Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The development of the tools to implement evolutionary operation as an operations strategy in wastewater… Coleman, Patrick F. 1992

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1992_spring_coleman_patrick_f.pdf [ 8.56MB ]
Metadata
JSON: 831-1.0050493.json
JSON-LD: 831-1.0050493-ld.json
RDF/XML (Pretty): 831-1.0050493-rdf.xml
RDF/JSON: 831-1.0050493-rdf.json
Turtle: 831-1.0050493-turtle.txt
N-Triples: 831-1.0050493-rdf-ntriples.txt
Original Record: 831-1.0050493-source.json
Full Text
831-1.0050493-fulltext.txt
Citation
831-1.0050493.ris

Full Text

THE DEVELOPMENT OF THE TOOLS TO IMPLEMENT EVOLUTIONARY OPERATION AS AN OPERATIONS STRATEGY IN WASTEWATER TREATMENT PLANTS by PATRICK F. COLEMAN B.ASc. (Civil Engineering) University of British Columbia, 1982 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in FACULTY OF GRADUATE STUDIES CIVIL ENGINEERING We accept this thesis as conforming to theequired standyrd  THE UNIVERSITY OF BRITISH COLUMBIA December 1991 © Patrick F. Coleman, 1991  In presenting this thesis in  partial fulfilment of the requirements for an advanced  degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.  Department of  Civil  Engineering  The University of British Columbia Vancouver, Canada Date  DE.6 (2/88)  DeCeIflbeI  21, 1991  Abstract  A technical operator of a wastewater treatment plant responds to changes in performance while a fundamentalist operator causes the performance to change. A good operator is both, responding to maintain control and acting to optimize performance. Most opera tors are technicians, afraid to experiment on their systems. This state of affairs exists for three reasons. First, management and regulatory authorities accept consistent suboptimal performance because they do not know what the plant is capable of. Second, many treatment plants are inflexible and poorly designed because the designers have no way to evaluate their work. Third, the operator does not understand what is going on in his process because the data are unreliable, insufficient, incompatible or unavailable. Computer based solutions to performance limitations have had mixed success because researchers have ignored this situation. The goal of this research is to provide a new way to look at treatment plant data that will free the operator to he both a techni cian and a fundamentalist. This new view is based on three paradigms. The Structure paradigm uncouples the structure of the system from the task of reasoning about the system. The Measurement paradigm maps all data collected on the system, qualitative and quantitative, into a single space so that it can be analyzed as a single unit. The Operation paradigm enables the operator to determine the effect of his actions on his process. A computer program that incorporates this approach will enable the operator to learn about his process while operating his plant.  II  Table of Contents  Abstract  ii  List of Tables  xii  List of Figures  xv  Acknowledgement 1  xix  Introduction  1  1.1  Objectives  1  1.2  Summary  5  1.2.1  Technician Or Fundamentalist 7  5  1.2.2  Technician By Necessity  6  1.2.3  Fundamentalist By Design  11  1.2.4  Information: The Basis For Fundamental Operation  13  1.2.5  Computers  17  1.2.6  From Concept To Code: Objectives  17  1.2.7  Paradigms  18  1.2.8  Structure Paradigm  18  1.2.9  Measurement Paradigm  19  -  -  Information Assistant  A Frame Of Mind  1.2.10 Operation Paradigm  19  1.2.11 Technician or Fundamentalist 1.3  Layout  -  The Operator Decides  20 21  ni  2  PLF’s and Computer-Based Solutions 2.1  2.2  2.3  Sources Of Information 2.1.1  CPE/CCP  2.1.2  Audits  2.1.3  Plant Experimentatiou  2.1.4  Examination Of Historical Records  2.1.5  Summary: Quality and Information  2.1.6  Results  2.1.7  Summary  2.1.8  Equipping The Operator To Improve Performance  32  Data Analysis 2.2.1  Initial Data Analysis  2.2.2  Data Processing  2.2.3  Data Quality  2.2.4  Summary Statistics  2.2.5  Relationships  2.2.6  Statistical Graphs  2.2.7  Advanced Statistical Methods  Modeling 2.3.1  2.4  23  Identifiability  Expert Systems 2.4.1  Integration: Access To Other Data  2.4.2  Expert: Is There One?  2.4.3  Rule Conflict: Wait and Short Term Gain  2.4.4  Are Expert Systems Needed?  2.4.5  Future: The Role Of Expert System Technology iv  .  -T  75  3  2.5  Automatic Control  2.6  Summary  78  Cause and Effect  80  3.1  Temporal Reasoning  81  3.1.1  Introduction  81  3.1.2  Problems and Solutions  82  3.2  4  .  77  Cause And Effect  85  3.2.1  Association Versus Causation  85  3.2.2  Effect To Cause: The Poor Cousin  89  3.2.3  Effect To Cause  Treatment Plants  91  3.2.4  Cause To Effect  Fundamental Problem Of Causal luference  95  3.2.5  Treatment Plants and Time Series Experiments  -  -  100  3.3  Synthesis  106  3.4  Summary: Control Cycle  108  Measurement Process  110  4.1  112  Measurement 4.1.1  Measurement Theory  4.1.2  Operational Measures  119  4.2  Preservation  122  4.3  Sampling  123  4.3.1  Sampling Protocol  124  4.3.2  Sampling Plan  127  4.3.3  Sampling Viewpoints  136  4.3.4  References  138  4.4  Quality Assurance/Quality Control v  140  4.5  4.6 5  6  4.4.1  Concern For Quality  4.4.2  References  4.4.3  Data Quality  Ivleasurement Model 4.5.1  Model Components  4.5.2  Preparation Errors  4.5.3  Error Correction  Good Data, Good Decisions  Fuzzy Sets, Logic and Reasoning  5.1  Literature  5.2  hat Is Fuzziness? 7 V\  5.3  Notion Of Sets 5.3.1  Notion Of Crisp Sets  5.3.2  Notion Of Fuzzy Sets  5.4  Notion Of Variables  5.5  Notion Of Fuzzy Controllers  5.6  Conclusions  164  Structure Paradigm  179  6.1  Database Management Systems  6.2  A Simple Plant  185  6.3  Overview Of The Structure Paradigm  194  6.3.1  Nodes  199  6.3.2  Links  200  6.3.3  Mapping  203  6.3.4  Class  204 vi  6.4  6.3.5  Structural Relationships : Path. Loop and Network  207  6.3.6  Level and Plane  208  6.3.7  Stream and Cnrrency  209  Structure Of Planes  211  6.4.1  Structural Plane  211  6.4.2  Monitoring, Diagnostic and Capacity  211  6.4.3  Quality Assnrance Plane  213  6.4.4  Derived Planes  216  6.4.5  Reasoning Planes  217  6.5  An Example: Construction Of The Structural Skeleton  6.6  Algorithms.  223  6.6.1  Phase 1: Parse In Node Information From A Text File  223  6.6.2  Phase 2: Construct Hierarchy  223  6.6.3  Phase 3: Parse In Link Information  224  6.6.4  Phase 4: Constrnct Link Sets  224  6.6.5  Phase 5: Build Interplanar Links  225  6.6.6  Phase 6: Build Sample and Measurement Sets  6.6.7  Phase 7: Loop Detection  226  6.6.8  Phases 8-11: Preparation For Plotting  226  6.6.9  Phases 12-15: Object and Graphic Space  226  6.7  .  219  ...225  6.6.10 Validatiow Draw A Plane  226  Conclusion  227  7 Measurement Paradigm 7.1  .  228  Declaration Space: Origin Of A Datum  230  7.1.1  230  Parameter Context  vii  7.2  7.3  7.4  8  7.1.2  Sample Context  234  7.1.3  Measurement Context  234  Data Space: Derivation Of A Datum  238  7.2.1  Quality  238  7.2.2  Preference  241  Data Space: Internal Representation Of A Quad  245  7.3.1  247  Manipulated Parameter: A Special Case  Primary Mapping: Mapping A Datum Into The Data Space  248  7.4.1  Mapping: {Crisp, {Ratio, Interval, Ordinal  248  7.4.2  Mapping: {Crisp, Nominal}  7.4.3  Mapping: {Mean/Standard Deviation, {Ratio or Interval}}  7.4.4  Mapping: {Fuzzy Number, {Ratio or Interval}}  251  7.4.5  Mapping: {Linguistic \/ariahle, Any Scale}  252  }}  249 .  .  251  7.5  Viewpoint  253  7.6  Secondary Mapping: Quad To Series  257  7.7  Tertiary Mapping: Grouping Data Series  259  7.8  Summary  261  Operation Paradigm  264  8.1  What Is Status 7  266  8.2  What Is Change?  270  8.2.1  Level  270  8.2.2  Warn-Alarm  8.2.3  Limits  272  8.2.4  Trend  272  8.2.5  Frequency  275  .  .  270  .  viii  8.3  9  Postdiction: Why Was A Manipulated Variable Changed?  275  8.3.1  Reason #1: Anticipate Disturbance  283  8.3.2  Reason #2: Anticipate Performance  285  8.3.3  Reason #3: Optimize Performance  286  8.3.4  Reason #4: Compensate Disturbance  286  8.3.5  Reason #5: Compensate Manipulated  286  8.3.6  Reason #6: Counteract Performance  291  8.3.7  Reason #7: Non-Operational Change  294  8.3.8  Reason #8: Catastrophic Intervention  8.4  Prediction: How To Change A Manipulated Variable  8.5  Conclusions  •  .  .  .  •  .  .  .  295 296 296  Synthesis  297  9.1  The Relationship Between The Program and The Operator  9.2  Construction Of A Simple Example  300  9.3  Information Generating System  302  9.3.1  Treatment Process Module  302  9.3.2  Information Gathering Process Module  304  9.3.3  Adjustable Parameters  305  9.3.4  Simulation Scenario  305  9.3.5  Detection Of Change  305  9.3.6  Detection and Response To Change  308  9.4  .  .  297  Simulation: Coping With A Bulking Sludge  308  9.4.1  March 17, 1991  308  9.4.2  March 27, 1991  316  9.4.3  April 22, 1991  316  ix  9.4.4  May ‘20, 1991  9.4.5  Postscript  .  318 318  10 Conclusion: From Box’s EVOP to Evolutionary Operation  322  Appendices  324  A Abbreviations And Copyrights  324  B Glossary  326  B.1 Introduction  326  B.2 Items  326  C Uncertainty In The MCRT and F/M ratio  365  C.1 Goal  365  C.2 Result  365  C.3 Derivation  366  D Sludge Quality  374  D.1 Plant Observations  375  D.1.1  Foam On Surface Of Aeration Basin  375  D.1.2  Secondary Clarifier  376  D.2 Microscopic Examination  378  D.2.1  Morphology  378  D.2.2  Size  378  D.2.3  Composition  378  D.2.4  Protozoa  379  D.3 30 Ivlinute Settling Test  381  x  D.3.1  Floc Formation  D.3.2  Blanket Formation  D.3.3  Settling Velocity  381 81 381  D.4 Laboratory Measurements D.4.1  383  Analytical Measurements  383  D.4.2 Derived Measures  384  E Structure Paradigm Example E.1  385  Database Schema  385  Bibliography  391  xi  List of Tables  2.1  Performance Limiting Factors As Identified By The CPE/CCP Program  29  2.2  DMR QA Study 2-Average Failure Rate  33  2.3  Mean, Median, Mode  41  2.4  Common Width Estimators  2.5  Skewness and Kurtosis  2.6  Correlation Example: Correlation Analysis and Outlier Detection  2.7  Holmberg’s Batch Reactor Equations  3.1  Statistical Solution  3.2  Time Series Experiment  3.3  Time Series Experiment  3.4  Mixed Culture/Mixed Substrate Interactions  4.1  Measurement Scales  116  4.2  Appropriate Statistical Operations  117  4.3  Ratio And Interval Temperature Scales  4.4  Effectiveness Of ATU As Effluent BUD 5 Sample Nitrification Inhibitor  122  4.5  Outline Of A Sampling Protocol  125  4.6  Green’s Sampling Design Principles  126  4.7  Sample Selection Methodology  131  4.8  DMR QA/PESP Effectiveness  142  4.9  BOD and Suspended Solids Quality Assurance Results: % Unacceptable  145  -  -  Common Location Estimators  -  -  Single Values  42  Order Statistics  45  -  46  Effect Archetypes  .  98  .  Cause Archetypes  .  .  .  .  4.10 Potassium Determination In An Agricultural Laboratory xii  .  62  Analysis Of Variance -  .  101 102 103  118  148  4.11 Sample Plan and Type Selection Based On The Quality Fluctuation Error 162 5.1  Properties Of Crisp Set Operations  169  6.1  Portion Of An Index  182  6.2  Phase 1: Text File Syntax  223  6.3  Phase 2: Section Numbers and Node Relationships  6.4  Phase 3: Text File Syntax  225  7.1  Measure Paradigm: Effluent COD Example  249  7.2  Linguistic Variable: Clarifier Condition Is Poor  252  7.3  Calculate Exploratory Statistics  254  7.4  Time Series Viewpoint : Display Decision Table  256  7.5  Differenced Time Series Viewpoint : Series Manipulation Table  7.6  Measure Paradigm: Effluent COD Summary  262  8.1  Forms Of Change  270  8.2  Why Change A Manipulated Variable?  277  8.3  Rule For Reason #1: State Of Anticipated Disturbance  287  8.4  Rule For Reason #1: State Of Common Performance  287  8.5  Rule For Reason #2: State Of Anticipated Performance  287  8.6  Rule For Reason #3: State Of Performance  289  8.7  Rule For Reason #4: State Of Initiating Disturbance  8.8  Rules For Reason #4  8.9  Rule For Reason #6: State Of Common Performance  293  9.1  Observed Measurements  306  •  .  .  State Of Common Performance Variables.  A.1 Copyrights  224  256  289 293  324 xni  A.2 List Of Abbreviations  325  D.1 Plant Observations  Aeration Basin Foam  375  Secondary Clarifier  376  D.2 Plant Observations  -  -  D.3 Microscopic Examination D.4 Microscopic Examination D.5 Microscopic Examination D.6 Microscopic Examination D.7 30 Minute Settling Test D.8 30 Minute Settling Test D.9 30 Minute Settling Test  -  -  -  -  -  -  -  Floc Morphology  378  Floc Size  378  Floc Composition  379  Protozoa  380  Floc Formation Blanket Formation Settling Velocity  D.10 Laboratory Measurements  -  1982 DMR-QA Results  xiv  38t .  .  .  .  38t 382 383  List of Figures  1.1  From Concept To Code: Program Development  1.2  Datum As Information  1.3  Elements Of Information: Datum, Series and Relationship  2.1  Statistical Sample Size: {Small, Medium, Large}  2.2  Correlation Example: Filament Length vs SVI  47  2.3  Anscombe’s Quartet  55  2.4  Ljung’s System Identification Loop  3.1  Scale-up : Volume to Surface Area Ratio Contours  87  3.2  Hysteresis  93  3.3  Scientific Solution  97  3.4  Statistical Solution  99  4.1  Samples Taken On Linsley Pond  128  4.2  Cy’s Sampling Error Model  155  4.3  Example Audit Procedure  161  5.1  Butterfly Cluster  171  5.2  Butterfly Cluster : Fuzzy Clustering  172  5.3  Representations Of Uncertainty  177  6.1  Book Index Schema  184  6.2  Simple Plant  187  4 14  xv  .  .  .  .  .  .  .  .  15 34  63  6.3  Outline Of Plant  6.4  Simple Plant: Bird’s Eye View Of A Hierarchical Network  6.5  LP.: Treatment Plant Plane  190  6.6  LP.3: Primary Treatment Plane  191  6.7  LP.i: Secondary Treatment Plant  192  6.8  LP.i.3: Bioreactor  193  6.9  Overlap Of Classes Of Information  196  188 .  189  6.10 Link  201  6.11 Influence Map  205  6.12 PMS Plane Owned By Structural Node LP.1 Influent  212  6.13 LP..3: Bioreactor  214  6.14 LP.1 Influent: QA/QC Plane For Composite Sample and COD Measure.  215  6.15 Graphic Space  221  6.16 Object Space  222  7.1  Measure Paradigm: Parameter Context  231  7.2  Model Context: An Example  234  7.3  Measurement Paradigm: Sample Context  7.4  Measurement Paradigm: Measurement Context  236  7.5  Primary Mapping: Datum To Data Space  239  7.6  Quality Distribution For COD Test  7.7  Preference Distribution For COD Test  7.8  Measure Paradigm: Effluent COD Example  250  7.9  Secondary Mapping  258  7.10 Tertiary Mapping 8.1  .  .  .  235  243  260  Derived Time Series: Level  267 xvi  8.2  Derived Time Series: Stability  268  8.3  Simple Direction Algorithm  269  8.4  Change In Level  271  8.5  Change In Trend  274  8.6  Change In Frequency  276  8.7  Why A Manipulated Parameter Is Changed  279  8.8  Stability And Control Actions  280  8.9  Causal and Noncausal Change  281  8.10 Reason #1: Anticipate A Change In A Disturbance Parameter  284  8.11 Reason #2: Anticipate A Change In A Performance Parameter  285  8.12 Reason #3: Optimize A Performance Parameter  288  8.13 Reason #4: Respond To A Change In A Disturbance Parameter  290  8.14 Reason #5: Compensate For A Change In A Manipulated Parameter  291  8.15 Reason #6: Compensate For A Change In A Performance Parameter  292  8.16 Reason #7: Correct An Operational Error  294  9.1  Synthesis: Information Generation and Interpretation  299  9.2  Simple Example: Program Layout  301  9.3  Simulated Plant’s Flow Sheet  303  9.4  Simulation Scenario: Tbie SVI Value  307  9.5  Control Cycle: Wastewater Teatment Plant  309  9.6  Daily Trend  310  9.7  Stability  311  9.8  Weekly Trend: Disturbance. Performance and Adjustable Parameters  312  9.9  Weekly Trend: Status Parameters  313  9.10 Reason #2: Anticipate A Change In A Performance Parameter  xvii  315  9.11 Uncontrollable System  317  9.12 System Recovery  319  9.13 Effluent COD  320  9.14 Wastage Rate Control  321  Bi. Simple Box and Whisker Plot B.2 Age  .  Stem and Leaf Plot  .  C.1 95% Confidence Intervals For MCRT C.2 MCRT Range Of Insignificance C.3 95% Confidence Intervals For F/M Ratio C.4 F/M Ratio Range Of Insignificance  xviii  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  370 371 372 373  Acknowledgement  I owe a deep debt of gratitude to the many people who assisted in the completion of this work. My greatest debt is to my snpervisor, Dr. W. K. Oldham, who was extremely patient and supportive through my many mistakes, wanderings and false starts. I am also grateful for the assistance I received from the members of my committee, my fellow graduate students, computing services and interlibrary loan staff. I am especially grateful for the support given to me by Fred Koch, Susan Liptak, Dave Wareham, Timo Pera and Kerry Seale. The concepts in this thesis began with my stay at the Penticton Pollution Control Center. I am grateful to the operator, Bernie Udala, and the staff for taking the time to teach me how to operate a plant. I also acknowledge the financial support provided by the Natural Sciences and Engineering Research Council of Canada. Without the support of friends, family, church and most important, my wife Myrna, this thesis could never have been completed. This thesis is dedicated to my father who taught me that a university edncation is a privilege. AMDG  xix  Chapter 1  Introduction  The man who knows what question to ask is on the verge of understanding; the man who is beginning to understand what he does not know is not far from knowledge. This chapter consists of three sections: Objectives, Summary and Layout. The purpose of these sections is to introduce the thesis material and provide an overview of the structure of this document.  1.1  Objectives  Successful control of a wastewater treatment plant is less dependent on plant configura tion or sophisticated and expensive equipment than on the operator’s ability to recognize and influence causal relationships in his plant. For this reason, operators are coming under increasing pressure to improve or maintain their plant’s effluent quality despite increasing flows and shrinking budgets. The operator, therefore, needs a new tool which will assist him in improving and interpreting his monitoring data so that he can improve his plant’s performance. The goal of this thesis is to develop the logical paradigms for a computer program which could become such a tool. Computers are already being used in the wastewater industry. However, much of their function is information storage. This accumulation of monitoring data solely as a historical record is a luxury few municipalities can afford. However, with reallocation of some of these computing resources to data analysis, the benefit accrued from each Abba Issac, Fifth Century 1 1  Chapter 1.  Introduction  9  monitoring dollar would significantly increase. Or a.s Lord Rutherford once said, “We haven’t the money, so we’ve got to think”. A new partnership between the operator and the computer must be struck if the operator is to take on a more aggressive role in operating his plant. This new partnership requires new paradigms that gnide the operator in his search for optimal performance. For this reason, this research concentrates on the specification of an appropriate tool rather than the development of a working program. The thesis objectives are defined as follows: 1. Define the Concept of Computer-Assisted EVOP: Explain how expressing process information in a cause/effect framework enables the operator to improve both his information gathering aud treatment process. 2. Design the Logical Structure: Develop a set of paradigms that guide the op erator and the computer program in organizing the data to best elucidate causal patterns in the data. 3. Demonstrate the Paradigms’ Use: Explain how an operator uses these paradigms to decide how and wheu to respond to a process change. Figure 1.1 outlines the evolution of a concept to a computer program [319] [206]. The first thesis objective, concept definition, initiates the design process. The definition’s content provides the basis for the development of a logically correct program. The second objective, paradigm development, carries this evolution to level 2: Logical Structure. From then on, the evolution becomes more of an exercise in Computer Science than one of Environmental Engineering. Implicit in this model is the notion that a de-evolution must take place as well. The goal of top-down planning (level 1  5) is to maintain logical consistency (i.e. avoid  Chapter 1  Introduction  “bugs”) while the goal of bottom-up planning (level 5  3  1) is to maintain feasibility  (i.e. can the machine do it). In order to ensure that it was feasible to implement these paradigms as part of a computer program, a number of test programs were written as part of this research. These programs form the basis of the examples in Chapters 6 and 9.  Chapter 1. Introduction  4  EPT  Data  Activities  Overview  Strategic Overview Of Corporate Data  Strategic Overview Of Corporate Functions  Level 1  Logical Structure  Detailed Logical Data Model  Logical Relationship Among Processes  Level 2  Program Structure  Program-Level View Of Data  Overall Program Structure  Level 3  Program Structure  Program Usage Of Data  Detailed Program Logic  Level  State Space And Modules  Data Structure Design  Function Design  Level 5  cD Figure 1.1: From Concept To Code: Program Development  Chapter 1. Introduction  1.2  Summary  An operator would prefer to influence and control his process, rather than just react to it. The goal of this thesis is to provide the operator with a tool which could assist him in this decision-making process. 1.2.1  Technician Or Fundamentalist?  A good operator acts as a fundamentalist or a technician depending on the situation in his plant at a given time. A fundamentalist is an author of cause (i.e. acts) while a technician is a reader of effects (i.e. reacts). As a technician, an operator assumes that he does not need to know (or cannot determine) the cause of an effect. Instead, he responds to changes in effects by adjusting his plant’s manipulated parameters, e.g. feedback control. As a fundamentalist, an operator exploits causal relationships within the plant by changing a manipulated parameter to offset the effect of a change in a disturbance parameter, e.g. feed-forward control. A Technician: Reader Of Effects The technical approach is based on three premises; 1. A cause worth worrying about will have a detectable effect on the process. 2. An operator detects change by monitoring changes in an effect’s quality, preference and value. 3. An operator can reverse a change in an effect by changing a manipulated parameter, i.e. process is elastic. This approach fails when the change in the system makes the process uncontrollable, i.e. the change in the manipulated parameter no longer affects the process.  Chapter 1.  Introduction  6  A Fundamentalist: Author Of Cause A fundamentalist operates on a different set of premises: 1. A change in performance is due to either a change in a disturbance parameter or an operator manipulated parameter, i.e. process is controllable. 2. A change in a disturbance can be detected or anticipated early enough that the operator can act to offset it, i.e. feed-forward control. 3. The cause and effect relationships in the plant are known. i.e. an operator knows what to change and how to change it. 4. An operator knows when the system reaches a stable state, i.e. steady state  2  is  determinable. This approach fails when the operator’s model of the system fails, i.e. the process re sponds differently than expected. 1.2.2  Technician By Necessity  By necessity, most operators are technicians. There are three reasons for this: 1. Management and regulatory authorities are generally willing to accept consistent suboptimal performance, as long as regulatory demands are met, i.e. .satisfices [168]. To avoid confusion with the microbiological definition of Steady State, the term Stable State will be 2 used to describe the situation when the components of interest in a system stabilize around a constant value.  Chapter 1. Introduction  7  2. Wastewater treatment plants often are either inflexibly or poorly designed. 3. The operator does not understand what is going on in his process. What Will Management Accept? Operators generally aim for consistent operation rather than optimal performance. The reason management and regulatory authorities settle for this situation is that they do not understand what the treatment plant is capable of doing. There is no easy way for them to “tap” into the plant and monitor its performance. It is very costly in time and resources to collect and examine the information required when one uses an optimizing strategy to arrive at a decision. A consequence of the information age is that most managers suffer from severe time pressures and information overload [265]. In other words, a manager is “so busy manning the fire hose that he cannot devise a fire prevention program” [168]. The tendency is “to go with” the first solution that works (i.e. satisfices) rather than search for a solution that is optimal. For this reason, management settles for adherence to a numerical standard rather than evidence that the operator is making careful and informed judgements about the plant’s operation. Management needs a method of  viewing  the treatment plant data that will organize,  value and filter information so that a manager is able to understand what is happening at the plant. A manager and an operator should be able to view the same screen (or at least the same data in the same way) and discuss what they see. How Does One Learn How To Design A Plant? The one conclusion that can be drawn from the EPA’s Comprehensive Performance Eval uation/Composite Correction Program (CPE/CCP) The CPE/CCP program is discussed in Section 2.1.1 3  and numerous other field studies  Chapter 1. Introduction  8  and experiments is that there is not enough communication between those who operate plants and those who design them. This lack of communication has led to the construc tion of plants based on theories that field studies have shown to be unreliable. This is why the CPE/CCP concluded that many plants are over-designed and that design deci sions are often the source of operational problems. It is an onerous task for a designer to return to the plant and sort through the log book, data files and maintenance database in the hope of evaluating his design. Neither the client or the designer have the resources to fund such an evaluation. The cost is high because of the way the data are stored. Most of the cost associated with an evaluation is due to collecting and re-entering the data so that it is in a format that the designer understands and the computer can manipulate. If the data were already in such a format, the operator could send the data to the designer on a disk and the designer could analyze the data at his office. Why Don’t Operators Know What Is Going On? One of the leading Performance Limiting Factors (PLF) identified during the EPA’s CPE/CCP was operator ignorance. This PLF invariably was cited along with staffing and design problems. The EPA felt that this PLF was too ambiguous because it did not identify why the operator appeared ignorant to the evaluator. One explanation was that the evaluator expected the operator to be a fundamentalist while the conditions in the plant forced the operator to be a technician. This situation occurs for three reasons: • Data Quality: The data the operator receives from the laboratory is often not reliable. Due to inadequate equipment, training or just poor quality control, most treatment data is of poor quality [50] [215] [164] [244]. The uncertainty of control variables such as MCRT increase as the data quality deteriorates. As  well, poor  Chapter 1. Introduction  9  data limits statistical analyses to descriptive statistics and increases the data size requirements for inferential statistics. It also renders models non-identifiable on the system and makes automatic control impossible. In other words, “Data of poor quality are a pollutant of clear thinking and rational decision making” [164, p. 870]. • Data Coverage: The operator does not receive enough data to make an informed decision about “what is causing what”. There are three reasons why the data do not “cover” the process: 1. Data is collected hut is either not recorded or recorded but not considered part of the process’s data set. Among this missing data is information on per sonnel, energy usage, sludge management, maintenance, equipment failures and operating costs [237]. Given that personnel, power and chemical expendi tures can account for 80% of a typical plant’s operating costs [89] [253], these expenditures should be analyzed. 2. The operator relies on qualitative observations and intuition because he lacks the resources for on-line instrumentation or additional laboratory work. This information is usually not part of the monitoring data set due to its categorical or linguistic nature, i.e. data apartheid. 3. Data may be collected at different frequencies or not collected at all. This creates a problem because most statistical methods require data pairs, not just data. Therefore, the loss of a single datum can mean that all the data collected at that time is also lost to the analysis. In other words, “The best thing that can be said about missing data is  -  don’t have any!” [264, p. 188].  • Data Apartheid: Data Apartheid occurs when two or more data sets, collected on the same system at the same time, are stored, analyzed and interpreted separately  Chapter 1. Introduction  10  simply because of their form. For example, the EPA CPE/CCP identified that plant performance was affected by a number of factors:  —  —  —  —  —  —  —  —  scheduling of preventive maintenance equipment failure staffing levels quality assurance tests on on-]ine samplers and instruments industrial wastes sludge processing streams in-plant pumping cycles abnormal influent events (e.g. storms)  Because of their form, these data are often spread across log books and databases, or, in some cases, not recorded at all. Operators can manage information if the information is in a digitally compatible form. Converting information from one form to another is far more time consuming than analyzing that same information. Once the information is in one place, the operator can cross check to ensure that (1) the information is reliable (i.e. of good quality) and (2) he has collected all the information he needs (i.e. the data covers his process). The need to manage information such that its use results in decisions being made wisely is not unique to wastewater treatment plant operators. Grace Hopper,  throughout her career, has  Rear Adm. Grace Hopper USN Retired, now in her 80’s, is a pioneer in the development of stan 4 dardized application programming languages, including COBOL. She continues to act as consultant to the Digital Equipment Corporation  Chapter 1. Introduction  11  argued that any organization that relies on information must put into place a system that ensures that the required information is collected, that it’s quality is maintained and its value is known [161, p. 170]: We have a raw material that is called data. We feed it into a process. In this case, the process consists of hardware, software, communications, and trained people. Hopefully, the output product is information. Equally hopefully, this process is under some form of control and there’s a feedback ioop from the information to the control to improve the quality of the information. 1.2.3  Fundamentalist By Design  In 1922, Abel Wolman  argued that the wastewater treatment profession would benefit  greatly if treatment plant operators applied the Scientific Method to their routine oper ation and management decisions [313]. Wolman realized that (1) if operators improved the performance of their plants through systematic assessment of the plants’ design and operation and (2) if the design profession had access to these assessments, then both the design and operation of treatment plants would improve. As a technical community, the wastewater industry has been slow to follow Wolman’s advice [228]  6  and the lack of  communication between operators and engineers continues to hurt the industry [30]. Since Wolman’s article was published, advances in statistics have provided a new basis for the application of the Scientific Method to the operation of wastewater treatment plants.  Statistical Sampling Theory provides guidance on how to design an effective  monitoring program while Statistical Quality Control enables the operator to monitor the quality of the data collected by this program. The Statistical Design Of Experiments provides the operator with guidance on how to 5 A bel Wolinan (1893-1989) was one of the founders of the Water Pollution Control Federation. The Engineering Profession as a whole has been slow to adopt the use of statistical methods. For 6 example, one conclusion of the Post-Challenger Evaluation was that “NASA is not adequately staffed with specialists and engineers trained in the statistical sciences to aid in the transformation of complex data into information useful to decision makers, and for use in setting standards and goals” [148]  Chapter 1. Introduction  12  operate his plant to both treat the waste and provide information on how to improve both the plant’s operation and design. This theory has been applied to treatment plants in two ways: Plant Experimentation (PLEX) and Evolutionary Operation (EVOP). PLEX involves executing one or more experiments in a limited amount of time to learn more about the process. PLEX usually involves “getting in and getting out”. A plant exper iment is usually carried out when there is additional personnel to closely monitor the process and take corrective action at the slightest  warning  EVOP is less intrusive and more long term [58].  of  an  upset  [143].  EVOP is a sequential form of  experimentation conducted in a treatment plant during normal operation. The principal theses of EVOP are that knowledge to improve the process should be obtained along with a product and that designed experiments using relatively small shifts in factor levels can yield this knowledge at minimum cost. The range of variation of the factors for any one EVOP experiment is usually quite small in order to avoid upsetting the process. However, the smaller the change, the smaller the experimental error must he if the effect of the change is to be detected. The smaller the experimental error, the more replicates are required [6]. The more replicates, the longer the experiment and the more danger that something else has changed, e.g. the weather. The North American economy has been hurt by the Engineering Profession’s reluc tance to integrate these statistical methods into the day-to-day operation of industry [94] [175]. On the other hand, many foreign economies, especially that of Japan, have grown partly because of their emphasis on statistical methods. They have managed to improve the quality of their products by improving the design and operation of their processes. When coupled with a sound understanding of the fundamentals of a process, the use of statistical methods increases creativity and improves both productivity and quality. A similar change must come about in the wastewater treatment profession if Wolman’s dream is to become reality. Operators, managers and designers will all benefit once  Chapter 1. Introduction  13  operators integrate plant experimentation and process fundamentals into their day-to day operational decisions. Both the design and operation of plants will improve as this information filters out from each plant into the profession as a whole. 1.2.4  Information: The Basis For Fundamental Operation  The goal of a fundamentalist is to lead rather than be led by his process. For this reason, an operator must (1) understand how his process functions and (2) know his process’s current state. The last requirement, the need for timely and reliable information, is critical to an operator’s success.  Information is data whose origin and derivation is  known, i.e. where it comes from and how it got here. In this thesis, information is defined with respect to a datum, a data series or a relationship among data series (Figure 1.3). The origin and derivation of a datum provide the datum with meaning (Figure 1.2), i.e.  context.  In the case of wastewater treatment plants, a datum’s context derives  from where in the system the datum originates (i.e. structural context), how the datum is obtained on the system (i.e. measurement context) and how the operator uses the datum to make a decision (i.e. operational context).  Chapter i.  ctjo mt 1 rodu1  14  Datum Declaration Origin Structure Measure Operation Deni tion Derivation Time Preference Quality Value Figure 1.2: Datum As Tnformation  Chapter 1. Introduction  15  Time (Or Space)  Series Datum  Datum  Datum  Datinn  Datnm  Datum  I Relationship  4 Series Datiam  Datum  Datum  Datum  Daturj  Datm  Figure 1.3: Elements Of Information: Datum, Series and Relationship  Chapter 1. Introduction  16  The origin and derivation of a datum also provide a datum with value (Figure 1.2). For example, Grace Hopper argues that a datum is valuable to an operator if it enables him to make a control decision [161, p. 169-170]. For a couple of decades now, I’ve beenì asking people how they value their information. I haven’t received any answers but I have received a great as sortment of blank stares. We have totally failed to consider the criteria for the value of information. We haven’t even defined onr criteria. And yet we must know something about the value of the information and data we are processing. I think we must create several priorities: the time you have to act on the data and the number of lives and dollars at stake. But there’s another one the importance of that piece of information in making decisions. .  .  .  -  In the race for faster computers and more powerful software, Hopper argues that the industry has lost sight of what it is doing with data. All data are not of equal value and the allocation of monitoring, QA/QC and data analysis resources should reflect this fact. A datum’s worth is described, in part, by its preference and quality. Preference describes how the operator feels about a datum. For example, an effluent COD concen tration that is above the permit level would be less preferable than one below. Quality describes how reliable the datum is. For example, a datum derived from what later turns out to be a contaminated sample is of little use to the operator. Good data are accurate, precise, complete, representative and comparable. A data series describes the history of a characteristic with respect to time (i.e. time series) or space (i.e. space series). The history describes change: change in level, change in trend and change in stability. A data series should cover the interval of interest, i.e. describe the important dynamics of a characteristic. A set of data series describe the history of a system, i.e. a treatment plant. The set is informative if location of the series in the process and the role of the series in the decision making process is known. A set of data series should be comprehensive, i.e. describe anything of importance that might impact on the plant.  Chapter 1. Introduction  17  An operator will intervene when he knows he can make a difference. The operator responds to changes in his information, first by ruling out any explanation other than a process change and second, by acting on the information provided. For this reason, an operator needs more than just data, he needs information data with meaning and value. -  If an operator is to do more than just react to change in order to maintain the status quo, he must be in control of the process that provides him with information. He must design his monitoring program carefully to ensure that (1) he receives the information he requires and (2) he can separate changes in the system from problems in the monitoring program.  The consequence of not taking control is “infolock”, a situation where an  operator is so busy processing the data that he has no time to analyze it [265]. 1.2.5  Computers  -  Information Assistant  Researchers who design computer-based solutions to overcome performance limitations may be divided into two groups: those who seek to replace the operator and those who seek to utilize the operator. This thesis takes the latter approach. The operator’s presence is our insurance against the unpredictable (e.g. toxic spill) and the inevitable (e.g. equipment failure)  .  In this case, the computer assumes responsibility for tasks  the operator either dislikes doing or does not do well. This frees the operator to do what he does best 1.2.6  -  operate his treatment plant.  From Concept To Code: Objectives  The goal of this thesis is to develop a tool that will enable an operator to implement an EVOP-like operations strategy. The evolution of a concept to a program involves both a bottom-up and top-down approach to design (Figure 1.1). The goal of top-down planning Guariso and Werthner provide a convincing argument of this viewpoint with respect to expert 7 systems [131]  Chapter 1. Introduction  18  is to maintain logical consistency (i.e. avoid “bugs”) while the goal of bottom-np planning is to maintain feasibility (i.e. machine can do it,). The tension of design is to translate an organic concept into a digital one. The tension exists because humans and computers “think” differently. For example, imagine the difficulty of the task of translating a work written in a language with a vocabulary of over 100,000 words into one with less than 200. This is exactly what happens when a client describes a concept to a programmer and the programmer translates the concept into a computer program  .  This man-machine  compromise is what makes the design of an EVOP-like tool so difficult. 1.2.7  Paradigms  -  A Frame Of Mind  The success of computer-based solutions has been limited for at least three reasons: • The solutions usually take advantage of only a fraction of the available information. • The solutions usually ignore both the quality of the data and the measurement characteristics of the data. • The solutions usually do not take full advantage of the operator’s presence in the plant. In order to deal with these limitations, three paradigms  are developed: Structure,  Measurement and Operation. 1.2.8  Structure Paradigm  The aim of the structure paradigm (Chapter 6) is to construct a physical and causal network of a wastewater treatment plant in both the memory of the operator and the 5 T he Intel ©8086/88 chip has an instruction set of less than 200 triples A paradigm is a model or pattern that organizes knowledge about a subject, explains phenomena, 9 and serves as the basis for what measurements to take [77).  Chapter 1. Introduction  19  machine. An operator uses the network to guide his reasoning while the machine uses the model to construct relationships between parameters. The computer uses the physical network to follow the passage of mass and the causal network to establish cause and effect relationships. The causal network, constructed from the physical and influence structure of the plant, forms the skeleton onto which all other knowledge is huug. This knowledge includes information on how the process works and how information on the process is obtained. The structural network consists of nodes and links, classed into planes, classes, and streams. A plane is a level of abstractiou, a class is a form of knowledge and a stream is a the form of flow that passes through the links. The paradigm places information in context of the system’s structure fixing it in space, time and process continuum. The structure forms the basis of communication between the different forms of information collected on the process. 1.2.9  Measurement Paradigm  The goal of the measurement paradigm (Chapter 7) is to establish a datum’s context and value (Figure 1.2). The paradigm accomplishes this by mapping information into a number of spaces. A space is a predefined logical structure that formalizes the meaning of a datum. The spaces define the meaning of a datum at the datum, series and relationship level. 1.2.10  Operation Paradigm  The goal of the operation paradigm is to link a past change to a change now, and a change now to a change in the future. For example, the operator decreases the wastage because he detects ammonia in the effluent (no nitrification), in the hope that he will  Chapter 1. Introduction  20  see these ammonia levels drop in the future. These linkages enable the operator to learn from his actions. 1.2.11  Technician or Fundamentalist  -  The Operator Decides  It is clear that in each plant, no matter how small, no matter how crude, phenomena of great importance and of peculiar significance are occurring and recurring. They are not always observed and still less often reported. It is a special plea that this condition be remedied. for with its remedy, per haps, many of both scientist and practical people will avoid voyages “bound nowhere, under full sail” [313. p. 14]. .  .  .  .  .  .  Over sixty years have passed since Abel Wolman argued that each plant, if operated scientifically, was an untapped reservoir of process knowledge. This thesis will be counted a success if it moves the wastewater industry one step closer to realizing Wolman’s dream.  Chapter 1. Introduction  1.3  21  Layout  The thesis is divided into four parts: • PLF’s And Computer-Based Solutions: Chapter 2 consists of two parts: (1) a review of what limits performance and (2) a discussion of problems with existing computer-based solutions to these limitations. The goal of this chapter is to show that the computer’s role is that of an assistant to, rather than a replacement for, the operator. • Theory: Chapters 3, 4, 5 present theory necessary to overcome many of the limi tations identified in Chapter 2:  —  The focus of Chapter 3, “Cause and Effect”, is on how to determine the cause of an effect in wastewater treatment plant where temporal assumptions may change, a number of causes may be confounded and the operator may intervene.  —  The focus of Chapter 4, “Measurement Process”, is on how the process of obtaining information on a characteristic affects the detection of cause and effect relationships.  —  The focus of Chapter 5, “Fuzzy Sets,Logic and Reasoning”, is on the repre sentation of data that are not crisp numbers.  • Paradigm: Chapters 6, 7 and 8 provide a logical basis for an EVOP-hke tool: —  The goal of Chapter 6, “Structure Paradigm”, is to reduce the structure of the process to a hierarchical network.  —  The goal of Chapter 7, “Measurement Paradigm”, is to establish the meaning of a datum and a data series.  Chapter 1. Introduction  —  22  The goal of Chapter 8, “Operation Paradigm”, is to establish the operational meaning of the relationships among the data series, i.e.  cause and effect  viewpoint. • Synthesis: Chapter 9 uses a simple example to demonstrate how these paradigms assist the operator to run his plant. • Conclusions And Recommendations: The last chapter summarizes the re search findings and provides suggestions for further research.  Chapter 2  PLF’s and Computer-Based Solutions  It is a great fault of descriptive poetry to describe everything . 1 The purpose of this chapter is to discuss what limits treatment plant performance and how computer-based solutions can ameliorate the problem. The intent of this chapter is to place this research in context with other research in the area of computer-based solutions to performance limitations.  2.1  Sources Of Information  The information in the literature on what limits plant performance comes from four types of studies: 1. Comprehensive Performance Evaluation/Composite Correction Program (CPE/CCP) 2. Audits 3. Plant Experimentation 4. Review Of Historical Records 2.1.1  CPE/CCP  The EPA’s Comprehensive Performance Evaluation and Composite Correction Program consists of two steps: (1) Evaluation Phase and (2) Performance Improvement Phase Alexander Pope 1  23  Chapter 2. PLF’s and Computer-Based Solutions  24  [125] [126] [139] [140] [145] [252] [40] [138] [251]. The evaluation phase is a thorough review and analysis of a treatment plant’s design capabilities and associated adminis tration, operation and maintenance practices. The performance improvement phase is a systematic approach to eliminating those factors that limit performance in existing treatment plants. 2.1.2  Audits  An audit involves the comprehensive monitoring of a part of a treatment plant over a short time in order to characterize its behaviour. All audits share at least four characteristics: 1. An audit is not an experiment. 2. An audit involves additional staff and resources not normally available to the op erator. 3. An audit is (usually) short-term. 4.  An audit is plant-specific.  An audit suffers from the same drawbacks as all short-term observational studies. For this reason, if an auditor recommends a change to the operator based on the audit’s results, the operator should be careful to introduce one change at a time to ensure that beneath a short term benefit there is not a long term disaster. Resource Audit A resource audit monitors how a resource is allocated for use in the treatment plant, e.g energy or labour. For example, Steiger et al [282] conducted an energy audit at the Jackson Pike Treatment Plant  2  The auditors suggested a number of changes that  The reader should refer to [239] [112j and [17] for more information on energy audits 2  Chapter 2. PLF’s and Computer-Based Solutions  25  would reduce the plant’s consumption of energy including the use of hot water boilers in lieu of steam boilers, the institution of regular cleaning of air socks, screen and filters and the elimination of leaks in the air mains. Given that personnel, energy, chemical expenditures and sludge disposal account for at least 80% of a typical treatment plant’s operating budget [89], these parameters shonld be audited on a regular basis if they are not being monitored by the operator. Design or Research Audits Audits are an effective way to evaluate design assumptions. For example, most clari fier models are based on the assumption that activated sludge flocs settle according to Kynch ‘s Theory Of Thickening. However, at least two researchers confirmed that this is not the case. Laikari [191] compared the settling velocities obtained from a conventional cylinder test with those obtained in a clarifier during stable operation and concluded that cylinder test results are misleading. This deviation probably explains the high degree of variability Morris et al [216] observed in the Vesilind coefficients (k, V ) on a day-to-day 0 basis in three Burlington, Vermont, completely mixed activated sludge (CMAS) plants. Most clarifier models operate in, at most, two dimensions and assume that the hy draulic and mass transfer characteristics of a clarifier are fixed. Again, field studies show that this is not the case. Crosby [83] conducted a series of innovative dye tests on seven different clariflers in seven different plants. Based on these tests, Crosby concluded that both the hydraulic and mass transfer characteristics of a clarifier depend on the clarifier’s design and the plant’s operation. For example, Crosby found that clarifier performance was affected by such things as clarifier depth, influent flow distribution, baffles, weirs, sludge rakes and blanket height. This level of complexity is not represented in any ex isting clarifier model.  Chapter 2. FLE’s and Computer-Based Solutions  26  The focus of both Crosby’s investigations of seven American plants and Tendaj-Xavier and Hnltgren’s investigations [291] at the Bromma STP (Stockholm) was operational stability. Both researchers argue that depth along with inlet, underfiow and overflow design are critical to stable operation.  They also suggest that the design should be  checked and adjusted using dye tests once the clarifier is on-line. Crosby also observed that clarifier performance is sensitive to changes in the hy draulic loading. A change induces a small amplitude wave in the clarifier increasing the turbulence in the settler. Retention time is not related to the time for a flow change to propagate through a plant unless flow equalizing storage volume intervenes. The main factor controlling the speed with which a flow change propagates is the small amplitude wave speed: c  =  /jh where g is the gravitational acceleration and h is the water depth.  Crosby estimates a change in the flow into the Orange County Sand Lake Plant would take about 4 minutes to propagate through the plant. However, this dynamic dies out within one hydraulic retention time [216]. Process Audit A process audit is a “stepping-up” of the monitoring program to define the time or space variation of a number of plant characteristics to determine if an operational or design change is in order [286] [280]. For this reason, process audits usually are used during start-up studies or to evaluate the operation of older, often overloaded, treatment plants. For example, Stephenson et al [286] conducted an audit of the Cambridge (Hespeler) Water Pollution Control Plant and identified air flow disturbances and pump station induced hydraulic disturbances as causes of problems in the secondary clarifier. The CPE documentation describes how to design, execute and interpret process audits.  Chapter 2. PLF’s and Computer-Based Solutions  27  Performance Audit In some cases. an audit is necessary to provide an independent evaluation of some aspect of the treatment plant.  For example, regulatory authorities may audit a laboratory  as part of their quality assurance program [107] [223]. Similarly, legal authorities may require an independent assessment of piece of equipment to settle a dispute between a client and an equipment supplier. 2.1.3  Plant Experimentation  Plant Experimentation (PLEX) involves executing one or more experiments in a limited amount of time to learn more about the process. Treatment plants with parallel streams enable the operator to experiment with one stream and use the second stream as a control. For example, Curran et al [86] conducted a stress test on one stream in a plant to evaluate its nitrification and clarification capacity. Experimentation on a single stream is also possible. Poole and Biol [245] modified the Mold Treatment Plant (Wales) from a completely mix system to a plug flow system with an anoxic zone. They supplemented their experimental work with plant simulations to help them understand the diurnal variation in nitrification. 2.1.4  Examination Of Historical Records  A researcher cannot make sense of a plant’s historical records without discussing their contents with the plant staff. Historical records are often incomplete, fragmented and, in some cases, open to misinterpretation. This information may be found in log books, data files and work orders. Berthouex et al [46] went through this process when they reviewed the operations records of 15 well-operated plants  .:  A well-operated plant is a plant that meets its discharge permit using conventional control methods. 3  Chapter 2, PLE’s and Computer-Based Solutions  28  One difficulty in working with historical data is judging its accuracy and precision. Making quality assurance tests today can only show that the ana lytical work is being done properly or improperly today. It cannot prove that past work was of the same quality. The quality of the data used was assessed by meeting with plant chemists and operators, seeing how samples were col lected, stored and analyzed, reviewing analytical procedures, and learning what quality assurance methods were being used. From this kind of on-site evaluation, an impression was formed that the data were reliable. In their study, a typical plant was upset 9% of the time and an average upset lasted about 3.5 days. Together, low DO, high flow, low MLSS, or solids handling problems, caused 60% of these upsets. 2.1.5  Summary: Quality and Information  Sources must be evaluated in light of their quality, coverage and completeness. The advantage of historical data over data from the other three sources is that it is long term. The disadvantage is that the researcher has no control over the process by which its was gathered. This is a situation that this thesis hopes to help change as well as one that the EPA has expended a great deal of effort to improve (see Section 4.4). Data collected as part of an experiment are easier to interpret than those collected as part of an audit. For this reason, PLEX data are preferred over CPE/CCP and audit data. CPE/CCP data are preferred over audit data because the CPE/CCP program in cludes numerous follow-up studies that ensure that the auditor’s diagnosis of the problem is correct.  Chapter 2. PLF’s and Computer-Based Solutions  29  Table 2.1: Performance Limiting Factors As Identified By The CPE/CCP Program PLF Code A B C D E F G H I J K L lvi N 0 P  Q R S T U  Description Of Performance Limiting Factor Poor understanding and application of process control by operator Staffing (too few staff, low pay, turnover, etc.) Support from municipality (administrative and technical) Operating Budget and user charge system Operability and maintainability considerations (process flexibility, au tomation, standby units, etc.) Infiltration and Inflow Construction Problems Process design errors (clariflers, aerators, disinfection, etc.) Over design Under Design Solids handling and sludge disposal Pretreatment, industrial discharges and toxics Operation and maintenance manual Preventive maintenance program Spare parts inventory Chemical inventory Laboratory capability for process / NP DES testing NPDES reporting Equipment/Unit process broken down or inoperable Hydraulic overload Poor aeration system  Results  2.1.6  The CPE/CCP EPA studies identified 21 factors that limit a plant’s performance (Ta ble 2.1).  These Performance Limiting Factors (PLF’s) can be sorted into one of six  groups [14]: 1. Operator  The plant’s performance is limited because the operator does not understand his process and/or is unable to use his knowledge to control his process. Because this  Chapter 2. PLF’s and Computer-Based Solutions  30  PLF did not identifywhy the operator had problems, the EPA feels in retrospect that this PLF was too ambiguous to be useful [14]. This PLF is often cited with “staffing problems” and “design errors”. This indicates that “operator ignorauce” may sometimes be a symptom of a problem with the plant’s design or management. 2. Management, Staff and Budgets Many PLF’s are due to staffing, management and budget problems. Inadequate staffing, high staff turnover, non-supportive management, and under funding can all cause suhoptimal plant performance, especially in small plants (< 10 mgd) [288] [88] [14]. 3. Plant Design and Construction Inflexible or poorly designed/constructed plants do not perform. In some cases, audits can identify the design weaknesses and modifications can be made to the process. This is particularly true of aeration and pumping systems. For example, Crosby [83] suggests that a leading cause of clarifier problems is due to hydraulic disturbances caused by pump cycles. With new pumps and/or controls, this prob lem can be eliminated.  Chapter 2. FLF’s and Computer-Based Solutions  31  4. Influent Characteristics A treatment plant can cope with changes in the influent if the operator can prevent these changes from destabilizing the process, i.e.  wash ont  the solids. If a problem  is acnte and is a serious threat to the plant’s operation, then the problem must be dealt with before that flow enters the plant. For example, in a community where a storm may cause a large increase in flow into the plant, the municipality can use a computer to divert flow past the plant and/or store flow in the sewerage system to enable the plant to treat as much flow as possible without being overloaded [11]. Apart from storm events and infiltration, the predominant cause of problem influ ents is industrial wastes. Wetzel and Scott [308] observed that 80% of the plants reporting operation and maintenance problems received industrial wastes. These problems included corrosion, flow obstruction, process upsets and fires. Wetzel ar gues that industrial wastes are the main cause of permit violations and ongoing plant upsets. .5. Preventive Maintenance and Proper Operation Equipment failure is a common PLF. Berthouex et al [46] cited mechanical failure, power interruptions, maintenance activities  and system modifications as common  causes of upsets in well-operated plants. Sensor failure is a common problem in highly automated plants. A 1984 survey showed that a Georgia Plant had only 20 sensors out 700 functioning properly, a Hunstville plant had only 1 sensor in 40 working and the Washington DC sanitary commission reported that their sensors were down 70% of the time [246].  Chapter 2. PLF’s and Computer-Based Solutions  32  6. Laboratory To produce good data, a plant needs knowledgeable staff and adeqnate equipment. The issue of data quality is discnssed in detail in Section 4.4. In the mid 1970’s, the EPA realized that the data collected by wastewater treatment plants was very poor. In response to this finding, the EPA launched a number of programs to improve the quality of data in treatment plants. Table 2.2 contains a summary of the 1982 DMR-QA Study 2 [64] [244]. Only 42% of the 7500 treatment plants studied passed all the tests.  2.1.7  Summary  No one factor limits performance more than lack of information. The problem is not that any of the above situations is without a solution, rather the problem is that operators, management and designers lack the confidence to act because they lack information. In other words, they need evidence that there is a problem and an indication of what that problem is. The most cost effective way to overcome this lack of information is to improve the reliability of the historical record. 2.1.8  Equipping The Operator To Improve Performance  A computer excels at well-defined and tedious tasks. For example, operators currently use computers to keep records, do repetitive calculations, draw graphs, print reports, log data collected on-line, manage preventive maintenance programs, track costs, and communicate with other computers [115] [232] [227] [246] [234] [236] [269]. The cost of CPU power and storage has dropped considerably giving treatment plants access to much more powerful machiues. As software development usually lags hardware development, much of the ongoing research concentrates on developing software to help  Chapter 2. PLF’s and Computer-Based Solutions  33  Table 2.2: DMR QA Study 2-Average Failure Rate  Parameter  pH Total Suspended Solids Oil and Grease  % Failure 14.1 17.1 23.1  Nutrients Ammonia-N Nitrate-N Kjeldahl-N Total Phosphorus Ortho-Phosphorus  36.9 37.8 43.2 29.8 37.8 34.1  Demands Chemical Oxygen Demauds Total Organic Carbon Biochemical Oxygen Demand  19.7 25.8 17.0 17.9  Metals  20.8  Plants That Failed All Tests Plants With At Least One Failure  21.6 58.0  National Discharge Monitoring Report (DMR) Quality Assurance Program (QA) 1982 involved 7500 plants [244]. A plant failed if its determination was more than two standard deviations from the average determination of 100 EPA and state laboratories.  Chapter 2. PLF’s and Computer-Based Solutions  34  the operator run his plant. The following discussion of this new software is broken into four parts: data analysis, modeling, expert systems and automatic control.  atistical Sample Size:  Small: N < 30  Medium: 30 < N < 100  Large: N> 100 Figure 2.1: Statistical Sample Size: {Small, Medium, Large}  Chapter 2. PLF’s and Computer-Based Solutions  2.2  35  Data Analysis  The two areas of statistics that impact the most on treatment plants are Initial Data Analysis and Statistical Quality Control as they suit the analysis of small data sets (Figure 2.1). 2.2.1  Initial Data Analysis  Initial Data Analysis(IDA) [71] [73] or Exploratory Data Analysis (FDA) [72] [217] [294] [264] [98] describes the first phase of a statistical analysis and inclndes the following steps: 1. Processing of data into a suitable form for analysis. 2. Checking data quality. 3. Calculating simple descriptive statistics. 4. Preparing graphs. 2.2.2  Data Processing  The goal of this step is to determine the structure of the data. The structure consists of two components: the system on which the data are collected and the measurement process used to collect the data. The number of measures taken on a parameter and the number of parameters deter mine what types of analysis can be conducted on the data set. Chatfield [73] warns that any model-fitting  is likely to be unreliable if the sample size is less than ten. In a treat  ment plant, ten data points may represent ten days of data. For this reason, a researcher lnferential statistics often involves the derivation of a model to which the statisticians fit the data. 4 For example, if the population is normally distributed, then the 95% confidence intervals can be con structed from a sample using the t-distribution.  Chapter 2. PLE’s and Computer-Based Solutions  36  must be careful not to “over-analyze” the data [50] [44] [45], i.e. use a statistical method that the data do not support. The number of parameters is also important. For example, if an analyst is given parameters and knows nothing about the structure, then he must look at (n(n  —  ii  1))/2  relationships. Moreover, the analysis of two parameters is significantly easier than the analysis of three parameters. If the analyst knows the layout of the system, he may be able to simplify the analysis considerably, i.e. parameter A cannot affect parameter B because there is no pipe that connects their unit processes. If the number of parameters exceeds the number of data points for a given parameter, then multivariate techniques should not be used. The type of measure and parameter also affects subsequent analyses. For example, if a measure is ordinal then the mean should not be used. The relationship between measurement scales and arithmetic manipulations is discussed in Section 4.1. Similarly, the analyst needs to know which parameters are causes or factors and which ones are effects or responses. This topic is discussed in Chapter 3. 2.2.3  Data Quality  Data are collected from the laboratory, from the plant and on-line. These data points must somehow be entered into a database. Missing values and non-numerical data must be coded in such a way that the computer can distinguish them from numerical data  .  The value of a data set rapidly diminishes as the number of missing values increases. The first impact of a missing datum is loss of power. If the datum forms part of a record (i.e. x,y pair), the record is ignored (i.e. if x is missing, the analysis cannot use y). If missing data form part of a intervention analysis or an experiment, the loss can make This is a problem if the software does not allow the user some method of distinguishing an empty 5 field from a zero. dBASE IV and Lotus 123 suffer from this limitation while BMDP does not.  Chapter 2. PLE’s and Computer-Based Solutions  37  analysis of the data impossible. Most statistical analyses can work around a few missing values but even this has some problems [198]. The best advice with respect to missing data remains, don’t have any [264]. The operator does have one advantage over the statistician he knows the origin of his -  data. For example, if two measures (e.g. COD and BOD) estimate the same parameter e.g. Substrate), the operator can exploit the relationship between the two measures (e.g. use a COD value to derive an estimate for a missing BOD value). Similarly, if a value for an operator-set value (i.e. manipulated parameter) is missing, the operator can safely assume that the value has not changed since it was last set. An outlier may occur due to an error in the measurement process (therefore can be ignored) or from a fault in the process (therefore indicates a problem). An operator can tell the difference only if he has a quality assurance/quality control program in place and software that checks the data as it is entered. For example, if the operator notices that most of the parameters measured on the same sample are outliers, he can go check if there is a problem with the sampler. One purpose of the measurement paradigm described in Chapter 7 is to help the operator exploit his knowledge of his plant and his monitoring program to overcome some of the problems associated with missing values and outliers. Other data quality issues are discussed in Section 4.4.  Chapter 2. PLF’s and Computer-Based Solutions  2.2.4  38  Summary Statistics  The initial examination of a data set uses robust estimators  6  of population parameters.  A robust estimator is one that is insensitive to small departures from the assumptions from which it is derived. The population parameters of interest include location, width, shape and association. Location A location statistic estimates where the “center” of a population lies, i.e. fixes its location on a number line. The operator should examine the characteristics of his data set in light of what he knows about his process in order to determine which estimator to use: • The number of data points in the sample: The different location statistics give similar values if the sample size is large but perform with varying degrees of accuracy if the sample size is small [12]. For example, if the data set is small, the median is a more efficient estimator of a population’s mean than is the arithmetic average [302].  • The distribution of the sample values with respect to magnitude: Location statistics perform about the same if the distribution is symmetric and unimodal but perform differently if this is not the case. For example, given a process with a limit cycle and an upset condition, the median will provide the best estimate of the expected performance because it is insensitive to the upset data points [255].  An estimator is a method that estimates the value of a particular population parameter. For example, 6 assume we have a normally distributed population (p, c). The median, mean, mode and ADA are all estimators of p.  Chapter 2. PLF’s and Computer-Based Solutions  39  • The distribution of sample values over time: Most estimates weight each sample equally. However, if the sample is taken to estimate an interval’s expected value and the sampling frequency varies, the analyst may have to apply a different weight to each datum. For example, assume a manager asks the operator to provide him with the average annual influent COD concentration. If the operator measured the influent COD daily during the summer (when it was strongest) but only weekly during the winter (when it was the weakest), then the arithmetic average would over-estimate the influent COD concentration. Instead, the operator should weight each COD value by the interval it represents and calculate the weighted average. An ideal (robust) estimator exhibits four characteristics: 1. Restricts the influence of any fixed fraction of wrong observations on the value of the estimate. 2. Rejects outliers which are too far away from the bulk of the data. 3. Limits the effect of rounding, grouping and other local inaccuracies. 4. Performs well under ideal conditions. The three most commonly used location statistics are the mean, mode and median (Table 2.3). The arithmetic mean is a linear combination of sample values, the median is an order statistic and the mode is an occurrence statistic. This is important because the type of statistic determines the type of data to which the estimator can be applied (see Section se:meth). The Princeton Monte Carlo Project [12] evaluated these and 65 other location statistics using these criteria. The Project found that the median performed better than the mean or mode when the sample size was small and the population’s  Chapter 2. PLF’s and Computer-Based Solutions  distribution was unimodal but asymmetric.  40  This is why the Interlaboratory Quality  Assurance Program (International Joint Commission) decided to use the median instead of the mean in their program [16]. The mean is not a robust estimator. The Project observed that “...the mean is a horrible estimator, except under strict normal distribution; it gets rapidly worse even under mild deviations from normality, and at some distance from the Gaussian distribu tion, it is totally disastrous  ...“  [12, p. 244]. The mean is difficult to interpret if the  distribution of the data is not known. Despite these problems, the mean is still the most commonly used location statistic. There are at least five reasons for this: 1. Application or tradition requires it, e.g. permit written in terms of the average monthly discharge. 2. Cost of redoing analysis programs is prohibitive, e.g. Lotus 123  ©  has a function  for mean but not median or mode. 3. Majority of inference statistics use the mean, e.g. Analysis Of Variance. 4. Mean is easy to calculate as it is a linear combination of the data, e.g. the mean can be calculated recursively while the median and mode cannot. 5. Some populations of data are indeed symmetric and unimodal, e.g. instrument error  8  The mean should not be used when reviewing data, at least, not by itself. Instead, the median should be used for non-categorical data and the mode, accompanied by The Project found that the best location statistic for small sample sizes (n < 10) was the Adaptive-M 7 Estimator (ADA). This estimator is described in the glossary (Appendix B). Unless a standard is near the detection limit of an instrument, repeated measurement of a standard 8 should result in a Gaussian distribution.  Chapter 2. PLF’s and Computer-Based Solutions  Table 2.3: Mean, Median, Mode  -  41  Common Location Estimators  Mean Arithmetica  Median  Mod& a  C  Arithncetic =  Geometricb  Geome  Harmonic°  XHarmonic =  1 p(xj)xj where L  p(Xj) =  i/ri  tric = 1 Vl = i 1 n/(ZL  xj  ) ifnisodd ifniseven  Most frequent value or interval.  1/n, then we refer to the mean as being a weighted mean. The geometric mean is used when the variable changes at a rate proportional to itself, i.e growth rates [255]. The geometric mean should he used with data with a ratio scale. The harmonic mean is used when the observations of what we wish to express with the arithmetic mean are given in inverse proportion, i.e. mean velocity over a proportion of road or mean lifetime [255] The harmonic mean should he used with data with a ratio scale. The mode of a non-categorical measure is an interval rather than a singleton. The number of groups (intervals), m, should satisfy the criterion [134]: n 2”. If p(xj)  .  ci  occurrence, for categorical data. The operator should have no problem interpreting these statistics because of their straight-forward definition and robust behaviour. The more esoteric and efficient estimates, such as ADA, should not be used because they are difficult to calculate and equally hard to explain. The mean shonid be reserved for those cases where it is required or the data support it.  Chapter 2. PLF’s and Computer-Based Solutions  42  Table 2.4: Common Width Estimators  Variance  =  -  Single Valnes  {z (x 1  )2]  Absolute Mean Deviation  MD  =  [Z  (I  x  Absolute Median Deviation  D  median  {I  x  Quartile Rangea  QR  =  =  Qi  —  —1)  )J /n  —  —  Quartiles are values that divide a set of observations into 4 equal parts. The values, denoted by Qi, Q2, and Qs, are such that 25% of the data falls below Q, 50% falls below Q2, and 75% falls below Q [303]. Q2 is the median. a:  Width A width statistic measures the variation of the samples around the center (of a unimodal distribution). A location estimate, e.g. mean, should always be accompanied by (at least) a width statistic and a sample size. Table 2.4 lists four commonly used width estimators. The most commonly used width statistic is the variance (or standard deviation). The variance is calculated using the mean. Therefore, it performs poorly either when the sample size is small and/or the distribution is skewed. The interpretation of the variance  Chapter 2. PLF’s and Computer-Based Solutions  43  is dependent on the distribution being normal [2641: If the signal is Gaussian, the limit of guaranteed detection is centered at six standard deviations from the mean of the background noise. At this point, only 0.13% of the distributions overlap. However, if the background distribu tion is not Gaussian, this probability (as given by Tschebyscheff inequality) can be as high as 11%. The Project [193] observed that few dispersion estimators work well when the sample size is below 20. The reason for this is two-fold: (1) most width statistics are calculated using location statistics and (2) most width statistics involve the calculation of differences. For example, assume that the mass of solids at the start of an experiment is given by x and the mass at the end by y. Assume that the standard deviation for the mass of solids is given by a and a respectively. If the destruction of the solids is given by x  —  y, then the standard deviation  =  + u. In other words, the uncertainty  of a sample-based variance is higher than a sample-based mean because the variance is calculated using a difference that involves the mean. A more effective approach to describing a sample’s dispersion (for small sample sizes) is to provide a histogram that shows how the sample is distributed. An alternative for non categorical data to the histogram is Tukey s Five Number Summary which consists of the maximum, upper quartile, median, lower quartile, and minimum. A sixth number, sample size, should also be provided. The Box and Whisker Plot is a graphical representation of this summary. Given that most categorical measures in a treatment plant have no more than ten categories, the best approach would be to provide a histogram for categorical measures and Tukey’s summary for the non-categorical measures.  Chapter 2. PLE’s and Computer-Based Solutions  44  Shape A given distribution may deviate from a Normal distribution in three ways: 1. The distribution develops another mode. 2. One of the two tails lengthens and the distribution skews: The skewness is positive if the right side skews and negative if the left side skews. 3. The maximum occurrence is higher or lower than that of a normal distribution: If the maximum is higher, the distribution is peaked or exhibits positive kurtosis while if the maximum is lower, the distribution is flat or exhibits negative kurtosis. Table 2.5 lists three shape statistics that use order statistics. These statistics work best if the sample size is over 20 and the distribution is unimodal. A more effective way to examine the shape frequency distribution of a small sample is to use a histogram or a stem and leaf plot. 2.2.5  Relationships  The detection, definition and determination of relationships among data sets are fun damental to treatment plant operation and design. An operator detects the possibility of a relationship primarily through plotting the data. If the relationship is linear, the operator may confirm his findings by calculating a correlation coefficient and reasoning whether such a relationship can exist. Correlation analysis should only be done after the data has been plotted, i.e. screened. For example, Sadalagekar et al [256] determined the SVI and filament length for a number of activated sludge samples and calculated the Pearson correlation coefficient to be 0.95. However, a plot of their data shows that most of the SVI data points are under 125 mL/g (Figure 2.2). If Tukey’s outlier detection method [41] is applied to screen out the outliers,  Chapter 2. PLF’s and Computer-Based Solutions  Table 2.5: Skewness and Knrtosis  45  -  Order Statistics  Skewness II  +DZ 9 DZ 2 S= 1 Skewness II ranges between —1 and 1.  Skewness III  S 1 11 = Q3+Qa—2 Skewness III ranges between —1 and 1.  Knrtosis  K = 1 —DZ 9 2(DZ ) For a normal distrihntion, K  =  0.263  There exists 3 values which partition a frequency distribution in to 4 equal parts. The central value is the median, fl; the other two are designated the lower or first quartile, Qi and the upper or third quartile, Q3. Three-quarters of the samples are less than the first quartile while one quarter of the samples are less than the third quartile. The decile, DZ divides the distribution into ten parts. 90% of the values lie beneath the first decile, DZ 1 while decile, DZ . These statistics can only be applied 9 to non-categorical scales [255].  Chapter 2. PLE’s and Computer-Based Solutions  46  Table 2.6: Correlation Example: Correlation Analysis and Outlier Detection Case All Data (n=30) No Extreme Outliers (n=28) No Outliers (n=25) Critical values is a  SVI Range (500,9.2) (180,9.2) (125,9.2) =  Pearson’s Coefficient Calculated Critical 0.95 0.36 0.82 0.37 0.50 0.40  Spearman’s Coefficient Calculated Critical 0.97 0.36 0.83 0.38 0.74 0.40  0.05, two-sided test [255, pp. 398 and 425]  the correlation coefficient drops considerably (Table 2.6) and the range of interest shrinks. The same pattern is observed if the more robust but less powerful Spearman coefficient is used. Apart from data entry, no part of statistics is as tedious and yet as important as screening the data before an analysis. In order to construct a scatter plot or to calculate a correlation coefficient, each data point on the x-axis (abscissa) must he paired with a single datum on the y-axis (ordinate). This implies that all parameters must be measured at the same time and frequency, which, from a wastewater  treatment  plant operator’s viewpoint, is ludicrous.  The alternative is to match the series on the basis of the least common intervaL For example, assume an operator wishes to plot DO vs MLSS, and DO vs effluent solids concentration (TS). The operator obtains the three measures as follows: 1. The operator collects the DO probe readings over an hour, and calculates their mean and standard deviation, i.e one datum every hour. 2. The operator takes a grab sample of the mixed liquor every 12 hours aud conducts a single solids test, i.e. one datum every 12 hours. 3. The operator uses an automatic sample to form a 24 hour composite sample of the effluent. The operator condncts a solids test (TS) on this sample, i.e. one datum every 24 hours.  Chapter 2. PLF’s and Computer-Based Solutions  47  Filament Length vs SVI Tukey’s Outlier Detection 400 ,,  350 OUTLIER  300  EXTREME OUTLIER  <  250 ‘  _  200  a) 150  2 tt  50 0  I  0  All Data Used No Extreme Outliers No Outliers  R=0.95 R=0.82 R=0.53  100  50  100  150  I  I  I  200 250 300 SVI (mUg)  I  I  350  400  Figure 2.2: Correlation Example: Filament Length vs SVI  450  500  Chapter 2. PLF’s and Computer-Based Solutions  48  The least common interval between the DO and MLSS is 12 honrs.  Therefore, the  operator pools the DO data to form the best estimate of the 12 honr interval and plots it against a single MLSS valne.  The least common interval between the MLSS and  effluent TS is also 12 hours, not 24 hours. The TS is a physical (versus statistical) 24 hour average and is the best estimate of the TS concentration over both 12 hour intervals in a day. If the TS had resulted from a grab sample, the least common interval would have been 24 hours. The construction of these derived series is not an exercise a statistician would feel comfortable with. The primary difference between a statistician and an operator is that a statistician analyzes data while an operator analyses his plant. In other words, an operator is able to make compromises during the analysis of the data based on his knowledge of his process. This is another manifestation of the Result Evaluation Principle. What an operator regresses depends on what an operator is looking for. The operator groups parameters to determine particular data characteristics. In wastewater treatment, parameters are most commonly grouped to address concerns arising from the monitor ing program (PMS), the quality assurance program (QA/QC), the passage of material through the plant and operating decisions: Parameter-Measure-Sample: A datum originates from a measure conducted on a sample to estimate a parameter. A number of measures may be taken on a sample (e.g. COD, Solids and Total Phosphorus taken on the influent composite sample) while a number of measures may he taken to estimate a parameter (e.g. COD and BOD 5 estimate substrate concentration). These relationships are formalized in the PMS plane discussed in Chapter 7. An operator can make at least three comparisons using this grouping:  —  An operator may compare two measures (e.g.  COD, BOD ) of the same 5  Chapter 2. PLF’s and Computer-Based Solutions  49  parameter (e.g. Influent substrate) if he suspects one of the measures or he needs to replace a missing data point. —  An operator may compare two parameters (e.g. influent and effluent Fe) to ensure that one parameter is consistently less than the other (e.g. not adding too much pickle liquor).  —  An operator may compare two measures taken on the same sample if he sus pects the sample is contaminated, i.e. all values that day appear to be out of line.  • Quality Assurance/Quality Control: A datum’s measure may be compared against a standard or a more reliable measure while a datum’s sample may be checked against an independent sample. Koopman et al [187] discuss an example where a plant’s inability to meet its discharge permit was traced, using these types of comparisons, to a problem with the sampling equipment. • Structure: A set of parameters may be grouped by whether they precede or follow a given parameter, e.g. what affects the MLSS. • Operation: A set of parameters may be grouped by whether they are causes (i.e. disturbance or manipulated parameters) or effects (i.e. status or performance parameters). Once a relationship is detected, an operator may decide to reduce the relationship to an equation, i.e. characterize the relationship. Characterization consists of identifying the equation’s structure and determining the equation’s coefficients. A discussion of model identification techniques is beyond the scope of this thesis. However, this does not preclude a discussion of the types of relationships that interest an operator and some of the limitations his process places on his analysis.  Chapter 2. PLF’s and Computer-Based Solutions  50  An operate may regress data for one of three reasons: 1. Measure: A regression can be used to summarize data by reducing an aspect of the data to a single value, i.e. statistic. Morris et al [216] showed that the Vesilind coefficients provide a unique measure of the settleability of a sludge. They recommend the coefficients be determined every 2-4 days during stable operation and daily when the plant is stressed (e.g. settling problems). 2. Prediction and Planning: A regression can he used to formalize an observed relationship between two or more parameters. Because the basis of the equation is association (not causation), the equation will work most of the time as long as the circnmstances in the plant do not change. For example, Keinath [184] derived an empirical relationship that related stirred S\71 to the Vesilind coefficients [3SmL/g  =  VoeX  =  [15.3  SVI  220rnL/g]:  —  (2.1)  Daigger and Roper [87] developed a similar relationship which is suspect because their data included plants that used pickle liquor for phosphorus removal. A relationship may also be characterized via a reference or frequency distribution. A reference distribution (i.e. samples-based histogram) provides a measure of what can be expected [45] while a frequency distribution [224] may be used to predict rare events (e.g. equipment failure [271], peak flows [113]).  See Subsection 3.2.1, correlation due to inhomogeneity 9  Chapter 2. PLF’s and Computer-Based Solutions  51  3. Operation: A regression equation between a cause and effect may be used to control the process. The independent variables should include at least one ma nipulated parameter while the dependent variable may be a status or performance parameter. A status parameter may also be used as an independent variable. The range of applicability of the equation should also be provided. For example, White [309] developed the following relationship for allocating clari fication capacity and setting the recycle rate:  (  +  MLSS < 31085 V1° 77  ()  0.68  (2.2)  where A  Area [m ] 2  MLSS  Mixed Liquor Suspended Solids [g/l] Influent Flow Rate [rn /h] 3 Recycle Flow Rate [m /h] 3  SSVI  Stirred Specific Volume Index [ml/g]  Keinath et al proposed a similar but slightly more complex system that uses an experimentally determined mass flux curve [185]. Three characteristics of treatment plant data limit the utility of regression: 1. Treatment plant data are often multi-collinear, i.e independent variables are corre lated. 2. Treatment plants are under control, i.e. the operator compensates for important sources of variability.  Chapter 2. PLF’s and Computer-Based Solutions  52  3. Manipulated parameters may change over a limited range, i.e. range of applicability of an equation is narrow. Most secondary clarification models use an empirically derived relationship that includes clarifier influent flow rate and MLSS [22.5]. If the operator sets the recycle rate to be a fixed fraction of the plant infinent flow rate, then the two flow rates will he correlated. A step-wise regression algorithm will use only one of them. The resulting relationship will hold true as long as the operator does not change his operating strategy. The second problem is that the operator compensates for important sources of vari ation, i.e. disturbance parameters. One possible signature of a well-operated process is that there is no correlation between the inflnent and effluent characteristics. NCASI looked at the influent and effluent of 39 treatment systems including 17 activated sludge plants that treat pulp and paper mill wastes [226]: The major observation is that the treatment processes reviewed in this bul letin appeared relatively insensitive to the variation in influent quality. Efflu ent quality and its variation more than likely reflects the operating state of the individual systems. The third problem is a corollary to the second problem. In order to learn what happens when you interfere with a process, you must interfere with it, not just passively observe it [135]. Therefore, another possible signature of a well-operated plant (with a fixed influ ent pattern) is that the data contain little information on how to change a manipulated parameter. For example, W. G. Hunter recounts the story about a newly graduated statistician who analyzed a chemical plant’s data and then prioritized the causes of variance in the plant’s performance. At the end of the presentation, he stated that the least important variable is the amount of water present. Much to the statistician’s surprise, his audience laughed.  It was, in fact, easily the most important one, because, if any  water  were  Chapter 2. PLF’s and Computer-Based Solutions  53  allowed to enter this particular plant, the plant would gloriously and very definitely explode.” [165]. A computer program can aid the operator in determining relationships in three ways:  1. Data Management: A computer is faster than an operator at grouping parame ters and extracting their respective data series. An operator would gladly pass this tedious task over to a computer. 2. Data Manipulation: A computer, if provided with the context of the data, can derive the data series based on the least common interval. A computer excels at this well-defined repetitious task. 3. Guidance: A computer can retrieve information (e.g. interpretive rules) much in the same as it does data if it is provided with a method of matching the information needed with a particular situation. The latter task is the subject of a  review  paper  (followed by a set of discussion papers)  by Gerald J. Hahn [136]. Using elements of Hahn’s expert system for Product-Life data analysis, we can describe what a data analysis program should be able to do for the operator:  • Setup: The operator indicates what he wants to find out and the computer program sets up the problem, i.e. viewpoint. The computer determines which parameters are involved and extracts their data series. • Verify: The program characterizes the data and matches the data to the statistical method.  Chapter 2. PEF’s and Computer-Based Solutions  54  • Execute: The program steps the operator through the analysis. When the com puter requires the operator’s opinion, the computer queries the operator and pro vides a list of possible answers. • Interpret: The computer indicates how the operator can interpret the results by weighing the outcome against the data series’ characteristics. Ideally, each stage of the analysis should have a graphical and textual representation. As well, a complete report should be geuerated at the end so the operator can review what he and the computer have accomplished. A good operator is both suspicious and appreciative of statistics. The computer should cultivate these attitudes by pointing out the weaknesses and strengths of an analysis.  George Box summed up this attitude in his axiom [181]: “All models are  wrong, but some are useful”.  Chapter 2. PLF’s and Computer-Based Solutions  55  14  14  12  12  10  10 oD ..  8  8  .o...°  6 4  6 0  4  2 00  CD  2 5  10  15  20  00  14  14  12  12  10  5  10  15  20  C  1o...—.-::..—-. 0  8  8  6  6  00  5  10  15  20  00  Figure 2.3: Anscombe’s Quartet  5  10  15  20  Chapter 2. PLF’s and Computer-Based Solutions  2.2.6  56  Statistical Graphs  Graphics reveal data and can be more precise than conventional statistical computa tions [77]. For example, F. J. Anscombe derived data sets that can be described by the same linear model (Figure 2.3) [141]:  Number of Data Points :  10  MeanofX : 9.0 Mean of Y : Coefficient Of Determination :  7.5 0.82  Equation : Y  3 + 0.5X  This example illustrates why an operator should always plot his data first. The most commonly used plots are listed below [77] [70] [98]: • Scatter Plot: Plot two parameters against each other. A Time Series Plot is a special case of a scatter plot in which one parameter is time. A common way to view a number of parameters is as an array of scatter plots. One very powerful method to analyze an array of scatter plots is to allow the user to define a box in one plot that contains a set of data points, then highlight the location of these data points in every other plot in the array. • Cusurn Charts, Difference Plots: A cusum plot is a plot of the cumulative sum of differences from a location statistic. Difference plots are plots of first and second differences between elements in a time series.  Chapter 2. PLF’s and Computer-Based Solutions  57  • Histograms, Percentile, Box and Whisker Plots and Stem and Leaf Plots: These plots show how the data are distributed. Plant operators find that histograms are an effective way to monitor a process [45] [44]. Figure B.2 contains a simple box plot and Figure B.2 contains a stem and leaf plot. • Andrew’s Curve, Chernoff Faces and Glyphs: These plots display more than one parameter in a single plot, e.g. the plant’s effluent characteristics at a particular instance. An operator assigns each variable to a parameter in an harmonic function. The frequency spectra of this function forms the Andrew’s curve. A Chernoff face is constructed by assigning the deviation of a measure from its set point to a feature of a face. The further a measure is from its desired value, the more distorted the facial feature. A Glyph is a circle with rays emanating from its circumference. A datum’s value is assigned to a ray. The purpose of these plots is to provide an image that an operator can scan quickly but still notice important changes in the data. • Control Charts: These charts are plots of successive statistical measures or other values of a random variable, e.g. mean, range, proportion and trend.  Control  charts are routinely used in QA/QC programs [248]. Berthouex et al [43] used control charts to the monitor effluent quality. • Autocorrelation Function and Spectral Density: The autocorrelation func tion and spectral density function enable the analyst to detect cycles in the data. Graphical methods can be abused much in the same way that statistical methods can because the interpretation of a plot is invariably based on some assumptions about the data. Graphical analysis is time consuming unless the software is set up to automate the production of plots. For example, given a small data set consisting of ten parameters  Chapter 2. PLF’s and Computer-Based Solutions  (one of which is time) there would be ((10(10  —  l))/2)  58  =  45 plots to analyze. If we assume  it takes 10 minutes to analyze one plot, then it would take 450 minutes or a work day to analyze the data. And even then, the more complex relationships involving three or more features would be missed [264]. The alternative is to provide a computer program with the ability to decide which plots the operator may be interested in seeing. The operator could page through these plots first, and if wanted to view additional plots, request the computer to add these plots to its list. 2.2.7  Advanced Statistical Methods  A simple rule in statistics is that the more powerful or complex an analysis, the more demands the analysis places on the data set. If the data set cannot match these demands, then the results of the analysis are suspect. This fact motivated many universities to redesign their statistics courses to free the student from the rigors of computation so that they concentrate on problem solving [71]. The students are given real data, complete with missing values and asymmetric distributions, and taught how use statistics to characterize then analyze these data. Similarly, applied statisticians advocate that statistical software he made more “intelligent” [136]. The software would step the user through a complete analysis starting with IDA and moving onto more sophisticated methods if the data and problem warrant it. The goal of these two groups is the same  to emphasize that  statistical dollar is better spent on planning a study than on analysis. A number of advanced statistical methods have been tried on treatment plant data including time series analysis. Time series analysis exploits auto- and cross- correlations within the data set. The cause of this correlation in wastewater treatment is twofold. The first cause is due to the actions within the unit process, e.g. mixing, aeration and separation. For example, assume two aeration basins are linked in series with tank A upstream of tank B. The MLSS in tank A and tank B are correlated because the solids  Chapter 2. PLF’s and Computer-Based Solutions  in A flow into B. Similarly, the MLSS in tank  14  59  is correlated with the MLSS in tank  A an honr later for two reasons: (1) the biological activity is partly dependent on the concentration of the active mass and (2) the mixing averages the characteristics of the solids across the reactor. The second cause of antocorrelation is due to auto correlated inputs, e.g. diurnal flow variation.  Numerous researchers, including Dehelak and Sims [93], Crowther et  al [85] [84], Berthouex et al [8] [50] [47] [48] [49] [51] and Hiroaka et al [146] [147], examined the auto- and cross- correlation functions in wastewater treatment plants. Most characteristics show a 1 to 3 day dependency and a seasonal component. The reason for this low dimensionality is that the noise to signal ratio of most environmental measures is high.  Chapter 2. PLF’s and Computer-Based Solutions  60  Modeling  2.3  This section focuses on the use of a model to simulate and control the process. For a review of the current state of wastewater treatment plant modelling, refer to Lessard and Beck’s survey paper [195]. All advanced control schemes require a model of the process. Adaptive controllers usually use a stochastic model that is identified recursively on-line while model following (model reference adaptive) systems use a mechanistic model. For example, Kabonris and Georgakakos [178] designed a model reference set-point control system for wastage and recycle rate. The controller uses the IAWPRC model [142] to determine the optimal control settings to maintain the effluent quality and minimize energy consumption over a 24 hour period. Before a loop can be closed using a controller, the structure and the parameters of the model must be correctly identified. There is a great deal of doubt in the wastewater and automatic control literature whether this is possible with some control loops. 2.3.1  Identifiability  The construction of a model (i.e. structural and parameter identification) from data involves three entities [199]: • The Data: A set of ordered pairs of input and output • The Model Set: A set of candidate models containing the real model  10  • The Criteria: A method to determine which model is the real one Figure 2.4 outlines the identification cycle. The data are obtained from an experiment. The type of model in a model set ranges from being mechanistic to being stochastic (i.e. Best Model: The real—life system is an object of a different kind than our mathematical models. t0 Mathematical descriptions are templates we place over the real-world to order what we see. Therefore, the real model is one that describes what we see in full.  Chapter 2. PLF’s and Computer-Based Solutions  61  black box). There are several ways to fit a model to data (i.e. criterion), of which, least squares is only one of many [199]. A model is valid if it satisfies at least three criteria: • Model agrees sufficiently with observed data: The model is run alongside the system to see if it follows the process. The data set on which a model is identified should not be the same data set on which it is validated [31]. • Model is good enough for the purpose it was intended: The model’s per formance is compared against the modelling objectives. • Model describes the “true” system: The model’s dynamics are compared against past data and theory (via simulation)  “.  The identification fails if the model is invalid or if the model is nonidentifiable. The latter case is the focus of this section as it is the “Achilles heel” of most modelling and control exercises in wastewater treatment. The problem of deciding how to use the criteria to determine which parameter set best fits the model to the data is referred to as the parameter identification problem. A model is considered identifiable if an analyst can use the criteria to determine the best parameter set. Therefore, parameter identifiability is dependent on both the characteristics of the model and the data set. Godfrey and DiStefano [123] divide parameter identifiability into two classes: (1) deterministic, structural or a priori identifiability and (2) numerical or a posterior iden tifiability. Deterministic non-identifiability occurs when two parameters are confounded and cannot be separately identified. Complex nonlinear models are susceptible to this type of problem. Once a parameter is known to be identifiable, the next step is to deter mine the accuracy of the parameter estimate given a particular stochastic input function “The true system is an esoteric entity that cannot be attained in practical modelling. We have to be content with partial descriptions that are purposeful for our applications” [199, p. 430]  Chapter 2. PLF’s and Computer-Based Solutions  62  Table 2.7: Holmberg’s Batch Reactor Equations  dx ds  = =  1 1t 5 — x 8 7 S  K + 8 s  —  x  Y Ad  m t [  :  : : :  concentration of microorganisms concentration of growth limiting substrate specific growth rate yield coefficient decay rate coefficient maximum growth rate Michaelis- Menten constant  (i.e. numerical identifiability). For example, Holmberg [1.56] published a paper on the identifiability of microbial growth models incorporating Michaelis- Menten (Monod) type nonlinearities. Holmberg examined microbial growth in a batch reactor (see Table 2.7). Holmberg studied the sensitivity functions for the four parameters, {K ,1 3 um, K , Y}. 1 The sensitivity function,  ,  describes the effect of small perturbations in the parameters,  {p : K, ,urn. Kd, Y} on the state variables, {z : x, s}. The sensitivity function of lAm and 3 are indistinguishable with respect to either x or .s. This indicates that these pararne K ters may be difficult to identify, i.e. will give the analyst the most problem. Pohjanpalo’s identifiability test confirmed that all the parameters are structurally identifiable. Holmberg introduced noise into the parameters and observed that the standard de viation of K 8 and [m could be c 7007 greater than the standard deviation of s and 200  Chapter 2. PLF’s and Computer-Based Solutions  Figure 2.4: Ljung’s System Identification Loop  63  Chapter 2. PEE’s and Computer-Based Solutions  64  x. Holmberg warns that with a few noisy nwasurements, an analyst will not be able to determine K 3 and  gUm  with sufficient accuracy. The paper concludes that (1) the  Michaelis-Menten model is at best an empirical model rather than an internally descrip tive one and (2) a number of K 3 and  sUm  pairs fit the data set with a moderate amount  of noise equally well. Holmberg’s results challenge the validity of wastewater treatment plant models that consist of a series of Monod type equations. M. B. Beck examined the area of wastewater modeling in a number of papers [31] [32] [33] [34] [35] [37] [36]. He observed that modellers take one of two approaches: stochastic and mechanistic. Stochastic models are limited by treatment plant data, which usually vary within a narrow range exhibiting very few degrees of freedom. These models are typically under-parameterized but accurate as long as the conditions under which they were identified prevail. The second approach is to construct complex nonlinear models. The model parameters are identified in laboratory experiments [114, pp. 347]: Most of our knowledge of the activated sludge system is based on experiments carried out under so-called controlled laboratory conditions. Such studies can be much broader and more diversified than experiments in pilot or full-scaled conditions, but are often biased by undne influence of some factors which cannot be properly controlled or modelled. It is usually taken for granted that the behavior of the full-scale plants will be close to that of models stud ied, and that the relationships among process parameters, observed in smaller scale, will be equally valid in the full-scale plants. However, quite often the fluctuation of wastewater quantity and quality, as well as the instability of some process parameters, together with the inadequacies of metering and sampling, are such that in specific cases expected correlations cannot be real ized in practice. The data are characterized by such a scatter of information that their correlation would be without any practical meaning. In other words, complex mechanistic models tend to be precise but inaccurate. lvi. B. Beck summarized these problems in two dilemmas [35]: Model Complexity Dilemma and Prediction Error Dilemma. The basis of these dilemmas is the over and under parameterization of the model with respect to the data set [199]. For example,  Chapter 2. PLF’.s and Computer-Based Solutions  65  assume the true system is giveu by Equation 2.3 (linear case) and Equation 2.4 (nonlinear case).  Y  =  bX + 3 1 +2 aX cX + d  (2.3)  dXXX  (2.4)  Y  =  In order to understand the consequence of Beck’s dilemmas, consider the following sce narios: • Scenario 1: The data set has 2 degrees of freedom (df), i.e. X ,X 2 3 are constant and the model has 2 df:  —  Linear Case: Y In this case, d  —  =  =  1 +d aX  d + aX . 3 2 + bX  Nonlinear Case: Y In this case, d  =  =  dX  dXX, assuming that  a  a.  Scenario 1 is the ideal stochastic case, i.e. the df of the data equal that of the model. If X 2 or X 3 cease to be constant, then both model equations will give precise but inaccurate predictions, i.e. they will consistently predict the wrong future. If X 2 and X 3 change slowly, the analyst can use some form of recursive identification scheme [200] to update the estimates of a and b.  Chapter 2. PLF’s and Computer-Based Solutions  • Scenario 2: The data have two degrees of freedom but the model —  —  Linear Case: Y  66  oniy one:  1 aX  =  Nonlinear Case: Y  =  Xf  Scenario 2 is Ljung’s under parameterized case [199]. In these cases, a may be difficult to determine and its value will depend on the fit criterion. Scenario 2 will never provide a good prediction of Y even when X 2 remain constant. The 2 and X problem is similar to trying to describe a plane with a line. Scenario 2 will have the same problems as scenario 1. • Scenario 3: The data set has 2 df and the model has 3 df: —  Linear Case: Y  =  1 + êX + d aX  A step-wise regression algorithms will “kick out” the X term. If the user sets the term a priori (e.g. from a laboratory experiment), then d will change accordingly. —  Nonlinear Case: Y  =  dXXf  The identification algorithm can choose any value for d, X and ê as long as it satisfies the following equation:  =  dXX  (2.5)  If X is X , and X 2 2 ceases to be constant, the model has the potential of predicting the correct future. However, the user has no way of knowing when it is doing so since the values of the extra parameters were chosen arbitrarily. Scenario 3 is Ljung’s over parameterization case. Scenario 3 presents a different problem to 1 and 2. If the user conducts a series of laboratory tests to determine  Chapter 2. PLF’s and Computer-Based Solutions  67  some of the model coefficients (e.g. kinetic rates, half saturation constants, sub strate fractions), then the user must have a method of determining if the laboratory values hold true in the field. The user cannot determine this from the field data alone and, therefore, can never be certain that the model will predict the correct future. Beck [36] argues that this is particularly a problem with carbonaceous substrate degradation because the responsible biomass is heterogeneous and the substrate measurements are noisy and nonspecific. The consequence of Beck’s dilemmas can be reduced to three points: 1. Over-parameterized models that rely on laboratory experiments for determination of their coefficients cannot be relied on because there is no way to verify that the coefficient values hold true in the field. 2. Any model that uses field data to determine some of its coefficients can be relied on as long as conditions in the plant do not change, i.e. there is no advantage to over-parameterized models. 3. Any model that is used for control must be recursively identified on line and, therefore, should not be over-parameterized. A model cannot be used to control a complex system unless the model’s structnre and parameters are identifiable. The model must be complete in the sense that it enables the operator or computer to explain the past and plan the future.  12  To plan means to  be able to define a set of control actions that will move the system from its current state Kalman Decomposition Theorem [18] states that a linear system must be both observable and 12 reachable before it is identifiable. Observable means that given the input/output history of the system, you can predict its current state while reachable means that given its current state, you can define a set of inputs that will drive it to a another state.  Chapter 2. PLF’s and Computer-Based Solutions  68  to a desired state. To explain means that given its current state and its input/output history, you can determine what a past state was (see Section 3.1). The point of this discussion to underline how a computer can use a model to assist the operator. A model can be used to help the operator analyze his data, i.e. act as a basis of comparison. A model may also be used to make short term predictions provided the model’s parameters are being constantly adjusted to reduce the error in these predictions. However, over-parameterized (and therefore partly laboratory identified) models should not be used as the basis for control.  Chapter 2. PLF’s and Computer-Based Solutions  2.4  69  Expert Systems  An expert system is an automated process which incorporates the judgement, experience, rules of thumb, and intuition used by a human specialist to emulate that specialist’s problem solving ability. Usually, the knowledge is stored in a computer in the form of facts and decision rules, although many more complex knowledge representations are available (e.g. neural networks). Au expert system consists of two parts: • Knowledge Base: A database of domain specific knowledge (cf basic source code). • Inference Engine: A knowledge interpreter (cf basic interpreter). The inference engine forms part of the shell. A shell is the total system minus the knowledge base. The main advantage of an expert system is that it is an available source of knowledge no matter what time of day it is, i.e. human experts unavailable. Expert systems have been applied to a number of tasks in wastewater treatment plants: • Recommend solutions to wastewater and water treatment process problems [242] [174] [203] [230] • Review monitoring data and make control decisions [190] • Recognition and diagnosis of sludge bulking [121] • Detect failure in an anaerobic digester and recommend action [192] • Recommend control actions based ou the DO profile in a PHOSTRIP plant [109] • Evaluate compliance data from a number of plants [312]  Chapter 2. PLF’s and Computer-Based Solutions  70  • Identification of performance limiting factors following the EPA’s CPE/CCP pro cednres [69]. There are at least fonr reasons why expert systems have met with mixed snccess in wastewater treatment: 1. Expert systems are limited by lack of access to other databases, e.g. monitoring and maintenance information [242]. 2. Expert knowledge is not available. Knowledge is either is too vagne or too specific leading to a mediocre system [241]. 3. False alarms and rnle conflicts undermine the operator’s confidence in the sys tem [190] 4. A good operator does not need an expert system to help him run his plant, i.e .the operator is the expert [190]. 2.4.1  Integration: Access To Other Data  The trend in the industry is away from stand alone expert systems to embedding the technology into existing programs, e.g. spreadsheets, databases, statistical packages [230]. An expert system can be integrated with other software in a number of ways including using mixed language programming (e.g. Prolog and C), linkable expert systems  13,  shells with “hooks” to other commercial software, and rule compilers. In order not to “cripple” a piece of software, like the one envisioned in this thesis, the expert system component should he coded as a number of small rule bases that the software’s internal language interpreter can process. At this point in time in wastewater treatment, the e.g. NEXPERTOBJECT 13  © Neuron Data, 165 University Avenue, Palo Alto, CA  Chapter 2. PLF’s and Computer-Based Solutions  71  integration issue far outweighs the issues surrounding how knowledge is represented and interpreted. 2.4.2  Expert: Is There One?  To develop an expert system, there must be an expert. This may seem trivial, hut in reality, many experts are not experts at all. An “expert” is a person who can both do and explain the task [38]. The task should be one that is routinely taught to beginners, and even after they are trained, the expert should be able to outperform them. The expert must be willing to communicate his knowledge and to commit himself to the project. Researchers have expressed serious doubts that there are individuals of this caliber for all areas of wastewater treatment operations [241]. The most disconcerting question an engineer can ask a computer scientist is “How do you know your expert is an expert?”  14  This question is analogous to the identifiability question posed in the previous section. 2.4.3  Rule Conflict: Wait and Short Term Gain  A rule conflict occurs when two rules suggest opposing actions for the same condition. There are at least three reasons why this may happen: 1. A change in a manipulated parameter may have a positive effect on one part of the plant and a negative effect on the other. For example, an increase in the recycle rate increases turbulence in the clarifier, reduces the hydraulic retention time in the aeration basin, increases the hydraulic loading on the clarifier, decreases the sludge blanket height and increases the plant’s energy consumption. 2. A change in a manipulated parameter produces a hydraulic, chemical and biological dynamic in the process. Some of these may be preferable and some not. From experience. I never heard from him again. 4 ‘  Chapter 2. PLE’s and Computer-Based Solutions  72  3. The operator is sometimes unsure how to use a manipulated variable to affect his process. One cause of this uncertainty is due to interactions between status and manipulated parameters, i.e. the effect of a change in the recycle ratio is dependent on the MLSS and the air  flow  rate  into  the aeration basin [295].  Lai [190] lists a number of guidelines used by his expert system to resolve these conflicts:  1. Select the control action that has an immediate effect on protecting effluent quality. 2. Examine the trends in the variables. If the variables are headed in the right direc tion, leave alone. 3. If a recent control action was taken, wait (action may influence today’s decision). 4. Be cautious about acting on the basis of a single unusually high or low value. 5. If there is no risk, wait. The first guideline may lead to accepting immediate (hydraulic) improvement at the cost of long term (biological) deterioration. The second guideline explains why the computer needs to know when a number is preferable, i.e. operator prefers a low effluent BOD 5 concentration over a high one. The other guidelines advise the operator to wait until the itself out, i.e. when in doubt, do nothing.  process  sorts  2.4.4  Are Expert Systems Needed?  An expert system application  is  a success if it meets three criteria [128]:  1. The expert system must represent/emulate the decision processes used by knowl edgeable person in the field.  Chapter 2. PLF’s and Computer-Based Solutions  73  2. The expert system must produce a sufficient increase in decision efficiency and quality to justify the development and maintenance costs 3. The expert system must be accepted by those who will use it. Most wastewater expert systems fail on the first two criteria. In other words, they were not the best tool for the job. An expert system is a new tool to most wastewater practitioners, and therefore, there is a danger it will be used inappropriately. For this reason, Beckman [38] developed a checklist of 75 qnestions to help a practitioner decide whether or not to develop an expert system. The list is broken into six categories: Category Task Payoff Management Domain Expert System Designer User Total Possible  Points Possible 25 20 20 15 10 10 100  The main problem with many of the stand alone expert systems developed to date is that there is no payoff because there is no need for such a system. For example, the consensns in the wastewater literature is that experienced operators do not need expert systems to help them run their plants [230] [190]. The reason there is no need is because an expert system is either the wrong tool or not the best tool to use to solve the problem at hand. Most problems in wastewater treatment plants can be placed into two groups: those an experienced operator can solve and those that reqnire outside help. The basic premise of an expert system is that there is an expert and that the expert is either unavailable or expensive. If an experienced operator on staff can solve the problem, then  Chapter 2. PLF’s and Computer-Based Solutions  74  there is no need for an expert system. If the problem requires outside help, then it usually is plant specific and not worth developing an expert system for. In most cases, the operator solves problems by conducting an investigation (e.g. addi tional analyses, in plant measurements) , and when necessary, will refer to an operations manual for help (e.g. [293]  ).  The nature of the problem is iterative meaning the operator  may decide to take additional measurements as part of his diagnosis. If the problem is generic, then an expert system could be used to solve it if the problem requires cognitive reasoning and is neither to complex or simple (i.e. an expert can solve it in 1-8 hours) [38]. If the problem is plant specific, the expert system probably will not be able to help the operator due the the system’s ignorance of the process’s particulars. In this case, it is probably cheaper to “call out” an experienced operator (if he is not there already) than to develop a plant specific expert system. The second type of problems happen infre quently and usually require a process audit to solve. The “expert” designs and sometimes supervises the audit to diagnose the problem. Again the expert is available. Expert systems excel at tasks that have a narrow domain and involve ctassification or heuristics. For example, AT&T developed an expert system, Antoprint, to simplify printing out a file under UNIX [263]. This is a successful application because UNIX is “well-behaved” system, i.e. logically consistent and predictable. Biological systems are not. There is a gap between what the expert system requires and what the operator (or expert) can provide it with created by the lack of monitoring information and process understanding. An expert system should provide a number of benefits including reduced costs, in creased autonomy from consultants, and improved quality.  The developer of such a  system must look at a minimum of three other criteria. The first criterion is the nature  Chapter 2. PLE’s and Computer-Based Solutions  of the platform that is needed to mn the system  75  15  Ideally, the system should run on  the machines now in nse in the plant. The second criterion is the portability of the prod uct. A developer will not recoup his investment by developing a system that is useable only at a single plant. The final criterion is whether the product would be useful if only partially finished. Al (Artificial Intelligence) projects are notorious for not meeting their dead-lines. The cost to benefit ratio for a successful project should be approximately [38]. Given the plant specific nature of process knowledge and the high cost to  1:10  benefit ratio, it is not surprising that most wastewater expert systems are developed by universities and government departments. Most wastewater expert system projects would not have gone ahead or at least taken the form they took if the researcher had worked through Beckman’s checklist. The consequence of this is that most environmental expert systems sit in a filing cabinet nnused [166]. The fault is not the technology but rather its misapplication. Future: The Role Of Expert System Technology  2.4.5  If a researcher works through Beckman’s list [38] or Greathouse’s six questions [128], he will realize that expert systems show their greatest potential when (1) they are part of other systems and (2) their purpose is to support rather than replace the decision maker  16  These expert systems should be small, fast, run on conventional hardware and  be imbedded in existing applications [231]: A recent trend is that expert systems are moving away from being “stand alone” systems to one interacting with other useful plant software packages, such as database-management systems, spreadsheets, and graphics. Thus, embedibility of expert systems into other programs and the calling of these For most plants at the writing of this thesis, the system should run on an IBM AT class machine. 15 This is a significant limitation as the POTW Expert [69] is noticeably slow on a machine with an Intel 80386 running at 25 Mhz. The use of expert systems to support environmental decision makers is discussed in [131] 5 ‘  Chapter 2. PLF’s and Computer-Based Solutions  76  programs from the exert system is becoming increasingly important. The underlying philosophy is to accept expert-system technology as just another software tool, and to make a judicious use of this tool in tasks best suited to it. For example, expert systems are being used now in a number of computer programs to make a piece of equipment “more intelligent”: • On-site Help: A complex piece of equipment would have a small expert system built into it that helps the operator trouble-shoot and/or operate his equipment. This is a common feature in photocopiers. • Failure Detection: A plant with a large number of sensors and other on-line instrumentation should have a program that monitors the data sent back to the plant to detect sensor failure.  The program could do on-line calibration, take  a sensor off-line and/or warn the operator that a sensor needs attention. This software exists in some chemical plants and nuclear reactors. • Data Analysis: An expert system could make data analysis easier by improving a program’s interface, interpreting analytical results, selecting appropriate data anal yses, and making suggestions on the basis of the monitoring and maintenance data. This was discussed in Section 2.2 and is the subject of Hahn’s review paper [136]. In order to imbed expert system technology into an application, the expert system must be made aware of the process and given access to structural and monitoring information. This is the purpose of the structure paradigm discussed in Chapter 6.  Chapter 2. PLF’s and Computer-Based Solutions  2.5  77  Automatic Control  The main goals in applying control methods to microbiological systems are to improve operational stability, prodnction efficiency and profit, and to handle dynamic changes during start-up and shutdown [157]. Most biological processes are difficult to control automatically because they exhibit non-linear, time-varying behavior and many of the available process state measurements are of poor quality [109]. For this reason, control in these systems remains a hierarchical two-level problem with the operator active on the upper level [155]. Gustaf Olsson [234] [236] notes that the operator cannot be factored out of the loop as he is the best person to monitor slow changes in the process, leaving the computer to deal with the fast ones. Conventional control theory deals predominantly with linear systems having constant parameters [20]. Loops in these systems are usually controlled using a Proportional Integral-Derivative (PID) controller. However, if operating conditions change, the con troller needs to be re-tuned. This becomes a problem if the system is constantly changing, which is the case with aeration and disinfection systems in wastewater treatment plants. In the last decade, Foxboro (1984), ASEA (1982) and SattControl (1984) introduced auto-tuning PID controllers. In the spring of 1986, Novatune’s product controlled over 1000 loops in a wide range of industrial processes including loops in wastewater treat ment plants and pulp and paper mills [19]. In 1986, Rundqwist [254] implemented a self-tuning dissolved oxygen controller in the Kappalla Sewage Works in Sweden. The standard deviation of the DO was 0.15 mg/l over the test period. Since then, Holmberg et al [158] and Bocken et al [54] introduced systems that estimate the oxygen utilization rate on-line to improve the controller by predicting the demand.  Chapter 2. PLE’s and Computer-Based Solutions  2.6  78  Summary  Treatment plant performance is limited by a number of factors, some of which the com puter can ameliorate. The computer is a powerful tool only if it is applied to the right problem. The right problem is something the operator either does not do well or dislikes doing. In addition to the tasks the computer now performs in treatment plants, it could be used to analyze, model, reason and control the system. However, there are limitations:  • The computer may be used analyze the data as long as the software is sensitive to the data’s characteristics. In most plants, these characteristics limit the analysis to simple summary statistics and statistical graphs. • The computer may be used to model the process but these models cannot form the basis of controllers unless they can be identified reliably on-line. • The computer may be used to reason about the process hut these expert systems should he coupled to existing software and act in support of the operator. Typically, these expert systems will be small and fast, and perform supervisory tasks such as monitoring on-line instrumentation for failures, reviewing monitoring data and providing on-line help with complex pieces of equipment. • The computer may he used to control the system but only under the supervision of the operator. The computer can control difficult loops using adaptive control algorithms such as auto-tuning. Information on what limits and improves performance comes from four sources: PLEX, CPE/CCP, audits and historical data. Because historical data is a long term record of the operator’s interaction with his process, it should he the best source of information.  Chapter 2. PLF’s and Computer-Based Solutions  79  This is not the case because the quality, coverage and completeness of these data is poor. The computer can help the operator overcome these limitations by managing the data in such a manner that the operator regains control of both his information gathering and treatment processes.  Chapter 3  Cause and Effect  The approximate answer to the right question is worth a great deal more than a precise answer to the wrong question [73J.’ The purpose of this chapter is to discuss time and cause/effect. The chapter consists of three sections: 1. Temporal Reasoning: This section provides an overview of Shoham’s Temporal Reasoning paradigm and discusses the paradigm’s application to treatment plant operations. 2. Cause and Effect: This section is broken into three parts. The first part discusses the difference between association and causation. The second part discusses the difficulty of determining an effect’s cause while the third section discusses experi mental design. 3. Operation Paradigm: This section outlines the requirements of an treatment plant operations paradigm.  ‘John Tukey’s Fzrst Golden Rule Of Statistics 80  Chapter 3.  3.1  Cause and Effect  81  Temporal Reasoning  The importance of reasoning about cause over time is fundamental in all areas of science. Given this issue’s interdisciplinary nature, the terminology is not standardized. For this reason, we chose to use the definitions laid out in Yoav Shoham’s book “Reasoning About Change” [270].  Introduction  3.1.1  Without the possibility of change, there is no reason to keep track of time. Consequently, knowledge possesses a temporal component, i.e. what is true now may not be true later. A theory describing change in the context of time provides both a language for describing what is true or false over an interval and a way of manipulating rules describing lawful change. We classify the types of temporal reasoning into four classes  2;  • Prediction: Given a description of the world over some time period and a set of rules governing change, predict the state of the world at some future point in time. For example, given the treatment plant’s current state and a calibrated plant model, determine what the effluent characteristic will be in a week. • Explanation: Given a description of the world over a time period and a set of rules governing change, describe the state of the world at a previous point in time. For example, given a set of monitoring data up to the point the clarifier started to bulk and a model, determine when the system started to change. The notions of planning and explanation are abstractions of the concepts of controllability and 2 observability in linear systems  Chapter 3.  Cause and Effect  82  • Planning: Given a description of a desired futnre state and a description of the current state, provide a set of actions to achieve this future state. For example, draft a strategy that will fix the bulking problem in the secondary clarifier. • Learning: Given a description of the world at a number different times, provide a set of rules that account for regularities in the description. For example. explain how the Mean Cell Residence Time (MCRT) influences the sludge’s settleability. To plan and to learn require a higher level of thinking than to predict or to explain (cf Bloom’s Taxonomy). These four types of reasoning form the basis of plant operations. If an operator can predict, then he can plan. Similarly, if he can explain, then he can learn. If he can plan and learn, then his ability to operate the plant will improve along with the plant’s performance. However, the operator must know when he can optimize if he is to plan and he must be able to review his decisions if he is to learn. These are the principles that form the basis of Box and Draper’s EVOP [58]. 3.1.2  Problems and Solutions  The goal of a temporal reasoning algorithm is to reason correctly and efficiently about what is true over extended periods of time. In other words, the algorithm’s goal is to reach a decision quickly expending the least amount of resources without making a mistake. The general problem exists because these two criteria, efficiency and accuracy, are contradictory. The general problem can be split into two problems: the qualification problem and the extended prediction problem. The qualification problem arises from the relationship between the amount of knowledge required to make a decision and the accuracy of the decision. For example, an operator observes ash-like particles on the liquid surface of his  Chapter 3.  Cause and Effect  83  secondary clarifiers. This condition may be caused by either the onset of denitrification in the sludge blanket or high grease levels in the solids in the aeration basin [293]. The operator mnst decide how much information he needs to collect before he can decide what the canse of the problem is. For example, is the operator willing to conduct a grease analysis on the MLSS (as suggested by Tsngita et al [293]) in order to diagnose his problem? The extended prediction problem arises from the relationship between length of time over which a prediction is made and the accuracy of the prediction. A special case of this problem is the persistence problem. The persistence problem occurs when we predict on the basis that a fact remains true over the prediction interval. For example, we may map out an operations strategy that is based on the assumption that the operator can control the SRT over the prediction interval [301]. Both the extended prediction and qualification problems must be taken into account when controlling a treatment plant using a model. The amount of data needed to calibrate a model may be more than that needed to operate the plant. In this case, it makes sense to calibrate the model intermittently. The length of time between calibrations must be short enough to ensure the model is reliable but long enough to ensure that the use of the model is feasible. The solution to these problems is chronological ignorance. An operator chooses a compromise between efficiency and accuracy that provides him with sufficient accuracy to run the plant given the resources he has available. However, the operator must inclnde in his daily decision making an evaluation of the content of this compromise. For example, the operator assumes that by manipulating the MCRT. he can effect a degree of control on the system. This is true as long as the clarifier is able to separate the solids. However, if he has to stop wasting in order to maintain some solids in the aeration basin, then he can no longer manipulate either the F/M ratio or the MCRT. What was true when he  Chapter 3.  Cause and Effect  84  made his compromise is no longer true, so he must reevaluate his monitoring program to reflect the new state of his system. The is the situation that one researcher encountered when he was developing a process control strategy based on decision theory [301]. The consequence of this solution is nonmonotonicity. Nonmonotonic reasoning occurs when the addition of a rule to the knowledge base can force us to retract an inference. Nonmonotonicity and common sense are two reasons why expert systems have difficulty with treatment plant operations  .  However, an operator can live with nonmonotonicity  as long as he periodically reviews his temporal assumptions.  The problem with common sense is that it leads to nonmonotonic logic. For example, if we know 3 something is a bird, we assume it can fly. If we find out it is an Ostrich, then we retract this conclusion.  Chapter 3.  Cause and Effect  85  Cause And Effect  3.2  A clear understanding of the cause and effect relationships within a treatment plant is a prerequisite for successful operations. If the operator searches for the effect of a cause, he is an observer, while if he causes an effect, he is an experimenter. All other things being equal, it is better to be an experimenter than an observer. In both cases, the operator avoids obscuring these causal relationships by the introducing additional changes or by not monitoring the process. 3.2.1  Association Versus Causation  Assume operator plots the effluent COD against a number of plant parameters including the volume of supernatant returned by the anaerobic digester to the primary clarifier. Assume he notices a linear relationship and regresses one on the other. In this case, the Coefficient of Determination, R , is high [137]. The Coefficient of Determination is that 2 proportion of the dependent variable, effluent COD, that is accounted for by the regres sion equation in the independent variable, volume supernatant returned. Unfortunately, the Coefficient of Determination gives no indication of whether the lack of perfect pre diction is caused by an inadequate model or by experimental uncertainty [95]. On other words, a high R 2 value does not necessarily indicate a statistically significant equation. All statistical methods are based on a set of assumptions. Therefore, the result of any statistical  test is conditional on the fit between the test’s assumptions and the data  analyzed. The R 2 value is no exception. For example, even though the data may satisfy the test’s assumptions, the significance of the R 2 value is still dependent on the number of data points and the number of parameters in the regression equation  .  The intent of this section is not to discuss regression but rather the determination of cause and 4 effect relationships. The topic of regression is thoroughly discussed in Draper and Smith’s text Applied Regression Analysis [97] and Mosteller and Tukey’s text Data Analysts and Regression [217]  Chapter 3.  Cause and Effect  86  Similarly, a high R 2 value does not necessarily indicate a useful regression equation either. A large R 2 value can result from the data being taken over an unrealistically large range of the independent variables or a small R 2 may result because the system is being restrained, i.e. controlled. Because the data do not come from an experiment, a high R 2 may indicate association but not necessarily causation [255]. For example, given the function y  =  .sirz(x), y is functionally related to x yet the  correlation between the two is zero [26]. On the other hand, there are at least four situations where two variables are correlated but not in a causal relationship: 1. Chance: Two unrelated phenomena may show a correlation for no reason at all, e.g. an observed association between the circulation of New York public library and the changes in the ozone layer. 2. Dependence: Two variables may share a common cause, e.g. the decrease in the number of stork nests in East Prussia and the decrease in the number of human births, both caused by growing industrialization. A second example would be the decreasing rate of substrate utilization and the decreasing rate of solids production during a batch test, both of which are a function of the metabolism rate. 3. Inhomogeneity: The data are taken from three distinct groups. The correlation between the two variables is nonexistent within the group but, when the data are pooled, the correlation exists because of the inhomogeneity between groups. For example, Horvath, in his text on scale-up of wastewater unit processes, argues that the ratio of volume to surface area should be held constant when scaling up a bioreactor [162]:  Chapter 3.  Cause and Effect  87  Volume to Surface Area Ratio Cylindrical Tank 25.0  \\  Scale: Diameter13.7 m and Depth=8.5 m  20.0  Ratio=2.44  1 5.0 I.  a) U)  2 c  2.0  10.0—  5.0  0.0-I0.0  1.0  I  I  2.0  3.0  I  I  4.0 5.0 Depth (m)  I  I  I  6.0  7.0  8.0  Figure 3.1: Scale-up : Volume to Surface Area Ratio Contours  9.0  Chapter 3.  Cause and Effect  88  The wall effect is more or less pronounced in virtually all cases and may modify not only the hydraulic but the physico-chemical and biological conditions. The boundary snrfaces may, for instance, act as a catalyser [sic] and the biological film developing on the wall must also be taken into account especially in small tanks. The ratio of the internal (active) boundary surfaces and volumes of the reactors of different size should be possibly equal. -  Given a full-scale cylindrical reactor, 8.5 in high and 13.7 rn diameter, the volume to surface area ratio is 2.44 in. Figure 3.1 shows the relationship between depth and diameter for a number of volume to surface area ratios. A researcher must maintain a reactor with a height over 4.0 in in order to maintain the full scale ratio of 2.44 in. If the wall effect is significant and if a different ratio is used at bench, pilot and full scale, then when these data are pooled, a correlation will exist because of the varying importance of the wall effect in each group of data. 4. Dead Soldier: One of the variables is used to derive the other, e.g. plot effluent solids against Mean Cell Residence Time (MCRT) where the calculation of MCRT includes unintentional wastage.  Chapter 3.  3.2.2  Cause and Effect  89  Effect To Cause: The Poor Cousin  The weakest form of scientific investigation is observation. In this case, the researcher conducts measurements on the system and tries to sort out what is a cause and what is an effect. However, this is often the only way to study a phenomenon because of the nature of what is being studied, i.e. field conditions can not be duplicated in the laboratory. The underlying premise of an observational study is that association may indicate causation, i.e. “where there is correlational smoke, there may he causational fire” [154]. Different disciplines have formalized guidelines for sorting out cause and effect. In medicine, Koch ‘s Postulates are used to implicate an organism as a pathogen. Social scientists use path analysis and other correlation techniques while economists rely heavily on time series analysis. One set of guidelines broad enough to be useful in environmental engineering were proposed by Sir Bradford Austin Hill, who was among one of the first public health officials to argue that there is causal connection between smoking and lung cancer. Hill’s criteria are listed below [154]: • Temporality: The cause should precede the effect, i.e. what happens upstream may be the cause of what happens downstream. To this we must add one proviso, we can only reason about monitored causes. For example, we cannot link denitrification to rising sludge in the secondary clarifier unless we monitor changes in the nitrate concentration. • Natural Experiment: A Natural Experiment  is an unplanned event that ap  proximates a planned experiment. For example, due to a malfunction at a winery connected to the sewerage system, the plant is organically overloaded for a short period of time. A Natural Experiment may approximate a quasi-experiment. A quasi-experiment is an experiment 5 where the various treatment groups cannot be presumed to be initially equivalent within the limits of sampling error [80], e.g. repeated experiments on a treatment plant.  Chapter 3.  Cause and Effect  90  Plausibility, Coherence, Analogy: A causation link must be plausible and coherent in the sense that it does not conflict with known facts. Similarly, the dy namics of the situation should be analogous to similar situations that have occurred elsewhere. These principles are very important when a number of causes lead to the same effect. For example, low DO concentration in the aeration basin, nutri ent deficiencies, presence of sulfides, low organic loading and fixed film cultures in bench-scale treatment processes (e.g. in tubing) have all been identified as causes of sludge bulking [171]. • Strength: It is easier to suspect a casual link if the cause and effect are strongly associated. Association may take various forms. For example, a spectral analysis  6  of the winery waste and the plant’s effluent characteristics shows that certain dy namics found in the winery waste appear in the effluent characteristics that cannot be attributed to dynamics in the domestic influent. • Consistency: The same conditions should produce similar results. All good re search uses replicates, either through time or in space. The reason for this is that there is no guarantee that the same conditions will produce the same results. For example, Zaloum [317] applied the same organic loading to two systems at the same SRT and observed a different response. The difference occurred because the two systems were operated at different hydraulic retention times (HRT). • Specificity: A specific cause has a specific effect. Based on experience, Junkins et al [177] recommend that the operator maintain a 3 feet (< im) sludge blanket depth in the secondary clarifier. However, Crosby [83] observed that the effect of Spectral analysis is a useful exploratory tool as long as person is aware of its limitations. In our 5 case, we used spectral analysis to examine the Penticton’s data and identified behaviors in the data that the operator later explained. See spectral density in glossary.  Cause and Effect  Chapter 3.  91  the blanket height on the effluent solids depended on the settling properties of the sludge. Crosby recommends the following:  —  If sludge settleability is poor, and flow rate is steady, then raise the blanket so that the inlet is below the blanket. If the height of the blanket is not controllable, increase the recycle to lower the blanket below the inlet.  —  If the sludge settleability is good, then lower the recycle to lower the blanket below the inlet.  In this case, the specific cause (blanket height) has a specific effect (effluent solids) depending on the settleability of the sludge. Statisticians refer to this as an inter action. To this list, we need to add one more item. A cause must be a variable that can be manipulated. An attribute (or an intrinsic variable) cannot be a cause because an attribute’s value changes only when the object it is describing changes. For example, the redox potential of an anaerobic reactor in a biological phosphorus removal plant does not cause the bacteria to release phosphorus. However, the redox potential and phosphorus release may be correlated because both are caused by the same mechanism. In other words, “causes are only those things that could, in principle, be treatments in experiments” [154]. 3.2.3  Effect To Cause  -  Treatment Plants  A treatment plant is a complex system. If an operator must determine what caused an effect in the system, he starts by tracing upstream. Possible causes fall into two categories: disturbances and control actions. A disturbance is a cause whose origin is outside the plant, e.g. influent, weather, power failure. A control action is something the  Chapter 3.  Cause and Effect  92  operator does to the system, e.g. recycle rate, wastage rate, DO set point. The analysis is complicated by three factors: limit cycles, loops and the operator’s actions. A layman’s definition of a limit cycle is a finctuation in the system that the operator cannot remove.  Limit cycles may be induced by predator/prey type relationships in  the biomass or by relay-like controls in the plant, e.g. pump cycles governed by a level sensor. The operator must be able to distingnish between a “real” effect and a natural fluctuation in the system. Baltzis and Fredrickson [24] describe a situation where a pure culture exhibits a limit cycle when grown on two limiting nutrients. Assume we have an organism growing in a chemostat where either S or S 2 may be the rate limiting nutrient. At dilution rate D (Figure 3.2), there are two steady states, A and B, depending on the direction from which the equilibrium is approached. If we establish a steady state at B, then decrease the dilution rate slowly maintaining a quasi steady state, we move along line b If we move slightly past b, the system will jump to line c rate, we will proceed along c  —  the system will jump to line b  —  —  d to b.  a. If we increase the dilution  a until we hit a. If we increase the dilution rate further,  —  d.  A loop is a path that returns to its starting point, not crossing any intermediate variable twice. Feedback loops come in two varieties: negative and positive feedback. A negative feedback occurs when an increase in the initiator sets into motion processes that lead to its decrease. For example, if the dissolved oxygen (DO) level in the tank increases, the DO controller responds by decreasing the air supply leading to a decrease in the DO. Positive feedback occurs when an increase in the initiator sets into motion processes that lead to an increase in the initiator. For example, the effluent solids level rises due to poor settling sludge. The operator reacts by increasing the recycle rate with The reader should refer to either Astrom and Wittenmark’s Adaptive Control [20] or Beltrami’s 7 Mathematics For Dynamzc Modeling [39]. Limit cycles are fine in the process but not in the data analysis.  Chapter 3.  Cause and Effect  93  A and B are steady states at dilution rate D  Steady State u=D  / d  S2-j/  b  Hysteresis  0 Si Figure 3.2: Hysteresis  Chapter 3.  Cause and Effect  94  view of drawing the blanket down. The increased recycle rate increases the hydraulic loading on the clarifier, increasing the turbulence causing less sludge to settle out. If an effect is in a loop, then its cause could be downstream. For example, if the MLSS in the aeration basin is dropping, the cause may be due to problems downstream in the secondary clarifier. A common identification problem is identifying the open ioop characteristics of a system when the process is in a closed ioop. We will not discuss this problem in detail but rather will focus on three points raised in the review paper by Gustavsson at al [132]. The first point is that an inherent feedback system with a rate faster than the sampling rate is part of the open loop system. The second point is that the degrees of freedom in the model cannot exceed that in the data collected on the system. These two points limit on-line identification in treatment plants to simple models whose response time is in hours or days rather than seconds. The last point is that one cannot identify the open loop unless one knows the structure of the regulator. In automatic control, this means there must be a regulator model, e.g. PID model. In other words, in order to determine how a process reacts to a change, an operator must know why the change occurred. These limitations may he extended to the case where the operator acts as the con troller. In order to understand the system, we must understand the operator. If the operator makes a change, we must know what caused the operator to act. Assume the operator changes the recycle rate in response to a change in the effluent solids concentra tion. Physically, there is no causal connection between the two yet the change in effluent solids caused the recycle rate to change. In this case, the connection between these two parameters is through the operator. If an operator is aware how difficult it is to determine the cause of an effect, he will do all he can to not complicate it any more than it already is. The computer knows the plant’s structure and understands the monitoring program. If an operator wants to  Chapter 3.  Cause and Effect  95  control an effect, the compnter can identify which control variables will influence the effect. In other words, the computer provides the “road map” and the operator does the “driving”. 3.2.4  Cause To Effect  -  Fundamental Problem Of Causal Inference  The Fundamental Problem Of Causal inference forms the basis of our discussion of de tection of cause and effect. The problem, posed formally by Paul ‘N. Holland [154], frustrates treatment plant operations as much as it does laboratory research: Because it takes at least two treatments (cause present, cause absent) to determine an effect, and only on treatment can be applied to an unit at a time, then it is impossible to observe the effect of the treatment on the unit. For example, an engineer wants to know if he can improve the operation of an aerobic sludge digester by controlling the pH. However, he cannot simultaneously observe the effect of two causes. There are two general solutions to this problem: the Scientific Solution and the Statistical Solution. We use data extracted from a thesis by Anderson as the basis of our example [10]. Assume our researcher obtains a second reactor, identical in every way to the first. Also assume that he operates both reactors in such a way that they both reach steady state. Once he is convinced that the two reactors are identical, he introduces pH control to the second reactor. By comparing the performance of the two reactors, he can infer what the effect of pH control is.  Our researcher solved the problem by making the  untestable assumption that the two reactors are exactly same in all ways except for the pH controller. This approach is referred to as the Scientific Solution. Figure 3.3 shows the solids destruction data for the two reactors. Given the noise in the data, the researcher can not detect a difference between the reactors (cf Jenkins [170]). The Analysis Of Variance (ANOVA) Table confirms this (p  =  0.229). To overcome this  Chapter 3.  Cause and Effect  96  problem, our researcher decides to repeat the experiment. He redesigns his work using the principles underlying Experimental Design [79] [59] [53]: 1. Replication: A single treatment is applied to more than one experimental unit. The experimental error is the variation between replicates. Replication is analogous to Hill’s consistency requirement. For example, our researcher repeats the experi ment twice, one with sludge from plant A and second, with sludge from plant B. 2. Randomization: The decision about which experimental unit shall receive what treatment should have a random component. The goal here is to avoid confounding the researcher’s bias with the experimental results. In our case, randomization is less an issue because we have only two treatment units within each run. 3. Control: The researcher exploits the structure of the experimental units to reduce the debate over what caused what: • Balance: Assign the treatments to maintain symmetry, i.e. avoid confounding and incomplete blocks if possible. For example, the researcher applies both treatments twice, once in each run. • Block: Assign the treatment units such that the units within the blocks are identical in all ways except for the treatment they receive. For example. the only difference between run 1 and 2 is the source of the sludge. • Group: Placement of experimental units into a homogeneous group to which the treatment is applied. With only two reactors, grouping is not an issue.  Chapter 3.  Cause and Effect  97  Scientific Solution Aerobic Digestion -  Fill and Draw Reactors  r. I  140 12O  No Control  >%  -D 0)  ci > o  pH=7  80  60 40  4—.  Cl)  ci)  0  204  C,) C,) Cu  -20 -40  —_____________________________________  —60 -10  I  0  10  20  30  40  Day  Figure 3.3: Scientific Solution  50  60  I  Chapter 3.  Cause and Effect  Table 3.1: Statistical Solution  98  -  Analysis Of Variance  Reactor Statistics Average Standard Destruction Deviation [kg/d] [kg/d] Run 1: No pH Control 6.8 14.3 Run 1: pH Control 18.0 16.7 Run 2: No pH Control 10.9 17.0 Run 3: pH Control 22.4 26.0 Analysis Of Variance Treatment Probability pH Control/No Control 0.065 Moderately Significant Run or Sludge Source 0.003 Significant Reactor (n=34)  Figure 3.4 is a plot of the results from the two runs. Just as in the Scientific Solution example, the measurement error in the solids test obscures the difference between the reactors. However, the researcher made two assumptions when he laid out his experiment that will enable him to factor out the sampling error: • Unit Homogeneity: The characteristics of the reactors remained the same through out the experiment. • Constant Effect: The effect of the pH controller on the reactors was the same no matter when it was applied. The experimental layout is a randomized block design with days within a run nested within the treatment. The ANOVA confirms that the pH controlled reactor performs better than the uncontrolled one. We used BMDP (3V) [298] to analyze the data (see Table 3.1).  ©  General Mixed Model Of Variance  Chapter 3.  Cause and Effect  99  Statistical Solution-Aerobic Digestion Fill and Draw Reactors 120 Run 1: No Control  100 >  80  Runl:pH=7  0)  60  —  Run 2: No Control  -  ci)  >  040  Run2:pH=7  L.  C’)  a) Cl) Cl)  20 0 -20  a —40 -10  0  10  20 30 Day  40  50  Figure 3.4: Statistical Solution  60  Chapter 3.  Cause and Effect  100  The above example illustrates what is required to detect the effect on a cause under the ideal conditions in the laboratory. In the first example, the measurement error swamped out the experimental error making it impossible to detect any difference between the reactors. In the second example. our researcher replicated the experiment. In this case, the analysis was able to remove the sampling error from the experimental error and detect the improvement in solids destruction caused by controlling the pH. If the solids measurements were less precise or the researcher did not take care to prevent a second cause from acting on the reactors, the experiment would have failed. 3.2.5  Treatment Plants and Time Series Experiments  The previous section may appear to be an argument for constructing treatment plants with parallel treatment trains. In a sense, it is becanse one treatment train can act as the control and the other as the experiment. If the experiment fails, the inventory and capacity of the control can be used to regain control of the system. However, in many cases, having two trains is too costly. In this case, one must be careful not to upset the system with a poor control or optimization decision. The question we need to ask is what if our researcher in the previous section had only one reactor? Both the Scientific Solution and the Statistical Solution could be applied if he can assume the characteristics of the sludge do not change from run to run. Our researcher can take one of two approaches. The first approach would be to fill the reactor with fresh sludge, wait for steady state, measure the destruction, empty the reactor and start over. In this case, there would be no carry over effect from run to run due to the application of a treatment to the sludge. Our researcher would lay out his experiment such that the application of the treatments is independent of time [91] [90] [173]. The second approach would he to use the same sludge throughout. We refer to this as a Time Series Experiment [122]. Tables 3.2 and 3.3 list cause and effect archetypes  Chapter 3.  Cause and Effect  101  Table 3.2: Time Series Experiment Object Type Duration  -  Cause Archetypes  Elements { Level, Trend, Stability, Frequency { Pulse, Square, Step }  }  respectively. In classical experimental design, a cause is a change in version (e.g. fertilizer A or B) or a change in level (e.g. high and low dosage of fertilizer A). In a time series experiment, the cause may also be a trend (e.g. increasing flow), instability (e.g. greater distance between peak and low flows) or a new dynamic (e.g. an industrial discharge that follows a 2 day cycle). A time series experiment must also be concerned with a cause’s duration. Classical experimental design detects an increase in levels of effects. However, a time series effect may also manifest itself as a trend, an instability or a new dynamic. The experimenter must also he concerned with the elasticity of the effect (i.e. if the cause is removed, does the effect remain) and the lag between the start of the cause and the start of the effect. For these reasons, a time series experiment is much more difficult to interpret than a classical experiment. The discnssion that follows elaborates on this increase in complexity. The independence of experimental nnits is a fundamental assumption underlying ex  perimental design. In the hard sciences, the determination of the independence of the experimental units is trivial unless the layout is a time series experiment. The reason for this is twofold: (1) difficulty in determining steady state; and (2) the danger of caus ing permanent changes to the biomass such that the response is dependent on previons treatments. A reactor mnst be at steady state before the researcher can measure its performance and before he can apply a new treatment. The reason for this is that during the transition time between steady states, a reactor’s performance may fluctuate. In some cases, the  Chapter 3.  Cause and Effect  102  Table 3.3: Time Series Experiment Object Type Elasticity Delay  -  Effect Archetypes  Elements { Level, Trend, Stability, Frequency { Elasticity. Inelasticity } { Delay, No Delay }  }  performance may deteriorate before it improves. For example, Olsson and Chapman [235] introduced a step change into a secondary clarifier in both directions and observed that the transient response depended on the direction of the step change. A mixed culture is a complex, nonlinear system [28] [278]  [2731.8  For this reason, the  researcher must determine 1. if the system is at steady state, 2. if there is more than one possible steady state, and 3. if the steady state is stable. Portions of a complex system may reach a steady state  while other portions may  still be in transition. For example, Kucnerowicz and Verstraete [189] wrote that their laboratory scale activated sludge system reached a reasonably consistent steady state in terms of substrate removal efficiency, effluent suspended solids and sludge volume index in 2-4 sludge ages. However, it took the same culture at least 10 sludge ages to establish a steady state in terms of nitrification and endogenous respiration. Turk [296] operated one reactor for over a year and did not achieve steady state with respect to nitrite build-up.  8 F or a survey of wastewater microbiology, the reader should refer to references [127] or [218] A better term may be stable state as steady state is impossible in a mixed culture system driven by 9 a changing input function.  Chapter 3.  Cause and Effect  103  Table 3.4: Mixed Culture/Mixed Substrate Interactions  Type  Effect On Species A  Effect On Species B  Mutualism Competition Neutralism Commensalism Parasitism Predation Amensalism  +  +  —  —  0 + + + 0  A mixed culture’s current state is dependent  0 0 —  —  —  011  its history as well as the current  environmental conditions [305, pp. 642]: and for that matter almost all models presently in use, state that the behavior of the biomass-substrate system depends only on the present state, and there is no provision for the past history of the microorganism. It has been recognized for a long time, however, that the observed response of a cell population at a certain time instant is the composite result of various biological processes that were initiated at different time instants in the past as a response to instantaneous environmental conditions prevailing at each particular time. Thble 3.4 lists seven types of interactions found in a mixed culture/mixed substrate system [65]. The significance of these interactions is that they make the state of the culture dependent on its history, i.e. non-markovian. For example, the shifts between competition, mutualism and predation depend on the density and age structure. For this reason, an operator may not be able to maintain the system at the current steady state. Once the dynamics in the culture start to change, so does the system. An excellent example of this is documented in Turk’s thesis [296]. Turk decreased the oxidation of nitrite to nitrate using the inhibitory effects of free ammonia. The system  Chapter 3.  Cause and Effect  104  operated in what appeared to be a steady state for over 4 months. However, once the biomass became acclimatized to the ammonia, the system’s state began to change. Turk was unable to maintain control over the system at this point. A history dependent, nonlinear system may have a number of steady states, both stable and unstable. For this reason, stability analysis should be the part of any lab oratory or modelling research. Takamatsu et al [289] conducted a stability analysis on a simple system with two types of organisms, fioc forming and bulking. The organisms competed for the same substrate. They identified four equilibrium points: normal state, bulking state, coexistence state and washout state Only the normal state was desirable from an operations point of view Bertucco et al [52] examined the stabihty of a reactor with inhibition kinetics and identified conditions under which the system may become unstable. Stability is as important to an operator as is performance. This is why most operators prefer to respond to change rather than to induce change in their processes, i.e. they are risk adverse. When a treatment plant operator introduces a change, he is conducting a time series experiment. However, unlike the researcher, the operator must be concerned about the transient as well as steady state response because he must meet his discharge permit at all times. An operator must be careful not to mistake a transient response as a steady state. An action in a wastewater treatment plant may have a hydraulic, chemical and/or microbiological response. For example, an operator may make a change in his plant that causes his performance to improve in the short term but deteriorate in the long term. This may occur if the hydraulic response to the change is positive but the biological response is negative. If the operator assumes that the system has now reached a stable state and introduces a second change, the long term effect of the first change will be confounded with the short term effect of the second change. George Box introduced Evolutionary Operation (EVOP) as a sequence of small two  Chapter 3.  Cause and Effect  los  level factorial experiments run in a completely randomized design. However, EVOP may be viewed as a methodology rather than a sequence of a particular type of experimental design. EVOP has four distinguishing characteristics [58] [60]: • EVOP is a sequence of statistically designed experiments. • EVOP searches for an optimum on a response surface under the direction of an EVOP committee, i.e. plant staff. • EVOP is a day-to-day operational procedure. • EVOP couples the staff’s knowledge of the process with the results of a statistical analysis. EVOP has a number of variants but the underlying principles remain the same. Springer et al [281] applied Box’s EVOP to an activated sludge plant in Miamisburg, Ohio. Box’s EVOP is probably the simplest statistically based procedure that could be applied to wastewater treatment plants. Even so, EVOP suffered from the same limi tation that most sophisticated schemes face  —  time. For example, a typical run could  take at least 12 sludge retention times (SRI). At a ten day SRT, a study that began on August 1 would end near the beginning of December. If the operator replicates the run, the study would end near the beginning of April. During this eight month period, the plant would have experienced a number of disturbances whose effect may mask the effect of the operator’s actions. The other alternative is to use intervention analysis. At best this analysis requires 30 days of data prior to the change followed by at least 30 days of data following the change. If the operator suspects that the system’s response may lag the change, the post change period would have to be even longer [298]. Most fault detection and change detection routines have the same appetite for data [27] [310].  Chapter 3.  3.3  Cause and Effect  106  Synthesis  The determination of cause and effect is the basis for the Scientific Operation of wastewa ter treatment plants. Scientific Operation is a strategy that Abel Wolman [313] advocated as early as 1922 and is still being advocated today [228]: The scientific method involves observation and experience, formulation of a working hypothesis, testing the validity of the hypothesis through experi mentation, and the acceptance or rejection of the hypothesis. In other words, the integration of past experience and close observation allows an operator to formulate a conjecture about the most efficient method to run a piece of equipment or process. From these conjectures, a working hypothesis may be developed that serves as the basis for further investigation. Once a useful working hypothesis has been developed, sampling and testing programs may be implemented. Such programs will provide relevant infor mation on the validity of the hypothesis. As a final step, the information gathered during the optimization study can he used as a measurement of the soundness of the original hypothesis. The scientific method provides a framework for operation and should not be tied to a particular form of statistical analysis (as is EVOP). Instead, an operation’s strategy should allow the operator to use an analytical method that fits the data. Although the scientific method forms the basis of experimentation, observation and operation, operation differs from its counterparts in that the operator must maintain control of his system in real time. For this reason, a computer program that assists the operator must track cause and effect relationship in the plant so that the operator can decide whether to intervene or not. Given the importance of an intervention, the program must enable the operator to perform three tasks: • To rule out spurious causes: Prevent low quality data or problems in the monitoring program from being mistaken for a change in the system.  Chapter 3.  Cause and Effect  107  • To evaluate his decisions: Store the reason for a change with the change so that the operator can determine whether his reasoning was correct. • To determine when a causal link no longer exists: Warn the operator when a control act will no longer effect the system.  Chapter 3.  3.4  Cause and Effect  108  Summary: Control Cycle  An operator controls his process by changing an operator-set parameter (i.e manipulated parameter) to respond to changes in an effect or disturbance parameter. A control cycle includes the detection, the introduction, and the monitoring of a change (Figure 9.5). An operator detects change through his monitoring program. First, an operator must define change. Change may be a shift in magnitude, trend, stability or hmits of the value, preference or quality of a measure. Second, an operator must decide when he has enough information to confirm that a change has taken place, i.e qualification problem. If he acts too soon, he may make a mistake; if he waits too long, he may be too late to make an effective improvement to the biological system. Third, he must decide in which system the change took place. A change may indicate a problem in his information gathering or his treatment processes. Finally, an operator must decide whether he should intervene. If he is too late, an intervention will just perturb the process. An operator should not intervene unless he absolutely has to. If an operator decides to intervene, he must decide what to change. This decision is based on the operator’s understanding of the cause and effect relationships in his process. First, if the operator is responding to an effect, he must try to determine its cause. Reasoning from effect to cause is fraught with many pitfalls so the operator must reason carefully. Once the operator constructs a list of possible causes, he must decide whether they still exist (i.e did they change back?) and whether their effect is elastic or inelastic (i.e is the effect dependent upon the ongoing presence of its cause?). In order to determine how he should intervene, the operator must work back to find operator manipulated causes of the effects of these disturbances. Once the operator constructs a list of operator set parameters, he must decide if a change in one of them will effect a change in the process and what this change will be. If an operator set  Chapter 3.  Cause and Effect  109  parameter’s relationship with its effect is on its limit of existence (i.e. is the process controllable), then it is not the parameter to change. An operator cannot assume that cause and effect relationships remain static; they do change, and in some cases, cease to exist (i.e. nonmonotonicity). If none of the candidates will do and a change must be effected, then the operator must intervene catastrophically (i.e. push the process back into being controllable by introducing a new cause). Once an operator effects a change, he must monitor the change to determine if (1) the change in the operator-set parameter did what he thought it would, (2) the situation that caused him to act has changed and (3) his process has reached a stable state. An operator faces two challenges. First, he must define what “stable state” is and second, he must decide what to do if he has not reached a stable state and a new change starts to act on his system. This control cycle forms the basis of evolutionary operation as each cycle provides information on the system. Therefore, the operator should keep the causal relationships as simple as possible (i.e. limit the number of changes) and follow his decisions through to their end result to determine if he reasoned correctly. Apart from detecting change and grouping parameters into cause/effect relationships, the role of the computer is twofold: (1) to link the change that caused the operator to act to the change (if any) the operator made, and (2) to link the change the operator made to the change’s effect. In Chapter 8, we show how a computer accomplishes these tasks using a network database management system (without an expert system).  Chapter 4  Measurement Process  [Measurement] does not belong to the modelling of a reactor as such. Nev ertheless, if theoretical methods are to be used for estimation and control, it is necessary to model the measurements, i.e. how the measurement output relates to the states of the process. [108] An operator obtains information on his plant in two ways: observation and measurement. The difference between the two mechanisms is the role of operator: the operator makes an observation but takes a measnrement. The pnrpose of this chapter is to discuss this process in the context of the operation of wastewater treatment plants. This discussion is broken into five sections: 1. Measurement 2. Sample Handling And Preservation 3. Sampling 4. Quality Assurance and Quality Control (QA/QC) 5. Measurement Process Model A datum has meaning becanse we know where it was obtained, how it was obtained and its quality. A datum represents a point or interval in time depending on whether it is an average or originates from a composite sample. How we manipulate a datum depends on its scale while how much emphasis we place on it depends on the datum’s quality.  110  Chapter 4. Measurement Process  111  The measurement process is as important to understanding the meaning of a datum as is knowing where in the plant the datnm originates. However, control systems and models continue to be built that ignore this process altogether. The result is research that is never used and systems that are either unreliable or unusable. Recognizing this fact, the Measurement Paradigm discussed later in this thesis draws heavily on the material discussed in this chapter.  Chapter 4. Measurement Process  112  Measurement  4.1  This section consists of two parts: a discussion of measurement theory and wastewater measurements. Two points can be extracted from this discussion: 1. The onus is on the analyst to ensure that an unstable measure’s value is correct. 2. A datum’s measure determines, in part, how the datum can be analyzed. 4.1.1  Measurement Theory  A measurement is the assignment of symbols to attributes of objects according to a pre defined set of rules. A method of measurement consists of specifications of the equipment to be used, the operations to be performed, the sequence in which these operations are to be executed, and the conditions under which these operations are to be conducted [164]. A measurement process is the realization of a method of measurement in terms of partic ular conditions that, at best, only approximate the conditions prescribed. A method of measurement may he viewed as a template (or a model) which when imposed on reality extracts information. The degree to which the template fits determines the utility of the result [111, pp. 27]: Measurement then is seen as the construction of a model of some property of the world. Like all modelling it involves the establishment of a correspondence between an empirical relational system (the world) and a formal relational system (the model), so that one can be said to represent the other Therefore, the meaning of a measure is determined by the degree of correspondence between the conditions under which a method of measurement was derived and the conditions under which the measurement was made [103]. A measure may be expressed using any symbol to which we can attach meaning. We attach meaning by comparing a measure against a scale. In the hard  sciences, the  An electrical circuit is a hard system while the politics in the Middle East are a soft system. 1  Chapter 4. Measurement Process  113  predominant expression is a number; preferably accompanied with an indication of the number’s precision and accuracy. Numbers are a convenient and an “unambiguous way of delimiting and fixing onr ideas of things” [111]. Alternate forms of expression include linguistic variables and fuzzy numbers. Rules define how symbols are assigned to attributes. Different rules provide different measures of the same attribute. For example, substrate concentration is an attribute of an effluent sample. This attribute cau he measured using the Chemical Oxygen Demand (COD) test or the Biochemical Oxygen Demand (BUD) test. The method by which the attribute is measured in each test differs, resulting in different numbers for the same sample. The mapping of an attribute into a measure should be homomorphic, meaning the structure of the attributes appear unmodified in the measure of the attribute, e.g. high substrate concentration means both a high COD and BOD value. Measures of an object’s mass and a person’s Intelligence Quotient (IQ) are at opposite ends of the spectrum of possible measures. The former is based on verifiable scientific theory and is easy to execute properly. As the measurement of mass is standardized, mea sures conducted anywhere in the world can be compared. In contrast, because scientists cannot agree on what intelligence is, they cannot agree on what the IQ test measures. Typical of measures in the soft sciences, the correct execution and interpretation of an IQ test requires a high level of training. Consequently, these measures fall under a greater degree of scrutiny than measures in the hard sciences. However, until a better test of in telligence is found, IQ is as important to researchers as the measurement of mass because “measurement is a necessary bridge between the real world and our ability to investigate its attributes” [110, pp. 71].  Chapter 4. Measurement Process  114  Statistical Control In order to qualify as a specification of a measurement method, a set of instructions must be sufficiently definite to insure statistical stability (of repeated measurements) [103]. W. A. Shewart, an early proponent of this concept, viewed a measurement process as being in statistical control when samples, cumulated over a suitable time interval, give a distribution of a given shape, time after time. Under these conditions, unaccounted variation is random in nature [4], [106] [307]. When this is the case, the arithmetic mean of a set of measurements approaches a limiting value which may or may not be the true value. Eisenhart expresses this fact in the Postulate Of Measurement [103, N. E. Dorsey, quoted on page 168]: The mean of a family of measurements of a number of measurements for a given quantity carried out by the same apparatus, procedure and observerapproaches a definite value as the number of measurements is indefinitely increased. Otherwise, they could not be properly called measurements of a given quantity. In the theory of errors, its limiting mean is frequently called the true value, although it bears no necessary relation to the true quaesitum, to the actual value of the quality that the observer desires to measure Let us call it the limiting mean. -  If an analyst can maintain statistical control during the measurement process, then the measurement method is considered to be statistically stable. In practice, the impetus to measure comes from the need to know, not the availability of acceptable measurement methods. Consequently, not all measures used in wastewater treatment plants are statistically stable. For example, the BOD 5 test, which is used by regulatory authorities and plant operators, was dropped by ASTM from its list of standardized tests because it is statistically unstable. ASTM argued that the test gave different values for the same substrate depending upon which seed was used. Because there is no way to characterize the seed bacteria, there is no way to determine the test’s accuracy. When an unstable test is used, the analyst cannot solely rely on the proper  Chapter 4. Measurement Process  115  execution of the method to ensure validity. Instead, the analyst must assume more of the responsibility for determining when the method produces a valid result (Result Evaluation Principle) [103, pp. 163]: To the extent that complete elimination of the subjective element is not always possible, the responsibility for an important and sometimes difficult part of the evaluation is shifted from the shoulders of the statistician to the shoulders of the subject matter expert. In wastewater treatment, this is why a redox measure is much more difficult to interpret than a dissolved oxygen reading. Scale Of Measurement A measurement scale may be one of four types: nominal, ordinal, interval and ratio. The minimal requirement of a scale is that the scale consistently assigns a symbol to a particular attribute.  Consistent assignment is the basis of the occurrence.  If the  scale forms a lattice, then a second property, order, is introduced. In order to perform linear transformations in the attribute values, the scale must he delineated with a unit interval. To perform nonlinear transformations, the scale’s zero must coincide with the attribute’s zero in the real world. Table 4.1 describes the basic empirical operations and the mathematical group structure for each scale. The table is adapted from [111. pp. 26] and [73]. The type of scale partly determines what type of statistical measures and methodolo gies are appropriate. Parametric tests should only be applied to data from interval and ratio scales [255, pp. 133] because one should not use statistics that could be distorted by admissible transformations of the scale values [111]. Table 4.2 lists examples of sta tistical measures appropriate for measurements made on various scale types. The table is adapted from a similar table in [111, pp. 31].  Chapter 4. Measurement Process  116  Table 4.1: Measurement Scales Scale Nominal  Ordinal  Interval  Operation Determination Of Equality For Unordered Categorical Variables  Determination of Greater or Less Ordering But No Implication Of Distance Between Scale Positions Determination of Equality of Intervals or of Differences  Group Permutation Group p = f(x) where f(x) means any one-to-one substitution Isotonic Group y=f(x) where f(x) means any increasing monotonic function Linear Affine Group  Example Supernatant Appearance (see Appendix D)  Mohs Hardness Of Minerals  Nephelometric Method Of Turbidity Measurement  p = ax + b where a > 0  Equal Differences Between Successive Integers But Zero Position  Ratio  Arbitrary Determination Of Equality  Similarity Group  Ratios The Highest Measurement Where One Can Comp are Differences In Scores As Well As Relate Magnitudes  p = cx where c> 0  Mass  Chapter 4. Measurement Process  117  Table 4.2: Appropriate Statistical Operations Scale  Nominal  Measures Of Location Mode  Measures Of Dispersion Information,H See Appendix B  Ordinal  Median  Percentiles  Interval  Arithmetic Mean  Standard Deviation  Ratio  Geometric Mean Harmonic Mean  Percent Variation  Measures Of Association Information Transmitted Contingency Correlation Rank Order Correlation ProductMoment Correlation  Significance Tests Chi Square  Sign Test Run Test I-test F-test  The simplest type of scale is a nominal or classificatory scale. Attributes are compared with archetypes and those that match the archetype are assigned the same symbol as the archetype.  For example, an operator determines the Appearance of the clarifier  supernatant by comparing a sample with six archetypal conditions  2:  bulking, clumping,  ashing, straggler-floc, pin-floc, and clear. The supernatant appearance is recorded as being the archetypal condition that is most indicative of the sample’s condition. Given that the only statistical property these data possess is commonness of occur rence, the only appropriate summary statistics are frequency statistics such as the mode. An ordinal scale is a nominal scale where the sets that form the scale can be ranked. When measuring, attributes are compared with the archetypes that form the scale. If an attribute matches an archetype, the attribute is given the archetype’s symbol. Because see Table D.2, Appendix D 2  Chapter 4. Measurement Process  118  Table 4.3: R.atio And Interval Temperature Scales Scale  Reactor A  Celsius Fahrenheit Rankine Kelvin  20 C 68 F 528 R 293 K  Celsius Fahrenheit Rankine Kelvin  -162 C -260 F 200 R 111 K  Reactor B  Case 1 40 C 104 F 563 R 313 K Case 2 -51 C -60 F 400 R 222 K  Ratio A: B  1:2.00 1:1.54 1:1.05 1:1.05 1:3.13 1:4.35 1:2.00 1:2.00  the scale is ordered, if an attribute does not match any of the archetypes, then it must lie between two archetypes. If this is the case, then the attribute is given a symbol that indicates between which two archetypes it lies. Mohs scale of hardness of minerals uses an ordinal scale. The hardness of A is ranked higher than B if A can scratch B, hut B cannot scratch A. Both commonness of occurrence and rank are important statistical properties of or dinal measures. The appropriate summary statistics are frequency and rank statistics (e.g. median, maximum, minimum, quartile). If the size of the interval between archetypes is known, then when an attribute is between two archetypes, a symbol can be assigned that expresses where in the interval the attribute lies. If the choice of zero on the scale is arbitrary, we refer to this as an interval scale. Unlike the previous scale, this scale is divided into units of equal and known size (i.e. degree). Arithmetic operations can be applied to differences between measures sharing the same interval scale as long as the rank and relative difference between measures is  Chapter 4. Measurement Process  119  preserved. This limits arithmetic manipulations to linear transformations. For example, the arithmetic mean is a valid statistic but not the geometric or harmonic mean. An interval scale becomes a ratio scale when measures of real world attributes have both ranking and interval properties and a natural zero. The natural zero implies that any two intervals on the scale have comparable magnitudes determined by the number of times one contains the other. Celsius and Fahrenheit are measured on an interval scale and Rankine and Kelvin are measured on a ratio scale. Table 4.3 lists the temperature of two reactors. In case 1, the temperature of reactor B is 40 C and A is 20 C giving us a ratio of 2 to 1. None of the other measures give this ratio. However, in case 2, the temperature of reactor B is 400 if and A is 200 if. The only other measure that gives this ratio is degrees Kelvin. The reason for this is that the zero of both the Fahrenheit and Celsius scale is different from the zero of the Rankine and Celsius scale which is absolute zero. Absolute zero is defined as the temperature at which the thermal energy of random motion of particles of a system in equilibrium is zero. In other words, absolute zero is a ‘true’ zero. This is why we cannot say that reactor B is twice as hot as reactor A based on either a Celsius or Fahrenheit measure (e.g Case 1). 4.1.2  Operational Measures  Many of the measures that an operator uses to operate his treatment plant are not stan dardized. A measure is non-standard if (1) the operator cannot determine its accuracy (e.g. BOD ) or (2) the measure’s interpretation is context dependent (e.g. redox). In 5 both cases, the operator preserves the quality of these data by (1) applying the measure to the process in a consistent manner and (2) establishing a method to validate these data using other measures conducted on the process. The operator also uses derived measures to run his process. Derived measures, such  Chapter 4. Measurement Process  120  as F/M ratio and SRT, are statistics. A statistic’s interpretation is based upon a model. If the model is meaningless, then so is the statistic. The Mean Cell Residence Time (MCRT) is defined as the average time an organism remains in the system under steady state conditions. In this case, the growth rate is a function of the dilution rate. In a chemostat containing a single pure culture, the MCRI is equal to the inverse of the dilution rate. If a solids separation device and a recycle line are added, the MCRT is defined as the ratio of the number of organisms in the system to the net loss of organisms from the system. SRI (Solids Retention Time) is the extension of this concept to a wastewater biore actor. SRI is a noisy and misleading operational parameter for three reasons: (1) a treatment plant cannot reach steady state conditions as defined by microbiologists, (2) a wastewater bioreactor is a fiocculating mixed culture  and (3) the raw measurements  used to calculate SRI are noisy. Vaccari [300] [299] derived an alternative measure, Dy namic Sludge Age (DSA), based on an age distribution function. DSA is founded on a more realistic model and appears to act as a smoothed SRI. However, both measures still suffer from noisy raw measures. For example, Figure C.1 (Appendix C) shows that if the standard deviation of the solids test is 10%, then a measured MCRT of 10 days indicates that the “real” MCRT is somewhere between 6 and 14 days  .  For this reason,  data analysis should be conducted on raw values rather than derived values. Measures used to monitor and control biological systems have been the subject of a number of reviews, some of which are listed below: Meyer et al [210] published a review of microbial growth control measures along hen suddenly increased loading results increased sludge wasting and wasting is adjusted to maintain a removal of a constant fraction of the mass. The traditional sludge age calculation predicts constant sludge age. In reality, the sludge initially gets younger due to the increased amount of new sludge present.” [300] The derivation of this plot is in Appendix C 4 ‘  Chapter 4. Measurement Process  121  with a discussion of growth control strategies. • Stephanopoulos et al [285) [257] [130] [258) discussed on-line instrumentation and bioreactor identification in a four paper series. • Wang and Stephanopoulos [304] reviewed real time digital-computer applications to fermentation control with an emphasis on using on-line instrumentation. • Briggs and Grattan [62] [63] provide a survey of instrumentation used in the United Kingdom. • The EPA published a manual on Wastewater Treatment Plant Instrumentation that covers installation and maintenance of conventional instrumentation [205]. The developments of measures to control wastewater treatment plants and other biolog ical systems is ongoing.  The author found that most of this material was published in IAWPRC specialty conferences, 5 IFAC specialty and general conferences, ISA proceedings and American Automatic Control Conference proceedings.  Chapter 4. Measurement Process  122  Table 4.4: Effectiveness Of ATU As Effluent BOD 5 Sample Nitrification Inhibitor Percent Decrease In BUD 5 When ATU Increased From 0.5 Type Of Plant Statistic Activated Biological Sludge Filters Maximum 27% 88% Upper Quartile 19 % 58 % Median 46 % 14 % Lower Quartile 30 % 10 % Minimum 6% 6% Number 29 9  4.2  to 2.0 mg/i ATU Both 63 % 46 % 39 % 33 % 28% 10  Preservation  A sample is preserved to protect its integrity between collection and analysis. Preserva tion techniques attempt to minimize physical changes caused by volatilization, adsorption and absorption, diffusion and precipitation, and chemical changes caused by chemical re actions, photochemical reactions and microbiological degradation. The changes are min imized with the use of temperature control, type of sample container used and chemical addition [240] [247] [42]. The validity of the preservation method is usually tested as part of a laboratory’s  QA/QC procedure. For example, Tyers and Shaw [297] looked at the effect of adding allythiourea (ATU) to a BOD 5 sample to inhibit nitrification. Table 4.4 shows the percent decreases in the BUD 5 concentration when the ATU is increased from 0.5 mg/l ATU to 2.0 mg/l ATU.  Chapter 4. Measurement Process  4.3  123  Sampling  By definition, a sample is a fraction of a stream. A fraction may be delineated by the processes of extraction or splitting, or by the interaction between the characteristics of an in situ measurement device and the system. The size of the fraction must satisfy two criteria: • The size must be manageable. • The size must allow the analyst to infer from the fraction’s characteristics the characteristics of the stream at the time and place the fraction is delineated, i.e. the sample is representative. The aim of sampling is to obtain relevant information on an object, e.g. a waste stream. The analyst must translate a request for information into a monitoring program. This program can be optimized by considering the purpose of the program, what the program is seeking information on and how this information will be obtained. The more prior information that is available on these items, the better a plan can be developed to gather the needed process information. By using the theoretical and practical work from other disciplines, the analyst can improve the monitoring program. Two disciplines that play an important role in the discussions in this section are Statistical Sampling Theory and Signal Theory. The appli cation of knowledge from these disciplines to the design of treatment plant monitoring programs is “tricky” because the conditions in a treatment plant are different from the conditions under which both these disciplines were developed. However, these disciplines provide a logical framework that can be used to solve the problems at hand. Most objects hold in themselves the proof of their value. Their value can be esti mated by measurements carried out on the object itself, i.e. all a good jeweller needs to  Chapter 4. Measurement Process  124  determine the value of a diamond is the diamond itself. This is not true of a sample. The economical value of a sample is its represent ativeness, i.e. the closeness with which the composition of the sample reflects the composition of the stream from which it was delineated. The only way to distinguish a representative sample from a useless sample after the fact is to examine how the sample was obtained. Over the last decade, many studies have shown that poor sampling is a commonly identified cause of poor data. No matter how elaborate the analysis made on a sample is, the resulting datum is of no value or worse, misleading, unless the sample is representative [82, pp. 171]: Sampling in environmental analyses is akin to dish washing in analytical chemistry. If the dishes are dirty, all analytical activity thereafter is for nought. Similarly, nothing is gained from the chemical endeavor if the samples are not representative of the environment from which they are taken. The discussion that follows this section is broken into four parts. The first section discusses the EPA’s definition of a sampling protocol. The second section discusses one aspect of this protocol, the sampling plan. The third section discusses why a sample is taken and the final section lists a number of sampling references. 4.3.1  Sampling Protocol  A sampling protocol is a thorough written description of detailed steps and procedures involved in the collection of samples (see Table 4.5). The protocol forms part of a Quality Assurance Project Plan and is usually included in a project’s proposal and report [233]. The three requirements for a sampling protocol as established by the American Chemical Society Committee on Environmental Improvement, are listed below [25]: 1. A proper statistical design that takes into account the goals of the study and its certainties and uncertainties. This item discussed in the next section.  Chapter 4. Measurement Process  125  Table 4.5: Outline Of A Sampling Protocol Purpose Analytes of interest Locations Sampling points Sample collection Sample handling Field determinants Sample storage and transport  Subelements Primary and secondary chemical con stituents and criteria for representativeness Site, depth and frequency Design, construction and performance evaluation Mechanisms, materials and methodology Preservation, filtration, and field control samples Unstable species and additional sampling variables Preservation of sample integrity  2. Instruction for sample collection, labelling, preservation and transport to the ana lytical facility. 3. Training of personnel in the sampling techniques and procedures specified. A sampling protocol is an expression of professional accountability because the pro tocol enables a researcher’s peers to evaluate the quality (particularly the representa tiveness) of the data. The objective of requiring a protocol is to prevent a researcher from sampling until the questions to be answered have been determined and properly framed [9]. This is evident in Green’s methodology which is one of the best set of guide lines for developing a sampling plan [129](see Table 4.6).  Chapter 4. Measurement Process  Table 4.6: Green’s Sampling Design Principles I.  2.  3.  4.  5.  6.  7.  8.  9.  10.  Be able to state concisely to someone else what question you are asking. Your results will be as coherent and as comprehensible as your initial conception of the problem. Take replicate samples within each combination of time, location and any other controlled variable. Differences among can only be demon strated by comparison to differences between. Take an equal number of randomly allocated replicate samples for each combination of controlled variables. Putting samplers in representative or typical places is not random sampling. To test whether a condition has an effect, collect samples both where the condition is present and where the condition is absent but all else is the same. An effect can only be demonstrated with a comparison with a control [cf Fundamental Problem Of Causal Inference]. Carry out some preliminary sampling to provide a basis for evaluation of sampling design and statistical analysis options. Those who skip this step because they do not have enough time usually end up losing time. Verify that your sampling device or method is sampling the population you think you are sampling, and with equal and adequate efficiency over the entire range of sampling conditions to be encountered. Variation in efficiency of sampling from area to area biases among-area comparisons. If the area to be sampled has a large-scale environmental pattern, break the area up into relatively homogeneous subareas and allocate samples to each in proportion to the size of the subarea. If it is an estimate of total abundance over the entire area that is desired, make the allocation proportional to the number of organisms in the subarea. Verify that your sample size unit is appropriate to the size , densities and spatial distributions of the organisms you are sampling. Then esti mate the number of replicate samples required to obtain the precision you want. Test your data to determine whether the error variation is homoge neous, normally distributed, and independent of the mean. If it is not, as will be the case for most field data, then (a) appropriately trans form the data. (b) use a distribution-free (nonparametric) procedure, (c) use an appropriate sequential sampling design, or (d) test against simulated H 0 data. Having chosen the best statistical method to test your hypothesis, stick with the result. An unexpected or undesired result is not a valid reason for rejecting the method and hunting for a better one.  126  Chapter 4. Measurement Process  4.3.2  127  Sampling Plan  A sampling scheme consists of at least five components  6:  • Sample Volume: how much sample to take • Sample Delineation And Location: what defines a sample • Sample Selection: random or systematic • Sample Frequency, Number, and Length of Study: how many samples should be taken and how often should they be taken • Sample Type: grab sample, composite sample or probe reading Sample Volume The population consists of the totality of the observations with which we are concerned, i.e. the effluent over the last 24 hours. The population must be defined before we can design a sampling scheme because a sample is a subset of the population. The sampling objective is to obtain a precise and accurate picture of the population’s characteristics. For this reason, the sample size should be large enough that it averages over the com positional heterogeneity yet is still small enough to be handled. Figure 4.1 shows the relationship between sample volume and the precision of the determination [82]. The samples were taken from the euphotic zone of a lake using a Van Dorn bottle outfitted with a graduated steel tape. The 5 Liter volume was the smallest size that gave similar results for both sides of the boat. 1f the reader is unfamiliar with the terminology used in sampling theory, he could refer to 6 ASTM D 3370-82 [2], ASTM D 4392-87 [4] and/or ASTM D 4375-84 [3]. The terms pertinent to this thesis are defined in the glossary (Appendix B).  128  Chapter 4. Measurement Process  Effect Of Sample Volume Mean and Standard Deviation 100 ., Sfl  -7,.’ ‘I—,  0 C)  60  —  E 50  —  :: 20  F  FL  2L-S 2L-N 5L-S 5 L-N 20 L-S 20 L-N 30 L-S 30 Volume and Side Of Boat  Figure 4.1: Samples Taken On Linsley Pond  L-N  Chapter 4. Measurement Process  129  Sample Delineation and Location Sample delineation depends on the type of sample the analyst wishes to take: • Increment Sample: An increment sample is a portion of a stream at a given instance in time. The analyst must decide where in the stream to take his sample. For example, if the analyst is sampling the aeration basin for MLSS, he would take the sample in a well-mixed zone away from the edge of the basin. • Split Sample: A split sample is the total volume of the stream over a time interval, i.e. the stream is diverted into the sample container. The analyst must decide when and how much of the stream to divert. For example, if the analyst wants to sample the pickle liquor addition during a pump cycle, he should divert the flow at the start, middle and end of the cycle to form a composite sample. Split samples are commonly used in the mining and other solids handling industries. • Probe Sample: A probe sample is the volume over which the probe measures. This volume depends on the characteristics of the probe and what is being sam pled. For example, a probe in a quiescent basin samples a smaller volume than a probe in a turbulent tank. In this case, the analyst must position the probe to obtain a representative measure of the tank’s contents. One way to determine the approximate size of this sample is to measure the concentration profile of the tank and plot out the zone around the probe that is within two standard deviations of the probe’s measurement error. If more than one probe is used, then the sample value may be either the mean or median probe reading. This is the case when an operator attempts to improve data reliability by using two or more probes rather than a single probe [66]  ‘.  lnstrument redundancy does not eliminate the need for a QA/QC program. For a complete discussion 7 of DO probe redundancy aud failure, refer to [66]  Chapter 4. Measurement Process  130  Sample Selection In standard sampling theory, the rule is that samples must be chosen at random unless there is a very good reason for doing otherwise (see Table 4.7). The reason for this is that a researcher does not know everything about the population being studied. Therefore at some point, he must make some assumptions. Statistical Sampling Theory reduces the consequences of a faulty assumption [9] by forcing the effect of what is unknown to manifest itself as a random variable. Consequently, at some level, sample selection must be random [129, pp. 10]: It is critical that at some level the sampling be random or most statistical tests will be invalid. The assumption of independence of error is the only one in most statistical methods for which violation is both serious and impossible to cure after the data have been collected. Truly random allocation of samples is the necessary and sufficient safeguard against this violation. A researcher may decide to restrict or eliminate random selection given prior knowledge of the population. However, if this knowledge later proves to be inaccurate, so may the end results. In most situations, researchers control sample selection while still retaining a random component. The case when the selection is completely controlled is referred to as System atic Sampling. The main advantage of systematic sampling is that the program is easier to execute. The disadvantage is that the system used to select the samples may appear in the results as being a characteristic of the system. For example, when evaluating a Total Oxygen Demand Meter, researchers observed that the sample-based two month average of a systematic sample of the plant’s effluent oxygen demand could be out by as much as 35% depending on the time of day the sample was taken [117]. A similar situation can occur if the sample time coincides with equipment cycles [105]. Despite its dangers, systematic sampling is commonly used because “to some extent, good judgement is a substitute for money” [57]. C. Cochran provides a list of circumstances under which it is safe to use systematic sampling [78, pp. 229]:  Chapter 4. Measurement Process  131  Table 4.7: Sample Selection Methodology Select Units  Method  Description  Comment  Using prior information on population  Haphazard or Arbitrary  No prior information available, so any sample will do  Population must be homogeneous or parameter estimates will be bi ased. The method is used primar ily in pilot sampling studies, i.e. look at whatever data is available.  Judgement  Sampler knows population very well so he selects which sample will be indicative of the population.  Same as above. This method is often used when the sample itself is important, i.e. check suspicious cars at the border.  Search  Sampler has historical information on the location being searched for.  The sampling goal is to find something in a population which is different from studying the population as a whole, i.e. find an illegal connection in a sewer.  Using a regular pattern or system  Systematic  The sampler chooses the samples systematically.  Systematic sampling is used to map across space and/or time. If the population exhibits a cyclical pattern, the sampler must he sure the system’s resolution is high enough to detect the pattern.  Using a random number  Probability  The sample selection process has a random component.  Statistical Sample Theory deals with probability sampling almost exclusively. Random selection of samples is the preferred method ology, unless there is a very good reason for doing otherwise.  Chapter 4. Measurement Process  132  • Population is relatively homogeneous: The population exhibits no systematic patterns that could bias the program’s results. Over a day, the nature of the fish entering a spawning channel would not change dramatically nor would they exhibit any sort of systematic pattern. Therefore, examining every tenth fish would be equally as effective as sampling at random. • Population is stratified and an independent sample is taken from each strata: The population consists of strata. Although the average values between strata may differ, the variation within the strata is random and small when com pared to the variation between strata. • Population is stratified but the sampling frequency is high enough to characterize the strata: This situation is like the previous one except the varia tion within the strata is significant. However, the number of samples taken on each strata is sufficient to ensure that the results are not biased. • Population is systematically stratified and an error estimate is not re quired: This situation is similar to the previous example. However, the form of the variation within the strata is not random and is known, thus the judicious choice of sample time and location provides a reasonable estimate of the strata’s average. Systematic sampling is also used when subsequent analyses require a systematic data set (i.e. Time Series Analysis). If systematic sampling is used instead of random sam pling, the onus shifts from the statistician to the researcher to prove that the results are representative. This is another manifestation of the Result Evaluation Principle. For this reason, samples whose results may be used in a court case or to determine compliance should be selected at random.  Chapter 4. Measurement Process  133  Sample Frequency and Number Of Samples Taken A researcher uses foreknowledge of the system’s structure to decide on the sample type and frequency. The Statistical Sampling Frequency is determined from estimates of the underlying variability and the desired precision. If the system exhibits a regular cycli cal behaviour, the researcher should also consider the Signal Sampling Frequency. The Signal Sampling Frequency should be at least three times faster than the fastest cyclical component in the system being sampled. The larger of the two sampling frequencies gov erns. The other considerations that must be examined when determining the sampling frequency are mentioned below [50, pp. 85]: For selection of an appropriate sampling interval, a careful analysis must take into account the natural variability of the process, the precision of control that is possible, the required uniformity of performance, the accuracy of mea surements used to judge performance, the penalty for failing to detect bad performance, and the cost of sampling, laboratory work, and data analysis. Sample Type In pollution control, there are three types of samples that deserve mention: in-situ, grab and composite. An in-situ sample is one where the sample is not removed from the population. For example, a redox probe measures the condition of the biomass in the vicinity of the probe. An in-situ sample is used whenever a parameter cannot be measured any other way or when it is cheaper or more convenient to measure in-situ. A grab sample is one where the sample is removed from the population and the analysis is conducted on a portion of the sample. Berg lists three situations where a grab sample should be used [42, pp. 19-21]: 1. Sampling batch flows (i.e. on delivery of a tank of waste pickle liquor to a treatment plant using the liquor for phosphorus removal) 2. Sampling parameters that may change when stored or composited (i.e. Dissolved Oxygen, PH)  Chapter 4. Measurement Process  134  3. Sampling a parameter that changes very slowly making a composite unwarranted (i.e. MLSS is an aeration basin). A composite sample is a sample where a number of grab samples are pooled prior to analyses. The volume and time of sampling of the grab samples determines how the samples are pooled.  A composite sample is routinely used to screen populations, to  reduce the variability in a sample and to facilitate an analytical procedure [116]. The advantage of a composite sample is that it allows the operator to uncouple the sampling frequency from the analytical frequency. For example, assume an operator wishes to estimate the plant’s daily load. Also assume that the variation in the load over the day is unimportant but significant. The ideal approach is to take and analyze enough grab samples within each day to characterize the variation completely, then filter out the within-day dynamics from the data set.  A more pragmatic approach would  be to composite the grab samples and conduct one analysis each day.  Time-based  compositing effectively filters out frequencies up to the analytical period while distorting the contributions of components with a period between the analytical period and four times the analytical period. Composite samples reduce analytical costs and reduce inter-sample variance. How ever, the disadvantage of composite samples warrants some examination: 1. Dilution below detection limit: Under some circumstances, compositing may dilute the parameter of interest to the point it is undetectable. 2. Loss of representativeness: The parameter of interest should behave like a conservative substance.  This excludes parameters that are in equilibrium with  other phases such as dissolved oxygen, pH, free cyanide ion or dissolved heavy metals [214].  Chapter 4. Measurement Process  135  3. Loss of information and possible bias: Information is lost because the grab samples that form the composite are not measured individually. Bias results from three sources: • The nature of the composite function: The simplest composite sample is a time based composite  -  samples of equal volume are taken at fixed time  intervals. A more complex composite sample is a flow-based composite  -  the  size of the aliquot is a determined by the current flow rate. In this case, the statistic is a ratio. The variance introduced by imperfect knowledge of the flow affects the variance and distribution of the load [260]. • The nature of the statistic: A time based composite estimates the mean concentration while a volume composite estimates the mean load. Because the flow and concentration both vary with time, the load estimated from a time-based composite and the mean concentration estimated from a volume based composite are biased. • The relationship between the analytical frequency and the stream’s dynamics: Bias may result if the dominant dynamic in the stream lies be tween the Nyquist frequency and the evaluation frequency. The evaluation frequency is one quarter the analytical frequency. Sample Interval A sample represents an interval in space or time. The optimal size of this interval is determined by the relationship between the characteristics of the sample and of the population.  Chapter 4. Measurement Process  4.3.3  136  Sampling Viewpoints  We can view sampling from three perspectives: 1. Principal and Objectives: A sample is taken on a system to either describe, monitor or control the system. A descriptive measure may be infrequent and noisy as long as it provides the operator with the information he needs, while a control measure must be precise, accurate, fast and frequent if the operator is to maintain control of his system. 2. Object and Its Properties: A sample must be representative. Therefore, the sampling process should not modify the sample’s characteristics. In some cases, a composite sample may be desired but a grab sample necessary because the attribute of interest is in equilibrium with the atmosphere, i.e. pH. 3. Analyst and Analytical Procedure: The sampling method depends on the analytical method. For example, if the attribute is near the detection limit of the analytical method, then a composite sample should not be used. We will elaborate on the first perspective as the previous sections dealt with the other two items in sufficient detail. Principal and Objectives A sample is taken for one of three reasons: 1. To Describe 2. To Monitor 3. To Control  Chapter 4. Measurement Process  137  A description is a measure of the properties of an object to determine the object’s average characteristics along with some measnre of the precision of these characteristics. The researcher mnst be aware of the type of estimator nsed if he is to nse the proper precision estimator. Monitoring is the measurement of the changing properties of an object in order to detect deviations from a preset value that are considered too large. This is also referred to this as Threshold ControL The sampler increases the sampling frequency as the properties approach their bounds. In other words, the sampling frequency should be a function of the previous value. In this case, the most important sample is the most recent one. We use this type of monitoring when we want to detect abnormal conceutrations, to define peaks, to detect trends, to detect when a parameter exceeds a limit, or to follow a parameter as a unit process approaches failure. Control is the measurement of the changing properties of an object to detect a signif icant deviation from a set-point aud to adjust the system to correct for these deviations. The difference between control and monitoring is the degree of intervention required to maintain stability. The turn-arovnd-time of a sample must be faster than the dynamic being controlled. If a correction is applied after the dynamic passed through the system, then the correction itself may become a cause of deviation. Turn-around-time should not be confused with dead-time. Dead-time is a lag in the system while turn-around-time is a lag in the measurement process. Control requires precise, accurate and fast measures.  Chapter 4. Measurement Process  4.3.4  138  References  A standard reference in Statistical Sampling Theory is W. G. Cochran’s text “Sampling Techniques”  .  However, statistical sampling theory does not consider sampling and re  construction of a time series, particularly Shannon’s Sampling Theorem. For this reason, Cochran’s text is complemented with short sections from other time series and auto matic control texts [18] [61] [284] [133]. Three texts discuss the application of Statistical Sampling Theory areas which share some characteristics with wastewater treatment: • R. 0. Gilbert’s “Statistical Methods For Environmental Monitoring”: monitoring surface waters, hazardous wastes and ground water. • R. H. Green’s “Sampling Design and Statistical Methods For Environmental Biol ogists”: ecology. • P. M. Gy’s “Sampling Of Particulate Materials”: mineral processing, sewage sludges and solid wastes. The EPA published two sampling manuals for wastewater treatment plants which focus on the methodological aspects of sampling [267] [42]. Montgomery and Hart [214], Ellis and Lacey [105] and Katema [182] published sam pling reviews. Additional information can be found in chapters 1 and 2 in Examination Of Water For Pollution Control [204] [311]. The balance of the literature discusses specific sampling concerns, some of which are listed below: • Volume Of Sample [82]: The volume of a sample is a compromise between a sample being representative and a sample being manageable. Two of W. 0. Cochran’s texts are considered standard references in applied statistics: Sampling 5 Theory [78] and Experimental Design [79].  Chapter 4. Measurement Process  139  • Validity Of Composite Samples [2.59], [261], [260]. [116]: A composite sample loses its advantage over a grab sample when the composite function is noisy. • Mass Discharge Estimation [55]: A stratified random design using a ratio esti mator to estimate baseline pollution to the Great Lakes. • Sampling Sludges and Slurries [243], [81]: Application of Gy’s sampling theory. • Scheduling Of Sample Analysis In A Laboratory [169]: If the objective is process control, then the most recent sample should be processed first. • Design Of Sampling Plans When Autocorrelation Is Present [221], [275], [276]: The methods exploit the antocorrelation function sample to reduce the num ber of samples taken. • Sample Collection Routing [266]: The Lincoln Division Of Anglian Water must collect samples from over 600 locations.  The Division optimized the collection  routes and was able to collect over 15000 more samples without an increase in staff. • Ecological Sampling [9]: The distribution of a population is as important as the mean because most ecological populations are not normally distributed. • Sampling Intervals For Compliance [23], [215], [279], [117]: Optimize the abil ity to enforce compliance while controlling the cost of monitoring. • Sampling Protocol [25], [233], [101]: The EPA requires a sampling protocol as part of the Quality Assurance Project Plan. Autocorrelation of 9 servation. [201]  a time series indicates that information is carried over from observation to ob  Chapter 4. Measurement Process  140  Automatic Samplers [229], [197], [187]: Reviews of automatic sampling equip ment. The bulk of the literature on sampling is published either in ASTM or EPA documents or in  “Water  Pollution Control (GB)”.  Perhaps, what is more interesting is what is  not in the literature. Apart from M. B. Beck’s work, little reference is made to the impact of sampling (and measurement) on modelling and automatic control in wastewater treatment. This is not the case in the biotechnology and automatic control fields.  Quality Assurance/Quality Control  4.4 4.4.1  Concern For Quality  The data obtained from wastewater treatment plants is used to determine compliance, to control and optimize the process, and to provide information leading to new designs and new regulations. The type and frequency of these measurements must increase if a treatment plant is to meet its new more stringent discharge requirements. Measurement for these purposes across the industrial and public sector accounts for about 6% of an industrial nation’s Gross National Product [164]. Although this cost is large, the indirect cost of making poor measurements is staggering. The immediate effect of poor quality data is noise. Although the amount of infor mation published each day is increasing, there is also an increase in confusion caused by poorly designed experiments, haphazard sampling plans and flawed measurement pro cesses. The cost of this noise is reflected in the wastage of research effort when results cannot be duplicated, and conflict between regulators and industry over permit violations and the applicability of new technology [164]. This cost also includes the time readers ”Water Pollution Control” was merged the “Public Health Engineer” in 1987 to form “Water and 10 Environmental Management”. The last issue of WPC was 86(2).  Chapter 4. Measurement Process  141  spend trying to determine the utility of a paper’s findings. Without standardized and full reporting of results, researcher’s cannot pass judgement on what they read [21]. Similarly, without a standardized and full method for recording operations decisions, designers and managers cannot determine how well the plant is being operated. The problem is endemic and concern is growing. For example, consider the evolution of the EPA’s compliance inspection programs [223]. In the mid 1970’s, the EPA organized the Compliance Evaluation Inspection (CEI) and Compliance Sampling Inspection (CSI) programs. CEI involved a short visit by an inspector. The inspector discussed plant operation, laboratory methodology, record keeping and laboratory quality control. CSI is identical to CEI except the inspector also collected effluent samples which were analyzed by another laboratory and the results compared with the plant’s records. In the late 1970’s concomitant with the National Cause and Effect Survey, the EPA organized the Performance Audit Inspection (PAT) which is less intensive than the CSI but focuses on plant operation rather than performance. The Cause and Effect Survey provided the basis for the Composite Correction Program (CCP) [140]. As a consequence of the results of previous surveys and some of the initial betweenlaboratory test sample programs [164] [50], the EPA began to require self-monitoring NPDES  sites to analyze reference samples as early as 1980. This program is known  by various names including the Discharge Monitoring Report Quality Assurance (DMR QA) program and the Performance Evaluation Sample Program (PESP). Table 4.8 sum marize the program’s effectiveness [7, pp. 6]. By 1985 October 1, the EPA required all environmental data used for decision making purposes to be of known (and documented) quality [213]. The next step in this evolution is the Wisconsin plan to certify laboratories based on their QA/QC programs and their performance in a regular reference sample test program. National Pollution Discharge Elimination System 11  Chapter 4. Measurement Process  142  Table 4.8: DMR QA/PESP Effectiveness Year  1980-81 1982 1983 1984 1985 1986 1987 1988  91 Of Permittees e With All Data Acceptable 30.8 41.4 49.3 54.2 55.8 52.1 54.1 56.9  % Of DMR QA Analyses A ccept able 73.9 78.9 82.8 85.4 85.5 86.1 87.1 88.0  The development of the EPA’s concern over data quality is not unique. An increasing number of publications concerned with data quality have emanated from ASTM [290], National Bureau of Standards [164], Bureau of Census [21] and the National Council for Air and Stream Information [223] [226]. The quality of data is judged relative to its intended use. In a wastewater treatment plant, effluent data are collected to satisfy regulatory authorities and to provide an indi cation of the plant’s performance. Other data are collected to monitor and control the process. These data are also used to establish new regulations and expand our under standing of existing processes. The data quality required for these last two purposes may be different than that required for process monitoring and control. 4.4.2  References  Quality Assurance is the system of activities whose purpose it is to provide assurance that the quality control job is being done effectively. Quality Control is the system of activities whose purpose is to provide a quality of product or service that meets the needs of the user. The application of these systems to wastewater and water laboratories is  Chapter 4. Measurement Process  143  clearly explained in three EPA documents: • Handbook For Analytical Quality Control In Water and Wastewater Laborato ries [56] • Choosing Cost-Effective QA/QC Programs For Chemical Analysis [248] • Quality Assurance/Quality Control QA/QC for 301h Monitoring Programs: Guid ance On Field and Laboratory Methods [22] 4.4.3  12  Data Quality  Data Quality can be defined as “the totality of features and characteristics of data that bears upon its ability to satisfy a given purpose [233]. By definition, a data set may be of both poor and excellent quality at the same time depending upon its end use. The EPA formalizes the Quality concept by identifying five characteristics of major importance accuracy, precision, completeness, representativeness and comparability. Accuracy And Precision Accuracy is a function of precision and bias [103]. Accuracy is a measure of the degree of correctness while precision is a measure of reproducibility. If two measures are without bias, then the more precise measurement would be considered the more accurate while if two measurements are equally precise, the one whose mean value is closest to the true value would be considered the most accurate. A precise measure cannot he considered accurate unless the bias is negligible. Accuracy, precision and bias are usually reported with a description of how they were determined. A reliable measure is defined as being both a precise and an accurate measure. Under section 301(h) of the Clean Water Act, municipalities are required to conduct monitoring 12 programs to determine the impact of their discharge on marine hiota, to demonstrate compliance with applicable water quality standards and to measure toxic substances in their discharge [22].  Chapter 4  Measurement Process  144  To estimate accuracy, an estimate of the bias, imprecision and (strictly speaking) the form of the distribution of individual measures about the sample-based average are required. If the measurement process is under statistical control, it is assumed that the distribution is Gaussian. Accuracy cannot be determined unless the true valne is known beforehand. Analytical inaccuracy may be caused by interference, nonselectivity, a biased calibra tion or an erroneous blank correction [104]. An interference is a biological or chemical attribute (other than the determinand) of a test sample that positively or negatively offsets the measurement result from the true value. For example, reduced inorganics such as ferrous iron interfere with the Chemical Oxygen Demand (COD) measurement of organic material snch as fatty acids. Nonselectivity is the inability to measure all forms of the determinand eqnally. A typical COD test is nonselective because it oxidizes all biodegradable organics but only partially oxidizes aromatic organics [202]. Many environmental measurements use at least one of the following techniqnes in their measurement methodology: blanks  ,  standard solutions or calibration cnrves. For  example, the BOD, COD and Suspended Solids test use a blank to provide a zero point. Blank results, like other analytical results, are subject to error. Therefore the use of blanks may introduce additional variation and/or bias [248]. Correction is a less obvious source of analytical inaccuracy. “Corrections applied in practice are usually of more limited scope than the names that they are given appear to indicate” [103]. For example, rate constants are often corrected for temperature using the Arrhenius law. However, the law is only an approximation because enzyme catalysis is a bounded nonlinear function of temperature. For this reason, correction is a form of extrapolation and extrapolation increases the uncertainty and possibly the inaccuracy of a measnre. Table 4.9 contains quality assurance data for BOD and Suspended Solids tests [21]  Chapter 4. Measurement Process  145  Table 4.9: BOD and Suspended Solids Quality Assurance Results: % Unacceptable Study Wisconsin 1978 Major Municipal Minor Municipal Major Commercial Minor Commercial Wisconsin 1980 EPA DMR QA Wisconsin 1982 EPA DMRQA DMR QA 1982  Number 150  107 122 109 123 7500  5 BOD  SS  80 72 60 100 22  20 35 70 33 20  18 17.9  17 17.1  [50] [244] [188] [64]. Weber’s study was one of many studies that showed that a vast amount of data being collected by dischargers was inaccurate [50]. The percentage of inaccurate data dropped dramatically in later studies as the EPA, having once identified a problem laboratory, took remedial action. The Wisconsin studies also showed that Pa per Mill laboratories performed better than other NPDES laboratories. This may be the case because these laboratories conducted analyses for process control and therefore were under greater scrutiny. The discrepancy between the initial percentage of unacceptable results is indicative of the observation that “it is more difficult to ensure adequate ac curacy and comparable results with biological/microbiological determinations than with physical or chemical determinations [104, pp. 266]. Sampling inaccuracy may result from a poor sampling plan, an inappropriate sampling technique or an unacceptable sample custody chain. The latter two sources occur because the researcher has not evaluated the appropriateness of the sampling process for each determinand. For example, four deficiencies identified by NPDES inspectors at pulp and paper mills are given below [223]:  Chapter 4. Measurement Process  146  1. Insufficient linear velocity in automatic sample collection to prevent loss of set tleable solids 2. Collection of an unrefrigerated sample when a refrigerated sample is required 3. Use of dirty sampling equipment and sample containers 4. Holding the collected sample prior to analysis longer than the maximum allowable holding time Inaccuracy resulting from the sampling plan was discussed in Section 4.3. A Quality Assurance Project plan addresses the issue of accuracy in two ways [233] [213]. First, the project proposal is justified with respect to experimental and sampling design theory, and standard sampling and sample custody practices. Second, an analytical qual ity control program is instituted to prove that the analytical method provides accurate data throughout the project’s life. Quality control usually consists of analyzing refer ence or spiked samples. These data should be accompanied with supporting information including a description of how these data were collected and the number of data points involved [274]. Precision is meaningless if the measurement process is not under statistical control. An analyst in a wastewater treatment plant maintains statistical control over his mea surement process by ensuring that the measurement of blanks, standards and spikes are statistically stable, i.e. vary randomly about their true value.  Chapter 4. Measurement Process  147  The precision of an analytical process is often concentration dependent. For example, a 1:1 mixture of glucose and glutamic acid in a total concentration range of 5 to 340 mg/l was analyzed by between 86 and 102 laboratories in an interlaboratory study [176, pp. 489]. The precision expressed as a Standard Deviation (5) is given by the following equation: S  =  0.120(Added Level in mg/l) + 1.04  This type of dependency must be taken into account when applying Statistical Qual ity Control methodologies to environmental analyses  -  particularly when using Control  Charts [56]. A precision measure should be reported with an explanation of how it was determined [233] [5]. Precision is dependent upon the analyst, the apparatus, the laboratory and the date and time of the analysis. Given all the permutations possible. ASTM restricts the use of the terms “Repeatability” and “Reproducibility” to the following situations: • Repeatability is a measure of the variability between at least two test results ob tained from a single operator on a particular piece of equipment on a particular sample in the shortest length of time. Repeatability is an indication of the high est precision that can be obtained by a single operator using a particular piece of apparatus on a particular sample. • Reproducibility is a measure of the variability of a measurement method carried out on a single sample (usually by different laboratories). Reproducibility is a measure of the precision of a measurement method carried out on a single sample. By definition, reproducibility is almost always greater than repeatability. Precision of a measurement should be presented as a function of the measured value across the range of applicability [274]. At minimum, regression coefficients, means and  Chapter 4. Measurement Process  148  Table 4.10: Potassium Determination In An Agricultural Laboratory  Source Of Imprecision Sampling Error due to field sampling due to sampling inhomogeneity in the laboratory  Percentage j 87.8 84.0 3.8  Between Laboratories  9.4  Sample Preparation  1.4  Precision of Ivleasurement  1.4  Variability is expressed as a percentage of the total.  other statistics should be accompanied by an indication of their precision. The most effective way to present the precision of a measurement process is as a Statistical Exper iment because the Analysis of Variance Table speaks for itself. For example, a researcher looking at Table 4.10 knows immediately what sources of imprecision are significant [207]. This approach requires that the sampling plan and measurement schedule are laid out to accommodate this type of analysis. Precision estimates are usually accompanied by some descriptive information including the type of precision, the components of the process included in the estimate, and any outlier identification schemes [274]. Completeness Completeness is a measure of the amount of valid data obtained from a measurement system compared to the amount that was expected to be obtained during planning. The significance of completeness depends on the consequence of not having all the data you hoped for [274]. For example, the impact of the missing data might he a loss of statistical  Chapter 4. Measurement Process  149  power resulting in large confidence intervals or insensitive statistical tests. When data are unavailable, common practice dictates that the report identifies in what phases of the work the data were lost, provides an explanation and suggests a remedy. The impact of missing data on the study’s utility may be severe (see Section 2.2). Representativeness And Comparability Representativeness is a geneiic term referring to the degree of correspondence between what is measured in the laboratory and what occurred in situ  The strength of this  correspondence determines a study’s utility [29] No matter how elaborate or expensive the analysis made on a sample is, it is of no value or worse, misleading, unless the sample is representative. Representativeness is affected by how and when samples are collected, what happens to the samples between their being collected and analyzed and how the samples are analyzed [274] [240]. For example, inspections of paper industry NPDES self-monitoring facilities identified a number of deficiencies that possibly affected the representativeness of their effluent 5 data [223]: BUD • Sampling —  —  Samples not kept at 4 degrees C during collection Dirty samplers and sample containers  • Storage  —  Holding time exceeded the maximum recommended  Chapter 4. Measurement Process  150  Analysis  —  —  —  Improper seed storage Insufficient thiosulfate standardization frequency Improper incubation temperature  Most deficiencies occurred in laboratories without a Quality Assurance/Quality Control program in place. A representative sample is not necessarily a random sample. In the literature, the two words are sometimes assumed to be synonyms [1]. Random refers to how a sample is chosen while representative refers to how well a sample describes the part of the system from which it was taken. By definition, representativeness like accuracy, is impossible to quantify. For this reason, the EPA requires that the researcher include a justification of the project plan with respect to representativeness and proof that the plan was implemented [233]. Comparability expresses “the confidence with which one data set can be compared to another” [233]. The comparability of two data sets can only be decided if the condi tions under which the data were generated and collected are known. For example, the EPA [233] list six criteria that to determine if two data sets might be comparable: • Similar siting criteria • Same observables measured • Compatible sampling and analysis protocols • Same degree of quality assurance and control • Same units of reporting  Chapter 4. Measurement Process  151  • Correction of measured values to standard conditions Although full disclosure is required to determine whether two data sets are comparable, many articles fall short of providing the information necessary to evaluate comparability. For example, a recent review of three prominent medical journals revealed that only 56% of clinical trials were fully reported. The authors concluded that  ‘.  .  .  investigators often  have good reason for weaknesses in design, but the reasons for weaknesses in reporting can he few” [96, pp. 1337].  Chapter 4. X’ieasuremerzt Process  4.5  152  Measurement Model  Pierre M. Gy observed that the failure of many mining or metallurgical ventures can of ten be attributed to unaccountable sampling errors. Gy developed his General Sampling Theory to improve the accuracy and efficiency of the sampling of particulate solids in these industries, particularly the sampling of ores and concentrates. His theory is sum marized in his text “Sampling Of Particulate Materials” [133]. The theory now forms the basis of process sampling techniques in a multitude of industries including waste treatment [81] [243]. In retrospect, Gy’s theory evolved from the inadequacy of any existing theory in itself to address completely the difficult sampling problems surrounding the monitoring of municipal sludges, sediments, dredge spoils, drilling muds, solid wastes and waste water streams. For this reason, the theory is a synthesis of Statistical Sampling Theory, Signal Theory and Sampling Physics and Mechanics. Statistical Sampling Theory or more specifically, Standard Sample Survey and Classical Sampling Theory, was developed to obtain efficient estimates of a population’s characteristic and its uncertainty. Signal Theory was developed to ensure that a digital reconstruction of an analog signal is free of alias. The physics and mechanics of sampling are concerned with the sample extraction process itself. Two models form the basis of Gy’s theory: The Continuous and The Discrete Selection Models. The Continuous Selection Model describes the variation with respect to time and space of the characteristic of interest within the investigated material. This model consists of two functions: • Qualitative Function: The critical content of the material flowing past the sam pling point at an instance in time.  Chapter 4. Measurement Process  1.53  • Quantitative Function: The rate of flow at an instance in time. If necessary, these functions can be generalized to 2 or 3 dimensions. The second model, the Discrete Selection Model, is concerned with the discrete nature of fragments or particles submitted to the sampling operation. The model describes two types heterogeneity: • Constitution or Micro Heterogeneity: The case when there are differences in physical or chemical properties between or within particles. • Distribution or Macro Heterogeneity: The case when the stream is stratified or multi-phasic. Gy limits the sample selection schemes to systematic selection with random position ing, random stratified selection and random selection. The reason for this restriction is that purposive selection does not lend itself to statistical examination [78, pp. 10-111. However, in most situations, understanding the probabilistic case is a prerequisite to understanding the purposive case. The theory discusses two methods for obtaining a sample: increment sampling and flow splitting. The difference between the two is that in flow splitting, the entire stream is diverted for an instance in its entirety, while with increment sampling, only a portion of the stream is removed at an instance. Gy defines the underlying objective of a sampling scheme as correctness. Correctness is a measure of the match between a reconstruction of the history of a stream from a set of samples and the history of the stream. The required closeness of the match depends on whether the reconstructed history is used for control, failure detection, process identification, monitoring, or compliance determination. The goal of Gy’s theory is to identify sampling problems and determine if they are correctable. The theory accomplishes this by classifying sampling errors into identifiable  Chapter 4. lVleasurement Process  154  units. This scheme forms the framework for this section. The units may be identified in part by running a \/ariographic Experiment to obtain data to construct a Relative Semivariogram. The reader may refer to Gy’s text [133] or Pitard et al’s paper [243] for more information. 4.5.1  Model Components  Figures 4.2 contain a schematic of Gy’s sampling error model. The error components are dealt with during the planning of the monitoring program (i.e. Quality Assurance) and during the execution of the program (i.e. Quality Control). The components tagged in Figure 4.2 with an “A” are dealt with at the planning stage, those with a “C” during the execution stage and those with an “A,C” at both stages. Gy’s model could form the basis of a measurement process simulation as well as a measurement process audit. Overall Error The Overall Error,  eoE,  is the error generated during the measurement process. The  process includes sampling, preservation, transport, preprocessing and analysis. All but the last term is contained in the Total Error,  eTE.  The Analytical Error, eAE, is the error  generated duiing the measurement of the sample’s determinand  Given the abysmal  performance of treatment plants during DMR-QA studies, analytical error remains an impoit ant concern  eoE  =  eTE  +  CAE  (4.6)  Chapter 4.  155  Measurement Process  KFY  Stage Error Dealt With A C  Planning Execution  C, A C, A C, A C C  A A A A  A, C A, C A,C  Figure 4.2: Gy’s Sampling Error Mode’  Chapter 4. Measurement Process  156  Total Error The Total Error, pling errors,  TE, 6  is the sum of the individual samples’ preparation, epg, and sam  esE.  N 6 T E = E(epEk  +  (4.7)  esEk)  k=1  4.5.2  Preparation Errors  Preparation errors occur during the preparation of a sample for analysis: • Contamination Error: A foreign substance contaminates the sample altering the measurement of the determinand, i.e. contamination from a dirty sample bottle. • Loss Error: A fraction of the sample is lost prior to analysis, i.e. the exclusion of some colloidal matter because the analyst does not use a wide-mouth pipette. • Alteration Error: Sample characteristics are altered by physical or biological phe nomena, i.e. the sample undergoes nitrification in the sampler. • Mistake and Fraud: A mistake is unintentional error introduced due to an accident or due to ignorance, i.e. confusing sample bottles. Fraud is an intentional mistake. Sampling Error Sampling Error, esg, arises from the sample selection, delineation and extraction pro cesses. The sampling error is the sum of the Continuous Selection Error, Materialization Error,  ecE,  and the  eME.  eSE = eME  +  CUE  (4.8)  Chapter 4. Measurement Process  157  Materialization Error The Materialization Error is the sum of the Increment Delineation Error, Extraction Error,  EE 6  CDE  and the  The Increment Delineation Error results from the incorrect de  lineation of the sample in the stream, i.e. the sample is taken at the wrong time or place. The Extraction Error results from the tendency for the sampling device or probe to select for different fractions of the stream with different probabilities. For example, a sampler will select for different portions of the stream depending on the tube size and the sampler pump velocity.  eME = eDE  +  cEE  (4.9)  Continuous Selection Error The Continuous Selection Error,  0E, 6  is the error that results from the interaction be  tween the stream’s characteristics and the sampling process. The error is the sum of two components, the Weighting Error and the Quality Fluctuation Error. The Weighting Error,  ewe,  results from changes in the stream’s flow or volume.  For example, an operator can take either a time-based or a flow-based composite sample. In theory, the flow-based sample provides the best estimate of the mass of a parameter entering the plant over the compositing period because it is a flow-weighted average. However, the sample’s accuracy and precision rely on the accuracy and precision of the flow meter. Consequently, the utility of the sample decreases as the noise in the weighting function increases (Schaeffer’é Composite Sampling Theorem) [259]. A time-based composite sample provides a biased estimate of the mass loading into the plant because each aliquot is weighted equally despite the flow into the plant at that instance. However, because flow and strength are inversely correlated in most plants [262],  Chapter 4. Measurement Process  158  the bias is tolerable for most operating decisions as it is much easier to weight a sample by time than it is by flow.  ecE  =  ewE +  CQE  (4.10)  Quality Fluctuation Error The Quality Fluctuation Error consists of three components: • QE : Short Range Quality Fluctuation Error, i.e. noise 1 • QE : Long Range Quality Fluctuation Error, i.e. trend 2 • QE : Periodic Quality Fluctuation Error, i.e. diurnal 3 Classical Sampling Theory assumes that QE 2 and QE 3 are not significant or have been compensated for by stratification of the population prior to sampling.  eQE  =  QE 6 1  +  2 eQE  +  QE. 6  (4.11)  Short Range Quality Fluctuation Error Short Range Quality Fluctuation Error, Error,  eFE  QE 6 , 1  consists of two components, Fundamental  and Segregation or Grouping Error, eGE.  Distributional or Macro Heterogeneity is the source of Segregation or Grouping Error. Macro Heterogeneity occurs when the mass of the investigated material is distributed in a non-random manner across the stream. This distribution may take the form of strata (i.e. secondary clarifier solids distribution) or a velocity profile (i.e. the solids distribution across the secondary sludge recycle line). If the structure of the heterogeneity is not known, the samples should be delineated either systematically or at random. If enough  Chapter 4. Measurement Process  159  samples have been taken to plot the profile of the population, the samples can be grouped 3 into strata after the fact to give the average concentration in each strata.’ Compositional or Micro Heterogeneity is the source of Fundamental Error. Micro Heterogeneity occurs when the stream either contains a dispersed phase or when the properties of the components of a single phase vary. Primary influent is an example of the first situation. The influent, which is mostly water, contains dissolved, colloidal and settleable solids. Dried secondary sludge is an example of the second situation. The characteristics of the sludge at a single concentration varies depending on its microbial composition. Fundamental Error cannot be suppressed by improvements in the measure ment process and therefore forms the basis of comparison for determining the importance of the contribution of other sources of error.  4.5.3  Error Correction  A possible procedure for reducing the overall error is listed below: 1. Audit: The operator should familiarize himself with the proper procedures for sam pling, preservation, transport, preprocessing and analysis. He should step through his monitoring program and try to identify any obvious errors in the system (Fig ure 4.3). An audit is much easier to do if the plant has a sampling protocol. 2. Plan: In some cases, the error may be reduced by using a different sample type or plan. Table 4.11 provides some suggestions on what sampling plan or sample type to use given the quality fluctuation error. The sampling frequency,f , is the frequency 3 at which an aliquot is removed from the stream. The analytical frequency.  fa.  is the  frequency at which the sample produces a value. If the sample is a grab sample, the The reader should not confuse stratification before sampling with stratification after sampling. The 3 ‘ former improves the precision of the overall mean if the strata are significantly different. The latter does not [78, pp. 89-90].  Chapter 4. Measurement Process  160  two frequencies are equal. If the sample is a probe average or a composite sample, the analytical frequency is the averaging or compositing frequency respectively. The evaluation frequency,  , 8 f  may be defined as being a quarter of the analytical  frequency. These guidelines were derived by the author using Shannon’s Sampling Theorem [18] and Gy’s sampling theory  14  3. Experiment: The precision of a measure can be improved by replication. However, the operator must determine what to replicate. Table 4.10 contains the results of a sampling experiment. Because the sampling error accounts for most of the variation in the data (87.8%), the operator would gain more precision if he collects two samples and conducts an analysis on each than if he collects one sample and replicates the analysis.  see evaluation frequency 14  in Appendix B  Chapter 4. Measurement Process  161  Error Check Equipment And Reactants Audit Analytical Methodology  Audit Laboratory Procedures  z:z;F4EioEz::---  Service Or Replace Sampling Equipment  Correct How Sample Is Delineated  Sample  Audit Sampling  Handung  Handling Procedures  Auditl  Figure 4.3: Example Audit Procedure  Chapter 4. Measurement Process  162  Table 4.11: Sample Plan and Type Selection Based On The Quality Fluctuation Error Dominant Form  Action  Random Error  1.  Determine the cause.  If  the cause is due to sample heterogeneity then increase the sample volume and/or homogenize the sample mechanically. the cause is due to subsampling or sample preparation then increase the qual ity control and/or improve the subsampling scheme. the cause is due to the analytical method then either change the method or perform repetitive determinations and calculate the average. a less precise but easy measure is available, a third alternative might be to use a double sampling type scheme.  If If If  Trend  1. 2.  Determine or estimate the form of the trend. Decide on the sampling interval that gives the optimal resolution of the trend.  Cyclic:  1.  Determine the cost of measurement  2. 3.  Determine if compositing is possible Determine if the determinand can be measured on-line.  If  on-line measurement can be used then measure at least 3f and filter out the dynamics you do not want. measurement cost is low and compositing not possible then use a stratified random design. the cost is high and compositing is not possible then use a stratified random design sparingly until a better measurement alternative can be found. a composite sample can be used then sample at 3f and composite.  (f<fa)  If If If  Cyclic:  (f  >  A grab sample may be all that is needed. The sampling or analytical frequency should be less 3f.  fe)  Cyclic:  (L  f  Definitions  f€)  Same as first cyclic case. The operator should be aware that a composite sam pler will introduce f into the time series under a somewhat damped alias. An alias is a high frequency component that appears in the sample (reconstructed signal) as a low frequency component.  f Frequency Of Fastest Significant Component fa Appraisal Frequency fe Evaluation Frequency (fa/4)  Chapter 4. Measurement Process  4.6  163  Good Data, Good Decisions  Measurement is the Achilles heel of treatment plant data analysis, modelling and control. Most of the problems discussed in this chapter cannot be rectified after the fact. If the data analysis is to produce meaningful results, then the measurement process must produce representative data [207]: statistical treatment is no substitute for good data.  /  Chapter 5  Fuzzy Sets, Logic and Reasoning  Computers give blunt answers. Yes or no, black or white. Researchers in artificial intelligence are trying to teach their machines a little subtlety: to encode the shades of gray characteristic of human thought. One approach that is producing good results is fuzzy logic, the creation of Lofti Zadeh of the university of California at Berkeley. His brainchild is now running cement kilns and making decisions for corporate managers. [13] The purpose of this chapter is to introduce the concept of a fuzzy set and explain how the theory is used to overcome data apartheid. The measurement paradigm uses fuzzy set theory in two ways: 1. To convert a string into a form the computer understands: For example, the operator tells the computer that the floc size is “large” (Table D.4, Appendix D). For example, the could computer map the string “large” into the following fuzzy set:  itarge(X) =  0  ifx<4O0tm  x;ooo  if 400< x < 500um  1  ifx500Rrn  (5.12)  The computer maps the data into a common space so that the data can be analyzed as a unit. 2. To convert a datum into a more reliable form: For example, the user tells the computer that the effluent Chemical Oxygen Demand (COD) is 22.1 mg/l. 164  Chapter 5. Fuzzy Sets, Logic and Reasoning  165  The computer “knows” that the COD test is unreliable below 50 mg/l. Instead of storing the datum supplied by the user, the computer stores the value as being less than 25 rng/l. This avoids false alarms, e.g. reporting a trend in COD values below 2.5 mg/l.  5.1  Literature  A comprehensive review of fuzzy set literature is beyond the scope of this thesis. This section limits itself to listing introductory reviews, a number of texts and a few examples of applications in wastewater treatment. Lofti Zadeh [314] [315] [316] published two short and accessible introductions to fuzzy logic.  ‘.  Matthias Otto [238] published a more extensive review discussing fuzzy theory’s  impact on the field of ‘hernometries. In the course of this research, a number of texts proved to be helpful. Dubois and Prade [99] and, to a lesser degree, Kandel [180] provide an overview of the subject. Smithson [277] discussed the application of fuzzy set analysis to categorical measures while Kaufmann and Gupta [183] dealt with ratio and interval measures (fuzzy numbers). Klir and Folger [186] reviewed fuzzy set theory in the context of information theory. The predominant application of fuzzy set theory in wastewater treatment is in the development of fuzzy controllers and diagnostic systems. The first controllers were de veloped by Tong et al [292] and Flanagan [109]. Flanagan used the DO profile in the plant to estimate the F/M activity in the plant and output daily settings for SRT. Recent applications include control of sewage pumping [160] and chlorination [167]. Fuzzy set theory is also used in a number of wastewater treatment expert systems [174] [190].  ‘A number of Lofti Zadeh’s papers have been collected into a single edition,”Fuzzy Sets And Appli cations”, New York:Wiley, 1987  Chapter 5. Fuzzy Sets, Logic and Reasoning  5.2  166  What Is Fuzziness?  Fuzziness is a form of uncertainty. Uncertainty may be due to vagueness or ambigu ity. Vagueness is associated with the difficulty in sharp or precise distinctions in the world [186] or some domain of interest which cannot be delimited by sharp boundaries. Ambiguity is associated with one-to-many relations, that is, when the choice between two or more alternatives is left unspecified. Fuzziness is a form of vagueness that is unambiguous. Possibility is one way to describe fuzziness. Possibility definitely is not probability in the frequentist sense, however, under some conditions, the difference between subjective probability and possibility is difficult to discern [277]. For example, given a COD value less than 25 mg/l, the likelihood (belief) and the possibility that the COD is between  o  and 25 mg/l are indistinguishable. The view among those who use fuzzy set theory  in the work place is that there is some overlap between the various versions of fuzziness and probability. No single uncertainty representation is able to describe every form of vagueness. Therefore, a researcher must choose the one which gives his problem the best coverage. The reader can refer to either Klir [186] or Smithson [277] if he wishes to pursue the issue further.  Chapter 5. Fuzzy Sets, Logic and Reasoning  167  Notion Of Sets  5.3  Fuzzy Sets are an extension of Crisp Sets. For this reason, this section introduces the notion of crisp sets and then extends this notion to that of fuzzy sets. 5.3.1  Notion Of Crisp Sets  Intuitively a set is a collection of objects (elements of the set). Let X denote a classical (or crisp) set of all objects of concern in each particular context or application, and let x denote a generic element in X. By definition, X is the Universal Set. Subsets are formed on X. Given a set A which contains the dates that a treatment plant does not meet its discharge permit, we can delineate the set by one of two methods: 1. List Method: name all a sets members. A  =  ,a 1 {a 2  Days Viol ated  =  {Jan2/90, Feb5/90,  a} ...,  Mar2/91}  2. Rule Method: specify some property the set’s elements must satisfy. A  ={aahas properties  DaysViolated  =  {Date BUD 5 > 3Orng/l or TS> 2.5rng/l}  Throughout the balance of our discussion, we will refer to two important universal sets: 1. fl: The set of real numbers. 2. Ar: The set of positive integers.  Chapter 5. Fuzzy Sets, Logic and Reasoning  168  We denote a n-dimensional Euclidean Vector space of real nnmbers as A convex set is a set where if a line is drawn between any two members of the set, all the elements that fall on the line are also members of the set. A set whose members are sets is referred to as an indexed set or a family of sets: B  =  {  A i  e I }.  For example,  if we define B to be the days treatment plants in British Columbia did not meet their permit, then I would be the number of plants in the province and A would the days treatment plant i violated its permit. The Characteristic or Discriminant Function delineates which elements x C X belong to A: Vx  C X  [LA(X) =  11 10  ifxeA ifx3A  The characteristic function maps elements of X into a set that consists of 0:Not A Member or 1:Member:  A : 1 [  X  —*  The cardinality of A,  {0, 1}. A  , is the number of elements of X that belong to A. The  Power Set of A, P( A), is an indexed set of all possible subsets of A. The cardinality of P( A) 5.3.2  =  Table 5.1 lists various properties of crisp set operations [186, pp. 9].  Notion Of Fuzzy Sets  Classical set theory is governed by a logic that permits a proposition to possess one of only two values: true or false. In the real world, the distinction between what is true and what is false is blurred. Truth is relative in the sense that something can be more true than something else or in some circumstances, something is true and in others, it is false. For this reason, a system that forces common sense knowledge into being always true or always false may not be reliable. The solution is to allow the boundaries of a concept such as truth or set membership to be graded. The consequence of this is that the excluded middle law is no longer true, i.e. A U  A$  X.  Chapter 5. Fuzzy Sets, Logic and Reasoning  Table 5.1: Properties Of Crisp Set Operations  Involution Commutativity Associativity Distributivity Idempotence Absorption Absorption of Complement Absorption by X or Identity Law Of Contradiction Law Of Excluded Middle DeMorgan’s Laws  = A AUB = BU A AnB=BnA (A U B) U C = A U (B U C) (AflB)flC=Afl(BflC) A fl (B U C) = (A fl B) U (A fl C) Au(BflC)=(AuB)n(AUC) AUA= A AflA= A A U (A fl A) = A An(AUA) =A A U (J fl B) = A U B An(AuB) = AnB AU X = X Anø=ø AU0 = A An X= X A fl A = 0 A U A =X A fl B = A U AUB =AflB  169  Chapter 5. Fuzzy Sets, Logic and Reasoning  170  A crisp set’s characteristic function maps an element either into the set or out of the set:  flA  : X  —÷  {0, 1}. A fuzzy set’s characteristic (membership) function maps  an element into a set by degrees depending on how compatible the element is with the underlying concept of the set:  [‘A  : X  —*  [0,  1]. Consequently, the utility of a fuzzy set  depends on the appropriateness of its membership function. Membership describes the possibility that an object is a member of a given set. Although both possibility and probability range between 0 and 1, they are very different concepts and should not be confused with one another. For example, Figure 5.la shows a data set of 15 objects in a 2 feature plane [238, pp. 106]. The goal is to cluster the data into two groups. When using a crisp objective function based on Euclidean distance, two clusters are obtained (see Figure 5.lb) where a membership value of 1 is assigned to the left cluster and 0 to the right cluster. Despite the symmetrical nature of the data, the two clusters are not symmetric. The reason for this is that the middle data point lies between the two data patterns. This point should be described as belonging equally to both. If we repeat the analysis using fuzzy clustering, then the middle elements given a membership value of 0.5. Figures 5.2c and 5.2d map the membership of the elements to both the right and left cluster respectively. The choice of the interval [0, 1] is arbitrary as any interval (e.g. [0, L]  )  could be  used. What is not arbitrary is that when elements are mapped into an interval, they are ordered by membership. The interval [0, 1] is the natural choice because in binary logic 1 represents TRUE or ON while 0 represents FALSE or OFF. Fuzzy Sets should not be confused with Fuzzy Measures. A fuzzy set assigns a degree of membership to each element in the universal set to indicate how compatible an element is with the concept of the set. A fuzzy measure assigns a value to each crisp subset of a universal set signifying the degree of evidence or belief that a particular element belongs in a particular subset.  Chapter 5. Fuzzy Sets, Logic and Reasoning  171  (A)  . .  B •  •  B  B  B  B  •  B  B  B  Butte/fly Cluster  (W  Cluster 1  B7  •  B  B  B  :B  B  Cluster 2 Figure 5.1: Butterfly Cluster  B  Chapter 5. Fuzzy Sets, Logic and Reasoning  172  (C)  • 0.86  0.14W 0.94  0.06  0.86 0.99W 10.971  0.14 • 0.Oi•  o.o  0.061  0.94 •0.86  0.14W  Cluster 1  (D) •0.14  0.86W  0.94  .0.06 0.14  10.03  •o.oi  •  0.86  •  0.99W  0.97  0.50 0.94  0.06 10.14  a  W  0.86W  Cluster 2  Figure 5.2: Butterfly Cluster : Fuzzy Clustering  Chapter 5. Fuzzy Sets, Logic and Reasoning  173  For example, jury members at a criminal trial are uncertain about the guilt or iuno cence of the defendant [186, pp. 107]. The boundary between these two sets is clear in that given perfect evidence, the jury would have no problem making a decision. Unfor tunately, the evidence is rarely perfect, therefore the jury must express its belief that the defendant is innocent or guilty. Belief is a fuzzy measure. In contrast, assume that in the course of the trial, a witness states that he saw a tall man run from the crime. After the witness is finished, the prosecutor informs the jury that the defendant is 5’ 10” tall. In this case, the problem is not with the evidence but with determining what is a tall man. Not at ion The usual way to denote an element in a fuzzy set is t/x where  t  is the membership and  x is the element. For example, 0.8/50 mg/l may denote a possibility of 0.8 that 50 mg/l 5 is considered a high value. A less common denotation is to write an element as a BOD pair: (x,t).  The two most common methods to denote a fuzzy set A are as a union or as a set of ordered pairs: • As a union: =  A  A(x)/xj  denotes union over discrete universe  where  or A =  J  4(x)/x  where  f  denotes union over continuous universe  • As ordered pairs: A  = {(xi, ILA(X1)), (x , [IA(X2)),.. 2  .  ,  (x, RA(Xn))}  Chapter 5. Fuzzy Sets, Logic and Reasoning  For example, let X where  Xa  =  174  [0. oc] be a measure of distance. Let A be the fuzzy set Long  is the distance beyond which the membership is different from zero. We can  denote this set in one of two ways: • Method 1: RL0n ( 9 X)  lo =  ifx<xa  11—c’’  ra)  ifx>Xa  • Method 2: A =  J  X<Xa  0/x +  J  [1  —  X>Xa  where + denotes nnion. Basic Definitions The support of A is the crisp set of all the elements A with positive membership: supp(A)  =  {x  I  /JA(X) >  of any element in A:  0}. The height of A is the highest degree of membership  hght(A)  =  supxexiUA(X)  where sup means supremum. A crossover  point is an element in A whose membership is 0.5. The cardinality of a crisp set is the number of objects in the set. The cardinality of a fuzzy set lies between the number of items with a membership greater than zero and the number of members with a membership of one, lAlo.o scaler cardinality of a fuzzy set A is given by  IA!  =  Al  Z€X PA(x) where  The fuzzy cardinality of A is a fuzzy set or a fuzzy number where  lAl= . 1 o. The means sum  [tIAI(IAI)  =  a. The  definition of a fuzzy empty set is similar to that of a crisp empty set: A is empty (A  0) if Vx  C X,,uA(x)  Norrnally, 2  =  0.  means union when discussing fuzzy sets  2  =  Chapter 5. Fuzzy Sets, Logic and Reasoning  175  Two fuzzy sets are equivalent if they contain the same items with the same degree of membership: A equals B (A  =  B) if Vx € X,RA(x)  =  /1B(X).  The definition of  nonequivalence follows from the definition of equivalence: A not equals B (A X  $  B) if  (x) 4 € X, One set contains another set if for every item in the first set, the membership in the  second set of that item is less than the membership in the first set: A contains B (A ç B) if Vx C X,  ILA(x)  RB(x). B is proper subset of A if (A C B) and supp(A)  0 supp(B).  An a-set, Aa, is a crisp set of elements of the fuzzy set A whose membership function exceeds a threshold value a: Aa : x C X p,A(x)  a. A fuzzy set may be represented by  a set of a-sets where a is varied from 0 to 1. It is much easier to program a computer to operate on this representation of a fuzzy set than on the more conventional representation described in the previous section [99].  Chapter 5. Fuzzy Sets, Logic and Reasoning  176  Notion Of Variables  5.4  A variable’s type is determined by what the variable takes for values (Figure 5.3). A crisp variable’s values are crisp sets, i.e. DaysViolated. A crisp number and interval values . An 3 are restrictions on 7?. A crisp number’s set has one member, i.e. Volume is 100 m interval’s set contains all the values on 7? between two limits, i.e. Normal Workday is between 7.5 and 8.5 hours long. A probabilistic variable value’s are given by a reference distributiou. If the reference distribution is Gaussian, the set is defined by a mean and a standard deviation. A fuzzy variable’s value is a fuzzy set, i.e large. A fuzzy number’s sets are convex normal restrictions on 7?. A linguistic variable’s values are words. A word is meaningless unless it is defined as part of a language. In fuzzy set theory, a word is equated with a fuzzy variable. For example, Floc Size may take on the values Small  }  {  Large, Mid-Sized,  (Table D.4, Appendix D). Each value is associated with a fuzzy set, i.e. large is  defined by equation 5.12. The measurement paradigm maps all values to a crisp number, a mean/standard deviation or a fuzzy set. Each of these representations may he mapped to a fuzzy set. In this way, all variables may be analyzed at the same time.  chapter 5. Fuzzy Sets, Logic and Reasoning  177  1.0: -  Crisp  0.0  1.0 -  Interval  1.0=  0.0_ Possibility 1.0  Mean/Std  -  0.0 Likelihood Figure 5.3: Representations Of Uncertainty  Chapter 5. Fuzzy Sets, Logic and Reasoning  5.5  178  Notion Of Fuzzy Controllers  In the early seventies when expert systems were being developed in the United States, a parallel development took place in the United Kingdom  -  the development of rule-based  controllers. Although not originally conceived as expert systems, rule-based controllers possess all the same characteristics of an expert system. The first controllers used fuzzy set theory to represent the rules and rule-based control has been associated with fuzzy sets ever since. Control is based on deviations from a set-point, much in the same way as PID control. Efstathiou [100] describes Image Automation’s fuzzy controller for a cement kiln. As of 1988, over a third of the United Kingdom’s cement making capacity operates using this controller. A self-organizing rule based controller is one that corrects its rules on the basis of the controller’s performance. The controller is the rule-based counterpart to an adaptive controller.  5.6  Conclusions  Fuzzy set theory and fuzzy logic enable the measurement paradigm to eliminate data apartheid and to represent vague data. Fuzzy set theory may also provide a method to reason about the data analysis and assist the operator in making control decisions.  Chapter 6  Structure Paradigm  4s simple as possible and only as complicated as necessary 2  1  The purpose of this chapter is to explain the Structure Paradigm. The Structure Paradigm places various forms of process knowledge into a single data structure providing a datum with a structural context. The importance of the structural context in managing the information gathering and treatment processes was discussed in Chapter 2 ter 3  .  2  and Chap  The primary purpose of the structural paradigm is to enable a computer program  to create an image of these processes within its memory so that it can establish process derived relationships among the data. Process knowledge falls into one of four classes: • Structural: Layout of the process • Measurement: Measurements taken on the process (1) to monitor, control, diag nose the process or (2) to validate the measurement process. • Derivation: Statistics and other numbers derived from the data. • Reasoning: Rules and equations provided by the operator that enable the com puter to reason about the process. Structure is the language by which information in the other classes is expressed. ‘Engineering Maxim PLF’s and Computer-Based Solutions 2 Cause and Effect 3 179  Chapter 6. Structure Paradigm  180  The chapter consists of four parts: • Introduction to Data Base Management System (DBMS) Models: The difference between a relational and network databases is explained using a text book’s subject index as an example. • Layout of a Simple Plant: The concept of hierarchical network is introduced using a simple activated sludge plant. The plant is simple because it has only one primary clarifier, one secondary clarifier and one aeration tank. This makes the plant’s schematic easy to draw. This example is referred to throughout the balance of the chapter. • Definition of the Structure Paradigm: The basic elements of the structure paradigm are defined. These include nodes, links, classes, planes, streams and currency. • Plane Types: An introduction to the structural representation of different types of process information. • An Example of How the Structure Database May Be Built: Chapter 1 mentioned that a number of test programs were written to ensure that the computer can do a task. The example in this section is one of these test programs  .  This example  is written in ANSI C [211] and Assembler [212] under MS-DOS. The database was developed using db_FILE. dh_FILE  ©  [250] is a Database Management System  (DBMS) for use by C language application developers.  The example consists of over 20,000 lines of code not including file I/O and list management. Contact 4 the author about the availability of the source code.  Chapter 6. Structure Paradigm  6.1  181  Database Management Systems  Because most database packages that run on personal compnters are relational databases, most people are unfamiliar with the concept of a network database.  The difference  between the two models is discussed in this section because this research uses a network database. For a more in depth presentation of this topic, one can refer to C. J. Date’s book “An Introduction To Database Systems” [92]. Database Management Systems (DBMS) are based on variations of either the rela tional or network model. The relational model views a database as a set of two dimen sional tables where the columns correspond to fields and the rows form relations between these fields. The advantages of this model are threefold: 1. A relational database is easy to query. 2. New relations may be formed from existing relations. 3. The database structure is simple. A relational database is used when the database structure either is not known ahead of time or may change. The disadvantage of the relational model is that it is difficult to store one-to-many type relations without storing a large body of redundant data. A one-to-many relation consists of a single owner (i.e parent) and any number of members (i.e. children). dBASE IV  ©  and Paradox  ©  both use the relational model.  Chapter 6. Structure Paradigm  182  Table 6.1: Portion Of An Index  Sludge conditioning (see Conditioning of sludge) conversion processes, 520-522 selection of processes, 70, 71 vacuum filters (see Vacuum filtration)  The network model extends the relational model by allowing the user to store rela tionships among records with the records themselves. The advantage of a network DBMS over a relational one is twofold: 1. Network databases use storage space more efficiently than relational databases. 2. Network databases allow the storage of relationships among the data with the data itself. A network database is a complex structure. Therefore, once built, its structure is difficult to modify. To illustrate these differences more clearly, consider the following example. An author wishes to construct a subject index for his book. A portion of a typical index is shown in Table 6.1. Using this table, we can make four generalizations about subject indexes: • A subject cannot be more than 40 characters in length (arbitrary). • A subject may have any number of secondary subjects,. i.e. Sludge has at least four.  Chapter 6. Structure Paradigm  183  • A subject may have any nnmber of page references associated with it, i.e. selection of processes has two. • A subject may have any number of synonyms. An author uses a synonym to refer the reader to another part of the index, i.e. see A network DBMS implements this using two records: one to contain a subject and a second to contain a page number. The DBMS constructs two sets.  The first set  links a subject to its synonyms (synonyms) and the second links a subject with a page number (paga.refs)  .  Figure 6.1 contains the database schema. If the user requires the  page numbers associated with a subject, the user simply makes the subject the owner of the pagesef set and retrieves all the set’s members. The reader should refer to the db_FILE [250] manual if they wish to explore this example further. The user could accomplish the same task using a relational DBMS if they took one of two approaches: • Construct the record to accommodate the extreme case. This approach results in a large, mostly empty file. • Mimic a network DBMS by storing large amounts of redundant information and writing code to model the set relationships. This approach requires a large number of database files linked together by user supplied relations. A relational DBMS shifts onto the user (or the programmer) the task of managing re lationships among data. Although this burden is manageable in most small business environments, the burden becomes unacceptable when managing treatment plant infor mation. The book indexing example (Table 6.1) provides the reader with a point of reference when reviewing the example given at the end of this chapter. In this  chapter, slanted type is used to indicate a db_FILE set.  Chapter 6. Structure Paradigm  184  Subject  Jr  Primary Name Secondary Name  page_refs  Reference Page Number  Figure 6.1: Book Index Schema  synonyms  Chapter 6. Structure Paradigm  6.2  185  A Simple Plant  An operator and a computer view networks differently. An operator views a network as a single entity (e.g. story) while a computer views a network one node at a time (e.g word). A computer can deal with a network of almost any size because it deals with each node separately. An operator views a network as an entity in terms of what the links and nodes describe. For this reason, a computer program that processes a network must strike a compromise between these two viewpoints. A network is a mnch more complex entity to think abont than a node. Therefore, most operators reason best when the network consists of no more than eight nodes. The software must both provide a node level description to the computer while providing a network level description to the operator. The best way to meet both the needs of the operator and the computer is to represent the system as a hierarchical network. This is one example of the “organic-digital” compromise discussed in Section 1.2.6. A hierarchical network is a tree (i.e. hierarchy) of networks. A tree is a one-to-many relation with one node owning any number of nodes and links, i.e. a network. In this application, the parent node is a more “abstract” concept than its children, i.e. the level in the hierarchy is based on the level of abstraction. Like a conventional network, a hierarchical network describes a node by what immediately precedes and follows it. However, a third dimension (i.e. level of abstraction) is added which elaborates on a node using yet another network. The example discussed in this section illustrates this compromise. Figure 6.2 describes a simple activated sludge plant.  The level of abstraction of the plant’s components  vary. For example, “Split Underfiow” is a more concrete concept than “Aeration Basin”.  Chapter 6. Structure Paradigm  186  Therefore, the first task is to group the nodes iuto sets with the same level of abstraction, i.e an outline  6  Figure 6.3 is an outline of the plant. The first level, LP.?  ,  consists of seven nodes.  Two of these nodes, LP.3 and LP.4 own children. For example, LP.3 Primary Treatment owns [P.3.1  -  LP.3.4. The networks are formed using nodes at the same level of abstrac  tion. Figure 6.4 shows a “bird’s eye view” of the hierarchical network. A node that is designated as being a source or a sink node in the outline marks the edge of the net work, i.e. the origin or destination of a link is outside the process. This is analogous to replacing a connection with a force when drawing a free body diagram. The first network an operator would see provides an overview of the plant (Figure 6.5). The terms “Currency” and “Coordinates” which appear in the figure are defined in Section 6.3.7. The node [P.3 Primary owns a network which describes what primary treatment entails (Figure 6.6).  This is a simple network because the plant has only  one primary clarifier. The term “Capacity” which appears in the figure is discussed in Section 6.4.2 and Chapter 7.  Similarly, the node [P.4 Secondary is described by the  network in Figure 6.7 and [P.4.3 is describe further by the network in Figure 6.8. This representation forms the basis of the operator’s interaction with the computer. An operator would begin his analysis of his plant first by taking an overview (Figure 6.5), and then would proceed to focus in on trouble areas by “opening up” a unit process, e.g. Figure 6.5  Figure 6.7 =i Figure 6.8. The computer provides a datum with a structural  context by associating it with the part of the plant the datum describes, e.g. MLSS with [P.4.3.2 Aeration Basin.  For example, the input for the example describe in this chapter was developed using Microsoft Word’s 5  © outlining feature. In order to keep the example simple, sludge processing has not been elaborated on. In this chapter only, names of nodes which appear in the text will he in sans serif type. T  Chapter 6. Structure Paradigm  Figure 6.2: Simple Plant  187  Chapter 6. Structure Paradigm  P. City XYZ Water Pollution Control Center LP.l Plant Influent [Source] LP.2 Mix LP.3 Primary Treatment LP.3.1 Primary Infinent [Source] LP.3.2 Primary Clarifiers LP.3.3 Primary Effluent [Sink] LP.3.4 Primary Sludge [Sink] LP.4 Secondary Treatment LP.4.1 Secondary Influent [Source] LP.4.2 Mix Return Sludge With Influent [Join] LP.4.3 Aeration Basin Bioreactor LP.4.3.1 Aeration Basin Influent [Source] LP.4.3.2 Aeration Basin LP.4.3.3 Aeration Basin Effluent [Sink] AP.4.3.4 Aeration Basin Air Supply [Contains a Source: Energy] LP.4.4 Secondary Clarifier LP.4.5 Secondary Effluent [Sink] LP.4.6 Wastage From Secondary Clarifier Underfiow [Sink] LP.4. 7 Split Secondary Clarifier Underfiow [Split] Plant Effluent [Sink] LP..5 LP.6 Sludge Processing LP.7 Composting [Sink]  Figure 6.3: Outline Of Plant  188  1 89  Chapter 6. Structure Paradigm  Influent LP.1  I  Sludge  LP.4: Secondary Treatment  Figure 6.4: Simple Plant: Bird’s Eye View Of A Hierarchical Network  190  Chapter 6. Structure Paradigm  Influent  Compost  LP.1  LP.7  Currency: Flow {L /T] 3  Coordinates: [M/L ] 3 Mix  1. Solids  LP.2  2. Substrate 3. Ammonium-N  Primary  Sludge  LP.3  LP.6  —  4. Nitrate-N  Secondary LP..4  Effluent LP.5 Figure 6.5: LP.: Treatment Plant Plane  5. Toxic Compound  191  Chapter 6. Structure Paradigm  Currency: Flow [L /Tj 3  Coordinates: [M/L ] 3 1. Solids 2. Substrate 3. Ammonium-N Influent  4. Nitrate-N  LP.3.1  5. Toxic Compound  Capacity:  Clarifier  1. Number: 2 Tanks  LP.8.2  I Effluent  1 Sludge  LP.3.3  LP.3.4  2. Length: 46.33 m 3. Width: 11.58 m 4. Depth: 3.10 m  Figure 6.6: LP. 9: Primary Treatment Plane  192  Chapter 6. Structure Paradigm  Currency: Flow [L /TJ 3  Influent LP..1  Coordinates: [M/L ] 3  +  1. Solids  Mix  2. Substrate  I  3. Ammonium-N 4. Nitrate-N  Bioreactor  5, Toxic Compound  LP..3  I Clarifier  Split  LP.J  LP..7 I  Effluent  Wastage  LP.4.5  LP.4.6  I  Figure 6.7: LP..i: Secondary Treatment Plant  193  Chapter 6. Structure Paradigm  <Liquid Stream> Currency: Flow [L /T] 3  Coordinates: [M/L ] 3 1. Solids 2. Substrate 3. Ammonium-N luffuent  4. Nitrate-N  LP.4.3.1  5. Toxic Compound F  Basin  Air  <Air Stream> Currency: Mass [M/T]  Effluent LP.4.3.3 Figure 6.8: LP. ’.3: Bioreactor 11  Chapter 6. Structure Paradigm  6.3  194  Overview Of The Structure Paradigm  The Structure Paradigm offers a structured and extendible way to process information obtained on a process whose structnre is relatively stable, e.g. a treatment plant. The paradigm accomplishes this by creating a data structnre that provides the computer with a description of the process. This data structure forms a skeleton onto which the computer can graft other classes (i.e. types) of process information (e.g. monitoring program, simulation model). For example, assume the operator wants to tell the computer where a dissolved oxygen probe is located in his process. The computer draws Figure 6.8 on the screen and asks the operator to point to the probe’s location. The operator points to node LP.4.3.2 Aeration Basin. The data structure enables the computer to associate a location on the screen with a node in the hierarchical network. Now assume the operator enters a rule into the computer that the aeration basin dissolved oxygen should not exceed 3 mg/l. The computer can use this rule because the computer can retrieve the aeration basin’s location in the network from the data structure and then retrieve any dissolved oxygen data collected at this location. The data structure consists of two components: (1) skeleton and (2) flesh.  The  skeleton is a hierarchical network modelled after the structure of the process. The flesh is formed from the remaining forms of process information, e.g. monitoring program, model, diagngstic rules. The first step is to build the skeleton or structure. We define structure as the physical and influence layout of the plant. The physical layout consists of pipes, unit processes and sampling locations while the influence layout consists of items whose connection cannot be reduced to a physical entity, e.g. the weather. The computer views this structure as a sequence of records in the database that describe nodes, links and loops in the network. The computer gives each of these elements a unique code. All other forms of knowledge  Chapter 6. Structure Paradigm  195  “talk about” the layout of the treatmeut plant using these codes. The other forms of process knowledge are broken into three classes: • Measurement: Information on the information gathering processes, e,g monitor ing program, QA/QC program. • Derivation: Information that tells the computer how to derive information from the data and what this information might mean, e.g Sludge Age. • Reasoning: Information that uses information on the process structure, on the information gathering processes, on the information derived from the data and the monitoring data itself to reason about the process, e.g operating rules and simulation models. These classes of information are grafted onto the network in the order listed because each is expressed in terms of the classes preceding it (Figure 6.9). For example, the SRT is defined using elements from the monitoring program, e.g. MLSS, and the process structure, e.g. SRT describes an aspect of the LP.4 Secondary Treatment (Figure 6.5). As discussed in the previous section, the skeletal data structure is a cross between an outline and a network. An outline is a branching structure that groups elements according to their level of abstraction. An outline is an example of a one-to-many relationship, as each section owns one or more subsections. We refer to the section as the parent and the subsections as children. An outline enables a user to maintain conceptual control over a complex system. For example, if an operator added a second primary clarifier to Figure 6.2, the figure starts to become “untidy” because the effluent and sludge links cross each other. With each added degree of complexity, the figure becomes more and more incomprehensible to the operator. A similar data structure was used by Narayanan and Viswan [222] to uncouple failure detection rules from the structure of a system.  Chapter 6. Structure Paradigm  -  Structural  196  •---•  7ur11’  / Measurement  ---•-•  I  capacity  I  Diagnostic /  Explain Predl  J  Learn Plan  Reasoning Derivation  Figure 6.9: Overlap Of Classes Of Information  Chapter 6. Structure Paradigm  197  An alternative approach is to construct an outline of the process, (Figure 6.3). nest siblings beneath their parent and display each family as a separate diagram. This is the approach taken in the previous section. The user navigates through the plant by travelling up and down the outline and through the network formed by a group of siblings. The rule of thumb is to limit a network (on a given plane) to no more than eight nodes (see Figures 6.5, 6.6, 6.7 and 6.8). Perman [242] observed that “experts”  8  describe plants  and their components with varying levels of detail, much in the manner described here. The structure paradigm is based on the assumption that structure is the basis of knowledge.  A recent UBC study analyzed the water quality in 12 apartment build  ings [272].  The study took the form of a 2x2 Split-Plot Factorial Design which was  blocked by Run. The experimental layout (i.e. structure) determined what buildings were studied and how the data were analyzed (by associating each datum with the opti mal combination of age, building height and location effects). The analysis exploited the structure of the data in order to extract the maximum amount of information from the data set. An observational study of the same size would have yielded far less information than the study that was done. The reason for this is that the researcher selected the buildings that would give him the optimal combination of plumbing and age effects. In this way, he used the structure of the study to extract from his data the effect of plumbing type, building age and building height and their interactions. In other words, he used the four principles underlying the Statistical Design Of Experiments [79] to his advantage. The alternative would have been to select the buildings at random and attempted to classify data after it had been collected. Similarly, the structure of the treatment plant forces a framework on all process infor mation including monitoring data. The buildings had a “natural organization” that the The definition of an expert is subject to some debate in wastewater treatment (see Section 2.4). 5  Chapter 6. Structure Paradigm  198  researcher recognized and exploited in his study. Similarly, because all the information collected on a single treatment plant “talks about” the same system, the structure of the system provides a framework with which to analyze information. The structure paradigm quantifies this structure so that the data analysis modules can exploit this structure to extract information about the system. Tn other words, if the computer knows where in the process the data originate, the computer can use the structure to organize the data for the operator, making the task of data analysis much easier to do correctly. All the information collected on a system share one common denominator: the sys tem’s structure. At the moment, rule bases, models and monitoring data each have a unique and non-portable manner of expressing structure. This leads to duplication of information as well as an inability to access each other’s data. The alternative is to develop a language that all the classes can nse that expresses structural context. In this way, the program only needs one copy of the process’s structure. For example, assume a user wishes to diagnose a problem in the aeration basin. The user identifies which unit process he wishes to study, and the program then sets up the problem by constructing a cause-effect network description of the unit process and retrieves the relevant data sets. The next step is to retrieve from a rule-base the rules as sociated with the unit process. The inference engine is able to access the monitoring data, information on the monitoring program, a simulation model as well as its own rule-base using the system’s strncture. The efficacy of such an approach is recognized by expert system developers. One of the conclusions of Perman’s work [242] with SLUDGECADET was that the best way to improve an expert system is to give it the ability to communicate with other plant databases, e.g. monitoring data and maintenance information. No known existing commercially available software package can implement the paradigm efficiently. For this reason, a new package will have to be developed, and this activity forms the main thrust of this thesis.  Chapter 6. Structure Paradigm  6.3.1  199  Nodes  The paradigm starts with the presumption that the universe of discourse [120] consists of nodes. The notion of a node is quite broad. A node may be concrete or abstract. A node is concrete if the object it represents exists as an entity in reality. Examples of concrete nodes include a tank (e.g. Figure 6.8: LP.4.3 Aeration Basin), a sample and a location in the plant. A node is abstract if it represents a concept or a grouping, e.g. the weather and the secondary treatment train, e.g. Fignre 6.5: LP.4 Secondary Treatment. The paradigm uses five “built-in” node types: • Source and Sink nodes form the edge of the system. By definition, a source node has no inputs and a sink node no outputs. For example, the sewer into the plant conld be an input node (e.g. Figure 6.5: LP.1) and the receiving water an output node (e.g. Figure 6.5: LP.5). • A Point node defines a location in the plant. A point node has one input and one output. The location of the effluent sampler could be a point node. • Split and Join nodes describe the action of splitting and joining a stream. The paradigm assumes that when a stream is split, each of the split streams possess the same characteristics as the input stream. In other words, a split node reap portions the currency (e.g. flow) but does alter the stream’s coordinate variables (e.g. BUD , TSS concentrations). Splitting the secondary clarifier underfiow into 5 a sludge recycle stream and a sludge wastage stream is a split node (Figure 6.7:  LP.4.6 Split). The paradigm assnmes that when a number of streams are joined, their contents are completely mixed to form a single exit stream. The joining of the plant influent and the sludge processing supernatant is an example of a join node  Chapter 6. Structure Paradigm  200  (Figure 6.5: LP.2 Mix). The paradigm describes the relationships among nodes us ing a combination of ordered and unordered sets, and one-to-one and one-to-many relationships. Links  6.3.2  A link is an ordered set consisting of three nodes (Figure 6.10): • Parent: The parent node defines the highest level of abstraction that a link reaches. • Source: The origin of the link. • Sink: The destination of the link. For example, Figure 6.7 shows that the clarifier’s underfiow moves to the sludge splitter box where it is split into two streams: wastage and recycle. The parent node is LP.4 Secondary Treatment, the source node is LP.4.4 Clarifier and the sink node is the LP.4.7 Split. This link is a concrete link in that it describes a pipe that connects the clarifier to the sludge splitter box. A link may also be abstract meaning it describes a causal connection that is not reducible to a physical entity, i.e. the influence the weather exerts on the operation of the plant. A planar link is a link that connects two nodes on the same plane. The link between the secondary clarifier and the sludge splitter box is a planar link: LP.4.7  }  .  {  LP.4 : LP.4.4  The link between the primary clarifier effluent and the secondary treatment  influent is a cross planar link (Figure 6.4):  {  LP.: LP.3.3  LP.4.1  }.  A concrete link differs from an abstract link in that a concrete link passes material while an abstract link passes influence. For this reason, we talk about a concrete link’s capacity and an abstract link’s strength. If a link does not pass anything, it ceases to exist. Syntax: 9  { Parent: Source  =‘-  Sink  }  Chapter 6. StrUCtfh paradigm  201  \Cr  oSS  Planar lrnk  Figure 6.10 Link  Pa Link  Chapter 6. Structure Paradigm  202  Derived Links A link may be referred to as being as being raw, derived or hierarchicaL A raw link is a link supplied to the program. The other types of links are derived by the program from a raw link in order to accomplish a task, e.g. plot the plane on the screen. These links are archetypes of the implementation rather than the paradigm. A derived link is a piece of a raw link. For example, if a raw link crosses from one plane to another, the program divides the link into its planar and hierarchical components much in the same way a physicist divides motion into its vertical and horizontal components. A hierarchical link is a link that travels within a family, i.e. parent to child. For example, the link  {  LP.: LP.3.3  [P.4.1  { { {  }  may be broken into the following components:  LP.3 : LP.3.3 LP.3 LP. : LP.3 LP.4 } LP.4 : [P.4 LP.4.1  } }  Hierarchical Planar Hierarchical  Chapter 6. Structure Paradigm  6.3.3  203  Mapping  A map is an ordered set that charts information from one space into another.  The  paradigm nses maps in three situations: • Information must be communicated from one class or stream to another. For ex ample, a map associates a hierarchical network description of a particular clarifier model with LP.3.2 Primary Clarifier (Figure 6.6). • Information must be communicated between two planes with different coordinate systems. For example, a primary clarifier model may partition Total Solids into  {  Inert Settleable Solids, Volatile Settleable Solids, Inert Nonsettleable Solids,  Volatile Nonsettleable Solids  }.  A map associates the primary clarifier influent  solids measure with the model’s solids component. • Influence is inherited from an ancestor (see Figure 6.11). For example, the user (of the program) indicates that the weather affects the plant, i.e. LP.. The weather data are collected at a nearby weather station and therefore are not associated with any particular part of the plant. The paradigm assumes then that all the planes beneath this node are influenced by the weather, e.g. LP.3 Primary Treatment and LP.4.3.2 Aeration Basin. An influence map propagates this information down the hierarchy. The example showed that there are a number of situations when a node has only links coming in. If the link is concrete, this signals that the node is a sink, i.e. a part of the system where material passes out of the system into the world. If the link is abstract, the link indicates that the link’s source influences the link’s sink as wells as all its children. An analysis of the example showed that these two situations must be treated differently. This is why the notion of influence mapping  Chapter 6. Structure Paradigm  204  was added to the structure paradigm.  6.3.4  Class  A class groups uodes and links by the type of information they represent. There are four classes of process information (Figure 6.9): 1. Structural 2. Measurement 3. Derivation 4. Reasoning Each class expresses knowledge in terms of the classes listed above it, e.g. derive is expressed in terms of measure and structure. The measurement class consists of four forms of measurement knowledge: • Monitoring Program: At various locations in the plant, the staff take samples and conduct measurements on the process (on a regular basis). • Diagnostic Program: At various times, the staff take additional samples and measurements to diagnose a problem. • Quality Assurance/Quality Control: The staff test out the assumptions on which their monitoring program is based. • Capacity: The capacity is the maximum amount of material a unit process can process.  The algorithm cannot interpret the meaning of an allocation measure  without knowing the unit process’s capacity, e.g. a plant has three clarifiers but is only using two at the moment.  Chapter 6.  StrUCt paradigm  205  lntluefl Unk  Child Inherits Llflk  Figure 6.11: Influefl Map  Chapter 6. Structure Paradigm  206  The purpose of the measurement class is to describe the structure of the iuformatiou gathering processes. The purpose of the derivation class is to define numbers that are derived from the monitoring data: • Summary Statistics: e.g.  running average, Tukey’s five number summary (see  Section 2.2). • Operation Statistics: e.g. SRT, F/M ratio • Mass Balances: e.g. net substrate consumption in aeration basin • Yields: e.g. solids generated per unit of substrate consumed in aeration basin • Relations: e.g. relationship between effluent solids concentration and MLSS, influ ent flow rate and underfiow flow rate (see Section 2.2.5). Derivation knowledge may be described by a hierarchical network and a macro. A macro is a set of instructions that the program’s internal language can interpret  The macro  uses a plane as a data structure. The internal language would have functions that perform mass balances, calculate summary statistics and fit relationships. The form that this language should take is a topic for further investigation. Over time, an operator may develop a set of operational rules that are based on both experience and theory. These rules may take the form of a simulation model or a rule-base. One way to implement these is to express them as function that the internal language interpreter would call or a small rule-base that the interpreter would pass onto an inference engine. The functions and the rules would be expressed in terms of the other ‘°The use of an internal language interpreter is discussed in Chapter 10. An interpreter enables the user to write macros that the program can execute.  Chapter 6. Structure Paradigm  207  classes of knowledge. To date, no commercially available treatment plant software has achieved this level of integration. The hierarchy of these classes of information highlights the fanlt of some of the re search in to treatment plant operations. A good operator does not need a model or an expert system to advise him as much as he needs a way to control the gathering of information on his process and a way to discern the effect of his interaction with his plant [190]. 6.3.5  Structural Relationships  Path, Loop and Network  A path is an ordered set of consecutive links and nodes that trace the way (along the time axis) from one node to another. A path event, a path positioned in time, is a three dimensional entity with the source occurring in time before the sink. This concept is important when trying to understand how a loop affects a system. A loop is a path that begins and ends at the same node. For example, Figure 6.7 shows that returning a portion of the clarifier under flow to the aeration basin creates a loop. The paradigm constructs two types of networks from the links and nodes: (1) Mass Network and (2) Causal Network. The former includes only concrete links while the latter includes both. The example includes an algorithm for constructing loops given a set of structural links and nodes. Once a loop is constructed, the example program stores the loop and assigns it a unique code. The reason for this is that loop detection is a onerous task. Given that the structure of a plant is relatively static, it is more efficient to direct the program to store the loop rather to ask the program to reconstruct the loop over and over again.  Chapter 6. Structure Paradigm  6.3.6  208  Level and Plane  Levels order planes within a class by degree of abstraction. A plane is a set of nodes and their relationships that share a common parent and the same level of abstraction. The abstraction axis follows a tree-like strncture as it emanates downward until it culminates in the most concrete level of abstraction. For example, a possible tree in the struc tural class might proceed from plant (Figure 6.5) to secondary treatment (Figure 6.7) to aeration basin (Figure 6.8). Abstraction is one step in the problem solving process [319]. Reality is far too complex for us to comprehend so we model what is important and ignore the rest. Abstraction is task driven in that the utility of the abstraction is determined by our ability to use it to accomplish a predefined task. However, abstraction is not easy. Invariably, our first try at abstraction turns out be a vague generality. A vague generality is a simplified answer that appears to be good enough but is not. An abstraction is a refinement of a vague generality. An abstraction has five attributes [319]: 1. An abstraction abbreviates or simplifies the thing from which it is abstracted. An abstraction omits irrelevant details. 2. An abstraction characterizes a set of things. An abstraction groups similar items into one set. 3. An abstraction is precise. The description is exact enough that there is no doubt when the question is asked, “Is this thing an example of that abstraction?” 4. An abstraction is complete and accurate. details.  An abstraction contains all relevant  Chapter 6. Structure Paradigm  209  5. An abstraction may be hierarchical. Abstraction A may be stated in terms of abstraction B and C. Abstraction enables the paradigm to organize the process and its ancillary structures such that the nser can manage the process. An abstraction is the map that guides the user when he must find bugs in the system and when he must find hooks onto which changes can be hung. 6.3.7  Stream and Currency  A Stream is a group of nodes that share a common currency. The best way to illustrate this is to describe some of the streams that may be found in a treatment plant: • Liquid: The main stream in a wastewater treatment plant is the links and nodes that process the liquid waste. The currency is flow in [L /T]. 3 • Dry or Wet Chemical: This stream maps the use of chemicals such as methanol, pickle liquor, lime, or chlorine. The currency is mass [M/T] or flow {L /T]. 3 • Air: Maps the aeration system from the blowers to the aeration basins and/or grit chambers. The currency is flow in [M/T Air]. • Solids: Maps the solids handling in the treatment plant such as composting or drying beds. The currency is mass in [M/T]. • Energy: Maps the use of energy throughout the plant. The currency may be cost. • Labour: Maps the amount of hours personnel spend maintaining a part of the plant. The currency would be man-hours. • Reliability: Maps the amount of time a piece of equipment is available. The currency would be time.  Chapter 6. Structure Paradigm  210  A stream carries a coordinate variable, i.e. water carries nitrates and solids. A coordinate parameter must be a fundamental quantity [284]. For example, Figure 6.8 contains two streams: liquid (L) and air (A). The currency for the former is volume flow [L /T water] 3 and the latter is mass flow [M/T air].  Chapter 6. Siucture Paradigm  6.4  211  Structure Of Planes  The paradigm represents all classes of knowledge using planes and macros. A plane consists of a group of nodes and their relationships (i.e. sets) while a macro is a set of instructions that the internal interpreter would execute (i.e. functions). The purpose of this section is to describe the layout of these planes, leaving the procedural elements for future study.  6.4.1  Structural Plane  A structure plane contains a network of nodes connected by links (see Figure 6.7). All but the root plane has a parent. The root plane consists of a single node that delin eates the start of the hierarchical tree. This node contains the identity of the system being modelled, e.g. City XYZ Water Pollution Control Center. Node LP.4 Secondary (Figure 6.5) is the parent of LP.4? (Figure 6.5). The parent to child set (ParTo€thd) defines this relationship. A plane consists of a network. The network is defined by nodes Nodeinfo and links Linkinfo. The relationship between nodes and links are defined by two sets: Node to Link NIt oLI and Node from Link NffromLL 6.4.2  Monitoring, Diagnostic and Capacity  A plane that describes the monitoring program is called a Parameter-MeasurementSample (PMS) plane. The plane consists of three types of nodes (Figure 6.12)  12:  • Parameter: Attribute under study, i.e. Substrate The database schema is in Appendix E. 11 Figures 6.12, 6.13 and 6.14 use a box to describe a node. The box contains three entries. The top 2 ‘ entry is the node’s identity or name, the middle entry is a typical value and the bottom entry is the node type.  Chapter 6. Structure Paradigm  212  Parameter:Measure Set  N  -  5 BUD 250 mg/l Measure  -.  / Sample:Measure Set Figure 6.12: PMS Plane Owned By Structural Node LP.1 Influent  • Sample: Samples taken to measure attribute, i.e. Influent composite • Measure: Measure conducted on the sample to estimate attribute, i.e. COD or BOD tests. The relationships among the nodes are described by two sets, Parameter:Measure (Par ToMeas) and Sample:Measure (SmpToMeas). The first set connects a parameter to its measures, i.e. Substrate owns COD and BUD . This set allows the program to reason 5 about a parameter. For example, the user may provide a rule that requires a BUD 5 measure. If there is no BUD 5 measure available, the program can substitute the COD  Chapter 6. Structure Paradigm  213  measure by establishing a relationship between COD and BOD . The second set con 5 nects a sample to the measures conducted on it. i.e. Influent Composite Sample owns COD, BUD 5 and TS. This set enables the program to monitor the sampling program. For example, if the values of all the measures conducted on one sample change, the program can warn the user that the change could be due to a sampling problem (and therefore minimize false alarms). The Diagnostic and Capacity planes are similar to the PMS plane. The only difference between a Diagnostic Plane and a PMS plane is their occurrence. The program assumes that a diagnostic event occurs when the need arises while a monitoring event occurs on a regular basis. Normally, the program polls only the monitoring data unless instructed otherwise by the user (or by a rule). A Capacity Plane does not contain any sample nodes. Instead, a one-to-one set links the capacity measure with its allocation measure (see Figure 6.13). 6.4.3  Quality Assurance Plane  A Quality Assurance Plane consists of three nodes (see Figure 6.14): • QA/QC Process: Which quality assurance technique is being used. • Observed Value: The measured result. • Compare Value: The value to which it should he compared. An operator conducts QA/QC tests on the following: • Measurement Process: Blank, Spike or Standard • Measure: Spike or Duplicate • Sample: Spike or Duplicate  Chapter 6. Structure Paradigm  214  Parameter-Measure-Sample Plane  Capacity Plane  Volume  Volume  Aeration Basin  Aeration Basin  Parameter  Parameter  Allocate  Number  75 97 o  4 Tanks  Measure  Capacity  Same Node  H  Volume 3 8649m Capacity  Figure 6.13: LP.4.3: Bioreactor  Chapter 6. Structure Paradigm  215  QA/QC Planes  PMS Plane  Con nect To QA/QC Plane  Figure 6.14: LP.1 Influent: QA/QC Plane For Composite Sample and COD Measure  For example, Figure 6.14 shows three planes. The PMS plane describe how the substrate parameter is measured as COD on a composite sample of the influent sewage. The figure shows that two QA/QC planes are attached to nodes in this PMS plane. The first plane is attached to the composite sample node. In this case, the operator used a second composite sampler (i.e. itinerant sampler) to obtain a second composite sample of the influent and measured the COD on both samples. In practice, he would conduct the same analyses on the itinerant sample as he would the regular sample to ensure that the regular sampler is not altering the characteristics of the sample [187]. The second plane is attached to the COD measure node. In this case, the opera tor takes a second laboratory sample and spikes this sample with potassium hydrogen  Chapter 6. Structure Paradigm  216  pthalate. The observed value is the difference between the two laboratory samples, i.e. CODspiIced  —  CODnotspikeci.  In order to monitor the quality of the analytical methods used in the laboratory, a PMS plane is created and attached to the root node to model the laboratory. This plane contains a list of all the analytical methods used in the laboratory, e.g. COD and BOD . 5 The parameter nodes represent the parameters (e.g.  substrate), the measure nodes  represent the measurement methods (e.g. BOD 5 and COD tests) and the sample nodes represent the origin of the laboratory sample (e.g. standard solutions, dilution water). These nodes are connected to QA/QC planes that are used to monitor the quality of the measurement methods, e.g. Chemical Oxygen Demand test. The laboratory samples used by these planes are created in the laboratory, e.g. standard solutions. The purpose of this PMS-like plane is to warn the operator that a datum may be of poor quality due to a problem with the test. The paradigm accomplishes this by grouping all measures that use a particular a measurement method (e.g. Effluent COD, Influent COD) into a set that is owned by the measurement method node in this PMS plane (e.g. COD test on a standard). If the quality of the measurement process deteriorates, the program will note that this may be one of the reasons a parameter has changed (i.e. minimizing false alarms). 6.4.4  Derived Planes  A derived plane consists of two types of nodes. The first type is a node that identifies what is being derived (e.g. F/M ratio) while the second type indicates which measures the derived datum is derived from. The derived datum may be derived from other derived data (e.g. COD load and mass of volatile solids under aeration) or raw data (e.g. Influent COD, Influent flow, Volume of under aeration and MLVSS). A derived plane does not create a new node for a raw datum. Instead, it uses the datum’s PMS measure node.  Chapter 6. Structure Paradigm  217  This is why derived information is grafted onto the structural skeleton after measurement information. A set of instructions that describes how to calculate the datum must be associated with the datum. The procedure may either be an internal function or a macro. For example, an internal function would be a routine that calculates mass balances while a macro would be an operator-supplied procedure written in the program’s internal lan guage. An internal function is analogous to a spreadsheet function (e.g. ©avg) while a macro is analogous to a macro written using the spreadsheet’s macro language. 6.4.5  Reasoning Planes  The best way to describe a reasoning plane is to use an example. Assume the operator wants to store the following rule as a plane [172]: If the effluent NH 3 is high and the effluent TSS is low, then increase the air flow rate to the aeration basin. The plane would consists of four types of nodes: • Identity: A unique name for the rule or equation. The computer uses this node to refer to the rule. This node is the parent (i.e. owner) of the input, parameter and output sets. • Input: A set of both measurement and derived nodes that are used by the rule, e.g. effluent All 3 and TSS concentrations. These references would link the rule to nodes in the measurement and derivation classes. • Parameter  13:  A set of nodes identify operator provided definitions, i.e. another  rule, equation or constant. For example, the definition of “high” NH 3 and “low” TSS concentrations. The term “parameter” used here is not to be confused with a parameter in a PMS plane. Here, the 3 ‘ term is used to describe numbers or sets of numbers supplied by the operator.  Chapter 6. Structure Paradigm  218  • Output: A set of nodes that the rule or equation estimates or comments on, e.g. the air flow rate to the aeration basin. The rule or equation should be expressed in terms of the program’s internal language.  Chapter 6. Structure Paradigm  6.5  219  An Example: Construction Of The Structural Skeleton  The construction of a data structure, like a building, must proceed one step at a time. If two steps are mutually dependent (i.e. either one can only proceed if the other is completed), the data structure cannot be built. The more complex a data structure becomes, the more difficult it is to determine if it is feasible.  The objective of the  example discussed in this section is to determine if the database defined by the schema in Appendix E can be built from two operator supplied text files. The purpose of the example is threefold: 1. The example demonstrates that the computer can construct the database. This is an important consideration because it is possible to design a network database that cannot be built. 2. The example enables us to study the paradigm and correct some of its weaknesses, e.g. the difference between abstract and concrete links discussed in Section 6.3.3. 3. The example shows that once structure and graphics information is stored in a network database, retrieval is simple. An operator can describe a hierarchical network by constructing two text files. The first text file contains an outline of his process much like Figure 6.3 while the second lists the links between the outline’s elements as the triple:  {  Parent : Source  =  Sink  }.  The  format of these files is defined in Tables 6.2 and 6.4 respectively. The database consists of two elements: records and sets. A set is a one-to-many relation between records. The database schema (Appendix E) divides the records and nodes into three spaces: • Ideal Space: the space used by the program to reason about the structure, i.e. hierarchical network. The spaces consists of three elements: nodes, links and loops.  Chapter 6. Structure Paradigm  220  The relationship between nodes and links is defined by two sets: a set of incoming links and a set of outgoing links. The relationship among nodes (or links) and loops is a many-to-many relationship, i.e. a node may belong to more than one ioop while a loop consists of one or more nodes. These relationships are modelled using sets by introducing a new record that owns one loop and one node (i.e. intersection record). • Object Space: the space used to map the network from the graphic space into the ideal space. An object spaces describes a plane by dividing the plane into node and channel objects (Figure 6.16). A channel object consists of a number of conduit objects. A conduit is a space through which one link may pass through a channel. A link is mapped to a set of conduits that defines its path from the link’s source node to its sink node. • Graphic Space: The graphics space consists of two sets of coordinates. The first set is used to draw and label a box to represent a node while the second set is used to draw a set of lines to represent a link (Figure 6.15). For example, assume a user “clicks on” the location 1 shown in Figure 6.15. The program determines which object this location falls into (see Figure 6.16). The program determines that two links pass through this object and lists them on the screen. The user selects the link of interest and the program locates the link in the hierarchical network (i.e. ideal space). A set of 15 programs consisting of over 20,000 lines of code were written to construct this database. This exercise is an example of bottom-up design (see Chapter 1).  CD  Li)  C;’  CD  CD  (ID  Ct,  CID  C-.  C  Cz  Ct,  CD  (I)  -o  øz  øz  0. CD  0  z  0  \C  \CD  J\Cl)  Chapter 6. Structure Paradigm  223  Table 6.2: Phase 1: Text File Syntax  <Section Number> • <Name Of Node> <Nick Name> ,<Node Type>, <X-Coord> <Y-Coord> <Description>  (required) (required) (optional) (optional)  NOTE • denotes one or more blanks  6.6  Algorithms  The example consists of 15 executable programs described as phases.  Each program  opens the database, passes through the database performing a function, and closes the database when finished. 6.6.1  Phase 1: Parse In Node Information From A Text File  Phase 1 extracts information on nodes from a text file and constructs node information and node description records. The node description records enable the user to attach an explanation to a node. Table 6.2 contains the source file’s syntax. A future implementa tion should include a designation for a node’s class and stream. 6.6.2  Phase 2: Construct Hierarchy  Phase 2 examines each node’s section number, then searches for its children. The algo rithm reads a node’s section number and collects its children. The mode of connecting the child to the parent is dependent on the child’s plane type, i.e. capacity, PMS or structural. For simplicity, the PMS planes are attached to their parent structural node  Chapter 6. Structure Paradigm  224  Table 6.3: Phase 2: Section Numbers and Node Relationships  1.34.54.12 1.34.54 1.34 1.34.54.12.2 1.34.54.12.23.4 1.34.54.12.A2 1.34.54.23 1.3.45  Current Node Parent Ancestor Child Descendent Capacity Sibling No Relationship  via the t ParToC h d set. The final copy of the schema would create a new set for this to avoid blurring the distinctions between the two classes. Table 6.3 explains relationships among structural nodes based on their section number. 6.6.3  Phase 3: Parse In Link Information  Phase 3 parses a text source file to obtain information to construct link records. The parent, source and sink node section numbers describe a link. In this implementation, the source file includes PMS set linkages as well. 6.6.4  Phase 4: Construct Link Sets  Phase 4 constructs two sets, the node’s output links and input links, to model structural links. The program accomplishes this matching of the source and sink section numbers in the link record to those in the node records.  Chapter 6. Structure Paradigm  225  Table 6.4: Phase 3: Text File Syntax  <Parent Section Number> .[<Parent Node Name>] <Source Section Number> <Sink Section Number> <Link Type>,<Nick Name> <Description>  (required) (required) (required) (required) (optional) (optional)  NOTE: • denotes one or more blanks  6.6.5  Phase 5: Build Interplanar Links  An influence may be mapped from a parent to its descendants by either creating a number of links from the parent to the descendants or by creating a number of links from the source to the descendants. In this implementation, the former approach was taken. However, the example showed that the latter approach would have been more efficient and logically safe. Therefore, the next implementation should use the latter approach. 6.6.6  Phase 6: Build Sample and Measurement Sets  Phase 6 constructs sample and measurement sets using links that model PMS type re lationships. In retrospect, it would have been better to process the PMS information separately from the structure information. This would simplify the code considerably.  Chapter 6. Structure Paradigm  6.6.7  226  Phase 7: Loop Detection  Phase 7 detects and stores ioops. The algorithm travels from source nodes to sink nodes. Along the way, the algorithm pushes its path onto a stack. With every move, the al gorithm checks to see if a node appears on the stack twice. If so and if the loop is not already stored in the database, the program dumps the loop into the database. 6.6.8  Phases 8-11: Preparation For Plotting  The purpose of phases 8-11 is to construct a set of nodes and links that form the basis of a graphic presentation of the plane. Phase 8 decomposes a cross-planar link into its constituents within and across planar components. Phase 9 groups the derived and raw links contained in a plane and attaches them to the plane’s parent (PARowrisLI). Phase 10 generates a pseudo node to act as source and sink nodes for those derived links missing a source or sink node in the plane. Phase 11 determines the maximum number of links leaving or entering a node in a plane. 6.6.9  Phases 12-15: Object and Graphic Space  Phase 12 maps a link into a set of channel objects, Phase 12a generates the graphics coordinates of these objects and Phase 14 generates a sequence of points that describe a line from the output edge of a source node to the input edge of a sink node. Phase 13 generates the graphic coordinates for a node. 6.6.10  Validation: Draw A Plane  A simple routine was written to retrieves the graphics coordinates and draw a plane. This routine was used to ensure that the database resembles the plant it is suppose to describe.  Chapter 6. Structure Paradigm  6.7  227  Conclusion  The structure of information forms the vocabulary with which information is best ex pressed. The structure paradigm formalizes this vocabulary in terms of a hierarchical network. The skeleton of the network describes the structure of the process being stud ied. On to this structure, measurement, derivation and reasoning classes of information are grafted in the order mentioned. This approach forms the basis of an integrated ap proach to the storage and manipulation of information collected on a process and on its information gathering processes.  Chapter 7  Measurement Paradigm  The order of evolution is from raw data to information, from information to knowledge, and from knowledge to wisdom. [9] The purpose of this chapter is to explain the Measvrement Paradigm.. The Measurement Paradigm maps process measurements into a common space so that the data can be analyzed as a single unit. A datum may take on a variety of forms, many of which are incompatible. For example, a string and a number are incompatible data types because they must be processed differently. The paradigm avoids data apartheid by mapping all data to one of three compatible types: Crisp Number, Mean and Standard Deviation and Fuzzy Set. Two data types are compatible if they can be stored in the same database and can be manipulated by the same program. For example, time and currency are Lotus 1-2-3  ©  compatible because a 1-2-3 stores them both as real numbers.  This chapter consists of 8 sections: • Declaration Space: Defines the origin of a datum relative to a datum’s PMS plane. • Definition Space: Provides a datum with a preference. quality and (in some cases) value relation that maps a datum to a quad.  This quad consists of an  interval, value, preference and quality. • Data Space: Describes how a quad is stored in a database, i.e. internal formats.  228  Chapter 7•  t  en Measuiem  Paradigm  229  xplai how the different forms of data are mapped to one 5 • Primary Mapping. E of the three internal formats. • Viewpo;. Introduces the concept of a viewpoint A viewpoint is a data analysis template • Secondary Mapping: Describes how a viewpoint and a datum’s declaration guide the construction of raw and derived time series • Tertiary Mapping: Describes how a viewpoint and a datum’s declaration guide the grouping of raw and derived time series. The objectj of this Paradigm is to help the operator gain control of his information gathering processes  Chapter 7. Measurement Paradigm  7.1  230  Declaration Space: Origin Of A Datum  A datum is the product of a measurement process. A measurement process is charac terized by three dements: Parameter, Sample and Measure. In the previous chapter, it was explained that the structure of the measurement process is characterized by a PMS (or PMS-like) plane. A datum is placed in the context of the information gathering process by associating the datum with the appropriate measure node in the PMS plane. Because the PMS plane is associated with a structure node, the datum is also placed in the context of the layout of the treatment plant. This process of attaching a datum to a measure node which in turn is associated with a measurement process and a position in the plant is referred to as “mapping into the declaration space”. The objective of this section is to elaborate on the elements of the declaration space. 7.1.1  Parameter Context  A Parameter is a characteristic of the stream under study, i.e. Substrate Concen tration. The primary purpose of a parameter is to group measures of the same stream attribute together, e.g. COD and BOD 5 under Substrate Concentration. A parameter has the following attributes (Figure 7.1): • Structural Context: The structural context is an unique code that identifies which node represents the parameter in the structure database.  Chapter 7. Measurement Paradigm  231  Parameter Context Structural Context Control Context  H  Cause  —1  Manipulated  —]  Adjustment  Allocation  —H  Disturbance Effect  — H  —H  Status  -L  Performance  QA/QC  Model Context  —J H  H  Currency  J  Observable  Coordinate Status  Figure 7.1: Measure Paradigm: Parameter Context  Chapter 7. Measurement Paradigm  232  • Control Context: A user measures a parameter to make a control decision:  —  Adjustment or Allocation Parameters: By definition, an adjustable or allocated parameter is not measured, but set (e.g. recycle flow rate). For this reason, the paradigm assumes their measures are without error and their value remains the same until they are reset. An adjustable or allocated parameter is an operator determined input, i.e. a manipulated or operator-set parameter.  —  Disturbance Parameters: By definition, the cause of a change in a distur bance parameter is outside the system described by the structure paradigm (e.g. influent BUD 5 concentration). In other words, a disturbance parameter is an externally (versus internally) controlled input.  —  Performance and Status Parameters: If the operator flags a parameter as being an important effect (i.e. Effluent 5 BUD ) , the paradigm refers to this as a performance parameter. Otherwise, the parameter is a status parameter. A status parameter is not necessarily a state parameter.  —  1  QA/QC Parameter: A QA/QC parameter is not an attribute of the struc ture hut of the monitoring program (i.e. described by a QA/QC plane).  A parameter whose value is not set by the operator is referred to as being an ob servable parameter. The value of an observable parameter may only be determined through measnrement.  ‘Notion Of State: Some qualitative information  ( a set of numbers, a function etc.) which is the  least amount of data one has to know about past behaviour of the system in order to predict its future behaviour. The dynamics is then described in terms of state transitions, i.e. one must specify how one state is transformed into another as time passes [179].  Chapter 7. Measurement Paradigm  233  • Model Context: A user measures a parameter to model the system (Figure 7.2):  —  Currency Parameter: A currency parameter is the flow or volume in a stream, e.g. liquid, mass of air or dollars.  —  Coordinate Parameter: A coordinate parameter is an attribute of the cur rency parameter, e.g. solids or substrate concentration. A coordinate measure must be a fundamental quantity.  —  Status Parameter: A status parameter is none of the above, e.g. pH or redox potential. A status parameter is not necessarily a fundamental quantity or a state parameter.  By definition, a QA/QC parameter cannot be a model parameter. One way to view this classification system is to relate it to a simple model. Assume that we have a chemostat with a pure culture. The influent flow rate is a manipulated parameter (assuming it is set by the operator), the infiuent substrate concentration is a disturbance (assuming it is not set by the operator), the reactor pH is a status parameter (assuming it is not how the performance of the process is measured) and the effluent solids concentration is a performance parameter (assuming the object of the study is to generate solids). Similarly, given that the liquid volume is the system’s currency, then both the influent flow rate and the volume of liquid in the reactor would be currency parameters. Any parameter that satisfies a conservation principle is a coordinate parameter, i.e. substrate and solids concentration. The remaining parameters are (model) status parameters.  Chapter 7. Measurement Paradigm  234  Figure 7.2: Model Context : An Example  Q  dx 10 cIt  Q, x,y X12  7.1.2  V  =  1/  (yio  —  xio)  +  Flow and Volume Nitrate-N Nitrite-N Nitrobacter  22 a 1 x 9 5 26 a 9 +x  (7.13)  Currency Coordinate Coordinate Coordinate  Sample Context  A Sample is the portion of the stream on which a measure is conducted. A sample has the following attributes (Figure 7.3): • Structural Context: The structural context is an unique code that identifies which node represents the sample in the structure database. • Type Context: The type of sample refers to how the stream is divided up for study. Common sample types inciude grab. composite and probe. Sample types were discussed in Section 4.3. • Time Context: A composite sample, and sometimes a probe sample, sample the stream over an interval rather than at an instance. The effect of this on the data analysis is discussed in Section 2.2.5. 7.1.3  Measurement Context  A Measure possesses four attributes (Figure 7.4): • Structural Context: The structural context is an unique code that identifies which node represents the measure in the structure database.  235  Chapter 7. Measurement Paradigm  L  Sample Context  H  Structural Context Type Context  H H H H  Grab Composite  Probe Time Context  —H  Instance  H  Interval  Figure 7.3: Measurement Paradigm: Sample Context  Chapter 7. Measurement Paradigm  236  L Measurement Context Structural Context Scale Context Nominal Ordinal  —H  Interval  H  Ratio  Resolution Context  H -H  Crisp Vague  Definition Context Figure 7.4: Measurement Paradigm: Measurement Context  Chapter 7. Measurement Paradigm  237  • Scale Context: A measure may possess one of four scales: Nominal, Ordinal, Interval or Ratio. Measurement scales were discussed in Section 4.1. • Resolution Context: A measure may be crisp or vague. A crisp measure may be accompanied by a measure of its precision. The notions of crisp and vague measures were discussed in Chapter 5. • Definition Context: A measure possesses a definition, e.g. filtered COD.  Chapter 7. Measurement Paradigm  7.2  238  Data Space: Derivation Of A Datum  All data are not of equal value. The value of a datum (to the operator) depends partly on the datum’s preference and quality. Preference describes how an operator “feels” about a datum while quality describes how reliable the datum is. For this reason, the paradigm maps a datum (and its time) to the quad:  {  Interval, Value, Preference, Quality  },  i.e.  Primary Mapping (Figure 7.5). This quad forms the basis of the data space. In order to map a datum into the data space, both a declaration and a definition must be associated with the datum’s measure. A definition consists of two, and in some cases, three relations. The first two relations are the preference and quality distribution. The third relation alters a datum’s magnitude. This relation is optional. All three relations are somewhat arbitrary as the operator defines the relations to suit his needs (i.e. the relations are plant specific). 7.2.1  Quality  The measurement process and execution affect quality. The effect of the process on a datum’s quality is known (for the most part) before the fact. For example, consider the measurement specification for the determination of Chemical Oxygen Demand (COD) using the Dichromate Reflux Method [176]. We can derive a quality distribution for a COD measure by recognizing that the COD is dependent on two factors. The first factor is the difference between the amount of Ferrous Ammonium Sulphate (FAS) used to titrate the blank versus that used to titrate the sample. If the sample is weak, the difference is small. If the sample is strong, the difference is large. Standard Methods [176] suggest the difference should not be larger than 22.5 mL which is half the sample volume and not smaller than 1 mL. This means if the COD is over 900 mg/l  2,  the sample must  1f a 25 mL sample is used, the dilution breakpoint is 450 mg/I COD. A 25 mE sample is used when 2 a 125 mL erleumeyer flask is used  Chapter 7. Measurement Paradigm  239  Definition  I Interval  Datum  Map From  Value  Input Datum To Instance  Data Spare Preference  Quality  I  Data Space  Declaration  Figure 7.5: Primary Mapping: Datum To Data Space  Chapter 7. Measurement Paradigm  240  be diluted. Because dilution introduces another source of error, the quality of the test deteriorates as the dilution required increases. We can derive a quality function, eQQD(COD), using the limitations placed on the COD test by the FAS difference and dilution restrictions.  The function is given in  equation 7.14 and plotted in Figure 7.6: 1. COD < 25 mg/l: The COD test is unreliable so the Quality is zero. This means the value is somewhere below 25 mg/i. 2. 25 mg/i  < COD < 50 mg/i: Standard Methods suggests the analyst use a  0.025 N rather than 0.25 N FAS solution [176]. Standard Methods warns that this second test is very sensitive to interferences. A value in this region should be viewed as being in range of 25 to 50 mg/i. Because the two FAS normalities represent separate tests, the structure paradigm would create a measure node for each test and group the two measure nodes under a single parameter node. 3. 50 mg/i < COD < 900 mg/i : The COD test is reliable over this range. The value should be viewed as being crisp. 4. COD > 900 mg/i: When the COD exceeds 900 mg/i, Standard Methods suggest that the laboratory sample be diluted so that the sample’s COD is within the range of the test. However, the reliability of the test result decreases as the dilution required increases.  Chapter 7. Measurement Paradigm  241  This quality function,9co, is given below: if COD <25 mg/i  0 1 OCOD(COD)  =  1  —  50—COD 25  if 25 <COD < 50 mg/i —  if 50  (7.14)  COD < 900 mg/i  if COD > 900 mg/i  The effect of the execution of the measurement process on a datum’s quality should be monitored continuously through a QA/QC program. The paradigm encourages the operator to flag suspect data to avoid the change detection module from raising a false alarm. Bad data pose a greater threat to the system than missing data because bad data lead to had conclusions. The secret of dealing with bad data is simple: “don’t have any”. Or, at least, flag the data so that the program will ignore it. Gy’s measurement model, discussed in Section 4.5, could help the operator design his QA/QC program. 7.2.2  Preference  A preference distribution tells the program “how desirable” a datum is. For this reason, the operator wants the preference function to be most sensitive in the range where the desireability of the datum changes. For example, a treatment plant’s discharge permit states that the effluent COD must not exceed 50 mg/i. Given the error in the COD test. the operator would prefer if the measured effluent COD value was less than 40 mg/i. The closer the effluent COD is to the permit value, the higher the likelihood that  Chapter 7. Measurement Paradigm  242  a noncompliant COD will be observed. A typical preference distribution,  for the  Effluent COD is given below (see Figure 7.7): 1 QGOD(COD)  =  1 0  if COD <30 mg/l —  80D  if 30  COD <80 mg/i  if COD > 80 mg/i  (7.15)  Chapter 7. Measurement Paradigm  243  Chemical Oxygen Demand Test FAS Normality  =  0.25 N  1.0• 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0-  1  I I III  10  I  I  I  I I 1111  I  I  I  I I I III  I  I  I  I I liii  100 1000 Sample COD (mg/I)  Figure 7.6: Quality Distribution For COD Test  10000  I  I  I  I I III  100000  Chapter 7. Measurement Paradigm  244  Effluent Chemical Oxygen Demai Preference 1.0 0.9 0.8 0.7  a) o 0.6 c U)  0.3 0.2 0.1’ 0.0  0  10  I  I  20  30  I  I  I  40 50 60 70 Sample COD (mg/I)  Figure 7.7: Preference Distribution For COD Test  80  90  100  Chapter 7. Measurement Paradigm  7.3  245  Data Space: Internal Representation Of A Quad  The data space consists of the quad:  {  Interval, Value, Preference aud Quality  }.  A  quad is stored by the computer as a queue of real numbers. Depending on a datum’s sample type, a datum may represent an interval or an instance. Because an instance can he described as an interval, the paradigm expresses time as an interval, i.e. two queue entries. By definition, preference and quality are expressed as a crisp numbers between 0 and 1. Therefore, they require one queue entry each. A datum’s value may be stored in one of three forms: crisp number, mean and standard deviation, or fuzzy set. Each of these require a different amount of queue space: 1. Crisp Number: A crisp number is a single value. The number may represent one of two values: (1) a nominal measure, e.g. superuatant appearance (see Table D.2, Appendix D) or (2) a crisp ordinal, interval or ratio measure, e.g. 75 mg/l COD. Preference  (optional : defalllt to 1)  0.10  Quality  (optional : default to 1)  1.0  Value  (required)  75.0  2. Mean and Standard Deviation: A mean/standard deviation consists of two values. The numbers represent an interval and, by definition, apply only to interval and ratio measures. The statistics summarize a number of observations taken over an interval, usually by an on-line device. The interval should be short enough to ensure the standard deviation is a valid estimate of the mean’s precision. For  Chapter 7. Measurement Paradigm  246  example, assume we wish to store the dissolved oxygen as measured by an on-line probe over the last hour. In thsi case, the mean is 2.0 mg/l and the standard deviation is 0.8 mg/l: Preference  (optional : default to 1)  1.0  Quality  (optional : default to 1)  1.0  Value  (required)  2.0  Standard Deviation  (required)  0.8  3. Fuzzy Number or Linguistic Variable: Fuzzy numbers and linguistic variables were discussed in Chapter 5. A fuzzy number may be stored in one of three ways: as a standard function, as an a-set or as a piece wise linear approximation. The first alternative limits the operator to a small set of standard functions while the second alternative requires too much space. For these reasons, the last alternative is used. The fuzzy set is represented by a set of membership and magnitude coordinates. If the fuzzy set’s basis set is noncategorical, the paradigm interpolates between coordinates by drawing a straight line. For example, the following queue stores “less than 25 mg/l COD”: Preference  (optional : default to 1)  1.0  Quality  (optional : default to 1)  0.0  Magnitude  (required)  0.0  Possibility  (required)  1.0  Magnitude  (required)  25  Possibility  (required)  1.0  Magnitude  (required)  35.0  Possibility  (required)  0.0  As many pairs as needed  Ghapter 7. Measurement Paradigm  247  Because a queue represents a quad (which represents a datum), the queue must be grouped under a unique record formed from the datum’s time interval and the datum’s measure code. 7.3.1  Manipulated Parameter: A Special Case  Because an operator sets rather than measures a manipulated parameter, manipulated parameters represent a special case. A manipulated datum differs from other data in two ways: 1. By definition, the quality is always 1, i.e. good. 2. A new datum is only entered when the parameter is changed by the operator. For these reasons, the quality measure is replaced by a flag that indicates why the operator changed the parameter (see Table 8.2).  Chapter 7. Measurement Paradigm  7.4  248  Primary Mapping: Mapping A Datum Into The Data Space  A datum may take on a variety of forms: • Single or Crisp Number (e.g. 300 mg/i BOD ) 5 • Range of Equally Possible Numbers (e.g. COD is less than 25 mg/i) • Fuzzy Number or Range of Numbers With Different Degrees of Possibility (e.g. The foam covers about half of the basin’s surface) • Arithmetic Average with a Precision (e.g. the average DO over the last hour was 2.0 mg/l with a standard deviation of 0.5 mg/i) • Category (e.g. the sludge is buiking) • Linguistic Variable that is not a Fuzzy Number (e.g. the condition of the clarifier is very poor) Each datum is important and should not be excluded from the database (and the data analysis) on the basis of their form (i.e. data apartheid). However, these data can not be analyzed in their current form because their types are incompatible (i.e. string versus number). For this reason, the paradigm maps these data into one of the three internal formats modifying their value on the basis of their quality (Figure 7.5). 7.4.1  Mapping: {Crisp, {Ratio, Interval, Ordinal  }}  Given a crisp datum measured on a non-categorical scale, the program calculates the datum’s preference and quality and stores all three in the database. The time may be an instance if the sample was a grab, single probe reading or an observation, otherwise, the time is an interval. For example, the effluent COD is measured on a 24 hours composite  Chapter 7. Measurement Paradigm  249  Table 7.1: Measure Paradigm: Effluent COD Example  Entered 20 mg/l 30 mg/l 75 mg/l  sample.  Quality 0.0 0.2 1.0  [ Preference [ Stored 1.0 0.84 0.10  Fuzzy Number: Less than 25 mg/l Fuzzy Number: Between 25 and 50 mg/l Crisp : 75 mg/l  The program would record the datum as being indicative of the composite  interval, i.e. from Monday 8:00 AM to Tuesday 8:00 AM. An operator may modify the storage of the Effluent COD by including the following rules: • IF COD  25 mg/l  THEN COD is a number less than 25 mg/l • IF (COD > 25 mg/l) AND (COD  50 mg/l)  THEN COD is a number between 25 and 50 mg/I • IF COD > 50 mg/l THEN COD is the measured COD Table 7.1 and Figure 7.8 show the results. The advantage of this approach is that the program will not detect a change in trend or stability in regions where the quality of the data is low. This will reduce the number of false alarms and simplify the causal analysis. 7.4.2  Mapping: {Crisp, Nominall  A categorical or nominal datum is stored as a crisp value. If the user supplies the category as a string, the program retrieves the corresponding crisp value from the dictionary. The crisp value is stored with its quality and preference.  Chapter 7. Measurement Paradigm  250  30 mg/I COD  75 mg/I COD  1.0  1.0  4  I  I I  10  20  30  40  50  I  I  60  70  10  80  I  I  20  30  40  I  I  I  I  50  60  70  80  I  I  50  60  70  80  80  70  80  1.0  1.0  I  I  10  20  30  40  50  I  I  60  70  80  I  I  I  I  10  20  30  40  1.0  1.0  10  20  30  40  50  60  70  80  I  I  I  I  I  10  20  30  40  50  Figure 7.8: Measure Paradigm: Effluent COD Example  Chapter 7. Measurement Paradigm  7.4.3  251  Mapping: {Mean/Standard Deviation, {Ratio or Tnterval}}  By definition, a Mean/Standard Deviation datum must have either a ratio or interval scale. The operator may modify how the datum is stored. If we assume the datum is normally distributed, we can calculate the preference and quality as a weighted average over the distribution. For example, assume the operator measures DO on-line. The monitoring system generates a reading every minute and averages the values over the hour and calculates their standard deviation. This reduces the number of data points from 60 to 2. The average and standard deviation are stored in the database as representing the hour. A moving average should not be used for this purpose 7.4.4  Mapping: {Fuzzy Number, {Ratio or Interval}  }  By definition, a Fuzzy Number must have either a ratio or an interval scale. The program retrieves the number’s definition from the dictionary and stores the definition.  The  definition is stored as a piece-wise function. We calculate the preference and quality as a weighted average over the possibility distribution. For example, the Effluent COD example used two trapezoidal fuzzy numbers (x, i)’ • COD is less than 25 mg/l: {(0, 1.0), (25, 1.0), (35, 0.0)}  • COD is between 25 and 50 mg/h {(20, 0.0), (30, 1.0), (45, 1.0), (55, 0.0)} A Discrete Average summarizes the data while a Moving Average smooths the data. The distinction 3 is important because most statistical methods, especially Analysis of Variance, assume the data are independent of each other.  Chapter 7. Measurement Paradigm  252  These definitions are arbitrary (i.e. set by the operator). 7.4.5  Mapping: {Linguistic Variable, Any Scale}  The program retrieves the linguistic variable’s definition from the dictionary, and stores the definition. We calculate the preference and quality as a weighted average over the possibility distribution. The basis set of the measure “Clarifier Supernatant Appearance” is contained in Appendix D. We define the condition of the clarifier as being very poor if the sludge is clumping, bulking or washing out with less emphasis on the other conditions (see Table 7.2). The supernatant appearance scale is nominal (Table D.2). The definition assigns an integer to each class, e.g. Bulking=1, Ashing=2, etc. The computer stores the value “poor” by replacing the category with its corresponding integer.  Table 7.2: Linguistic Variable: Clarifier Condition Is Poor Base Variable Bulking Ashing Straggler Floc Pin Floc Clumping Washout Normal  Membership 1.0 0.6 0.6 0.6 1.0 1.0 0.0  Chapter 7. Measurement Paradigm  7.5  253  Viewpoint  In order to analyze the data, the operator needs three things: • A list of what to analyze (e.g. all influent parameters) • A time interval over which the analysis should take place (e.g. the last month of data) • A list of what analyses to perform (e.g. cause and effect, summary statistics,  ...)  Given the context of the data (i.e. each datum’s definition and declaration) and an analysis framework, a computer program should be able to do the following: 1. Connect a task to a data set, a set of functions and a set of conditions: For example, “I want to look at the aeration basin” would mean plot the data as a time series, conduct mass balances on the basin’s coordinate parameters, list what series have changed and calculate summary statistics for all the parameters. 2. Perform the correct types of manipulations for a given a data set: For example, the logic for calculating a set of simple exploratory statistics is based on the measurement scale and the number of data points (see Table 7.3). This framework is referred to as a viewpoint. By shifting the question specific components onto a viewpoint, the program is free to use the same menu system and interface for every viewpoint. A viewpoint is a template whose purpose is to focus on the relationships among the data. For this reason, a viewpoint is associated with a question the operator may ask about his data, e.g. which parameters changed? A viewpoint consists of five components:  Chapter 7. Measurement Paradigm  254  Table 7.3: Calculate Exploratory Statistics  IF ELSE IF ELSE IF  Measure(scale) is categorical Calculate reference distribution more than 7 data points Calculate Tukey’s five number summary more than 2 data points Calculate maximum, median and minimum  ELSE Calculate average ENDIF  1. Table: A viewpoint constructs a list of measures that it will look at. The contents and order of this list may be restricted by the viewpoint. The table consists of four components: • View Restrictions: What type of measurements may be viewed, i.e. which raw series to extract from the database. • Order Restrictions: What criteria may be used to control the order in which the data sets are to be viewed, i.e. how to group the data series (e.g. by parameter). • Duration: The length of time over which to extract data from the database, e.g. month or year. • Windows: How the duration is to be split into windows, i.e. which derived series to prepare (e.g. weeks or SRTs). 2. Manipulation: A decision table and a set of functions used to massage the data sets into a form used by the viewpoint, e.g. calculate first or second differences.  Chapter 7. Measurement Paradigm  255  3. Display: A decision table and a set of fnnctions to display the data in text and graphics mode. 4. Comment: A decision table and a set of fnnctions that are valid within the view point and operate on the data series, e.g. highlight preference data below 0.5. 5. Report: A list of fnnctions that produce reports, e.g.  list duration summary  statistics.  The following two examples illustrate how a viewpoint uses the measurement scale and parameter control type. Table 7.4 contains part of the display decision table to form a viewpoint that plots the data against time and calculates some simple summary statistics. The viewpoint allows the user to specify where the program should locate the X-axis. The scatter plot function plots the data points. If the datum represents an interval, the scatter plot function represents the datum by a bar that stretches over the interval. Otherwise, the datum is represented by a point. Operator set data are plotted as a post-point step function  .  The other types of parameters are joined by a solid line.  For example, if the parameter is a manipulated parameter (i.e. recycle rate) and the scale is ratio, the operator would see a step function on the screen. He would be able to move the X-axis to the mode-range, median or mean. Table 7.5 is taken from the manipulation section of  viewpoint  that takes the first and  second differences on time series. A transition sequence is a list of intervals over which the first difference changes sign. For example, if the parameter is a manipulated variable (i.e. recycle rate) and the measurement scale is ratio, the viewpoint would calculate the first difference series and construct a list of points where the derivative changes sign. A post point step function is one where the datum starts the step rather than ending it 4  Chapter 7. Measurement Paradigm  256  Table 7.4: Time Series Viewpoint : Display Decision Table  PARA(Control):{Adjust,Allocate} MEAS(Scale):{ N=Nominal, O=Ordinal, I=Interval, RRatio Step Fuuction:Post Point Line:Solid Scatter Plot Y-Axis Type:Category X-Axis Y-Position:Mode X-Axis Y-Position:Mode Range X-Axis Y-Position:Median X-Axis Y-Position:Mean X-Axis Y-Position:Interval Weighted Mean X-Axis Y-Position:Harmonic or Geometric Mean  }  T N X  T 0 X  T I X  T R X  X X X  X X X  X  X  X X X X  X X X X  X  F N  F 0  F I  F R  X X X X  X X X X  X X  X X  X X X X  X X X X X  —  Table 7.5: Differenced Time Series Viewpoint : Series Manipulation Table  PARA(Control):{Adjust,Allocate} MEAS(Scale):{Nominal,Ordinal} MEAS(Scale):{IntervaLRatio} Generate First Difference Time Series Generate Second Difference Time Series Generate Transition Sequence Generate Max, Mm and Inflection Points  X  T T F X  I’ F T X  F T F X  X  X  X  —  —  F F T X X X  Chapter 7. Measurement Paradigm  7.6  257  Secondary Mapping: Quad To Series  A raw series is a set of quads ordered by interval, i.e. time series. The duration of the series is the length of time between the start of the first quad’s interval to the end of the last quad’s interval. The duration of a series may be broken into equally sized windows. A derived series consists of a summary of the quads within each window. For example, an element of a derived series would consist of summaries of the preference, quality and value of quads contained within a window. The quality and preference summaries could consist of Tukey’s five number summaries (see Section 2.2). The value summary could consist of Tukey’s five number summaries for non-categorical data and histograms for categorical data. The mapping of data quads contained within a duration to a number of time series is referred to as secondary mapping (Figure 7.9). This mapping is controlled by both a viewpoint and each datum’s declaration.  Chapter 7. Measurement Paradigm  258  Viewpoint  Raw Series Datum Quad  Derived Series  Map From  Derived Series  Data Space To Datum Quad  Data Series  Derived Series  Datum Quad Series Summary  Data Space  Data Series Space Declaration  Figure 7.9: Secondary Mapping  Chapter 7. Measurement Paradigm  7.7  259  Tertiary Mapping: Grouping Data Series  A viewpoint may require that the data series be organized into different groups. This is referred to as tertiary mapping (Figure 7.10). For example, a viewpoint that calculates mass balances and yields around unit processes would group the series first by parameter (e.g. substrate) and second by whether the parameter characterizes a stream entering or leaving a unit process.  Chapter 7. Measurement Paradigm  260  Viewpoint  Raw Series Group 1 Series  vedSeries’_  Map From  Derived Series  Group 2 Series  I  Data Series To Viewpoint  ...  Derived Series  Group 3 Series Series Summary  Scc.I Declaration  Figure 7.10: Tertiary Mapping  Chapter 7. Measurement Paradigm  7.8  261  Summary  The Measurement Paradigm builds on the Structural Paradigm by providing a datum with a measurement context. The paradigm relies on three sources of information to guide the paradigm’s extraction of information from a data set. The first two sources of information are a datum’s declaration and definition. A summary of the declaration and definition of the effluent COD example is given in Table 7.6. The third source of information is a viewpoint. A viewpoint is associated with a data analysis task rather than with a datum and its PMS representation. A viewpoint guides the derivation and grouping of time series. The derivation of information is accomplished by three mappings. The primary map ping associates a datum with a quad (Figure 7.5). A quad consists of an interval, a value, a preference and a quality. The interval defines the time span the datum repre sents. The value may take on various forms, all which represent the datum’s magnitude and all which are mapped to one of three internal formats. Preference describes how the operator “feels” about the datum and quality indicates how much weight the operator can put on the datum. The secondary mapping maps data to raw and derived time series (Figure 7.9). A derived time series consists of a series of window summaries. For example, the window may be the least common interval between two raw series (see Section 2.2). The tertiary mapping groups time series according to the needs of a viewpoint (Figure 7.10). For example, the cause and effect viewpoint (which is discussed in the next chapter) groups data series by a datum’s parameter control context, i.e. perform, status, QA/QC  {  adjust or allocate, disturb,  }.  Statistical packages and other data analysis programs assume that quality of the data is good. The onus is on the user to ensure that this is the case. However, when data are  Chapter 7. Measurement Paradigm  Table 7.6: Measure Paradigm: Effluent COD Summary  Component Parameter  Sample  Measure  Quality  Preference  Context  Example Declaration Space Structure Effluent Substrate Control Performance Model Coordinate Structure Effluent Composite Sample Type Composite Time 24 Hours Structure Effluent COD Scale Ratio Resolution Crisp Definition None Needed Definition Space Preprocess see equation 7.14 QA/QC Test is checked against a standard Sample is checked against a duplicate Distribution see equation 7.15  262  Chapter 7. Measurement Paradigm  263  collected on a routine basis and simply archived, there is the tendency to get careless. The problem of data quality was discussed in Chapter 4. The measurement paradigm enables the operator to correct the data for the limi tations of the test method, to monitor the quality of the monitoring program and to document any changes in the sampling or measurement methods. i.e. control the infor mation gathering process. This is important because we cannot distinguish between a change induced by the problems in the monitoring program from changes in the system. An excellent example of this occurred at Beckman and Southwest Treatment Plants in Jacksonville where the plants were brought into compliance simply by installing a new composite sampler [187]. In this case, the problem was with the information gathering process, not the plant.  Chapter 8  Operation Paradigm  Finally, we note that with messy data and unclear objectives, the problem is not how to get the optimal solution, but how to get any solution. Once again, philosophical problems seem rather irrelevant. [73] The purpose of this chapter is to explain the Operation Paradigm.  The Operation  Paradigm enables the operator to determine the effect of his actions on the process. The Chapter is divided into five sections; 1. What Is Status? 2. What Is Change? 3. Postdiction: Why Was A Manipulated Variable Changed? 4. Prediction: How Should A Manipulated Variable Be Changed? 5. Conclusion The Chapter mentions two viewpoints; Change Viewpoint and Cause/Effect View point. If an operator changes a manipulated variable, the manipulated variable becomes the response variable in one of the eight Change Viewpoints (Table 8.2). Other manipu  264  Chapter 8.  Operation Paradigm  265  lated variables may be members of the viewpoint depending on their relationship to the response and initiating variables. A change viewpoint ceases to exist when one of the following conditions occurs: • The change is deemed insignificant: The time at which the change was made is now outside the data series duration. • The change is deemed successful or unsuccessful: A change goal is defined by what the operator hopes will happen and/or by what he hopes will not happen. • The viewpoint is no longer relevant: A viewpoint is based on a set of as sumptions about the relationship among the variables. If these assumptions are no longer valid, the viewpoint ceases to exist. A Cause and Effect Viewpoint is nested in each Change Viewpoint.  An operator  can lock on a variable and ask the Cause and Effect Viewpoint to organize the other  variables relative to this variable. For example, the graphics screen is divided into four plots: Manipulated, Disturbances, Performance and Effect. If the operator locks on a disturbance variable, the plots in the other three windows are grouped as follows: • Perform and Status: The set of effect variables affected by the disturbance, i.e what does the disturbance variable affect? • Manipulate and Disturb: The set of cause parameters that also affect the above performance parameter, i.e. what might offset the effect of the disturbance param eter? Because the paradigm relies on the detection of change, the paradigm will fail if the change detection routines fail. Therefore, the operator is able to overrule the paradigm at any time.  Chapter 8.  8.1  Operation Paradigm  266  What Is Status?  A derived time series’ duration is split into equally sized windows. The status of the series is itself a series that describes the magnitude (location) and stability (width) of the data within a window. A status element consists of the following items: • Interval: Size of the window. • Count: Number of data points in the window. • Position: The window’s position relative to the other windows. • Direction: The direction the time series is moving. • Stability: The variability of the time series. The user defines a duration that determines how far back the analysis should look at the data. The user breaks this duration into a number of windows that define short, medium and long term behavior, i.e. at least three derived time series per raw time series. The program generates a status report for each window. When possible, the program calculates the position, trend direction and stability for the window summary’s value, preference and quality. The window statistics should be simple (see Section 2.2). For example, Figures 8.1 and 8.2 show location defined as Tukey’s five number summary and stability defined as the median difference respectively. Trend direction could be described using the simple algorithm described in Figure 8.3  •  This algorithm is one example of  a simple trend detection algorithm that will work on small sample sizes [219]. Similar algorithms for stability need to be developed and tested.  1’ is the maximum allowable retracement and is usually set at 0.66 (see Figure 8.3)  Chapter 8.  Operation Paradigm  267  Duration III  —Window—-II  ‘I  WI  I  t  I  14  III  I  ci  $  -C D  ‘s ‘ Is  C  s  C)  I  I  • I • I • I  I  I I I  !  II  I, WI  I  j:  $  I  I  I I  I I I  I I I  p p p p  Il II  • $  I II IS pg  I I  -—  —  •‘ gI  ssI  •-i  S  I  I  I I I  g  $  $  ItWI  WI  —  II  II ii  II  I  I  I p p  I I I I  I I  I I  —  •  I  I  II  $  • I  Box-Plot  I  Time Window Series Replaced With Tukey’s Five Number Summary  Figure 8.1: Derived Time Series: Level  Chapter 8.  Operation Paradigm  268  Duration II I’  ‘—Window--—  II  I’ I I I I  II I I  a4.  I I I I I  I I I  II S  I I I I I  C  II  i-i  I I  I I i I  I  • I I I  I I  $  I  I I  I  •  I I  1$  Jhl •  I  Iii  ‘I 5$  I  II  I I  I I  ItI  I  I I I I I I I I I S II  I  5  I  g  I S  II  •i-I’I  I  WI  I I I  g  Time Window Series Replaced With Absolute Median Deviation  Figure 8.2: Derived Time Series: Stability  II  Chapter 8.  Operation Paradigm  CONSTANT: P  iF  =  269  0.66  > X_ THEN IF x_ > X THEN Direction is Down ELSE IF x_ <Xt  Xt_2  —  1 _ 2 x—z 1 — 2 z_ xt_  IF zX > P THEN Direction is None ELSE Direction is Down ENDIF ELSE Direction is Down ENDIF ELSE IF Vt2 <rt—1  THEN IF  < Xj THEN Direction is Up ELSE IF xt_a > Xg X_  —  xt—.xt_1  —  THEN Direction is None ELSE Direction is Up ENDIF ELSE Direction is Up END IF ELSE THEN < Xj THEN Direction is Up ELSE IF X_ > 3J THEN Direction is Down ELSE Direction is None ENDIF IF Xt_1  ENDIF  Figure  8.3: Simple  Direction  Algorithm  Chapter 8.  Operation Paradigm  270  Table 8.1: Forms Of Change _Form Level Warn/Alarm Limits Frequency  8.2  Explanation The weekly median effluent COD increased The daily effluent COD is above the permit value. Today’s effluent COD is the highest window value. The diurnal flow cycle has developed a new peak.  What Is Change?  A change event occurs when the value, quality or preference between windows changes. Table 8.1 lists the different forms a change may take. 8.2.1  Level  Figure 8.4 shows a square wave that steps up at T  =  10 and down at T  =  20. The  solid line would represent a change in a manipulated variable because a manipulated variable is set, not measured. The other points represent changes in non-manipulated variables. If the measurement noise is comparable to the step size, the step is undetectable (STD  8.2.2  =  0.40). Warn/Alarm and Limits changes are special cases of a level change.  Warn-Alarm  An operator can associate warning and alarm levels with a parameter. A change may he defined as moving from one region into another (change in level). This type of change is useful for the case when the operator collects very little data on a parameter that should not vary outside a. predefined region. For example, the operator may assign the effluent  Chapter 8.  Operation Paradigm  271  Change In Level Square Wave (h=O.50) 3.50 Square Wave 3.00  ±  STD  =  0.01  00 *  D  t÷++±±-IJK**  2.50  STD  * 2.00  1.50—  1.o0  0  I  0  5  10  I  —  0.10  0  0  0  C  STD  0  I  20 15 Time  I  I  25  30  Fignre 8.4: Change In Level  35  =  0.40  Chapter 8.  Operation Paradigm  272  COD to a member of the following set (assuming the discharge permit is 50 mg/i): • Okay:  {  COD <40 mg/i  • Warning : • Alarm :  {  {  40  COD  } 45 mg/i  COD > 45 mg/i  }  }  Alarms may be used for nominal parameters as well.  The operator assigus each  category to a region. For example. the operator may assign the secondary clarifier “ap pearance” measures to the following sets (see Appendix D): • Okay : {Normal • Warning : • Alarm :  8.2.3  {  {  }  Ashing, Straggler-Floc, Pin-Floc  Bulking, Wash-Out, Clumping  }  }  Limits  A change could be defined as the establishment of a new limit, e.g. 30 day maximum or minimum. A window’s maximum and minimum defines the limits for a non-categorical measure while the mode and the set of unreferenced categories defines limits for categor ical measures. A change would be defined as either the establishment of a new mode or the use of a previously unused category (e.g. bulkiug). 8.2.4  Trend  A trend is a tendency for an observation to increase (or decrease) over time, over and above what can be attributed to local trends or random variation [219]. Figure 8.5 shows both a positive and negative trend.  Chapter 8.  Operation Paradigm  273  A trend possesses four physical characteristics [74] [75]: 1. Existence: The trend is cansed. The canse may be a disturbance parameter, a manipulated parameter or an undetected causal parameter. 2. Stable: The trend is different from random variability and resolvable above the noise. A caused trend may be unstable if the measurement noise drowns out its contribution to the time series. 3. Uniqueness In Scaling: A scale and window define a trend. For example, assume an operator detects an upward trend in the average daily effluent solids concentra tion over the past month. The scale is daily, i.e. an individual datum represents a 24 hour interval and the window is the month. 4. Well-Behaved: A well-behaved trend is smooth and continuous. For example. a build-up of filamentous bacteria in the system would cause a trend upward in the effluent solids. Discontinuities are usually due to equipment failures, contaminated samples or analytical errors. The state of a trend at time t is defined by the triplet {xQt),  ,  9}.  A trend event is  the ordered set of contiguous trends. The more recent a trend event, the more important it is. A simple way to detect trend is to assign the “Up” 1, “Down” -1 and “None” 0 (see Figure 8.3). Trend then could be defined as the sum of the directions divided by the number of items in (i.e. windows) in the series. The operator would pick a threshold value for this which he could “tune” for each parameter. Alternatively, the operator could use trend lines [219]. A violation of a trend line indicates a change in trend.  Chapter 8.  Operation Paradigm  274  Change In Trend Triangle (h=O.50) 3.50 Triangle 3.00  +  STD  0  0  0.01  *  2.50  STD=0.10  D  **  0  00 0)  STD  o  2.00 *0  00 0  1.50  1.00  =  —______________________________  0  5  10  15 20 Time  25  Figure 8.5: Change In Trend  30  35  =  0.40  Chapter 8.  8.2.5  Operation Paradigm  275  Frequency  A change may be also due to a change in the autocorrelation or spectral density function (see Figure 8.6). In order to detect such a change, a large amount of data are required. For example, Watnabe et al [306] proposed a two filter scheme to detect parametric change. A time series model is determined on-line during normal operation. A Kalman filter is used to update the model until the innovations sequence becomes colored. An Extended Kalman Filter is used to correct the model. A change in frequency may manifest itself as a change in stability. A change in stability is defined as the change in spread between windows. For example, a change in stability may be defined as a change in the windows’ median deviation.  8.3  Postdiction: Why Was A Manipulated Variable Changed?  Table 8.2 summarizes the eight reasons why an operator may change a manipulated variable (change viewpoints). In the previous chapter, it was mentioned that quality is meaningless when speaking of manipulated parameters. For this reason, the quality measure is replaced with a code that indicates why the operator changed a manipulated parameter. These change viewpoints form the basis of this code. Figure 8.7 shows the interrelationships between these viewpoints. All viewpoints except Catastrophic Intervention require that the process be control lable. A process is uncontrollable when changes in a manipulated parameter cease to effect a change in performance. For example, assume that a biological phosphorus re moval plant stops removing phosphorus. The introduction of a readily degradable carbon source, and changes in recycle or wastage cannot effect an immediate process recovery. The operator decides to polish the effluent by using alum until the process starts re moving phosphorus again. The addition of alum is a temporary measure in that the  Chapter 8.  Operation Paradigm  276  Change In Frequency Sine Wave (T=1O) 3.50 AdS,.w,T=3  LI  Triangle  LI  ——-——--—  3.00  ±  STD 2.50  0)  T  2.00  STD=0.10  oJ *  STD  1.00 LI  0.50  I  0  5  10  20 15 Time  0.01  *  LI  1.50  =  25  Figure 8.6: Change In Frequency  30  35  =  0.40  Chapter 8.  Operation Paradigm  277  Table 8.2: Why Change A Manipulated Variable?  T  Why Anticipate Disturbance  2  Anticipate Performance  3 4  Optimize Performance Compensate Disturbance  5  Compensate Manipulated  6  Counteract Performance  7  Non-Operational Change Failure Error  8  Maintenance Disruption Catastrophic Intervention  Example The load on the plant will increase during tourist season so decrease wastage now to build up the solids needed to expand the plants capacity. The nitrate levels in the effluent rise during the warm summer months so change your process configuration to allow denitrification. Reduce recycle rate with the hope settling will improve. The organic load in the plant is trending upwards so de crease wasting to raise the MLSS concentration. Decrease wasting to offset effect of increase in sludge pro cessing streams returned to aeration basin. Increase recycle rate to offset denitrification in secondary clarifier. The recycle line to the aeration basin was plugged so monitor the aeration basin’s recovery. The change in the recycle rate caused a deterioration in performance not an improvement. One clarifier was off-line for three days for maintenance. The power was out for three hours. Chlorinate secondary clarifier sludge to kill back some of the filamentous population.  Chapter 8.  Operation Paradigm  278  plant is designed to remove phosphorus biologically only. Once the process recovers, the operator stops adding alum and monitors his performance controlling the process using conventional means. The viewpoint changes from Catastrophic Intervention to Counteract Performance. The viewpoints, Anticipate Disturbance, Anticipate Performance and Optimize Per formance, are preemptive control actions in that the cause of the change is in the future. The operator can sort out the effect of the change if the process is stable at the time the change is made. Figure 8.8 shows the distribution of a performance parameter over a window. If recent intervals he within the optimize region, the operator can assume it is safe to “push” the performance to a new level. Figure 8.7 illustrates that the tendency is to move to a control viewpoint after a preemptive change is made to the process. The reason for this is that the operator must stabilize his process before he acts again. When the operator anticipates a disturbance or decides to optimize the plant’s per formance, he is making a prediction. The program monitors the performance parameters affected by the change in the manipulated parameter and watches to see if the operator’s prediction comes true. The Control viewpoints, Counteract Performance, Compensate Disturbance and Com pensate Manipulated, are the most common operation modes. The operator responds to his process in the hope of stabilizing the process performance. Figure 8.8 describes this as operating on the least preferable tail of a performance distribution. In other words, if the operator cannot assume his plant is stable and if the recent intervals are less preferable than older intervals, then the control objective is to stop the deterioration in performance. Once the deterioration is stopped, then the operator can analyze what happened and improve the process performance. If the process is not stable but is mov ing in a preferable direction, the operator should leave his process alone until either the  Chapter 8. Operation Paradigm  279  Controllable  Figure 8.7: Why A Manipulated Parameter Is Changed  Chapter 8.  Operation Paradigm  280  Preference  ,  Control  Optimize  .....::...  6 0)  Monitor .................  ......  Frequency Of Occur Magnitude Distribution Preference Function Figure 8.8: Stability And Control Actions  Chapter 8.  Operation Paradigm  281  Time Anticipate  Cause  I 1’vianipulate Optimize  Disturb Cause  I ManiPulatej Feedback Control  Perform Cause  1 rm Feedforward Control  Manipulate Cause  I Manipulate  turb Interaction  Cause  I Manipulate Monitor  Manipulate Cause  Figure 8.9: Causal and Noncausal Change  Chapter 8.  Operation Paradigm  282  performance “settles down” or the performance starts to deteriorate. One of the most difficult decisions an operator mnst make is deciding when to wait (not intervene). The rule of thumb is that if the change in performance is tolerable then it is better to wait than to perturb the process further. The reason a change is made determines what the operator should monitor. When the operator reacts to a change in performance, he forms a feedback loop, while when he reacts to a disturbance or compensates for another manipulated parameter, he forms a feedforward loop (see Figure 8.9). In both these situation, the operator’s goal is to stabilize or control the process.  For this reason, the program monitors performance  parameters to ensure that they do not deteriorate. If the program detects that another cause is affecting the performance parameters, the program will warn the operator and reset the viewpoint. If the operator makes an error or a piece of equipment fails, the operator’s main con cern is with the process’s stability (assuming that if an error is detected, it is corrected). If the performance starts to deteriorate, the viewpoint will suggest he act. However, if the operator feels the effect is transient, he may decide to wait the deterioration out and not intervene. The program assumes that a variable is monitored and that a change in the variable is detected. If this is not the case, the operator will have to assist the program: • Monitored but Not Detected: A change occurred that the program missed. The operator informs the program of its error and the program enters the change. If the data are noisy, the program’s change detection routines may not detect a change that is obvious to the operator. In other words, if the computer and the operator disagree on whether a significant change took place, the operator should be able to override the computer.  Chapter 8.  Operation Paradigm  283  • Not Monitored but Detected: The program detects a change in a diagnostic variable. The program obtains the necessary information on the variable from the operator and includes it in the analysis until the viewpoint changes. This usually occurs when the operator monitors a parameter for a couple months each year, e.g. effluent nitrate concentration or when the operator is attempting to determine if a normally unmonitored parameter is the cause of a process problem, e.g. anaerobic digester supernatant COD. • Not Monitored and Not Detected: The operator detects a change in a variable that the program knows nothing about. The operator informs the program and provides it with a location in the process. The program has no data on the variable so it must rely on the operator to tell it  when  the variable’s state changes again.  For example, the operator observes that his influent turned a milky color and the wet well smells of dry cleaner solvent. He may tell the program when this occurred but not provide the program with any data. 8.3.1  Reason #1: Anticipate Disturbance  Figure 8.10 outlines the case when the operator changes a manipulated parameter to compensate for an anticipated change in a disturbance parameter  2  For example, the  operator decreases the wasting because he anticipates an increase in load due to the tourism season.  If the anticipated disturbance occurs, the action is considered successful and the  view  point ceases (Table 8.3). If a common performance parameter deteriorates, the action is considered unsuccessful and the viewpoint switches to Error (Table 8.4). However, The reason for the change is the operator’s prediction that a change will take place. The prediction 2 took place before the change, therefore, postdiction.  Chapter 8.  Operation Paradigm  284  Not Deteriorate Must Change  cD  Must Improve Cause To Effect  Anticipate  Confounding  Disturbance  Disturbance  :::: Common  Figure 8.10: Reason #1: Anticipate A Change In A Disturbance Parameter  Chapter 8.  Operation Paradigm  285  Not Deteriorate  0 Q  Must Change Must Improve Cause To Effect  Confounding  Anticipate  -  _[  Disturbance  nding  Confounding  Manipulated  Manipulated Performance  Figure 8.11: Reason #2: Anticipate A Change In A Performance Parameter  the operator may choose to ignore the program’s opinion because he suspects the distur bance is still imminent. The change viewpoint is reset if a confounding manipulated or disturbance variable changes. 8.3.2  Reason #2: Anticipate Performance  Figure 8.11 illustrates the case when the operator changes a manipulated variable because he suspects that a performance variable is about to change.  Chapter 8.  Operation Paradigm  286  The operator is successful if the change in the performance is never detected (Ta ble 8.5). In other words, the absence of failure is success. If a confounding disturbance or manipulated variable intervenes, the viewpoint is reset. 8.3.3  Reason #3: Optimize Performance  An operator should not introduce a change in a manipulated variable with the view to improving the process’s performance if the process is not stable (Figure 8.12)  .  When  the change is made under stable conditions and the subsequent performance does not improve, the action is considered unsuccessful (Table 8.6). The viewpoint is reset if a confounding disturbance or manipulated variable acts on the performance variable being optimized. 8.3.4  Reason #4: Compensate Disturbance  Figure 8.13 outlines the case when the operator changes a manipulated parameter to com pensate for a detected change in a disturbance parameter. For example, the disturbance may be an increase in the winery’s COD while the response may be reduced wasting. The operator is successful if the shared performance variables between the disturbance and the manipulated variable do not deteriorate (Table 8.8). However, if a confounding disturbance or manipulated variable change, the viewpoint must be reset. The viewpoint is cleared if the initiating change is no longer important or the initiating disturbance changes (Table 8.7). 8.3.5  Reason #5: Compensate Manipulated  A change in a manipulated variable may cause some performance variables to improve and others to deteriorate. To offset this, a second performance variable may be changed  Chapter 8.  Operation Paradigm  Table 8.3: Rule For Reason #1: State Of Anticipated Disturbance IF initiating disturbance manifests itself Change is successful WHY? Anticipated disturbance detected Change viewpoint to Compensate Disturbance ENDIF  Table 8.4: Rule For Reason #1: State Of Common Performance IF at least one of the performance parameters deteriorates Change viewpoint to Compensate Disturbance Change is unsuccessful WHY? Manipulated parameter caused process to deteriorate ENDIF  Table 8.5: Rule For Reason #2: State Of Anticipated Performance  IF performance variable does not deteriorate IF change in manipulated variable insignificant Change is successful WHY? Performance did not deteriorate ENDIF ELSE Change viewpoint to Error or Compensate Manipztlated Change is unsuccessful WHY? Performance deteriorated ENDIF  287  Chapter 8.  Operation Paradigm  288  Not Deteriorate  I  Q [  Must Change Must Improve Cause To Effect  Disturbance  Target  fl  Performance Common Disturbance  Common  Performance  I  Shared  Imtiatmg  Performance  Manipulated  I  I  Manipulated Manipulated Performance  j  Figure 8.12: Reason #3: Optimize A Performance Parameter  Chapter 8.  Operation Paradigm  Table 8.6: Rule For Reason #3: State Of Performance  IF performance deteriorates Change is unsuccessful WHY? Performance deteriorated Change viewpoint to Error or Compensate Manipulated ELSE IF change in manipulated variable still significant IF performance improved Change is successful WHY? Performance improved ELSE Change is unsuccessful WHY? Performance did not improve. ENDIF ENDIF  Table 8.7: Rule For Reason #4: State Of Initiating Disturbance  IF No change since initiating change in initiating disturbance IF Initiating change insignificant Viewpoint ceases WHY? The initiating disturbance insignificant ENDIF ELSE Viewpoint ceases WHY? Initiating disturbance changed significantly ENDIF  289  Chapter 8.  Operation Paradigm  290  Not Deteriorate Must Change  Q  Must Improve Cause To Effect  Warn Only  Performance  Initiating  Confoundmg  Disturbance  Disturbance Shared Performance  Responding  L  Manipulated  Warn Only  Confounding Manipulated  Manipulated Performance  Figure 8.13: Reason #4: Respond To A Change In A Disturbance Parameter  Chapter 8.  Operation Paradigm  291  Not Deteriorate Must Change Must Improve  0 lU  Cause To Effect  Warn Only  Performance  Shared Responding  L_  Manipulated  Manipulated  I Warn Only  Confounding  Performance  Figure 8.14: Reason #5: Compensate For A Change In A Manipulated Parameter  (Figure 8.14). The operator is successful if common performance variables do not dete riorate. 8.3.6  Reason #6: Counteract Performance  Figure 8.15 outlines the case when the operator decides to respond to a change in a performance parameter. For example, the operator increases the wastage rate because pin-fioc is observed in the clarifier supernatant. If the effect improves or remains the  Chapter 8.  Operation Paradigm  292  Not Deteriorate Must Change Must Improve  Q  Cause To Effect  rn Only Common is ur ance Irfifiatingnding Performance.  Common  ]  Manipulated  Manipulated  rn Only  Figure 8.15: Reason #6: Compensate For A Change In A Performance Parameter  same, then the operator is successful. If the performance deteriorates (Table 8.9), the control action is unsuccessful. In this case, the operator may decide to either compensate for the change in performance or the change in the manipulated variable.  Chapter 8.  Operation Paradigm  Table 8.8: Rules For Reason #4 : State Of Common Performance Variables  IF Performance variables did not deteriorate Change is successful WHY? Performance did not deteriorate ELSE Change is unsuccessful Change viewpoint to Error or Compensate Manipulated WHY? Performance variable deteriorated ENDIF  TaEle 8.9: Rule For Reason #6: State Of Common Performance IF at least one of the performance parameters deteriorates Change unsuccessful. Change viewpoint to Error or Compensate Manipulated WHY? A target performance variable deteriorated. ELSE Control action successful. ENDIF  293  Chapter 8.  Operation Paradigm  294  Not Deteriorate Must Change  0  Must Improve Cause To Effect  Changed Back To Correct Error Confounding Disturbance Common In  Confounding  Manipulated  Manipulated Performance  Figllre 8.16: Reason #7: Correct An Operational Error  8.3.7  Reason #7: Non-Operational Change  If a manipulated variable is changed for a non-operational reason (i.e. error or equipment failure), the operator has the choice to compensate for the change or to correct the offending variable (Figure 8.16). If a performance variable deteriorates, the viewpoint changes to Counteract Performance.  Chapter 8.  8.3.8  Operation Paradigm  295  Reason #8: Catastrophic Intervention  A catastrophic intervention differs from all the previous interventions in that it only occurs when the operator has lost control of his process. When this occurs, the operator introduces a new cause that “jolts” the process back into a state where the operator can maintain control using conventional control variables. An operator is successful when the intervention is no longer required (i.e. removed).  Chapter 8.  8.4  Operation Paradigm  296  Prediction: How To Change A Manipulated Variable  Although the focus of this research is postdiction, prediction is required if the operator requires assistance in deciding if, when and how to change a manipulated parameter. In this case, the program could be coupled to a model. The choice of model is important. An operator may decide that a model is not worth bothering with if the model requires maintenance or extensive computing resources. The best approach is to use a simple mechanistic model whose parameters can be recursively identified from the monitoring data.  In this case, the program, not the operator, maintains the model. Simplicity  requires compromise  -  a simple model cannot estimate the transient and steady state  response to the same degree of precision as a complex model. However, when making an operating decision in a  wastewater  treatment  plant, the transient response is of only pass  ing interest. Therefore, the simple model should concentrate on estimating the correct steady state response. The program can introduce an additional degree of complexity by using a set of simple models, each of which is valid over a small range of operating conditions.  8.5  Conclusions  The goal of the Operation Paradigm is to link a set of data in the past to a change now, and change now, to a set of data in the future. These linkages enable the operator to learn from his actions.  Chapter 9  Synthesis  A reasonable scientific goal is to develop a theory of representation and rea soning that explains how information can be structtired in such a way as to be efficiently interpretable by machine, yet understandable to humans [124]. The objective of this chapter is to demonstrate how an operator can use the paradigms, described in the previous three chapters, to improve his ability run his plant. An operator can improve his plant’s performance by gaining control of his information gathering processes and following the effect of his control decisions. The simple example provided in this chapter focuses on the latter case. The structure of the example discussed in this chapter is presented in Chapter 6. 9.1  The Relationship Between The Program and The Operator  The operator uses the analyst module of the program to work through each control recursion (Figure 9.1). Because the Treatment Process and the Information Gathering Process cannot be uncoupled, control must be maintained over both. For this reason, the system can be broken into two components: the Information Generating System and the Information Interpretation System. The operator and the computer program (the analyst) form the latter while the treatment process and the data collection programs make up the former. The most common approach to treatment plant control is to assume each set-point is independent. For example, the wastage rate is set to maintain a desired SRI and the recycle rate is set to provide a desired sludge blanket depth in the secondary clarifier. In 297  Chapter 9. Synthesis  298  this case, the operator passively monitors the system’s performance nntil a change occurs in either the plant’s performance or the plant’s inputs. Although such an approach is easily automated, the approach often leads to suhoptimal performance because these two manipulated parameters are in fact linked. The alternative approach is to recognize that for a given set of input and state condi tions, there is at least one optimal set of set-points. The reason for this is that set-points in a treatment plant are not independent. The operator must actively monitor the effect of the set-points on the system in the hope that he can determine their optimum values, i.e. delineate a response surface. This approach is not easily automated. The program outlined in this thesis assists the operator in recognizing the pattern of set-points that provides the best over-all plant performance. If the program is used in this manner it will lead to both an improvement in the plant’s performance and the operator’s process knowledge.  Chapter 9. Synthesis  299  Treatment Process  Inform att on Gathering Process  Observable L.  Collection Programs  Parameters  Information Generating System  Viewpoint  Operator Information Interpretation System  Figure 9.1: Synthesis: Information Generation and Interpretation  Chapter 9. Synthesis  9.2  300  Construction Of A Simple Example  The purpose of the example described in this section and analyzed in the following section is to demonstrate how the three paradigms defined in the previous three chapters assist the operator to run his treatment plant. The purpose of the example is not to prove the efficacy of the paradigms over conventional operational strategies nor is it to simulate in detail the interaction between an operator and his treatment plant. The structure of the example parallels Figure 9.1 and is shown in Figure 9.2. The first difference between the two figures is that the treatment plant process and the data collection programs are replaced with models. The purpose of these models is to simulate the functions of these two elements of the real system at a very primitive level. The second difference between the figures is that Figure 9.2 contains a model that drives the simulation, i.e. Simulation Scenario Function. In the real world, the plant responds to events in its system that are caused by outside disturbances (i.e. changes in the weather) and internal state changes (i.e. shifts in the biological population). In the example, the system responds to changes in model coefficients and inputs made by the Simulation Scenario Function.  Chapter 9. Synthesis  301  Information Gathering Process  Treatment Process  Simulation  easuremen  Scenario  Model  Information Generating System  What 4  Changed?  Jr  How Did It Change?  zzzz:tzzzz’ HowTo Respond?  V%ewpoint Operator  Information Interpretation System  Figure 9.2: Simple Example : Program Layout  Observed Parameters  Chapter 9. Synthesis  9.3  302  Information Generating System  The Information Generating System consists of the Treatment Process and Information Gathering Modules. 9.3.1  Treatment Process Module  The Treatment Plant module models the layout and the function of a treatment plant independent of each other. This module constructs a hierarchical network model of the treatment plant in the computer’s memory that enables it to trace the incoming sewage through the plant to the receiving water. The objective of this module is to generate data that describes the state of the plant’s two streams as they pass through the system. The first stream, the liquid stream, consists of one currency parameter, Liquid Flow, and five coordinate parameters: Toxic Compound  }.  {  Solids, Substrate, NH 3  —  N  3 Aro  —  N, Conservative  The second stream, the air stream, consists of a currency parameter  only: Air Flow. The design of this module is patterned after the Structural Paradigm discussed in Chapter 6. •The simulated treatment plant consists of a primary clarifier, an aeration basin and a secondary clarifier. The plant is identical to the one described in Quasim’s book on wastewater treatment plant design [249](Figure 9.3). Influent Characteristics The influent characteristics were modelled by passing yearly average values for each of the liquid stream’s components through a seasonal, weekly and daily hydrograph [209] [283].  Chapter 9. Synthesis  303  Influent  Effluent  Compost  Adjustable Parameter Join Node Split Node  0 0  Figure 9.3: Simulated Plant’s Flow Sheet  Chapter 9. Synthesis  304  Primary Clarifier The primary clarifier model consists of a function that estimates the removal of settleable solids and COD given the clarifier’s hydraulic loading rate [225]. Aeration Basin The aeration basin is modelled using Lessard’s Activated Sludge Model [194]. Lessard’s model is a subset of the JAWPRC model [142]. Secondary Clarifier A secondary clarifier performs two functions: clarification and thickening. Clarification is modelled using a simple linear function that adjusts the percent of solids spilling over the weir using the clarifier loading rate and the SVI of the sludge [32].  The model  assumes that the clarifier is a perfect thickener. i.e. the underfiow solids concentration is calculated using a simple mass balance. Sludge Processing Waters The example uses Arun’s regression model that estimates the return flow rate, solids concentration and COD concentration from the sludge treatment processes using the plant’s influent characteristics [15]. 9.3.2  Information Gathering Process Module  The Information Gathering Module uses the stream parameters estimated by Treatment Process Module to generate the data the operator would obtain through his monitoring program. The Information Gathering Module accomplishes this using two functions: Sampling and Measurement Error functions. The role of these two functions is best  Chapter 9. Synthesis  305  described by a simple example. Assume the operator collects a 24 hour composite effiueut sample and determines its COD concentration. The Sampling Function would construct the 24 hour average to which the Measurement Error Function would add a random error. The preference and quality functions calculate a number between 0 and 1 that repre sent the COD value’s desirability or quality. The purpose of the preference function is to make a measure more sensitive to changes around a critical value, i.e. permit limit. The purpose of the quality function is to make a measure less sensitive, particularly when the measure is outside the test’s detection range. In some cases, the quality function rounds off the observed value or replaces the observed value with a more realistic num ber. The Preference and Quality functions used in this example were constructed using information from Standard Methods [176]. Table 9.1 lists the measurements used in this example. 9.3.3  Adjustable Parameters  The example assumes the operator is able to adjust the recycle flow rate and the wastage rate. The operator decides to fix the recycle rate at 60% of the influent flow rate and to fix the wastage rate to maintain a 15 day MCRT. 9.3.4  Simulation Scenario  The simulation scenario function changes model coefficients to simulate changes in the treatment plant. Figure 9.4 shows how the true SVI is changed to simulate the occurrence of a bulking sludge in the treatment plant. 9.3.5  Detection Of Change  The simulation uses three measures to detect change in a data series:  Chapter 9. Synthesis  306  Table 9.1: Observed Measurements  Control Context QI CI SI NI  Measure Name Influent Flow Rate lnf. COD Inf. Solids Inf. Ammonia  Adjustable  WASQ RASQ  Wastage Rate Recycle Rate  Performance  CE  Elf. COD  SE SEP NE  Pref. and Qual. Eff. Solids Preference Elf. Anunonia  PF PC PS PN MVS UAS OUR SVI SET FIN  Sludge Proc. Flow Rate Sludge Proc. COD Sludge Proc. Solids Sludge Proc. Ammonia MLVSS Underfiow Solids (RAS) Oxygen Uptake Rate Sludge Volume Index (SVI) Settleability Filament Number  Disturbance  Status  Sample Type Daily Average Composite Composite Composite  Sampling Frequency Daily Daily Daily MWF  Composite  Daily  Composite  Daily  Composite  Daily  Daily Average Grab Grab Grab Grab Grab Grab Grab Grab Grab  Daily MWF MWF MWF Daily Daily Daily Daily Daily Daily  Chapter 9. Synthesis  307  Simulation Scenario Secondary Clarifier Model Parameter 110 100  StartCL2  90 Bulking Event 80  / StopCL2  22-Jan 03-Mar 12-Apr 22-May 01-Jul 10-Aug 11-Feb 23-Mar 02-May 11-Jun 21-Jul  Figure 9.4: Simulation Scenario: True SVI Value  Chapter 9. Synthesis  308  • Daily trends are detected by a simple direction measure described in Figure 8.3. • Weekly Trends are detected by examining the position of the current week’s val ues relative to the last four weeks. Five number summaries are used for these comparisons. Five number summaries were discussed in Chapter 2. • Changes in stability are detected by examining changes in the Absolute Median Deviation over the last four weeks. This measure was discussed in Chapter 2. 9.3.6  Detection and Response To Change  The detection and response to a change in a data set may be broken into two steps. The first step is to rule out the possibility that the change in the data set is due to a problem in the measurement process. Once this is ruled out, the second step is to try to reconstruct a reason for the change in the process and to plan out a response if the performance of the process is deteriorating. Figure 9.5 contains an outline of this process. 9.4  Simulation: Coping With A Bulking Sludge  The simulation starts on November 1, 1990 and ends on August 29, 1991. The Measure ment Model starts January 1,1991 and the hulking event starts on