UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Reconstruction of process topology using historical data and process models Ongalbayeva, Aigerim 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_february_ongalbayeva_aigerim.pdf [ 1.84MB ]
Metadata
JSON: 24-1.0340670.json
JSON-LD: 24-1.0340670-ld.json
RDF/XML (Pretty): 24-1.0340670-rdf.xml
RDF/JSON: 24-1.0340670-rdf.json
Turtle: 24-1.0340670-turtle.txt
N-Triples: 24-1.0340670-rdf-ntriples.txt
Original Record: 24-1.0340670-source.json
Full Text
24-1.0340670-fulltext.txt
Citation
24-1.0340670.ris

Full Text

RECONSTRUCTION OF PROCESS TOPOLOGY USING HISTORICAL DATA AND PROCESS MODELS  by  Aigerim Ongalbayeva  B.Sc., M. Auezov South Kazakhstan State University, 2012  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Chemical and Biological Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  December 2016  © Aigerim Ongalbayeva, 2016 ii  Abstract  Modern process industries are large and complex. Their units are highly interconnected with each other. If there is an abnormal situation in the process, the faults might propagate from one part of the process to another. To keep the process safe, it is vital to know a causality and connectivity relationship of the process.  Alarms control all process variables and let operators know if there is any fault in the process. During a process malfunction, alarms start from a single process variable and quickly propagate to other variables. This leads to alarm flooding showing continuous appearance of alarms in the monitoring panels. During alarm flooding, it is difficult for operators to find the root cause and solve the problem on time. Causality analysis between different variables is one of the methods to avoid alarm flooding. The method helps to provide a process topology based on the process models and data. Process topology is a map that shows how all units and parts of the process are connected; it helps to find root causes of the fault and to predict future abnormalities. There are many techniques for causality detection. Transfer entropy is a popular method of causality detection that is used for both linear and nonlinear systems. The method estimates the variables’ entropy using their probabilities.  This thesis focuses on the transfer entropy based on historical data of the Tennessee-Eastman process. The Tennessee-Eastman is a widely used benchmark in process control studies. The thesis aims to detect the causality and connectivity map of the continuous process measurements. Particle filters or Sequential Monte Carlo methods are also considered to approximate density functions of the filtering problem by spreading particles. iii  Preface  The “Reconstruction of Process Topology Using Historical Data and Process Models" thesis presents the research that I performed during my MASc study. The research was identified, initiated and supervised by Professor Bhushan Gopaluni. All the equations in the Chapter 3 are extracted from literature review in the field of process control. The application of those theories to the historical data of the Tennessee-Eastman process is the original work. The research and simulation was done by me under the direct supervision of Professor Gopaluni. Chapter 4 is based on research performed with Jiaqi Gao in B. Gopaluni’s group with the professor’s help. The problem definition and literature review of this chapter was done by me. Prof. Gopaluni together with Dr. Aditya Tulsiya supervised and helped with deriving equations and coding on Matlab. iv  Table of Contents  Abstract .......................................................................................................................................... ii Preface ........................................................................................................................................... iii Table of Contents ......................................................................................................................... iv List of Tables ................................................................................................................................ vi List of Figures .............................................................................................................................. vii List of Abbreviations ................................................................................................................. viii Acknowledgements ...................................................................................................................... ix Chapter 1: Introduction ................................................................................................................1 1.1 Background and Motivation ........................................................................................... 1 1.2 Process Safety Design ..................................................................................................... 3 1.3 Alarm Flooding ............................................................................................................... 5 1.4 Outline of the Thesis ....................................................................................................... 8 Chapter 2: Literature Review .......................................................................................................9 2.1 Causality and Connectivity ............................................................................................. 9 2.2 Information Entropy...................................................................................................... 13 2.3 Transfer Entropy ........................................................................................................... 20 Chapter 3: Causality Detection via the Transfer Entropy Method ........................................25 3.1 Data-based Causality and Connectivity Detection Analysis ........................................ 25 3.2 Calculation Method ....................................................................................................... 29 3.3 Tennessee-Eastman Process .......................................................................................... 32 3.4 Case Study .................................................................................................................... 39 v  Chapter 4: Transfer Entropy Using Sequential Monte-Carlo Method ..................................49 4.1 Transfer Entropy Using Sequential Monte Carlo Method ............................................ 49 4.2 Methodology ................................................................................................................. 50 4.2.1 Approximation of               ........................................................................... 52 4.2.2 Approximation of          ....................................................................................... 55 4.3 Case Study .................................................................................................................... 57 4.3.1 A Simple Nonlinear Process ..................................................................................... 58 4.3.2 A Single Tank ........................................................................................................... 59 Chapter 5: Conclusions ...............................................................................................................61 5.1 Summary of the Present Contribution........................................................................... 61 5.2 Areas for the Future Work ............................................................................................ 62 Bibliography .................................................................................................................................63 Appendix A ...................................................................................................................................66 A.1 Particle Approximation of Transfer Entropy Algorithm ................................................ 66  vi  List of Tables  Table 1.1. Protection layers of a chemical plant……………………………………………. 3 Table 1.2. The Engineering Equipment and Materials Users Association (EEMUA) benchmark [6].………………………………………………………………………………  6 Table 3.1. Continuous process measurements……………………………………………… 35 Table 2.2. Sampled process measurements………………………………………………… 36 Table 3.3. Estimated TE values of the TEP………………………………………………… 44 Table 4.1. Transfer Entropy Change as a Weighting Parameter Function…………………. 58 Table 4.2. Transfer Entropy Change as a Recycle Ration Function………………………... 60  vii  List of Figures  Figure 2.1. Causal map of a random process’ variables….…………………………………... 10 Figure 2.2. Detection of X having causal influence on Y………………………………..…... 11 Figure 2.3. The entropy H(q) of a coin toss with probabilities p=q and p=1-q [19] ………… 16 Figure 2.4. Representation of the conditional entropy and the mutual information [20] …… 20 Figure 2.5. Transfer Entropy Graphical Explanation [25]..………………………………….. 21 Figure 3.1. Data-based methods for causality detection…………………..…………………. 26 Figure 3.2. The Tennessee-Eastman Process Flowsheet [28] (© 1993 Elsevier).……………………....................................................................................................  33 Figure 3.3. Time Trends of Several Process Streams………………………………………… 40 Figure 3.4. Mean Testing for Stationarity Results…………………………………………… 41 Figure 3.5. Connectivity Graph of 22 Process Variables Based on Transfer Entropy Values………………………………………………………………………………………..  43 Figure 3.6. Transfer Entropy Values from Variable 1→6 and from Variable 6→1………... 45 Figure 3.7. Connectivity of Process Streams………………………………………………… 46 Figure 3.8. Connectivity Map of the Continuous Process Variables………………………… 48 Figure 4.1. A single tank with a recycle stream……………………………………………… 59   viii  List of Abbreviations  AR – Autoregressive  BPCS – Basic Process Control System DCS – Distributed Control System DTF – Directed Transfer Functions EEMUA – Engineering Equipment and Materials Users Association ESD – Emergency Shutdown GCCA – Granger Causality Connectivity Analysis MI – Mutual Information MIMO – Multiple-Input Multiple-Output MISO – Multi-Input Single-Output MPC – Model Predictive Control P&ID – Piping and Instrumentation Diagram PDC – Partial Directed Coherence PDF – Probability Density Functions PFD – Process Flow Diagrams SIS – Safety Instrumented System SISO – Single-Input Single-Output SMC – Sequential Monte Carlo TE – Transfer Entropy TEP – Tennessee-Eastman Process  ix  Acknowledgements  I would like to express my sincere gratitude to Dr. Bhushan Gopaluni for believing in me and accepting me into his research group. Dr. Gopaluni introduced me into the process control world and shared his knowledge with me. I appreciate his patience, experience,  guidance  and  understanding  that helped  me  considerably  throughout  my graduate school.  I am thankful for the friendly atmosphere in the process control research group. The research group members helped me any time I had questions related to the research and beyond that. Thanks to friends in Chemical and Biological Engineering department for providing information about graduate life and helping with any struggles I faced during my studies. Special thanks to Yiting and Roza for helping me with my thesis. My special thanks go to the Kazakhstani government for providing the Bolashak International Scholarship that funded 2 years of my graduate studies. I also would like to pay very special gratitude to my parents Uzakbay and Nazira and my brothers Yerzhan and Nurzhan for their support and being on my side at the times of difficulty despite being thousands of miles away from me.  I am grateful to all my friends in Vancouver who made me feel like home from the first day I arrived to Vancouver. Thank you for your friendship and helping me throughout our student lives. 1  Chapter 1: Introduction  1.1 Background and Motivation  Modern industries are becoming more complex, and legal requirements of process safety are becoming stricter. Society is more aware of possible hazards of industries. Thus, process safety is one of the main concerns of engineers. The process control system, instrumentation, and alarms play critical roles in ensuring the plant safety. However, dysfunction of these instruments is often a major reason leading a serious incident [1]. To keep high reliability and safety, recent industrial setups are equipped with highly automated diagnosis systems and are controlled by thousands of sensors. The sensors are set in different areas of a process plant; they observe physical and environmental conditions of the plant [2]. The main goal of the diagnosis systems is to trigger alarms when there is an abnormal situation in the process and maintain smooth performance. Operators are reported about any malfunction by devices called alarms. Alarms provide sufficient time to take proper actions to correct abnormalities and avoid huge catastrophes [3].  Finding faults on time is crucial to provide worker and plant safety, cost efficiency and products quality. If operators fail on finding the abnormalities, serious outcomes like human injuries and casualties may occur. When all of system’s locations are monitored, any event is detected well [4]. To operate the process reliably and efficiently, the alarms should be clear and relevant to the operator. They should indicate an abnormal process condition or event and have a defined response. The alarms should be unique for each case [5]. 2  Abnormal situations are detected and reported independently by many sensors causing nuisance alarms. Abundance of alarms leads to complexity of the alarm system. There are two most important qualities for alarms to be effective: to be well-designed and to be true. When the alarm system is designed well, operators are warned immediately about errors in the plant. Therefore, they correct the errors right away. The notification of abnormal situations must be accurate.  In other  words,  operator have to be provided  with  alarms  that are  necessary  and  meaningful, but  not  those  that  are  unnecessary, confusing or redundant. Too many and false alarms mislead operators while taking actions to correct the errors. Poorly designed alarms caused several accidents in the past [6].  Petrochemical plants suffer a major incident in every three years, on average. Inefficient monitoring leads to many of that incidents rather than equipment failures. Several examples of major incidents in petrochemical plants are:  The Deepwater Horizon oil spill in 2011;  The Buncefield fire in 2005;  The BP Texas city refinery explosion in 2005;  The Three Mile Island nuclear power plant accident. All of the given examples may have different causes; however, it is important to remember that most of the accidents could be prevented with better alarm systems and more careful monitoring. Therefore, an effective alarm system is vital [6]. Nine-tenths of industrial processes are non-linear, and it makes alarm design more challenging [45]. The main two challenges are:   Measurements are not necessarily Gaussian distributed;  Measurements are correlated. 3  A quality alarm is an annunciated  process  condition or  event  to  which  the  operator can  and  should  take  corrective action  in  order  to  return  the process to stable and safe operation. Because of nuisance alarms, they might get a false impression about the alarms. Under such tense conditions, their performance level decreases and might lead to serious consequences [7].  1.2 Process Safety Design  Table 1.1. Protection layers of a chemical plant 1 Process design Prevent accidents from happening 2 Basic controls process alarms, and operator supervision 3 Critical alarms, operator supervision, and manual intervention 4 Automatic action of Safety Interlock System (SIS) or Emergency Shutdown (ESD) 5 Physical protection (relief devices) Lessen the loss from accidents 6 Physical protection (dikes) 7 Plant emergency response 8 Community emergency response   4  In current industries process safety relies on several protection layers that are shown in the Table 1.1. The layers are independent from each other. The purpose of the layers is to handle an abnormal incident at the lowest possible layer [8]. The process design itself provides the first level of protection. The basic process control system (BPCS) is in the next two layers; they are extended with two levels of alarms and operator supervision or intervention. If a measurement has exceeded its specified limits, an alarm points it out and may require either operator intervention or an automated response. The fourth protection layer comprises of a safety instrumented system (SIS) and/or emergency shutdown (ESD) system. The SIS automatically corrects the fault when the process and BPCS layers cannot manage an emergency. For instance, if there is a high temperature in a chemical reactor, the SIS could shut off the reactant and catalyst pumps automatically.  There are passive relief devices, such as rupture disks and relief valves in the fifth layer of protection. They provide physical protection by preventing excessive pressures from being generated within process vessels and pipelines. During pressurization, the relief valve or rupture disk opens and fluid is released to a proper location like combustion flare stack, incinerator, scrubber, waste treatment facility, or the atmosphere. Those passive devices work on their own, independently of the SIS system.  Lastly, dikes are situated around process units and storage tanks to restrain liquid spills. When extremely difficult situations occur, emergency response plans should be used to address them, inform the nearby community, and implement evacuation plans. To sum up how the multiple-layer protection system works: “In well-designed and operated chemical processes, majority of malfunctions are contained by the first one or two independent protection layers. The middle levels protect against major releases and the farthest 5  from the center layers provide mitigation response to very rare major events. For major hazard potential, even more layers might be needed”. Obviously, as given in the Table 1.1, process control and automation have significant roles in securing and remaining process safety. Mostly, independent protection layers involve instrumentation and control equipment [1].  1.3 Alarm Flooding  Constant attempts to increase performance, safety, and profitability of today’s industrial processes have made the processes strongly interconnected and sophisticated. The high expectation obviously needs thousands of sensors to control the processes. The information from these sensors is vital for the performance and safety of the whole industry. Pneumatic controls were used in the past, and they were expensive. Because of the computer-based control systems, alarms are cheap these days. The number and frequency have increased, as a consequence. A new term, alarm flood, is needed to describe the excess alarms [9]. When operators in process industries get more than 10 alarms per 10 minutes, there is a crucial issue called the alarm flood. Alarm floods usually lead to emergency shutdown or huge plant upsets. The root cause of analysis of alarm floods history can be used as a signature and can be an advantage for operators to take corrective actions in the future similar incidents [4]. In case of emergency, alarms start from one variable and spread to other variables rapidly causing continuous alarms on a monitoring panel. This is one of the reasons for alarm flooding [10]. According to the Engineering Equipment and Materials Users Association (EEMUA), alarm floods are a main contributor to catastrophic events; operators are overloaded with alarm floods. The notable examples of financial loss, environmental damage, human injuries and loss 6  happened because of alarm floods as shown in the EEMUA Publication 191. There are numerous historical examples where large number of alarms has resulted in catastrophic events with significant human and material losses because operators were overloaded with excess of alarms could not determine correct action on time. These events often result in environmental pollution as well. In order to show the importance of this information, the following equation was proposed for managers who work in corporations, operations, industrial health, safety, and environmental departments [5]:  Floods ~ incidents ~ loss  Table 1.2. The Engineering Equipment and Materials Users Association (EEMUA) benchmark [6]  EEMUA Oil and Gas Petrochemical Power Average alarms per hour ≤6 36 54 48 Average standing alarms 9 50 100 65 Peak alarms per hour 60 1320 1080 2100 Distribution % (low/med/high) 80/15/5 25/40/35 25/40/35 25/40/35  7  The difference between industrial standards and real numbers of alarms in different process industries is given in the Table 1.2. The reality is far from the recommended standards; it means operators have to work on more alarms than expected [6]. Controlling the alarm floods result in less number of accidents, fewer loss, and lower risk. Alteration of a process or process equipment is a critical period for the process alarm system to function fully [34]. Those changes and alarm floods happen at the same time usually. The probability of alarm floods happening in a process control room is troublesome because of the following reasons:  Excess of alarms may lead to operators missing the critical alarms;  A deluge of alarms is a remarkable disturbance while the operator is dealing with the plant’s   upsets;  Alarm floods can be a demonstration of larger problems in the process safety system. Alarms are designed for one operational state – run. When the process changes its state, from run to upset or from run to shutdown, there will be alarm flood. The reason is operating criteria change according to process’s state change; it affects thousands of alarms. A lot of alarms sound in a small time interval when the process state changes. A couple of alarms in the beginning point out the event change and alert operators. As the alarms propagate through the process, excess of needless alarms are displayed in the controlling monitor. In case there already was an alarm flood, the new ones would be added to the existing alarms with no root cause difference. The operators struggle with evaluating and recognizing all alarms. To prevent alarm flooding and minimize the alarm numbers, different strategies like causality analysis and alarm rationalization have been offered by researchers. Alarm system rationalization is one of the main process industry concerns. Distributed control system (DCS) 8  stores huge amounts of data in historian that is used as an alarm rationalization resource. Process engineers are usually aware of the plant’s standard behavior; they are expected to know which data is fault-free and which data is faulty [5].  1.4 Outline of the Thesis  Chapter 2 provides preliminaries and literature review of causality detection and information entropy. To work with alarm flooding, data-based causality and connectivity detection methods and their requirements are explained here. The chapter includes information about the causality and connectivity detection method – transfer entropy (TE). Chapter 3 focuses on historical data-based transfer entropy. Description of the Tennessee-Eastman Process (TEP) is given because the process is used in the research as a case study. In the case study the process topology of the popular TEP is examined and is given. Chapter 4 explains about estimating transfer entropy based on process models. The particle filter or sequential Monte Carlo methods are considered to quantify transfer entropy. The case study shows how transfer entropy values vary by changing process variables. Chapter 5 discusses on conclusion and future work. A summary of work presented in the thesis is described and possible future research topics are discussed. 9  Chapter 2: Literature Review  2.1 Causality and Connectivity  Process units in today’s industrial processes are highly interconnected with each other and mutually dependent. Causality (or causation) and connectivity terms are introduced to describe the causality relationships between units or events. While causality focuses on cause-effect relationships between process variables, connectivity describes physical and information linkage in a process. In the context of process topology, connectivity relates to material and information flow connections between or within process units, or sensors, or actuators, or controllers. This linkage illustrates qualitative process knowledge without the need of first-principle models [11]. Process topology, a constructed network called a causal map, represents the cause-effect relationships. The network has nodes symbolize variables and arcs symbolize their causal relationships.  Because the causal map shows the direction of disturbance propagation, causality analysis is considered an effective way to spot root cause and to explore fault propagation pathways [12]. In the Figure 2.1, an example of an oscillating variables’ causal map obtained from a causality detection method is shown. The blue lines with arrows represent unidirectional causality and the red bold lines indicate bidirectional causality. The primary tools used in establishing connectivity are process flow diagrams (PFDs) and piping and instrumentation diagrams (P&IDs); thus we need to convert them into standard formats like adjacency matrices, digraphs, and semantic web models that are easily accessible and computer-friendly. Causality between process variables can be built through process data as well as process models. Thus, it can be described qualitatively, with certain quantitative 10  information, using structural equation models, matrices and digraphs, and matrix layout plots [11].   Figure2.1. An examples of a causal map of random process variables  Even though causality is in our routine life, it was not considered as a scientific term until a randomized experiment to test causal relations from data was formulated. The term causality between two random variables X and Y was first introduced by Wiener. There, X is a “cause” variable, and Y is the variable whose predictability is improved by including information about X. Wiener’s idea could not be used in practice at the time of its introduction due to the mathematical complexity of the idea.  However, much later, a method to analyze data observed in continuous time series was put forward by Granger. He used the idea of causality and identified its two details:  The cause occurs before the effect;  The cause has unique information about the effect.  11  With this definition of causality, the effect variable can be predicted by the cause variable. In other words, if the future of Y can be forecasted using the past information of both X and Y better than only using the past information of Y, the X variable causes the Y variable. The first experimental example of causality detection by analyzing consecutive time series was demonstrated by Granger. He formalized the causality identification idea in linear regression models by the following thought: we consider that there exists causality from random variable X to Y if the variance of the autoregressive prediction error of Y can be reduced by additionally considering the historical data of X. The definition is – if the variance of the autoregressive (AR) prediction error of Y at the present time is reduced by inclusion of past measurements of X, X is said to have a causal influence on Y.   Figure 2.2. Detection of X having causal influence on Y  According to the definition above, the time flow or temporary direction is the main import of the causality analysis. The interactions might be unidirectional and bidirectional. The direction of causality is asymmetric; thus X causes Y does not mean Y causes X.  Those interactions show the main differences between causal impact and relations represented by ordinary coherence and mutual information. Causality and correlation are not same things. If event X is correlated with Y, it means Y is correlated with X. However, if event X causes event Y, it does not imply Y causes X. Granger’s work motivated researchers to propose many kinds of methods to detect causality [13]. After that, some more advanced techniques for causality 12  identification have been proposed by different researchers, such as extended Granger causality, nearest neighbor methods [14] and transfer entropy (TE) [15]. Causality detection approach identifies the total and spurious causality which are overly sophisticated set of pathways; root cause diagnosis and propagation pathways examination might be erroneous because of the sophistication of the pathways. If direct and indirect, true and spurious causalities are distinguished from each other, the causal map would be more accurate and can be more easily used to reveal the purposes of the causality detection [13]. Causality analysis is applied for capturing process connectivity to find the fault propagation pathways and detect the root cause(s) of plant-wide defects and upsets. For example, there might be oscillations all over the plant. There are many potential methods to find the connectivity of the process, but no rule to determine which method to use in case of oscillations. Illustration of the causality analysis approach’s usefulness is important for giving suggestions on choosing the most appropriate method and providing directives on dealing with the common problems.  It is important to demonstrate the value and efficiency of the causality analysis methods on a benchmark industrial data set. It gives suggestions on how to select the suitable method and provides guidelines on how to deal with similar problems [16]. Causality and connectivity have a lot of potential applications, among which we focus on analysis and design of large-scale complex industrial processes. A direct application of establishing connectivity and causality is to build a topological model before parameter identification for complex industrial processes that are usually multi-input, multi-output (MIMO) systems with many internal closed loops. In abnormal situation management, process topology can be employed for root cause analysis. When an operator finds that something is wrong in a process, there may be multiple abnormal symptoms; to resolve this situation, the operator must 13  find the root cause promptly and eliminate it. Once the root cause is resolved, all the other issues caused by the root cause disappear accordingly. Process topology can be employed for risk analysis that is a way to examine a process to identify and evaluate problems that may represent risks. It can identify consequential alarms. Since a fault can propagate throughout the process, alarms also show up in a specific order. The list of consecutive alarms may be dependent; thus we call them consequential alarms leading to a dangerous situation as the console operators or engineers may not be able to identify true root causes. For this case, process topology would be of great help in describing the relationship between alarms. The most important task is to find the root cause of all related or consequential alarms. If the root cause is resolved, all of the alarms can be removed [11].  2.2 Information Entropy  Information metric, developed by E. Shannon, is a measure of information which is based on probability theory and statistics. The purpose of the term is to find fundamental thresholds on reliably storing and communicating data generated by independent samples with the given distribution. Information theory deals with estimating information of the disturbances mutually exchanged with random variables. Entropy is a probabilistic measure of uncertainty; information is a measure of that uncertainty’s decrease. In other words, statistical entropy is a single random variable’s main information measure. Entropy quantifies the uncertainty associated with predicting a random variable’s estimation; thus it is a random variable’s probability distribution property.  Information I of any discrete event x is:  14                   (2.1)  Where, p(x) is the probability distribution of the event x. The entropy HX of a random variable is:       ∑                              (2.2)  Where, continuity is assumed by assuming 0 log20 = 0 [20]. We shall also use the notation H(p) whenever we want to stress the dependence of the entropy upon the probability distribution of X. The logarithm to the base 2 is used because it is well adapted to digital communication. Thus, “bits” is used to indicate the entropy. In case the natural logarithm (to base e ≈ 2.7182818) is used, entropy is measured in “nats”. Bits differ from nats by a global multiplicative constant that is equal to units’ variation [17]. For evaluation of two random X and Y variables’ joint distribution uncertainty the joint entropy is studied:          ∑                    (2.3) When one of the variables Y is given, the conditional entropy of the variable X is: 15           ∑                    (2.4) Here, the conditional entropy is the average uncertainty about x that remains when y information is known [24]. The information I received from an observation is equal to the degree to which uncertainty is reduced [18]:                            (2.5)  The idea of information theory was first developed for thermodynamic studies. L. Boltzmann and W. Gibbs worked on most computations of the information theory with different possible events for the thermodynamics. Although information-theoretic entropy (denoted by H) and thermodynamic entropy (denoted by S) are different terms in studies, there are connections between them [17].  Missing information is entropy’s another name. Thus, when the random variable’s value is more “stochastic”, the entropy has higher value. Entropy, generally saying, is the logarithm of the variable’s typical values.  To get a better idea about the entropy and uncertainty, a coin toss example is described. The maximum uncertainty of a coin toss happens when the coin is fair; the probability of heads and tails are equal. The entropy of the coin toss reaches the maximum. With a fair coin, prediction of the outcome of the next toss is very difficult because both head and tails are equally 16  likely. The possibility of the coin coming up heads is ½, and that is the best prediction one can achieve by randomly guessing. Because of two possible results with the same probability, one binary bit has a log22=1 Shannon or bit entropy.  In case of double-headed or double-tailed coin toss, the entropy is zero. The reason is that the coin will always come up heads (or tails) only; there is no uncertainty. The prediction can be made with 100% accuracy, and the coin toss does not deliver any new information.   In case of an unfair coin toss, the outcomes still come up heads or tails with probabilities p and q with, p≠q. Thus, uncertainty of the outcome is less than in the fair coin toss. One side of the coin comes up more often than the other side. This leads to lower entropy, less than 1 bit of information in general [19]. In the Figure 2.3 a plot of the entropy H versus the probability p is depicted.   Figure 2.3. The entropy H(p) of a coin toss with probabilities p and 1-p [19] 17  The following characteristics [20], [21] of entropy H of event X (HX) make it important to use as a measure of information: 1. HX ≥ 0. 2. HX = 0 when the random variable X definitely takes one value with probability 1. 3. When x events in any X set with M elements are equally likely to occur, i.e. p(x)=1/M, their entropy H reaches the maximum. The equation of the entropy is:            (2.6)  4. For two independent random variables X and Y with probability pX,Y(x, y)=pX(x)pY(y), their total entropy is:        ∑                         ∑          (                   )            (2.7)  5. General entropy for any pair of random variables satisfies the following property:             (2.8)  The outcome can easily be generalized to n variables.  18  6. Entropy of events in any finite set can be added together. For example, consider a finite set of events X, and decompose it into X = X1∪X2, where X1∩X2=∅. Take the probability of X1 and X2 as q1 = x X1p(x) and q2, respectively. The conditional probability of x in each x   X1 and x   X2 are r1(x) = p(x)/q1 and r2(x), respectively. The total entropy can be written as the sum of two contributions:      ∑                           (2.9)  where:                          (2.10)          ∑                      ∑                    (2.11)  Information content is mathematically expressed by several concepts related to entropy such as the self-information of an individual message or symbol taken from a given probability distribution, the entropy of a given probability distribution of messages or symbols, and the entropy rate of a stochastic process. A measure of information between two random variables is mutual information (MI). It is a property of the joint distribution of those random variables, and is the maximum rate of reliable communication across a noisy channel [19]. 19  The mutual information between jointly distributed discrete X, Y random variables is calculated by the following formulas:          ∑                                                                         (2.12)  Between continuous X, Y variables with probability function f(x,y), the mutual information is found by:         ∬                            (2.13)  For the X and Y variables it is possible to represent the different entropic quantities with an analogy to set theory. The Figure 2.4 below describes various entropic quantities and explains that mutual information is the uncertainty that is common for both of the variables [20].    20   Figure 2.4. Representation of the conditional entropy and the mutual information [20]  2.3 Transfer Entropy  Alarms are important to control and safely operate a process; however, they propagate from one variable to other variables causing alarm flooding. As discussed in the previous chapter, alarm floods fill the monitoring panels with messages and hinder operator’s work. Engineers propose different ways to prevent alarm flooding. One way of tracking down the origin of the alarm flood is by using process topology. A causal network, that has links between two nodes, is constructed using the results of the causality analysis between a pair of variables. A causal network for multivariate systems is constructed by applying a statistical confounding examination. Causality analysis helps to extract process topology from data of any given variables. Different methods exist for quantifying causality between the variables; transfer entropy analysis is a well-known method that can be used with both historical data of the process, and/or it’s physical models. The result of the transfer entropy estimation describes causality relationships between all variables [22]. 21  Transfer Entropy (TE) is a method that effectively quantifies the causality and connectivity between any two variables of a process. The method can be used for both linear and nonlinear process models. When a complex system’s dynamics are not known, the first choice to use is the TE method. The TE method was offered by Schreiber to differentiate driving elements from responding elements in 2000 [26]. The amount of information transferred from an x variable to any y variable is calculated by the TE. The result is entropy that explains causal and connectivity relationships between the process variables [15].  The TE has reached success in different study areas like chemical engineering and neurosciences. Based on random variables, the TE has two forms: discrete TE (TEdisc) for discrete random variables and differential TE (TEdiff) for continuous variables [23].   Figure 2.5. Transfer Entropy Graphical Explanation [25]  22  Information transfer is defined as the amount of information that a source provides about a destination’s next state that was not contained in the destination’s past. This definition pertains to Schreiber’s transfer entropy measure [24]. Graphical explanation of the transfer entropy is depicted in the Figure 2.5. Consider two random variables X and Y. They are sampled at           time instants. Symbols of the variables are xtϵ[xmin,xmax] and ytϵ[ymin,ymax]. At h steps in the future, xt+h is the value of X at t+h time instant. Here, h is the prediction horizon. The embedded vectors with the past values of X and Y are given by xt(l)=[xt,xt- τ,…,xt-(l-1) τ] and yt(k)=[yt,yt- τ,…,yt-(k-1) τ], respectively. Here, l is the embedding dimension of X and k is the embedding dimension of Y. τ is the time interval that allows sampling the embedded vectors [26].  For continuous variables, the equation for the differential TE (TEdiff) from X to Y calculation:        ∫                                                          (2.13) Where,                     is the joint probability density function (pdf) of yt+h, yt(k), and xt(l);                     is the conditional probability density function (pdf) of  yt+h, yt(k), and xt(l);               is the conditional pdf of yt+h and yt(k )variables. Moreover, w represents a random vector [yt+h,yt(k),xt(l)]. When the elements of w are considered as w1, w2,…,ws, then ∫      =∫  ∫                    . Again, k is the embedding dimension of X and l is the embedding dimension of Y. 23  The numerator of the logarithm,                    , represents the prediction result of the Y value when the historical data of X and Y are both known. The denominator of the logarithm,              , represents the prediction result of the Y when the historical data of  Y only is known. The idea of the TE method is: if there is causality and connectivity relationship from X to Y, it is useful to predict the value of Y using the historical data of X. The numerator value should be higher than the denominator value. The value of the whole estimation of the TE should be greater than 0.  TEdiff is a measure for continuous random variables. In real industrial processes, sampled time series data are used to describe continuous variables. The discrete TE formula, TEdisc is needed to estimate the sampled data’s entropy transfer [15]. There are several methods of TEdisc approximation of TEdiff.  In order to estimate TEdisc, x and y variables have to quantized first. Let  ̃ and  ̃ denote quantize x and y continuous variables [12]. The supports of x and y, i.e., xtϵ[xmin,xmax] and ytϵ[ymin,ymax] are classified into nx and ny nonoverlapping bins. The bin sizes of x and y are ∆x and ∆y, respectively. For instance, a uniform quantizer of x:                       (2.14) The bin size is related to the variable supports [xmin,xmax] and [ymin,ymax] and the bin number. For a given variable support, the larger the bin number, the smaller the quantization bin size. The smaller the bin size, the more accurate is the quantization. It means TEdiff and TEdisc 24  values are closer. When the quantization bin sizes of both x and y approach 0, TEdiff from x to y is the same as TEdisc. When nx and ny quantization bin numbers increase, the computational complexity of the summation and the probability estimation will increase significantly. There is a compromise between the quantization accuracy and the computational burden in TEdisc estimation. In real processes, it is difficult to satisfy the quantization bin sizes approach 0.                          (2.15) The TEdisc from  ̃ to  ̃ is approximated by the following formula:     ̃  ̃  ∑   ̃     ̃      ̃           ̃     ̃      ̃        ̃     ̃       (2.16) Where,    ̃     ̃      ̃      is the joint probability distribution;    ̃     ̃      ̃      and    ̃     ̃      are conditional probabilities. The sign of sum means k+l and sums all amplitude bins of the joint and conditional pdfs [23].    25  Chapter 3: Causality Detection via the Transfer Entropy Method  3.1 Data-based Causality and Connectivity Detection Analysis  Causal relationships and effective connectivity are measured by different quantitative and qualitative methods. The quantitative methods use random variable’s information concept to calculate the causality relationships [26]; the qualitative methods use process models to determine causality and connectivity of the process variables [35]. Both methods have benefits and drawbacks. Qualitative methods, like digraph based models, are popular in root cause and hazard propagation investigation. The causal relationships between process variables and their propagation pathways are defined by considering available knowledge of the process [36]. However, expert knowledge is not always easily accessible, and extracting the knowledge about nonlinear processes of a large industry requires a lot of time and is prohibitively expensive. The digraphs are modeled by mathematical equations. The large and integrated processes’ mathematical models are highly complicated and are rarely practical or precise [37]. Data-based approaches may be more suitable in such cases.  Another way of causality detection is data-driven methods like directed transfer functions (DTFs), partial directed coherence (PDC), Granger causality, and path analysis [23]. A valuable source of information for modelling and analysis is data. Process data is determined by values of process variables over a timeseries. A variety of data-based methods are utilized for causality detection between process variables. The methods can be divided into three main categories. The Figure 3.1 below explains the categories. 26    Figure 3.1. Data-based methods for causality detection  The main three classes of causality detection are: lag-based methods like the Granger causality and transfer entropy; conditional independence methods, like the Bayesian network; and higher order statistics, like Patel’s pairwise conditional probability approach [13]. In this thesis the most commonly used lag-based method is considered. Since the method is chosen based on data characteristics, it is very important to know as much as possible about the data. The knowledge about the characteristic features of data helps to make a better choice of algorithms for transfer entropy estimation. Generally, the time series data are noisy and nonstationary. The Granger causality method is the first choice for linearly dependent data; it can be used for cases without linear dependence but with strong linear components. However, in the case of nonstationary, nonlinear, or non-Gaussian data, linear methods should not be used to analyze them [38]. Transfer entropy (TE) detects causality and connectivity based on information between two process variables and their conditional probabilities. Usually, TE is a data-driven method 27  that is suitable for both linear and nonlinear relationships and is used in chemical processes and neurosciences [26]. For linear and Gaussian data, TE is as good as Granger causality method. For causality detection, the first choice might be TE method when the complex system’s dynamics are unknown [23]. TE requires large volumes of data to estimate causal relationships. Taking into account the stochasticity of the data, the TE estimation algorithms are computationally inefficient [22]. The sampled data that are used for causality and connectivity analysis must be stationary and the data length must be large. When the system’s dynamical properties do not change during the observation period, the data are considered stationary. When the measurements from the system are inaccessible and the stationarity of process variables is doubtful, the dataset can be tested for stationarity [39]. The simplest and most popular method of testing for stationarity is measuring the mean and the variance for several parts of the dataset. A standard statistical hypothesis is used to see if the mean and the variance change over time. Less explicit non-stationarity of the dataset is estimated by quantities like spectral components, correlations or nonlinear statistics [40]. The data set is divided into n consecutive segments denoted by               , denoted by X1,X2,...,Xn, each containing s data points. Let µj denote the mean value of Xj,              and denote  ̅  ∑         , then the standard error of the estimated mean  ̅ is given by    √∑      ̅             (3.1)  28  where the standard deviation divided by an extra √  is the error when estimating mean value of Gaussian distributed uncorrelated numbers. The null hypothesis for stationarity testing is that the data set is stationary. The significance level for the mean testing is defined as  |    ̅|                     (3.2)  A six-sigma threshold for the significance level is chosen here. Specifically, if there exists     ̅     or     ̅     for            then the null hypothesis that the data set is stationary is rejected. If  ̅      ̅   ̅     holds for all js, then the null hypothesis is accepted that the data set is stationary.  For variance testing, let   ̂          denote the normalized data set of Xi, and   ̅̅̅̅    ̅̅̅̅     ̅  denote the corresponding consecutive segments, then we have   ̅̅ ̅   ̂           ̂            ̂    for          . Since the sum of squares of the elements in each segment has the chi-squared distribution with s degrees of freedom  ̂   ̂            ̂              ̂        , we can check whether or not the data is stationary by  ̂  with       . If there exists  ̂         for          , then the null hypothesis that the data set is stationary is rejected with (1-α)*100% confidence. If  ̂         for all js, then the null hypothesis is accepted [40]. TE suffers from highly nonstationary data because probability density estimation will be biased. Granger causality method assumes covariance stationarity; there is the asymptotic theory for that case. The Granger causality connectivity analysis (GCCA) toolbox in Matlab offers 29  examining shorter time series and differencing, in order to deal with non-stationarity [41]. However, these methods bring along new problems. Detrending and demeaning the data are two other methods often used. If the non-stationarity is because of the slow change of the parameters, it is usually not a problem [38].  Because of normal operational changes and production strategy changes, industrial processes encounter multimodal probability density functions. A large dataset for multimodal processes is usually nonstationary. The data change based on transitions from one mode to another. Since stationarity is a key assumption for TE method, multimodality has to be handled. The data have to be divided into several segments relative to different modes. For the time series segmentation, several time-series analysis methods are offered. When the segments correspond to the different modes of the process, causality detection is applied to appropriate sections. Causality and connectivity might change when the process mode changes [42].  3.2 Calculation Method  In this chapter, causality and connectivity relationship between process variables is quantified using the historical data of the process. As discussed before, there are different methods to quantify the causality. The TE from X to Y is understood as the information gained when using the past information of both X and Y to predict future of Y compared to only using the past information of Y. Mathematical definition of the differential TE is:  30         (          ) [   [                       ]]  ∫    [                       ]                  (3.3)  Here,                                          . Since there is an integral in the equation, X and Y are considered to be continuous random variables. For discrete random variables, the integral is replaced with summation over the sample space. The discrete form of the TE equation is expressed equivalently [26]:       ∑                                         (3.4) The main difficulty is estimation of conditional probability density function from the historical data.  Bayes theorem states that if the joint probability p(x,y) and marginal probabilities p(x) and p(y)  of two variables are given, the following relation is applied to find conditional probability of the variables:                         (3.5) Moreover, joint probability of two variables can be found by conditional probability and marginal probability of any variable [43].  31                     (3.6) By using the Bayes theorem, the TE equation that is expressed with conditional probabilities is simplified to:       ∑                                                      (3.7)  The joint probabilities of process variables are calculated by the kernel estimation [44]. For any q-dimensional multivariate data, the joint probability is found by the Fukunaga method using the following function:                    ∑ {                   }      (3.8) Here, S is the covariance matrix of the data. N is the number of samples. Xi is a q-dimensional vector that constitutes X1,…,XN. x is the q-dimensional vector [x1, x2,…,xq]T. Г is the chosen bandwidth to minimize the mean integrated squared error of the joint probability density function estimation. The bandwidth is estimated by:                .  {                   } is a kernel estimation. The estimation uses the Gaussian kernel function:                       (3.9) 32  The TE calculation complexity comes with the kernel estimation of joint probability density functions and conditional probabilities. For any q-dimensional joint pdf, the calculation complexity is equal to O(N2q2). The number of samples and the embedding dimension determine the speed of calculation. A large data set and more than 2000 observations are often needed to obtain reasonable accuracy of the density approximation [12]. Large data sets make the computation more complex and slow the computation speed. Thus, the embedded dimensions must have an upper limit.  Complex, large scale systems can be broken down into smaller units and examined for causal relationships within each unit and between different units. The causality and connectivity relationship diagram of the whole process is eventually established by putting together the causality maps from smaller units [23].  3.3 Tennessee-Eastman Process  The Tennessee-Eastman (TEP) process is a famous benchmark in process control and simulation studies. The process was first published by Downs and Vogel in 1993. There are 41 measurable variables and 12 manipulated variables in the process; thus, it is a large and challenging process. The 41 measurements are a mixture of continuous and sampled measurements. Basically, there are two products (G and H labels) from four reactants (A, C, D, and E labels) [27]. In addition, several inert and byproducts are  present (B and F labels).33   Figure 3.2. The Tennessee-Eastman Process Flowsheet [28] (© 1993 Elsevier)34  The following reactions occur in the process:  A(g) + C(g) + D(g) → G(liq), Product 1 A(g) + C(g) + E(g) → H(liq), Product 2 A (g) + E(g) → F(liq), Byproduct 3D(g) → 2F(liq), Byproduct  They are all irreversible and exothermic reactions. With regard to the reactant concentrations, the reactions are first-order. Their reaction rates depend on temperature through the Arrhenius theory. To produce G, the reaction uses more activation energy than others. This leads to more sensitivity to temperature.  The TEP contains 5 unit operations: a reactor, a product condenser, a vapor-liquid separator, a recycle compressor and a product stripper. An illustration of the process is given in the Figure 3.2. To produce liquid products, the gaseous reactants are fed to the reactor. A dissolved non-volatile catalyst is used in the gas phase reactions. To remove the heat of the reaction, an internal cooling bundle is set in the reactor. The produced vapors go from the reactor together with the unreacted reactants, leaving the catalyst in the reactor. The stream mixture from the reactor gets through a cooler and a vapor-liquid separator.  In the cooler the stream is condensed. Uncondensed part of the stream goes back to the reactor feed, through a centrifugal compressor. Condensed part of the stream passes to a product stripping column. In the stripping column, the product stream is stripped with feed stream number 4 and remaining reactants are removed. G and H products are separated in a downstream refining section, after leaving the stripper base. 35  The inert and byproduct are removed from the system from the vapor-liquid separator [28]. Each measurement is corrupted by additive noise; the statistical properties of the noise are unknown. The process is nonlinear, open-loop unstable, and contains a mixture of fast-slow dynamics [27]. As any industrial process, all process variables of the TEP are connected with each other. Since the process is huge, this thesis aims for the connectivity of the TEP derived from the historical data of 22 continuous process variables. In the Table 3.1 and Table 3.2 41 process variables are listed dividing into categories of the process variables like continuous process variables and sampled process measurements [28] (© 1993 Elsevier).   Table 3.1. Continuous process measurements Variable Name Variable Number A feed (stream 1) XMEAS(1) D feed (stream 2) XMEAS(2) E feed (stream 3) XMEAS(3) A and C feed (stream 4) XMEAS(4) Recycle flow (stream 8) XMEAS(5) Reactor feed rate (stream 6) XMEAS(6) Reactor pressure XMEAS(7) Reactor level XMEAS(8) Reactor temperature XMEAS(9) Purge rate (stream 9) XMEAS(10) Product separator temperature XMEAS(11) 36  Variable Name Variable Number Product separator level XMEAS(12) Product separator pressure XMEAS(13) Product separator underflow (stream 10) XMEAS(14) Stripper level XMEAS(15) Stripper pressure XMEAS(16) Stripper underflow (stream 11) XMEAS(17) Stripper temperature XMEAS(18) Stripper steam flow XMEAS(19) Compressor work XMEAS(20) Reactor cooling water outlet temperature XMEAS(21) Separator cooling water outlet temperature XMEAS(22)   Table 3.2. Sampled process measurements  Component Variable Name   Reactor feed analysis  (stream 6) A XMEAS(23) B XMEAS(24) C XMEAS(25) D XMEAS(26) E XMEAS(27) F XMEAS(28) 37   Component Variable Name     Purge gas analysis A XMEAS(29) B XMEAS(30) C XMEAS(31) D XMEAS(32) E XMEAS(33) F XMEAS(34) G XMEAS(35) H XMEAS(36)   Product analysis D XMEAS(37) E XMEAS(38) F XMEAS(39) G XMEAS(40) H XMEAS(41)   As with any other chemical processes, the control objectives for this process are:  1. Keeping process variables in an acceptable range;  2. Maintaining process operating conditions within the limits of equipment;  3. In case of disturbances, decreasing variability of product rate and product quality (stream 11). 4. Minimize valves’ motions that have an effect on other processes. 38  5. Fast and smooth recovery from disturbances and changes in production rate or product mix [28]. According to the Tennessee Eastman process authors, the problem is a testbed for different topics including plant wide control strategy design, multivariable control, optimization, predictive control, estimation/adaptive control, nonlinear control, process diagnostics and education, etc. The TE problem was used by a number of researchers for different purposes.  McAvoy and Ye presented a systematic approach to developing a plant-wide decentralized control system design. This design is based on multiple single-input-single-output (SISO) control loops.  The  resulting  design  can  form  the basis  upon  which  an  advanced  control  scheme such as  predictive  control can  be  built.  In  addition,  it  can also  be  used  to  compare  the  advantages  of  employing  other  more  advanced  control  approaches [29]. Multi-input single output (MISO) models for process characteristics like production rate and key compositions were established by Sriniwas and Arkun. The production rate and key compositions were used in an MPC algorithm [33]. Performance and robustness of the control systems were examined by Farina, Trierweiler, and Secchi. The authors applied the “Robust Performance Number” [30]. The physical model of the TEP was obtained by Ricker and Lee in 1995. They used the obtained model for development of a nonlinear Model Predictive Control (MPC) algorithm [31]. Ricker, moreover, studied optimization of the steady-state conditions for the six modes of operation by using economic data [32].  The resulting applications were accentuated more than the properties of the identified models by the research studies.  39  3.4 Case Study  A simulation case study that explains how the transfer entropy method shows connectivity of a process is given in this section. The Tennessee-Eastman process is considered as a case study. More detailed information about the process is given in the Section 3.3. The Tennessee-Eastman process data was generated using Ricker’s MATLAB Simulink model [1]. The simulation data consists of 7201 samples; however, only 3000 samples are considered here. We considered the data is Gaussian distributed. To ensure stationarity of the data, first 3000 and last 1201 data were removed. Those samples are not stationary because of the startup and process mode transition. Overall, there are 41 variables in the TEP process not including 12 process manipulated variables and 20 process disturbances. Even though manipulated variables and process disturbances affect connectivity of the process variables, we did not consider in the case study.  In this study, only 22 continuous variables (Table 3.1) of the process are considered. The transfer entropy estimation was not conducted on 19 sampled process measurements since the process is huge. Figure 3.3 shows the normalized time trends of several process variables including 1) reactor feed rate, 2) recycle flow, 3) product separator underflow, and 4) stripper underflow.   40   Figure 3.3. Time Trends of Several Process Streams: 1) Reactor Feed Rate, 2) Recycle Flow, 3) Product Separator Underflow, and 4) Stripper underflow  Even though the variables look stationary from the graph, the data have to be tested for stationarity. Using equation 3.1 and 3.2, the 3000 samples were divided into 6 segments that contain 500 samples. The mean test shows the data set we use is stationary. All variables are stationary; to save the space mean test results of only several process streams including 1) reactor feed rate, 2) recycle flow, 3) product separator underflow, and 4) stripper underflow are shown in the Figure 3.4. The solid lines show the six-sigma thresholds, and the stars represent mean values of the variable segments.  The mean values of each data set segments are within their respective thresholds confirming the stationarity.  41  1  2  3  4  Figure 3.4. Mean Testing for Stationarity Results 1) Reactor Feed Rate, 2) Recycle Flow, 3) Product Separator Underflow, and 4) Stripper Underflow  Based on the estimated transfer entropy values of all process variables, causality and connectivity between all 22 continuous variables of the process are shown with arrows in the Figure 3.5. The blue lines show unidirectional connections; the red lines show bidirectional connections. There are several process variables situated in the same process units like 7 – reactor pressure, 8 – reactor level, 9 – rector temperature, and 21 – reactor cooling water outlet temperature. Studying the connectivity graph of 22 variables of the TEP on the process flow sheet would be difficult because of high interconnection and mutual dependency. Thus, process 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 647.5147.5247.5347.5447.5547.5647.5747.5847.5947.647.61SegmentsMean Values1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 632.3232.3432.3632.3832.432.4232.4432.4632.4832.5SegmentsMean Values1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 625.1925.225.2125.2225.2325.2425.25SegmentsMean Values1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 622.8722.87522.8822.88522.8922.89522.922.90522.91SegmentsMean Values42  variables are numbered to make the graph less confusing.  Explanations of the numbers are listed in the following order in the Table 3.1. The transfer entropy values of all 22 variables are shown in the Table 3.3. In order to see where the information and material flows go next, connections between two variables are compared with each other. The stronger connection is considered to have higher transfer entropy value. Matlab results of the transfer entropy estimation using highly stochastic data of the nonlinear TEP are shown in the Figure 3.6. y-axes of the figure show unpredictability values of the variables; x-axes show indices of corresponding time lags. All other variables show similar graphs based on their TE. From the figure, it is shown that there is a stronger connection from the variable 1 (A feed) to the variable 6 (reactor feed) rather than vice-versa. Strongest connections between variables are shown at different y time lags. UP values show the unpredictability of variables when the past values of one variable and both variables are given, respectively. As it was explained in the Figure 2.5, the TE value of the variables is the difference between uncertainties given past values of one or both variables. 43   Figure 3.5. Connectivity Graph of 22 Process Variables  Based on Transfer Entropy Values 44  Table 3.3. Estimated TE values of the TEP   45     Figure 3.6. Transfer Entropy Values from Variable 1→6 and from Variable 6→1  Since the process is extremely complex and large, the graph of the connectivity relationship is very confusing. Moreover, all process units and variables are interconnected with each other; there might be connection through intermediate variables between any 2 variables. Based on the in-depth explanation of Downs and Vogel [28] about the TEP, the theory-based connections of process variables are drawn to understand where the information flow goes next. Purpose of the case study is to compare the theory-based connections with the transfer entropy based connections.  46   Figure 3.7. Connectivity of Process Streams 47  The bold blue lines represent the estimated connectivity of the streams based on the stronger connections between two variables. The estimated connectivity relationships match almost every theory-based connectivity relationships except from stream 10 to stream 9 and from stream 11 to stream 10. Connectivity graph of 22 continuous variables of the TEP are given in the Figure 3.8. The red lines represent theory-based connectivity of the variables; the bold blue lines show connections of process variables based on the estimated TE values. Instead of the process flow sheet, numerical order is used to avoid difficulty of seeing connections. The theoretical and estimated graphs are adequately similar. The difference is in seven directions: 5→12, 6→3, 6→8, 10→14, 17→14, 19→14, and 19→15.  The reason of the mismatch between theory-based connections and TE-based connections might be approximation errors during the calculation. Among estimated transfer entropies of process variables, negative values are observed (theoretically TE is supposed to be between 0 and 1). The reason might be the data is stochastic and the process is highly nonlinear with several feedback loops including recycle and control loops in it. Moreover, error cause might be in the approximation of differential TE (TEdiff) to discrete TE (TEdisc). It is important to approximate the discrete transfer entropy for applying on discrete historical data. Even though the normalized TE values do not give the overall process topology directly from the data because of the aforementioned reasons, they help to show the process connectivity by showing where the information or material flow go next. Overall, transfer entropy estimation the result is satisfactory for reproducing the theory-based connectivity of the TEP. 48   Figure 3.8. Connectivity Map of the Continuous Process Variables49  Chapter 4: Transfer Entropy Using Sequential Monte-Carlo Method  4.1 Transfer Entropy Using Sequential Monte Carlo Method  The Transfer Entropy (TE) method is calculated using probability density functions (pdfs) of process variables at different time instants. Usually, the processes are complex; the pdfs are high dimensional. Thus, data-based causality detection methods might have errors and be less efficient. When the used data are nonstationary, non-Gaussian, and nonlinear, the probability of getting wrong result from the only data-based TE method is high. To detect causality, the TE method based on models and particle filters is proposed in this chapter. Integrals of the process variables density functions are approximated by particle filtering algorithms. The transfer entropy method using Sequential Monte Carlo (SMC) method finds qualitative causality relationships between process variables by also providing their quantitative strength. This approach uses information from the process models and tries to improve the TE calculation accuracy and speed. By using the physical models of the process and particle filters, the approach can generate the probability directly from the model; thus, there is no need to estimate the pdfs of each vector. Particle filters or sequential Monte Carlo methods based on particles that depict probability densities. The method aims to approximate density functions of the filtering problem by spreading particles. The particles should follow the process dynamic equations [45]. The methods are applied to any state-space model and generalize the traditional Kalman filtering methods [46]. Sequential Monte Carlo methods are simulation-based methods that are 50  convenient and flexible for the posterior distribution calculation [47]. They are a set of Bayesian methods that are used for nonlinear and non-Gaussian filtering problems [48].  4.2 Methodology  Consider any chemical process with the discrete-time stochastic nonlinear state-space model:                   (4.1)  where          is a Markov state process defined the initial density X0~p(x0) and transition density Xt|(Xt-1,Ut-1)~p(xt|xt-1,ut-1).           is a nonlinear state feedback control law. The control law is given by:                (4.2)  where        is random noise sequence. Suppose the noise sequence to be known in its distribution.  The mathematical definition of the TE method formula is discussed in the previous chapter. The symbols of variables are changed because of the process model and control law symbols.  51                    [   [                       ]]  ∫    [                       ]                  (4.3)  Here,                                           .  Next, the particle filter method is used for TE approximation. Because of the Law of Total Probability, the TE formula is written as below:       ∫    [             ∫                 ]                  (4.4)  Here, ∫                 is shown to designate           . By using conditional probability, the equation 4.4 is rewritten as:       ∫   [             ∫                       ]                  (4.5)  Here,           equals to             which is the control law’s distribution.  In order to get the TE’s approximation by particle filter methods, particle sets have to be generated distributed in obedience to               and          with respect to respective integrals in the equation 4.5.   52  4.2.1 Approximation of                The joint density function               can be written by using conditional probability:                                                                  (4.6) Here,               is a state transition density function that is given in the equation 4.1.          is the control law that is given in the equation 4.2. Now using the Law of Total Probability the joint probability               is written as:                                      ∫                                 There is a joint probability in the integrand in the equation 4.7. Appealing to conditional densities, it is rewritten as:                                      ∫                                                                           (         )               (4.8)  Suppose a particle approximation of                    is known. The approximation is given as:  53   ̃                    ∑                                            where {               }                     is a pool of N particles distributed according to the respective joint probability.  ̃                  is a N particle approximation of                   . Then, the distribution function                    is marginalized over XT.   ̃              ∫  ∑                                        ∑                              (4.10)  where  ̃              is a N particle approximation of               .  Substitute the equation 4.10 into the equation 4.7:  54   ̃             ∫                                       ̃               ∫                                       ∑                                ∑                                             (4.11)  where  ̃             is a N-particle approximation of               which is given in the equation 4.8. All density functions are either state transition or control law density functions. N random particles {             }                   that are distributed according to               are sampled in a sequence of the following steps: 1) Pass the particle set {           }     through                   to generate a particle set {   }    ; 2) Pass {   }     through           to generate a set {   }    ; 3) Pass {       }     through                 to generate a set {     }    . When the particle set {             }                   is ready, write:   ̃                  ∑                                  (4.12) 55  The equation 4.12 provides the required particle set distributed according to the initial density function. It is clear that from the equation 4.9 to 4.12 a recursive approach to generate the required particle set {             }                   for all time instants       4.2.2 Approximation of           Suppose, once again, the particle set {               }                     is available in the equation 4.9. After marginalization of the equation 4.9 over Xt-1and Ut-1:   ̃      ∫  ∑                                      ∑                  (4.13) where  ̃      is a N particle approximation of the        distribution. For each particle in the set {   }           , a set {   }                can be generated. The set is distributed according to  (  |   )density function by M random sampling in the equation 4.2. Once the set {   }       (  |   ) is known:   ̃(       )  ∑                 (4.14) where  ̃(       ) is a M particle approximation of  (       ). 56  After the particle approximation of p(               and  (       ) densities are available in the equations 4.12 and 4.14, the approximation of the TE in the equation 4.5 can be calculated. For calculation, substitute the equation 4.12 into 4.5:   ̃    ∫   [             ∫                       ]  ̃                ∫    [             ∫                       ]    ∑                                ∑   [ (     |       )∫  (     |      ) (   |   )]      (4.15) There is another integration in the equation 4.15. Thus, it has to be substituted into the equation 4.12. The approximation of the real Transfer Entropy by Monte Carlo method is:   ̃    ∑   [ (     |       )∫  (     |      ) ̃(   |   )]     ∑   [ (     |       )∫        |     )∑               ]     ∑   [                ∑                     ]      (4.16) Where  ̃    is the approximation of real Tu→x. The Particle Filter for Transfer Entropy approximation algorithm is shown in the Appendix 1.  57  4.3 Case Study  The overall calculation procedure is explained in the general rules below. 1) Particle filters are used for different density functions approximation. The particles of variables approximate the respective density function. The number of particles is known a priori; it is between 200 and1000 for every density functions. 2) ut, xt, and xt+1 process variables are indicated by a n × n × n matrix.  3) The joint probability p(uk,xk) is calculated using                      ∑ {                   }     (3.8)   4) Conditional probability p(xt+1|ut,xt) is found by using the process models regardless of noise. 5) Conditional probability p(xt+1|ut,xt) is calculated based on the examination:                                  (4.17) The calculation is based on the ut pdf.  Probability density functions do not all have same dimensionality. Accuracy of the particle approximation depends on particle numbers and pdfs dimensionality. For each density function, appropriate particle numbers should be chosen.  58  4.3.1 A Simple Nonlinear Process  The equations below describe the process we consider as a case study:                                      (     )            (4.18)  Here vt and wt are two Gaussian distributed noises, their distribution is vt, wt ~ N(0, 0.1). x01 and x02are equal to 0.   is a weighting parameter that balances two states’ effects. A time step consists of 1000 particles. ut is the entire system’s input that is randomly generated by Matlab function “idinput”.  In order to see how transfer entropy values change over time, the relationship between transfer entropy and the weighting parameter   is examined. The transfer entropy between xt(1) and xt(2) states is shown in the Table 4.1.  Table 4.1. Transfer Entropy Change as a Weighting Parameter Function Weighting Parameter,   TE from xt(1) and xt(2) 0.2 0.231 0.4 0.225 0.6 0.218 0.8 0.214  59  When the weighting parameter   increases, the transfer entropy from xt(1) and xt(2) decreases by dropping from 0.231 to 0.214. The reason is the memory contribution of xt(2) state is increasing, and causing the effect the decrease of input xt(1).  4.3.2 A Single Tank   As a case study, a single tank with a recycle stream is considered. Suppose inflow and outflow streams are connected to each other. The amount of the recycled outflow is fixed. The image of the process is shown in the Figure 4.1.   Figure 4.1. A single tank with a recycle stream  The mass balance equations of three streams are given below:                   (4.19) 60                       Where V is a constant influx, α is the recycle ration, k and C are tank parameters, h is the tank level, w1 and w2 are zero-mean Gaussian noise sequences.  Even though all variables are interconnected with each other, the TE between vi1 and v0 is calculated. There is causality from vi1 and h, apparently. However, the precise TE is unknown because of the recycle steam. The Table 4.2 shows how the TE calculation results change, when the recycle ratio α varies.   Table 4.2. Transfer Entropy Change as a Recycle Ration Function Recycle Ratio α TE from vi1 to v0 0.1 0.291 0.3 0.282 0.5 0.2705 0.7 0.266 0.9 0.259  The TE from vi1 to v0 declines when the recycle ratio α grows. The reason is when the recycle ratio α increases in the inflow the proportion of vi1 decreases. This means, the influence of vi1 is also reduced. As a result, the TE from vi1 to v0 dropped from 0.29 to 0.26.   61  Chapter 5: Conclusions  5.1 Summary of the Present Contribution  The modern industrial processes often experience plant-wide abnormalities and disturbances. These processes are becoming more and more complicated. Units and variables are highly interconnected and mutually interdependent. A small error in a certain part of the system may have an influence on the rest of the system by spreading along information and material flows. Alarms may propagate other alarms along the way leading to alarm flooding. Process connectivity and connecting pathways are vital for determination of the root causes of the error and predict future abnormalities. Transfer entropy is a powerful method that discovers information transfer between joint processes. The TE method is suitable for both linear and nonlinear systems. This thesis considers transfer entropy estimation based on both historical data and process models. Simulation case studies are given for each method of estimation. The method works with probability distribution functions of the process variables. Overall, the research shows the importance of the causality and connectivity detection by estimating transfer entropy of different processes.  Data-based TE method is useful to apply when process’s physical model is not given. Building the process model is usually rather time-consuming and expensive. In the thesis, connectivity graph of the popular Tennessee-Eastman process was built by using only historical data of the process. Transfer entropy values were approximated by calculating joint probabilities density functions of 2 process variables at different time instants. 62  On the other hand, it is risky to study for causality and connectivity by using only data when the process is highly stochastic and nonlinear. The causality and connectivity research might give erroneous results because of nonlinearity. The thesis also considers approximating transfer entropy by particle filters or sequential Monte Carlo methods.   5.2 Areas for the Future Work  Both historical data-based transfer entropy and process model-based transfer entropy in this thesis only provide approximate values of transfer entropy. They approximate differential transfer entropy, making the values as close to the actual value as possible. According to the formula, transfer entropy of continuous processes can be approximated by discrete transfer entropy. Since the discrete transfer entropy is used for historical data-based causality and connectivity detection, development of a convenient approximation of the differential transfer entropy for discrete process samples that are stochastic and nonlinear is needed.  It is very important to have a transfer entropy approximation that exposes the strongest connections between these variables. It would be interesting to apply the model-based TE using sequential Monte-Carlo method to more complex processes like the TEP and to compare both historical data based and process model based methods’ results.   63  Bibliography  1) Seborg, Edgar, Mellichamp, Doyle. (2011). Process Dynamics and Control (3rd ed.). John Wiley & Sons. 2) Izadi, I., Shah, S., and Chen, T., “Effective resource utilization for alarm management” in Decision and Control (CDC), 2010 49th IEEE Conference on Dec 2010, pp. 6803-6808.   3) Basseville, M., Nikiforov, I. Detection of Abrupt Changes: Theory and Application. Prentice-Hall, Englewood Cliffs, NJ, 1993. 4) Adnan, N.A. (2013). Performance Assessment and Systematic Design of Industrial Alarm Systems. Edmonton, Canada: University of Alberta.  5) Beebe D., Ferrer, S., Logerot, D. (2012, July). Alarm floods and plant incidents. Retrieved May 27, 2016, from www.digitalrefining.com/article/1000558 6) Izadi, I., Shah, L.S., Shook, D.S., Chen, T. (2009, June 30-July 3). An Introduction to Alarm Analysis and Design. Pp. 645-650: Proceedings of the 7th IFAC Fault Detection, Supervision and Safety of Technical Processes, Barcelona, Spain. 7) Rothenberg, D. H., Alarm management for process control: a best-practice guide for design, implementation, and use of industrial alarm systems, Momentum Press, New York, NY, 2009. 8) Nishiguchi, J., Takai, T. IPL2 and 3 performance improvement method for process safety using event correlation analysis, Computers & Chemical Engineering, vol. 34, no. 12, pp. 2007-2013, 2010. 9) Alrowaie, F., Gopaluni, B., Kwok, K. (2014). Alarm design for nonlinear stochastic systems (pp. 473-479). Shenyang, China: 11th World Congress on Intelligent Control and Automation. 10) Vedam, H. and Venkatasubramanian, V. (1999). Pca-sdg based process monitoring and fault diagnosis. Control Engineering Practice, 7(7), 903-917. 11) Yang, F. (2014). Capturing connectivity and causality in complex industrial processes. Cham: Springer. 12) Bauer, M., Cox, J.W., Caveness, M.H., Downs, J.J., Thornhill N.F.. Finding the direction of disturbance propagation in a chemical process using transfer entropy. IEEE Transactions on Control Systems Technology, 15(1):12–21, 2007. 13) Duan, P. (2014). Information Theory-based Approaches for Causality Analysis with Industrial Applications. University of Alberta. 14) Bauer, M.; Cox, J.W.; Caveness, M.H.; Downs, J.J.; Thornhill, N.F. Nearest Neighbors Methods for Root Cause Analysis of Plantwide Disturbances. Ind. Eng. Chem. Res. 2007, 46, 5977–5984. 15) Yu, W., Yang, F. (2015). Detection of Causality between Process Variables Based on Industrial Alarm Data Using Transfer Entropy. 16) Yang, F., Xiao, D. Progress in root cause and fault propagation analysis of large-scale industrial processes. Journal of Control Science and Engineering, 2012:478373–1–478373–10, 2012. 17) Wikipedia. Entropy (information theory). https://en.wikipedia.org/wiki/Entropy_(information_theory) 64  18) Heylighen F., Joslyn C. (2001): “Cybernetics and Second Order Cybernetics”, in: R.A. Meyers (ed.), Encyclopedia of Physical Science and Technology, Vol.4 (3rd ed.), (Academic Press, New York), p. 155-170. 19) Mezard, M., Montanari, A. (2009). Information, Physics, and Computation. Oxford University Press. 20) Vu, M. McGill University, ESCE 612 Lecture 2. http://www.info612.ece.mcgill.ca/lecture_02.pdf 21) Shannon, C.E., Weaver, W. The Mathematical Theory of Communication, p.628-633. University of Illinois Press, Champaign, Illinois, 1949. 22) Gao, J., Tulsya, A., Yang, F., & Gopaluni, B. (2016). A Transfer Entropy Method to Quantify Causality in Stochastic Nonlinear Systems. IFAC-PapersOnLine, 49(7), 454-459. 23) Duan, P., Chen, T., Shah, S. L. (2013, November 6). Direct Causality Detection via the Transfer Entropy Approach. IEEE Transactions on Control Systems Technology, 21(6), 2052-2065. 24) Lizier, J. T., Mahoney, J. R. (n.d.). Moving Frames of Reference, Relativity and Invariance in Transfer Entropy and Information Dynamics. Entropy, 177-197. Retrieved 2013. 25) Shah, S. L. (2015, June). Speech presented at Detection of Causality between Process Variables Based on Industrial Alarm Data Using Transfer Entropy in UBC, Vancouver. 26) Schreiber, T. “Measuring information transfer,” Phys. Rev. Lett., vol. 85, no. 2, pp. 461–464, 2000 27) Juricek, B. C., Seborg, D. E., & Larimore, W. E. (2001). Identification of the Tennessee Eastman challenge process with subspace methods (pp. 1337-1351). Control Engineering Practice. 28) Downs, J. J., Vogel, E. (1993). A plant-wide industrial process control problem. Computers and Chemical Engineering, 17, 245–255. 29) McAvoy, T.J., Ye, N. “Base Control for Tennessee Eastman Problem” in Computers in Chem.  Engng, 1994, Vol. 18, No. 5, pp. 383-413. 30) Farina, L. A., Trierweiler, J. O., & Secchi, A. R. (2000). A systematic comparison of control structures proposed to the Tennessee Eastman benchmark problem. Proceedings of ADCHEM 2000 (pp. 629–634). Pisa, Italy. 31) Ricker, N. L., Lee, J. H. (1995a). Nonlinear model predictive control of the Tennessee Eastman challenge process. Computers and Chemical Engineering, 19, 961–981. 32) Ricker, N. L. (1995). Optimal steady-state operation of the Tennessee Eastman challenge process. Computers and Chemical Engineering,19, 949–959. 33) Sriniwas, R., Arkun, Y. (1997). Control of the Tennessee Eastman process using input–output models. Journal of Process Control, 7, 387–400. 34) Emmanouilidis, C., Taisch, M., & Kiritsis, D. (Eds.). (2013). Advances in Production Management Systems. Competitive Manufacturing for Innovative Products and Services IFIP WG 5.7 International Conference, APMS 2012, Rhodes, Greece, September 24-26, 2012, Revised Selected Papers, Part II. Berlin, Heidelberg: Springer Berlin Heidelberg. 35) Vicente, R., Wibral, M., Lindner, M., & Pipa, G. (2010, August 13). Transfer entropy—a model-free measure of effective connectivity for the neurosciences. J Comput Neurosci, 45-67. doi:10.1007/s10827-010-0262-3 65  36) Nam, D.S., Han, C., Jeong, C.W., Yoon, E.S. “Automatic construction of extended symptom-fault associations from the signed digraph”, Comput. Chem. Eng., vol.22, no. 1, pp. S605-S610, 1996. 37) Maurya, M.R., Rengaswamy, R., Venkatasubramanian, V. “A systematic framework for the development and analysis of signed digraphs for chemical processes. 1. Algorithms and analysis”, Ind. Eng. Chem. Res., vol. 42, no. 20, pp. 4789-4810, 2003. 38) Zaremba, A., Aste, T. (2014). Measures of Causality in Complex Datasets with Application to Financial Data. Entropy, 6, 2309-2349. doi:10.3390/e16042309 39) Bauer, M., Cox, J.W., Caveness, M.H., Downs, J.J., Thornhill, N.F. “Finding the direction of disturbance propagation in a chemical process using transfer entropy,” IEEE Trans. Control Syst. Technol., vol. 15, no. 1, pp. 12–21, Jan. 2007 40) Kantz, H., Schreiber, T. Nonlinear Time Series Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1997 41) Seth, A.K. A MATLAB toolbox for Granger causal connectivity analysis. J. Neurosci. Methods, 2010, 186, 262–273. 42) Yu, J. “A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes,” Chem. Eng. Sci., vol. 68, no. 1, pp. 506–519, 2012. 43) Bossomaier, T., Barnett, L., Harre, M., & Lizier, J. T. (2016). An Introduction to Transfer Entropy: Information Flow in Complex Systems. Cham, Switzerland: Springer Nature. 44) Silverman, B. W. Density Estimation for Statistics and Data Analysis. New York: Chapman & Hall, 1986, pp. 34–48. 45) Alrowaie, F., Gopaluni, R., and Kwok, K. (2012). Fault detection and isolation in stochastic non-linear state-space models using particle filters. Control Engineering Practice, 20(10), 1016-1032. 46) Arulampalam, M.S., Maskell, S., Gordon, N. & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-gaussian Bayesian tracking. IEEE Transactions on Signal Processing. 50(2):174-188 47) Arnaud, D., Freitas, N.D., Gordon, N. eds. Sequential Monte Carlo Methods in Practice. N.p.: Springer, 2001. Print. 48) Gordon, N.J., Salmond, D.J., and Smith, A.F. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. In IEEE Proceedings F (Radar and Signal Processing), volume 140, 107-113. IET. 49) Ricker, N.L. Decentralized Control of the Tennessee Eastman Challenge Process. J. Process Control 1996, 6, 205–221      66  Appendix A  A.1 Particle Approximation of Transfer Entropy Algorithm     

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0340670/manifest

Comment

Related Items