Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Algorithm for nonlinear process monitoring and controller performance recovery with an application to… McClure, Kenneth Scott 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2013_fall_mcclure_kenneth.pdf [ 4.31MB ]
Metadata
JSON: 24-1.0073946.json
JSON-LD: 24-1.0073946-ld.json
RDF/XML (Pretty): 24-1.0073946-rdf.xml
RDF/JSON: 24-1.0073946-rdf.json
Turtle: 24-1.0073946-turtle.txt
N-Triples: 24-1.0073946-rdf-ntriples.txt
Original Record: 24-1.0073946-source.json
Full Text
24-1.0073946-fulltext.txt
Citation
24-1.0073946.ris

Full Text

ALGORITHMS FOR NONLINEAR PROCESS MONITORING AND CONTROLLER PERFORMANCE RECOVERY WITH AN APPLICATION TO SEMI-AUTOGENOUS GRINDING by Kenneth Scott McClure B.A.Sc., The University of British Columbia, 2011 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies (Chemical and Biological Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) July 2013 cKenneth Scott McClure, 2013 Abstract Chemical and mineral processing industries commonly commission linear feedback controllers to control unit processes over a narrow and linear operating region where the economy of the process is maximized. However, most of these processes are non- linear outside of this narrow operating region. In the event of a large unmeasured disturbance, a process can shift away from nominal and into an abnormal operating region. Owing to the nonlinearity of these processes, a linear controller tuned for the nominal operating region will perform poorly and possibly lead to an unstable closed-loop system in an abnormal operating region. Moreover, it is often dicult to determine whether a process has shifted to an abnormal operating region if none of the constraints on the measured process outputs are violated. In these events, it is the operator who must detect and recover the process, and this manual response results in a sub-optimal recovery. This thesis develops and demonstrates a control strategy that monitors several process variables simultaneously and provides an estimate of the process shift to a nonlinear abnormal operating region where a linear recovery controller is implemented to recover the process back to nominal. To monitor the process, principal component analysis is proposed for process shifts that can be detected by linear variable trans- formations. Alternatively, for nonlinear or high-dimensional processes, locally linear embedding is proposed. Once a process shift to an abnormal operating region is detected, the control strat- egy uses the estimate of the process shift in a recovery controller to recover the process. In the event the linear recovery controller is unable to recover the process, an expert system overrides the recovery controller to return the process to a recoverable region. ii A case study on a semi-autogenous grinding mill at a processing facility in British Columbia presents the successful application of the control strategy to detect and recover the mill from overloading. Portions of this control strategy have been imple- mented at this facility, and it provides the operators with a real-time estimate on the magnitude of the mill overload. iii Preface This thesis was written in partial fulfillment of the degree of Masters of Applied Science in the Faculty of Graduate Studies in the Department of Chemical and Biological Engineering at the University of British Columbia. The content herein describes a general control strategy to recover a chemical or mineral process from an abnormal operating region to the nominal range of operation. Portions of this work have been submitted for publication in May 2013. In addition, the findings for locally linear embedding on the case study will be presented at the International Mineral Processing Congress in October 2014. A case study for this work was conducted at a mineral processing facility located in British Columbia. Portions of this work were implemented at this facility in the first half of 2013. The case study of semi-autogenous grinding at this mineral processing facility demonstrates how this work can improve the stability and economy of com- minution circuits in the mineral processing field. However, the stability and economy of comminution can also be improved indirectly by rejecting disturbances upstream at the dry-surge ore stockpile. The case study in this work is complementary to the work on dry-surge control that has been accepted for publication: McClure, K.S., Nagamune, R., Marshman, D., Chmelyk, T., and Gopaluni, R.B. Robust level control of a dry-surge ore pile. Canadian Journal of Chemical Engineering (In Press). I, Ken McClure, was the main contributor with respect to research for the imple- mentation of the control strategy in this thesis at the mineral processing facility, along with all other content of this work. iv Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Model-based methods . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Data-based methods . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Controller performance recovery . . . . . . . . . . . . . . . . . . . . . . 21 2.3 SAG milling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.1 Process overview . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.2 Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 v 3.1 Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2 Controller performance recovery . . . . . . . . . . . . . . . . . . . . . . 37 4 General Control Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1 Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.1 Principal component analysis . . . . . . . . . . . . . . . . . . . 42 4.1.2 Locally linear embedding . . . . . . . . . . . . . . . . . . . . . . 50 4.1.3 Linear and quadratic classification . . . . . . . . . . . . . . . . . 61 4.2 Controller performance recovery . . . . . . . . . . . . . . . . . . . . . . 65 4.2.1 Modeling the abnormal operating region . . . . . . . . . . . . . 67 4.2.2 Override MPC design . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2.3 Modifications to existing MPC . . . . . . . . . . . . . . . . . . . 74 5 Case Study of SAG Mill Overload . . . . . . . . . . . . . . . . . . . . . 76 5.1 Description of the process . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.1.1 Control objective . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2.1 Selection of measured and inferred process variables . . . . . . . 81 5.2.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2.3 SLLEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.4 Comparison between PCA-LDA and SLLEP-LDA . . . . . . . . 105 5.3 Modeling of the overloaded operating region . . . . . . . . . . . . . . . 110 5.4 Recovery from the SAG mill overload . . . . . . . . . . . . . . . . . . . 117 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 vi A Selection of Process Variables . . . . . . . . . . . . . . . . . . . . . . . . 139 A.1 Transformation of sound level . . . . . . . . . . . . . . . . . . . . . . . 141 B Locally Linear Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 143 B.1 Minimization of reconstruction error . . . . . . . . . . . . . . . . . . . 143 B.2 Minimization of embedding cost . . . . . . . . . . . . . . . . . . . . . . 144 B.3 Least squares regression to map new samples . . . . . . . . . . . . . . . 144 C Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 D Model Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . 148 E Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 vii List of Tables 5.1 Input and output variables critical to the recovery control strategy . . . 79 5.2 Properties of the candidate process variables . . . . . . . . . . . . . . . 82 5.3 Classification results for PCA Method 1 . . . . . . . . . . . . . . . . . 96 5.4 PCA modeling results . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5 PCA-LDA classification results for the 2D and 1D feature spaces . . . . 98 5.6 Summary of the variables retained in each candidate model . . . . . . . 102 5.7 SLLEP Classification results of four candidate models . . . . . . . . . . 105 5.8 Classification boundaries for SLLE and the classification error rate . . . 105 5.9 Comparison of the classification results between PCA-LDA and SLLEP- LDA on the test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.10 Modeling steps and fitting results . . . . . . . . . . . . . . . . . . . . . 112 5.11 Best fit models and their parameter variances . . . . . . . . . . . . . . 115 5.12 Key performance indices to measure the potential savings between Sce- nario 1 and Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.13 Assumptions for calculating the key performance measures . . . . . . . 125 viii List of Figures 1.1 Example of a process operation with two states . . . . . . . . . . . . . 4 2.1 Types of faults and their development with time . . . . . . . . . . . . . 8 2.2 Residual generation by a parity relation for process monitoring . . . . . 11 2.3 Residual generation by observer for process monitoring . . . . . . . . . 12 2.4 Architecture of a neural network . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Bivariate normal sampled data showing the relative contributions of the two principal components . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Side view of the ellipse from PCA on 3D data to emphasize dimension- ality reduction and outlier detection . . . . . . . . . . . . . . . . . . . . 17 2.7 SVM trained with two classes . . . . . . . . . . . . . . . . . . . . . . . 19 2.8 Dimensionality reduction by LLE performed on a Swiss Roll . . . . . . 20 2.9 Schematic of a typical grate-discharge SAG mill . . . . . . . . . . . . . 28 2.10 Mill power curve for several process conditions . . . . . . . . . . . . . . 29 2.11 Cross-sectional view of the inside of a SAG mill showing load torques and maximum potential energy of cataracting ore . . . . . . . . . . . . 30 2.12 Distribution of slurry within a mill . . . . . . . . . . . . . . . . . . . . 32 2.13 Cross-sectional view of the inside of a SAG mill showing load torques . 33 2.14 Cross-sectional view of the inside of a SAG mill showing load torques and load distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 Flowchart of the process monitoring and controller performance recov- ery strategy applied to a chemical process . . . . . . . . . . . . . . . . 40 ix 4.2 Block diagram of the major steps to statistical pattern recognition for process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3 The three steps of the LLE algorithm . . . . . . . . . . . . . . . . . . . 52 4.4 Locally linear reconstruction of xi . . . . . . . . . . . . . . . . . . . . . 54 4.5 LLE mapping of an ‘S’ shaped curve to demonstrate the accuracy of LLE projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.6 3D manifold of process operation with two planes showing the linearized regions the nominal and recovery controller are designed to operate . . 68 4.7 Block diagram for the input-output relationship of a modelled process . 69 4.8 MPC controller overrides the MV’s of the existing controller to recover the process after a process shift . . . . . . . . . . . . . . . . . . . . . . 72 4.9 Expert rules override the MV’s if the process is operating in a severely abnormal region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.10 Existing MPC controller is modified with a constraint variable to allow the MPC to recover the process after a process shift . . . . . . . . . . . 75 5.1 Overview of existing SAG mill control at the mineral processing facility in British Columbia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Candidate process variables of the training data-set . . . . . . . . . . . 88 5.3 Pair-wise scatter plot of the selected process variables . . . . . . . . . . 91 5.4 Candidate process variables for a segment of the test data-set for process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5 Candidate process variables of the training data-set for PCA Method 1 only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.6 Q and T 2 results on the test data-set for PCA Method 1 . . . . . . . . 95 5.7 LDA classification results on the PC score space of the training data-set for a) 2D LDA, and b) 1D LDA . . . . . . . . . . . . . . . . . . . . . . 99 5.8 LDA classification results on the PC score space of the test data-set . . 100 x 5.9 LDA and QDA classification on the training data-set mapped into 2D space by SLLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.10 LDA and QDA classification on the test data-set mapped into 2D space by SLLEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.11 Output from process monitoring on the test set for both PCA-LDA and SLLEP-LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.12 Model fitting results for the response of the process shift to changes in feed ore rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.13 Fitting P1D to a new training data-set in order to fit a model for u2 on the fit remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.14 P1D fitting results on a test data-set to validate model . . . . . . . . . 116 5.15 Scenario 1: control strategy has not been incorporated to mitigate the incipient disturbance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.16 Scenario 2: process shift constraint variable is exceeded causing the MPC to correct for the incipient disturbance before severe overloading results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.17 Scenario 3: abrupt disturbance leads to significant overloading that the expert system mitigates before returning control to the MPC . . . . . . 122 A.1 Di↵erence between measured sound level and a 0.4 day moving average of the sound level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 xi Notation D number of dimensions in a high-dimensional data-set F statistic for the F continuous probability distribution Gp process transfer function H1 Hardy space with minimax restrictions in the frequency domain J cost function Kf number of folds in cross-validation Kp process gain K number of nearest neighbors to a data point Q↵ upper control limit of the test statistic for principal component analysis Q test statistic for principal component analysis R0 Euclidean distance between vectors considering class information R Euclidean distance between vectors before considering class information T 2↵ upper control limit of the test statistic for principal component analysis T 2 test statistic for principal component analysis U(s) input signal in Laplace domain W weights Y (s) output signal in Laplace domain  di↵erence operator ⇤ diagonal eigenvalue matrix  embedding cost < real number ↵ percent confidence level xii r̄ average model residual per sample x̄1 mean vector of nominal operating data x̄2 mean vector of abnormal operating data  binary class information between two vectors ⌘ nearest neighbor  class information modifier ŷ(t+ k|t) predicted value of y(t+ k) with available information at instant t 2 element of L Lagrange multiplier m unknown parameter in cross-validation  eigenvalue A mapping matrix in locally linear embedding projection C correlation matrix E residual matrix G system dynamic matrix I identity matrix K controller matrix Q penalty on move weight matrix R penalty on error weight matrix S1 covariance matrix of nominal operating data S2 covariance matrix of abnormal operating data Spooled pooled covariance matrix S covariance matrix T principal component score matrix V eigenvector matrix xiii X observation data matrix Y feature data matrix ⇤ eigenvalue matrix ⌦ cross product of two vectors ⇡1 class label for nominal data ⇡2 class label for abnormal data  matrix inequality ⌧c torque provided by charge ⌧n numerator time constant ⌧p torque provided by slurry pool ⌧ time constant ✓ time delay " reconstruction cost a basis mapping vector locally linear embedding projection f free response vector t principal component score vector u array of future control signals x observation data vector y feature vector T transpose df degrees of freedom d number of dimensions of a low-dimensional mapped data-set f arbitrary feature of a process n1 number of observations of nominal operating data n2 number of observations of abnormal operating data xiv nu number of controlled process inputs ny number of controlled process outputs n number of observations in a data-set p number of retained principle components s operator variable in the frequency domain tf time of onset of a fault t time u process input variable w process input disturbance variable xc length of the charge torque arm xp length of the slurry pool torque arm x process state variable y process output variable |S| determinant of matrix S ||v||2Q equivalent to vTQv 1D one-dimensional 2D two-dimensional 3D three-dimensional ANN artificial neural network APER apparent error rate AV constraint variable BP bearing pressure xv c.g. center of gravity CPV cumulative percent variance CV control variable DCS digital control system DE discharge-end DPCA dynamic principal components analysis DV disturbance variable EKF extended Kalman filter FE feed-end FOPTD first-order plus time delay FTC fault tolerant control GPM gallons per minute KF Kalman filter KNN K-nearest neighbors KPCA kernel principal components analysis LDA linear discriminant analysis LLE locally linear embedding LP linear program LTI linear time-invariant xvi MP motor power MPC model predictive control MV manipulated variable PC principal component PCA principal components analysis PEmax maximum potential energy PF particle filter PID proportional-integral-derivative PRESS prediction error sum of squares QDA quadratic discriminant analysis QP quadratic program SAG semi-autogeneous grinding SLLE supervised locally linear embedding SLLEP supervised locally linear embedding projection SOPTD second-order plus time delay SP set-point STPH short tons per hour SVD singular value decomposition SVM support vector machines UIO unknown input observer xvii UKF unscented Kalman filter USD United States dollars XV cross-validation xviii Acknowledgements This thesis would not have been possible without the encouragement, support, and insight provided from my academic supervisors Dr. Bhushan Gopaluni, Dr. Ryozo Nagamune, and Dr. Sirish Shah, or from my industrial supervisors Terry Chmelyk, Manny Sidhu, and Devin Marshman at Spartan Controls. To them, I owe a huge debt of gratitude over these past two years. I am grateful to the University of Alberta for allowing me to study at their school under the Western Dean’s Agreement. Dr. Shah deserves a huge thank-you for ac- cepting me as his student and for encouraging me to take a productive course load at the university. I am also grateful to NSERC and Spartan Controls whose financial support made this work possible. I thank Dr. Nagamune and Dr. Gopaluni for managing the control group at the Uni- versity of British Columbia. This control group brought me together with colleagues outside of the department, and it motivated me to design a multivariate model pre- dictive controller that was useful in this work. I sincerely thank my parents for their never-failing interest and support throughout all the years of my study. They are an inspiration to work hard in life. I also thank my uncle, Dan Ashman, for his inspiration to become an engineer and exposing me to mineral processing. xix I am thankful to the mineral processing facility in British Columbia for allowing me to use their SAG mill as a case study for my thesis. It is a rare opportunity to have been able to implement this work in the industry and solve a problem in great need of a solution. I would like to make a very special acknowledgement to my partner, Vanya Jongkind, for her love, patience, and support throughout these years. xx Chapter 1 Introduction It is well-known that industrial processes are controlled over a narrow operating region to maximize profitability. Moreover, processes are usually most profitable at the inter- section of process constraints (Garćıa et al., 1989). Since processes are intended to be controlled over a narrow operating region, control engineers appropriately assume this narrow operating region is ‘locally linear’ and describe the process dynamics as linear time-invariant (LTI) (Ljung, 2001). The assumption of an LTI process allows control engineers to design linear controllers for a narrow and desirable operating region, also known as the nominal operating region, even though the process dynamics may be nonlinear outside this region. Controlling industrial processes with linear controllers is advantageous compared to controlling these processes with nonlinear ones. Linear controllers require rela- tively minor online computation, and they can be synthesized from many well-known techniques such as proportional-integral-derivative (PID), model predictive, H1, or µ control methods (Kim et al., 1997). However, linear feedback control, by design, is inherently limited to the nominal operating region where process economics are maximized. This limitation exists because the process model is linearized around the nominal operating region, and a controller is designed based on this linearized model. Since industrial processes are typically nonlinear, a deviation in the state of the process away from the nominal operating region where the controller is tuned, also known as a process shift, results in a degradation of controller performance. If the nonlinearity 1 of the process is significant, the controller may not be able to recover the process back to nominal. As a consequence, the feedback system may become unstable leading to safety risks or damage to process equipment (Doyle et al., 1992). Nimmo (1995) estimated that abnormal process situations cost U.S. industries over $20 billion each year. For example, the recent BP oil spill in the Gulf of Mexico in 2010 resulted in billions of dollars in damages to the fishing and energy industries and had enormous social and environmental impacts (Gill et al., 2011). This disaster started when the cement barrier at the bottom of the well failed. Afterwards, a series of equipment failures and poor monitoring and control practices resulted in the venting and ignition of a hydrocarbon-rich vapour that ultimately sank the oil rig and caused one of the worst oil spills in history (Bly, 2011; Liu et al., 2011). In this example, process monitoring could have detected the failure of the cement barrier from the measured increase of the drill pressure, and an automatic recovery control strategy could have responded 40 minutes faster than the operators onboard the oil rig (Bly, 2011). As a result, the hydrocarbons could have been diverted to a safer atmosphere with less risk of ignition or catastrophic damage to the oil rig (Bly, 2011). In the chemical and mineral processing industries, a process is usually recovered once an operator recognizes the degradation of the process performance and manually manipulates the control elements to recover the process. However, industrial processes are often very complex with many process variables; consequently, it takes too long for an operator to recognize the process has shifted and intervene. In addition, the actions of an operator are not optimal, and it often takes more time than necessary to manually recover the process to its nominal operating region (van Drunick and Penny, 2005). As an example, in the case study for this thesis, semi-autogenous grinding (SAG), operators usually over-correct a mill overload to the point where the mill becomes underloaded—leading to poor throughput and excessive liner wear. On the other hand, past attempts to design a method to automatically recover from a SAG mill overload have been ad hoc and often lead to sub-optimal results (van Drunick and 2 Penny, 2005). There are few, if any, generalized control strategies applicable to the chemical in- dustry that solve both the problem of identifying a process shift and recovering the process back to the nominal operating region with a linear control method. There have been e↵orts in the control field to try to prevent a process from shifting away from the nominal operating region. These methods focus mostly on disturbance rejection. However, it is not always feasible to prevent a process shift for every possible scenario. As a result, another approach has been developed that attempts to make the feed- back system tolerant to the consequences of a process shift. This approach is called fault-tolerant control (FTC), where a reasonable system performance and stability is achieved in the event of an abnormal event, known as a fault, by designing a feedback system that is tolerant to abnormal events such as process equipment failure. These two approaches are discussed briefly in the literature review in Chapter 2. The approach of this work is to design a general control strategy applicable to industrial processes that experience process shifts regardless of disturbance rejection e↵orts. Furthermore, these process shifts are caused by disturbances to the inputs of the process that are not a result of any equipment failure. Therefore, in most of these scenarios, the process recovery is controllable (Venkatasubramanian et al., 2003a), and we can design a control method to recover the process to the nominal operating state. In this work, this control method is known as ‘controller performance recovery’ because the existing controller for the process is modified or overridden in the event of a process shift in order to recover control performance and ultimately recover the process to back to nominal. In this thesis, a general control strategy is sought that, first, monitors a process to detect a process shift for linear and nonlinear processes, and afterwards utilizes information from the process monitoring tool to recover linear feedback control performance and return the process to nominal operation. Figure 1.1 illustrates the two main elements of the control strategy on an example process with two states, x1 and x2. The figure depicts two di↵erent types of undesired 3 operating regions. In this example, only the operating region with serious consequences to process stability needs a recovery control strategy. In this region, the existing linear controller, also known as the nominal controller, is unable to recover the process due to the nonlinear nature of the process. However, with an appropriate control strategy, the process shift can be detected by a process monitoring tool, and a controller specifically tuned for this operating region can optimally recover the process to the nominal operating region. 1. Detect Process Shift x1 x2 Undesired State – Serious Consequences Nominal State Undesired State – Minor Consequences 2. Recover Process Figure 1.1: Example of a process operation with two states, x1 and x2, where 1. the shift to an undesired operating region is detected, and 2. the process is recovered with a linear controller Once a control strategy is developed, it will be tested on a SAG mill at a mineral processing facility located in central British Columbia, Canada. The SAG mill at this facility sometimes shifts to an overloaded state due to disturbances in the ore hardness, particle size, and composition. The overloaded state is dicult to detect because the process is operated as close to an overloaded state as possible in order to maximize the mill’s throughput, and changes in the state of the mill are only subtle at first. Once the mill is severely overloaded, the grinding eciency of the mill decreases to 4 the point where ore rapidly accumulates and the mill’s bearings are at risk of damage. To recover from the mill overload, operators manually stop the feed ore conveyor for a long period of time to let the mill grind the ore charge. However, these actions usually cause the state of the mill to be overshot to an underloaded operating state. The SAG mill at the mineral processing facility in British Columbia is an excel- lent industrial application for the control strategy in this thesis. The SAG mill can benefit from earlier detection of the process shift to an overloaded operating state. Furthermore, the control system can automatically recover the process to nominal op- eration in minimal time without underloading the mill. This case study will showcase the adaptability of the general control strategy to a real industrial process, and it demonstrates the return of investment possible when implementing this technique. The main contribution of this thesis to the chemical and mineral processing in- dustry is the development of a control strategy that can be used in a digital control system to detect and recover a process that has shifted to a nonlinear operating range where the existing controller performs poorly. This control strategy allows industrial processes to operate in process conditions that are close to constraints or undesired operation regions to enhance the economic performance of the process. In addition, a technique that allows locally linear embedding to be commissioned in an industrial controller for use as a nonlinear process monitoring technique is also discussed in this work. Finally, the control strategy developed in this thesis was applied to the SAG mill at the mineral processing facility in British Columbia, and it addresses the problem of detecting and correcting a mill overload before controller performance degrades ex- cessively. As a result, the mineral processing facility in British Columbia can operate their process at a higher throughput with less risk that manual intervention will be needed in the event the process is significantly disturbed. The remainder of this thesis is organized as follows. In Chapter 2, a literature review covers the most relevant process monitoring and control techniques to this work. Chapter 3 develops the problem statement and scope for this work. In Chapter 4, a 5 general control strategy is developed that addresses the issues identified in the problem statement. An industrial implementation of the control strategy on the SAG mill at the mineral processing facility in British Columbia is presented in Chapter 5. Finally, in Chapter 6, the conclusions and future work are discussed. 6 Chapter 2 Literature Review The literature review focuses on two topics: process monitoring and controller perfor- mance recovery. Process monitoring tools are able to estimate whether the state of a process is nominal or abnormal. Controller performance recovery tools are control methods that recover a process from an abnormal operating situation to the nominal operating region where process economy is maximized and the existing nominal con- troller has the highest performance. Both topics are broad with numerous techniques available. Emphasis is placed on contributions relevant to the work in this thesis. 2.1 Process monitoring Process monitoring is an important problem in the chemical and mineral processing industry because early detection of abnormal, also referred to as undesired, process conditions can help prevent escalation of abnormal events and consequent production loss. Traditionally, the industry has relied on human operators to monitor processes. However, process monitoring is a non-trivial task because industrial processes are usually complex multivariate systems. For example, a large process plant may have more than 1500 variables that are measured every few seconds (Bailey, 1984). In these systems, measurements may remain within their acceptable alarm limits when the process is in an abnormal operating region, thus making it dicult for operators to detect by monitoring the data directly (Serpas et al., 2012). However, a process 7 monitoring system would monitor correlations between multiple variables to determine when the process deviates to an undesired state. The benefits of an e↵ective process monitoring method include improved process economics, safety, and environmental impact. Abnormal events can manifest in many ways such as process unit failures, instru- mentation failures or drifting, and process degradation (Venkatasubramanian et al., 2003a). Furthermore, unmeasured variables, such as feed characteristics, may also introduce large disturbances into a process that shift the process to an undesired op- erating region. Regardless of the root cause, any departure from an acceptable range of an observed variable or calculated parameter is known as a fault (Himmelblau, 1978). Isermann (2005) showed that the time dependency of these faults can be divided into three categories as shown in Figure 2.1. The variable f represents the change of some feature, or operating state, of a process due to a fault, and tf signifies the time of onset of the fault. Abrupt faults cause an immediate and drastic change to some feature of the process; for example, a process unit such as a pump has failed. Incipient faults progressively alter the feature of a process; for example, a measurement slowly drifts due to poor maintenance of instrument calibrations. Lastly, intermittent faults are transient faults that exist for a short time—only to eventually reappear again. Intermittent Incipient Abrupt tf t f Figure 2.1: Types of faults and their development with time Fault detection methods can be divided into two categories: model-based methods 8 and data-based methods (Serpas et al., 2012), and both categories can be further subdivided into either quantitative and qualitative methods. Ohran et al. (2009) discuss the main di↵erences between data-based and model-based methods as follows: model-based methods generate data that are compared to measured outputs to create residuals relating to specific process conditions, while data-based methods estimate the location and direction of the system trajectory in state space. Desirable characteristics of a fault detection system are discussed by Venkatasub- ramanian et al. (2003a) and the points relevant to this work are listed below: 1. Quick detection: there is a trade-o↵ between quick fault detection and sensitivity to high frequency influences such as noise. 2. Robustness: the performance should degrade gracefully rather than abruptly, and the method should be robust to noise and uncertainties. 3. Novelty identifiability: the ability to successfully detect the known fault from the occurrence of unknown, or novel, faults that may arise. 4. Estimate classification errors: provide a confidence level for the classification of a fault condition. 5. Adaptability: the adaptability of the method to changing process conditions such as disturbances, environmental conditions, production targets, etc. 6. Modeling requirements: modeling e↵ort should be minimal. 7. Computational requirements: the method must operate at a satisfactory scan rate in an industrial controller. Although a process monitoring method may satisfy these desirable characteristics, it will not be utilized unless the operators of the process accept the method. To increase operator acceptance, the method should be intuitive and simple to operate and maintain. Furthermore, the operators need confidence that the output of the 9 process monitoring tool is accurate and makes sense. Therefore, for new measurements it is preferable to have a two, or at most, three-dimensional plot depicting the output of the process monitoring tool and the classification boundary between the abnormal and nominal operating conditions. 2.1.1 Model-based methods Model-based process monitoring tools require a model of the process, often in the form of quantitative methods such as parity relations, diagnostic observers, or optimal state estimators. The output from these models is the expected behaviour of the process, and it is compared to the actual behaviour to generate inconsistencies known as residuals (Venkatasubramanian et al., 2003a). Residual signals are useful for fault detection because they may reflect the occurrence and magnitude of process abnormalities. The advantages of model-based methods over data-based methods, especially for first-principles models, is that the models reflect physical laws of the process rather than coincidences which may only occur under certain conditions. Furthermore, model-based methods can be reused on similar processes. Additionally, the assump- tions and limitations of the models are typically more clear than data-based methods. However, model-based methods are usually developed on some fundamental under- standing of the physics of the process (Venkatasubramanian et al., 2003b), and require very accurate process models to be e↵ective for process monitoring (Ohran et al., 2009). Many chemical processes are so complex and poorly understood that it may not be possible to develop a suciently accurate model for model-based process monitoring (Yoon and MacGregor, 2001). There are many model-based techniques that can be used for process monitor- ing. Each technique o↵ers advantages and disadvantages in terms of computational requirements, required model accuracy, and assumed a priori knowledge of the physics of a process. Moreover, some techniques are better suited for nonlinear processes or stochastic processes. Model-based techniques can be separated into qualitative and 10 quantitative methods. Qualitative methods, such as cause-e↵ect models, are discussed by Venkatasubramanian et al. (2003b). Their main advantage is in diagnosing the root cause of process abnormalities, and the methods are less sensitive to process drift. However, these methods may lead to spurious solutions, and a measurement of the magnitude of process shift is not provided as an output from these methods. Three model-based methods that can be used for process monitoring are discussed in this section: parity relations, diagnostic observers, and stochastic filters. Parity relations A common quantitative model-based process monitoring technique is to develop parity relations. Parity relations are most commonly used for analytical redundancy that may indicate sensor or actuator failures. A parity relation is obtained by rearranging or transforming the input-output process models. The general structure of the parity approach is illustrated in Figure 2.2. This approach is useful when measurements are not heavily corrupted with noise and sensors are calibrated frequently to minimize drift. Process Residual Generator (Parity Relation) Residual Evaluator Inputs u(t) Measured Outputs y(t) Classification Residuals r(t) Figure 2.2: Residual generation by a parity relation for process monitoring 11 Diagnostic observers Chou (2000) discusses the application of diagnostic observers, also known as unknown input observers (UIO), as a model-based process monitoring technique. Often, a dy- namic system can be modelled as having unknown inputs representing disturbances. In a disturbance-free case, the UIO is designed so that its output tracks the output of the process regardless of the unknown inputs. When the process is disturbed, the residuals carry a ‘signature’ of the resultant process shift. The general structure of the UIO approach is shown in Figure 2.3. The figure shows that the UIO approach is di↵erent from the parity approach in that the observer provides an estimate of the state of the process, and it is designed to track the real state asymptotically under nominal operating conditions. Residual Evaluator Residual Generator Unknown Input Observer ProcessInputs u(t) Measured Outputs y(t) State Estimates x(t) Residuals r(t) Classification Figure 2.3: Residual generation by observer for process monitoring 12 Stochastic filters An extension of the observer approach is the optimal stochastic filter known as the Kalman filter (KF) originally proposed by Kalman (1960), and applications of the KF to process monitoring are discussed by Basseville (1988). The KF is used to estimate the state of the process from a series of measurements corrupted by white Gaussian noise (Frank et al., 2000). A KF uses a weighted average of the model prediction of a system and a new measurement to produce a new state estimate with minimum mean square error. A KF operates recursively and requires only the previous most trusted estimate; therefore, a KF can be solved in real-time. The KF has been adapted to nonlinear processes with white Gaussian noise. The Extended Kalman filter (EKF) (Maybeck, 1979) and the unscented Kalman filter (UKF) (Wan and Van Der Merwe, 2000) are two such approaches. These methods are more computationally complex than the KF; however, these techniques can be adapted for online process monitoring. For example, Quine (2006) discusses a derivative-free implementation of the EKF that avoids the need to calculate the Jacobian matrix. Particle filters (PFs) can be employed for processes with nonlinear dynamics where the noise is non-Gaussian (Manohar and Roy, 2006). Monte Carlo simulations are performed to provide an asymptotically consistent distribution of the state of the process; however, PFs are computationally expensive and require an exact process model (Saha and Roy, 2011). 2.1.2 Data-based methods In contrast to model-based methods where a priori knowledge about the process is needed, data-based process monitoring requires only a sucient database of historical process data. Consequently, for systems that are complex or poorly understood, a data-based method is advantageous because the process dynamics would naturally be taken into account if enough historical data around various operating points is provided 13 (Venkatasubramanian et al., 2003c). Ohran et al. (2008) mentions the disadvantage of these methods is they usually require fault specific historical data which may not always be available; however, statistical methods for unsupervised learning of the models may be utilized in these cases. In data-based process monitoring, features are extracted from a set of data that present some priori knowledge of the system. Specifically, data-based methods are pattern recognition problems (Venkatasubramanian et al., 2003c). Each data-based method demonstrates some advantage or disadvantage in terms of the desirable char- acteristics of a fault detection method as listed in Chapter 2.1. Furthermore, each technique has a specific class of systems it is intended to be applied to. For example, some methods work best for only linear or multivariate normal systems. Data-based methods fall into two categories: qualitative methods and quantita- tive methods. Two important qualitative methods for process monitoring are expert systems and trend analysis (Venkatasubramanian et al., 2003c). These techniques are best suited to diagnosing the root cause of a fault rather than detecting the onset of an abnormal event. For example, expert systems match existing symptomatic in- formation for a fault, such as alarm information or operator supplied information, with a rule-set involving logical condition statements (Ramesh et al., 1988). Venkata- subramanian et al. (2003a) suggest that the best fault detection method is likely a hybrid of process monitoring methods in order to both detect and diagnose faults. In this section, we discuss four di↵erent data-based tools that may be used for process monitoring: artificial neural networks; principal component analysis; support vector machines; and locally linear embedding. Artificial neural networks Artificial neural networks (ANN) are proposed by Venkatasubramanian and Chan (1989) and Patton et al. (1994) as an e↵ective non-statistical quantitative fault de- tection and diagnosing technique for nonlinear and dynamic systems in the chemical 14 industry. ANN are capable of mapping multi-input and multi-output systems through a network of highly interconnected layers of neuron-like processing elements as shown in Figure 2.4. There are three layers to an ANN: the input layer where signals enter the network, a hidden layer where input signals are routed to the output layer, and the output layer which contains the response to a given set of inputs. Output LayerInput Layer Hidden Layer Figure 2.4: Architecture of a neural network ANN are trained with historical process data to modify the weights of all intercon- nections in the network and create an input-output relation useful for detecting faults. However, Venkatasubramanian and Chan (1989) mention ANNs are limited because of the comprehensive and excessive amounts of data or very detailed simulation models required for training. In addition, Zhou et al. (2003) found ANN needed to be con- stantly re-tuned due to the time-varying behaviour of some processes. Consequently, industrial applications of ANN for fault detection are rare (Koivo, 1994). As an al- ternative, multivariate statistical methods can be used for fault detection in place of ANN and do not require as extensive of a historian database. Principal component analysis Multivariate statistical techniques have proven in the industry to be e↵ective fault detection techniques. The major benefits of these methods is that there are no re- quirement for a fundamental or causal model of the system; additionally, they handle 15 large problems, missing data, and noise eciently (Yoon and MacGregor, 2001). One technique that has seen success in the chemical and mineral processing industries is principal component analysis (PCA). The main concept behind PCA is to transform a number of related process variables to a smaller set of uncorrelated variables, called principal components (PCs), containing the most relevant information (Cheng et al., 2008). Mathematically, PCA utilizes singular value decomposition (SVD) on the co- variance matrix of a data-set of the relevant process variables to produce its eigenvec- tors and eigenvalues (Garćıa-Álvarez et al., 2012; Johnson and Wichern, 2007). Only the eigenvectors that explain most of the variability in the data are selected, resulting in the chosen PCs. Figure 2.5 shows a bivariate normal set of data for two arbitrary process variables y1 and y2 with the magnitude and direction of the two PCs for the data. The figure shows that PC1 explains significantly more variability than PC2; as a result, PC2 may be discarded to reduce the dimensionality of the fault detection model. y1 y2 PC1 PC2 Figure 2.5: Bivariate normal sampled data showing the relative contributions of the two principal components From the information-rich PCs, it is possible to perform fault detection through statistical analysis of the PC scores of new sampled data to discover outliers. Two commonly used methods to detect data outliers are the Q test, also known as squared 16 prediction error, and the Hotelling T 2 test (Garćıa-Álvarez et al., 2012). These meth- ods are two di↵erent measures of distance that reveal if any PC scores are located significantly far from the center of the ellipse of the PC model. Figure 2.6 portrays data dimensionality reduction and outlier detection for three-dimensional data from arbitrary process variables y1, y2, and y3. In this figure, the black line through the data points depicts a side view of a 2D ellipse discovered by PCA. Data points along the ellipse axis, but outside of the ellipse itself, are detected by the T 2 statistic and represent abnormal observations whose variances can be explained by the PC model. Data points that are far away from the axis of the ellipse are detected by the Q statis- tic and represent the variances that cannot be explained by the PC model (Ko and Shang, 2011). y1 y2 y3 Outside of normal operating range Show up in Q test Show up in T2 test Normal operating data Correlation structure does not hold Figure 2.6: Side view of the ellipse from PCA on 3D data to emphasize dimensionality reduction and outlier detection PCA in its basic form has limited application for fault detection in the chemical industry because many processes are nonlinear and/or dynamic. Hence, PCA has been adapted in many ways to address these shortcomings. To address serial correlations between variables due to process dynamics, Ku et al. (1995) proposed a technique called dynamic PCA (DPCA). DPCA accounts for serial correlations by augment- ing observation vectors in the PCA method with observations at previous sampling 17 instances. However, a study by Russell et al. (2000) found that for many common industrial faults, DPCA gave similar performance to PCA in terms of detection delay and missed faults. To extend PCA to nonlinear processes, Schölkopf et al. (1998) proposed a method called kernel PCA. Kernel PCA received attention as a nonlinear fault detection tech- nique because it did not involve a nonlinear optimization problem to be solved as had been required of predecessors of nonlinear PCA based on neural networks (Choi et al., 2005; Kramer, 1991). Kernel PCA uses integral operators and nonlinear kernel functions to map an input space to a feature space via a nonlinear mapping and then it computes the PCs in that space (Lee et al., 2004). Because kernel PCA can use a variety of nonlinear kernel functions, it can handle a wide range of process nonlineari- ties. However, Choi et al. (2005) note that the computational requirements for kernel PCA may be unreasonable for large systems with many samples, and the feature space is dicult to interpret compared to other methods such as linear PCA. Support vector machines An alternative to kernel PCA is support vector machines (SVMs). SVMs were origi- nally proposed by Boser et al. (1992) as a machine learning technique to classify data having two di↵erent pre-defined categories, and the technique has been evolving since. SVMs have shown to be suitable for fault detection in order to classify data as either faulty or non-faulty. Figure 2.7 shows the principle of SVMs on a two-dimensional plot of arbitrary process variables y1 and y2. The technique discovers an optimal linear classifier in the form of a hyperplane which maximizes the margin between data points on either side of the hyperplane (Tong and Koller, 2002). If a nonlinear classification boundary is required, a kernel transformation can be used to map the data to a higher-dimensional feature space where a separating hy- perplane in this space results in a nonlinear boundary when re-mapped to a lower- dimensional space. SVMs can also be adapted with unsupervised learning so that it 18 Hyperplane Margin y1 y2 Normal Sample Fault Sample Figure 2.7: SVM trained with two classes where data samples on the margin are called support vectors only requires nominal operating data in the learning algorithm, rather than requiring data from both nominal and abnormal classes (Mahadevan and Shah, 2009). The disadvantage to SVMs, however, is that the kernel mapping technique for nonlinear boundaries is too computationally expensive to be used online in an industrial con- troller. Locally linear embedding One of the most promising feature extraction techniques with minimal computational requirements for nonlinear processes is locally linear embedding (LLE). LLE was first proposed by Saul and Roweis (2000) as an image analysis technique and has found applications as a fault detection technique, such as for the machinery fault detection simulated case study performed by Li and Zhang (2011). However, no application or case studies of LLE for the chemical or mineral processing industries have been docu- mented in literature, nor has this technique been applied in an industrial environment. LLE is a simple and elegant technique to reduce data dimensionality and extract important features within data that are useful for detecting abnormal operation. For example, Figure 2.8 demonstrates LLE reconstructing a three-dimensional (3D) ‘Swiss Roll’ shaped manifold into two-dimensional (2D) space with the shape’s key features 19 intact. To accomplish this reconstruction, LLE assumes that each data point and its neighbors lie on a locally linear portion of its reduced dimension manifold (Li and Zhang, 2011). Accordingly, LLE often utilizes the K-nearest neighbors (KNN) method to determine the neighbors of each data point (Johnson and Wichern, 2007). The technique also requires the computation of two quadratic minimization problems to determine the optimal weights and coordinate vector in the reduced dimension. If the problem is well posed, the coordinate reconstruction can be minimized by solving a sparse n ⇥ n eigenvector problem, where n is the number of observations in the data matrix (Roweis and Saul, 2000). Figure 2.8: Dimensionality reduction by LLE performed on A, a 3D manifold of a Swiss Roll, where B is the sampling of the 3D manifold, and C is the reconstruction by LLE of the Swiss Roll in 2D space (Roweis and Saul, 2000) Li and Zhang (2011) observed that LLE neglects class information which would allow the technique to be trained with supervised learning to enhance fault recognition accuracy. Likewise, Li and Zhang (2011) found that LLE does not provide an explicit mapping from high to low-dimensional space. Consequently, new data samples can- not be mapped to the reduced dimension space resulting in the inability to use LLE for fault detection in an online controller. However, Li and Zhang (2011) proposed two techniques to overcome these drawbacks. The first is to artificially add a distance between data samples that are from di↵erent classes. The second is to find a locally lin- ear projection that best approximates the high to low-dimensional mapping performed by LLE. Subsequently, a simple classification technique, such as linear discriminant 20 analysis (LDA), is used on the low-dimensional feature space discovered by LLE in order to classify fault and non-fault conditions. The resulting method is named by the authors as supervised locally linear embedding projection (SLLEP). The results from Li and Zhang (2011) show that SLLEP o↵ers significant improvements in accuracy over PCA, and that the method, once trained, is relatively simple to implement and is computationally inexpensive in a digital control system (DCS). Since LLE is only a feature extraction technique, a simple classifier is used on the feature space discovered by LLE. LDA, also known as Fisher’s discriminant analysis and originally proposed by Fisher (1936), may be used as a linear classifier. LDA is a computationally simple technique to train. In addition, LDA provides an intuitive classification boundary between nominal and abnormal operating data. In this way, LDA is more advantageous than SVM as a classifier. Furthermore, LDA can be ex- tended to a multiclass linear classifier (Rao, 1948). This is especially useful if separate control strategies are needed for multiple abnormal process operating regions. Since LLE preserves the nonlinear features of process data, LDA may need to be extended to its quadratic version, known as quadratic discriminant analysis (QDA), to capture this nonlinearity. Both techniques assume a multivariate normal distribution of the data; however, LDA assumes the covariances of the abnormal and nominal operating data are the same. QDA is derived from one fundamental di↵erence: the covariances of the abnormal and nominal operating data are not assumed to be equal (McLachlan, 2004). Li and Zhang (2011) state that the mapping from SLLEP is more separable than LLE and can be suciently classified with either LDA or QDA. 2.2 Controller performance recovery Controller performance recovery is that the task of recovering a process from an ab- normal, nonlinear operating region, back to nominal operation where the nominal linear controller is tuned for its highest performance. Furthermore, a process shift to an abnormal operating region is usually caused by disturbances to the process that 21 were unaccounted for in the nominal control strategy. Unfortunately, it is not always feasible to prevent disturbances from causing a process shift. Chen (2004); Liu and Peng (2000) propose modeling observers that identify external disturbances and then compensate for the e↵ects of these disturbances with proper feedback. However, this is a model-based process monitoring method, hence very accurate process models are required to design an e↵ective observer. Krohling and Rey (2001) propose another method where a PID controller can be designed to reject disturbances if the H1 norm of the disturbance can be assumed to be limited. In both approaches, there are lim- itations to their applicability; as a consequence, when these limitations are violated the process may still deviate from its nominal operating envelope. We instead discuss five techniques in this section that improve controller performance in the event of a process shift: fault-tolerant; nonlinear; robust; gain scheduling; and override control. Fault-tolerant control Since abnormal situations may still arise despite disturbance rejection e↵orts, there has been considerable interest in literature in fault-tolerant control. FTC is a control strategy that is tolerant to component malfunctions in order to maintain desirable closed-loop performance and stability (Patton, 1997; Zhang and Jiang, 2008). In the event an abnormal situation is detected, FTC attempts to reconfigure the control system by selecting one of several predetermined control configurations whose stability regions have also been predetermined. The selected control configuration is the one whose state of the closed-loop system resides in the stability region and has the most minimal cost function (Mhaskar et al., 2005). The limitation of FTC is that it is not a control strategy designed to recover a process in the event of a process shift. Rather, FTC is designed to avoid disaster in the event of a malfunction of process equipment, sensors and actuators, controller, or control loop. A process shift, however, is a result of acceptable system behaviour that causes the process to deviate from the narrow operating range a linear controller 22 is designed for. In this event, a control method is needed that optimizes only the recovery of the process to its nominal operating range. Nevertheless, FTC shares a similar overall strategy to the objectives of this thesis, and FTC can o↵er valuable insight into process monitoring and controller performance recovery. Other techniques reviewed with potential application to this work are nonlinear, robust, gain scheduling, and override control. Nonlinear control One strategy that addresses the problem of poor control when a process has shifted to a new operating region is nonlinear control. Nonlinear controllers are built from a nonlinear model of the process. As a result, these controllers can have a much wider range of operation than a linear controller. Existing linear control methods are utilized to design a nonlinear controller; however, feedback systems with nonlinear control are usually not analyzable by standard LTI system theory (Nijmeijer and Van der Schaft, 1990). Moreover, Goodwin (2002) shows that it is dicult to guarantee global stability for a nonlinear system, and nonlinear feedback design techniques are usually associated with finding a system output that has stable zero dynamics at the desired operating point. Furthermore, Primbs and Nevistic (1996) show that nonlinear control is not always optimal since nonlinear control problems are usually non-convex, and there is often a trade-o↵ between optimality and global stability. Although a nonlinear controller can o↵er greater performance and range of op- erability than a linear controller, the mathematical analysis involved with nonlinear feedback control is rigorous (Mariño, 1995). Consequently, personnel in the chem- ical industry are not well-equipped with the knowledge or tools needed to maintain nonlinear controllers. Besides, Ljung (2001) explain that linear feedback control is for- giving and achieves good control based on an approximate model. Therefore, nonlinear controllers are not essential in the chemical and mineral processing industries to main- tain reasonable performance. Furthermore, linear control methods can be adapted to 23 recover a process from a nonlinear region. Robust control Robust control is a technique that can be formulated as a linear control method and be used in place of the nominal controller in a process. A robust controller can be designed to achieve robust performance and/or stability for a given set of modelled uncertainty (Skogestad and Postlethwaite, 2005). Consequently, disturbances pertaining to certain unknown parameters, such as ore hardness, can be modelled and a robust controller, such as a µ-controller, can be designed to guarantee stability for a given range of modelled uncertainty (Packard and Doyle, 1993). The challenge with robust control is the performance–stability trade-o↵ of the feedback system. To overcome this challenge, multiple robust controllers may be designed for specific operating intervals. However, there is generally not enough expertise in the chemical industry to maintain robust controllers. Gain scheduling control Gain scheduling control is a promising linear technique that is a form of adaptive control. Nonlinear plant dynamics are modelled as a linear parameter-varying sys- tem, and a Jacobian linearization about a family of operating points is taken in a method called linearization scheduling (Rugh and Shamma, 2000). A family of linear controllers are developed for the linear parameter-dependent plant and the controller gains are scheduled according to the value of the scheduling variable. Gain scheduling techniques are successfully utilized by industry in many applications such as wind tur- bine control (Bianchi et al., 2006). However, the main disadvantage of gain scheduling is that stability can only be guaranteed under the assumption of a slow variation in the scheduling signals (Leith and Leithead, 2000). Additionally, di↵erent gain scheduling techniques need to be developed depending on whether the nominal controller to be scheduled is PID (Rugh and Shamma, 2000), MPC (Chisci et al., 2003), or another 24 type. Override control Nonlinear, robust, or gain scheduling control strategies focus on replacing the nominal controller in the process with a new controller that is tuned to control a process at both the nominal and abnormal operating points. An entirely di↵erent approach is to keep the nominal controller that is tuned for nominal operation, and implement a secondary controller that is tuned specifically for the abnormal operating region re- quiring recovery control. The secondary controller would then override the nominal controller’s actions once the process shifts to the abnormal operating region. This con- troller would be designed to optimally recover the process back to nominal operation. The benefit of using an override controller is that the nominal controller in the pro- cess is kept, and the secondary controller can be designed from linear control methods that are easy to maintain. The secondary controller would be some form of an optimal controller to return the process back to nominal operation, such as MPC (Coetzee, 2009). The disadvantage to this approach is that transient signals caused from switch- ing between controllers may lead to instability. Hespanha and Morse (2002) propose a method to avoid instability from controller switching by properly selecting controller realizations where the closed-loop system is bounded for every possible switching sig- nal. In addition, some modern DCS eliminate feedback on inactive loops to prevent integral windup that would cause unstable oscillations upon switching. 2.3 SAG milling The case study selected for this research is a SAG mill at a mineral processing facility located in British Columbia. Grinding is one of the most fundamental unit operations in a mineral processing facility. Optimal operation of grinding mills is an important research problem because they are the most energy intensive processes in a mineral processing facility—constituting nearly half of all energy consumed in a concentrator 25 plant (Yang et al., 2010). Furthermore, the product particle size distribution from grinding mills has a large e↵ect on the recovery of valuable minerals in downstream processing (Chen et al., 2008). Many mineral processing facilities, such as the mineral processing facility in British Columbia at the start of this work, have only one SAG mill due to their large size and operating cost. An industrial survey conducted by Wei and Craig (2009) revealed that SAG mills are used in approximately 38% of all concentrator plants, and approximately 35% of all concentrator plants have one milling circuit. Therefore, SAG mills often bottleneck the total ore throughput of these facilities, and their stable operation is critical to plant economics. Moreover, if there is only one SAG mill, the entire plant can be shutdown in the event the mill is shutdown. Improving the control of a SAG mill can lead to a substantial increase in the sta- bility and performance of the mill and downstream processing. However, the input conditions and internal state of a SAG mill are highly uncertain and disturbances can easily lead to abnormal and unstable operation, even when the most advanced control methods available to the industry are utilized. For example, the mineral processing facility in British Columbia currently utilizes MPC to control their SAG mill. Accord- ingly, the MPC at this facility is often set to manual mode so operators can manage the abnormal operating conditions of the SAG mill resulting from input disturbances and variations in the gain of the system. Previous work to address the stability and abnormal conditions of the SAG mill includes robust control (Galán et al., 2002), adaptive control (Najim et al., 1995), and disturbance-observer with MPC (Yang et al., 2010). However, the drawbacks are that robust control sacrifices performance for stability (Skogestad and Postlethwaite, 2005), and adaptive control will be sluggish when dealing with strong disturbances (Chen et al., 2009). Disturbance-observer MPC is a promising method to handle disturbances directly; however, if an unobserved or poorly rejected disturbance perturbs the system to an abnormal operating state, then the controller will no longer e↵ectively manage 26 the process. 2.3.1 Process overview In a mineral processing facility, ore size must be reduced in order to liberate the valuable minerals from the waste rock for downstream processing. Accordingly, a SAG mill is often used in these facilities as a means to grind the ore. SAG mills are unique to other grinding processes because they utilize the ore itself to reduce material size, and can typically grind ore from 200 mm down to 0.1 mm in one unit (Wills and Napier-Munn, 2006). Semi-autogenous grinding implies that a steel ball charge, usually between 4% to 15% of the total mill volume, is added to the ore charge to enhance the grinding process. In these mills, the total charge volume usually occupies between 30% and 35% of the mill interior (Gupta and Yan, 2006). The main components of a typical grate-discharge trunnion supported SAG mill, such as the one operated by the mineral processing facility in British Columbia, are described below and shown in Figure 2.9. 1. Feed trunnion: provides a support for the mill and an entrance for ore and water to enter the mill. 2. Mill shell: main chamber where ore is lifted by the lifting bars and tumbles down to grind the ore as the mill rotates. 3. Grate: a screen allowing the finished product in the form of slurry to pass into the discharge chamber. 4. Pulp lifter: lifts the ore slurry that passes through the grate to the discharge trunnion. 5. Discharge trunnion: provides a support for the mill and an exit for ore slurry. The primary objective of SAG mill control is to stabilize the circuit, and the secondary objective is to maximize the economic performance of the mill (Coetzee, 27 Feed Trunnion Mill Shell Grate Pulp Lifter Discharge Trunnion Figure 2.9: Schematic of a typical grate-discharge SAG mill (Latchireddi and Morrell, 2003a) 2009). To maximize the economy of the mill, sub-objectives often include (Craig and MacLeod, 1995): 1. maximize throughput. 2. minimize power consumption per tonne of ore processed. 3. minimize steel ball and liner consumption per tonne of ore processed. 4. minimize variations in output particle size. 2.3.2 Overloading SAG mills are often operated as close to their upper throughput constraint as possible in order to maximize the economic performance of the mill. As a result of high feed rates, the operation of the mill can easily become unstable from overloading. Preventing overloading is non-trivial because the properties of the ore that a↵ect grind-ability, such as ore size, hardness, and composition, are highly variable, and ore hardness is not feasible to measure online. Likewise, the internal state of the mill cannot be directly measured, and estimates from mill power and bearing pressures are seldom accurate (Wei and Craig, 2009). 28 The principle variables a↵ecting the internal state of the mill are changes in feed rate and circulating load, size distribution, ore properties, steel ball charge and liner wear, and the rate of feed water addition (Wills and Napier-Munn, 2006). Moreover, slurry rheology is highly variable and can have significant e↵ects on load distribu- tion, breakage behaviour, and discharge eciency (Moys, 1989; Shi and Napier-Munn, 2002). Figure 2.10 shows how a mill can be overloaded for several cases of these varying process conditions. The figure shows the mill is always overloaded for a specific range in charge volume; however, the maximum power draw, representative of mill overload- ing, varies between operating conditions A through D. Unfortunately, it is currently impossible to accurately measure charge volume or slurry pooling during operation. Consequently, it is dicult for operators to detect mill overloading, and once it has been detected, usually from an increasing bearing pressures and constant or decreasing mill power, the mill has been overloaded for a long period. Once detected, operators must then make sudden and large process changes to decrease the feed ore rate to let the mill slowly grind down the charge. Po w er Total Charge Volume Overload Zone A B C D Figure 2.10: Mill power curve for several process conditions A, B, C, and D showing that mill power is a poor indicator of overloading Overloading occurs as a result of either, or a combination of, three cases: volumetric 29 overloading, slurry pooling, or the freewheel e↵ect. Additionally, slurry pooling and the freewheel e↵ect eventually lead to volumetric overloading. These three modes of overloading are discussed in detail in the remainder of this section. To the best of our knowledge, this thesis is the first publication that discusses all three complex phenomena within a SAG mill that directly leads to mill overloading. Case 1: Volumetric overload Volumetric overloading occurs when either the ore feed rate, particle size distribution, or ore hardness is too high, thus leading to an accumulation of charge in the mill. Additionally, any disturbance that causes the grinding eciency in the mill to decrease may also lead to volumetric overloading. As charge accumulates in the mill, the volume occupied by the charge increases such that the total distance ore can fall under gravity, known as cataracting, decreases and causes the grinding eciency of the mill to degrade (Morrell, 1993). Figure 2.11 illustrates the reduction of maximum potential energy (PEmax) of cataracting ore resulting from a loss in free-fall distance when the mill is overloaded. c.g. charge rotation τc xc c.g. charge τc rotation xc (a) (b) PEmax PEmax Figure 2.11: Cross-sectional view of the inside of a SAG mill showing load torques and maximum potential energy of cataracting ore for (a) a non-overloaded mill, and (b) an overloaded mill with reduced potential energy of the cataracting ore 30 Once a SAG mill has begun to overload, ore may rapidly accumulate until the charge level reaches the center of the mill where slurry is allowed to freely discharge and the feed chute plugs. However, depending on the charge density, the bearing pressures may become too high such that the system alarms well before this event. Depending on the charge density, it may be dicult to detect volumetric overloading early, since this type of overloading is determined by charge volume rather than mass. However, Figure 2.11 shows that the center of gravity (c.g.) shifts inside the mill, as illustrated by the length of the charge torque arm (xc in Figure 2.11). Consequently, it may be possible to measure the load distribution using the pressure measurements from the bearing units supporting the mill, or by monitoring whether the motor power decreases resulting from a reduced counter-torque provided by the charge. However, the density, volume, and distribution of the charge in the mill are highly variable and may not always lead to a significant change in the counter-torque provided by the charge (⌧c in Figure 2.11) to a↵ect mill power noticeably. Case 2: Slurry pooling Slurry pooling occurs when an excess of slurry builds up inside the mill from poor pulp lifter design resulting in the formation of a slurry pool (Mokken et al., 1975). The slurry pool drastically dampens the impact energy of the falling rock and reduces grinding eciency. Figure 2.12 shows how slurry is distributed inside the mill for various slurry levels. The figure shows that slurry is held-up within the solid charge as the material is lifted up with the rotation of the mill (Latchireddi and Morrell, 2003a). As slurry content increases, the slurry fills the interstitial space of the solids charge in the direction of the toe, or bottom of the solids charge. When all the interstitial space is occupied by slurry, a pool then forms at the toe of the charge (Cleary et al., 2006). As the slurry pool forms inside the mill, grinding eciency decreases and the mill may become overloaded—leading to Case 1. At this point, operators may realize that 31 Slurry in Charge Solids Charge Slurry Pool (a) (b) (c) Figure 2.12: Distribution of slurry within a mill for (a) low slurry content, (b) charge saturated with slurry, and (c) slurry pool formation due to excess slurry content (Latchireddi and Morrell, 2003a) for a constant ore feed rate, the trunnion bearing pressures increase indicating an increase in mill weight. However, mill power may not increase in proportion to the bearing pressure as would normally be expected. This phenomena is the result of a shift in the center of gravity of the mill toward the center as the slurry pool builds. Figure 2.13 shows the slurry pool provides a counter torque (illustrated by ⌧p and the length of the slurry torque arm xp) to the torque provided from the charge, ⌧c (Moys et al., 1996). As a result, the center of gravity of the total charge inside the mill shifts towards the mill center and results in a decrease in mill power for severely overloaded mills. 32 c.g. charge rotation τc xc c.g. poolc.g. charge τc τp rotation xc  xp (a) (b) Figure 2.13: Cross-sectional view of the inside of a SAG mill showing load torques for (a) a non-overloaded mill, and (b) an overloaded mill with additional torque from slurry pool Case 3: Freewheel e↵ect The third mode of mill overloading is when charge becomes trapped between the lifter bars due to centripetal force and the viscosity and density of the charge (Dean, 1999). This phenomena is known as freewheeling because the inertia of the trapped charge combines with the kinetic energy of the mill and e↵ectively reduces the motor load. In addition, motor load decreases significantly because the remaining charge in the mill is no longer lifted because the lifter bars are obstructed. Therefore, the remaining charge becomes static and results in poor comminution and an accumulation of ore in the mill—leading to Case 1. In this abnormal situation, increasing the feed water or decreasing the ore feed rate is needed to decrease the density and viscosity of the charge and recover from the freewheel e↵ect. The freewheel e↵ect is illustrated in Fig. 2.14. 33 c.g. charge τc rotation xc (b) c.g. charge τc rotation xc (a) Figure 2.14: Cross-sectional view of the inside of a SAG mill showing load torques and load distribution for (a) a non-overloaded mill, and (b) an overloaded mill with freewheeling e↵ect 34 Chapter 3 Problem Statement The objective of this thesis is to develop a linear control strategy to recover a process to its nominal operating region following a disturbance that has shifted the process to an undesired operating region. To satisfy this objective, the problem is divided into two parts: 1. apply a linear or nonlinear process monitoring method to detect when the linear or nonlinear process has shifted to an undesirable operating region. 2. develop a control method to optimally recover the process back to its nominal operating region. It is also the objective of this thesis to illustrate the overall control strategy with an industrial application on the SAG mill at the mineral processing facility in British Columbia. Consequently, the control strategy must also satisfy the following condi- tions: • be easy to understand and maintain for process control engineers in industry. • be computationally simple for it to operate in an industrial controller. • be robust to changing process conditions. There are many forms of disturbances to a process, and many root causes for these disturbances. However, we are not interested in the nature of the disturbance. Rather, 35 this work is concerned with the nature of the process shift. The control strategy must be robust to abrupt, incipient, or transient forms of a process shift. In addition, this control strategy only targets process shifts to specific undesired operating regions where the process is recoverable to the nominal operating region. Therefore, process shifts resulting from equipment failure may not result in a controllable recovery and are not within the scope of this thesis. A process monitoring and controller performance recovery technique, when ap- plied so that the techniques work together, may be extremely useful when used to recover processes from specific undesired operating regions where nominal controller performance is low. In particular, these techniques are essential to recover controller performance from undesired operating regions that result in severe consequences to the safety and/or productivity of a process. The following two sections elaborate on the objectives of this thesis. 3.1 Process monitoring As part of the control strategy to recover a process, the process shift must first be de- tected. Therefore, the process monitoring method must have the ability to determine whether the state of the system is nominal, or whether it has shifted to an abnormal operating region. Furthermore, there may be many di↵erent abnormal operating re- gions of a process. For example, a process such as a grinding mill may be overloaded or underloaded. In either case, the throughput of the grinding mill will be depressed, and the process equipment will be at a higher risk of damage. In this example, the nominal controller can likely recover the process from the underloaded condition. However, an overloaded mill can quickly become unstable due to the nonlinearity of the process. In this case, process monitoring and the controller performance recovery techniques are designed specifically for the mill overload. Process monitoring methods have been thoroughly developed and reviewed in lit- erature. However, for many process monitoring methods in literature, there has not 36 been an e↵ort to develop a control strategy that utilizes the information from process monitoring to be used by a controller to recover the process. Consequently, existing techniques will be adapted in this work in order to provide information to a controller that links the process monitoring and controller performance recovery methods. 3.2 Controller performance recovery The purpose of controller performance recovery is to recover a process from a nonlin- ear, undesired, operating region back to the nominal operating region where nominal controller performance and process economy is maximized. Comparatively, it is not the objective of this work to prevent the occurrence of a process shift that leads to a loss in controller performance. There are plenty of techniques in literature that try to address this issue; however, it is sometimes inevitable that a process may shift away from its nominal operating region. Nevertheless, it is possible that a control strategy that focuses on recovering a process in order to improve controller performance can also prevent a process from shifting to an undesired operating region. For example, if the process shift is incipient, a control strategy may halt the process shift at the boundary between what is defined as the nominal and undesired operating region. This is an unintended, yet beneficial, consequence of a control strategy that is robust to a diverse nature of process shifts. The controller performance recovery technique is intended only for a specific class of systems. We are interested in systems that can be reasonably approximated as time-invariant. Moreover, the specific abnormal operating region of a process that the control strategy is designed for must be reasonably approximated as LTI. How- ever, most chemical processes are nonlinear. Therefore, the linear control method can only be e↵ective for a specific range of the abnormal operating region that can be ap- proximated as ‘locally linear’. Outside of this range, nonlinear control techniques not covered within the scope of this work need to be applied in place of the linear control method proposed here for optimal recovery. Alternatively, simple rules based on op- 37 erators’ best practices may be added to the controller performance recovery technique to help recover a process outside of the locally linear region to the region where the technique is designed. However, this alternative is not mathematically optimal and is only to be used if the focus is to recover the process from severe conditions where the linear control strategy designed in this work no longer suces to recover the process. Implementing a controller performance recovery technique where an existing nom- inal controller exists for a process may have unintended consequences. For example, the overall economic performance or stability of a process should not be compromised; conversely, the economic performance and stability of the process should be improved. In addition, the transition between the nominal and abnormal operating region should not result in stability issues or transient behaviours. 38 Chapter 4 General Control Strategy This chapter covers the establishment of a general control strategy to recover a chem- ical process from an abnormal operating region to its nominal operating region, as stated in Chapter 3. The control strategy is divided into two main tasks: the detec- tion of a process shift through process monitoring, and the recovery of the process to nominal operation using a linear controller. An application of this control strategy on semi-autogenous grinding is presented in Chapter 5. Figure 4.1 presents a flowchart showing the major steps of the general control strategy presented in this chapter. The first step is process monitoring. Process monitoring is the task of monitoring process outputs and determining if the process is operating in an abnormal operating range that requires recovery control. In the event the process requires recovery control, it is also the task of process monitoring to estimate the magnitude of the process shift. Process monitoring is discussed in Chapter 4.1. In the event that the process monitoring method detects a process shift where it is deemed control action is necessary, the next task is to recover the process from the abnormal situation. The goal of a controller performance recovery technique is to execute control actions that are tuned specifically to recover the process back to nominal operation in the shortest time possible in order to recover nominal controller performance and process economy. Controller performance recovery is discussed in Chapter 4.2. 39 Measure Process Outputs Abnormal Operation? Normal Control Actions Determine Magnitude of Process Shift Recovery Control Actions YesNo Process Monitoring Chapter 4.1 Controller Performance Recovery Chapter 4.2 Figure 4.1: Flowchart of the process monitoring and controller performance recovery strategy applied to a chemical process 4.1 Process monitoring There are a multitude of techniques available for process monitoring. Data-based methods are selected in preference to model-based methods because model-based meth- ods usually require very accurate process models. Comparatively, chemical processes are often poorly understood and are too complex to model accurately. In addition, the computational e↵ort of nonlinear model-based techniques, such as particle filtering, is usually too large for an industrial controller. The two main steps for a general data-based statistical process monitoring method are shown in Figure 4.2. The first step is to extract the most important features from a measured high-dimensional data-set known as the observation space. This feature extraction step reduces data sparsity and the dimensionality of the data-set, thus al- lowing for more accurate training of a classifier in the second step. The second and final step of a general process monitoring procedure is to classify the low-dimensional 40 Feature Extraction Classification Observation Space Feature Space Decision Space T2, Qα, LDA, QDAαPCA, SLLEP Figure 4.2: Block diagram of the major steps to statistical pattern recognition for process monitoring feature space into regions of nominal and abnormal operating conditions. These clas- sified operating regions are known as the decision space. In this step, it is possible to include both categorical class information and a measurement of the magnitude of process shift in the decision space. There are two data-based process monitoring methods that suciently satisfy the desirable characteristics of a process monitoring method that were outlined in Chap- ter 2.1. PCA is a computationally ecient feature extraction method that can be applied to monitor process shifts for linear processes. In addition, the method is for- giving for process shifts that are mildly nonlinear because there will only be small changes in the correlation structure between process variables. Also, PCA can detect a process shift to an abnormal, nonlinear, operating region that has a significantly di↵erent correlation structure than nominal operation by measuring the squared pre- diction error. Classification of the feature space from PCA can be performed with Q and T 2 statistics, or LDA. For processes that are highly nonlinear, or are highly-multivariate such that PCA cannot find at most a three-dimensional model to visualize and summarize key in- formation regarding a process shift, then LLE can be used instead. LLE has proven in image analysis to map high-dimensional data to often three or fewer dimensions and still retain the most important nonlinear features (Saul and Roweis, 2003; Wang, 2011). The resulting low-dimensional space can be classified into regions of nominal and abnormal operation and plotted to visually verify the accuracy and credibility of 41 the technique. Moreover, LLE is also computationally ecient, and it is possible to design a simple approach for it to operate in an industrial controller. Classification of the feature space from LLE can be performed with LDA or QDA. 4.1.1 Principal component analysis PCA is well-suited as a feature extraction method for process monitoring of linear processes. The technique can easily fit in an industrial controller since there are only O(D2) online computations at each execution. Additionally, due to its simplicity, operator acceptance of PCA is high with several applications (Neogi and Schlags, 1998; Kourti et al., 1996; Kosanovich et al., 1996). In this section, we discuss two di↵erent methods to using PCA in process monitoring. In the first method, PCA is trained on nominal operating data, and the PC scores of the model are classified with Q and T 2 statistics. In the second method, we propose training PCA on nominal and abnormal operating data, and the PC scores are classified using LDA. In Chapter 5, we will demonstrate that the second method has better process monitoring results if the changes in the correlation matrix during a process shift are too subtle for the first method to detect. As discussed in Chapter 2.1.2, PCA extracts features within data by finding or- thogonal directions of maximum variance using SVD. The orthogonal directions of maximum variance are characterized by the eigenvectors of the correlation matrix from a training data-set, and the magnitude of these vectors is characterized by the eigenvalues. The main underlying assumption with PCA is the data are multivariate normal; however, reasonable results can often be achieved even if this assumption is invalid (Johnson and Wichern, 2007). The first, and most challenging, step of PCA is to determine which process vari- ables should be included in the model and how these variables should be transformed to extract key information in the data. Traditionally, nearly all measured variables related to a particular process are used to train the PC model. In fact, this is the 42 advantage to PCA: one does not need to know much about the process to use the technique. However, this method of training the PCA model may incorporate biases in the PC scores towards certain aspects of the process if there is a lot of redundant information between variables. In addition, the PC model may be trained to find cor- relations between certain variables in the training data-set that are in fact transient conditions that may not be experienced with the same operating conditions. Ap- pendix A discusses techniques helpful for selecting and transforming process variables for PCA. Consider a data matrix X 2 <n⇥D consisting of n observations and D measured process variables. Generally, n does not need to be very large to adequately train the PC model if there is adequate excitation of the process in the data. Since PCA is scale-dependent, the variables, or columns of X, must be scaled in some meaningful way (Johnson and Wichern, 2007). The most common method is to standardize the D variables by scaling them to zero mean and unit variance. Once X is scaled so that the mean and variance of the variables are in relative proportional to each other, the next step is to construct the covariance matrix, S, from X: S = 1 n 1X TX, (4.1) where S is a square symmetric matrix of dimensionalityD. Next, SVD is performed on S to find the eigenvector matrix, V 2 <D⇥D, and the eigenvalue matrix, ⇤ 2 <D⇥D: S = V ⇤V T , (4.2) where the columns of V are the eigenvectors of S, and ⇤ is a diagonal matrix of the eigenvalues sorted in descending magnitude. Now, it is possible to calculate the PC scores, T , which provides a measurement of how an observation scores for each PC. T 43 is calculated using the following expression: T =XV , (4.3) where the columns of T 2 <n⇥D are the scores corresponding to each PC. The next step is to select only the PCs explaining most of the variance in the data-set. There are several procedures to determine the number of PCs to be retained: 1. Cumulative percent variance (CPV): a measure of the total observed variance captured by the first p PCs (Zumo↵en and Basualdo, 2008). Select p such that CPV is greater than some threshold. 2. Scree analysis: the sharpest bend in a graph where the diagonal elements of ⇤ are plotted in descending order indicates the best choice of p (Jackson, 1991). 3. Cross-validation: partition the data into training and test sets and evaluate the predictive accuracy of the PC model for di↵erent values of p. p is chosen such that the squared prediction error no longer significantly increases if p is increased (Ko and Shang, 2011). The procedure for cross-validation is discussed in Appendix C. For process moni- toring, it is often acceptable to simply use CPV to find p that satisfies CPV > 90%. CPV is calculated from the following expression: CPV(p) = pX k=1 k trace(⇤) ⇥ 100%, (4.4) where k is the eigenvalue in ⇤ corresponding to the kth PC. Once p PCs have been selected, the columns of V corresponding to the first p PCs create the PC eigenvector matrix P 2 <D⇥p. The most common usage of PCA as a process monitoring technique uses T 2 and Q test statistics to classify process conditions. T 2 indicates the magnitude that obser- 44 vations are from the center of the PC ellipse, yet remain within the axis of the ellipse (Garćıa-Álvarez et al., 2009). T 2 2 <n⇥1 is calculated from the expression: T 2 = pX k=1 diag(tk 1 k t T k ), (4.5) where tk is the score vector in T corresponding to the kth fault-specific PC. The acceptable upper limit of any T 2 value is given by T 2↵: T 2↵ = p(p 1) n p Fp,np,↵, (4.6) where ↵ is the percentage confidence level, and F is the F -distribution at the ↵ statistical significance level with p degrees of freedom in the numerator and n – p degrees of freedom in the denominator. Values of T 2 larger than T 2↵ are considered abnormal observations. Furthermore, the magnitude of T 2 > T 2↵ provides an estimate of the magnitude of process shift. The Q test statistic may also be used to classify process behaviour (Jackson and Mudholkar, 1979). Q is di↵erent from T 2 in that it provides a measure of how far observations are from axis of the ellipse of the PC model. In other words, it provides a measure of how poorly the PC model fits a particular observation. Q 2 <n⇥1 is calculated as: Q = diag(ETE), (4.7) where E 2 <D⇥n is the residual matrix for the prediction of X from the selected PCs, and E can be calculated from the expression: E = (I  PP T )XT . (4.8) Here, I is the D ⇥ D identity matrix. For a new observation, xnew 2 <D⇥1, the matrix XT in Equation (4.8) is simply replaced with the vector xnew. The acceptable 45 upper limit of the Q statistic is given by Q↵: Q↵ = ✓1 " h0c↵ p 2✓2 ✓1 + 1 + ✓2h0(h0  1) ✓21 # 1 h0 , (4.9) where c↵ is the one-tailed normal deviate corresponding to the upper (1↵) percentile, and ✓i and h0 are calculated from the following two expressions: ✓i = DX j=p+1 ij h0 = 1 2✓1✓3 3✓22 , (4.10) with j raised to the ith power, and 1  i  3. Q functions similar to T 2 where values of Q greater than Q↵ are classified as abnormal operating data. Moreover, the magnitude of Q > Q↵ provides an estimate of the magnitude of the process shift that can be used as a control signal. The Q statistic is unique because it makes process monitoring with PCA applicable to some nonlinear processes. Since Q provides a measure of how well the PC model fits the observed data, Q will become larger as the process shifts to a nonlinear operating region that is outside of the region where the linear PC model was trained. However, the Q statistic estimates the magnitude of process shift into any nonlinear region. Therefore, the Q statistic should be coupled with some sort of rule set to distinguish which nonlinear operating region the process has shifted to. Taking a SAG mill as an example, if Q > Q↵ and the bearing pressure of the mill is high, then the mill is likely to be overloaded as opposed to being underloaded. The traditional approach to use PCA as a process monitoring technique is divided into the o✏ine and online portions of the method and discussed in Method 1: Method 1: O✏ine 1. Obtain a training data-set, X, from nominal process behaviour with key process variables selected relevant to abnormal operation. 46 2. Obtain the mean and standard deviation vectors of X, then standardize the columns of X to zero mean and unit variance. 3. Perform SVD on the correlation matrix of X to obtain the PC model. 4. Select the number of PCs explaining most of the variance in the data using CPV. 5. Determine the upper control limits for the T 2 and Q statistics. Method 1: Online 1. Obtain a new test sample, xnew. 2. Scale xnew using the original mean and standard deviation vectors of X. 3. Evaluate the T 2 and Q statistics. If one of these values exceeds their upper control limit, then it is considered an abnormal situation. Traditionally, when T 2↵ or Q↵ are violated, the response of the DCS is to send an alarm to the operators so they can manually recover the process from the abnormal situation. However, in this work, we are proposing using the magnitude that these control limits are exceeded as an estimate of the magnitude of the process shift to an abnormal operating region. Consequently, this signal can be controlled in the DCS to reduce the magnitude of the process shift below some constraint value. A new approach to use PCA for process monitoring is proposed here for cases where it is found that the T 2 or Q statistics do not correctly identify the process abnormality upon testing the PC model. This may occur if the abnormality causes only subtle changes to the correlation structure of the data. In this event, there may exist some region in the high-dimensional space of X that can be classified as an abnormal operating region. Furthermore, PCA can be used to reduce variable redundancy and lower the dimensionality of X to allow for a simple classification 47 method, such as LDA1, to classify the nominal and abnormal operating regions within the data. The procedure for this approach is outlined below in Method 2: Method 2: O✏ine 1. Obtain a training data-set, X, from a relatively even distribution of nominal and abnormal process behaviour with key process variables selected relevant to detecting the abnormal operation. 2. Obtain the mean and standard deviation vectors of X, then standardize the columns of X to zero mean and unit variance. 3. Perform SVD on the correlation matrix of X to obtain the PC model. 4. Perform LDA classification with 10-fold cross-validation, as discussed in Ap- pendix C, on all 2D combinations of the PC scores in T . 5. Select only the p PCs that best allow the LDA classifier to separate the nominal and abnormal operating data. 6. Find the linear equation of the LDA classifier from Equation (4.27) separating the nominal and abnormal operating data in the pth dimension. Method 2: Online 1. Obtain a new test sample, xnew. 2. Scale xnew using the original mean and standard deviation vectors of X. 3. Calculate the p PC scores of xnew, where each score is part of the score vector, y, of xnew. 4. Calculate the LDA classifier output value and class label for y from the linear equation discovered in Step 6 of Method 2: O✏ine. 1Refer to Chapter 4.1.3 for a discussion on LDA 48 In the o✏ine portion of Method 2, it is most likely that a fault will be character- ized by no more than three PCs. It is also not necessary that the fault-specific PCs determined from this procedure be the PCs that explain most of the variance in the data. If an abnormal situation is significantly characterized by more than three PCs, then LLE should be used instead of PCA since LLE is usually capable of mapping high-dimensional data to three or fewer dimensions. It is important to have three or fewer dimensions characterizing a process shift to an abnormal operating region because it allows operators to visually interpret the PC scores and the classification boundary and gain trust in the process monitoring method. In Method 2, abnormal process training data should be selected where the correla- tion structure between the highly correlated variables determined from Method 1 does not change drastically when the model is trained with the addition of the abnormal process data. The objective of training the PC model with the inclusion of abnormal process data is to develop the correlation structure of variables having poor correla- tions in the nominal operating region, yet may express a clear correlation structure in the abnormal operating range. The shift in center of gravity of a SAG mill is a good example of a variable that only has a clear correlation structure once the mill has begun overloading. These types of variables provide extra redundancy in the PC model that is useful for detecting complex phenomena such as mill overloading. For both Method 1 and Method 2, it may be beneficial to scale the variables in X from 0% to 100% in proportion to the minimum and maximum values that may be expected during process operation. A percentage scale is meaningful to operators and more robust to time-varying changes (MacGregor and Kourti, 1995). The minimum and maximum limits to scale the variables can usually be estimated from the operating limits of the variables measured in the DCS, since industrial control often requires process variables to be scaled from 0% to 100%. When using a percentage scale, it is important to scale so that the variances of the variables are in relative proportion to each other. However, the limits for scaling may also be manipulated to add bias to 49 certain variables in the variances when training the PC model. In Method 2, the signal to be controlled is the distance the PC score vector of xnew is located away from the classification boundary and into the abnormal operating region. Moreover, using the centroid of several consecutive score vectors reduces the likeliness that stochastic signals will cause a single observation to be classified as abnormal when the process is actually operating nominally. In Method 2, Fisher’s approach to LDA has the ability to perform feature extraction instead of PCA (Johnson and Wichern, 2007). However, we do not use Fisher’s LDA because we found that this technique is sensitive to changes in the mean values of variables for nominal operation from preliminary experiments on the case study in Chapter 5. This lead to poor performance with Fisher’s LDA and classification when the model was used on a new test set having di↵erent nominal operating conditions. On the other hand, we found that PCA is less sensitive to changes in the mean values of process variables, thus better process monitoring results are expected for monitoring processes with varying operating conditions. If either Method 1 or Method 2 is unable to distinguish the abnormal process behaviour of concern from the nominal process behaviour, it may be because the abnormal process behaviour is too nonlinear for PCA, hence a nonlinear process mon- itoring technique such as LLE is needed. Likewise, if more than three PCs are needed to classify the abnormal process behaviour, then LLE should be used since it has demonstrated in image analysis to be capable of mapping high-dimensional data on the order of thousands of dimensions to three or fewer dimensions (Saul and Roweis, 2003; Wang, 2011). 4.1.2 Locally linear embedding If the process is too nonlinear or high-dimensional that PCA is found to be unsuitable as a process monitoring method, then LLE can be used as an alternative to PCA. LLE, however, is more computationally expensive than PCA both in training o✏ine 50 and in online operation. As discussed in Chapter 2.1.2, LLE is a feature extraction method that nonlinearly maps locally linear patches of a high-dimensional data-set into a low-dimensional space that preserves the critical information in the data-set. Furthermore, if LLE is trained with supervised learning, classification of the low-dimensional mapped space can be done with a simple classifier. As a result, LLE allows operators to visualize how the process monitoring method determines whether new observations are classified as nominal or abnormal. Before training the LLE model, we must first determine which process variables should be used in the model, and how they should be transformed. Appendix A gives some suggestions to this process. Unlike PCA, LLE does not require as much pruning when selecting process variables; in other words, LLE can tolerate high-dimensional data-sets. On the other hand, training LLE requires substantially more process data than PCA. LLE requires a lot of data because new data are mapped to the feature space by performing a linear regression on the training data, and if a lot of training data are available, then it is expected that a new data point and its neighbors in the training data-set are located on a locally linear patch of the manifold. To properly map a variety of process conditions, we found from the case study in Chapter 5 that we needed to train LLE with nearly n = 2000 observations that cover a wide range of operation and sampled only as frequently as necessary, including as extensive coverage as possible of the abnormal operation requiring the control strategy. If abnormal operating data are scarce, it should be sampled more frequently than nominal operating data in order to provide enough observations to properly train the technique. Correspondingly, n may be di↵erent depending upon the signal to noise ratio of the data, variations in operating conditions, and information redundancy in the training data. Lastly, failure to include enough data when training LLE may result in erroneous results when mapping new observations to the feature space. In the most simple formulation of LLE discussed by Roweis and Saul (2000), there 51 are three fundamental steps performed o✏ine of a DCS once X has been obtained: the first step is to select the nearest neighbors of each data point in X; the second step is to reconstruct each data point from its neighbors; and the final step is to map the data to a low-dimensional feature space. Fig. 4.3 reviews the three steps of the LLE procedure. In this form of LLE, there is only one free parameter to determine during training: the number of neighbors, K. Selection of K is discussed at the end of this section. 1 Select neighbors 2Reconstruct with linear weights 3 Map to embedded coordinates Wik xk xj Wij xi xi yi yk yj Wij Wik Figure 4.3: The three steps of the LLE algorithm (Roweis and Saul, 2000) Step 1: Select neighbors. The first step is to identify the K-nearest neighbors of the training data matrix, X 2 <n⇥D, where X has been standardized to zero mean and unit variance or scaled from 0% to 100%. To find the neighbors of the ith observation inX, xi, the Euclidean 52 distance between xi and every observation needs to be determined. Here, the K data points with the shortest distances are the neighbors, and a neighbor of xi is denoted as x⌘. The Euclidean distance between xi and another observation in X, xj is calculated from the expression: xi  xj2, (4.11) where || · ||2 refers to the l2 norm. The K-nearest neighbors characterize the geometry of these local patches by linear coecients used to reconstruct each data point from its neighbors. In reality, these local patches are not exactly linear; consequently, there are errors when using linear weights to reconstruct each data point from its neighbors in Step 2. Step 2: Reconstruct each data point from its neighbors. The second step is to reconstruct each data point from its neighbors by minimizing the cost function: minimize W "(W ) = nX i=1 xi  nX j=1 Wijxj 2 2 , (4.12) subject to Wij = 0, if xj 62 {x⌘1 ,x⌘2 , . . . ,x⌘K}, (4.13) nX j=1 Wij = 1. (4.14) Here, the weights W ij are the magnitude that the jth observation contributes to the ith reconstruction. W ij are constrained to zero in Equation (4.13) if they do not belong to the set of nearest neighbors for the ith reconstruction. "(W ) adds up to the squared distances between all data points and their reconstructions. The optimal weights subject to the constraints in Equation (4.13) and Equation (4.14) that minimizes the reconstruction error in Equation (4.12) can be found by solving a least-squares problem as detailed in Appendix B.1. 53 An illustration of the locally linear reconstruction of xi is shown in Figure 4.4. This figure shows that a nonlinear curve must be drawn between xi and its two neighbors. In order to to develop a linear reconstruction, the data point xi must be reconstructed onto a line between the two neighbors x⌘1 and x⌘2 . The weights W ij are the degrees of freedom to minimize the reconstruction errors. xi xη xη ΣWij xj1 2 Figure 4.4: Locally linear reconstruction of xi (Saul and Roweis, 2003) Step 3: Map to embedded coordinates. The third and final step is to map each high-dimensional observation, xi, to a low- dimensional feature vector, yi. Each yi represents the d-dimensional global internal coordinates on the low-dimensional manifold, where d represents a low dimension such that d < D, and yi is found by minimizing the embedding cost function: minimize y (y) = nX i=1 yi  nX j=1 Wijyj 2 2 , (4.15) subject to nX i=1 yi = 0, (4.16) 1 n nX i=1 yiy T i = I, (4.17) where I is the d ⇥ d identity matrix. Equation (4.16) prevents yi from be translated by a constant displacement by forcing yi to be centerd on the origin, and Equa- tion (4.17) avoids degenerate solutions by constraining the embedding vectors to have unit covariance. The embedding cost defined in Equation (4.15) is a quadratic program (QP); how- ever, subject to the constraints in Equation (4.16) and Equation (4.17) that make the 54 problem well-posed, the QP can be reformulated to solve a sparse n ⇥ n eigenvalue problem as detailed in Appendix B.2. Supervised LLE An adaption to LLE is discussed by De Ridder and Duin (2002) to improve the method when used to extract features within a data-set for subsequent classification. The technique is called supervised LLE (SLLE), and it includes class information in the algorithm by artificially adding a distance between samples from di↵erent classes. In process monitoring, there are usually two classes: data that is nominal and data that is abnormal. To perform SLLE, only Step 1 of the original algorithm is changed: before se- lecting the K-nearest neighbors of xi, add an additional distance, R0, between data points if they belong to di↵erent classes. Here, the distance R0 that incorporates class information is calculated as follows: R0 = R + max(R)(1 (xi,xj)), (4.18) whereR= ||xixj||2 is the Euclidean distance between two data points, and max(R) = maxij||xixj||2 is the distance between the two observations that are farthest apart in X. For every combination of xi and xj that belong to separate classes, (xi,xj) = 0, otherwise (xi,xj) = 1. Lastly, the parameter  is used to control the amount of class information incorporated into R0, where 0    1. In practice,  is often chosen to be at most 0.2 because large  values result in data from the same class to coalesce on the same point, thus resulting in poor mapping of new data to the feature space with LLE projection (De Ridder and Duin, 2002; Li and Zhang, 2011). Comparatively,  can be very small if there are outliers in the data that cause max(R) to be large. Owing to the ability of SLLE to separate data belonging to di↵erent classes, we are able to use a simple classifier, such as LDA or QDA, when classifying the feature space into 55 regions of nominal and abnormal process operation. LLE projection Since LLE is a non-parametric method, there is no equation to map new observations into the same feature space discovered by LLE during o✏ine training. As a result, Li and Zhang (2011) adapted LLE to project a new D-dimensional observation, xnew, to the d-dimensional space as ynew. This technique is called LLE projection, and there are three steps computed at each controller execution to project new observations into the feature space for subsequent classification: the first step is to select the nearest neighbors of a new observation to the observations in X; the second step is to find a mapping matrix; and the final step is to calculate the embedding coordinate of the new observation. LLE projection is performed online in a DCS at every scan. Moreover, this tech- nique requires the entire training data-set, X, and the entire embedded coordinate vector, Y , to be stored in the DCS. Here, X includes the class information from SLLE, and Y is a matrix containing all embedded coordinates, yi, determined from Equation (4.15). Step 1: Select neighbors. For a new high-dimensional sample, xnew 2 <D⇥1, the first step is to find its K neighbors, X⌘, in the training data-set X by a Euclidean distance measurement, where: X⌘ = [x⌘1 ,x⌘2 , . . . ,x⌘K ] 2 <D⇥K . (4.19) Step 2: Find a mapping matrix. The second step is to compute a mapping matrix, A, from the training data that satisfies: Y ⌘ = AX⌘, (4.20) 56 in the least squares sense, where: A = [a1,a2, . . . ,ad] T 2 <d⇥D, (4.21) Y ⌘ = [y⌘1 ,y⌘2 , . . . ,y⌘K ] 2 <d⇥K , (4.22) and Y ⌘ is the neighborhood embedded coordinate matrix of the feature data corre- sponding to the observations of X⌘. In other words, the coordinate vectors in Y ⌘ are taken from the vectors in Y corresponding to the observations, X⌘. Each basis vector of A, aj, is obtained by solving the linear least squares regression problem: aj = argmin a KX i=1 aTx⌘i  yj⌘i22, (4.23) where 1  j  d, and yj⌘i is the jth element of y⌘i . The analytical solution for the case of XT⌘ having full column rank and the case where X T ⌘ is column rank deficient is given in Appendix B.3. Step 3: Compute the embedding coordinate. The final step is to calculate the embedding coordinate of the new observation, ynew 2 <d⇥1, from xnew using the mapping matrix determined in Step 2: ynew = Axnew. (4.24) In Step 2 of LLE projection, the solution for A is obtained by solving the QP in Equation (4.23). Since the analytical solution described in Appendix B.3 requires the computation of a matrix inverse, it is dicult to program and execute in an industrial controller. In many industrial controllers, matrix inversion is not built-in and must be programmed using elementary row operations—often using Gauss-Jordan elimination (Friedberg et al., 2002). A new method is proposed in this thesis to solve A. Since MPC is a pre-existing 57 module found in most modern DCS, it can be used as a QP solver to solve Equa- tion (4.23). A dynamic matrix control MPC algorithm is discussed in detail in Ap- pendix D, which is a common unconstrained MPC formulation. To use any form of MPC to solve Equation (4.23), the manipulated variables (MVs) are treated as the elements of A, and the controller will determine the values of these elements in A by minimizing its cost function. Furthermore, the control variables (CVs) are treated as elements of Y ⌘ whose target set-points (SPs) are the values of the elements in Y ⌘. Finally, the input-output model gains of the MPC are the values of X⌘ pertaining to the elements of the input-output relation in Equation (4.20), and it is not necessary to assume a first-order model, or dynamics, in these models. Example of LLE projection using MPC MPC was used to find the projection of a data point in an ‘S’ shaped curve shown in Figure 4.5. In this figure, n = 800 random samples, or observations, of a 3D ‘S’ shaped manifold are shown in Figure 4.5a. In this plot, one observation is randomly selected, and its K = 12 neighbors are identified. Next, LLE was performed to map the 3D data to 2D space and the results are shown in Figure 4.5b. A black box depicts the inset of this plot that is shown in Figure 4.5c. Finally, in Figure 4.5d, MPC was used to perform linear regression on the neighbors of the observation, and the linear equation was used to project the coordinates of the data point in 3D space to the mapped 2D feature space. The plot shows that discrepancy exists between the coordinates of the data point mapped by LLE and its projection from the linear equation. This is due to the nonlinearity of the ‘S’ curve, and this error decreases as the number of samples the LLE model is trained with increases. The results from MPC for the mapping shown in Figure 4.5 are comparable to the analytical solution of Equation (4.23) described in Appendix B.3. In fact, MPC discovers the exact same solution as the analytical solution from the Moore-Penrose pseudo-inverse in one iteration. To have MPC converge on the solution immediately, 58 a) b) c) d)   observation neighbors projection Figure 4.5: LLE mapping of an ‘S’ shaped curve to demonstrate the accuracy of LLE projection, where a) is a 3D ‘S’-shaped manifold, b) is a 2D mapping of the 3D with an inset illustrated in c) of a data point and its neighbors, and d) is the data point estimated from the neighbors in LLE projection the models in MPC, as described by the coordinates of the K neighbors in 3D space, were treated as gain-only models with no dynamics. In addition, the penalty-on-move term of the MPC was set to zero for all six of the MVs describing the linear mapping matrix A. On a final note for LLE projection, we observe that X and Y need to be stored in the DCS in order to determine X⌘ and Y ⌘ in Equation (4.20). Since n needs to be large for adequate LLE projection results, a lot of data needs to be stored in the DCS. To reduce the amount of data that needs to be stored in the DCS, we recommend pruning the training data-set once the feature space has been trained in order to reduce the size of n. De Ridder and Duin (2002) propose a method to select a subset of the training data once the feature space has been trained to resolve this issue. 59 SLLEP for classification The combination of SLLE for training the feature space and LLE projection for map- ping new data is collectively known as SLLEP. Once SLLE trains the information rich feature space, this low-dimensional space can be classified into regions of nominal and abnormal process operation. If a control strategy is sought for only one type of abnormal operation, there will likely be only one contiguous region in this space corresponding to the abnormal operation. However, multiple regions can be classified if the features in the low-dimensional space characterize more than just one type of process abnormality. Owing to the ability of SLLE to incorporate class information when training the feature space, process data from separate classes is well-segregated and can be easily classified with a linear or quadratic classifier. It is ideal to use a linear or quadratic classifier because of the simplicity and robustness of training the classifier o✏ine, and because it is simple to determine a meaningful distance that an observation in the feature space is located from the classification boundary. LDA and QDA are suitable linear and quadratic classifiers, respectively, to classify the feature space from SLLE into regions of nominal and abnormal operation. LDA and QDA are discussed in Chapter 4.1.3. LDA should be used if it appears that the abnormal operating data are linearly separable from the nominal operating data. Otherwise, QDA, is usually acceptable. New data projected to the feature space is classified by the same classification rule as the training data to provide an estimate of the class label and the distance to the classification boundary. If LDA or QDA poorly classify the feature space into regions of nominal and abnormal operation, then the tuning parameters K and  should be adjusted when training the SLLE technique. Selection of K and  The selection of K and  significantly a↵ects the ability of LDA or QDA to classify the feature space discovered by SLLEP. K determines the number of neighbors around a 60 particular data point that are considered to be within a locally linear region of the observation space. If K is too small, then SLLEP does not have enough information about the global properties within the data-set; if K is too large, then the mapping loses its nonlinear character and behaves like PCA.  determines how much class information is included in training SLLEP. A large  will excessively separate classes and lead to over-generalization of the feature space. With a large , erroneous results may occur when projecting new observations into the feature space. Furthermore, the classifier loses the ability to provide meaningful information on the magnitude of process shift to a controller since data from the same class converge on a single point in the feature space. On the other hand, a small  does not incorporate any class information when training SLLEP. Consequently, it is less likely that LDA or QDA can classify the feature space due to the nonlinearity of the features. Since there are two free parameters to manipulate when training SLLEP, 10-fold cross-validation should be performed using the procedure discussed in Appendix C to optimize the selection of the parameters. Cross-validation finds the best selection of K and  by evaluating the unbiased prediction accuracy of LDA or QDA. Once the parameters have been chosen and the SLLEP model has been trained, the model should then be tested on a test set of data before commissioning in the industry. 4.1.3 Linear and quadratic classification LDA and QDA are simple classification methods that find a linear or quadratic com- bination of features that best separate two or more classes. In this thesis, we focus on the classification of two classes, and leave the simple extension to three or more classes to Johnson and Wichern (2007). The main underlying assumption in LDA and QDA is that the data to be classified are multivariate normal (Johnson and Wichern, 2007). Although this is not always the case, LDA often remains an e↵ective classifier if this assumption is not violated 61 (Johnson and Wichern, 2007). However, QDA may lead to erroneous results when classifying a feature space greater than 2D if this assumption is violated (Johnson and Wichern, 2007). Since PCA only performs well on multivariate normal data, LDA should be a sucient classifier when PCA is used for feature extraction. SLLEP does not assume multivariate normality, thus  can be tuned to suciently separate the classes in the feature space, and LDA may be used in the absence of multivariate normal data. However, QDA is a quadratic classifier, and it may be used to better separate a 2D feature space than LDA without having to over-tune . LDA and QDA classifiers are trained in a supervised manner from the known class labels of the feature data to be classified. Therefore, we must divide our feature data matrix into two groups, Y 1 and Y 2. Here, Y 1 contains the nominal operating data belonging to class ⇡1, and Y 2 contains the specific abnormal operating data belonging to class ⇡2 that represents the abnormal operating range that a control strategy is to be designed for. The remaining steps between LDA and QDA di↵er from each other and are discussed separately. Linear discriminant analysis To construct a linear classifier, it is assumed in LDA that the covariance matrices of Y 1 and Y 2 are equivalent. However, since the true covariance matrix of any of these populations are unknown, we take a pooled estimate of the covariance matrix, Spooled, from the sample covariances. Spooled is calculated from the expression: Spooled = n1  1 (n1  1) + (n2  1)S1 + n2  1 (n1  1) + (n2  1)S2, (4.25) where S1 and S2 are the sample covariance matrices calculated from Y 1 and Y 2, respectively, and the formula to find the covariance matrix is given in Equation (4.2). Additionally, n1 is the number of observations in Y 1, and n2 is the number of obser- 62 vations in Y 2. The next step is to find the mean vectors of Y 1 and Y 2: ȳ1 = 1 n1 Y T1 1, ȳ2 = 1 n2 Y T2 1, (4.26) where 1 is a vector of ones with as many elements as there are observations in Y 1 or Y 2. The allocation rule of LDA for a given observation in the feature space, yi, is: class = 8><>:⇡1 â Tyi  m̂  cost ⇡2 â Tyi  m̂ < cost, (4.27) where: â = (ȳ1  ȳ2)TS1pooled, (4.28) m̂ = 1 2 (ȳ1  ȳ2)TS1pooled(ȳ1  ȳ2), (4.29) cost = ln ✓ c(1|2) c(2|1) ◆✓ p2 p1 ◆ . (4.30) Here, â is the sample coecient vector, m̂ is the estimated midpoint between the two classes, and the cost equation accounts for the probability of misclassification and the cost of misclassification. In the cost equation, p1 is the prior probability of occurrence that an observation belongs to the ⇡1, and vice versa for p2; for example, if the probability that a new observation will belong to ⇡2 is small, then LDA should classify a new observation as ⇡1 unless the data overwhelmingly favors ⇡2. In addition, c(1|2) is the cost associated with classifying an observation as ⇡1, when it actually belongs to ⇡2, and vice versa for c(2|1). For example, if a SAG mill is overloaded and we classify a new observation as belonging to the non-overloaded class, then the mill overload is not corrected for and the consequences are more severe than if the mill was not overloaded but was classified as overloaded. Not only does the LDA allocation rule dictate the class of an observation, it also 63 gives the equation of the linear classification boundary for solutions to: âTyi  m̂ cost = 0. (4.31) Equation (4.27) provides the classification rule to determine whether an observation belongs to ⇡1 or ⇡2. In addition, the calculated output of â Tyi  m̂  cost yields a value that is proportional to the distance an observation is located away from the linear classification boundary. The output value is simply the closest distance an observation is to the classification line that has been scaled by an arbitrary amount that is consistent between all observations. In other words, this output value can provide an estimate of the magnitude of process shift that is needed for the controller performance recovery strategy. Quadratic discriminant analysis QDA is di↵erent from LDA in that there is no assumption that the covariance matrices of Y 1 and Y 2 are equivalent. Without this assumption, the classification regions are defined by quadratic functions of yi. The allocation rule of QDA is: f(yi) =  1 2 yTi (S 1 1  S12 )yi + (ȳT1S11  ȳT2S12 )yi  k  cost, (4.32) class = 8><>:⇡1 f(yi)  0⇡2 f(yi) < 0, (4.33) where: k = 1 2 ln ✓ |S1| |S2| ◆ + 1 2 (ȳT1S 1 1 ȳ1  ȳT2S12 ȳ2). (4.34) Here, |S| denotes the determinant of S. The QDA allocation rule not only dictates the class of an observation, it also gives the equation of the quadratic classification boundary for solutions to f(yi) = 0. 64 Similar to LDA, the calculated value of f(yi) provides an estimate of the distance that an observation is located from the quadratic classification boundary. However, unlike LDA, this distance is not the shortest distance that an observation is to the classification boundary. Instead, the value of f(yi) corresponds to the location of the observation on the classification boundary if the quadratic feature was increased or decreased in size. In other words, the output of f(yi) is a meaningful estimate of the magnitude that an observation is from the classification boundary that also accounts for the curvature of the classifier. As a result, this output value can be used to estimate the magnitude of process shift that is needed in the controller performance recovery technique. 4.2 Controller performance recovery Controller performance recovery is a control strategy where information from process monitoring is used to improve controller performance in order to recover a process from an abnormal operating range to its nominal operating range. This section discusses how to model the process in an abnormal operating region to allow us to tune the recovery controller. In addition, this section discusses two control strategies to recover the process: the first is applicable to a process that utilizes any style of nominal feedback controller, and the second is applicable to a process utilizing MPC as the nominal feedback controller. A general control objective of a process is given in the following linear optimization 65 problem: minimize u "(u) = abs(x xsp), (4.35) subject to xlower  x  xupper, (4.36) ylower  y  yupper, (4.37) ulower  u  uupper, (4.38)24ẋ y 35 = f(x,u,w), (4.39) where x is a vector of process states to control; y is a vector of measured process outputs; u is a vector of manipulated process input variables; the subscript sp repre- sents a set-point; and the subscripts upper and lower represent the upper and lower constraint limits. The process outputs are related to the inputs in Equation (4.39). Here, ẋ is the derivative of x, and w is a vector of measured input disturbances to the process. The process dynamics of f(x,u,w) are complex due to the nonlinearity of the process. Therefore, it is not always possible for a linear nominal controller to maintain high performance in the form of best minimizing Equation (4.35) in the event the process shifts away from the nominal, locally linear, operating region. As a result, a control strategy is needed to recover the process to the nominal operating region. MPC is used in both controller performance recovery techniques as the recovery controller. In the case where any style of nominal feedback controller is used, MPC will externally override the existing controller; conversely, in the case the existing controller is already MPC, then the process monitoring information is used as a new constraint variable in the existing MPC. MPC is an ideal choice because it is a mul- tivariate controller that can handle constraints, and it has achieved wide acceptance in the industry. MPC also solves an optimization problem in order to choose the best control actions that minimize prediction residuals over a finite prediction horizon. For 66 operators in the industry, MPC is reasonably easy to maintain and understand, and tools exist in most modern DCS to easily implement MPC. The principles and de- sign of an unconstrained MPC formulation are presented in Appendix D. For more detail on MPC, Camacho and Bordons (2007) discuss MPC in great detail including formulations involving constraints. In addition to using MPC as the recovery control in the event of a process shift leading to poor nominal controller performance, expert rules are also proposed in this section. Expert rules are only needed to recover a process that has shifted into a severely abnormal operating region where even the MPC recovery controller has poor performance. In this case, the objective of the expert rules is only to recover the controller performance of the MPC recovery controller, thereby enabling the MPC recovery controller to recover the process to the nominal operating region. 4.2.1 Modeling the abnormal operating region When modeling the abnormal operating region, we use the output from the process monitoring method as the CV used to monitor the process recovery. This CV contains a measure of the magnitude of the process shift into a specific abnormal operating region. Since this CV is a special transformation of measured process outputs, it is appropriate to use in closed-loop feedback. The main assumption to model the abnormal operating region is that this region is assumed to be locally linear. For this assumption to be valid, the control boundary for this region must be selected so that reasonably linear process behaviour can be expected for observations in one point fn the region to another. Figure 4.6 shows a 3D nonlinear manifold of process operation depicted by three arbitrary feature variables. This figure shows the locally linear regions of the nonlinear manifold depicted by planes tangent to the manifold where the nominal and recovery controllers are linearized for operation. At the edges of the abnormal operating region, we see that on one end lies the nominal operating region, and on the other end is process operation that 67 Abnormal Operating Range Nominal Operating Range Severely Abnormal Operating Range y3 y2 y1 Figure 4.6: 3D manifold of process operation with two planes showing the linearized regions the nominal and recovery controller are designed to operate is considered to be severely abnormal. Process recovery from a severely abnormal operating region is challenging and probably best achieved using expert rules that mimic an operator’s best actions in this event. However, discrete control actions from these rules are not mathematically optimal, thus the expert rules should focus only on recovering the process to the abnormal operating region where the continuous recovery controller is designed. In Figure 4.6, we see the assumption of local linearity breaks down at the edges of the planes due to the nonlinearity of the manifold, thus the size of these regions is dictated by a predefined allowable tolerance for process model mismatch and controller performance decay. Once the boundaries of the abnormal operating region have been decided, then operating data that lies within abnormal operating region is needed to model the process in this region. However, it is unlikely that there will be abundant operating data available within this region, and the process would not have been allowed to operate abnormally for a long time. Thus, it may be hard to find enough operating data to train a process model accurately in this region. The strategy to develop the process model is to ideally find at least two sets of data 68 Gp YU Figure 4.7: Block diagram for the input-output relationship of a modelled process where the operators placed the control in open-loop to manually recover the process from an abnormal operating region. At least one data-set is used to train a process model for the abnormal operating region, and one data-set should be set aside as a test set to validate the model. For each of the training data-sets, train a first-order plus time delay (FOPTD) or second-order process plus time delay (SOPTD) model of the process response for each process input-output pair. There are many forms of process models we can use; however, FOPTD or SOPTD are chosen in this work to model the process because many chemical processes can be adequately modelled with these forms, and most modern DCS allow for these forms of the process model in the built-in MPC. Figure 4.7 illustrates the block diagram for a process with output Y (s) and input U(s) that is modelled using a transfer function Gp(s), where Y (s) is the process output, y(t), represented in the frequency domain and U(s) is the process input, u(t), represented in the frequency domain. The process transfer function, Gp(s), for a FOPTD process is of the form: Gp(s) = Y (s) U(s) = Kp ⌧1s+ 1 exp(✓s), (4.40) where Kp is the gain of the response in Y (s) for a change in U(s), ⌧ 1 is the time constant, and ✓ is the time delay of the process response. The Laplace variable, s, is known as the operator variable in the frequency domain. A seldom used form of the FOPTD model is when lead dynamics are included in the numerator of the transfer function. A FOPTD model with lead dynamics is of the form: Gp(s) = Y (s) U(s) = Kp ⌧ns+ 1 ⌧1s+ 1 exp(✓s), (4.41) 69 where ⌧n is the time constant of the first-order numerator dynamics. It is unlikely this transfer function is needed for most chemical processes. However, if the process response is fast, the numerator dynamics may help in modeling the transfer function. A SOPTD model usually arises whenever two first-order processes are connected in series. Gp(s) for a SOPTD process is of the form: Gp(s) = Y (s) U(s) = Kp (⌧1s+ 1)(⌧2s+ 1) exp(✓s), (4.42) where ⌧ 1 and ⌧ 2 are the time constants for the two first-order transfer functions making up the SOPTD transfer function. For chemical processes, it may be necessary to include dynamics in the numerator of the SOPTDmodel, particularly when the process data shows an inverse response. The SOPTD model with numerator dynamics is: Gp(s) = Y (s) U(s) = Kp ⌧ns+ 1 (⌧1s+ 1)(⌧2s+ 1) exp(✓s). (4.43) To identify any of the process models in Equations (4.40)–(4.43), a continuous-time model identification technique should be conducted on sampled data. Discrete-time model identification techniques, such as output error modeling, are not considered because these models need to be converted to continuous time, and it is not guaran- teed that a discrete-time model can be converted to a continuous-time model with a specific pre-determined form (Bequette, 1998; Ljung, 2007b). Garnier et al. (2003); Ljung (2009) discuss methods to identify a continuous-time model from discrete-time data. One of the most common continuous-time model identification methods is the prediction error method. This technique minimizes the prediction errors when fitting a model and involves an iterative numerical search (Ljung and Wills, 2008). The prediction error method for continuous-time model identification is available in the MATLAB System Identification Toolbox (Ljung, 2007a). The procedure to find the best process model of an abnormal operating region, and its model parameters, is given below as a guideline and is to be repeated for each 70 input-output pair, u(t) and y(t). Step 1: Identify a set of training data where the process is in the abnormal operating region and y(t) has been suciently excited from changes to u(t) that are uncorrelated from other process inputs. Step 2: Determine the parameters for each of the four continuous-time models in Equations (4.40)–(4.43) using the MATLAB System Identification Toolbox. Step 3: Compute the residuals of each model, and evaluate whether increasing the order of the model increases the prediction accuracy at a statistical significance level of ↵. Choose the model of lowest order that gives the best fit. A more detailed discussion on model selection is given in Appendix E. Step 4: Evaluate the overall model on the test data-set to see how well it will perform in practice. Once the process model is satisfactory, it can be used as a predictive model in the MPC controller. 4.2.2 Override MPC design An override MPC is used when the nominal controller is any type of feedback controller except MPC. The override controller is tuned for a specific abnormal range of process operation. When the measure of process shift output from the process monitoring method, given by the output of the LDA or QDA classifier, indicates the process shift to this region is high, the controller will try to recover the process shift to a SP of zero by overriding the process inputs that are manipulated by the nominal controller. Figure 4.8 illustrates the override control strategy. Figure 4.8 shows that the override controller takes control of the MVs using high (>) and low (<) signal selects. High selects are used when the optimal action to recover the process for a particular MV is to increase the variable, and vice versa for 71 Process CV MV’s Nominal Controller- +CV Set-point MV’s Measure of Process Shift Override MPC Process Monitoring < > - +Process Shift Set-point Figure 4.8: MPC controller overrides the MV’s of the existing controller to recover the process after a process shift the low selects. The implementation of the override MPC controller is simple: the only CV is the measure of process shift, there are as many MV’s as there are degrees of freedom needed to recover the process, and there are no constraint variables. In the MPC design, the penalty-on-move weights should be set as low as possible because the main objective of the override control is fast recovery of the process. This is because the fastest recovery of the process possible with MPC is when the cost function only includes the prediction errors. When the cost associated with changes to the MVs is included, the time for the process to recover becomes suboptimal. However, some cost associated with the control moves is necessary to prevent excessive wear to the control elements in a recovery situation. With the override MPC configuration, there is the possibility that switching tran- sients from the high and low selects may introduce instability to the system. In particular, if the disturbance at the root cause of a process shift persists, the recovery controller may keep the process at the boundary between the abnormal and nominal operating regions. In this case, dwell-time switching logic may be used to prevent rapid switching between control signals. In this logic, a pre-specified length of time, known as the dwell time, must pass before the signal can switch from one controller (Morse, 1993). However, dwell-time switching has some serious disadvantages. For 72 example, the controller performance could severely decay within the dwell-time. An alternative to dwell-time switching is to use hysteresis based switching. Hysteresis switching requires that the signal to be switched exceeds the switching threshold by a predefined value before switching occurs (Hespanha et al., 2003). However, hystere- sis switching may introduce destabilizing transients resulting from the signal to the control elements abruptly changing. Given that the nature of the switching logic for high and low signal selects inher- ently results in reasonably smooth switching, it is unlikely that e↵orts are required to mediate instabilities resulting from switching control signals to the control elements in the override MPC design. This continuity of switching exists because both signals must be nearly identical in magnitude as switching occurs, with only the signal that is slightly higher or lower than the competing signal being selected by the high or low select functions, respectively. Nevertheless, a quick simulation should be performed us- ing the process models from the nominal and override controllers to determine whether any instabilities result from switching. In the event that the magnitude of process shift is large and the process is consid- ered to be in a severely abnormal and nonlinear operating region, expert rules that mimic an operator’s best actions may be needed to help recover the process. The expert system acts as the final override to the MVs and has the ability to override any previous control decisions to the MVs. Figure 4.9 illustrates the expert system. The objective of the expert system is only to recover the process from the severely abnormal operating region into the less-severe abnormal operating region that the override MPC is designed for. Likewise, the expert rules should be designed with hysteresis to ensure that the process has been recovered far enough away from the severe abnormal operating region that it is unlikely that the override MPC will be unable to complete the recovery of the process. With the override MPC control strategy commissioned, it is unlikely that a process will shift to a severely abnormal operating region. However, large abrupt disturbances 73 Process CV Process Monitoring < Expert Override > . . . . . . . . . . . . . . . . . . Figure 4.9: Expert rules override the MV’s if the process is operating in a severely abnormal region are possible in some processes, and these disturbances may rapidly shift a process into severely abnormal operation. Therefore, an expert system provides an extra layer of security in the process to maintain safe and stable operation following an extraordinary event. 4.2.3 Modifications to existing MPC If the existing controller in the process is already MPC, then the controller can be modified to incorporate the measure of process shift as a constraint variable. Therefore, when the constraint is violated, the MPC will switch to the models for the constraint variable and execute the corrective actions to recover the process to nominal operation. The structure of the existing MPC is shown in Figure 4.10. When comparing Figure 4.10 to Figure 4.8, we see that it is more ecient and intuitive to implement the control strategy when the nominal controller is MPC. With this method, there is no need to license a new MPC. Furthermore, the MPC inherently switches the models to the constraint model set after a process shift, and MPC in modern DCS is designed to avoid the risk of creating process instability from internal model switching. Figure 4.10 also illustrates the expert override control strategy that is identical to 74 Process CV MV - +CV Set-point Measure of Process Shift Process Monitoring AV CV MPC Expert Override Figure 4.10: Existing MPC controller is modified with a constraint variable to allow the MPC to recover the process after a process shift the override strategy depicted in Figure 4.9. In this figure, however, MV and CV can represent any number of manipulated or controlled variables since MPC is a multiple- input and multiple-output controller. 75 Chapter 5 Case Study of SAG Mill Overload The objective of this thesis is to develop a control strategy to manage processes that are operating outside of their nominal operating region. This was achieved in Chap- ter 4. This chapter provides an industrial case study for the application of the control strategy for the detection of a process shift to an abnormal operating region, and the recovery of the process from the abnormal operating region to the nominal operating region. This case study is conducted on the SAG mill at a mineral processing facility in British Columbia. The operators at this facility operate the SAG mill conservatively because of their concern for the mill to overload. They find the early detection of overloading to be extremely dicult, and are therefore not able to react until the situation deteriorates significantly. Furthermore, the operators’ corrective actions to recover the process result in a sub-optimal recovery and a substantial opportunity loss in mill performance. In this chapter, we show how we can address these issues using the control strategy proposed in Chapter 4. 5.1 Description of the process A detailed overview of the SAG mill process, and the overload process abnormality is discussed in Chapter 2.3. Figure 5.1 concisely shows the SAGmill circuit at the mineral processing facility with key process variables labelled that are utilized by the existing 76 MPC nominal control strategy. In the MPC, the CVs are the feed-end west bearing oil pressure and calculated charge density. The feed ore and water flow rates are the MVs, and the constraint variables, labelled AV in the figure, are the motor powers, remaining three bearing oil pressure measurements, pebble recycle, and sound level. Finally, the disturbance variable (DV) is a measurement from a visual size analyzer of the fraction of ore particles in the feed that are greater than two inches. Here, a particle size of two inches is of interest because the pebble ports on the discharge grate allow two inch particles to pass, thus particle sizes greater than two inches have a longer residence time in the mill and significantly a↵ects the mill dynamics. Figure 5.1 shows only the most relevant process variables to the operation of the mill using MPC. There are numerous additional measurements monitoring the condi- tion of the mill such as bearing temperatures and vibration measurements. Addition- ally, the mineral processing facility lacks important measurements such as discharge flow to the pump box; however, some missing measurements can be inferred from available measurements. Based upon the nature of SAG mill overloading, we believe a process monitoring method that utilizes available and inferred measurements can accurately detect the onset of overloading and also provide an estimate of the severity of the overload. The selection of process variables for process monitoring is discussed in Chapter 5.2.1. 5.1.1 Control objective The objective of the nominal control strategy at the mineral processing facility in British Columbia is to control the feed-end west bearing pressure and charge density to target values. The feed-end west bearing pressure is the highest bearing pressure because the load is distributed mostly towards the feed chute and, during mill rotation, is lifted up onto the west side of the mill. By controlling this bearing pressure to a target value, it is expected that the throughput of the mill is maximized. This is true only when the mill is not overloaded to any extent. Charge density is used as the second 77 conveyer SAG MILL M M MOTOR B MOTOR A screen co nv ey er conveyer FEED-END TRUNNION BEARING OIL (EAST / WEST) DISCHARGE-END TRUNNION BEARING OIL (EAST / WEST) WT FT FEED WATER f(x) DI DENSITY CALCULATION CASCADE MV2 CV1 AV1 AV2 AV4 AV3 CV2 AV5 AV6 AV7 PARTICLE SIZE ANALYZER DV1 CONTROLLED VARIABLES (CV’s) CV1.  FEED END WEST BEARING OIL PRESSURE CV2.  SAG MILL DENSITY CONSTRAINT VARIABLES (AV’s) AV1.  MOTOR A POWER AV2.  MOTOR B POWER AV3.  PEBBLE RECYCLE CONVEYOR LOAD AV4.  SOUND LEVEL AV5.  FEED END EAST BEARING OIL PRESSURE AV6.  DISCHARGE END EAST BEARING OIL PRESSURE AV7.  DISCHARGE END WEST BEARING OIL PRESSURE MANIPULATED VARIABLES (MV’s) MV1.  ORE FEED RATE MV2.  FEED WATER DISTURBANCE VARIABLES (DV’s) DV1.  ORE PARTICLE SIZE > 2" TO PUMP BOX FIC FIC conveyer NOTATION C. D. F. I. J. M. P. T. W. X. CONTROLLER DENSITY FLOW INDICATOR POWER MOTOR PRESSURE TRANSMITTER WEIGHT SOUND JT XT PT PT JT DIC WT PT PT Figure 5.1: Overview of existing SAG mill control at the mineral processing facility in British Columbia 78 CV to ensure the slurry water content is sucient to allow adequate discharge and to prevent excessive cohesion of clay minerals that may lead to overloading. However, the density measurement is seldom accurate because the calculation is rudimentary and based upon the feed and water rate measured at a single sample instance in the past. Although not discussed in this work, a more practical approach to SAG mill control would be to maximize the throughput of the mill. To maximize throughput, the discharge flow to the pump box would need to be measured for use as a CV in the MPC. The input and output variables of the process shown in Figure 5.1 are listed in Table 5.1 with their proper control notation: x is a state variable to be controlled; u is an input manipulated variable; w is an input disturbance; and y is a vector of output measurements. In the case of the SAG mill overload, the constraint variables will not be violated until the overload becomes extremely severe. Instead, many of these measurements are included in the process monitoring technique in order to estimate the magnitude of the mill overload. PM1 denotes the process monitoring output variable in the table, and it represents the estimated mill overload magnitude as a single variable. Table 5.1: Input and output variables critical to the recovery control strategy Outputs Variable Inputs Variable CV1 x1 MV1 u1 CV2 x2 MV2 u2 PM1 x3 DV1 w1 AV1–7 y The control objective of the SAG mill process is shown in the following optimization 79 problem: minimize u1,u2 "(u1, u2) = abs(x1  x1,sp) + abs(x2  x2,sp), (5.1) subject to x1,lower  x1  x1,upper, (5.2) x2,lower  x2  x2,upper, (5.3) ylower  y  yupper. (5.4) x3  x3,upper, (5.5)24ẋ y 35 = f(x,u,w), (5.6) Here, the control objective is formulated to minimize the absolute value of the error between the control variables, x1 and x2, and their SPs, x1,sp and x2,sp. In addition, the process monitoring output variable, x3, has an upper constraint limit. If the constraint in Equation (5.5) is violated, then the process is considered to be overloaded and it is desirable to recover controller performance. Otherwise, we will be unable to control x1 and x2 to the target SPs. This will result in poor minimization of Equation (5.1) and possible violations of the constraints in Equations (5.2) to (5.4). The greatest challenge in satisfying Equations (5.1) to (5.6) is the equality con- straint imposed in Equation (5.6). Because the dynamics of f(x,u,w) are nonlinear, the nominal linear controller used to satisfy the SAG mill control objective will be insucient in the event of a mill overload. In particular, a process shift from the nom- inal operating region to the overload operating region is nonlinear, and the nominal controller is not tuned for the overload operating region. In Chapter 5.4, we introduce Equation (5.5) as a constraint variable into the existing nominal MPC controller for the SAG mill. When the upper limit of Equation (5.5) is violated, the MPC switches to the constraint variable models which are tuned specifically for the overload operating region in order to recover controller performance. 80 5.2 Process monitoring The first task in the application of the overall control strategy on the SAG mill is to train the process monitoring method to be sensitive to the overloaded operating region. The SAG mill overload is a complex and nonlinear abnormal situation to detect. This is because the onset of overloading can be the result of any of several phenomena each a↵ecting process outputs in di↵erent ways. In addition, the e↵ects of overloading in the early stages are subtle with only small deviations to the linear characteristic of the nominal range of operation. In this section, we show that traditional PCA using Q and T 2, Method 1, is unable to detect the onset of overloading. Next, we show that PCA with LDA classification, Method 2, successfully detects the onset of overloading when monitoring only a specific selection of process variables. Furthermore, PCA Method 2 was implemented on site at the mineral processing facility in British Columbia in January 2013. Finally, we also show that SLLEP with LDA classification is able to successfully detect the onset of overloading regardless of the selection of process variables, and the classification results are superior to PCA Method 2. Unlike PCA, SLLEP was not implemented on site due to limitations of the MPC tool available at the mineral processing facility. The results discovered from the implementation of PCA demonstrates the first successful implementation of a process monitoring method to both detect SAG mill overloading and estimate the magnitude of overload at a mineral processing facility. 5.2.1 Selection of measured and inferred process variables There are numerous input and output variables considered to train the feature extrac- tion methods for process monitoring. There are nearly 100 measurements available for the SAG mill. However, only a handful of these variables are needed to estimate a process overload; other process variables are not sensitive to the changes within the mill relating to overloading, or they provide redundant information that would bias 81 the feature extraction results. The candidate process variables considered for process monitoring are listed in Table 5.2. This table provides a summary of the original units, filtering time constant, upper and lower ranges for scaling, and the standard deviation of the scaled variables. All 10 candidate variables are scaled from 0 to 1, and the upper and lower ranges for scaling are manually selected based upon the following heuristics: the variables are scaled considering their upper and lower operational constraints at the mineral processing facility, and the scale range is adjusted to create relatively similar standard deviations between all candidate variables. Table 5.2: Properties of the candidate process variables Process Variable Units Lower Range Upper Range Filter Time Constant (min) Standard Deviation Bearing Pressure psi 850 1200 1 0.1316 Motor Power kW 4000 6200 1 0.0701 Sound Level n/a -40 40 1 0.1099 Center of Gravity n/a 0.459 0.464 4 0.0907 Di↵erence of Slopes n/a -0.0027 0.0027 11 0.1135 Recycle STPH 0 600 1 0.0599 Discharge to Pump Box STPH 2200 4800 1 0.0870 Specific Energy kW/ton 1 12 1 0.1291 Vibration n/a -0.15 0.15 5 0.1132 Size > 2 inch % 0 100 4 0.1017 The filtering time constants listed in Table 5.2 are also determined heuristically. The filter time constant needs to be as small as possible to minimize detection delay of the overload in process monitoring. Conversely, the time constant needs to be large enough to filter stochastic noise from the data in order to help minimize false alarms. A first order filter is used to filter the data and has the di↵erence equation: y[k] = ↵fy[k  1] + (1 ↵f )x[k], (5.7) where k is the sample instance, x is the filter input, y is the filter output, and ↵f is 82 given in the following expression: ↵f = exp ✓t ⌧ ◆ . (5.8) Here, ⌧ is the filter time constant, and t is the elapsed time in minutes between samples. For the SAG mill case study, we believe it is ideal to detect overloading within at most 20 minutes. Otherwise, the mill may severely overload and result in excessive stress on the equipment and poor process economy. Although some variables may have a large detection delay due to filtering, the advantage of a multivariate process monitoring method is that all the variables complement each other and the disadvantages of one variable, such as detection delay, can be o↵set by other variables with low detection delay. Similarly, no single variable alone is suitable to monitor SAG mill overloading, thus several variables are needed to provide sucient evidence of an overload. It is important to note in Table 5.2 that there is only one input candidate variable, feed rate with size greater than two inches, and the rest are process outputs. Time plot studies of the data discovered that other process inputs, such as the total feed rate and water flow, could be any arbitrary value during an overload. In other words, there is a poor correlation between mill overloading and these inputs. This is expected because the overload is the result of complex changes to the internal states of the mill that needs to be indirectly measured from the process outputs. Each of the 10 candidate variables and their relevance to a SAG mill overload are briefly discussed in the following sections. Bearing pressure, motor power, di↵erence of slopes Bearing pressure and motor power are two variables that are closely monitored by op- erators when they are concerned about an overload. During an overload, they expect to see high values for bearing pressure and motor power. Furthermore, they also ex- 83 pect bearing pressure to continually increase as a mill overloads, yet the motor power may not increase or even decrease during severe overloading. Therefore, based upon operator experience, we believe that the di↵erence in the rates of change between bear- ing pressure and motor power is also important to monitor. A variable transformation of the scaled bearing pressure and scaled motor power is used to create the di↵erence of slopes variable: Di↵erence of Slopes = BP t  MP t , (5.9) where the slopes are determined from a first order finite di↵erence calculation given by: BP t = BP[k] BP[k  1] t , (5.10) MP t = MP[k]MP[k  1] t . (5.11) Here, BP is the feed-end west bearing pressure, MP is the motor power for motor A, k is the sample instance, and t is the time between sample instances and is chosen to be one minute—one minute is reasonable since the time constant of the process is estimated to be around 10 minutes. Sound level Another important variable to monitor is sound level. Experienced operators insist that an overloaded mill is quieter than normal. Moreover, the review of SAG mill overloading in Chapter 2.3 identified that mill acoustics should decrease during an overload resulting from the reduced potential energy or dampening of the cataracting charge. A time plot of the sound level measurement of the SAG mill revealed that the sound level has a moving average. The moving average is likely a result of changes in the steel ball charge and ore type. The condition of the mill liner and auxiliary equipment where the SAG mill is located may also a↵ect the sound level. Consequently, the 84 sound level was transformed into the di↵erence between a 0.4 day moving average of the sound level and the raw measurement filtered at 40 seconds. A time plot of the raw sound level and the moving average is shown in Figure A.1 in Appendix A.1, and the sample instances when the mill was overloaded is highlighted to show the e↵ect of overloading on the raw sound level compared to the moving average. Center of gravity The review in Chapter 2.3 also revealed that load distribution may be another key variable for process monitoring. If the load distribution shifts to the east side of the mill where the charge is not lifted, then it may indicate that ore or slurry is accumulating on this side and reducing the grinding eciency of the mill. To measure load distribution, the change of total mass between the east and west side of the mill can be determined from the bearing pressure measurements. The center of gravity of the mill can be used to estimate the load distribution, and it is calculated from the expression: Center of Gravity = FEEast BP + DEEast BP FEWest BP + DEWest BP + FEEast BP + DEEast BP , (5.12) where FE is the feed-end of the mill, DE is the discharge-end of the mill, and BP refers to the bearing pressure at specific locations of the SAG mill. Since the total force on the bearings is largely a↵ected by the weight of the mill, it is expected that the load distribution estimate will not be sensitive to the small changes in charge distribution inside the mill. However, the estimate can be appropriately scaled to emphasize the expected subtle changes in this measurement. Recycle rate, discharge to pump box During an overload, it is also expected that the recycle flow and discharge to the pump box will decrease even though bearing pressure increases and the feed rate may 85 be constant. The discharge flow to the pump box is inferred from the feed rate, water flow addition, and change of tonnage in the mill and calculated from the mass balance given in the expression: mcharge t = (Feed + Water + Recycle)  (Discharge to Pump Box + Recycle), (5.13) and rearranged to solve for the discharge to pump box: Discharge to Pump Box = Feed + Water mcharge t . (5.14) Here, feed is given in short tons per hour (STPH) and water is also given in STPH after being converted from gallons per minute (GPM) to STPH where 1 GPM is 0.25 STPH of water at 10 C. The discharge to pump box variable is also given in STPH, and mcharge is the estimated change in the mass of the charge over a time interval of t of one minute. At the mineral processing facility in British Columbia, the mass of the total charge in the mill is an available measurement inferred from the four bearing pressure measurements with a scaling and bias determined from a calibration performed on the empty mill. Specific energy Specific energy is a measure of the amount of energy consumed per ton of ore processed in the mill. In the midst of an overload, the specific energy of the charge should decrease because mass is accumulating but motor power does not increase. Specific energy is calculated from the expression: Specific Energy = Total Motor Power Total Bearing Pressure , (5.15) 86 where the total motor power is the combination of the motor power from both motors A and B, and the total bearing pressure is the combination of all four bearing cells. Bearing vibration, feed rate of size greater than two inches The final two variables considered for monitoring are bearing vibration and the feed rate with a size greater than two inches. The e↵ects of overloading on bearing vibration are poorly understood in the industry. However, we believe bearing vibrations may initially increase upon overloading due to the increased overall inertia and total kinetic energy of the charge as the mass of the charge increases. Eventually, vibrations may decrease upon significant overloading as the impact energy of the cataracting charge decreases. Even though feed with a particle size greater than two inches is considered as a candidate process variable, a time plot of this variable also shows that there appears to be no easily observable correlation between an overload and the particle size distur- bance. Time plots of each of the scaled process variables are provided in the following section. Visualization of the training data A time plot of the training data-set for PCA and SLLEP feature extraction is given in Figure 5.2 showing data sampled at two minutes from the mineral processing facility on February 29th, 2012. Figure 5.2 shows that the bearing pressure and motor power time trends are two excellent indicators of the overload condition. Normally, bearing pressure and motor power are correlated; however, during an overload, bearing pressure increases and motor power holds steady. Upon severe overloading, shown just before sample instance 250, motor power and bearing pressure are inversely correlated. The figure also shows that the sound level variable—formulated so that low mill acoustics results in a high fault indicator—is high during an overload. This corresponds to a much lower sound 87 0 50 100 150 200 250 0 0.25 0.5 0.75 1 Be ar in g Pr es su re Overload 0 0.25 0.5 0.75 1 M ot or  P ow er 0 50 100 150 200 250 0 0.25 0.5 0.75 1 So un d Le ve l 0 0.25 0.5 0.75 1 Ce nt er  o f G ra vit y 0 50 100 150 200 250 0 0.25 0.5 0.75 1 D iff er en ce  o f S lo pe s 0 0.25 0.5 0.75 1 R ec yc le 0 50 100 150 200 250 0 0.25 0.5 0.75 1 D is ch ar ge  to  P um p Bo x 0 0.25 0.5 0.75 1 Sp ec ific  E ne rg y 0 50 100 150 200 250 0 0.25 0.5 0.75 1 Vi br at io n Sample Number 0 0.25 0.5 0.75 1 Si ze  >  2  in ch Figure 5.2: Candidate process variables of the training data-set for process monitoring 88 level than the moving average of the sound level. The figure also shows the center of gravity increases implying that the load distribution has shifted slightly to the east side of the mill. The third subplot of the figure shows the di↵erence of slopes is positive when the mill is overloaded which is apparent from the trend being above the dashed black line at the center of the scale. In this subplot, it is also observable that the recycle flow decreases once the mill is severely overloaded. The fourth subplot shows the discharge to pump box is high during the mill overload; however, it does not increase enough to o↵set the consistent rise in bearing pressure that indicates an accumulation of ore in the mill. The specific energy is observed to decrease during the overload. Finally, the vibration measurement appears to temporarily increase during an overload, and the feed rate with size greater than two inches is unexpectedly less than normal. From the training data shown in Figure 5.2, the first five candidate process vari- ables listed in Table 5.2 appear to be the most important indicators of the overload situation. Specific energy provides somewhat redundant information to the bearing pressure, recycle indicates the overload only during severe overloading, and vibration only indicates an overload at the onset of overloading. It is unlikely that a linear feature extraction method could utilize the nonlinear characteristics of variables such as the vibration measurement since the correlation between other variables is poor. However, nonlinear feature extraction may appropriately utilize these nonlinear mea- surements. Although it appears that five variables are the most important for process monitoring, all 10 candidate process variables will be considered when comparing PCA to SLLEP in Chapter 5.2.4. Figure 5.3 shows a pair-wise scatter plot between the five most relevant candidate process variables for the training data-set—the other five variables are not shown for brevity. The scatter plots show that the blue data points, representing nominal operating data, are scattered in an approximately elliptical fashion which is most visible for plots of bearing pressure and other variables. However, when considering 89 both the nominal and overload operating data, the data no longer scatters elliptically for the mill overload. This is a clear indication of the nonlinear nature of the SAG mill operation, particularly near the boundary between nominal and overloaded operation. The non-elliptical scattering of the data between nominal and overloading operation shows that at the boundary between these operating states multivariate normality of the data does not hold. This implies that linear feature extraction techniques such as PCA may result in poor performance when used in process monitoring to detect the onset of overloading. However, in PCA, the accuracy of model fit between the PC model and the measured values can be useful to provide an indication of overloading if the model fit is poor. Visualization of the test data Figure 5.4 shows the time trends of the 10 candidate process variables for the test data- set. This data-set is developed from data sampled at two minutes from the mineral processing facility on May 27th, 2012, when three consecutive overloads occurred. This data-set is excellent to validate the performance of the process monitoring technique because the mill is in a double-fault situation. In this case, the engineers overestimated the steel ball charge in the mill. As a result, the grinding eciency was depressed and the mill could only be operated at a much lower total charge mass than normal. The e↵ects of the lower charge mass can be observed from the unusually low bearing pressure. Because the bearing pressure was so low, the operators attempted to increase the bearing pressure in order to increase the mill throughput. However, the mill was already nearly overloaded. Consequently, the mill easily overloaded and it was later discovered that the deficiency of the steel ball charge indirectly lead to overloading. The test data shown in Figure 5.4 has mean and standard deviation values that are so di↵erent from the training data, that any attempts to train feature extractions methods on the standardized training data failed during model validation on the test data. Only after scaling the data between 0 and 1 was it possible to mitigate the 90 0 0.5 1 Be ar in g Pr es su re   Normal Operating Data Overloaded Operating Data 0 0.5 1 M ot or  P ow er 0 0.5 1 So un d Le ve l 0 0.5 1 Ce nt er  o f G ra vit y 0 0.5 1 0 0.5 1 D iff er en ce  o f S lo pe s Bearing Pressure 0 0.5 1 Motor Power 0 0.5 1 Sound Level 0 0.5 1 Center of Gravity 0 0.5 1 Difference of     Slopes Figure 5.3: Pair-wise scatter plot of the selected process variables for a wide range of data including the training set 91 0 50 100 150 200 250 0 0.25 0.5 0.75 1 Be ar in g Pr es su re Overload 0 0.25 0.5 0.75 1 M ot or  P ow er 0 50 100 150 200 250 0 0.25 0.5 0.75 1 So un d Le ve l 0 0.25 0.5 0.75 1 Ce nt er  o f G ra vit y 0 50 100 150 200 250 0 0.25 0.5 0.75 1 D iff er en ce  o f S lo pe s 0 0.25 0.5 0.75 1 R ec yc le 0 50 100 150 200 250 0 0.25 0.5 0.75 1 D is ch ar ge  to  P um p Bo x 0 0.25 0.5 0.75 1 Sp ec ific  E ne rg y 0 50 100 150 200 250 0 0.25 0.5 0.75 1 Vi br at io n Sample Number 0 0.25 0.5 0.75 1 Si ze  >  2  in ch Figure 5.4: Candidate process variables for a segment of the test data-set for process monitoring 92 drastic changes to the mean and standard deviation of the variables and design a feature extraction method e↵ective for both the training and test data-sets. Scaling the variables rather than standardizing is one of the features that makes the process monitoring models developed in this chapter robust to changing process conditions. 5.2.2 PCA Two methods of training the PC model and classifying the feature space were inves- tigated. The procedure for both of these methods was discussed in Chapter 4.1.1. First we will show that PCA Method 1 is not satisfactory for process monitoring of a complex process such as a SAG mill. Then, we will demonstrate that PCA Method 2 is instead more suitable for these processes if careful consideration is given to the selection of process variables in the model. PCA Method 1 To use PCA as a feature extraction technique, Method 1 was first evaluated. Since this method requires that the training data are entirely nominal operating data, a training data-set di↵erent from Figure 5.2 was used. The training data-set used to train PCA Method 1 is shown in Figure 5.5. This data-set is taken at a two minute sampling from the mineral processing facility on February 29th, 2012. This training data includes some of the training data in Figure 5.2 up until the point where the mill became overloaded. PCA Method 1 was trained on this data using the first five variables in Table 5.2 The test statistics Q and T 2 were calculated for the test set in Figure 5.4 to classify abnormal and normal process behaviour. Figure 5.6 shows the results for the Q and T 2 statistical classifiers from Method 1 on the test data-set from May 27th, 2012 sampled every two minutes from the mineral processing facility. In these results, we see that neither Q and T 2 are sensitive enough to the mill overload to detect the early onset of overloading. This is likely because the overload results in subtle changes to the 93 0 50 100 150 200 250 300 350 400 450 0 0.25 0.5 0.75 1 Be ar in g Pr es su re 0 0.25 0.5 0.75 1 M ot or  P ow er 0 50 100 150 200 250 300 350 400 450 0 0.25 0.5 0.75 1 So un d Le ve l 0 0.25 0.5 0.75 1 Ce nt er  o f G ra vit y 0 50 100 150 200 250 300 350 400 450 0 0.25 0.5 0.75 1 D iff er en ce  o f S lo pe s 0 0.25 0.5 0.75 1 R ec yc le 0 50 100 150 200 250 300 350 400 450 0 0.25 0.5 0.75 1 D is ch ar ge  to  P um p Bo x 0 0.25 0.5 0.75 1 Sp ec ific  E ne rg y 0 50 100 150 200 250 300 350 400 450 0 0.25 0.5 0.75 1 Vi br at io n Sample Number 0 0.25 0.5 0.75 1 Si ze  >  2  in ch Figure 5.5: Candidate process variables of the training data-set for PCA Method 1 only 94 correlation matrix used to train the PC model. Moreover, these statistics appear to be more sensitive to the highly nonlinear underloading abnormal situation. However, mill underloading does not require a recovery control strategy like overloading does. 0 50 100 150 200 250 0 5 10 15 20 H ot el lin g T 2 St at ist ic T 2 α Overload 0 50 100 150 200 250 0 0.05 0.1 0.15 0.2 0.25 Q St at ist ic Q α Sample Number Figure 5.6: Q and T 2 results on the test data-set for PCA Method 1 Table 5.3 summarizes the classification results for PCA Method 1. Two PCs were retained in the PC model and used to calculate the Q and T 2 results. The perfor- mance of the Q and T 2 test statistics was evaluated using sensitivity and specificity, and an average of the sensitivity and specificity results in a column labelled ‘Metric’. Sensitivity is a measure of the proportion of true overload operating data that are correctly identified by the classifier. Sensitivity is defined by the following expression: Sensitivity = # of true overload # of true overload + # of false nominal , (5.16) 95 where a true overload corresponds to the data we labelled as overloaded before training the classifier and are correctly classified, and a false nominal corresponds to data classified as nominal by the classifier when instead we labelled that data as belonging to the overloaded class. The low sensitivity of both Q and T 2 reveals that these classifiers are poor indicators of the SAG mill overload. The other metric used to evaluate the performance of the classifier is specificity. Specificity is a measure of the proportion of true nominal operating data that are correctly identified by the classifier. Specificity is defined by the following expression: Specificity = # of true nominal # of true nominal + # of false overload , (5.17) where a true nominal corresponds to the data we defined as nominal before training the classifier and are correctly classified, and a false overload corresponds to data classified as overloaded by the classifier when instead we labelled that data as belonging to the nominal class. Table 5.3: Classification results for PCA Method 1 Dimension of Feature Space Dataset Q↵ Q Sens. Q Spec. T 2↵ T 2 Sens. T 2 Spec. Metric 2D Train 0.058 0% 97.7% 6.05 0% 89.4% 46.8% Test 0% 74.7% 9.1% 93.1% 44.2% Sensitivity and specificity are used instead of an overall classification error rate because we are most interested in knowing if we incorrectly classified an overload as being normal. Also, it is likely that more nominal data is available than overload data, and this imbalance biases this type of estimate towards measuring the successful classification of only the nominal data. For complex processes, we do not expect to see satisfactory process monitoring results from PCA Method 1. This case study is one example providing evidence that Method 1 is not sensitive enough to subtle changes in the correlation structure of 96 data in the event of a process shift. However, Method 1 is included in this thesis to demonstrate the advantages of PCA Method 2 or SLLEP proposed in this work. As a general rule, if it is known that the process dynamics are complicated, we recommend to not attempt to use Method 1 for process monitoring. PCA Method 2 PCA Method 2 has more satisfactory feature extraction and classification results than Method 1. In Method 2, the PC model was trained on the training data-set in Fig- ure 5.2. This data-set has enough overload data included with the nominal data that when the PC model was trained, correlations existed between the first five candidate variables in Table 5.2. In fact, the inclusion of the overload data allowed variables such as the center of gravity to develop a correlation structure that otherwise never existed in the nominal operating region. Figure 5.2 shows that the center of gravity only has a strong correlation with other variables once the mill is overloaded. This is a key reason why overload data are needed in training the PC model. The sacrifice to incorporating overload training data occurs when the overload is severe and the correlation structure di↵ers significantly from the nominal operation. However, as long as a limited amount of data in the severe operating region is used, then the PC model will not significantly lose performance as a feature extractor. The PC model parameter results from training are shown in Table 5.4. Table 5.4 shows the five PCs and the coecients of the five process variables used in the model. From the table, we see that nearly all of the variance in the data-set can be explained with one, two, or three PCs. Scores from each PC were plotted against each other in 2D plots to inspect which PCs were sensitive to overloading. It was found that PC1 and PC2 were most sensitive to the overload. The 2D plot of PC1 and PC2 scores is shown in Figure 5.7a. Figure 5.7a shows that the nominal and overloaded operating data occupy their 97 Table 5.4: PCA modeling results PC1 PC2 PC3 PC4 PC5 Bearing Pressure 0.651 0.427 0.073 -0.339 -0.523 Motor Power 0.313 0.128 -0.232 -0.483 0.774 Sound Level 0.456 -0.186 -0.688 0.533 -0.027 Centre of Gravity 0.329 0.300 0.577 0.585 0.355 Di↵erence of Slopes 0.403 -0.823 0.367 -0.158 -0.016 Eigenvalue 0.0360 0.0088 0.0071 0.0034 0.0001 % Variance Explained 64.90 16.00 12.80 6.10 0.30 own distinct regions of the PC score space. A linear classification boundary is drawn in the figure to separate the data into binary classes of nominal and overload operating data. The equation for the classification boundary is determined by 2D LDA and is given in Table 5.5. Table 5.5: PCA-LDA classification results for the 2D and 1D feature spaces Dimension of Feature Space Classifier Data-set Sens. Spec. Metric APER 2D y = 1.464x+ 1.641 Train 97.8% 99.4% 98.6% 31.2% Test 18.8% 100% 59.4% 1D x = 1.195 Train 100% 95.5% 97.8% 13.0% Test 65.4% 100% 82.8% Table 5.5 summarizes the classification results for the PCA-LDA process moni- toring method using sensitivity, specificity, a cumulative performance metric labelled ‘Metric’, and the apparent error rate (APER). The column labelled ‘Metric’ in Ta- ble 5.5 is simply the average of the sensitivity and specificity metrics, and is used to provide a concise summary of the classification results. APER is the estimated overall error rate of the classifier, and a low APER value is favourable. APER is determined from the classification results on the test set, and it is calculated as follows: APER = # of false overload + # of false nominal # of samples in the test set . (5.18) 98 0.4 0.6 0.8 1 1.2 1.4 1.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 PC 2 Sc or es   a) Nominal Overloaded Classification Boundary 0.4 0.6 0.8 1 1.2 1.4 1.6 PC1 Scores   b) Figure 5.7: LDA classification results on the PC score space of the training data-set for a) 2D LDA, and b) 1D LDA Figure 5.7a demonstrates of the importance of being able to visually interpret high- dimensional data into a low-dimensional feature space that is easily plotted. In this case, we see the linear classifier has separated the nominal and overloaded data across two-dimensions. However, it appears the data are almost entirely separable across one dimension—the scores for PC1. As a consequence, the 2D linear classifier over-fits the training data, and the classifier will not generalize well to new data sampled online at the mineral processing facility. To mitigate over-fitting, the classifier is trained on PC1 score data in 1D LDA, and the classification results are shown in Figure 5.7b and summarized in Table 5.5. The figure and table show that the classification results 99 between 2D and 1D LDA on the feature space are nearly identical. However, when the 2D and 1D classifiers are applied to the test data-set, the results indicate that the 1D classifier has superior classification accuracy than the 2D classifier. The classification results for 2D and 1D LDA on the feature space of the test set are given in Figure 5.8a, Figure 5.8b, and Table 5.5. 0.4 0.6 0.8 1 1.2 1.4 1.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 PC 2 Sc or es   a) Nominal Overloaded Classification Boundary 0.4 0.6 0.8 1 1.2 1.4 1.6 PC1 Scores   b) Figure 5.8: LDA classification results on the PC score space of the test data-set for a) 2D LDA, and b) 1D LDA 5.2.3 SLLEP In this section, we show that SLLEP in combination with LDA can also successfully de- tect the onset of SAG mill overloading. In addition, in this case study we demonstrate 100 that this technique achieves equivalent or better process monitoring performance that PCA Method 2, and this technique is suitable for both low and high-dimensional pro- cess monitoring. PCA Method 2, on the other hand, is limited to monitoring the few scarce variables with a reasonably linear response during a process shift. The method for training SLLEP as a feature extractor is discussed in Chapter 4.1.2. In this case study, we found that we needed approximately 2000 data points from both nominal and overloaded training data to properly train SLLEP. This training data-set was obtained from the mineral processing facility for February 25th, 2012 to February 29th, 2012. This data-set includes the data shown in Figure 5.2. All nominal operating data were sampled every five minutes, and overload data for the overload shown in Figure 5.2 and a partial overload on February 27th were sampled at 30 seconds. In addition, nominal data from May 26th, 2012 were included in the training set with a sampling of five minutes to include data from di↵erent operating regions. Finally, all data where the SAG mill was not in operation were removed. This results in a training data-set with 1919 data-points spanning over six days and includes two identified overloads. Four candidate models for several combinations of candidate process variables were considered when training and testing SLLEP. The four candidate models are shown in Table 5.6. Model 1 is the same model used in training the PC model in Chapter 5.2.2. Three other models with increasing numbers of variables in the model were considered. The results for SLLE on the training data-set mapped to 2D space are shown in Figure 5.9 for Model 4. The figure shows that the nominal operating data appears to scatter elliptically, and the nominal and overload operating data are well-separated with some misclassifications near the classification boundary. A 2D visual interpreta- tion of all 10 process variables would not be possible without a method such as SLLE, and the overload data are separated enough from the nominal operating data that it is possible to use LDA to classify the 2D space into regions of nominal and overload 101 Table 5.6: Summary of the variables retained in each candidate model Variables Model 1 Model 2 Model 3 Model 4 Bearing Pressure X X X X Motor Power X X X X Sound Level X X X X Centre of Gravity X X X X Di↵erence of Slopes X X X X Recycle X X Discharge to Pump Box X X X Specific Energy X Vibration X X Size > 2 inch X operation. Finally, the figure also shows that QDA over-fits the region and is unable to classify some severely overloaded data within the quadratic classification boundary, and classification of severely overloaded data needs to be perfect due to the potential safety risks and equipment damage resulting from an overloaded mill. In both LDA and QDA classifiers on the feature space in Figure 5.9, we see that there are misclassified data points. However, these misclassified data points are ac- ceptable because they likely tell us two things. First, since we manually assigned a class label to data ranges that appeared faulty, there is subjectivity to this process. Only the most obvious overload data were classified as being overloaded in order to reduce bias towards classifying data as overloaded. Secondly, it is nearly impossible to perfectly classify data as being nominal or overloaded when the process shift between the two operating regions is continuous. In other words, there is a ‘fuzzy’ operating region in the middle that can be considered both slightly overloaded and nominal at the same time. This presents a challenge when assigning a binary class label to a non- binary abnormal situation. Given these points, we expect to see misclassifications, and the linear classification boundary in the figure shows a good compromise between classifying data as nominal or overloaded. Figure 5.10 shows the 2D SLLEP mapping of the test data-set using Model 4. 102 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 Coordinate 1 Co or di na te  2   Nominal Overloaded Quadratic Classification Boundary Linear Classification Boundary Figure 5.9: LDA and QDA classification on the training data-set mapped into 2D space by SLLE The results from this plot show that the linear classifier is superior to the quadratic classifier because it misclassifies fewer observations. The figure also shows that there are numerous data points where the overload data were classified incorrectly as nominal by the linear classifier. However, an inspection of these data points revealed that these data points are on the fringes where manual class labels were added to the overload ranges in the test set; in other words, we most likely manually classified some data as being overloaded when in reality the mill was not necessarily overloaded. Table 5.7 summarizes the SLLEP classification results for the four candidate mod- els. Both LDA and QDA were tested for each model. The performance metric to compare the models is simply the average of the four training and test sensitivity and specificity values. The training sensitivity and specificity were determined from 103 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 Coordinate 1 Co or di na te  2   Nominal Overloaded Quadratic Classification Boundary Linear Classification Boundary Figure 5.10: LDA and QDA classification on the test data-set mapped into 2D space by SLLEP 10-fold cross-validation of the training data-set, then the test data were mapped to the feature space and the classification results were determined. This table shows us that the linear classifier consistently scores better than the quadratic classifier for these four models. This is because the quadratic classifier is more susceptible to over generalizing the feature space than the linear classifier. Table 5.7 also shows that the classification performance increases as the number of variables used to train the model increases. This is not expected to always be the case; however, these results demonstrate that SLLEP works well with high-dimensional data-sets. In addition, SLLEP-LDA classification performance was not hindered by the addition of the particle size data, even though this variable does not correlate well with the overload in the test data-set. 104 Table 5.7: SLLEP Classification results of four candidate models Model Classifier K  Train Sens. Train Spec. Test Sens. Test Spec. Metric 1 Quadratic 28 0.0016 91.3% 93.7% 66.2% 99.5% 87.7% Linear 28 0.0016 91.1% 93.7% 70.0% 99.1% 88.5% 2 Quadratic 33 0.0022 91.5% 87.1% 81.5% 94.9% 88.8% Linear 34 0.0020 97.8% 90.1% 85.4% 94.0% 91.8% 3 Quadratic 31 0.0004 96.2% 93.7% 70.0% 96.8% 89.2% Linear 31 0.0004 96.2% 93.7% 80.0% 98.1% 92.0% 4 Quadratic 26 0.0012 99.1% 89.0% 86.9% 94.9% 92.5% Linear 26 0.0012 98.9% 89.0% 90.8% 93.5% 93.1% The equations of the linear and quadratic classification boundaries, determined from all solutions to f(yi) = 0, for SLLEP on Model 4 are given in Table 5.8. The APER is calculated from the results for the test data and provided in the table to provide an estimate on the classification error rate expected for the classifiers. Table 5.8: Classification boundaries for SLLE and the classification error rate Classifier f(yi) APER Linear 2.8 + 2.42x 4.95y 8.7% Quadratic 1.81 + 0.75x 5.88y + 1.96x2 + 3.3xy + 1.13y2 13.3% 5.2.4 Comparison between PCA-LDA and SLLEP-LDA In Chapters 5.2.2 and 5.2.3 we showed that PCA-LDA and SLLEP-LDA both suc- cessfully classify the overload abnormal operating situation in the training and test data-sets. The combined classification results of the two methods are summarized in Table 5.9 for the four models identified in Table 5.6. Table 5.9 shows that SLLEP-LDA is superior to PCA-LDA as a process monitor- ing method for the test data-set. For each model, SLLEP-LDA is more sensitive than PCA-LDA to detect an overload. On the other hand, PCA-LDA consistently has a higher specificity than SLLEP-LDA. However, a high specificity alone is not indica- 105 Table 5.9: Comparison of the classification results between PCA-LDA and SLLEP- LDA on the test set Model PCA Sens. LLE Sens. PCA Spec. LLE Spec. PCA Metric LLE Metric 1 65.4% 70.0% 100% 99.1% 82.7% 84.5% 2 53.1% 85.4% 100% 94.0% 76.5% 89.7% 3 15.4% 80.0% 100% 98.1% 57.7% 89.1% 4 0.0% 90.8% 100% 93.5% 50.0% 92.1% tive of a good classifier. Therefore, a comprehensive performance metric for the two methods is included that takes the average of the sensitivity and specificity for the each method. The best model chosen for PCA-LDA is Model 1, and the model chosen for SLLEP-LDA is Model 4. Based upon ‘Metric’ for these two models, SLLEP-LDA outperforms PCA-LDA by 9.4%. Figure 5.11 provides a visual comparison of the process monitoring output, also defined as the measure of process shift, on the test data-set for Model 1 of PCA-LDA and Model 4 of SLLEP-LDA. Bearing pressure and motor power are included in the figure to help orientate these results to the complete test data shown in Figure 5.4. Figure 5.11 demonstrates that the output between the two methods is nearly identical. In fact, SLLEP-LDA is slightly more sensitive to the overload than PCA-LDA, and has resulted in a misclassification around sample 70. False alarms such as these are proportional to the specificity measurement, and SLLEP-LDA could easily be tuned to reduce these misclassifications if the classification boundary was adjusted by increasing the cost of misclassifying nominal data as overloaded in Equation (4.30). This is one example where, if the assumptions of PCA are valid and the dimension- ality of the observation space is low, then PCA can almost match the performance of SLLEP. However, because of the assumption of linearity and multivariate normality in PCA that SLLEP does not require, SLLEP as a feature extractor will always be equivalent or better than PCA if trained properly (De Ridder and Duin, 2002). Table 5.9 demonstrates the power of SLLEP-LDA as a superior process monitoring 106 0 50 100 150 200 250 0 0.25 0.5 0.75 1 Be ar in g Pr es su re Overload 0 0.25 0.5 0.75 1 M ot or  P ow er 0 50 100 150 200 250 400 800 1200 1600 PC A O ut pu t Overload 0 50 100 150 200 250 −20 −10 0 10 SL LE P O ut pu t Sample Number Overload Figure 5.11: Output from process monitoring on the test set for both PCA-LDA and SLLEP-LDA tool for high-dimensional data-sets. PCA-LDA shows a deteriorating performance on the test set as the number of variables in the model increased. This observation is likely explained by the fact that we observed the extra variables included in Models 2 to 4 are more nonlinear than the five variables carefully selected for Model 1. However, the classification performance of SLLEP-LDA improves as the number of observation variables increases because the method does not require linearity of the variables in the model, thus more information about the process operation is mapped to the feature space. In addition, since PC scores are mapped to the feature space using a linear equation, the information of the fault in the PCs becomes masked by the new variables. Consequently, a linear model is not a useful tool to extract features pertaining to the 107 overload for nonlinear variables. Moreover, PCA-LDA in Model 1 is only successful as a process monitoring method in this case study because of the large e↵ort invested in selecting and transforming the variables to allow PCA to perform well. On the other hand, SLLEP-LDA does not require as large of an investment in the selection of process variables because highly nonlinear variables do not need to be pre-screened. SLLEP-LDA has proven to have equivalent or greater classification accuracy than PCA-LDA in this case study. However, SLLEP-LDA has some disadvantages too. Although less e↵ort is required to select process variables, more e↵ort is required to perform a grid search on the tuning parameters K and ↵ in SLLEP that give the best classification results in the feature space by LDA. However, if a fast computer is available during the o✏ine training of SLLEP-LDA to find K and ↵, then the grid search can be performed in a reasonable amount of time. Another disadvantage to SLLEP-LDA is the online computational complexity. PCA-LDA requires only a handful of calculations; however, SLLEP-LDA requires the real-time solution to an unconstrained QP. In Chapter 4.1.2 we demonstrated a new approach that allows SLLEP-LDA to be utilized in the industry for process monitoring. Consequently, if MPC is available in the control software at a specific site, then it is possible to solve the unconstrained QP in real-time with minimal controller loading and programming. In this case study, the mineral processing facility uses a modern DCS that is equipped with MPC. However, in their version of the commercial DCS product, the MPC consists of an optimizer that solves an LP to provide set-points to the MPC controller. Because the optimizer cannot be bypassed in this version, the QP of the MPC controller cannot be directly accessed. As a result, SLLEP-LDA is only solved with the LP and results in poor mapping of new data to the feature space. Because of the limitation of the control software available at the mineral processing facility, we were not able to implement the SLLEP-LDA process monitoring tool, and the modeling and simulation portion of this case study is conducted using the PCA- 108 LDA process monitoring method. Despite this, we have successfully demonstrated to a major control systems manufacturer the benefit of bypassing the optimizer to access the MPC controller directly in order to use the QP solver. Moreover, we submitted a technology request with the advanced control engineers at the major control systems manufacturer, and they have agreed to incorporate these changes into the software in a future release. To put it another way, this work has helped to advance one of the industry’s best commercial control products, allowing for some of the most forthcoming techniques in literature to be implemented in the industry. As a final note on the process monitoring method developed for the case study on the SAG mill at the mineral processing facility, this work provides a new technique that is superior to the best practice. Ko and Shang (2011) demonstrate PCA with Q and T 2 classification on industrial operating data. However, their study does not validate the process monitoring method on new test data; instead, they performed cross-validation on the training data. However, we found with 10-fold cross-validation that PCA can successfully classify the training data. By way of contrast, the real test is when the PCA model is tested on new test data where it is likely that the process operating conditions have changed. We found that careful selection of process variables, classification of the PC score space, and scaling the observation variables is the best way to make sure PCA generalizes well to new data where process conditions have changed. As a result, this work presents two process monitoring methods that are designed to work in the industry given realistic operating conditions. Likewise, we have successfully implemented PCA-LDA at the mineral processing facility in January 2013, and have found that this tool gives the expected output results for their operating conditions. However, no overloads have occurred in this first half of 2013 to fully evaluate the performance of PCA-LDA because the mineral processing facility has recently added an additional SAG mill to de-bottleneck the original SAG mill. 109 5.3 Modeling of the overloaded operating region To model the overload operating region of the SAG mill, we first considered a first- principles model of the internal dynamics of the SAG mill at the mineral processing facility based o↵ the work of Morrell (1993); Apelt (2002), and the work of the Julius Kruttschnitt Mineral Research Centre at the University of Queensland (Napier-Munn et al., 1996; Valery, 1998). These models are based upon dynamic mass balances for solids, water, ball charge, and shell liner wear. However, we instead chose an empirical modeling approach because this approach has the advantage of significantly reducing the time required for plant personnel to gather detailed information about the design of the SAG mill. Moreover, many of the model parameters would have to be estimated from data available at similar plants. Finally, the first-principles model does not account for the rheological nature of the slurry, flow-back problems of the pulp-lifter discharge system, slurry pooling, and freewheeling; as a consequence, the models do not provide an accurate simulation of the onset of mill overloading required for this case study (Katubilwa and Moys, 2011; Shi and Napier-Munn, 2002; Morrison and Cleary, 2008; Latchireddi and Morrell, 2003a,b). The modeling techniques described in Chapter 4.2.1 are used to model the overload operating region of the SAG mill at the mineral processing facility. In this case, there are two MVs: feed ore flow, u1, and water flow, u2, and one CV: measure of process shift, y. The measure of process shift is the output of the process monitoring method discussed in Chapter 5.2. Since there are two MVs and one CV, we must train two process models to find the response between each input-output pair. Considering we know that the feed input will have the strongest e↵ects on the mill behaviour, we first model the response of y to changes in u1. Overload data from February 27th, 2012 was used to train the response between u1 to y because u2 was not initially changed to correct the overload response. Conse- quently, we can use the data from this region to find the response in y that is directly 110 caused by changes in u1 and not u2. Figure 5.12 shows the data sampled at 30 seconds used to develop the model relation between u1 to y. In this data, we found 29 samples where the mill is overloaded and u1 is reduced to zero to let the mill grind the ore charge and recover the process to nominal operation. 0 5 10 15 20 25 30 0 1000 2000 3000 Fe ed  (S TP H) 0 2000 4000 6000 W at er  (G PM ) 0 5 10 15 20 25 30 400 600 800 1000 1200 1400 1600 M ea su re  o f P ro ce ss  S hi ft Sample Number Overload   PCA Output P1D P1DZ P2D Figure 5.12: Model fitting results for the response of the process shift to changes in feed ore rate Three di↵erent models were fitted to the data shown in Figure 5.12 using the MAT- LAB system identification toolbox (Ljung, 2007a): P1D, P1DZ, and P2D. Here, P1D corresponds to the a model with one pole and time delay as given in Equation (4.40). P1DZ corresponds to a model with one pole, a zero, and time delay as given in Equa- tion (4.41). Finally, P2D corresponds to a model with two poles and time delay as given in Equation (4.42). A model with two poles, a zero, and time delay, as given 111 in Equation (4.43), was not considered because there is not enough data to fit a high order model as seen with P2D in the figure. The best fitting model is P1D as shown in the figure, and the model parameters for P1D and the two other candidate models are shown in Table 5.10 for the data-set labelled Feb 27. In this table, r̄ represents the average model residual per sample. r̄ is simply the total residuals divided by the number of samples in the data-set, and it is used instead of the total residuals to pro- vide a comparison of the residuals against models trained on data-sets with more or less samples. In addition, ‘Fit’ provides a measure that compares how well the model predicts the response compared to a line centered on the mean of the response. Fit is calculated from the expression: Fit = 100  1 ||y  ŷ||||y mean(y)|| ! , (5.19) where ŷ is the array containing the predicted response from the model, and || · || corresponds to the l2 norm. Table 5.10: Modeling steps and fitting results Data-set Purpose Name Model Fit (%) r̄(⇥103) Feb 27 Train P1D Y (s) U1(s) = 0.638e 116s 437s+ 1 94.8 0.37 Feb 27 Train P1DZ Y (s) U1(s) = 0.836(57.6s+ 1)e25.7s 731s+ 1 91.7 0.94 Feb 27 Train P2D Y (s) U1(s) = 0.265e 17.8s (0.03s+ 1)(0.03s+ 1) 20.6 85.8 Feb 29 Fit P1D Y (s) U1(s) = 0.638e 116s 437s+ 1 84.3 3.65 Feb 29 Train n/a Y (s) U2(s) = 0.021e 475s 3.57s+ 1 0.00 323.5 May 27 Test P1D Y (s) U1(s) = 0.638e 116s 437s+ 1 0.00 109.4 From Table 5.10, we observe that the fit is highest and r̄ is the lowest with P1D. Ac- 112 cordingly, it is not necessary to perform an F -test on the model residuals to determine which model has the best fit for the lowest model order as detailed in Appendix E. The next step to model the overload operating region is to model the response between u2 and y. In this case, data are available for the overload on February 29th, 2012 where both u1 and u2 were reduced to recover from the overload. To find the response between u2 and y, we must estimate the response of y from u1 first, then the remainder between the estimated response and the true response is used to fit the model between u2 and y. The data from February 29th is shown in Figure 5.13. The second subplot of the figure shows the estimated response calculated by P1D. The third subplot shows the remainder of the response that is unexplained by P1D. The model remainder shown in Figure 5.13 was used to identify the FOPTD model response between u2 and y; however, the figure shows that the model fit is poor. One explanation for the poor fit between u2 and y is that there is not enough data to properly identify the model. Alternately, there is possibly a poor correlation between u2 and y, or the correlation is very nonlinear. However, there is not enough data in the overload operating region needed to fit a nonlinear model, and the model fit from P1D alone satisfactorily explains the response. In fact, Figure 5.13 can be thought of as a test set for P1D, and Table 5.10 shows that the fit is reasonable at 84.3%. As a final point, the engineers at the mineral processing facility were also unable to fit a model between u2 and motor load for the nominal operating range, and the e↵ects of water on mill dynamics are poorly understood in the industry and academia. Table 5.11 summarizes the final model fitting results for u1 and y and u2 and y. The table shows that the parameters variances of P1D only a fraction of a percent of the magnitudes of the parameters. Although this is desirable, it is also unlikely that the real parameter variances are so low because P1D trained on a data-set with a limited size. Moreover, the parameter variances for the response between u2 and y were negligible. However, the data-set to train this model was not large enough to appropriately estimate the model or the parameters variances. As a result, the overall 113 0 5 10 15 20 25 30 35 40 0 1000 2000 3000 Fe ed  (S TP H) 0 2000 4000 6000 W at er  (G PM ) 0 5 10 15 20 25 30 35 40 400 800 1200 1600 M ea su re  o f P ro ce ss  S hi ft Overload   PCA Output P1D 0 5 10 15 20 25 30 35 40 −50 0 50 100 150 P1 D  F it R em ai nd er Sample Number   Remainder Model Fit Figure 5.13: Fitting P1D to a new training data-set in order to fit a model for u2 on the fit remainder process model for the overload operating region is P1D, and the model between u2 and y is not included in the overall model. The final step, now that the overall model has been chosen, is to validate the overall model on a test data-set. Overload data on May 27th, 2012 was used to validate the overall process model, P1D. There are 290 data points sampled every 30 seconds in this data-set. This data-set and the estimated model response is shown in Figure 5.14. Figure 5.14 shows the estimate of the process shift is accurate in certain regions, 114 Table 5.11: Best fit models and their parameter variances Model Parameter Unit Value Variance Y (s) = 0.638e 116s 437s+ 1 U1(s) K Process ShiftSTPH 0.638 0.001 ⌧ seconds 437 1.7 ✓ seconds 116 0.03 Y (s) = 0.021e 475s 3.57s+ 1 U2(s) K Process ShiftGPM 0.021 n/a ⌧ seconds 3.57 n/a ✓ seconds 475 n/a and inaccurate in others. This is expected because the process is nonlinear, and P1D is a linear model. Correspondingly, P1D was trained to estimate the response of y only when the system is in an overload operating region. When the system is in the nominal operating region then the model for y is not needed because the nominal controller models will be used instead. In this figure, however, we see from the measure of process shift that the system shifts three times from an overloaded to the nominal operating region. As a result, P1D will have an excellent fit on the shift from overload to nominal, yet will have a poor fit when the process has recovered to the nominal operating region. Only when the process finally returns to the overload does P1D once again show a strong fit for the recovery from overload to nominal. The fitting results for the test set are also included in Table 5.11. However, these results are ambiguous because the model fit will only be accurate for changes to the MVs in the overload region. The models determined in this chapter are used in the recovery MPC as the process models for the response in process shift to changes in the feed rate MV. Moreover, the process model between the process shift measurement and water addition is set to a zero gain model to eliminate changes in water in the recovery response. Since a recovery response will involve reducing the feed and maintaining water flow, it is expected that the response will decrease the density of the charge in the mill suciently to help o↵set mill overloading. 115 0 50 100 150 200 250 300 0 1000 2000 3000 Fe ed  (S TP H) 0 2000 4000 6000 W at er  (G PM ) 0 50 100 150 200 250 300 400 800 1200 1600 M ea su re  o f P ro ce ss  S hi ft Sample Number Overload   PCA Output P1D Figure 5.14: P1D fitting results on a test data-set to validate model Expert rules The final modeling procedure required is to determine the expert rules and a limit in the magnitude of process shift where the expert rules take control of the MVs. From the data in Figure 5.14, we see that the operators reduce the feed to zero until the fault has corrected, and then the operators return the feed to roughly the same level prior to their intervention. However, there is no consistent action taken with the water addition. Based on these actions, the expert rules should automatically reduce the feed to zero until the recovery controller can finish the process recovery, then the feed will be returned to a feed rate similar to the level before the expert system intervened. In 116 addition, since the action to take with water is uncertain from the data, the water will be held constant because the ratio between feed and water will help reduce the density and viscosity of the charge during an overload. With simulation and plant trials, these rules may be adjusted. However, to start, based on our observations from the data of the operators’ actions during an overload, we recommend using the following rules: Rule 1 If the magnitude of process shift is greater than 1400, override control of the feed and water MVs, and reduce the feed rate to zero. Rule 2 If Rule 1 is active and the magnitude of process shift is less than 1300, increase feed to 80% of the feed rate before Rule 1 was active and return control to the override or nominal controller. 5.4 Recovery from the SAG mill overload The last part of the case study is a simulation that studies the behaviour of the control strategy in the event of an incipient or abrupt disturbance to the process. Because the mineral processing facility has recently added a second SAG mill to their process, they no longer experience overloading with either of their SAG mills because comminution is no longer the bottleneck at their facility. Consequently, we were unable to incorporate the control strategy at the mineral processing facility, and a simulation was performed instead. The process models used in the simulation are the linear process models identified by Spartan Controls during commissioning of the MPC. However, the time constant of the models was reduced from roughly 10 minutes to approximately one minute. The dead time was also reduced by a factor of 10 in order to speed up the simulation; nonetheless, the sample rate is taken to simulate a two minute sampling of the real process. Moreover, nonlinearity was added to the process outputs to account for the nonlinear nature of the process when the mill has overloaded. The behaviour of the process in this simulation is rudimentary and will not be nearly as accurate as the true 117 response of the SAG mill. However, the models in the simulation do not necessarily need to be accurate because the purpose of this simulation is to demonstrate the behaviour of the control strategy. A more accurate test performed at the mineral processing facility would only allow us to demonstrate how well we modelled the abnormal operating region. Nevertheless, this was demonstrated in Chapter 5.3 on a large test data-set. In each of the three simulations examined in this section, MPC is used as the nominal controller with two MV’s: feed rate and water flow, and two CV’s: feed- end west bearing pressure and motor power A. In Scenario 1, an incipient disturbance enters the system where no control strategy is commissioned to recover the process from an overload. Scenario 2 demonstrates the e↵ectiveness of the control strategy to recover the process from an overload following the same input disturbance given in Scenario 1. Finally, in Scenario 3, an abrupt disturbance leads to significant overloading that the expert system mitigates before returning control to the MPC. The e↵ects of the disturbance in these simulations is most visible from the estimated process shift output representing the simulated PCA-LDA output. Additionally, the disturbance also a↵ects the bearing pressure and motor load. Bearing pressure will rise during an overload because the grinding eciency within the mill is depressed and causes ore to accumulate in the mill. Furthermore, motor power will initially increase during an overload; however, when the overload becomes severe, the motor power will decrease indicating the presence of freewheeling or slurry pooling. Scenario 1: Incipient disturbance without recovery control. Scenario 1 demonstrates the behaviour of the controller when no control strategy is commissioned to recover the process from an overload. In this case, the overload results in a positive feedback where the overload results in ore accumulating in the mill, thus decreasing the grinding eciency of the mill and worsening the overload. The results of this simulation are shown in Figure 5.15. 118 0 20 40 60 80 100 120 140 1000 1100 1200 1300 1400 Overload Pr oc es s Sh ift 0 25 50 75 100 D is tu rb an ce 0 20 40 60 80 100 120 140 800 900 1000 1100 1200 FE W  Be ar in g Pr es su re  (p si) 4000 4500 5000 5500 6000 M ot or  P ow er  A  (k W ) 0 20 40 60 80 100 120 140 0 1000 2000 3000 Sample Number Fe ed  R at e (S TP H) 3000 4000 5000 6000 W at er  F lo w  (G PM ) Figure 5.15: Scenario 1: control strategy has not been incorporated to mitigate the incipient disturbance In Figure 5.15 we see that the incipient disturbance that enters at sample 15 causes the process shift to increase, indicating the process is becoming closer to an overload. However, when the process is classified as overloaded around sample 44, the controller is reacting slowly because it is only responding to the increase in bearing pressure from the target SP of 950 psi. Even when the disturbance stops increasing at sample 55, the positive feedback mechanism of the overload causes the process shift to gradually increase. As the overload worsens, the bearing pressure increases and the motor power decreases. However, the controller is tuned to decrease feed when the bearing pressure is high, and increase the feed when motor power is low. As a result, the nonlinearity of the process results in an incorrect response between motor power and feed rate. This 119 conflicting response within the MPC causes the changes to the feed to be slow and smaller in magnitude than required to prevent the overload from worsening. Only at sample 134 do the operators recognize the process is overloaded and manually make corrective actions before the bearing pressure exceeds its upper constraint of 1200 psi. However, the operators’ response arrives 90 samples after the process monitoring method first detected that the mill was overloaded. Scenario 2: Incipient disturbance with recovery control. Scenario 2 demonstrates the same disturbance that enters the system as in Scenario 1; however, Scenario 2 treats the measure of process shift as a constraint variable in the MPC with a constraint limit of 1195 as determined from PCA-LDA. The constraint variable allows the MPC to switch models from controlling the bearing pressure and motor power to controlling the measure of process shift as a priority. If the constraint variable exceeds the constraint limit, or if the MPC predicts the constraint limit will be exceeded within a 120 sample prediction horizon, then the MPC will act to reduce the feed to reduce the process shift to the constraint limit. Figure 5.16 shows the results for this simulation. In Figure 5.16, we observe that the overload was detected by the process monitoring method around sample 44—the same as Scenario 1. This tells us that the MPC did not predict the constraint variable would exceed the limit within its prediction horizon. This result makes sense, because the prediction of the constraint variable is determined from the models developed in Chapter 5.3 between the measure of process shift and the feed and water MVs. The overall model response assumed water flow had zero gain on the process shift, and feed had a positive gain. In other words, when the feed decreases in the simulation, the MPC predicts that the measure of process shift should decrease. However, the MPC does not observe the unmeasured disturbance entering the system and cannot compensate until the constraint limit on the constraint variable is violated. 120 0 50 100 150 200 250 300 1000 1100 1200 1300 1400 Overload Pr oc es s Sh ift 0 25 50 75 100 D is tu rb an ce 0 50 100 150 200 250 300 800 900 1000 1100 1200 FE W  Be ar in g Pr es su re  (p si) 4000 4500 5000 5500 6000 M ot or  P ow er  A  (k W ) 0 50 100 150 200 250 300 0 1000 2000 3000 Sample Number Fe ed  R at e (S TP H) 3000 4000 5000 6000 W at er  F lo w  (G PM ) Figure 5.16: Scenario 2: process shift constraint variable is exceeded causing the MPC to correct for the incipient disturbance before severe overloading results Once the constraint limit for the constraint variable is violated, we see that the MPC prevents the overload from worsening by sample 70 by reducing the feed rate. In this simulation, the corrective action of the feed flow was faster than in Scenario 1, and the overload stopped worsening by sample 70. Moreover, the MPC was able to keep the process shift at the constraint limit of the constraint variable until the dis- turbance subsided. We expect the process to have the optimal throughput at the constraint variable limit because the process is maximally loaded without being over- loaded. Therefore, the MPC maintains optimal operation of the process in the presence 121 of the large disturbance that entered the system. Scenario 3: Abrupt disturbance with recovery control. Scenario 3 demonstrates the response of the control strategy in the presence of a large and abrupt disturbance that rapidly overloads the SAG mill. In this case, the MPC is unable to compensate for the unmeasured disturbance alone. Figure 5.17 shows the results for this simulation. 0 20 40 60 80 100 120 140 160 180 200 1000 1100 1200 1300 1400 Overload Pr oc es s Sh ift 0 25 50 75 100 D is tu rb an ce 0 20 40 60 80 100 120 140 160 180 200 800 900 1000 1100 1200 FE W  Be ar in g Pr es su re  (p si) 4000 4500 5000 5500 6000 M ot or  P ow er  A  (k W ) 0 20 40 60 80 100 120 140 160 180 200 0 1000 2000 3000 Sample Number Fe ed  R at e (S TP H) 3000 4000 5000 6000 W at er  F lo w  (G PM ) Figure 5.17: Scenario 3: abrupt disturbance leads to significant overloading that the expert system mitigates before returning control to the MPC In Figure 5.17, we see that by sample 20, the process shift has exceeded the limit that was set at a process shift value of 1300 in this simulation where the expert system is designed to act. The rule in the expert system is to decrease the feed to zero when 122 the process shift exceeds 1300. The rule of 1300 is for simulation purposes only, and a di↵erent value may be used on-site. A limit of 1300 was used instead of 1400 because the recovery controller was aggressive enough to make it dicult in the simulation to bring the process shift above 1400. The expert system will maintain control of the feed and keep the water constant until the process shift returns to below 1230 in this simulation. When the process shift is below 1230, the next rule in the expert system is to increase the feed to 80% of the original feed rate prior to returning control of the MVs to the MPC. Once the expert system activates, we see that the overload is completely corrected after approximately 30 samples. The process recovery is smooth; however, there is a slight overshoot of the process shift around sample 100. On the other hand, the overshoot is small, and it is possible to correct by adjusting the expert rules slightly. For example, the expert system could return control of the MVs to the MPC earlier, or the feed rate could be recovered closer to 100% of the original feed rate prior to returning control of the MVs. Key performance measures We have selected three key performance indices to measure potential benefits of the controller performance recovery algorithm. These three measures are: the time to detect an overload, the time until the overload is fully corrected, and the amount of extra ore, and its value, that the mineral processing facility is able to process. We will look at the benefits of two potential cases: the di↵erence in responses between Scenario 1 and Scenario 2, and the ability to increase the throughput of the SAG mill since the risks of an overload are reduced. The benefits of the control strategy for an event such as Scenario 3 are not consid- ered because of the uncertainty of the costs of this event. Without a control strategy in this scenario, the abrupt disturbance would cause the bearing pressure limit to be exceeded and the mill would have been immediately stopped. Likewise, the damages 123 to the mill would need to be assessed, and it would be dicult to estimate potential equipment damages and downtime of the mill. Table 5.12 summarizes the three key performance measures for the di↵erence in response between Scenario 1 and Scenario 2. Dollar values are given in U.S. dollars (USD) because these metals are usually traded in USD in the global market. The assumptions for these estimates are provided in Table 5.13. Table 5.12: Key performance indices to measure the potential savings between Sce- nario 1 and Scenario 2 Key performance measure Value Unit Reduction in time to detect overload 3 hours Reduction in time to fully recover from overload 1.5 hours Additional ore processed 1816 tons Profit from additional ore processed 25 257 USD To determine the amount of additional ore processed between Scenario 1 and Sce- nario 2, we assumed that the operators reduced the feed to zero for 30 samples, or one hour, after the overload was detected on sample 134 in Figure 5.15; reducing the feed to zero for one hour was observed in historical data for a similar overload on February 29th, 2012. Then, we compare the total throughput of the mill over a period of 164 samples, or 5.4 hours, for both scenarios. The profit from the additional ore processed is calculated using the equation: Profit = quantity of ore⇥ copper grade⇥ copper price/2 + quantity of ore⇥molybdenum grade⇥molybdenum price/2  quantity of ore⇥ cost per ton, (5.20) where the grades, prices, and costs are given in Table 5.13, and the sale price of the minerals in the concentrate are assumed to sell for half the market value. 124 Table 5.13: Assumptions for calculating the key performance measures Assumptions Value Unit Copper grade of ore1 0.32 % Molybdenum grade of ore1 0.01 % 2011 average copper price2 8811 USD/ton 2011 average molybdenum price2 15.49 USD/lb Operating days in a year 300 days/year Cost per ton of ore processed3 3.48 USD/ton The second return of investment opportunity for the mineral processing facility is the ability to slightly increase the throughput of the SAG mill. As we mentioned at the beginning of this chapter, the mineral processing facility operates their process conservatively because of their concern for the SAG mill to overload. However, based upon evidence from historical data, we estimate that they could increase their average throughput by 10 STPH since they would have the capacity to detect the early onset of an overload and automatically recover from it. This leads to an estimated increase in profit by one million USD per year based on an assumed 300 days of operation per year. Moreover, we believe a 10 STPH throughput increase is a conservative estimate as it represents a 0.5% increase in the average throughput of approximately 2000 STPH. The assumptions and calculations of this estimate are also given in Table 5.13, and an additional assumption is that the SAG mill is the bottleneck of their facility. 1Provided by the mineral processing facility. 2London Metal Exchange. 3An estimate from Wills and Napier-Munn (2006) of operating expenses including labour, energy, and water usage for a typical copper concentrating facility. 125 Chapter 6 Conclusion E↵ective management of abnormal situations in a chemical or mineral processing facil- ity is vital to maximizing process economy and safety. Traditionally, statistical process monitoring has been utilized to alert operators when a process has shifted to an ab- normal operating region. The operators then manually make the corrective control action to recover the process to the nominal operating range. However, this thesis bridges the gap between statistical process monitoring and the DCS. Consequently, the DCS is provided with enough information about the process shift to the abnormal operating range and the necessary control actions that must be made to recover the process to nominal operation. It is in these specific areas that this work has made the following contributions: 1. A general control strategy has been developed in Chapter 4 that proposes two statistical process monitoring techniques that may be used for either linear or nonlinear processes to detect a process shift into to an abnormal operating sit- uation. In addition, the control strategy includes two linear control techniques designed to recover a process from the nonlinear abnormal operating region to the nominal operating region. 2. A technique to allow SLLEP to be performed in real-time in an industrial DCS has been developed and described in Chapter 4.1.2. Since most modern industrial DCS include MPC, it may be used as a tool to solve a QP in real-time with 126 minimal programming. This allows us to map new data to the feature space for classification in the SLLEP process monitoring algorithm. 3. The control strategy was demonstrated on a SAG mill at the mineral process- ing facility in British Columbia that detects a mill overload and uses the DCS to recover the process without operator intervention. Moreover, the causes and mechanics of a SAG mill overload have been thoroughly studied and comprehen- sively discussed in Chapter 2.3 for the first time in literature to the best of our knowledge. 6.1 Summary of contributions The general control strategy proposes using PCA or SLLEP as a feature extractor and then Q, T 2, LDA, or QDA as a classifier in process monitoring to identify abnormal operating situations. Specifically, SLLEP is proposed as a new feature extractor for processes that behave nonlinearly. The output of the classifiers is a continuous vari- able that provides an estimate of the magnitude of the process shift to an abnormal operating region. This measure of process shift can be used in two ways: first as a CV in an override MPC controller to override the existing nominal controller’s MVs. The second technique is when the nominal controller is MPC. The technique is to use the measure of process shift as a constraint variable in the MPC, and the constraint vari- able models in the MPC are modelled for the nonlinear abnormal operating region. In addition, if the measure of process shift indicates a severe overload, an expert system based o↵ of an operator’s best recovery control actions is proposed to recover a pro- cess from a severely nonlinear abnormal operating region to the less-severe abnormal operating region that the linear recovery controller is designed. We have developed a method to use MPC to solve the QP in the SLLEP algorithm in real-time in a DCS. SLLEP is a new and advanced process monitoring technique that has shown in this work to be e↵ective for nonlinear processes, or processes with 127 many variables to be monitored. Through this work, we have demonstrated to one of the largest control software developers in the industry the value of SLLEP and the necessity to directly access the QP of the MPC controller. As a result, the developers have agreed to incorporate a manual override of the LP optimizer in their DCS for advanced users to gain direct access to the QP. In other words, we have helped advance the industry enough to allow the commissioning of a highly advanced and forthcoming technique such as SLLEP. The control strategy in this work was demonstrated on a SAG mill at the mineral processing facility. We successfully demonstrated that the process monitoring method is robust to changes in process operating conditions to be able to accurately detect the early onset of a mill overload even after significant changes to the process occurred over the year this study was carried out. In addition, the PCA-LDA process moni- toring technique has been installed at the mineral processing facility and is providing information to the operators on the status of their SAG mill. Although the mineral processing facility was unable to incorporate the full control strategy developed in this thesis due to sta↵ limitations, we have demonstrated the control strategy has the potential for significant savings in the operation of their SAG mill. The control strategy is capable of optimally recovering their process from an overload before it becomes severely overloaded. As a result, process stability and safety is significantly increased. Likewise, the control strategy allows the process to be operated closer to the maximum throughput of the SAG mill without having to su↵er the consequences of the increased risk of overloading the mill. 6.2 Future research The following is a list of potential research subjects based on the work described in this thesis: 1. The improvement in modeling of the abnormal operating region in the cases 128 where data availability is limited, or the excitation of the process inputs is so large that the response in the data is nonlinear. 2. The improvement of locally linear embedding to map new observations to the feature space for classification. Specifically, the mapping errors need to be re- duced, especially for regions of data sparsity in the training set. 3. The training of a classifier using fuzzy class data rather than binary class data 4. The application of this control strategy to other industries. 129 Bibliography Apelt, T. A. Inferential measurement models for semi-autogenous grinding mills. Ph.D. Thesis, University of Sydney, Australia, 2002. Bailey, S. J. From desktop to plant floor, a CRT is the control operators window on the process. Control Engineering, 31(6):86–90, 1984. Basseville, M. Detecting changes in signals and systems–A survey. Automatica, 24(3): 309–326, 1988. Bequette, B. W. Process Dynamics: Modeling, Analysis, and Simulation. Prentice- Hall, 1998. Bianchi, F., De Battista, H., and Mantz, R. Wind Turbine Control Systems: Princi- ples, Modelling and Gain-scheduling Design. Springer, 2006. Bly, M. Deepwater Horizon Accident Investigation Report. DIANE Publishing, 2011. Boser, B. E., Guyon, I. M., and Vapnik, V. N. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992. Boyd, S. and Vandenberghe, L. Convex Optimization. Cambridge University Press, 2004. Camacho, E. and Bordons, C. Model Predictive Control. Springer, 2nd edition, 2007. Chen, W. H. Disturbance observer based control for nonlinear systems. Mechatronics, IEEE/ASME Transactions on, 9(4):706–710, 2004. Chen, X. S., Li, Q., and Fei, S. Constrained model predictive control in ball mill grinding process. Powder Technology, 186:31–39, 2008. Chen, X. S., Yang, J., Li, S. H., and Li, Q. Disturbance observer based multi-variable control of ball mill grinding circuits. Journal of Process Control, 19:1205–1213, 2009. Cheng, H., Nikus, M., and Jämsä-Jounela, S.-L. Evaluation of PCA methods with improved fault isolation capabilities on a paper machine simulator. Chemometrics and Intelligent Laboratory Systems, 92(2):186–199, 2008. 130 Chisci, L., Falugi, P., and Zappa, G. Gain-scheduling MPC of nonlinear systems. International Journal of Robust and Nonlinear Control, 13(3-4):295–308, 2003. Choi, S. W., Lee, C., Lee, J.-M., Park, J. H., and Lee, I.-B. Fault detection and identi- fication of nonlinear processes based on kernel PCA. Chemometrics and Intelligent Laboratory Systems, 75(1):55–67, 2005. Chou, H. H. Fault diagnosis of a heat exchanger system using unknown input observers. M.A.Sc. Thesis, University of Toronto, Canada, 2000. Cleary, P. W., Sinnott, M., and Morrison, R. Prediction of slurry transport in SAG mills using SPH fluid flow in a dynamic DEM based porous media. Minerals Engi- neering, 19:1517–1527, 2006. Coetzee, L. C. Robust nonlinear model predictive control of a closed run-of-mine ore milling circuit. Ph.D. Thesis, University of Pretoria, South Africa, 2009. Craig, I. K. and MacLeod, I. M. Specification framework for robust control of a run-of-mine ore milling circuit. Control Engineering Practice, 3(5):621–630, 1995. De Ridder, D. and Duin, R. P. W. Locally linear embedding for classification. Pattern Recognition Group, Dept. of Imaging Science & Technology, Delft University of Technology, Delft, The Netherlands, Tech. Rep. PH-2002-01, 2002. Dean, D. R. Memo to Jim Dunstan, Rosebud Joint Venture, Nevada, January 1999. Doyle, J. C., Francis, B. A., and Tannenbaum, A. Feedback control theory, volume 1. Macmillan Publishing Company, New York, 1992. Draper, N. R., Smith, H., and Pownell, E. Applied Regression Analysis. John Wiley & Sons, 3rd edition, 1998. Fisher, R. A. The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7(2):179–188, 1936. Frank, P. M., Ding, S. X., and Marcu, T. Model-based fault diagnosis in technical processes. Transactions of the Institute of Measurement and Control, 22(1):57–101, 2000. Friedberg, S. H., Insel, A. J., and Spence, L. E. Linear Algebra. Pearson, 4th edition, 2002. Galán, O., Barton, G. W., and Romagnoli, J. A. Robust control of a SAG mill. Powder Technology, 124(3):264–271, 2002. Garćıa, C. E., Prett, D. M., and Morari, M. Model predictive control: Theory and practice–A survey. Automatica, 25(3):335–348, 1989. 131 Garćıa-Álvarez, D., Fuente, M. J., Vega, P., and Sainz, G. Fault detection and diagno- sis using multivariate statistical techniques in a wastewater treatment plant. In 7th IFAC International Symposium on Advanced Control of Chemical Processes, pages 952–957, 2009. Garćıa-Álvarez, D., Fuente, M., and Sainz, G. Fault detection and isolation in transient states using principal component analysis. Journal of Process Control, 22(3):551– 563, 2012. Garnier, H., Mensler, M., and Richard, A. Continuous-time model identification from sampled data: Implementation issues and performance evaluation. International Journal of Control, 76(13):1337–1357, 2003. Gill, D. A., Picou, J. S., and Ritchie, L. A. The exxon valdez and BP oil spills: A com- parison of initial social and psychological impacts. American Behavioral Scientist, 56(1):3–23, 2011. Goodwin, G. C. A brief overview of nonlinear control. Centre for Integrated Dynamics and Control Bildirisi, Department of Electrical and Computer Engineering, The University of Newcastle, Australia, 2002. Gupta, A. and Yan, D. Mineral Processing Design and Operations. Elsevier, Amster- dam, The Netherlands, 2006. Hespanha, J. P. and Morse, A. S. Switching between stabilizing controllers. Automat- ica, 38(11):1905–1917, 2002. Hespanha, J. P., Liberzon, D., and Morse, A. S. Hysteresis-based switching algorithms for supervisory control of uncertain systems. Automatica, 39(2):263–272, 2003. Himmelblau, D. M. Fault detection and diagnosis in chemical and petrochemical pro- cesses. Elsevier Scientific Publishing Company, 1978. Horn, R. A. and Johnson, C. R. Matrix Analysis. Cambridge University Press, 1990. Isermann, R. Model-based fault-detection and diagnosis–Status and applications. An- nual Reviews in Control, 29(1):71–85, 2005. Jackson, E. J. and Mudholkar, G. S. Control procedures for residuals associated with principal component analysis. Technometrics, 21(3):341–349, 1979. Jackson, J. E. A user’s guide to principal components, volume 244. Wiley-Interscience, 1991. Johnson, R. A. and Wichern, D. W. Applied Multivariate Statistical Analysis. Pearson, 6th edition, 2007. 132 Kalman, R. E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82:35–45, 1960. Katubilwa, F. M. and Moys, M. H. E↵ects of filling degree and viscosity of slurry on mill load orientation. Minerals Engineering, 24:1502–1512, 2011. Kim, B. S., Calise, A. J., and Kam, M. Nonlinear flight control using neural networks and feedback linearization. Journal of Guidance, Control, and Dynamics, 20(1): 26–33, 1997. Ko, Y.-D. and Shang, H. SAG mill system diagnosis using multivariate process variable analysis. The Canadian Journal of Chemical Engineering, 89(6):1492–1501, 2011. Koivo, H. N. Artificial neural networks in fault diagnosis and control. Control Engi- neering Practice, 2(1):89–101, 1994. Kosanovich, K. A., Dahl, K. S., and Piovoso, M. J. Improved process understanding using multiway principal component analysis. Industrial & Engineering Chemistry Research, 35(1):138–146, 1996. Kourti, T., Lee, J., and MacGregor, J. F. Experiences with industrial applications of projection methods for multivariate statistical process control. Computers & Chemical Engineering, 20:S745–S750, 1996. Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37(2):233–243, 1991. Krohling, R. A. and Rey, J. P. Design of optimal disturbance rejection PID controllers using genetic algorithms. Evolutionary Computation, IEEE Transactions on, 5(1): 78–82, 2001. Ku, W., Storer, R. H., and Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 30(1):179–196, 1995. Latchireddi, S. and Morrell, S. Slurry flow in mills: Grate-only discharge mechanism (Part-1). Minerals Engineering, 16:625–633, 2003a. Latchireddi, S. and Morrell, S. Slurry flow in mills: Grate-pulp lifter discharge systems (Part-2). Minerals Engineering, 16:635–642, 2003b. Lee, J.-M., Yoo, C., Choi, S. W., Vanrolleghem, P. A., and Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chemical Engineering Science, 59(1):223–234, January 2004. Leith, D. J. and Leithead, W. E. Survey of gain-scheduling analysis and design. International Journal of Control, 73(11):1001–1025, 2000. 133 Li, B. and Zhang, Y. Supervised locally linear embedding projection (SLLEP) for machinery fault diagnosis. Mechanical Systems and Signal Processing, 25(8):3125– 3134, 2011. Liu, C. S. and Peng, H. Disturbance observer based tracking control. Journal of Dynamic Systems and Control Division, 122:332–335, 2000. Liu, Y., Weisberg, R. H., Hu, C., and Zheng, L. Tracking the deepwater horizon oil spill: A modeling perspective. 92(6):45, 2011. Ljung, L. Estimating linear time-invariant models of nonlinear time-varying systems. European Journal of Control, 7(2-3):203–219, 2001. Ljung, L. The System Identification Toolbox: The Manual. The MathWorks Inc., Natick, MA, USA, 7th edition, 2007a. Ljung, L. and Wills, A. Issues in sampling and estimating continuous-time models with stochastic disturbances. In IFAC World Congress, Seoul, Korea, July 2008. Ljung, L. System Identification. Prentice-Hall, 2nd edition, 2007b. Ljung, L. Experiments with identification of continuous time models. In Proceedings of the 15th IFAC symposium on system identification, Saint-Malo, France, pages 90–95, 2009. MacGregor, J. F. and Kourti, T. Statistical process control of multivariate processes. Control Engineering Practice, 3(3):403–414, 1995. Mahadevan, S. and Shah, S. L. Fault detection and diagnosis in process data using one-class support vector machines. Journal of Process Control, 19(10):1627–1639, 2009. Manohar, C. S. and Roy, D. Monte Carlo filters for identification of nonlinear structural dynamical systems. Sādhanā, 31(4):399–427, 2006. Mariño, R. Nonlinear Control Design: Geometric, Adaptive, & Robust. Prentice Hall, New York, 1995. Maybeck, P. S. Stochastic Models, Estimation, and Control, volume 1. Academic Press, New York, 1979. McLachlan, G. J. Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons, New Jersey, 2nd edition, 2004. Mhaskar, P., Gani, A., and Christofides, P. D. Fault-tolerant control of nonlinear processes: Performance-based reconfiguration and robustness. International Journal of Robust and Nonlinear Control, 16(3):91–111, 2005. 134 Mokken, A. H., Blendulf, G. K. I., and Young, G. J. C. A study of the arrangements for pulp discharge on pebble mills, and their influence on mill performance. Journal of the South African IMM, pages 257–289, 1975. Morrell, S. The prediction of power draw in wet tumbling mills. Ph.D. Thesis, Univer- sity of Queensland, Australia, 1993. Morrison, R. D. and Cleary, P. W. Towards a virtual communition machine. Minerals Engineering, 21:770–781, 2008. Morse, A. S. Supervisory control of families of linear set point controllers. In Proceed- ings of the 32nd Conference on Decision and Control, Texas, 1993. Moys, M. H. Slurry rheology-The key to a further advance in grinding mill control. Pro- ceedings: Advances in autogenous and semi-autogenous grinding technology, 1989. Moys, M. H., Van Nierop, M. A., and Smit, I. Progress in measuring and modelling load behaviour in pilot and industrial mills. Minerals Engineering, 9(12):1201–1214, 1996. Najim, K., Hodouin, D., and Desbiens, A. Adaptive control: State of the art and an application to a grinding process. Powder Technology, 82(1):59–68, 1995. Napier-Munn, T. J., Morrell, S., Morrison, R. D., and Kojovic, T. Mineral Com- munition Circuits: Their Operation and Optimization. Julius Kruttschnitt Mineral Research Centre, Australia, 1996. Neogi, D. and Schlags, C. E. Multivariate statistical analysis of an emulsion batch process. Industrial & Engineering Chemistry Research, 37(10):3971–3979, 1998. Nijmeijer, H. and Van der Schaft, A. Nonlinear dynamical control systems. Springer, New York, 1990. Nimmo, I. Adequately address abnormal operations. Chemical Engineering Progress, 91(9):36–45, 1995. Ohran, B. J., de la Peña, D. M., Christofides, P. D., and Davis, J. F. Enhancing data-based fault isolation through nonlinear control: Application to a polyethylene reactor. In American Control Conference, 2008, pages 3299–3306, 2008. Ohran, B. J., Liu, J., de la Peña, D. M., Christofides, P. D., and Davis, J. F. Data- based fault detection and isolation using feedback control: Output feedback and optimality. Chemical Engineering Science, 64(10):2370–2383, 2009. Packard, A. and Doyle, J. The complex structured singular value. Automatica, 29: 71–109, January 1993. 135 Patton, R. J. Fault-tolerant control systems: The 1997 situation. In IFAC Symposium on Fault Detection Supervision and Safety for Technical Processes, volume 3, pages 1033–1054, 1997. Patton, R. J., Chen, J., and Siew, T. M. Fault diagnosis in nonlinear dynamic systems via neural networks. In International Conference on Control’94, volume 2, pages 1346–1351, 1994. Primbs, J. A. and Nevistic, V. Optimality of Nonlinear Design Techniques: A Converse HJB Approach. Technical report CIT-CDS 96022, California Institute of Technology, Pasadena, CA 91125., 1996. Quine, B. M. A derivative-free implementation of the extended Kalman filter. Auto- matica, 42(11):1927–1934, 2006. Ramesh, T. S., Shum, S. K., and Davis, J. F. A structured framework for ecient problem solving in diagnostic expert systems. Computers & Chemical Engineering, 12(9):891–902, 1988. Rao, R. C. The utilization of multiple measurements in problems of biological classi- fication. Journal of the Royal Statistical Society, 10(2):159–203, 1948. Roweis, S. T. and Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000. Rugh, W. J. and Shamma, J. S. Research on gain scheduling. Automatica, 36(10): 1401–1425, 2000. Russell, E. L., Chiang, L. H., and Braatz, R. D. Fault detection in industrial pro- cesses using canonical variate analysis and dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 51(1):81–93, 2000. Saha, N. and Roy, D. Two-stage extended Kalman filters with derivative-free local linearizations. Journal of Engineering Mechanics, 137(8):537–546, 2011. Saul, L. K. and Roweis, S. T. An introduction to locally linear embedding. Unpublished. Available at: http://www.cs.nyu.edu/⇠roweis/lle/publications.html, 2000. Saul, L. K. and Roweis, S. T. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research, 4:119–155, 2003. Schölkopf, B., Smola, A., and Müller, K. R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998. Serpas, M., Chu, Y., and Hahn, J. Fault detection approach for systems involving soft sensors. Journal of Loss Prevention in the Process Industries, July 2012. 136 Shi, F. N. and Napier-Munn, T. J. E↵ects of slurry rheology on industrial grinding performance. International Journal of Mineral Processing, 65:125–140, 2002. Skogestad, S. and Postlethwaite, I. Multivariable Feedback Control: Analysis and Design. John Wiley & Sons, 2nd edition, 2005. Tong, S. and Koller, D. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2:45–66, 2002. Valery, W. A model for dynamic and steady-state simulation of autogenous and semi- autogenous mills. Ph.D. Thesis, University of Queensland, Australia, 1998. van Drunick, W. I. and Penny, B. Expert mill control at AngloGold ashanti. 105(7): 497, 2005. Venkatasubramanian, V. and Chan, K. A neural network methodology for process fault diagnosis. AIChE Journal, 35(12):1993–2002, 1989. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Computers & Chemical Engineering, 27(3):293–311, 2003a. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies. Computers & Chemical Engineering, 27(3):313–326, 2003b. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review of process fault detection and diagnosis: Part III: Process history based methods. Computers & Chemical Engineering, 27(3):327–346, 2003c. Wan, E. A. and Van Der Merwe, R. The unscented Kalman filter for nonlinear esti- mation. In Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000, pages 153–158, 2000. Wang, J. Locally linear embedding. In Geometric Structure of High-Dimensional Data and Dimensionality Reduction, pages 203–220. Springer Berlin Heidelberg, 2011. Wei, D. and Craig, I. K. Grinding mill circuits–A survey of control and economic concerns. International Journal of Mineral Processing, 90(1):56–66, 2009. Wills, B. A. and Napier-Munn, T. J. Mineral Processing Technology. Elsevier, Oxford, 7th edition, 2006. Yang, J., Li, S., Chen, X. S., and Li, Q. Disturbance rejection of ball mill grinding circuits using DOB and MPC. Powder Technology, 198(2):219–228, 2010. 137 Yoon, S. and MacGregor, J. F. Fault diagnosis with multivariate statistical models Part I: Using steady state fault signatures. Journal of Process Control, 11(4):387– 400, 2001. Zhang, Y. and Jiang, J. Bibliographical review on reconfigurable fault-tolerant control systems. Annual Reviews in Control, 32(2):229–252, 2008. Zhou, Y., Hahn, J., and Mannan, M. S. Fault detection and classification in chemical processes based on neural networks with feature extraction. ISA Transactions, 42 (4):651–664, 2003. Zumo↵en, D. and Basualdo, M. From large chemical plant data to fault diagnosis integrated to decentralized fault-tolerant control: Pulp mill process application. Industrial & Engineering Chemistry Research, 47(4):1201–1220, 2008. 138 Appendix A Selection of Process Variables This section acts as guideline to help select process variables to be used in PCA. To select the most relevant process variables as inputs to a PC model, we need an understanding of the process from a physics and an operating point of view. With a basic understanding of the process, we can strategically select process variables, or transformations of process variables, to make the PC model robust to extraneous information, noise, or drifting. Moreover, we can replace highly nonlinear process variables with transformations that behave more linear. As a result, it is possible to use PCA as a statistical process monitoring technique for some nonlinear or time- varying processes. The disadvantage of a careful selection of process variables is there is a much larger time requirement to build a PC model when compared to the traditional method of building the model from all of the main process inputs and outputs. From the physics perspective, we need to have a fundamental understanding of the dynamics inside of the process unit. For example, it is important to know how mass and energy flows from the inputs to the outputs of the process boundary. Furthermore, we need to know how the accumulation of mass or energy in the process unit a↵ect the outputs or the economic objective. It is also important to understand how the physics of the process change when the process shifts from a nominal to an abnormal operating region. By gaining a fundamental understanding of the process, we may be able to transform certain process variables, based on physical intuition, in such a way that the transformation more clearly indicates the process has shifted. Some 139 examples of the types of variable transformations based on a physical understanding of the process that can be used in PCA to detect a process shift are: • transform bearing pressures to reflect load distribution in the process unit. • design a mass or energy balance to determine if mass or energy is accumulating inside a process. • develop a model that compares the expected throughput or economics of the process to measured values. • develop a model that compares the expected acoustics, light, or vibration inten- sity output of a process to measured values. From the operational perspective, there is a lot of insight into the behaviour of the nominal and abnormal operation of the process from the operators and process data. It is important to have a discussion with the operators to learn how they identify nominal and abnormal process operation. Specifically, we need to know what key variables they monitor, and what are the changes in these variables following a process shift. Almost certainly, the variables the operators monitor to detect a process shift will also be included in some form in the PC model. Operators can also provide valuable insight on which sensors are poorly calibrated and drift. Once time-varying variables are identified, they can either be excluded from the PC model if they are not critical to the model, or the variables can be transformed into the di↵erence the raw measurement is from a moving average. However, as few variables as possible should be transformed into the di↵erence from a moving average. Taking the di↵erence from a moving average may desensitize the process monitoring method if the process remains stable near the undesired operating region for a long time. Operating data can also provide insight into the selection of key process variables. For example, a correlation analysis, such as a correlation plot, between process vari- ables may help to eliminate variables that provide redundant information during a 140 process shift. In addition, plotting time series trends or frequency spectra of various process variables together may show which variables best indicate a process shift has occurred. A time series plot may also reveal if particular measurements are signifi- cantly corrupted with noise and need to be filtered. Moreover, a time series plot may reveal that the rate of change of a variable, when compared to other variables, o↵ers another form of an indication of the process shift. To incorporate the rate of change of process variables into a PC model, the derivative of each variable needs to calculated and subsequently filtered. A.1 Transformation of sound level Figure A.1 shows sound level data sampled every two minutes from the mineral pro- cessing facility on February 25th, 2012 to February 29th, 2012. Data points where the SAG mill was not operational have been removed from this data-set. The measured sound level is filtered in the DCS at the mineral processing facility with a first-order filter with a time constant of 40 seconds. The moving average of the sound level is calculated from a first-order filter with a time constant of 0.4 days. A moving average using a first-order filter is desired because it is computationally easy to implement and compute in real-time in a DCS. The deviation of the measured sound level from the estimated moving average is used to replace the measured sound level because it is more robust to changes in the mean of the sound level due to operation of the SAG mill and auxiliary equipment. The red lines in the figure depict the region on February 29th, 2012 that the engineers ocially recognize the process as being overloaded. From the figure, we see that the sound level is less than the moving average. This is expected because the energy of the falling ore in the mill is either reduced or dampened due to the presence of slurry pooling or a reduced distance the ore can cataract. 141 0 500 1000 1500 2000 2500 3000 0 20 40 60 80 100 120 140 So un d Le ve l, no  u ni ts Overload Sample Number   Measured Sound Level Moving Average of Sound Level Figure A.1: Di↵erence between measured sound level and a 0.4 day moving average of the sound level 142 Appendix B Locally Linear Embedding B.1 Minimization of reconstruction error The reconstruction error in Equation (4.12) is minimized in three steps: 1. Evaluate the inner product between each of the K neighbors of xi to find the neighborhood correlation matrix C. Let the K-nearest neighbors of xi be rep- resented by X⌘ where: X⌘ = [x⌘1 ,x⌘2 , . . . ,x⌘K ] 2 <D⇥K , (B.1) Then C is computed as: C = xT⌘jx⌘k , (B.2) where 1  j  K and 1  k  K. 2. Compute the Lagrange multiplier, L = ↵L/L that enforces the sum to one constraint in Equation (4.14), where for each xi the parameters ↵ and  are: ↵L = 1 KX j=1 KX k=1 C1jk (x T i x⌘k), (B.3) L = KX j=1 KX k=1 C1jk . (B.4) 143 3. Compute the reconstruction weights, Wj, corresponding to the ith reconstruc- tion: Wj = KX j=1 KX k=1 C1jk (x T i x⌘k + L). (B.5) If the correlation matrix C is nearly singular, it can be conditioned before inversion by adding a small multiple of the identity matrix—this is known as regularization and essentially penalizes large weights. B.2 Minimization of embedding cost The embedding cost, as a function of in Equation (4.15) subject to the constraints in Equation (4.16) and Equation (4.17) can be rewritten to the form: minimize y (y) = nX i=1 nX j=1 Mijy T i yj = tr(YMY T ), (B.6) where M is a sparse symmetric and positive semidefinite n ⇥ n matrix as follows: M = (I W )T (I W ). (B.7) By the Rayleigh-Ritz theorem, Equation (B.6) can be solved by finding the eigenvectors corresponding to the smallest d + 1 eigenvalues (Horn and Johnson, 1990). The smallest eigenvalue of this matrix is the unit vector with all equal components and it is discarded. It corresponds a free translation mode of eigenvalue zero: discarding it enforces the constraint that the embeddings have zero mean. B.3 Least squares regression to map new samples If XT⌘ has full column rank, then aj can be solved as: aj = (X T ⌘ ) +yj⌘ = (X⌘X T ⌘ ) 1X⌘yj⌘, (B.8) 144 where (XT⌘ ) + is the Moore-Penrose pseudo-inverse of XT⌘ , and y j ⌘ is defined as: yj⌘ = h yj⌘1 , y j ⌘2 , . . . , y j ⌘K iT (B.9) If XT⌘ is column rank deficient, such as in the case where K < D, then Equa- tion (4.23) is ill-posed and there are infinitely many solutions for aj. In this case, an e↵ective method to condition the problem so that there are unique solutions to aj is to perform ridge regression (Draper et al., 1998), which minimizes a regularized least squares problem. With the introduction of a regularization parameter,  > 0, then the regularized least square problem is posed: aj = argmin a KX i aTx⌘i  yj⌘i22 + a22, (B.10) where || · || corresponds to the l2 norm. Each basis vector aj can be solved as: aj = (X⌘X T ⌘ + I) 1X⌘yj⌘, (B.11) where I is the D ⇥ D identity matrix. The parameter  can be determined from the variance of the noise, 2, since the objective is to project the neighborhoods linearly into a d-dimensional space and discard the noise. Therefore,  can be estimated by:  = 2 = 1 K  d KX i=d+1 i, (B.12) where i are the non-zero eigenvalues of matrixX⌘X T ⌘ sorted from largest to smallest. 145 Appendix C Cross Validation The purpose of cross-validation is to train a predictive model that has one or more unknown parameters. Cross-validation optimizes the model parameters to find the model that best fits the training data-set. Moreover, cross-validation assesses how well the model we train will perform in practice. Consider a model, f(x,m), with one unknown model parameter, m, that is to be trained on the training data-set, X, consisting of n observations, x. The cross- validation procedure to find m so that f(x,m) fitsX as well as possible and also gen- eralizes well to new observations is given below and is called Kf -fold cross-validation. Step 1 Randomly partition X into Kf mutually exclusive data-sets: X = (X1,X2, . . . ,XKf ). (C.1) Step 2 For each k = 1, 2, . . . ,Kf , train the model to the entire training set excluding the kth-fold, Xk. Step 3 Fit the observations in Xk to the model trained in the previous step. Step 4 Compute the prediction error sum of squares (PRESS) statistic for the kth- fold: PRESSk = nX i=1 xi  x̂i22, (C.2) 146 where x̂i is the estimate of xi from the model trained in Step 2. Step 5 Compute the overall cross-validation error, XVerror: XVerror = 1 Kf KfX k=1 PRESSk. (C.3) Step 6 Repeat Steps 1-5 with several m values and choose the best one, m⇤, that results in the lowest XVerror. Step 7 Train f(x,⇤m) on X using m ⇤. In Step 6, a grid search can be used to select test m values. First, use a coarse grid of roughly 10 equidistant m values within a reasonable range. For the range of m values where XVerror is the smallest, perform a fine grid search until satisfied that the XVerror has been minimized to a desired tolerance. In the case there are multiple parameters to optimize, perform a grid search that exhausts all combinations of the gridded model parameters. Once f(x,⇤m) has been trained on X, it should now be tested on a new data-set, also known as a test set, to further validate the model performance before using in practice. In the cross-validation procedure presented in this section, Kf was chosen to be 10 for balance. A large Kf results in a better trained classifier since there is more training data available; however, variances may exist between di↵erent partitions of the training set that are not captured if Kf is large. On the other hand, if Kf is small the classifier may be poorly trained. Therefore, Kf = 10 strikes a balance between these competing objectives. 147 Appendix D Model Predictive Control The objective of dynamic matrix control MPC is to reduce the error between measured process outputs and their SPs in a least squares sense with the inclusion of a penalty term on the input moves to the process. Therefore, the manipulated variables are determined such that the cost function, J , is minimized: minimize u J(u) = nyX j=1 pX k=1 ŷj(t+ k|t) wj(t+ k)2R + nuX l=1 mX q=1 ul(t+ q  1)2Q, (D.1) where ŷ is a predicted output for the kth prediction and jth output variable; ul is a future control signal for the qth prediction and lth control variable; ny is the number of control variables; nu is the number of manipulated variables; p is the length of the prediction horizon; m is the length of the control horizon; R 2 <ny ·p⇥ny ·p is the penalty on error diagonal weight matrix; and Q 2 <nu·m⇥nu·m is the penalty on move diagonal weight matrix. Equation (D.1) can be also be written in matrix form: minimize u J(u) = ŷ w2 R + u2 Q . (D.2) Here, ŷ 2 <p·ny⇥1 is the vector of predicted outputs; u 2 <m·nu⇥1 is the array of future control signals, where ŷ is defined as: ŷ = [ŷ1(t+ 1|t), . . . , ŷ1(t+ p|t), . . . , ŷny(t+ 1|t), . . . , ŷny(t+ p|t)]T , (D.3) 148 and u is: u = [u1(t), . . . ,u1(t+m 1), . . . ,unu(t), . . . ,unu(t+m 1)]T . (D.4) Future output predictions are a function of future control moves; in other words, ŷ is a function of u, and ŷ can be computed by the expression: ŷ = Gu+ f , (D.5) where f is the vector for the free response of the system, that is, it is the part of the system response that does not depend on future control moves, and is given by: f = [f1(t+ 1|t), . . . , f1(t+ p|t), . . . , fny(t+ 1|t), . . . , fny(t+ p|t)]T . (D.6) Here, f1, . . . , fny are the free responses corresponding to the ny process outputs, and is given by: fj(t+ k) = ymj(t) + Nj,1X i=1 (gj,1k+i  gj,1i )u(t i) + · · ·+ Nj,nuX i=1 (gj,nuk+i  gj,nui )u(t i), for j = 1, . . . , ny, (D.7) where ymj corresponds to the jth measured output; Nj,l is the number of samples until steady state is reached in yj following a step change in ul with l = 1, . . . , nu; k = 1, . . . , p; and gj,li is the ith coecient of the response in yj following a unit step change in ul. Furthermore, the g j,l i coecients are used to construct the system’s 149 dynamic matrix between yj and ul, Gj,l 2 <p⇥m, and is given by: Gj,l = 26666666666664 g1 0 . . . 0 g2 g1 . . . 0 ... ... . . . ... gm gm1 . . . g1 ... ... . . . ... gp gp1 . . . gpm+1 37777777777775 ; (D.8) furthermore, for multiple-input and multiple-output systems, Gj,l are combined into the combined matrix G containing all the system dynamic matrices, where G is: G = 26666664 G1,1 G1,2 . . . G1,nu G2,1 G2,2 . . . G2,nu ... ... . . . ... Gny ,1 Gny ,2 . . . Gny ,nu 37777775 . (D.9) Since R  0 and Q ⌫ 0, Equation (D.2) is a convex quadratic program (Boyd and Vandenberghe, 2004), and an analytical solution for u is found by setting the derivative to zero. The solution to u computed at each time instant t is: u = (GTRG+Q)1GTR(w  f), (D.10) For each t, only (w  f) needs to be recalculated since the system is assumed to be LTI. Therefore, we define K = (GTRG+Q)1GTR and calculate it o✏ine prior to implementation of MPC. Moreover, since only the first control move is executed at each t,K can be reduced to only the rows corresponding to the first calculated control move for each ul. Then Equation (D.10) is simplified to: u =K(w  f). (D.11) 150 where each element of u contains the control move for each ul to be executed. 151 Appendix E Model Selection Model selection is important when modeling a system because there is a trade-o↵ between prediction accuracy and model complexity. The greater the order of the model, the more likely the possibility that the model overfits the data and results in poor generalization to new data. To select the process model that best fits a training data-set, a method known as the extra sum of squares method can be used. Extra sum of squares is a simple technique that allows us to measure whether increasing the order of the model results in a statistically significant increase in prediction accuracy at a confidence level of ↵. The first step in extra sum of squares is to find the sum of squares of the model residuals, Qj, for the jth model where, j = 1, 2, . . . , nm, and nm is the number of models to be compared against each other. Moreover, the nm models are sorted in ascending order in terms of the number of fitted parameters, pj, in the jth model. Qj is computed from the expression: Qj = nX i=1 (yi  ŷi,j)2, (E.1) where n is the number of observations in the training data-set, yi is the ith observation of the output in the data-set, and ŷi is the ith estimate of yi from the jth model. The next step is to compute the F -statistic between combinations each model where pj 152 increases only by one, resulting in a total of k = j  1 combinations: Fk = ✓ Qj Qj+1 pj+1  pj ◆✓ n pj+1 Qj+1 ◆ . (E.2) The next step is to find F↵. To do this, find the value in the F -distribution table for df1 = pj+1 pj and df2 = n pj+1 at a confidence level of ↵, where df corresponds to the degrees of freedom in the F -distribution. The final step is to compare the F values to find the first case where Fk < F↵. Once this condition is satisfied, then the jth model is chosen as the best candidate, where j = k. With this method, it is possible that a comparison between two models where j in pj is increased by more than one results in a higher-order and better fitting model. However, it is unlikely in this case that the increase in model order is significantly more beneficial to the predictive accuracy. If it appears a model is not selected but may be more beneficial for modeling the process, the predictive accuracy of the unselected model should be compared to the predictive accuracy of the selected model on a test data-set to see if the prediction accuracy substantially improved. This section is meant to act as a guideline for model selection. There are numerous other methods available for model selection. For example, a cost function incorporating the cost of prediction errors and parameter costs could be minimized to find the best model. The final prediction-error criterion discussed by Ljung (2007b) is one example for model selection based upon the minimization of a cost function. 153

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0073946/manifest

Comment

Related Items