Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Algorithm for nonlinear process monitoring and controller performance recovery with an application to… McClure, Kenneth Scott 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2013_fall_mcclure_kenneth.pdf [ 4.31MB ]
Metadata
JSON: 24-1.0073946.json
JSON-LD: 24-1.0073946-ld.json
RDF/XML (Pretty): 24-1.0073946-rdf.xml
RDF/JSON: 24-1.0073946-rdf.json
Turtle: 24-1.0073946-turtle.txt
N-Triples: 24-1.0073946-rdf-ntriples.txt
Original Record: 24-1.0073946-source.json
Full Text
24-1.0073946-fulltext.txt
Citation
24-1.0073946.ris

Full Text

ALGORITHMS FOR NONLINEAR PROCESS MONITORING AND CONTROLLER PERFORMANCE RECOVERY WITH AN APPLICATION TO SEMI-AUTOGENOUS GRINDING by Kenneth Scott McClure B.A.Sc., The University of British Columbia, 2011  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies (Chemical and Biological Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) July 2013 c Kenneth Scott McClure, 2013  Abstract Chemical and mineral processing industries commonly commission linear feedback controllers to control unit processes over a narrow and linear operating region where the economy of the process is maximized. However, most of these processes are nonlinear outside of this narrow operating region. In the event of a large unmeasured disturbance, a process can shift away from nominal and into an abnormal operating region. Owing to the nonlinearity of these processes, a linear controller tuned for the nominal operating region will perform poorly and possibly lead to an unstable closed-loop system in an abnormal operating region. Moreover, it is often difficult to determine whether a process has shifted to an abnormal operating region if none of the constraints on the measured process outputs are violated. In these events, it is the operator who must detect and recover the process, and this manual response results in a sub-optimal recovery. This thesis develops and demonstrates a control strategy that monitors several process variables simultaneously and provides an estimate of the process shift to a nonlinear abnormal operating region where a linear recovery controller is implemented to recover the process back to nominal. To monitor the process, principal component analysis is proposed for process shifts that can be detected by linear variable transformations. Alternatively, for nonlinear or high-dimensional processes, locally linear embedding is proposed. Once a process shift to an abnormal operating region is detected, the control strategy uses the estimate of the process shift in a recovery controller to recover the process. In the event the linear recovery controller is unable to recover the process, an expert system overrides the recovery controller to return the process to a recoverable region. ii  A case study on a semi-autogenous grinding mill at a processing facility in British Columbia presents the successful application of the control strategy to detect and recover the mill from overloading. Portions of this control strategy have been implemented at this facility, and it provides the operators with a real-time estimate on the magnitude of the mill overload.  iii  Preface This thesis was written in partial fulfillment of the degree of Masters of Applied Science in the Faculty of Graduate Studies in the Department of Chemical and Biological Engineering at the University of British Columbia. The content herein describes a general control strategy to recover a chemical or mineral process from an abnormal operating region to the nominal range of operation. Portions of this work have been submitted for publication in May 2013. In addition, the findings for locally linear embedding on the case study will be presented at the International Mineral Processing Congress in October 2014. A case study for this work was conducted at a mineral processing facility located in British Columbia. Portions of this work were implemented at this facility in the first half of 2013. The case study of semi-autogenous grinding at this mineral processing facility demonstrates how this work can improve the stability and economy of comminution circuits in the mineral processing field. However, the stability and economy of comminution can also be improved indirectly by rejecting disturbances upstream at the dry-surge ore stockpile. The case study in this work is complementary to the work on dry-surge control that has been accepted for publication: McClure, K.S., Nagamune, R., Marshman, D., Chmelyk, T., and Gopaluni, R.B. Robust level control of a dry-surge ore pile. Canadian Journal of Chemical Engineering (In Press). I, Ken McClure, was the main contributor with respect to research for the implementation of the control strategy in this thesis at the mineral processing facility, along with all other content of this work.  iv  Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iv  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  v  List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ix  Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7  2.1  Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7  2.1.1  Model-based methods . . . . . . . . . . . . . . . . . . . . . . . .  10  2.1.2  Data-based methods . . . . . . . . . . . . . . . . . . . . . . . .  13  2.2  Controller performance recovery . . . . . . . . . . . . . . . . . . . . . .  21  2.3  SAG milling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  25  2.3.1  Process overview . . . . . . . . . . . . . . . . . . . . . . . . . .  27  2.3.2  Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  28  3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35  v  3.1  Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  36  3.2  Controller performance recovery . . . . . . . . . . . . . . . . . . . . . .  37  4 General Control Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1  4.2  Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  40  4.1.1  Principal component analysis . . . . . . . . . . . . . . . . . . .  42  4.1.2  Locally linear embedding . . . . . . . . . . . . . . . . . . . . . .  50  4.1.3  Linear and quadratic classification . . . . . . . . . . . . . . . . .  61  Controller performance recovery . . . . . . . . . . . . . . . . . . . . . .  65  4.2.1  Modeling the abnormal operating region . . . . . . . . . . . . .  67  4.2.2  Override MPC design . . . . . . . . . . . . . . . . . . . . . . . .  71  4.2.3  Modifications to existing MPC . . . . . . . . . . . . . . . . . . .  74  5 Case Study of SAG Mill Overload . . . . . . . . . . . . . . . . . . . . . 76 5.1  5.2  Description of the process . . . . . . . . . . . . . . . . . . . . . . . . .  76  5.1.1  Control objective . . . . . . . . . . . . . . . . . . . . . . . . . .  77  Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  81  5.2.1  Selection of measured and inferred process variables . . . . . . .  81  5.2.2  PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  93  5.2.3  SLLEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100  5.2.4  Comparison between PCA-LDA and SLLEP-LDA . . . . . . . . 105  5.3  Modeling of the overloaded operating region . . . . . . . . . . . . . . . 110  5.4  Recovery from the SAG mill overload . . . . . . . . . . . . . . . . . . . 117  6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.1  Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . 127  6.2  Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 vi  A Selection of Process Variables . . . . . . . . . . . . . . . . . . . . . . . . 139 A.1 Transformation of sound level . . . . . . . . . . . . . . . . . . . . . . . 141 B Locally Linear Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 143 B.1 Minimization of reconstruction error . . . . . . . . . . . . . . . . . . . 143 B.2 Minimization of embedding cost . . . . . . . . . . . . . . . . . . . . . . 144 B.3 Least squares regression to map new samples . . . . . . . . . . . . . . . 144 C Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 D Model Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . 148 E Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152  vii  List of Tables 5.1  Input and output variables critical to the recovery control strategy . . .  79  5.2  Properties of the candidate process variables . . . . . . . . . . . . . . .  82  5.3  Classification results for PCA Method 1 . . . . . . . . . . . . . . . . .  96  5.4  PCA modeling results . . . . . . . . . . . . . . . . . . . . . . . . . . .  98  5.5  PCA-LDA classification results for the 2D and 1D feature spaces . . . .  98  5.6  Summary of the variables retained in each candidate model . . . . . . . 102  5.7  SLLEP Classification results of four candidate models . . . . . . . . . . 105  5.8  Classification boundaries for SLLE and the classification error rate . . . 105  5.9  Comparison of the classification results between PCA-LDA and SLLEPLDA on the test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106  5.10 Modeling steps and fitting results . . . . . . . . . . . . . . . . . . . . . 112 5.11 Best fit models and their parameter variances . . . . . . . . . . . . . . 115 5.12 Key performance indices to measure the potential savings between Scenario 1 and Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.13 Assumptions for calculating the key performance measures . . . . . . . 125  viii  List of Figures 1.1  Example of a process operation with two states . . . . . . . . . . . . .  4  2.1  Types of faults and their development with time . . . . . . . . . . . . .  8  2.2  Residual generation by a parity relation for process monitoring . . . . .  11  2.3  Residual generation by observer for process monitoring . . . . . . . . .  12  2.4  Architecture of a neural network . . . . . . . . . . . . . . . . . . . . . .  15  2.5  Bivariate normal sampled data showing the relative contributions of the two principal components . . . . . . . . . . . . . . . . . . . . . . . . .  2.6  16  Side view of the ellipse from PCA on 3D data to emphasize dimensionality reduction and outlier detection . . . . . . . . . . . . . . . . . . . .  17  2.7  SVM trained with two classes . . . . . . . . . . . . . . . . . . . . . . .  19  2.8  Dimensionality reduction by LLE performed on a Swiss Roll . . . . . .  20  2.9  Schematic of a typical grate-discharge SAG mill . . . . . . . . . . . . .  28  2.10 Mill power curve for several process conditions . . . . . . . . . . . . . .  29  2.11 Cross-sectional view of the inside of a SAG mill showing load torques and maximum potential energy of cataracting ore . . . . . . . . . . . .  30  2.12 Distribution of slurry within a mill . . . . . . . . . . . . . . . . . . . .  32  2.13 Cross-sectional view of the inside of a SAG mill showing load torques .  33  2.14 Cross-sectional view of the inside of a SAG mill showing load torques and load distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1  34  Flowchart of the process monitoring and controller performance recovery strategy applied to a chemical process . . . . . . . . . . . . . . . .  ix  40  4.2  Block diagram of the major steps to statistical pattern recognition for process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  41  4.3  The three steps of the LLE algorithm . . . . . . . . . . . . . . . . . . .  52  4.4  Locally linear reconstruction of xi . . . . . . . . . . . . . . . . . . . . .  54  4.5  LLE mapping of an ‘S’ shaped curve to demonstrate the accuracy of LLE projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4.6  59  3D manifold of process operation with two planes showing the linearized regions the nominal and recovery controller are designed to operate . .  68  4.7  Block diagram for the input-output relationship of a modelled process .  69  4.8  MPC controller overrides the MV’s of the existing controller to recover the process after a process shift . . . . . . . . . . . . . . . . . . . . . .  4.9  72  Expert rules override the MV’s if the process is operating in a severely abnormal region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  74  4.10 Existing MPC controller is modified with a constraint variable to allow the MPC to recover the process after a process shift . . . . . . . . . . . 5.1  75  Overview of existing SAG mill control at the mineral processing facility in British Columbia . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  78  5.2  Candidate process variables of the training data-set . . . . . . . . . . .  88  5.3  Pair-wise scatter plot of the selected process variables . . . . . . . . . .  91  5.4  Candidate process variables for a segment of the test data-set for process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5.5  92  Candidate process variables of the training data-set for PCA Method 1 only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  94  5.6  Q and T 2 results on the test data-set for PCA Method 1 . . . . . . . .  95  5.7  LDA classification results on the PC score space of the training data-set for a) 2D LDA, and b) 1D LDA . . . . . . . . . . . . . . . . . . . . . .  5.8  99  LDA classification results on the PC score space of the test data-set . . 100  x  5.9  LDA and QDA classification on the training data-set mapped into 2D space by SLLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103  5.10 LDA and QDA classification on the test data-set mapped into 2D space by SLLEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.11 Output from process monitoring on the test set for both PCA-LDA and SLLEP-LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.12 Model fitting results for the response of the process shift to changes in feed ore rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.13 Fitting P1D to a new training data-set in order to fit a model for u2 on the fit remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.14 P1D fitting results on a test data-set to validate model . . . . . . . . . 116 5.15 Scenario 1: control strategy has not been incorporated to mitigate the incipient disturbance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.16 Scenario 2: process shift constraint variable is exceeded causing the MPC to correct for the incipient disturbance before severe overloading results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.17 Scenario 3: abrupt disturbance leads to significant overloading that the expert system mitigates before returning control to the MPC . . . . . . 122 A.1 Di↵erence between measured sound level and a 0.4 day moving average of the sound level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142  xi  Notation D  number of dimensions in a high-dimensional data-set  F  statistic for the F continuous probability distribution  Gp  process transfer function  H1  Hardy space with minimax restrictions in the frequency domain  J  cost function  Kf  number of folds in cross-validation  Kp  process gain  K  number of nearest neighbors to a data point  Q↵  upper control limit of the test statistic for principal component analysis  Q  test statistic for principal component analysis  R0  Euclidean distance between vectors considering class information  R  Euclidean distance between vectors before considering class information  T↵2  upper control limit of the test statistic for principal component analysis  T2  test statistic for principal component analysis  U (s)  input signal in Laplace domain  W  weights  Y (s)  output signal in Laplace domain di↵erence operator  ⇤  diagonal eigenvalue matrix embedding cost  <  real number  ↵  percent confidence level  xii  r¯  average model residual per sample  ¯1 x  mean vector of nominal operating data  ¯2 x  mean vector of abnormal operating data binary class information between two vectors  ⌘  nearest neighbor class information modifier  yˆ(t + k|t)  predicted value of y(t + k) with available information at instant t  2  element of  L  Lagrange multiplier  m  unknown parameter in cross-validation eigenvalue  A  mapping matrix in locally linear embedding projection  C  correlation matrix  E  residual matrix  G  system dynamic matrix  I  identity matrix  K  controller matrix  Q  penalty on move weight matrix  R  penalty on error weight matrix  S1  covariance matrix of nominal operating data  S2  covariance matrix of abnormal operating data  S pooled  pooled covariance matrix  S  covariance matrix  T  principal component score matrix  V  eigenvector matrix  xiii  X  observation data matrix  Y  feature data matrix  ⇤  eigenvalue matrix  ⌦  cross product of two vectors  ⇡1  class label for nominal data  ⇡2  class label for abnormal data matrix inequality  ⌧c  torque provided by charge  ⌧n  numerator time constant  ⌧p  torque provided by slurry pool  ⌧  time constant  ✓  time delay  "  reconstruction cost  a  basis mapping vector locally linear embedding projection  f  free response vector  t  principal component score vector  u  array of future control signals  x  observation data vector  y  feature vector  T  transpose  df  degrees of freedom  d  number of dimensions of a low-dimensional mapped data-set  f  arbitrary feature of a process  n1  number of observations of nominal operating data  n2  number of observations of abnormal operating data  xiv  nu  number of controlled process inputs  ny  number of controlled process outputs  n  number of observations in a data-set  p  number of retained principle components  s  operator variable in the frequency domain  tf  time of onset of a fault  t  time  u  process input variable  w  process input disturbance variable  xc  length of the charge torque arm  xp  length of the slurry pool torque arm  x  process state variable  y  process output variable  |S|  determinant of matrix S  ||v||2Q  equivalent to v T Qv  1D  one-dimensional  2D  two-dimensional  3D  three-dimensional  ANN  artificial neural network  APER  apparent error rate  AV  constraint variable  BP  bearing pressure  xv  c.g.  center of gravity  CPV  cumulative percent variance  CV  control variable  DCS  digital control system  DE  discharge-end  DPCA  dynamic principal components analysis  DV  disturbance variable  EKF  extended Kalman filter  FE  feed-end  FOPTD  first-order plus time delay  FTC  fault tolerant control  GPM  gallons per minute  KF  Kalman filter  KNN  K-nearest neighbors  KPCA  kernel principal components analysis  LDA  linear discriminant analysis  LLE  locally linear embedding  LP  linear program  LTI  linear time-invariant  xvi  MP  motor power  MPC  model predictive control  MV  manipulated variable  PC  principal component  PCA  principal components analysis  PEmax  maximum potential energy  PF  particle filter  PID  proportional-integral-derivative  PRESS  prediction error sum of squares  QDA  quadratic discriminant analysis  QP  quadratic program  SAG  semi-autogeneous grinding  SLLE  supervised locally linear embedding  SLLEP  supervised locally linear embedding projection  SOPTD  second-order plus time delay  SP  set-point  STPH  short tons per hour  SVD  singular value decomposition  SVM  support vector machines  UIO  unknown input observer  xvii  UKF  unscented Kalman filter  USD  United States dollars  XV  cross-validation  xviii  Acknowledgements This thesis would not have been possible without the encouragement, support, and insight provided from my academic supervisors Dr. Bhushan Gopaluni, Dr. Ryozo Nagamune, and Dr. Sirish Shah, or from my industrial supervisors Terry Chmelyk, Manny Sidhu, and Devin Marshman at Spartan Controls. To them, I owe a huge debt of gratitude over these past two years.  I am grateful to the University of Alberta for allowing me to study at their school under the Western Dean’s Agreement. Dr. Shah deserves a huge thank-you for accepting me as his student and for encouraging me to take a productive course load at the university.  I am also grateful to NSERC and Spartan Controls whose financial support made this work possible.  I thank Dr. Nagamune and Dr. Gopaluni for managing the control group at the University of British Columbia. This control group brought me together with colleagues outside of the department, and it motivated me to design a multivariate model predictive controller that was useful in this work.  I sincerely thank my parents for their never-failing interest and support throughout all the years of my study. They are an inspiration to work hard in life. I also thank my uncle, Dan Ashman, for his inspiration to become an engineer and exposing me to mineral processing. xix  I am thankful to the mineral processing facility in British Columbia for allowing me to use their SAG mill as a case study for my thesis. It is a rare opportunity to have been able to implement this work in the industry and solve a problem in great need of a solution.  I would like to make a very special acknowledgement to my partner, Vanya Jongkind, for her love, patience, and support throughout these years.  xx  Chapter 1 Introduction It is well-known that industrial processes are controlled over a narrow operating region to maximize profitability. Moreover, processes are usually most profitable at the intersection of process constraints (Garc´ıa et al., 1989). Since processes are intended to be controlled over a narrow operating region, control engineers appropriately assume this narrow operating region is ‘locally linear’ and describe the process dynamics as linear time-invariant (LTI) (Ljung, 2001). The assumption of an LTI process allows control engineers to design linear controllers for a narrow and desirable operating region, also known as the nominal operating region, even though the process dynamics may be nonlinear outside this region. Controlling industrial processes with linear controllers is advantageous compared to controlling these processes with nonlinear ones. Linear controllers require relatively minor online computation, and they can be synthesized from many well-known techniques such as proportional-integral-derivative (PID), model predictive, H1 , or µ control methods (Kim et al., 1997). However, linear feedback control, by design, is inherently limited to the nominal operating region where process economics are maximized. This limitation exists because the process model is linearized around the nominal operating region, and a controller is designed based on this linearized model. Since industrial processes are typically nonlinear, a deviation in the state of the process away from the nominal operating region where the controller is tuned, also known as a process shift, results in a degradation of controller performance. If the nonlinearity 1  of the process is significant, the controller may not be able to recover the process back to nominal. As a consequence, the feedback system may become unstable leading to safety risks or damage to process equipment (Doyle et al., 1992). Nimmo (1995) estimated that abnormal process situations cost U.S. industries over $20 billion each year. For example, the recent BP oil spill in the Gulf of Mexico in 2010 resulted in billions of dollars in damages to the fishing and energy industries and had enormous social and environmental impacts (Gill et al., 2011). This disaster started when the cement barrier at the bottom of the well failed. Afterwards, a series of equipment failures and poor monitoring and control practices resulted in the venting and ignition of a hydrocarbon-rich vapour that ultimately sank the oil rig and caused one of the worst oil spills in history (Bly, 2011; Liu et al., 2011). In this example, process monitoring could have detected the failure of the cement barrier from the measured increase of the drill pressure, and an automatic recovery control strategy could have responded 40 minutes faster than the operators onboard the oil rig (Bly, 2011). As a result, the hydrocarbons could have been diverted to a safer atmosphere with less risk of ignition or catastrophic damage to the oil rig (Bly, 2011). In the chemical and mineral processing industries, a process is usually recovered once an operator recognizes the degradation of the process performance and manually manipulates the control elements to recover the process. However, industrial processes are often very complex with many process variables; consequently, it takes too long for an operator to recognize the process has shifted and intervene. In addition, the actions of an operator are not optimal, and it often takes more time than necessary to manually recover the process to its nominal operating region (van Drunick and Penny, 2005). As an example, in the case study for this thesis, semi-autogenous grinding (SAG), operators usually over-correct a mill overload to the point where the mill becomes underloaded—leading to poor throughput and excessive liner wear. On the other hand, past attempts to design a method to automatically recover from a SAG mill overload have been ad hoc and often lead to sub-optimal results (van Drunick and  2  Penny, 2005). There are few, if any, generalized control strategies applicable to the chemical industry that solve both the problem of identifying a process shift and recovering the process back to the nominal operating region with a linear control method. There have been e↵orts in the control field to try to prevent a process from shifting away from the nominal operating region. These methods focus mostly on disturbance rejection. However, it is not always feasible to prevent a process shift for every possible scenario. As a result, another approach has been developed that attempts to make the feedback system tolerant to the consequences of a process shift. This approach is called fault-tolerant control (FTC), where a reasonable system performance and stability is achieved in the event of an abnormal event, known as a fault, by designing a feedback system that is tolerant to abnormal events such as process equipment failure. These two approaches are discussed briefly in the literature review in Chapter 2. The approach of this work is to design a general control strategy applicable to industrial processes that experience process shifts regardless of disturbance rejection e↵orts. Furthermore, these process shifts are caused by disturbances to the inputs of the process that are not a result of any equipment failure. Therefore, in most of these scenarios, the process recovery is controllable (Venkatasubramanian et al., 2003a), and we can design a control method to recover the process to the nominal operating state. In this work, this control method is known as ‘controller performance recovery’ because the existing controller for the process is modified or overridden in the event of a process shift in order to recover control performance and ultimately recover the process to back to nominal. In this thesis, a general control strategy is sought that, first, monitors a process to detect a process shift for linear and nonlinear processes, and afterwards utilizes information from the process monitoring tool to recover linear feedback control performance and return the process to nominal operation. Figure 1.1 illustrates the two main elements of the control strategy on an example process with two states, x1 and x2 . The figure depicts two di↵erent types of undesired  3  operating regions. In this example, only the operating region with serious consequences to process stability needs a recovery control strategy. In this region, the existing linear controller, also known as the nominal controller, is unable to recover the process due to the nonlinear nature of the process. However, with an appropriate control strategy, the process shift can be detected by a process monitoring tool, and a controller specifically tuned for this operating region can optimally recover the process to the nominal operating region.  x2  1. Detect Process Shift  2. Recover Process  x1 Nominal State Undesired State – Minor Consequences Undesired State – Serious Consequences Figure 1.1: Example of a process operation with two states, x1 and x2 , where 1. the shift to an undesired operating region is detected, and 2. the process is recovered with a linear controller Once a control strategy is developed, it will be tested on a SAG mill at a mineral processing facility located in central British Columbia, Canada. The SAG mill at this facility sometimes shifts to an overloaded state due to disturbances in the ore hardness, particle size, and composition. The overloaded state is difficult to detect because the process is operated as close to an overloaded state as possible in order to maximize the mill’s throughput, and changes in the state of the mill are only subtle at first. Once the mill is severely overloaded, the grinding efficiency of the mill decreases to 4  the point where ore rapidly accumulates and the mill’s bearings are at risk of damage. To recover from the mill overload, operators manually stop the feed ore conveyor for a long period of time to let the mill grind the ore charge. However, these actions usually cause the state of the mill to be overshot to an underloaded operating state. The SAG mill at the mineral processing facility in British Columbia is an excellent industrial application for the control strategy in this thesis. The SAG mill can benefit from earlier detection of the process shift to an overloaded operating state. Furthermore, the control system can automatically recover the process to nominal operation in minimal time without underloading the mill. This case study will showcase the adaptability of the general control strategy to a real industrial process, and it demonstrates the return of investment possible when implementing this technique. The main contribution of this thesis to the chemical and mineral processing industry is the development of a control strategy that can be used in a digital control system to detect and recover a process that has shifted to a nonlinear operating range where the existing controller performs poorly. This control strategy allows industrial processes to operate in process conditions that are close to constraints or undesired operation regions to enhance the economic performance of the process. In addition, a technique that allows locally linear embedding to be commissioned in an industrial controller for use as a nonlinear process monitoring technique is also discussed in this work. Finally, the control strategy developed in this thesis was applied to the SAG mill at the mineral processing facility in British Columbia, and it addresses the problem of detecting and correcting a mill overload before controller performance degrades excessively. As a result, the mineral processing facility in British Columbia can operate their process at a higher throughput with less risk that manual intervention will be needed in the event the process is significantly disturbed. The remainder of this thesis is organized as follows. In Chapter 2, a literature review covers the most relevant process monitoring and control techniques to this work. Chapter 3 develops the problem statement and scope for this work. In Chapter 4, a  5  general control strategy is developed that addresses the issues identified in the problem statement. An industrial implementation of the control strategy on the SAG mill at the mineral processing facility in British Columbia is presented in Chapter 5. Finally, in Chapter 6, the conclusions and future work are discussed.  6  Chapter 2 Literature Review The literature review focuses on two topics: process monitoring and controller performance recovery. Process monitoring tools are able to estimate whether the state of a process is nominal or abnormal. Controller performance recovery tools are control methods that recover a process from an abnormal operating situation to the nominal operating region where process economy is maximized and the existing nominal controller has the highest performance. Both topics are broad with numerous techniques available. Emphasis is placed on contributions relevant to the work in this thesis.  2.1  Process monitoring  Process monitoring is an important problem in the chemical and mineral processing industry because early detection of abnormal, also referred to as undesired, process conditions can help prevent escalation of abnormal events and consequent production loss. Traditionally, the industry has relied on human operators to monitor processes. However, process monitoring is a non-trivial task because industrial processes are usually complex multivariate systems. For example, a large process plant may have more than 1500 variables that are measured every few seconds (Bailey, 1984). In these systems, measurements may remain within their acceptable alarm limits when the process is in an abnormal operating region, thus making it difficult for operators to detect by monitoring the data directly (Serpas et al., 2012). However, a process  7  monitoring system would monitor correlations between multiple variables to determine when the process deviates to an undesired state. The benefits of an e↵ective process monitoring method include improved process economics, safety, and environmental impact. Abnormal events can manifest in many ways such as process unit failures, instrumentation failures or drifting, and process degradation (Venkatasubramanian et al., 2003a). Furthermore, unmeasured variables, such as feed characteristics, may also introduce large disturbances into a process that shift the process to an undesired operating region. Regardless of the root cause, any departure from an acceptable range of an observed variable or calculated parameter is known as a fault (Himmelblau, 1978). Isermann (2005) showed that the time dependency of these faults can be divided into three categories as shown in Figure 2.1. The variable f represents the change of some feature, or operating state, of a process due to a fault, and tf signifies the time of onset of the fault. Abrupt faults cause an immediate and drastic change to some feature of the process; for example, a process unit such as a pump has failed. Incipient faults progressively alter the feature of a process; for example, a measurement slowly drifts due to poor maintenance of instrument calibrations. Lastly, intermittent faults are transient faults that exist for a short time—only to eventually reappear again.  Abrupt  f  Incipient Intermittent tf  t  Figure 2.1: Types of faults and their development with time Fault detection methods can be divided into two categories: model-based methods 8  and data-based methods (Serpas et al., 2012), and both categories can be further subdivided into either quantitative and qualitative methods. Ohran et al. (2009) discuss the main di↵erences between data-based and model-based methods as follows: model-based methods generate data that are compared to measured outputs to create residuals relating to specific process conditions, while data-based methods estimate the location and direction of the system trajectory in state space. Desirable characteristics of a fault detection system are discussed by Venkatasubramanian et al. (2003a) and the points relevant to this work are listed below: 1. Quick detection: there is a trade-o↵ between quick fault detection and sensitivity to high frequency influences such as noise. 2. Robustness: the performance should degrade gracefully rather than abruptly, and the method should be robust to noise and uncertainties. 3. Novelty identifiability: the ability to successfully detect the known fault from the occurrence of unknown, or novel, faults that may arise. 4. Estimate classification errors: provide a confidence level for the classification of a fault condition. 5. Adaptability: the adaptability of the method to changing process conditions such as disturbances, environmental conditions, production targets, etc. 6. Modeling requirements: modeling e↵ort should be minimal. 7. Computational requirements: the method must operate at a satisfactory scan rate in an industrial controller. Although a process monitoring method may satisfy these desirable characteristics, it will not be utilized unless the operators of the process accept the method. To increase operator acceptance, the method should be intuitive and simple to operate and maintain. Furthermore, the operators need confidence that the output of the 9  process monitoring tool is accurate and makes sense. Therefore, for new measurements it is preferable to have a two, or at most, three-dimensional plot depicting the output of the process monitoring tool and the classification boundary between the abnormal and nominal operating conditions.  2.1.1  Model-based methods  Model-based process monitoring tools require a model of the process, often in the form of quantitative methods such as parity relations, diagnostic observers, or optimal state estimators. The output from these models is the expected behaviour of the process, and it is compared to the actual behaviour to generate inconsistencies known as residuals (Venkatasubramanian et al., 2003a). Residual signals are useful for fault detection because they may reflect the occurrence and magnitude of process abnormalities. The advantages of model-based methods over data-based methods, especially for first-principles models, is that the models reflect physical laws of the process rather than coincidences which may only occur under certain conditions.  Furthermore,  model-based methods can be reused on similar processes. Additionally, the assumptions and limitations of the models are typically more clear than data-based methods. However, model-based methods are usually developed on some fundamental understanding of the physics of the process (Venkatasubramanian et al., 2003b), and require very accurate process models to be e↵ective for process monitoring (Ohran et al., 2009). Many chemical processes are so complex and poorly understood that it may not be possible to develop a sufficiently accurate model for model-based process monitoring (Yoon and MacGregor, 2001). There are many model-based techniques that can be used for process monitoring. Each technique o↵ers advantages and disadvantages in terms of computational requirements, required model accuracy, and assumed a priori knowledge of the physics of a process. Moreover, some techniques are better suited for nonlinear processes or stochastic processes. Model-based techniques can be separated into qualitative and 10  quantitative methods. Qualitative methods, such as cause-e↵ect models, are discussed by Venkatasubramanian et al. (2003b). Their main advantage is in diagnosing the root cause of process abnormalities, and the methods are less sensitive to process drift. However, these methods may lead to spurious solutions, and a measurement of the magnitude of process shift is not provided as an output from these methods. Three model-based methods that can be used for process monitoring are discussed in this section: parity relations, diagnostic observers, and stochastic filters. Parity relations A common quantitative model-based process monitoring technique is to develop parity relations. Parity relations are most commonly used for analytical redundancy that may indicate sensor or actuator failures. A parity relation is obtained by rearranging or transforming the input-output process models. The general structure of the parity approach is illustrated in Figure 2.2. This approach is useful when measurements are not heavily corrupted with noise and sensors are calibrated frequently to minimize drift.  Inputs u(t)  Process  Measured Outputs y(t)  Residual Generator (Parity Relation) Residuals r(t) Residual Evaluator Classification Figure 2.2: Residual generation by a parity relation for process monitoring  11  Diagnostic observers Chou (2000) discusses the application of diagnostic observers, also known as unknown input observers (UIO), as a model-based process monitoring technique. Often, a dynamic system can be modelled as having unknown inputs representing disturbances. In a disturbance-free case, the UIO is designed so that its output tracks the output of the process regardless of the unknown inputs. When the process is disturbed, the residuals carry a ‘signature’ of the resultant process shift. The general structure of the UIO approach is shown in Figure 2.3. The figure shows that the UIO approach is di↵erent from the parity approach in that the observer provides an estimate of the state of the process, and it is designed to track the real state asymptotically under nominal operating conditions.  Inputs u(t)  Process  Measured Outputs y(t)  Unknown Input Observer State Estimates x(t) Residual Generator Residuals r(t) Residual Evaluator Classification Figure 2.3: Residual generation by observer for process monitoring  12  Stochastic filters An extension of the observer approach is the optimal stochastic filter known as the Kalman filter (KF) originally proposed by Kalman (1960), and applications of the KF to process monitoring are discussed by Basseville (1988). The KF is used to estimate the state of the process from a series of measurements corrupted by white Gaussian noise (Frank et al., 2000). A KF uses a weighted average of the model prediction of a system and a new measurement to produce a new state estimate with minimum mean square error. A KF operates recursively and requires only the previous most trusted estimate; therefore, a KF can be solved in real-time. The KF has been adapted to nonlinear processes with white Gaussian noise. The Extended Kalman filter (EKF) (Maybeck, 1979) and the unscented Kalman filter (UKF) (Wan and Van Der Merwe, 2000) are two such approaches. These methods are more computationally complex than the KF; however, these techniques can be adapted for online process monitoring. For example, Quine (2006) discusses a derivative-free implementation of the EKF that avoids the need to calculate the Jacobian matrix. Particle filters (PFs) can be employed for processes with nonlinear dynamics where the noise is non-Gaussian (Manohar and Roy, 2006). Monte Carlo simulations are performed to provide an asymptotically consistent distribution of the state of the process; however, PFs are computationally expensive and require an exact process model (Saha and Roy, 2011).  2.1.2  Data-based methods  In contrast to model-based methods where a priori knowledge about the process is needed, data-based process monitoring requires only a sufficient database of historical process data. Consequently, for systems that are complex or poorly understood, a data-based method is advantageous because the process dynamics would naturally be taken into account if enough historical data around various operating points is provided  13  (Venkatasubramanian et al., 2003c). Ohran et al. (2008) mentions the disadvantage of these methods is they usually require fault specific historical data which may not always be available; however, statistical methods for unsupervised learning of the models may be utilized in these cases. In data-based process monitoring, features are extracted from a set of data that present some priori knowledge of the system. Specifically, data-based methods are pattern recognition problems (Venkatasubramanian et al., 2003c). Each data-based method demonstrates some advantage or disadvantage in terms of the desirable characteristics of a fault detection method as listed in Chapter 2.1. Furthermore, each technique has a specific class of systems it is intended to be applied to. For example, some methods work best for only linear or multivariate normal systems. Data-based methods fall into two categories: qualitative methods and quantitative methods. Two important qualitative methods for process monitoring are expert systems and trend analysis (Venkatasubramanian et al., 2003c). These techniques are best suited to diagnosing the root cause of a fault rather than detecting the onset of an abnormal event. For example, expert systems match existing symptomatic information for a fault, such as alarm information or operator supplied information, with a rule-set involving logical condition statements (Ramesh et al., 1988). Venkatasubramanian et al. (2003a) suggest that the best fault detection method is likely a hybrid of process monitoring methods in order to both detect and diagnose faults. In this section, we discuss four di↵erent data-based tools that may be used for process monitoring: artificial neural networks; principal component analysis; support vector machines; and locally linear embedding. Artificial neural networks Artificial neural networks (ANN) are proposed by Venkatasubramanian and Chan (1989) and Patton et al. (1994) as an e↵ective non-statistical quantitative fault detection and diagnosing technique for nonlinear and dynamic systems in the chemical  14  industry. ANN are capable of mapping multi-input and multi-output systems through a network of highly interconnected layers of neuron-like processing elements as shown in Figure 2.4. There are three layers to an ANN: the input layer where signals enter the network, a hidden layer where input signals are routed to the output layer, and the output layer which contains the response to a given set of inputs.  Input Layer  Hidden Layer  Output Layer  Figure 2.4: Architecture of a neural network ANN are trained with historical process data to modify the weights of all interconnections in the network and create an input-output relation useful for detecting faults. However, Venkatasubramanian and Chan (1989) mention ANNs are limited because of the comprehensive and excessive amounts of data or very detailed simulation models required for training. In addition, Zhou et al. (2003) found ANN needed to be constantly re-tuned due to the time-varying behaviour of some processes. Consequently, industrial applications of ANN for fault detection are rare (Koivo, 1994). As an alternative, multivariate statistical methods can be used for fault detection in place of ANN and do not require as extensive of a historian database. Principal component analysis Multivariate statistical techniques have proven in the industry to be e↵ective fault detection techniques. The major benefits of these methods is that there are no requirement for a fundamental or causal model of the system; additionally, they handle 15  large problems, missing data, and noise efficiently (Yoon and MacGregor, 2001). One technique that has seen success in the chemical and mineral processing industries is principal component analysis (PCA). The main concept behind PCA is to transform a number of related process variables to a smaller set of uncorrelated variables, called principal components (PCs), containing the most relevant information (Cheng et al., 2008). Mathematically, PCA utilizes singular value decomposition (SVD) on the covariance matrix of a data-set of the relevant process variables to produce its eigenvec´ tors and eigenvalues (Garc´ıa-Alvarez et al., 2012; Johnson and Wichern, 2007). Only the eigenvectors that explain most of the variability in the data are selected, resulting in the chosen PCs. Figure 2.5 shows a bivariate normal set of data for two arbitrary process variables y 1 and y 2 with the magnitude and direction of the two PCs for the data. The figure shows that PC1 explains significantly more variability than PC2 ; as a result, PC2 may be discarded to reduce the dimensionality of the fault detection model.  PC1  y2  PC2  y1 Figure 2.5: Bivariate normal sampled data showing the relative contributions of the two principal components From the information-rich PCs, it is possible to perform fault detection through statistical analysis of the PC scores of new sampled data to discover outliers. Two commonly used methods to detect data outliers are the Q test, also known as squared 16  ´ prediction error, and the Hotelling T 2 test (Garc´ıa-Alvarez et al., 2012). These methods are two di↵erent measures of distance that reveal if any PC scores are located significantly far from the center of the ellipse of the PC model. Figure 2.6 portrays data dimensionality reduction and outlier detection for three-dimensional data from arbitrary process variables y 1 , y 2 , and y 3 . In this figure, the black line through the data points depicts a side view of a 2D ellipse discovered by PCA. Data points along the ellipse axis, but outside of the ellipse itself, are detected by the T 2 statistic and represent abnormal observations whose variances can be explained by the PC model. Data points that are far away from the axis of the ellipse are detected by the Q statistic and represent the variances that cannot be explained by the PC model (Ko and Shang, 2011).  y2  Normal operating data Outside of normal operating range Correlation structure does not hold Show up in Q test  y1  y3  Show up in T2 test  Figure 2.6: Side view of the ellipse from PCA on 3D data to emphasize dimensionality reduction and outlier detection PCA in its basic form has limited application for fault detection in the chemical industry because many processes are nonlinear and/or dynamic. Hence, PCA has been adapted in many ways to address these shortcomings. To address serial correlations between variables due to process dynamics, Ku et al. (1995) proposed a technique called dynamic PCA (DPCA). DPCA accounts for serial correlations by augmenting observation vectors in the PCA method with observations at previous sampling 17  instances. However, a study by Russell et al. (2000) found that for many common industrial faults, DPCA gave similar performance to PCA in terms of detection delay and missed faults. To extend PCA to nonlinear processes, Sch¨olkopf et al. (1998) proposed a method called kernel PCA. Kernel PCA received attention as a nonlinear fault detection technique because it did not involve a nonlinear optimization problem to be solved as had been required of predecessors of nonlinear PCA based on neural networks (Choi et al., 2005; Kramer, 1991). Kernel PCA uses integral operators and nonlinear kernel functions to map an input space to a feature space via a nonlinear mapping and then it computes the PCs in that space (Lee et al., 2004). Because kernel PCA can use a variety of nonlinear kernel functions, it can handle a wide range of process nonlinearities. However, Choi et al. (2005) note that the computational requirements for kernel PCA may be unreasonable for large systems with many samples, and the feature space is difficult to interpret compared to other methods such as linear PCA. Support vector machines An alternative to kernel PCA is support vector machines (SVMs). SVMs were originally proposed by Boser et al. (1992) as a machine learning technique to classify data having two di↵erent pre-defined categories, and the technique has been evolving since. SVMs have shown to be suitable for fault detection in order to classify data as either faulty or non-faulty. Figure 2.7 shows the principle of SVMs on a two-dimensional plot of arbitrary process variables y 1 and y 2 . The technique discovers an optimal linear classifier in the form of a hyperplane which maximizes the margin between data points on either side of the hyperplane (Tong and Koller, 2002). If a nonlinear classification boundary is required, a kernel transformation can be used to map the data to a higher-dimensional feature space where a separating hyperplane in this space results in a nonlinear boundary when re-mapped to a lowerdimensional space. SVMs can also be adapted with unsupervised learning so that it  18  y2  Margin  Normal Sample Fault Sample  Hyperplane y1  Figure 2.7: SVM trained with two classes where data samples on the margin are called support vectors only requires nominal operating data in the learning algorithm, rather than requiring data from both nominal and abnormal classes (Mahadevan and Shah, 2009). The disadvantage to SVMs, however, is that the kernel mapping technique for nonlinear boundaries is too computationally expensive to be used online in an industrial controller. Locally linear embedding One of the most promising feature extraction techniques with minimal computational requirements for nonlinear processes is locally linear embedding (LLE). LLE was first proposed by Saul and Roweis (2000) as an image analysis technique and has found applications as a fault detection technique, such as for the machinery fault detection simulated case study performed by Li and Zhang (2011). However, no application or case studies of LLE for the chemical or mineral processing industries have been documented in literature, nor has this technique been applied in an industrial environment. LLE is a simple and elegant technique to reduce data dimensionality and extract important features within data that are useful for detecting abnormal operation. For example, Figure 2.8 demonstrates LLE reconstructing a three-dimensional (3D) ‘Swiss Roll’ shaped manifold into two-dimensional (2D) space with the shape’s key features 19  intact. To accomplish this reconstruction, LLE assumes that each data point and its neighbors lie on a locally linear portion of its reduced dimension manifold (Li and Zhang, 2011). Accordingly, LLE often utilizes the K-nearest neighbors (KNN) method to determine the neighbors of each data point (Johnson and Wichern, 2007). The technique also requires the computation of two quadratic minimization problems to determine the optimal weights and coordinate vector in the reduced dimension. If the problem is well posed, the coordinate reconstruction can be minimized by solving a sparse n ⇥ n eigenvector problem, where n is the number of observations in the data matrix (Roweis and Saul, 2000).  Figure 2.8: Dimensionality reduction by LLE performed on A, a 3D manifold of a Swiss Roll, where B is the sampling of the 3D manifold, and C is the reconstruction by LLE of the Swiss Roll in 2D space (Roweis and Saul, 2000) Li and Zhang (2011) observed that LLE neglects class information which would allow the technique to be trained with supervised learning to enhance fault recognition accuracy. Likewise, Li and Zhang (2011) found that LLE does not provide an explicit mapping from high to low-dimensional space. Consequently, new data samples cannot be mapped to the reduced dimension space resulting in the inability to use LLE for fault detection in an online controller. However, Li and Zhang (2011) proposed two techniques to overcome these drawbacks. The first is to artificially add a distance between data samples that are from di↵erent classes. The second is to find a locally linear projection that best approximates the high to low-dimensional mapping performed by LLE. Subsequently, a simple classification technique, such as linear discriminant 20  analysis (LDA), is used on the low-dimensional feature space discovered by LLE in order to classify fault and non-fault conditions. The resulting method is named by the authors as supervised locally linear embedding projection (SLLEP). The results from Li and Zhang (2011) show that SLLEP o↵ers significant improvements in accuracy over PCA, and that the method, once trained, is relatively simple to implement and is computationally inexpensive in a digital control system (DCS). Since LLE is only a feature extraction technique, a simple classifier is used on the feature space discovered by LLE. LDA, also known as Fisher’s discriminant analysis and originally proposed by Fisher (1936), may be used as a linear classifier. LDA is a computationally simple technique to train. In addition, LDA provides an intuitive classification boundary between nominal and abnormal operating data. In this way, LDA is more advantageous than SVM as a classifier. Furthermore, LDA can be extended to a multiclass linear classifier (Rao, 1948). This is especially useful if separate control strategies are needed for multiple abnormal process operating regions. Since LLE preserves the nonlinear features of process data, LDA may need to be extended to its quadratic version, known as quadratic discriminant analysis (QDA), to capture this nonlinearity. Both techniques assume a multivariate normal distribution of the data; however, LDA assumes the covariances of the abnormal and nominal operating data are the same. QDA is derived from one fundamental di↵erence: the covariances of the abnormal and nominal operating data are not assumed to be equal (McLachlan, 2004). Li and Zhang (2011) state that the mapping from SLLEP is more separable than LLE and can be sufficiently classified with either LDA or QDA.  2.2  Controller performance recovery  Controller performance recovery is that the task of recovering a process from an abnormal, nonlinear operating region, back to nominal operation where the nominal linear controller is tuned for its highest performance. Furthermore, a process shift to an abnormal operating region is usually caused by disturbances to the process that 21  were unaccounted for in the nominal control strategy. Unfortunately, it is not always feasible to prevent disturbances from causing a process shift. Chen (2004); Liu and Peng (2000) propose modeling observers that identify external disturbances and then compensate for the e↵ects of these disturbances with proper feedback. However, this is a model-based process monitoring method, hence very accurate process models are required to design an e↵ective observer. Krohling and Rey (2001) propose another method where a PID controller can be designed to reject disturbances if the H1 norm of the disturbance can be assumed to be limited. In both approaches, there are limitations to their applicability; as a consequence, when these limitations are violated the process may still deviate from its nominal operating envelope. We instead discuss five techniques in this section that improve controller performance in the event of a process shift: fault-tolerant; nonlinear; robust; gain scheduling; and override control. Fault-tolerant control Since abnormal situations may still arise despite disturbance rejection e↵orts, there has been considerable interest in literature in fault-tolerant control. FTC is a control strategy that is tolerant to component malfunctions in order to maintain desirable closed-loop performance and stability (Patton, 1997; Zhang and Jiang, 2008). In the event an abnormal situation is detected, FTC attempts to reconfigure the control system by selecting one of several predetermined control configurations whose stability regions have also been predetermined. The selected control configuration is the one whose state of the closed-loop system resides in the stability region and has the most minimal cost function (Mhaskar et al., 2005). The limitation of FTC is that it is not a control strategy designed to recover a process in the event of a process shift. Rather, FTC is designed to avoid disaster in the event of a malfunction of process equipment, sensors and actuators, controller, or control loop. A process shift, however, is a result of acceptable system behaviour that causes the process to deviate from the narrow operating range a linear controller  22  is designed for. In this event, a control method is needed that optimizes only the recovery of the process to its nominal operating range. Nevertheless, FTC shares a similar overall strategy to the objectives of this thesis, and FTC can o↵er valuable insight into process monitoring and controller performance recovery. Other techniques reviewed with potential application to this work are nonlinear, robust, gain scheduling, and override control. Nonlinear control One strategy that addresses the problem of poor control when a process has shifted to a new operating region is nonlinear control. Nonlinear controllers are built from a nonlinear model of the process. As a result, these controllers can have a much wider range of operation than a linear controller. Existing linear control methods are utilized to design a nonlinear controller; however, feedback systems with nonlinear control are usually not analyzable by standard LTI system theory (Nijmeijer and Van der Schaft, 1990). Moreover, Goodwin (2002) shows that it is difficult to guarantee global stability for a nonlinear system, and nonlinear feedback design techniques are usually associated with finding a system output that has stable zero dynamics at the desired operating point. Furthermore, Primbs and Nevistic (1996) show that nonlinear control is not always optimal since nonlinear control problems are usually non-convex, and there is often a trade-o↵ between optimality and global stability. Although a nonlinear controller can o↵er greater performance and range of operability than a linear controller, the mathematical analysis involved with nonlinear feedback control is rigorous (Mari˜ no, 1995). Consequently, personnel in the chemical industry are not well-equipped with the knowledge or tools needed to maintain nonlinear controllers. Besides, Ljung (2001) explain that linear feedback control is forgiving and achieves good control based on an approximate model. Therefore, nonlinear controllers are not essential in the chemical and mineral processing industries to maintain reasonable performance. Furthermore, linear control methods can be adapted to  23  recover a process from a nonlinear region. Robust control Robust control is a technique that can be formulated as a linear control method and be used in place of the nominal controller in a process. A robust controller can be designed to achieve robust performance and/or stability for a given set of modelled uncertainty (Skogestad and Postlethwaite, 2005). Consequently, disturbances pertaining to certain unknown parameters, such as ore hardness, can be modelled and a robust controller, such as a µ-controller, can be designed to guarantee stability for a given range of modelled uncertainty (Packard and Doyle, 1993). The challenge with robust control is the performance–stability trade-o↵ of the feedback system. To overcome this challenge, multiple robust controllers may be designed for specific operating intervals. However, there is generally not enough expertise in the chemical industry to maintain robust controllers. Gain scheduling control Gain scheduling control is a promising linear technique that is a form of adaptive control. Nonlinear plant dynamics are modelled as a linear parameter-varying system, and a Jacobian linearization about a family of operating points is taken in a method called linearization scheduling (Rugh and Shamma, 2000). A family of linear controllers are developed for the linear parameter-dependent plant and the controller gains are scheduled according to the value of the scheduling variable. Gain scheduling techniques are successfully utilized by industry in many applications such as wind turbine control (Bianchi et al., 2006). However, the main disadvantage of gain scheduling is that stability can only be guaranteed under the assumption of a slow variation in the scheduling signals (Leith and Leithead, 2000). Additionally, di↵erent gain scheduling techniques need to be developed depending on whether the nominal controller to be scheduled is PID (Rugh and Shamma, 2000), MPC (Chisci et al., 2003), or another 24  type. Override control Nonlinear, robust, or gain scheduling control strategies focus on replacing the nominal controller in the process with a new controller that is tuned to control a process at both the nominal and abnormal operating points. An entirely di↵erent approach is to keep the nominal controller that is tuned for nominal operation, and implement a secondary controller that is tuned specifically for the abnormal operating region requiring recovery control. The secondary controller would then override the nominal controller’s actions once the process shifts to the abnormal operating region. This controller would be designed to optimally recover the process back to nominal operation. The benefit of using an override controller is that the nominal controller in the process is kept, and the secondary controller can be designed from linear control methods that are easy to maintain. The secondary controller would be some form of an optimal controller to return the process back to nominal operation, such as MPC (Coetzee, 2009). The disadvantage to this approach is that transient signals caused from switching between controllers may lead to instability. Hespanha and Morse (2002) propose a method to avoid instability from controller switching by properly selecting controller realizations where the closed-loop system is bounded for every possible switching signal. In addition, some modern DCS eliminate feedback on inactive loops to prevent integral windup that would cause unstable oscillations upon switching.  2.3  SAG milling  The case study selected for this research is a SAG mill at a mineral processing facility located in British Columbia. Grinding is one of the most fundamental unit operations in a mineral processing facility. Optimal operation of grinding mills is an important research problem because they are the most energy intensive processes in a mineral processing facility—constituting nearly half of all energy consumed in a concentrator 25  plant (Yang et al., 2010). Furthermore, the product particle size distribution from grinding mills has a large e↵ect on the recovery of valuable minerals in downstream processing (Chen et al., 2008). Many mineral processing facilities, such as the mineral processing facility in British Columbia at the start of this work, have only one SAG mill due to their large size and operating cost. An industrial survey conducted by Wei and Craig (2009) revealed that SAG mills are used in approximately 38% of all concentrator plants, and approximately 35% of all concentrator plants have one milling circuit. Therefore, SAG mills often bottleneck the total ore throughput of these facilities, and their stable operation is critical to plant economics. Moreover, if there is only one SAG mill, the entire plant can be shutdown in the event the mill is shutdown. Improving the control of a SAG mill can lead to a substantial increase in the stability and performance of the mill and downstream processing. However, the input conditions and internal state of a SAG mill are highly uncertain and disturbances can easily lead to abnormal and unstable operation, even when the most advanced control methods available to the industry are utilized. For example, the mineral processing facility in British Columbia currently utilizes MPC to control their SAG mill. Accordingly, the MPC at this facility is often set to manual mode so operators can manage the abnormal operating conditions of the SAG mill resulting from input disturbances and variations in the gain of the system. Previous work to address the stability and abnormal conditions of the SAG mill includes robust control (Gal´an et al., 2002), adaptive control (Najim et al., 1995), and disturbance-observer with MPC (Yang et al., 2010). However, the drawbacks are that robust control sacrifices performance for stability (Skogestad and Postlethwaite, 2005), and adaptive control will be sluggish when dealing with strong disturbances (Chen et al., 2009). Disturbance-observer MPC is a promising method to handle disturbances directly; however, if an unobserved or poorly rejected disturbance perturbs the system to an abnormal operating state, then the controller will no longer e↵ectively manage  26  the process.  2.3.1  Process overview  In a mineral processing facility, ore size must be reduced in order to liberate the valuable minerals from the waste rock for downstream processing. Accordingly, a SAG mill is often used in these facilities as a means to grind the ore. SAG mills are unique to other grinding processes because they utilize the ore itself to reduce material size, and can typically grind ore from 200 mm down to 0.1 mm in one unit (Wills and Napier-Munn, 2006). Semi-autogenous grinding implies that a steel ball charge, usually between 4% to 15% of the total mill volume, is added to the ore charge to enhance the grinding process. In these mills, the total charge volume usually occupies between 30% and 35% of the mill interior (Gupta and Yan, 2006). The main components of a typical grate-discharge trunnion supported SAG mill, such as the one operated by the mineral processing facility in British Columbia, are described below and shown in Figure 2.9. 1. Feed trunnion: provides a support for the mill and an entrance for ore and water to enter the mill. 2. Mill shell: main chamber where ore is lifted by the lifting bars and tumbles down to grind the ore as the mill rotates. 3. Grate: a screen allowing the finished product in the form of slurry to pass into the discharge chamber. 4. Pulp lifter: lifts the ore slurry that passes through the grate to the discharge trunnion. 5. Discharge trunnion: provides a support for the mill and an exit for ore slurry. The primary objective of SAG mill control is to stabilize the circuit, and the secondary objective is to maximize the economic performance of the mill (Coetzee, 27  Feed Trunnion  Mill Shell  Gratee  Pulp f Lifter  sc a Discharge Trunnion  Figure 2.9: Schematic of a typical grate-discharge SAG mill (Latchireddi and Morrell, 2003a) 2009). To maximize the economy of the mill, sub-objectives often include (Craig and MacLeod, 1995): 1. maximize throughput. 2. minimize power consumption per tonne of ore processed. 3. minimize steel ball and liner consumption per tonne of ore processed. 4. minimize variations in output particle size.  2.3.2  Overloading  SAG mills are often operated as close to their upper throughput constraint as possible in order to maximize the economic performance of the mill. As a result of high feed rates, the operation of the mill can easily become unstable from overloading. Preventing overloading is non-trivial because the properties of the ore that a↵ect grind-ability, such as ore size, hardness, and composition, are highly variable, and ore hardness is not feasible to measure online. Likewise, the internal state of the mill cannot be directly measured, and estimates from mill power and bearing pressures are seldom accurate (Wei and Craig, 2009).  28  The principle variables a↵ecting the internal state of the mill are changes in feed rate and circulating load, size distribution, ore properties, steel ball charge and liner wear, and the rate of feed water addition (Wills and Napier-Munn, 2006). Moreover, slurry rheology is highly variable and can have significant e↵ects on load distribution, breakage behaviour, and discharge efficiency (Moys, 1989; Shi and Napier-Munn, 2002). Figure 2.10 shows how a mill can be overloaded for several cases of these varying process conditions. The figure shows the mill is always overloaded for a specific range in charge volume; however, the maximum power draw, representative of mill overloading, varies between operating conditions A through D. Unfortunately, it is currently impossible to accurately measure charge volume or slurry pooling during operation. Consequently, it is difficult for operators to detect mill overloading, and once it has been detected, usually from an increasing bearing pressures and constant or decreasing mill power, the mill has been overloaded for a long period. Once detected, operators must then make sudden and large process changes to decrease the feed ore rate to let the mill slowly grind down the charge.  Power  A B C D  Overload Zone  Total Charge Volume Figure 2.10: Mill power curve for several process conditions A, B, C, and D showing that mill power is a poor indicator of overloading Overloading occurs as a result of either, or a combination of, three cases: volumetric 29  overloading, slurry pooling, or the freewheel e↵ect. Additionally, slurry pooling and the freewheel e↵ect eventually lead to volumetric overloading. These three modes of overloading are discussed in detail in the remainder of this section. To the best of our knowledge, this thesis is the first publication that discusses all three complex phenomena within a SAG mill that directly leads to mill overloading. Case 1: Volumetric overload Volumetric overloading occurs when either the ore feed rate, particle size distribution, or ore hardness is too high, thus leading to an accumulation of charge in the mill. Additionally, any disturbance that causes the grinding efficiency in the mill to decrease may also lead to volumetric overloading. As charge accumulates in the mill, the volume occupied by the charge increases such that the total distance ore can fall under gravity, known as cataracting, decreases and causes the grinding efficiency of the mill to degrade (Morrell, 1993). Figure 2.11 illustrates the reduction of maximum potential energy (PEmax ) of cataracting ore resulting from a loss in free-fall distance when the mill is overloaded.  rotation  rotation  PEmax  PEmax xc  xc  c.g. charge  (a)  τc  c.g. charge  (b)  τc  Figure 2.11: Cross-sectional view of the inside of a SAG mill showing load torques and maximum potential energy of cataracting ore for (a) a non-overloaded mill, and (b) an overloaded mill with reduced potential energy of the cataracting ore 30  Once a SAG mill has begun to overload, ore may rapidly accumulate until the charge level reaches the center of the mill where slurry is allowed to freely discharge and the feed chute plugs. However, depending on the charge density, the bearing pressures may become too high such that the system alarms well before this event. Depending on the charge density, it may be difficult to detect volumetric overloading early, since this type of overloading is determined by charge volume rather than mass. However, Figure 2.11 shows that the center of gravity (c.g.) shifts inside the mill, as illustrated by the length of the charge torque arm (xc in Figure 2.11). Consequently, it may be possible to measure the load distribution using the pressure measurements from the bearing units supporting the mill, or by monitoring whether the motor power decreases resulting from a reduced counter-torque provided by the charge. However, the density, volume, and distribution of the charge in the mill are highly variable and may not always lead to a significant change in the counter-torque provided by the charge (⌧c in Figure 2.11) to a↵ect mill power noticeably. Case 2: Slurry pooling Slurry pooling occurs when an excess of slurry builds up inside the mill from poor pulp lifter design resulting in the formation of a slurry pool (Mokken et al., 1975). The slurry pool drastically dampens the impact energy of the falling rock and reduces grinding efficiency. Figure 2.12 shows how slurry is distributed inside the mill for various slurry levels. The figure shows that slurry is held-up within the solid charge as the material is lifted up with the rotation of the mill (Latchireddi and Morrell, 2003a). As slurry content increases, the slurry fills the interstitial space of the solids charge in the direction of the toe, or bottom of the solids charge. When all the interstitial space is occupied by slurry, a pool then forms at the toe of the charge (Cleary et al., 2006). As the slurry pool forms inside the mill, grinding efficiency decreases and the mill may become overloaded—leading to Case 1. At this point, operators may realize that  31  (a)  (b) Slurry in Charge  (c) Solids Charge  Slurry Pool  Figure 2.12: Distribution of slurry within a mill for (a) low slurry content, (b) charge saturated with slurry, and (c) slurry pool formation due to excess slurry content (Latchireddi and Morrell, 2003a) for a constant ore feed rate, the trunnion bearing pressures increase indicating an increase in mill weight. However, mill power may not increase in proportion to the bearing pressure as would normally be expected. This phenomena is the result of a shift in the center of gravity of the mill toward the center as the slurry pool builds. Figure 2.13 shows the slurry pool provides a counter torque (illustrated by ⌧p and the length of the slurry torque arm xp ) to the torque provided from the charge, ⌧c (Moys et al., 1996). As a result, the center of gravity of the total charge inside the mill shifts towards the mill center and results in a decrease in mill power for severely overloaded mills.  32  rotation  rotation  xc  xc  c.g. charge  xp  c.g. charge  c.g. pool τp  (a)  τc  (b)  τc  Figure 2.13: Cross-sectional view of the inside of a SAG mill showing load torques for (a) a non-overloaded mill, and (b) an overloaded mill with additional torque from slurry pool Case 3: Freewheel e↵ect The third mode of mill overloading is when charge becomes trapped between the lifter bars due to centripetal force and the viscosity and density of the charge (Dean, 1999). This phenomena is known as freewheeling because the inertia of the trapped charge combines with the kinetic energy of the mill and e↵ectively reduces the motor load. In addition, motor load decreases significantly because the remaining charge in the mill is no longer lifted because the lifter bars are obstructed. Therefore, the remaining charge becomes static and results in poor comminution and an accumulation of ore in the mill—leading to Case 1. In this abnormal situation, increasing the feed water or decreasing the ore feed rate is needed to decrease the density and viscosity of the charge and recover from the freewheel e↵ect. The freewheel e↵ect is illustrated in Fig. 2.14.  33  rotation  rotation  xc  xc  c.g. charge (a)  τc  c.g. charge (b)  τc  Figure 2.14: Cross-sectional view of the inside of a SAG mill showing load torques and load distribution for (a) a non-overloaded mill, and (b) an overloaded mill with freewheeling e↵ect  34  Chapter 3 Problem Statement The objective of this thesis is to develop a linear control strategy to recover a process to its nominal operating region following a disturbance that has shifted the process to an undesired operating region. To satisfy this objective, the problem is divided into two parts: 1. apply a linear or nonlinear process monitoring method to detect when the linear or nonlinear process has shifted to an undesirable operating region. 2. develop a control method to optimally recover the process back to its nominal operating region. It is also the objective of this thesis to illustrate the overall control strategy with an industrial application on the SAG mill at the mineral processing facility in British Columbia. Consequently, the control strategy must also satisfy the following conditions: • be easy to understand and maintain for process control engineers in industry. • be computationally simple for it to operate in an industrial controller. • be robust to changing process conditions. There are many forms of disturbances to a process, and many root causes for these disturbances. However, we are not interested in the nature of the disturbance. Rather, 35  this work is concerned with the nature of the process shift. The control strategy must be robust to abrupt, incipient, or transient forms of a process shift. In addition, this control strategy only targets process shifts to specific undesired operating regions where the process is recoverable to the nominal operating region. Therefore, process shifts resulting from equipment failure may not result in a controllable recovery and are not within the scope of this thesis. A process monitoring and controller performance recovery technique, when applied so that the techniques work together, may be extremely useful when used to recover processes from specific undesired operating regions where nominal controller performance is low. In particular, these techniques are essential to recover controller performance from undesired operating regions that result in severe consequences to the safety and/or productivity of a process. The following two sections elaborate on the objectives of this thesis.  3.1  Process monitoring  As part of the control strategy to recover a process, the process shift must first be detected. Therefore, the process monitoring method must have the ability to determine whether the state of the system is nominal, or whether it has shifted to an abnormal operating region. Furthermore, there may be many di↵erent abnormal operating regions of a process. For example, a process such as a grinding mill may be overloaded or underloaded. In either case, the throughput of the grinding mill will be depressed, and the process equipment will be at a higher risk of damage. In this example, the nominal controller can likely recover the process from the underloaded condition. However, an overloaded mill can quickly become unstable due to the nonlinearity of the process. In this case, process monitoring and the controller performance recovery techniques are designed specifically for the mill overload. Process monitoring methods have been thoroughly developed and reviewed in literature. However, for many process monitoring methods in literature, there has not 36  been an e↵ort to develop a control strategy that utilizes the information from process monitoring to be used by a controller to recover the process. Consequently, existing techniques will be adapted in this work in order to provide information to a controller that links the process monitoring and controller performance recovery methods.  3.2  Controller performance recovery  The purpose of controller performance recovery is to recover a process from a nonlinear, undesired, operating region back to the nominal operating region where nominal controller performance and process economy is maximized. Comparatively, it is not the objective of this work to prevent the occurrence of a process shift that leads to a loss in controller performance. There are plenty of techniques in literature that try to address this issue; however, it is sometimes inevitable that a process may shift away from its nominal operating region. Nevertheless, it is possible that a control strategy that focuses on recovering a process in order to improve controller performance can also prevent a process from shifting to an undesired operating region. For example, if the process shift is incipient, a control strategy may halt the process shift at the boundary between what is defined as the nominal and undesired operating region. This is an unintended, yet beneficial, consequence of a control strategy that is robust to a diverse nature of process shifts. The controller performance recovery technique is intended only for a specific class of systems. We are interested in systems that can be reasonably approximated as time-invariant. Moreover, the specific abnormal operating region of a process that the control strategy is designed for must be reasonably approximated as LTI. However, most chemical processes are nonlinear. Therefore, the linear control method can only be e↵ective for a specific range of the abnormal operating region that can be approximated as ‘locally linear’. Outside of this range, nonlinear control techniques not covered within the scope of this work need to be applied in place of the linear control method proposed here for optimal recovery. Alternatively, simple rules based on op37  erators’ best practices may be added to the controller performance recovery technique to help recover a process outside of the locally linear region to the region where the technique is designed. However, this alternative is not mathematically optimal and is only to be used if the focus is to recover the process from severe conditions where the linear control strategy designed in this work no longer suffices to recover the process. Implementing a controller performance recovery technique where an existing nominal controller exists for a process may have unintended consequences. For example, the overall economic performance or stability of a process should not be compromised; conversely, the economic performance and stability of the process should be improved. In addition, the transition between the nominal and abnormal operating region should not result in stability issues or transient behaviours.  38  Chapter 4 General Control Strategy This chapter covers the establishment of a general control strategy to recover a chemical process from an abnormal operating region to its nominal operating region, as stated in Chapter 3. The control strategy is divided into two main tasks: the detection of a process shift through process monitoring, and the recovery of the process to nominal operation using a linear controller. An application of this control strategy on semi-autogenous grinding is presented in Chapter 5. Figure 4.1 presents a flowchart showing the major steps of the general control strategy presented in this chapter. The first step is process monitoring. Process monitoring is the task of monitoring process outputs and determining if the process is operating in an abnormal operating range that requires recovery control. In the event the process requires recovery control, it is also the task of process monitoring to estimate the magnitude of the process shift. Process monitoring is discussed in Chapter 4.1. In the event that the process monitoring method detects a process shift where it is deemed control action is necessary, the next task is to recover the process from the abnormal situation. The goal of a controller performance recovery technique is to execute control actions that are tuned specifically to recover the process back to nominal operation in the shortest time possible in order to recover nominal controller performance and process economy. Controller performance recovery is discussed in Chapter 4.2. 39  Measure Process Outputs  No  Abnormal Operation?  Process Monitoring Chapter 4.1  Yes  Determine Magnitude of Process Shift  Normal Control Actions Controller Performance Recovery Chapter 4.2  Recovery Control Actions  Figure 4.1: Flowchart of the process monitoring and controller performance recovery strategy applied to a chemical process  4.1  Process monitoring  There are a multitude of techniques available for process monitoring. Data-based methods are selected in preference to model-based methods because model-based methods usually require very accurate process models. Comparatively, chemical processes are often poorly understood and are too complex to model accurately. In addition, the computational e↵ort of nonlinear model-based techniques, such as particle filtering, is usually too large for an industrial controller. The two main steps for a general data-based statistical process monitoring method are shown in Figure 4.2. The first step is to extract the most important features from a measured high-dimensional data-set known as the observation space. This feature extraction step reduces data sparsity and the dimensionality of the data-set, thus allowing for more accurate training of a classifier in the second step. The second and final step of a general process monitoring procedure is to classify the low-dimensional  40  Tα2, Qα, LDA, QDA  PCA, SLLEP  Observation Space  Feature Extraction  Feature Space  Classification  Decision Space  Figure 4.2: Block diagram of the major steps to statistical pattern recognition for process monitoring feature space into regions of nominal and abnormal operating conditions. These classified operating regions are known as the decision space. In this step, it is possible to include both categorical class information and a measurement of the magnitude of process shift in the decision space. There are two data-based process monitoring methods that sufficiently satisfy the desirable characteristics of a process monitoring method that were outlined in Chapter 2.1. PCA is a computationally efficient feature extraction method that can be applied to monitor process shifts for linear processes. In addition, the method is forgiving for process shifts that are mildly nonlinear because there will only be small changes in the correlation structure between process variables. Also, PCA can detect a process shift to an abnormal, nonlinear, operating region that has a significantly di↵erent correlation structure than nominal operation by measuring the squared prediction error. Classification of the feature space from PCA can be performed with Q and T 2 statistics, or LDA. For processes that are highly nonlinear, or are highly-multivariate such that PCA cannot find at most a three-dimensional model to visualize and summarize key information regarding a process shift, then LLE can be used instead. LLE has proven in image analysis to map high-dimensional data to often three or fewer dimensions and still retain the most important nonlinear features (Saul and Roweis, 2003; Wang, 2011). The resulting low-dimensional space can be classified into regions of nominal and abnormal operation and plotted to visually verify the accuracy and credibility of  41  the technique. Moreover, LLE is also computationally efficient, and it is possible to design a simple approach for it to operate in an industrial controller. Classification of the feature space from LLE can be performed with LDA or QDA.  4.1.1  Principal component analysis  PCA is well-suited as a feature extraction method for process monitoring of linear processes. The technique can easily fit in an industrial controller since there are only O(D2 ) online computations at each execution. Additionally, due to its simplicity, operator acceptance of PCA is high with several applications (Neogi and Schlags, 1998; Kourti et al., 1996; Kosanovich et al., 1996). In this section, we discuss two di↵erent methods to using PCA in process monitoring. In the first method, PCA is trained on nominal operating data, and the PC scores of the model are classified with Q and T 2 statistics. In the second method, we propose training PCA on nominal and abnormal operating data, and the PC scores are classified using LDA. In Chapter 5, we will demonstrate that the second method has better process monitoring results if the changes in the correlation matrix during a process shift are too subtle for the first method to detect. As discussed in Chapter 2.1.2, PCA extracts features within data by finding orthogonal directions of maximum variance using SVD. The orthogonal directions of maximum variance are characterized by the eigenvectors of the correlation matrix from a training data-set, and the magnitude of these vectors is characterized by the eigenvalues. The main underlying assumption with PCA is the data are multivariate normal; however, reasonable results can often be achieved even if this assumption is invalid (Johnson and Wichern, 2007). The first, and most challenging, step of PCA is to determine which process variables should be included in the model and how these variables should be transformed to extract key information in the data. Traditionally, nearly all measured variables related to a particular process are used to train the PC model. In fact, this is the 42  advantage to PCA: one does not need to know much about the process to use the technique. However, this method of training the PCA model may incorporate biases in the PC scores towards certain aspects of the process if there is a lot of redundant information between variables. In addition, the PC model may be trained to find correlations between certain variables in the training data-set that are in fact transient conditions that may not be experienced with the same operating conditions. Appendix A discusses techniques helpful for selecting and transforming process variables for PCA. Consider a data matrix X 2 <n⇥D consisting of n observations and D measured process variables. Generally, n does not need to be very large to adequately train the PC model if there is adequate excitation of the process in the data. Since PCA is scale-dependent, the variables, or columns of X, must be scaled in some meaningful way (Johnson and Wichern, 2007). The most common method is to standardize the D variables by scaling them to zero mean and unit variance. Once X is scaled so that the mean and variance of the variables are in relative proportional to each other, the next step is to construct the covariance matrix, S, from X: S=  1 n  1  X T X,  (4.1)  where S is a square symmetric matrix of dimensionality D. Next, SVD is performed on S to find the eigenvector matrix, V 2 <D⇥D , and the eigenvalue matrix, ⇤ 2 <D⇥D : S = V ⇤V T ,  (4.2)  where the columns of V are the eigenvectors of S, and ⇤ is a diagonal matrix of the eigenvalues sorted in descending magnitude. Now, it is possible to calculate the PC scores, T , which provides a measurement of how an observation scores for each PC. T  43  is calculated using the following expression: T = XV ,  (4.3)  where the columns of T 2 <n⇥D are the scores corresponding to each PC. The next step is to select only the PCs explaining most of the variance in the data-set. There are several procedures to determine the number of PCs to be retained: 1. Cumulative percent variance (CPV): a measure of the total observed variance captured by the first p PCs (Zumo↵en and Basualdo, 2008). Select p such that CPV is greater than some threshold. 2. Scree analysis: the sharpest bend in a graph where the diagonal elements of ⇤ are plotted in descending order indicates the best choice of p (Jackson, 1991). 3. Cross-validation: partition the data into training and test sets and evaluate the predictive accuracy of the PC model for di↵erent values of p. p is chosen such that the squared prediction error no longer significantly increases if p is increased (Ko and Shang, 2011). The procedure for cross-validation is discussed in Appendix C. For process monitoring, it is often acceptable to simply use CPV to find p that satisfies CPV > 90%. CPV is calculated from the following expression:  CPV(p) = where  k  p X  k  k=1  trace(⇤)  ⇥ 100%,  (4.4)  is the eigenvalue in ⇤ corresponding to the kth PC. Once p PCs have been  selected, the columns of V corresponding to the first p PCs create the PC eigenvector matrix P 2 <D⇥p .  The most common usage of PCA as a process monitoring technique uses T 2 and  Q test statistics to classify process conditions. T 2 indicates the magnitude that obser44  vations are from the center of the PC ellipse, yet remain within the axis of the ellipse ´ (Garc´ıa-Alvarez et al., 2009). T 2 2 <n⇥1 is calculated from the expression: 2  T =  p X  diag(tk  k  1 T tk ),  (4.5)  k=1  where tk is the score vector in T corresponding to the kth fault-specific PC. The acceptable upper limit of any T 2 value is given by T↵2 : T↵2 =  p(p n  1) Fp,n p  p,↵ ,  (4.6)  where ↵ is the percentage confidence level, and F is the F -distribution at the ↵ statistical significance level with p degrees of freedom in the numerator and n – p degrees of freedom in the denominator. Values of T 2 larger than T↵2 are considered abnormal observations. Furthermore, the magnitude of T 2 > T↵2 provides an estimate of the magnitude of process shift. The Q test statistic may also be used to classify process behaviour (Jackson and Mudholkar, 1979). Q is di↵erent from T 2 in that it provides a measure of how far observations are from axis of the ellipse of the PC model. In other words, it provides a measure of how poorly the PC model fits a particular observation. Q 2 <n⇥1 is calculated as: Q = diag(E T E),  (4.7)  where E 2 <D⇥n is the residual matrix for the prediction of X from the selected PCs, and E can be calculated from the expression: E = (I  P P T )X T .  (4.8)  Here, I is the D ⇥ D identity matrix. For a new observation, xnew 2 <D⇥1 , the  matrix X T in Equation (4.8) is simply replaced with the vector xnew . The acceptable  45  upper limit of the Q statistic is given by Q↵ :  Q ↵ = ✓1  "  p h0 c↵ 2✓2 ✓2 h0 (h0 +1+ ✓1 ✓12  #1 1) h0  ,  (4.9)  where c↵ is the one-tailed normal deviate corresponding to the upper (1 ↵) percentile, and ✓i and h0 are calculated from the following two expressions:  ✓i =  D X  i j  h0 = 1  j=p+1  with  j  2✓1 ✓3 , 3✓22  (4.10)  raised to the ith power, and 1  i  3. Q functions similar to T 2 where  values of Q greater than Q↵ are classified as abnormal operating data. Moreover, the magnitude of Q > Q↵ provides an estimate of the magnitude of the process shift that can be used as a control signal. The Q statistic is unique because it makes process monitoring with PCA applicable to some nonlinear processes. Since Q provides a measure of how well the PC model fits the observed data, Q will become larger as the process shifts to a nonlinear operating region that is outside of the region where the linear PC model was trained. However, the Q statistic estimates the magnitude of process shift into any nonlinear region. Therefore, the Q statistic should be coupled with some sort of rule set to distinguish which nonlinear operating region the process has shifted to. Taking a SAG mill as an example, if Q > Q↵ and the bearing pressure of the mill is high, then the mill is likely to be overloaded as opposed to being underloaded. The traditional approach to use PCA as a process monitoring technique is divided into the o✏ine and online portions of the method and discussed in Method 1: Method 1: O✏ine 1. Obtain a training data-set, X, from nominal process behaviour with key process variables selected relevant to abnormal operation. 46  2. Obtain the mean and standard deviation vectors of X, then standardize the columns of X to zero mean and unit variance. 3. Perform SVD on the correlation matrix of X to obtain the PC model. 4. Select the number of PCs explaining most of the variance in the data using CPV. 5. Determine the upper control limits for the T 2 and Q statistics. Method 1: Online 1. Obtain a new test sample, xnew . 2. Scale xnew using the original mean and standard deviation vectors of X. 3. Evaluate the T 2 and Q statistics. If one of these values exceeds their upper control limit, then it is considered an abnormal situation. Traditionally, when T↵2 or Q↵ are violated, the response of the DCS is to send an alarm to the operators so they can manually recover the process from the abnormal situation. However, in this work, we are proposing using the magnitude that these control limits are exceeded as an estimate of the magnitude of the process shift to an abnormal operating region. Consequently, this signal can be controlled in the DCS to reduce the magnitude of the process shift below some constraint value. A new approach to use PCA for process monitoring is proposed here for cases where it is found that the T 2 or Q statistics do not correctly identify the process abnormality upon testing the PC model. This may occur if the abnormality causes only subtle changes to the correlation structure of the data. In this event, there may exist some region in the high-dimensional space of X that can be classified as an abnormal operating region. Furthermore, PCA can be used to reduce variable redundancy and lower the dimensionality of X to allow for a simple classification  47  method, such as LDA1 , to classify the nominal and abnormal operating regions within the data. The procedure for this approach is outlined below in Method 2: Method 2: O✏ine 1. Obtain a training data-set, X, from a relatively even distribution of nominal and abnormal process behaviour with key process variables selected relevant to detecting the abnormal operation. 2. Obtain the mean and standard deviation vectors of X, then standardize the columns of X to zero mean and unit variance. 3. Perform SVD on the correlation matrix of X to obtain the PC model. 4. Perform LDA classification with 10-fold cross-validation, as discussed in Appendix C, on all 2D combinations of the PC scores in T . 5. Select only the p PCs that best allow the LDA classifier to separate the nominal and abnormal operating data. 6. Find the linear equation of the LDA classifier from Equation (4.27) separating the nominal and abnormal operating data in the pth dimension. Method 2: Online 1. Obtain a new test sample, xnew . 2. Scale xnew using the original mean and standard deviation vectors of X. 3. Calculate the p PC scores of xnew , where each score is part of the score vector, y, of xnew . 4. Calculate the LDA classifier output value and class label for y from the linear equation discovered in Step 6 of Method 2: O✏ine. 1  Refer to Chapter 4.1.3 for a discussion on LDA  48  In the o✏ine portion of Method 2, it is most likely that a fault will be characterized by no more than three PCs. It is also not necessary that the fault-specific PCs determined from this procedure be the PCs that explain most of the variance in the data. If an abnormal situation is significantly characterized by more than three PCs, then LLE should be used instead of PCA since LLE is usually capable of mapping high-dimensional data to three or fewer dimensions. It is important to have three or fewer dimensions characterizing a process shift to an abnormal operating region because it allows operators to visually interpret the PC scores and the classification boundary and gain trust in the process monitoring method. In Method 2, abnormal process training data should be selected where the correlation structure between the highly correlated variables determined from Method 1 does not change drastically when the model is trained with the addition of the abnormal process data. The objective of training the PC model with the inclusion of abnormal process data is to develop the correlation structure of variables having poor correlations in the nominal operating region, yet may express a clear correlation structure in the abnormal operating range. The shift in center of gravity of a SAG mill is a good example of a variable that only has a clear correlation structure once the mill has begun overloading. These types of variables provide extra redundancy in the PC model that is useful for detecting complex phenomena such as mill overloading. For both Method 1 and Method 2, it may be beneficial to scale the variables in X from 0% to 100% in proportion to the minimum and maximum values that may be expected during process operation. A percentage scale is meaningful to operators and more robust to time-varying changes (MacGregor and Kourti, 1995). The minimum and maximum limits to scale the variables can usually be estimated from the operating limits of the variables measured in the DCS, since industrial control often requires process variables to be scaled from 0% to 100%. When using a percentage scale, it is important to scale so that the variances of the variables are in relative proportion to each other. However, the limits for scaling may also be manipulated to add bias to  49  certain variables in the variances when training the PC model. In Method 2, the signal to be controlled is the distance the PC score vector of xnew is located away from the classification boundary and into the abnormal operating region. Moreover, using the centroid of several consecutive score vectors reduces the likeliness that stochastic signals will cause a single observation to be classified as abnormal when the process is actually operating nominally. In Method 2, Fisher’s approach to LDA has the ability to perform feature extraction instead of PCA (Johnson and Wichern, 2007). However, we do not use Fisher’s LDA because we found that this technique is sensitive to changes in the mean values of variables for nominal operation from preliminary experiments on the case study in Chapter 5. This lead to poor performance with Fisher’s LDA and classification when the model was used on a new test set having di↵erent nominal operating conditions. On the other hand, we found that PCA is less sensitive to changes in the mean values of process variables, thus better process monitoring results are expected for monitoring processes with varying operating conditions. If either Method 1 or Method 2 is unable to distinguish the abnormal process behaviour of concern from the nominal process behaviour, it may be because the abnormal process behaviour is too nonlinear for PCA, hence a nonlinear process monitoring technique such as LLE is needed. Likewise, if more than three PCs are needed to classify the abnormal process behaviour, then LLE should be used since it has demonstrated in image analysis to be capable of mapping high-dimensional data on the order of thousands of dimensions to three or fewer dimensions (Saul and Roweis, 2003; Wang, 2011).  4.1.2  Locally linear embedding  If the process is too nonlinear or high-dimensional that PCA is found to be unsuitable as a process monitoring method, then LLE can be used as an alternative to PCA. LLE, however, is more computationally expensive than PCA both in training o✏ine 50  and in online operation. As discussed in Chapter 2.1.2, LLE is a feature extraction method that nonlinearly maps locally linear patches of a high-dimensional data-set into a low-dimensional space that preserves the critical information in the data-set. Furthermore, if LLE is trained with supervised learning, classification of the low-dimensional mapped space can be done with a simple classifier. As a result, LLE allows operators to visualize how the process monitoring method determines whether new observations are classified as nominal or abnormal. Before training the LLE model, we must first determine which process variables should be used in the model, and how they should be transformed. Appendix A gives some suggestions to this process. Unlike PCA, LLE does not require as much pruning when selecting process variables; in other words, LLE can tolerate high-dimensional data-sets. On the other hand, training LLE requires substantially more process data than PCA. LLE requires a lot of data because new data are mapped to the feature space by performing a linear regression on the training data, and if a lot of training data are available, then it is expected that a new data point and its neighbors in the training data-set are located on a locally linear patch of the manifold. To properly map a variety of process conditions, we found from the case study in Chapter 5 that we needed to train LLE with nearly n = 2000 observations that cover a wide range of operation and sampled only as frequently as necessary, including as extensive coverage as possible of the abnormal operation requiring the control strategy. If abnormal operating data are scarce, it should be sampled more frequently than nominal operating data in order to provide enough observations to properly train the technique. Correspondingly, n may be di↵erent depending upon the signal to noise ratio of the data, variations in operating conditions, and information redundancy in the training data. Lastly, failure to include enough data when training LLE may result in erroneous results when mapping new observations to the feature space. In the most simple formulation of LLE discussed by Roweis and Saul (2000), there  51  are three fundamental steps performed o✏ine of a DCS once X has been obtained: the first step is to select the nearest neighbors of each data point in X; the second step is to reconstruct each data point from its neighbors; and the final step is to map the data to a low-dimensional feature space. Fig. 4.3 reviews the three steps of the LLE procedure. In this form of LLE, there is only one free parameter to determine during training: the number of neighbors, K. Selection of K is discussed at the end of this section.  1  Select neighbors  xi  Reconstruct with 2 linear weights  xi  Wik  xk  Wij xj yi  Wik y k Wij yj 3 Map to embedded coordinates  Figure 4.3: The three steps of the LLE algorithm (Roweis and Saul, 2000)  Step 1: Select neighbors. The first step is to identify the K-nearest neighbors of the training data matrix, X 2 <n⇥D , where X has been standardized to zero mean and unit variance or scaled from 0% to 100%. To find the neighbors of the ith observation in X, xi , the Euclidean 52  distance between xi and every observation needs to be determined. Here, the K data points with the shortest distances are the neighbors, and a neighbor of xi is denoted as x⌘ . The Euclidean distance between xi and another observation in X, xj is calculated from the expression: xi  xj  2  ,  (4.11)  where || · ||2 refers to the l2 norm. The K-nearest neighbors characterize the geometry of these local patches by linear coefficients used to reconstruct each data point from its neighbors. In reality, these local patches are not exactly linear; consequently, there are errors when using linear weights to reconstruct each data point from its neighbors in Step 2. Step 2: Reconstruct each data point from its neighbors. The second step is to reconstruct each data point from its neighbors by minimizing the cost function: minimize W  "(W )  =  n X  xi  i=1  subject to Wij n X  = 0,  n X j=1  2  Wij xj  ,  if xj 62 {x⌘1 , x⌘2 , . . . , x⌘K },  Wij = 1.  (4.12)  2  (4.13) (4.14)  j=1  Here, the weights W ij are the magnitude that the jth observation contributes to the ith reconstruction. W ij are constrained to zero in Equation (4.13) if they do not belong to the set of nearest neighbors for the ith reconstruction. "(W ) adds up to the squared distances between all data points and their reconstructions. The optimal weights subject to the constraints in Equation (4.13) and Equation (4.14) that minimizes the reconstruction error in Equation (4.12) can be found by solving a least-squares problem as detailed in Appendix B.1. 53  An illustration of the locally linear reconstruction of xi is shown in Figure 4.4. This figure shows that a nonlinear curve must be drawn between xi and its two neighbors. In order to to develop a linear reconstruction, the data point xi must be reconstructed onto a line between the two neighbors x⌘1 and x⌘2 . The weights W ij are the degrees of freedom to minimize the reconstruction errors.  xi xη2 xη1  ΣWij xj  Figure 4.4: Locally linear reconstruction of xi (Saul and Roweis, 2003)  Step 3: Map to embedded coordinates. The third and final step is to map each high-dimensional observation, xi , to a lowdimensional feature vector, y i . Each y i represents the d-dimensional global internal coordinates on the low-dimensional manifold, where d represents a low dimension such that d < D, and y i is found by minimizing the embedding cost function: minimize  (y)  y  subject to  =  n X i=1  n X  yi  i=1 n X  1 n  yi  n X j=1  2  Wij y j  ,  (4.15)  2  = 0,  (4.16)  y i y Ti = I,  (4.17)  i=1  where I is the d ⇥ d identity matrix. Equation (4.16) prevents y i from be translated by a constant displacement by forcing y i to be centerd on the origin, and Equation (4.17) avoids degenerate solutions by constraining the embedding vectors to have unit covariance. The embedding cost defined in Equation (4.15) is a quadratic program (QP); however, subject to the constraints in Equation (4.16) and Equation (4.17) that make the 54  problem well-posed, the QP can be reformulated to solve a sparse n ⇥ n eigenvalue problem as detailed in Appendix B.2. Supervised LLE An adaption to LLE is discussed by De Ridder and Duin (2002) to improve the method when used to extract features within a data-set for subsequent classification. The technique is called supervised LLE (SLLE), and it includes class information in the algorithm by artificially adding a distance between samples from di↵erent classes. In process monitoring, there are usually two classes: data that is nominal and data that is abnormal. To perform SLLE, only Step 1 of the original algorithm is changed: before selecting the K-nearest neighbors of xi , add an additional distance, R0 , between data points if they belong to di↵erent classes. Here, the distance R0 that incorporates class information is calculated as follows: R0 = R + max(R)(1  (xi , xj )),  (4.18)  where R = ||xi xj ||2 is the Euclidean distance between two data points, and max(R) = maxij ||xi xj ||2 is the distance between the two observations that are farthest apart in X. For every combination of xi and xj that belong to separate classes, (xi , xj ) = 0, otherwise (xi , xj ) = 1. Lastly, the parameter information incorporated into R0 , where 0  at most 0.2 because large  is used to control the amount of class  1. In practice,  is often chosen to be  values result in data from the same class to coalesce on the  same point, thus resulting in poor mapping of new data to the feature space with LLE projection (De Ridder and Duin, 2002; Li and Zhang, 2011). Comparatively,  can  be very small if there are outliers in the data that cause max(R) to be large. Owing to the ability of SLLE to separate data belonging to di↵erent classes, we are able to use a simple classifier, such as LDA or QDA, when classifying the feature space into  55  regions of nominal and abnormal process operation. LLE projection Since LLE is a non-parametric method, there is no equation to map new observations into the same feature space discovered by LLE during o✏ine training. As a result, Li and Zhang (2011) adapted LLE to project a new D-dimensional observation, xnew , to the d-dimensional space as y new . This technique is called LLE projection, and there are three steps computed at each controller execution to project new observations into the feature space for subsequent classification: the first step is to select the nearest neighbors of a new observation to the observations in X; the second step is to find a mapping matrix; and the final step is to calculate the embedding coordinate of the new observation. LLE projection is performed online in a DCS at every scan. Moreover, this technique requires the entire training data-set, X, and the entire embedded coordinate vector, Y , to be stored in the DCS. Here, X includes the class information from SLLE, and Y is a matrix containing all embedded coordinates, y i , determined from Equation (4.15). Step 1: Select neighbors. For a new high-dimensional sample, xnew 2 <D⇥1 , the first step is to find its K neighbors, X ⌘ , in the training data-set X by a Euclidean distance measurement, where: X ⌘ = [x⌘1 , x⌘2 , . . . , x⌘K ] 2 <D⇥K .  (4.19)  Step 2: Find a mapping matrix. The second step is to compute a mapping matrix, A, from the training data that satisfies: Y ⌘ = AX ⌘ , 56  (4.20)  in the least squares sense, where: A = [a1 , a2 , . . . , ad ]T 2 <d⇥D ,  (4.21)  Y ⌘ = [y ⌘1 , y ⌘2 , . . . , y ⌘K ] 2 <d⇥K ,  (4.22)  and Y ⌘ is the neighborhood embedded coordinate matrix of the feature data corresponding to the observations of X ⌘ . In other words, the coordinate vectors in Y ⌘ are taken from the vectors in Y corresponding to the observations, X ⌘ . Each basis vector of A, aj , is obtained by solving the linear least squares regression problem:  aj = argmin a  K X  a T x⌘ i  y⌘j i  2 , 2  (4.23)  i=1  where 1  j  d, and y⌘j i is the jth element of y ⌘i . The analytical solution for the case of X T⌘ having full column rank and the case where X T⌘ is column rank deficient is given in Appendix B.3. Step 3: Compute the embedding coordinate. The final step is to calculate the embedding coordinate of the new observation, y new 2 <d⇥1 , from xnew using the mapping matrix determined in Step 2: y new = Axnew .  (4.24)  In Step 2 of LLE projection, the solution for A is obtained by solving the QP in Equation (4.23). Since the analytical solution described in Appendix B.3 requires the computation of a matrix inverse, it is difficult to program and execute in an industrial controller. In many industrial controllers, matrix inversion is not built-in and must be programmed using elementary row operations—often using Gauss-Jordan elimination (Friedberg et al., 2002). A new method is proposed in this thesis to solve A. Since MPC is a pre-existing 57  module found in most modern DCS, it can be used as a QP solver to solve Equation (4.23). A dynamic matrix control MPC algorithm is discussed in detail in Appendix D, which is a common unconstrained MPC formulation. To use any form of MPC to solve Equation (4.23), the manipulated variables (MVs) are treated as the elements of A, and the controller will determine the values of these elements in A by minimizing its cost function. Furthermore, the control variables (CVs) are treated as elements of Y ⌘ whose target set-points (SPs) are the values of the elements in Y ⌘ . Finally, the input-output model gains of the MPC are the values of X ⌘ pertaining to the elements of the input-output relation in Equation (4.20), and it is not necessary to assume a first-order model, or dynamics, in these models. Example of LLE projection using MPC MPC was used to find the projection of a data point in an ‘S’ shaped curve shown in Figure 4.5. In this figure, n = 800 random samples, or observations, of a 3D ‘S’ shaped manifold are shown in Figure 4.5a. In this plot, one observation is randomly selected, and its K = 12 neighbors are identified. Next, LLE was performed to map the 3D data to 2D space and the results are shown in Figure 4.5b. A black box depicts the inset of this plot that is shown in Figure 4.5c. Finally, in Figure 4.5d, MPC was used to perform linear regression on the neighbors of the observation, and the linear equation was used to project the coordinates of the data point in 3D space to the mapped 2D feature space. The plot shows that discrepancy exists between the coordinates of the data point mapped by LLE and its projection from the linear equation. This is due to the nonlinearity of the ‘S’ curve, and this error decreases as the number of samples the LLE model is trained with increases. The results from MPC for the mapping shown in Figure 4.5 are comparable to the analytical solution of Equation (4.23) described in Appendix B.3. In fact, MPC discovers the exact same solution as the analytical solution from the Moore-Penrose pseudo-inverse in one iteration. To have MPC converge on the solution immediately,  58  a)  b) observation  neighbors  c)  projection  d)  Figure 4.5: LLE mapping of an ‘S’ shaped curve to demonstrate the accuracy of LLE projection, where a) is a 3D ‘S’-shaped manifold, b) is a 2D mapping of the 3D with an inset illustrated in c) of a data point and its neighbors, and d) is the data point estimated from the neighbors in LLE projection the models in MPC, as described by the coordinates of the K neighbors in 3D space, were treated as gain-only models with no dynamics. In addition, the penalty-on-move term of the MPC was set to zero for all six of the MVs describing the linear mapping matrix A. On a final note for LLE projection, we observe that X and Y need to be stored in the DCS in order to determine X ⌘ and Y ⌘ in Equation (4.20). Since n needs to be large for adequate LLE projection results, a lot of data needs to be stored in the DCS. To reduce the amount of data that needs to be stored in the DCS, we recommend pruning the training data-set once the feature space has been trained in order to reduce the size of n. De Ridder and Duin (2002) propose a method to select a subset of the training data once the feature space has been trained to resolve this issue.  59  SLLEP for classification The combination of SLLE for training the feature space and LLE projection for mapping new data is collectively known as SLLEP. Once SLLE trains the information rich feature space, this low-dimensional space can be classified into regions of nominal and abnormal process operation. If a control strategy is sought for only one type of abnormal operation, there will likely be only one contiguous region in this space corresponding to the abnormal operation. However, multiple regions can be classified if the features in the low-dimensional space characterize more than just one type of process abnormality. Owing to the ability of SLLE to incorporate class information when training the feature space, process data from separate classes is well-segregated and can be easily classified with a linear or quadratic classifier. It is ideal to use a linear or quadratic classifier because of the simplicity and robustness of training the classifier o✏ine, and because it is simple to determine a meaningful distance that an observation in the feature space is located from the classification boundary. LDA and QDA are suitable linear and quadratic classifiers, respectively, to classify the feature space from SLLE into regions of nominal and abnormal operation. LDA and QDA are discussed in Chapter 4.1.3. LDA should be used if it appears that the abnormal operating data are linearly separable from the nominal operating data. Otherwise, QDA, is usually acceptable. New data projected to the feature space is classified by the same classification rule as the training data to provide an estimate of the class label and the distance to the classification boundary. If LDA or QDA poorly classify the feature space into regions of nominal and abnormal operation, then the tuning parameters K and  should be adjusted when training the SLLE technique.  Selection of K and The selection of K and  significantly a↵ects the ability of LDA or QDA to classify the  feature space discovered by SLLEP. K determines the number of neighbors around a 60  particular data point that are considered to be within a locally linear region of the observation space. If K is too small, then SLLEP does not have enough information about the global properties within the data-set; if K is too large, then the mapping loses its nonlinear character and behaves like PCA. determines how much class information is included in training SLLEP. A large will excessively separate classes and lead to over-generalization of the feature space. With a large , erroneous results may occur when projecting new observations into the feature space. Furthermore, the classifier loses the ability to provide meaningful information on the magnitude of process shift to a controller since data from the same class converge on a single point in the feature space. On the other hand, a small does not incorporate any class information when training SLLEP. Consequently, it is less likely that LDA or QDA can classify the feature space due to the nonlinearity of the features. Since there are two free parameters to manipulate when training SLLEP, 10-fold cross-validation should be performed using the procedure discussed in Appendix C to optimize the selection of the parameters. Cross-validation finds the best selection of K and  by evaluating the unbiased prediction accuracy of LDA or QDA. Once  the parameters have been chosen and the SLLEP model has been trained, the model should then be tested on a test set of data before commissioning in the industry.  4.1.3  Linear and quadratic classification  LDA and QDA are simple classification methods that find a linear or quadratic combination of features that best separate two or more classes. In this thesis, we focus on the classification of two classes, and leave the simple extension to three or more classes to Johnson and Wichern (2007). The main underlying assumption in LDA and QDA is that the data to be classified are multivariate normal (Johnson and Wichern, 2007). Although this is not always the case, LDA often remains an e↵ective classifier if this assumption is not violated 61  (Johnson and Wichern, 2007). However, QDA may lead to erroneous results when classifying a feature space greater than 2D if this assumption is violated (Johnson and Wichern, 2007). Since PCA only performs well on multivariate normal data, LDA should be a sufficient classifier when PCA is used for feature extraction. SLLEP does not assume multivariate normality, thus  can be tuned to sufficiently separate the  classes in the feature space, and LDA may be used in the absence of multivariate normal data. However, QDA is a quadratic classifier, and it may be used to better separate a 2D feature space than LDA without having to over-tune . LDA and QDA classifiers are trained in a supervised manner from the known class labels of the feature data to be classified. Therefore, we must divide our feature data matrix into two groups, Y 1 and Y 2 . Here, Y 1 contains the nominal operating data belonging to class ⇡1 , and Y 2 contains the specific abnormal operating data belonging to class ⇡2 that represents the abnormal operating range that a control strategy is to be designed for. The remaining steps between LDA and QDA di↵er from each other and are discussed separately. Linear discriminant analysis To construct a linear classifier, it is assumed in LDA that the covariance matrices of Y 1 and Y 2 are equivalent. However, since the true covariance matrix of any of these populations are unknown, we take a pooled estimate of the covariance matrix, S pooled , from the sample covariances. S pooled is calculated from the expression: S pooled =  (n1  n1 1 1) + (n2  1)  S1 +  (n1  n2 1 1) + (n2  1)  S 2,  (4.25)  where S 1 and S 2 are the sample covariance matrices calculated from Y 1 and Y 2 , respectively, and the formula to find the covariance matrix is given in Equation (4.2). Additionally, n1 is the number of observations in Y 1 , and n2 is the number of obser-  62  vations in Y 2 . The next step is to find the mean vectors of Y 1 and Y 2 : ¯1 = y  1 T Y 1, n1 1  ¯2 = y  1 T Y 1, n2 2  (4.26)  where 1 is a vector of ones with as many elements as there are observations in Y 1 or Y 2 . The allocation rule of LDA for a given observation in the feature space, y i , is:  class =  where:  8 > < ⇡1 > : ⇡2  ˆ = (¯ a y1  ˆ T yi a T  ˆ yi a  m ˆ  cost  (4.27)  m ˆ < cost,  1 ¯ 2 )T S pooled y ,  1 1 ¯ 2 )T S pooled m ˆ = (¯ y1 y (¯ y1 2 ✓ ◆✓ ◆ c(1|2) p2 cost = ln . c(2|1) p1  (4.28) ¯ 2 ), y  (4.29) (4.30)  ˆ is the sample coefficient vector, m Here, a ˆ is the estimated midpoint between the two classes, and the cost equation accounts for the probability of misclassification and the cost of misclassification. In the cost equation, p1 is the prior probability of occurrence that an observation belongs to the ⇡1 , and vice versa for p2 ; for example, if the probability that a new observation will belong to ⇡2 is small, then LDA should classify a new observation as ⇡1 unless the data overwhelmingly favors ⇡2 . In addition, c(1|2) is the cost associated with classifying an observation as ⇡1 , when it actually belongs to ⇡2 , and vice versa for c(2|1). For example, if a SAG mill is overloaded and we classify a new observation as belonging to the non-overloaded class, then the mill overload is not corrected for and the consequences are more severe than if the mill was not overloaded but was classified as overloaded. Not only does the LDA allocation rule dictate the class of an observation, it also  63  gives the equation of the linear classification boundary for solutions to: ˆ T yi a  m ˆ  cost = 0.  (4.31)  Equation (4.27) provides the classification rule to determine whether an observation ˆ T yi belongs to ⇡1 or ⇡2 . In addition, the calculated output of a  m ˆ  cost yields a  value that is proportional to the distance an observation is located away from the linear classification boundary. The output value is simply the closest distance an observation is to the classification line that has been scaled by an arbitrary amount that is consistent between all observations. In other words, this output value can provide an estimate of the magnitude of process shift that is needed for the controller performance recovery strategy. Quadratic discriminant analysis QDA is di↵erent from LDA in that there is no assumption that the covariance matrices of Y 1 and Y 2 are equivalent. Without this assumption, the classification regions are defined by quadratic functions of y i . The allocation rule of QDA is: f (y i ) =  1 T y i (S 1 1 2  S 2 1 )y i + (¯ y T1 S 1 1  class =  where: 1 k = ln 2  ✓  |S 1 | |S 2 |  ◆  8 > < ⇡1 > : ⇡2  ¯ T2 S 2 1 )y i y  f (y i )  k  0  cost,  (4.32)  (4.33)  f (y i ) < 0,  1 T 1 ¯ + (¯ y S y 2 1 1 1  ¯ T2 S 2 1 y ¯ 2 ). y  (4.34)  Here, |S| denotes the determinant of S. The QDA allocation rule not only dictates the class of an observation, it also gives the equation of the quadratic classification boundary for solutions to f (y i ) = 0. 64  Similar to LDA, the calculated value of f (y i ) provides an estimate of the distance that an observation is located from the quadratic classification boundary. However, unlike LDA, this distance is not the shortest distance that an observation is to the classification boundary. Instead, the value of f (y i ) corresponds to the location of the observation on the classification boundary if the quadratic feature was increased or decreased in size. In other words, the output of f (y i ) is a meaningful estimate of the magnitude that an observation is from the classification boundary that also accounts for the curvature of the classifier. As a result, this output value can be used to estimate the magnitude of process shift that is needed in the controller performance recovery technique.  4.2  Controller performance recovery  Controller performance recovery is a control strategy where information from process monitoring is used to improve controller performance in order to recover a process from an abnormal operating range to its nominal operating range. This section discusses how to model the process in an abnormal operating region to allow us to tune the recovery controller. In addition, this section discusses two control strategies to recover the process: the first is applicable to a process that utilizes any style of nominal feedback controller, and the second is applicable to a process utilizing MPC as the nominal feedback controller. A general control objective of a process is given in the following linear optimization  65  problem: minimize u  "(u) = abs(x  xsp ),  (4.35)  subject to xlower  x  xupper ,  (4.36)  y lower  y  y upper ,  (4.37)  ulower  u  uupper , 2 3 x˙ 4 5 = f (x, u, w), y  (4.38) (4.39)  where x is a vector of process states to control; y is a vector of measured process outputs; u is a vector of manipulated process input variables; the subscript sp represents a set-point; and the subscripts upper and lower represent the upper and lower constraint limits. The process outputs are related to the inputs in Equation (4.39). Here, x˙ is the derivative of x, and w is a vector of measured input disturbances to the process. The process dynamics of f (x, u, w) are complex due to the nonlinearity of the process. Therefore, it is not always possible for a linear nominal controller to maintain high performance in the form of best minimizing Equation (4.35) in the event the process shifts away from the nominal, locally linear, operating region. As a result, a control strategy is needed to recover the process to the nominal operating region. MPC is used in both controller performance recovery techniques as the recovery controller. In the case where any style of nominal feedback controller is used, MPC will externally override the existing controller; conversely, in the case the existing controller is already MPC, then the process monitoring information is used as a new constraint variable in the existing MPC. MPC is an ideal choice because it is a multivariate controller that can handle constraints, and it has achieved wide acceptance in the industry. MPC also solves an optimization problem in order to choose the best control actions that minimize prediction residuals over a finite prediction horizon. For  66  operators in the industry, MPC is reasonably easy to maintain and understand, and tools exist in most modern DCS to easily implement MPC. The principles and design of an unconstrained MPC formulation are presented in Appendix D. For more detail on MPC, Camacho and Bordons (2007) discuss MPC in great detail including formulations involving constraints. In addition to using MPC as the recovery control in the event of a process shift leading to poor nominal controller performance, expert rules are also proposed in this section. Expert rules are only needed to recover a process that has shifted into a severely abnormal operating region where even the MPC recovery controller has poor performance. In this case, the objective of the expert rules is only to recover the controller performance of the MPC recovery controller, thereby enabling the MPC recovery controller to recover the process to the nominal operating region.  4.2.1  Modeling the abnormal operating region  When modeling the abnormal operating region, we use the output from the process monitoring method as the CV used to monitor the process recovery. This CV contains a measure of the magnitude of the process shift into a specific abnormal operating region. Since this CV is a special transformation of measured process outputs, it is appropriate to use in closed-loop feedback. The main assumption to model the abnormal operating region is that this region is assumed to be locally linear. For this assumption to be valid, the control boundary for this region must be selected so that reasonably linear process behaviour can be expected for observations in one point fn the region to another. Figure 4.6 shows a 3D nonlinear manifold of process operation depicted by three arbitrary feature variables. This figure shows the locally linear regions of the nonlinear manifold depicted by planes tangent to the manifold where the nominal and recovery controllers are linearized for operation. At the edges of the abnormal operating region, we see that on one end lies the nominal operating region, and on the other end is process operation that 67  Severely Abnormal Operating Range Abnormal Operating Range y3 y2 y1 Nominal Operating Range Figure 4.6: 3D manifold of process operation with two planes showing the linearized regions the nominal and recovery controller are designed to operate is considered to be severely abnormal. Process recovery from a severely abnormal operating region is challenging and probably best achieved using expert rules that mimic an operator’s best actions in this event. However, discrete control actions from these rules are not mathematically optimal, thus the expert rules should focus only on recovering the process to the abnormal operating region where the continuous recovery controller is designed. In Figure 4.6, we see the assumption of local linearity breaks down at the edges of the planes due to the nonlinearity of the manifold, thus the size of these regions is dictated by a predefined allowable tolerance for process model mismatch and controller performance decay. Once the boundaries of the abnormal operating region have been decided, then operating data that lies within abnormal operating region is needed to model the process in this region. However, it is unlikely that there will be abundant operating data available within this region, and the process would not have been allowed to operate abnormally for a long time. Thus, it may be hard to find enough operating data to train a process model accurately in this region. The strategy to develop the process model is to ideally find at least two sets of data  68  Gp  U  Y  Figure 4.7: Block diagram for the input-output relationship of a modelled process where the operators placed the control in open-loop to manually recover the process from an abnormal operating region. At least one data-set is used to train a process model for the abnormal operating region, and one data-set should be set aside as a test set to validate the model. For each of the training data-sets, train a first-order plus time delay (FOPTD) or second-order process plus time delay (SOPTD) model of the process response for each process input-output pair. There are many forms of process models we can use; however, FOPTD or SOPTD are chosen in this work to model the process because many chemical processes can be adequately modelled with these forms, and most modern DCS allow for these forms of the process model in the built-in MPC. Figure 4.7 illustrates the block diagram for a process with output Y (s) and input U (s) that is modelled using a transfer function Gp (s), where Y (s) is the process output, y(t), represented in the frequency domain and U (s) is the process input, u(t), represented in the frequency domain. The process transfer function, Gp (s), for a FOPTD process is of the form: Gp (s) =  Y (s) Kp = exp( ✓s), U (s) ⌧1 s + 1  (4.40)  where Kp is the gain of the response in Y (s) for a change in U (s), ⌧ 1 is the time constant, and ✓ is the time delay of the process response. The Laplace variable, s, is known as the operator variable in the frequency domain. A seldom used form of the FOPTD model is when lead dynamics are included in the numerator of the transfer function. A FOPTD model with lead dynamics is of the form: Gp (s) =  Y (s) ⌧n s + 1 = Kp exp( ✓s), U (s) ⌧1 s + 1 69  (4.41)  where ⌧n is the time constant of the first-order numerator dynamics. It is unlikely this transfer function is needed for most chemical processes. However, if the process response is fast, the numerator dynamics may help in modeling the transfer function. A SOPTD model usually arises whenever two first-order processes are connected in series. Gp (s) for a SOPTD process is of the form: Gp (s) =  Y (s) Kp = exp( ✓s), U (s) (⌧1 s + 1)(⌧2 s + 1)  (4.42)  where ⌧ 1 and ⌧ 2 are the time constants for the two first-order transfer functions making up the SOPTD transfer function. For chemical processes, it may be necessary to include dynamics in the numerator of the SOPTD model, particularly when the process data shows an inverse response. The SOPTD model with numerator dynamics is: Gp (s) =  Y (s) ⌧n s + 1 = Kp exp( ✓s). U (s) (⌧1 s + 1)(⌧2 s + 1)  (4.43)  To identify any of the process models in Equations (4.40)–(4.43), a continuous-time model identification technique should be conducted on sampled data. Discrete-time model identification techniques, such as output error modeling, are not considered because these models need to be converted to continuous time, and it is not guaranteed that a discrete-time model can be converted to a continuous-time model with a specific pre-determined form (Bequette, 1998; Ljung, 2007b). Garnier et al. (2003); Ljung (2009) discuss methods to identify a continuous-time model from discrete-time data. One of the most common continuous-time model identification methods is the prediction error method. This technique minimizes the prediction errors when fitting a model and involves an iterative numerical search (Ljung and Wills, 2008). The prediction error method for continuous-time model identification is available in the MATLAB System Identification Toolbox (Ljung, 2007a). The procedure to find the best process model of an abnormal operating region, and its model parameters, is given below as a guideline and is to be repeated for each 70  input-output pair, u(t) and y(t). Step 1: Identify a set of training data where the process is in the abnormal operating region and y(t) has been sufficiently excited from changes to u(t) that are uncorrelated from other process inputs. Step 2: Determine the parameters for each of the four continuous-time models in Equations (4.40)–(4.43) using the MATLAB System Identification Toolbox. Step 3: Compute the residuals of each model, and evaluate whether increasing the order of the model increases the prediction accuracy at a statistical significance level of ↵. Choose the model of lowest order that gives the best fit. A more detailed discussion on model selection is given in Appendix E. Step 4: Evaluate the overall model on the test data-set to see how well it will perform in practice. Once the process model is satisfactory, it can be used as a predictive model in the MPC controller.  4.2.2  Override MPC design  An override MPC is used when the nominal controller is any type of feedback controller except MPC. The override controller is tuned for a specific abnormal range of process operation. When the measure of process shift output from the process monitoring method, given by the output of the LDA or QDA classifier, indicates the process shift to this region is high, the controller will try to recover the process shift to a SP of zero by overriding the process inputs that are manipulated by the nominal controller. Figure 4.8 illustrates the override control strategy. Figure 4.8 shows that the override controller takes control of the MVs using high (>) and low (<) signal selects. High selects are used when the optimal action to recover the process for a particular MV is to increase the variable, and vice versa for 71  Process Shift Set-point  +  -  Override MPC Process Monitoring  Measure of Process Shift  CV Set-point  +  -  Nominal Controller  MV’s MV’s  <  Process  CV  >  Figure 4.8: MPC controller overrides the MV’s of the existing controller to recover the process after a process shift the low selects. The implementation of the override MPC controller is simple: the only CV is the measure of process shift, there are as many MV’s as there are degrees of freedom needed to recover the process, and there are no constraint variables. In the MPC design, the penalty-on-move weights should be set as low as possible because the main objective of the override control is fast recovery of the process. This is because the fastest recovery of the process possible with MPC is when the cost function only includes the prediction errors. When the cost associated with changes to the MVs is included, the time for the process to recover becomes suboptimal. However, some cost associated with the control moves is necessary to prevent excessive wear to the control elements in a recovery situation. With the override MPC configuration, there is the possibility that switching transients from the high and low selects may introduce instability to the system. In particular, if the disturbance at the root cause of a process shift persists, the recovery controller may keep the process at the boundary between the abnormal and nominal operating regions. In this case, dwell-time switching logic may be used to prevent rapid switching between control signals. In this logic, a pre-specified length of time, known as the dwell time, must pass before the signal can switch from one controller (Morse, 1993). However, dwell-time switching has some serious disadvantages. For 72  example, the controller performance could severely decay within the dwell-time. An alternative to dwell-time switching is to use hysteresis based switching. Hysteresis switching requires that the signal to be switched exceeds the switching threshold by a predefined value before switching occurs (Hespanha et al., 2003). However, hysteresis switching may introduce destabilizing transients resulting from the signal to the control elements abruptly changing. Given that the nature of the switching logic for high and low signal selects inherently results in reasonably smooth switching, it is unlikely that e↵orts are required to mediate instabilities resulting from switching control signals to the control elements in the override MPC design. This continuity of switching exists because both signals must be nearly identical in magnitude as switching occurs, with only the signal that is slightly higher or lower than the competing signal being selected by the high or low select functions, respectively. Nevertheless, a quick simulation should be performed using the process models from the nominal and override controllers to determine whether any instabilities result from switching. In the event that the magnitude of process shift is large and the process is considered to be in a severely abnormal and nonlinear operating region, expert rules that mimic an operator’s best actions may be needed to help recover the process. The expert system acts as the final override to the MVs and has the ability to override any previous control decisions to the MVs. Figure 4.9 illustrates the expert system. The objective of the expert system is only to recover the process from the severely abnormal operating region into the less-severe abnormal operating region that the override MPC is designed for. Likewise, the expert rules should be designed with hysteresis to ensure that the process has been recovered far enough away from the severe abnormal operating region that it is unlikely that the override MPC will be unable to complete the recovery of the process. With the override MPC control strategy commissioned, it is unlikely that a process will shift to a severely abnormal operating region. However, large abrupt disturbances  73  ...  ...  Process Monitoring  Expert Override  ... ... ...  <  Process  CV  > ...  Figure 4.9: Expert rules override the MV’s if the process is operating in a severely abnormal region are possible in some processes, and these disturbances may rapidly shift a process into severely abnormal operation. Therefore, an expert system provides an extra layer of security in the process to maintain safe and stable operation following an extraordinary event.  4.2.3  Modifications to existing MPC  If the existing controller in the process is already MPC, then the controller can be modified to incorporate the measure of process shift as a constraint variable. Therefore, when the constraint is violated, the MPC will switch to the models for the constraint variable and execute the corrective actions to recover the process to nominal operation. The structure of the existing MPC is shown in Figure 4.10. When comparing Figure 4.10 to Figure 4.8, we see that it is more efficient and intuitive to implement the control strategy when the nominal controller is MPC. With this method, there is no need to license a new MPC. Furthermore, the MPC inherently switches the models to the constraint model set after a process shift, and MPC in modern DCS is designed to avoid the risk of creating process instability from internal model switching. Figure 4.10 also illustrates the expert override control strategy that is identical to 74  Measure of Process Shift  Process Monitoring  Expert Override  CV Set-point  AV  +  -  MPC  CV  Process  CV  MV  Figure 4.10: Existing MPC controller is modified with a constraint variable to allow the MPC to recover the process after a process shift the override strategy depicted in Figure 4.9. In this figure, however, MV and CV can represent any number of manipulated or controlled variables since MPC is a multipleinput and multiple-output controller.  75  Chapter 5 Case Study of SAG Mill Overload The objective of this thesis is to develop a control strategy to manage processes that are operating outside of their nominal operating region. This was achieved in Chapter 4. This chapter provides an industrial case study for the application of the control strategy for the detection of a process shift to an abnormal operating region, and the recovery of the process from the abnormal operating region to the nominal operating region. This case study is conducted on the SAG mill at a mineral processing facility in British Columbia. The operators at this facility operate the SAG mill conservatively because of their concern for the mill to overload. They find the early detection of overloading to be extremely difficult, and are therefore not able to react until the situation deteriorates significantly. Furthermore, the operators’ corrective actions to recover the process result in a sub-optimal recovery and a substantial opportunity loss in mill performance. In this chapter, we show how we can address these issues using the control strategy proposed in Chapter 4.  5.1  Description of the process  A detailed overview of the SAG mill process, and the overload process abnormality is discussed in Chapter 2.3. Figure 5.1 concisely shows the SAG mill circuit at the mineral processing facility with key process variables labelled that are utilized by the existing  76  MPC nominal control strategy. In the MPC, the CVs are the feed-end west bearing oil pressure and calculated charge density. The feed ore and water flow rates are the MVs, and the constraint variables, labelled AV in the figure, are the motor powers, remaining three bearing oil pressure measurements, pebble recycle, and sound level. Finally, the disturbance variable (DV) is a measurement from a visual size analyzer of the fraction of ore particles in the feed that are greater than two inches. Here, a particle size of two inches is of interest because the pebble ports on the discharge grate allow two inch particles to pass, thus particle sizes greater than two inches have a longer residence time in the mill and significantly a↵ects the mill dynamics. Figure 5.1 shows only the most relevant process variables to the operation of the mill using MPC. There are numerous additional measurements monitoring the condition of the mill such as bearing temperatures and vibration measurements. Additionally, the mineral processing facility lacks important measurements such as discharge flow to the pump box; however, some missing measurements can be inferred from available measurements. Based upon the nature of SAG mill overloading, we believe a process monitoring method that utilizes available and inferred measurements can accurately detect the onset of overloading and also provide an estimate of the severity of the overload. The selection of process variables for process monitoring is discussed in Chapter 5.2.1.  5.1.1  Control objective  The objective of the nominal control strategy at the mineral processing facility in British Columbia is to control the feed-end west bearing pressure and charge density to target values. The feed-end west bearing pressure is the highest bearing pressure because the load is distributed mostly towards the feed chute and, during mill rotation, is lifted up onto the west side of the mill. By controlling this bearing pressure to a target value, it is expected that the throughput of the mill is maximized. This is true only when the mill is not overloaded to any extent. Charge density is used as the second 77  CONTROLLED VARIABLES (CV’s) CV1. FEED END WEST BEARING OIL PRESSURE CV2. SAG MILL DENSITY  MANIPULATED VARIABLES (MV’s) MV1. ORE FEED RATE MV2. FEED WATER  CONSTRAINT VARIABLES (AV’s) AV1. MOTOR A POWER AV2. MOTOR B POWER AV3. PEBBLE RECYCLE CONVEYOR LOAD AV4. SOUND LEVEL AV5. FEED END EAST BEARING OIL PRESSURE AV6. DISCHARGE END EAST BEARING OIL PRESSURE AV7. DISCHARGE END WEST BEARING OIL PRESSURE  DISTURBANCE VARIABLES (DV’s) DV1. ORE PARTICLE SIZE > 2"  NOTATION C. CONTROLLER D. DENSITY F. FLOW I. INDICATOR J. POWER M. MOTOR P. PRESSURE T. TRANSMITTER W. WEIGHT X. SOUND  AV3 conveyer  conveyer  FIC  XT  FEED WATER  DISCHARGE-END TRUNNION BEARING OIL (EAST / WEST) PT  AV6  PT  AV7  FT SAG MILL  WT  f(x)  DI CV2  DENSITY CALCULATION  DIC  CASCADE  FIC MV2  er  M MOTOR B  e ye r  ey  DV1  AV4  JT  nv  AV2  WT  co  78  co n v  PARTICLE SIZE ANALYZER  CV1  PT  AV5  PT  FEED-END TRUNNION BEARING OIL (EAST / WEST)  screen  M MOTOR A JT  AV1  TO PUMP BOX  Figure 5.1: Overview of existing SAG mill control at the mineral processing facility in British Columbia  CV to ensure the slurry water content is sufficient to allow adequate discharge and to prevent excessive cohesion of clay minerals that may lead to overloading. However, the density measurement is seldom accurate because the calculation is rudimentary and based upon the feed and water rate measured at a single sample instance in the past. Although not discussed in this work, a more practical approach to SAG mill control would be to maximize the throughput of the mill. To maximize throughput, the discharge flow to the pump box would need to be measured for use as a CV in the MPC. The input and output variables of the process shown in Figure 5.1 are listed in Table 5.1 with their proper control notation: x is a state variable to be controlled; u is an input manipulated variable; w is an input disturbance; and y is a vector of output measurements. In the case of the SAG mill overload, the constraint variables will not be violated until the overload becomes extremely severe. Instead, many of these measurements are included in the process monitoring technique in order to estimate the magnitude of the mill overload. PM1 denotes the process monitoring output variable in the table, and it represents the estimated mill overload magnitude as a single variable. Table 5.1: Input and output variables critical to the recovery control strategy Outputs  Variable  Inputs  Variable  CV1 CV2 PM1 AV1–7  x1 x2 x3 y  MV1 MV2 DV1  u1 u2 w1  The control objective of the SAG mill process is shown in the following optimization  79  problem: minimize u1 ,u2  "(u1 , u2 ) = abs(x1  x1,sp ) + abs(x2  x2,sp ),  (5.1)  subject to x1,lower  x1  x1,upper ,  (5.2)  x2,lower  x2  x2,upper ,  (5.3)  y lower  y  y upper .  (5.4)  x3  x3,upper , 2 3 x˙ 4 5 = f (x, u, w), y  (5.5) (5.6)  Here, the control objective is formulated to minimize the absolute value of the error between the control variables, x1 and x2 , and their SPs, x1,sp and x2,sp . In addition, the process monitoring output variable, x3 , has an upper constraint limit. If the constraint in Equation (5.5) is violated, then the process is considered to be overloaded and it is desirable to recover controller performance. Otherwise, we will be unable to control x1 and x2 to the target SPs. This will result in poor minimization of Equation (5.1) and possible violations of the constraints in Equations (5.2) to (5.4). The greatest challenge in satisfying Equations (5.1) to (5.6) is the equality constraint imposed in Equation (5.6). Because the dynamics of f (x, u, w) are nonlinear, the nominal linear controller used to satisfy the SAG mill control objective will be insufficient in the event of a mill overload. In particular, a process shift from the nominal operating region to the overload operating region is nonlinear, and the nominal controller is not tuned for the overload operating region. In Chapter 5.4, we introduce Equation (5.5) as a constraint variable into the existing nominal MPC controller for the SAG mill. When the upper limit of Equation (5.5) is violated, the MPC switches to the constraint variable models which are tuned specifically for the overload operating region in order to recover controller performance.  80  5.2  Process monitoring  The first task in the application of the overall control strategy on the SAG mill is to train the process monitoring method to be sensitive to the overloaded operating region. The SAG mill overload is a complex and nonlinear abnormal situation to detect. This is because the onset of overloading can be the result of any of several phenomena each a↵ecting process outputs in di↵erent ways. In addition, the e↵ects of overloading in the early stages are subtle with only small deviations to the linear characteristic of the nominal range of operation. In this section, we show that traditional PCA using Q and T 2 , Method 1, is unable to detect the onset of overloading. Next, we show that PCA with LDA classification, Method 2, successfully detects the onset of overloading when monitoring only a specific selection of process variables. Furthermore, PCA Method 2 was implemented on site at the mineral processing facility in British Columbia in January 2013. Finally, we also show that SLLEP with LDA classification is able to successfully detect the onset of overloading regardless of the selection of process variables, and the classification results are superior to PCA Method 2. Unlike PCA, SLLEP was not implemented on site due to limitations of the MPC tool available at the mineral processing facility. The results discovered from the implementation of PCA demonstrates the first successful implementation of a process monitoring method to both detect SAG mill overloading and estimate the magnitude of overload at a mineral processing facility.  5.2.1  Selection of measured and inferred process variables  There are numerous input and output variables considered to train the feature extraction methods for process monitoring. There are nearly 100 measurements available for the SAG mill. However, only a handful of these variables are needed to estimate a process overload; other process variables are not sensitive to the changes within the mill relating to overloading, or they provide redundant information that would bias 81  the feature extraction results. The candidate process variables considered for process monitoring are listed in Table 5.2. This table provides a summary of the original units, filtering time constant, upper and lower ranges for scaling, and the standard deviation of the scaled variables. All 10 candidate variables are scaled from 0 to 1, and the upper and lower ranges for scaling are manually selected based upon the following heuristics: the variables are scaled considering their upper and lower operational constraints at the mineral processing facility, and the scale range is adjusted to create relatively similar standard deviations between all candidate variables. Table 5.2: Properties of the candidate process variables Process Variable Bearing Pressure Motor Power Sound Level Center of Gravity Di↵erence of Slopes Recycle Discharge to Pump Box Specific Energy Vibration Size > 2 inch  Units  Lower Range  Upper Range  Filter Time Constant (min)  Standard Deviation  psi kW n/a n/a n/a STPH STPH kW/ton n/a %  850 4000 -40 0.459 -0.0027 0 2200 1 -0.15 0  1200 6200 40 0.464 0.0027 600 4800 12 0.15 100  1 1 1 4 11 1 1 1 5 4  0.1316 0.0701 0.1099 0.0907 0.1135 0.0599 0.0870 0.1291 0.1132 0.1017  The filtering time constants listed in Table 5.2 are also determined heuristically. The filter time constant needs to be as small as possible to minimize detection delay of the overload in process monitoring. Conversely, the time constant needs to be large enough to filter stochastic noise from the data in order to help minimize false alarms. A first order filter is used to filter the data and has the di↵erence equation: y[k] = ↵f y[k  1] + (1  ↵f )x[k],  (5.7)  where k is the sample instance, x is the filter input, y is the filter output, and ↵f is 82  given in the following expression: ↵f = exp  ✓  Here, ⌧ is the filter time constant, and  t ⌧  ◆  .  (5.8)  t is the elapsed time in minutes between  samples. For the SAG mill case study, we believe it is ideal to detect overloading within at most 20 minutes. Otherwise, the mill may severely overload and result in excessive stress on the equipment and poor process economy. Although some variables may have a large detection delay due to filtering, the advantage of a multivariate process monitoring method is that all the variables complement each other and the disadvantages of one variable, such as detection delay, can be o↵set by other variables with low detection delay. Similarly, no single variable alone is suitable to monitor SAG mill overloading, thus several variables are needed to provide sufficient evidence of an overload. It is important to note in Table 5.2 that there is only one input candidate variable, feed rate with size greater than two inches, and the rest are process outputs. Time plot studies of the data discovered that other process inputs, such as the total feed rate and water flow, could be any arbitrary value during an overload. In other words, there is a poor correlation between mill overloading and these inputs. This is expected because the overload is the result of complex changes to the internal states of the mill that needs to be indirectly measured from the process outputs. Each of the 10 candidate variables and their relevance to a SAG mill overload are briefly discussed in the following sections. Bearing pressure, motor power, di↵erence of slopes Bearing pressure and motor power are two variables that are closely monitored by operators when they are concerned about an overload. During an overload, they expect to see high values for bearing pressure and motor power. Furthermore, they also ex-  83  pect bearing pressure to continually increase as a mill overloads, yet the motor power may not increase or even decrease during severe overloading. Therefore, based upon operator experience, we believe that the di↵erence in the rates of change between bearing pressure and motor power is also important to monitor. A variable transformation of the scaled bearing pressure and scaled motor power is used to create the di↵erence of slopes variable: Di↵erence of Slopes =  BP t  MP , t  (5.9)  where the slopes are determined from a first order finite di↵erence calculation given by: BP BP[k] = t MP MP[k] = t  BP[k 1] , t MP[k 1] . t  (5.10) (5.11)  Here, BP is the feed-end west bearing pressure, MP is the motor power for motor A, k is the sample instance, and  t is the time between sample instances and is chosen  to be one minute—one minute is reasonable since the time constant of the process is estimated to be around 10 minutes. Sound level Another important variable to monitor is sound level. Experienced operators insist that an overloaded mill is quieter than normal. Moreover, the review of SAG mill overloading in Chapter 2.3 identified that mill acoustics should decrease during an overload resulting from the reduced potential energy or dampening of the cataracting charge. A time plot of the sound level measurement of the SAG mill revealed that the sound level has a moving average. The moving average is likely a result of changes in the steel ball charge and ore type. The condition of the mill liner and auxiliary equipment where the SAG mill is located may also a↵ect the sound level. Consequently, the 84  sound level was transformed into the di↵erence between a 0.4 day moving average of the sound level and the raw measurement filtered at 40 seconds. A time plot of the raw sound level and the moving average is shown in Figure A.1 in Appendix A.1, and the sample instances when the mill was overloaded is highlighted to show the e↵ect of overloading on the raw sound level compared to the moving average. Center of gravity The review in Chapter 2.3 also revealed that load distribution may be another key variable for process monitoring. If the load distribution shifts to the east side of the mill where the charge is not lifted, then it may indicate that ore or slurry is accumulating on this side and reducing the grinding efficiency of the mill. To measure load distribution, the change of total mass between the east and west side of the mill can be determined from the bearing pressure measurements. The center of gravity of the mill can be used to estimate the load distribution, and it is calculated from the expression: Center of Gravity =  FEWest  FEEast BP + DEEast BP , BP + DEWest BP + FEEast BP + DEEast BP  (5.12)  where FE is the feed-end of the mill, DE is the discharge-end of the mill, and BP refers to the bearing pressure at specific locations of the SAG mill. Since the total force on the bearings is largely a↵ected by the weight of the mill, it is expected that the load distribution estimate will not be sensitive to the small changes in charge distribution inside the mill. However, the estimate can be appropriately scaled to emphasize the expected subtle changes in this measurement. Recycle rate, discharge to pump box During an overload, it is also expected that the recycle flow and discharge to the pump box will decrease even though bearing pressure increases and the feed rate may  85  be constant. The discharge flow to the pump box is inferred from the feed rate, water flow addition, and change of tonnage in the mill and calculated from the mass balance given in the expression: mcharge = (Feed + Water + Recycle) t (Discharge to Pump Box + Recycle),  (5.13)  and rearranged to solve for the discharge to pump box: Discharge to Pump Box = Feed + Water  mcharge . t  (5.14)  Here, feed is given in short tons per hour (STPH) and water is also given in STPH after being converted from gallons per minute (GPM) to STPH where 1 GPM is 0.25 STPH of water at 10 C. The discharge to pump box variable is also given in STPH, and of  mcharge is the estimated change in the mass of the charge over a time interval t of one minute. At the mineral processing facility in British Columbia, the mass  of the total charge in the mill is an available measurement inferred from the four bearing pressure measurements with a scaling and bias determined from a calibration performed on the empty mill. Specific energy Specific energy is a measure of the amount of energy consumed per ton of ore processed in the mill. In the midst of an overload, the specific energy of the charge should decrease because mass is accumulating but motor power does not increase. Specific energy is calculated from the expression: Specific Energy =  Total Motor Power , Total Bearing Pressure  86  (5.15)  where the total motor power is the combination of the motor power from both motors A and B, and the total bearing pressure is the combination of all four bearing cells. Bearing vibration, feed rate of size greater than two inches The final two variables considered for monitoring are bearing vibration and the feed rate with a size greater than two inches. The e↵ects of overloading on bearing vibration are poorly understood in the industry. However, we believe bearing vibrations may initially increase upon overloading due to the increased overall inertia and total kinetic energy of the charge as the mass of the charge increases. Eventually, vibrations may decrease upon significant overloading as the impact energy of the cataracting charge decreases. Even though feed with a particle size greater than two inches is considered as a candidate process variable, a time plot of this variable also shows that there appears to be no easily observable correlation between an overload and the particle size disturbance. Time plots of each of the scaled process variables are provided in the following section. Visualization of the training data A time plot of the training data-set for PCA and SLLEP feature extraction is given in Figure 5.2 showing data sampled at two minutes from the mineral processing facility on February 29th, 2012. Figure 5.2 shows that the bearing pressure and motor power time trends are two excellent indicators of the overload condition. Normally, bearing pressure and motor power are correlated; however, during an overload, bearing pressure increases and motor power holds steady. Upon severe overloading, shown just before sample instance 250, motor power and bearing pressure are inversely correlated. The figure also shows that the sound level variable—formulated so that low mill acoustics results in a high fault indicator—is high during an overload. This corresponds to a much lower sound 87  0.75  0.75  0.5  0.5  0  50  100  150  200  0.75  0.75  0.5  0.5  0.25  0.25  Difference of Slopes  0  Discharge to Pump Box  1  0  50  100  150  200  1  1  0.75  0.75  0.5  0.5  0.25  0.25  0  0  50  100  150  200  1  0 250 1  0.75  0.75  0.5  0.5  0.25  0.25  0  0  50  100  150  200  1 Vibration  0 250  0 250 1  0.75  0.75  0.5  0.5  0.25  0.25  0  Recycle  Sound Level  1  0 250  Specific Energy  0  0.25  Center of Gravity  Overload  0.25  Motor Power  1  0  50  100 150 Sample Number  200  Size > 2 inch  Bearing Pressure  1  0 250  Figure 5.2: Candidate process variables of the training data-set for process monitoring  88  level than the moving average of the sound level. The figure also shows the center of gravity increases implying that the load distribution has shifted slightly to the east side of the mill. The third subplot of the figure shows the di↵erence of slopes is positive when the mill is overloaded which is apparent from the trend being above the dashed black line at the center of the scale. In this subplot, it is also observable that the recycle flow decreases once the mill is severely overloaded. The fourth subplot shows the discharge to pump box is high during the mill overload; however, it does not increase enough to o↵set the consistent rise in bearing pressure that indicates an accumulation of ore in the mill. The specific energy is observed to decrease during the overload. Finally, the vibration measurement appears to temporarily increase during an overload, and the feed rate with size greater than two inches is unexpectedly less than normal. From the training data shown in Figure 5.2, the first five candidate process variables listed in Table 5.2 appear to be the most important indicators of the overload situation. Specific energy provides somewhat redundant information to the bearing pressure, recycle indicates the overload only during severe overloading, and vibration only indicates an overload at the onset of overloading. It is unlikely that a linear feature extraction method could utilize the nonlinear characteristics of variables such as the vibration measurement since the correlation between other variables is poor. However, nonlinear feature extraction may appropriately utilize these nonlinear measurements. Although it appears that five variables are the most important for process monitoring, all 10 candidate process variables will be considered when comparing PCA to SLLEP in Chapter 5.2.4. Figure 5.3 shows a pair-wise scatter plot between the five most relevant candidate process variables for the training data-set—the other five variables are not shown for brevity. The scatter plots show that the blue data points, representing nominal operating data, are scattered in an approximately elliptical fashion which is most visible for plots of bearing pressure and other variables. However, when considering  89  both the nominal and overload operating data, the data no longer scatters elliptically for the mill overload. This is a clear indication of the nonlinear nature of the SAG mill operation, particularly near the boundary between nominal and overloaded operation. The non-elliptical scattering of the data between nominal and overloading operation shows that at the boundary between these operating states multivariate normality of the data does not hold. This implies that linear feature extraction techniques such as PCA may result in poor performance when used in process monitoring to detect the onset of overloading. However, in PCA, the accuracy of model fit between the PC model and the measured values can be useful to provide an indication of overloading if the model fit is poor. Visualization of the test data Figure 5.4 shows the time trends of the 10 candidate process variables for the test dataset. This data-set is developed from data sampled at two minutes from the mineral processing facility on May 27th, 2012, when three consecutive overloads occurred. This data-set is excellent to validate the performance of the process monitoring technique because the mill is in a double-fault situation. In this case, the engineers overestimated the steel ball charge in the mill. As a result, the grinding efficiency was depressed and the mill could only be operated at a much lower total charge mass than normal. The e↵ects of the lower charge mass can be observed from the unusually low bearing pressure. Because the bearing pressure was so low, the operators attempted to increase the bearing pressure in order to increase the mill throughput. However, the mill was already nearly overloaded. Consequently, the mill easily overloaded and it was later discovered that the deficiency of the steel ball charge indirectly lead to overloading. The test data shown in Figure 5.4 has mean and standard deviation values that are so di↵erent from the training data, that any attempts to train feature extractions methods on the standardized training data failed during model validation on the test data. Only after scaling the data between 0 and 1 was it possible to mitigate the  90  Normal Operating Data  Overloaded Operating Data  Bearing Pressure  1  0.5  0 Motor Power  1  0.5  0  Sound Level  1  0.5  Difference of Slopes  Center of Gravity  0 1  0.5  0 1  0.5  0 0 0.5 1 0 0.5 1 Bearing Pressure Motor Power  0  0.5 1 0 0.5 1 0 0.5 1 Sound Level Center of Gravity Difference of Slopes  Figure 5.3: Pair-wise scatter plot of the selected process variables for a wide range of data including the training set 91  0.75  0.75  0.5  0.5  0  50  100  150  200  0.75  0.75  0.5  0.5  0.25  0.25  Difference of Slopes  0  Discharge to Pump Box  1  0  50  100  150  200  1  1  0.75  0.75  0.5  0.5  0.25  0.25  0  0  50  100  150  200  1  0 250 1  0.75  0.75  0.5  0.5  0.25  0.25  0  0  50  100  150  200  1 Vibration  0 250  0 250 1  0.75  0.75  0.5  0.5  0.25  0.25  0  Recycle  Sound Level  1  0 250  Specific Energy  0  0.25  Overload  Center of Gravity  0.25  Motor Power  1  0  50  100 150 Sample Number  200  Size > 2 inch  Bearing Pressure  1  0 250  Figure 5.4: Candidate process variables for a segment of the test data-set for process monitoring 92  drastic changes to the mean and standard deviation of the variables and design a feature extraction method e↵ective for both the training and test data-sets. Scaling the variables rather than standardizing is one of the features that makes the process monitoring models developed in this chapter robust to changing process conditions.  5.2.2  PCA  Two methods of training the PC model and classifying the feature space were investigated. The procedure for both of these methods was discussed in Chapter 4.1.1. First we will show that PCA Method 1 is not satisfactory for process monitoring of a complex process such as a SAG mill. Then, we will demonstrate that PCA Method 2 is instead more suitable for these processes if careful consideration is given to the selection of process variables in the model. PCA Method 1 To use PCA as a feature extraction technique, Method 1 was first evaluated. Since this method requires that the training data are entirely nominal operating data, a training data-set di↵erent from Figure 5.2 was used. The training data-set used to train PCA Method 1 is shown in Figure 5.5. This data-set is taken at a two minute sampling from the mineral processing facility on February 29th, 2012. This training data includes some of the training data in Figure 5.2 up until the point where the mill became overloaded. PCA Method 1 was trained on this data using the first five variables in Table 5.2 The test statistics Q and T 2 were calculated for the test set in Figure 5.4 to classify abnormal and normal process behaviour. Figure 5.6 shows the results for the Q and T 2 statistical classifiers from Method 1 on the test data-set from May 27th, 2012 sampled every two minutes from the mineral processing facility. In these results, we see that neither Q and T 2 are sensitive enough to the mill overload to detect the early onset of overloading. This is likely because the overload results in subtle changes to the 93  0.75  0.75  0.5  0.5  0.25  0.25 50  100  150  200  250  300  350  400  0.75  0.75  0.5  0.5  0.25  0.25  Difference of Slopes  0  Discharge to Pump Box  1  0  50  100  150  200  250  300  350  400  1  1  0.75  0.75  0.5  0.5  0.25  0.25  0  0  50  100  150  200  250  300  350  400  1  0 450 1  0.75  0.75  0.5  0.5  0.25  0.25  0  0  50  100  150  200  250  300  350  400  1 Vibration  0 450  0 450 1  0.75  0.75  0.5  0.5  0.25  0.25  0  Recycle  Sound Level  1  0 450 Center of Gravity  0  Specific Energy  0  Motor Power  1  0  50  100  150  200 250 Sample Number  300  350  400  Size > 2 inch  Bearing Pressure  1  0 450  Figure 5.5: Candidate process variables of the training data-set for PCA Method 1 only 94  correlation matrix used to train the PC model. Moreover, these statistics appear to be more sensitive to the highly nonlinear underloading abnormal situation. However, mill underloading does not require a recovery control strategy like overloading does.  Hotelling T 2 Statistic  20 Overload 15 10  Tα2  5 0  0  50  100  150  200  250  100 150 Sample Number  200  250  0.25  Q Statistic  0.2 0.15 0.1  Qα  0.05 0  0  50  Figure 5.6: Q and T 2 results on the test data-set for PCA Method 1 Table 5.3 summarizes the classification results for PCA Method 1. Two PCs were retained in the PC model and used to calculate the Q and T 2 results. The performance of the Q and T 2 test statistics was evaluated using sensitivity and specificity, and an average of the sensitivity and specificity results in a column labelled ‘Metric’. Sensitivity is a measure of the proportion of true overload operating data that are correctly identified by the classifier. Sensitivity is defined by the following expression:  Sensitivity =  # of true overload , # of true overload + # of false nominal 95  (5.16)  where a true overload corresponds to the data we labelled as overloaded before training the classifier and are correctly classified, and a false nominal corresponds to data classified as nominal by the classifier when instead we labelled that data as belonging to the overloaded class. The low sensitivity of both Q and T 2 reveals that these classifiers are poor indicators of the SAG mill overload. The other metric used to evaluate the performance of the classifier is specificity. Specificity is a measure of the proportion of true nominal operating data that are correctly identified by the classifier. Specificity is defined by the following expression:  Specificity =  # of true nominal , # of true nominal + # of false overload  (5.17)  where a true nominal corresponds to the data we defined as nominal before training the classifier and are correctly classified, and a false overload corresponds to data classified as overloaded by the classifier when instead we labelled that data as belonging to the nominal class. Table 5.3: Classification results for PCA Method 1 Dimension of Feature Space  Dataset  Q↵  Q Sens.  Q Spec.  T↵2  T2 Sens.  T2 Spec.  Metric  2D  Train Test  0.058  0% 0%  97.7% 74.7%  6.05  0% 9.1%  89.4% 93.1%  46.8% 44.2%  Sensitivity and specificity are used instead of an overall classification error rate because we are most interested in knowing if we incorrectly classified an overload as being normal. Also, it is likely that more nominal data is available than overload data, and this imbalance biases this type of estimate towards measuring the successful classification of only the nominal data. For complex processes, we do not expect to see satisfactory process monitoring results from PCA Method 1. This case study is one example providing evidence that Method 1 is not sensitive enough to subtle changes in the correlation structure of  96  data in the event of a process shift. However, Method 1 is included in this thesis to demonstrate the advantages of PCA Method 2 or SLLEP proposed in this work. As a general rule, if it is known that the process dynamics are complicated, we recommend to not attempt to use Method 1 for process monitoring. PCA Method 2 PCA Method 2 has more satisfactory feature extraction and classification results than Method 1. In Method 2, the PC model was trained on the training data-set in Figure 5.2. This data-set has enough overload data included with the nominal data that when the PC model was trained, correlations existed between the first five candidate variables in Table 5.2. In fact, the inclusion of the overload data allowed variables such as the center of gravity to develop a correlation structure that otherwise never existed in the nominal operating region. Figure 5.2 shows that the center of gravity only has a strong correlation with other variables once the mill is overloaded. This is a key reason why overload data are needed in training the PC model. The sacrifice to incorporating overload training data occurs when the overload is severe and the correlation structure di↵ers significantly from the nominal operation. However, as long as a limited amount of data in the severe operating region is used, then the PC model will not significantly lose performance as a feature extractor. The PC model parameter results from training are shown in Table 5.4. Table 5.4 shows the five PCs and the coefficients of the five process variables used in the model. From the table, we see that nearly all of the variance in the data-set can be explained with one, two, or three PCs. Scores from each PC were plotted against each other in 2D plots to inspect which PCs were sensitive to overloading. It was found that PC1 and PC2 were most sensitive to the overload. The 2D plot of PC1 and PC2 scores is shown in Figure 5.7a. Figure 5.7a shows that the nominal and overloaded operating data occupy their  97  Table 5.4: PCA modeling results  Bearing Pressure Motor Power Sound Level Centre of Gravity Di↵erence of Slopes Eigenvalue % Variance Explained  PC1  PC2  PC3  PC4  PC5  0.651 0.313 0.456 0.329 0.403 0.0360 64.90  0.427 0.128 -0.186 0.300 -0.823 0.0088 16.00  0.073 -0.232 -0.688 0.577 0.367 0.0071 12.80  -0.339 -0.483 0.533 0.585 -0.158 0.0034 6.10  -0.523 0.774 -0.027 0.355 -0.016 0.0001 0.30  own distinct regions of the PC score space. A linear classification boundary is drawn in the figure to separate the data into binary classes of nominal and overload operating data. The equation for the classification boundary is determined by 2D LDA and is given in Table 5.5. Table 5.5: PCA-LDA classification results for the 2D and 1D feature spaces Dimension of Feature Space 2D 1D  Classifier y=  1.464x + 1.641 x = 1.195  Data-set  Sens.  Spec.  Metric  Train Test Train Test  97.8% 18.8% 100% 65.4%  99.4% 100% 95.5% 100%  98.6% 59.4% 97.8% 82.8%  APER 31.2% 13.0%  Table 5.5 summarizes the classification results for the PCA-LDA process monitoring method using sensitivity, specificity, a cumulative performance metric labelled ‘Metric’, and the apparent error rate (APER). The column labelled ‘Metric’ in Table 5.5 is simply the average of the sensitivity and specificity metrics, and is used to provide a concise summary of the classification results. APER is the estimated overall error rate of the classifier, and a low APER value is favourable. APER is determined from the classification results on the test set, and it is calculated as follows: APER =  # of false overload + # of false nominal . # of samples in the test set 98  (5.18)  0.2 0.1  PC2 Scores  0 −0.1 −0.2 −0.3 a)  b)  −0.4  Nominal Overloaded Classification Boundary  −0.5 0.4  0.6  0.8  1  1.2  1.4  1.6  0.4  0.6  0.8  1 PC1 Scores  1.2  1.4  1.6  Figure 5.7: LDA classification results on the PC score space of the training data-set for a) 2D LDA, and b) 1D LDA Figure 5.7a demonstrates of the importance of being able to visually interpret highdimensional data into a low-dimensional feature space that is easily plotted. In this case, we see the linear classifier has separated the nominal and overloaded data across two-dimensions. However, it appears the data are almost entirely separable across one dimension—the scores for PC1 . As a consequence, the 2D linear classifier over-fits the training data, and the classifier will not generalize well to new data sampled online at the mineral processing facility. To mitigate over-fitting, the classifier is trained on PC1 score data in 1D LDA, and the classification results are shown in Figure 5.7b and summarized in Table 5.5. The figure and table show that the classification results  99  between 2D and 1D LDA on the feature space are nearly identical. However, when the 2D and 1D classifiers are applied to the test data-set, the results indicate that the 1D classifier has superior classification accuracy than the 2D classifier. The classification results for 2D and 1D LDA on the feature space of the test set are given in Figure 5.8a, Figure 5.8b, and Table 5.5.  Nominal Overloaded Classification Boundary  0.2 0.1  PC2 Scores  0 −0.1 −0.2 −0.3 a)  −0.4 −0.5 0.4  0.6  0.8  1  1.2  1.4  1.6  0.4  0.6  0.8  1 PC1 Scores  1.2  1.4  1.6  b)  Figure 5.8: LDA classification results on the PC score space of the test data-set for a) 2D LDA, and b) 1D LDA  5.2.3  SLLEP  In this section, we show that SLLEP in combination with LDA can also successfully detect the onset of SAG mill overloading. In addition, in this case study we demonstrate 100  that this technique achieves equivalent or better process monitoring performance that PCA Method 2, and this technique is suitable for both low and high-dimensional process monitoring. PCA Method 2, on the other hand, is limited to monitoring the few scarce variables with a reasonably linear response during a process shift. The method for training SLLEP as a feature extractor is discussed in Chapter 4.1.2. In this case study, we found that we needed approximately 2000 data points from both nominal and overloaded training data to properly train SLLEP. This training data-set was obtained from the mineral processing facility for February 25th, 2012 to February 29th, 2012. This data-set includes the data shown in Figure 5.2. All nominal operating data were sampled every five minutes, and overload data for the overload shown in Figure 5.2 and a partial overload on February 27th were sampled at 30 seconds. In addition, nominal data from May 26th, 2012 were included in the training set with a sampling of five minutes to include data from di↵erent operating regions. Finally, all data where the SAG mill was not in operation were removed. This results in a training data-set with 1919 data-points spanning over six days and includes two identified overloads. Four candidate models for several combinations of candidate process variables were considered when training and testing SLLEP. The four candidate models are shown in Table 5.6. Model 1 is the same model used in training the PC model in Chapter 5.2.2. Three other models with increasing numbers of variables in the model were considered.  The results for SLLE on the training data-set mapped to 2D space are shown in Figure 5.9 for Model 4. The figure shows that the nominal operating data appears to scatter elliptically, and the nominal and overload operating data are well-separated with some misclassifications near the classification boundary. A 2D visual interpretation of all 10 process variables would not be possible without a method such as SLLE, and the overload data are separated enough from the nominal operating data that it is possible to use LDA to classify the 2D space into regions of nominal and overload  101  Table 5.6: Summary of the variables retained in each candidate model Variables Bearing Pressure Motor Power Sound Level Centre of Gravity Di↵erence of Slopes Recycle Discharge to Pump Box Specific Energy Vibration Size > 2 inch  Model 1  Model 2  Model 3  Model 4  X X X X X  X X X X X  X X X X X X X  X X X X X X X X X X  X  X  operation. Finally, the figure also shows that QDA over-fits the region and is unable to classify some severely overloaded data within the quadratic classification boundary, and classification of severely overloaded data needs to be perfect due to the potential safety risks and equipment damage resulting from an overloaded mill. In both LDA and QDA classifiers on the feature space in Figure 5.9, we see that there are misclassified data points. However, these misclassified data points are acceptable because they likely tell us two things. First, since we manually assigned a class label to data ranges that appeared faulty, there is subjectivity to this process. Only the most obvious overload data were classified as being overloaded in order to reduce bias towards classifying data as overloaded. Secondly, it is nearly impossible to perfectly classify data as being nominal or overloaded when the process shift between the two operating regions is continuous. In other words, there is a ‘fuzzy’ operating region in the middle that can be considered both slightly overloaded and nominal at the same time. This presents a challenge when assigning a binary class label to a nonbinary abnormal situation. Given these points, we expect to see misclassifications, and the linear classification boundary in the figure shows a good compromise between classifying data as nominal or overloaded. Figure 5.10 shows the 2D SLLEP mapping of the test data-set using Model 4.  102  4 3 2  Coordinate 2  1 0 −1 −2 Nominal Overloaded Quadratic Classification Boundary Linear Classification Boundary  −3 −4 −3  −2  −1  0 1 Coordinate 1  2  3  4  Figure 5.9: LDA and QDA classification on the training data-set mapped into 2D space by SLLE The results from this plot show that the linear classifier is superior to the quadratic classifier because it misclassifies fewer observations. The figure also shows that there are numerous data points where the overload data were classified incorrectly as nominal by the linear classifier. However, an inspection of these data points revealed that these data points are on the fringes where manual class labels were added to the overload ranges in the test set; in other words, we most likely manually classified some data as being overloaded when in reality the mill was not necessarily overloaded. Table 5.7 summarizes the SLLEP classification results for the four candidate models. Both LDA and QDA were tested for each model. The performance metric to compare the models is simply the average of the four training and test sensitivity and specificity values. The training sensitivity and specificity were determined from 103  4 3 2  Coordinate 2  1 0 −1 −2 Nominal Overloaded Quadratic Classification Boundary Linear Classification Boundary  −3 −4 −3  −2  −1  0 1 Coordinate 1  2  3  4  Figure 5.10: LDA and QDA classification on the test data-set mapped into 2D space by SLLEP 10-fold cross-validation of the training data-set, then the test data were mapped to the feature space and the classification results were determined. This table shows us that the linear classifier consistently scores better than the quadratic classifier for these four models. This is because the quadratic classifier is more susceptible to over generalizing the feature space than the linear classifier. Table 5.7 also shows that the classification performance increases as the number of variables used to train the model increases. This is not expected to always be the case; however, these results demonstrate that SLLEP works well with high-dimensional data-sets. In addition, SLLEP-LDA classification performance was not hindered by the addition of the particle size data, even though this variable does not correlate well with the overload in the test data-set. 104  Table 5.7: SLLEP Classification results of four candidate models Model 1 2 3 4  Classifier  K  Quadratic Linear Quadratic Linear Quadratic Linear Quadratic Linear  28 28 33 34 31 31 26 26  0.0016 0.0016 0.0022 0.0020 0.0004 0.0004 0.0012 0.0012  Train Sens.  Train Spec.  Test Sens.  Test Spec.  Metric  91.3% 91.1% 91.5% 97.8% 96.2% 96.2% 99.1% 98.9%  93.7% 93.7% 87.1% 90.1% 93.7% 93.7% 89.0% 89.0%  66.2% 70.0% 81.5% 85.4% 70.0% 80.0% 86.9% 90.8%  99.5% 99.1% 94.9% 94.0% 96.8% 98.1% 94.9% 93.5%  87.7% 88.5% 88.8% 91.8% 89.2% 92.0% 92.5% 93.1%  The equations of the linear and quadratic classification boundaries, determined from all solutions to f (yi ) = 0, for SLLEP on Model 4 are given in Table 5.8. The APER is calculated from the results for the test data and provided in the table to provide an estimate on the classification error rate expected for the classifiers. Table 5.8: Classification boundaries for SLLE and the classification error rate  5.2.4  Classifier  f (yi )  APER  Linear Quadratic  2.8 + 2.42x 4.95y 1.81 + 0.75x 5.88y + 1.96x2 + 3.3xy + 1.13y 2  8.7% 13.3%  Comparison between PCA-LDA and SLLEP-LDA  In Chapters 5.2.2 and 5.2.3 we showed that PCA-LDA and SLLEP-LDA both successfully classify the overload abnormal operating situation in the training and test data-sets. The combined classification results of the two methods are summarized in Table 5.9 for the four models identified in Table 5.6. Table 5.9 shows that SLLEP-LDA is superior to PCA-LDA as a process monitoring method for the test data-set. For each model, SLLEP-LDA is more sensitive than PCA-LDA to detect an overload. On the other hand, PCA-LDA consistently has a higher specificity than SLLEP-LDA. However, a high specificity alone is not indica105  Table 5.9: Comparison of the classification results between PCA-LDA and SLLEPLDA on the test set Model  PCA Sens.  LLE Sens.  PCA Spec.  LLE Spec.  PCA Metric  LLE Metric  1 2 3 4  65.4% 53.1% 15.4% 0.0%  70.0% 85.4% 80.0% 90.8%  100% 100% 100% 100%  99.1% 94.0% 98.1% 93.5%  82.7% 76.5% 57.7% 50.0%  84.5% 89.7% 89.1% 92.1%  tive of a good classifier. Therefore, a comprehensive performance metric for the two methods is included that takes the average of the sensitivity and specificity for the each method. The best model chosen for PCA-LDA is Model 1, and the model chosen for SLLEP-LDA is Model 4. Based upon ‘Metric’ for these two models, SLLEP-LDA outperforms PCA-LDA by 9.4%. Figure 5.11 provides a visual comparison of the process monitoring output, also defined as the measure of process shift, on the test data-set for Model 1 of PCA-LDA and Model 4 of SLLEP-LDA. Bearing pressure and motor power are included in the figure to help orientate these results to the complete test data shown in Figure 5.4. Figure 5.11 demonstrates that the output between the two methods is nearly identical. In fact, SLLEP-LDA is slightly more sensitive to the overload than PCA-LDA, and has resulted in a misclassification around sample 70. False alarms such as these are proportional to the specificity measurement, and SLLEP-LDA could easily be tuned to reduce these misclassifications if the classification boundary was adjusted by increasing the cost of misclassifying nominal data as overloaded in Equation (4.30). This is one example where, if the assumptions of PCA are valid and the dimensionality of the observation space is low, then PCA can almost match the performance of SLLEP. However, because of the assumption of linearity and multivariate normality in PCA that SLLEP does not require, SLLEP as a feature extractor will always be equivalent or better than PCA if trained properly (De Ridder and Duin, 2002). Table 5.9 demonstrates the power of SLLEP-LDA as a superior process monitoring 106  1 Overload  0.75  0.75  0.5  0.5  0.25  0.25  0  0  50  100  150  200  0 250  100  150  200  250  100 150 Sample Number  200  250  PCA Output  1600 Overload 1200 800 400  0  50  SLLEP Output  10 Overload 0 −10 −20  0  50  Figure 5.11: Output from process monitoring on the test set for both PCA-LDA and SLLEP-LDA tool for high-dimensional data-sets. PCA-LDA shows a deteriorating performance on the test set as the number of variables in the model increased. This observation is likely explained by the fact that we observed the extra variables included in Models 2 to 4 are more nonlinear than the five variables carefully selected for Model 1. However, the classification performance of SLLEP-LDA improves as the number of observation variables increases because the method does not require linearity of the variables in the model, thus more information about the process operation is mapped to the feature space. In addition, since PC scores are mapped to the feature space using a linear equation, the information of the fault in the PCs becomes masked by the new variables. Consequently, a linear model is not a useful tool to extract features pertaining to the 107  Motor Power  Bearing Pressure  1  overload for nonlinear variables. Moreover, PCA-LDA in Model 1 is only successful as a process monitoring method in this case study because of the large e↵ort invested in selecting and transforming the variables to allow PCA to perform well. On the other hand, SLLEP-LDA does not require as large of an investment in the selection of process variables because highly nonlinear variables do not need to be pre-screened. SLLEP-LDA has proven to have equivalent or greater classification accuracy than PCA-LDA in this case study. However, SLLEP-LDA has some disadvantages too. Although less e↵ort is required to select process variables, more e↵ort is required to perform a grid search on the tuning parameters K and ↵ in SLLEP that give the best classification results in the feature space by LDA. However, if a fast computer is available during the o✏ine training of SLLEP-LDA to find K and ↵, then the grid search can be performed in a reasonable amount of time. Another disadvantage to SLLEP-LDA is the online computational complexity. PCA-LDA requires only a handful of calculations; however, SLLEP-LDA requires the real-time solution to an unconstrained QP. In Chapter 4.1.2 we demonstrated a new approach that allows SLLEP-LDA to be utilized in the industry for process monitoring. Consequently, if MPC is available in the control software at a specific site, then it is possible to solve the unconstrained QP in real-time with minimal controller loading and programming. In this case study, the mineral processing facility uses a modern DCS that is equipped with MPC. However, in their version of the commercial DCS product, the MPC consists of an optimizer that solves an LP to provide set-points to the MPC controller. Because the optimizer cannot be bypassed in this version, the QP of the MPC controller cannot be directly accessed. As a result, SLLEP-LDA is only solved with the LP and results in poor mapping of new data to the feature space. Because of the limitation of the control software available at the mineral processing facility, we were not able to implement the SLLEP-LDA process monitoring tool, and the modeling and simulation portion of this case study is conducted using the PCA-  108  LDA process monitoring method. Despite this, we have successfully demonstrated to a major control systems manufacturer the benefit of bypassing the optimizer to access the MPC controller directly in order to use the QP solver. Moreover, we submitted a technology request with the advanced control engineers at the major control systems manufacturer, and they have agreed to incorporate these changes into the software in a future release. To put it another way, this work has helped to advance one of the industry’s best commercial control products, allowing for some of the most forthcoming techniques in literature to be implemented in the industry. As a final note on the process monitoring method developed for the case study on the SAG mill at the mineral processing facility, this work provides a new technique that is superior to the best practice. Ko and Shang (2011) demonstrate PCA with Q and T 2 classification on industrial operating data. However, their study does not validate the process monitoring method on new test data; instead, they performed cross-validation on the training data. However, we found with 10-fold cross-validation that PCA can successfully classify the training data. By way of contrast, the real test is when the PCA model is tested on new test data where it is likely that the process operating conditions have changed. We found that careful selection of process variables, classification of the PC score space, and scaling the observation variables is the best way to make sure PCA generalizes well to new data where process conditions have changed. As a result, this work presents two process monitoring methods that are designed to work in the industry given realistic operating conditions. Likewise, we have successfully implemented PCA-LDA at the mineral processing facility in January 2013, and have found that this tool gives the expected output results for their operating conditions. However, no overloads have occurred in this first half of 2013 to fully evaluate the performance of PCA-LDA because the mineral processing facility has recently added an additional SAG mill to de-bottleneck the original SAG mill.  109  5.3  Modeling of the overloaded operating region  To model the overload operating region of the SAG mill, we first considered a firstprinciples model of the internal dynamics of the SAG mill at the mineral processing facility based o↵ the work of Morrell (1993); Apelt (2002), and the work of the Julius Kruttschnitt Mineral Research Centre at the University of Queensland (Napier-Munn et al., 1996; Valery, 1998). These models are based upon dynamic mass balances for solids, water, ball charge, and shell liner wear. However, we instead chose an empirical modeling approach because this approach has the advantage of significantly reducing the time required for plant personnel to gather detailed information about the design of the SAG mill. Moreover, many of the model parameters would have to be estimated from data available at similar plants. Finally, the first-principles model does not account for the rheological nature of the slurry, flow-back problems of the pulp-lifter discharge system, slurry pooling, and freewheeling; as a consequence, the models do not provide an accurate simulation of the onset of mill overloading required for this case study (Katubilwa and Moys, 2011; Shi and Napier-Munn, 2002; Morrison and Cleary, 2008; Latchireddi and Morrell, 2003a,b). The modeling techniques described in Chapter 4.2.1 are used to model the overload operating region of the SAG mill at the mineral processing facility. In this case, there are two MVs: feed ore flow, u1 , and water flow, u2 , and one CV: measure of process shift, y. The measure of process shift is the output of the process monitoring method discussed in Chapter 5.2. Since there are two MVs and one CV, we must train two process models to find the response between each input-output pair. Considering we know that the feed input will have the strongest e↵ects on the mill behaviour, we first model the response of y to changes in u1 . Overload data from February 27th, 2012 was used to train the response between u1 to y because u2 was not initially changed to correct the overload response. Consequently, we can use the data from this region to find the response in y that is directly  110  caused by changes in u1 and not u2 . Figure 5.12 shows the data sampled at 30 seconds used to develop the model relation between u1 to y. In this data, we found 29 samples where the mill is overloaded and u1 is reduced to zero to let the mill grind the ore  3000  6000  2000  4000  1000  2000  0  0  5  10  15  20  25  0 30  25  30  1600  Measure of Process Shift  1400  Overload  1200 1000 800 600 PCA Output 400  0  5  P1D 10  P1DZ  15 Sample Number  P2D 20  Figure 5.12: Model fitting results for the response of the process shift to changes in feed ore rate Three di↵erent models were fitted to the data shown in Figure 5.12 using the MATLAB system identification toolbox (Ljung, 2007a): P1D, P1DZ, and P2D. Here, P1D corresponds to the a model with one pole and time delay as given in Equation (4.40). P1DZ corresponds to a model with one pole, a zero, and time delay as given in Equation (4.41). Finally, P2D corresponds to a model with two poles and time delay as given in Equation (4.42). A model with two poles, a zero, and time delay, as given 111  Water (GPM)  Feed (STPH)  charge and recover the process to nominal operation.  in Equation (4.43), was not considered because there is not enough data to fit a high order model as seen with P2D in the figure. The best fitting model is P1D as shown in the figure, and the model parameters for P1D and the two other candidate models are shown in Table 5.10 for the data-set labelled Feb 27. In this table, r¯ represents the average model residual per sample. r¯ is simply the total residuals divided by the number of samples in the data-set, and it is used instead of the total residuals to provide a comparison of the residuals against models trained on data-sets with more or less samples. In addition, ‘Fit’ provides a measure that compares how well the model predicts the response compared to a line centered on the mean of the response. Fit is calculated from the expression:  Fit = 100 1  ! ˆ ||y y || , ||y mean(y)||  (5.19)  ˆ is the array containing the predicted response from the model, and || · || where y corresponds to the l2 norm. Table 5.10: Modeling steps and fitting results Data-set  Purpose  Name  Feb 27  Train  P1D  Feb 27  Train  P1DZ  Feb 27  Train  P2D  Feb 29  Fit  P1D  Feb 29  Train  n/a  May 27  Test  P1D  Model 116s Y (s) = 0.638e 437s + 1 U1 (s) Y (s) 0.836( 57.6s + 1)e 25.7s = 731s + 1 U1 (s) Y (s) 0.265e 17.8s = U1 (s) (0.03s + 1)(0.03s + 1) 116s Y (s) = 0.638e 437s + 1 U1 (s) 475s Y (s) = 0.021e 3.57s + 1 U2 (s) 116s Y (s) = 0.638e 437s + 1 U1 (s)  Fit (%)  r¯(⇥103 )  94.8  0.37  91.7  0.94  20.6  85.8  84.3  3.65  0.00  323.5  0.00  109.4  From Table 5.10, we observe that the fit is highest and r¯ is the lowest with P1D. Ac112  cordingly, it is not necessary to perform an F -test on the model residuals to determine which model has the best fit for the lowest model order as detailed in Appendix E. The next step to model the overload operating region is to model the response between u2 and y. In this case, data are available for the overload on February 29th, 2012 where both u1 and u2 were reduced to recover from the overload. To find the response between u2 and y, we must estimate the response of y from u1 first, then the remainder between the estimated response and the true response is used to fit the model between u2 and y. The data from February 29th is shown in Figure 5.13. The second subplot of the figure shows the estimated response calculated by P1D. The third subplot shows the remainder of the response that is unexplained by P1D. The model remainder shown in Figure 5.13 was used to identify the FOPTD model response between u2 and y; however, the figure shows that the model fit is poor. One explanation for the poor fit between u2 and y is that there is not enough data to properly identify the model. Alternately, there is possibly a poor correlation between u2 and y, or the correlation is very nonlinear. However, there is not enough data in the overload operating region needed to fit a nonlinear model, and the model fit from P1D alone satisfactorily explains the response. In fact, Figure 5.13 can be thought of as a test set for P1D, and Table 5.10 shows that the fit is reasonable at 84.3%. As a final point, the engineers at the mineral processing facility were also unable to fit a model between u2 and motor load for the nominal operating range, and the e↵ects of water on mill dynamics are poorly understood in the industry and academia. Table 5.11 summarizes the final model fitting results for u1 and y and u2 and y. The table shows that the parameters variances of P1D only a fraction of a percent of the magnitudes of the parameters. Although this is desirable, it is also unlikely that the real parameter variances are so low because P1D trained on a data-set with a limited size. Moreover, the parameter variances for the response between u2 and y were negligible. However, the data-set to train this model was not large enough to appropriately estimate the model or the parameters variances. As a result, the overall  113  6000  2000  4000  1000  2000  Measure of Process Shift  0  0  5  10  15  20  25  30  35  0 40  35  40  1600 Overload 1200 800 400  PCA Output P1D 0  5  10  15  20  25  30  P1D Fit Remainder  150 Remainder Model Fit  100 50 0 −50  0  5  10  15  20 25 Sample Number  30  35  40  Figure 5.13: Fitting P1D to a new training data-set in order to fit a model for u2 on the fit remainder process model for the overload operating region is P1D, and the model between u2 and y is not included in the overall model. The final step, now that the overall model has been chosen, is to validate the overall model on a test data-set. Overload data on May 27th, 2012 was used to validate the overall process model, P1D. There are 290 data points sampled every 30 seconds in this data-set. This data-set and the estimated model response is shown in Figure 5.14. Figure 5.14 shows the estimate of the process shift is accurate in certain regions, 114  Water (GPM)  Feed (STPH)  3000  Table 5.11: Best fit models and their parameter variances Model 116s  Y (s) = 0.638e 437s + 1 U1 (s) 475s  Y (s) = 0.021e 3.57s + 1 U2 (s)  Parameter  Unit  Value  Variance  K ⌧ ✓ K ⌧ ✓  Process Shift STPH  0.638 437 116 0.021 3.57 475  0.001 1.7 0.03 n/a n/a n/a  seconds seconds Process Shift GPM  seconds seconds  and inaccurate in others. This is expected because the process is nonlinear, and P1D is a linear model. Correspondingly, P1D was trained to estimate the response of y only when the system is in an overload operating region. When the system is in the nominal operating region then the model for y is not needed because the nominal controller models will be used instead. In this figure, however, we see from the measure of process shift that the system shifts three times from an overloaded to the nominal operating region. As a result, P1D will have an excellent fit on the shift from overload to nominal, yet will have a poor fit when the process has recovered to the nominal operating region. Only when the process finally returns to the overload does P1D once again show a strong fit for the recovery from overload to nominal. The fitting results for the test set are also included in Table 5.11. However, these results are ambiguous because the model fit will only be accurate for changes to the MVs in the overload region. The models determined in this chapter are used in the recovery MPC as the process models for the response in process shift to changes in the feed rate MV. Moreover, the process model between the process shift measurement and water addition is set to a zero gain model to eliminate changes in water in the recovery response. Since a recovery response will involve reducing the feed and maintaining water flow, it is expected that the response will decrease the density of the charge in the mill sufficiently to help o↵set mill overloading. 115  6000  2000  4000  1000  2000  0  0  50  100  150  200  250  0 300  200  250  300  Measure of Process Shift  1600 Overload 1200  800 PCA Output P1D 400  0  50  100  150 Sample Number  Figure 5.14: P1D fitting results on a test data-set to validate model Expert rules The final modeling procedure required is to determine the expert rules and a limit in the magnitude of process shift where the expert rules take control of the MVs. From the data in Figure 5.14, we see that the operators reduce the feed to zero until the fault has corrected, and then the operators return the feed to roughly the same level prior to their intervention. However, there is no consistent action taken with the water addition. Based on these actions, the expert rules should automatically reduce the feed to zero until the recovery controller can finish the process recovery, then the feed will be returned to a feed rate similar to the level before the expert system intervened. In 116  Water (GPM)  Feed (STPH)  3000  addition, since the action to take with water is uncertain from the data, the water will be held constant because the ratio between feed and water will help reduce the density and viscosity of the charge during an overload. With simulation and plant trials, these rules may be adjusted. However, to start, based on our observations from the data of the operators’ actions during an overload, we recommend using the following rules: Rule 1 If the magnitude of process shift is greater than 1400, override control of the feed and water MVs, and reduce the feed rate to zero. Rule 2 If Rule 1 is active and the magnitude of process shift is less than 1300, increase feed to 80% of the feed rate before Rule 1 was active and return control to the override or nominal controller.  5.4  Recovery from the SAG mill overload  The last part of the case study is a simulation that studies the behaviour of the control strategy in the event of an incipient or abrupt disturbance to the process. Because the mineral processing facility has recently added a second SAG mill to their process, they no longer experience overloading with either of their SAG mills because comminution is no longer the bottleneck at their facility. Consequently, we were unable to incorporate the control strategy at the mineral processing facility, and a simulation was performed instead. The process models used in the simulation are the linear process models identified by Spartan Controls during commissioning of the MPC. However, the time constant of the models was reduced from roughly 10 minutes to approximately one minute. The dead time was also reduced by a factor of 10 in order to speed up the simulation; nonetheless, the sample rate is taken to simulate a two minute sampling of the real process. Moreover, nonlinearity was added to the process outputs to account for the nonlinear nature of the process when the mill has overloaded. The behaviour of the process in this simulation is rudimentary and will not be nearly as accurate as the true 117  response of the SAG mill. However, the models in the simulation do not necessarily need to be accurate because the purpose of this simulation is to demonstrate the behaviour of the control strategy. A more accurate test performed at the mineral processing facility would only allow us to demonstrate how well we modelled the abnormal operating region. Nevertheless, this was demonstrated in Chapter 5.3 on a large test data-set. In each of the three simulations examined in this section, MPC is used as the nominal controller with two MV’s: feed rate and water flow, and two CV’s: feedend west bearing pressure and motor power A. In Scenario 1, an incipient disturbance enters the system where no control strategy is commissioned to recover the process from an overload. Scenario 2 demonstrates the e↵ectiveness of the control strategy to recover the process from an overload following the same input disturbance given in Scenario 1. Finally, in Scenario 3, an abrupt disturbance leads to significant overloading that the expert system mitigates before returning control to the MPC. The e↵ects of the disturbance in these simulations is most visible from the estimated process shift output representing the simulated PCA-LDA output. Additionally, the disturbance also a↵ects the bearing pressure and motor load. Bearing pressure will rise during an overload because the grinding efficiency within the mill is depressed and causes ore to accumulate in the mill. Furthermore, motor power will initially increase during an overload; however, when the overload becomes severe, the motor power will decrease indicating the presence of freewheeling or slurry pooling. Scenario 1: Incipient disturbance without recovery control. Scenario 1 demonstrates the behaviour of the controller when no control strategy is commissioned to recover the process from an overload. In this case, the overload results in a positive feedback where the overload results in ore accumulating in the mill, thus decreasing the grinding efficiency of the mill and worsening the overload. The results of this simulation are shown in Figure 5.15. 118  1200  50  1100  25  FEW Bearing Pressure (psi)  0  20  40  60  80  100  120  0 140  1200  6000  1100  5500  1000  5000  900  4500  800  4000 0  20  40  60  80  100  120  140  3000  6000  2000  5000  1000  4000  0  0  20  40  60 80 Sample Number  100  120  Disturbance  75  Overload  3000 140  Motor Power A (kW)  1300  1000  Feed Rate (STPH)  100  Water Flow (GPM)  Process Shift  1400  Figure 5.15: Scenario 1: control strategy has not been incorporated to mitigate the incipient disturbance In Figure 5.15 we see that the incipient disturbance that enters at sample 15 causes the process shift to increase, indicating the process is becoming closer to an overload. However, when the process is classified as overloaded around sample 44, the controller is reacting slowly because it is only responding to the increase in bearing pressure from the target SP of 950 psi. Even when the disturbance stops increasing at sample 55, the positive feedback mechanism of the overload causes the process shift to gradually increase. As the overload worsens, the bearing pressure increases and the motor power decreases. However, the controller is tuned to decrease feed when the bearing pressure is high, and increase the feed when motor power is low. As a result, the nonlinearity of the process results in an incorrect response between motor power and feed rate. This 119  conflicting response within the MPC causes the changes to the feed to be slow and smaller in magnitude than required to prevent the overload from worsening. Only at sample 134 do the operators recognize the process is overloaded and manually make corrective actions before the bearing pressure exceeds its upper constraint of 1200 psi. However, the operators’ response arrives 90 samples after the process monitoring method first detected that the mill was overloaded. Scenario 2: Incipient disturbance with recovery control. Scenario 2 demonstrates the same disturbance that enters the system as in Scenario 1; however, Scenario 2 treats the measure of process shift as a constraint variable in the MPC with a constraint limit of 1195 as determined from PCA-LDA. The constraint variable allows the MPC to switch models from controlling the bearing pressure and motor power to controlling the measure of process shift as a priority. If the constraint variable exceeds the constraint limit, or if the MPC predicts the constraint limit will be exceeded within a 120 sample prediction horizon, then the MPC will act to reduce the feed to reduce the process shift to the constraint limit. Figure 5.16 shows the results for this simulation. In Figure 5.16, we observe that the overload was detected by the process monitoring method around sample 44—the same as Scenario 1. This tells us that the MPC did not predict the constraint variable would exceed the limit within its prediction horizon. This result makes sense, because the prediction of the constraint variable is determined from the models developed in Chapter 5.3 between the measure of process shift and the feed and water MVs. The overall model response assumed water flow had zero gain on the process shift, and feed had a positive gain. In other words, when the feed decreases in the simulation, the MPC predicts that the measure of process shift should decrease. However, the MPC does not observe the unmeasured disturbance entering the system and cannot compensate until the constraint limit on the constraint variable is violated. 120  1200  50  1100  25  FEW Bearing Pressure (psi)  0  50  100  150  200  250  0 300  1200  6000  1100  5500  1000  5000  900  4500  800  4000 0  50  100  150  200  250  300  3000  6000  2000  5000  1000  4000  0  0  50  100  150 Sample Number  200  250  Disturbance  75  Overload  3000 300  Motor Power A (kW)  1300  1000  Feed Rate (STPH)  100  Water Flow (GPM)  Process Shift  1400  Figure 5.16: Scenario 2: process shift constraint variable is exceeded causing the MPC to correct for the incipient disturbance before severe overloading results Once the constraint limit for the constraint variable is violated, we see that the MPC prevents the overload from worsening by sample 70 by reducing the feed rate. In this simulation, the corrective action of the feed flow was faster than in Scenario 1, and the overload stopped worsening by sample 70. Moreover, the MPC was able to keep the process shift at the constraint limit of the constraint variable until the disturbance subsided. We expect the process to have the optimal throughput at the constraint variable limit because the process is maximally loaded without being overloaded. Therefore, the MPC maintains optimal operation of the process in the presence  121  of the large disturbance that entered the system. Scenario 3: Abrupt disturbance with recovery control. Scenario 3 demonstrates the response of the control strategy in the presence of a large and abrupt disturbance that rapidly overloads the SAG mill. In this case, the MPC is unable to compensate for the unmeasured disturbance alone. Figure 5.17 shows the results for this simulation.  1200  50  1100  25  FEW Bearing Pressure (psi)  0  20  40  60  80  100  120  140  160  180  0 200  1200  6000  1100  5500  1000  5000  900  4500  800  4000 0  20  40  60  80  100  120  140  160  180  200  3000  6000  2000  5000  1000  4000  0  0  20  40  60  80 100 120 Sample Number  140  160  180  Disturbance  75  Overload  3000 200  Motor Power A (kW)  1300  1000  Feed Rate (STPH)  100  Water Flow (GPM)  Process Shift  1400  Figure 5.17: Scenario 3: abrupt disturbance leads to significant overloading that the expert system mitigates before returning control to the MPC In Figure 5.17, we see that by sample 20, the process shift has exceeded the limit that was set at a process shift value of 1300 in this simulation where the expert system is designed to act. The rule in the expert system is to decrease the feed to zero when 122  the process shift exceeds 1300. The rule of 1300 is for simulation purposes only, and a di↵erent value may be used on-site. A limit of 1300 was used instead of 1400 because the recovery controller was aggressive enough to make it difficult in the simulation to bring the process shift above 1400. The expert system will maintain control of the feed and keep the water constant until the process shift returns to below 1230 in this simulation. When the process shift is below 1230, the next rule in the expert system is to increase the feed to 80% of the original feed rate prior to returning control of the MVs to the MPC. Once the expert system activates, we see that the overload is completely corrected after approximately 30 samples. The process recovery is smooth; however, there is a slight overshoot of the process shift around sample 100. On the other hand, the overshoot is small, and it is possible to correct by adjusting the expert rules slightly. For example, the expert system could return control of the MVs to the MPC earlier, or the feed rate could be recovered closer to 100% of the original feed rate prior to returning control of the MVs. Key performance measures We have selected three key performance indices to measure potential benefits of the controller performance recovery algorithm. These three measures are: the time to detect an overload, the time until the overload is fully corrected, and the amount of extra ore, and its value, that the mineral processing facility is able to process. We will look at the benefits of two potential cases: the di↵erence in responses between Scenario 1 and Scenario 2, and the ability to increase the throughput of the SAG mill since the risks of an overload are reduced. The benefits of the control strategy for an event such as Scenario 3 are not considered because of the uncertainty of the costs of this event. Without a control strategy in this scenario, the abrupt disturbance would cause the bearing pressure limit to be exceeded and the mill would have been immediately stopped. Likewise, the damages  123  to the mill would need to be assessed, and it would be difficult to estimate potential equipment damages and downtime of the mill. Table 5.12 summarizes the three key performance measures for the di↵erence in response between Scenario 1 and Scenario 2. Dollar values are given in U.S. dollars (USD) because these metals are usually traded in USD in the global market. The assumptions for these estimates are provided in Table 5.13. Table 5.12: Key performance indices to measure the potential savings between Scenario 1 and Scenario 2 Key performance measure  Value  Unit  Reduction in time to detect overload Reduction in time to fully recover from overload Additional ore processed Profit from additional ore processed  3 hours 1.5 hours 1816 tons 25 257 USD  To determine the amount of additional ore processed between Scenario 1 and Scenario 2, we assumed that the operators reduced the feed to zero for 30 samples, or one hour, after the overload was detected on sample 134 in Figure 5.15; reducing the feed to zero for one hour was observed in historical data for a similar overload on February 29th, 2012. Then, we compare the total throughput of the mill over a period of 164 samples, or 5.4 hours, for both scenarios. The profit from the additional ore processed is calculated using the equation:  Profit = quantity of ore ⇥ copper grade ⇥ copper price/2 + quantity of ore ⇥ molybdenum grade ⇥ molybdenum price/2 quantity of ore ⇥ cost per ton, (5.20) where the grades, prices, and costs are given in Table 5.13, and the sale price of the minerals in the concentrate are assumed to sell for half the market value.  124  Table 5.13: Assumptions for calculating the key performance measures Assumptions  Value  Unit  Copper grade of ore1 Molybdenum grade of ore1 2011 average copper price2 2011 average molybdenum price2 Operating days in a year Cost per ton of ore processed3  0.32 0.01 8811 15.49 300 3.48  % % USD/ton USD/lb days/year USD/ton  The second return of investment opportunity for the mineral processing facility is the ability to slightly increase the throughput of the SAG mill. As we mentioned at the beginning of this chapter, the mineral processing facility operates their process conservatively because of their concern for the SAG mill to overload. However, based upon evidence from historical data, we estimate that they could increase their average throughput by 10 STPH since they would have the capacity to detect the early onset of an overload and automatically recover from it. This leads to an estimated increase in profit by one million USD per year based on an assumed 300 days of operation per year. Moreover, we believe a 10 STPH throughput increase is a conservative estimate as it represents a 0.5% increase in the average throughput of approximately 2000 STPH. The assumptions and calculations of this estimate are also given in Table 5.13, and an additional assumption is that the SAG mill is the bottleneck of their facility.  1  Provided by the mineral processing facility. London Metal Exchange. 3 An estimate from Wills and Napier-Munn (2006) of operating expenses including labour, energy, and water usage for a typical copper concentrating facility. 2  125  Chapter 6 Conclusion E↵ective management of abnormal situations in a chemical or mineral processing facility is vital to maximizing process economy and safety. Traditionally, statistical process monitoring has been utilized to alert operators when a process has shifted to an abnormal operating region. The operators then manually make the corrective control action to recover the process to the nominal operating range. However, this thesis bridges the gap between statistical process monitoring and the DCS. Consequently, the DCS is provided with enough information about the process shift to the abnormal operating range and the necessary control actions that must be made to recover the process to nominal operation. It is in these specific areas that this work has made the following contributions: 1. A general control strategy has been developed in Chapter 4 that proposes two statistical process monitoring techniques that may be used for either linear or nonlinear processes to detect a process shift into to an abnormal operating situation. In addition, the control strategy includes two linear control techniques designed to recover a process from the nonlinear abnormal operating region to the nominal operating region. 2. A technique to allow SLLEP to be performed in real-time in an industrial DCS has been developed and described in Chapter 4.1.2. Since most modern industrial DCS include MPC, it may be used as a tool to solve a QP in real-time with 126  minimal programming. This allows us to map new data to the feature space for classification in the SLLEP process monitoring algorithm. 3. The control strategy was demonstrated on a SAG mill at the mineral processing facility in British Columbia that detects a mill overload and uses the DCS to recover the process without operator intervention. Moreover, the causes and mechanics of a SAG mill overload have been thoroughly studied and comprehensively discussed in Chapter 2.3 for the first time in literature to the best of our knowledge.  6.1  Summary of contributions  The general control strategy proposes using PCA or SLLEP as a feature extractor and then Q, T 2 , LDA, or QDA as a classifier in process monitoring to identify abnormal operating situations. Specifically, SLLEP is proposed as a new feature extractor for processes that behave nonlinearly. The output of the classifiers is a continuous variable that provides an estimate of the magnitude of the process shift to an abnormal operating region. This measure of process shift can be used in two ways: first as a CV in an override MPC controller to override the existing nominal controller’s MVs. The second technique is when the nominal controller is MPC. The technique is to use the measure of process shift as a constraint variable in the MPC, and the constraint variable models in the MPC are modelled for the nonlinear abnormal operating region. In addition, if the measure of process shift indicates a severe overload, an expert system based o↵ of an operator’s best recovery control actions is proposed to recover a process from a severely nonlinear abnormal operating region to the less-severe abnormal operating region that the linear recovery controller is designed. We have developed a method to use MPC to solve the QP in the SLLEP algorithm in real-time in a DCS. SLLEP is a new and advanced process monitoring technique that has shown in this work to be e↵ective for nonlinear processes, or processes with  127  many variables to be monitored. Through this work, we have demonstrated to one of the largest control software developers in the industry the value of SLLEP and the necessity to directly access the QP of the MPC controller. As a result, the developers have agreed to incorporate a manual override of the LP optimizer in their DCS for advanced users to gain direct access to the QP. In other words, we have helped advance the industry enough to allow the commissioning of a highly advanced and forthcoming technique such as SLLEP. The control strategy in this work was demonstrated on a SAG mill at the mineral processing facility. We successfully demonstrated that the process monitoring method is robust to changes in process operating conditions to be able to accurately detect the early onset of a mill overload even after significant changes to the process occurred over the year this study was carried out. In addition, the PCA-LDA process monitoring technique has been installed at the mineral processing facility and is providing information to the operators on the status of their SAG mill. Although the mineral processing facility was unable to incorporate the full control strategy developed in this thesis due to sta↵ limitations, we have demonstrated the control strategy has the potential for significant savings in the operation of their SAG mill. The control strategy is capable of optimally recovering their process from an overload before it becomes severely overloaded. As a result, process stability and safety is significantly increased. Likewise, the control strategy allows the process to be operated closer to the maximum throughput of the SAG mill without having to su↵er the consequences of the increased risk of overloading the mill.  6.2  Future research  The following is a list of potential research subjects based on the work described in this thesis: 1. The improvement in modeling of the abnormal operating region in the cases  128  where data availability is limited, or the excitation of the process inputs is so large that the response in the data is nonlinear. 2. The improvement of locally linear embedding to map new observations to the feature space for classification. Specifically, the mapping errors need to be reduced, especially for regions of data sparsity in the training set. 3. The training of a classifier using fuzzy class data rather than binary class data 4. The application of this control strategy to other industries.  129  Bibliography Apelt, T. A. Inferential measurement models for semi-autogenous grinding mills. Ph.D. Thesis, University of Sydney, Australia, 2002. Bailey, S. J. From desktop to plant floor, a CRT is the control operators window on the process. Control Engineering, 31(6):86–90, 1984. Basseville, M. Detecting changes in signals and systems–A survey. Automatica, 24(3): 309–326, 1988. Bequette, B. W. Process Dynamics: Modeling, Analysis, and Simulation. PrenticeHall, 1998. Bianchi, F., De Battista, H., and Mantz, R. Wind Turbine Control Systems: Principles, Modelling and Gain-scheduling Design. Springer, 2006. Bly, M. Deepwater Horizon Accident Investigation Report. DIANE Publishing, 2011. Boser, B. E., Guyon, I. M., and Vapnik, V. N. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992. Boyd, S. and Vandenberghe, L. Convex Optimization. Cambridge University Press, 2004. Camacho, E. and Bordons, C. Model Predictive Control. Springer, 2nd edition, 2007. Chen, W. H. Disturbance observer based control for nonlinear systems. Mechatronics, IEEE/ASME Transactions on, 9(4):706–710, 2004. Chen, X. S., Li, Q., and Fei, S. Constrained model predictive control in ball mill grinding process. Powder Technology, 186:31–39, 2008. Chen, X. S., Yang, J., Li, S. H., and Li, Q. Disturbance observer based multi-variable control of ball mill grinding circuits. Journal of Process Control, 19:1205–1213, 2009. Cheng, H., Nikus, M., and J¨ams¨a-Jounela, S.-L. Evaluation of PCA methods with improved fault isolation capabilities on a paper machine simulator. Chemometrics and Intelligent Laboratory Systems, 92(2):186–199, 2008.  130  Chisci, L., Falugi, P., and Zappa, G. Gain-scheduling MPC of nonlinear systems. International Journal of Robust and Nonlinear Control, 13(3-4):295–308, 2003. Choi, S. W., Lee, C., Lee, J.-M., Park, J. H., and Lee, I.-B. Fault detection and identification of nonlinear processes based on kernel PCA. Chemometrics and Intelligent Laboratory Systems, 75(1):55–67, 2005. Chou, H. H. Fault diagnosis of a heat exchanger system using unknown input observers. M.A.Sc. Thesis, University of Toronto, Canada, 2000. Cleary, P. W., Sinnott, M., and Morrison, R. Prediction of slurry transport in SAG mills using SPH fluid flow in a dynamic DEM based porous media. Minerals Engineering, 19:1517–1527, 2006. Coetzee, L. C. Robust nonlinear model predictive control of a closed run-of-mine ore milling circuit. Ph.D. Thesis, University of Pretoria, South Africa, 2009. Craig, I. K. and MacLeod, I. M. Specification framework for robust control of a run-of-mine ore milling circuit. Control Engineering Practice, 3(5):621–630, 1995. De Ridder, D. and Duin, R. P. W. Locally linear embedding for classification. Pattern Recognition Group, Dept. of Imaging Science & Technology, Delft University of Technology, Delft, The Netherlands, Tech. Rep. PH-2002-01, 2002. Dean, D. R. Memo to Jim Dunstan, Rosebud Joint Venture, Nevada, January 1999. Doyle, J. C., Francis, B. A., and Tannenbaum, A. Feedback control theory, volume 1. Macmillan Publishing Company, New York, 1992. Draper, N. R., Smith, H., and Pownell, E. Applied Regression Analysis. John Wiley & Sons, 3rd edition, 1998. Fisher, R. A. The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7(2):179–188, 1936. Frank, P. M., Ding, S. X., and Marcu, T. Model-based fault diagnosis in technical processes. Transactions of the Institute of Measurement and Control, 22(1):57–101, 2000. Friedberg, S. H., Insel, A. J., and Spence, L. E. Linear Algebra. Pearson, 4th edition, 2002. Gal´an, O., Barton, G. W., and Romagnoli, J. A. Robust control of a SAG mill. Powder Technology, 124(3):264–271, 2002. Garc´ıa, C. E., Prett, D. M., and Morari, M. Model predictive control: Theory and practice–A survey. Automatica, 25(3):335–348, 1989. 131  ´ Garc´ıa-Alvarez, D., Fuente, M. J., Vega, P., and Sainz, G. Fault detection and diagnosis using multivariate statistical techniques in a wastewater treatment plant. In 7th IFAC International Symposium on Advanced Control of Chemical Processes, pages 952–957, 2009. ´ Garc´ıa-Alvarez, D., Fuente, M., and Sainz, G. Fault detection and isolation in transient states using principal component analysis. Journal of Process Control, 22(3):551– 563, 2012. Garnier, H., Mensler, M., and Richard, A. Continuous-time model identification from sampled data: Implementation issues and performance evaluation. International Journal of Control, 76(13):1337–1357, 2003. Gill, D. A., Picou, J. S., and Ritchie, L. A. The exxon valdez and BP oil spills: A comparison of initial social and psychological impacts. American Behavioral Scientist, 56(1):3–23, 2011. Goodwin, G. C. A brief overview of nonlinear control. Centre for Integrated Dynamics and Control Bildirisi, Department of Electrical and Computer Engineering, The University of Newcastle, Australia, 2002. Gupta, A. and Yan, D. Mineral Processing Design and Operations. Elsevier, Amsterdam, The Netherlands, 2006. Hespanha, J. P. and Morse, A. S. Switching between stabilizing controllers. Automatica, 38(11):1905–1917, 2002. Hespanha, J. P., Liberzon, D., and Morse, A. S. Hysteresis-based switching algorithms for supervisory control of uncertain systems. Automatica, 39(2):263–272, 2003. Himmelblau, D. M. Fault detection and diagnosis in chemical and petrochemical processes. Elsevier Scientific Publishing Company, 1978. Horn, R. A. and Johnson, C. R. Matrix Analysis. Cambridge University Press, 1990. Isermann, R. Model-based fault-detection and diagnosis–Status and applications. Annual Reviews in Control, 29(1):71–85, 2005. Jackson, E. J. and Mudholkar, G. S. Control procedures for residuals associated with principal component analysis. Technometrics, 21(3):341–349, 1979. Jackson, J. E. A user’s guide to principal components, volume 244. Wiley-Interscience, 1991. Johnson, R. A. and Wichern, D. W. Applied Multivariate Statistical Analysis. Pearson, 6th edition, 2007. 132  Kalman, R. E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82:35–45, 1960. Katubilwa, F. M. and Moys, M. H. E↵ects of filling degree and viscosity of slurry on mill load orientation. Minerals Engineering, 24:1502–1512, 2011. Kim, B. S., Calise, A. J., and Kam, M. Nonlinear flight control using neural networks and feedback linearization. Journal of Guidance, Control, and Dynamics, 20(1): 26–33, 1997. Ko, Y.-D. and Shang, H. SAG mill system diagnosis using multivariate process variable analysis. The Canadian Journal of Chemical Engineering, 89(6):1492–1501, 2011. Koivo, H. N. Artificial neural networks in fault diagnosis and control. Control Engineering Practice, 2(1):89–101, 1994. Kosanovich, K. A., Dahl, K. S., and Piovoso, M. J. Improved process understanding using multiway principal component analysis. Industrial & Engineering Chemistry Research, 35(1):138–146, 1996. Kourti, T., Lee, J., and MacGregor, J. F. Experiences with industrial applications of projection methods for multivariate statistical process control. Computers & Chemical Engineering, 20:S745–S750, 1996. Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37(2):233–243, 1991. Krohling, R. A. and Rey, J. P. Design of optimal disturbance rejection PID controllers using genetic algorithms. Evolutionary Computation, IEEE Transactions on, 5(1): 78–82, 2001. Ku, W., Storer, R. H., and Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 30(1):179–196, 1995. Latchireddi, S. and Morrell, S. Slurry flow in mills: Grate-only discharge mechanism (Part-1). Minerals Engineering, 16:625–633, 2003a. Latchireddi, S. and Morrell, S. Slurry flow in mills: Grate-pulp lifter discharge systems (Part-2). Minerals Engineering, 16:635–642, 2003b. Lee, J.-M., Yoo, C., Choi, S. W., Vanrolleghem, P. A., and Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chemical Engineering Science, 59(1):223–234, January 2004. Leith, D. J. and Leithead, W. E. Survey of gain-scheduling analysis and design. International Journal of Control, 73(11):1001–1025, 2000. 133  Li, B. and Zhang, Y. Supervised locally linear embedding projection (SLLEP) for machinery fault diagnosis. Mechanical Systems and Signal Processing, 25(8):3125– 3134, 2011. Liu, C. S. and Peng, H. Disturbance observer based tracking control. Journal of Dynamic Systems and Control Division, 122:332–335, 2000. Liu, Y., Weisberg, R. H., Hu, C., and Zheng, L. Tracking the deepwater horizon oil spill: A modeling perspective. 92(6):45, 2011. Ljung, L. Estimating linear time-invariant models of nonlinear time-varying systems. European Journal of Control, 7(2-3):203–219, 2001. Ljung, L. The System Identification Toolbox: The Manual. The MathWorks Inc., Natick, MA, USA, 7th edition, 2007a. Ljung, L. and Wills, A. Issues in sampling and estimating continuous-time models with stochastic disturbances. In IFAC World Congress, Seoul, Korea, July 2008. Ljung, L. System Identification. Prentice-Hall, 2nd edition, 2007b. Ljung, L. Experiments with identification of continuous time models. In Proceedings of the 15th IFAC symposium on system identification, Saint-Malo, France, pages 90–95, 2009. MacGregor, J. F. and Kourti, T. Statistical process control of multivariate processes. Control Engineering Practice, 3(3):403–414, 1995. Mahadevan, S. and Shah, S. L. Fault detection and diagnosis in process data using one-class support vector machines. Journal of Process Control, 19(10):1627–1639, 2009. Manohar, C. S. and Roy, D. Monte Carlo filters for identification of nonlinear structural dynamical systems. S¯adhan¯a, 31(4):399–427, 2006. Mari˜ no, R. Nonlinear Control Design: Geometric, Adaptive, & Robust. Prentice Hall, New York, 1995. Maybeck, P. S. Stochastic Models, Estimation, and Control, volume 1. Academic Press, New York, 1979. McLachlan, G. J. Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons, New Jersey, 2nd edition, 2004. Mhaskar, P., Gani, A., and Christofides, P. D. Fault-tolerant control of nonlinear processes: Performance-based reconfiguration and robustness. International Journal of Robust and Nonlinear Control, 16(3):91–111, 2005. 134  Mokken, A. H., Blendulf, G. K. I., and Young, G. J. C. A study of the arrangements for pulp discharge on pebble mills, and their influence on mill performance. Journal of the South African IMM, pages 257–289, 1975. Morrell, S. The prediction of power draw in wet tumbling mills. Ph.D. Thesis, University of Queensland, Australia, 1993. Morrison, R. D. and Cleary, P. W. Towards a virtual communition machine. Minerals Engineering, 21:770–781, 2008. Morse, A. S. Supervisory control of families of linear set point controllers. In Proceedings of the 32nd Conference on Decision and Control, Texas, 1993. Moys, M. H. Slurry rheology-The key to a further advance in grinding mill control. Proceedings: Advances in autogenous and semi-autogenous grinding technology, 1989. Moys, M. H., Van Nierop, M. A., and Smit, I. Progress in measuring and modelling load behaviour in pilot and industrial mills. Minerals Engineering, 9(12):1201–1214, 1996. Najim, K., Hodouin, D., and Desbiens, A. Adaptive control: State of the art and an application to a grinding process. Powder Technology, 82(1):59–68, 1995. Napier-Munn, T. J., Morrell, S., Morrison, R. D., and Kojovic, T. Mineral Communition Circuits: Their Operation and Optimization. Julius Kruttschnitt Mineral Research Centre, Australia, 1996. Neogi, D. and Schlags, C. E. Multivariate statistical analysis of an emulsion batch process. Industrial & Engineering Chemistry Research, 37(10):3971–3979, 1998. Nijmeijer, H. and Van der Schaft, A. Nonlinear dynamical control systems. Springer, New York, 1990. Nimmo, I. Adequately address abnormal operations. Chemical Engineering Progress, 91(9):36–45, 1995. Ohran, B. J., de la Pe˜ na, D. M., Christofides, P. D., and Davis, J. F. Enhancing data-based fault isolation through nonlinear control: Application to a polyethylene reactor. In American Control Conference, 2008, pages 3299–3306, 2008. Ohran, B. J., Liu, J., de la Pe˜ na, D. M., Christofides, P. D., and Davis, J. F. Databased fault detection and isolation using feedback control: Output feedback and optimality. Chemical Engineering Science, 64(10):2370–2383, 2009. Packard, A. and Doyle, J. The complex structured singular value. Automatica, 29: 71–109, January 1993. 135  Patton, R. J. Fault-tolerant control systems: The 1997 situation. In IFAC Symposium on Fault Detection Supervision and Safety for Technical Processes, volume 3, pages 1033–1054, 1997. Patton, R. J., Chen, J., and Siew, T. M. Fault diagnosis in nonlinear dynamic systems via neural networks. In International Conference on Control’94, volume 2, pages 1346–1351, 1994. Primbs, J. A. and Nevistic, V. Optimality of Nonlinear Design Techniques: A Converse HJB Approach. Technical report CIT-CDS 96022, California Institute of Technology, Pasadena, CA 91125., 1996. Quine, B. M. A derivative-free implementation of the extended Kalman filter. Automatica, 42(11):1927–1934, 2006. Ramesh, T. S., Shum, S. K., and Davis, J. F. A structured framework for efficient problem solving in diagnostic expert systems. Computers & Chemical Engineering, 12(9):891–902, 1988. Rao, R. C. The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society, 10(2):159–203, 1948. Roweis, S. T. and Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000. Rugh, W. J. and Shamma, J. S. Research on gain scheduling. Automatica, 36(10): 1401–1425, 2000. Russell, E. L., Chiang, L. H., and Braatz, R. D. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 51(1):81–93, 2000. Saha, N. and Roy, D. Two-stage extended Kalman filters with derivative-free local linearizations. Journal of Engineering Mechanics, 137(8):537–546, 2011. Saul, L. K. and Roweis, S. T. An introduction to locally linear embedding. Unpublished. Available at: http://www.cs.nyu.edu/⇠roweis/lle/publications.html, 2000. Saul, L. K. and Roweis, S. T. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research, 4:119–155, 2003. Sch¨olkopf, B., Smola, A., and M¨ uller, K. R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998. Serpas, M., Chu, Y., and Hahn, J. Fault detection approach for systems involving soft sensors. Journal of Loss Prevention in the Process Industries, July 2012. 136  Shi, F. N. and Napier-Munn, T. J. E↵ects of slurry rheology on industrial grinding performance. International Journal of Mineral Processing, 65:125–140, 2002. Skogestad, S. and Postlethwaite, I. Multivariable Feedback Control: Analysis and Design. John Wiley & Sons, 2nd edition, 2005. Tong, S. and Koller, D. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2:45–66, 2002. Valery, W. A model for dynamic and steady-state simulation of autogenous and semiautogenous mills. Ph.D. Thesis, University of Queensland, Australia, 1998. van Drunick, W. I. and Penny, B. Expert mill control at AngloGold ashanti. 105(7): 497, 2005. Venkatasubramanian, V. and Chan, K. A neural network methodology for process fault diagnosis. AIChE Journal, 35(12):1993–2002, 1989. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Computers & Chemical Engineering, 27(3):293–311, 2003a. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies. Computers & Chemical Engineering, 27(3):313–326, 2003b. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review of process fault detection and diagnosis: Part III: Process history based methods. Computers & Chemical Engineering, 27(3):327–346, 2003c. Wan, E. A. and Van Der Merwe, R. The unscented Kalman filter for nonlinear estimation. In Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000, pages 153–158, 2000. Wang, J. Locally linear embedding. In Geometric Structure of High-Dimensional Data and Dimensionality Reduction, pages 203–220. Springer Berlin Heidelberg, 2011. Wei, D. and Craig, I. K. Grinding mill circuits–A survey of control and economic concerns. International Journal of Mineral Processing, 90(1):56–66, 2009. Wills, B. A. and Napier-Munn, T. J. Mineral Processing Technology. Elsevier, Oxford, 7th edition, 2006. Yang, J., Li, S., Chen, X. S., and Li, Q. Disturbance rejection of ball mill grinding circuits using DOB and MPC. Powder Technology, 198(2):219–228, 2010.  137  Yoon, S. and MacGregor, J. F. Fault diagnosis with multivariate statistical models Part I: Using steady state fault signatures. Journal of Process Control, 11(4):387– 400, 2001. Zhang, Y. and Jiang, J. Bibliographical review on reconfigurable fault-tolerant control systems. Annual Reviews in Control, 32(2):229–252, 2008. Zhou, Y., Hahn, J., and Mannan, M. S. Fault detection and classification in chemical processes based on neural networks with feature extraction. ISA Transactions, 42 (4):651–664, 2003. Zumo↵en, D. and Basualdo, M. From large chemical plant data to fault diagnosis integrated to decentralized fault-tolerant control: Pulp mill process application. Industrial & Engineering Chemistry Research, 47(4):1201–1220, 2008.  138  Appendix A Selection of Process Variables This section acts as guideline to help select process variables to be used in PCA. To select the most relevant process variables as inputs to a PC model, we need an understanding of the process from a physics and an operating point of view. With a basic understanding of the process, we can strategically select process variables, or transformations of process variables, to make the PC model robust to extraneous information, noise, or drifting. Moreover, we can replace highly nonlinear process variables with transformations that behave more linear. As a result, it is possible to use PCA as a statistical process monitoring technique for some nonlinear or timevarying processes. The disadvantage of a careful selection of process variables is there is a much larger time requirement to build a PC model when compared to the traditional method of building the model from all of the main process inputs and outputs. From the physics perspective, we need to have a fundamental understanding of the dynamics inside of the process unit. For example, it is important to know how mass and energy flows from the inputs to the outputs of the process boundary. Furthermore, we need to know how the accumulation of mass or energy in the process unit a↵ect the outputs or the economic objective. It is also important to understand how the physics of the process change when the process shifts from a nominal to an abnormal operating region. By gaining a fundamental understanding of the process, we may be able to transform certain process variables, based on physical intuition, in such a way that the transformation more clearly indicates the process has shifted. Some 139  examples of the types of variable transformations based on a physical understanding of the process that can be used in PCA to detect a process shift are: • transform bearing pressures to reflect load distribution in the process unit. • design a mass or energy balance to determine if mass or energy is accumulating inside a process. • develop a model that compares the expected throughput or economics of the process to measured values. • develop a model that compares the expected acoustics, light, or vibration intensity output of a process to measured values. From the operational perspective, there is a lot of insight into the behaviour of the nominal and abnormal operation of the process from the operators and process data. It is important to have a discussion with the operators to learn how they identify nominal and abnormal process operation. Specifically, we need to know what key variables they monitor, and what are the changes in these variables following a process shift. Almost certainly, the variables the operators monitor to detect a process shift will also be included in some form in the PC model. Operators can also provide valuable insight on which sensors are poorly calibrated and drift. Once time-varying variables are identified, they can either be excluded from the PC model if they are not critical to the model, or the variables can be transformed into the di↵erence the raw measurement is from a moving average. However, as few variables as possible should be transformed into the di↵erence from a moving average. Taking the di↵erence from a moving average may desensitize the process monitoring method if the process remains stable near the undesired operating region for a long time. Operating data can also provide insight into the selection of key process variables. For example, a correlation analysis, such as a correlation plot, between process variables may help to eliminate variables that provide redundant information during a 140  process shift. In addition, plotting time series trends or frequency spectra of various process variables together may show which variables best indicate a process shift has occurred. A time series plot may also reveal if particular measurements are significantly corrupted with noise and need to be filtered. Moreover, a time series plot may reveal that the rate of change of a variable, when compared to other variables, o↵ers another form of an indication of the process shift. To incorporate the rate of change of process variables into a PC model, the derivative of each variable needs to calculated and subsequently filtered.  A.1  Transformation of sound level  Figure A.1 shows sound level data sampled every two minutes from the mineral processing facility on February 25th, 2012 to February 29th, 2012. Data points where the SAG mill was not operational have been removed from this data-set. The measured sound level is filtered in the DCS at the mineral processing facility with a first-order filter with a time constant of 40 seconds. The moving average of the sound level is calculated from a first-order filter with a time constant of 0.4 days. A moving average using a first-order filter is desired because it is computationally easy to implement and compute in real-time in a DCS. The deviation of the measured sound level from the estimated moving average is used to replace the measured sound level because it is more robust to changes in the mean of the sound level due to operation of the SAG mill and auxiliary equipment. The red lines in the figure depict the region on February 29th, 2012 that the engineers officially recognize the process as being overloaded. From the figure, we see that the sound level is less than the moving average. This is expected because the energy of the falling ore in the mill is either reduced or dampened due to the presence of slurry pooling or a reduced distance the ore can cataract.  141  140  Sound Level, no units  120  100  80  60  40  20 Measured Sound Level Moving Average of Sound Level 0  0  500  1000  1500 Sample Number  Overload  2000  2500  3000  Figure A.1: Di↵erence between measured sound level and a 0.4 day moving average of the sound level  142  Appendix B Locally Linear Embedding B.1  Minimization of reconstruction error  The reconstruction error in Equation (4.12) is minimized in three steps: 1. Evaluate the inner product between each of the K neighbors of xi to find the neighborhood correlation matrix C. Let the K-nearest neighbors of xi be represented by X ⌘ where: X ⌘ = [x⌘1 , x⌘2 , . . . , x⌘K ] 2 <D⇥K ,  (B.1)  Then C is computed as: C = xT⌘j x⌘k ,  (B.2)  where 1  j  K and 1  k  K. 2. Compute the Lagrange multiplier,  L  = ↵L /  L  that enforces the sum to one  constraint in Equation (4.14), where for each xi the parameters ↵ and  ↵L = 1  K X K X  Cjk1 (xTi x⌘k ),  are:  (B.3)  j=1 k=1  L  =  K X K X  Cjk1 .  j=1 k=1  143  (B.4)  3. Compute the reconstruction weights, Wj , corresponding to the ith reconstruction: Wj =  K X K X  Cjk1 (xTi x⌘k +  L ).  (B.5)  j=1 k=1  If the correlation matrix C is nearly singular, it can be conditioned before inversion by adding a small multiple of the identity matrix—this is known as regularization and essentially penalizes large weights.  B.2  Minimization of embedding cost  The embedding cost, as a function of in Equation (4.15) subject to the constraints in Equation (4.16) and Equation (4.17) can be rewritten to the form: minimize y  (y) =  n X n X  Mij y Ti y j = tr(Y M Y T ),  (B.6)  i=1 j=1  where M is a sparse symmetric and positive semidefinite n ⇥ n matrix as follows: M = (I  W )T (I  W ).  (B.7)  By the Rayleigh-Ritz theorem, Equation (B.6) can be solved by finding the eigenvectors corresponding to the smallest d + 1 eigenvalues (Horn and Johnson, 1990). The smallest eigenvalue of this matrix is the unit vector with all equal components and it is discarded. It corresponds a free translation mode of eigenvalue zero: discarding it enforces the constraint that the embeddings have zero mean.  B.3  Least squares regression to map new samples  If X T⌘ has full column rank, then aj can be solved as: aj = (X T⌘ )+ y j⌘ = (X ⌘ X T⌘ ) 1 X ⌘ y j⌘ , 144  (B.8)  where (X T⌘ )+ is the Moore-Penrose pseudo-inverse of X T⌘ , and y j⌘ is defined as: h iT y j⌘ = y⌘j 1 , y⌘j 2 , . . . , y⌘j K  (B.9)  If X T⌘ is column rank deficient, such as in the case where K < D, then Equation (4.23) is ill-posed and there are infinitely many solutions for aj . In this case, an e↵ective method to condition the problem so that there are unique solutions to aj is to perform ridge regression (Draper et al., 1998), which minimizes a regularized least squares problem. With the introduction of a regularization parameter,  > 0, then  the regularized least square problem is posed:  aj = argmin a  K X  aT x⌘i  y⌘j i  2 2  +  a  2 , 2  (B.10)  i  where || · || corresponds to the l2 norm. Each basis vector aj can be solved as: aj = (X ⌘ X T⌘ + I) 1 X ⌘ y j⌘ , where I is the D ⇥ D identity matrix. The parameter  can be determined from the variance of the noise,  (B.11)  2  , since  the objective is to project the neighborhoods linearly into a d-dimensional space and discard the noise. Therefore,  can be estimated by:  =  where  i  2  =  1 K  K X  d i=d+1  i,  (B.12)  are the non-zero eigenvalues of matrix X ⌘ X T⌘ sorted from largest to smallest.  145  Appendix C Cross Validation The purpose of cross-validation is to train a predictive model that has one or more unknown parameters. Cross-validation optimizes the model parameters to find the model that best fits the training data-set. Moreover, cross-validation assesses how well the model we train will perform in practice. Consider a model, f (x,  m ),  with one unknown model parameter,  m,  that is to  be trained on the training data-set, X, consisting of n observations, x. The crossvalidation procedure to find  m  so that f (x,  m)  fits X as well as possible and also gen-  eralizes well to new observations is given below and is called Kf -fold cross-validation. Step 1 Randomly partition X into Kf mutually exclusive data-sets: X = (X 1 , X 2 , . . . , X Kf ).  (C.1)  Step 2 For each k = 1, 2, . . . ,Kf , train the model to the entire training set excluding the kth-fold, X k . Step 3 Fit the observations in X k to the model trained in the previous step. Step 4 Compute the prediction error sum of squares (PRESS) statistic for the kthfold: PRESSk =  n X i=1  146  xi  ˆi x  2 , 2  (C.2)  ˆ i is the estimate of xi from the model trained in Step 2. where x Step 5 Compute the overall cross-validation error, XVerror :  XVerror  Kf 1 X = PRESSk . Kf k=1  Step 6 Repeat Steps 1-5 with several  m  (C.3)  values and choose the best one,  m  ⇤  , that  results in the lowest XVerror . Step 7 Train f (x,  ⇤ m)  on X using  m  ⇤  .  In Step 6, a grid search can be used to select test grid of roughly 10 equidistant m  m  m  values. First, use a coarse  values within a reasonable range. For the range of  values where XVerror is the smallest, perform a fine grid search until satisfied that  the XVerror has been minimized to a desired tolerance. In the case there are multiple parameters to optimize, perform a grid search that exhausts all combinations of the gridded model parameters. Once f (x,  ⇤ m)  has been trained on X, it should now be tested on a new data-set,  also known as a test set, to further validate the model performance before using in practice. In the cross-validation procedure presented in this section, Kf was chosen to be 10 for balance. A large Kf results in a better trained classifier since there is more training data available; however, variances may exist between di↵erent partitions of the training set that are not captured if Kf is large. On the other hand, if Kf is small the classifier may be poorly trained. Therefore, Kf = 10 strikes a balance between these competing objectives.  147  Appendix D Model Predictive Control The objective of dynamic matrix control MPC is to reduce the error between measured process outputs and their SPs in a least squares sense with the inclusion of a penalty term on the input moves to the process. Therefore, the manipulated variables are determined such that the cost function, J, is minimized:  minimize J( u) = u  ny p X X  yˆj (t + k|t)  wj (t + k)  2 R  +  nu X m X  ul (t + q  1)  2 , Q  l=1 q=1  j=1 k=1  where yˆ is a predicted output for the kth prediction and jth output variable;  (D.1) ul is a  future control signal for the qth prediction and lth control variable; ny is the number of control variables; nu is the number of manipulated variables; p is the length of the prediction horizon; m is the length of the control horizon; R 2 <ny ·p⇥ny ·p is the penalty  on error diagonal weight matrix; and Q 2 <nu ·m⇥nu ·m is the penalty on move diagonal weight matrix. Equation (D.1) can be also be written in matrix form: minimize J(u) = yˆ u  w  2 R  + u  2 . Q  (D.2)  Here, yˆ 2 <p·ny ⇥1 is the vector of predicted outputs; u 2 <m·nu ⇥1 is the array of future control signals, where yˆ is defined as: yˆ = [ˆ y1 (t + 1|t), . . . , yˆ1 (t + p|t), . . . , yˆny (t + 1|t), . . . , yˆny (t + p|t)]T ,  148  (D.3)  and u is: u = [ u1 (t), . . . , u1 (t + m  1), . . . , unu (t), . . . , unu (t + m  1)]T .  (D.4)  Future output predictions are a function of future control moves; in other words, yˆ is a function of u, and yˆ can be computed by the expression: yˆ = Gu + f ,  (D.5)  where f is the vector for the free response of the system, that is, it is the part of the system response that does not depend on future control moves, and is given by: f = [f1 (t + 1|t), . . . , f1 (t + p|t), . . . , fny (t + 1|t), . . . , fny (t + p|t)]T .  (D.6)  Here, f1 , . . . , fny are the free responses corresponding to the ny process outputs, and is given by:  fj (t + k) = ymj (t) +  Nj,1 X  Nj,nu j,1 (gk+i  gij,1 )  u(t  i=1  i) + · · · +  X  j,nu (gk+i  gij,nu ) u(t  i),  i=1  for j = 1, . . . , ny , (D.7) where ymj corresponds to the jth measured output; Nj,l is the number of samples until steady state is reached in yj following a step change in ul with l = 1, . . . , nu ; k = 1, . . . , p; and gij,l is the ith coefficient of the response in yj following a unit step change in ul . Furthermore, the gij,l coefficients are used to construct the system’s  149  dynamic matrix between yj and ul , Gj,l 2 <p⇥m , and is given by: 2  Gj,l  g 0 6 1 6 6 g2 g1 6 .. 6 .. 6 . . =6 6 6gm gm 1 6 6 .. .. 6 . . 4 gp gp 1  ...  0  ... ...  0 .. .  ... .. .  g1 .. .  . . . gp  m+1  3  7 7 7 7 7 7 7; 7 7 7 7 7 5  (D.8)  furthermore, for multiple-input and multiple-output systems, Gj,l are combined into the combined matrix G containing all the system dynamic matrices, where G is: 2  G G1,2 6 1,1 6 6 G2,1 G2,2 G=6 6 .. .. 6 . . 4 Gny ,1 Gny ,2 Since R  ...  G1,nu  ... ...  G2,nu .. .  3  . . . Gny ,nu  7 7 7 7. 7 7 5  (D.9)  0 and Q ⌫ 0, Equation (D.2) is a convex quadratic program (Boyd  and Vandenberghe, 2004), and an analytical solution for u is found by setting the derivative to zero. The solution to u computed at each time instant t is: u = (GT RG + Q) 1 GT R(w For each t, only (w  f ),  (D.10)  f ) needs to be recalculated since the system is assumed to  be LTI. Therefore, we define K = (GT RG + Q) 1 GT R and calculate it o✏ine prior to implementation of MPC. Moreover, since only the first control move is executed at each t, K can be reduced to only the rows corresponding to the first calculated control move for each  ul . Then Equation (D.10) is simplified to: u = K(w  150  f ).  (D.11)  where each element of u contains the control move for each  151  ul to be executed.  Appendix E Model Selection Model selection is important when modeling a system because there is a trade-o↵ between prediction accuracy and model complexity. The greater the order of the model, the more likely the possibility that the model overfits the data and results in poor generalization to new data. To select the process model that best fits a training data-set, a method known as the extra sum of squares method can be used. Extra sum of squares is a simple technique that allows us to measure whether increasing the order of the model results in a statistically significant increase in prediction accuracy at a confidence level of ↵. The first step in extra sum of squares is to find the sum of squares of the model residuals, Qj , for the jth model where, j = 1, 2, . . . , nm , and nm is the number of models to be compared against each other. Moreover, the nm models are sorted in ascending order in terms of the number of fitted parameters, pj , in the jth model. Qj is computed from the expression: Qj =  n X  (yi  yˆi,j )2 ,  (E.1)  i=1  where n is the number of observations in the training data-set, yi is the ith observation of the output in the data-set, and yˆi is the ith estimate of yi from the jth model. The next step is to compute the F -statistic between combinations each model where pj  152  increases only by one, resulting in a total of k = j Fk =  ✓  Qj Qj+1 pj+1 pj  ◆✓  n  1 combinations: pj+1  Qj+1  ◆  .  (E.2)  The next step is to find F↵ . To do this, find the value in the F -distribution table for df1 = pj+1  pj and df2 = n  pj+1 at a confidence level of ↵, where df corresponds  to the degrees of freedom in the F -distribution. The final step is to compare the F values to find the first case where Fk < F↵ . Once this condition is satisfied, then the jth model is chosen as the best candidate, where j = k. With this method, it is possible that a comparison between two models where j in pj is increased by more than one results in a higher-order and better fitting model. However, it is unlikely in this case that the increase in model order is significantly more beneficial to the predictive accuracy. If it appears a model is not selected but may be more beneficial for modeling the process, the predictive accuracy of the unselected model should be compared to the predictive accuracy of the selected model on a test data-set to see if the prediction accuracy substantially improved. This section is meant to act as a guideline for model selection. There are numerous other methods available for model selection. For example, a cost function incorporating the cost of prediction errors and parameter costs could be minimized to find the best model. The final prediction-error criterion discussed by Ljung (2007b) is one example for model selection based upon the minimization of a cost function.  153  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0073946/manifest

Comment

Related Items