Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Adaptive model-predictive control and its applications in paper-making processes Lu, Qiugang 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_february_lu_qiugang.pdf [ 6.8MB ]
JSON: 24-1.0363007.json
JSON-LD: 24-1.0363007-ld.json
RDF/XML (Pretty): 24-1.0363007-rdf.xml
RDF/JSON: 24-1.0363007-rdf.json
Turtle: 24-1.0363007-turtle.txt
N-Triples: 24-1.0363007-rdf-ntriples.txt
Original Record: 24-1.0363007-source.json
Full Text

Full Text

Adaptive Model-Predictive Control and ItsApplications in Paper-Making ProcessesbyQiugang LuB.Sc., Harbin Institute of Technology, 2011M.Sc., Harbin Institute of Technology, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Chemical and Biological Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)January 2018c© Qiugang Lu 2018AbstractModel-based controllers such as model-predictive control (MPC) have become dominated control strategiesfor various industrial applications including sheet and film processes such as the machine-directional (MD)and cross-directional (CD) processes of paper machines. However, many industrial processes may have vary-ing dynamics over time and consequently model-based controllers may experience significant performanceloss under such circumstances, due to the presence of model-plant mismatch (MPM). We propose an adap-tive control scheme for sheet and film processes, consisting of performance assessment, MPM detection,optimal input design, closed-loop identification and controller adaptive tuning.In this work, four problems are addressed for the above adaptive control strategy. First, we extend con-ventional performance assessment techniques based on minimum-variance control (MVC) to the CD process,accounting for both spatial and temporal performance limitations. A computationally efficient algorithm isprovided for large-scale CD processes. Second, we propose a novel closed-loop identification algorithmfor the MD process and then extend it to the CD process. This identification algorithm can give consistentparameter estimates asymptotically even when true noise model structure is not known. Third, we propose anovel MPM detection method for MD processes and then further extend it to the CD process. This approachis based on routine closed-loop identifications with moving windows and process model classifications. Aone-class support vector machine (SVM) is used to characterize normal process models from training dataand detect the MPM by predicting the classification of models from test data. Fourth, an optimal closed-loop input design is proposed for the CD process based on noncausal modeling to address the complexityfrom high-dimensional inputs and outputs. Causal-equivalent models can be obtained for the CD noncausalmodels and thus closed-loop optimal input design can be performed based on the causal-equivalent models.The effectiveness of the proposed algorithms are verified by industrial examples from paper machines. Itis shown that the developed adaptive controllers can automatically tune controller parameters to account forprocess dynamic changes, without the interventions from users or recommissioning the process. Therefore,the proposed methodology can greatly reduce the costs on the controller maintenance in the process industry.iiLay SummaryA good model of an industrial process is an important prerequisite for high-quality products since mostindustrial controllers depend on the model. When the quality of the model degrades due to the changes ofprocess characteristics, a new model has to be identified to improve the control performance. In this research,we develop data-based performance metrics to assess the control performance and quality of employed mod-els online. Once the degradation in the quality of process model is detected, an experiment is designed totrigger the identification of underlying process models. This experiment is designed in an optimal way suchthat the subsequent identification can yield a model as accurate as possible. Based on the updated model, thecontroller is retuned to improve the control performance. With this scheme, the controller can account forprocess changes automatically without user interventions and this can significantly reduce the maintenancecost of model-based controllers.iiiPrefaceThe results presented in this thesis are based on a close collaboration with Mr. Johan Backstrom and Dr.Michael Forbes from Honeywell Process Solutions at North Vancouver, BC, Canada. Besides, routineclosed-loop identification for MD processes in Chapter 2 is collaborated with Lee Rippon from the Uni-versity of British Columbia. The main algorithms, theoretical verifications, simulations and literature reviewin Chapter 3-6 are based on my original ideas and joint discussions with my supervisors Prof. BhushanGopaluni, Prof. Philip Loewen from University of British Columbia as well as our industrial partners.• The results in Chapter 2 have been submitted as a US patent: Q. Lu, L. Rippon, B. Gopaluni, M.Forbes, P. Loewen, J. Backstrom and G. Dumont, “Closed-loop model parameter identification forindustrial model-based process controllers,” Application number H0057287-0108, 2016. This workis also summarized into a paper and to be submitted to the peer reviewed journal: Q. Lu, L. Rippon,B. Gopaluni, M. Forbes, P. Loewen, J. Backstrom and G. Dumont, “A closed-loop ARX output-erroridentification method for industrial routine operating data”.• The results in Chapter 3 have been submitted as a US patent: Q. Lu, B. Gopaluni, M. Forbes, P.Loewen, J. Backstrom and G. Dumont, “Model-plant mismatch detection using model parameter dataclustering for paper machines or other processes”, Application number H0057510-0108. This workalso has been published as a conference paper: Q. Lu, B. Gopaluni, M. Forbes, P. Loewen, J. Back-strom and G. Dumont, “Model-plant mismatch detection with support vector machines”, in Proceed-ings of the IFAC World Congress, 50(1): 7993-7998, 2017.• The results in Chapter 4 have been published as a journal paper: Q. Lu, M. Forbes, B. Gopaluni, P.Loewen, J. Backstrom and G. Dumont, “Performance assessment of cross-directional control for papermachines”, IEEE Transactions on Control Systems Technology, 25(1): 208-221, 2017.• The results in Chapter 5 have been submitted as a US patent: Q. Lu, B. Gopaluni, M. Forbes, P.ivPrefaceLoewen, J. Backstrom and G. Dumont, “Model-plant mismatch detection with support vector ma-chines for cross- directional process behavior monitoring”, Application number H0059870-0108. Thiswork is also summarized into a paper and to be submitted to the peer reviewed journal: Q. Lu, B.Gopaluni, M. Forbes, P. Loewen, J. Backstrom and G. Dumont, “Model-plant mismatch detectionwith support vector machines: An application to cross-directional processes”.• The results in Chapter 6 have been published as a US patent: Q. Lu, B. Gopaluni, M. Forbes, P.Loewen, J. Backstrom and G. Dumont, “Optimal closed-loop input design for identification of flatsheet process models”, Application number 62305412. It has also been published as a conferencepaper: Q. Lu, B. Gopaluni, M. Forbes, P. Loewen, J. Backstrom and G. Dumont, “Noncausal modelingand closed-loop optimal input design for cross-directional processes of paper machines”, in AmericalControl Conference, pp. 2837-2842, 2017.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiNomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 The paper machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Process models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4.1 Performance assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4.2 Model-plant mismatch detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4.3 Optimal input design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.4 Closed-loop identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.5 Outline of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.6 Significance of the research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16viTable of Contents2 Routine Closed-loop Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 The closed-loop ARX-OE identification method . . . . . . . . . . . . . . . . . . . . . . . 212.3.1 First step: high-order ARX identification . . . . . . . . . . . . . . . . . . . . . . . 212.3.2 Second step: OE modeling with filtered input and output data . . . . . . . . . . . . 232.4 Asymptotic analysis of the ARX-OE method . . . . . . . . . . . . . . . . . . . . . . . . . 242.4.1 Consistency analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.4.2 Asymptotic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.5 Identification based on routine operating data . . . . . . . . . . . . . . . . . . . . . . . . . 292.5.1 Estimate of the controller inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.5.2 Sufficient excitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.5.3 The role of noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.6.1 Case I: univariate MD control of the paper machine . . . . . . . . . . . . . . . . . 302.6.2 Case II: multivariate MD control of the paper machine . . . . . . . . . . . . . . . . 322.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Model-plant Mismatch Detection for MD Processes . . . . . . . . . . . . . . . . . . . . . . . 393.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 Drawbacks of several MPM detection methods . . . . . . . . . . . . . . . . . . . . . . . . 403.3 The MPM detection idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.4 Closed-loop identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 MPM detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.5.1 One-class learning support vector machines . . . . . . . . . . . . . . . . . . . . . . 463.5.2 Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.5.3 MPM detection with SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.6 Examples of MPM detections for MD processes . . . . . . . . . . . . . . . . . . . . . . . 523.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54viiTable of Contents4 Control Performance Assessment for CD Processes . . . . . . . . . . . . . . . . . . . . . . . 564.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.2.1 Variance partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.2.2 Process model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.3 The MVC benchmark for CD processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3.1 MVC benchmark for the steady-state profile . . . . . . . . . . . . . . . . . . . . . 594.3.2 MVC benchmark for the residual profile . . . . . . . . . . . . . . . . . . . . . . . 604.4 User-specified benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5 Performance monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.5.1 Vector autoregressive modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.5.2 Performance monitoring algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 694.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.6.1 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.6.2 Industrial example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775 Model-plant Mismatch Detection for CD Processes . . . . . . . . . . . . . . . . . . . . . . . 795.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.2.1 CD process model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.2.2 CD noise model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.2.3 High-order ARX approximation of the CD process model . . . . . . . . . . . . . . 825.2.4 The presence of feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.3 Routine CD closed-loop identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.3.1 Model parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.3.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.3.3 Convergence and consistency analysis . . . . . . . . . . . . . . . . . . . . . . . . 885.4 Application of one-class SVM to CD mismatch detection . . . . . . . . . . . . . . . . . . . 905.4.1 SVM training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91viiiTable of Contents5.4.2 SVM prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.5.1 Example 1: Iterative CD closed-loop routine identification . . . . . . . . . . . . . . 935.5.2 Example 2: One-class SVM model-plant mismatch detection . . . . . . . . . . . . 955.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986 Closed-loop Optimal Input Design for CD Processes . . . . . . . . . . . . . . . . . . . . . . . 1006.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.2.1 Open-loop CD process model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.2.2 Closed-loop steady-state CD process model . . . . . . . . . . . . . . . . . . . . . . 1026.2.3 Spatial optimal input design for the CD process . . . . . . . . . . . . . . . . . . . 1036.3 Causal scalar transfer function representation of the CD process . . . . . . . . . . . . . . . 1036.3.1 Noncausal scalar model of the closed-loop CD process . . . . . . . . . . . . . . . . 1036.3.2 Causal equivalent closed-loop models . . . . . . . . . . . . . . . . . . . . . . . . . 1056.3.3 Covariance matrix equivalence of the causal and noncausal model parameter esti-mates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086.4 Closed-loop optimal input design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147 CD Iterative Identification and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.1 Case I: No model-plant mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167.2 Case II: Gain mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.3 Case III: Width, divergence, and attenuation mismatches . . . . . . . . . . . . . . . . . . . 1197.4 Case IV: Time constant mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207.5 Discussion of the simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1218 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130ixTable of ContentsAppendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Appendix A Proofs for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141A.1 Proof of Theorem 2.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141A.2 Proof of Theorem 2.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142Appendix B Derivations for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148B.1 Variance partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148B.2 Derivation of (4.12) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149Appendix C Proofs for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151C.1 Proof of Theorem 5.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151xList of Tables2.1 Tuning parameters of the SISO MD MPC . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2 Tuning parameters of the MIMO MD MPC . . . . . . . . . . . . . . . . . . . . . . . . . . 363.1 Parameters setup of the MPM detection algorithm . . . . . . . . . . . . . . . . . . . . . . . 534.1 Variance partition for each simulated case . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.1 The implementation of routine CD closed-loop iterative identification . . . . . . . . . . . . 885.2 Tuning parameters of the CD MPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.3 Simulation and MPM detection parameters for Example 1 and Example 2 . . . . . . . . . . 947.1 Summary of parameters setup in adaptive control simulation . . . . . . . . . . . . . . . . . 115xiList of Figures1.1 Diagram of a typical paper machine [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 The overall adaptive control structure applicable to both MD and CD processes. . . . . . . . 51.3 Two-dimensional response of a CD actuator under bump signal . . . . . . . . . . . . . . . . 61.4 Temporal step response and spatial impulse response of a CD actuator. . . . . . . . . . . . . 72.1 Simulated input and output data for the closed-loop MD process . . . . . . . . . . . . . . . 332.2 IR coefficients of the estimated noise model inverse with different ARX orders . . . . . . . 332.3 Step response of the estimated process model with different ARX orders . . . . . . . . . . . 332.4 Histogram of parameter b estimates in 1000 Monte-Carlo simulations. Green: direct iden-tification method with wrong noise model structure. Red: direct identification method withcorrect noise model structure. Blue: closed-loop ARX-OE method. Red dashed line: thetrue parameter value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.5 Histogram of parameter a estimates in 1000 Monte-Carlo simulations. Green: direct iden-tification method with wrong noise model structure. Red: direct identification method withcorrect noise model structure. Blue: closed-loop ARX-OE method. Red dashed line: thetrue parameter value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.6 The simulated output profiles (CVs) for the MIMO MD process of a paper machine . . . . . 372.7 The simulated input profiles (MVs) for the MIMO MD process of a paper machine . . . . . 372.8 IR coefficients of the true and estimated inverse noise model for each output channel. . . . . 372.9 Step response of the true (red solid line) and estimated (blue dashed line) process models. . . 373.1 The IMC structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2 Illustration of the training and testing data (ID stands for identification). . . . . . . . . . . . 43xiiList of Figures3.3 Illustration of the SVM training. Each curve shows the IR coefficients of the estimatedmodels from training data. The upper and lower bounds define the width of the cluster. . . . 443.4 Illustration of the MPM detection idea. Here the training and testing models refer to theprocess model estimates from training and testing data sets. . . . . . . . . . . . . . . . . . . 443.5 Flow chart of MPM detection scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.6 Simulated input and output profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.7 MPM and noise change detection results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.1 Illustration of the variation separation. (a): The steady-state profile plot. Note that thesteady-state profile is replicated to have the same scan number as the residual profile andCD profile; (b): The residual profile plot. Each CD bin has zero mean; (c): The overall CDprofile. Note that the CD profile is combined by the steady-state and the residual profile. . . 574.2 Spatial impulse steady-state response and the temporal step response of a single actuator.The negative peak of the spatial response is due to the negative gain of the actuator spatialmodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.3 Performance indices for different levels of gain mismatch. Note that γ0 is the nominal gainvalue used by the controller, γ is the actual gain value of the process plant. . . . . . . . . . . 714.4 Performance indices for different levels of width mismatch. Note that ξ0 is the nominal widthvalue used by the controller, ξ is the actual width value of the process plant. . . . . . . . . . 724.5 Performance indices for different levels of divergence mismatch. Note that β0 is the nominaldivergence value used by the controller, β is the actual divergence value of the process plant. 724.6 Performance indices for different levels of attenuation mismatch. Note that α0 is the nominalattenuation value used by the controller, α is the actual attenuation value of the process plant. 724.7 Performance indices for different levels of time constant and time delay mismatch. Note thatτ0 and τd0 are the nominal time constant and time delay value used by the controller, τ andτd are the actual time constant and time delay value of the process plant. . . . . . . . . . . 734.8 Comparison of performance index η1 and η2 with high frequency spatial disturbance . . . . 754.9 Three-dimensional plot of input-output profiles from industrial data set. (a): The dry weightprofile (g/m2) with 376 measurement bins with MD trend removed; (b): The actuator profile(%) of 114 actuator zones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77xiiiList of Figures4.10 The analysis of the industrial data. (a): The moving window performance indices for themeasured data. Blue solid line: η2,user; Red dash-dotted line: η2; Black dashed line: η1; (b):The steady-state of the entire dry weight profile; (c): The steady-state of the entire actuatorprofile with lower bound and upper bound (red dash-dotted line); (d): The spectrum of theaveraged dry weight profile and the approximated spatial bandwidth (red dash-dotted line). . 785.1 Simulated input-output data for the closed-loop CD process . . . . . . . . . . . . . . . . . . 955.2 CD closed-loop iterative identification results: noise model is a high-pass filter (left); noisemodel is a low-pass filter (right); . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.3 The colormap of simulated input-output data for CD MPM detection . . . . . . . . . . . . . 965.4 Process parameter estimates over moving windows . . . . . . . . . . . . . . . . . . . . . . 985.5 MPM and noise model change detection results . . . . . . . . . . . . . . . . . . . . . . . . 986.1 The closed-loop optimal input design configuration . . . . . . . . . . . . . . . . . . . . . . 1016.2 The impulse response of a single actuator (red solid line) and the impulse response of theestimated noncausal transfer function (blue dash-dotted line). . . . . . . . . . . . . . . . . . 1136.3 Spectrum of the optimal input based on causal-equivalent model of the CD process. . . . . . 1136.4 The impulse responses of the estimated process model in the closed-loop under the opti-mally designed input (upper plot), the bumped input (middle plot), and the white noise input(bottom plot) in 100 Monte-Carlo simulations. . . . . . . . . . . . . . . . . . . . . . . . . . 1147.1 No mismatch: The collected output and input profiles . . . . . . . . . . . . . . . . . . . . . 1177.2 No mismatch: Estimates of parameters in each moving window . . . . . . . . . . . . . . . . 1177.3 No mismatch: Adaptive CD MPC control results . . . . . . . . . . . . . . . . . . . . . . . 1187.4 Gain mismatch: simulated output and input profile . . . . . . . . . . . . . . . . . . . . . . 1197.5 Gain mismatch: Estimates of parameters in each moving window . . . . . . . . . . . . . . . 1197.6 Gain mismatch: Adaptive CD MPC control results . . . . . . . . . . . . . . . . . . . . . . 1207.7 Width mismatch: The collected output and input profiles . . . . . . . . . . . . . . . . . . . 1227.8 Width mismatch: Estimates of parameters in each moving window . . . . . . . . . . . . . . 1227.9 Width mismatch: Adaptive CD MPC control results . . . . . . . . . . . . . . . . . . . . . . 1237.10 Divergence mismatch: The collected output and input profiles . . . . . . . . . . . . . . . . 123xivList of Figures7.11 Divergence mismatch: Estimates of parameters in each moving window . . . . . . . . . . . 1237.12 Divergence mismatch: Adaptive CD MPC control results . . . . . . . . . . . . . . . . . . . 1247.13 Attenuation mismatch: The collected output and input profiles . . . . . . . . . . . . . . . . 1247.14 Attenuation mismatch: Estimates of parameters in each moving window . . . . . . . . . . . 1247.15 Attenuation mismatch: Adaptive CD MPC control results . . . . . . . . . . . . . . . . . . . 1257.16 Time constant mismatch: The collected output and input profiles . . . . . . . . . . . . . . . 1257.17 Time constant mismatch: Estimates of parameters in each moving window . . . . . . . . . . 1257.18 Time constant mismatch: Adaptive CD MPC control results . . . . . . . . . . . . . . . . . 126xvNomenclatureARX Autoregressive exogenousAsN Asymptotically normalCD Cross DirectionCV Controlled Variable⊕ Direct sum between setsδ (·) Dirichlet functionE Expectation of a random variableE Generalized expectation of a random variableEi The i-th basis matrixFIR Finite impulse responseFOPTD First-order plus time-delayG† Pseudo-inverse of matrix Gi.i.d Independent and identically distributedIMC Internal model controlIR Impulse responseκ(x,y) Kernel function between two vectors x, yLQG Linear quadratic Gaussianλ−1 Unit backward shift operator in spatial directionLMI Linear matrix inequalityMD Machine DirectionMIMO Multi-input-multi-outputMISO Multi-input single-outputMPC Model-predictive controlMPM Model-plant mismatchxviNomenclatureMV Manipulated VariableMVC Minimum-variance controlN(µ,P) Multivariate Gaussian distribution with mean µ and covariance POE Output-errorPDF Probability density functionPEM Prediction-error methodPRBS Pseudorandom binary sequenceq−1 Unit backward shift operator in temporal directionSISO Single-input-single-outputsup Supremum of a setSVM Support vector machine→ Convergence of a sequence of numbers or functionsw.p.1 With probability onexviiAcknowledgmentsI would like to first express my sincere gratitude to my supervisor Professor Bhushan Gopaluni. He intro-duced me to this university-industry collaborative research project and gave me invaluable guidance towardsaddressing practical problems arising from the industry. This work would not be possible without his sup-port, patience and inspiring discussions throughout my PhD study. Working with Prof. Gopaluni and hisgroup is one of the most enjoyable experiences in my life.I am greatly thankful to my co-supervisor, Professor Philip Loewen. He helped a lot in my studies fromthe initial literature review all the way to the paper writing. I received extensive training on mathematicsfrom working with him on the proofs for our ideas and I believe that I would benefit enormously in myfuture career from being rigorous, in both research and life.My sincere appreciation also goes to my industrial supervisors, Dr. Michael Forbes and Mr. Johan Back-stro¨m. Michael gave me numerous assistances in realizing our algorithms in the real industry. I appreciatethose hundreds of productive weekly meetings organized by Michael to implement our algorithms into thereal paper machine. These meetings are one of the most important reasons for the success of this project. Ialso acknowledge Johan for his insightful feedback on our progress to make our work practically significant.Completing this project would have been more difficult without his guidance and steering.I am also thankful to Prof. Richard Braatz for hosting me at the Department of Chemical Engineering atMIT. He introduced me to a new research area on fault detection and data analytics. I benefited a lot fromhis broad knowledge and great expertise in control and optimization. I would also like to thank Prof. BenbenJiang from Beijing University of Chemical Technology, China, who was visiting Prof. Richard Braatz’s labat the same time, for his help during my visit at MIT and the fruitful discussions on the new research area.I would also like to thank all the collaborators and colleagues in my PhD project and study: Lee Rippon,Cristian Gheorghe, Greg Stewart, Yiting Tsai, Mahdi Yousefi, Prof. Guy Dumont. I have been fortunateenough to work with you over the last four years.xviiiAcknowledgmentsFinally, I offer special thanks to my parents and my wife, Hui Tian, for their unyielding support, uncon-ditional love and tolerance in my life. To them I dedicate this dissertation.xixChapter 1IntroductionThe stiff competition globally on the reduction of energy and improvement of quality has exerted extensivepressure on the increasing of all-around efficiency of pulp and paper industry. As one of the major sectors inpulp and paper industry, sheet and film processes are facing strong demands for highly efficient, autonomous,and environmentally sustainable operations to enhance their competitive edge. In this work, we focus on de-velopment of novel methods to improve efficiency through automation and adaptive information processingfor paper machine that is a typical sheet and film process.1.1 The paper machinePaper machines transform with high efficiency a slurry of water and wood cellulose fibres into sheets ofpaper. The diagram of a typical Fourdrinier paper machine is illustrated in Figure 1.1. A paper machinecontains four sections: wet end section, press section, dryer section and post drying section. In the wet endsection, diluted fibre (mixture of water and fibre with about 0.5% fibre concentration) is bumped into theheadbox. The array of actuators controlling the opening of slice lips at the headbox across the paper sheetare used to adjust the amount of pulp distributed on the drainage belt. The drainage belt moves at a highspeed with various suction devices underneath to remove most water in the fibre. In the press section, steamboxes and pressing rolls are used to further dewater the paper sheet, ending with approximate 40% fibreconcentration. A series of steam-heated cans further evaporate the water content in the fibres in the dryersection, reducing the water concentration to about 5-9% [1]. In the post drying section paper properties, suchas paper sheet thickness (caliper) and surface properties (gloss), are controlled by calendar stacks and finallythe paper sheets are wounded up on the reel at the end.The most important paper properties are sheet weight, moisture content and thickness. These propertiesare measured by traversing scanners located at the end of paper machine, as shown in Figure 1.1. These scan-ning sensors travel back and forth across the paper sheet to measure paper properties and thus the scanned11.1. The paper machinepoints on the paper product form a zig-zag trajectory due to the movement of paper sheet. The objectivein controlling paper machines is to reduce variations in paper properties due to disturbances by increasingcontrollable bandwidth such that disturbances with frequency less than the bandwidth will be attenuated.For the control of paper machines, there are two important directions that divide paper machines into twoparts. The first direction is known as the machine direction (MD) and it refers to the direction in which thepaper sheet moves. The other direction is called cross direction (CD) and it is perpendicular to the MD. Theprimary objective in the control of a paper machine is to make the actual paper properties as close to desiredas possible by adjusting the manipulation of actuators [2].MD control is concerned with controlling the average values of measurement points and MD processesare usually represented as single-input-single-output (SISO) or low-dimensional multi-input-multi-output(MIMO) systems. For example, MD basis weight is controlled by adjusting the fibre concentration of thepulp in the headbox. The overall moisture is controlled by the average amount of steam pumped into thedryer section. Note that Figure 1.1 only shows the structure of CD processes afterwards.Cross-directional control is a finer-resolution control on the top of MD control. Five types of commonCD actuators are shown in Figure 1.1. An array of slice lip actuators are mounted after the headbox andcontrolling the opening of each slice lip actuator locally can affect the amount of pulp pumped out from theslice lip, thus impacting the local basis weight of fibres on the sheet. Note that slice lip actuators can alsoinfluence other properties such as moisture and caliper of the paper sheet. In the press section, an array ofsteam box actuators are used to facilitate the dewatering by spraying hot steams onto the paper sheet. Rewetshowers spray water drops onto the paper sheet to prevent over-drying. Similarly, the calendar stacks, anarray of induction heating rolls, are used to change the thickness of paper sheet. All these arrays of actuatorscan be manipulated individually to modify local properties. Although there exist strong interactions betweendifferent arrays of actuators and controlled variables (CVs), in this work, for the cross direction, we mainlyfocus on the single-array CD process since the methods developed can be easily extended to multiple-arraycases.Compared with MD control, CD control is much more complex due to the characteristics associated withCD processes [3, 4]. First, a typical industrial paper machine may be as wide as 10 meters with hundredsof actuator and measurement bins. If modeled as a multivariable system, the large dimension will makethe controller design a challenging problem [5]. Second, due to the spatially-distributed nature, most CDprocess models are often ill-conditioned, and consequently, a large portion of eigenvector directions with21.2. Research objectivessmall eigenvalues are indeed uncontrollable [6]. Third, model uncertainty especially the gain sign uncertaintyassociated with the uncontrollable eigenvector directions make robust stability difficult to achieve [7, 8].Figure 1.1: Diagram of a typical paper machine [1]1.2 Research objectivesWith the development of model-based control, especially model-predictive control (MPC), most MD andCD controllers are based on models identified a priori from experiments such as bump tests [9, 10]. TakingCD processes as an example, classical CD control strategies include two dimensional loop shaping [11],CD model predictive control (MPC) [12], and robust CD control [13]. In these conventional CD controltechniques, the quality of CD process model plays a vital role in determining the closed-loop performance.However, as the process operating conditions may change, the quality of the process model may deteriorateand consequently the control performance will degrade [14]. In this case, a new model must be identified forthe process. Therefore, it is of great interest to develop techniques to evaluate the control performance byusing input-output data. A straightforward idea is to find benchmarks that represent satisfactory control per-formance against which we can assess the currently implemented controller. Some performance indices canbe put forward to quantitatively measure the closeness of the current control performance to the benchmark.As various factors may lead to the control performance deterioration such as poor model quality, inappro-priate controller tuning, disturbance change, etc., the process model re-identification is necessary only whenthe root cause is diagnosed as the model quality degradation. Hence it is attractive to find techniques whichare able to detect the existence of significant model-plant mismatches. Since we would like to perform the31.2. Research objectivesmismatch detection task online, it is desirable that the proposed technique can be applied to daily routineoperating data. The problem here is that routine industrial data generally lacks excitation signals and thusit is difficult to precisely identify the process model in real-time. In this work, we will propose an effectivemethod to resolve the mismatch detection problem.Once the poor model quality has been diagnosed as the root cause for performance degradation, a modelidentification scheme can be initiated to obtain an updated model of the process. A typical model identifi-cation scheme involves three stages: optimal input design, model identification and model validation [15].Among these stages, the optimal input signal (or dithering signal) plays an important role in determining thequality of process model estimates. For a parametric model, the least requirement on the dithering signalis that it should contain enough frequency components so as to excite all necessary modes. Furthermore, agood input signal shall also guarantee a small variance associated with the identified parameters. It is desir-able that the perturbation to the process from excitation signals is minimal so as to prevent economic lossesand to meet physical process constraints. The objective of optimal input design can be formulated as that ofminimizing a scalar function of the covariance matrix of parameter estimates with respect to input sequence(spectrum) subject to a set of constraints. The literature is replete with algorithms for optimal input designin both time and frequency domains, and for both open and closed-loop data [16]. Most current optimalinput design techniques can be applied to MD processes. However, for CD processes, the literature on inputdesign focuses only on the open-loop case, where the controller has to be turned off which would result inprofit loss for paper mills. In this work, the optimal input signal will be designed in closed-loop, and thelarge input-output dimensions for CD processes will be taken into account.Model identification has been a mature research area where a variety of techniques have been developedand applied in the industry. Specifically, model identifications can be roughly grouped into open-loop iden-tifications and closed-loop identifications. Open-loop identifications are known to be simple and flexibleas there is no feedback or controller that correlate manipulated variables (MVs) with output noise. For theclosed-loop identification, its main advantage is that the system identification is performed during closed-loop operations and this is particularly important for open-loop unstable systems. However, the strong cor-relation between the input signal and the output disturbance complicates the closed-loop identification andit often leads to biased model estimates [15]. For the MD process, a novel closed-loop identification methodwill be proposed in this work, aiming at providing unbiased process model estimates. This method will thenbe extended to the CD process. For the high-dimensional inputs and outputs associated with CD processes,41.3. Process modelswe will present an iterative approach to simplify the identification of CD processes.In particular, we will develop (a) a performance monitoring mechanism to evaluate the quality of MDand CD process models by using routine operating data; (b) a model-plant mismatch detection schemeto examine the occurrence of mismatch in real-time; (c) a closed-loop optimal input method particular tothe papermaking process; (d) a closed-loop system identification technique. The outcome of this projectwill be a “quasi” adaptive control scheme 1, as shown in Figure 1.2, which will automatically monitor theperformance, identify new models, and re-tune controllers online without user intervention, for both MD andCD processes.Research Objectives• Adaptive control schemeOutputProcessControllerReferenceMonitoringIdentificationSwitchInputAdaptivetuningFeedback+ _5Figure 1.2: The overall adaptive control structure applicable to both MD and CD processes.1.3 Process modelsIn this section, we demonstrate typical process models employed in the industry for MD and CD processes.As mentioned in previous sections, single-array MD processes can be well represented by using SISO mod-els. For example, a typical MD process can be modeled asy(t) = G(q)u(t)+H(q)e(t), (1.1)where G(q) is a scalar transfer function, usually with a first-order-plus-time-delay (FOPTD) structure. H(q)is a stable, inversely stable and monic transfer function showing the dynamics of measurement noise e(t).{e(t)} is usually assumed to be a Gaussian sequence. u(t) and y(t) are input (e.g., stock flow) and output1Here the adaptive control is in a general sense that resembles iterative identification and control, in contrast to the conventionaladaptive control. Thus standard problems related to stability of traditional adaptive control do not occur in the proposed scheme.51.3. Process models(e.g., dry weight) profiles. It shall be pointed out that a multiple-array MD process is usually represented asa lower triangular MIMO system to represent the coupling between different arrays of actuators and paperproperties.Figure 1.3: Two-dimensional response of a CD actuator under bump signalCompared with MD, CD processes are much more complicated due to the large number of input-outputdimensions. Typical CD processes involve hundreds of actuators and measurement boxes in an array. Es-sentially, the responses of CD actuators are two-dimensional, in both spatial and temporal directions. Fig.1.3 shows the response (spatial and temporal) curve of a single actuator under bump signal. Note that thetemporal response is similar to that of a first-order model while the spatial response is highly-nonlinear thatis modeled by a parameterized nonlinear function. CD measurement sensors are viewed as discretization ofthe nonlinear response curve. When modeling a CD process, a widely used assumption is the separabilitybetween spatial and temporal responses. Under such assumption, the temporal response is described by aFOPTD model with time constant and time-delay as the parameters. The spatial response is depicted by anonlinear function parameterized by several spatial parameters:b(x) =γ2{e− α(x+βξ )2ξ2 cos[pi(x+βξ )ξ]+ e− α(x−βξ )2ξ2 cos[pi(x−βξ )ξ]}, (1.2)where x is the spatial coordinate and b(x) is the spatial response curve. Gain γ , width ξ , divergence βand attenuation α are four parameters associated with the nonlinear function. The temporal step responseand spatial steady-state impulse response of a single actuator are demonstrated in Fig. 1.4. Four spatialparameters, together with two temporal parameters, form the parameter vector that we wish to estimate in61.3. Process modelsCD model identification.The nonlinear model (1.2) is not convenient for CD controller design. An additional assumption is placedstating that all actuators are supposed to have identical spatial and temporal response behaviors. Substantialfacts from industry have revealed the validity of this assumption for CD actuators in an array except thosenear the edge. With this assumption, the spatial model is simplified as a highly-structured gain matrix andall actuators share the same temporal model. A generic CD process model is represented asy(t) = g(q)Gu(t)+v(t), (1.3)where y(t) ∈Rm stands for the measured CD profile, such as basis weight, and u(t) ∈Rn is the manipulatedvariable such as the opening of slice lip actuators. m and n are respectively the output and input dimensions.v(t) ∈ Rm represents the output measurement noise. g(q) is a scalar FOPTD transfer function showing thecommon dynamics of all actuators in the array in temporal direction. G is a Toeplitz-structured gain repre-senting the spatial responses of actuators at steady-state. In particular, the (i, j)-th entry of G is determinedby the value b(dsi−c j) in (1.2), where ds is the spatial distance between neighboring measurement bins andc j is the spatial response center of actuator j.Note that in the following chapters, we will elaborate the MD and CD models in different aspects ac-cording to the specific needs in the context.Figure 1.4: Temporal step response and spatial impulse response of a CD actuator.71.4. Literature review1.4 Literature review1.4.1 Performance assessmentController performance assessment for univariate systems have been extensively studied in the literature.Excellent surveys on performance assessment for univariate and multivariate processes can be referred to[17–19] and the references therein. Most performance assessment techniques aim at finding a benchmarkwhich characterizes a theoretically best control performance and compare it with the actual performanceachieved by the implemented controllers. For a typical control system, there exist a variety of factors thatprevent the controller from achieving ‘ideal’ control performance. For example, we cannot expect the con-troller to track a setpoint change perfectly without any tracking error since the inherent lag of a physicalsystem in response to any abrupt setpoint change cannot be overcome by any controller. Such constraints onthe achievable performance of a controller are called performance limitations, including time-delays, timeconstants, non-minimum-phase zeros, etc [20]. Thus the performance of a controller cannot be evaluatedsimply by comparing it with the ideal control performance, but rather, it should be assessed by taking intoaccount the corresponding performance limitations existing in the underlying system.Among these performance limitations time-delays are the most fundamental and have the most directinfluence on the controller performance. Harris [21] first proposed the minimum variance control (MVC)benchmark which specifies the minimum variance that a closed-loop system can achieve when time delayis the fundamental performance limitation. Due to its straightforward intuition and easy deployment, thisbenchmark has become the most popular tool in the process monitoring area. A time series model is fittedinto the measured output data and the first few coefficients are the controller-invariant part (benchmark).The filtering and correlating (FCOR) algorithm was proposed in [18] in which only the output data and timedelay are required to calculate the minimum variance benchmark. Furthermore, the MVC benchmark wasextended from SISO systems to MIMO systems [19, 22, 23]. Note that for MIMO systems, the interactormatrix has to be factored out as an analogy of the time-delay for the SISO case in order to apply the MVCbenchmark. Moreover, it has been proved that the multivariate MVC is able to achieve minimal variance ineach individual output channel provided that the interactor matrix is simple or diagonal [18]. A variety ofindustrial applications using MVC benchmark have emerged afterwards [24], such as paper machine [25]and refinery process [26].Due to the aggressiveness of the MVC benchmark, a generalized minimum variance controller (GMVC)81.4. Literature reviewbenchmark was proposed in [27] which enables the incorporation of input signals into the benchmark. Theweight selection strategy was presented in [28] for the GMVC benchmark. The LQG benchmark for per-formance monitoring was proposed and investigated in [18, 29, 30], where a trade-off curve between theoutput variance and input variance was obtained and the performance of implemented controllers was eval-uated against this curve. Furthermore, due to the wide use of MPC in the industry, performance assessmenttechniques specific to MPC systems were investigated in [31, 32]. However, most MIMO systems under con-sideration have small input and output dimensions. A suitable extension of the MVC benchmark to spatiallydistributed systems, such as the cross direction of paper machines, has received relatively little attention.The high dimensionality of measurements as well as the poorly conditioned nature of the CD process modelpresents significant challenges for performance monitoring [14].For CD performance monitoring, in [33], the deviation of an implemented controller from the MVC wasestimated in terms of actual input-output data as well as the plant model (known a priori). More practicalconsiderations such as actuator constraints were taken into account in the monitoring process. In [34],the original CD model was decoupled into a family of SISO systems and Harris’ MVC benchmark wasapplied to each subsystem. The minimum variance performance index for CD processes was proposedin [35] which accounts for both time-delay performance limitations in the temporal direction and spatialbandwidth performance limitations in the cross (spatial) direction. A Bayesian method was employed toimpose Toeplitz structure to the coefficient matrices in estimating the time series model. However, it is notstraightforward to obtain a quantitative assessment of the CD processes from the proposed methods. Thusit is desirable to develop an intuitive performance index which directly measures the control performancefor the CD process. Moreover, the performance index must be reasonably easy to compute so as to beimplemented on-line. Note that the process monitoring for MD processes, with MVC-based benchmark anduser-specified benchmark, has been thoroughly studied in [36]. In this work, our focus would be on theperformance assessment for CD processes.1.4.2 Model-plant mismatch detectionAlthough process monitoring provides a quantitative way of measuring the control performance, controlloop performance may deteriorate for various reasons such as model-plant mismatch (poor-quality model),changes in disturbance characteristics, improper tuning of controllers [30]. Among these factors the model-plant mismatch (MPM) is vital since when the mismatch degrades the control performance, system re-91.4. Literature reviewidentification will be required to produce an updated model and deploy it to the controller. Note that itis not necessary to perform model re-identification for the other causes of performance degradation. Thusthere is an increasing demand on an automated and highly-reliable scheme to directly detect the presence ofsignificant mismatches.MPM often arises from the susceptibility of industrial processes to changes of operating conditions overtime, such as a drift to the chemical process caused by catalyst deactivation [37]. Significant MPM can leadto suboptimal control decisions, production loss, or even closed-loop instability, in which case a processmodel re-identification procedure is necessary [38]. Traditionally, the MPM detection problem is consideredas a component in the diagnostic part of the overall process monitoring hierarchy [39]. A correlation-basedmethod was proposed in [40] which relies on examining the significance of cross-correlations between set-point and model error to separate MPM from other causes of performance degradation. Multivariate chi-squared test on the output and prediction error was proposed to diagnose the presence of MPM in [41]. Ahypothesis test method was proposed in [42] to distinguish the effects of changes in process and distur-bance on control performance drop. As pointed on in [39], during this early stage of research on MPMdetection, most efforts concentrated on the performance monitoring and little endeavors were reported toseparate the MPM from other causes. Another category of methods on MPM detection during that periodconstructs statistics to detect the abrupt changes in the process parameters. A variety of statistical tools suchas time series modeling, principal component analysis (PCA), generalized likelihood ratio test and severalnew statistics were presented in [43–46]. The issue of deciding when to perform the model re-identification,based on PCA and Akaike information criterion, was investigated in [47]. Periodic injection of external exci-tation signals was required to perform simple tests to detect the model degradation. More recently, attemptson directly identifying the presence or even the model of MPM have become the mainstream in the litera-ture. A method based on partial correlation was illustrated in [48] to address the correlations among MVs.The correlation between each MV and each model residual, after de-correlating the MVs, was employedto examine the occurrence of MPM. In [49], an important method was proposed to discriminate the noisecovariance mismatch and process model mismatches. It was shown that both mismatches affect the Kalmanfilter design, leading to non-white innovations. These two mismatches differ in the order of innovation se-quences and from this the MPM can be separated from noise model changes. The impacts of MPM on thesensitivity function was thoroughly explored in [50] and a resultant performance index was put forward forthe diagnosis.101.4. Literature reviewOver the last few years, interests on MPM detection have grown significantly, pushing the frontierstowards realistic industrial contexts with closed-loop routine operating data. Several approaches have beenreported based on model residual analysis [51], output autocovariance function [37], plant-model ratio [52]and the nominal sensitivity function [52]. Despite these achievements, the MPM detection problem is notyet completely solved. The remaining key issues on this topic can be summarized into two categories: theseparability between MPM and noise model change [38, 51], and the usage of external excitations [48, 50].First, most MPCs perform predictions based solely on a process model or together with a presumed simplenoise model, e.g. a step or random-walk disturbance. Despite that changes in true noise model may causedrifts in the variance of process variables, the resultant degradation in control performance shall not beattributed to the MPC. In other words, operators in the industry are more concerned with the quality dropin the underlying process model instead of that of the noise model. An ideal MPM detection approach shallnot mistake the changes in noise model as MPM and be robust to such noise changes [38]. Most approachesbuilt on variance-based metrics are susceptible to this issue, such as the minimum-variance benchmark [21].Second, most present approaches to directly identify the MPM depend on certain external excitations, suchas dither signals or setpoint changes. Nevertheless, external excitations bring additional perturbations to thesystem and inevitably cause profit loss to the industry. Although it is possible to identify the MPM duringthe setpoint change (e.g. grade changes for paper machines), we prefer to monitor MPM during the routineoperation stage, given that most of the time the system operates at steady-state that may be free of any typeof external excitations. In terms of CD processes, the problem of MPM detection still remains open. Itcan be expected that extending previous approaches to CD processes faces extensive difficulties due to thecomplexity from large-scale inputs and outputs.1.4.3 Optimal input designThe input design answers the question on how to design an excitation signal optimally to excite the systemso that the identified system model is as accurate as possible. For an input design problem, the objectiveis often formulated as minimizing the variance of parameter estimates by carefully choosing the excitationsignal [15]. A default assumption that is often made is that the employed estimator is consistent or unbiased,so that the covariance of the estimator is the only concern that we have. So far, a number of approaches havebeen put forward, including time-domain, frequency-domain, open-loop and closed-loop approaches.The optimal input design problem has been extensively investigated since 1970s. During that period, the111.4. Literature reviewresearch focused on open-loop input design and the design variable was selected as the input signal. Bothtime domain and frequency domain approaches were explored [53]. In the time domain approaches, theproblem was formulated as that of maximizing some scalar function of the Fisher Information Matrix byselecting the input sequence from an admissible compact set. The time domain approaches suffered fromthe complexity of nonlinear optimizations [54]. In the frequency domain approach, the information matrixwas replaced by an asymptotic per sample information matrix, which can be expressed as an affine functionof the input spectrum [53]. The optimal input spectrum can be obtained by optimizing a scalar function(e.g. A-optimal, D-optimal, E-optimal, etc.) of the asymptotic per sample covariance matrix subject toconstraints [55]. Later on, the results on parameter covariance were extended directly to transfer functionestimates, where the asymptotic variance formulas were derived under the assumption that both model orderand sample number tend to infinity [56, 57]. The results have been shown to be effective not only forhigh order models but also for model orders that do not exceed that of the true system. In the nineties,the concept of identification for control emerged where the identification focus was more on the intendeduse of the model [58]. The identification criteria were chosen to be as close to the control performanceas possible. The iterative identification and control design scheme was proposed to achieve these goals[59]. More recently, the focus shifted to the influence of variance error of the estimated model on theset of robust stabilizing controllers. Semidefinite programming with linear matrix inequality (LMI) [60]constraints makes the problem solvable. More specifically, during this period, the research has significantlyexpanded into the realm of optimal design criteria and constraints. For instance, the design criteria wereno longer restricted to be the classical criteria or the variance of transfer function estimates [61]. Morepractical constraints such as the frequency-by-frequency constraints can be included as well in addition tothe conventional power constraints [62]. In conjunction with these new results are the developments ofoptimization and parameterization techniques associated with the input design problem. It has been shownthat most input design problems can be recast as LMI optimization problems, and some parameterizationtechniques have been proposed to reduce the infinite dimensional LMI into finite dimensions. Two well-known parameterizations are the finite dimensional parameterization and partial correlation parameterization[63]. More recent advances include using graph theory to address the input design problem for closed-loopMPC systems in time domain with probabilistic bound constraints on input and output [64, 65].With regard to optimal input design for MIMO and ill-conditioned systems, classical results are referredto [66–70]. However, for the CD process, a high-dimensional MIMO and ill-conditioned process, most121.4. Literature reviewexisting input design methods were developed for the open-loop case, see [71]. The main drawback of openloop input design is the resultant production loss as normal operations of process risk being interrupted.Results on closed-loop optimal input design for CD processes are scarce, with the current industrial practiceof using spatial-bump-temporal-PRBS signals as excitations for closed-loop identification (termed as “bumpexcitations” in this work) [72]. How to design optimal input excitation signal in the closed-loop for large-scale spatially distributed systems is still an unsolved problem.1.4.4 Closed-loop identificationIdentifying process models using closed-loop data has received extensive attention in the last few decades[73]. There are a number of occasions where open-loop identifications may not be suitable and closed-loop identifications are more desirable. For example, when the underlying system is unstable, the controllerhas to be in the loop in order to perform identification experiments [59]. Furthermore, if the objective ofthe identification is to supply a model for the robust controller design, it has been revealed that closed-loop identification is advantageous relative to open-loop identification [74]. A more recent but importantapplication of closed-loop identification is in process performance monitoring, particularly for industriallyrelevant model-based controllers (e.g., MPC). Many industrial processes collect a large amount of routineoperating data and an important aspect of industrial process assessment is to extract metrics in real-timeto monitor the performance of control loops. Most of these data are collected in the closed-loop and thusclosed-loop system identification becomes one of the most powerful tools in industrial data analytics [75].One important feature of such closed-loop identification is that the data are collected during the routineoperation stage where external excitations (e.g., setpoint change or dither signal) may not exist and no apriori knowledge of the noise model is available.Over the past few decades, closed-loop identification has become a well-established area of researchwhere a number of effective techniques, such the prediction error method (PEM), have been proposed andwidely used [15]. Broadly speaking, current closed-loop identification methods can be categorized as fol-lows: direct identification, indirect identification, joint input-output identification method, the two-stagemethod and the projection method. The direct identification method treats closed-loop data as if they weregenerated in the open-loop and therefore neglects the correlation between input and noise [76, 77]. How-ever, for this method complete knowledge of the noise model is required to generate an unbiased estimateof process model. To avoid significant bias, the direct identification method is restricted to situations where131.4. Literature reviewthe feedback effect is weak (e.g., the controller is highly detuned or the signal-to-noise ratio is high). Theindirect identification method proceeds by identifying a closed-loop transfer function (e.g., the sensitivityfunction) and then factoring out the open-loop process model by assuming that the regulator model is avail-able and accurate [73]. The joint input-output method [78], the two-stage method [79] and the projectionmethod [80] share the same idea, i.e., identifying the closed-loop transfer functions from the dither signal tothe process input and output, respectively. The open-loop process model can then be determined from thesetwo closed-loop transfer functions, although the resultant model is often of high order. An additional modelorder reduction process is necessary to acquire a parsimonious model for further model-based controllerdesign [73]. The latter four closed-loop identification methods require injection of a dither signal into theclosed-loop system and in general the dither is independent of the noise. As a consequence, identificationof the closed-loop transfer function with these methods is essentially reduced to identification of “open-loopprocesses”. Additionally, the latter four methods are generally developed for linear controllers and applyingthem when the controller is nonlinear may cause erroneous results. In this case, the direct identificationmethod becomes favorable [75]. These aforementioned closed-loop identification methods can be applied toMD processes. However, the lack of a noise model in MD processes poses a barrier in achieving unbiasedestimates of process models. In this work, we propose a novel closed-loop identification method for SISO orlow-dimensional MIMO systems, which is applicable to both experimental data and routine operating data.For CD processes, the large-dimensional nature brings significant challenges for even open-loop iden-tification. A traditional way of dealing with open-loop CD identification is estimating the finite impulseresponse (FIR) coefficients along the cross direction [81]. Apparently, the parameters estimated from thismethod have large uncertainty due to the large number of free parameters [82]. Other identification toolsuse nonlinear least squares to estimate the parameters in determining the spatial response curve [83]. Due tothe noncausal nature of spatial response of CD actuators, noncausal parsimonious models are put forward torepresent CD processes [82, 84]. However, these methods only consider open-loop data. Their extensionsto closed-loop data have not been reported in the literature. In this work, we will present a closed-loop CDidentification algorithm, which can be applied to both CD experimental data and routine operation data.141.5. Outline of this thesis1.5 Outline of this thesisThis thesis covers the entire adaptive control for CD processes and part of adaptive control for MD processes.Note that each process contains components as shown in Fig. 1.2. For the MD process, our focus wouldbe on the model-plant mismatch and closed-loop identification. Detailed simulations on adaptive controlfor MD processes can be referred to [85]. The performance assessment and optimal input design for MDprocesses can be referred to the work of a colleague [36].Chapter 2 is devoted to the development of a novel closed-loop identification scheme. This closed-loopidentification contains two steps and it can supply an unbiased estimate for the process model even whenthe knowledge on the true noise model structure is not available. Moreover, it can operate with closed-loopdata collected during an experiment in which a dither signal is injected or the data obtained from routineoperating stage in which no external signal is present. Note that this closed-loop identification approach willbe employed both in MPM detection stage and final closed-loop identification stage.Chapter 3 introduces a novel MPM detection scheme based on support vector machines (SVM). Thismethod consists of two steps: routine closed-loop identification and SVM classification for the obtainedmodel estimates. This method can be applied to routine operating data that are free of external excitationsignals, and can avoid the issue of being sensitive to noise model changes. This MPM detection scheme willrun in parallel with the performance assessment to reduce false alarms for MPM detection.Chapter 4 demonstrates the extension of classical performance assessment techniques based on MVCbenchmark to the CD process. Detailed analysis with regard to performance limitations associated with CDprocess is provided, based on which a practical CD MVC benchmark is derived. Moreover, to account forthe actual tuning status of CD controllers, a user-specified benchmark is proposed to reflect the performanceof current controllers. A computationally efficient algorithm is put forward to address the difficulties incomputation due to the large number of dimensions in CD processes.Chapter 5 gives an SVM-based MPM detection method for CD processes, which can be viewed asan extension of the idea from Chapter 3. In particular, a novel iterative CD closed-loop identification isproposed, inspired by the identification of Hammerstein models. Convergence and consistency analysis areperformed for this novel CD closed-loop identification approach. SVM-based MPM detection is then appliedto the identified CD process models and noise models to monitor the occurrence of model-plant mismatches.With this method, we can detect the MPM with only routine operating data and can discriminate MPM from151.6. Significance of the researchnoise model changes.Chapter 6 presents a novel closed-loop input design method for single-array CD processes. The contri-bution lies in that traditional closed-loop input design, either time-domain approaches or frequency-domainapproaches, can not be implemented to CD processes due to the large dimensions of inputs and outputs.Inspired by the noncausal modeling for CD processes, we propose a closed-loop input design idea based onnoncausal models. The advantage of this idea is that CD spatial model can be effectively represented bylow-order noncausal models, reducing the complexity associated with the input design.Chapter 7 provides several examples on the adaptive control (more precisely, iterative identification andcontroller re-tuning) for CD processes. These simulations are performed on a simulator from HoneywellProcess Solutions. Specifically, we demonstrate the procedures in detail from introducing a MPM, detectingthe existence of MPM, performing closed-loop input design and identification to the adaptive tuning of CDcontrollers. Various spatial and temporal parametric mismatches are provided to verify the effectiveness ofthis entire adaptive control scheme.1.6 Significance of the researchThe above adaptive control scheme is proposed based on the stringent requirement in practice for passiveperformance monitoring and minimal user intervention during the entire monitoring, identification and con-troller re-tuning loop. Both control performance assessment and MPM detection do not need the injection ofexternal excitations or even setpoint changes to avoid the resultant production loss. Optimal input design andsystem re-identification are performed in closed-loop to prevent the continuous operation of the process frombeing interrupted. To enhance of the robustness during the monitoring procedure and reduce false alarms onprocess deterioration, we combine control performance assessment and MPM detection to make decisionson whether it is necessary to trigger the identification procedures. Although the entire framework is proposedfor paper machines, it can easily be extended to other sheet and film processes such as coating, metal rollingand polymer film extrusion. For generic processes that are not spatially distributed, the above scheme is alsoapplicable with minor modifications. With this scheme, industrial manufactures can greatly take advantageof advanced control methods to promote the production efficiency and dramatically reduce the expense onthe associated controller maintenance.16Chapter 2Routine Closed-loop Identification2.1 IntroductionOver the last few decades, model-based controllers (especially MPC) have become the dominant controllersin most industrial processes. One typical task of process monitoring is to assess the quality of processmodels used by the controller. This work often involves real-time estimation of the process model based onroutine industrial data, which may lack any type of external excitations [86]. Moreover, most industrial MPCmay display nonlinear dynamics and thus those closed-loop identification methods relying on external dithersignals or linear regulators may not be suitable in this scenario. The direct identification method appears tobe appropriate, however, the lack of knowledge of true noise model structure may create biased estimatesof the process models [87]. Therefore, it is of great significance to develop a new closed-loop identificationmethod which can be applied to the process monitoring situation, i.e., with or without excitation signals andwithout knowing the noise model structure.With the above motivation in mind, we propose a novel closed-loop identification technique. The goal ofthis method is to overcome the bias issue associated with the direct identification method while preserving itsversatility and simplicity in dealing closed-loop data under linear or nonlinear control. A major advantageof the proposed method is its applicability to situations both with and without external excitations. Theproposed identification method is referred to as the closed-loop (autoregressive with exogenous input, outputerror) ARX-OE method. It consists of two steps: high-order ARX modeling in the first step and the OEidentification in the second step. In the second step, the OE identification is applied to input and output datathat have been filtered by the inverse of estimated noise model from the first step. Some preliminaries andassumptions are presented in the next section followed by a detailed description of the ARX-OE methodin Section 2.3 and discussions of the asymptotic properties in Section 2.4. An explanation of the usage ofthis method for routine industrial operating data is provided in Section 2.5. Two examples validating theeffectiveness of the ARX-OE method are presented in Section 2.6 with conclusions given in Section Preliminaries2.2 PreliminariesConsider the following SISO true system with Box-Jenkins structureS : y(t) = G0(q)u(t−d)+H0(q)e0(t), (2.1)where y(t) and u(t) represent the measured output signal (CV) and input signal (MV), respectively. The trueprocess model, G0, is a stable minimum-phase rational transfer function that is assumed to have at least onesample delay. The noise model H0 is a monic, stable and inversely stable filter. The sequence {e0(t)} isindependent and identically distributed (i.i.d.) Gaussian white noise with zero mean and variance σ2e .In closed-loop, the input signal u(t) is determined by the following nonlinear mappingu(t) = k(t,ut−1,yt−1,r(t)), (2.2)where ut−1 = {u(t−1),u(t−2), . . . ,u(1)} and yt−1 is defined analogously. The external excitation signal,r(t), can be either the dither signal (normally added to the actuator site) or the setpoint signal. Note that(2.2) is a general representation of the controller and it includes both linear and nonlinear cases. A typicalexample of a nonlinear controller is MPC with varying active constraints. From the perspective of systemidentification, a nonlinear controller could be more conducive to closed-loop identification as it is likely tobreak the correlation between input and noise through the feedback path [15].Assumption 2.2.1. We assume that all relevant signals in (2.1)-(2.2) are quasi-stationary, i.e., the followingconditions hold [15]E[s(t)] = m(t), |m(t)| ≤C, ∀t,E[s(t)s(t− τ)] = Rs(τ), |Rs(τ)| ≤C, ∀τ,where s(t) is a signal in (2.1)-(2.2), C is some constant and E is the conventional expectation operator forrandom variables. The generalized expectation operator, E, is applicable to the signals consisting of bothstochastic and deterministic componentsE[s(t)] = limN→∞1NN∑t=1E[s(t)].182.2. PreliminariesFurthermore, we defineZN = {u(1),y(1), . . . ,u(N),y(N)},to be a collection of sampled closed-loop data. To facilitate derivations, the following assumption is defaultunless otherwise explicitly stated.Assumption 2.2.2. The collected closed-loop input-output data ZN is informative enough for the selectedmodel structures in the relevant closed-loop identification.Remark 2.2.1. When external excitation exists, Assumption 2.2.2 holds if the external excitation signal r(t)is persistently exciting of sufficient orders. For the case without external excitation, we assume that the linearor nonlinear controller is complex enough to make this assumption hold, see Section 2.5.2. Note that for thetheoretical derivation, in the sequel we assume that r(t) exists. The corresponding results without externalexcitation can be easily derived by setting r(t) = 0.For the prediction-error method (PEM), a class of model structures are constructed to fit into the data set,parameterized by θ = [ρT γT ]T ∈Ωθ ⊆ Rnθ ,M : y(t) = G(q,ρ)u(t)+H(q,γ)e(t),where q ∈Ωρ ⊆ Rnρ and γ ∈Ωγ ⊆ Rnγ are parameter vectors of the process and noise models, respectively.Define Ωθ , Ωρ , and Ωγ to be the corresponding convex and compact sets of parameters θ , ρ and γ . First weintroduce the definition of uniform stability for a model structure [15].Definition 2.2.1. Let G(q,θ) = ∑∞k=1 gk(θ)q−k be a transfer function depending on parameter θ ∈Ωθ . Themodel structure G(q,θ),θ ∈ Ωθ , is said to be uniformly stable on Ωθ if there exists a sequence {g(k)}independent of θ such that ∑∞k=1 g(k)< ∞ and |gk(θ)| ≤ g(k) for each k and every θ ∈Ωθ .We further suppose that the selected model structures G(q,ρ) and H(q,γ) are uniformly stable (‘uni-formly’ is with respect to the parameter θ and ρ) and H(q,γ) is also inversely uniformly stable. Note that inwhat follows we may use Gρ , Gρ(q) and G(q,ρ) interchangeably if there is no risk of confusion.Assumption 2.2.3. It is assumed that the true process model is contained in the set of selected process modelstructures, i.e.,G0 ∈ G ∆= {G(q,ρ)|ρ ∈Ωρ}.192.2. PreliminariesFurther, we assume that all relevant closed-loop transfer functions formed by the selected family of modelstructures are uniformly stable under the controller in (2.2). Note that this condition also guarantees closed-loop stability of the true system since the true system is contained in the selected model structure.We further stress that Assumption 2.2.3 is valid from a practical point of view since for most industrialprocesses the required a priori knowledge of the process is often available. It is well-known that if theselected noise structure also contains the true noise model, then the direct identification method providesconsistent estimates for both the process and noise model parameters, regardless of whether the experimentis conducted in closed-loop or open-loop [15]. However, in practice this statement is too stringent to holdsince in general, the characteristics of noise are too complex to analyze or come up with an appropriatemodel structure. Thus discrepancy between the true noise model and the selected noise model structureis inevitable. Moreover, for process control engineers, reliability of the identified process model is moreimportant than that of the noise model for controller design.A direct consequence of this noise model mismatch is a biased estimate of the process model if the PEMis applied to closed-loop data (we only consider the direct closed-loop identification method for now). To bemore specific, taking the fixed noise model, H∗(q), as an example, the process model parameter estimate isshown to be [15]ρ∗ = arg minρ∈Ωρ12pi∫ pi−pi∣∣G0(e jω)+B(e jω)−Gρ(e jω)∣∣2Φu(ω)|H∗(e jω)|2dω, (2.3)where B(e jω) is the bias term andB(e jω) =(H0(e jω)−H∗(e jω))Φue(ω)Φu(ω), (2.4)where Φue(ω) is the cross-spectrum between the input and noise. It is obvious that for open-loop dataΦue(ω) = 0 and thus the OE structure with a fixed noise model can give an unbiased process model esti-mate. In this work, we propose to use a two-step approach to resolve the bias issue while maintaining otheradvantages of the direct identification method.202.3. The closed-loop ARX-OE identification method2.3 The closed-loop ARX-OE identification methodIn this section we describe the proposed closed-loop ARX-OE identification method in detail. The proposedclosed-loop identification method, as mentioned above, consists of two consecutive steps: high-order ARXidentification followed by closed-loop OE identification using filtered input-output data.2.3.1 First step: high-order ARX identificationLet us first re-write (2.1) in an equivalent formA0(q)y(t) = B0(q)u(t)+ e0(t), (2.5)whereA0(q) =1H0(q), B0(q) =G0(q)H0(q).Since H0(q) is assumed to be inversely stable, it is clear that A0(q) and B0(q) are also stable. It is worthpointing out that in most cases, the impulse response (IR) coefficients of A0(q) and B0(q) contain infiniteterms, i.e.,A0(q) = 1+∞∑k=1a0kq−k, B0(q) =∞∑k=1b0kq−k. (2.6)Thus the original Box-Jenkins model can be represented by an ARX model but with an infinite number ofparameters. Here we propose to use a high-order ARX model to fit the closed-loop dataA(q,ηn)y(t) = B(q,ηn)u(t)+ e(t), (2.7)where n is the selected order andA(q,ηn) = 1+n∑k=1akq−k, B(q,ηn) =n∑k=1bkq−k, (2.8)withηn = [a1 . . . ,an,b1, . . . ,bn]T .The basic idea is that the IR coefficients may converge to zero after sufficient lags and the high-order ARXmodel will capture the first few significant coefficients. Based on (2.7) and (2.8), the parameter estimates212.3. The closed-loop ARX-OE identification methodof the ARX model can be readily achieved by solving a least-squares problem. The estimate of parametervector is defined asηˆn = [aˆ1 . . . , aˆn, bˆ1, . . . , bˆn]T . (2.9)It is evident that the parameter estimate ηˆn under (2.7) and (2.8) will suffer from large variance due to thelarge number of parameters. In fact, the accuracy of parameter estimates from the first step is consequentialto the second step. One remedy to this issue is to add regularizations to the least-squares problem. Fortheoretical derivations in this work, we employ the regularization with a form shown in [88]: for smallδ > 0,Rnreg(N) = Rn(N), if ‖Rn(N)−1‖2 < 2/δRn(N)+ δ2 I, otherwisewhere Rn(N) = 1N ∑Nt=n+1ϕn(t)ϕn(t)T with the regressor ϕn(t) = [−y(t) u(t) . . . − y(t− n) u(t− n)]T . Nis the sample size. It has been proved in that paper that asymptotically, the first and second order propertiesof the parameter estimate will not depend on the regularization term. Another important point is that theIR coefficients of most practical noise models tend to decay rapidly and a priori information can be used tochoose a reasonable model order for the ARX identification in the first step. For theoretical convenience, wemake the following assumption which states that as the sample size N tends to infinity, the order n(N) (asa function of N) of the selected ARX structure (2.7) is allowed to tend to infinity but with a much slowerincrease rate than N.Assumption 2.3.1. For the high-order ARX model (2.7), it holds thatn(N)→ ∞, n(N)3+δ/N→ 0, as N→ ∞, (2.10)where δ > 0 is some constant.It is straightforward that the parameters in (2.7) can be estimated with ordinary least-squares. DenoteηˆN = ηˆn(N), (2.11)to represent the least-squares estimates of the parameter ηn when n is allowed to tend to infinity as a functionof N. We also define η0 as a vector stacking the infinite number of true parameters in the high-order ARX222.3. The closed-loop ARX-OE identification methodmodel, i.e.,η0 = [a01 . . . ,a0n, . . . ,b01, . . . ,b0n, . . .]T . (2.12)In the following sections, we use A0(q) and A(q,η0) interchangeably as well as B0(q) and B(q,η0).Before proceeding to the second step of the ARX-OE method, we present the following lemma [88].Lemma 2.3.1. Consider the true high-order ARX model in (2.5) and the selected model structure in (2.7). Ifthe previous assumptions hold, then for the least-squares estimate ηˆN , we havesupω∣∣A(e jω , ηˆN)−A0(e jω)∣∣→ 0, w.p.1, as N→ ∞. (2.13)Note that we can acquire a similar statement for B(e jω , ηˆN). However, as will be explained below, weare only interested in A(e jω , ηˆN). Lemma 2.3.1 asserts that, asymptotically, in both the sample number andthe order of ARX model, the estimate A(q, ηˆN) of A0(q) converges almost surely to the true value. Noticethat this lemma holds regardless of whether the data are gathered in open-loop or closed-loop, as long asthe corresponding assumptions are satisfied. As will be seen later, this lemma plays an essential role in theasymptotic analysis of the proposed ARX-OE method.2.3.2 Second step: OE modeling with filtered input and output dataIn the second step of the proposed approach, we perform an OE model identification on the filtered input andoutput signals. Here the filter is chosen as the estimated A(q, ηˆN) from the first step. For ease of notation,from now on we define the operation of filtering a signal s(t) using A(q, ηˆN) ass(t, ηˆN) = A(q, ηˆN)s(t),to show this explicit dependence. With this notation, the filtered input and output signals are as followsy(t, ηˆN) = A(q, ηˆN)y(t), u(t, ηˆN) = A(q, ηˆN)u(t).To estimate the process model, we fit the following OE model to filtered input-output data, i.e.,y(t, ηˆN) = G(q,ρ)u(t, ηˆN)+ e(t), ρ ∈Ωρ , (2.14)232.4. Asymptotic analysis of the ARX-OE methodwhere a priori information on the process model can be imposed, e.g., a FOPTD model for the paper machineexample in the simulation section. The one-step-ahead predictor of the above OE model isyˆ(t|t−1,ρ, ηˆN) = G(q,ρ)u(t, ηˆN), (2.15)and the resulting prediction error isε(t,ρ, ηˆN) = y(t, ηˆN)− yˆ(t|t−1,ρ, ηˆN) = [G0−G(q,ρ)]u(t, ηˆN)+ A(q, ηˆN)A0(q) e(t). (2.16)For the prediction error method, the optimal parameter is obtained by minimizing the following objectivefunctionρˆN = arg minρ∈ΩρVN(ρ, ηˆN) =1NN∑t=112ε2(t,ρ, ηˆN). (2.17)Note that solving the OE model identification in (2.16)-(2.17) often involves nonconvex optimization andthus the global minima in general cannot be guaranteed. However, in this work we do not intend to developtechniques to overcome this issue. For details on this topic, see [89].2.4 Asymptotic analysis of the ARX-OE methodIn this section, we will analyze the consistency and asymptotic distribution of the proposed closed-loopARX-OE identification method.2.4.1 Consistency analysisWe show that the estimate of process model approaches the true model as the sample size N tends to infinity.First, by replacing A(q, ηˆN) in (2.16) using A(q,η0), let us define the prediction error under the ideal filterA(q,η0) asε(t,ρ,η0) = y(t)− yˆ(t|t−1,η0) = [G0−G(q,ρ)]u(t,η0)+ e(t), (2.18)and the corresponding loss function asVN(ρ,η0) =1NN∑t=112ε2(t,ρ,η0).242.4. Asymptotic analysis of the ARX-OE methodDefiningV (ρ,η0)∆= E12ε2(t,ρ,η0), (2.19)the following theorem shows that for large sample size the loss function (2.17) coincides with (2.19) almostsurely.Theorem 2.4.1. Suppose that Assumptions 2.2.1–2.3.1 hold and consider the loss functions under prefiltersA(q, ηˆN) and A(q,η0) as shown in (2.17) and (2.19), respectively. If follows thatsupρ∈Ωρ∣∣VN(ρ, ηˆN)−V (ρ,η0)∣∣→ 0, w.p.1, as N→ ∞, (2.20)where V (ρ,η0) is defined as in (2.19).Proof. See Appendix A.1. Based on the arguments in Theorem 2.4.1, we are ready to show that the estimate of process model isconsistent through the following theorem.Theorem 2.4.2. Consider the Box-Jenkins model (2.1) and the equivalent ARX model (2.5). Assume thatthe output-error model in (2.14) is uniformly stable. Then under Assumptions 2.2.1–2.3.1, the parameterestimate from the ARX-OE method in (2.17) is consistent, i.e.,ρˆN → ρ∗ = arg minρ∈Ωρ12pi∫ pi−pi∣∣G0(e jω)−G(e jω ,ρ)∣∣2 Φu(ω)|H0(e jω)|2 dω, w.p.1, as N→ ∞. (2.21)Proof. Notice that due to the delay in G0 and G(q,ρ), the term [G0−G(q,ρ)]u(t,η0) in (2.18) containsonly e(t− s),s≥ 1, and thus is not correlated with e(t). From the Parseval’s theorem, it follows thatV (ρ,η0) =14pi∫ pi−pi∣∣G0(e jω)−G(e jω ,ρ)∣∣2 ∣∣A0(e jω)∣∣2Φu(ω)+σ2e .Thus based on Theorem 2.4.1, we conclude that (2.21) in Theorem 2.4.2 holds. Remark 2.4.1. Theorem 2.4.1 and Theorem 2.4.2 verify the consistency of process model estimate from theproposed ARX-OE method. It should be pointed out that in practice, proper selection of the ARX model mayrequire trial and error or be based on a priori knowledge of the process. Although above results are derivedunder the assumption that the ARX model order tends to infinity, our experience shows that sufficiently high252.4. Asymptotic analysis of the ARX-OE methodorder is enough to give high-quality estimates of the noise model. However, in practice, the regularizationis vital in the first step to reduce the variance associated with the parameter estimate in high-order ARXidentification.2.4.2 Asymptotic distributionIn this section, for simplicity, we restrict our asymptotic distribution analysis for the proposed ARX-OEalgorithm to linear feedback case, i.e., the controller in (2.2) is reduced to (suppose r(t) is at the processinput site)u(t) =−K(q)y(t)+ r(t), (2.22)where K(q) is the linear feedback controller. We assume, without loss of generality, that r(t) is an excitationsignal independent of e(t). Let us suppose that all relevant closed-loop transfer functions are exponentiallystable under (2.22). The closed-loop system is denoted asy(t) = S0G0r(t)+S0H0e(t), (2.23)u(t) = S0r(t)−S0KH0e(t), (2.24)where S0 is the sensitivity function, i.e.,S0 =11+G0K. (2.25)Before proceeding to the main results, the following lemma is necessary for the analysis of asymptoticcovariance of high-order ARX model parameter estimates [88].Lemma 2.4.1. Assume that polynomials A0(q) and B0(q) in the ARX representation (2.5)-(2.6) of the Box-Jenkins model (2.1) are stable. Suppose that Assumption 2.3.1 holds, then∥∥∥E[N(ηˆN− η¯n(N))(ηˆN− η¯n(N))T ]−σ2e [Rn(N)]−1∥∥∥2→ 0, as N→ ∞, (2.26)where η¯n(N) is the expected value of ηˆN andRn = E[ϕnt (ϕnt )T ], ϕnt = [−y(t−1) . . . − y(t−n) u(t−1) . . . u(t−n)]T . (2.27)262.4. Asymptotic analysis of the ARX-OE methodThe above lemma explicitly shows the asymptotic covariance of parameter estimates. It will be utilizedin the following theorem to derive the asymptotic distribution of parameter estimates from the closed-loopARX-OE algorithm. Note that in the following theorem we denote G′ρ(ρ0) = dG/dρ|ρ=ρ0 and so on toreduce the notational burden. First we need the following lemma.Lemma 2.4.2. Let {e(t)} be a sequence of independent random variables with zero mean and constantvariance. Suppose that z(t) is a function of {e(k), k ≤ t−1}, then the following relation holdsE[z(t)e(t) · e(s)z(s)] = E[z(t)z(s)] ·E[e(t)e(s)] = σ2eE[z2(t)]. (2.28)Proof. Given the condition in this lemma, the signal z(t) can be expressed as, z(t) = f (e(t− 1),e(t−2), . . .), where f (·) is some nonlinear function. Then it is straightforward that z(t) is independent of e(t) andsimilarly z(s) is independent of e(s). If t ≤ s−1, then e(s) is independent of the other three terms and as aresult, (2.28) will be zero. Analogously, if s ≤ t− 1, then (2.28) is also zero. Thus (2.28) is nonzero onlywhen t = s, and the right-hand side of (2.28) is obtained. Lemma 2.4.2 is useful in the derivation of asymptotic distribution of the ARX-OE method. Beforepresenting the relevant results, first we may notice from (2.27) thatϕnt = −G0ΓnΓnu(t)+ − 1A0Γn0e(t)= −G0ΓnS0ΓnS0r(t)+ G0ΓnS0KH0− ΓnA0−ΓnKH0S0e(t)= −G0ΓnS0 −ΓnH0S0ΓnS0 −ΓnKH0S0 r(t)e(t) , (2.29)where Γn = [1 q . . . q−n]T . Note that the second equality follows by substituting u(t) with (2.24). Thethird equality uses A0H0 = 1, as well as (2.25). Now let us establish the main theorem on the asymptoticdistribution of parameters estimates.Theorem 2.4.3. Suppose that assumptions 1-4 hold and the feedback controller is linear as in (2.22). Con-sider the parameter estimates ρˆN in (2.17) from the closed-loop ARX-OE method. We assume that ρˆN → ρ0,272.4. Asymptotic analysis of the ARX-OE methodw.p.1 as N→ ∞, then√N(ρˆN−ρ0)∼ AsN(0,Pθ ), (2.30)wherePθ =[E[ψ(t,ρ,η0)ψT (t,ρ,η0)]]−1 Q[E[ψ(t,ρ,η0)ψT (t,ρ,η0)]]−1 ,and E[ψ(t,ρ,η0)ψT (t,ρ,η0)]is the derivative of the predictor in (2.15) w.r.t. parameter ρ , evaluated at ρand η0 (see (A.3)). The term Q isQ = σ2eE[ψ(t,ρ0,η0)ψT (t,ρ0,η0)]+Zn(N)2 [Rn(N)]−1(Zn(N)2 )T +Zn(N)2 [Rn(N)]−1σ2e·E[ϕn(N)t ψT (t,ρ0,η0)]+σ2eE[ψ(t,ρ0,η0)(ϕn(N)t )T][Rn(N)]−1(Zn(N)2 )T , (2.31)whereZn(N)2 = −[E[G′ρ(ρ0)S0Ke(t) ·ΓTn(N)H0e(t)] 01×n(N)],and Rn(N) is defined in (2.27) with ϕnt defined in (2.29).Proof. See Appendix A.2. Remark 2.4.2. Unlike open-loop case [89], the proposed closed-loop ARX-OE method cannot supply effi-cient estimates for the system. Specifically, the parameter covariance of the OE step will be dependent onthe parameter covariance Rn from the first high-order ARX modeling step. However, this method providesconsistent estimates and is applicable to routine operating data. Although the ARX-OE method is similarto the asymptotic method (ASYM) approach proposed by Zhu [90], our approach mainly applies to routineclosed-loop data without external excitations, since for process monitoring such data are the most commontype in practice. These scenarios have certain unique problems, such as the informativeness of closed-loopdata, that will be discussed in the next section.Remark 2.4.3. A similar result is reported in Corollary 10 in [73] on asymptotic distribution of the parameterestimates under direct identification with orders of noise model tend to infinity. It is shown that directidentification with current noise model yields minimal covariance on parameter estimates. Investigationson the possible links between these two results will be a promising future direction and this will provide282.5. Identification based on routine operating datavaluable insights to the interpretation of (2.31).2.5 Identification based on routine operating dataMost on-line process monitoring tasks, e.g., MPM detection for MPC, require closed-loop identificationbased on routine operating data where external excitation signals may not exist. In this section, we discussseveral important aspects of applying the above ARX-OE method in this situation.2.5.1 Estimate of the controller inverseIt is well-known that if there are no external excitations, nonparametric closed-loop identifications (e.g.,spectral analysis method) may result in an estimate of the inverse of controller. This is due to the possibilityof relating input-output data through the feedback instead of through the process and noise models. Inparticular, when r(t) = 0, (2.23)-(2.24) are equivalent toy(t) = G0(q)u(t)+H0(q)e(t), (2.32)u(t) = −K(q)y(t). (2.33)The modeling error of estimating G0 is always larger than zero, while the modeling error of estimating−1/K(q) between y(t) and u(t) is zero. Thus nonparametric methods will take the controller inverse asan estimate of the process since the corresponding prediction error is the minimum. One solution to thisissue is to impose time-delay to the identification method where a large time-delay will help prevent yieldingthe inverse of controller as the process model estimate. A number of delay estimation approaches usingclosed-loop data are available in the literature (refer to [91], [92]). Time-delay estimation is a necessary stepbefore performing the closed-loop ARX-OE identification method. For our paper machine examples in thesimulation section, time-delay is assumed to be available and large enough to avoid estimating controllerinverse as the process model estimate.2.5.2 Sufficient excitationIt has been shown in the literature (e.g., [93]) that for a linear closed-loop system without any externalexcitations, its identifiability is related to the order of regulator and time-delay. Specific conditions on the292.6. Examplesclosed-loop identifiability are investigated in [93] and [94]. It is concluded that a complicated controller anda large time-delay will enrich the informativeness of closed-loop data. Thus nonlinear controllers are favoredfrom this perspective. Fortunately, for most industrial processes (e.g., the paper machine) controlled by MPC,routine input-output data is generally sufficiently exciting (i.e., contains sufficiently many nonzero frequencycomponents in the spectra) to apply the above ARX-OE method, especially when the MPC operates withactive constraints.2.5.3 The role of noiseFor most closed-loop identification methods relying on external excitations, such as the indirect identificationmethod, two-stage method and projection method, the presence of noise inflates the variance of the associatedparameter estimates. However, for the direct identification method, it is proved in [73] that the noise canindeed reduce the variance of parameter estimates (refer to Corollary 10 in [73]). This argument also appliesto the situation of implementing our ARX-OE identification method to the routine operating data in whichnoise is the only external signal. Moreover, in this case, large noise may trigger the constraints in MPC andthus renders the controller to be nonlinear, which in general enriches the informativeness of closed-loop data.2.6 ExamplesIn this section, we provide one SISO example and one MIMO example from a paper machine to verify theproposed closed-loop ARX-OE identification method.2.6.1 Case I: univariate MD control of the paper machineWe begin by studying a SISO process from the machine direction. For this particular MD process, thevalve of thick stock pump is manipulated to alter the amount of pulp slurry provided to the headbox, whichultimately is used to control the basis weight of the paper sheet being produced. The true process has thefollowing first-order modelG0(s) =1.054550s+1e−80s.The sampling interval is 5 seconds and the discretized process model is shown to beG0(q) =b1+aq−1q−17,302.6. ExamplesTable 2.1: Tuning parameters of the SISO MD MPCTuning parameters Values Tuning parameters ValuesActuator movement weight 0.01 Actuator deviation weight 0CV target value 23.75 lbs/3000 f t2 MV target value 131 gpmPrediction horizon 40 Control horizon 4CV target weight 1 MV target weight 1CV min-max weight 1 MV min-max weight 1Bounds of MV [0 3000] Limit of MV movement 0.1667where b = 0.1003, a = −0.9048. The controller employed is MPC with tuning parameter specificationsshown in Table 2.1. The true noise model is chosen asH0(q) =1−0.9q−11+0.9q−1e(t),where e(t) is white noise with zero mean and variance σ2e = 0.01. In this simulation, we assume thatthere is no MPM. To verify the effectiveness of the proposed closed-loop ARX-OE algorithm with routineoperating data, we leave the setpoint unchanged during the entire simulation, i.e., the noise is the onlyexternal signal into the system. To avoid obtaining the inverse of controller, we assume that the true processdelay is available and has been specified to the identification algorithm. With above settings, the simulationof closed-loop system lasts for 15 hours (10800 samples of input-output data). Input and output data fromthe simulation is shown in Figure 2.1 below.To more accurately identify the noise model we test scenarios with different ARX model orders in thefirst step of the ARX-OE algorithm. Note that here we use na and nb to denote the orders of A(q,ηn) andB(q,ηn) in (2.8), respectively. Figure 2.2 shows the comparison between true and estimated IR coefficientsof the inverse noise model. We can see clearly that for this slow decaying inverse noise model, high-orderARX can better capture those IR coefficients. Figure 2.3 demonstrates the step responses of our estimatedprocess model under different ARX orders. All three situations give high-quality process model estimatesfrom their comparison with the true step response. The estimated gain and time constant pairs from thesethree cases are [0.0975 −0.9114], [0.1033 −0.907], and [0.1036 −0.9084], respectively. We can see thatthe estimated parameters from the scenario with na = 30, nb = 30 are closer to the true values, even thoughthe step responses in these three situations are almost identical. Notice that in order to acquire smoothlydecaying IR coefficients in the first step, we added a regularization term to the least-squares estimation.Another benefit of using the regularization term is that it can avoid overfitting caused by the large number of312.6. Examplesparameters in the first step.To further illustrate the effectiveness of the proposed closed-loop identification method, we perform 1000Monte-Carlo simulations with the closed-loop MD process above. For simplicity, we only consider the casena = 30, nb = 30. In each Monte-Carlo simulation we collect 15 hours of data. The proposed closed-loopARX-OE method is applied to each data set and the estimates of parameters b and a are recorded. Thehistograms of estimates for b and a are demonstrated in blue bars in Figure 2.4 and Figure 2.5, respectively.The corresponding fitted normal curves are shown in the blue line. We can see that both parameter estimateshave an approximate normal distribution. The mean values in both histogram plots are close to the trueparameter values and this indicates that the proposed method can give consistent estimates. Furthermore, asa comparison, we apply the closed-loop direct identification method with correct and wrong specifications ofthe noise model order. For the latter case, we choose the noise model to have first order in the denominatorand zero order in the numerator (and thus we expect biased estimates in a and b). In Figures 2.4 and 2.5,the green and red bars illustrate the histograms of parameter estimates from direct identification methodwith correct and wrong noise model structure, respectively. The green and red curves are the correspondingfitted normal curves from the histograms. As expected, the direct identification with correct noise modelstructure gives an unbiased estimate whereas with wrong noise model specification it gives a biased estimate.In practice, a priori knowledge of the noise model structure is often unavailable and thus the proposedclosed-loop ARX-OE method shows great advantages under this circumstance compared with the directidentification method. Note that in this Monte-Carlo simulation we did not introduce any external excitationand thus it resembles the routine operating stage of industrial processes.2.6.2 Case II: multivariate MD control of the paper machineIn this example, we consider a MIMO MD process in the paper machine. For this process, one stock flowand two steam flows (Steam4 and Steam3) are manipulated to control three properties of the paper sheet,i.e., weight, press moisture and reel moisture. It is important to note that there exist extensive interactionsamong these variables. Specifically, changing the stockflow will affect all three CVs. Also, the two steamvariables have a coupling effect on the press and reel moisture. However, the influence on the weight causedby two steam flows can be considered negligible and thus the internal interactions among these variables can322.6. ExamplesTime in Samples0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000212223242526Simulated Output SignalTime in Samples0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000130130.5131131.5132Simulated Input SignalFigure 2.1: Simulated input and output data for the closed-loop MD process0 5 10 15 20 25 30 35 40Time in Samples00.511.52na=10,nb=10TrueEstimate0 5 10 15 20 25 30 35 40Time in Samples00.511.52na=20,nb=200 5 10 15 20 25 30 35 40Time in Samples00.511.52na=30,nb=30Figure 2.2: IR coefficients of the estimated noisemodel inverse with different ARX orders0 10 20 30 40 50 60 70 80 90 100Time in Samples00.511.5na=10,nb=10TrueEstimate0 10 20 30 40 50 60 70 80 90 100Time in Samples00.511.5na=20,nb=200 10 20 30 40 50 60 70 80 90 100Time in Samples00.511.5na=30,nb=30Figure 2.3: Step response of the estimated processmodel with different ARX orders332.6. Examplesbˆ0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16Counts020406080100120Histogram Plots of the Estimates of bFigure 2.4: Histogram of parameter b estimates in1000 Monte-Carlo simulations. Green: direct iden-tification method with wrong noise model struc-ture. Red: direct identification method with correctnoise model structure. Blue: closed-loop ARX-OEmethod. Red dashed line: the true parameter value.aˆ-1 -0.95 -0.9 -0.85 -0.8 -0.75Counts020406080100120140Histogram Plots of the Estimates of aFigure 2.5: Histogram of parameter a estimates in1000 Monte-Carlo simulations. Green: direct iden-tification method with wrong noise model struc-ture. Red: direct identification method with correctnoise model structure. Blue: closed-loop ARX-OEmethod. Red dashed line: the true parameter modeled by a lower triangular transfer function matrix. We have the following MIMO MD process modely1(t)y2(t)y3(t)=1.054566.66s+1 e−72s 0 00.296046.02s+1 e−90s −0.14270.00s+1 e−114s 00.7530154.80s+1 e−42s −0.2380211.20s+1 e−30s −0.055524.84s+1 e−90su1(t)u2(t)u3(t) ,where the three CVs y1(t), y2(t) and y3(t), are weight, press moisture and reel moisture, respectively. Thethree MVs u1(t), u2(t) and u3(t), correspond to stockflow, Steam4 and Steam3, respectively. In this simu-lation, the selected sampling interval is 5 seconds and the above process model is discretized for controllerdesign. We have the discretized process model (denoting G0(q) as the transfer matrix)G0(q) =0.076271−0.9277q−1 q−15 0 00.030471−0.897q−1 q−19 −0.0025691−0.9817q−1 q−23 00.023931−0.9682q−1 q−9 −0.0055681−0.9766q−1 q−7 −0.010121−0.8177q−1 q−19 .342.6. ExamplesFor the noise model, we suppose that the noise sequences in three output channels are mutually independent,with the noise model asd(t) =1−0.5q−11+0.5q−1 0 00 1−0.3q−11+0.6q−1 00 0 1+0.2q−11+0.7q−1e1(t)e2(t)e3(t) ,where three noise sequences have the same variance, σ2e = 0.1. For the ensuing discussion, we use Gi j(q) andHi j(q) to represent the scalar transfer function located at the (i, j)-th position of G0(q) and H(q), respectively.In this example, a multivariate MPC is used to control this MIMO process. The specific tuning parametersof the controller are illustrated in Table 2.2 above. The simulation duration of closed-loop MIMO MDsystem is set to 1150 minutes (13800 samples). The setpoint stays constant throughout the simulation andno external excitation is introduced. With current setup, the controller is very likely to work in a nonlinearmode. Therefore, closed-loop identification methods requiring a linear regulator will not be suitable in thissituation. In the following content, we examine the effectiveness of our proposed method with MIMO closed-loop data, assuming that we have no a priori knowledge on the true noise model structure. To identify theprocess model, we apply the proposed method channel by channel. Each output channel is considered as aclosed-loop multi-input single-output (MISO) system. A MISO high-order ARX model is first estimated byAi(q)yi(t) =i∑j=1Bi, j(q)u j(t)+ ei(t), i = 1,2,3, (2.34)whereAi(q) = 1+ai,1q−1+ . . .+ai,nq−n, Bi, j(q) = 1+bi, j,1q−1+ . . .+bi, j,nq−n,giving rise to an estimate of the inverse noise model in the i-th output channel. Define Aˆi(q), Bˆi, j(q) as therespective estimate of Ai(q) and Bi, j(q) from the above high-order ARX modeling step. We then apply theMISO OE identification after filtering the input and output signals using the obtained noise model estimate.Specifically, it follows thaty fi (t) =i∑j=1Gi, j(q,ρ)u fj (t)+ ei(t), i = 1,2,3,352.6. ExamplesTable 2.2: Tuning parameters of the MIMO MD MPCTuning parameters Values Tuning parameters ValuesActuator movement weight [0.01 0.01 0.01] Actuator deviation weight [0 0 0]CVs target weight [1 1 1] MVs target weight [1 1 1]CVs min-max weight [1 1 1] MVs min-max weight [1 1 1]Actuator movement weight [0.01 0.01 0.01] Actuator deviation weight [0 0 0]Target value for CV1 174 lbs/3000 f t2 Target value for MV1 165 gpmTarget value for CV2 2.3 % Target value for MV2 115 psiTarget value for CV3 6.2 % Target value for MV3 35 psiPrediction horizon 20 Control horizon 2Upper bounds of MVs [250 180 180] Lower bounds of MVs [0 0 0]Limits of CV movement [0.1 0.1 0.1] Limits of MV movement [0.2 1 1]wherey fi (t) = Aˆi(q)yi(t), ufj (t) = Aˆi(q)u j(t).The classic PEM can be applied to identify above MISO system. As before, we assume that the true time-delay for each channel is available. This assumption can be relaxed if external excitation signals are injectedinto the system or if the objective is not pursing a precise model estimate (e.g., MPM detection).Figure 2.6 and Figure 2.7 show the simulated CV and MV profiles of the closed-loop MD process,respectively. The estimated and true IR coefficients of the inverse noise model are given in Figure 2.8.Clearly, the estimated coefficients are very close to true values, ensuring the reliability of using the estimatednoise model for subsequent filtering operations. Figure 2.9 shows the step response comparison between theestimated and true process models. It is obvious from Figure 2.9 that the estimated process models are veryclose to the true ones. Specifically, we demonstrate explicitly the estimated process model Gˆ(q) for furthercomparison as followsGˆ(q) =0.07691−0.9270q−1 q−15 0 00.030621−0.8916q−1 q−19 −0.0024671−0.9819q−1 q−23 00.023751−0.9679q−1 q−9 −0.0056391−0.9760q−1 q−7 −0.0099751−0.0.8206q−1 q−19 .Comparing Gˆ(q) with the true process model G0(q) one can see that the closed-loop ARX-OE identifica-tion provides accurate parameter estimates with routine operating data even though there are no externalexcitation signals. Therefore, the proposed method can be used for routine process monitoring purposes(e.g., real-time controller performance monitoring and MPM detection) that require online estimation of the362.7. ConclusionFigure 2.6: The simulated output profiles (CVs) forthe MIMO MD process of a paper machineFigure 2.7: The simulated input profiles (MVs) forthe MIMO MD process of a paper machineTime in Samples0 5 10 15 20 25 30-0.500.511.5Impulse Response Coefficients of Estimated H11 InverseTrueEstimateTime in Samples0 5 10 15 20 25 30-0.500.511.5Impulse Response Coefficients of Estimated H22 InverseTime in Samples0 5 10 15 20 25 30-0.500.511.5Impulse Response Coefficients of Estimated H33 InverseFigure 2.8: IR coefficients of the true and estimatedinverse noise model for each output channel.Time in Samples0 50 10000.511.5Estimate of G11Time in Samples0 50 10000.20.4Estimate of G21Time in Samples0 50 100-0.2-0.10Estimate of G22Time in Samples0 50 10000.51Estimate of G31Time in Samples0 50 100-0.4-0.20Estimate of G32Time in Samples0 50 100-0.1-0.050Estimate of G33Figure 2.9: Step response of the true (red solid line)and estimated (blue dashed line) process models.process model during routine operation.2.7 ConclusionThis chapter presents a novel closed-loop identification method that can correct the bias inherent in thedirect identification method due to insufficient specification of the noise model. First, a high-order ARX isidentified to obtain an estimate of the noise model. In the second step, we filter input and output data byusing the estimated inverse of noise model, and then perform an OE identification with filtered input-outputdata to obtain the process model estimate. It is shown that this closed-loop ARX-OE identification approach372.7. Conclusioncan give consistent estimates and that the parameter estimator is asymptotically normally distributed. Thismethod is applicable not only when the controller is nonlinear but also for the case where closed-loop datacontain no external excitations. Therefore, this method exhibits great potential for purposes such as controllerperformance monitoring and MPM detection that require process model identifications based on routineoperating data.38Chapter 3Model-plant Mismatch Detection for MDProcesses3.1 IntroductionModel-plant mismatch (MPM) is the main source of control performance degradation in industrial MPCs. Asmentioned in the previous chapters, an MPM detection scheme must not be sensitive to changes in the noisemodels since it may result in increased false alarms, and it should not require external perturbations as theywill disturb normal operations of industrial processes. In this chapter, we propose a novel MPM detectionapproach that addresses these two challenges. Our idea is inspired by the historical data based benchmarksused in controller performance monitoring [95]. In those methods, to assess the control performance, processvariable metrics under actual data are compared with those under a set of historical data which are collectedduring a period with satisfactory performance. Similarly, in our method, we partition routine operating datainto a “training” stage (that we believe is generated with no MPM) and a “testing” stage. The training dataserves as a benchmark against which we evaluate the presence of MPM in the test data. Specifically, we pro-pose a novel method, based on closed-loop identification and support vector machine (SVM) classification,that can monitor MPM and noise change independently and thus can directly discriminate MPM from noisemodel change. The most striking benefit of our method is that it is suitable for situations where externalexcitations may not exist.This chapter is outlined as follows. We begin in Section 3.2 by analyzing the drawbacks of several currentmethods for MPM detection. In Section 3.3, we elucidate the framework of the proposed MPM detectionapproach. Section 3.4 is devoted to developing a new closed-loop identification method that can provideconsistent parameter estimates for the plant model based on the ARX-OE method in Chapter 2. In Section3.5 we elaborate detailed procedures on training an SVM model based on the training data and implementing393.2. Drawbacks of several MPM detection methodsit in the test data to examine the MPM and noise model change. An industrial example from the MD processof paper machines is provided to verify the proposed approach in Section 3.6, followed by conclusions inSection Drawbacks of several MPM detection methodsThe minimum variance controller (MVC) benchmark has become one of the most popular tools in assessingcontroller performance due to its simplicity in computation and less required prior knowledge (only time-delay) of the process. However, as pointed out in previous sections, one pitfall of using MVC benchmark todetect MPM is that not only MPM but also disturbance changes can affect the performance metrics. It seemsimpossible to separate MPM from disturbance change by using the MVC benchmark. The rationale behindthis conclusion is briefly outlined as follows.Consider a SISO process under regulatory control (the setpoint is constant), where the process is denotedas G(q) = q−dG˜(q). Notice that d is the time delay and G˜(q) is the delay-free plant. Define the controllerto be K(q). Suppose that the output noise is a filtered white noise with a noise model H(q) and that e(t) is aGaussian white noise. The closed-loop transfer function from the noise e(t) to the output y(t) is theny(t) =H(q)1+q−dG˜(q)K(q)e(t). (3.1)From the Diophantine decomposition of H(q) we haveH(q) = h0+h1q−1+ . . .+hd−1q−(d−1)︸ ︷︷ ︸F(q)+Rq−d , (3.2)then the closed-loop transfer function (3.1) is further written asy(t) = F(q)e(t)+L(q)e(t−d), (3.3)where L(q) = R−FG˜K1+q−dG˜K is a proper transfer function, dependent on the controller. Here the argument qis omitted for simplicity of notations. The basic idea of the MVC benchmark is to consider F(q)e(t) asa controller-invariant part and the minimum output variance is achieved by choosing a suitable controller(which is MVC) such that the second term L(q)e(t − d) vanishes. Thus the theoretically minimal output403.2. Drawbacks of several MPM detection methodsGˆG0tuttvtyˆtyKFigure 3.1: The IMC structurevariance will be the same, regardless of the implemented controller, as long as the disturbance model isunchanged. The actual controller performance is evaluated by the ratioη =var[F(q)e(t)]var[y(t)]. (3.4)If η is close to 1 then it indicates good control performance as the underlying controller is acting to achievethe minimal output variance. From (3.3) and (3.4) it is straightforward to find that if the disturbance modelH(q) changes, then η will be affected and will be no longer valid for the performance monitoring. Therefore,using the MVC benchmark (3.3) to detect MPM might be misleading and a more reliable approach insensitiveto disturbance changes is necessary.Another family of methods in detecting model-plant mismatch is based on system identification theory.A straightforward idea is to directly identify the process model online and compare it with the initial modelused by the controller. Any significant discrepancy between them is then regarded as a model-plant mis-match. This idea requires that the identification method give a reliable estimate of the process model withthe presence of feedback and with routine operating data in which external dither signals may not exist.Among all well-developed system identification methods the direct identification can fulfill this demand.But the condition for a consistent estimate is that the closed-loop data has to be informative enough andthe selected process and disturbance model structures are flexible enough to contain the true model. Theinformativeness issue might be resolved if the controller is complex enough and most actual MPCs in indus-trial processes generally can satisfy this requirement. However, the noise model structure issue is difficultto address since it is not straightforward to acquire a priori information on the structure of a practical noisemodel and misspecification of the noise model often leads to a biased estimate of the process model.Another approach is to use process variables in the internal model control (IMC) structure (see Figure413.3. The MPM detection idea3.1) to identify the mismatch model ∆(q) = G(q)− Gˆ(q). From IMC, one can easily verify the followingclosed-loop relationships among process variables (considering SISO case):ε(t) =∆K1+∆Kr(t)+11+∆Kv(t), (3.5)u(t) =K1+∆Kr(t)− K1+∆Kv(t), (3.6)where ε(t) is the output error (or model residual), u(t) is the process input, r(t) is the dither signal and v(t)is the output disturbance. Note that v(t) is assumed to be filtered white noise, v(t) = H(q)e(t) and e(t) isGaussian white noise. H(q) is assumed to be stable, monic and of minimum phase. It is obvious that if dithersignal is sufficiently exciting (notice that {r(t)} and {v(t)} are independent), the mismatch ∆(q) can alwaysbe factored out from (3.5)-(3.6). If there is no dither signal, i.e., r(t) = 0, the transfer function between ε(t)and u(t) is further shown to beu(t) =− 1Kε(t). (3.7)Thus without external excitation it is impossible to determine the mismatch model by identifying the transferfunction between u(t) and ε(t). Indeed the identified result is always the inverse of the controller. In thiswork we propose a novel closed-loop identification method that is able to identify the process and noisemodels using routine operating data. These models are further used by the proposed mismatch detectionscheme.3.3 The MPM detection ideaWe mention first that the noise model change detection follows the same course as the MPM detection inour method. So in subsequent sections, our attention is mainly on introducing the MPM detection approach.The proposed method is based on a novel closed-loop identification algorithm which is capable of provid-ing unbiased estimates for the process models with routine operating data, albeit with large variance. Theinevitable variance associated with process model estimates impedes us from directly comparing our resultswith nominal models to identify the mismatch. In other words, discrepancies between the model estimatesand the true plant can not always be blamed on the MPM and they may simply be an artifact of the variance inthe estimates. Thus we have to form a reasonable uncertainty bound around the estimated process model dueto the variance of parameter estimates. Models outside this uncertainty range are regarded as mismatched423.3. The MPM detection idea5RXWLQHRSHUDWLRQ 7HVWLQJ,' ,' 7HVWLQJ030'HWHFWLRQGHOD\7UDLQLQJGDWD 7UDLQLQJGDWDWLPH030LQGLFDWRU5RXWLQHRSHUDWLRQFigure 3.2: Illustration of the training and testing data (ID stands for identification).models. Such an uncertainty range can be naturally captured by using the SVM technique. Note that to syn-thesize all possible mismatch situations (e.g., gain mismatch and time constant mismatch) with an overallmetric, we would represent process model estimates in the FIR form. With a high order FIR model we cancapture the process dynamics of any order model. Now the problem of detecting MPM can be reduced tothat of checking if the estimated FIR coefficients are “equal to” the FIR coefficients of the current model.However, comparing the high dimensional FIR coefficient vectors is non-trivial and it forms the motivationfor using SVMs.Fig. 3.2 demonstrates the proposed approach to detect the MPM. The industrial data is split into trainingdata and testing data. The training data is collected during a time interval in which the MPM is absent, e.g.,the period during or right after the identification stage, as shown in the above figure. Notice that our algorithmis in the form of moving windows and the amount of training data can be properly selected according to thewindow size. For each window, we apply closed-loop ARX-OE identifications to obtain an estimate of theprocess model. With a set of estimated process models from moving windows in the training data, a one-class SVM is trained which can be interpreted as an appropriate boundary encompassing this set (see bluecurves in Fig. 3.3- 3.4). Any model inside this boundary is considered as normal, indicating the absence ofMPM. For the testing data, a similar moving window is applied and each process model estimate obtainedis examined by the SVM model to predict whether it is located inside the boundary. If so, the SVM returnsa positive score, implying no MPM and otherwise, it returns a negative score to indicate the presence ofMPM in the current moving window. To be cautious in triggering an identification experiment, the MPMalarm is not raised until we gather a large number of negative scores. Note that the entire training and testingoperations are carried out with routine operating data free of external excitations. In the following sectionswe focus on the closed-loop identification as well as SVM training and testing.433.4. Closed-loop identificationTime in Samples0 5 10 15 20 25 30Amplitude00.0050.010.0150.020.0250.030.035Impulse Response Coefficients of Estimates of G21Lower boundUpper bound   Cluster of'Good Models'Figure 3.3: Illustration of the SVM training. Eachcurve shows the IR coefficients of the estimatedmodels from training data. The upper and lowerbounds define the width of the cluster.SVM modelImpulse Response CoefficientsTraining modelsTesting modelsFigure 3.4: Illustration of the MPM detection idea.Here the training and testing models refer to the pro-cess model estimates from training and testing datasets.3.4 Closed-loop identificationFrom the explanations above we can see that closed-loop identification plays a fundamental role in deter-mining the performance of our proposed MPM detection algorithm. In this section, we apply the proposedARX-OE method in Chapter 2 to routine operating data to obtain process and noise model estimates. Suffi-cient conditions guaranteeing the informativeness of routine closed-loop data are provided.Consider the SISO Box-Jenkins model (2.1) and the controller in (2.2). It should be noted that a persis-tently exciting dither signal can always guarantee that the closed-loop data is informative, regardless of thecontroller orders. However, without external excitations, for linear controller to achieve the informativenessrequirement, the following relationship must be satisfied for Box-Jenkins models [93]:max(nx−n f ,ny−nb)≥ nd +min(nx,n f ), (3.8)where nx and ny are numerator and denominator orders of the linear controller, respectively. nb and n f standfor the orders of polynomials in the process model numerator and denominator, respectively. nd denotesthe order of numerator polynomial in the noise model. One observation from (3.8) is that more complexcontrollers and a larger time-delay often imply richer information in the closed-loop data [86]. Additionally,if the controller is nonlinear, as is the case of MPC, the closed-loop data is generally sufficiently excitingfor relevant system identifications [15]. Another benefit of a nonlinear controller is that it can prevent the443.4. Closed-loop identificationidentification algorithm from returning an estimate of controller inverse. Moreover, we assume that the truetime-delay is known a priori and is specified to the identification algorithm 2.Following the main closed-loop ARX-OE identification algorithm presented in Chapter 2, we have thefollowing theorem regarding the parameter estimate ρˆN .Theorem 3.4.1. Consider the true Box-Jenkins model for the plant (2.1) as well as the equivalent high-order ARX form (2.6). Assume that conditions in Theorem 2.4.1 hold and that the plant model is correctlyparameterized. Then the parameter estimate ρˆN from the closed-loop ARX-OE identification method isconsistent, i.e., we haveρˆN → ρ0, w.p.1, as N→ ∞, (3.9)where ρ0 is the true parameter value of G0 if the condition (3.8) is satisfied for high-order ARX model (2.6).Moreover, the estimated parameter value ρˆN is asymptotically Gaussian distributed with mean value ρ0.Proof. (Outline) Defineε(t,ρ) := [G0−G(q,ρ)]u(t)+ 1A(q,η0)e(t),andVN(ρ, ηˆN) :=1NN∑t=112[A(q, ηˆN)ε(t,ρ)]2,VN(ρ,η0) :=1NN∑t=112[A(q,η0)ε(t,ρ)]2.Due to Theorem 2B.1 in [15] and Lemma 2.3.1, we haveVN(ρ, ηˆN)→V (ρ,η0), w.p.1, as N→ ∞,where V (ρ,η0) := E 12 [A(q,η0)ε(t,ρ)]2. Applying Parseval’s theorem yieldsV (ρ,η0) =14pi∫ pi−pi|G0(e jω)−G(e jω ,ρ)|2 Φu(ω)|H0(e jω)|2 dω+σ2.2It will be shown shortly that knowing the true time-delay is an excessively strong requirement for the proposed MPM detectionmethod. In fact, a prior knowledge for time-delay (might be wrong) is enough for the purpose of MPM detection.453.5. MPM detectionTherefore, if G(q,ρ) is correctly parameterized and closed-loop data is informative enough for selectedmodel structures, we can conclude that (3.9) is valid. The proof for the Gaussian distribution of ηˆN followsthe lines of Theorem 9.1 in [15] and is thus omitted here. Remark 3.4.1. Despite the premise on the correct parameterization of G0(q) in Theorem 3.4.1, it is notsupposed to be a restrictive limitation on the proposed closed-loop identification method. As will be shownin the following section, the SVM in MPM detection is trained and tested on the FIR form of G(q, ρˆN). Thususing an FIR model in the OE identification step, if a priori information about G0(q) is not accessible, issuggested in such cases to eliminate the bias.Note that the explicit expression of the variance of ρˆN is non-trivial. It is thus recommended to use a setof training data from which we can obtain a collection of process model estimates as an approximation tothe variance of transfer function estimates.3.5 MPM detectionBefore delving into the MPM detection algorithm, let us briefly review the SVM technique.3.5.1 One-class learning support vector machinesSVM is a well-known binary classification technique. The idea behind a two-class SVM is to choose ahyperplane in the data space to separate two distinct classes of data. However, for linearly separable data,there are typically infinitely many hyperplanes that are able to discriminate the two classes. The SVM seeksthe one that not only separates the two groups but also maximizes its geometric distance (known as themargin) to either class [96]. Therefore, the SVM is essentially an optimal separating hyperplane in the senseof robustness by significantly reducing the false classifications if other separating hyperplanes were used.Suppose we are given a set of training data{x1, . . . ,xl}, xi ∈X ⊂ Rr, (3.10)and the corresponding (binary) labels {y1, . . . ,yl}, where l stands for the number of data points. r is thedimension of input space X that the training data are located in. As the data are grouped into two classes,for convenience, we use yi = 1 to denote the first class and yi = −1 to denote the second class. A generic463.5. MPM detectiontwo-class SVM training problem is formulated as [97]minw,b,ζ12‖w‖2+Cn∑i=1ζi (3.11)s.t. yi(wT xi+b)≥ 1−ζi, i = 1, . . . , l, (3.12)ζi ≥ 0, i = 1, . . . , l, (3.13)where w ∈ Rr and b ∈ R are two parameters (slope and offset) characterizing a hyperplane. ζi, i = 1, . . . ,n,are nonnegative slack variables and C is a weight parameter to compromise between maximizing the marginand minimizing the training errors. Note that the presence of slack variables allows local violations ofseparating boundary determined by the hyperplane. This dramatically enhances the SVM’s flexibility intreating nonseparable data sets. A major advantage of SVM in classifications, relative to other techniques,is the ease of generalization to incorporate kernel tricks. This operation is achieved by solving the dualproblem,minαl∑i=1αi− 12l∑i=1l∑j=1αiα jyiy jκ(xi,x j) (3.14)s.t. 0≤ αi ≤C, i = 1, . . . , l, (3.15)l∑i=1αiyi = 0, (3.16)where αi is the Lagrangian multiplier, and κ(·, ·) is the kernel function that will be explained in (3.18). Thecorresponding prediction function is shown asp(x) =l∑i=1αˆiyiκ(xi,x)+ bˆ, (3.17)where αˆ and bˆ are respectively the obtained Lagrangian multiplier and offset. It is easy to verify that thesolution αˆ is sparse from KKT conditions. Those xi corresponding to nonzero αˆi are known as supportvectors. The sparsity of αˆ can significantly simplify the predictions in (3.17) as the summation only involvesvery few terms. The inclusion of a kernel function enables SVM to deal with nonlinear classifications.As a convention, the SVM is developed particularly for binary classification problem. However, for thespecific MPM detection problem, we use the set of process models from the training data as a reference grouprepresenting the behaviors of “no mismatch” process model cluster. The other group of data is ordinarily not473.5. MPM detectionaccessible since abnormal situations may occur in a variety of ways such as various parametric mismatches,irregular disturbances and so on. Thus the MPM detection is a one-class learning problem, which is alsoknown as the “novelty detection problem”.The one-class learning SVM is depicted in the feature space, i.e., a space the data is mapped into.Consider the set of training data samples in (3.10). Prior to one-class SVM training it is necessary to map thedata through Φ :X 7→ F into a (higher-dimensional) feature space F . The kernel function κ(x,y) is suchthat the inner product in the feature space can be evaluated in the input space asκ(x,y) =<Φ(x),Φ(y)>, ∀x,y ∈X . (3.18)A well-known kernel function that will be used hereafter is the Gaussian kernelκ(x,y) = e−‖x−y‖2/c, (3.19)where c is a parameter that is used to tune the sharpness of the Gaussian kernel function. It should be pointedout that with the Gaussian kernel function all data points in the feature space are located in the same orthantsince κ(x,y)> 0, ∀x,y ∈X . Thus it is possible to find a hyperplane to separate the origin from the trainingdata in the feature space with maximized margin. With this idea the one-class SVM training problem isformulated as [98]minw,ξ ,b12‖w‖2+ 1vll∑i=1ξi−b (3.20)s.t. wTΦ(xi)≥ b−ξi, ξi ≥ 0, (3.21)where w and b represent the slope and offset of the hyperplane in feature space. The term v ∈ (0,1] isa parameter tuning the upper bound of the fraction of outliers and lower bound of the fraction of supportvectors. ξ is a slack variable allowing for local violations of the hard boundary determined by the hyperplane.Solving the optimization problem (3.20)-(3.21) can be converted into solving the following dual problem,minα12l∑i, j=1αiα jκ(xi,x j) (3.22)s.t. 0≤ αi ≤ 1vl ,l∑i=1αi = 1. (3.23)483.5. MPM detectionIt is obvious that while the primal problem is formulated in the feature space, its dual problem can be resolvedin the input space by resorting to the kernel function. Thus we can avoid the intense computation arisingfrom large dimensions of the feature space. Efficient algorithms are available in the literature to solve thisdual problem. In the simulations presented in Section 3.6, we used sequential minimal optimization (SMQ)[98] to solve the dual problem above. A feature associated with the solution αˆ of dual problem is its sparsity,with most optimal dual variables αˆi valued at 0. Data points corresponding to nonzero optimal dual variablesare known as support vectors and it is revealed that the optimal w and b (denoted as wˆ and bˆ, respectively)are completely determined by those nonzero optimal dual variables. Furthermore, with kernel function, thedecision (or score value) function is also represented in the input space, instead of in the high-dimensionalfeature space, by the following,p(x) =l∑i=1αˆiκ(xi,x)− bˆ (3.24)where x is a test example. Note that the sum in (3.24) typically involves nα << l nonzero terms, where nαis the number of nonzero dual variables. This allows for efficient evaluation. For a given test example x, thevalue |p(x)| represents the distance of x to the separating hyperplane. If p(x) > 0, it means that x can beclassified into the initial class. Otherwise x does not belong to that class. We note that the introduction ofkernel functions significantly expands the flexibility of SVM in constructing separating boundaries, enablingit to generate a nonlinear classifier in the input space.A critical issue in applying one-class SVM training strategy to MPM detection is the limited amountof training data available in industrial processes. Taking the paper machine as an example, grade changes(setpoint changes) often take place on a daily basis and thus training data has to be collected after each gradechange to represent the current operating condition before carrying out MPM detection. Consequently, onlya few process model estimates from training data are available to build an SVM model. In order to overcomethis issue we use a resampling technique to enlarge the cluster of no mismatch models estimated from trainingdata before performing the SVM training.3.5.2 ResamplingThe main principle we adopt here is to fit a probability density function (PDF) to each impulse responsecoefficient of the estimated process model. Then a large number of samples can be generated by samplingrandomly from the estimated density function. More specifically, denote the FIR form of the estimated493.5. MPM detectionprocess model G(q, ρˆN) in the k-th moving window asG(q, ρˆkN) = gˆk0q−d + gˆk1q−d−1+ . . .+ gˆkmq−d−m, (3.25)where m is a pre-specified number. Here k = 1,2, . . . ,Nk, are the indices of moving windows in the trainingdata. It is straightforward that FIR coefficients gˆki , i = 0, . . . ,m, are asymptotically Gaussian distributed,given that ρˆkN has an asymptotically Gaussian distribution (cf. Theorem 3.4.1). For each coefficient gˆki ,several estimated values are obtained from moving windows in the training data. Then we can constructrough estimators for the mean and variance of each IR coefficientµˆi = µ(gˆ1i , gˆ2i , . . . , gˆNki ), i = 0, . . . ,m,σˆ2i = σ(gˆ1i , gˆ2i , . . . , gˆNki ), i = 0, . . . ,m,where µ(·) and σ(·) are some functions. One choice of these two functions is sample mean and samplevariance. Due to the limited amount of training data (Nk normally is small), the estimated PDF for each FIRcoefficient is much more conservative than the true PDF. Thus we use a parameter α to tune the width of thePDF to avoid this problem. The guidelines for selecting α are:• If we have plenty of training data, α is small;• If we have very few training data, α is large.The rationale behind these guidelines is that more training data may give us a more precise and reliableestimate of the PDF and vice versa. The next step is to use the resampling idea to randomly generate a largenumber of samples of each FIR coefficient subject to the corresponding estimated PDF. Then a one-classSVM model can be developed from these enhanced samples for the initial cluster of “good” process models.Remark 3.5.1. Note that the proposed SVM MPM detection approach relies on the training data and thendetecting changes on the testing data. In practice, there is no prior knowledge on how severe an MPM mightbe. Thus determining the value of tuning parameter α shall be based on the anticipated sensitivity frompractical demands. For example, if we require our approach to be able to detect only significant MPM but berobust to minor MPM, α shall be specified to a relatively large value and vice versa. Moreover, our approachcan detect both abrupt and slowly drifting mismatches since our training data will be fixed once they areselected.503.5. MPM detection3.5.3 MPM detection with SVMWith the trained one-class SVM, we first need to estimate FIR coefficients of the process model identifiedfrom each moving window in the testing data, following the same procedures as previous sections. Forestimated FIR coefficients, we apply the SVM model to predict whether they belong to the initial cluster. Ifso, the SVM returns a positive score value indicating that the current testing window does not display anysign of mismatch. Otherwise the SVM returns a negative score to signify the mismatch. However, to becautious to start an identification experiment, the MPM alarm is not triggered until we have accumulateda sufficiently large number of mismatch reports. Specifically, define It as the sign of score value for timeinstant tIt = sign(p(xt)), (3.26)with xt being the FIR coefficient vector of the plant model estimate for the window data at time t. DenoteTt = {t−nT , . . . , t−1, t}where nT is a detection interval, i.e., the number of previous moving windows underinspection to determine the existence of MPM. We further define an MPM indicators =|I−|nT, (3.27)where I− := {Ii = −1 : i ∈ Tt} and |I−| is the number of elements in the set |I−|. The user can specify athreshold sT for the MPM indicator to raise an MPM alarm. We suggest a conservative sT (e.g., sT = 0.95)to be circumspect in raising the MPM alarm.Remark 3.5.2. Note that the MPM detection method presented above can also be applied to the noise modelestimate A(q, ηˆn) from the ARX-OE method to find the noise mismatch. In this manner we can monitorthe process and noise models separately to distinguish MPM from noise model changes. We comment thatwhile the controllers considered in this chapter are assumed to be tuned based solely on the plant model,in cases where controller tuning also depends on noise model (such as minimum variance control and someMPCs), detection of a noise model change can also be used to trigger an identification experiment. Moreover,if a priori information about the true process model structure is not available, we can specify it with anFIR structure to acquire consistent model estimates and the subsequent mismatch detection scheme is stillapplicable. The entire logic flow of MPM detection is shown in 3.5.513.6. Examples of MPM detections for MD processesFigure 3.5: Flow chart of MPM detection scheme3.6 Examples of MPM detections for MD processesIn this section we demonstrate the MPM detection algorithm through a SISO example in the MD process ofpaper machines. The CV is dry weight and the MV is stockflow. The sampling interval is set as 5 seconds.After discretization we obtain the true plant modelG0 =0.10031−0.9048q−1 q−17.In the simulation of MD process the true noise model is selected asH0 =1−0.3q−11+0.6q−1.We use an MPC as the MD controller. To reflect the reality of a paper machine’s operating condition we setthe standard deviation of noise to be σ = 0.05. The entire simulation lasts 15 hours without any setpointchange. During this simulation, initially there is no mismatch between the true plant and process modelemployed in MPC. After 7 hours we change the true noise model intoH0 =1+0.3q−11−0.6q−1 ,523.6. Examples of MPM detections for MD processesTable 3.1: Parameters setup of the MPM detection algorithmParameters Values NoteWsize 2 hours Moving window sizeWstep 5 min Moving window step sizen 9 Order of ARX modelTtrain 3 hours Duration of training stagenT 2 hours Mismatch inspection intervalsT 0.95 Mismatch thresholdα 1.5 Tuning the width of estimated PDFto create a noise mismatch. Furthermore, we double the plant gain to introduce an MPM after 11 hours.The objective is to examine whether the proposed MPM detection algorithm is able to detect the process andnoise model changes separately, with the collected routine operating data. We summarize configurations ofparameters relevant to the MPM detection algorithm in Table 3.1. Note that we use the same window sizeand step size for both training and testing stages.Fig. 3.6 depicts the simulated CV and MV profiles. Note that the first vertical red dash-dotted lineindicates the time instant at which the noise change is introduced to the process. The second vertical lineshows the time when we create an MPM. In plotting this graph we have removed the mean from the profiles.It is obvious that both the noise change and MPM bring significant variations to the profiles which are notfavored since the control objective is to keep CV profiles as flat as possible. However, it is stressed that poorcontrol performance from merely noise change should not trigger an identification experiment. We use thefirst 180 minutes of data as training data and the rest as testing data.Fig. 3.7 demonstrates the detection results for both noise model change and MPM. Specifically, thefirst and third figures display the predicted SVM scores (cf. (3.24)) for noise and process model estimates,respectively. Clearly, the SVM scores drop to negative values after the corresponding noise model changeoccurs. The second and fourth plots track the mismatch indicator values s in (3.27) for both noise andprocess models. The red dash-dotted line highlights the specified threshold to raise an alarm. Ideally, asystem identification experiment is triggered once the s value of the process model exceeds the thresholdsT . However, in this example we neglect the subsequent identification part and it will be more explicitlydemonstrated in the following adaptive control chapters. From Figure 3.7 it is clear that the noise change atthe 420th minute does not affect the prediction of MPM. Thus we may conclude that the proposed mismatchdetection scheme is capable of monitoring the noise model change and MPM separately and thus is able todiscriminate MPM from noise model change by using only routine operating data.533.7. ConclusionTime (min)0 100 200 300 400 500 600 700 800 900-0.4- (min)0 100 200 300 400 500 600 700 800 900-0.3-0.2-0.100.1StockFlowFigure 3.6: Simulated input and output profiles3.7 ConclusionThis chapter presents a novel MPM detection algorithm that can separate MPM from noise model changes,relying only on routine operating data. To this end, we proposed a new closed-loop identification method thatcan give consistent parameter estimates for the process model without the need for any a priori informationabout the noise model. We split the mismatch detection problem into a training stage and a testing stage,and in the training stage an SVM model is developed based on the process model estimates. The trainedSVM model is then used to detect MPM in the testing data. With the same procedures we can train anotherSVM model for noise models to detect noise model changes. This technique is tailored well enough tomeet industrial demands on MPM monitoring. An example on paper machines is presented to illustrate theeffectiveness of the proposed method.543.7. ConclusionTime (min)0 100 200 300 400 500 600 700 800-1000010002000SVM Scores of Noise Model -- Updated Every 5minTime (min)0 100 200 300 400 500 600 700 800-0.500.511.5Noise Model Mismatch Indicator -- Updated Every 5minTime (min)0 100 200 300 400 500 600 700 800-2000020004000Scores of Plant Model -- Updated Every 5minTime (min)0 100 200 300 400 500 600 700 800-0.500.511.5Plant Model Mismatch Indicator -- Updated Every 5minFigure 3.7: MPM and noise change detection results55Chapter 4Control Performance Assessment for CDProcesses4.1 IntroductionSimilarly to typical control loops, the CD process also suffers from the performance limitation due to time-delay in the dynamic model in temporal direction. In addition, the CD process has its unique performancelimitations in the spatial direction which arise from the spatially-distributed nature of this large-scale process.An obvious fact regarding CD control is that the number of CD measurement bins is much greater than thatof CD actuators, and this makes the CD process model matrix non-square. As a result, it becomes plausiblethat we are not able to find a CD controller to let the actuator array completely control all CD measurementbins. This non-square nature of CD process models forms the fundamental spatial performance limitation.Moreover, most CD process are ill-conditioned and the gains corresponding to high spatial frequency modesare small [12]. Thus for the sake of robust stability when model uncertainty is present, the controller nor-mally does not exert any control action to compensate for high-frequency spatial disturbances, leaving themuncontrolled. In this sense it is appropriate to select the benchmarking CD controller as the one that allowsfor these CD performance limitations.In this chapter we first derive the MVC benchmark for the CD process by studying both spatial andtemporal fundamental performance limitations. However, MVC benchmark is rather aggressive since itrepresents an ideal performance limit for attenuating variations. In order to better reflect the performanceof a practical controller, our next step is to build up a user-specified benchmark that not only accounts forabove physical performance constraints but also considers the tuning status of implemented controllers. Wemay expect such user-specified benchmark to be a more suitable choice in monitoring the health of industrialprocesses.564.2. Preliminaries10080CD Bins60(a) Steady-State Profile40200050100Scans15021521020520020010080CD Bins60(b) Residual Profile40200050100Scans1500-5520010080CD Bins60(c) CD Profile40200050100Scans150210215200195205200Figure 4.1: Illustration of the variation separation. (a): The steady-state profile plot. Note that the steady-state profile is replicated to have the same scan number as the residual profile and CD profile; (b): Theresidual profile plot. Each CD bin has zero mean; (c): The overall CD profile. Note that the CD profile iscombined by the steady-state and the residual profile.4.2 Preliminaries4.2.1 Variance partitionA given data set Y ∈ Rm×N from a paper machine can always be separated into machine direction (MD),cross direction and residual components, where m is the number of measurement bins, and N is the numberof scans in the data set. For details on the calculation and partitioning of variance, see Appendix B.1. Interms of CD control and CD performance monitoring, the MD variation is not taken into account. The dataset without MD variation at time t is denoted as y(t) ∈ Rm (CD profile), and we havey(t) = yss+yr(t), (4.1)where yss ∈ Rm is the steady-state profile, which is constant over all scans. yr(t) ∈ Rm is the residualprofile and is changing over time. Figure 4.1 shows typical graphs of steady-state, residual and CD profiles.The process models for steady-state and residual profiles will be given in the following subsection, and themotivation to treat them separately will be presented in Section Process modelIn traditional CD control, the steady-state performance is of great importance since most paper machinesoperate in regulatory mode at steady-state. The static steady-state model of a CD process is expressed asyss = Guss+vss, (4.2)574.2. Preliminarieswhere uss ∈ Rn is the steady-state manipulated variable, and n is the number of actuator zones. G ∈ Rm×nis the steady-state gain matrix. vss is the steady-state disturbance which refers to a deterministic disturbancepersistently acting on the output, e.g., a spatial sinusoidal disturbance which is not changing over time. Forthe CD process, the overall disturbance is assumed to be a combination of the steady-state disturbance vssand a filtered white noise vr(t), as shown in (4.3) below.When considering only the residual profile, we have the following model,yr(t) = g(q)Gur(t)+vr(t), (4.3)where yr(t) = y(t)−yss can be considered as the deviation of process output from the steady-state value dueto stochastic disturbances. Similarly, ur(t) = u(t)−uss,vr(t) = v(t)−vss are the deviations of manipulatedvariable and disturbance from their steady-state values, respectively. The output disturbance, vr(t) ∈ Rm, isgenerally assumed to be filtered white noise. Note that the subscripts r in (4.3) stand for the residual. Thescalar transfer function g(q) in (4.3) can further be expressed asg(q) = z−dB(q)A(q), (4.4)where d stands for the time-delay and B(q) and A(q) are scalar polynomials. Similarly, the stochastic dis-turbance vr(t) in (4.3) is filtered white noise assumed to be temporally and spatially separable, denotedasvr(t) =C(q)A(q)φe(t), (4.5)where C(q) and A(q) are scalar polynomials describing the temporal filter while the constant matrix, φ , isused to represent the spatial filtering of the white noise vector e(t). The covariance matrix of white noisevector e(t) is assumed to be E[e(t)eT (t0)] = Σeδ (t − t0), where E is the expectation operator, Σe is thecovariance matrix and δ is the Dirac delta function.Remark 4.2.1. From the above description on steady-state and residual profiles, one can interpret each entryof the steady-state profile yss as the mean of the corresponding output channel (measurement bin). Eachentry of the residual profile yr(t) is the deviation of the profile from corresponding mean values.584.3. The MVC benchmark for CD processes4.3 The MVC benchmark for CD processesIn this section, the performance limitations for both steady-state model (4.2) and residual dynamic model(4.3) are illustrated. The MVC benchmarks for both models are developed analogously, and new performanceindices are proposed based on these benchmarks.4.3.1 MVC benchmark for the steady-state profileFor the steady-state model (4.2), the optimal control input uss which minimizes the output variance has thestructureuss =−(GT G)−1GT vss, (4.6)involving the pseudo-inverse of the G matrix. It has been proved in [2] that if a controller has the structure(4.6) and an integrator in the dynamic part, then the steady-state output profile yss will contain no componentsin the column space of G, which is regarded as the controllable subspace. If the G matrix is square andinvertible, then the optimal input is able to achieve zero steady-state output. Therefore, the structure of theG matrix limits the performance of steady-state model (4.2). For a non-square G matrix with full rank, i.e.,rank{G}= min{m,n}, the output yss with minimal variance isyss = [I−G(GT G)−1GT ]vss. (4.7)In order to demonstrate the controller form which is able to achieve minimal steady-state variance, we assumethat the controller K(q) has the following structure (refer to [2])K(q) = k(q)(GT G)−1GT . (4.8)where k(q) is the scalar dynamic part of K(q). From (4.8), the closed-loop sensitivity function is obtained asy(t) = [1+g(q)GK(q)]−1v(t). (4.9)594.3. The MVC benchmark for CD processesUsing the singular value decomposition (SVD) of G, (4.9) is further simplified as yc(t)yu(t)= 11+g(q)k(q) 00 I(m−n)×(m−n) vc(t)vu(t) , (4.10)where the subscripts c and u refer to spatially controllable and spatially uncontrollable signals, respectively.If k(q) has an integrator, at steady-state, (4.10) will become yss,cyss,u= 0 00 I(m−n)×(m−n) vss,cvss,u . (4.11)It is clear that the MVC for the steady-state model (4.2) will completely remove all disturbance componentswithin the controllable subspace. However, those disturbance components within the uncontrollable sub-space will not be affected by the controller. Therefore, the spatially uncontrollable components yss,u can beused to develop the MVC benchmark. Note that the actual steady-state profile under a CD controller such asCD-MPC (not spatial MVC) may have components left in the controllable subspace.4.3.2 MVC benchmark for the residual profileFor the residual profile, in the spatial direction, as with the steady-state case, due to there being more CD binsthan actuator zones, i.e., G is not square, not all of the directions of G are controllable. This means that itis impossible to design a controller to reach zero error for a given disturbance. Therefore, the structure of Gmatrix contributes to the spatial performance limitation in the CD controller. In addition, the residual profile(4.3) also suffers temporal performance limitation due to the actuator dynamics. In the temporal direction,the time-delay forms a fundamental limitation on the controller design, upon which various types of delaycompensators arise such as dead-beat controller, minimum variance controller, Dahlin controller, and so on.For the disturbance model (4.5), from the Diophantine identity, we can decompose the prediction fortime t+d (at time t) asyr(t+d|t) = yˆr,c(t+d|t)+ yˆr,u(t+d|t)+F(q)φe(t+d)︸ ︷︷ ︸controller-invariant, (4.12)where the subscript c and u stand for spatially controllable and uncontrollable parts of yr(t + d|t), respec-604.3. The MVC benchmark for CD processestively. Here the controllable subspace is defined as the column space of G, as for the steady-state case. F(q)is the first d terms after decomposition of C(q)A(q) . The derivation of (4.12) is referred to Appendix B.2. It can beobserved in (4.12) that the last two terms are controller-invariant, both spatially and temporally, parts of theprofile, and hence can be used to define a benchmark for performance assessment. If the current controllerbeing implemented is MVC, then the first term on the right hand side of (4.12) disappears.Based on the proposed MVC benchmarks for both steady-state profile in (4.7) and residual profile in(4.12), a new MVC performance index for the CD process can be defined asη1 =trace[d−1∑i=0FiΣeFTi +Σyˆr,u +diag(yss,uyTss,u)]trace(Σy,mse), (4.13)where Fi = fiφ , Σyˆr,u is the covariance matrix of the uncontrollable predicted profile yˆr,u, and diag(·) is amatrix formed by diagonal elements. The term in the denominator Σy,mse is defined asΣy,mse = Σyr +diag(yssyTss), (4.14)where Σyr is the covariance matrix of the residual profile, each element on the diagonal represents the varianceof an individual output channel of yr. diag(·) represents the operator of extracting the diagonal entries of amatrix into a diagonal matrix. For the term diag(yssyTss), each element stands for the corresponding meandeviation from zero of each individual output channel. In (4.13), the first term ∑d−1i=0 FiΣeFTi in the numeratorrepresents the covariance of unpredictable components in the residual profile. The second term Σyˆr,u indicatesthe covariance of spatially uncontrollable predicted residual profile. The third term diag(yssyTss) stands for thespatially uncontrollable portion of steady-state profile. Thus the numerator of (4.13) specifies the measureof both residual MVC benchmark (4.12) and steady-state MVC benchmark (4.7). On the other hand, thedenominator of (4.13) represents the overall mean square error (MSE) of the output profile. Hence, the newMVC performance index η1 is the ratio between the covariance of benchmark of y(t) and its total variance(see Appendix B.1). If the implemented controller is MVC, the index η1 will be equal to 1 as the measuredoutput y only contains uncontrollable components, which are exactly the terms shown in the denominator of(4.13). Otherwise, η1 will be less than 1. In general, a smaller value of η1 implies worse control performance.From the performance index in (4.13), we can see that one has to separate the spatially uncontrollablecomponents yˆu(t) from yˆ(t) in (4.12) in order to evaluate the benchmark. Besides, this performance index614.3. The MVC benchmark for CD processestakes the spatially uncontrollable components into account. In industrial CD control systems, the spatiallyuncontrollable components of disturbances almost remain untouched. Therefore, we expect the performanceindex (4.13) to be sensitive to high-frequency spatial disturbances. If there is a great amount of spatiallyhigh-frequency disturbances (beyond the spatial bandwidth) present, the performance index (4.13) will beinflated by these high-frequency components, and therefore the performance index will be always closeto one (this will be illustrated in the simulation part). In this case, the performance index η1 becomesincapable of detecting the performance drop. An alternative is to separate the white noise e(t) into spatiallyuncontrollable components eu(t) and controllable components ec(t). Then (4.12) can be rewritten asyr(t+d|t) = yˆr,c(t+d|t)+F(q)φec(t+d)+F(q)φeu(t+d)+ yˆr,u(t+d|t)︸ ︷︷ ︸yr,u(t+d|t), (4.15)where F(q)φeu(t+d|t) and yˆr,u(t+d|t) are combined as yr,u(t+d|t), the uncontrollable parts of yr(t+d|t).Specifically, define the projection operators Pc and Pu which project the profile into spatially controllableand uncontrollable subspaces, respectively. For the column space framework, Pc and Pu are defined asPc = (GT G)−1GT , Pu = I− (GT G)−1GT . (4.16)Then we haveyr,u(t+d|t) = Puyr(t+d|t), yss,u = Puyss. (4.17)From (4.17) one can see that the spatially uncontrollable components of both steady-state and residual pro-file can be extracted by using the operator Pu. In order to solve the problem with the performance index(4.13) being sensitive to spatially high-frequency disturbances, the following modified performance index issuggested,η2 =trace[d−1∑i=0FiΣecFTi]trace(Σyc,mse), (4.18)where Σec = E[eceTc ] is the covariance matrix of the white noise within the spatially controllable subspace.The covariance matrix in the denominator is further expressed to beΣyc,mse = Σyr,c +diag(yss,cyTss,c), (4.19)624.4. User-specified benchmarkwhere Σyr,c is the covariance matrix of spatially controllable residual profile. The performance index (4.18)compares the covariance of unpredictable disturbance within the controllable subspace with the mean squareerror of the overall spatially controllable output profile. This performance index is not sensitive to high-frequency spatial disturbances since the spatially uncontrollable components have been removed before cal-culating the performance index.Remark 4.3.1. Note that the numerator of (4.18) includes only the residual part, which makes sense sinceif the implemented controller has the spatial MVC structure (4.6) and an integrator, then the components ofthe steady-state profile within the controllable subspace (the steady-state benchmark) will be zero. However,since most implemented controllers are not MVC, there will be components left in the spatially controllablesubspace, which explains the steady-state terms in (4.19).4.4 User-specified benchmarkIt is well known that for control loops MVC gives aggressive control actions and lacks robustness to modeluncertainties. Consequently, MVC is not widely used in the process industry. In practice, to guaranteerobust stability and performance, the actually implemented controllers are much more sluggish than MVC.If the MVC is still used as the benchmark, then most industrial controllers will show a very low performanceindex even though the underlying control loop is indeed operating with satisfactory performance. In suchcases, the observation of low performance index based on the MVC benchmark does not necessarily implypoor controller design. Therefore, it is important to develop practical benchmarks based on the specificcontroller that is implemented in the process. The user-specified benchmark is an outcome of this idea, wherea filter is defined as the desired closed-loop behavior and a parameter in the filter can be tuned to change theaggressiveness and conservativeness of the benchmark. In this section, the user-specified benchmark will beadapted to the CD process to make our benchmark more realistic.For the CD process, we note that the spatial part of the MVC, G† in (B.12), removes the components ofdisturbance profile within the subspace spanned by columns of G. However, due to the spatially-distributednature, the G matrix for most CD processes has a large condition number. The ill-conditioned propertyof G implies that some of the singular values are vanishing. Therefore, the corresponding singular vectordirections are considered uncontrollable and avoided in the CD controller so as to ensure robust stability andacceptable actuator action. It is therefore more realistic to select those (pseudo) singular vector directions (or634.4. User-specified benchmarkspatial frequencies, in the perspective of Fourier matrix transform) with significant mode gains and withoutwrong signs as controllable directions [99], which typically corresponds to the low spatial frequency range(specified by spatial bandwidth in this work).The selection of desired spatial benchmark is not fixed depending on the specific CD controller that isbeing used. For instance, if we are using CD MPC, we may choose a spatial frequency dependent sensitivityfunction as the benchmark, which can be obtained from the steady-state weighting matrices in the objectivefunction when there are no active constraints. In this thesis, for simplicity, we choose the estimated (closed-loop) spatial bandwidth as the spatial benchmark, which can be approximated from the width of the spatialresponse [100].For the spatial bandwidth, the mathematical operators separating the spatially controllable and uncon-trollable components Pc,user and Pu,user are constructed asPc,user = PTc Pc, Pu,user = I−PTc Pc, (4.20)where 3 Pc = [P(1 : r, :) 0 P(m− r+ 2 : m, :)], P is the m-dimensional Fourier matrix and in Matlab it canbe defined as P = f f t(eye(m))/sqrt(m). r is the selected spatial bandwidth. Notice that the selection of Pcan affect the performance index but this effect will be so small that the decision (e.g., as to the presence orabsence of MPM) based on the performance index will not be influenced. Choosing P as the Fourier matrixis for the sake of being consistent with the definition of spatial bandwidth which is used in the tuning ofCD controllers and expressed in the frequency domain. Moreover, it is more intuitive for users to specifythe desired spatial bandwidth by choosing the Fourier matrix. According to the rule-of-thumb proposed in[101], r can be determined directly with the knowledge of spatial response width. The spatially controllablecomponents, for both residual and steady-state profiles, areyss,user = Pc,useryss, yr,user(t) = Pc,useryr(t). (4.21)In the temporal direction, when the implemented controller is not MVC, the controllable (either from3Note that the colon follows Matlab’s notation. P(1 : r, :) represents the first r rows of P, and P(m− r+ 2 : m, :) represents thelast r−1 rows of P.644.4. User-specified benchmark(4.16) or (4.20)) residual profile yr,c(t) can be expressed as the impulse response form,yr,c(t) = f0ec(t)+ f1ec(t−1)+ . . .+ fd−1ec(t−d+1)+fdec(t−d)+ fd+1ec(t−d−1)+ . . . . (4.22)For the temporal MVC, there will be no terms remaining after the first d terms in the time series model(4.22). The temporal user-specified term (scalar) GR(q) can be used to define a desirable form for theremaining terms such that,yr,c(t) = f0ec(t)+ . . .+ fd−1ec(t−d+1)+GR(q)ec(t−d). (4.23)The user-specified term GR(q) can be selected as [18],GR(q) = [1−GF(q)]R(q), (4.24)where GF(q) is the desired complementary sensitivity function with the first order form,GF(q) =1−αR1−αRq , (4.25)and αR is calculated via the desired closed-loop time constant τdes,αR = e− Tsτdes , (4.26)where Ts is the sampling time. R(q) is from the dynamic part of the disturbance model (4.5) via the Dio-phantine decomposition (B.7), R(q) = H(q)/A(q).By combining the spatial operators (4.21) and temporal term (4.24), the user-specified counterpart of η2can be obtained as follows,η2,user =trace[d−1∑i=0FiΣeuser FTi +Σuser]trace(Σyuser,mse), (4.27)where Σeuser =E[euser(t)eTuser(t)], euser(t)=Pc,usere(t), Σuser =Var[GR(q)euser(t)]. The denominator of η2,user654.5. Performance monitoringis,Σyuser,mse = Σyr,user +diag(yss,useryTss,user), (4.28)where Σyr,user = E[yr,user(t)yTr,user(t)]. It can be seen that compared with η2, in the spatial direction, the onlydifference in η2,user is that the controllable projector is replaced by a user-specified projector Pc,user, which isapplied to both steady-state and residual profiles. In the temporal direction, an additional term GR(q) whichrepresents the desired sensitivity function is included to the residual benchmark. Note that this term is notapplicable to the steady-state profile.Remark 4.4.1. Note that (4.24) implies that in order to obtain a user-specified benchmark, the disturbancemodel has to be available, which is not realistic as the disturbance model may change from time to time.However, for simplicity, we assume the disturbance model to be known. This assumption is valid sincethere have been extensive methods proposed on the identification of disturbance models using closed-loopinput-output data [51, 102].Remark 4.4.2. If the user-specified term is selected to be the same as the nominal closed-loop response (whenthe tuning parameters of controller are available), then the highest achievable user-specified performanceindex will be 1. In this case, the value of user-specified performance index will make more sense andprovide better indication of the control performance.4.5 Performance monitoringIn order to compute previous performance indices, the residual profile has to be fitted into a moving averagemodel (refer to (4.15) and (4.22)) to obtain the estimates of impulse response coefficient matrices and whitenoise covariance. However, due to the high input-output dimensions of CD processes, the computationalburden plays an essential role in multivariate time series estimation. In this section, a novel technique isproposed to reduce the computations in performance monitoring.4.5.1 Vector autoregressive modelingAs illustrated in previous sections, to proceed with performance monitoring, we need to perform the follow-ing multivariate time series identification,yr(t) =Θ1yr(t−1)+ . . .+Θpyr(t− p)+ e(t), (4.29)664.5. Performance monitoringwhere e(t) ∈ Rm is the white noise vector. The Θi ∈ Rm×m, i = 1,2, . . . , p, are the coefficient matrices to beestimated for the vector autoregressive (VAR) process, where p is the temporal order selected by the user. IfΘi, i = 1,2, . . . , p, are chosen to be full matrices, estimating the VAR model (4.29) will be computationallyexpensive. However, we can assume that Θi, i = 1,2, . . . , p, are Toeplitz-structured, because in industrymost CD controllers have limited spatial response width, which means that the CD multivariate controllerwill be band-diagonal [11, 103, 104]. Furthermore, the plant G in general is Toeplitz-structured, and as aresult the closed-loop sensitivity function will be approximately band-diagonal [105]. By taking advantageof the special structure ofΘi, the estimation problem can be greatly simplified through basis matrices methoddescribed below.We construct basis matrices to decompose each Toeplitz-structured coefficient as the sum of a seriesof scalars multiplied by simple basis matrices. Each Toeplitz-structured VAR coefficient Θi matrix has theform,Θi = toeplitz{θi,1, . . . ,θi,q, . . .}m×m, (4.30)where q is the spatial order selected by the user. There are only q unknown scalars to be estimated ineach coefficient matrix. The unknown scalars θi, j, j = 1, . . . ,q, are extracted from Θi by rewriting the largedimensional matrix as the sum of simple terms,Θi =q∑j=1θi, jE j, (4.31)where E j, j = 1, . . . ,q are basis matrices with the jth superdiagonal and − jth subdiagonal entries as 1, whilethe other entries are all 0. Thus the i-th term of the VAR model (4.29) can be written as,Θiyr(t− i) =q∑j=1θi, jE jyr(t− i) =q∑j=1θi, jy˜r,i j(t− i),where y˜r,i j(t− i) = E jyr(t− i). The overall VAR model becomesyr(t) = θ11y˜r,11(t−1)+ . . .+θ1qy˜r,1q(t−1)+θ21y˜r,21(t−2)+ . . .+θ2qy˜r,2q(t−2)+. . .+θp1y˜r,p1(t− p)+ . . .+θpqy˜r,pq(t− p)+ e(t)4= Y˜r(t−1)θ + e(t), (4.32)674.5. Performance monitoringwhereY˜r(t−1) =[y˜r,11(t−1) y˜r,12(t−1) . . . y˜r,pq(t− p)],θ =[θ11 θ12 . . . θpq]T.The following steps for identification are similar to the scalar autoregressive model identification. There arevarious well-developed techniques for this problem, e.g., the least-squares method. The residuals eˆ(t) result-ing from the identification are considered to be estimates of the innovations e(t). The estimate of coefficientsθ is denoted as θˆ . Then the Toeplitz coefficients Θˆi, i = 1, . . . , p, can be determined by reconstructing θˆ . Byusing the basis matrices, the estimation of the VAR model (4.29) can be significantly simplified.In order to calculate the performance indices, the VAR model above is transformed into the followingvector moving average (VMA) model by using the technique in [106],yr(t) = Φˆ0eˆ(t)+ . . .+ Φˆd−1eˆ(t−d+1)+ Φˆd eˆ(t−d)+ . . . , (4.33)whereΦˆ0 = I, (4.34)Φˆi =i∑j=1Φˆi− jΘˆ j, i = 1, . . . ,d, . . . . (4.35)Although the order of the VMA model (4.33) will be infinite, we are only interested in the first d termssince they are the coefficients required in evaluating the benchmark. When applying this algorithm, d can beselected to be the time delay in the model as an approximation of the true time delay in the process. On theother hand, due to the spatial performance limitation resulting from the structure of G, only the controllablecomponents of estimated residuals eˆc(t) and the output profile yr,c(t) are considered. The covariance matrixof the output under temporal MVC within the column space of G matrix is thus expressed as,Σˆmv =d−1∑i=0ΦˆiΣeˆcΦˆTi , (4.36)where Σeˆc = E[eˆc(t)eˆTc (t)] is the covariance of spatially controllable residual, eˆc(t). The overall estimated684.5. Performance monitoringperformance index ηˆ2 is obtained as,ηˆ2 =trace(Σˆmv)trace(Σyc,mse), (4.37)where ηˆ2 is the estimate of performance index η2.4.5.2 Performance monitoring algorithmAs illustrated in previous sections, any given CD data set (without MD variations) can be separated into a sumof steady-state and residual profiles. The steady-state profile is obtained by averaging the data for each CDbin over all scans. In order to calculate performance indices (4.13), (4.18) or (4.27), the steady-state profileyss has to be separated into controllable parts yss,c and uncontrollable parts yss,u according to the columnspace of G or the spatial bandwidth. A VAR model (4.29) is applied to the residual profile yr(t) to obtain thecontroller-invariant variation due to time-delay and spatial bandwidth limitations. The uncontrollable partsof the steady-state variation, yss,u, and the controller-invariant variations of the residuals, yr,u, are combinedto obtain the overall benchmark in (4.13). The algorithm we propose to calculate (4.18) is as follows:1. For a given data set, Y, remove the MD variations (i.e., mean value of each scan).2. Calculate the average CD profile by averaging the data set across all scans, record it as yss. Removethe mean of each CD bin to obtain the residual profile yr(t).3. Perform Toeplitz-structure VAR estimation (4.29) using the residual profile yr(t) with the selectedspatial order q and temporal order p. Check the whiteness of noise estimate eˆ(t). If the residual is notwhite, one can increase the orders of VAR model to allow more flexible model structures.4. Transform the VAR model into a VMA model (4.33) and obtain the coefficient matrices Φˆi, i =0, . . . ,d−1.5. Construct the operator Pc based on the column space (4.16), and get the following controllable com-ponents: eˆc = Pceˆ(t), yr,c = Pcyr, yss,c = Pcyss,c.6. Calculate the estimated minimum covariance Σˆmv in (4.36), Σyc,mse in (4.37) and PI in (4.37).For the user-specified benchmark η2,user, in Step 5, the operator Pc can be constructed based on the se-lected spatial bandwidth (4.20). Besides, the temporal user-specified term can be obtained from the knowl-edge of desired closed-loop time constant together with (4.24) and (4.26).694.6. ExamplesRemark 4.5.1. In order to obtain the performance index, one has to fit the CD profile to the VAR model(4.29), which requires each output channel to be zero-mean. Otherwise, there will be an offset term showingup in (4.29), as illustrated in [106]. This fact motivates the profile partition in Section 4.2 and 4.3. Aftersplitting the output measurements into steady-state profile and residual profile, we can simply fit only theresidual profile into (4.29).4.6 ExamplesIt has been mentioned that various factors such as a poorly tuned controller, model-plant mismatch, changeof disturbance dynamics, etc., can cause a drop in the performance index. From the perspective of industryengineers, when the model quality deteriorates, a new model has to be identified. Therefore, it is of interestto determine whether the proposed performance benchmarks are sensitive to model-plant mismatches. Inthis section, the validity of the proposed performance benchmark for the CD process is tested using data setsfrom both paper machine simulators and paper mills.4.6.1 Simulation resultsIn this subsection, scenarios with various types of model-plant mismatch are created to test the sensitivityof performance index. In this simulation, the employed control strategy is the two-dimensional loopshapingtechnique. The number of actuators in the CD array is 238, and the number of CD bins downstream at thescanner side is 714.The continuous steady-state spatial response shape of a single actuator is determined by four parameters,gain γ , width ξ , divergence β and attenuation α in (1.2) (see [12] for more details). In the temporal direction,the actuator dynamic model g(s) is assumed to be FOPTD with unit steady-state gain,g(s) =1τs+1e−τds, (4.38)where τ the time constant τ and τd is the time delay. In the simulator, the nominal values of these parameterson both plant and process model are set initially as γ0 = −0.03,ξ0 = 164mm,β0 = 0.15,α0 = 7.0,τ0 =17.34s,τd0 = 21s. Figure 4.2 illustrates the spatial steady-state impulse response and the temporal stepresponse of one actuator with these nominal parameter values. This process is relatively easy to control704.6. Examples−400 −200 0 200 400−0.025−0.02−0.015−0.01−0.00500.005Position [mm]AmplitudeActuator Spatial Steady−State Response0 20 40 60 80 100 120− [seconds]AmplitudeTemporal Step ResponseFigure 4.2: Spatial impulse steady-state response and the temporal step response of a single actuator. Thenegative peak of the spatial response is due to the negative gain of the actuator spatial model.−400 −200 0 200 400−0.05−0.04−0.03−0.02−0.0100.01Position [mm]Actuator Spatial Response with Different Gains  γ=0.5γ0γ=0.75γ0γ=γ0γ=1.5γ0γ=2.0γ0User−specified Benchmark MVC Bencmark00. IndexGain Mismatch  γ=0.5γ0γ=0.75γ0γ=γ0γ=1.5γ0γ=2.0γ0Figure 4.3: Performance indices for different levels of gain mismatch. Note that γ0 is the nominal gain valueused by the controller, γ is the actual gain value of the process plant.since there are no negative side lobes in the spatial response shape. The controllers are properly tuned byusing the two-dimensional loop shaping technique based on the nominal process model.In order to investigate the sensitivity of proposed performance index in detecting the model-plant para-metric mismatch, scenarios with different levels of mismatch for these spatial and temporal parameters arecreated. For convenience, we manually increase and decrease the parameter values in the plant while theparameter values in the process model used by the controller remain unchanged. In the following simula-tions, we denote γ,ξ ,β ,α,τ,τd without subscripts as the plant parameters. In addition, positive mismatchesindicate that the plant parameters are greater than those in the model. For instance, positive gain mismatchmeans the plant gain is greater than the model gain used by the controller.Figures 4.3-4.6 illustrate the simulation results with respect to various levels of parametric mismatch foreach parameter in the spatial model. Note that in this simulation, the sampling time is 20 seconds and the714.6. Examples−400 −200 0 200 400−0.025−0.02−0.015−0.01−0.00500.005Position [mm]Actuator Spatial Response with Different Widths  ξ=0.5ξ0ξ=0.75ξ0ξ=ξ0ξ=1.5ξ0ξ=2.0ξ0User−specified Benchmark MVC Bencmark00. IndexWidth Mismatch  ξ=0.5ξ0ξ=0.75ξ0ξ=ξ0ξ=1.5ξ0ξ=2.0ξ0Figure 4.4: Performance indices for different levels of width mismatch. Note that ξ0 is the nominal widthvalue used by the controller, ξ is the actual width value of the process plant.−400 −200 0 200 400−0.03−0.025−0.02−0.015−0.01−0.00500.005Position [mm]Actuator Spatial Response with Different Divergences  β=0.5β0β=0.75β0β=β0β=1.5β0β=2β0User−specified Benchmark MVC Bencmark00. IndexDivergence Mismatch  β=0.5β0β=0.75β0β=β0β=1.5β0β=2β0Figure 4.5: Performance indices for different levels of divergence mismatch. Note that β0 is the nominaldivergence value used by the controller, β is the actual divergence value of the process plant.−400 −200 0 200 400−0.03−0.02−0.0100.010.02Position [mm]Actuator Spatial Response with Different Attenuations  α=α0α=2.3α=1.5α=1.0α=0.5User−specified Benchmark MVC Bencmark00. IndexAttenuation Mismatch  α=α0α=2.3α=1.5α=1.0α=0.5Figure 4.6: Performance indices for different levels of attenuation mismatch. Note that α0 is the nominalattenuation value used by the controller, α is the actual attenuation value of the process plant.724.6. ExamplesUser−specified Benchmark MVC Bencmark00. IndexTime Constant Mismatch  τ=0.5τ0τ=τ0τ=2.0τ0τ=2.5τ0τ=3.0τ0User−specified Benchmark MVC Bencmark00. IndexDelay Mismatch  τd=τd0τd=41sτd=61sτd=81sτd=101sFigure 4.7: Performance indices for different levels of time constant and time delay mismatch. Note that τ0and τd0 are the nominal time constant and time delay value used by the controller, τ and τd are the actualtime constant and time delay value of the process is selected after the initial transient behavior. The number of scans in the selected portion of data is 500.The left graph in each figure shows the comparison of spatial response shape of each mismatched plant withthat of the nominal case, from which we can know the significance of distortion the corresponding parametricmismatch can cause. The right graph of each figure shows the corresponding calculated performance indexfor each case based on both MVC benchmark η2 and user-specified benchmark η2,user. One can see that ingeneral the user-specified performance indices are higher than that based on the MVC benchmark, whichagrees well with the previous analysis as the user-specified benchmark is more practical and less aggressive.These figures show that for most MPM, the performance indices based on both MVC benchmark and user-specified benchmark decrease as the degree of MPM increases. However, the performance indices with gainand width mismatches with γ = 0.75γ0 and ξ = 0.75ξ0 show better performance than the case without MPM.It can be explained that the degree of mismatch is not severe and within the tolerance of implemented robustcontroller and these mismatched processes are more close to the models used by the controller. However, forthe cases with large mismatches, all performance indices drop. Besides, the performance index is sensitive tothe divergence mismatch but not so sensitive to the attenuation mismatch. Figure 4.7 shows the performanceindices for the time constant and time delay mismatches. One can see that the performance indices are ableto detect the drop in performance due to the time constant or delay mismatch as well, but not so sensitive asthe spatial parameters. Note that for some cases (e.g. width mismatch with ξ = 0.75ξ0), the indices show aslightly better performance compared with the nominal case. Table I shows the partitioned variances for eachsimulated case, in which σT , σCD, σRes refer to the total variance, CD variance (variance of the steady-state734.6. ExamplesTable 4.1: Variance partition for each simulated caseγ MPM σT σCD σRes ξ MPM σT σCD σResγ = 0.50γ0 0.2575 0.0687 0.1888 ξ = 0.50ξ0 0.3715 0.1812 0.1903γ = 0.75γ0 0.2454 0.0523 0.1930 ξ = 0.75ξ0 0.2391 0.0453 0.1937γ = 1.50γ0 0.2511 0.0453 0.2059 ξ = 1.50ξ0 0.2621 0.0643 0.1979γ = 2.00γ0 0.2739 0.0485 0.2253 ξ = 2.00ξ0 0.2979 0.1017 0.1961β MPM σT σCD σRes α MPM σT σCD σResβ = 0.50β0 0.3804 0.1861 0.1943 α = 2.3 0.2496 0.0505 0.1991β = 0.75β0 0.2434 0.0459 0.1975 α = 1.5 0.2595 0.0543 0.2052β = 1.50β0 0.2571 0.0459 0.1975 α = 1.0 0.2756 0.0594 0.2162β = 2.00β0 0.3497 0.1579 0.1918 α = 0.5 1.8128 0.2307 1.5758τ MPM σT σCD σRes τd MPM σT σCD σResτ = 0.5τ0 0.2428 0.0480 0.1948 τd = 41 0.2448 0.0473 0.1976τ = 2.0τ0 0.2406 0.0472 0.1933 τd = 61 0.2480 0.0491 0.1989τ = 2.5τ0 0.2447 0.0480 0.1967 τd = 81 0.3093 0.1060 0.2033τ = 3.0τ0 0.3007 0.1059 0.1948 τd = 101 0.4271 0.2224 0.2047Note: For the normal case, σT = 0.2439, σCD = 0.0476, σRes = 0.1962.profile) and residual variance. Note that there is no MD variance since the MD profile has been removedbeforehand. By comparing these results with the previous performance indices one can see, mostly, thelarger variance in either CD or residual than the nominal case will correspond to worse control performance.Therefore we can conclude that the proposed performance indices are not only affected by the CD variance,but also the residual variance.To demonstrate the advantage of performance index η2 over η1, another simulation is carried out withgain mismatch γ = 2.0γ0 and a spatial sinusoidal disturbance (with frequency greater than the closed-loopspatial bandwidth) added to the output. Sinusoidal disturbance is commonly encountered in practice whenthere are malfunctions in process devices. The performance index is expected to drop relative to the normalcase due to the presence of gain mismatch, however, it is desirable for the high frequency spatial disturbanceto affect the index as little as possible. Simulation results under this situation are illustrated in Figure 4.8.The performance index η1 with both positive gain mismatch and high frequency spatial disturbance remainsalmost the same as the normal case, while it drops a little for the case with gain mismatch only. Thus itis not straightforward to observe the performance deterioration in the presence of high frequency spatialdisturbance if we are using performance index η1. However, for performance index η2, the presence of highfrequency spatial disturbance has almost no affect, and the drop in the performance index is due to the gainmismatch. Moreover, one may find from Figure 4.8 that the performance index η1 is not so sensitive to themismatch as the index η2 since the drop due to gain mismatch is much smaller compared with η2. Therefore,744.6. Examples00.η1 η2Performance Index  No mismatchGain Mismatch γ=2.0γ0Gain mismatch γ=2.0γ0 plus high freq. dist.Figure 4.8: Comparison of performance index η1 and η2 with high frequency spatial disturbancewe can achieve reliable assessment based on performance index η2. As mentioned in the previous section,the misleading conclusion from η1 is a result of the deterministic high frequency disturbance inflating boththe covariance of the benchmark and the actual output in (4.13). The ratio between the two covariancematrices will be very close to one.4.6.2 Industrial exampleIn this subsection, data sets from a paper mill are used to validate the effectiveness of the proposed tech-niques. The number of actuators and CD bins in this example are 114 and 402, respectively. The width ofeach actuator zone is 60 mm and the width of each CD bin is 16.79 mm. Figure 4.9a illustrates the measureddry weight profile without MD variations. Note that the edges of the profile that were not controlled havebeen removed. Figure 4.9b shows the corresponding actuator (ProFlow) profile with lower bound 10% andupper bound 90%. The sampling interval is 16 seconds with time-delay 45 seconds and the number of scansis 551. The implemented controller is a multivariate CD-MPC. In this data set, an unknown spatial distur-bance is acting on the output profile and is attenuated by the controller through manipulating the actuators.Performance monitoring with a moving window of size 200 scans is applied to the measured output profile,and the corresponding performance indices over moving windows is demonstrated in Figure 4.10a. Bothuser-specified benchmark (η2,user in the blue solid line) and MVC benchmark (η2 in the red dash-dotted lineand η1 in the black dashed line) are used in the calculation of these performance indices. The VMA model754.6. Examplesproposed in Section 4.5 is estimated repeatedly for each window in order to the compute those performanceindices. For each window, on average, it takes only 1.68 seconds to identify the VMA model, which is muchless than the sampling interval and thus fast enough for online monitoring. Note that performance indicesare calculated only after the 200th scan since the width of each window is 200 scans. It can be observed thatall these performance indices show consistent patterns except their levels. The performance index η2 showsthe worst performance since it is based on the aggressive MVC benchmark. The high level of η1 is due tothat the spatially uncontrollable components in the denominator and numerator of (4.13) are the majority.The user-specified benchmark η2,user also shows higher performance index than η2, which makes sense sinceη2,user is based on a more practical and less aggressive benchmark.For this industrial example, the user-specified benchmark is specified based on the implemented con-troller. Thus good control performance is expected to have η2,user close to one. However, from Figure 4.10awe find that all these performance indices are less than 0.7, which is not satisfactory. In order to investigatethe root causes, we generated the steady-state profile of both the dry weight and actuator as well as the powerspectrum of averaged dry weight profile, which are shown in Figure 4.10b-d. From the spectrum plot, it isobvious that some low-frequency components are left in the controllable range which contributes to the poorperformance. These low frequency components are due to the large fluctuations of the dry weight profilein the first 60 bins and those around the 280th bin. From the averaged actuator profile, we find that theactuator saturation in the first few zones and around the 85th zone explains the reason that those fluctuationsin the output profile were not rejected. Therefore, the actuator saturation due to the spatial disturbance is oneroot cause of the low performance index. However, further diagnosis or data pre-processing techniques arerequired to know if the model-plant mismatch is also one of the root causes. For instance, in order to reducethe false positives on the model-plant mismatch diagnosis, we may select a portion of data in which there areno severe actuator saturation, irregular disturbances or poorly tuned control to apply the performance mon-itoring algorithm. Alternatively, approaches on deriving performance lower bounds for constrained control(such as the work by Wang and Boyd [107]) could potentially be used to improve the MVC and user-specifiedbenchmarks. In this work, we combine the MVC and user-specified benchmarks with our MPM detectionapproach to address this issue.Note that the computational complexity associated with the performance index is an important concernfor online performance monitoring. For both simulator and industrial examples studied in this work, thecomputation speed is fast enough (less than 3 seconds for each window), compared with the sampling in-764.7. SummaryFigure 4.9: Three-dimensional plot of input-output profiles from industrial data set. (a): The dry weightprofile (g/m2) with 376 measurement bins with MD trend removed; (b): The actuator profile (%) of 114actuator zones.terval (more than 15 seconds for most paper machines). Thus the proposed algorithms for computing theperformance indices are efficient enough for online performance monitoring.4.7 SummaryIn this chapter, a spatial and temporal MVC benchmark for both steady-state and residual profiles are an-alyzed for the CD process of paper machines. Performance indices are proposed based on the steady-stateand residual MVC benchmark. The sensitivity of these performance indices with respect to large spatialhigh-frequency disturbances are analyzed and compared. Furthermore, the corresponding user-specifiedbenchmark is put forward by taking into account the desirable closed-loop dynamics in both temporal andspatial directions. A novel technique is employed to improve the efficiency of computations associated withthe multivariate time series model identification. The proposed technique decomposes the coefficient ma-trices into a series of simple multiplications between scalars and basis matrices. These basis matrices areconstructed based on the special structure of coefficient matrices. Data sets from a CD simulator are usedto test the sensitivity of proposed performance indices with respect to various types of mismatch existingin the CD process as well as to spatial high-frequency disturbances. Finally, an industrial data set is intro-duced to test the effectiveness of proposed performance monitoring technique. It is observed that in this dataset, the actuator saturation due to the large amplitude of spatial disturbances is a cause for the low controlperformance.774.7. Summary200 250 300 350 400 450 500 5500. NumberPerformance Index (a) Performance indices for the industrial data0 50 100 150 200 250 300 350−202CD BinsDry Weight(b) Averaged Dry Weight Profile0 20 40 60 80 100050100Actuator ZonesActuator(c) Averaged Actuator Profile0 0.01 0.02 0.03 0.04 0.05 0.0602040Spatial Frequency [cycles/meter]Magnitude(d) Spectrum of the steady−state profileFigure 4.10: The analysis of the industrial data. (a): The moving window performance indices for themeasured data. Blue solid line: η2,user; Red dash-dotted line: η2; Black dashed line: η1; (b): The steady-stateof the entire dry weight profile; (c): The steady-state of the entire actuator profile with lower bound and upperbound (red dash-dotted line); (d): The spectrum of the averaged dry weight profile and the approximatedspatial bandwidth (red dash-dotted line).78Chapter 5Model-plant Mismatch Detection for CDProcesses5.1 IntroductionIn this chapter, we focus on extending the MD MPM detection method to CD processes. As shown in Chapter3, routine closed-loop identification forms one of the main building blocks for the proposed MPM detectionalgorithm. Due to the large number of actuators and measurement sensors, closed-loop identifications ofCD processes have not been extensively explored. However, two assumptions widely employed by theindustry, separability between spatial and temporal responses and identical models for all CD actuators, cangreatly facilitate the CD identifications. Under the first assumption, a CD process model is essentially ahigh-dimensional (linear version of) Hammerstein model with a static (spatial) part in connection with adynamic part [108]. Various identification toolboxes developed for Hammerstein models would be inspiringfor the CD process identification. With the second assumption, the spatial static model is further simplifiedas a sparse Toeplitz-structured matrix. Based on these two observations, we will present a closed-loopidentification method for CD processes, by extending our previous ARX-OE method to high-dimensionalmodels. In addition, we will show a CD MPM detection framework that is based on routine operating dataand can discriminate the MPM (we use MPM exclusively to refer to the mismatches in spatial and temporalmodels) from changes in unmeasured disturbance models. The MPM idea is similar to that for MD processes,differing in that for CD processes we have more models to monitor.This chapter is organized as follows. A preliminary description of the closed-loop CD process is givenin Section 5.2. Section 5.3 is devoted to the development of routine CD closed-loop identification method,which includes convergence and consistency analysis of the proposed algorithm. We demonstrate the pro-cedures of implementing one-class SVM to MPM detection in Section 5.4. Two illustrative examples are795.2. Preliminariesprovided in Section 5.5 to show the effectiveness of the proposed CD closed-loop identification approach, aswell as the advantage of our MPM detection scheme. This chapter ends with a conclusion in Section Preliminaries5.2.1 CD process modelWe focus on the following single-array CD process that is widely employed in paper machine control,S1 : y(t) = go(q)Gou(t−d)+v(t), (5.1)where y(t) ∈ Rm and u(t) ∈ Rn represent the CV and MV profiles, respectively. v(t) ∈ Rm is a coloredmeasurement noise vector. go(q) is a FOPTD filter with unit gain, i.e.,go(q) =1− f o1− f oq , (5.2)where f o = exp(−Ts/Tp) with Tp and Ts being the time constant and sampling interval, respectively. Go ∈Rm×n is a steady-state gain matrix which represents the spatial responses of actuator array at steady state. dis the discrete time-delay. Note that in (5.1) we use a superscript “o” to denote the true process. The spatialmodel Go is assumed to be Toeplitz-structured. As a result, Go is decomposed as follows,Go =po∑k=1cokEk, (5.3)where cok is a scalar standing for the k-th elements in Go and Ek is the corresponding k-th basis matrix. Takingm = 6,n = 3, po = 2 as an example, we haveGo =co1 co3 0co2 co2 0co3 co1 co30 co2 co20 co3 co10 0 co2, E1 =1 0 00 0 00 1 00 0 00 0 10 0 0, E2 =0 0 01 1 00 0 00 1 10 0 00 0 1, E3 =0 1 00 0 01 0 10 0 00 1 00 0 0.With above decomposition we can identify the spatial model Go by estimating the parameters co = [co1 . . . coqo ]Tinstead of those in the nonlinear function (1.2). Note that the separability between temporal dynamic model805.2. Preliminariesgo(q) and spatial static model Go connects the typical CD process with Hammerstein models [109], whichinspires the closed-loop identification method proposed in this chapter.5.2.2 CD noise modelMost literature on CD system identification only considers the situations with v(t) being Gaussian whitenoise [83]. However, in practice, the physical nature of measurement devices (e.g. traveling back and forthin the cross direction) makes v(t) a colored noise in both spatial and temporal directions. Researchers havemade a few attempts to model the CD measurement noise in certain ways so as to reflect the correlationsin these two directions [81, 110, 111]. Several noise models, varying from simple to complex, have beenavailable in the literature to precisely represent the realistic noise encountered in the industry. A commonpractice is to use a multivariate band-diagonal AR or ARMA structure so that the temporal correlation ismodeled through each filter on the diagonal and the spatial correlation is represented by interactions amongoff-diagonal entries [111]. An alternative is to choose the noise model as diagonal while enforcing the inno-vation sequence to have non-diagonal covariance matrix [81]. For the latter method, the spatial correlationof colored noise v(t) is reflected by the covariance matrix. From the viewpoint of system identification pro-posed in this chapter, the latter method is favored since it admits inversing the noise model matrix withoutconcerning the issue of invertability of the noise matrix that often arises in the former method. Moreover,analogous to the temporal dynamic model, we assume that all output channels possess the same noise modelin the temporal direction4. In this manner,v(t) = Ho(q)Ieo(t), (5.4)where Ho(q) is a scalar monic transfer function that is stable and inversely stable, I ∈ Rm×m is an identitymatrix, and eo(t) ∈ Rm is a zero-mean Gaussian white noise vector with covarianceE[eo(t)eo(t− s)T ] = δs,tΣ ∈ Rm×m. (5.5)Note that here Σ can be non-diagonal and structured to represent the spatial correlations of CD measurementnoise. In general, it is difficult to acquire prior information about the true noise model structure Ho(q) in4This assumption can be easily relaxed to allow for different noise models in each output channel and the identification methodpresented here is still applicable with slight modifications.815.2. Preliminaries(5.4). For closed-loop identification especially the direct identification approach, incorrect specification ofthe noise model often leads to bias in the process model estimate [15]. In this chapter, we propose a novelclosed-loop identification method for the CD process model to address this issue.5.2.3 High-order ARX approximation of the CD process modelIt is well-known that any stable linear transfer function can be approximated arbitrarily well by a high-order FIR model [15, 112]. Inspired by this fact we can represent the CD process model (5.1)-(5.4) witha sufficiently high-order ARX structure, in order to avoid the bias issue in direct closed-loop identificationstemming from misspecification of the noise model structure. Specifically, we can re-write the CD model asfollows, given the particular diagonal noise model in (5.4) and the decomposition of Go in (5.3),S2 : Ao(q,ao)y(t) = Bo(q,bo)qo∑k=1cokEku(t−d)+ eo(t), (5.6)where Ao(q,ao) = 1/Ho(q) is a scalar polynomial showing the FIR representation of the inverse of noisemodel. We haveAo(q,ao) = 1+noa∑k=1aokq−k, ao = [ao1 . . . aonoa ]T . (5.7)The polynomial Bo(q,bo) = Ao(q,ao)go(q) is also parameterized with a FIR form,Bo(q,bo) =nob∑k=0bokq−k, bo = [bo1 . . . bonob]T . (5.8)We further define the parameter vector, θ oT = [aoT boT coT ] ∈ Rnoa+nob+1+qo . Strictly speaking, Ao(q,ao) andBo(q,bo) shall be of infinite orders as they are the infinite series expansions of rational functions. How-ever, under the stability assumption of Ao(q,ao) and Bo(q,bo), their coefficients decay to be negligible aftersufficient lags. Therefore, in practice, it makes sense to use a finitely truncated form to perform the corre-sponding identifications. One can refer to [88, 89] for a rigorous treatment of the more general case withinfinite impulse response coefficients. With above manipulations the CD process model is transformed intoan ARX-Hammerstein structure, which is much easier to handle than the original Box-Jenkins-Hammersteinstructure. However, the price to pay is the increased number of parameters to estimate in the high-orderrepresentation, which places more stringent requirements on the informativeness of closed-loop data.825.3. Routine CD closed-loop identification5.2.4 The presence of feedbackThe current optimization method is based on quadratic programming proposed in [12]. Typical constraints inthe MPC algorithm include actuator limits, maximum change between successive control actions, constraintson the averaged actuator profile in an array and bounds for bending limits. According to [113], when some ofthese constraints are active and varying, the MPC will display a piecewise linear or even nonlinear behavior,depending on the formulation of objective functions. Hence, we denote the feedback asu(t) = k(ut−1,yt , t), (5.9)where ut−1 = {u(1), . . . ,u(t−1)} and yt is defined in an analogous way. Note that for closed-loop identifi-cation with routine operating data where external excitations and setpoint changes are absent, nonparametricidentification methods often yield the controller inverse as a process model estimate [94]. One remedy toprevent this is to impose the true time-delay to (5.6) when performing the high-order ARX identification.Therefore, we assume that the true time-delay is available throughout the derivations in this chapter. How-ever, we stress that in practice, this stringent assumption can be relaxed and our mismatch detection schemeworks even when the true time-delay is not available. In that case we just incorporate a priori knowledge ofthe true time-delay into the identification algorithm.Another important concern in routine closed-loop identification is the identifiability. It has been discov-ered in [86, 93, 94] and in previous chapters that for linear feedback control, higher orders in the regulatorand larger time-delay in the process generally enhance the informativeness of closed-loop data. The specificrelationships among these factors have been fully investigated in these references. However, as commentedin [15] (p. 432), time-varying or nonlinear regulators in (5.9) are usually enough to guarantee the informa-tiveness of routine closed-loop data. The detailed conditions ensuring CD closed-loop identifiability will bepresented later in this chapter.5.3 Routine CD closed-loop identificationIn this section we present a novel closed-loop routine CD identification approach that gives convergent andconsistent estimates with routine closed-loop data. Our stress in this section is on the high-order ARX rep-resentation (5.6) of the original CD process. The basic techniques here adopt a similar idea as the separable835.3. Routine CD closed-loop identificationleast-squares [114]: alternately identifying the spatial model G0 and temporal model {A0(q), B0(q)}, untilthe parameters converge. When identifying the spatial (or temporal) parameters, we fix the temporal (orspatial) parameters to the latest values.5.3.1 Model parameterizationBased on previous analysis on go(q), we parameterize the temporal model with a first-order model structurein the following formg(q,θT ) =h1− f q , θT ∈ΩT , (5.10)where θT = [h f ]T is the temporal parameter and ΩT is a compact set. Note that the time-delay d is absorbedinto the input signal in (5.1). For the iterative identification algorithm in this work we mostly deal with thehigh-order ARX representation (5.6) of the original CD process. This model is parameterized asM : A(q,a)y(t) = B(q,b)G(c)u(t−d)+ e(t), (5.11)withA(q,a) = 1+na∑k=1akq−k, B(q,b) =nb∑k=0bkq−k, G(c) =p∑k=1ckEk, (5.12)where a = [a1 . . . ana ]T , b = [b0 . . . bnb ]T and c = [c1 . . . cp]T . na and nb are the orders of the ARX model.q is the selected spatial order. The temporal model {A(q,a), B(q,b)} are assumed to be scalar transferfunctions.Now let us derive the predictor form of (5.11). We start with the i-th output channel yi(t) and thengeneralize the results to the overall output. Define u¯k(t) = Eku(t) ∈ Rm, k = 1, . . . , p. With the high-orderARX parameterization (5.11), the one-step-ahead prediction for the i-th output isyˆi(t|t−1) =−[A(q,a)−1]yi(t)+B(q,b)p∑k=1cku¯ki (t−d), i = 1, . . . ,m. (5.13)Straightforward calculations yieldyˆi(t|t−1) = ψyi(t)a+ψ u¯i(t−d)Cb, i = 1, . . . ,m, (5.14)845.3. Routine CD closed-loop identificationwhereC = diag{c,c, . . . ,c},ψyi(t) = [−yi(t−1) . . . −yi(t−na)] ,ψ u¯i(t) =[u¯1i (t) . . . u¯qi (t)| . . . |u¯1i (t−nb) . . . u¯qi (t−nb)].It then follows that the predictor form of overall output isyˆ(t|t−1) = [ψy(t) ψ u¯(t−d)] aCb , (5.15)whereψy(t) =ψy1(t)...ψym(t) , ψ u¯(t) =ψ u¯1(t)...ψ u¯m(t) .5.3.2 Parameter EstimationConsider a set of input-output data generated according toS2 under the controller (5.9),ZN = {y(1),u(1), . . . ,y(N),u(N)}. (5.16)Denote θT = [aT bT cT ] ∈ Rna+nb+1+p stacking all temporal and spatial parameters. We can formulate theloss function of parameter estimation as,VN(θ) =1NN∑t=1εT (t,θ)ε(t,θ), (5.17)where ε(t,θ) ∈ Rm is the prediction error,ε(t,θ) = y(t)− yˆ(t|t−1). (5.18)855.3. Routine CD closed-loop identificationThe optimal parameter estimate θˆN is obtained byθˆN = argminθ∈ΩVN(θ), (5.19)where Ω=Ωa⊕Ωb⊕Ωc is a compact and convex set. Ωa, Ωb and Ωc are respectively compact and convexsets containing a, b and c. Note that solving the optimization problem (5.19) directly is not straightforwarddue to the coupling Cb in (5.15) which results in a nonconvex optimization. However, the separable structureof C and b motivates the usage of the separable least-squares technique. Fixing one parameter of Cb, solvingoptimization with respect to the other is convex and this scheme leads to the iterative optimization approach.We will show later that under weak conditions this iterative identification scheme is convergent.Another important observation of (5.18)-(5.19) is the non-identifiability issue featured for Hammersteinmodels. This is due to the fact that any pair (b/l, lc) yields the same model, ∀l 6= 0. In order to address thisproblem, a normalization of b or c has to be in place. In fact, for open-loop Hammerstein models that can betransformed into separable linear least-squares forms, the iterative identification algorithm can guarantee theconvergence to stationary points, provided that a normalization is performed after each iteration [115]. Thisstatement, with modifications, holds in our situation if certain conditions are satisfied. We will elaborate thisargument in Theorem 5.3.1.Our iterative identification algorithm to solve (5.19) is as follows. Denote the intial values of a and b asaˆi and bˆi, respectively. We use aˆk to denote the estimate of a in the k-th iteration. It also holds true for theestimates of b and c. Define the maximum iteration number as K. In each iteration k, k = 1, . . . ,K, first fixthe spatial parameter to cˆk−1 and perform a high-order ARX identification to (5.11) by solving{aˆk, bˆk}= arg mina∈Ωa,b∈ΩbVN(a,b, cˆk−1). (5.20)Note that the above optimization is an ordinary least-squares problem which is easy to solve. Before identi-fying spatial parameters, we choose to normalize bˆk after solving (5.20) byρk = sign(bˆk(1)), bˆk = ρkbˆk‖bˆk‖ , (5.21)to eliminate the parameter non-identifiability due to the coupling of b and c. Then fix the temporal parameters865.3. Routine CD closed-loop identificationto {aˆk, bˆk} and estimate the spatial parameter in (5.11) with another linear least-squares,cˆk = arg minc∈ΩcVN(aˆk, bˆk,c). (5.22)We then move forward to next iteration and carry out the same procedures as above. The estimated param-eters after K iterations are defined as aˆ = aˆK , bˆ = bˆK , cˆ = cˆK and denote θˆN = [aˆT bˆT cˆT ]T . Note that toobtain an estimate for true temporal and spatial parameters, some extra identifications have to follow up. Tothis end, we first filter the input-output data byy˜(t) = A(q, aˆ)y(t), u˜(t) = A(q, aˆ)G(cˆ)u(t). (5.23)One can see that ideally, i.e., if A(q, aˆ) = Ao(q) and G(cˆ) = Go, from (5.1), (5.6), (5.11), it follows that,y˜(t) = go(q)u˜(t− d)+ eo(t). As will be shown in Theorem 5.3.1, aˆ and cˆ converge to the true parametervalues asymptotically in the sample number N. Thus it is reasonable to estimate the temporal model go(q)with filtered input u˜(t) and output y˜(t). We stress that if a priori information about the true temporal modelstructure is available, as in the CD process, a parsimonious model in (5.10) can be efficiently estimated by anoutput-error identification. Otherwise, we can estimate an FIR structure for the temporal model to eliminatethe bias. For the CD process, we perform a multiple-experiment output-error identification to g(q,θT ) in(5.10),y˜(t) = g(q,θT )u˜(t−d)+ e(t), (5.24)and denote the parameter estimate as θˆT = [hˆ fˆ ]T . The next procedure is to re-scale the spatial and temporalparameters bycˆ = cˆ fˆ/(1− gˆ). (5.25)The rational behind the re-scaling step (5.25) is that g(q,θT ) is discretized from a continuous first-ordertransfer function and shall have a unit step response at steady-state (cf. (5.2)). With the acquired cˆ, we caneasily identify the spatial parameter θ S in (1.2) by standard nonlinear least-squares. The entire algorithm issummarized in Table 5.1.875.3. Routine CD closed-loop identificationTable 5.1: The implementation of routine CD closed-loop iterative identificationAlgorithm of routine CD closed-loop identificationInput: Set aˆ0← ai, bˆ0← bi and cˆ0← ci. K← maximum iteration number.Loop: for k = 1, . . . , K, do1: Fix the spatial parameter cˆk−1, and estimate parameters of the high-orderARX part in (5.11) by solving the least-squares problem (5.20);2: Normalize bˆk as in (5.21);3: Fix the temporal parameter {aˆk, bˆk}, and estimate the spatial parameterin (5.11) by solving the nonlinear least-squares problem (5.22);End for4: aˆ = aˆK , bˆ = bˆK , cˆ = cˆK . Filter the input-output data as in (5.23);5: Estimate the temporal model g(q,θT ) with y˜(t) and u˜(t) by anoutput-error identification (5.24);6: Re-scale cˆ based on (5.25) and identify the spatial parameter θ S in (1.2);Output: Parameter estimates θˆ S, θˆT , aˆ and noise covariance.5.3.3 Convergence and consistency analysisNow let us consider the convergence and consistency of the proposed iterative identification algorithm. Be-fore presenting the main results, we need certain restrictions on the quality of closed-loop data. We havediscussed the closed-loop identifiability problem in Section 5.2 for a generic process that is free of any ex-ternal excitations. It shows that high-order linear, time-varying or nonlinear regulators are beneficial fordirect identification by enriching the information content in routine operating data. Now we are in a placeto explore the detailed requirements on CD closed-loop data arising from these conditions. Specifically, wehave the following assumptions.Assumption 5.3.1. The input-output data ZN is bounded and generated according to the stable closed-loopsystemS1 (or equivalentlyS2) and (5.9), where N >> noa+nob and eo(t) is Gaussian white noise vector. Inaddition, it is assumed that the model structure (5.11)-(5.12) is uniformly stable ∀θ ∈Ω.Assumption 5.3.2. The parameterized model (5.11)-(5.12) has the same structure as the true model (5.6)-(5.8), i.e., na = noa, nb = nob and q = qo, and the true model is contained in the selected model structureθ o ∈Ω. Moreover, we assume the polynomial pairs {Ao(q),Bo(q)} and {A(q,a),B(q,b)} are coprime.Assumption 5.3.3. Assume that the closed-loop data ZN is informative enough for the relevant closed-loopidentification. This assumption involves the following aspects:(a). The closed-loop input data uN is strongly persistently exciting with orders at least nb over the basis885.3. Routine CD closed-loop identificationmatrices Ek,k = 1, . . . , p. In other words,rank Φu = p(nb+1), Φu =ψ u¯(1)...ψ u¯(N) , (5.26)that is, Φu has full column rank for any large N. This is similar to the persistent excitation requirementfor input signals in open-loop identification.(b). There does not exist a common linear time-invariant feedback relationship between inputs and outputsover all channels. It is formally stated asE‖R(q)G(c)u(t−d)+S(q)y(t)‖2 > 0, ∀c ∈Ωc, (5.27)where R(q) and S(q) are arbitrary scalar linear filters, and E is the generalized expectation operator.Note that this condition is equivalently stating that all output channels do not share the same feedbackregulator.Note that the above assumptions are fairly loose. In particular, for Assumption 3a, it is easy to meetthe persistent exciting requirement (5.26) since in closed-loop the input signal is filtered white noise thatcontains enough excitations to make Φu full column rank, especially when N is large. For Assumption 3b,all input-output channels have to share the same regulator in order to make (5.27) invalid. This assumptionbecomes more convincing given that most CD MPC has complex dynamics due to the complexity in theassociated optimization and constraints [112].We present the following theorem showing the convergence of proposed iterative CD closed-loop iden-tification and asymptotic properties of the parameter estimator.Theorem 5.3.1. Consider the data ZN generated according to the stable closed-loop system (5.6) and (5.9).Suppose that Assumptions (5.3.1)-(5.3.3) are true. We further assume that the parameter estimates aˆk 6= 0,bˆk 6= 0, cˆk 6= 0 during all iterations. Then the following statements on the algorithm 5.1 of routine CDclosed-loop identification hold:(i) For the iterative identification algorithm 5.1 with normalization (5.20)-(5.22), the parameter estimatesequence {θˆ kN} is convergent to stationary points of VN(θ) for any large N >> na+nb if it converges,895.4. Application of one-class SVM to CD mismatch detectioni.e.,θˆ kN → θˆN , as k→ ∞, (5.28)where ∇VN(θˆN) = 0.(ii) The loss function VN(θ) converges uniformly to the limit function V (θ) asymptotically in the samplenumber, namely,supθ∈Ω|VN(θ)−V (θ)| → 0, w.p.1 as N→ ∞, (5.29)where V (θ) = E[εT (t,θ)ε(t,θ)]. As a result,θˆN → θ ∗, w.p.1 as N→ ∞, (5.30)where θ ∗ = argminθ∈Ω V (θ).(iii) The parameter estimate θˆN is consistent, that is,θˆN → θ o, w.p.1 as N→ ∞, (5.31)where θ o is the true parameter value.Proof. Please see C.1.Remark 5.3.1. Note that the convergence in (i) of Theorem 5.3.1 the iterative identification algorithm stillholds even when the assumptions on the informativeness of closed-loop data are not satisfied [116]. More-over, it is shown in [115] that the accumulated points of {θˆ kN} are stationary points of VN(θˆ) if a generalconvergence on the sequence {θˆ kN} cannot be achieved.5.4 Application of one-class SVM to CD mismatch detectionWe can attain consistent parameter estimates for the spatial, temporal and noise models with the iterativeclosed-loop identification approach presented previously. This serves as the major building block for ourSVM based mismatch detection framework. Specifically, we will implement the mismatch detection methodsimultaneously to each of these models to examine the corresponding model changes. With this in mind, thefollowing presentation is exclusively devoted to showing the detection of mismatch for the temporal model.905.4. Application of one-class SVM to CD mismatch detectionAll procedures below apply to the detection of mismatch for noise model and spatial model as well. Inpractice, we have to monitor these three models in parallel so as to distinguish the noise model change fromMPM and raise mismatch alarms appropriately.5.4.1 SVM trainingWe use the temporal models estimated from moving windows in the training data as the training models,denoted as {x1, . . . ,xl}. l is the number of moving windows in the training data set. Each xi stands for anFIR coefficient vector of one estimated temporal model from the i-th moving window,xi = [xˆ1i . . . xˆngi ]T , (5.32)where xˆki , k = 1, . . . ,ng, is the k-th FIR coefficient and ng is the order. Applying (3.20)-(3.21) to the trainingdata {x1, . . . ,xl} yields a one-class SVM prediction model (3.24). Notice that in principle, larger trainingdata set provides better descriptions of the boundary of the nominal cluster. However, in practice, we mayencounter situations where only very limited training data sets are available, as in the MD MPM detectioncase. One remedy to overcome this issue is to use historical data. However, this idea would fail if thehistorical data is not accessible or was not saved, which is not uncommon in the industry. Another solutionto this problem is enlarging the training data by re-sampling according to the probability distribution ofparameter estimates. To this end, here we propose a simple re-sampling technique. It has been shown in[117] that parameter estimates from the separable nonlinear least-squares method are Gaussian distributed(if the noise is Gaussian) and this arguments can also be extended to our iterative closed-loop identificationalgorithm with minor modifications. Based on this statement, we can construct rough estimators for themean µk and variance σk of each FIR coefficient xˆk,µˆk = µ(xˆk1, . . . , xˆkl ), σˆk = σ(xˆk1, . . . , xˆkl ), k = 1, . . . ,ng, (5.33)where a common choice for µ(·) and σ(·) are the sample mean and sample variance. It is apparent thatif the number of training models is small due to limited training data, the variance estimator in (5.33) isconservative relative to the true variance. A proper scaling factor αT (subscript T means temporal) of σˆkis necessary. The rule of thumb in selecting αT is: if we have a large set of training data, αT is small;915.5. Examplesotherwise, αT is large. One can also expect that larger scaling factor makes the mismatch detection algorithmless sensitive to mismatch and vice versa. After scaling the variance estimator, we can re-sample from theobtained density function to generate a large number of training models and then train a one-class SVM basedon these augmented data. Note that in the case where plenty of historical data are available, the re-samplingstrategy may not be necessary.5.4.2 SVM predictionSimilar to the SVM training stage, we slide moving windows along the test data and perform a closed-loopidentification in each window. Then use the trained SVM model to predict whether the currently estimatedmodel can be classified into the initial cluster of normal model estimates. Given the test data point x (a FIRcoefficient vector), p(x) in (3.24) computes the functional distance of this point to the initial cluster andsuch distance is known as a score. We can use the sign of this score to classify x. Positive scores mean theunderlying test points can be grouped into the initial cluster and thus show no mismatch. Otherwise, theyare seen as indications of mismatches. Define It as the sign of score for the test moving window at time t.The users, in practice, tend to be very cautious in raising the MPM alarm since the subsequent closed-loopidentification is expensive. Therefore, the accumulated number of negative scores over last few movingwindows is a more reasonable metric to indicate the occurrence of MPM. This accumulated metric is definedassMPM :=I−nT, (5.34)where I− := {Ii =−1 : i ∈ Tt} with Tt := {t−nT , . . . , t−1, t}. Here the user-defined term nT means that thelast nT moving windows will be used to compute sMPM. The users can specify a conservative threshold onsMPM, e.g, 0.95, to be circumspect on raising MPM alarms.5.5 ExamplesIn this section, we will provide examples from a single-array CD process of paper machines to verify the pro-posed iterative CD closed-loop routine identification algorithm and the one-class SVM mismatch detectionapproach. These examples are conducted on Honeywell paper machine simulators which highly resemblepractical paper machines.925.5. ExamplesTable 5.2: Tuning parameters of the CD MPCTuning parameters Values Tuning parameters ValuesCV target weight 0.40 MV temporal movement weight 0.25MV target weight 0.17 MV spatial picketing weight 0.14CV target value 42 MV target value 0Prediction horizon 25 Control horizon 4Actuator bend limit 30 Actuator upper/lower average ±1Actuator change rate limit 15 Actuator upper/lower limit ±205.5.1 Example 1: Iterative CD closed-loop routine identificationFor this particular CD process, the opening of autoslice (MV) is manipulated to alter the amount of pulpslurry distributed across the sheet, which ultimately affects the dry weight (CV) of the paper sheet beingproduced. There are 222 measurement boxes and 74 actuators along the cross direction. The selectedsampling interval is 12 seconds and the continuous time constant is 126.4352 seconds. After discretization,the true CD process has temporal and spatial parameters as follows:f o = 0.9095, θ oS = [0.3802 268.6414 0.10 1.5]T , d = 2.The controller that is being used in the simulation is CD MPC. Please refer to [12] for details on the explicitformulation of this MPC. Note that there are four types of constraints involved in the adopted MPC: bendlimits for neighboring actuators, bound limits for the actuator average profile, upper and lower limits for theactuator profile, and limits of the change rate of actuators. Tuning parameters specified for the MPC aredemonstrated in Table 5.2. The true noise is chosen as a high-pass filtervo(t) =1−0.6q−1+0.3q−2−0.1q−31+0.4q−1+0.1q−2+0.05q−3Ieo(t),where eo(t) is Gaussian white noise with zero mean and non-diagonal covariance matrix Σ. The varianceat each output channel is 0.01. The relevant simulation parameters are demonstrated in Table 5.3 (note thatTable 5.3 also contains several MPM detection parameters that will be used in Example 5.5.2). We furtherassume that there is no MPM in this simulation. To verify the effectiveness of the proposed closed-loopCD identification algorithm with routine operating data, we leave the setpoint unchanged during the entiresimulation, i.e., the noise is the only external signal to the system. We also suppose that the true process delayis available and is incorporated into the identification algorithm to avoid estimating the inverse of controller935.5. ExamplesTable 5.3: Simulation and MPM detection parameters for Example 1 and Example 2Parameters Values Parameters ValuesActuator zone width 62.5194 mm CD bin windth 20.8398 mmSampling interval 12 seconds Iteration number 5Initial temporal θ iT [0.82 0.18] Initial spatial θiS [0.3 200 0.2 4.0]Window size 80 min Window step size 20 minTraining data size 800 min Temporal αT 3Spatial αS 1.5 Noise αN 3Time noise model changes 1200th min Time MPM occurs 1600th minas the process model. With the above setting, the simulation of closed-loop system lasts for 120 minutes(600 samples of input-output data). Plots of input and output data from simulation are shown in Figure 5.1below. Initially, the controller manipulates the actuator array to attenuate the steady-state disturbances actingon the output. After this transient part the whole closed-loop system enters a steady-state mode and thus weselect the last 400 samples of data to perform the closed-loop identification.The initial spatial and temporal parameters for the identification algorithm are shown in Table 5.3. Wefind that similar to most open-loop Hammerstein model identifications, our algorithm converges very fast tothe local minimum in just a few iterations. The maximum iteration number is set to K = 5. For the tem-poral model, we choose na = 20, nb = 50. The spatial order is selected as p = 30. The users may requiretrial-and-error in implementing our algorithm in practice and empirical insights into the noise characteristicsare valuable in choosing these orders. In addition, proper regularizations are necessary in executing thisalgorithm in order to smooth the estimated FIR coefficients. The regularization can also ensure the numer-ical stability incurred with the high-order least-squares problem (5.20) when the regressor matrix has largecondition number. Table 5.3 summarizes relevant parameters chosen for these two examples studied in thissection.The top-left plot of Fig. 5.2 demonstrates the impulse response of the estimated inverse of noise modelversus that of the true inverse of noise model. As expected, these two lines are fairly close to each otherwhich indicates that the estimated noise model is accurate. After steps 4-6 in Algorithm 5.1, the estimatedvalues of temporal and spatial parameters arefˆ = 0.9060, θˆ S = [0.3442 277.0566 0.0694 1.7889]T .One can see that these parameter estimates are very precise. In the SVM-based mismatch detection, we945.5. ExamplesFigure 5.1: Simulated input-output data for the closed-loop CD processare more interested in the FIR coefficients of these models. The next two plots on the left side of Fig. 5.2demonstrate the computed impulse responses of both spatial and temporal models against those of the truemodels. It is apparent that all these estimated impulse responses coefficients approximate the true ones withhigh accuracy. More importantly, the noise model and process model can be independently estimated, whichpaves a way to discriminate the MPM from noise model change. As an additional test, we change the noisemodel into a low-pass filtervo(t) =1+0.7q−1+0.4q−21−0.5q−1+0.1q−2 Ieo(t),and repeat the closed-loop identification procedures above. The corresponding identification results areshown in the right plot of Fig. 5.2. Once again, the estimated impulse responses are very close to the trueresponses. These results verify the effectiveness of our proposed CD closed-loop identification method.5.5.2 Example 2: One-class SVM model-plant mismatch detectionIn this example, we continue studying the previous closed-loop CD simulation but shift our attention tothe mismatch detection. The main simulation parameters stay the same as those in Example 5.5.1. Table5.3 also lists other parameters that are specific to the CD MPM detection algorithm. It is worth pointingout that here we choose a relatively small spatial scaling factor αS to increase the sensitivity of our MPMdetection algorithm in detecting spatial mismatches. During the steady-state operation, the spatial mismatchis relatively important as it directly determines the steady-state control performance. In this example, we putour emphasis on the spatial mismatch detection.The simulation logic for this example is elaborated as follows. Initially there is neither MPM nor noise955.5. ExamplesTime in samples0 2 4 6 8 10 12 14 16 18 20-0.500.51Impulse response of the inverse of noise modelTrueEstimatedTime in samples0 5 10 15 20 2500.050.1Impulse response of temporal modelTrueEstimatedSpatial distance in millimetres0 100 200 300 400 500 600- response of spatial modelTrueEstimatedTime in samples0 2 4 6 8 10 12 14 16 18 20-2-101Impulse response of the inverse of noise modelTrueEstimatedTime in samples0 5 10 15 20 2500.050.1Impulse response of temporal modelTrueEstimatedSpatial distance in millimetres0 100 200 300 400 500 600- response of spatial modelTrueEstimatedFigure 5.2: CD closed-loop iterative identification results: noise model is a high-pass filter (left); noisemodel is a low-pass filter (right);Noise change →MPM →Colormap of dry weight data1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 110005010015020040.54141.54242.5Colormap of autoslice dataTime in samples1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000204060 -202Figure 5.3: The colormap of simulated input-output data for CD MPM detection965.5. Examplesmodel change and the noise model is chosen as Ho(q) = (1− 0.6q−1)/(1+ 0.4q−1). After disposing theinitial transient part due to attenuating the steady-state disturbance, we use the first 4000 samples of data (cf.Table 5.3) as the training data. Apply moving windows to the training data and identify spatial, temporal andnoise models from each moving window. Each of these moving windows contain 400 samples and a stepsize of 100 samples. Once this step is complete we train one-class SVMs separately to these models andobtain the corresponding prediction functions. Notice that this procedure involves scaling the initial clusterand re-sampling. From then on, the SVMs start to predict mismatches. After 6000 samples, we graduallyswitch the noise model into Ho(q) = (1+0.7q−1)/(1−0.5q−1). At the 8000th sample, the width parameterof true plant begins to increase, gradually to 1.5 times of the original value after 300 samples. Then thetrue width parameter settles at the new value thereafter. As a comparison, we add an online user-specifiedperformance index to monitor the output variance [118]. Note that throughout this simulation there is nosetpoint change or any other external excitation.The colormaps of simulated input and output data are shown in Fig. 5.3. We highlight the time at whichwe begin to introduce the noise model change and spatial MPM. It is clear that the output variance inflatesafter introducing the noise model change and increases furthermore after adding the MPM. Fig. 5.4 illustratesthe spatial and temporal parameter estimates over all moving windows where the red dash-dotted line in eachplot shows the true parameter value over time. The blue lines display the estimated parameter values in eachmoving window. Although these parameter estimates are subject to large variance due to low excitation levelsin the routine closed-loop data, it still provides valuable insights for the operators on which parameter mayhave drifted. Moreover, this closed-loop identification algorithm can easily carry over to the data collectedduring closed-loop identification experiment. This means that if a closed-loop identification is necessary, weonly need to start injecting external excitations to the system, with the closed-loop identification algorithmoperating continuously.Fig. 5.5 displays the detailed mismatch detection results. As expected, the user-specified performanceindex is extremely sensitive to the changes in output variance, whether they are caused by noise model changeor by MPM. It starts to drop obviously after the noise change even before the MPM is added. This is themain drawback of using user-specified or other variance-based performance index to examine the happeningof MPM: they are not able to distinguish noise model change from MPM. In contrast, the scores obtainedfrom SVM predictions can clearly indicate whether it is a noise model change or a MPM. Since it monitorsthree models independently, we can significantly reduce the number of false alarms on MPM due to noise975.6. Summary0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1100000.51Gain estimates0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 110000200400600Width estimates0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1100000.20.40.6Divergence estimates0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1100011.52Attenuation estimatesTime in samples0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 110000100200300Time constant estimatesFigure 5.4: Process parameter estimates over mov-ing windowsNoise change →MPM →Colormap of dry weight data1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000501001502000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 110000.40.60.81User-specified performance index0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000-2000200SVM scores for noise model estimates0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000050100150SVM scores for temporal model estimatesTime in samples0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000-2000200SVM scores for spatial model estimatesFigure 5.5: MPM and noise model change detectionresultsmodel changes. Notice that occasionally there may be several negative scores which implies a mismatch inthe underlying moving window. These outliers shall not affect our decisions regarding the MPM. As pointedout in Section 3.5.3 and Section 5.4.2, we shall use the accumulated score metric (5.34) to raise the mismatchalarm, instead of any single negative score observed occasionally.5.6 SummaryIn this chapter, we present a novel closed-loop identification method that can provide consistent parameterestimates for the CD process. It is applicable to the routine operating data that lack external excitations, giventhat a set of weak conditions on the informativeness of closed-loop data are satisfied. For the CD MPMdetection, we split the routine operating data into training and test stages. The closed-loop identification985.6. Summarymethod is implemented in moving windows continuously over both stages. We then develop one-class SVMmodels based on the clusters of model estimates from the training data. Such SVM models are then used topredict classifications of models estimated from the test data. From the predictions, we are able to detect thepresence of MPM. Moreover, this approach enables us to monitor the changes in process and noise modelsseparately, thus our detection results on MPM are robust to noise model changes. Two simulation examplesare presented to validate the effectiveness of the proposed methods.99Chapter 6Closed-loop Optimal Input Design for CDProcesses6.1 IntroductionFor CD processes, most existing input design results focus on the open-loop case, see [71]. The main draw-back of open loop input design is the resultant production loss as the normal operations of the process riskbeing interrupted. Results on closed-loop optimal input design for CD processes are scarce, with currentindustrial practice of using spatial-bump-temporal-PRBS signals as excitation signals for closed-loop iden-tification (denoted as “bump excitation” in this work) [72].In this chapter, we design excitation signals for the CD process with optimal input design techniques.In particular, we focus primarily on input design for the steady-state CD process model, in light of the factthat mostly the CD process operates at steady-state. The major challenge lies in how to represent the closed-loop CD process with a parsimonious parametric model to avoid large input-output dimensions. Inspiredby [82], we propose to develop a scalar noncausal model for the closed-loop CD process to resolve thisissue. Furthermore, we demonstrate that a causal model can be obtained and it is equivalent to the noncausalmodel in the sense of same output spectrum. It is further shown that the maximum likelihood estimateand parameter covariance matrix of the causal-equivalent model converge to those of the noncausal modelasymptotically with probability one. In this sense, the optimal excitation signal can be designed based solelyon the causal model.This chapter is outlined as follows. In Section 6.2, we present the CD steady-state model. In Section 6.3,we illustrate detailed procedures to develop a scalar causal-equivalent model for the closed-loop CD process.In Section 6.4, the optimal input design procedures are demonstrated based on the causal-equivalent model,followed by a simulation example in Section 6.5 to verify the proposed results.1006.2. Preliminaries0r vyuFigure 6.1: The closed-loop optimal input design configuration6.2 Preliminaries6.2.1 Open-loop CD process modelIn this work, we consider the following classical CD process model,y(t) = g(q)Gu(t)+v(t), (6.1)where y(t) ∈ Rm and u(t) ∈ Rm are the CV and MV profiles. Note that here we assume, without loss ofgenerality, that the dimensions of output and input profiles are equal. The results in this chapter can be easilyextended to the more general case in which theirs dimensions are different. v(t) is the colored measurementnoise. G ∈ Rm×m is the steady-state gain matrix. In this chapter, we pose the following assumptions on thestructure of G matrix. The diagram of closed-loop optimal input design for steady-state CD processes isshown in Fig. 6.1.Assumption 6.2.1. All actuators of the CD process have the same symmetric impulse response in the spatialdirection at steady-state. The columns of G matrix are indeed sampled version of these responses. Thus Gmatrix is Toeplitz-structured.For the purpose of spatial optimal excitation signal design, the following steady-state CD process modelis of interest,yss = Gssuss+vss, (6.2)where yss ∈Rm is the steady-state measured CV profile, and uss ∈Rm is the steady-state MV profile. Gss =Gis the steady-state process gain. vss = φe ∈ Rm is the steady-state output disturbance, where e ∈ Rm is thespatial noise. For convenience, we suppose that the spatial filter φ is also Toeplitz-structured and sparse asGss.1016.2. Preliminaries6.2.2 Closed-loop steady-state CD process modelDesigning closed-loop dither signals requires an explicit expression of the controller to formulate the objec-tive function of input design. It is well-known that CD MPC may display time-varying or even nonlineardynamics if any constraint is active. Thus input design for closed-loop MPC systems involving active con-straints is non-trivial. In this work, to simplify this problem we introduce the following assumption.Assumption 6.2.2. The MPC is assumed to operate in a linear mode with no active constraints.A typical MPC cost for CD process is expressed as [12]min∆u(k)J(k) = min∆u(k){Hp∑j=1[yˆ(k+ j)−ysp(k+ j)]T Q1 [yˆ(k+ j)−ysp(k+ j)]+Hu−1∑j=0[∆u(k+ j)T Q2∆u(k+ j)+u(k+ j)T Q3u(k+ j)]}, (6.3)where Hp and Hu are the prediction and control horizons. ysp(k) is the target value for output at time k. yˆ(k)is the predicted output value at time k. ∆u(k) is the changes in actuator profile at time k and u(k) is the inputprofile. Q1, Q2 and Q3 are diagonal weighting matrices for controlled properties, changes in actuator arrayand the actuator array, respectively.With Assumption 6.2.2, the closed expression of MPC is shown to be [12]Kss =−Q−13 αKGssQ1, (6.4)where Q1 is the weight matrix in MPC objective function penalizing the deviation of CV profile from itssetpoint. Q3 is the corresponding weight matrix to penalize the offset of steady-state MV from its target. αKis a constant determined from the dynamic model of actuators. In practice the weighting matrices Q1 and Q3are often chosen to be diagonal. Assuming that Q1 and Q3 are diagonal, the controller Kss then has a matrixstructure similar to that of gain matrix Gss. The structural similarity between Kss and Gss will be used in thederivations below.Remark 6.2.1. While Assumption 6.2.2 seems to be a restrictive assumption in practical implementation ofour algorithm, industrial experience reveals that a well-tuned CD MPC normally operates without activeconstraints. Extension of our study to incorporate active constraints will be a consideration in our futurework.1026.3. Causal scalar transfer function representation of the CD processCombining (6.2) and (6.4), we can easily arrive at the following closed-loop CD process (cf. Fig. 6.1)yss = (I+GssKss)−1 GssKssr+(I+GssKss)−1 vss, (6.5)uss = (I+KssGss)−1 r− (I+KssGss)−1 Kssvss, (6.6)where r ∈ Rm is the spatial excitation signal to be designed.6.2.3 Spatial optimal input design for the CD processWhen it comes to spatial optimal input design, the parameters of interest here are those in the gain matrixGss (or more specifically, the parameters in a column of Gss). However, optimal input design directly basedon the closed-loop model (6.5)-(6.6) is non-trivial due to the large input-output dimensions as well as thelarge number of parameters in Gss. To circumvent this problem, we propose to use a scalar transfer functionalong the spatial coordinate to represent the steady-state response of CD actuators. In this sense, the originaloptimal input design aimed for the MIMO CD model can be re-formulated into that for the scalar spatialmodel, which significantly reduces the complexity. However, the disadvantage is that this scalar spatialtransfer function has to be noncausal to capture the responses of actuators on both sides (see Fig. 6.2),analogous to the ‘past’ and ‘future’ in the conventional time coordinate. We provide theoretical basis showingthat the noncausal model can be further converted to a causal-equivalent model such that the input designcan be performed based on this causal model.6.3 Causal scalar transfer function representation of the CD process6.3.1 Noncausal scalar model of the closed-loop CD processFrom the aforementioned structure of Gss as well as Assumption 6.2.1, one is readily able to extract a scalarnoncausal FIR model from any single column of Gss to represent the spatial impulse response of an actuatorg(λ ,λ−1) = g−nλ−n+ . . .+g0+ . . .+gnλ n, (6.7)where n < m is a truncated index representing significant coefficients. The positive and negative powers ofλ denote the anti-causal and causal shifts, respectively. The gi, i = −n, . . . ,n, are spatial impulse response1036.3. Causal scalar transfer function representation of the CD processcoefficients and in general symmetry of these coefficients is enforced, i.e., gi = g−i. Since in most cases thenoncausal FIR model (6.7) has a high order (i.e., n is large), a parsimonious noncausal transfer function isnecessary to simplify this model. First the following assumption is present.Assumption 6.3.1. The complex trigonometric polynomial (6.7) has real and symmetric coefficients. More-over, this polynomial is positive at any points on the unit circle |λ |= 1.Remark 6.3.1. Assumption 6.3.1 is posed to facilitate the conversion of the noncausal model into a causalequivalent model in the following sections. Although this assumption does not admit a physical motivationor interpretation, it is a fairly loose condition as most practical actuator impulse response shapes in spatialdirection can satisfy this condition or be approximated by a curve satisfying such requirement.With this assumption it follows from the Feje´r-Riesz Theorem that we can factorize g(λ ,λ−1) asg−nλ−n+ . . .+g0+ . . .+gnλ n = M(λ )M(λ−1), ∀ω, (6.8)where λ = e jω . Here M(λ−1) has the following expressionM(λ−1) = m0+m1λ−1+ . . .+mnλ−n,where mi, i = 1, . . . ,n, are the coefficients. An immediate observation is that the frequency response of theleft-hand side of (6.8) is real and non-negative at any frequency, which places certain restrictions on thescope of possible spatial impulse response shapes that we may investigate. However, as commented in 6.3.1,industrial experience shows that most actual actuator response shapes are able to satisfy or approximatelysatisfy this condition. The relationship between Gss and Kss from (6.4) affirms that if Gss satisfies (6.8) thenso does Kss.After obtaining the causal FIR model M(λ−1), the next step would be to find a parsimonious transferfunction model (e.g., output-error model) to represent M(λ−1). This process can be accomplished from thesystem identification toolbox in Matlab and the original noncausal g(λ ,λ−1) is approximated by g¯(λ ,λ−1)1046.3. Causal scalar transfer function representation of the CD processas followsg¯(λ ,λ−1) =B(λ )B(λ−1)A(λ )A(λ−1), (6.9)B(λ−1) = b0+b1λ−1+ . . .+bnbλ−nb , (6.10)A(λ−1) = 1+a1λ−1+ . . .+anaλ−na , (6.11)where na and nb are the orders of B(λ−1) and A(λ−1), respectively. In a similar fashion, the noncausaltransfer function form of the controller is assumed to bek¯(λ ,λ−1) =F(λ )F(λ−1)E(λ )E(λ−1), (6.12)F(λ−1) = f0+ f1λ−1+ . . .+ fn f λ−n f , (6.13)E(λ−1) = 1+ e1λ−1+ . . .+ eneλ−ne , (6.14)where ne and n f are the orders of E(λ−1) and F(λ−1), respectively. From (6.9)-(6.14), the original high-dimensional MIMO steady-state closed-loop model (6.5)-(6.6) can be replaced by scalar but noncausal trans-fer functions5y(x) =g¯1+ g¯k¯r(x)+11+ g¯k¯v(x), (6.15)u(x) =11+ g¯k¯r(x)− k¯1+ g¯k¯v(x), (6.16)where x stands for the spatial coordinate. Note that the input and output sensitivity functions have the samenoncausal transfer function representation as shown above.6.3.2 Causal equivalent closed-loop modelsThe closed-loop scalar noncausal model of the CD process (6.15)-(6.16) is still not in a form convenient fortraditional optimal input design algorithms. In this subsection we develop a method to find causal-equivalentmodels for the noncausal transfer functions such as g¯(λ ,λ−1). First, the following Lemma is necessary.Lemma 6.3.1. Suppose that g¯1(λ ,λ−1) and g¯2(λ ,λ−1) satisfy the conditions in Assumption 6.3.1. Then thesum g¯1(λ ,λ−1)+ g¯2(λ ,λ−1) also has a factorization as (6.8).5In the sequel, we will omit the subscript and use the argument x to indicate the steady-state input and output sequence.1056.3. Causal scalar transfer function representation of the CD processProof. Since the conditions in Assumption 6.3.1 apply to polynomials g¯1(λ ,λ−1) and g¯2(λ ,λ−1), wehaveg¯1(e jω ,e− jω) ≥ 0, ∀ω,g¯2(e jω ,e− jω) ≥ 0, ∀ω,Thus it follows thatg¯1(e jω ,e− jω)+ g¯2(e jω ,e− jω) ≥ 0, ∀ω. (6.17)Besides, the coefficient sequence of (6.17) is real and symmetric. From the Feje´r-Riesz Theorem therealways exists an M(λ ) such that (6.8) is satisfied. Defining S¯ = 11+g¯k¯ , from (6.15)-(6.16), we haveS¯ =A(λ )A(λ−1)E(λ )E(λ−1)A(λ )A(λ−1)E(λ )E(λ−1)+B(λ )B(λ−1)F(λ )F(λ−1). (6.18)From Lemma 6.3.1, it follows that the denominator of (6.18) can be factorized to be the product of a causalFIR filter and its anti-causal form. Therefore, the closed-loop transfer functions (6.15)-(6.16) can be simpli-fied asy(x) = S¯1(λ ,λ−1)r(x)+ S¯2(λ ,λ−1)v(x), (6.19)u(x) = S¯2(λ ,λ−1)r(x)− S¯3(λ ,λ−1)v(x), (6.20)where S¯i(λ ,λ−1), i = 1,2,3, has a structure similar to that of (6.9) and (6.12). Further notice that φ can alsobe represented by a noncausal transfer function as is assumed in previous sections. In other words, the spatialnoise vss has the following expressionv(x) = h¯(λ ,λ−1)e(x) =D(λ )D(λ−1)C(λ )C(λ−1)e(x), (6.21)where {e(x)} is a spatial white noise sequence. To find a causal-equivalent transfer function for (6.19)-(6.21),we establish the following theorem.1066.3. Causal scalar transfer function representation of the CD processTheorem 6.3.1. Consider a stochastic process with the output sequence {y(x),x = 1, . . . ,m} generated ac-cording to the following noncausal Box-Jenkins modely(x) =M(λ )M(λ−1)N(λ )N(λ−1)r(x)+R(λ )R(λ−1)S(λ )S(λ−1)e(x), (6.22)where {e(x),x = 1, . . . ,m} is a Gaussian white noise sequence. The polynomials with arguments λ−1 andλ are the causal and anti-causal parts, respectively. Assume that all polynomials have no zeros on the unitcircle and are minimum phase. Then there exist causal polynomials M˜y(λ−1), N˜y(λ−1), R˜y(λ−1), S˜y(λ−1)and a white noise sequence {e˜y(x)} as well as a stochastic sequence {y˜(x)} which has the same spectra as{y(x)} such that,y˜(x) =M˜y(λ−1)N˜y(λ−1)r(x)+R˜y(λ−1)S˜y(λ−1)e˜y(x). (6.23)Proof. Multiplying both sides of (6.22) by using N(λ )N(λ−1)S(λ )S(λ−1), we obtainN(λ )N(λ−1)S(λ−1)S(λ )y(x) = M(λ )M(λ−1)S(λ )S(λ−1)r(x)+N(λ )N(λ−1)R(λ )R(λ−1)e(x).(6.24)Define the roots of causal polynomials M(λ−1), N(λ−1), R(λ−1), S(λ−1) to be, respectively, αi, βi, γi andδi. LetpiM =∏iλ−1−αiλ −αi , piN =∏iλ−1−βiλ −βi , piR =∏iλ−1− γiλ − γi , piS =∏iλ−1−δiλ −δi .Notice that N(λ )N(λ−1)piN = N2(λ−1) and the same also holds for M(λ ), R(λ ), and S(λ ). Multiplyingboth sides of (6.24) by piMpiS, after some manipulations, we haveN2(λ−1)S2(λ−1)y˜(x) = M2(λ−1)S2(λ−1)r(x)+R2(λ−1)e˜y(x), (6.25)where y˜(x) = piMpiN y(x), e˜y(x) =piMpiSpiNpiR e(x). Since piM, piN , piR and piS are all-pass filters, {e˜y(x)} is a Gaussianwhite noise sequence with the same spectra as {e(x)} but may correspond to different realizations. Besides,{y˜(x)} has the same spectra as {y(x)}. Therefore, (6.23) is verified by pairing M˜(λ−1) = M2(λ−1) and soon with (6.25). Remark 6.3.2. From Theorem 6.3.1 one may interpret the equivalence between {y˜(x)} and {y(x)} in terms1076.3. Causal scalar transfer function representation of the CD processof the spectra, although realizations might be different. However, this equivalence greatly facilitates themaximum likelihood estimation for the original noncausal model by reducing it into a causal-equivalentform. The rationale for performing identification in this manner has been explained in [82] for an ARXmodel. The conclusion is that the log-likelihood function of the noncausal model converges to that of thecausal model with probability one as the sample number tends to infinity. This result can also be extended tothe noncausal Box-Jenkins model in (6.22).Similarly, the input signal u(x) in (6.20) can also be represented through causal filtersu˜(x) =M˜u(λ−1)N˜u(λ−1)r(x)+R˜u(λ−1)S˜u(λ−1)e˜u(x), (6.26)where {u˜(x)} and {u(x)} have the same spectra. The equations (6.23) and (6.26) are necessary for theoptimal input design in the sequel.6.3.3 Covariance matrix equivalence of the causal and noncausal model parameterestimatesIt is well-known that if the white noise is Gaussian distributed, the prediction error method with properly cho-sen criterion coincides with the maximum likelihood estimation. In [82], it is shown that the log-likelihoodfunction of the noncausal ARX model and that of the corresponding causal ARX model converge to the samevalue as the sample number tends to infinity. In this subsection we will demonstrate a similar statement forBox-Jenkins models with closed-loop data.Theorem 6.3.2. Consider the following noncausal process model (θ is the parameter in a compact set Ω)y(x) = g¯(λ ,λ−1,θ)u(x)+ h¯(λ ,λ−1,θ)e(x), (6.27)where g¯ is defined in (6.9)-(6.11) and h¯ is defined in (6.21). e(x) is Gaussian white noise. Suppose thatthe data is generated in closed-loop with controller (6.12)-(6.14) and that all relevant transfer functions areuniformly stable. DenoteL my (y) as the log-likelihood function of the noncausal model (6.27) andLmy˜ (y˜) asthe log-likelihood function of the causal-equivalent model of (6.27) obtained similarly as (6.23). Then, as1086.3. Causal scalar transfer function representation of the CD processm→ ∞ (m is the spatial sample number, i.e., the number of measurement bins),supθ∈Ω|L my (y)−L my˜ (y˜)|w.p.1−−→ 0, (6.28)supθ∈Ω∥∥∥∥dL my (y)dθ − dL my˜ (y˜)dθ∥∥∥∥ w.p.1−−→ 0. (6.29)Proof. The proof of (6.28) follows along Proposition 3 in [82] and the proof for (6.29). We outline the proof for (6.29) in Theorem 6.3.2. First the following lemma is necessary.Lemma 6.3.2. Consider a set of uniformly stable causal or noncausal filters G(λ ,θ),θ ∈Ω, and H(λ ,θ),θ ∈Ω. Define u(x), x = 1, . . . ,m, as a bounded signal sequence, and e(x), x = 1, . . . ,m, as a sequence of Gaus-sian white noise with zero mean and variance σ2. The signal s(x), x = 1, . . . ,m, is generated viasθ (x) = G(λ ,θ)u(x)+H(λ ,θ)e(x). (6.30)Then as m→ ∞, the sample variance of sx(θ) converges uniformly in probability to the ensemble variancesupθ∈Ω∥∥∥∥∥ 1m m∑x=1 sθ (x)sTθ (x)− 1mm∑x=1Esθ (x)sTθ (x)∥∥∥∥∥→ 0,w.p.1. (6.31)Note that (6.31) is an extension of Theorem 2B.1 in [15] to noncausal models. The proof follows asimilar line and is thus omitted here. Based on Lemma 6.3.2, as m→ ∞, the following statements hold• dL my (y,θ)dθ converges uniformly w.r.t. θ in probability;• dLmy˜ (y˜,θ)dθ converges uniformly w.r.t. θ in probability.The reason is that both dLmy (y)dθ andL my˜ (y˜,θ)dθ can similarly be considered as generated from uniformly stablefilters. Thus from Lemma 6.3.2 the above statements hold. On the other hand, from the proof of (6.28) in[82], one can see that both L my (y) and Lmy˜ (y˜,θ) converge uniformly to the same value, denoted as σ2(θ).Based on Theorem 7.17 in [119], with the above statements, we have, as m→ ∞,supθ∈Ω∥∥∥∥dL my (y,θ)dθ − dσ2(θ)dθ∥∥∥∥→ 0, w.p.1, (6.32)supθ∈Ω∥∥∥∥dL my˜ (y˜,θ)dθ − dσ2(θ)dθ∥∥∥∥→ 0, w.p.1. (6.33)1096.4. Closed-loop optimal input designFrom the Triangle Inequality, the result (6.29) follows. It should be pointed out that in this proof the ‘unifor-mity’ of the convergence in probability is a necessary condition for the results to hold.Remark 6.3.3. Theorem 6.3.2 implies that both the log-likelihood function and its derivative with respectparameter θ obtained from the noncausal and causal-equivalent models are identical asymptotically. There-fore, we can conclude that the parameter covariances from these two schemes coincide, and hence we mayperform the optimal input design based solely on the causal-equivalent model.6.4 Closed-loop optimal input designIn this section closed-loop optimal input design for the steady-state CD process is investigated. As mentionedabove we can accomplish this task with the causal-equivalent representation of CD process. Note that inpractice the noise model parameters are of less interest and thus we split the parameter θ as θ = [ρT ηT ]T ,where ρ is the process model parameter and η is the noise model parameter. For optimal input design ourobjective is to minimize the covariance of ρ by selecting the optimal excitation signal. From Theorem 6.3.2the parameter covariance of ρ , Pρ , is expressed asPρ ∼ 1m[12piλ0∫ pi−pi1|h˜(e jω ,η0)|2∂ g˜(e jω ,ρ0)∂ρΦu˜(ω)∂ g˜T (e− jω ,ρ0)∂ρdω]−1, (6.34)where λ0 is the variance of noise e˜y(x). g˜ and h˜ are the causal equivalent forms of g¯ and h¯, respectively. Theinput spectrum Φu˜(ω), according to (6.26), is related to the excitation spectrum Φr(ω) viaΦu˜(ω) =∣∣∣∣M˜u(e− jω)N˜u(e− jω)∣∣∣∣2Φr(ω)+ ∣∣∣∣ R˜u(e− jω)S˜u(e− jω)∣∣∣∣2λ0. (6.35)The closed-loop optimal input design can be formulated as minimizing a function of the parameter co-variance Pρ subject to a set of constraints, e.g., input and output power constraints,minΦr(ω)f0(Pρ(Φr(ω))) (6.36)s.t.12pi∫ pi−piΦu(ω)dω ≤ cu, (6.37)12pi∫ pi−piΦy(ω)dω ≤ cy, (6.38)1106.4. Closed-loop optimal input designwhere cu and cy are the power limits on input and output signals. The constraints (6.37)-(6.38) can be writtenin terms of the design variable Φr(ω) by (6.35) and (6.23), respectively. As this optimization problem is stillinfinite-dimensional (since Φr(ω) is a continuous function of ω), a technique known as finite dimensionalparameterization [62] can be employed to reduce it into finite-dimensional. Specifically, Φr(ω) can beparameterized by the definition of a spectrumΦr(ω) =∑mck=−mc cke− jωk ≥ 0, ∀ω, (6.39)where ck, k = −mc, . . . ,mc, are the parameters, and mc is the selected number of parameters. With (6.39)the original optimization problem can be cast into one with finite number of parameters. Note that thenon-negativity constraint on the parameterized spectrum (6.39) at any frequency has to be satisfied whilesearching for the optimal ck. This requirement can be fulfilled by using the KYP lemma and constructinga controllable and observable state-space realization for the spectrum [62]. After these modifications theresulting optimization problem is convex (choose f0(·) to be a convex function, such as the trace of inversefunction, negative log determinant function and negative maximum eigenvalue function) and can be readilysolved by off-the-shelf solvers such as the CVX toolbox. In Example 6.5 below, we choose the negative logdeterminant function as f0(·).Remark 6.4.1. Note that the aforementioned optimal input design only considers the power constraints onthe input and output (6.37)-(6.38). However, in practice, the hard constraints on CVs and MVs make moresense and this is still an open problem for frequency-domain optimal input design. Besides, specific to theCD process, the second-order bending constraints preventing ‘picketing’ on actuators are also important.These practical constraints are beyond the scope of this paper and will be investigated in future work.Remark 6.4.2. A common issue for optimal input design is that the covariance matrix depends on the trueparameter values, as shown in (6.34), which may not be accessible in practice. One remedy is the adaptiveinput design scheme: specifying an initial parameter value to design an optimal input signal, updating pa-rameter estimates via identifications and using the updated parameter value to design a new optimal input.This process is performed iteratively until it converges.Remark 6.4.3. The optimal input design approach proposed above is only meant for the CD steady-statemodel. In fact, the practically implemented input design for CD identification is two-dimensional involvingboth spatial and temporal variations. For the design of optimal input in the temporal direction, the mature1116.5. Exampletechniques developed in recent years for traditional dynamic systems can be applied directly. In this work, forsimplicity, in the next chapter for integrated simulations, we use temporal PRBS signal as the time-domainvariations and spatial optimal signal obtained above to composite the actual CD excitation signal.6.5 ExampleIn this section we use a simulation example to validate the proposed CD process modeling and closed-loopoptimal input design methods. In particular, we would compare the effect of optimally designed input onparameter estimates with that of bump excitation signal that is currently employed in the industry [72].In practice, the spatial response shape of a single actuator is assumed to satisfy the nonlinear equationin (1.2) with four spatial parameters [12]. In this example, these parameters are specified with values,respectively, γ = 0.3802, ξ = 268.6414, β = 0.10, α = 3.5. The response shape under impulse signal ofamplitude 5 is illustrated as the red curve (upper plot) in Fig. 6.2. For convenience we assume that the CDprocess has 222 actuators and measurement bins. The controller is chosen to be CD-MPC with predictionhorizon 25 samples and control horizon 5 samples (sampling interval is 12 seconds). The weighting matricesin the cost function is selected to be Q1 = 0.4I and Q3 = 0.1667I. The parameter αK in (6.4) is computedto be 12.3212. From Section 6.3 one is able to obtain noncausal scalar models for the CD process and thecontroller, respectively. The impulse response curves of these noncausal models are shown in Fig. 6.2 in theblue dash-dotted curves. Note that for simplicity we have chosen nb = n f = 1, na = ne = 2. Higher ordermodels improve the quality of estimates but also increase the computational cost in designing the optimalinput. The noise variance is chosen to be 0.1 with noise model φ = I (output-error structure). Fig. 6.3 showsthe optimal spectrum based on causal-equivalent models with cu = 50. Notice that small process gain (thecausal-equivalent model has even smaller gain) as in this case requires a large excitation signal to achieve agood signal-to-noise ratio.To make a fair comparison between the optimal and bump excitations, we set a hard constraint ±10 onthe amplitude of excitation signals. For the optimally designed input, if any part of its amplitude violatesthis constraint, we set that part at the corresponding bound. For the bumped signal, the amplitudes of bumpsalternate between −10 and 10. In order to further show the optimality of designed input, we generateanother excitation signal that is a white noise sequence with the same variance as the optimal input. Foreach excitation signal we perform 100 Monte-Carlo simulations and a process model is identified in each1126.5. ExampleCD Location [mm]1200 1400 1600 1800 2000 2200 2400 2600 2800 3000-1012Impulse Responses of the ProcessTrueEstimateCD Location [mm]1200 1400 1600 1800 2000 2200 2400 2600 2800 3000-200204060Impulse Responses of the ControllerTrueEstimateFigure 6.2: The impulse response of a single actuator (red solid line) and the impulse response of the esti-mated noncausal transfer function (blue dash-dotted line).simulation. Fig. 6.4 shows the impulse responses of estimated models under these three excitation signals.It can been observed that estimates under the optimal input have the smallest variance while estimates underthe bumped signal show the largest variance. Specifically, the averaged errors of estimated impulse responsesrelative to the true response are shown to be 0.0643, 1.3344 and 0.4479, respectively, for the optimal input,bumped input and white noise input. Thus we can conclude that our optimal input outperforms the bumpexcitation signal and white noise signal (with the same variance) in terms of the identification performance.ω10-3 10-2 10-1 100 101Φr(ω)050100150200250Spectrum of Optimal Input based on Causal ModelFigure 6.3: Spectrum of the optimal input based on causal-equivalent model of the CD process.1136.6. SummaryCD Location [mm]1200 1400 1600 1800 2000 2200 2400 2600 2800 3000-1012Impulse Responses of the Process Under Optimal InputTrueCD Location [mm]1200 1400 1600 1800 2000 2200 2400 2600 2800 3000-1012Impulse Responses of the Process Under Bumped InputTrueCD Location [mm]1200 1400 1600 1800 2000 2200 2400 2600 2800 3000-1012Impulse Responses of the Process Under White Noise InputTrueFigure 6.4: The impulse responses of the estimated process model in the closed-loop under the optimallydesigned input (upper plot), the bumped input (middle plot), and the white noise input (bottom plot) in 100Monte-Carlo simulations.6.6 SummaryWe developed an approach to represent the closed-loop steady-state CD process model with noncausal scalartransfer functions. The advantage of using noncausal models is that it circumvents the problem of largedimensions associated with MIMO representations of CD processes. We then show that these noncausaltransfer functions can be further represented by spectrally equivalent causal transfer functions. A closed-loop optimal input design framework is proposed based on these causal-equivalent models. An example isprovided to validate the proposed approaches and demonstrate the advantage of optimal input in system iden-tification performance over the industrial practice of using bump excitation as well as white noise excitation.114Chapter 7CD Iterative Identification and ControlIn this section we integrate controller performance assessment, model-plant mismatch detection, optimalinput design and closed-loop identification together. The controller performance assessment and mismatchdetection combine as the monitoring block and users can make overall decisions by observing the respectivedetection results. Notice that not all model-plant mismatches can significantly degrade the controller per-formance and also that the controller performance assessment technique is only applicable to steady-statedata. In other words, the performance index from the controller performance assessment may not give muchinformation about the transient performance. Therefore, our strategy is to use the diagnosis results fromthe MPM detection as the main factor in considering whether we inject closed-loop excitations. The con-troller performance assessment results serve as a side metric on how good the controller is performing atsteady-state. The users can modify this strategy based on their own preference or demand.The process in this example is a single-array CD process with dry weight as the measured profile andautoslice as the actuators. There are 222 measurement bins in the cross direction and 74 actuator zones.Parameters selected in this simulation example are shown in Table 7.1.Table 7.1: Summary of parameters setup in adaptive control simulationParameters Values Parameters ValuesMoving window size 40 min Actuator zone width 62.5194 mmWindow step size 5 min Measurement bin width 20.8398 mmSampling interval 12 seconds # of measurement bins 222Training stage duration 4.17 hours # of actuator zones 74Identification duration 1.67 hours Nominal spatial parameters [0.38 268 0.1 1.5]MPM inspection interval 1 hour Nominal temporal parameters [126 35.98]Mismatch threshold 0.8 Spatial alpha ratio 1.3∼ 4# of ID iterations 10 Temporal alpha ratio 3Note that the MPM inspection interval determines how many of last few SVM scores are used to discoverthe existence of mismatch. If the percentage of negative SVM scores is greater than the mismatch threshold,1157.1. Case I: No model-plant mismatchthen we would raise a mismatch alarm. The parameter spatial/temporal alpha ratio shows the extent to whichwe would like to enlarge the initial cluster of process model estimates from the training data. At the beginningof simulation we choose the initial parameters as θ iC = [0.3 200 0.2 4.0]T , θ iT = [60 2]T . In the case withmismatch and subsequent closed-loop identification we will update the initial values after identification intothose obtained from the identification stage. Note also that our simulation is based on the assumption thatthe output noise is white. However, the underlying idea can be extended to the case where process output isaffected by colored noise (cf. Section 5.3).7.1 Case I: No model-plant mismatchIn this case study we examine our controller performance assessment, mismatch detection and closed-loopidentification algorithms when there is no discrepancy between the true plant and plant model used by CDMPC. Initially there is a steady-state disturbance acting on the CD output profile and it is rapidly attenuatedby the CD controller. We apply the whole adaptive control algorithm since the beginning of the simula-tion, however, the first few parameter estimates are discarded since they may not be reliable due to thenon-stationary transient behavior during the rejection of steady-state disturbance. This logic is performedsimilarly for the following case studies when there exist model-plant mismatches. Fig. 7.1 demonstrates thesampled output and input data during the normal operation free of mismatches. Obviously the system entersinto a steady-state mode after the initial transient part in compensating for the steady-state disturbance andthere is no significant increase in the variations of output and input at steady-state. Fig. 7.2 displays the ex-plicit parameter estimates at each moving windows, for both spatial and temporal models, from which we donot observe apparent trend of a parameter drifting away from the true parameter (shown in red dash-dottedline). There are some biases in several parameter estimates such as the gain and time constant and they comefrom the ridge regularization in the high-order ARX modeling step of iterative identifications. However, asremarked above regarding our mismatch detection method, the changes in parameter estimates are of impor-tance rather than the accuracy of these parameter estimates relative to the true values. The adaptive controlsimulation results are illustrated in Fig. 7.3. One can see that the user-specified benchmark stays close toone all the time, and SVM scores for both spatial and temporal models are positive most of the time whichimplies that there is no mismatch existing in the profiles. The 2σ values for output and input are almost flatat a level after initial drops due to the rejection of steady-state disturbance.1167.2. Case II: Gain mismatchFigure 7.1: No mismatch: The collected output and input profiles0 500 1000 1500 2000 2500 3000 3500 400000.51Gain Estimates0 500 1000 1500 2000 2500 3000 3500 40005101520Width EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 400000.20.4Divergence Estimates0 500 1000 1500 2000 2500 3000 3500 4000123Attenuation Estimates0 500 1000 1500 2000 2500 3000 3500 4000100200300Time Constant EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000123Delay EstimatesFigure 7.2: No mismatch: Estimates of parameters in each moving window7.2 Case II: Gain mismatchIn this case study we concentrate on the effects of gain model-plant mismatch. For such purpose we letthe simulator run for a while without any mismatch and then we manually introduce a gain mismatch tothe simulation after 2000 samples. To be specific, initially both true plant gain and model gain are set tobe equal, i.e., γp = γm, where γp is the gain of true plant, and γm = 0.38 is the process model gain. Afterobtaining 2000 samples (approximately 6.67 hours) of input-output data we create a model-plant mismatchby altering the true plant gain to γp = 2γm gradually while keeping the model gain unchanged. The gatheredinput-output data and parameter estimates from moving windows are shown in Fig. 7.4 and Fig. 7.5, respec-tively. From the input-output profile plot we can see that the mismatch detection algorithm successfully findsthe mismatch shortly after we introduce the mismatch. The identification experiment starts with injecting1177.2. Case II: Gain mismatchDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 400000.51User-specified Performance IndexTime in Samples0 500 1000 1500 2000 2500 3000 3500 40000100200SVM Scores for Temporal Dynamic ModelDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 40000100200SVM Scores for Steady-State Model0 500 1000 1500 2000 2500 3000 3500 40000.512  of DW120 500 1000 1500 2000 2500 3000 3500 4000Time in Samples2342  of AutosliceFigure 7.3: No mismatch: Adaptive CD MPC control resultsexcitation signals into the system and ends with identifying a new process model. The mismatch is moreclearly shown in the gain parameter estimates plot Fig. 7.5 in which gain estimates move up to a new levelafter we create the mismatch. Fig. 7.6 presents some key metrics over the entire adaptive control simulation.One can find that the user-specified benchmark starts to drop after the mismatch takes place until the mov-ing window does not cover non-stationary data after the closed-loop identification. Note that if a movingwindow contains portions of data from both excitation and routine operation periods, then one expects thatthe performance index is low since stationary data is required for the performance assessment algorithm toperform well. However, the closed-loop identification is still applicable as long as the data meets the require-ments of quasi stationarity. After the mismatch is introduced both temporal and spatial SVMs return negativescores showing the possibility of mismatch. In fact, we obtain a series of negative scores especially from thespatial SVM reports and the percentage of negative scores in the last one hour of moving windows exceedsthe threshold. Therefore, a decision is made stating that a mismatch indeed occurs and we then trigger anidentification experiment. Note that after the identification experiment we collect an extra portion of routineoperation data in order to resume a new SVM training. The updated process model is acquired by averagingthe parameter estimates over the moving windows within the identification stage and is then deployed to theCD MPC (termed as controller retuning). We can compare the 2σ of input and output before identificationexperiment and after controller retuning to find the improvement of control performance of using updatedCD controller.1187.3. Case III: Width, divergence, and attenuation mismatchesFigure 7.4: Gain mismatch: simulated output and input profile0 500 1000 1500 2000 2500 3000 3500 400000.51Gain Estimates0 500 1000 1500 2000 2500 3000 3500 40005101520Width EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 400000.20.4Divergence Estimates0 500 1000 1500 2000 2500 3000 3500 4000123Attenuation Estimates0 500 1000 1500 2000 2500 3000 3500 4000100200300Time Constant EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000123Delay EstimatesFigure 7.5: Gain mismatch: Estimates of parameters in each moving window7.3 Case III: Width, divergence, and attenuation mismatchesSimilar to the gain mismatch case, in this subsection we study the influences of the other three parametricmismatches on CD MPC performance. For the width mismatch we set ξp = 1.3ξm after 2000 samples andremain ξm unchanged. The input-output data under this scenario is demonstrated in Fig. 7.7. Clearly theseplots show the same pattern as the gain mismatch case, i.e., the adaptive control scheme consists of fivestages: monitoring, mismatch detected, closed-loop identification, controller retuning and monitoring againat a new operation condition. The parameter estimates and adaptive control simulation results are shown inFig. 7.8 and Fig. 7.9, respectively. For the divergence mismatch, we set βp = 3βm after 2000 sample. Theinput-output data, parameter estimation results and adaptive control metrics are shown in Fig. 7.10, Fig. 7.11and Fig. 7.12, respectively. Again, for the scenario of attenuation mismatch, we change αp = 0.2αm. Note1197.4. Case IV: Time constant mismatchDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 400000.51User-specified Performance IndexTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Temporal Dynamic ModelDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Steady-State Model0 500 1000 1500 2000 2500 3000 3500 40000.512  of DW120 500 1000 1500 2000 2500 3000 3500 4000Time in Samples2342  of AutosliceFigure 7.6: Gain mismatch: Adaptive CD MPC control resultsthat the effect of attenuation parameter is not significant in producing CD MPM, and thus we create a largemismatch here for an illustrative purpose. The corresponding simulation results are shown in Fig. 7.13, Fig.7.14 and Fig. 7.15, respectively. In summary, for these three scenarios, our adaptive control algorithm isable to detect the mismatch even with routine operating data. Closed-loop excitation signals are injected intothe system once the mismatch is observed. Careful examination of parameter estimates reveals that duringthe identification stage parameters can be accurately identified due to the high-quality excitation signal. Weaverage the parameters within that period to attain a relatively reliable estimate of the true parameter anddeploy it to the CD controller. Most of the time the input and output variations are reduced after retuningwhich implies a better control performance.7.4 Case IV: Time constant mismatchIn the last case study we focus on the temporal parametric mismatch, i.e., time constant mismatch. Through-out the entire study we assume that the time delay is known a priori and is specified to the identificationalgorithm. The true time constant of plant remains unchanged in the first 2000 samples. After that we in-crease it to two times of the true parameter value but remain the time constant of process model unchanged.Fig. 7.16, Fig. 7.17 and Fig. 7.18 demonstrate the corresponding simulation results. Analogous to othersimulations above we can see here that the mismatch detection algorithm is able to detect the mismatch andthen correct it afterwards. Increase in the performance index and reduction of input and output variance showthat the re-tuned control can improve the control performance.1207.5. Discussion of the simulation results7.5 Discussion of the simulation resultsFrom the case studies above, one can see that for the normal case, both user-specified benchmark and SVMMPM detection method give consistent indications of good control performance without any false alarmson MPM. For the gain mismatch case, as the mismatch is significant, user-specified benchmark manages todetect the deterioration in control performance. Correspondingly, the MPM detection method also finds thechanges in the process gain. Both indications suggest the triggering of an identification experiment, duringwhich the output shows large fluctuations. Thus closed-loop identification is destructive to the continuousoperation but fortunately, with our adaptive control scheme, the operator do not need to intentionally initiate abatch identification but rather just introduce the dither signal. With this method, we can thus avoid the cost onperforming batch identification experiment. After ru-tuning the controller, the control performance recoversto an excellent level. For the other three spatial mismatches, one can see that the SVM MPM detectionapproach is rather sensitive to mismatches, while the performance index is not affected evidently. Thissuggests that despite of the presence of these mismatches, the inherent robustness of CD MPC can handlethem to a great extent and thus the control performance is still satisfactory. For illustrative purposes, in ourexamples, we let these mismatches trigger the subsequent identifications. However, in practice, as the controlperformance is still high, it is not necessary to start the identification experiment under such scenarios. Theseexamples imply that a combination of performance index and MPM detection results is more advisable inpractice. For the last case study on time constant mismatch, notice that we intentionally introduce a largemismatch, and the control performance index still remains high. This coincides with the observations in theliterature than the impact of time constant mismatch on control performance is much less compared with theimpacts of other mismatches. A plausible explanation on this is that the performance index is proposed basedon steady-state operating data where the effect of time constant on control performance is not significant.7.6 SummaryThis chapter presents several examples from CD and MD paper machine simulators to demonstrate the adap-tive control framework. It is observed that the proposed control performance assessment and MPM detectionapproaches are applicable to routine operating data. Specifically, the user-specified performance index candetect significant mismatches while the SVM technique is sensitive to parametric mismatches. Therefore,a combination of these two techniques are recommended to raise alarms and trigger the subsequent closed-1217.6. Summaryloop identification experiment. In addition, it is shown that the proposed closed-loop identification can becontinuously applied to both routine operating stage and experimental stage. The control performance canbe recovered after re-tuning the MPC based on updated process models.Figure 7.7: Width mismatch: The collected output and input profiles0 500 1000 1500 2000 2500 3000 3500 400000.51Gain Estimates0 500 1000 1500 2000 2500 3000 3500 40005101520Width EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 400000.20.4Divergence Estimates0 500 1000 1500 2000 2500 3000 3500 400011.52Attenuation Estimates0 500 1000 1500 2000 2500 3000 3500 4000100200300Time Constant EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000123Delay EstimatesFigure 7.8: Width mismatch: Estimates of parameters in each moving window1227.6. SummaryDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 400000.51User-specified Performance IndexTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Temporal Dynamic ModelDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Steady-State Model0 500 1000 1500 2000 2500 3000 3500 40000.512  of DW120 500 1000 1500 2000 2500 3000 3500 4000Time in Samples2342  of AutosliceFigure 7.9: Width mismatch: Adaptive CD MPC control resultsFigure 7.10: Divergence mismatch: The collected output and input profiles0 500 1000 1500 2000 2500 3000 3500 400000.51Gain Estimates0 500 1000 1500 2000 2500 3000 3500 40005101520Width EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 400000.20.4Divergence Estimates0 500 1000 1500 2000 2500 3000 3500 4000123Attenuation Estimates0 500 1000 1500 2000 2500 3000 3500 4000100200300Time Constant EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000123Delay EstimatesFigure 7.11: Divergence mismatch: Estimates of parameters in each moving window1237.6. SummaryDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 400000.51User-specified Performance IndexTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Temporal Dynamic ModelDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Steady-State Model0 500 1000 1500 2000 2500 3000 3500 40000.512  of DW120 500 1000 1500 2000 2500 3000 3500 4000Time in Samples2342  of AutosliceFigure 7.12: Divergence mismatch: Adaptive CD MPC control resultsFigure 7.13: Attenuation mismatch: The collected output and input profiles0 500 1000 1500 2000 2500 3000 3500 400000.51Gain Estimates0 500 1000 1500 2000 2500 3000 3500 40005101520Width EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 400000.20.4Divergence Estimates0 500 1000 1500 2000 2500 3000 3500 4000012Attenuation Estimates0 500 1000 1500 2000 2500 3000 3500 4000100200300Time Constant EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000123Delay EstimatesFigure 7.14: Attenuation mismatch: Estimates of parameters in each moving window1247.6. SummaryDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 400000.51User-specified Performance IndexTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Temporal Dynamic ModelDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Steady-State Model0 500 1000 1500 2000 2500 3000 3500 40000.512  of DW120 500 1000 1500 2000 2500 3000 3500 4000Time in Samples2342  of AutosliceFigure 7.15: Attenuation mismatch: Adaptive CD MPC control resultsFigure 7.16: Time constant mismatch: The collected output and input profiles0 500 1000 1500 2000 2500 3000 3500 400000.51Gain Estimates0 500 1000 1500 2000 2500 3000 3500 40005101520Width EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 400000.20.4Divergence Estimates0 500 1000 1500 2000 2500 3000 3500 4000123Attenuation Estimates0 500 1000 1500 2000 2500 3000 3500 4000100200300Time Constant EstimatesTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000123Delay EstimatesFigure 7.17: Time constant mismatch: Estimates of parameters in each moving window1257.6. SummaryDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 400000.51User-specified Performance IndexTime in Samples0 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Temporal Dynamic ModelDW12 Profile500 1000 1500 2000 2500 3000 3500 4000501001502000 500 1000 1500 2000 2500 3000 3500 4000-2000200SVM Scores for Steady-State Model0 500 1000 1500 2000 2500 3000 3500 40000.512  of DW120 500 1000 1500 2000 2500 3000 3500 4000Time in Samples2342  of AutosliceFigure 7.18: Time constant mismatch: Adaptive CD MPC control results126Chapter 8ConclusionsThis dissertation has detailed an adaptive MPC framework for both MD and CD processes. This entireadaptive control scheme consists of four building blocks, namely performance assessment, MPM detection,optimal input design and closed-loop identification. For MD processes, which can be represented by SISOor low-dimensional models, we mainly concentrate on the presentation of the proposed MPM detection andclosed-loop identification methods. For CD processes, we extend the closed-loop identification and MPMdetection method by accounting for high dimensions and structural features associated with the CD process.Besides, we propose a novel performance assessment technique and optimal input design approach that arecomputationally friendly and effective for adaptive control.Specifically, for the closed-loop identification of MD processes, we present a novel ARX-OE methodthat is suitable for both routine operating data and experimental data. The proposed ARX-OE method avoidsthe bias issue due to the misspecification of noise model in direct identification method. There are twomajor steps in the proposed method. In the first step, the original model is approximated with a high-orderARX model and closed-loop identification is performed for the ARX model. In the second step, an OEidentification is conducted with filtered input-output data based on the estimated ARX model from the firststep. Theoretical analysis shows that the proposed ARX-OE method can provide consistent estimates for theparameters. We also present a thorough analysis on the informativeness of closed-loop data for this method.For MD MPM detection, we apply the proposed ARX-OE method to routine operating data (split in totraining and test sections) to obtain rough estimates for process and noise models. Model estimates from thetraining section are regarded as the normal cluster without MPM and a one-class SVM model is formed todepict this cluster. Model estimates from test data are examined by the SVM model to forecast the occurrenceof MPM. This scheme can not only apply to routine data, but also discriminant the MPM from noise modelchanges, thus reducing the false alarms significantly. A resampling technique is proposed prior to performingthe one-class SVM training to solve the small sample problem associated with the training data.127Chapter 8. ConclusionsAs for the CD process, we first demonstrate a practical approach for controller performance assessmentas an extension of the MVC benchmark to large-scale systems. This CD MVC approach takes account ofboth spatial and temporal performance limitations associated with CD processes. Furthermore, to overcomethe aggressiveness of this benchmark, we incorporate the information about tuning status of the CD con-troller, both in spatial and temporal directions, into a user-specified benchmark. An algorithm based onstructured multivariate time series modeling is put forward to estimate the proposed CD performance index.Simple computation involved in this algorithm ensures the applicability of the proposed method for onlineperformance monitoring.We further extend the technique of MD MPM detection to CD processes. The main contribution is thatwe propose a novel CD closed-loop identification method based on separable least-squares and the closed-loop ARX-OE method. Conditions on the informativeness of closed-loop data are illustrated and it showsthat most practical situations (e.g., complicated MPC constraints, presence of various disturbances and largetime-delays) can provide informative enough data during routine operating stage. Furthermore, asymptoticanalysis of the CD closed-loop identification shows that the proposed identification algorithm can convergeto local minima and supply consistent estimates for the parameters. The SVM-based CD MPM detectionmethod is then proposed that can monitoring spatial, temporal and noise models separately and operate withroutine data that is absent of external excitations.In terms of CD closed-loop optimal input design, we propose a noncausal method that is based onnoncausal modeling of CD processes. This method can effectively address the complexity of input designfor spatially distributed systems due to the large number of parameters. With noncausal modeling, a low-order scalar noncausal transfer function can be used to approximate the original high-dimensional CD model.It is shown that the covariance matrix of parameter estimates for this noncausal model is equivalent to that ofa causal model. With this observation, a framework of closed-loop optimal input design is proposed basedon the causal-equivalent model, which greatly reduces the difficulties in this process.Finally, the proposed performance assessment, MPM detection, optimal input design and closed-loopidentification methods are extensively tested on the CD and MD simulators. These simulators can providehigh-fidelity simulations of practical operations of paper machines. The overall adaptive control scheme isalso examined through examples by simulating various spatial and temporal parametric mismatches. Thesesimulation results illustrate that the proposed framework can automatically account for process changes with-out human interventions. Thus with this adaptive control scheme, the industry can save enormous costs on128Chapter 8. Conclusionsthe maintenance of model-based controls. It is noted that the proposed scheme can also be easily transferredto other industrial processes.Recommendations for future studies include further improvements on the closed-loop identification tech-niques for routine operating data especially when the controller is simple, e.g., PID controllers. A promisingdirection is to find more advanced and succinct representations than high-order ARX models to alleviatethe requirements for the informativeness of closed-loop data. For example, the generalized orthonormal ba-sis filter (GOBF) ARX models can serve as a better alternative than high-order ARX. Practical tricks suchas using faster sampling rate to increase the time-delay can also enhance the quality of routine data. Forclosed-loop input design of noncausal CD processes, time domain approaches, especially the recent tech-niques developed based on graph theory, can be explored to meet physical constraints on inputs and outputs.Although there is potential for further improvement, our adaptive control scheme has reached to the stage ofautomatically monitoring the process and re-tuning controllers with feedback control in the loop and withoutuser interruption.129Bibliography[1] Stevo Mijanovic. Paper machine cross-directional control near spatial domain boundaries. PhDthesis, The University of British Columbia, Vancouver, Canada, 2004.[2] Stephen R. Duncan. The cross directional control of a web forming process. PhD thesis, ImperialCollege London, London, UK, 1989.[3] Rafael M. Morales and William P. Heath. The robustness and design of constrained cross-directionalcontrol via integral quadratic constraints. IEEE Transactions on Control Systems Technology,19(6):1421–1432, 2011.[4] Tapio Ma¨enpa¨a¨. Robust model predictive control for cross-directional processes. Helsinki Universityof Technology, Helsinki, Finland, 2006.[5] William P. Heath. Orthogonal functions for cross-directional control of web forming processes. Au-tomatica, 32(2):183–198, 1996.[6] Kristinn Kristinsson and Guy A. Dumont. Cross-directional control on paper machines using Grampolynomials. Automatica, 32(4):533–548, 1996.[7] Petter Lundstrom, Sigurd Skogestad, and John C. Doyle. Two-degree-of-freedom controller designfor an ill-conditioned distillation process using µ-synthesis. IEEE Transactions on Control SystemsTechnology, 7(1):12–21, 1999.[8] Sigurd Skogestad, Manfred Morari, and John C. Doyle. Robust control of ill-conditioned plants:High-purity distillation. IEEE Transactions on Automatic Control, 33(12):1092–1105, 1988.[9] Apostolos Rigopoulos and Yaman Arkun. Reduced order cross-directional controller design for sheetforming processes. IEEE Transactions on Control Systems Technology, 11(5):746–756, 2003.130Bibliography[10] Dimitry Gorinevsky, Robert Vyse, and Michael Heaven. Performance analysis of cross-directionprocess control using multivariable and spectral models. IEEE Transactions on Control Systems Tech-nology, 8(4):589–600, 2000.[11] Gregory E. Stewart, Dimitry M. Gorinevsky, and Guy A. Dumont. Feedback controller design fora spatially distributed system: The paper machine problem. IEEE Transactions on Control SystemsTechnology, 11(5):612–628, 2003.[12] Junqiang Fan. Model predictive control for multiple cross-directional processes: Analysis, tuning andimplementation. PhD thesis, The University of British Columbia, Vancouver, Canada, 2003.[13] Andrew P. Featherstone, Jeremy G. VanAntwerp, and Richard D. Braatz. Identification and Controlof Sheet and Film Processes. Springer Science & Business Media, 2000.[14] Jeremy G. VanAntwerp, Andrew P. Featherstone, Richard D. Braatz, and Babatunde A. Ogunnaike.Cross-directional control of sheet and film processes. Automatica, 43(2):191–211, 2007.[15] Lennart Ljung. System Identification: Theory for the User. Springer, 1999.[16] Michel Gevers, Xavier Bombois, Roland Hildebrand, and Gabriel Solari. Optimal experiment de-sign for open and closed-loop system identification. Communications in Information and Systems,11(3):197–224, 2011.[17] S. Joe Qin. Control performance monitoring – A review and assessment. Computers & ChemicalEngineering, 23(2):173–186, 1998.[18] Biao Huang and Sirish L. Shah. Performance Assessment of Control Loops: Theory and Applications.Springer Science & Business Media, 1999.[19] T.J. Harris, C.T. Seppala, and L.D. Desborough. A review of performance monitoring and assessmenttechniques for univariate and multivariate control systems. Journal of Process Control, 9(1):1–17,1999.[20] Sigurd Skogestad and Ian Postlethwaite. Multivariable Feedback Control: Analysis and Design, vol-ume 2. Wiley New York, 2007.131Bibliography[21] Thomas J. Harris. Assessment of control loop performance. The Canadian Journal of ChemicalEngineering, 67(5):856–861, 1989.[22] Thomas J. Harris, F. Boudreau, and John F. MacGregor. Performance assessment of multivariablefeedback controllers. Automatica, 32(11):1505–1518, 1996.[23] Jie Yu and S. Joe Qin. Statistical MIMO controller performance monitoring. Part II: Performancediagnosis. Journal of Process Control, 18(3):297–319, 2008.[24] C.B. Lynch and Guy A. Dumont. Control loop performance monitoring. IEEE transactions on ControlSystems Technology, 4(2):185–192, 1996.[25] Biao Huang, Sirish L. Shah, K. Ezra Kwok, and Jim Zurcher. Performance assessment of multi-variate control loops on a paper-machine headbox. The Canadian Journal of Chemical Engineering,75(1):134–142, 1997.[26] N.F. Thornhill, M. Oettinger, and P. Fedenczuk. Refinery-wide control loop performance assessment.Journal of Process Control, 9(2):109–124, 1999.[27] M.J. Grimble. Generalized minimum variance control law revisited. Optimal Control Applicationsand Methods, 9(1):63–77, 1988.[28] M.J. Grimble and P. Majecki. Weighting selection for controller benchmarking and tuning. IndustrialControl Centre, University of Strathclyde, Technical Report PAM-12-TN-1-V1, Glasgow, 2004.[29] Andrzej Ordys, Damien Uduehi, and Michael A. Johnson. Process Control Performance Assessment:from Theory to Implementation. Springer Science & Business Media, 2007.[30] Mohieddine Jelali. Control Performance Management in Industrial Automation: Assessment, Diag-nosis and Improvement of Control Loop Performance. Springer Science & Business Media, 2012.[31] Byung-Su Ko and Thomas F. Edgar. Performance assessment of constrained model predictive controlsystems. AIChE Journal, 47(6):1363–1371, 2001.[32] Ashraf AlGhazzawi and Barry Lennox. Model predictive control monitoring using multivariate statis-tics. Journal of Process Control, 19(2):314–327, 2009.132Bibliography[33] Stephen R. Duncan, Guy A. Dumont, and Dimitry M. Gorinevsky. Evaluating the performance ofcross-directional control systems. In Proceedings of the 1999 American Control Conference, vol-ume 5, pages 3092–3096, 1999.[34] Stephen R. Duncan, Guy A. Dumont, and Dimitry M. Gorinevsky. Performance monitoring for cross-directional control systems. Proceedings of the Control Systems Conference, pages 173–177, 2000.[35] A.R. Taylor and Stephen R. Duncan. Bayesian methods for control loop performance assessment incross-directional control. IFAC Proceedings Volumes, 38(1):434–439, 2005.[36] Mahdi Yousefi. Performance assessment and online input design for closed-loop identification of ma-chine directional properties on paper machines. Master Thesis, The University of British Columbia,Vancouver, Canada, 2014.[37] Siyun Wang and Michael Baldea. Autocovariance-based MPC model mismatch estimation for SISOsystems. In 2015 the 54th IEEE Conference on Decision and Control (CDC), pages 3032–3037, 2015.[38] Viviane Botelho, Jorge Ota´vio Trierweiler, Marcelo Farenzena, and Ricardo Duraiski. Perspectivesand challenges in performance assessment of model predictive control. The Canadian Journal ofChemical Engineering, 94(7):1225–1241, 2016.[39] Rohit S. Patwardhan and Sirish L. Shah. Issues in performance diagnostics of model-based controllers.Journal of Process Control, 12(3):413–427, 2002.[40] Nives Stanfelj, Thomas E. Marlin, and John F. MacGregor. Monitoring and diagnosing process controlperformance: The single-loop case. Industrial & Engineering Chemistry Research, 32(2):301–314,1993.[41] Parthasarathy Kesavan and Jay H. Lee. Diagnostic tools for multivariable model-based control sys-tems. Industrial & Engineering Chemistry Research, 36(7):2725–2738, 1997.[42] Fredrik Gustafsson and Stefan F. Graebe. Closed-loop performance monitoring in the presence ofsystem changes and disturbances. Automatica, 34(11):1311–1326, 1998.[43] Biao Huang. Multivariable model validation in the presence of time-variant disturbance dynamics.Chemical Engineering Science, 55(20):4583–4595, 2000.133Bibliography[44] Michele Basseville. On-board component fault detection and isolation using the statistical local ap-proach. Automatica, 34(11):1391–1415, 1998.[45] Biao Huang, Ashish Malhotra, and Edgar C. Tamayo. Model predictive control relevant identificationand validation. Chemical Engineering Science, 58(11):2389–2401, 2003.[46] Qiang Zhang and Shao-Yuan Li. Performance monitoring and diagnosis of multivariable model pre-dictive control using statistical analysis. Chinese Journal of Chemical Engineering, 14(2):207–215,2006.[47] Jeremy S. Conner and Dale E. Seborg. Assessing the need for process re-identification. Industrial &Engineering Chemistry Research, 44(8):2767–2775, 2005.[48] Abhijit S. Badwe, Ravindra D. Gudi, Rohit S. Patwardhan, Sirish L. Shah, and Sachin C. Patwardhan.Detection of model-plant mismatch in MPC applications. Journal of Process Control, 19(8):1305–1313, 2009.[49] Christopher A. Harrison and S. Joe Qin. Discriminating between disturbance and process modelmismatch in model predictive control. Journal of Process Control, 19(10):1610–1616, 2009.[50] Abhijit S. Badwe, Rohit S. Patwardhan, Sirish L. Shah, Sachin C. Patwardhan, and Ravindra D.Gudi. Quantifying the impact of model-plant mismatch on controller performance. Journal of ProcessControl, 20(4):408–425, 2010.[51] Zhijie Sun, S. Joe Qin, Ashish Singhal, and Larry Megan. Performance monitoring of model-predictive controllers via model residual assessment. Journal of Process Control, 23(4):473–482,2013.[52] Suraj Yerramilli and Arun K. Tangirala. Detection and diagnosis of model-plant mismatch in MIMOsystems using plant-model ratio. IFAC-PapersOnLine, 49(1):266–271, 2016.[53] Graham C. Goodwin and Robert L. Payne. Dynamic System Identification: Experiment Design andData Analysis. Academic Press, 1977.[54] Raman K. Mehra. Optimal input signals for parameter estimation in dynamic systems–Survey andnew results. IEEE Transactions on Automatic Control, 19(6):753–768, 1974.134Bibliography[55] Martin B. Zarrop. Optimal experiment design for dynamic system identification. PhD thesis, ImperialCollege London, London, UK, 1977.[56] Lennart Ljung and Zhen-Dong Yuan. Asymptotic properties of black-box identification of transferfunctions. IEEE Transactions on Automatic Control, 30(6):514–530, 1985.[57] Lennart Ljung. Asymptotic variance expressions for identified black-box transfer function models.IEEE Transactions on Automatic Control, 30(9):834–844, 1985.[58] Michel Gevers and Lennart Ljung. Optimal experiment designs with respect to the intended modelapplication. Automatica, 22(5):543–554, 1986.[59] Ha˚kan Hjalmarsson, Michel Gevers, and Franky D. Bruyne. For model-based control design, closed-loop identification gives better performance. Automatica, 32(12):1659–1673, 1996.[60] Stephen P. Boyd, Laurent E.I. Ghaoui, Eric Feron, and Venkataramanan Balakrishnan. Linear MatrixInequalities in System and Control Theory. SIAM, 1994.[61] Xavier Bombois, Ha˚kan Hjalmarsson, and Ge´rard Scorletti. Identification for robust H2 deconvolutionfiltering. Automatica, 46(3):577–584, 2010.[62] Henrik Jansson and Ha˚kan Hjalmarsson. Input design via lmis admitting frequency-wise model spec-ifications in confidence regions. IEEE transactions on Automatic Control, 50(10):1534–1549, 2005.[63] Henrik Jansson. Experiment design with applications in identification for control. PhD thesis, RoyalInstitute of Technology (KTH), 2004.[64] Afrooz Ebadat. On application oriented experiment design for closed-loop system identification. PhDthesis, Kungliga Tekniska ho¨gskolan, 2015.[65] Patricio E. Valenzuela, Cristian R. Rojas, and Ha˚kan Hjalmarsson. A graph theoretical approach toinput design for identification of nonlinear dynamical models. Automatica, 51:233–242, 2015.[66] R. Isermann and M. Mu¨nchhof. Identification of Dynamic Systems: An Introduction with Applications.Springer, Berlin and Heidelberg, 2011.[67] Yucai Zhu and Przemyskaw Stec. Simple control-relevant identification test methods for a class ofill-conditioned processes. Journal of Process Control, 16(10):1113–1120, 2006.135Bibliography[68] Osmel R. Vaillant, Andre S.R. Kuramoto, and Claudio Garcia. Effectiveness of signal excitationdesign methods for identification of ill-conditioned and highly interactive processes. Industrial &Engineering Chemistry Research, 52(14):5120–5135, 2013.[69] Mark L. Darby and Michael Nikolaou. Identification test design for multivariable model-based con-trol: An industrial perspective. Control Engineering Practice, 22:165–180, 2014.[70] Andrea Micchi and Gabriele Pannocchia. Comparison of input signals in subspace identification ofmultivariable ill-conditioned systems. Journal of Process Control, 18(6):582–593, 2008.[71] Mohammed Ammar and Guy A. Dumont. Input design minimizing the µ-gap in cross-directionalmodels of paper machines. In Proceedings of the 2006 American Control Conference, pages 3795–3800, 2006.[72] Danlei Chu, J.U. Backstro¨m, C. Gheorghe, A. Lahouaoula, and C. Chung. Intelligent closed loop CDalignment. In Proceedings of Control Systems, pages 161–166, 2010.[73] Urban Forssell and Lennart Ljung. Closed-loop identification revisited. Automatica, 35(7):1215–1241, 1999.[74] Wee S. Lee, Brian D. Anderson, Iven M. Mareels, and Robert L. Kosut. On some key issues in thewindsurfer approach to adaptive robust control. Automatica, 31(11):1619–1636, 1995.[75] Jiandong Wang, Tongwen Chen, and Biao Huang. Closed-loop identification via output fast sampling.Journal of Process Control, 14(5):555–570, 2004.[76] Michel Gevers, Lennart Ljung, and Paul Van den Hof. Asymptotic variance expressions for closed-loop identification. Automatica, 37(5):781–786, 2001.[77] Ali Esmaili, John F. MacGregor, and Paul A. Taylor. Direct and two-step methods for closed-loopidentification: a comparison of asymptotic and finite data set performance. Journal of Process Control,10(6):525–537, 2000.[78] Ivar Gustavsson, Lennart Ljung, and Torsten So¨derstro¨m. Identification of processes in closed loop–identifiability and accuracy aspects. Automatica, 13(1):59–75, 1977.136Bibliography[79] Paul M.J. Vandenhof and Ruud J.P. Schrama. An indirect method for transfer-function estimationfrom closed-loop data. Automatica, 29(6):1523–1527, 1993.[80] Urban Forssell and Lennart Ljung. A projection method for closed-loop identification. IEEE Trans-actions on Automatic Control, 45(11):2101–2106, 2000.[81] Luis G. Bergh and John F. MacGregor. Spatial control of sheet and film forming processes. TheCanadian Journal of Chemical Engineering, 65(1):148–155, 1987.[82] R. Bhushan Gopaluni, Mohammed Loewen, Philip D .and Ammar, Guy A. Dumont, and Michael S.Davies. Identification of symmetric noncausal processes: cross-directional response modelling ofpaper machines. In Proceedings of the 45th IEEE Conference on Decision and Control, pages 6744–6749, 2006.[83] Dimitry M. Gorinevsky and Cristian Gheorghe. Identification tool for cross-directional processes.IEEE Transactions on Control Systems Technology, 11(5):629–640, 2003.[84] Mohammed E. Ammar and Guy A. Dumont. Automatic tuning of robust constrained cross-directioncontrollers. International Journal of Adaptive Control and Signal Processing, 30(11):1550–1567,2016.[85] Lee Rippon. Sheet profile estimation and machine direction adaptive control. Master Thesis, TheUniversity of British Columbia, Vancouver, Canada, 2017.[86] Yuri A.W. Shardt and Biao Huang. Closed-loop identification condition for ARMAX models usingroutine operating data. Automatica, 47(7):1534–1537, 2011.[87] Abhijit S. Badwe, Sachin C. Patwardhan, and Ravindra D. Gudi. Closed-loop identification usingdirect approach and high order ARX/GOBF-ARX models. Journal of Process Control, 21(7):1056–1071, 2011.[88] Lennart Ljung and Bo Wahlberg. Asymptotic properties of the least-squares method for estimatingtransfer functions and disturbance spectra. Advances in Applied Probability, 24(02):412–440, 1992.[89] Yucai Zhu and Ha˚kan Hjalmarsson. The Box–Jenkins Steiglitz–McBride algorithm. Automatica,65:170–182, 2016.137Bibliography[90] Yucai Zhu. Multivariable process identification for MPC: the asymptotic method and its applications.Journal of Process Control, 8(2):101–115, 1998.[91] Svante Bjo¨rklund and Lennart Ljung. A review of time-delay estimation techniques. In Proceedingsof the 42nd IEEE Conference on Decision and Control, volume 3, pages 2502–2507, 2003.[92] S. Babji and Arun K. Tangirala. Time-delay estimation in closed-loop processes using average mutualinformation theory. Control & Intelligent Systems, 37(3):176–182, 2009.[93] Michel Gevers, Alexandre Sanfelice Bazanella, and Xavier Bombois. Identification and the informa-tion matrix: How to get just sufficiently rich? IEEE Transactions on Automatic Control, 54(12):2828–2840, 2009.[94] Torsten So¨derstro¨m and Petre Stoica. System Identification. Prentice-Hall, Inc., 1988.[95] Q. Li, J.R. Whiteley, and R.R. Rhinehart. A relative performance monitor for process controllers.International Journal of Adaptive Control and Signal Processing, 17(7-9):685–708, 2003.[96] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297,1995.[97] Ingo Steinwart and Andreas Christmann. Support Vector Machines. Springer Science & BusinessMedia, 2008.[98] Bernhard Scho¨lkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson.Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471,2001.[99] Osvaldo J. Rojas, Graham C. Goodwin, and Andre´ Desbiens. Study of an adaptive anti-windup strat-egy for cross-directional control systems. In Proceedings of the 41st IEEE Conference on Decisionand Control, volume 2, pages 1331–1336, 2002.[100] G.E. Stewart, J.U. Backstrom, P. Baker, C. Gheorghe, and R.N. Vyse. Controllability in cross-directional processes: Practical rules for analysis and design. In The 87th Annual meeting, PAPTAC,pages 1–8, 2001.138Bibliography[101] G.E. Stewart, J.U. Backstro¨m, P. Baker, C. Gheorghe, and R.N. Vyse. Controllability in cross-directional processes. Pulp & Paper Canada, 103(8):32–38, 2002.[102] Zhijie Sun, Yu Zhao, and S. Joe Qin. Improving industrial MPC performance with data-driven distur-bance modeling. In Proceedings of the 50th IEEE Conference on Decision and Control and EuropeanControl Conference (CDC-ECC), pages 1922–1927, 2011.[103] Junqiang Fan, Gregory E. Stewart, and Guy A. Dumont. Two-dimensional frequency analysis forunconstrained model predictive control of cross-directional processes. Automatica, 40(11):1891–1903, 2004.[104] Dimitry Gorinevsky, Stephen Boyd, and Gunter Stein. Design of low-bandwidth spatially distributedfeedback. IEEE Transactions on Automatic Control, 53(1):257–272, 2008.[105] Bassam Bamieh, Fernando Paganini, and Munther A. Dahleh. Distributed control of spatially invariantsystems. IEEE Transactions on Automatic Control, 47(7):1091–1107, 2002.[106] Helmut Lu¨tkepohl. New Introduction to Multiple Time Series Analysis. Springer Science & BusinessMedia, 2005.[107] Yang Wang and Stephen Boyd. Performance bounds and suboptimal policies for linear stochasticcontrol via LMIs. International Journal of Robust and Nonlinear Control, 21(14):1710–1728, 2011.[108] K.S. Narendra and P.G. Gallman. An iterative method for the identification of nonlinear systems usinga Hammerstein model. IEEE Transactions on Automatic Control, 11(3):546–550, 1966.[109] Fouad Giri and Er-Wei Bai. Block-oriented Nonlinear System Identification, volume 1. Springer,2010.[110] James B. Rawlings and I-Lung Chien. Gage control of film and sheet-forming processes. AIChEJournal, 42(3):753–766, 1996.[111] Apostolos Rigopoulos, Yaman Arkun, and Ferhan Kayihan. Identification of full profile disturbancemodels for sheet forming processes. AIChE Journal, 43(3):727–739, 1997.[112] Yucai Zhu. Estimation of an N–L–N Hammerstein–Wiener model. Automatica, 38(9):1607–1614,2002.139Bibliography[113] Alberto Bemporad, Manfred Morari, Vivek Dua, and Efstratios N. Pistikopoulos. The explicit linearquadratic regulator for constrained systems. Automatica, 38(1):3–20, 2002.[114] Gene Golub and Victor Pereyra. Separable nonlinear least squares: the variable projection methodand its applications. Inverse Problems, 19(2):R1–R26, 2003.[115] Er-Wei Bai and Duan Li. Convergence of the iterative Hammerstein system identification algorithm.IEEE Transactions on Automatic Control, 49(11):1929–1940, 2004.[116] Gene H. Golub and Victor Pereyra. The differentiation of pseudo-inverses and nonlinear least squaresproblems whose variables separate. SIAM Journal on Numerical Analysis, 10(2):413–432, 1973.[117] Kaushik Mahata and Torsten So¨derstro¨m. Large sample properties of separable nonlinear least squaresestimators. IEEE Ttransactions on Signal Processing, 52(6):1650–1658, 2004.[118] Qiugang Lu, Michael G. Forbes, R. Bhushan Gopaluni, Philip D. Loewen, Johan U. Backstro¨m, andGuy A. Dumont. Performance assessment of cross-directional control for paper machines. IEEETransactions on Control Systems Technology, 25(1):208–221, 2017.[119] Walter Rudin. Principles of Mathematical Analysis. New York, NY: McGraw-Hill, Inc., 1976.[120] Stephen R. Duncan and K.W. Corscadden. Mini-max control of cross-directional variations on a papermachine. IEE Proceedings-Control Theory and Applications, 145(2):189–195, 1998.[121] Lennart Ljung. Convergence analysis of parametric identification methods. IEEE Transactions onAutomatic Control, 23(5):770–783, 1978.140Appendix AProofs for Chapter 2A.1 Proof of Theorem 2.4.1Let us begin by defining an auxiliary term, ε(t,ρ), as followsε(t,ρ) = [G0−G(q,ρ)]u(t)+ 1A(q,η0)e(t).Notice that with the presence of feedback, one may consider u(t) to be generated by passing the referencesignal r(t) and external noise e(t) through a set of uniformly stable filters. Note, from Assumptions 2.2.2and 2.2.3, that A(q,η0) is inversely stable and thus ε(t,ρ) meets the conditions of Theorem 2B.1 in [15].Therefore,ε(t,ρ, ηˆN) = A(q, ηˆN)ε(t,ρ), ε(t,ρ,η0) = A(q,η0)ε(t,ρ),andVN(ρ, ηˆN) =1NN∑t=112[A(q, ηˆN)ε(t,ρ)]2 , VN(ρ,η0) =1NN∑t=112[A(q,η0)ε(t,ρ)]2 .Furthermore, we can decompose the loss function asVN(ρ, ηˆN) =1NN∑t=1{12[A(q,η0)ε(t,ρ)]2+12[(A(q, ηˆN)−A(q,η0))ε(t,ρ)]· [A(q,η0)ε(t,ρ)]+ 12 [(A(q, ηˆN)−A(q,η0))ε(t,ρ)] [A(q, ηˆN)ε(t,ρ)]}.Due to the Theorem 2B.1 in [15], we havesupρ∈Ωρ∥∥∥∥∥ 1N N∑t=1 12 [A(q,η0)ε(t,ρ)]2−V (ρ,η0)∥∥∥∥∥→ 0, w.p.1, as N→ ∞,141A.2. Proof of Theorem 2.4.3whereV (ρ,η0) = E12[A(q,η0)ε(t,ρ)]2 .Since ε(t,ρ) is bounded uniformly, from Lemma 2.3.1, it follows thatsupρ∈Ωρ∥∥∥∥∥ 1N N∑t=1[(A(q, ηˆN)−A(q,η0))ε(t,ρ)][A(q,η0)ε(t,ρ)]∥∥∥∥∥→ 0, w.p.1, as N→ ∞,supρ∈Ωρ∥∥∥∥∥ 1N N∑t=1[(A(q, ηˆN)−A(q,η0))ε(t,ρ)][A(q, ηˆN)ε(t,ρ)]∥∥∥∥∥→ 0, w.p.1, as N→ ∞.Applying the triangular inequality yields,supρ∈Ωρ∥∥VN(ρ, ηˆN)−V (ρ,η0)∥∥≤ supρ∈Ωρ∥∥∥∥∥ 1N N∑t=1 12 [A(q,η0)ε(t,ρ)]2−E12 [A(q,η0)ε(t,ρ)]2∥∥∥∥∥+ supρ∈Ωρ∥∥∥∥∥ 12N N∑t=1[(A(q, ηˆN)−A(q,η0))ε(t,ρ)][A(q,η0)ε(t,ρ)]∥∥∥∥∥+ supρ∈Ωρ∥∥∥∥∥ 12N N∑t=1[(A(q, ηˆN)−A(q,η0))ε(t,ρ)][A(q, ηˆN)ε(t,ρ)]∥∥∥∥∥→ 0, w.p.1, as N→ ∞.Hence the conclusion of Theorem 2.4.1 in (2.20) can be achieved.A.2 Proof of Theorem 2.4.3Assume that in (2.14) the model structure G(q,ρ) contains the true process model G0(q). Following a similarline of the proof for Theorem 9.1 in [15], the optimum ρˆN in (2.17) satisfiesV′N(ρˆN , ηˆN) = 0.Re-writing the above equation using a first-order Taylor expansion at ρ0, we have0 = V′N(ρ0, ηˆN)+V′′N(ξN , ηˆN)(ρˆN−ρ0), ξN ∈ [ρˆN , ρ0].142A.2. Proof of Theorem 2.4.3Thus the asymptotic distribution√N(ρˆN−ρ0) can be determined by√N(ρˆN−ρ0) =−[V′′N(ξN , ηˆN)]−1√NV′N(ρ0, ηˆN). (A.1)In what follows we will study the distribution√NV′N(ρ0, ηˆN) first, followed by an analysis on the asymp-totic behavior of V′′N(ξN , ηˆN).(1) Asymptotic Analysis of√NV′N(ρ0, ηˆN)From the definition (2.17) we have−V′N(ρ0, ηˆN) =1NN∑t=1ψ(t,ρ0, ηˆN)ε(t,ρ0, ηˆN), (A.2)where ψ(t,ρ, ηˆN) is the derivative of the predictor (2.15) with respect to the parameter ρ , i.e.,ψ(t,ρ0, ηˆN) = G′ρ(ρ0)u(t, ηˆN). (A.3)The specific expression of ε(t,ρ0, ηˆN) in (A.2) isε(t,ρ0, ηˆN) =A(q, ηˆN)A0(q)e(t). (A.4)Instead of studying −√NV′N(ρ0, ηˆN) in (A.2), we can analyze its asymptotic distribution by studying1√NN∑t=1ψ(t,ρ0,η0)ε(t,ρ0, ηˆN), (A.5)since the mean-squared error between them approaches zero as N goes to infinity. This can be verified basedon Lemma 2.3.1. From (2.24), the signal u(t,η0) can be represented byu(t,η0) = A0(q)S0(q)r(t)−S0(q)K(q)e(t), (A.6)where {r(t)} and {e(t)} are independent sequences. It can be observed from (A.3) and (A.4) that1√NN∑t=1ψ(t,ρ0,η0)ε(t,ρ0, ηˆN) = Z1(N)+Z2(N),143A.2. Proof of Theorem 2.4.3whereZ1(N) =1√NN∑t=1ψ(t,ρ0,η0)e(t),Z2(N) =1√NN∑t=1ψ(t,ρ0,η0)A(q, ηˆN)−A0(q)A0(q)e(t).Note that the derivation of (9A.25) in [15] applies to both open-loop and closed-loop data. Thus we haveCov[√NV′N(ρ0, ηˆN)] = Q,whereQ = limN→∞N ·E{V′N(ρ0, ηˆN)[V′N(ρ0, ηˆN)]T}. (A.7)With the above notions, the Q in (A.7) is shown to beQ = limN→∞E[Z1(N)+Z2(N)][Z1(N)+Z2(N)]T= limN→∞E[Z1(N)ZT1 (N)]+E[Z2(N)ZT2 (N)]+E[Z1(N)ZT2 (N)]+E[Z2(N)ZT1 (N)]. (A.8)For the first term of (A.8), by substituting (A.6) into (A.3) we haveψ(t,ρ0,η0) = φr(t)−φe(t), (A.9)whereφr(t) = G′ρ(ρ0)S0A(η0)r(t), (A.10)φe(t) = G′ρ(ρ0)KS0e(t). (A.11)Notice that φr(t) is a deterministic signal. Furthermore from our previous assumption, the process model G0always contains at least one delay and so does G′ρ(ρ0). Therefore, ψ(t,ρ0,η0) is independent of e(t). From144A.2. Proof of Theorem 2.4.3Lemma 2.4.2, it is easy to see thatlimN→∞E[Z1(N)ZT1 (N)] = limN→∞1Nσ2eN∑t=1E[ψ(t,ρ0,η0)ψT (t,ρ0,η0)]= σ2eE[ψ(t,ρ0,η0)ψT (t,ρ0,η0)]= σ2eEφr(t)φTr (t)+σ2eE[φe(t)φTe (t)]. (A.12)For the second term of (A.8), based on a similar argument as that in [89], we consider the auto-covarianceofZ˜2(N) = Zn(N)2√N(ηˆN− η¯n(N)), (A.13)whereZn(N)2 = E[ψ(t,ρ0,η0) · [ΓTn(N) 01×n(N)]1A0(q)e(t)],andηˆN → η¯n(N), w.p.1, as N→ ∞,since the mean-squared error between (A.13) and Z2(N) tends to zero as N → 0. Note that the matrixZn(N)2 ∈ Rnρ×n(N) is deterministic and satisfies the condition of Theorem D.3 in [89], thus theorem D.3 canbe immediately applied to (A.13). Before demonstrating the main result, let us first explore the specificexpression of Zn(N)2 . From (A.9) we haveZn(N)2 = E[(φr(t)−φe(t)) · [ΓTn(N) 01×n(N)]1A0(q)e(t)]= −E[φe(t) · [ΓTn(N) 01×n(N)]1A0(q)e(t)]= −[E[G′ρ(ρ0)S0Ke(t) ·ΓTn(N)H0e(t)] 01×n(N)]. (A.14)Then from Theorem D.3 in [89], the asymptotic auto-covariance of Z2(N) islimN→∞E[Z2(N)ZT2 (N)] = limN→∞Zn(N)2 [Rn(N)]−1(Zn(N)2 )T , (A.15)where R¯n was defined in (2.27).For the third term of (A.8), to simplify the analysis, we consider the cross-covariance between Z1(N)145A.2. Proof of Theorem 2.4.3andZ2(N) = Zn(N)2 [Rn(N)]−11√NN∑t=1ϕn(N)t e(t),where ϕn(N)t was defined in (2.29). Therefore, the cross-covariance between Z2(N) and ZT1 (N) giveslimN→∞E[Z2(N)ZT1 (N)] = limN→∞Zn(N)2 [Rn(N)]−11NN∑t=1N∑s=1E[ϕn(N)t e(t) · e(s)ψT (s,ρ0,η0)]= Zn(N)2 [Rn(N)]−1σ2eE[ϕn(N)t ψT (t,ρ0,η0)]. (A.16)For the last term of (A.8), following the same line of above derivations we havelimN→∞E[Z1(N)ZT2 (N)] = limN→∞1NN∑t=1N∑s=1E[ψ(t,ρ0,η0)e(t) · e(t)(ϕn(N)t )T ][Rn(N)]−1(Zn(N)2 )T= σ2eE[ψ(t,ρ0,η0)(ϕn(N)t )T][Rn(N)]−1(Zn(N)2 )T . (A.17)Adding the results of (A.12), (A.15), (A.16) and (A.17) yields the expression for Q in (2.31). Followingthe procedure in proving Theorem 9.1 in [15], it can be observed that asymptotically,√NV′N(ρ0, ηˆN) isnormally distributed, i.e.,√NV′N(ρ0, ηˆN)∼ AsN(0,Q). (A.18)(2) Asymptotic Analysis of [V′′N(ξN , ηˆN)]From (A.2)-(A.3) we have the following expressionV′′N(ρ, ηˆN) =1NN∑t=1ψ(t,ρ, ηˆN)ψT (t,ρ, ηˆN)− 1NN∑t=1∂∂ρψ(t,ρ, ηˆN)ε(t,ρ, ηˆN). (A.19)Expanding the first term in the above equation at η0, similar to the proof of Theorem 2.4.1, we obtain1NN∑t=1ψ(t,ρ, ηˆN)ψT (t,ρ, ηˆN) =1NN∑t=1ψ(t,ρ,η0)ψT (t,ρ,η0)+1NN∑t=1{[A(ηˆN)−A(η0)]ψ(t,ρ)}·{ψT (t,ρ,η0)}+ 1N N∑t=1{ψ(t,ρ,η0)}{[A(ηˆN)−A(η0)]ψT (t,ρ)} .It follows thatsupρ∈Ωρ∥∥∥∥∥ 1N N∑t=1ψ(t,ρ, ηˆN)ψT (t,ρ, ηˆN)−Eψ(t,ρ,η0)ψT (t,ρ,η0)∥∥∥∥∥→ 0, w.p.1, as N→ ∞. (A.20)146A.2. Proof of Theorem 2.4.3In an analogous way we havesupρ∈Ωρ∥∥∥∥∥ 1N N∑t=1 ∂∂ρ ψ(t,ρ, ηˆN)ε(t,ρ, ηˆN)−E ∂∂ρ ψ(t,ρ,η0)ε(t,ρ,η0)∥∥∥∥∥→ 0, w.p.1, as N→ ∞, (A.21)since ξN ∈ [ρˆN ,ρ0] and ρˆN → ρ0, w.p.1, as N→ 0 if the selected model structure G(q,ρ) in (2.14) containsthe true model G0(q). Therefore, we may conclude that ξN → ρ0 almost surely as sample number tends toinfinity. Combining (A.19)-(A.21) and noticing that ε(t,ρ,η0) = e(t) (thus E ∂∂ρψ(t,ρ,η0)ε(t,ρ,η0) = 0),we haveV′′N(ρ, ηˆN)→ E[ψ(t,ρ,η0)ψT (t,ρ,η0)], w.p.1, as N→ ∞. (A.22)Combining the results of (A.1), (A.18) and (A.22), we can obtain (2.30), where Q(ηˆN) is given by (2.31),and thus this theorem is proved.147Appendix BDerivations for Chapter 4B.1 Variance partitionFor the measured two-dimensional data set Y ∈ Rm×N , it has the following structure:Y =1 · · · j · · · N1...i...my11...yi1...ym1· · ·. . .· · ·. . .· · ·y1 j...yi j...ym j· · ·. . .· · ·. . .· · ·y1N...yiN...ymN, (B.1)where each row of Y refers to the N measurements of one data box, while each column of Y represents themeasured profile at each scan across all the data boxes. The overall sample mean of the data set Y is definedas,Y¯ =m∑i=1N∑j=1yi jNm. (B.2)The total sample variance σ2T can be calculated as,σ2T =m∑i=1N∑j=1(yi j− Y¯ )2Nm−1 . (B.3)The CD sample variance σ2CD can be calculated as,σ2CD =m∑i=1(Y˜i− Y¯ )2m−1 , (B.4)148B.2. Derivation of (4.12)where Y˜i = ∑Nj=1yi jN . The MD sample variance σ2MD can be calculated as,σ2MD =N∑j=1(Y˜j− Y¯ )2N−1 , (B.5)where Y˜j = ∑mi=1yi jm . The residual sample variance σ2Res is calculated as,σ2Res = σ2T −σ2CD−σ2MD. (B.6)B.2 Derivation of (4.12)For the disturbance model (4.5), from the Diophantine identity, we haveC(q)A(q)= F(q)+q−dH(q)A(q), (B.7)where F(q) and H(q) are scalar polynomials, i.e.,F(q) = f0+ f1q+ . . .+ fd−1q−d+1, (B.8)H(q) = h0+h1q+ . . .+hnhq−nh . (B.9)Considering the profile at time t+d (supposing the current time is t), from (B.7) we haveyr(t+d|t) = yˆr(t+d|t)+F(q)φe(t+d), (B.10)where yˆr(t+d|t) represents the d-step-ahead prediction [120] at t, namely,yˆr(t+d|t) = B(q)F(q)C(q) Gur(t)+H(q)C(q)yr(t). (B.11)Note that the second term in (B.10) is the unpredictable future profile due to the time-delay, which is con-troller invariant. The first term of (B.10) is controller-dependent and by minimizing E[yˆr(t+d|t)yˆTr (t+d|t)]we will achieve the minimum variance of the residual profile. If the transfer matrix G in (B.11) is squareand invertible, it is possible to find an input sequence such that yˆr(t +d|t) = 0. However, due to the special149B.2. Derivation of (4.12)structure, G is often non-square, and hence the minimum variance of yˆr(t+d|t) is achieved by settingur(t) =−G† H(q)B(q)F(q)yr(t), (B.12)where G† = (GT G)−1GT is the pseudo-inverse of G. It should be noted that G† indeed represents the MVCin the spatial direction due to the special structure of G (refer to (4.6) and (4.11)) while H(q)B(q)F(q) denotes thetemporal MVC due to time delay. Therefore, (B.12) stands for the temporal and spatial MVC for the residualmodel (4.3).150Appendix CProofs for Chapter 5C.1 Proof of Theorem 5.3.1DefiningΦy =ψ y¯(1)...ψ y¯(N) ,it then follows from (5.15) thatYˆ = [Φy Φu] aCb= [Φ˜y Φ˜u] aBc , (C.1)where B = diag{b, . . . ,b}. Φ˜y and Φ˜u are easily obtained by rearranging Φy and Φy, respectively. The lossfunction (5.17) is expressed in a more compact form,VN(θ) = ‖Y− Yˆ‖22 =∥∥∥∥∥∥∥Y− [Φy Φu] aCb∥∥∥∥∥∥∥22. (C.2)(i) The proof is an extension of Theorem IV.1 in [115]. Note that to ease the notation we will dropthe hat in the parameter estimates from (5.20)-(5.22), but add a subscript “N” to stress the fact that theseestimates are obtained from N samples of data. It is easy to see that through the iterative identification steps(5.20)-(5.22),VN(θ kN) =VN(akN ,bkN ,ckN)≤VN(akN ,bkN ,ck−1N )≤VN(ak−1N ,bk−1N ,ck−1N ) =VN(θ k−1N ).In other words, VN(θ kN) is nonincreasing. Despite of the nonincreasing VN(θkN), without normalization (5.21),151C.1. Proof of Theorem 5.3.1it is possible that bkN → ∞, ckN → 0, as k→ ∞, or vice versa, but as a product, CkNbkN remains finite. Thepresence of normalization step in (5.21) eliminates the above disfavored scenario. As a result, the sequence{bkN} is bounded due to the normalization and thus {ckN} is bounded. Since otherwise, VN(θ kN) will becomeunbounded which contradicts the facts that ZN is bounded, all model structures in Ω are stable, and thatVN(θ) is continuous in θ and nonincreasing.Now let us concentrate on the convergence analysis of the iterative identification algorithms for largeN. First we will establish a statement that when fixing the spatial parameter c, VN(θ) is convex in temporalparameters {a,b} and vice versa. Based on (C.2), we haveVN(a,b,c) = YT Y−2YTΦ aCb+ aCbTΦTΦ aCb , Φ= [Φy Φu] . (C.3)When the parameter c is fixed, it is easy to derive that for λ ∈ [0,1],VN(λa1+(1−λ )a2,λb1+(1−λ )b2,c) = λVN(a1,b1,c)+(1−λ )VN(a2,b2,c)−λ (1−λ ) a1Cb1− a1Cb1TΦTΦ a2Cb2− a2Cb2 . (C.4)We mention that due to Assumption 3, for any large N, Φ has full column rank, which renders ΦTΦ > 0.Therefore, with the fact that a 6= 0, b 6= 0, c 6= 0, (C.4) implies thatVN(λa1+(1−λ )a2,λb1+(1−λ )b2,c)≥ λVN(a1,b1,c)+(1−λ )VN(a2,b2,c).This verifies the convexity of VN(a,b,c) with respect to a,b. A similar conclusion can be achieved for theconvexity of VN(a,b,c) with respect to c. An immediate consequence of these statements is that in eachoptimization of the iterative identification algorithm, we have a unique and closed-form solution. It will beshown below that even though VN(θ) may not be convex in θ , our algorithm can always converge to its localminimum.Now define θ k to be the parameter estimate at k-th iteration. When the iterative identification algorithm(5.20)-(5.22) converges, i.e., θ k→ θ for some θ , we need to show that ∇VN(θ) = 0. Suppose that ∇VN(θ) 6=0, then there always exists a direction (e.g. negative gradient) along which VN(θ) decreases. Because the152C.1. Proof of Theorem 5.3.1solution to (5.20) and to (5.22) is unique, we can always minimize VN(θ) sequentially to achieve anotherpoint θ ′, such that VN(θ ′)≤VN(θ). This contradicts the facts that VN(θ) is nonincreasing and θ k convergesto θ . Thus the statement in (i) holds.(ii) Combining (5.6) and (5.9), one can see that the closed-loop system has the formy(t) = fS(t,yt−1,ut−1)+ eo(t). (C.5)According to Assumption 1, the nonlinear closed-loop system (C.5) is exponentially stable, which satisfiesS1-S3 in [121]. Moreover, with the parameterizations (5.12), the one-step-ahead predictor (5.15) of themodel is differentiable with respect to parameter θ , which implies the condition M1 in [121]. Our selectedquadratic criterion also meets the regularity condition C1 in that paper. As a result, Lemma 3.1 in [121]applies to our scenario, which indicates the validity of (5.29). As the convergence in (5.29) is uniform inθ ∈Ω, (5.30) follows directly from (5.29).(iii) From Assumptions 1 and 2, it follows that ε(t,θ o) = eo(t). Therefore,V (θ)−V (θ o) = E[ε(t,θ)− ε(t,θ o)]T ε(t,θ o)+E[ε(t,θ)− ε(t,θ o)]T [ε(t,θ)− ε(t,θ o)]. (C.6)Note that ε(t,θ)− ε(t,θ o) = yˆ(t|θ)− yˆ(t|θ o) which only depends on past input-output data and thus is notcorrelated with current noise ε(t,θ o). Hence, the first term in (C.6) is zero. The second term is alwaysnonnegative which means that V (θ) is always greater than V (θ o) unless yˆ(t|t− 1,θ) = yˆ(t|t− 1,θ o). Wenow show that under the informativeness condition of closed-loop data as in Assumption 3, yˆ(t|θ ∗) = yˆ(t|θ o)implies that θ ∗ = θ o. From Assumption 2 and (5.15) we knowyˆ(t|θ ∗)− yˆ(t|θ o) = ψ(t) a∗−aoCb∗−Cbo , ψ(t) = [ψy(t) ψ u¯(t−d)] . (C.7)Plugging this into the limit loss function (C.6) yieldsE[yˆ(t|θ ∗)− yˆ(t|θ o)] = a∗−aoCb∗−CboTE[ψT (t)ψ(t)] a∗−aoCb∗−Cbo . (C.8)153C.1. Proof of Theorem 5.3.1From Assumption 3a, we know that (5.26) holds for any large N. Thus asymptotically, E[ψ u¯(t)] has fullcolumn rank. Moreover, according to (5.27), columns of E[ψ u¯(t− d)] are linearly independent of those inE[ψ y¯(t)]. This leads to the statement that E[ψT (t)ψ(t)] has full column rank. Therefore, the only situationmaking V (θ ∗)−V (θ o) = 0 is a∗ = ao, Cb∗ = Cbo. Note that from the rescaling (step 6) of the algorithm,we can arrive at C∗ = Co, b∗ = bo, which ends the proof of (5.31).154


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items