Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Sheet profile estimation and machine direction adaptive control Rippon, Lee 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2017_september_rippon_lee.pdf [ 8.67MB ]
JSON: 24-1.0347279.json
JSON-LD: 24-1.0347279-ld.json
RDF/XML (Pretty): 24-1.0347279-rdf.xml
RDF/JSON: 24-1.0347279-rdf.json
Turtle: 24-1.0347279-turtle.txt
N-Triples: 24-1.0347279-rdf-ntriples.txt
Original Record: 24-1.0347279-source.json
Full Text

Full Text

Sheet Profile Estimation and MachineDirection Adaptive ControlbyLee RipponB.A.Sc., The University of British Columbia, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Chemical and Biological Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2017c© Lee Rippon 2017AbstractSheet and film process control is often structured such that separate controllers and actuatorsare dedicated to either the temporal (i.e, machine direction) variations or the spatial (i.e., crossdirection) variations. The dedicated machine direction (MD) and cross direction (CD) controllersrequire separate measurements of the MD and CD sheet property profiles, respectively. The currentindustrial standard involves a traversing sensor that acquires a signal containing both MD and CDproperty variations. The challenge then becomes how does one extract separate MD and CD profilesfrom the mixed signal. Numerous techniques have been proposed, but ultimately the traditionalexponential filtering method continues to be the industrial standard. A more recent technique,compressive sensing, appears promising but previous developments do not address the industrialconstraints. In the first part of this thesis the compressive sensing technique is developed further,specifically with regards to feasibility of implementation. A comparative analysis is performed todetermine the benefits and drawbacks of the proposed method.Model-based control has gained widespread acceptance in a variety of industrial processes. Toensure adequate performance, these model-based controllers require a model that accurately repre-sents the true process. However, the true process is changing over time as a result of the variousoperating conditions and physical characteristics of the process. In part two of this thesis an in-tegrated adaptive control strategy is introduced for the multi-input multi-output MD process of apaper machine. This integrated framework consists of process monitoring, input design and systemidentification techniques developed in collaboration with multiple colleagues. The goal of this workis to unify these efforts and exhibit the integrated functionality on an industrial paper machinesimulator.iiPrefaceThis thesis consists of the following two distinct parts:• Part I: Sheet Profile Estimation,• Part II: Machine Direction Adaptive Control.In Part I the discussion is relevant to various sheet and film processes whereas in Part II the proposedtechniques have widespread applicability to a larger variety of industrial processes that implementmodel-based control systems. For the purpose of this thesis, the techniques in both Part I and PartII are developed and tested specifically in the context of application on a paper machine. Focusingon a specific industrial process allows for the important and necessary consideration of the realisticindustrial constraints. Furthermore, it allows for enhanced testing and result verification using theHoneywell paper machine simulator. Incorporating the expertise and resources of Honeywell, ourindustrial collaborator, is crucial to the practical significance of the resulting analyses.Part I begins with introductory material into the paper machine, scanning sensor, signal pro-cessing and the problem of MD aliasing. Various MD-CD separation techniques that have beenproposed in literature are then reviewed and some of the more promising methods are selectedfor further development. Specifically, the power spectral analysis and compressive sensing tech-niques are developed further in this work to enable implementation with the sampling structure ofthe scanning sensor. A simulated comparative analysis is performed that incorporates a realisticCD profile and the sampling geometry of the scanning sensor. Analyzing the experimental resultshelps to reveal the benefits and drawback of the various MD-CD separation methods. Concludingstatements and recommendations for future work are provided in the final chapter of Part I.The second part of this thesis involves unifying a series of developments made by multiplecolleagues throughout our collaborative research initiative with Honeywell. Specifically model-plant mismatch detection techniques developed by Qiugang Lu and closed-loop system identificationiiiPrefacetechniques developed in collaboration with Qiugang Lu are implemented in this work. Combiningthese techniques with input design methods developed in collaboration with Mahdi Yousefi facilitatethe development of a unified adaptive control framework. Contributions made in this work includeintegrating these developments into a functional adaptive control architecture and demonstratingthe effectiveness on a Honeywell paper machine simulator of the machine direction process.This work was made possible thanks to the valuable advice of my academic supervisors BhushanGopaluni and Philip Loewen at the University of British Columbia as well as my industrial super-visors Michael Forbes and Johan Backstrom at Honeywell Process Solutions. Furthermore, myresearch colleagues Qiugang Lu and Mahdi Yousefi were instrumental in the development of this re-search, both in terms of technical developments and as mentors. Finally I would like to acknowledgemy family and friends for helping to support me throughout this work.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiI Sheet Profile Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 Sheet and film processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 The paper machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 The scanning sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Characterizing paper sheet variations . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Preliminary scanner signal processing . . . . . . . . . . . . . . . . . . . . . . . . . . 91.6 Variance partitioning analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.7 MD aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 MD-CD Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Exponential filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3 Power spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.1 Fourier analysis and orthogonal basis functions . . . . . . . . . . . . . . . . . 22vTable of Contents2.3.2 Power spectral analysis for MD-CD separation . . . . . . . . . . . . . . . . . 272.4 Compressive sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.4.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.4.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.4.3 Compressive sensing for MD-CD separation . . . . . . . . . . . . . . . . . . 373 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.1 Trial 1: Slow moving sheet with 100 CD bins . . . . . . . . . . . . . . . . . . . . . . 444.2 Trial 2: Fast moving sheet with 60 CD bins . . . . . . . . . . . . . . . . . . . . . . . 484.3 Trial 3: Compressive sensing with two scanning sensors . . . . . . . . . . . . . . . . 514.3.1 200 CD bins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.3.2 100 CD bins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.3.3 60 CD bins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.4 Results summary and computational considerations . . . . . . . . . . . . . . . . . . 575 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.1 Recommendations for future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2 Recommendations for industrial progress . . . . . . . . . . . . . . . . . . . . . . . . 63II Machine Direction Adaptive Control . . . . . . . . . . . . . . . . . . . . . . . . 656 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687.1 Simulating the MD process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687.2 Adaptive control framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.2 Prediction error method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738.3 Direct identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75viTable of Contents8.4 ARX-OE identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 Model Plant Mismatch Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8410 Input Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8610.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8610.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8911 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9111.1 Experimental time-line and general setup . . . . . . . . . . . . . . . . . . . . . . . . 9111.2 Experimental configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9311.3 Experimental analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9412 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9612.1 Trial 1: PRBS excitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9612.1.1 Test 1.1: Tuning with ARX-OE identification . . . . . . . . . . . . . . . . . 9612.1.2 Test 1.2: Tuning with direct identification . . . . . . . . . . . . . . . . . . . 10012.2 Trial 2: Input design, first order noise . . . . . . . . . . . . . . . . . . . . . . . . . . 10312.2.1 Test 2.1: Tuning with ARX-OE identification . . . . . . . . . . . . . . . . . 10312.2.2 Test 2.2: Tuning with direct identification . . . . . . . . . . . . . . . . . . . 10612.3 Trial 3: Input design, sixth order noise model . . . . . . . . . . . . . . . . . . . . . . 10912.3.1 Test 3.1: Tuning with ARX-OE identification . . . . . . . . . . . . . . . . . 11012.3.2 Test 3.2: Tuning with direct identification . . . . . . . . . . . . . . . . . . . 11313 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119viiTable of ContentsAppendicesA Additional Sheet Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 125viiiList of Tables1.1 MD basis weight variance distribution (vsheet = 11.7 m/s) [19] . . . . . . . . . . . . . 81.2 Causes of MD basis weight variations (vsheet = 11.7 m/s) [19] . . . . . . . . . . . . . 81.3 Two-dimensional representation of scanner signal . . . . . . . . . . . . . . . . . . . . 104.1 Simulation parameters for Trial 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2 Simulation parameters for Trial 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3 Simulation parameters for Trial 3, 200 CD bins . . . . . . . . . . . . . . . . . . . . . 524.4 Summary of experimental results for Trial 1 . . . . . . . . . . . . . . . . . . . . . . . 584.5 Summary of experimental results for Trial 2 . . . . . . . . . . . . . . . . . . . . . . . 584.6 Summary of experimental results for Trial 3 . . . . . . . . . . . . . . . . . . . . . . . 594.7 Compressive sensing computation requirements . . . . . . . . . . . . . . . . . . . . . 595.1 Comparison of MD-CD separation methods . . . . . . . . . . . . . . . . . . . . . . . 617.1 Continuous MD Process Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 6911.1 General experimental setup parameters . . . . . . . . . . . . . . . . . . . . . . . . . 9311.2 Various experimental configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . 9412.1 Percent error of DI and ARX-OE estimates for Test 1.1 . . . . . . . . . . . . . . . . 9912.2 Percent error of DI and ARX-OE estimates for Test 1.2 . . . . . . . . . . . . . . . . 10212.3 Percent error of DI and ARX-OE estimates for Test 2.1 . . . . . . . . . . . . . . . . 10612.4 Percent error of DI and ARX-OE estimates for Test 2.2 . . . . . . . . . . . . . . . . 10912.5 Percent error of DI and ARX-OE estimates for Test 3.1 . . . . . . . . . . . . . . . . 11312.6 Percent error of DI and ARX-OE estimates for Test 3.2 . . . . . . . . . . . . . . . . 116A.1 Simulation parameters for Trial 3, 100 CD bins . . . . . . . . . . . . . . . . . . . . . 125ixList of TablesA.2 Simulation parameters for Trial 3, 60 bins . . . . . . . . . . . . . . . . . . . . . . . . 125A.3 Simulation parameters for Trial 3, 60 CD bins, 4 scans . . . . . . . . . . . . . . . . . 125xList of Figures1.1 Various sheet and film processes [34][39][53] . . . . . . . . . . . . . . . . . . . . . . . 31.2 Schematic of a Fourdrinier paper machine [16] . . . . . . . . . . . . . . . . . . . . . . 51.3 Schematic of a scanning sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 MD variation at frequency 1/(2tscan) . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5 MD variation at frequency 1/(tscan) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.6 Four scans with MD variation at frequency 1/(2tscan) . . . . . . . . . . . . . . . . . . 152.1 Four scans with MD frequencies near 1/(2tscan) and 1/(tscan) . . . . . . . . . . . . . 212.2 Exponential filtering after 2, 40 and 80 scans . . . . . . . . . . . . . . . . . . . . . . 212.3 Two scan speeds with noisy sinusoidal MD and CD profiles . . . . . . . . . . . . . . 282.4 Spectral content of scans at different scan speeds . . . . . . . . . . . . . . . . . . . . 292.5 CD profiles after MD-CD separation with PSA . . . . . . . . . . . . . . . . . . . . . 302.6 MD profiles after MD-CD separation with PSA . . . . . . . . . . . . . . . . . . . . . 312.7 Simulated sheet with 64 CD bins and 4 scans . . . . . . . . . . . . . . . . . . . . . . 392.8 Simulated sheet reconstruction with compressive sensing . . . . . . . . . . . . . . . . 403.1 Realistic CD profile provided by Honeywell . . . . . . . . . . . . . . . . . . . . . . . 414.1 Simulated sheet for Trial 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 MD-CD separation results for Trial 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.3 MD-CD separation after scaling for Trial 1 . . . . . . . . . . . . . . . . . . . . . . . 474.4 Simulated sheet with noise for Trial 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 484.5 MD-CD separation results with noise for Trial 1 . . . . . . . . . . . . . . . . . . . . 484.6 Simulated sheet for Trial 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.7 MD-CD separation results for Trial 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 50xiList of Figures4.8 MD-CD separation results with noise for Trial 2 . . . . . . . . . . . . . . . . . . . . 514.9 Simulated sheet and CS estimate with 200 CD bins . . . . . . . . . . . . . . . . . . . 534.10 MD-CD separation results for Trial 3, 200 CD bins . . . . . . . . . . . . . . . . . . . 544.11 Simulated sheet and CS estimate with 100 CD bins . . . . . . . . . . . . . . . . . . . 554.12 MD-CD separation results for Trial 3, 100 CD bins . . . . . . . . . . . . . . . . . . . 554.13 Simulated sheet and CS estimate with 60 CD bins . . . . . . . . . . . . . . . . . . . 574.14 MD-CD separation results for Trial 3, 60 CD bins . . . . . . . . . . . . . . . . . . . . 577.1 Closed-loop MD control system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687.2 Adaptive framework for closed-loop MPC . . . . . . . . . . . . . . . . . . . . . . . . 708.1 Impulse response of true and estimated inverse noise models . . . . . . . . . . . . . . 789.1 Illustration of high dimensional SVM classification [42] . . . . . . . . . . . . . . . . . 8411.1 Illustrative time-line of the general setup . . . . . . . . . . . . . . . . . . . . . . . . . 9212.1 Raw input profiles for Test 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9712.2 Raw output profiles for Test 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9712.3 Dynamic response with MPM for Test 1.1 . . . . . . . . . . . . . . . . . . . . . . . . 9812.4 Dynamic response after tuning for Test 1.1 . . . . . . . . . . . . . . . . . . . . . . . 9812.5 Raw input profiles for Test 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10012.6 Raw output profiles for Test 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10112.7 Dynamic response with MPM for Test 1.2 . . . . . . . . . . . . . . . . . . . . . . . . 10112.8 Dynamic response after tuning for Test 1.2 . . . . . . . . . . . . . . . . . . . . . . . 10212.9 Raw input profiles for Test 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10412.10Raw output profiles for Test 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10412.11Dynamic response with MPM for Test 2.1 . . . . . . . . . . . . . . . . . . . . . . . . 10512.12Dynamic response after tuning for Test 2.1 . . . . . . . . . . . . . . . . . . . . . . . 10612.13Raw input profiles for Test 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10712.14Raw output profiles for Test 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10712.15Dynamic response with MPM for Test 2.2 . . . . . . . . . . . . . . . . . . . . . . . . 108xiiList of Figures12.16Dynamic response after tuning for Test 2.2 . . . . . . . . . . . . . . . . . . . . . . . 10912.17Raw input profiles for Test 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11012.18Raw output profiles for Test 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11112.19Dynamic response with MPM for Test 3.1 . . . . . . . . . . . . . . . . . . . . . . . . 11112.20Dynamic response after tuning for Test 3.1 . . . . . . . . . . . . . . . . . . . . . . . 11212.21Raw input profiles for Test 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11312.22Raw output profiles for Test 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11412.23Dynamic response with MPM for Test 3.2 . . . . . . . . . . . . . . . . . . . . . . . . 11512.24Dynamic response after tuning for Test 3.2 . . . . . . . . . . . . . . . . . . . . . . . 115A.1 Simulated sheet with 100 CD bins and 4 scans . . . . . . . . . . . . . . . . . . . . . 126A.2 Compressive sensing estimate with 100 CD bins and 4 scans . . . . . . . . . . . . . . 126A.3 MD-CD separation results with 100 CD bins and 4 scans . . . . . . . . . . . . . . . . 127A.4 Simulated sheet and CS estimate with 60 CD bins and 4 scans . . . . . . . . . . . . 127A.5 MD-CD separation results with 60 CD bins and 4 scans . . . . . . . . . . . . . . . . 128xiiiPart ISheet Profile Estimation1Chapter 1IntroductionImproved sheet, web and film property estimation is motivated by the ongoing desire for higherquality, more consistent products and the constant desire for maximum production efficiency. In acompetitive market customers are able to set strict constraints and producers are often at a strategicadvantage if they can offer a premium product. In this chapter the property estimation problemwill be described for a broad set of relevant industries as well as the specific industry of study.1.1 Sheet and film processesCollectively, sheet and film processes such as paper making, metal rolling, coating and polymerfilm extrusion constitute a vast global market. As such, improving the operational efficiency, andthus the control of sheet and film processes, such as those shown in Figure 1.1, is of significantindustrial importance. Broadly speaking, optimal operational efficiency involves maximizing therate of on-grade product (i.e, conforms to customer specifications) while minimizing operationalexpenses (e.g., energy, raw materials, human intervention). In reality, the objective is much morecomplex, requiring adherence to a variety of safety, environmental, and economic constraints. Acommon control objective is to minimize the variation of a measured output around a specifiedtarget value (i.e., set-point). This allows for the production of a high quality sheet that meetscustomer specifications [57].What makes sheet and film processes similar is that property variations are minimized alongboth the length and the width of the sheet, also known as the machine direction (MD) and thecross direction (CD), respectively [22]. In applications where the sheet is moving, the MD profilesrepresent temporal variations and the CD profiles represent spatial variations. Specific MD actuatorsare controlled by dedicated MD controllers that require an accurate profile of the MD variations.Similarly, the CD variations are controlled by a dedicated CD controller that uses a CD profile of21.1. Sheet and film processesFigure 1.1: Various sheet and film processes [34][39][53]the sheet to control spatially distributed actuators. In order to accurately control the MD and CDvariations it is necessary to reconstruct accurate MD and CD output profiles from the measuredoutputs. Outputs are often measured by a scanning sensor that traverses the moving sheet andsamples a mixture of MD and CD variations [15][57]. The problem of reconstructing accurate MDand CD profiles from this scanner signal is commonly referred to as MD-CD separation and itconstitutes the primary research problem addressed in Part I of this thesis.Improved estimation of sheet and film profiles is motivated from the following two perspectives:1. From a control perspective: in order to minimize output variations around a set-point, it isimportant to observe where the output is relative to the set-point. From a control perspective,the focus is on producing MD and CD output profiles that help increase the performance ofthe respective controllers.2. From a quality reporting perspective: improved estimation of sheet and film propertiescan help provide a more accurate account of product quality to the customer. From a reporting31.2. The paper machineperspective, the focus is on quantifying an accurate representation of the sheet/film propertiesfor the batch of product sold to the customer.Although these two perspectives share a large over-lap in their motivations, the constraints andrequirements involved in each problem are unique. The majority of this thesis is motivated fromthe control perspective, although quality reporting is also addressed. For thorough considerationof the practical industrial complexities, the remainder of this thesis narrows in scope and focusesspecifically on application with a paper machine. However, it is important to note that the MD-CDseparation analysis is relevant to all sheet and film processes where the sheet is moving in the MDwhile being measured by a sensor traversing in the CD.1.2 The paper machineAs a staple of the Canadian economy, the multibillion dollar pulp and paper industry employsthousands of people and is a major source of Canada’s manufactured goods and exports. Increasedglobal competition, reduced growth in newsprint demand and volatile commodity prices have cre-ated challenges for the industry [45][61]. Moreover, decreasing profitability margins and increasingcustomer constraints further intensify the need for consistent product quality and improved controlof operations [50]. Perhaps most importantly, the pulp and paper industry is an integral componentof the larger forest products industry as the profitability of saw mills depends on revenue generatedthrough selling chips to pulp mills [49]. Thus, it is imperative for the Canadian economy to ensurethe pulp and paper industry is a global leader in efficiency and technological innovation.A Fourdrinier paper machine, as depicted in Figure 1.2, consists of a series of dewatering andpressing operations that transform a slurry of pulp fiber into a uniform sheet of paper. Thesecomplex and expensive machines can be over 100 m long in the MD and produce a sheet of paperover 10 m wide at rates exceeding 30 m/s [5]. In the forming section the pulp fiber flows from thethick stock pump to the headbox where it is diluted and sprayed through a slice lip onto a movingwire or felt. In the pressing and drying sections the wet sheet of fiber is compressed and driedby passing through a series of metal rollers, some of which are heated with high pressure steam.Finally, the sheet is smoothed into a final product before collecting on a reel at the end of themachine [59].41.3. The scanning sensorFigure 1.2: Schematic of a Fourdrinier paper machine [16]The objective in paper-making, from a quality control perspective, is to efficiently produce auniform sheet of paper with consistent properties that match customer specifications. Measuredoutputs such as caliper, moisture content and basis weight are used to determine actuator move-ments that correct deviations from the set-point [29]. Control of sheet properties is separated intoorthogonal temporal and spatial components. Dedicated MD and CD control systems are usedto address temporal and spatial variations, respectively. Set-point tracking changes are primarilyaddressed with the MD control system and the spatial variations across the sheet are tuned withthe CD control system. The MD control system uses dedicated MD actuators such as thick stockvalve positioning and dryer steam pressure to minimize the temporal variations. These actuatorshave a relatively uniform effect on the spatial properties of the sheet [60]. Alternatively, the spatialvariations across the sheet are minimized by manipulating spatially distributed CD actuators suchas headbox dilution valves and induction profilers [21]. To implement feedback control the MDand CD controllers require measured output profiles which are typically acquired using a scanningsensor.1.3 The scanning sensorAlthough various sampling configurations exist, a prevalent method in paper-making is to use ascanning sensor. Scanning sensors were first introduced around 1980 when they began to replace thesingle-point sensors which had a fixed CD position [58]. The scanning sensor framework consists of a51.3. The scanning sensorscanner head mounted on a frame that spans the width of the sheet. The scanner head is filled witha variety of sensors that can measure sheet properties such as basis weight, moisture content, andcaliper (i.e., thickness) [4]. Sometimes paper machines use multiple scanners at different locationsalong the MD to obtain additional samples, as shown in Figure 1.2. However, the cost associatedwith adding multiple scanners is generally prohibitive due to the expensive sensor technology [47].As the sheet moves at vsheet ≈ 30 m/s in the MD, the scanner traverses the sheet at vscan ≈1 m/s in the CD. The resulting sampling trajectory forms a diagonal zig-zag pattern across thesheet, as shown in Figure 1.3 below. The scanner frame and scanner head are depicted in grey whileFigure 1.3: Schematic of a scanning sensorthe blue arrows indicate the direction in which the scanner and sheet are moving. The resultingsampling trajectory is described by the dotted red line although it is important to note that thisdepiction is not to scale.In reality, the angle θ from the sampling trajectory to the MD axis is much smaller, typicallyθ ≈ 1◦. Specifically, θ is a function of the scan time (tscan), the sheet speed (vsheet), and the sheetwidth (wsheet). Although varying scan speeds can be used, it is typical in industry for the scanspeed (vscan) to be fixed for long periods of operation. Ignoring occasional stationary periods forscanner calibration, the scan time can be determined as simply tscan =wsheetvscan. Assuming realisticindustrial values of wsheet = 10 m and vscan = 0.5 m/s, the time required for one traverse of thesheet is tscan = 20s. Furthermore, using a realistic sheet speed of vsheet = 30 m/s, the scanning61.4. Characterizing paper sheet variationsangle, θ, from the MD can be determined using [2]θ = 90− arctan(vsheet tscanwsheet), (1.1)to be θ = 0.95◦. However, this value is dependent on application and operation, e.g., an 8 m widesheet moving at 20 m/s with a 1 m/s scanner results in θ = 2.86◦. Assuming a constant scan speed,the expression on the right hand side (RHS) of (1.1) simplifies to 90 − arctan(vsheetvscan ). This detailedanalysis of the sampling trajectory provides important background information for the simulatedexperimental setup described in Chapter 3.Due to the diagonal sampling trajectory the resulting signal acquired by the scanner containsa mixture of MD and CD property variations. From this signal separate MD trends and CD pro-files need to be determined and provided to the MD and CD control systems, respectively. One ofthe more straightforward proposals to address the need for MD-CD separation is to use an arrayof stationary sensors that span the width of the sheet. A sensor array would likely result in astraightforward acquisition of enhanced MD trends and CD profiles as well as a vastly improveddetermination of the overall product quality. However, as mentioned before the sensor technol-ogy can be prohibitively expensive and the economic justification based on improved control andproduct quality reporting is not straightforward. Until sheet spanning sensing mechanisms becomecommonplace in industry, the problem of MD-CD separation with a scanning sensor configurationremains relevant [37].1.4 Characterizing paper sheet variationsUnderstanding the nature of paper machine disturbances and sheet variations is critical when de-ciding which scanner signal processing techniques are appropriate. This is particularly true whendistinguishing between techniques that are suitable for either control or product quality reporting.As alluded to earlier, the requirements for each of these perspectives are not equivalent. For ex-ample, while a higher resolution of the estimated profile might be beneficial for quality reporting,it might not improve the control of sheet properties. To further appreciate the MD-CD separationstudy, some background information on paper sheet variations is provided here.Paper sheet variability is quantified primarily along the MD and CD axes, with the remaining71.4. Characterizing paper sheet variationsvariability referred to as residual variability. A more detailed discussion of paper machine variancepartitioning is provided in Section 1.6. Variability within a paper sheet ranges over a large spectrumfrom the level of individual fibers to the length of an entire reel of paper. A study of MD basisweight variation on a paper machine with a sheet speed of 11.7 m/s using a single-point sensor ledto the data shown in Table 1.1 below [19]. These MD basis weight variations are categorized basedon their relationship to the process, as shown in Table 1.2.Table 1.1: MD basis weight variance distribution (vsheet = 11.7 m/s) [19]Frequency (Hz) Range (s) Percent of total variance125-250 0.004-0.04 152.5-25 0.04-0.4 400.25-2.5 0.4-4.0 100.006-0.25 4.0-167 200.0016-0.006 167-600 15Table 1.2: Causes of MD basis weight variations (vsheet = 11.7 m/s) [19]Category Frequency (Hz) Potential sourcesFormation > 117 fiber properties, turbulence in headboxShort term 1-117 hydraulic pulsations, equipment vibrationMedium term 0.005-1 blending, flow stability, fast control loopsLong term < 0.005 stock prep, pulp handling, slow control loopsUnfortunately this work does not provide the same analysis for CD basis weight variations.Instead, the common assumption that CD profiles are essentially constant over time was studiedand found to be technically incorrect. However, the maximum variation in consecutive CD basisweight profiles was found to be small (i.e., ≈ 6%) when 10 plies were tested. It was necessary to usemultiple plies because otherwise the analysis would be incorrectly dominated by residual variations[19]. So while the CD profile is technically not independent of time, for control purposes assumingit is constant for short periods of time (e.g., a few scans) is reasonable.Of particular interest for MD-CD separation is estimating the MD and CD variations thatare within the range of controllable frequencies. For a sheet moving at 16.7 m/s the spectrum ofvariations related to process control starts at approximately 1 Hz and extends to slower variations.The highest cutoff frequency for MD control loops is for the headbox total head at around 0.05 Hz.Other control loops, such as basis weight, have a much lower cutoff frequency [8]. The spatially1Average of eight separate records81.5. Preliminary scanner signal processingdistributed CD actuators may have up to 300 individual actuators and the actuator spacing (for aslice lip system) is anywhere from 7.0 cm to 20 cm with a bandwidth of around 15 cm [16] [54].The MD and CD controllable frequencies provide a lower bound on the required resolution for MD-CD separation methods to provide adequate MD and CD profiles. For the purpose of this thesis,the target resolution in the developed MD-CD separation methods is conservatively higher thanthe aforementioned lower bounds. This allows for both improvements in the MD and CD controlsystem bandwidths as well as for improvements in the quality reporting statistics.1.5 Preliminary scanner signal processingFrom both a control perspective and a quality reporting perspective the acquired scanner signal,y ∈ Rn, needs to be processed such that it represents the two dimensional reality of the process, i.e.,Y ∈ RnCD×nMD . Here n is equal to the total number of measurements taken by the scanning sensor.In an ideal setting, such as with a high density sensor array, nCD would represent the number ofsensors distributed along the CD and nMD would represent each sample time. Unfortunately, witha scanning sensor some simplifying assumptions are often relied upon instead.Each time the scanner traverses the moving sheet of paper is considered one scan, during whichup to 10,000 locations along the CD profile can be sampled [57]. An initial data verification step isperformed to remove any illogical sample values. Next, these measurements are down-sampled intoevenly spaced CD bins at a lower common resolution (e.g., 500 bins) that is more computationallytractable for the model-based control systems [33]. Therefore, if y ∈ Rn represents the scannersignal after this initial processing then the total number of measurements, n, is equal to the numberof CD bins multiplied by the number of scans being analyzed. For both control purposes andquality reporting the standard method for achieving Rn 7→ RnCD×nMD is to simply collapse themeasurements of each scan into a column vector where the rows represent the common resolutionCD bins. The resulting matrix of scanner values, Y, is depicted in Table 1.3.In Table 1.3, nMD is equal to the number of successive scans being analyzed and nCD is equalto the number of common resolution CD bins. While the rows correctly represent the approximatespatial positioning of each measurement, the columns instead treat the temporal positioning of eachmeasurement during a scan as if the sheet was stationary and they all occurred at the same time.91.6. Variance partitioning analysisTable 1.3: Two-dimensional representation of scanner signalScans ← MD →Scan 1 · · · Scan j · · · Scan nMDBins Bin 1 Y, · · · Y,j · · · Y,nMD....... . ..... . ....↑ Bin i Yi, · · · Yi,j · · · Yi,nMDCD....... . ..... . ....↓ Bin nCD YnCD, · · · YnCD,j · · · YnCD,nMDIn fact, for control purposes the MD trend is often determined by simply taking the average of eachscan and ignoring the MD variations observed within the scan [21]. Sometimes subsequent scans areaveraged or temporal filtering is applied to determine the CD control profile (discussed further inChapter 2). For product quality reporting the matrix Y is analyzed further in a procedure referredto as variance partitioning.1.6 Variance partitioning analysisThe sheet property matrix Y can be broken down into the following form:Y = Y¯MD + Y¯CD + YR, (1.2)where Y¯MD, Y¯CD and YR represent data associated with MD variations, CD variations andresidual variations, respectively [44]. More specifically, if the average of all data points across scanj isY¯ MDj =1nCDnCD∑i=Yi,j (1.3)thenY¯MD = q[Y¯ MD · · · Y¯ MDnMD], (1.4)where q is a column vector of length nCD filled with all ones. In other words, Y¯MD represents amatrix where the data points in each column are the average of all the data points for the scan ofthat particular column. Values along the rows of a specific column are fixed as the matrix has a flatCD profile. Therefore, when Y − Y¯MD is computed it is equivalent to removing the scan averagefrom every data point along that particular scan, i.e., removing the MD variations. Similarly, if the101.6. Variance partitioning analysisaverage of all data points across bin i isY¯ CDi =1nMDnMD∑j=Yi,j (1.5)thenY¯CD =Y¯ CD...Y¯ CDnCDp, (1.6)where p is a row vector of length nMD filled with all ones. Here, Y¯CD represents a matrix with eachdata point in each row as the average of data points from all scans for the CD bin correspondingto that particular row. In other words the values along a particular row are fixed as the averageover all scans and the matrix has a flat MD profile. Therefore, when Y − Y¯CD is computed it isequivalent to removing the CD bin average from every data point along that particular CD bin,i.e., removing the CD variations. Understanding Y¯MD and Y¯CD provides important context forinterpreting the MD and CD profiles estimated in the simulated experiments. It is also worth notingthat the matrices Y¯MD, Y¯CD and YR are useful for controller performance monitoring [44][57].Variance partitioning is concerned with obtaining scalar values for describing the amount oftotal MD, CD and residual variation in a reel of paper. These statistics can be used to assess thequality of the paper and the performance of the paper machine relative to other machines. Theaverage across a scan is defined in (1.3), the average across a bin is defined in (1.5) and the overallaverage is defined asY¯ =1nCD · nMDnCD∑i=nMD∑j=Yi,j . (1.7)The total variance of the paper sheet data is then expressed asσ2T =(nCD · nMD)− 1nCD∑i=nMD∑j=(Yi,j − Y¯ )2. (1.8)Similarly the MD and CD variances are calculated asσ2MD =1nMD − 1nMD∑j=(Y¯MDj − Y¯ )2, (1.9)111.7. MD aliasingandσ2CD =1nCD − 1nCD∑i=(Y¯ CDi − Y¯ )2, (1.10)respectively. The residual variance is determined by taking the difference of the total variance andthe combined MD and CD variances, i.e.,σ2R = σ2T − σ2MD − σ2CD. (1.11)Instead of reporting the actual variance, it is common in industry for these values to be reportedas ‘2-Sigma’ values (e.g., 2σMD) [1].As mentioned before, this treatment of scanner data ignores the geometry of the samplingtrajectory and instead treats each scan as if it were taken on a stationary sheet. However, due tothe shallow sampling angle from the MD axis, θ, over 600 m of sheet can pass in the MD duringa single 10 m traverse in the CD. This MD variation leakage into the CD profile can significantlydistort the scalar values determined by variance partitioning. The CD variance is likely to be overestimated due to the inclusion of intra-scan MD variations. Arguably more problematic however,is the distorting effect of these MD variations when they alias into the CD profile used by the CDcontrol system. This distortion of the estimated CD profile due to the MD variations is referredto here as MD aliasing, and it represents one of the important challenges to overcome in MD-CDseparation.1.7 MD aliasingWhile variance partitioning is primarily concerned with determining scalar values of sheet variations,MD/CD separation involves obtaining profiles of the separate variations that exist either along theMD or the CD. As with variance partitioning, the CD profiles can become distorted by MD variationsleaking into the CD profile, i.e., MD aliasing. Here the effects of MD aliasing are displayed with ascanner signal that has been processed to use Y¯ CD − Y¯ as the CD profile.To show the effects of MD aliasing some simplifying assumptions are made. Moreover, thefollowing assumptions are common in literature and some of them are applied in the experimentalanalysis as well.121.7. MD aliasingAssumption 1. The scanner traverses the moving sheet at a constant and equivalent speed in bothdirections.Assumption 2. The scanner reverses exactly at the edges of the sheet.Assumption 3. The CD profile is independent of time.As mentioned before, Assumption 1 is essentially true for long periods of time in many papermachine operations. However, more recent work has sparked interest in variable rate scanningwhich will be discussed in Chapter 2. In reality, the scanner does not follow such an ideal trajectoryas described by Assumption 2. Instead, the scanner comes off the sheet briefly between scansand occasionally it will even stop for a short period while off the sheet in order to calibrate. Asdiscussed earlier, while Assumption 3 is not technically correct, it is true for the vast majority ofCD variability over short periods of time.According to the assumptions described above, the time series of the scanner data forms aneven, periodic function with respect to the CD variations (i.e., MD and residual variations notwith-standing). Each time the scanner completes two scans back and forth across the sheet representsone period (2tscan) in this even periodic time series function. Therefore, if one were to take theFourier transform of the scanner signal time series, the CD variations would contribute to coeffi-cients at integer multiples of (2tscan)−1. Consequently, the misinterpretation of MD variations inthe CD profile (i.e., MD aliasing) is most apparent when the frequency of the MD variations occurat integer multiples of (2tscan)−1 [27]. To show the effect of MD aliasing a simulated paper sheetwith a flat CD profile and a single MD variation of frequency (2tscan)−1 is sampled by a simulatedscanning sensor and the CD profile Y¯ CD − Y¯ is shown in red in Figure 1.4. The reason the observedCD profile extends into negative values is because the average of the scanned samples is removedaccording to standard industrial processing. If the frequency of the MD variation is doubled suchthat it equals (tscan)−1 the resulting observed CD profile would take the form shown in Figure 1.5.Observing these examples, one might conclude that simply averaging over two consecutive scanswould alleviate the bias in the CD profile. While this may be true for the current phase of the MDvariations, consider if the MD frequencies were phase shifted by pi/2. In Figure 1.6, four consecutivescans are simulated on a sheet with a flat CD profile and an MD variation at a frequency of (2tscan)−1.By phase shifting the MD variations by pi/2 the peaks and troughs of the MD variation consistently131.7. MD aliasingFigure 1.4: MD variation at frequency 1/(2tscan)Figure 1.5: MD variation at frequency 1/(tscan)occur at the close and far side of the sheet, respectively. In this situation, no amount of averagingconsecutive scans will eliminate or even alleviate the bias in the CD profile.The aliasing of MD variations into the CD profile can have serious implications for CD control.141.7. MD aliasingThese perceived CD variations do not actually exist spatially but nevertheless the CD controlsystem may attempt to correct for them which can further exacerbate the problem. By attemptingto mitigate fictitious CD variations the CD control system could introduce new disturbances tothe detriment of product quality. In some cases, averaging or temporal filtering can successfullyreduce the effect of MD aliasing, however this usually comes at the detriment of profile resolutionand controller response [37]. Until recently, the MD aliasing problem was scarcely addressed inthe proposed MD-CD separation methods. In this thesis MD aliasing will be a primary focus ofevaluating the effectiveness of various MD-CD separation methods.Figure 1.6: Four scans with MD variation at frequency 1/(2tscan)15Chapter 2MD-CD SeparationOver the previous thirty years a variety of different techniques have been proposed to separateMD and CD profiles from the scanning sensor signal. Unfortunately, to date none of the proposedmethods have succeeded in gaining widespread industrial implementation. In this chapter, theprevious developments in the MD-CD separation literature are reviewed and a few of the morepromising methods are selected for further development and comparison. Next, the typical industrialmethod known as exponential filtering (EF) is presented with a detailed review of the various benefitsand limitations. One of the more recently proposed methods, power spectral analysis (PSA), isconsidered with simulated examples provided. Finally, the chapter concludes with the introductionand development of the compressive sensing (CS) method for MD-CD separation.2.1 BackgroundThe majority of MD-CD separation literature involves the industrially supported assumptions thatMD variations are faster and more dynamic than their CD counterparts, which are consideredrelatively time invariant. Furthermore, MD variations are assumed to have a uniform effect acrossthe width of the sheet. These assumptions allow the control of MD and CD variations to be treatedas independent problems which motivates the need to estimate separate MD and CD profiles.The well established industrial method for MD-CD separation is referred to as exponentialfiltering. Essentially the signal processing described previously to obtain Y is carried out andthen a temporal averaging is applied along each CD bin to determine the CD profile. Each scan isaveraged into a single point to determine the MD trend. Among other problems, this method suffersfrom very limited MD resolution and slow upset detection in the CD profile because each new CDprofile update includes a weighted average of previous CD profiles. Traditional exponential filteringignores the CD position of the scanned data and therefore the sampling geometry itself. Up until162.1. Backgroundrecently, disregarding the sampling geometry has been fairly common in the MD-CD separationliterature.In the 1980s and early 1990s developments in MD-CD separation focused primarily on modelingthe process outputs such as basis weight and moisture. In 1986 Lindeborg proposes a nonlinearmoisture model that uses scanning sensor measurements to estimate MD moisture variations atall times [40]. A dual Kalman filter approach is introduced by Chen in 1988 that combines bothtemporal and spatial models estimated from scanning sensor data [15]. Lindeborg’s moisture modelis leveraged by Dumont et al. in 1993 to develop an on-line estimation and identification of moisturecontent (EIMC) algorithm. The EIMC algorithm uses a recursive least squares exponential forget-ting and resetting algorithm (EFRA) to estimate the CD profile, while the MD profile is estimatedwith a Kalman filter [21]. Following along this line of work, Wang et al. apply a similar approach forestimation of basis weight to develop the identification of basis weight profile and auto-regressivemoving average (ARMA) MD model (IBPAM) algorithm. The IBPAM algorithm uses recursiveleast squares EFRA to estimate the CD profile while an extended Kalman filter (EKF) is used toestimate the MD basis weight profile [60]. The EIMC and IBPAM techniques are consolidated byWang et al. and tested with both simulated and industrial data [59].Although the industrial and simulated tests presented are encouraging, these model based MD-CD separation methods have not gained wide-spread industrial implementation. This might be dueto the fact that model based methods rely on specification of parameters that are dependent on theparticular machine and vary with operating conditions. Furthermore, to ensure convergence of theindividual EKF and EFRA algorithms the tuning parameters must meet certain constraints. Evenif said constraints are met, the convergence of the composite EKF and EFRA algorithm cannot beguaranteed [59]. Finally, the ability of the EIMC algorithm to provide unbiased estimates has notbeen proven [21].In the mid-to-late 1990s wavelet compression and filtering techniques are introduced to approachthe MD-CD separation problem. Wavelet basis functions such as Daubechies, Haar and Symmletare tested and shown to have potentially useful compression ratios on industrial data [46]. Earlyefforts failed to recognize the geometry of the scanner but in 2004 Duncan and Wellstead developa technique that considers the sampling geometry and applies the generalized sampling theorem toreconstruct MD and CD profiles. However, a non-causal filter is required that depends on the use172.1. Backgroundof both past and future data [22]. Aslani et al. propose an algorithm in 2009 that recognizes thesampling geometry with a causal approximation of the non-causal filter based on periodic samplingtheory. Their algorithm combines this causal filter with recursive wavelet denoising to estimateMD and CD profiles [5]. Unfortunately this technique does not sufficiently address the issue of MDaliasing.Explicit attention to the issue of MD aliasing has drawn increased attention in the MD-CDseparation literature. The MD aliasing problem is discussed in detail by Gendron in 2005 [27].Gendron’s work is similar in many aspects to a 2008 thesis by Karimzadeh that also exploits thefact that the CD variations form an even periodic function in the time domain. By exploiting thefact that the discrete cosine transform (DCT) coefficients of the CD variations appear only at integermultiples of twice the scanning frequency, the CD profile can be estimated from the raw scannerdata. Specifically, taking the DCT of the scanner signal and setting all of the coefficients that arenot at integer multiples of (2tscan)-1 to zero, the CD profile is estimated by taking the inverse DCTof the remaining frequency content. The MD profile is then estimated by subtracting the CD profilefrom the raw scanner data. This technique is compared to the wavelet and exponential filteringmethods and is shown through simulations to have better performance [36]. One disconcerting factabout this DCT technique is that it does not distinguish between the CD variations and the MDvariations that occur at integer multiples of (2tscan)-1. Therefore, it does not offer a solution forhandling the MD variations that have the most detrimental aliasing effect on the CD profile.Recent attempts to address MD aliasing have explored the use of variable speed scanning or evenrandom scanning speeds to minimize the impact of MD variations leaking into the CD profile. In2009, Aslani et al. show that varying the scan speed can drastically alter the perception of MD andCD variations. Assuming the CD profile does not have any fast variations it is shown that the MDfrequency components will shift if the scan speed shifts. Unfortunately in this work a systematicway to exploit the benefits of variable speed scanning is not proposed [4]. In 2011 Chen (with ABB)introduces the use of power spectral analysis (PSA) with a fast Fourier transform (FFT) to improveon traditional variance partitioning analysis (VPA) by extracting dominant variation patterns fromMD, CD and residual profiles [14]. A formal procedure for using the PSA with two different scanspeeds to separate MD and CD variations is patented by Chen and ABB in 2012 [13]. A variation ofthis technique will be one of the primary subjects of research in this thesis and thus will be covered182.2. Exponential filteringin more detail in Section 2.3. Work by Lang and Fu with Metso in 2013 also confirms the benefitsof variable speed scanning and the potential for separating MD and CD components by observinga spectral frequency shift in the MD components with two different scan speeds [37].In a 2011 thesis by Towfighi compressive sensing (CS) for MD-CD separation is introduced.Although initial results are promising, the practical considerations of implementation on a scanningsensor are not sufficiently considered [55]. In Section 2.4 the CS technique is developed furtherwhile taking into consideration the practical constraints of the scanning sensor framework. Thisnew CS method is compared to the industrial standard (exponential filtering) and a variation of thePSA technique. A comparative analysis is conducted through simulations with these techniques todetermine potential benefits and limitations associated with each.2.2 Exponential filteringRecall, exponential filtering (EF) is the longstanding industrial technique used to help estimateseparate MD and CD profiles. The average of each scan is stored in a vector, Y¯ MD, and this is usedas the input profile for the MD controller. The CD profile is calculated as a weighted average ofcurrent and past values across each CD bin by passing through an exponential filter, i.e.,Y CD,fi,j = α(Yi,j − Y¯ MDj ) + (1− α)Y CD,fi,j−. (2.1)Here, Y CD,fi,j is the exponentially filtered data point for CD position i and MD position j. The CDposition, i = 1, · · · , nCD where nCD is the number of CD bins and the MD position, j = 1, · · · , nMDwhere nMD is the number of scans in this particular context. After initial processing (as describedin Section 1.5) the data matrix Y is created from individual output measurements Yi,j . When datafrom scan j is collected, the average Y¯ MDj is determined. For EF the scan average is subtracted fromthe measured value at each CD bin position and multiplied by a filter factor, α. The exponentialfilter factor, α ∈ [0, 1], typically has a nominal value of 0.33 although in some arrangements it isdynamic according to the number of consecutive scans considered. The second term on the RHSof (2.1) shows (1−α) being multiplied by the filtered value from the same CD bin but the previousscan [2][5].To summarize, EF removes the scan average from the values in the most recent scan, multiplies192.2. Exponential filteringthese zero average bin values by 0.33 and then adds 0.67 multiplied by the filtered value from thesame CD position but the previous scan. Essentially, EF works along each CD bin making theupdated value a function of all of the previous values. This creates a form of temporal averagingacross each CD bin where the most recent measurement is given a weight of α and the previousfiltered measurement is given a weight of (1−α). The hope is that averaging over consecutive scanswill remove high frequency MD variations from the CD profile and thus reduce the effect of MDaliasing. Employing the exponential filter requires the assumptions that MD variations are generallymuch faster than CD variations and that the CD profile is relatively stable.To demonstrate the effects of EF, a sheet of paper is simulated with MD variations at 0.05 Hzand 0.10 Hz, while a scanner traverses at a speed such that (2tscan)−1 = 0.0568 Hz. This differencemight seem trivial but it is important because the slight difference between the MD variations andinteger multiples of (2tscan)−1 allows the EF technique to reduce some of the aliasing over time. Onthe other hand, the more unique situation shown in Figure 1.6 where the MD variation is exactly(2tscan)−1 renders EF entirely ineffective. Again, the sheet is simulated with no noise but this timetwo additive MD variations exist along with a realistic CD variation profile provided by Honeywell,our industrial sponsors. The CD profile is static, as the first four scans show in Figure 2.1. It isworth noting that the observed CD profile shown in red is actually the average of the four separatezero-mean scanned CD profiles. From Figure 2.1 it is apparent that averaging over the four scanshas itself helped to dampen the MD variations in the perceived CD profile.With fairly severe MD aliasing, as in Figure 2.1, EF may require many scans before it caneffectively diminish the bias in the CD profile. To demonstrate this the exact same sheet andscanner setup in Figure 2.1 is simulated for a total of eighty scans. The effect of EF with α = 0.33is compared at the second, fortieth and eightieth scans as shown in Figure 2.2 with scans increasingfrom the top to bottom plots, respectively. The green line shows the true CD profile with no MDaliasing, the red line is the observed CD profile with the scan average removed and the blue lineshows the CD profile after EF is conducted. As the number of scans increase the EF is able toreduce more of the MD aliasing because the CD profile is static and the MD variations are not exactinteger multiples of twice the scan time. Figure 2.2 shows that as the scans increase the differencein aliasing between the observed CD profile (red) and the exponentially filtered CD profile (blue)decreases.202.2. Exponential filteringFigure 2.1: Four scans with MD frequencies near 1/(2tscan) and 1/(tscan)Figure 2.2: Exponential filtering after 2, 40 and 80 scansThe number of scans required to average out the high frequency MD variations can be quitesignificant, especially considering the necessary assumption of a static CD profile during this time.Since the most recent scan still receives α proportion of the weight, EF never fully dampens the212.3. Power spectral analysisaliased MD variations. Another significant drawback of EF is that if the CD profile is not static, theaveraging effect can mask the change for long periods of time, ultimately preventing detection ofthe upset and the necessary control response. Furthermore, the use of the scan average for the MDprofile is not ideal as it provides unnecessarily low resolution which results in failure to acknowledgefaster (potentially controllable) variations and ultimately slow upset detection by the controllers.Finally, exponential filtering does not address limitations set by the Nyquist sampling theoremwhich states that for accurate signal reconstruction the sampling frequency must be at least twicethe highest frequency in the original band-limited signal (i.e., the Nyquist rate) [27] [51].2.3 Power spectral analysisThe power spectral analysis (PSA) technique patented by Chen and ABB in 2012 involves comparingthe Fourier transform of two consecutive scans at alternating scan speeds. By varying the scan speedand observing the shift in the support of the spectral contents, dominant MD frequencies can bedistinguished from dominant CD frequencies [13]. Prior to discussing specifically how this techniqueis implemented for MD-CD separation, some background material on Fourier series expansion isprovided.2.3.1 Fourier analysis and orthogonal basis functionsIn short, Fourier analysis involves representing a signal, f(x), as a combination of sine and cosinewaves at various frequencies and magnitudes. Here, some background information on this repre-sentation process is provided. Please note that attempts are made to use standard notations whichmay result in conflicts with other notation used throughout this thesis. Efforts are made to helpclarify the use of notation, such as for the complex number, i =√−1, here as opposed to the CDbin index, i, from before. Furthermore, representation basis functions are traditionally denotedby φ (as seen in the references provided in this section) but to be consistent with the compressivesensing literature discussed later they will be denoted here as ψ instead. Notation is presented suchthat reader consideration of the current context should be sufficient to avoid confusion.The concept of orthogonality is often described as two perpendicular lines that form a 90◦ anglewith one-another. Extending this to higher dimensions, orthogonality can be understood as two222.3. Power spectral analysisvectors having an inner product that is equal to zero. A classic set of orthogonal functions is sin(x)and cos(x) on the interval [−pi, pi], i.e.,〈sin, cos〉 =∫ pi−pisin(x)cos(x)dx = 0. (2.2)This orthogonal property is critical to the application of Fourier analysis and more generally tosignal representation through basis functions. In Fourier analysis the orthogonal basis functions aresin(nx) and cos(nx), however other types of basis functions exist, such as wavelets, to representsignals. For generality, in what follows the set of basis functions will be denoted {ψi(x)} = Ψ .A set of basis functions, Ψ , are orthogonal over the interval [a, b] with respect to the weightingfunction w(x) if for i 6= j ∫ baψi(x)ψj(x)w(x)dx = 0. (2.3)However, in the case where i = j, the scalar product of the two vectors does not vanish, i.e.,∫ baψ2i (x)w(x)dx = N2i . (2.4)Now if the set of basis functions, Ψ , is normalized such that each ψi(x) is multiplied by N−1i thenthe resulting set of basis functions will satisfy∫ baψi(x)ψj(x)w(x)dx = δij ≡0, for i 6= j1, for i = j(2.5)where δij is the Kronecker delta. A set of basis functions satisfying (2.5) are considered orthonormalbasis functions, i.e., both orthogonal and unit normalized [3] [11].Another important property to consider when using orthogonal functions to estimate a signalis completeness. A set of orthogonal functions is complete if a piece-wise continuous function, f(x)for x ∈ [a, b], can be represented by the seriesf(x) =∞∑n=anψn(x) (2.6)232.3. Power spectral analysissuch that the limit of the integrated squared error vanishes [3], i.e.,limm→∞∫ ba[f(x)−m∑n=anψn(x)]2w(x)dx = 0. (2.7)From (2.7) it is apparent that orthogonal completeness is an important property for ensuring theaccuracy of the approximation of f(x).To demonstrate the importance of the orthonormal property for basis functions, suppose thefunction f(x) is to be estimated as in (2.6). If the vector space is complex, as it is with Fourierexpansion, some review of complex inner products is required. First, if z is a complex variablewith both real and imaginary parts, i.e., z = x+ iy, then the complex conjugate of z, denoted z∗ isz∗ = x− iy. For a complex matrix, A, the complex conjugate, A∗, is formed by taking the complexconjugate of each element [3]. For two vectors u and v in a complex vector space Cn their Euclideaninner product is given by [38]〈u, v〉 = uv∗ + uv∗ + · · ·+ unv∗n. (2.8)Now, given (2.8) and the inner product defined in (2.5), consider taking the inner product of thesignal we wish to estimate, f(x), with the orthonormal basis function ψm over the interval [a, b],i.e.,〈f, ψm〉 =∫ baf(x)ψ∗m(x)w(x)dx. (2.9)Plugging in the series expansion for f(x) from (2.6), i.e.,〈f, ψm〉 =∫ ba∞∑n=anψn(x)ψ∗m(x)w(x)dx, (2.10)and then extracting the summation of coefficients outside of the integral yields〈f, ψm〉 =∞∑n=an∫ baψn(x)ψ∗m(x)w(x)dx. (2.11)Since the set of basis functions Ψ are orthonormal as defined in (2.5), expression (2.11) simplifies to〈f, ψm〉 =∞∑n=anδnm (2.12)242.3. Power spectral analysisand by the definition of the Kronecker delta〈f, ψm〉 = am. (2.13)Therefore, by relating back to (2.6) the function to be estimated can be written directly in theorthogonal basis, i.e.,f(x) =∞∑n=〈f, ψn〉ψn(x). (2.14)This property of orthogonal basis functions is critical for both spectral analysis and compressivesensing.The expansion coefficients, am, given byam =∫ baf(x)ψ∗m(x)w(x)dx, (2.15)are commonly referred to as the generalized Fourier coefficients [3] [11]. As mentioned before theusual Fourier series uses the orthogonal functions sin(x) and cos(x) on the interval [−pi, pi]. TheFourier series expansion for a function f(x) is given byf(x) =12a +∞∑n=ancos(nx) +∞∑n=bnsin(nx), (2.16)where, for n = 0, 1, 2, · · · ,∞, we havean =1pi∫ 2pi0f(x)cos(nx)dx, (2.17)bn =1pi∫ 2pi0f(x)sin(nx)dx. (2.18)Note that a is covered by (2.17) which explains the coefficient of 1/2 in (2.16).The Fourier series expansion can also be extended to complex coefficients [3]. Recall, fromEuler’s formula, trigonometric functions can be related to the complex exponential function for anyreal number x as [18]eix = cos(x) + isin(x). (2.19)252.3. Power spectral analysisTherefore, the Fourier expansion from (2.16) can be expressed asf(x) =∞∑n=−∞cneinx (2.20)wherecn =12(an + ibn), for n < 012a0, for n = 012(an − ibn), for n > 0.(2.21)If f(x) is periodic in [−L/2, L/2] the Fourier expansion can be expressed asf(x) =∞∑n=−∞cnei(2pinx/L), (2.22)which forms the basis for the Fourier transform, F (ω).The Fourier transform of a continuous band-limited signal f(x) is given byF (ω) =∫ +∞−∞f(x)e(−iωx)dx. (2.23)A data set, f(nT ), that is sampled at uniform intervals of T , can be used to estimate f(x) with theperiodically extended spectrum F T (ω), i.e.,F T (ω) =+∞∑n=−∞f(nT )e(−iωnT ), (2.24)where|F (ω)| = 0 for |ω| > 12[2pi/T ], (2.25)F T (ω) = F (ω) for |ω| 6 12[2pi/T ]. (2.26)With F T (ω) the estimate of f(x) becomesf(x) =12pi∫ +pi/T−pi/TF T (ω)e(+iωx)dω, (2.27)but for practical purposes a finite data set of length N is used and the finite approximation is262.3. Power spectral analysisperformed with the discrete Fourier transform (DFT), i.e.,FD(ωk) =N−∑n=f(nT )e(−iωknT ), (2.28)whereωk =2piNTk, and k = 0, 1, · · · , N-1. (2.29)Other finite variations of the Fourier transform exist as well such as the finite Fourier transformand the discrete cosine transform [31].In this work the fast Fourier transform (FFT), a more computationally efficient version of theDFT, is used for PSA. The FFT uses symmetries between the sine and cosine transforms to reducethe number of operations for a signal of N samples from O(N2) operations to O(N logN) operations[17][18]. Here the notation O(N) describes how the number of operations grows as a function ofthe signal length N . In the following section the use of the FFT is described with an example on asimulated paper sheet.2.3.2 Power spectral analysis for MD-CD separationThe PSA technique requires the comparison of spectral components of two scans with varyingscan speeds. Here the simulated scanner is set up such that the scan speed alternates every scan,resulting in different forward and reverse scanning trajectories. This allows each new scan to becompared to the most recent previous scan with the PSA method to perform MD-CD separation.To demonstrate the use of PSA for MD-CD separation, consider a simulated sheet of paper that is 8m wide and moving at 10 m/s in the MD while a scanning sensor traverses the sheet at alternatingspeeds of 0.4762 m/s and 0.9091 m/s. The simulated sheet has one sinusoidal MD variation ata frequency of 0.10 Hz and one sinusoidal CD variation at a frequency of 0.5 m−1. Furthermore,zero-mean normally distributed random noise is added to the sheet with a standard deviation ofσe = 0.1. The simulated sheet has a total of nCD =128 CD bins and is scanned twice, as shown inFigure 2.3. It is clear from Figure 2.3 that the slower scan speed of the first scan results in a longerMD trajectory and a smaller scanning angle θ. For each scan, the scanning angle is θ = 2.7263◦and θ = 5.1944◦.272.3. Power spectral analysisFigure 2.3: Two scan speeds with noisy sinusoidal MD and CD profilesAccording to the Shannon-Nyquist theorem, in order to perfectly reconstruct a signal withmaximum bandwidth ωB one must take equispaced samples of that signal with a rate of at least2ωB, i.e., the Nyquist rate [20]. Likewise, the FFT of a scan of length nCD collected with a constantsampling rate of 1/nCD contains nCD/2 frequency bins that range from frequencies as low as 1/nCD( cyclessample) to frequencies as high as 0.5 (cyclessample). In other words, the FFT applies sinusoids with aperiod as large as the scanning length and as low as twice the sample spacing.The output of the FFT contains both real and imaginary components, mirrored (for real data)at the center, where the second half of the output is the complex conjugate of the first half. For areal-valued signal, as we have here, the second half of the FFT output offers no additional informa-tion. The first value of the FFT is commonly referred to as the DC component and it representsthe average value of the scanned signal multiplied by the number of samples (for the MATLABimplementation which does not normalize by 1/nCD as others do). Furthermore, the bin at nCD/2where the FFT becomes mirrored represents the signal energy at the Nyquist frequency. To iden-tify the dominant temporal and spatial frequency components the magnitudes of the complex FFToutputs are calculated.The magnitude of the spectral content for both scans is shown in Figure 2.4. The top profile282.3. Power spectral analysisshows the FFT magnitude plotted with respect to the spatial frequency whereas the bottom profileshows the FFT output magnitude plotted with respect to the temporal frequency. Notice in the topplot both the first and second scans have frequency content over the same spatial frequency rangebecause both scans span the 8 m wide paper sheet. However, on the bottom plot the frequencycontent of the first scan does not cover the same temporal frequency range as the second scanbecause the first scan takes the same number of samples over a longer period of time. The secondscan is effectively sampling at a faster rate so it is able to measure higher MD frequency content.Figure 2.4: Spectral content of scans at different scan speedsThe benefit of alternating scan speeds is apparent from Figure 2.4 by observing the differencein alignment of the spectral content between the top and bottom plots. As expected, each scanhas two dominant frequencies. In the top plot the higher dominant frequency of each scan alignsat the same spatial frequency, 0.5 m−1, which is equivalent to the frequency of the CD variationfrom Figure 2.3. By accounting for the alternating scan speed in order to represent the temporalfrequency content the peaks of the spectral content shift. In the bottom plot the lesser dominantfrequency of each scan aligns at a temporal frequency of 0.1 Hz, which coincides with the frequencyof the MD variation from Figure 2.3. This shift in the spectral content of each scan when presentedwith respect to both spatial and temporal frequency provides the basis for which MD-CD separation292.3. Power spectral analysiscan be conducted with PSA.To determine the dominant frequency components a threshold is applied such that only themagnitudes that exceed the threshold are retained and the rest are set to zero. Of the remainingvalues, the ones that have intersecting spatial frequencies are stored as dominant CD frequenciesand the ones that have intersecting temporal frequencies are stored as dominant MD frequencies.Next, the original FFT output from each scan is processed such that the values with a magnitudebelow another threshold are set to zero (to remove the noise). For the MD reconstruction theidentified CD dominant frequencies are set to zero and for the CD reconstruction the identified MDdominant frequencies are set to zero. Finally, to perform the reconstruction the inverse of the FFTis determined for both the MD and CD frequency content. The resulting CD profiles are shown inFigure 2.5 and the reconstructed MD profiles are shown in Figure 2.6.Figure 2.5: CD profiles after MD-CD separation with PSAThe estimated CD profiles of the first and second scans are shown in the top and bottom plotsof Figure 2.5, respectively. The green signal represents the true CD variation without noise, the redprofile represents the unprocessed observed scanner signal and the blue profile represents the PSAestimate. It is clear from Figure 2.5 that the PSA method helped to reduce the noise and moreimportantly the MD aliasing from the CD profile, particularly for the second scan which traverses302.3. Power spectral analysisthe CD at a faster rate. In Figure 2.6 the reconstructed MD profiles of the first and second scansare shown in the top and bottom plots, respectively. Here the red line shows the scan averagewhich would likely be used for both control and quality reporting purposes in industrial practice.The PSA estimate (blue) of the first scan is very close to the true MD variation (green) but forthe second scan (bottom plot) the PSA MD estimate is out of phase from the true MD variation.Since the dominant MD and CD spectral contents have a similar frequency it is possible that someMD and/or CD frequency content leaked into neighboring frequency bins and was lost during theprocessing. The reason the two plots look like they have different true MD variation frequencies isbecause the bottom plot is taken over a shorter time frame due to the faster scan rate.Figure 2.6: MD profiles after MD-CD separation with PSAThe PSA method for MD-CD separation is summarized in Algorithm 1. In this example theconstant for the first threshold is set as c = 10 and the constant for the second threshold is set asc = 3. Unfortunately one of the drawbacks of the PSA technique is the selection of the thresholdingconstants to optimally remove noise and unwanted variations while maintaining all of the valuableinformation. It is also worth noting that the inverse FFT (IFFT) contains both real and imaginarycomponents so for the purpose of this work only the real components are used to reconstruct theMD and CD profiles.312.4. Compressive sensingAlgorithm 1 Power spectral analysis for MD-CD separationInput: Two consecutive scans y, y at alternating scans speeds.Output: MD trends yMD , yMD and CD profiles yCD , yCDfor each scan doTake x = FFT(y) and |x|Set a threshold as T = c ·mean(|x|) and set values of |x| ≤ T to zeroFind support of remaining nonzero coefficientsCreate separate spatial and temporal frequency supportsend forfor spatial and temporal frequency supports doFind and store supports on consecutive scans that intersectSet values of x below noise threshold (T = c ·mean(|x|)) to zeroSet values of x at intersecting supports to zeroReconstruct yMD = IFFT(x) if intersecting spatial supports were removedReconstruct yCD = IFFT(x) if intersecting temporal supports were removedend for2.4 Compressive sensingThe goal of compressive sensing (CS) is to exploit signal structure (specifically sparsity) in order toaccurately reconstruct a signal with a much lower sampling rate than is dictated by the Shannon-Nyquist theorem. Compressive sensing has already been shown to successfully reconstruct paperprofiles although implementing the structured sampling of the scanning sensor was not sufficientlyaddressed [55]. Random sampling is often necessary when applying compressive sensing but due tothe nature of the sheet forming process it is not practically achievable. Instead, random samplingneeds to be approximated. In this section the CS theory is first presented, followed by a summary ofthe previous work to apply CS for MD-CD separation. Finally, the proposed technique for applyingCS to data from a scanning sensor for both MD and CD profile estimation is introduced.322.4. Compressive sensing2.4.1 TheoryShannon’s theorem states that for one to reconstruct a signal without error, the sampling rate mustbe at least twice the maximum frequency in the signal (i.e., the Nyquist rate). Although signalssuch as paper sheet variations may not have natural bandlimits, low-pass filters are often used tobandlimit these signals so Shannon’s theorem still plays a role. Compressive sensing claims that aslong as certain conditions pertaining to sparsity, incoherence and the restricted isometry property(RIP) are met, the signal can be perfectly reconstructed with high probability using less samplesthan is required by the Nyquist rate [10]. A detailed analysis of these conditions is beyond thescope of this thesis but they are presented briefly in what follows. The CS theory is developed withrespect to the notation introduced in Section 1.5 for the scanner signal processing.Recall from Section 1.5 the matrix Y ∈ RnB×nS of scanner signal data with nB rows wherenB is equal to the number of CD bins and nS columns where nS is equal to the number of MDscans. The vector form y = vec(Y) with y ∈ Rm where m = nB × nS represents the values fromeach scan stacked upon one another. Please note the change in matrix and vector size notationfrom the previous sections which is due to the fact that the estimated paper sheet has larger matrixdimensions than the measured data matrix. Specifically, with CS the geometry of the scanningtrajectory is retained and instead of stacking the data from a single scan at the same MD positionthe actual (or approximate) MD position of the scan data is represented. In other words, with CSthe matrix of the estimated paper sheet is F ∈ RnCD×nMD with nCD rows where nCD is the desiredCD resolution of the estimated sheet which in this work is the number of CD data points measuredacross the sheet (i.e., here nCD = nB). The number of columns nMD represents the desired MDresolution of the reconstruction which is defined here such that the scanning geometry is consideredin the reconstruction (nMD  nS). The estimated sheet profile is originally reconstructed in vectorform f = vec(F) with f ∈ Rn where n = nCD × nMD and n m.Sparsity refers to the number of non-zero coefficients a signal has in a given representation (orsparsifying) basis, Ψ . Sparse signals have relatively few non-zero coefficients which allow them tobe compressed and reconstructed accurately without storing the large amount of small coefficients.A signal, f , is considered S-sparse in the representation basis Ψ if the coefficient sequence x ∈ Rnhas S  n non-zero values and f = Ψx [20]. For example if the Fourier transform of a sinusoidal332.4. Compressive sensingsignal is taken, the resulting spectrum will have large coefficients at the frequencies apparent in thesignal while the rest of the spectrum will be noise around zero magnitude. If there are relativelyfew large coefficients then the Fourier transform is referred to as a sparsifying representation basisof the signal.Consider the representation basis Ψ ∈ Rn×n as an orthonormal basis (e.g., Fourier basis orwavelet basis) that provides a sparse representation of f . Recall from equation (2.13) for an or-thonormal basis the coefficients, xi, of the original signal can be determined by the inner product,i.e.,xi = 〈f , ψi〉 (2.30)wheref = Ψx =n∑i=xiψi. (2.31)Here x is the sparse coefficient sequence of f in the orthonormal representation basis Ψ [10].In addition to sparsity, another condition of CS referred to as coherence is related to the pro-jection matrix, Φ ∈ Rn×n. The projection matrix describes how samples are taken from the domainof the original signal, i.e.,y = RΦf + ν, (2.32)where R ∈ Rm×n is the measurement matrix that contains the sampling coordinates and ν representsadditive noise that corrupts the measured data y. The measurement matrix contains informationregarding the geometry of the scanner trajectory. The measured signal can be expressed in termsof the representation basis coefficient sequence by inserting (2.30) into (2.32) to obtainy = RΦΨx+ ν = Ax+ ν, (2.33)where A ∈ Rm×n is referred to as the compressive sensing matrix.Compressive sensing is concerned with minimizing the coherence, µ, of the sampling which isdetermined by the maximum correlation between Φ and Ψ asµ(Φ, Ψ) =√n max≤i, j≤n|〈φi, ψj〉|. (2.34)342.4. Compressive sensingIf Φ and Ψ are orthonormal bases then µ(Φ, Ψ) ∈ [,√n]. For CS it is desirable to select inco-herent basis pairs for Φ and Ψ as this reduces the number of measurements required for accuratereconstruction. An example of a low coherence pair is the identity or spike basis as the projectionmatrix (Φ) and the Fourier basis as a representation matrix (Ψ) since spikes and sinusoids are max-imally incoherent (i.e., µ = 1) [9][10]. The importance of sparsity and low coherence is shown inthe following formula where the probability of exact reconstruction exceeds 1− δ ifm ≥ C · µ(Φ,Ψ) · S · log(n/δ). (2.35)For maximally incoherent projection and representation bases the number of samples required forexact reconstruction is on the order of S · log(n).Finally, in order to guarantee efficient and robust reconstruction with CS it is desirable to satisfythe restricted isometry property (RIP). The CS matrix A is loosely said to obey the RIP if δS , thesmallest number such that(1− δS)||x|| ≤ ||Ax|| ≤ (1 + δS)||x|| (2.36)is not too close to one [10]. Although the RIP is NP-hard2 to verify it is generally satisfied byensuring a sufficient number of samples are taken for a given sparsity level, S, coherence, µ andrandomness of sampling specified by R [26]. Selecting completely random coordinate samplinglocations is usually used to satisfy the RIP and improve reconstruction for a given number ofsamples. However, in the paper machine application complete random sampling is not practicallyachievable due to the structured sampling scheme of the traversing scanning sensor.The `2-norm applied in (2.36) is one example of the more general `p-norm. Since other types of`p-norms are used in what follows it is useful to define the `p-norm as [52]||x||p =( n∑i=|xi|p) p. (2.37)The objective in CS is to find the sparsest vector x that yields the measurement vector y, which2non-deterministic polynomial-time hard352.4. Compressive sensinginvolves solving the `0-norm, i.e.,min ||x||0s.t. Ax = y(2.38)where A is the CS matrix described earlier. Unfortunately (2.38) is NP-hard, therefore an equivalent`1 relaxation of the problem is solved instead. Furthermore, to account for the measurement noise,ν, the equality constraint in (2.38) is changed into an inequality constraint resulting in the followingoptimizationmin ||x||1s.t. ||Ax− y||2 ≤ σ(2.39)where σ is related to the noise level. In statistics this optimization is known as the Least AbsoluteSelection and Shrinkage Operator (LASSO) [26]. It is also know as the basis pursuit with denoising(BPDN) problem [30]. By duality, an equivalent version of (2.39) is given bymin ||Ax− y||2s.t. ||x||1 ≤ τ(2.40)where τ is a sparsity parameter. For future reference (2.39) is referred to as BPDN and (2.40) isreferred to as LASSO. Here LASSO is a convex program that can be recast as a quadratic programand solved via interior point methods or proximal gradient methods for non-smooth convex opti-mization [26]. In this work the spectral projected gradient for ` (SPGL1) minimization algorithmis used which obtains the solution to the BPDN problem by solving a sequence of LASSO prob-lems. The SPGL1 method is chosen because it is reported as a fast method for large scale sparsereconstruction [30].2.4.2 Previous workAs mentioned before, previous work by Towfighi in 2011 tested the concept of CS for sheet profilereconstruction on both simulated and industrial data. The study tested the Fourier representationbasis in addition to the Haar, Symlet and Daubechies wavelet representation bases. For both thesimulated and industrial trials the Fourier bases provided the lowest reconstruction error. Thisresult was expected for the simulated trial because the simulated sheet was based on sinusoidal362.4. Compressive sensingvariations. For industrial data the Fourier basis also makes intuitive sense if one considers thephysical characteristics of the operations such as pressing and drying rollers that may result inroughly sinusoidal variations [55].One problem with the simulated testing was that the dimensions of the simulated sheet fallwoefully short of representing the true industrial problem. Furthermore, the relied upon samplingratio used is unrealistically high and is not close to what is practically achievable given the scanninggeometry. Some tests are conducted with complete random sampling which is not possible witha scanning sensor. Another test involved two random speed scanners but the traversing angle ofthese scanners is significantly greater than what is attainable in reality. Testing the CS techniqueon industrial data provided encouraging results although once again realistic structured samplingwas not considered. For a high speed paper machine to satisfy the RIP condition it is proposedthat either a sensor array or a large number of scanning sensors be used [55].2.4.3 Compressive sensing for MD-CD separationBased on Towfighi’s results the Fourier basis is used as the representation basis for CS in this work.While a sensor array spanning the width of the sheet would be ideal, here the focus is on perform-ing CS with a scanning sensor. Achieving the necessary number of samples to meet the RIP is notachievable in reality given the high speed of the paper machine and relatively slow speed of thescanning sensor. Furthermore, in industrial settings it is likely impractical to assume the sparsityof the moving paper sheet can be known in order to verify the RIP and determine the minimumnumber of samples required. Therefore, instead of attempting to guarantee perfect reconstructionaccording to the CS conditions, this work focuses on maximizing the obtained information to re-construct the sheet profile through CS. The absence of CS theoretical guarantees should not be adeterrent especially considering CS is successfully used in many applications (e.g., tomography andspectroscopy) without such guarantees [7].For industrial model predictive control (MPC) applications it is common for the scanner signalto be averaged into as many as 500 CD bins. Given a somewhat realistic sheet speed and scan speed,the shallow traversing angle θ ≈ 2◦ results in a very large amount of unmeasured data. For example,consider a scanner traversing the paper sheet at 2◦ from the MD averaging measurements over 500CD bins per scan. Now consider the ‘true’ MD resolution of the paper sheet is limited to that which372.4. Compressive sensingis determined by the CD bins and the geometry of the scanning trajectory. That is to say, the‘true’ sheet has a width of 500 equally spaced CD bins and the 2◦ traversing angle determines thatapproximately 14,500 MD data positions are covered by a single scan. This massive sheet size canbe problematic with regards to simulating CS tests due to storage limitations in MATLAB. Recall,the CS matrix A ∈ Rm×n where m is the number of measured samples and n is the total size ofthe simulated sheet, i.e., m = nB × nS and n = nCD × nMD. Simulating two scans with 500 CDbins and a 2◦ traversing angle yields n = 14.5 million and m = 1000 which results in the CS matrixA requiring roughly 116 GB of storage in MATLAB. Even with sparse matrix representation thenecessary storage requirements far exceed the memory limitations of a standard personal computer.To build on Towfighi’s work it is desirable to provide a more realistic representation of the sam-pling ratio and scanning structure determined by the geometry of the scanning sensor. Furthermore,it is desirable to determine whether or not CS can provide effective MD-CD separation especiallywith respect to the effects of MD aliasing. Unfortunately due to the memory limitations availablefor this work, some compromise is necessary with regards to representing the true dimensions of theindustrial problem. To balance these desires and limitations a smaller number of CD bins are usedwhile maintaining a fairly accurate reflection of the sampling geometry and traversing angle (i.e.,the ‘true’ simulated sheet still contains a large MD dimension). This is demonstrated by Figure 2.7where four scans are performed on a simulated sheet with 64 CD bins that is 8 m wide and mov-ing at 6 m/s. The simulated sheet contains single MD and CD sinusoidal variations with ωMD ≈1/(2tscan) and additive zero mean normally distributed random noise with standard deviation σe =0.1. Although Figure 2.7 is not to scale, the scanner moves at 0.86 m/s creating a traversing angleof 8.1◦ that results in a total of nMD = 1792 MD data points. The effect of MD aliasing is evidentby comparing the observed CD profile averaged over four scans (in red) to the true CD profile (ingreen).Even with the slow moving sheet and relatively simplistic MD and CD profiles the samplingratio achieved is still only 0.2% of the total sheet. To improve this sampling ratio for CS a simpleassumption is proposed here that allows for an increased sampling ratio.Assumption 4. The CD profile variations are slow enough to be assumed static over a short period(e.g., ≈10%) of a single scan.382.4. Compressive sensingFigure 2.7: Simulated sheet with 64 CD bins and 4 scansNote that this assumption is reasonably justifiable based on the characterization of paper sheetvariations presented in Section 1.4. Furthermore, it is much less assertive than the assumption usedby most MD-CD separation methods (including EF) which assume the CD profile is essentiallyindependent of time. In practice this allows the scanner signal to represent a band of measurementsacross the paper sheet as opposed to just a line composed of single points. The band is createdby using the measurement at a given CD bin and applying it to neighboring MD columns (alongthe same bin) that were missed by the scanner. No new information is taken from the true papersheet but instead the near static behavior of the CD profile is used to better leverage the obtainedmeasurements and increase the sampling ratio.For example, from the first scan in Figure 2.7 the value measured at the second bin is Y(2,1) =F0(2,8) = -0.2668 where Y is the matrix of scanned data and F0 is the matrix of the full sheet.Assumption 4 is applied in this example to four neighboring values, i.e., it is assumed that F0(2,6) =F0(2,7) = F0(2,8) = F0(2,9) = F0(2,10) = -0.2668. This is performed for each measurement fromeach scan providing a band of measurements across the sheet where the measurements from aparticular CD bin of a particular scan all share the actual measured value. This assumption isapplied to all of the scans and then the measurement vector y and CS matrix A are updated toreflect these new values. Next, CS is performed by solving the BPDN problem shown in (2.39) using392.4. Compressive sensingthe SPGL1 technique described before. The sheet is estimated from the coefficients, i.e., f = Ψx,and reshaped into the estimated sheet F shown in Figure 2.8 below. This simple demonstrationprecedes the more thorough comparative analysis of CS, PSA and EF which is described in thefollowing chapter.Figure 2.8: Simulated sheet reconstruction with compressive sensing40Chapter 3Experimental SetupIn this chapter the experimental setup for comparing the exponential filtering (EF), power spectralanalysis (PSA) and compressive sensing (CS) techniques is described. During industrial operationonly the measurements obtained from the scanning sensor are typically available. Unfortunatelysince the true sheet profile is not measured there is no way to measure the reconstruction accuracy.To determine the true sheet, post-production laboratory tests can be performed but these tests arelimited in scope and expensive to perform [5]. Consequentially, the analysis presented in this workis limited to simulated experiments.To compensate for the lack of industrial data Honeywell has provided a CD profile that replicatesa realistic industrial profile. This profile is shown in Figure 3.1 and a compressed version of it (i.e.,less CD bins) is used in the experiments to represent the CD variation on the simulated paper sheet.Since the primary focus of this work is to study the ability of MD-CD separation methods to reduceFigure 3.1: Realistic CD profile provided by HoneywellMD aliasing, the sinusoidal MD variations are specifically chosen to have frequencies near integermultiples of (2tscan)−1.To generate the simulated sheet the number of CD bins is specified along with the number ofscans, the scan speed, sheet speed and sheet width. From this information the traversing angle θ41Chapter 3. Experimental Setupis calculated as shown in equation (1.1). The MD length of the simulated sheet is based on thenumber of discrete MD data points necessary to approximate θ for the specified number of CDbins. With the necessary dimensions of the simulated sheet determined, the CD profile is repeatedat each MD position and the MD profile is repeated at each CD bin to construct F¯CD0 and F¯MD0 ,respectively. As mentioned before, zero mean normally distributed random noise with standarddeviation σe is used to generate the residual variation matrix FR0 . Finally, the simulated sheet isconstructed similar to equation (1.2), i.e,F0 = F¯MD0 + F¯CD0 + FR0 . (3.1)The simulated scanner extracts measurements of F0 at each CD bin and MD positions determined bythe traversing angle. When the scanner reaches the end of the sheet it stops and reverses ultimatelytaking two measurements at that CD bin such that each scan has a total nB measurements.Comparing the EF, PSA and CS methods for MD-CD separation is not straightforward since thefavorable implementation of each technique is unique. For example, the EF method is typically moreeffective once a large number of scans have been processed whereas the PSA method compares onlytwo consecutive scans. Furthermore, the PSA method requires alternating scan speeds whereas theCS method achieves a higher sampling ratio if the fastest scan speed is maintained. Instead of tryingto compare the methods with the exact same implementation, each method will be implementedto allow for a favorable reconstruction given the same simulated sheet and scanning setup. Thisprevents improper implementation of an MD-CD separation technique to be a reason why onereconstruction is better than another. The advantages and disadvantages of the implementation ofeach technique are discussed in Chapter 5.The sheet profile estimates obtained from each MD-CD separation method are compared basedon how well they match the true MD and CD variation profiles. Specifically, the root mean squareerror (RMSE) is calculated for the estimated profiles relative to the true MD and CD profiles asfollows: =√√√√1kk∑i=(pi − pˆi), (3.2)where k is the number of samples in the profile, pi is the true property value and pˆi represents the42Chapter 3. Experimental Setupestimated value. The smaller the RMSE the better the estimated profile represents the true profilewith an RMSE of 0 indicating a perfect fit. For EF the scan average is used as the MD profile soa zero order hold will be placed on the average value in order to extend to the true length of thesheet. Similarly for PSA the reconstructed MD profile is only the length of the scanned signal sointerpolation is performed to generate a signal of desired length. In situations where a differentnumber of scans are used for each method the specific reconstructed scans are typically from theend of the sheet but for clarity this is indicated along with the results.Finally, with regards to the specific set of tests conducted to compare each method, the simula-tion parameters are provided with the results but the general experimental outline is as follows:Trial 1: MD aliasing with slow moving sheet and 100 CD bins,Trial 2: MD aliasing with fast moving sheet and 60 CD bins,Trial 3: compressive sensing with two scanning sensors.For each of the trials multiple tests are performed where the simulation configurations are alteredslightly to study additional factors such as the effect of noise. Generally the major simulationparameters within a trial remain constant and the ones that change are stated explicitly before newresults are presented. This allows for a more condensed representation of the experimental resultswithout having to repeat similar setup parameters for each new test.43Chapter 4Experimental ResultsThe results presented in this chapter consist of a series of tests that have been categorized intothree major trials. The first trial compares the MD-CD separation methods with a fairly large CDprofile, nCD = nB = 100, but a very slow sheet speed, vsheet = 1 m/s. This unrealistic sheet speed isrequired due to memory limitations posed by constructing the CS matrix, A. This trial is performedto showcase that the CS method is capable of handling a larger CD dimension if a large enoughmemory is provided. The second trial is conducted with a smaller CD dimension but a faster sheetspeed. Increasing the sheet speed allows for the MD-CD separation problem to be studied with afairly realistic scanning geometry. The focus of the first two trials is comparing the various MD-CD separation methods particularly with regards to alleviating the MD aliasing problem. Finally,the third trial provides some insight into the potential benefits of using compressive sensing withmultiple scanning sensors.4.1 Trial 1: Slow moving sheet with 100 CD binsThe simulation parameters for Trial 1 are summarized in Table 4.1. To build memory into theTable 4.1: Simulation parameters for Trial 1Parameter Description Valuewsheet sheet width (m) 8vscan scan speed(s) (m/s) 0.33, 1tscan scan time (s) 24, 8vsheet sheet speed (m/s) 1θ traversing angle from MD (◦) 18.4, 45nCD = nB number of CD bins 100nEFS , nCSS number of scans for EF and CS 20, 8ωMD frequency of MD sinusoid(s) (Hz) 0.062, 0.12exponential filter the sheet is simulated for 20 scans but the CS method is only performed on the444.1. Trial 1: Slow moving sheet with 100 CD binsfinal 8 scans and the PSA technique only compares the final two scans. Since the PSA techniquerequires alternating scan speeds a separate scanning trajectory is performed over the final eightscans as the blue line in Figure 4.1 demonstrates. In order to compare the sheet profile estimatesof the various MD-CD separation methods the reverse scan speed of the PSA method is equivalentto the regular scan speed from the CS and EF methods. This results in the final scan from all3 methods sharing the same trajectory which allows the profile reconstructions to be compareddirectly. The sheet is constructed with the realistic industrial CD profile and two MD sinusoidalvariations with frequencies near (2tscan)−1 to create MD aliasing.Figure 4.1: Simulated sheet for Trial 1All 20 scans of the original sheet are passed through an exponential filter with a filter factorequal to α = 0.33 to generate an EF estimate of the final scan. The PSA method is performed asdescribed by Algorithm 1 with c = 4 and c = 3 to reconstruct the PSA estimate for the final scan.The 8 scans shown in Figure 4.1 are used to reconstruct the entire sheet via compressive sensingwith the help of Assumption 4. After each of these MD-CD separation methods are performed theresulting CD profile and MD trends are presented together along with the true profile in Figure4.2. The RMSE of each profile with respect to the true profile is presented in the legends which arelocated in different positions on the plot to provide the reader with the clearest possible view of theestimated profiles.454.1. Trial 1: Slow moving sheet with 100 CD binsFigure 4.2: MD-CD separation results for Trial 1Of the three methods the CS method provides the best reconstruction followed by the PSAtechnique and lastly the EF method. Although the PSA method seems to have removed part ofthe MD aliasing, the profile still over-estimates some of the CD variations which is potentially dueto MD contributions that could not be distinguished from the CD frequencies. On the other hand,the CS estimate captures the essential shape of the true CD profile but the variation amplitudeappears underestimated. The soft thresholding technique used by the SPGL1 method sparsifies thesolution by removing small coefficients at the expense of reducing the amplitude of the remainingcoefficients. This is also apparent in the MD profile where the CS method once again gives thebest reconstruction and the PSA method is equivalent to the EF method meaning that only thescan average was recovered. The low frequency MD variation was likely either removed by the PSAthreshold or leaked into the CD frequency bins.To address the bias with the CS method caused by the soft thresholding the following scalingconstant is proposed:β =12((max(Y)−max(F)) + |(min(Y)−min(F)|)). (4.1)464.1. Trial 1: Slow moving sheet with 100 CD binsHere the scalar β is multiplied by the CS reconstructed profile to increase the amplitude of thevariations as shown in Figure 4.3. In terms of the RMSE the CS MD trend (bottom) is improvedFigure 4.3: MD-CD separation after scaling for Trial 1after scaling but the CD profile (top) has a slightly larger error. The scaling of the CS CD profiledoes however help visualize that the original CS reconstruction captures the general shape of thetrue CD profile, albeit with lower magnitude variations.The second part of this trial involves adding zero mean normally distributed random noise withstandard deviation σe = 0.15 to the simulated sheet. For reference the noisy simulated sheet isshown in Figure 4.4. Again, the EF uses all 20 scans, CS uses the final 8 scans and the PSA methodcompares only the final 2 scans. The results of the MD-CD separation for all three methods withnoise are shown in Figure 4.5 where the CS method provides the most accurate CD profile and MDtrend reconstructions. In this trial the inclusion of noise appears to only slightly degrade CD profileand MD trend estimates from each MD-CD separation method. Both the PSA and CS methodsare effective in removing the noise with only a small number of scans and the EF method is able toreduce the effect of the noise by averaging over 20 separate scans.474.2. Trial 2: Fast moving sheet with 60 CD binsFigure 4.4: Simulated sheet with noise for Trial 1Figure 4.5: MD-CD separation results with noise for Trial 14.2 Trial 2: Fast moving sheet with 60 CD binsThe second trial involves reducing the number of CD bins to 60 in order to extend the sheet speedup to 8 m/s and study a more realistic traversing angle. The simulation parameters presented in484.2. Trial 2: Fast moving sheet with 60 CD binsTable 4.2 show that the regular traversing angle for the CS and EF scans is 7.1◦ from the MD whichis significantly closer to a realistic 2◦ value than the 45◦ trajectory presented in the first trial.Table 4.2: Simulation parameters for Trial 2Parameter Description Valuewsheet sheet width (m) 8vscan scan speed(s) (m/s) 0.5, 1tscan scan time (s) 16, 8vsheet sheet speed (m/s) 8θ traversing angle from MD (◦) 3.6, 7.1nCD = nB number of CD bins 60nEFS , nCSS number of scans for EF and CS 20, 6ωMD frequency of MD sinusoid(s) (Hz) 0.062, 0.12The simulated sheet shown in Figure 4.6 is very similar to the one from Trial 1 except withrespect to the reduced CD dimension and the increase in the MD dimension from nMD = 800 tonMD = 2880. Furthermore, only 6 scans are used for the CS reconstruction in this trial in order toreduce the required size of the CS matrix. Once again the PSA scan speeds are designed such thatthe final scan follows the same trajectory as the CS scans in order to allow for a direct comparison ofthe profile estimates from the final scan. The MD-CD separation methods are performed as beforewith the only difference being the thresholding coefficients of the PSA method in Algorithm 1 whichare decreased to c = 2 and c = 1.5. The reconstructed CD profiles are compared in the top partof Figure 4.7 and the estimated MD trends are compared in the lower plot with CSS indicating thescaled CS reconstruction.Comparing the RMSE of each reconstruction, displayed in the legends of Figure 4.7, shows thatthe scaled CS method provides the most accurate reconstruction for both the CD profile (top) andMD trend (bottom). For the CD profile the next best estimate is the regular CS method followed bythe PSA technique and finally the EF method which suffers from severe MD aliasing. The MD trendestimates follow the same general trend except now the PSA MD estimate is worse than the EFestimate which just uses the scan average. It is clear that the PSA MD trend contains a significantamount of CD variation, likely due to frequency bin overlap caused by the relatively small numberof CD bins in combination with the lower threshold values. In fact, the PSA MD trend estimateclosely represents the CD profile estimate which strongly suggests that the MD variations were notsufficiently distinguished in the spectrum of the final scan.494.2. Trial 2: Fast moving sheet with 60 CD binsFigure 4.6: Simulated sheet for Trial 2Figure 4.7: MD-CD separation results for Trial 2Once again zero mean normally distributed random noise with standard deviation σe = 0.15is added to the sheet. For concise presentation and less redundancy the noisy simulated sheet isnot presented here but Figure 4.4 can be referred to because the same level of noise was added.504.3. Trial 3: Compressive sensing with two scanning sensorsThe effect of noise is less apparent in the EF CD profile when there are less CD bins, as is clearby comparing Figure 4.5 and Figure 4.8. The MD-CD separation results with additive noise arevery similar to those without noise, i.e., the scaled CS estimate still provides the most accuratereconstruction for both the CD profile and MD trend. The only significant difference is that a highfrequency MD variation occurs in the CS estimates of the MD trend as a result of the noise.Figure 4.8: MD-CD separation results with noise for Trial 24.3 Trial 3: Compressive sensing with two scanning sensorsOne of the major differences between CS relative to the EF and PSA MD-CD separation methodsis that CS actually reconstructs the entire sheet of paper for the specified resolution. In simulationsthe desired resolution can be set as the resolution of the true simulated sheet but in reality thetrue paper sheet does not have a specific resolution. The desired resolution of the reconstructionmust be specified based on the control requirements and the available measurements. This is oftenaccomplished by compressing the large amount of single scan data into a common resolution of CDbins and a single average for the MD value.With the CS method the long MD dimension of the scan can be split into nB equal size sections514.3. Trial 3: Compressive sensing with two scanning sensorsand the paper sheet can be reconstructed with an equal number of MD and CD values per scan.This can be interpreted in simulations as having a slow moving sheet with θ = 45◦ but in reality thelonger MD dimension has been compressed and the data for each MD position represents a largerspatial dimension than for each CD position. Even though the MD resolution is artificially low, thistype of MD binning can allow for better consideration of MD variations than is currently providedby averaging all of the scanned values into a single point.To better observe the potential of CS this trial focuses solely on the CS method with two scanningsensors traversing in opposing directions. Two scanners represent a realistically achievable sensingframework and provides insight into the benefit of additional samples with CS. The following testsbegin with a large CD dimension of 200 bins and proceed to reduced CD dimensions of 100 and 60bins, respectively. The CS reconstructed sheet is compared visually to the true simulated sheet andthe average CD and MD profiles are compared with both the true profiles and the observed scannerprofiles.4.3.1 200 CD binsDue to memory limitations posed by constructing the CS matrix, A, the simulated sheet speed isrestricted to 1 m/s and only two scans (per scanner) are considered, as shown in Table 4.3. As withthe other trials, MD profile frequencies are specifically chosen such that significant MD aliasingresults. The same zero mean normally distributed random noise with standard deviation σe = 0.15is added to the sheet. Furthermore, the same industrial CD profile provided by Honeywell is used tocreate the simulated sheet. In Figure 4.9 the simulated sheet and the trajectory of the two scannersis shown on the left and the CS reconstruction of the sheet is presented on the right.Table 4.3: Simulation parameters for Trial 3, 200 CD binsParameter Description Valuewsheet sheet width (m) 8vscan scan speed(s) (m/s) 1tscan scan time (s) 8vsheet sheet speed (m/s) 1θ traversing angle from MD (◦) 45nCD = nB number of CD bins 200nS number of scans (per scanner) 2ωMD frequency of MD sinusoid(s) (Hz) 0.062, 0.12524.3. Trial 3: Compressive sensing with two scanning sensorsFigure 4.9: Simulated sheet and CS estimate with 200 CD binsA larger CD dimension results in a lower overall sampling ratio because the number of samplesper MD position is limited to the number of scanners. Although the CS reconstruction on the rightside of Figure 4.9 has obvious imperfections it still provides valuable additional information relativeto traditional scanner signal processing and exponential filtering. By leveraging the geometry of thescanning sensors the CS method makes efficient use of the available scanner information to providean informative estimate of the sheet.To determine the results of MD-CD separation, averages are taken along each MD position ofthe CS reconstruction for the MD profile and along each CD bin for the CD profile. The sameaveraging is conducted for the raw scanner data to generate the profile comparison presented inFigure 4.10. From the average CD profiles (top plot) the CS estimate (red) reduces some of thealiasing exhibited by the CD profile of the scan average (blue) which is evident numerically with theRMSE values presented in the legend. While the CS estimate provides a clear improvement, somealiasing is still present and the magnitudes of the estimated CD variations are smaller than the trueprofile (green). It is worth noting that the MD aliasing observed in the raw scan data is reduceddue to the consideration of both scanners. The bottom plot compares the average MD trends wherethe scanned values (blue) are the average value over each scan. Again, the CS estimate containsobvious imperfections but still provides a clear improvement over simply taking the scan average.534.3. Trial 3: Compressive sensing with two scanning sensorsFigure 4.10: MD-CD separation results for Trial 3, 200 CD bins4.3.2 100 CD binsReducing the CD dimension to 100 CD bins allows an increase in the sheet speed to 3 m/s withθ = 18.4◦ for two scans per scanner. Otherwise the sheet is simulated with essentially the sameparameters as before, including the additive noise of standard deviation σe = 0.15. Again theMD variation frequencies are chosen to intentionally create severe MD aliasing in the observedmeasurements. The simulated true sheet is presented with the scanning trajectory on the left sideof Figure 4.11 and the CS reconstruction is displayed on the right. In this test the CS reconstructionprovides a fairly accurate representation of the true sheet. Note that the change in increment onthe z-axis is indicative of reduced magnitude for the variations in the CS reconstruction.To observe the results of the MD-CD separation the same averaging as before is performedto generate the CD profile and MD trend for both the CS reconstruction and the raw scan data.Observing the average CD profiles displayed in the top part of Figure 4.12 reveals a CS estimatethat has removed the majority of the MD aliasing present in the raw scan data. Furthermore, theRMSE of the CS estimate for the CD profile is much smaller than that of the scanned data andalso smaller than the RMSE of the previous CS reconstruction with 200 CD bins. Similarly, the544.3. Trial 3: Compressive sensing with two scanning sensorsFigure 4.11: Simulated sheet and CS estimate with 100 CD binsFigure 4.12: MD-CD separation results for Trial 3, 100 CD binsCS estimate of the MD trend in the bottom part of Figure 4.12 provides a significant improvementover the scan average values and the estimate from the previous test with 200 CD bins. As before,the CS reconstruction of the sheet is not perfect but it clearly provides potentially important554.3. Trial 3: Compressive sensing with two scanning sensorsadditional information about the paper sheet. It is also clear from these results that given theright circumstances the CS method is able to successfully reduce the detrimental effects of MDaliasing. This test demonstrates the strong potential of CS for sheet profile estimation with aslightly improved sensing framework.4.3.3 60 CD binsThe final test involves two scans per scanner over 60 CD bins with a sheet speed of only 1 m/s.The reduced CD dimension allows for an increased MD dimension (i.e., sheet speed) with regardsto the memory limitations posed by constructing the CS matrix. However, the slow sheet speed of1 m/s is chosen here to study the computation requirements with the smallest CS reconstruction.In Section 4.4 the results are summarized and the troublesome computation requirements of theCS method are considered. Other than the reduced CD dimension and sheet speed the simulatedsheet is constructed with essentially the same parameters as the previous test. This is evident byobserving the simulated sheet presented on the left side of Figure 4.13. Interestingly, even with theincreased sampling ratio due to less CD bins, the CS reconstruction on the right side of Figure 4.13is actually less accurate than the previous CS estimate with 100 CD bins. To speculate as to why,it is potentially possible that a trade-off exists with regards to the complexity (i.e., sparsity) of theCD profile and the resolution of measurements available to estimate it.The same averaging as before is performed to evaluate the resulting MD-CD separation bycomparing the average CD profile and MD trend. The CD profile estimates are compared in thetop part of Figure 4.14 and the MD trends are compared in the bottom plot. Although the CSestimates for both the CD profile and MD trend outperform the raw scan estimates, they both fallshort of the accuracy presented by the previous CS estimates with 100 CD bins. Even though theMD aliasing is only partially removed, the CS estimate still provides a valuable improvement overthe traditional scanner signal processing.The full tables of simulation parameters for the 100 CD bin and 60 CD bin tests were omittedhere for concise presentation and limited redundancy. These parameters are presented along withadditional simulation results in Appendix A. The additional simulation results include reconstruc-tions using 4 scans for both 100 CD bins and 60 CD bins.564.4. Results summary and computational considerationsFigure 4.13: Simulated sheet and CS estimate with 60 CD binsFigure 4.14: MD-CD separation results for Trial 3, 60 CD bins4.4 Results summary and computational considerationsThe results of the various simulations are summarized here and consideration is given to the com-putational requirements of the CS method. In Table 4.4 the RMSE results for Trial 1 with 100 CD574.4. Results summary and computational considerationsbins are summarized for each method. Table 4.5 portrays the summarized RMSE results for Trial 2with 60 CD bins and a sheet speed of 8 m/s. The results from Trial 3 with two scanning sensors aresummarized in Table 4.6. In the first two trials the CS and scaled CS method provide the lowestRMSE results followed by the PSA technique with the industrial standard EF technique generallyproviding the worst MD and CD profile estimates. Additive noise appears to cause a relatively mi-nor increase in the RMSE of each MD-CD separation method. The CS method also fares well in thethird trial relative to the raw scan averages which serve as the benchmark because not enough scansare considered for proper implementation of the EF method. Comparing all three trials reveals thatthe best CS reconstruction is achieved with 100 CD bins and two scanning sensors.Table 4.4: Summary of experimental results for Trial 1Description Estimation Method CD RMSE MD RMSEσe = 0 CS 0.30 0.81σe = 0 PSA 0.48 1.33σe = 0 EF 0.78 1.33σe = 0 Scaled CS 0.36 0.52σe = 0.15 CS 0.32 0.85σe = 0.15 PSA 0.49 1.33σe = 0.15 EF 0.80 1.33σe = 0.15 Scaled CS 0.41 0.50Table 4.5: Summary of experimental results for Trial 2Description Estimation Method CD RMSE MD RMSEσe = 0 CS 0.45 0.92σe = 0 PSA 0.64 1.42σe = 0 EF 0.77 1.32σe = 0 Scaled CS 0.43 0.51σe = 0.15 CS 0.46 0.93σe = 0.15 PSA 0.66 1.42σe = 0.15 EF 0.77 1.32σe = 0.15 Scaled CS 0.42 0.55The results presented thus far have demonstrated the compressive sensing method performsfavorably relative to the exponential filtering and power spectral analysis methods. However, onesignificant disadvantage of the CS method is the increased computation time that could renderonline implementation impractical. The required computation time varies significantly dependingon the sampling ratio and size of the desired reconstruction, as is demonstrated in Table 4.7.584.4. Results summary and computational considerationsTable 4.6: Summary of experimental results for Trial 3Description Estimation Method CD RMSE MD RMSE200 CD bins CS 0.43 0.42200 CD bins Scan average 0.68 0.93100 CD bins CS 0.20 0.24100 CD bins Scan average 0.68 0.9360 CD bins CS 0.46 0.4760 CD bins Scan average 0.69 0.93Table 4.7: Compressive sensing computation requirementsTest Description Approximate computation time (s)Trial 1, σe = 0.15 1358Trial 2, σe = 0 2986Trial 3, 200 CD bins 2477Trial 3, 100 CD bins 320Trial 3, 60 CD bins 21It is only with the smallest CS reconstruction, i.e., 60 CD bins with two scans, that the CSmethod provides a practical computation time for online implementation. This is an importantconsideration for the MD-CD separation control problem which significantly favors rapid computa-tion, i.e., in less than one minute. Another important consideration is the fact that an increasedsampling ratio drastically improves the computation time. Unless a much faster CS method orprocessor is used, the implementation of CS for online control purposes is not practical with only asingle scanning sensor. In Chapter 5 the benefits and limitations of each MD-CD separation methodare summarized and recommendations for application and future studies are provided.59Chapter 5ConclusionThe industrial MD-CD separation problem has proven surprisingly difficult to overcome. Of allthe proposed MD-CD separation methods presented in Chapter 2 the long-standing EF techniqueis still the generally preferred industrial method. However, as the MD and CD control systemsbecome capable of removing higher frequency variations the limitations of the EF method maybecome restrictive to improved control. Furthermore, finding means to better address the problemof MD aliasing is necessary for further progress in control of sheet properties. The standard use of asingle scanning sensor creates significant challenges for proposing an improved MD-CD separationtechnique. While new sensing strategies such as sensor arrays and multiple scanners have beendeveloped, the prohibitive cost of this sensor technology has limited widespread adoption.In this work a comparative analysis is conducted between the EF method and two recent tech-niques, i.e., CS and PSA. These methods are compared with a single scanning sensor and the CSmethod is studied further using two scanning sensors. The qualitative results of the simulationstudies (summarized in Section 4.4) indicate the strength of the compressive sensing method underthe given simulation conditions. However, the CS method has limitations that are not adequatelyrepresented by the results of the experimental analysis. A more extensive consideration of the bene-fits and limitations associated with the studied MD-CD separation methods is summarized in Table5.1.The comparison of features in Table 5.1 stems primarily from research during the literaturereview and experience implementing each method in simulations. Both the PSA and EF methodshave computation speeds that are easily suitable for online implementation but the computationrequirements of the CS method are problematic. As seen in Section 4.4 the CS method only becomespotentially suitable for online implementation if a large sampling ratio and a small sheet dimensionis applied. With the large common CD resolution of modern industrial control systems the CS60Chapter 5. ConclusionTable 5.1: Comparison of MD-CD separation methodsFeature CS PSA EFComputation speed Slow Fast FastUser complexity Medium High LowRequired memory High Low LowTuning importance Minor Crucial MinorMD aliasing attenuation Good Fair PoorCD profile estimation Good Good PoorMD profile estimation Good Fair PoorReconstruction resolution Good Fair PoorRequired number of scans Few Few ManyAssumption reliance Low Medium Highmethod is likely only suitable for online implementation if a sampling strategy that is significantlybetter than a single scanning sensor is used.With respect to user complexity the PSA method is the most demanding particularly becauseof the importance of proper selection of the threshold tuning parameters. If these parameters areselected too high some of the dominant frequencies might be removed from the estimate and if theyare chosen too low MD aliasing and unwanted noise might not be removed. After the desired MDresolution is selected and the CS matrix is constructed the CS method is not very demanding onthe user. Only the parameter σ from (2.39) requires specification and for all of the tests conductedin this report it was simply set as σ = 0.001. Since σ is related to the noise level and fit of the datait can be increased to reduce the number of necessary computations or it can be reduced to increasethe number of computations. Here σ = 0.001 is chosen because it was determined that a lower valuedid not provide substantive increases in the estimation accuracy. In this sense the CS method wastuned for accuracy as opposed to speed which likely contributed to some of the slow computationspeeds. The EF method also has a single tuning parameter, i.e., the filter factor, which balances theweighting of the most recent measurement against the accumulated average. Only the CS methodhas difficulty with required memory due to the construction of the large CS matrix.The attenuation of MD aliasing is most successful with the CS method although optimal tuningof the PSA method could potentially provide improved results. This reflects directly on the CDprofile estimation which naturally favored the CS and PSA methods. The PSA method strugglesto provide consistent estimates of the MD profile to improve upon the scan average benchmarkwhereas the CS method was more consistent in providing valuable MD information that is other-615.1. Recommendations for future workwise not available. Another feature that strongly favors the CS method is the ability of the CSmethod to increase the resolution of the sheet reconstruction beyond what is generally possibleunder the Shannon-Nyquist sampling theorem. This ability to accurately estimate a high resolutionreconstruction from relatively limited samples is arguably the largest strength of CS.The final features covered in Table 5.1 relate to the necessary number of scans that need to beconsidered for each technique as well as the reliance of each method on assumptions. For accuratereconstruction the CS method was found to require consideration of at least two consecutive scans.More scans often resulted in an improved estimate but once again memory limitations becomerestrictive. The PSA method requires at least two scans at alternating speeds to compare spectra.Depending on the value of the filter factor and the degree of MD aliasing, the EF method requiresmany scans in order to build up an average of previous scans that can help eliminate the effect of MDaliasing. Reliance on assumption is primarily evaluated based on the common assumption of a staticCD profile. The EF method requires the CD profile to be static for the entirety of the filter memory,i.e., potentially many scans. If the CD profile changes and the EF is not reset the changes in the CDprofile can be masked by the filter and may take a long time to be recognized. The PSA method onlyrequires the CD profile to be static over the two scans that are being compared and the CS methodonly requires the CD profile to be static over a small portion of a single scan. In the following twosections these features and the experience acquired are used to provide recommendations for futureresearchers as well as recommendations for industrial practitioners.5.1 Recommendations for future workThe comparative analysis provided in this work provides valuable insight into the use of PSA andCS for MD-CD separation. The preliminary implementation of both methods allows for furtherdevelopment and potential for improvement that is beyond the scope of this research. For example,one area for recommended development is in the selection of the threshold for PSA. A more idealthreshold would better adapt to the noise level and number of primary frequency components in thescanned signal instead of relying on the specification of the tuning constants. Another potentiallyuseful threshold could simply allow the user to select the maximum number of dominant MDfrequencies to be identified and removed from the CD profile. Since the PSA method has already625.2. Recommendations for industrial progressbeen patented it is recommended that a similar technique be developed that leverages alternatingscan speeds with a different basis function representation. Finally, instead of simply setting aparticular frequency component to zero it is recommended that the MD contribution and the CDcontribution at that particular frequency bin be quantified and reduced accordingly.With respect to the CS method there are many areas available for suggested improvement. It isrecommended to study the CS method on a more powerful computing system with a larger memoryto evaluate performance with a larger CD dimension. Similarly, it would be ideal to develop a CSmethodology that did not require such a large memory. With regards to biasing of CS estimatescaused by soft thresholding it is recommended that a more robust debiasing technique be explored asopposed to the multiplicative scaling used in this work. A better debiasing technique could identifythe supports of the large coefficients and scale the magnitude of these coefficients to the value ofthese supports in the original signal. Another important area for further development is improvingthe speed of the CS technique. A simple suggestion to improve the computation speed would be touse a more powerful computing system. Additionally, the parameter σ could be increased to relaxthe required accuracy of the CS reconstruction and reduce the number of required computations.Ultimately, it would be ideal to develop a recursive CS method for online implementation. Finally,it is recommended that a robust examination of both the PSA and CS techniques with industrialdata be conducted. Unfortunately such a study is difficult to acquire as comparison to the truesheet properties requires expensive offline testing.5.2 Recommendations for industrial progressMany MD-CD separation techniques have been explored in literature and proposed as potential im-provements to the EF technique regularly employed by industry. Furthermore, alternate samplingschemes such as sensor arrays, multiple scanners and variable speed scanners have been proposedto improve upon the typical constant speed single scanning sensor. Unfortunately, these proposedchanges have not been successful in gaining widespread industrial implementation. For the alter-native sensing frameworks the challenge involves providing clear justification that the improvedproperty estimation and control justifies the high cost of the additional sensor technology. Under-standing the continued dominance of the EF method for MD-CD separation in light of the literature635.2. Recommendations for industrial progressdevelopments requires consideration of the user complexity and reliability of each method.Relative to alternative techniques proposed in literature, EF offers very simple implementation inaddition to high reliability up to a certain accuracy. It is speculated that the complex specification oftuning parameters that may vary across different operating conditions and different paper machinesprovides a strong deterrent for industrial adoption. To compete with EF an MD-CD separationtechnique needs to maintain straightforward implementation with reliability of estimation while alsoproviding improved resolution and accuracy of the paper sheet properties. To date the literaturehas focused predominantly on the latter aspects of improved performance and has fallen short ofconsidering the prior aspects of user-friendliness and reliability. This likely helps to explain a largeamount of the lack of industrial acceptance for proposed MD-CD separation methods.The PSA and CS techniques described and implemented in this work are insufficiently developedto successfully provide a robust estimation for industrial control purposes. Although neither areready for standalone implementation they both are capable of offering valuable insight not providedby the traditional EF. The PSA method is recommended for online implementation to supplementthe EF method by identifying MD variations and removing these from the CD profile. To beconservative only the largest identified MD variations can be removed. Similarly, CS could be usedto help retroactively adjust the CD profile and improve the resolution of the MD trend. Furthermore,CS could help identify if the CD profile has changed which would indicate that the exponential filtershould be reset.As a final consideration, the offline implementation of CS could offer valuable information forquality reporting purposes. Accounting for the geometry of the scanning sensor allows the CSmethod to generate valuable information about sheet variations not available with the standardvariance partitioning analysis. Even if the true MD dimension needs to be condensed due tomemory limitations the CS method could identify high frequency MD variations that are ignoredby simply taking the scan average. This increased resolution could provide important informationfor performing fault detection and determining better operating practices to reduce high frequencyMD variations. Ultimately, while further development is required for the CS and PSA techniquesto replace EF, both methods still offer potential sources of insight into achieving a more accuraterepresentation of the true sheet property variations.64Part IIMachine Direction Adaptive Control65Chapter 6IntroductionIn the second part of this work the focus shifts to the development of an adaptive control strategyfor the MD process of a paper machine. Model-based control methods such as model predictivecontrol (MPC) have gained widespread implementation in many industries including sheet andfilm processes such as paper-making. The use of MPC for MD control has become increasinglypopular since it was introduced by Honeywell in 1994 [16]. One of the difficulties associated withmodel-based control methods is maintaining an accurate model of the process when the true plantis changing according to the varying operating conditions and physical characteristics. A unifiedadaptive control framework is introduced and implemented to automatically update the controllermodel while maintaining closed-loop control when the true plant deviates from the previous model.This work unifies collaborative efforts made by various members of our research group into anintegrated MD adaptive control framework. The adaptive control strategy consists of techniquesdeveloped for MD process monitoring (PM), input design (ID) and system identification (SI). Byintegrating these separate procedures an adaptive tuning strategy for the MD process is developed.In this work the integrated framework is tested using an industrial simulator of an MD controlsystem provided by our industrial collaborators at Honeywell. Specifically the presented case studiesinvolve a simulated multi-input, multi-output (MIMO) MD process with an MPC control systemthat undergoes model-plant mismatch (MPM). The adaptive framework algorithms automaticallyidentify the MPM, trigger an identification experiment, design the excitation signal, identify anew process model and update the controller. Using this experimental setup a series of tests arepresented to examine alternative configurations of the adaptive control framework.The objective of this work is to briefly familiarize the reader with each of the separate tasksin the framework before exhibiting the integration and functionality of the unified adaptive controltechnique. First, the simulated plant and control system for this case study are described and the66Chapter 6. Introductionoutline of the adaptive framework is introduced. Next, the specific techniques currently implementedfor mismatch detection (also referred to here as process monitoring), input design and systemidentification are described. Afterwards, the experimental setup for the various tests is reviewedfollowed by a discussion of the experimental results. Concluding remarks and recommendations forfuture work are provided in the final chapter.67Chapter 7System OverviewThe industrial MD system simulated in this work involves both the true process dynamics andthe MPC control system. The adaptive framework integrates to the closed-loop MD system andperforms various tasks such as monitoring the input-output (IO) data and exciting the process input.In this chapter the simulated MD process is first presented before discussing how the adaptive controlsub-functions interface with the closed-loop system.7.1 Simulating the MD processThe MD paper machine process is modeled as a MIMO lower triangular system composed of sixfirst order plus deadtime (FOPDT) transfer functions with three manipulated variables (MVs) andthree controlled variables (CVs). The process output, including an additive disturbance, is fed backto a model predictive controller which determines the new process input values. A simple schematicof this closed-loop control system is shown below in Figure 7.1.K(Gˆ) GHuer+ + + y−yFigure 7.1: Closed-loop MD control systemFrom Figure 7.1 we obtain the following expression for our process outputs:y1(t)y2(t)y3(t) = G(q)u1(t)u2(t)u3(t)+H(q)e1(t)e2(t)e3(t), (7.1)687.1. Simulating the MD processwhere r(t) is the reference signal (or setpoint) and each ei(t) is an independent and identicallydistributed (IID) Gaussian white noise sequence with zero mean and variance σei [41]. Processoutputs y(t), y(t) and y(t) represent the measured basis weight (lbs/3000ft2), press moisture (%)and reel moisture (%), respectively. Furthermore, process inputs u(t), u(t) and u(t) representthe thick stock flow (gpm), press section steam pressure (psi) and reel section steam pressure (psi),respectively. Apart from some user-specified tuning parameters, the details of the MPC controller,K(Gˆ), presented in this chapter will be limited to the update of Gˆ.Both the nominal model used by the controller, Gˆ(q), and the true plant transfer function, G(q),have the following form:G(q) =G11(q) 0 0G21(q) G22(q) 0G31(q) G32(q) G33(q). (7.2)Each Gij in (7.2) represents an FOPDT transfer function which is defined in continuous timeaccording toGij(s) =bijaijs+ 1e−dijs, (7.3)where bij , aij and dij represent the continuous gain, time constant and time delay parameters,respectively [23]. The nominal model used by the controller, Gˆ(q), takes the exact same formexcept the sub-transfer functions are denoted Gˆij with parameters bˆij , aˆij and dˆij . Initially there isa 10% negative MPM in each of the process parameters, i.e., each of the parameters in the nominalmodel are 10% lower than the true process parameters presented in Table 7.1.Table 7.1: Continuous MD Process ParametersGij bij aij dijG 1.1600 73.26 77G 0.3256 50.622 99G -0.1540 297.0 121G 0.8283 170.28 43G -0.2618 232.32 33G -0.0611 27.324 99697.2. Adaptive control frameworkThe noise model, H(q), is a diagonal matrix of stable and inversely stable filters, i.e.,H(q) =H1(q) 0 00 H2(q) 00 0 H3(q), (7.4)where each Hi(q) can be either high pass or low pass filters of varying order. The specific noisemodels used in this case study vary so these will be defined along with the noise variance in theexperimental setup. Next, a discussion on how the various tasks in the adaptive framework interfacewith the main control loop is provided.7.2 Adaptive control frameworkThe adaptive control framework involves a large number of sub-functions that integrate into themain control loop. The specific details of this integration is described along with a review of thesemethods in the following chapters. First, however, a general overview of how the adaptive controlframework integrates with the closed-loop system is provided. The adaptive control frameworkimplemented in this work consists of three major elements: process monitoring, input design andsystem identification. It is worth mentioning that although the term process monitoring is usedhere to preserve generality, the current implementation of this PM element is equivalent to MPMdetection. An illustrative schematic outlining the adaptive framework on a closed-loop system isprovided in Figure 7.2. For simplicity, the specific IO data sent to the PM, ID and SI blocks isclarified in the detailed explanations of each of these blocks.K(Gˆ(i)) GH+ utet+PMut, yt, Gˆ(i)IDut, yt, Gˆ(i)SIut, yt, Gˆ(i)tf tsr++yt−ytr∗t+Gˆ(i+1)Figure 7.2: Adaptive framework for closed-loop MPC707.2. Adaptive control frameworkDuring normal operation the model predictive controller, K(Gˆ(i)), controls the process, G, andthe process monitoring algorithm, PM, collects process input data, ut, and output data, yt. Thecurrent plant model estimate, Gˆ(i+1), is also provided to the PM algorithm for reference. When Gdeviates from the estimated process model, Gˆ(i), the PM algorithm detects the MPM and triggersan excitation experiment by sending the start time, ts, to the ID algorithm. An optimally designedexcitation signal, r∗t , is then calculated by the ID algorithm and added to the actuator movements,ut, to provide additional process excitation for a specified experiment duration. A pseudo-randombinary sequence (PRBS) can also be used to excite the system as is discussed in the experimentalsetup. Once the end of the experiment, tf , is reached, the SI algorithm collects the informativeIO data from the experiment duration (ut and yt between ts and tf ) to determine a new processmodel estimate. The new estimate of the process, Gˆ(i+1), is then implemented in the controllerand normal operation resumes. Successful implementation of this framework allows the controllerto automatically respond to changes in the process while maintaining feedback control, withoutrequiring operator intervention. In what follows, each of the major elements of the adaptive controlframework are reviewed.71Chapter 8System IdentificationIn addition to re-tuning the process model, system identification is of critical importance to theMPM detection aspect of the adaptive control framework. A detailed description of the systemidentification methods is presented here in order to provide necessary context for the following sec-tions. First some general background and motivation is provided regarding the particular selectionof identification methods. This is followed by a theoretical review of how these methods are im-plemented on the closed-loop system. The chapter concludes with a user-focused description of theuse of these methods in the simulated closed-loop MD system.8.1 BackgroundSystem identification methods can be broadly categorized as follows:(a) The prediction error family.(b) Subspace approaches.(c) Nonparametric correlation and spectral analysis methods.An important difference between open-loop and closed-loop data is the correlation between the noiseand the input in closed-loop data as a result of feedback control. This correlation between the noiseand the input can be problematic for extending open-loop identification methods such as subspaceand nonparametric methods to closed-loop data [24]. For this reason, the analysis presented herefocuses on category (a), i.e., identification procedures that fit under the framework of the predictionerror method (PEM). Closed-loop identification methods can be classified based on their respectivefeedback assumptions as follows [28]:(i) Direct identification: neglects the effect of feedback and the correlation between728.2. Prediction error methodinput and noise. Directly identifies the open-loop system from input and output mea-surements.(ii) Indirect identification: uses knowledge of the controller to determine the open-loopsystem from an identified closed-loop transfer function (e.g., the sensitivity function).(iii) Joint input-output identification: identifies closed-loop transfer functions from anexternal excitation signal (e.g., dither signal or set-point change) to both the processinput and output. Determines the open-loop system from these closed-loop transferfunctions. Includes the projection method and the two-stage approach.For process monitoring, closed-loop identification must be performed on routine operating data(i.e., without external excitation). Joint input-output identification methods are thus ruled outas they depend on some form of external excitation. Furthermore, both indirect and joint input-output identification techniques can become overly complex in the presence of nonlinear feedback[24] [56]. As it is not uncommon for industrial MPC to display nonlinear dynamics, the indirectand joint input-output identification techniques both fall outside the scope of this report. Insteadthe focus of this report is constrained to comparing the direct identification (DI) method to a newlyproposed method that utilizes both an autoregressive exogenous (ARX) model and an output-error(OE) model. This newly proposed method is referred to as the closed-loop ARX-OE identificationmethod. Implementation of both of these techniques is discussed in the following sections.8.2 Prediction error methodThe closed-loop identification methods introduced in (i)-(iii) as well as the ARX-OE method canall be seen as varying parameterizations of the PEM. The PEM minimizes a cost function of thesquared 2-norm of the prediction error, i.e.,VN (G,H) =1NN∑t=ˆ2(t, θ), (8.1)where ˆ(t, θ) represents the difference between the measured and predicted outputs, i.e.,ˆ(t, θ) = y(t)− yˆ(t|θ). (8.2)738.2. Prediction error methodPlease note the distinction made between the actual process white noise sequence, ei(t), and theprediction error ˆ(t, θ) [25]. For plant and noise models defined by the parameter vector θ, thepredicted output, yˆ(t|θ), is given by [41]yˆ(t|θ) = Hˆ−(q, θ)Gˆ(q, θ)u(t) + [1− Hˆ−(q, θ)]y(t), (8.3)and after some algebraic manipulation the following expression for the prediction error is obtainedˆ(t, θ) = Hˆ−1(q, θ)[y(t)− Gˆ(q, θ)u(t)]. (8.4)Ultimately the goal is to obtain the parameter vector θ such that the parameterized modelsare equal to the true system, i.e,Gˆ(q, θ) = G(q) Hˆ(q, θ) = H(q). (8.5)where G(q) and H(q) represent the true plant and noise systems, respectively. Minimizing (8.1)with respect to θ results in the following PEM estimate of the parameter vectorθˆN = arg minθVN (θ, ZN ), (8.6)where ZN represents the provided input and output data, i.e.,ZN = {u(), y(), . . . , u(N), y(N)}, (8.7)for a given time span t = 1, . . . , N [24][25].The PEM implemented here uses a polynomial representation of the transfer functions Gˆ(q, θ)and Hˆ(q, θ) for the MIMO MD system. As an example, the expression for y in (7.1) becomesA(q)y(t) =B(q)F(q)u(t− nk) +B(q)F(q)u(t− nk)+B(q)F(q)u(t− nk) +C(q)D(q)e(t),(8.8)where nk represents the various delays of each input. Expressions for y and y follow in the same748.3. Direct identificationfashion. The polynomials presented in (8.8) are defined as follows:A(q, θ) = 1 + aˆq− + · · ·+ aˆnaq−na , (8.9)B(q, θ) = bˆ + bˆq− + · · ·+ bˆnbq−nb , (8.10)C(q, θ) = 1 + cˆq− + · · ·+ cˆncq−nc , (8.11)D(q, θ) = 1 + dˆq− + · · ·+ dˆndq−nd , (8.12)F (q, θ) = 1 + fˆq− + · · ·+ fˆnf q−nf . (8.13)Where model orders na, nb, nc, nd, nf and nk must be specified. In this work the time delays, nki ,are estimated by comparing a series of ARX models with different delays and determining whichdelay value corresponds to the ARX estimate that provides the best fit to the IO data.8.3 Direct identificationThe closed-loop DI method begins by defining the model orders specified above and proceeds bysolving (8.6) directly with process input and output data. It is assumed that the plant model isknown a priori to have a first order numerator and denominator, i.e., nb = nf = 1. Although thenoise model is unknown, an estimate of the noise model order is necessary for DI. In this report thenoise model is estimated to have both a first order numerator and denominator, i.e., nc = nd = 1.For SISO simulations, it has been shown that improper specification of the noise model order canresult in increased bias and variance of the parameter estimates. Although early MIMO simulationsfailed to consistently exhibit this behavior, it is considered as part of our experimental investigation.Finally, A(q) is assumed to be zeroth order which leads to the following output expressions for thedirect identification method:y(t) =bˆ1 + fˆq−u(t− nk) +1 + cˆq−1 + dˆq−e(t), (8.14)758.4. ARX-OE identificationy(t) =bˆ1 + fˆq−u(t− nk)+bˆ1 + fˆq−u(t− nk) +1 + cˆq−1 + dˆq−e(t),(8.15)andy(t) =bˆ1 + fˆq−u(t− nk) +bˆ1 + fˆq−u(t− nk)+bˆ1 + fˆq−u(t− nk) +1 + cˆq−1 + dˆq−e(t).(8.16)Comparing the above output expressions to the original Box-Jenkins system it is easy to see thatfor the DI methodGˆij(q, θ) =Bij(q, θ)Fij(q, θ)(8.17)andHˆi(q, θ) =Ci(q, θ)Di(q, θ). (8.18)As is apparent from (8.14) - (8.16), each transfer function is modeled with the same FOPDTstructure as the true process, Gij , shown in (7.3). Values of bˆij and fˆij in (8.14) - (8.16) representthe estimated gain and time constant parameters, respectively. Although the presentation here isfocused on the more complex MIMO system it is worth noting that the output expression for theSISO implementation of DI is equivalent to (8.14).An additional variation of the DI method specified above has been investigated in previoussimulations. This variation is referred to as direct output error (OE) identification, as the noisemodel order is specified such that (8.14) - (8.16) take the form of output error models, i.e., nc = nd =0 and thus Hˆi(q, θ) = 1. Although a detailed analysis of this technique is beyond the scope of thisreport, it is worth noting that the results obtained using the MIMO industrial MD simulator wereinterestingly comparable in quality to applying the true noise model order. Further investigationmay be warranted.8.4 ARX-OE identificationThe ARX-OE method is a recent technique developed by Q. Lu et al., in 2016 for performing closed-loop identification using routine operating data without a priori knowledge of the noise model order.768.4. ARX-OE identificationA thorough documentation of this technique is currently pending finalization and submission buta brief introduction is provided in what follows. The ARX-OE method is carried out through thefollowing two step procedure [43]:1. A high order ARX estimate of the noise model is generated and IO data is filtered withthe inverse of the noise model estimate.2. An OE model with the filtered IO data is solved via the PEM.After introducing the ARX-OE identification method for the SISO system a straightforwardextension to the MIMO case is presented. First, the Box-Jenkins system from (7.1) is representedwith the following ARX model structure:A(q, θ)y(t) = B(q, θ)u(t− nk) + e(t), (8.19)where A(q, θ) and B(q, θ) are defined as before. Dividing both sides of (8.19) by A(q, θ) yieldsy(t) =B(q, θ)A(q, θ)u(t− nk) + 1A(q, θ)e(t) (8.20)and by relating back to (7.1), it is apparent thatGˆ(q, θ) =B(q, θ)A(q, θ)and Hˆ(q, θ) =1A(q, θ). (8.21)The order of A(q, θ) is selected sufficiently high enough to capture the dynamics of the true noisemodel (e.g., na ≈ 15). This has been made evident by conducting tests that compare the impulseresponse (IR) coefficients of both the true and estimated inverse noise models such as the exampleshown in Figure 8.1.Upon estimating the inverse noise model, A(q, θ), the next step is to filter the IO data by A(q, θ).Filtered input and output data are defined respectively asuf (t) = A(q, θ)u(t) and yf (t) = A(q, θ)y(t). (8.22)778.4. ARX-OE identificationFigure 8.1: Impulse response of true and estimated inverse noise modelsInserting IO expressions from (8.22) into (8.20) yieldsyf (t)A(q, θ)=B(q, θ)A(q, θ)uf (t− nk)A(q, θ)+1A(q, θ)e(t) (8.23)and multiplying both sides of (8.23) by A(q, θ) results in the following output error modelyf (t) =B(q, θ)A(q, θ)uf (t− nk) + e(t). (8.24)Finally, this output error model can be solved via the PEM and equation (8.1) as described before.For the MIMO system, (8.24) extends to the following system of equations:yf (t) =B(q, θ)A(q, θ)uf(t− nk) + e(t), (8.25)yf (t) =B(q, θ)A(q, θ)uf(t− nk)+B(q, θ)A(q, θ)uf(t− nk) + e(t),(8.26)788.5. Implementationandyf (t) =B(q, θ)A(q, θ)uf(t− nk) +B(q, θ)A(q, θ)uf(t− nk)+B(q, θ)A(q, θ)uf(t− nk) + e(t).(8.27)Model orders of A(q, θ) and B(q, θ) for the second part of the ARX-OE method are specified asna = nb = 1. The final filtered output-error expressions are obtained by simply substitutingBij(q, θ)Aij(q, θ)=bˆij1 + aˆijq−(8.28)into (8.25) - (8.27). The values of bˆij and aˆij obtained from this identification procedure provideestimates of the process gain and time constant, respectively. Furthermore, these estimates are usedto evaluate the accuracy of the identification as is described in the experimental setup. Ultimately,the purpose of identifying the process model is to be able to re-tune the controller and obtainimproved controller performance. The implementation of these closed-loop identification techniquesin the MD industrial simulator is described in the following section.8.5 ImplementationClosed-loop system identification is fundamental to both the PM and SI aspects of the adaptivecontrol framework. The PM algorithm is setup such that only the ARX-OE method is used whereasthe SI algorithm applies both the DI method and the ARX-OE method. When the SI stage iscomplete the user can select whether to update the controller model with either the DI estimate orthe ARX-OE estimate. In this work both are selected to provide a comparison of the two methodson control performance. Here the specific implementation of closed-loop identification is describedfor both the PM and SI stages.For PM a series of ARX-OE identifications are performed using over-lapping moving windowsof IO data. The moving window has a length of 1440 samples and takes steps of length 60 sampleswith a sample time of 5s. In other words the identifications are performed every 5 minutes on theprevious 2 hours worth of IO data. In the first stage of the ARX-OE method the inverse noise modelestimate A(q, θ) is estimated with an order of na = 18 and the polynomial B(q, θ) is estimated withan order of nb = 10. The time delay estimates used by the controller are also provided for the798.5. ImplementationARX-OE identification.The large orders of the first stage ARX model require estimation of many parameters and sincethere is only a limited amount of data the resulting estimates can exhibit high variance. Errors inan estimated model generally take the form of either bias or variance. Bias can occur as a result ofhaving a model structure that does not capture the complexities of the true process whereas varianceis typically due to disturbances in the measured data. Minimizing the cost function, VN (θ, ZN ),involves a trade-off between bias and variance that is determined by the selection of model order.Increasing the order of a model generally decreases the bias while simultaneously increasing thevariance. To address the large model order and reduce the variance of the estimates regularizationconstants Λ and R are determined and the regularized ARX model is estimated, i.e.VN (θ, ZN ) =1NN∑t=ˆ2(t, θ) + Λ(θ − θ∗)TR(θ − θ∗), (8.29)where θ∗ is the mean of the assumed Gaussian prior distribution of θ [12].Since the PM algorithm uses small amounts of routine operating data the signal to noise ratiois improved by filtering out the high frequency noise. This denoising step is performed after theIO data is filtered with the inverse noise model estimate. The final stage of the ARX-OE methodinvolves solving the OE model. To improve the OE estimation the range of possible estimatesare bounded using a priori knowledge of acceptable values. Finally the results of the ARX-OEidentification are sent to the next stage of the PM algorithm, as will be discussed in detail in thefollowing chapter.The primary closed-loop identification used to re-tune the controller model is performed afterthe excitation experiment conducted by the ID algorithm. In this work the excitation experimenthas a duration of 3000 samples (i.e., ≈ 4.2 hours). The IO data from the duration of the experimentis collected by the SI algorithm to perform both DI and ARX-OE identification. First the timedelays are estimated by comparing a series of ARX models with varying time delays and determiningwhich model provides the best fit. The ARX-OE method proceeds by solving the regularized ARXmodel as described before. The IO data is then filtered with the high order inverse noise modelapproximation and the constrained OE model is estimated to determine the gain and time constantparameters. The DI method is performed with similar constraints on the acceptable parameter808.5. Implementationvalues. Both the OE model in the ARX-OE method and the DI model are estimated with firstorder numerator and denominator polynomials for the process model estimate. Furthermore, theDI method uses a first order numerator and denominator structure to estimate the noise model.After the results of both identifications are determined the user specifies which method to use forcontroller tuning. Finally, the controller is re-initialized with new process model estimates andregular operation resumes until another MPM is detected. This concludes the discussion of the useof closed-loop ARX-OE and DI in the adaptive control framework. Next, a brief introduction tothe MPM detection technique is provided.81Chapter 9Model Plant Mismatch DetectionThe process monitoring (PM) aspect of the adaptive control framework, shown in Figure 7.2, deter-mines whether or not the controller model estimate is adequately representative of the true process.In order to determine whether this mismatch exists, a novel model-plant mismatch (MPM) detectionalgorithm is employed. In this chapter some general background into the MPM detection algorithmis provided followed by a discussion of the implementation of this technique in the adaptive controlframework. A thorough treatment of the theory involved with the MPM detection technique is be-yond the scope of this work so the interested reader is referred to the development of the techniqueby Q. Lu et al., in 2017 [42].9.1 BackgroundFor model-based control methods such as MPC the existence of MPM can be detrimental to con-troller performance. Failure to acknowledge and correct significant MPM can result in sub-optimalcontrol and production losses. It is estimated that up to 60% of industrial controllers exhibit con-troller performance problems from issues such as poor tuning, lack of maintenance and inappropriatecontrol structure, among others [35]. Controller performance monitoring is a well established fieldthat has drawn increasing interest since the development of the minimum variance benchmark byHarris in 1989 [32]. Since monitoring is performed almost constantly it is beneficial to employ atechnique that passively monitors the process. Furthermore, since both the true noise model and thetrue process model can change during operation it is necessary to distinguish process model changesfrom noise model changes. If the noise model changes and the process model does not change it isunlikely that conducting an identification experiment will improve controller performance [6]. TheMPM detection implemented here uses the ARX-OE identification method on routine operatingdata in combination with a support vector machine (SVM) in order to distinguish between changes829.1. Backgroundin the process or noise models and ultimately determine whether MPM has occurred.To determine whether or not MPM has occurred a series of closed-loop ARX-OE identificationsare performed as was described in the previous chapter. Due to the presence of noise and the lackof significant excitation in routine operating data the variance of the ARX-OE estimates can beconsiderable even if the true process behavior has not changed. In other words, directly comparingthe routine ARX-OE estimates to the current process model does not provide a reliable MPMdetection procedure since the ARX-OE estimates have a natural range of variation. Instead, arange of acceptable variation of process model estimates is established as a benchmark and new IOdata is compared to this benchmark. Specifically, an SVM is used to compare new process modelestimates to the benchmark and if these estimates fall outside of the acceptable range they areregarded as MPM indications.For the purpose of this work the SVM can be understood as a binary classifier that indicateswhether or not ARX-OE estimates belong to the model implemented in the controller or whetherMPM has occurred. In order to classify new data the SVM model must first be trained to establishthe acceptable range of variation associated with the current model. Immediately after a newcontroller estimate is determined a training stage begins during which a series of overlapping moving-window ARX-OE identifications are performed. During the training stage it is assumed that noMPM exists. For PM it is desirable to be able to detect mismatch in the gain, time constant andtime delay parameters of the process and controller. Therefore the SVM model is trained using theIR coefficients of the ARX-OE estimated process models since the IR coefficients capture changesin all of the aforementioned parameters. The SVM is used to map the IR response coefficientsinto a high dimensional feature space for comparison and ultimately classification. An illustrationof the high dimensional SVM classification is shown in Figure 9.1. This mismatch classification isperformed for both process and noise model estimates in order to confirm whether identifying anew process model will improve controller performance. Further details on the implementation ofthe MPM algorithm are described in the following section but for a more theoretical description ofthe SVM please refer to the work by Q. Lu et al. [42].839.2. ImplementationFigure 9.1: Illustration of high dimensional SVM classification [42]9.2 ImplementationTo implement SVM classification the SVM model must first be trained with data that does notcontain MPM in order to establish a benchmark for comparison. When the controller is initialized,or a new model is estimated, the PM algorithm collects data over a period of operation that isassumed to have a relatively static true process, G ≈ Gˆ. Recall, the PM algorithm uses routineoverlapping moving-window closed-loop ARX-OE identification estimates to train the SVM withIR coefficients. For this particular study the duration of the training period is 3500 samples with asample time of 5s (i.e., ≈ 4.9 hours of training). During the training period the closed-loop ARX-OEidentifications are performed with a window length of 1440 samples and step size of 60 samples (i.e.,2 hour window length and 5 minute step size). The IR coefficients of the resulting process and noisemodel estimates are provided to the SVM algorithm to establish an acceptable range of estimatesthat are considered free of MPM. Now that the SVM is trained, the MPM detection algorithm isconsidered active and begins comparing new IO data to the benchmark established by the SVMmodel.849.2. ImplementationThe period of operation where the MPM detection algorithm is active is referred to as the testingperiod. As with the training period, the testing period involves routine ARX-OE identificationsthat are performed with an equivalent moving-window procedure. The SVM model constructedfrom the training data is illustrated by the blue boundary line in Figure 9.1. When new process andnoise model estimates from the testing data are mapped into the high dimensional feature spacethey are compared to the SVM model. The testing models that fit with the SVM model (marked ingreen in Figure 9.1) are considered free of MPM and those that violate the SVM model boundary(marked in red in Figure 9.1) are considered to exhibit MPM. The SVM algorithm returns plantand noise model indicators that provide positive values if no MPM is suspected and negative valuesif the estimate is classified as exhibiting MPM.To avoid triggering unnecessary identification experiments the PM algorithm requires a largenumber of consecutive negative indications. In this work the plant and noise model indicators areobserved over a duration of an hour (i.e., 720 samples) and if over 95% of the values are negative it isdetermined that MPM has occurred. The noise model indicators are analyzed first and if the alarmthreshold of negative values is reached the PM algorithm resets. Mismatch detection is stoppedso that the SVM can be retrained. If the noise model does not exhibit mismatch then the processmodel indicators are analyzed and if the alarm threshold is reached an identification experimentis triggered. Now that MPM has been identified the current sample time marks the beginning ofthe identification experiment during which a series of steps are performed to calculate an optimalexcitation signal, r∗t , as is described in the following chapter.85Chapter 10Input DesignInput design (ID) is used to calculate an optimal excitation signal, r∗t , that is added to the processinput, ut, to create more informative process IO data. Informative IO data has a relatively highsignal to noise ratio which enables improved system identification and more accurate parameterestimation. Industrial identification experiments are often performed in the open-loop mode wherefeedback control is removed and the set-point values can be manipulated. An example of an open-loop identification experiment is referred to as the bump test where the set-point is altered andthe output response to the set-point change is observed. A common closed-loop industrial tech-nique for exciting the process is to simply inject a pseudo-random binary sequence (PRBS) as aperturbation signal. Disrupting the process with perturbation sequences or set-point changes andremoving feedback control can lead to lost production time by failing to meet stringent productspecifications [63]. Therefore it is imperative to determine an optimal excitation signal that can beimplemented in the closed-loop and minimizes the cost of performing an identification experiment.This chapter proceeds by presenting some background material on ID before describing the specificimplementation of ID in the simulated case studies.10.1 BackgroundTypically, for identification purposes a higher quality estimate involves a lower covariance of es-timated parameters. Thus, traditional approaches to ID often involve minimizing some functionof the covariance matrix such as the determinant, trace or largest eigenvalues [48]. In this work,instead of minimizing the trace of the covariance matrix, an adaptation of the trace of it’s inverse,i.e., the partial Fisher information matrix is maximized. Specifically, a convex quadratic functionis maximized while enforcing explicit constraints on the inputs and outputs. A moving horizonpredictive (MHP) framework is used to compute a sequence of future input perturbations at each8610.1. Backgroundsample time. The length of this excitation signal is referred to as the prediction horizon.Outputs are predicted using the most recent process model estimate and predicted outputs areused to construct the partial Fisher information matrix. As will be seen in the experimental results,it is difficult (perhaps even impossible) to guarantee that the inputs and outputs always satisfy theirrespective constraints. This is due to the presence of output disturbances in addition to inevitablediscrepancies between the true process model and the current estimate. As the prediction horizonincreases, the potential for constraint violation also increases, which explains why this calculationis repeated at each sample time. The ID technique implemented in this thesis is adapted fromthe developments made by M. Yousefi et al. in 2015 which can be referred to for a more robustbackground and theoretical review [62].The Fisher information matrix is defined as the inverse of the parameter covariance matrix asfollows:F = cov(θN )− = ΨΨ ′. (10.1)The matrix Ψ in this part of the thesis is composed of a series of vectors ψ, i.e.,Ψ =[ψt . . . ψt−N+1]. (10.2)Each vector ψ contains the relevant IO data determined by the orders of the ARX polynomialsA(q, θ) and B(q, θ), i.e.,ψt =[y′t−1 . . . y′t−na u′t−d . . . u′t−d−nb]′, (10.3)such that the vector ψt and the parameter vector θ model the process output as follows:yt = θψt. (10.4)For a process described by an ARX model the parameter vector θ is given by the following expression:θ =[a1 . . . ana b1 . . . bnb]. (10.5)8710.1. BackgroundThe trace of the Fisher information matrix Ru = trace(F ) is defined byRu =n∑p=na∑j=N∑i=j(yp(t− i)) +m∑q=nb∑j=N∑i=j(uq(t− nk + − i)) (10.6)where n is the number output variables, m is the number input variables, na is the order of A(q, θ),nb is the order of B(q, θ), nk is the input delay and N is the number of IO samples collected [62].The ID technique applied in this work can be summarized as follows:maxutR˜usubject to yL ≤ y(t) ≤ yH ,uL ≤ u(t) ≤ uH ,|u(t)− u(t− 1)| ≤ ∆uH ,(10.7)where the objective function R˜u is defined asR˜u =m∑q=(1∆uHqvar(u∗q)). (10.8)The input sequence u∗q contains all possible combinations of input movements over the specifiedprediction horizon where each movement is limited to a binary choice of ±∆uHq . Originally, thetrace of the Fisher information matrix defined in (10.6) was maximized subject to the convex inputand output constraints. Since this optimization involves maximizing a high dimensional convexquadratic function it can be difficult for quadratic solvers to handle and there is no guarantee thata global optimum is determined.Since Ru is convex, the maximum over a compact convex domain defined by the convex con-straints in (10.7) occurs at one of the vertices, i.e., when the constraints are active. The number ofvertices grows exponentially with the dimension of the problem so it was originally thought that asolution based on visiting each vertice individually would be computationally infeasible [62]. How-ever, in this work the number of input variables is m = 3, so if the specified prediction horizonis not too large the dimensionality of the problem is small enough to solve via determining themaximum at all of the vertices. The input sequence u∗q is selected such that only combinations of8810.2. Implementationinputs that obey the IO boundaries while taking the maximum allowed step size are considered.In other words, of all of the possible combinations of performing binary movements of ±∆uHq overthe prediction horizon, only those that meet the IO boundaries are evaluated in (10.8). The outputboundaries are evaluated based on the predicted outputs resulting from taking the proposed set ofinput movements. Predicted outputs are generated based on the current process model estimates.The implementation of this new ID method in the experimental case studies is described further inthe following section.10.2 ImplementationAfter the PM algorithm triggers the start of an identification experiment the ID algorithm beginsdetermining a perturbation sequence to excite the process. Excitation signals are determined duringeach of the 3000 samples of the identification experiment. Previous input movements extending fora length greater than the sum of the largest time delay and the largest time constant are providedto the ID algorithm. Combinations of all possible future input sequences are determined by com-puting binary permutations equal to ±∆uHq over the prediction horizon. To ensure computationaltractability the prediction horizon is limited to a length of 4 steps after which the input value isheld constant. This limited prediction horizon is sustainable because the ID calculation is repeatedat each new sample. The previous model estimates are supplied to the ID algorithm which usesthe impulse response models in combination with past and future input values to predict futureoutputs.In the experimental simulations presented in this thesis the maximum input movements arespecified as∆uHq = [∆uH , ∆uH , ∆uH ] = [3.0, 1.4, 0.6]. (10.9)The input upper and lower bound deviations from the set-point are restricted touHq = −uLq = [uH , uH , uH ] = [30, 15, 8], (10.10)8910.2. Implementationand similarly the output upper and lower bound deviations from set-point are defined asyHp = −yLp = [yH , yH , yH ] = [15, 1.0, 2.0]. (10.11)To conservatively account for both the time delay and process time constant a total of the nup =111 most recent input values are provided to the ID algorithm at each iteration. This value isdetermined by taking the sum of the maximum time constant estimate, the maximum time delayestimate and a contingency factor of 35.Combinations of future input movements are determined over a 4 step prediction horizon afterwhich the input values are held constant for the remainder of the horizon. The total predictionhorizon has a length of 27 which is defined by the maximum time delay estimate plus a contingencyfactor of 5. Previous plant model estimates are used to generate an impulse response model of theprocess which is multiplied by both the past and future input values to determine the predictedfuture outputs. Input sequences that correspond to predicted values of future inputs and outputsthat meet the input and output upper and lower bounds are stored and values of R˜u in (10.8)are determined. After all of the possible combinations of input sequences have been evaluated thesequence that provides a maximum value of R˜u is selected as the perturbation sequence.This concludes the description of the various elements of the adaptive control framework andtheir integration in the closed-loop system. System identification, performance monitoring (i.e.,MPM detection) and input design algorithms have all been presented. Hopefully the reader nowhas a grasp of how the various stages of the adaptive framework are implemented and unified. Inthe following section, the focus shifts to describing the setup of the experimental simulations.90Chapter 11Experimental SetupTo provide the necessary context for understanding the experimental results, the details of theexperimental setup are described here. First, an illustration is provided to describe the generalsimulation time-line and common setup parameters are defined. This is followed by a review of thedistinguishing configuration features for the various experiments. Finally, this chapter concludes bydescribing the techniques used to analyze the resulting simulation data.11.1 Experimental time-line and general setupThe experiments presented in this work represent only a small subset of the large number of exper-iments conducted to integrate the various aspects of the adaptive control framework. The MPMdetection algorithm is capable of detecting gain, time constant and time delay mismatch. How-ever, the simulations presented here focus on detecting and resolving a mismatch in the gain ofthe process. Limiting the focus to a single type of mismatch provides more stringent control ofexperimental variables to better compare various experimental configurations such as tuning theprocess with DI or ARX-OE identification estimates. Since the MIMO process is highly coupledany identification experiment in a single IO pairing will result in excitation of all of the processinputs and outputs. For this reason when MPM is determined the identification algorithm simplyre-identifies the gain, time constant and time delay values for all of the process models.A generalized illustration of the experimental simulation time-line is provided in Figure 11.1.This figure is used to guide the discussion of the general simulation setup. A small inaccuracy inFigure 11.1 is that the controller model gain is initially 10% less than the plant gain. One importantcommon feature of all of the experimental simulations is that a 2× gain MPM in b exclusively isintroduced at sample number nMPM = 4000. The remaining gain and time constant values of thetrue process are held constant along with the controller model. Another common element is that9111.1. Experimental time-line and general setupFigure 11.1: Illustrative time-line of the general setupthe SVM is trained for a period of nSVM = 3500 samples at which point MPM detection becomesactive. The sample at which MPM is detected depends on the IO data and SVM models but it isdescribed here as nID. A user specified delay nID+∆ − nID = 1600 separates the MPM detectionpoint and the actual ID excitation experiment. Typically the ID experiment would start at nID,however the delay provides the time necessary to perform set-point changes and collect responsedata.After a pre-specified wait period nB − nMPM = 500, a set-point change at sample nB increasesthe set-point of y by 10% of y. This set-point is held for 300 samples before it is decreased by twiceas much, held for another 300 samples and then returned to the original set-point y. This seriesof set-point changes provides information on the dynamic response of the closed-loop system underMPM which is used to evaluate controller performance. An identical procedure is used at samplenSI after the controller has been re-tuned. This allows us to compare the dynamic closed-loopresponses both with and without adaptive tuning.A summary of the general setup parameters is provided in Table 11.1 below. Controller pa-rameters αsp and αd are used to tune the MPC for reference tracking performance and disturbancerejection performance, respectively. The ID algorithm is compared to using PRBS excitation signalsof magnitude uPRBSq = ± [2.0, 1.0, 0.5]. Since the PRBS does not constrain the input and outputvalues the magnitudes of the PRBS are conservatively less than ∆uHq . The remaining experimentalsetup parameters depend on the various test configurations and are outlined in the following section.9211.2. Experimental configurationsTable 11.1: General experimental setup parametersSymbol Value DescriptionnF 14000 Simulation duration in samplesnMPM 4000 Sample to trigger MPMαsp 1.5 Controller reference trackingαd 1.5 Controller disturbance rejectionyp [174, 2.3, 6.2] Nominal output set-pointsuq [165, 115, 35] Nominal input set-points∆yp% [10, 0, 0] Magnitude of set-point changenID+∆ − nID 1600 Delay after MPM detectionnB − nMPM 500 Wait period before set-point changen∆y 300 Samples to hold each set-point changeH, H, H 1st or 6th order High pass noise modelsuPRBSq ±[2.0, 1.0, 0.5] Magnitude of PRBS excitationσep [0.08, 0.02, 0.01] Standard deviation of noise11.2 Experimental configurationsAs mentioned before, the primary interest of conducting the experiments is to determine howthe controller performs before and after adaptive tuning. Analyzing the dynamic response of theclosed-loop system helps to evaluate the effectiveness of the adaptive control framework. For aparticular configuration, this effectiveness can be observed with a single simulation. However, it isalso desirable to observe which configurations maximize the effectiveness of the adaptive framework.Specifically, it is important to analyze the effectiveness of the ID algorithm relative to a PRBSsequence and determine whether the DI method or the ARX-OE identification method provides thebest controller performance after re-tuning.In this report six separate experiments are performed. The first four experiments are designedto investigate whether the ID sequence is more or less effective than a simple PRBS. The last twotests are conducted to evaluate the effect of incorrect noise model order specification for the DImethod. Another way to understand the series of tests is as a set of 3 trials each of which testsboth the DI method and the ARX-OE method. The first trial is conducted with PRBS excitation,the second trial uses the ID algorithm and the third trial uses the ID algorithm but with a differenttype of noise model. A sixth order noise model is used in the third trial while the DI method stillassumes that the noise model is first order. The noise models in all three trials are high pass (HP)filters. A general description of each test is provided in Table 11.2 below.9311.3. Experimental analysisTable 11.2: Various experimental configurationsTrial/test No. Description1.1 PRBS excitation, ARX-OE controller tuning, 1st order HP noise1.2 PRBS excitation, DI tuning, 1st order HP noise2.1 ID excitation, ARX-OE tuning, 1st order HP noise2.2 ID excitation, DI tuning, 1st order HP noise3.1 ID excitation, ARX-OE tuning, 6th order HP noise3.2 ID excitation, DI tuning, 6th order HP noiseThe first order noise models used in the first four tests are all HP filters defined as follows:H(q) =1− 0.5q−1 + 0.9q−,H(q) =1 + 0.3q−1 + 0.6q−,H(q) =1 + 0.2q−1 + 0.7q−.(11.1)Similarly, the sixth order noise models in the final two tests are all HP filters defined asH(q) =1− 0.9q− + 0.4q− − 0.3q− − 0.2q− + 0.1q− + 0.1q−1 + 0.9q− + 0.6q− + 0.3q− + 0.2q− + 0.1q− − 0.1q− ,H(q) =1− 0.9q− + 0.4q− − 0.2q− − 0.1q− + 0.1q− + 0.1q−1 + 0.9q− + 0.5q− + 0.2q− + 0.2q− + 0.1q− − 0.1q− ,H(q) =1− 0.7q− + 0.3q− − 0.2q− − 0.1q− + 0.1q− + 0.1q−1 + 0.6q− + 0.4q− + 0.3q− + 0.2q− + 0.1q− − 0.1q− .(11.2)These noise models were selected based on their frequency response characteristics and the fact thatthey are all stable and inversely stable. The techniques used to analyze the experimental data aredescribed in what follows.11.3 Experimental analysisAs mentioned before, the dynamic output response of the closed-loop system is compared for eachtest, before and after adaptive tuning. Furthermore, for each test the DI and ARX-OE identificationestimates are compared to the true plant values. Apart from within each test, it is also desirableto compare the responses and parameter estimates across the various testing configurations. Thedynamic response data is evaluated both qualitatively and quantitatively. The qualitative assess-ment involves observing the output response shape to the set-point changes and the quantitative9411.3. Experimental analysisassessment involves comparing the observed response to the ideal response. The ideal responseshape is generated based on the true process model. To quantify the fit of the response to the idealresponse the mean square error (MSE) is calculated as follows: =1nn∑i=(yi − y˜i), (11.3)where n is the number of data points collected during the dynamic response test and y˜ representsthe values of the ideal response shape. Finally, the parameter estimates are reported in terms oftheir percent error (PE), i.e.,PEij(%) = 100× |bˆij − bij ||bij |, (11.4)for some parameter estimate, bˆ, and true value, b.95Chapter 12Experimental ResultsThis chapter contains three trials each of which has two tests. The first trial explores the use of aPRBS excitation signal with both DI and ARX-OE identification for controller tuning. The secondtrial applies the same two tests but now the ID algorithm is used to replace the PRBS. Finally, thethird trial involves changing the noise model order to test whether an incorrectly specified noisemodel order will be detrimental to the DI method.12.1 Trial 1: PRBS excitationThe first trial applies PRBS excitation signals for the identification experiment. Using a PRBSfor excitation provides a reference for evaluating the ID algorithm. A first order high pass filter isimplemented as the noise model. Two tests are conducted to evaluate the difference between tuningthe controller with the ARX-OE method or the DI method.12.1.1 Test 1.1: Tuning with ARX-OE identificationThe 2× gain MPM is detected by the PM algorithm after only 860 samples, i.e., nID = 4860.Since a delay of 1600 samples is used the PRBS excitation experiment begins at sample numbernID+∆ = 6460. After the PRBS experiment an ARX-OE identification is performed and theestimated parameters are used to tune the controller. The raw input profiles for this test are shownbelow in Figure 12.1. It is apparent from these input profiles that the set-point change in y iscausing undesirable behavior in u and u. This input behavior ultimately degrades the outputresponses of y and y, as seen in Figure 12.2. From Figure 12.2 it is clear that a generous amountof PRBS excitation is used to excite the system. For y the response improves after the controlleris re-tuned but the responses of y and y appear to degrade. Once the controller is re-initializedy and y are unable to stabilize before the start of the second set-point change.9612.1. Trial 1: PRBS excitationFigure 12.1: Raw input profiles for Test 1.1Figure 12.2: Raw output profiles for Test 1.1Zooming in on the process outputs during the first set-point change yields Figure 12.3. It isimmediately clear that the closed-loop dynamic response under MPM exhibits undesirable over-shoot and oscillatory behavior. In fact, the responses of both y and y also exhibit oscillations9712.1. Trial 1: PRBS excitationFigure 12.3: Dynamic response with MPM for Test 1.1even though they do not undergo set-point changes. After the controller is tuned using the ARX-OE model estimates, another set-point change is performed. The output response to this set-pointchange is shown in Figure 12.4. Clear differences exist between the dynamic responses beforeFigure 12.4: Dynamic response after tuning for Test 1.19812.1. Trial 1: PRBS excitationand after adaptive tuning. For output y the response after tuning exhibits essentially no over-shoot or oscillation which is in stark contrast to the response with MPM. One potential downsideto the adaptive tuning response is that it appears more sluggish than the response with MPM.Furthermore, it appears that re-identifying the process models that did not contain MPM has hada detrimental effect on the response of outputs y and y. However, quantitatively as shown by theMSE values in the legend, generally the adaptive tuning improves control. Only for the responseof y does the MSE increase after re-tuning. The MSE for y which is the variable of particularinterest is reduced dramatically.To shed some light on the dynamic response behavior the percent error (PE) of the DI andARX-OE identification parameter estimates are presented in Table 12.1 and listed as (G, G,G, G, G, G). It is immediately clear that the ARX-OE time constant estimates for the GTable 12.1: Percent error of DI and ARX-OE estimates for Test 1.1Method Parameter PE (%)Both Time Delay (0, 0, 4.3, 11.1, 14.3, 21.1)Direct Gain (13.9, 25.5, 100.5, 100.4, 3.8, 33.3)Direct Time Constant (11.6, 26.2, 67.5, 134.3, 94.3, 480.7)ARX-OE Gain (28.7, 36.8, 117.5, 113.2, 34.7, 537.5)ARX-OE Time Constant (26.7, 36.8, 67.5, 117.6, 114.1, 1720.7)transfer functions are very poorly estimated. In fact, the ARX-OE gain estimates and the DI timeconstant and gain estimates for the G transfer functions are all significantly biased. Note that thepoorly estimated parameters generally correspond to transfer functions that have incorrect timedelay estimates. It is likely that part of the error in the time constant estimates is compensating forthe incorrect time delay specification. Ultimately the majority of these estimates are too inaccuratefor reliable controller tuning. This helps to explain the strange behavior observed in Figure 12.2for outputs y and y after the controller is re-initialized. It is important to note that although theestimated parameters exhibit considerable inaccuracy it is entirely possible that multiple arrange-ments of parameter values for the complex process model can yield similar response behaviors. Thisis why it is important to compare the dynamic response of the process and not simply base theanalysis off of the parameter bias. Next a similar experiment is performed except now the controlleris updated using the DI estimates.9912.1. Trial 1: PRBS excitation12.1.2 Test 1.2: Tuning with direct identificationThe MPM detection algorithm successfully identifies the mismatch at the same point as the previoustest, i.e., after 860 samples at nID = 4860 samples. Once again the PRBS excitation experimentbegins at nID+∆ = 6460. After the PRBS experiment the DI parameter estimates are used to re-tune the controller. The raw input profiles for this test are shown below in Figure 12.5. Again, theseFigure 12.5: Raw input profiles for Test 1.2input profiles show the set-point change in y causing undesirable behavior in u and u. Observingthe excitation experiment between 550 and 800 minutes in Figure 12.5 shows a significant amount ofPRBS excitation that becomes problematic as the input u clearly exceeds acceptable boundaries.As shown in Figure 12.6, once the controller is tuned the output y is unable to stabilize before thesecond set-point change. Recall, y is the basis weight (BW) measured in lbs/(3000ft2), y is thepress moisture (%) and y is the reel moisture (%). The responses of the basis weight and pressmoisture improve after adaptive tuning but the reel moisture response to the set-point change issignificantly worse after tuning.The output responses from the first set-point change are presented more clearly in Figure 12.7.During this set-point change 2× gain MPM exists in G and the other controller parameters are10% less than the true plant values. As before, the BW response to the set-point change displays10012.1. Trial 1: PRBS excitationFigure 12.6: Raw output profiles for Test 1.2Figure 12.7: Dynamic response with MPM for Test 1.2significant overshoot and oscillatory behavior resulting in a large MSE of  = 24.4. Due to theMIMO coupling the BW set-point change disrupts the press and reel moisture values as well.After the controller is tuned using the DI estimate, another set-point change is performed. The10112.1. Trial 1: PRBS excitationFigure 12.8: Dynamic response after tuning for Test 1.2output response to this set-point change is shown in Figure 12.8. Comparing the responses beforeand after adaptive tuning shows once again that the adaptive tuning improves the BW response toset-point changes. The MSE of the BW dynamic response is reduced from 24.4 to 1.8 as a result ofthe adaptive tuning. Furthermore, the MSE of the press moisture is reduced after tuning but theMSE of the reel moisture increases substantially. The MSE values are presented in the legends ofFigure 12.7 and Figure 12.8. Comparing Figure 12.8 to Figure 12.4 shows the effect of switching toDI from ARX-OE identification. It is clear both visually and by comparing the MSE values thatall three of the dynamic output responses are better with DI than with ARX-OE identification forthis trial.Finally, the PE of the DI and ARX-OE identification parameter estimates are listed from transferfunctionG toG and presented in Table 12.2. As before, these PE values indicate the DI estimatesTable 12.2: Percent error of DI and ARX-OE estimates for Test 1.2Method Parameter PE (%)Both Time Delay (0, 0, 0, 11.1, 57.1, 26.3)Direct Gain (13.7, 27.6, 107.8, 95.7, 26.5, 3.2)Direct Time Constant (11.3, 29.5, 67.5, 149.0, 114.1, 688.4)ARX-OE Gain (22.1, 34.1, 118.7, 58.0, 14.4, 75.2)ARX-OE Time Constant (19.9, 34.2, 67.5, 79.5, 114.1, 1324.6)10212.2. Trial 2: Input design, first order noiseare generally more accurate than the ARX-OE estimates. This helps to explain the reduction inMSE observed by using the DI method. Again, the PE of the time constant estimates are generallymuch higher if the time delay was incorrectly identified. Transfer function G and the G transferfunction parameter estimates exhibit particularly poor identification accuracy. In the followingtrial the PRBS excitation signal is replaced by the ID algorithm to determine whether or not betterresults can be obtained.12.2 Trial 2: Input design, first order noiseThe only difference between this trial and the previous one is that instead of using a PRBS excitationsignal the ID algorithm described in Chapter 10 is used. Once again first order HP filter noise modelsare used and two tests are performed to compare the DI and ARX-OE identification methods.12.2.1 Test 2.1: Tuning with ARX-OE identificationOnly 740 samples after the gain mismatch is introduced, the MPM detection algorithm triggersthe identification experiment which begins after a delay of 1600 samples at nID+∆ = 6340. Afterthe ID experiment an ARX-OE estimate is used to tune the controller. The raw input profiles forthis test are shown in Figure 12.9. The input values during the excitation period (between 500and 800 minutes) can be compared to the raw inputs from the previous trial to see the differencebetween using the ID algorithm or the PRBS excitation. The thick stock flow (u) variation duringthe excitation experiment ranges primarily between 160 gpm and 170 gpm and thus satisfies therequired constraints of 165± 30 gpm. Variation of the press steam pressure u is restricted to115± 15 psi which is approximately satisfied for the duration of the excitation experiment. Finally,the dryer steam pressure (u) is constrained between 35± 8 psi but these constraints are violateda few times during the excitation experiment, particularly around time 550 min. Relative to usinga PRBS as in the previous trial the dryer steam pressure is well-behaved. In general the PRBSexcitation created larger and more frequent input constraint violations than the ID algorithm.Similarly, each of the process outputs conform to the specified upper and lower bound constraintsduring the ID excitation experiment. This is in contrast to the PRBS excitation which occasionallyresulted in violations of the output constraints. The raw output profiles for this test are shown in10312.2. Trial 2: Input design, first order noiseFigure 12.9: Raw input profiles for Test 2.1Figure 12.10: Raw output profiles for Test 2.1Figure 12.10. Comparing the press moisture (y) profile before and after adaptive tuning shows themagnitude of the disruption due to the set-point change is reduced after adaptive tuning.Zooming in on the process outputs during the first set-point change allows for a clearer assess-10412.2. Trial 2: Input design, first order noisement of the dynamic output responses. Figure 12.11 shows the typical oscillatory and over-shootbehavior seen when MPM exists. After the controller is tuned using the ARX-OE model estimate,Figure 12.11: Dynamic response with MPM for Test 2.1another set-point change is performed. The output response to this set-point change is shown inFigure 12.12. The BW response is much closer to the ideal response (shown in green) after thecontroller is tuned. Both the BW and press moisture (y and y) have a lower MSE after tuningwhereas the reel moisture is slightly increased. The MSE of the reel moisture is much smaller usingthe ID algorithm than it was in the previous trial with PRBS excitation.The PE of the DI and ARX-OE parameter estimates are listed in Table 12.3 from transferfunction G to G. Once again some of the time delay estimates are incorrectly estimated whichappears to effect the time constant estimates, particularly for G. The ARX-OE gains for Gand G are poorly estimated but otherwise the majority of the gain estimates for both the DIand ARX-OE methods are reasonably accurate. Comparing these values to those in Table 12.2demonstrates that the ID algorithm provides significantly better estimates (in terms of PE) thanthe PRBS excitation. So far it appears as if the ID algorithm provides better identification estimatesand improved controller performance than the PRBS excitation.10512.2. Trial 2: Input design, first order noiseFigure 12.12: Dynamic response after tuning for Test 2.1Table 12.3: Percent error of DI and ARX-OE estimates for Test 2.1Method Parameter PE (%)Both Time Delay (0, 0, 4.3, 22.2, 71.4, 0)Direct Gain (6.9, 7.9, 7.6, 41.8, 37.1, 28.3)Direct Time Constant (6.2, 13.3, 2.4, 87.1, 16.8, 11.9)ARX-OE Gain (12.7, 58.2, 116.1, 97.7, 23.2, 1.5)ARX-OE Time Constant (11.5, 72.9, 67.5, 192.2, 27.0, 17.6)12.2.2 Test 2.2: Tuning with direct identificationThe 2× gain MPM is detected by the PM algorithm 860 samples after it is introduced and theidentification experiment begins at sample number nID+∆ = 6460. As before the ID technique isused to perform the excitation but this time the DI method is used to tune the controller. The inputprofiles for this test are shown below in Figure 12.13. For the most part the excitation experimentappears very similar to the previous test. Roughly speaking the input constraints appear to beapproximately met with the ID excitation. The only significant constraint violations are due tothe set-point changes in y. Throughout the excitation experiment each of the output variablessatisfies the upper and lower bound constraints. The raw output profiles are presented in Figure12.14. Once again the press moisture (y) is much less disrupted by the set-point change after theadaptive tuning is performed.10612.2. Trial 2: Input design, first order noiseFigure 12.13: Raw input profiles for Test 2.2Figure 12.14: Raw output profiles for Test 2.2A closer look at the dynamic response to the first set-point change (i.e., under MPM) is providedin Figure 12.15. The typical oscillatory response with over-shoot is observed once again, particularlyfor the BW profile. In each of the last four tests the output responses under MPM have been very10712.2. Trial 2: Input design, first order noisesimilar which is expected considering the simulation is equivalently configured up until the excitationexperiment. After the controller is tuned using the DI estimates, another set-point change is carriedFigure 12.15: Dynamic response with MPM for Test 2.2out. The output response to this set-point change is shown in Figure 12.16 and is significantly betterthan before adaptive tuning. Specifically the BW response is essentially equivalent to the ideal BWresponse, represented by the green profile. Comparing the BW response to the previous test (i.e.,tuning with ARX-OE estimates) shows that the DI estimate has resulted in a slightly better dynamicresponse. The press moisture is also slightly improved with the DI estimate but the reel moisture isslightly worse. The difference between tuning with either the DI or ARX-OE identification estimatesis fairly negligible but if anything the DI method appears slightly better.Finally, the PE of the DI and ARX-OE identification parameter estimates are listed as (G,G, G, G, G, G) and presented in Table 12.4. Once again the time delays for some ofthe transfer functions are incorrectly estimated which appears to effect the corresponding timeconstant estimates. The DI gain estimates are fairly accurate and so are most of the ARX-OEidentification estimates. So far the DI estimates have generally had lower PE values than the ARX-OE identification estimates and the ID excitation has resulted in lower PE values than the PRBSexcitation. The following trial studies the effect of changing the order of the noise models.10812.3. Trial 3: Input design, sixth order noise modelFigure 12.16: Dynamic response after tuning for Test 2.2Table 12.4: Percent error of DI and ARX-OE estimates for Test 2.2Method Parameter PE (%)Both Time Delay (0, 0, 0, 22.2, 42.9, 15.8)Direct Gain (8.6, 6.2, 16.3, 35.1, 52.6, 13.3)Direct Time Constant (10.4, 11.7, 12.5, 192.2, 31.4, 193.3)ARX-OE Gain (13.2, 59.2, 108.4, 97.7, 19.2, 81.8)ARX-OE Time Constant (12.1, 77.7, 67.5, 192.2, 114.1, 447.0)12.3 Trial 3: Input design, sixth order noise modelOnce again the ID technique is applied in this trial however now the prediction horizon is changedfrom a length of four to a length of three. From the previous two trials it was determined that theID technique provides a more controlled excitation and better system identification results than asimple PRBS excitation. Furthermore the DI method so far appears to have slightly out-performedthe ARX-OE identification method. The previous trial implemented a first order noise model so itwas expected for the DI method to perform well because the DI method correctly assumed the noisemodel was first order. In this trial the true noise models are changed to sixth order high pass filterswhile the DI method still assumes that the noise model is first order. Two tests are performed totune the controller with both the DI method and the ARX-OE identification method.10912.3. Trial 3: Input design, sixth order noise model12.3.1 Test 3.1: Tuning with ARX-OE identificationThe PM algorithm successfully identifies the MPM after 800 samples and after a delay of 1600samples the identification experiment begins at sample number 6400. After the ID experiment anARX-OE estimate is used to tune the controller. The input profiles for this test, as shown below inFigure 12.17, demonstrate a relatively small magnitude of excitation relative to the previous tests.Apart from a brief disruption at around 540 minutes the inputs conform to their upper and lowerbound constraints during the excitation experiment. The primary reason the excitation magnitudeis lower is the shorter prediction horizon used by the ID algorithm in this trial. The response ofboth u and u to the set-point change are clearly better after adaptive tuning than with MPM.Figure 12.17: Raw input profiles for Test 3.1Observing the raw output profiles in Figure 12.18 shows that the output variation during theexcitation experiment is significantly less than the previous tests. Clearly the output upper andlower bound constraints are satisfied during the excitation experiments. The press moisture y is farless disrupted by the set-point change after the adaptive tuning is conducted. Just before both 550minutes and 800 minutes the process outputs experience a slight disturbance which corresponds withthe beginning and end of the excitation experiment. The remainder of the excitation experimentexhibits only a small amount of variation in each of the outputs.11012.3. Trial 3: Input design, sixth order noise modelFigure 12.18: Raw output profiles for Test 3.1Figure 12.19: Dynamic response with MPM for Test 3.1The dynamic response of the process outputs before adaptive tuning are shown in Figure 12.19.Again this response has the same oscillatory and over-shoot behavior as before. After the controlleris tuned using the ARX-OE identification estimate, another set-point change is performed. The11112.3. Trial 3: Input design, sixth order noise modeloutput response of the re-tuned closed-loop system to this set-point change is shown in Figure12.20. The same differences as before are observed between the dynamic responses, before and afteradaptive tuning. The response after tuning is very smooth with essentially zero over-shoot, whereasbefore tuning the response has large over-shoot and lots of oscillations. Although the legend of yshows an MSE of zero this is simply because the true value is rounded to the nearest two decimalpoints. Relative to the previous trial the output responses after tuning all have a lower MSE. It ispossible that the sixth order noise models introduce more excitation to the process than the previousfirst order noise models, ultimately resulting in a more accurate identification of the process. ThePE values are analyzed for further investigation.Figure 12.20: Dynamic response after tuning for Test 3.1The PE values for the time delays, DI and ARX-OE estimates are presented for each transferfunction in Table 12.5. Once again some of the time delay values are incorrectly estimated whichappears to result in further inaccuracies with the time constant estimates. Relative to the previoustests both the DI and the ARX-OE identification gain estimates are significantly improved. TheDI gain estimates are particularly accurate with the largest PE being 22.9%. This is surprisingconsidering the DI method is incorrectly specified with a first order noise model. From the PEresults it appears as if incorrect noise model order specification has not had a detrimental effect11212.3. Trial 3: Input design, sixth order noise modelon the DI method for the MIMO MD process. The final test aims to confirm this observation byanalyzing the controller response after tuning with the DI estimates.Table 12.5: Percent error of DI and ARX-OE estimates for Test 3.1Method Parameter PE (%)Both Time Delay (0, 0, 4.3, 0, 57.1, 5.3)Direct Gain (8.6, 3.6, 0.4, 5.3, 19.3, 22.9)Direct Time Constant (1.8, 3.9, 4.8, 62.6, 66.4, 126.4)ARX-OE Gain (9.2, 11.6, 21.5, 2.2, 44.4, 38.5)ARX-OE Time Constant (3.2, 6.9, 21.4, 59.4, 108.4, 116.1)12.3.2 Test 3.2: Tuning with direct identificationFor the final test the sixth order HP filter noise models are used with ID excitation but now theDI estimates are used to tune the controller. Just 800 samples after the mismatch is introduced itis successfully detected by the PM algorithm which triggers an identification experiment to beginat sample number 6400. The plant input profiles from the closed-loop system are shown in Figure12.21. Once again the smaller prediction horizon has resulted in a lower magnitude of input variationFigure 12.21: Raw input profiles for Test 3.2during the identification experiments. Furthermore, both u and u are significantly less disrupteddue to the set-point change after adaptive tuning than before. Similarly, the outputs presented11312.3. Trial 3: Input design, sixth order noise modelin Figure 12.22 show only a small amount of variation during the excitation experiment and thusclearly meet their respective constraints. From an initial glance it appears that all of the outputresponses to the y set-point change are improved after adaptive tuning is conducted.Figure 12.22: Raw output profiles for Test 3.2The various output responses to the first set-point change are presented more clearly along withtheir MSE values in Figure 12.23. By now these dynamic responses under MPM have becomeredundant since the behavior is essentially the same for each test. The second set-point change isconducted after the controller is tuned using the DI estimate. The output response to this set-pointchange is shown in Figure 12.24. Now both moisture outputs (i.e., y and y) portray an MSE ofzero although this is only due to the fact that they are rounded to two decimal places. Comparingthe dynamic responses before and after adaptive tuning demonstrates an improvement in each ofthe output profiles after the controller is re-tuned. The BW response which is the only one undersignificant MPM becomes significantly improved after the identification experiment is conducted.Comparing with the previous test demonstrates that the BW response was slightly better withthe ARX-OE controller tuning and the reel moisture (y) response was slightly better with the DIcontroller tuning. Ultimately, the difference between tuning with the DI or ARX-OE identificationis fairly negligible for this trial in terms of controller performance.11412.3. Trial 3: Input design, sixth order noise modelFigure 12.23: Dynamic response with MPM for Test 3.2Figure 12.24: Dynamic response after tuning for Test 3.2As a final analysis the PE values for the DI and ARX-OE estimate are presented in Table 12.6.Once again some of the time delay values are incorrectly estimated. Failure to correctly identify thetime delays has been a recurring theme throughout these results. Typically the estimates are close11512.3. Trial 3: Input design, sixth order noise modeland most are correct but the time delays certainly represent a significant source of error within thecurrent setup. Surprisingly the DI gain estimates and all but one of the DI time constant estimatesare fairly accurate. It appears as if changing the noise model from first to sixth order did not haveany detrimental effect on the DI estimates. The ARX-OE estimates are also fairly accurate withthe exception of a couple of poorly estimated time constants (likely compensating for the incorrecttime delay estimates). From the three different trials the experimental results demonstrate thatthe ID algorithm is superior to using a PRBS excitation and that the DI method provides slightlybetter results than the ARX-OE method even if the noise model order is incorrectly specified.Table 12.6: Percent error of DI and ARX-OE estimates for Test 3.2Method Parameter PE (%)Both Time Delay (13.3, 0, 4.3, 22.2, 42.9, 10.5)Direct Gain (17.4, 3.0, 1.0, 0.8, 8.4, 4.4)Direct Time Constant (27.3, 5.6, 4.7, 8.3, 15.1, 102.8)ARX-OE Gain (25.1, 13.9, 20.6, 2.0, 43.7, 37.6)ARX-OE Time Constant (35.3, 6.1, 18.7, 49.0, 74.3, 145.4)116Chapter 13ConclusionTo summarize the results from the previous chapter, the adaptive control framework is capableof consistently identifying MPM, generating an excitation signal that meets IO constraints andidentifying a new process model. The third aspect of the adaptive control framework, i.e., systemidentification, is likely the part that could be improved upon the most. For the most part theparameter estimates are fairly accurate but there are certain transfer functions that are consistentlymore difficult to estimate than others. Furthermore, the SI technique correctly identifies most ofthe time delay values but inaccuracies in a few of the transfer function time delay estimates areproblematic. One potential solution is to further analyze the scores from the MPM detectionalgorithm and only update the parameters from the transfer functions that exhibits MPM.In all of the presented tests the output response to a set-point change of the transfer function thatunderwent MPM is significantly improved after the adaptive tuning. One fairly surprising resultis that the DI method provides slightly better results than the ARX-OE method, even when thenoise model order is incorrectly specified. Overall the results of the DI and ARX-OE identificationestimates are very comparable, particularly the output response profiles. It is recommended thatfurther studies be carried out on this adaptive control framework to determine ways to improvethe closed-loop system identification. One potential solution could be to use a longer durationfor the excitation experiment. Determining a better time delay estimation method, such as byincorporating available process data, could potentially provide improved identification results.Comparing the results of the first two trials demonstrates that the ID method enables moreaccurate identification of the closed-loop system than a simple PRBS excitation. Both the dynamicresponse of the outputs and the PE values of the parameter estimates were improved by using theID algorithm. Furthermore, the ID technique is able to provide better control of the input andoutput variation during the excitation experiment than the PRBS excitation. Particularly in the117Chapter 13. Conclusionfinal trial where the prediction horizon is set to three the ID algorithm is able to facilitate accurateSI results with only a small amount of IO deviation from the set-point. However, the change innoise model orders from first to sixth order could have influenced the excitation experiment as well.To follow up, the prediction horizon could be changed to four while using the sixth order noisemodels.Each aspect of the adaptive control framework is capable of online implementation. Only the IDalgorithm is computationally cumbersome which is why the prediction horizon is limited to valuesof four or less. Lowering the prediction horizon to three in the third trial appears to result in a lowermagnitude of excitation but the parameter estimates are just as accurate regardless. It is importantto note that online implementation of the ID technique is limited to MIMO systems of relativelysmall dimension since the number of computations scales exponentially with the dimension of theMIMO system. One beneficial quality of the ID algorithm is that it is capable of being solvedthrough parallel computation. Dividing the expensive search of combinations between multipleprocessors could substantially increase the speed of each computation.Ultimately, this report has successfully demonstrated the functionality and performance of thecurrent implementation of the adaptive control framework. Further testing of the various adap-tive control aspects is recommended to better understand the potential limitations and developimprovements. Recommendations for future studies include testing the MPM detection algorithmwith different amounts of mismatch in each of the parameters for each of the transfer functions.Furthermore, it is recommended to study different variations of the excitation experiment such asaltering the duration and the prediction horizon. Although there is room for further development,the contributions made in collaboration with fellow colleagues have resulted in a framework thatcan automatically detect MPM and re-tune an MPC controller while maintaining feedback con-trol. This work provides a substantial contribution in developing a framework for the adaptivetuning of a MIMO model-based control system without requiring expensive open-loop identificationexperiments or operator intervention.118Bibliography[1] Calculation and partitioning of variance using paper machine scanning sensor measurements.TAPPI, TIP 1101-01, 2005.[2] Measurement systems and product variability. TAPPI, Paper Machine Quality Control Systems(QCS), 1, 2005.[3] George B Arfken and Hans J Weber. Mathematical methods for physicists international studentedition. Academic press, 2005.[4] S Aslani, MS Davies, GA Dumont, and GE Stewart. Detecting aliasing between cross andmachine direction variations by variable sampling rate. measurement, 20(40):1, 2009.[5] S Aslani, MS Davies, GA Dumont, and GE Stewart. Estimation of cross and machine directionvariations using recursive wavelet filtering. Pulp and Paper Canada, pages 45–48, 2009.[6] Abhijit S Badwe, Ravindra D Gudi, Rohit S Patwardhan, Sirish L Shah, and Sachin C Pat-wardhan. Detection of model-plant mismatch in mpc applications. Journal of Process Control,19(8):1305–1313, 2009.[7] Alexander Bastounis and Anders C Hansen. On the absence of the rip in real-world applicationsof compressed sensing and the rip in levels. arXiv preprint arXiv:1411.4449, 2014.[8] WL Bialkowski. Newsprint uniformity and process control the untapped competitive edge. InCPPA Conference, Montreal, Canada, 1990.[9] Emmanuel J Cande`s et al. Compressive sampling. In Proceedings of the international congressof mathematicians, volume 3, pages 1433–1452. Madrid, Spain, 2006.[10] Emmanuel J Cande`s and Michael B Wakin. An introduction to compressive sampling. IEEEsignal processing magazine, 25(2):21–30, 2008.119Bibliography[11] Anindya Chatterjee. An introduction to the proper orthogonal decomposition. Current science,78(7):808–817, 2000.[12] Lennart Ljung Tianshi Chen. What can regularization offer for estimation of dynamical sys-tems? IFAC Proceedings Volumes, 46(11):1–8, 2013.[13] S.C. Chen. Determination of cd and/or md variations from scanning measurements of a sheetof material, November 1 2012. US Patent App. 13/457,870.[14] Shih-Chin Chen. Analysis of sheet variations–insights of two-dimensional variations.[15] Shih-Chin Chen. Kalman filtering applied to sheet measurement. In American Control Con-ference, 1988, pages 643–647. IEEE, 1988.[16] Danlei Chu, Cristian Gheorghe, Johan Backstrom, Michael Forbes, and Stephen Chu. Modelpredictive control and optimization for papermaking processes. INTECH Open Access Publisher,2011.[17] William T Cochran, James W Cooley, David L Favin, Howard D Helms, Reginald A Kaenel,William W Lang, GC Maling, David E Nelson, Charles M Rader, and Peter D Welch. Whatis the fast fourier transform? Proceedings of the IEEE, 55(10):1664–1674, 1967.[18] James W Cooley, Peter AW Lewis, and Peter D Welch. The fast fourier transform and itsapplications. IEEE Transactions on Education, 12(1):27–34, 1969.[19] K Cutshall. The nature of paper variation. Tappi journal, 73(6):81–90, 1990.[20] Marco F Duarte and Yonina C Eldar. Structured compressed sensing: From theory to appli-cations. IEEE Transactions on Signal Processing, 59(9):4053–4085, 2011.[21] Guy A Dumont, Ivar Mar Jonsson, Michael S Davies, Fariborz T Ordubadi, Ye Fu, K Natara-jan, Claes Lindeborg, and E Michael Heaven. Estimation of moisture variations on papermachines. IEEE transactions on control systems technology, 1(2):101–113, 1993.[22] Stephen Duncan and Peter Wellstead. Processing data from scanning gauges on industrial webprocesses. Automatica, 40(3):431–437, 2004.120Bibliography[23] Andrew P Featherstone and Richard D Braatz. Control-oriented modeling of sheet and filmprocesses. AIChE Journal, 43(8):1989–2001, 1997.[24] Urban Forssell and Lennart Ljung. Closed-loop identification revisited. Automatica,35(7):1215–1241, 1999.[25] Urban Forssell and Lennart Ljung. A projection method for closed-loop identification. IEEETransactions on Automatic Control, 45(11):2101–2106, 2000.[26] Nikolaos M Freris, Orhan O¨c¸al, and Martin Vetterli. Recursive compressed sensing. arXivpreprint arXiv:1312.4895, 2013.[27] Sylvain Gendron. Fundamental properties of scanner signals for the correct separation of papermachine cd and md variations. PAPTAC 91st Annual Meeting, 1(1):D297D302, 2005.[28] Ivar Gustavsson, Lennart Ljung, and Torsten So¨derstro¨m. Identification of processes in closedloopidentifiability and accuracy aspects. Automatica, 13(1):59–75, 1977.[29] Alena Halouskova´, Miroslav Ka´rny`, and Ivan Nagy. Adaptive cross-direction control of paperbasis weight. Automatica, 29(2):425–429, 1993.[30] Li Han, Liguo Han, and Zhao Li. Inverse spectral decomposition with the spgl1 algorithm.Journal of Geophysics and Engineering, 9(4):423, 2012.[31] Fredric J Harris. On the use of windows for harmonic analysis with the discrete fourier trans-form. Proceedings of the IEEE, 66(1):51–83, 1978.[32] Thomas J Harris. Assessment of control loop performance. The Canadian Journal of ChemicalEngineering, 67(5):856–861, 1989.[33] E Michael Heaven, Ivar M Jonsson, T Max Kean, Michael A Manness, and Robert N Vyse.Recent advances in cross machine profile control. IEEE Control Systems, 14(5):35–46, 1994.[34] ECPlaza Network Inc. Release paper and film coating machine, 1996.[35] Mohieddine Jelali. An overview of control performance assessment technology and industrialapplications. Control engineering practice, 14(5):441–466, 2006.121Bibliography[36] Soroush Karimzadeh. Online detection of picketing and estimation of cross direction and ma-chine direction variations using the discrete cosine transform. PhD thesis, University of BritishColumbia, 2008.[37] Dave Lang and Calvin Fu. Variable speed scanning–a fundamentally better way to scan. OPapel: revista mensal de tecnologia em celulose e papel, 74(4):59–64, 2013.[38] Ron Larson. Elementary linear algebra. Nelson Education, 2016.[39] Canfor Library. Wet end of pulp machine showing fibre mat on the wire.[40] C Lindeborg. A process model of moisture variations. Pulp & paper Canada, 87(4):42–47, 1986.[41] Lennart Ljung. System identification. Wiley Online Library, 1999.[42] Q Lu, RB Gopaluni, MG Forbes, PD Loewen, JU Backstrom, and GA Dumont. Model-plantmismatch detection with support vector machines. Accepted in 2017 IFAC World Congress,Toulouse, France, 2017.[43] Q Lu, LD Rippon, RB Gopaluni, MG Forbes, and PD Loewen. A novel closed-loop arx output-error identication method. Pending submission, 2017.[44] Qiugang Lu, Lee D Rippon, R Bhushan Gopaluni, Michael G Forbes, Philip D Loewen, JohanBackstrom, and Guy A Dumont. Cross-directional controller performance monitoring for papermachines. In American Control Conference (ACC), 2015, pages 4970–4975. IEEE, 2015.[45] Alain Martel, Wissem MBarek, and Sophie DAmours. International factors in the design ofmultinational supply chains: the case of canadian pulp and paper companies. Document detravail DT-2005-AM-3, Centor, Universite´ Laval, 10, 2005.[46] Zoran Nesic, Michael Davies, and G Dumont. Paper machine data compression using wavelets.In Control Applications, 1996., Proceedings of the 1996 IEEE International Conference on,pages 161–166. IEEE, 1996.[47] M Ohenoja. One-and two-dimensional control of paper machine: a literature review. october2009. Technical report, ISBN 978-951-42-9316-0.122Bibliography[48] Rohit S Patwardhan and R Bhushan Goapluni. A moving horizon approach to input designfor closed loop identification. Journal of Process Control, 24(3):188–202, 2014.[49] Don G Roberts, J Lethbridge, and H Carreau. Changes in the global forest products industry.Forest Sciences Centre, 2004.[50] Maristela Oliveira Santos and Bernardo Almada-Lobo. Integrated pulp and paper mill planningand scheduling. Computers & Industrial Engineering, 63(1):1–12, 2012.[51] JC Skelton, PE Wellstead, and SR Duncan. Distortion of web profiles by scanned measure-ments. Pulp & Paper Canada, 104(12):81–84, 2003.[52] Sigurd Skogestad and Ian Postlethwaite. Multivariable feedback control: analysis and design,volume 2. Wiley New York, 2007.[53] Siemens Industry Solutions. Plate rolling mill equipped by siemens, 2011.[54] Gregory E Stewart, Dimitry M Gorinevsky, and Guy Albert Dumont. Feedback controllerdesign for a spatially distributed system: The paper machine problem. IEEE Transactions onControl Systems Technology, 11(5):612–628, 2003.[55] Parisa Towfighi. Estimating paper sheet process variations from scanned data using compressivesensing. PhD thesis, University of British Columbia, 2011.[56] Paul Van den Hof. Closed-loop issues in system identification. Annual reviews in control,22:173–186, 1998.[57] Jeremy G VanAntwerp, Andrew P Featherstone, Richard D Braatz, and Babatunde A Ogun-naike. Cross-directional control of sheet and film processes. Automatica, 43(2):191–211, 2007.[58] Xiaochun George Wang, Guy A Dumont, and Michael S Davies. Adaptive basis weight controlin paper machines. In Control Applications, 1993., Second IEEE Conference on, pages 209–216.IEEE, 1993.[59] Xiaochun George Wang, Guy A Dumont, and Michael S Davies. Estimation in paper machinecontrol. IEEE Control Systems, 13(4):34–43, 1993.123[60] Xiaochun George Wang, Guy A Dumont, and Michael S Davies. Modelling and identifications ofbasis weight variations in paper machines. IEEE Transactions on Control Systems Technology,1(4):230–237, 1993.[61] Glenn Weigel. A strategic planning model for maximizing value creation in pulp and papermills. Universite´ Laval, Que´bec, 57, 2005.[62] M Yousefi, LD Rippon, MG Forbes, RB Gopaluni, PD Loewen, GA Dumont, and J Backstrom.Moving-horizon predictive input design for closed-loop identification. IFAC-PapersOnLine,48(8):135–140, 2015.[63] Yucai Zhu. Multivariable process identification for mpc: the asymptotic method and its appli-cations. Journal of Process Control, 8(2):101–115, 1998.124Appendix AAdditional Sheet Estimation ResultsTable A.1: Simulation parameters for Trial 3, 100 CD binsParameter Description Valuewsheet sheet width (m) 6vscan scan speed(s) (m/s) 1tscan scan time (s) 6vsheet sheet speed (m/s) 3θ traversing angle from MD (◦) 18.4nCD = nB number of CD bins 100nS number of scans (per scanner) 2ωMD frequency of MD sinusoid(s) (Hz) 0.08, 0.16Table A.2: Simulation parameters for Trial 3, 60 binsParameter Description Valuewsheet sheet width (m) 8vscan scan speed(s) (m/s) 1tscan scan time (s) 8vsheet sheet speed (m/s) 1θ traversing angle from MD (◦) 45nCD = nB number of CD bins 60nS number of scans (per scanner) 2ωMD frequency of MD sinusoid(s) (Hz) 0.062, 0.12Table A.3: Simulation parameters for Trial 3, 60 CD bins, 4 scansParameter Description Valuewsheet sheet width (m) 8vscan scan speed(s) (m/s) 1tscan scan time (s) 8vsheet sheet speed (m/s) 8θ traversing angle from MD (◦) 7.1nCD = nB number of CD bins 60nS number of scans (per scanner) 4ωMD frequency of MD sinusoid(s) (Hz) 0.062, 0.12125Appendix A. Additional Sheet Estimation ResultsFigure A.1: Simulated sheet with 100 CD bins and 4 scansFigure A.2: Compressive sensing estimate with 100 CD bins and 4 scans126Appendix A. Additional Sheet Estimation ResultsFigure A.3: MD-CD separation results with 100 CD bins and 4 scansFigure A.4: Simulated sheet and CS estimate with 60 CD bins and 4 scans127Appendix A. Additional Sheet Estimation ResultsFigure A.5: MD-CD separation results with 60 CD bins and 4 scans128


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items