UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Applications of spectroscopy and chemometrics in the pulp and paper industry Christy, Ashton 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_november_christy_ashton.pdf [ 107.42MB ]
Metadata
JSON: 24-1.0383320.json
JSON-LD: 24-1.0383320-ld.json
RDF/XML (Pretty): 24-1.0383320-rdf.xml
RDF/JSON: 24-1.0383320-rdf.json
Turtle: 24-1.0383320-turtle.txt
N-Triples: 24-1.0383320-rdf-ntriples.txt
Original Record: 24-1.0383320-source.json
Full Text
24-1.0383320-fulltext.txt
Citation
24-1.0383320.ris

Full Text

Applications of Spectroscopy and Chemometrics in thePulp and Paper IndustrybyAshton ChristyM.Sc., The University of British Columbia, 2016B.Sc. Hons., The University of British Columbia, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Chemistry)The University Of British Columbia(Vancouver)October 2019© Ashton Christy, 2019The following individuals certify that they have read, and recommend to theFaculty of Graduate and Postdoctoral Studies for acceptance, the dissertation enti-tled:Applications of Spectroscopy and Chemometrics in the Pulp and Paper Indus-trysubmitted by Ashton Christy in partial fulfillment of the requirements for thedegree of Doctor of Philosophy in ChemistryExamining Committee:Edward R. Grant, Chemistry & PhysicsSupervisorDavid Chen, ChemistrySupervisory Committee MemberKeng-Chang Chou, ChemistrySupervisory Committee MemberAlex Wang, ChemistrySupervisory Committee MemberRoman Krems, ChemistryUniversity ExaminerMark MacLachlan, ChemistryUniversity ExaminerAlex Brolo, Chemistry, University of VictoriaExternal ExamineriiAbstractThe pulp and paper industry stands to benefit immensely from the development ofautomated process control technologies that provide real-time feedback about thequality of in-process product. Current methods are destructive and labor-intensivewet-chemical assays, which cannot be implemented in an on-line setting. Rapidon-line alternatives to these methods hold great promise for improving efficiencyand reducing costs, as well as providing the opportunity to make product qualityguarantees based on data collected from in-process samples.This thesis presents the progress made on the development of two such au-tomated methods. The first, principal method couples Raman spectroscopy withchemometric analysis to model and predict value-critical properties of pulp prod-ucts, with a focus on strength properties. The second method implements machinevision in the detection of contaminants in in-process pulp, the presence of whichhave a deleterious effect on product strength - and therefore value. In both cases,we have taken these techniques from academic proofs-of-concept to industrial tri-als, one in a pilot plant, and the other in a pulp mill. This is a significant milestonein any academic-industrial collaboration.The first two chapters provide overviews of the nature of pulp and the cur-rent state of analytics in the industry, followed by a theoretical discussion of themethods used in this project. Following this are three chapters documenting theprogress made towards the development of the Raman probe system, and a chapterpresenting machine vision system, used to detect pulp contaminants. Finally, thereis a discussion of some of the ongoing challenges, as well as future steps that willbe undertaken to bring these technologies to full-scale on-line implementation in aworking pulp mill.iiiLay SummaryThe pulp and paper industry is very interested in developing automated qualitycontrol technologies, so that they can save time and money during their manufac-turing process. This thesis presents progress towards the development of two suchtechnologies. The first uses Raman spectroscopy and chemometrics to predict im-portant properties of pulp products, with the goal of making traditional measure-ments - which are lengthy and destructive - unnecessary. The second uses machinevision to find contaminants in pulp products. In both cases, we have taken thesetechniques from academic proofs-of-concept to trials in an industrial setting, a veryimportant step along the path to implementing these techniques in a working pulpmill.ivPrefaceThis work is based on foundational theoretical work performed by former groupmembers Alison Bain and Najmeh Tavassoli, published in their respective theses.[1, 2] The progress of this project has been guided by the input of Dr. Paul Bichoof Canfor Pulp Innovation.Chapter 3 was adapted from a technical report to Canfor Pulp Innovation, in-corporating core work from Alison’s thesis, and expanded upon by me. [3] Cod-ing, data collection, and data analysis were performed by the author. Assistanceand guidance with some aspects of data collection and analysis was provided byvarious collaborators, including Elias Sundvall, Otto Lindeberg, Flora Iranmanesh,Kiara Grant, and Michelle Li. All figures and photos in this work were producedby the author.The pulp and paper samples used throughout this work were provided by Can-for Pulp Innovation and Domsjö Fabriker AB. PulpEye AB provided us with themodel brightness chamber discussed in Chapter 4.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Industry 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 “Smart Manufacturing” in the Pulp and Paper Industry . . . . . . 21.2.1 Scope of the Project . . . . . . . . . . . . . . . . . . . . 31.3 Chemical and Physical Attributes: Pulp as a Biomaterial . . . . . 41.3.1 Hemicellulose . . . . . . . . . . . . . . . . . . . . . . . 51.3.2 Lignin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5vi1.4 Industrial and Commercial Attributes: Pulp as a Commodity . . . 51.4.1 Trees to Bales: The Pulping Process . . . . . . . . . . . . 61.5 Properties of Cellulose Fibers . . . . . . . . . . . . . . . . . . . . 71.5.1 Chemical Properties . . . . . . . . . . . . . . . . . . . . 71.5.2 Morphological Properties . . . . . . . . . . . . . . . . . . 81.5.3 Characterizations of Pulp . . . . . . . . . . . . . . . . . . 92 Process Control: Theory and Methods . . . . . . . . . . . . . . . . . 112.1 Machine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.1 Image Enhancement: Histogram Equalization . . . . . . . 132.1.2 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . 152.1.3 Image Classification . . . . . . . . . . . . . . . . . . . . 162.2 Vibrational Spectroscopy . . . . . . . . . . . . . . . . . . . . . . 172.2.1 General Theory . . . . . . . . . . . . . . . . . . . . . . . 172.2.2 Mid-Infrared and Near-Infrared Spectroscopy . . . . . . . 202.2.3 Raman Spectroscopy . . . . . . . . . . . . . . . . . . . . 212.3 Chemometrics and Multivariate Analysis . . . . . . . . . . . . . . 252.3.1 Discrete Wavelet Transform . . . . . . . . . . . . . . . . 262.3.2 Partial Least-Squares Regression . . . . . . . . . . . . . . 293 Modelling Dissolving Pulp Viscosity . . . . . . . . . . . . . . . . . . 343.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . 353.2.1 Data Treatment . . . . . . . . . . . . . . . . . . . . . . . 363.3 Results: The Laboratory Proof-of-Concept . . . . . . . . . . . . . 383.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 Incorporating Raman Spectroscopy into an On-Line Analysis Mod-ule: A Laboratory Model . . . . . . . . . . . . . . . . . . . . . . . . 474.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 A Benchtop Brightness-Raman Chamber: Design and Development 494.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . 534.3.1 Pad Formation and Analysis . . . . . . . . . . . . . . . . 53vii4.3.2 Data Treatment . . . . . . . . . . . . . . . . . . . . . . . 544.3.3 Gaussian Process Regression . . . . . . . . . . . . . . . . 544.4 Results: The Mini-Pulp Factory . . . . . . . . . . . . . . . . . . 574.4.1 Comparing PLS and GPR . . . . . . . . . . . . . . . . . . 604.4.2 Comparing Matlab and ExtractEye . . . . . . . . . . . . . 624.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Fiber Probes and Data Fusion: Implementation in a Pilot Plant . . . 665.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . 675.2.1 Probe Design: Fiber vs. Free-Space . . . . . . . . . . . . 685.2.2 Data Treatment . . . . . . . . . . . . . . . . . . . . . . . 715.2.3 Data Fusion: What the PulpEye Tells Us . . . . . . . . . 715.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.3.1 Comparison between Free-Space and Fiber-Coupled Probes 735.3.2 Data Fusion . . . . . . . . . . . . . . . . . . . . . . . . . 755.3.3 Comparison between Fiber-Coupled Probes . . . . . . . . 785.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806 Automated Detection of Contaminants in Pulp Using Machine Vision 856.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . 886.2.1 UV-Vis Image Acquisition . . . . . . . . . . . . . . . . . . 886.2.2 NIR Image Acquisition . . . . . . . . . . . . . . . . . . . 896.2.3 Sample Mounting Systems . . . . . . . . . . . . . . . . . 896.2.4 Image Processing using MATLAB . . . . . . . . . . . . . 916.2.5 Image Processing using ENVI . . . . . . . . . . . . . . . 976.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.3.1 UV-Vis Imaging . . . . . . . . . . . . . . . . . . . . . . . 1036.3.2 Laboratory Tests of NIR Imaging . . . . . . . . . . . . . . 1036.3.3 Mill Test of NIR Imaging . . . . . . . . . . . . . . . . . . 110viii6.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . 1137 Future Work: Advancing to the Mill . . . . . . . . . . . . . . . . . . 1157.1 NIR Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.2 Raman Hardware Integration . . . . . . . . . . . . . . . . . . . . 1167.3 Calibration Transfer . . . . . . . . . . . . . . . . . . . . . . . . . 1177.4 Facilitating Data Analysis . . . . . . . . . . . . . . . . . . . . . . 1187.5 Ongoing Modelling Refinements . . . . . . . . . . . . . . . . . . 1197.5.1 Extending Data Fusion . . . . . . . . . . . . . . . . . . . 1197.5.2 Accounting for Refining Energy . . . . . . . . . . . . . . 1207.6 Unbleached Pulp: A Way Forward with Raman? . . . . . . . . . . 1207.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 123Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124A Scripts and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 137A.1 Raman Data Processing - MATLAB: TOGA, PLS, and GPR . . . . . 137A.1.1 Main Processing Script . . . . . . . . . . . . . . . . . . . 137A.1.2 Load Data Set Function . . . . . . . . . . . . . . . . . . 151A.1.3 Load Independent Validation Set Function . . . . . . . . . 162A.1.4 Partial Least-Squares Regression Modelling Functions . . 168A.1.5 PLS Latent Variable Optimization Function . . . . . . . . 174A.2 Raman Data Processing - Python, SQL, and ExtractEye . . . . . . 178A.3 Near-Infrared Image Processing . . . . . . . . . . . . . . . . . . 188A.3.1 Feature Detection Script . . . . . . . . . . . . . . . . . . 188A.3.2 Edge Detection Script . . . . . . . . . . . . . . . . . . . 196A.3.3 ENVI Script . . . . . . . . . . . . . . . . . . . . . . . . 200B Empirical Classification Rule Determination Results . . . . . . . . . 211ixList of TablesTable 2.1 Comparison of spectroscopic techniques. . . . . . . . . . . . . 25Table 3.1 PLS model results for viscosity, built with dissolving pulp data. 42Table 3.2 Viscosity predictions for reserved samples from Domsjö Fab-riker AB. All values in cm3/g. . . . . . . . . . . . . . . . . . 44Table 4.1 PLS and GPR model results from brightness pad data. . . . . . . 59Table 5.1 List of PulpEye data fusion parameters. . . . . . . . . . . . . . 72Table 5.2 PLS model results from dissolving and Northern Bleached Soft-wood Kraft (NBSK) pulp samples, comparing free-space andfiber-coupled probe performance. . . . . . . . . . . . . . . . . 82Table 5.3 PLS model results from NBSK pulp samples, comparing spec-tral (DWT coefficients) and fused (DWT coefficient and PulpEyedata) datasets, gathered with the fiber-coupled Raman probe. . 83Table 5.4 PLS model results from NBSK pulp samples, gathered with ourcustom-built fiber-coupled Raman probe (Fig. 5.2). . . . . . . 84xList of FiguresFigure 1.1 The basic structure of cellulose. . . . . . . . . . . . . . . . . 4Figure 2.1 An example of Histogram Equalization and CLAHE. . . . . . . 14Figure 2.2 Harmonic (left) and anharmonic (Morse) (right) oscillators.Arrow thickness indicates the relative probability of each tran-sition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 2.3 Jabłon´ski diagram illustrating IR, NIRS, and Raman spectro-scopies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 2.4 A basic Raman spectrometer. . . . . . . . . . . . . . . . . . . 23Figure 2.5 A basic illustration of the Discrete Wavelet Transform algo-rithm, decomposing a source spectrum (left) into background(right, top), noise (right, bottom), and signal (right, center)components. . . . . . . . . . . . . . . . . . . . . . . . . . . 27Figure 2.6 Schematic of the Partial Least-Squares Regression algorithm,shown in Eqs. 2.13 and 2.14. . . . . . . . . . . . . . . . . . . 30Figure 2.7 Randomized target vector plots for PLS models built with Ra-man spectra of dissolving pulps. Left: Overfit model; A = 10.Right: Well-fit model; A = 4. These results are discussed inCh. 5). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Figure 3.1 Factor optimization charts for DWT-treated (left) and TOGA feature-selected (right) data. The minimum value forC1 represents theoptimal number of components, 5 and 3 respectively. . . . . . 38xiFigure 3.2 650 Raman spectra of dissolving pulp samples used to buildPLS models. Above: Normalized spectra. Below: Spectra re-constructed from DWT coefficients, filtered by removing thehighest two detail levels and the approximation level. . . . . . 39Figure 3.3 Left: Prediction plot of a PLS model built with the SIM al-gorithm showing 477 training points (red) and 204 validationpoints (green). Right: A similar model, built instead with av-eraged spectra, showing 159 training points (red) and 68 vali-dation points (green). Blue diamonds in both plots indicate thepredictions for a specially reserved test set of 5 spectra whoseviscosities are unknown. All axes in cm3/g. . . . . . . . . . . 40Figure 3.4 Left: Distribution plot for PLS-predicted viscosities, showingmeasured (red) and predicted (blue) values. Right: Error plotfor the same model, ordered by value. . . . . . . . . . . . . . 45Figure 4.1 A brightness chamber . . . . . . . . . . . . . . . . . . . . . . 48Figure 4.2 Schematic showing the integration of Raman measurementswith a PulpEye analysis module. . . . . . . . . . . . . . . . . 49Figure 4.3 Left: CAD rendering of the Raman probe mount for the bright-ness module. Right: The probe mount, as constructed. . . . . 50Figure 4.4 Drawing of the modified brightness chamber. . . . . . . . . . 51Figure 4.5 Our Raman-equipped model brightness module - the “mini-pulp factory”. . . . . . . . . . . . . . . . . . . . . . . . . . . 53Figure 4.6 705 Raman spectra of dissolving pulp pads, produced with thebrightness module. Above: Normalized spectra. Below: Spec-tra reconstructed from DWT coefficients, filtered by removingthe highest two detail levels and the approximation level. . . . 57Figure 4.7 PCA plot of 132 Raman spectra of NBSK samples, averaged andsorted by suspension. Green: first suspension. Red: secondsuspension (first redispersion). Blue: third suspension (secondredispersion). . . . . . . . . . . . . . . . . . . . . . . . . . . 58xiiFigure 4.8 Models of dissolving pulp viscosity. Top Row: Predictionplots. Bottom Row: Distribution plots. Left Col.: PLSmodel, with 12 components. Right Col.: GPR model, with 5-fold cross-validation. Values in cm3/g. . . . . . . . . . . . . 61Figure 4.9 Prediction plots for dissolving pulp viscosity, fromMatlab (left)and ExtractEye (right). Values in cm3/g. . . . . . . . . . . . 62Figure 4.10 First two loadings (p1, p2) and weights (w1,w2) of dissovlingpulp viscosity models from Matlab (red) and ExtractEye (blue). 63Figure 4.11 Histogram of viscosities of dissolving pulp samples. . . . . . 65Figure 5.1 Raman spectrometer (free-space) mounted to a brightness cham-ber in PulpEye’s workshop, without the watertight equipmentcabinet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 5.2 Fiber-coupled Raman probe with spatial filter. . . . . . . . . . 70Figure 5.3 Comparison of free-space (left) and fiber-coupled (right) Ra-man spectra of dissolving pulp samples collected in PulpEye’sworkshop. x-axis shows pixel number, ranging from 265 to2005 cm−1. Note the characteristic sapphire peaks at 90, 200,and 420 px, corresponding to 376, 414, and 641 cm−1. . . . . 74Figure 5.4 Weighted loading plot (w1 vs. w2) for a tensile breaking lengthmodel built using fused Raman (red) and PulpEye (green) datasets.PulpEye parameters are labelled. . . . . . . . . . . . . . . . . 76Figure 5.5 Comparison of models for tensile breaking length, built us-ing free-space spectral reconstructions (left) and fiber-coupledDWT coefficients fused with PulpEye data (right). All NBSKsamples were used in Raman data collection and PLS modelling. 77Figure 5.6 Comparison of Raman spectra of NBSK pulp samples, collectedwith theWP785 (left) and our custom-built (right) fiber probes.x-axis shows pixel number, ranging from 265 to 2005 cm−1. . 78Figure 5.7 Flowchart illustrating the Raman data acquisition and process-ing pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Figure 6.1 The NIR flashlight setup for small pulp samples. . . . . . . . . 90xiiiFigure 6.2 Left: CAD rendering of the light box. Right: The light box,as constructed. Adjacent monitor shows a live NIR view of thebackilluminated pulp sample. . . . . . . . . . . . . . . . . . 91Figure 6.3 A: Raw (blue) and moving average (MA; red) filtered intensityhistorgrams of an NIR Shive image. B: Illustration of imagebinarization threshold determination using the triangle algo-rithm, applied to the data in A. Hypotenuse (dotted) and maxi-mum distance (solid) lines shown; threshold value = 0.4604. . 93Figure 6.4 Shive feature, before and after trimming. Left: Cropped grayscalesubimage showing feature. Right: Trimmed feature image. . . 96Figure 6.5 Two-dimensional filter masks used during edge detection. Re-fer to Appendix A.3.2 for their numerical representations. Left:Gaussian mask, broad (type 1). Right: Sobel mask, semi-coarse (type 4). . . . . . . . . . . . . . . . . . . . . . . . . . 97Figure 6.6 Flowchart illustrating Near-Infrared (NIR) shive detection pro-cess. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Figure 6.7 Comparison of UV-Vis shive images. A: Ultraviolet reflectanceillumination (385 nm LED). B: Ultraviolet transillumination(385 nm LED). C: Visible-light reflectance illumination (whiteLED). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Figure 6.8 Stages of NIR shive image edge detection, using MATLAB.A: Unprocessed JPEG image. B: Cropped, contrast-enhancedRaw image. C: Edge-detected Raw image. D: Largest binaryregion in edge-detected image, filled, cropped, and enlarged. . 104Figure 6.9 Stages of NIR shive image feature detection, using MATLAB.A: Histogram-equalized Raw image. B: Binarized Raw image.C: Feature-processed Raw image. D: Detected shive feature,circled, on original Raw image. . . . . . . . . . . . . . . . . 105Figure 6.10 NIR shive image feature detection, using ENVI. Red filled areasare probable shives, green filled areas are possible shives. . . . 107Figure 6.11 Mill and imaging shive counts for handsheet samples - lab test. 108Figure 6.12 NIR image of a NBSK pulp sheet, detecting dirt particles usingENVI. Blue circled regions indicate detected particles. . . . . 109xivFigure 6.13 Mill and imaging shive counts for handsheet samples - mill test(using ENVI and Python). . . . . . . . . . . . . . . . . . . . 110Figure 6.14 Results of NIR shive image feature detection, using ENVI andMATLAB. Red areas are probable shives, green areas are pos-sible shives. Field of view is approximately 19.4 × 12.9 cm.Shive count: 7. . . . . . . . . . . . . . . . . . . . . . . . . . 111Figure 6.15 Results of NIR shive image feature detection, using ENVI andPython 3.7. Red areas are probable shives, green areas arepossible shives. Field of view is approximately 21.5 × 14.4cm. Shive count: 13. . . . . . . . . . . . . . . . . . . . . . . 112Figure 6.16 A highly creased unbleached pulp sample. Field of view isapproximately 2.2 × 1.4 cm. . . . . . . . . . . . . . . . . . 114Figure 7.1 Ultraviolet fluorescence spectrum of an unbleached kraft pulpsample. Excitation wavelength = 190 nm. . . . . . . . . . . . 122Figure B.1 Raw results of empirical classification rule determinations; seeprevious page for detailed information. . . . . . . . . . . . . 212xvList of Equations1.1 Page Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Bonding Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Normalized Image Histogram . . . . . . . . . . . . . . . . . . . . . 132.2 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . 132.3 Histogram Equalization Algorithm . . . . . . . . . . . . . . . . . . . 142.4 Merge Algorithm Threshold . . . . . . . . . . . . . . . . . . . . . . 162.5 Frequency of a harmonic oscillator . . . . . . . . . . . . . . . . . . . 182.6 Potential energy of a harmonic oscillator . . . . . . . . . . . . . . . 182.7 Quantum energies of a harmonic oscillator . . . . . . . . . . . . . . 182.8 Morse potential of an anharmonic oscillator . . . . . . . . . . . . . . 192.9 Quantum energies of an anharmonic oscillator . . . . . . . . . . . . 192.10 DWT wavelet translation . . . . . . . . . . . . . . . . . . . . . . . . 272.11 DWT filter decomposition . . . . . . . . . . . . . . . . . . . . . . . . 282.12 General PLS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.13 Orthogonal Bases Used in PLS . . . . . . . . . . . . . . . . . . . . . 302.14 PLS Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.15 PLS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.16 PLS statistics: R2Y (cumulative) . . . . . . . . . . . . . . . . . . . . 322.17 PLS statistics: Q2 (cumulative) . . . . . . . . . . . . . . . . . . . . . 322.18 Kalivas’C1 Parameter for PLS Optimization . . . . . . . . . . . . . . 322.19 PLS statistics: RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1 Monte Carlo-Uninformative Variable Elimination . . . . . . . . . . . 374.1 General GPR Regression . . . . . . . . . . . . . . . . . . . . . . . . 544.2 Squared Exponential Covariance Kernel Function . . . . . . . . . . . 55xvi4.3 Probabilistic GPR Model . . . . . . . . . . . . . . . . . . . . . . . . 554.4 Gaussian Process Regression (GPR): Definition of α . . . . . . . . . 554.5 GPR: Log-Likelihood Maximization Expression . . . . . . . . . . . . 564.6 GPR: Expected Values of New Data . . . . . . . . . . . . . . . . . . 564.7 GPR: Probability Distribution of New Data . . . . . . . . . . . . . . 566.1 Image Moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946.2 Eccentricity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.4 Texture Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006.5 Roundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101xviiList of AcronymsATR-FTIR Attenuated Total Reflectance-Fourier Transform Infrared Spectroscopy,a widely used suite of spectroscopic techniques commonly bundled into asingle instrument. The ATR portion of the instrument uses a crystal with ahigh refractive index to send an infrared evanescent wave into a sample. Thereflected light is then collected into the FTIR portion of the instrument, wherethe data is converted from the frequency to the time domain, and is collectedwith a very high sensitivity. [4]CCD Charge-Coupled Device, a type of high-sensitivity photodetector that con-sists of an array of metal-oxide-semiconductor pixels, and that allows accu-mulated photoelectric charge to be shifted between pixels to an external am-plifier. A more specialized, sensitive, and expensive alternative to a CMOSsensor.CDF Cumulative Distribution Function, a mathematical expression representingthe probability that a given function will have a value less than or equal to anarbitrary evaluation point. The CDF for images is commonly used in machinevision applications.CLAHE Contrast-Limited Adaptive Histogram Equalization, an algorithm to en-hance image contrast that calculates histograms for various image regions(tiles), and linearizes the histograms’ cumulative distribution functions. Thecontrast-limited algorithm caps the histograms at an arbitrary value (the cliplimit), so as to avoid amplifying pixel noise in homogeneous regions.CMOS Complementary Metal-Oxide-Semiconductorsensor, a type photodetectorxviiithat consists of an array of metal-oxide-semiconductor pixels, each with anintegrated amplifier unit. A cheaper and less sensitive alternative to a CCD.DBS Dichroic Beamsplitter, an optical band-rejection filter that uses a dichroic op-tical coating to split light of a certain frequency range away from its incidentbeam path.DWT Discrete Wavelet Transform, a method of multivariate analysis that decom-poses a signal into a set orthogonal wavelets. Some wavelets (highest andlowest frequencies) are discarded, and the signal is reconstructed from theremaining wavelets.GPR Gaussian Process Regression, a nonparametric probabilistic statistical tech-nique used to build a classification model for complex sets of variables,where correlations may not be readily apparent. It does not require a lin-ear relationship between variables. Also called kriging.HNF Holographic Notch Filter, an optical component that uses a holographicallyetched surface to stop a very narrow range of frequencies.IDL Interactive Data Language, a programming language developed for data anal-ysis, used in conjunction with image processing software.MCUVE Monte Carlo-Uninformative Variable Elimination, an algorithm to builda possibility template for use with TOGA. The template is determined bycalculating hundreds of PLS models with randomized target vectors. [5]NBSK Northern Bleached Softwood Kraft, a standard type of pulp produced fromboreal softwood (i.e. coniferous) trees, used to reinforce writing paper andmanufacture kraft and tissue paper. [6]NIRS Near-Infrared Spectroscopy, a spectroscopic technique using light in the near-infrared region, typically between 700 and 2500 nm. Its main advantage overconventional (mid-range) infrared spectroscopy is its penetration depth; NIRis commonly used in medicine.xixPCA Principal Component Analysis, a multivariate statistical technique used torank orthonormal components of a transformed data set by their contributionto the overall variance.PLS Partial Least-Squares Regression, a multivariate statistical technique used tobuild a classification model for complex sets of variables, where correlationsmay not be readily apparent. It necessitates a linear relationship betweenvariables.RMSEC Root-Mean-Square Error of Calibration, a measure of the uncertainty ofa classification model’s training results; as such, the lowest possible valueis preferred. [7] Differs from the RMSEP in that it considers the training (orcalibration) data only.RMSEP Root-Mean-Square Error of Prediction, a measure of the uncertainty of aclassification model’s predictions; as such, the lowest possible value is pre-ferred. [7] In this way it the converse of the more commonly used Coefficientof Determination (R2). Differs from the RMSEC in that it considers the vali-dation data only.SBI Shive Branch Index, a quantification of the shape of a shive, as proposed byCorscadden et al. [8]SERS Surface-Enhanced Raman Spectroscopy, a Raman technique that employssurface adsorption to enhance the Raman effect of a sample by up to tenorders of magnitude, although the exact mechanism is still debated in theliterature.TOGA Template-Oriented Genetic Algorithm, a type of evolutionary multivariateprocessing technique that uses a fixed set of predictor variables to guide itsiterative calculations, in order to minimize their variance. [5]xxGlossaryAs with any industry, the pulp and paper industry has its own specialized lexiconthat may not be readily understandable by outsiders. This glossary collates anddefines some of this industry jargon that is relevant to this thesis.For acronyms, see the previous section.Product-Related TerminologyDissolving PulpHigh-quality pulp destined for use in synthetic or reconstituted fiberproducts, such as rayon and acetate film base (used after nitrocellulose andbefore contemporary polyester film base). The pulp is typically derivatizedby acetylation or xanthation. [9] The viscosity of the derivatized pulp solu-tions is a critical processing parameter that underlies the quality of the finalproduct.Northern Bleached Softwood Kraft (NBSK)Northern Bleached Softwood Kraft (NBSK) is Canada’s premier pulpexport product, known for its fiber uniformity, thickness, and length. It istypically marketed as a reinforcing pulp. Canadian NBSK consists mainly ofPinus contorta var. latifolia and Picea glauca, with minor amounts of Tsugaheterophylla, Pseudotsuga menziesii var. glauca, Thuja plicata, and Larixlaricina. [6]Eurasian NBSK consists mainly of Pinus sylvestris, with smaller amountsof Pinus sibirica, Picea abies, Picea obovata, Larix gmelinii, and Larixxxisibirica. Because of the large percentage of Larix-derived fibers, it is con-sidered to be of inferior quality.In the southern hemisphere, introduced Pinus radiata plantations pro-duce NBSK pulp. It is also called Radiata Pine Softwood Kraft (RPSK).Reinforcing PulpSoftwood pulp, typically NBSK, destined for use as a reinforcementagent in newsprint, cardstock, and tissue. High tensile strength is valuablefor these applications.Industry-Standard MeasurementsBasis Weight / GrammageStandard measure of product weight per unit area.Reported in g/m2 (grams per square meter; GSM).Standards: TAPPI T 410; ISO 536.Brightness (B)Standard measure of a product’s reflectivity.Reported as percentage of blue (457 nm) light that is reflected.Standards: TAPPI T 452, T 525; ISO 2470-2.Note: the standards listed above are substantially different from oneanother. The ISO scale can exceed 100 due to the use of fluorescent whiteningagents during pulp bleaching.Burst StrengthStandard measure of a product’s resistance to pressure.Reported in kPa (kilopascals).Standards: TAPPI T 403, T 807; ISO 2758.Degree of Polymerization (DP)xxiiStandard measure of the length of cellulose chains.Reported as a scalar, i.e. the number of individual monosaccharidescomprising the average cellulose chain.Freeness or Canadian Standard Freeness (CSF)Standard measure of a product’s resistance to water flow (technically,hydrodynamic specific volume).Reported in mL (milliliters).Standards: TAPPI T 227; ISO 5267/2.Kappa NumberStandard measure of the degree of de-lignification of pulp, related topulp hardness or “bleachability”.Reported as a scalar between 1 and 100. Calculated as: κ =% lignin content/0.15%.Standards: TAPPI T 236, ISO 302.L*, a*, and b*Standard measures of a product’s color or shade, using the CIE colorspace.L*: black-to-white scale (luminance; L* ∈ [0,100])a*: green-to-red scale (negative to positive)b*: blue-to-yellow scale (negative to positive)Standards: TAPPI T 1216OpacityStandard measure of a product’s light absorption.Reported as percentage of white light absorbed, measured in terms ofdiffuse reflectance.Standards: TAPPI T 425, TAPPI T 519, ISO 2471xxiiiPorosity (Air Resistance)Standard measure of the air flow through a product, at a given pressuredifferential.Reported in Gurley− seconds, defined as the time required for 100 cm3of air to pass through one square inch of product with a pressure differenceof 0.176 PSIStandards: TAPPI T 460, ISO 5636/5R18A measure of the amount of insoluble material in dissolving pulp sam-ples.Measured by dissolving pulp solutions in 18% NaOH solution.Reported as a mass percentage (%m/m)Refining EnergyThe energy at which a pulp batch is refined. Unrefined pulps are typi-cally reported as having a refining energy of 0, rather than being categorizedseparately from refined pulps.Pulp is often refined to achieve a particular freeness (CSF) value.Reported in kW h(kilowatt−hours).Often also reported as the Specific Refining Energy (SRE), defined asthe refining energy applied per ton of pulp ( kW hT ).StiffnessStandard measure of a product’s resistance to being torn.Reported in MN m/kg. Calculated as: S= max(dF/dl)(l/w)Basis Weight (g/m2) , where F isforce, l is sample length, and w is sample width.Standards: TAPPI T 489; ISO 2491.StretchStandard measure of a product’s elongation at point of rupture.xxivReported as a percent.Standards: TAPPI T 494.Tear IndexStandard measure of a product’s resistance to being bent.Reported in mN m2/g. Calculated as: TI= Tear Resistance (mN)Basis Weight (g/m2) .Standards: TAPPI T 414; ISO 1974.Tensile Breaking LengthStandard measure of a product’s resistance to stretching.Reported in km (kilometers), representing the minimum length of a sam-ple at which it would break under its own weight, if held vertically. Calcu-lated as: BL= Tensile Strength (N)Sample width (m)∗Basis Weight (g/m2)∗9.807m/s2 .Measured with a force probe, with its prongs initially placed a fixeddistance from one another (thus different from Zero-Span Breaking Length).Standards: TAPPI T 494.TEA (Tensile Energy Absorption)Standard measure of the work required to break a product.TEA is the integral of the tensile force-stretch percent curve.Reported in J/kg. Calculated as: TEA= Energy Applied (J/m2)Basis Weight (g/m2) .ViscosityStandard measure of the average Degree of Polymerization (DP) of thecellulose fibers of sulfite pulps.Reported in cm3/g (cubic centimeters per gram).Measured with a capillary viscometer.Standards: TAPPI T 230, ISO 5351-1.Note: with respect to sulfite pulps, the industrial definition of “viscos-ity” is equivalent to the academic definition of intrinsic viscosity; this defi-nition will be used herein without distinction. Kraft pulps are characterizedusing kinematic viscosity, measured in cP.xxvYellownessStandard measure of a product’s reflection of yellow light.Reported as a scalar normalized between 0 and 100.Standards: TAPPI T 1216, ASTM E313Zero-Span Breaking LengthStandard measure of a product’s resistance to rupture.Reported in km (kilometers), representing the minimum length of a sam-ple at which it would break under its own weight, if held vertically.Measured with a force probe, with its prongs initially placed 0 mm fromone another (thus different from Tensile Breaking Length).Often measured and reported for both wet and dry samples.Standards: TAPPI T 231; ISO 15361.xxviAcknowledgmentsI would like to thank the following individuals and organizations for their support,moral and material.In academia:• Dr. Ed Grant• The members of the Grant group:Luke Melo, Matt Kowal, Mahyad Aghigh, Kevin Marroquín Madera,Ruoxi Wang• The past members of the Grant group, especially those who laid the founda-tions for my work:Najmeh Tavassoli, Allison Bain, and Zhiwen Chen• The coterie of undergraduates and Co-op students who helped me with datacollection:Flora Iranmanesh, Kiara Grant, Michelle Li, and Keanson PhanvanIn industry:• Paul Bicho, and the employees of Canfor Pulp Innovations• Elias Sundvall, and the employees of PulpEye AB• Otto Lindebergxxvii• Mike Doucette, and the employees of Prince George Pulp MillElsewhere:My family and friends, who know who they are and require no further intro-duction nor mention.I would also like to acknowledge the generous financial afforded me supportfrom:• Canfor Pulp LP• Mitacs• The Natural Sciences and Engineering Research Council of CanadaxxviiiChapter 1Introduction1.1 Industry 4.0The year 1750 is the baseline against which the effects of modern technologicalprogress are compared: it defines the beginning of industrialized society. Sincethat time, we have radically transformed the way we produce goods, which hashad far-ranging ramifications on our daily lives. This has happened in a number ofwaves.In the late eighteenth and early nineteenth centuries, the Industrial Revolutionwas the first such wave. It was primarily predicated on the exploitation of coal,steel, and rubber, bolstered by the widespread adoption of hardy and high-yieldAmerican food crops in Eurasia.The second wave of industrial development occurred in the late eighteenth cen-tury. This wave, at the time called the “American system”, introduced the massproduction of replaceable parts along an assembly line - a concept that is secondnature in the present day. This drastically improved both productivity and productquality by more effectively dividing labor between workers.In the mid-twentieth century, the development of assembly line automationhas further revolutionized production. Putting routine manufacturing under thecontrol of computers allows for far more precise and rapid assembly; automatedmachinery usually outperforms its human counterparts, and has allowed for furtherminiaturization of components.1We are fast approaching a fourth wave of industrial development, which hasbeen termed “Industry 4.0” or “smart manufacturing”. This involves the applica-tion of the Internet of Things concept to the production line, so that the controlsystems generate continuous in-process product quality information, and can auto-adjust manufacture parameters to correct any deviations. In some ways this is noth-ing but a modification of the third-wave automation revolution; the control systemsare made able to communicate with one another directly, instead of relying on hu-man input for process control. However, this has rather profound implications, andpresents a host of new challenges.Immediately, one might think that “Industry 4.0” would eliminate the needfor any human involvement. This is not quite true, as industrial labor will shiftfrom operating control systems to coding, monitoring, and maintaining operationalsystems. But this stage is still in the future; for now, we are still in the design andtesting phase of these “smart” systems.1.2 “Smart Manufacturing” in the Pulp and PaperIndustryThere is great interest in the pulp and paper industry for pursuing the developmentautomated diagnostic and control technologies. Aside from savings due to reducedlabor costs, such technologies would provide a number of benefits specific to thisindustry. The manufacture process, discussed below, is highly energy-intensive;the ability to receive immediate feedback about the process provides room for real-time optimization, which has the potential to reduce energy consumption (and con-sequently, cost).The pulping process is also greatly affected by natural variations in the feed-stock, often in ways that cannot be easily accounted for. Monitoring resourcestreams generally requires meticulous record-keeping and expert assessment offeedstock quality. Although reliable, such methods are very labor-intensive, andcannot provide concrete predictions of downstream product quality effects. Al-gorithmic assessment of resource streams, coupled with multivariate predictionmodels, could do so.Likewise, on-line product quality monitoring is currently inhibited by the na-2ture of the standard analytical techniques employed by the industry. These usuallyinvolve removal of a sample of in-process pulp from the production line, whichis then subjected to time-intensive and destructive testing. Oftentimes, the wet-chemical analytical tests require the samples be refined in a pilot plant, generallylocated offsite from the mill and requiring yet more energy. Introducing “smart”methods entails the development of alternatives to such traditional testing tech-niques that provide rapid and reliable feedback and do not require the removal ofpulp from the production line. Common approaches include imaging and spec-troscopy; however, advanced data processing is required to translate this raw datainto useful information. [10, 11]With data predicting the properties of refined pulps, a manufacturer can makequality guarantees about their unrefined product when they sell it without havingto engage in wet-chemical testing and small-scale refining, thus greatly reducingtheir energy, time, and labor demands. This work presents several steps along thepath to automating this process.1.2.1 Scope of the ProjectThe main goal of this project is to develop a methodology that applies Raman spec-troscopy and chemometrics to model important characterizations of in-process pulpsamples, with an emphasis on predicting strength properties. We have made sig-nificant progress towards this end; we have developed a Raman probe system anda suite of multivariate analysis techniques that allow us to accurately predict theseproperties, effectively bringing this methodology from an academic laboratory set-ting to a working industrial pilot plant. This transition of technology is significant,as the gulf between academia and industry is often quite wide; this is generallydue to the high opportunity costs associated with untested technology, as well as ageneral lack of communication between the two disciplines. Successfully bridgingthis gap pushes the technology forward to the point where industrial players takenote and re-assess the opportunity costs, greatly increasing the chances that it willfind its way into full-scale implementation.The next chapter will introduce process control in general terms, and then pro-vide a theoretical background to the methods described herein. This is followed by3discussions of the various research stages (Chapters 3, 4, and 5), moving from abenchtop proof-of-concept Raman probe to a full-scale demonstration of hardwareand software implementation in a pilot plant setting. Future work, to be conductedduring a three-year postdoctoral fellowship,1 will be outlined in Chapter 7, high-lighting the remaining steps towards mill integration. Chapter 6 discusses a dif-ferent application of process control to the pulp industry: using machine vision toidentify contaminants in in-process unbleached pulp.As the focus of this thesis is on analyzing in-process pulp products, a basicdiscussion of cellulosic materials and their production is warranted before delvinginto the methods that are employed in their analysis.1.3 Chemical and Physical Attributes: Pulp as aBiomaterialFigure 1.1: The basic struc-ture of cellulose.Cellulose, the fundamental component of pulp, isa naturally occurring organic polymer consisting oflong chains of D-glucose molecules, strung togetherby β -1,4-glycosidic bonds (see Fig., 1.1). Cellulosefibers are the primary structural component in thecell walls of all green plants. These fibers are com-posed of many individual cellulose chains bundledtogether by inter-chain hydrogen bonding; smallerbundled side chains, termed fibrils, are also com-mon. Cellulose in fibers takes one of two forms, crystalline or amorphous, theformer of which is fundamental to the material’s strength. Cellulose fibers mayreach several millimeters in length, and are bundled together into networks, whichform the fundamental component of pulp. [9, 12–15]However, like all natural products, cellulose fibers are almost never chemicallypure. Fibers isolated from woody plant material contain lignin and hemicellulose,in addition to cellulose. These inclusions, which can each make up to one quarter ofthe overall mass of the fiber, serve important evolutionary purposes for the plants,but can affect the human uses of fibers. [13, 14]1Mitacs Accelerate award #IT13700, in continued partnership with Canfor Pulp LP41.3.1 HemicelluloseThe primary chemical difference between cellulose and hemicellulose is that thelatter may contain monosaccharides besides glucose, which are sometimes acidi-fied. Hemicellulose is polymeric and found in green plant cell walls, though be-cause of its heterogeneous (and often branched) structure, it is often physicallyweaker and more susceptible to chemical alteration than is cellulose. Two of themost common hemicelluloses are O-acetylgalactoglucomannan and glucuronoxy-lan, typically comprising 20-30% of the material in softwood and hardwood fibers,respectively. [13, 15] These are commonly referred to by the shorthands man-nan and xylan, and are substantially shorter than pure cellulose chains (80-120monosaccharide units long, compared with ∼1,500). [13]Hemicellulose is typically hydrogen-bonded to cellulose in fibers. When lo-cated at the surface of fibers, it can promote inter-fiber bonding, which to someextent reinforces the structural integrity of the cellulosic material; the mechanismwill be discussed later (see Ch. 1.5.1). However, hemicellulose in higher concen-trations stiffens the cellulose fibers to the extent that it weakens the strength of pulpproducts. [9, 13, 15, 16]1.3.2 LigninLignin is a vastly complex category of cross-linked phenolic biopolymers that areclosely associated with cellulose. Chemically, its hydrophobicity helps cells resistwater infiltration, and physically, it structurally reinforces cells by binding withcellulose microfibrils. Photooxidation of lignin is responsible for the age-relatedyellowing effect in some paper products. [13, 14, 17, 18] Since this is an undesir-able outcome for most applications, lignin removal is of utmost importance to, andin fact forms the backbone of, the pulping process.1.4 Industrial and Commercial Attributes: Pulp as aCommodityWe have devised many uses for cellulose fibers, the most basic being textiles andpaper. The earliest of these were created by mechanically extracting and processingfibers from wild plants. This is, of course, a labor-intensive process, and thus was5considered to be highly artisanal. Early paper was made from fibers derived fromeasy-to-obtain sources, including bark, hemp, flax, and cotton rags. In the 19thcentury, inventors in Canada and Germany simultaneously developed machinesfor mechanically extracting cellulose fibers from wood.2 These, combined withchemical bleaching processes, produced the first modern paper, what we todayrecognize as newsprint. Today, pulp is either classified as paper-grade, to be used inpaper, cardboard, and other such products, or as dissolving-grade, to be derivatizedand reconstituted as other materials, such as rayon or cellulose acetate film.1.4.1 Trees to Bales: The Pulping ProcessThere are a number of different pathways to manufacturing paper-grade pulp. Be-sides mechanically-produced newsprint, most paper-grade pulp today is producedchemically, through either the sulfite process or the kraft process. This pulp canbe sourced from either hardwood or softwood, i.e. broadleaf or coniferous trees.Many combinations of these can be made. [9, 19]As previously noted, the main intent of the pulping process is the extractionof cellulose fibers from wood, and subsequent delignification. The sulfite processdoes so under acidic conditions, at high temperatures and long timescales. Thekraft (or sulfate) process, on the other hand, operates under alkali conditions (butalso at high temperatures and long timescales). These reactions, converting rawwood chips into cellulose pulp, then solubilizing and removing lignin, are referredto as digestion. 3 Other pulping methods, such as the organosolv process, havebeen proposed, typically in an attempt to reduce the environmental impact of theprocess by using sulfur-free chemicals during digestion. [9, 20, 21]Comparing the two main processes, sulfite pulps are easier to bleach and refine,while kraft pulps are stronger and better resist aging. The kraft process dominatesthe paper-grade pulp market today, although the sulfite process is still used in theproduction of dissolving-grade pulps. [20] Neither process completely delignifiesthe pulp; bleaching, typically with chlorine dioxide, often follows digestion and2Charles Fenerty and Friedrich Gottlob Keller, respectively. Both inventions were made in thelate 1830s.3In this context, digestion is not an enzymatic degradation, but a chemical one. It takes its namefrom the pressurized reaction vessel (a digester) in which the process occurs.6washing in an attempt to remove any residual traces of lignin. Pulp destined forproducts such as cardboard or bag material, where white color is not required, donot undergo bleaching.Canada - especially B.C. - is best known for Northern Bleached Softwood Kraft(NBSK). The abundance of slow-growing conifers in the province’s interior providea huge supply of pulp, whose fibers are exceptionally long, uniform, and strong.[6] At market, it is typically sold unrefined to paper manufacturers, who refine itand use it to reinforce their products.1.5 Properties of Cellulose FibersThe important market value-determining parameters of pulp products are deter-mined by a wide range of interrelated factors. These include fiber morphology,pulp chemistry, and process conditions, but also more disparate factors such assource species, local climate, and oil prices. Attempting to incorporate such anextended range of information presents a classic “Big Data” challenge - findinghidden trends and patterns in vast datasets - perhaps to be solved at some point inthe future. Currently, however, it is more economical to focus on factors that areeasier to account for.1.5.1 Chemical PropertiesChemically speaking, pure cellulose is rather monotonous - nothing but chains ofglucose. Of interest is the Degree of Polymerization (DP): the polymeric lengthof the cellulose chains. A pulp’s DP is heavily dependent on process parameters;usually, the refining process degrades the cellulose chains (thus lowering their DP).[22] DP determines the viscosity of pulp solutions, which is an important parameterfor dissolving-grade pulps, directly affecting their market value. [23] As such, DP isfrequently assessed, albeit indirectly, by wet-chemical viscometry. DP, along withthe degree of cellulose crystallinity, are also correlated with strength properties,though the mathematical nature of these relationships is difficult to assess. [12, 24]More interesting is the content and structure of hemicellulose, lignin, and so-called extractives (a catch-all term for residual organic products from the raw ma-terials that have escaped digestion). All of these affect the inter-fiber hydrogen-7bonding networks, which in turn has a downstream effect on pulp’s mechanicaland strength properties. Of particular interest to the pulp industry is the carboxy-lation of hemicellulose, particularly xylans, during the kraft pulping process.4 InNBSK pulp, the most common xylan is arabino-4-O-methylglucoronoxylan; oneof its component monosaccharides is 4-O-methyl-α-D-glucoronic acid. Duringthe kraft process, this acid unit is partially decomposed to form hexenuronic acid.[9, 13] The bleaching process reduces the concentration of carboxyl groups in apulp due to further delignification and removal of hemicellulose from the pulp;the magnitude of the reduction depends on the bleaching agent used. As well asbeing indicative of strength properties (xylan concentration is correlated with ten-sile strength, via increased inter-fiber bonding), the presence of carboxyls on thesurface of fibers increases their hydrophilicity, thereby affecting moisture uptakeand retention. [9, 12, 16, 25] Thus, probing the surface carboxyl content of a pulpsample may serve to predict important properties.The chemistry of pulp samples also determines their various attendant opticalproperties, including brightness, whiteness, and opacity. These properties are mostdirectly affected by residual lignin content, and are most relevant for bleachedpulps.1.5.2 Morphological PropertiesEven if cellulose itself is chemically simple, cellulose fibers have a number of im-portant morphological properties that affect the final commercial value of pulps.These include fiber length, fiber wall thickness (coarseness), fiber/fibril shape pa-rameters (curl, kink, angle), and the degree of fibrilation. Of course, the morphol-ogy is predicated on the chemistry, but it remains much simpler and more intuitiveto, for example, study fiber length without attempting to use DP to predict it. Thelatter presents another “Big Data” challenge. Nonetheless, these fiber morphologyproperties can be easily correlated with mechanical and strength properties, suchas tensile or tear strength. In fact, fibril morphology is known to be a key factorin determining strength. [6, 9, 12, 25] Though fiber morphology is affected by thepulping process, it is fundamentally determined by plant genetics.4Carboxylation is also a result of oxidative processes affecting cellulose and lignin in the presenceof certain additives. Some extractives may also contain carboxyl groups.81.5.3 Characterizations of PulpCharacterizing pulp draws upon chemical, morphological, and process informa-tion, and the various analyses involved are typically time-consuming wet-chemicalassays. Several important characterizations relevant to this work are outlined inthe Glossary, though it is by no means a comprehensive list. These are mostlymechanical or optical characterizations, though some are chemical. These charac-terizations are typically made on unrefined pulp, but can also be made on refinedpulp. The refining process has a direct effect on most of these characterizations,since it alters the surface chemistry and morphology of fibers: it shortens the fibersand increases fibrilation. [2, 22]The Page Equation relates several important morphological properties to tensilestrength:1T=98Z+1B(1.1)where T is the tensile index (ratio of tensile strength to basis weight), Z is the zero-span tensile index, and B is the bonding index. The latter two terms represent thecontributions of fiber and bond strength to the overall tensile strength T . B can beexpressed as follows, taking the units N m/kg:B≡ bλP12AρRBA=bλR12C(1.2)The left-hand term is the conventional definition of B, and the right-hand termis the simplified form proposed by Anson et al. These expressions contain thefollowing variables, representing morphological properties of fibers: [26]• b is the area-normalized shear strength of inter-fiber bonds (mN/m2)• λ is the mean fiber length (mm)• P is the mean 2-D perimeter length of fibers (mm)• RBA is the relative bonded surface area of fibers (%)• A is the mean cross-sectional area of fibers (mm2)• ρ is the fiber density (g/cm3)9• R is the contact ratio, the fractional surface area of fibers that is available tobond with adjacent fibers (%)• C is the Clarke coarseness: the mass per unit area of fibers (g/mm)b and RBA are challenging to measure; RBA is estimated by measuring lightscattering or nitrogen absorption of handsheets, while b is estimated by linearizingEq. 1.1 once all the other parameters have been determined. As proposed by Ansonet al, the ratio of R toC provides an easier-to-measure estimate of the bonding area;the values are obtained through image analysis of the fibers in handsheets. [26]The Page equation fundamentally assumes that a sample has low fiber kink andcurl, i.e. straight fibers, as part of its dependance on fiber length (λ ). This is notthe case under process conditions, and can only be ensured by careful preparationof handsheets, thus limiting the equation’s applicability to in-process pulp.Furthermore, for reasons previously discussed, it can be deduced that the fiberproperties involved in Eqs. 1.1 and 1.2 are predicated on fiber chemistry, which canbe characterized by other means, though the precise nature of the relationship isdifficult to assess. These chemical characterizations, including DP, cellulose/hemi-cellulose ratio, functional group analysis, and trace inorganic analysis, are oftenthe most difficult and expensive values to obtain due to the complex wet-chemicalanalysis methods that are required. For example, determining the relative carboxylcontent of fibers involves titrations, which are very time-consuming and not espe-cially sensitive to surface carboxyl group concentration - which is, of course, ofthe most interest to the industry, as surface carboxyls have the most infleunce onmechanical properties, especially strength. [9, 25] Needless to say, very few, if any,of these characterizations can be performed in an on-line or at-line setting.10Chapter 2Process Control: Theory andMethodsProcess control, as a concept, was introduced in the twentieth century alongsidethe automation of assembly lines. It can be easily understood in terms of cooking- after all, preparing food is conceptually identical to an industrial production line,reforming raw ingredients into a final consumable product.We can define a process trajectory as, for instance, how flour evolves throughtime to become a cake. This trajectory evolves in a multi-dimensional space, whichaccounts for every variable that affects the process: temperatures, ingredient quan-tities, whisking force and time, baking temperature and time, etc. Consequently,there exist process boundaries; points of no return beyond which the process willfail, and the chef will be left with a blackened lump.Naturally, monitoring the evolution of dozens of variables through time is amonumental task. Instead one must apply attrition: the key variables must bedetermined, methods must be developed to monitor these, and strategies must bedevised to correct the trajectory if it starts to deviate. The ratio of yolk to whitewithin an egg may have a downstream affect on the consistency of the cake; but, toensure the correct ratios are met, it is much more useful to monitor the color andconsistency of the batter while it is being prepared than it is to perform a volumetricanalysis of the eggs. Such determinations - trajectory resolution (developing arecipe), dimensional reduction (determining which factors are most important to11monitor), sensor analysis (assessing taste, color, consistency, etc.), and correctiveaction - are the essence of process analysis and control. [27]In an industrial setting, there are a number of strategies in place to monitor andenforce process boundaries. Some process controls, such as temperature and timecontrol, are simple to automate. However, many controls that are designed to mon-itor complex parameters are, as previously mentioned, time- and labor-intensive,and require the removal of product from the line for analysis. Automating thesecontrols involves replacing the off-line analyses with machine-readable sensors,usually involving imaging or spectroscopy. The raw sensor output must then beinterpreted using multivariate analysis and chemometrics. [2, 10, 11, 28, 29] Thisdata can then be compared to historical data and to process boundaries, and cor-rective actions can be undertaken to ensure optimal process (and product) quality.Fully automating this cycle is the end goal of “smart manufacturing” research.Sensor-based controls have several advantages, even in off-line applications:higher throughput, non-destructive analysis, higher sensitivity, and multimodal-ity, for instance. Of those, multimodality is perhaps the most interesting; what tohumans would appear to be discrete data can be fused together to produce an en-semble characterization of a sample. Oftentimes this involves combining multipletypes of spectroscopy, each yielding a specific set of information, into a single hy-perspectral map. This can then be analyzed with chemometrics for covariance thatwould otherwise remain undetected. [11, 29]The following sections will outline the basic theory of the sensor methodolo-gies used in this work, as well as the multivariate analysis methods needed to in-terpret the data generated by the sensors.2.1 Machine VisionOne of the most basic ways to implement “smart manufacturing” is to equip controlsystems with machine vision. Wide-field thermal imaging is a good example ofthis; instead of monitoring dozens of temperature sensors, a single video feed froma mid-infrared camera is all that is necessary. Frames are analyzed for pixel regionsexceeding some threshold value, which has been calibrated to match a thresholdtemperature, and if hot regions are detected, the system can trigger an alarm, or12deploy fire suppression measures, as may be warranted.Such time-intensity image analysis is relatively straightforward. Image classi-fication, on the other hand, presents more of a challenge. Instead of looking forsimple time-resolved trends, classification algorithms must pattern-match shapes.An application of image classification to the pulp industry is discussed in detailin Ch. 6; this application uses images Near-Infrared (NIR) imaging, which pro-vides information that is not available by simple visual inspection (or conventionalvisible-light imaging). Because of this, and because of the relative simplicity ofthe instrumentation, machine vision is increasingly applied across many industries.[11, 27]2.1.1 Image Enhancement: Histogram EqualizationHighly inhomogeneous images can be difficult to analyze, especially if featuresof interest are divided between lighter and darker areas; Figure 2.1 illustrates this.The left-hand column shows how an image with an underexposed foreground (A)can be contrast-enhanced (B, C) using histogram equalization.If we consider an 8-bit grayscale image a with n pixels, ranging in intensityfrom 0 to 255, 1 we can calculate the normalized histogram of a as:ha(i) =nini ∈ [0,255] (2.1)The Cumulative Distribution Function (CDF) (or cumulative normalized his-togram) for a, representing the probability that pixel k has an intensity between 0and i, can then be written as:pa(i) =i∑k=0ha(k) (2.2)The idea behind histogram equalization is to flatten ha, so that the values ofthe corresponding histogram hb for the contrast-enhanced image b are uniformlydistributed. This is accomplished by defining a monotonic transformation function1In general, the bit depth can be denoted D; in such cases, 0≤ n≤ 2D−11301231050 50 100 150 200 2500123451050 50 100 150 200 2500 2 4 6 810 500.510 2 4 6 810 500.5100.511.522.51050 50 100 150 200 250 0 1 2 3 410 500.51HistogramImageRaw(A) (D) (G)(B) (E) (H)(C) (F) (J)HistogramEqualizedCLAHECumulatiave Distribution F’nFigure 2.1: An example of Histogram Equalization and CLAHE.T , such that pb = T (pa). T is determined empirically by minimizing the followingexpression for each value of i:|pb(T (i))− pa(i)| (2.3)Thus, T linearizes of pa, so that pb(i) ≈ 1255 . Since pa is discrete, this willalways be an approximation. Returning to the example in Fig 2.1, we can seethe results of histogram equalization in the middle row. Note how the equalizedhistogram (E) is far more dispersed than the raw histogram (D), and that the CDF(H) is partially linear, mostly in the middle region. Comparing the images A andB, it is apparent that although the foreground is more visible with higher contrast,the background has lost some contrast.Adaptive histogram equalization calculates localized transformation functionsover tiles of predetermined size, instead of one for the overall image. This has theadvantage of improving local contrast enhancement, but with the drawback that its14application to homogeneous tiles will result in the amplification of noise. [30]To avoid this pitfall, the Contrast-Limited Adaptive Histogram Equalization(CLAHE) algorithm has been developed. CLAHE applies a maximum slope to T ineach tile, so that the contrast cannot be enhanced beyond a certain value. Thisis done by defining a threshold for each histogram ha; any bins in ha above thethreshold will be truncated, and the truncated pixels will be “redistributed” equallyto the other bins. When pa is calculated, its slope is limited, which limits the localtransformation function as well. Thus, the limiting applies mostly to homogeneoustiles. [31]The bottom row in Fig. 2.1 shows the results of CLAHE applied to the imagein A. Compared with simple histogram equalization, CLAHE more smoothly dis-tributes the histogram (F), and the CDF is more even (J). The enhanced image (C)shows better contrast in the foreground and background, though there are someedge effects visible as darkening along the top right. Typically, CLAHE addressesthis by applying a bilinear interpolation between the edges of tiles. [31]2.1.2 Image AnalysisOne way of extracting information from images is to partition it into regions, astep often called segmentation. One common way of doing this is the watershedalgorithm, which takes its name from the hydrographical feature dividing drainagebasins. The fundamental principle of the algorithm is to treat a grayscale imageas a topological map, where dark regions are considered valleys and light regionspeaks. The watershed algorithm “fills” regions in the topography, starting fromthe scale level (s). In essence, the bottom s percent of pixels are collapsed intoone region, and the filling proceeds from there. s is a user-input parameter, andthe specific pixel intensity it corresponds to is calculated based on the image’s CDF(see Eq. 2.2). [32]The watershed algorithm typically produces a large number of highly similarsegments. Therefore, it is usually followed by a merge algorithm to reduce thenumber of segments. This is a dimensional reduction step, which as previouslymentioned is a key step in process control (see also Ch. 2.3). Merge algorithmsexamine the spatial and spectral features of adjacent segments, and joins them if15they are within a threshold λ . One approach to thresholding is to calculate themerge cost (t) for each pair of adjacent segments i and j:ti, j =AiA jAi+A j×∥∥µi−µ j∥∥2length(∂i, j)(2.4)where A is the segment area, µ is the mean pixel value in a segment, and length(∂ )is the length of two segments’ shared boundary. For a predefined λ , the adjacentsegments are merged if ti, j ≤ λ . [33] The merge algorithm proceeds from the mostsimilar adjacent segments to the most dissimilar, up to the merge level; this isanalogous to the scale level, and is again based on the CDF.2.1.3 Image ClassificationOnce segmented, an image’s features can be easily studied and classified. The mostbasic form of classification is rules-based, which compares the attributes of imagesegments to a preset list of of thresholds and makes determinations accordingly.In doing so, the rules assign a class score to each segment, based on its attributes.The attributes for all segments are compiled into a CDF. Binary rules assign a classscore of either 0 or 1 when comparing the segment attribute to the appropriatethreshold. Linear or quadratic rules apply a tolerance window to the CDF for clas-sification, and then weighted depending on their distance from the threshold. Therule set must be determined empirically, or can be based on a reference library anddetermined algorithmically. Unsupervised classification algorithms - requiring noa priori knowledge of the features to be classified - have been developed, thoughthese are beyond the scope of this work.In a process setting, feature data generated using these machine vision algo-rithms can be compared with historical trends to provide valuable process trajec-tory information. Multiple imaging modes and spectral bands can be exploitedsimultaneously to provide a wealth of data - entirely non-invasively. Ultimately,the amount of information that can be ascertained from examining images is lim-ited, and other techniques must be used to render a more complete characterizationof a process trajectory.162.2 Vibrational SpectroscopyTo reprise a previous example, the standard analytical methods for determiningcarboxyl content in pulp samples are not useful for automated process control.Spectroscopic methods, however, can be just as sensitive to chemistry as can titra-tions. With spectroscopic instrumentation becoming more and more common (andnotably, cheaper), research is being undertaken to deploy such methods in indus-trial applications. [10, 29]2.2.1 General TheoryFundamentally, vibrational spectroscopy studies how light interacts with matter ona molecular level; changes in the light after the interaction, indicating a transfer ofenergy from light to chemical bonds, provides valuable information about molec-ular structure. This energy transfer occurs in the form of vibrational excitation ofthe bonds, and the nature of the excitation is specific to each type of bond.To elaborate upon this, we must first consider the simplest case: a diatomicmolecule. We can imagine various ways in which this molecule can be notice-ably altered, without actually changing its components. Translating the moleculethrough three-dimensional space does not alter it in any perceptible way; nor doesrotating it about its linear axis. From a fixed observation point, rotation about theother two axes will alter the molecule, though this takes very little energy to ac-complish. Oscillation along the molecule’s linear axis, as if the bond were a spring,also alters the molecule; this vibration is what is of interest.We can infer that there is some energy that must be applied to the system in or-der to induce oscillation, which depends on the masses of the two atoms. To extendthe spring analogy to introduce the harmonic oscillator, we can invoke Hooke’s law,balance forces, and solve the resulting differential equation to obtain the frequencyof oscillation:F =−kr = ma⇒−kr = md2rdr2⇒ ν = 12π√kµ(2.5)17where µ is the reduced mass m1m2m1+m2 , which accounts for the difference in massesbetween the atoms. We can also express the potential energy of the system asfollows:VHO =−∫Fdr⇒∫kr dr⇒V = 12k(r− r0)2(2.6)where r is the bond length, r0 at equilibrium, and k is a characteristic constant ofthe bond. Using this, the Schrödinger equation for our diatomic molecule can besolved as follows:EHO = hν(n+12) = h¯√kµ(n+12) (2.7)where ν is the previously-described oscillation frequency, in terms of k and µ , andn= 0,1,2, . . ., the vibrational quantum number.2 It is clear from this that the energylevels of the harmonic oscillator are quantized (based on n), and that there is aminimum energy level - the zero-point energy. It is worth nothing that the potentialenergy in Eq. 2.6 is a classical approximation of the chemical bond, which willof course dissociate if the interatomic distance becomes too great. Accounting forthis real-world behavior introduces the anharmonic (or Morse) oscillator. Figure2.2 illustrates both.For the harmonic oscillator (green curve, left), the potential energy well aroundr0 is parabolic, and the energy levels are evenly spaced, per Eq. 2.7. The transi-tion between each energy level is equal too; they are spaced by a factor of hν .Red arrows indicate vibrational transitions; the harmonic oscillator model permits∆n=±1;. However, as previously noted, this model does not account for the realbehavior of chemical bonds, nor observed data of higher-order transitions. To do2Although convention dictates that the vibrational quantum number be represented by the Latinletter v, in this work it will be represented with n so as to avoid confusion with the Greek letter ν .18EEDE0rr0 r0E=hν(n+½) ErE=hν(n+½)+q(n+½)2n=0 n=0n=1n=1n=2n=2n=3n=3n=4n=4Figure 2.2: Harmonic (left) and anharmonic (Morse) (right) oscillators. Ar-row thickness indicates the relative probability of each transition.so, Eq. 2.6 must be recast as the Morse potential:VM = D0(1− e−a(r−r0))2 (2.8)whereD0 is the well depth (equal to the sum of the bond dissociation energy ED andthe zero-point energy E0), and a is analogous to k in the classical representation;it can be shown that a =√k0/2D0. The blue curve (right) in Fig. 2.2 illustratesthe anharmonic oscillator. Re-solving the Schrödinger equation using the Morsepotential introduces an additional term to Eq. 2.7, as shown below:EAO = hν(n+12)−qhν(n+ 12)2 (2.9)where q is an anharmonicity constant, again specific to each vibrational mode.Thus, the energy levels become more closely spaced as n increases, which accountsfor experimental observations indicating smaller energy differences between higher-order vibrational states. The red arrows indicate transitions where ∆n= 1,2,3. Asdenoted by the arrow thickness, a fundamental transition (∆n= 1) is the most prob-able, and overtone transitions become successively less probable as the change inn increases. [4]192.2.2 Mid-Infrared and Near-Infrared SpectroscopyLight in the near- and mid-infrared range oscillates at a frequency near that of mostmolecular vibrational modes. If the resonant vibrational mode causes a change inthe molecule’s dipole moment, thereby creating its own oscillating electric field,the light is absorbed and the molecule undergoes a vibrational transition. Thisforms the fundamental principle of both mid-infrared and Near-Infrared Spec-troscopy (NIRS). The presence of chemical bonds in a sample can be inferred byrecording which wavelength ranges from a broad emitter source get absorbed bythe sample and which do not. Mid-Infrared (IR) spectroscopy probes fundamentalmodes (∆n = 1), while NIRS probes vibrational overtones (∆n > 1) or combina-tion modes, where radiation is absorbed by two coupled vibrational modes ratherthan just one. [4] The instrumentation required by mid-IR spectroscopy is typi-cally much more complex than that for NIRS; this is because most molecules havesome IR-active fundamental bands, so pains must be taken to isolate the analytesignal contributions from those of the matrix. With NIRS this is less of a concern,since fewer molecules have accessible overtones and combination bands; thus NIRSinstrumentation is much simpler, and sample preparation can be minimal.A good deal of research has been published in the last decades on the applica-tions of mid-IR and NIR spectroscopy to wood, pulp, and paper. These applicationsoften involve characterizations of pulp or wood chemistry, and in several cases,NIR spectra have been used to model various pulp properties or to predict originspecies. [10, 15, 18, 21, 22, 34–48]Attenuated Total Reflectance-Fourier Transform Infrared Spectroscopy (ATR-FTIR)and other mid-IR techniques have been applied to characterize wood and pulp sam-ples, especially where surface coatings or surface analysis are of interest, but theinstrumentation limits its applicability. [14, 35, 37, 40, 47, 48] Specifically, thesample handling required for mid-infrared measurements is very challenging in aprocess environment. Pulp suspensions in water are difficult to probe because ofwater’s intense absorbance; solid samples are easily probed using an ATR crystal,but these are most suitable for surface characterization. [15]With the development of chemometric methods to be discussed later, researchershave investigated methods using NIRS to predict the properties of pulp samples.20NIRS offers some advantages over mid-IR spectroscopy, though has some draw-backs as well. Most biomaterials have low absorptivities in the NIR region, whichallows NIRS to easily probe into the bulk material of samples. [49, 50] Further-more, the instrumentation required is much simpler than that for mid-IR techniques,though Fourier Transform NIRS has found use. [18, 43] As such, minimal samplepreparation is required for NIRS, which is distinctly advantageous for process con-trol settings.However, since NIRS probes broad vibrational combinations and overtones,rather than fundamental bands as in mid-IR, it lacks the same chemical specificitythat mid-IR provides. This can be overcome to a degree with the application ofchemometrics (as discussed below). Researchers have used NIRS alongside meth-ods including Principal Component Analysis (PCA) and Partial Least-Squares Re-gression (PLS) to build predictive models for important pulp properties. An earlyexample was published by Antti et al, who modelled a variety of softwood pulpproperties. [34] A number of groups have published studies doing similar workwith Eucalyptus kraft pulps, predicting for example pulp yield, strength properties,viscosity, and lignin content. [21, 38, 42, 43, 51] Hauksson et al did the samefor Norway spruce (Picea abies), as did Tavassoli et al for Canadian NorthernBleached Softwood Kraft (NBSK) samples. [36, 46]These infrared techniques suffer from one major drawback, when applied inindustrial settings: their sensitivity to water. This is especially important for mid-IR techniques, but also for NIRS. Moisture content in in-process samples canvary widely, which easily overwhelms other chemical and morphological variance.This, in turn, limits the predictive ability of infrared techniques, unless moisturecan be controlled and accounted for. [34]2.2.3 Raman SpectroscopyIn applications where variable humidity or temperature make infrared techniquesimpractical, Raman spectroscopy is a promising alternative. Like mid-IR spec-troscopy, it is chemically specific, and like NIRS, its instrumentation is fairly simpleand easily adaptable. It is also largely insensitive to water, an important advantagefor process applications.21S0S1Virtual statesMid-IRAbsorptionNIRS Raman Scattering RayleighScattering(Stokes) (Anti-Stokes)n=1n=2n=3jn=0n=1n=2n=0Figure 2.3: Jabłon´ski diagram illustrating IR, NIRS, and Raman spectro-scopies.Raman spectroscopy relies upon the Raman effect, whereby light occasion-ally undergoes an inelastic scattering process when interacting with matter. Thevast majority of light-scattering events are elastic (also called Rayleigh scattering),whereby there is no energy transfer between the photon and the molecule. In somecases, though, the incident photon excites the molecule to virtual energy state; re-laxation from such a state shifts the wavelength of scattered photon (relative tothe incident photon). In most cases, the ground-state molecule gains energy andrelaxes to an excited vibrational state; the scattered photon will be redshifted inthis case. This is referred to as Stokes scattering. Anti-Stokes scattering occurswhen the incident photon interacts with an excited molecule, which relaxes to theground state, losing energy in the scattering process; these photons are blueshifted.In accordance with Boltzmann statistics, which predicts that most molecules oc-cupy the ground energy state under ambient conditions, Anti-Stokes scattering ismuch less likely than Stokes scattering. This inelastic scattering effect depends onoscillations in the molecule’s polarizability tensor; thus, Raman spectroscopy ischemically specific, as different functional groups will create different wavelengthshifts in inelastically scattered photons. [4]Figure 2.3 illustrates transitions exploited by Raman spectroscopy (red/orange22LaserSampleObjectiveLensBeamsplitterSpectrographFigure 2.4: A basic Raman spectrometer.arrows,), along with mid-IR (green arrow) and NIR spectroscopies (blue arrow). Thelatter two are simple absorptions of the incident radiation, promoting the moleculeto an excited vibrational state (while remaining in the same electronic state S0).The Raman lines show both Stokes (left) and Anti-Stokes (right) scattering. Aspreviously mentioned, the latter is less probable. In both cases, the excitation wave-length is the same, indicated by the red arrow. The black arrows indicate an elastic(Rayleigh) scattering process, where no transfer of energy occurs.A basic Raman probe is illustrated in Figure 2.4. Monochromatic light froma laser (red) passes directly through a Dichroic Beamsplitter (DBS) and is focusedinto the sample. Backscattered light follows the same path to the DBS; Raman-shifted light (purple) is diverted into a spectrograph and dispersed across a Charge-Coupled Device (CCD) sensor. There are numerous modifications and extensionsthat can be made to this design that improve collection efficiency, boost signal-to-noise, or mitigate sample fluorescence; for instance, adding laser line filters andspatial filters decrease the amount of incident laser light entering the spectrograph,thereby increasing Signal to Noise ratio (SNR). The region in Fig. 2.4 betweenthe DBS and condenser lens can also be modularized as a probe head, allowingthe laser and spectrograph to be remotely located and coupled to the probe viafiber optic cables. The spectrograph usually records wavelength-shifted light in the200−2000 cm−1 range, relative to the incident laser wavelength.Raman spectroscopy’s two main disadvantages are its relative signal weak-23ness and its sensitivity to fluorescence interference. A probe like the one in Fig.2.4 records spontaneous Raman scattering events, which are orders of magnitudeless probable than elastic scattering events, themselves much less probable thansimple absorption. Numerous enhancement methods have been developed, suchas Surface-Enhanced Raman Spectroscopy (SERS), but these require either exten-sive sample modification or much more complex instrumentation. For this reason,spontaneous Raman spectroscopy is preferred for study in process environments.Interference from sample autofluorescence is a serious challenge facing all Ra-man techniques; fluorescence can easily overwhelm any Raman signal, and itsspectral profile is typically broad, which impedes its suppression. Longer wave-lengths (λ ) can reduce the intensity of sample autofluorescence, at the cost of de-creased Raman intensity as well, since the scattering cross-section is proportionalto λ−4. Typically, the falloff in fluorescence intensity is more pronounced; longerwavelengths also reduce the chance of sample damage, so Raman spectroscopy isoften conducted using red to NIR lasers (638 nm or 785 nm). [52, 53]Raman spectroscopy provides complementary information to mid-IR spectroscopy;a comparison of the techniques is outlined in Table 2.1. As mentioned previously,IR absorption requires oscillation in the molecule’s dipole moment, which interactswith the incident radiation. Raman scattering, meanwhile, requires a change in themolecule’s polarizability tensor. Vibrational modes typically cause one or the otherof those effects. As O–H stretching modes involve large oscillations in dipole mo-ment but not in polarizability, Raman spectroscopy is largely insensitive to water,which is highly IR-active. [4, 54] This is quite useful for pulp samples.Because of this, and its instrumental simplicity and chemical specificity, Ra-man spectroscopy is generally well-suited to probing cellulosic materials. Cellu-lose, hemicellulose, lignin, and other components have a variety of Raman-activemodes; [55] Raman spectroscopy is likely sensitive to cellulose crystallinity, andmay also be sensitive to the Degree of Polymerization (DP), which as previouslymentioned underlie many important mechanical and physical properties. Therehave been a number of studies using Raman spectroscopy to predict such proper-ties.Several groups have used Raman spectroscopy to probe cellulose structure, dis-tinguishing amorphous and crystalline regions, and types of crystallinity [56–60],24Table 2.1: Comparison of spectroscopic techniques.Spectroscopic Technique Mid-IR NIR RamanPrincipleChange indipole momentChange indipole momentTransient changein polarizabilityProbesFundamentalmodesOvertones andcombination modesFundamentalmodesChemical Specificity High Low HighSignal Strength High High LowSignal Type Sharp peaks Broad bands Sharp peaksPrimary Interference Water Water FluorescenceIllumination Source Nernst rod Quartz lamp LaserDetector TypeThermal(HgCdTe etc.)Photodetectoror cameraCCD andspectrographCollection Mode Absorption Absorption BackscatteringSample Preparation Moderate None NoneOn-line Applicable? Difficult Yes Yesas well as examining lignin content. [18, 61, 62] Agarwal et al also used FT-Raman spectroscopy to monitor and characterize the bleaching process of sprucepulp. [63] The use of Raman spectroscopy to model and predict end-product prop-erties of pulp samples has been relatively limited. [1, 2, 28, 64, 65] In most cases,FT-Raman spectroscopy has been used, since it improves SNR without requiringextra sample preparation. [37]As with the NIRS-based predictions, these works used chemometrics to con-struct these models. The methods most relevant to the present work will be outlinedin the next section.2.3 Chemometrics and Multivariate AnalysisData collection is frivolous without the tools to interpret it. A typical Raman spec-trum consists of over 1,000 variables: each a binned stack of pixels read out from25the CCD. These variables cannot be assumed to be independent for a number ofreasons: the sensor’s limited resolution, Raman peaks’ intrinsic linewidth, com-plex chemical environments leading to overlapping Raman and autofluorescencebands, etc. Extracting information beyond peak assignments from such a highlycovariant dataset requires some more advanced processing techniques.Chemometrics are designed to do just this. As a discipline, it consists of a suiteof multivariate analysis techniques to classify, model, and predict information fromcomplex and multivariate datasets - typically spectral. They have proven invaluableto contemporary practical science, and are ripe for application to industrial processcontrol. [66, 67]2.3.1 Discrete Wavelet TransformOne of the fundamental principles of chemometrics is the elimination of uncorre-lated variance; that is to say, variance in the data that is unrelated to the target orgoal. Another is dimensional reduction: cutting down the amount of informationthat must be processed. Often these go hand in hand, and needless to say, greatcare must be taken not to reject important information. And given that the im-portant information is usually hidden, this can be an especially challenging step.[66, 68]Identifying and rejecting uninteresting information requires either a priori knowl-edge about the signal, or that a set of assumptions be made. In practical terms, itis generally safer to make assumptions about the nature of data based on well-established knowledge, since these can be easily generalized.For instance, it can be assumed that a Raman spectrum contains no featuresfiner than a certain value, regardless of what material the spectrum was collectedfrom. This is because of the inherent linewidth of Raman peaks, which is due pri-marily to inhomogeneous broadening caused by variations in sample morphology.Thus, any and all data finer than that linewidth should be safely rejectable, as itis probably nothing more than readout noise from the CCD. Likewise, extremelybroad features in a Raman spectrum can almost certainly be attributed to sampleautofluorescence and not Raman scattering, given the physical nature of those pro-cesses.26200 400 600 800 1000 1200 1400 1600 1800 2000200 400 600 800 1000 1200 1400 1600 1800 2000200 400 600 800 1000 1200 1400 1600 1800 2000200 400 600 800 1000 1200 1400 1600 1800 2000Approximation(Background)Detail (Noise)SignalRaman shift (cm-1)ArbitraryRaman shift (cm-1)ArbitraryRaman shift (cm-1)ArbitraryRaman shift (cm-1)ArbitraryFigure 2.5: A basic illustration of the Discrete Wavelet Transform algorithm,decomposing a source spectrum (left) into background (right, top),noise (right, bottom), and signal (right, center) components.Discrete Wavelet Transform (DWT) is a signal processing technique that can beused to easily make those rejections, without loss of core data. The DWT algorithmis graphically outlined in Figure 2.5. The figure on the left is a representative Ra-man spectrum of a pulp sample, after normalization. The DWT algorithm dilatesand translates a model wavelet ψ(t) across the spectral range, so that it approxi-mates the shape original spectrum: [5, 28]ψ j,k = ψ( t2 j− k)(2.10)where j is the decomposition step and k is the position in the spectrum. The choiceof model wavelet depends on the nature of the signal; typically one chooses awavelet that resembles or approximates any features of interest in the signal. [69]27The modified wavelet is then convolved with the spectrum, decomposing it perfollowing expression, with j initially equal to 1:f (t) =∑ka j′,kϕ j′,k (t)+∑kd j,kψ( t2 j− k)(2.11)where j′ = j+ 1 and ϕ(t) is the projected spectrum. This generates two sets ofwavelet coefficients: the detail coefficients d j,k, containing the fine noise aroundthe same frequency as the model wavelet, and the approximation coefficients a j′,k,containing the remainder of the spectral information after the detail has been re-moved. This process is re-applied to the approximation coefficients a preset num-ber of times (termed scale), with ascending model wavelet dilation. When thedesired scale s is reached, the result is one set of approximation coefficients repre-senting the signal background, and s sets of detail coefficients. [5]Typically the first one or two sets of detail coefficients d1,k, d2,k are discarded,as they contain fine noise but no useful information; an example can be seen onthe bottom right of Fig. 2.5. Likewise, the approximation coefficients as,k arediscarded due to lack of useful information; this is illustrated on the top right ofFig. 2.5. [5, 28, 70]The algorithm can then recombine the remaining detail coefficients to recon-struct the filtered signal (Fig. 2.5, center right). It can also simply output theremaining detail coefficients; this is advantageous due to the reductive nature ofthe decomposition in Eq. 2.11. Each time it is applied, it essentially cuts the lengthof the data in half, as the filter downsamples the data by a factor of two (repre-sented by 2− j in Eq. 2.10). Rejecting the first two sets of detail coefficients thusrepresents a significant decrease in the data’s dimensionality. [5]DWT holds a few advantages over other filtering methods. Using smoothingfilters such as the Savitzky-Golay or moving average filters causes an unacceptableamount of signal loss, as they operate indiscriminately, and there is no preserva-tion of the removed information. The Fourier Transform (FT) is another commonlyapplied denoising technique, but unlike DWT it does not preserve any positional in-formation. Thus, it can be challenging to determine whether the frequencies beingremoved contain useful information, which may be expressed through localization28along the spectrum (eg. sharp peaks). [69]With a substantially reduced dataset of wavelets, free from most uncorrelatedvariance, it becomes much easier to accurately model the parameters that the spec-tral dataset will hopefully predict.2.3.2 Partial Least-Squares RegressionPartial Least-Squares Regression analysis, also termed “Projection to Latent Struc-tures”, is one of the primary chemometric methods used herein. The algorithm wasinitially developed for basic statistical analysis. In its simplest form, it is termedPrincipal Component Analysis (PCA), and it serves to classify a single data blockby uncovering hidden variance. PCA has been elaborated upon and adapted for usein chemistry, engineering, social science, and other fields. When analyzing morethan one data block at once, the algorithm is referred to as PLS. [66, 71] PLS isused to build regression models based on hidden covariance between several datablocks; in this work, between spectral data and target properties.Generally speaking, a linear regression takes the familiar form of y = ax+ b.When dealing with matrices rather than scalars, this can be generalized as follows:Y= Xb+ ε (2.12)X is an (m× n) data matrix, and Y is an (m× 1) target vector, consisting ofknown, measured values associated with each row of X.3 b then represents an (n×1) vector of regression coefficients, analogous to a in the linear model, weightingeach of the n variables according to its covariance with Y. ε is the error term.The PLS algorithm projects X onto an A-dimensional orthogonal basis set,whose components are linearly uncorrelated. These are denoted scores. X-scores(T) are estimated by weighting linear combinations of the original variables; theweights matrix is expressed asW. This can be represented as follows:X= TW| = TP|+E Y= UQ|+F (2.13)3It should be noted that the target Y could in fact be sized (m× p) containing p target properties,but in this work, only one is evaluated at a time.29XX’Y(lab)Y(pred)bPT UQWE FFigure 2.6: Schematic of the Partial Least-Squares Regression algorithm,shown in Eqs. 2.13 and 2.14.The scores T,U are estimates of the latent structures within the original data,X and Y respectively. Likewise, the loadings P,Q are reduced representations ofthe original data, such that the error terms E,F are minimal. The X-score vectorsta,m are predictors of Y, and the X-loading vectors pa,n are the eigenvectors of thecovariance matrix X|X. Likewise, the weight vectors wa,n are the eigenvectorsof the combined covariance matrix X|Y×Y|X. [66, 71] The scores are alwaysorthogonal to the loadings, but whereas the weights are orthonormal and the scoresare orthogonal, the loadings are not orthogonal (or orthonormal). The regressionitself can be expressed as: [7, 71, 72]Y= Xb+ ε b=WU| ⇒Y= X×W(P|W)−1×Q|+ ε(2.14)Figure 2.6 shows a schematic diagram of the PLS regression, as representedin the previous equations. The X and Y data blocks are decomposed into scoresand loadings, as per Eq. 2.13; these are then used to calculate b, which is used topredict Y values for new data X’, as per Eq. 2.14. The red box around the X dataindicates the limited extent of the PCA algorithm in comparison to PLS.Determining theY-weights, represented asW(P|W)−1, is mathematically chal-lenging due to the matrix inversion of the loadings-weights product. For this rea-300.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0Correlation coeff icientR2YQ2(cum)10 comp. used for prediction w ith 'PLS Model 3', 100 permutationsData pretreatment: Range Normalized Symmetric DWT sym5 7 levels reconstructed by excluding A[7]Visc: Intercepts R2Y(cum) =  1.00, Q2(cum) =  0.90R2Y(cum)Q2(cum)0.0 0.2 0.4 0.6 0.8 1.00.00.10.20.30.40.50.60.70.80.9Correlation coeff icientR2YQ2(cum)4 comp. used for prediction w ith 'PLS Model 3', 100 permutationsData pretreatment: Range Normalized Symmetric DWT sym5 7 levels reconstructed by excluding A[7]Visc: Intercepts R2Y(cum) =  0.19, Q2(cum) =  0.040R2Y(cum)Q2(cum)Figure 2.7: Randomized target vector plots for PLS models built with Ramanspectra of dissolving pulps. Left: Overfit model; A= 10. Right: Well-fit model; A= 4. These results are discussed in Ch. 5).son, algorithms typically estimate the regression coefficients with the following ex-pressions involving singular value decomposition of the covariance matrix: [7, 72]b≈ (SQ)Q| where:S= X|Y and Q= dominant right eigenvector of S|S(2.15)The algorithm is repeated A times, accounting for A latent structures in the data;these are called components, or latent variables. This is represented by the numberof columns in the score matrix T and the number of rows in the loading matrix P.The general assumption underlying the PLS algorithm is that A should be small, thatis to say, that there are few latent variables that account for the covariance betweenX and Y. [71] Determining A is a very important step, as it represents a trade-off between model under- and over-fitting. Underfit models do not incorporateenough information to accurately predict covariance between X and Y: they arehighly biased. Overfit models incorporate too much information, to an extent thatit cannot interpret new data: they incorporate too much variance.Figure 2.7 is a simple illustration of overfitting. The PLS models were calcu-lated using 10 (left) and 4 (right) components, and the plots were produced byrandomizing the target vector Y and calculating statistics about these randomized31models. The x-axis shows the correlation coefficient between randomized and orig-inal target vectors. Red points show the cumulative R2Y values for each random-ized model, representing the explained variance in Y; the red horizontal line aty= 0.35 indicates a fitness threshold. The cumulative R2Y value for a given num-ber of components A can be calculated as:R2Y = 1− ∑A (Y− y¯)2∑A′(Y− y¯)2 (2.16)where y¯ is the mean value value of Y, and A′ is the total number of components,usually 20. In Fig. 2.7, A is 10 (left) or 4 (right).Blue points in Fig. 2.7 show the cumulative Q2 values, representing the predic-tive ability of the model determined through cross-validation; the blue horizontalline at y = 0.05 is also a fitness threshold. The cumulative Q2 value for a givennumber of components A can be calculated as:Q2 = 1−A∏(∑A (Y−Ypred)2∑A−1 (Y− y¯)2)(2.17)where Ypred are the model predictions for Y.In a well-fit model (Fig. 2.7, right), the R2Y and Q2 values should be muchlower in the randomized models than in the original models, as the randomizationstep should minimize the covariance between X and Y. Thus, R2Y and Q2 valuesthat do not change (or that increase) after randomization indicate that the PLSmodelis incorporating too much uncorrelated variance.Kalivas and Palmer describe a method for algorithmically optimizing the se-lection of A. They proposed the following parameter: [73]C1 =∥∥∥bˆA∥∥∥−∥∥∥bˆ∥∥∥min∥∥∥bˆ∥∥∥max−∥∥∥bˆ∥∥∥min+RMSECA−RMSECminRMSECmax−RMSECmin (2.18)where∥∥∥bˆ∥∥∥ is the vector two-norm of the regression coefficient vector b, and A is theselected number of components with which the model is constructed. Thus, a list32ofC1 values are calculated for each value of A and the minimum result is selected,determining the optimal number of components to use. Both Kalivas’ algorithm aswell as the statistical method described previously (see Fig. 2.7) require a numberof models be built, with A values typically ranging from 1 to 20.With an appropriate number of components, a PLS model can be constructedwith confidence. Typically, models are trained with a subset of the input data,with the remainder being left out for model validation. There are a number ofcross-validation strategies to partition the dataset into training and validation sets,such as leave-one-out cross-validation and Monte Carlo cross-validation. [5, 71,74, 75] To avoid selection bias, cross-validation steps can be repeated until eachdata object has been excluded at least once. This provides statistics, such as theaforementioned Q2, that can give insight into the model’s quality.Another important measure of model quality is RootMean-Square Error (RMSE),which represents the average accuracy of the model. It can be calculated as:RMSE=√∑(Ypred−Y)2n(2.19)where n is the number of predicted objects, and Y and Ypred are as previously de-scribed. The distinction between Root-Mean-Square Error of Calibration (RMSEC)and Root-Mean-Square Error of Prediction (RMSEP) depends on whether the valuesof Y are drawn from the calibration or validation sets, respectively.The best way to assess the predictive ability of a cross-validated PLS modelis to feed it new data, for which the target value is unknown or withheld. Thissimulates the real-world application of the model. Of course, a PLS model in anprocess control setting is fed with data nearly constantly; ensuring the long-termstability of such a model requires some upkeep. The model should periodically berevalidated with new and historic data to ensure its continued accuracy.Thus, even though implementing automated process control systems as part ofa transition to “smart manufacturing” will obviate the man-hours currently spent onquality control procedures, these new systems will create new opportunities, andthe reduction in process down-time will free up resources that can be reinvested inresearch and development to further optimize the production process.33Chapter 3Modelling Dissolving PulpViscosityAs a laboratory-scale proof-of-concept to demonstrate the application of Ramanspectroscopy and multivariate analysis as a process control method in the pulpindustry, we built a set of calibration models predicting the viscosity of a set of dis-solving pulp samples provided by Domsjö Fabriker AB (Örnsköldsvik, Sweden), asulfite mill.3.1 IntroductionViscosity is an ideal parameter to model, as it is a key quality measure of high-value dissolving pulp. It depends on the molecular weight of the pulp’s com-ponent cellulosic fibers, and on the strength of inter-fiber interactions within thepulp. These in turn depend on the Degree of Polymerization (DP) and hemicellu-lose fraction, respectively. Since the presence of lignin within the pulp seriouslyaffects inter-fiber bonding and thus viscosity, dissolving pulps are bleached to re-move any residual lignin before they are derivatized. Mills producing dissolvingpulps monitor the viscosity of their in-process product using wet-chemical analyti-cal procedures. Standard practice involves dissolving the in-process pulp in 0.5 Mcupriethylenediamine, and measuring the solution’s viscosity with a capillary vis-cometer. [12, 23, 24]34Needless to say, this is an expensive and labor-intensive analysis method. Itslong timescale prohibits it from being implemented in an on-line quality controlsetting. Raman spectroscopy, however, is sensitive to chemical characteristics ofpulp, including hemicellulose content and cellulose end-group concentrations, thelatter of which is related to DP. Since these are determining factors for dissolvingpulp viscosity, it stands to reason that calibration models trained with spectroscopicdata could be used to accurately predict the viscosity of pulp samples.A previous student in our group conducted a preliminary set of measurementsand modelling with some of our dissolving pulp sample set as part of her M.Sc.thesis. [1] Since her graduation, Domsjö sent us more samples, along with theirattendant viscosity data. We combined these datasets to construct the models dis-cussed herein.3.2 Materials and MethodsWe collected Raman spectra of the dissolving pulp samples using a five-axis trans-lation/rotation/tilt mount for a backscattering Raman spectrometer that coupled itwith a chamber designed to hold dried dissolving pulp samples (called brightnesspads; see Ch. 4 for more information about these and the chamber). [1] The sampleset consists of 236 bleached and dried dissolving pulp sheets, with viscosity valuesmeasured by Domsjö’s on-site laboratory. They also measured the R18 values foreach sample.The sample mount uses a motor and pulley system to rotate the samples, andthe spectrometer’s laser is focused off-center relative to this rotation; this allows thespectrometer to probe a spatially-averaged area, in effect tracing a circle around thesample. Such spatial averaging minimizes the effect of sample inhomogeneity ondata quality. This design works only with dried pulp sheets, which are representa-tive of the final product at the end of the production line in a mill; these pulp sheetsare routinely tested for quality assurance purposes in a mill’s on-site laboratory.Details of the probe mount design can be found in Alison Bain’s M.Sc. thesis. [1]The spectrometer we use is a backscattering free-space Raman probe man-ufactured by Wasatch Photonics (Model WP785L; Durham, NC, USA). The self-contained unit has an incorporated 785 nm laser, with a maximum power of 100 mW35and a focal length of 32 mm. Its readout ranges between 270 and 2000 cm−1, witha resolution of 10 cm−1 (50 µm slit width). We performed five replicate measure-ments on each sample, each with an integration time of 500 ms, to further minimizethe effect of spatial inhomogeneity and selection bias.3.2.1 Data TreatmentRefer to Appendix A.1.Dimensional reduction is always a critical step in any type of multivariate cali-bration process, as outlined in Ch. 2.3. We take several steps in order to reduce thedimensionality of the Raman dataset. The first and simplest is to produce a sec-ondary dataset by averaging the replicate data, generating a single representativeRaman spectrum per pulp sheet.We next decompose these representative spectra into multi-resolution coeffi-cient space using Discrete Wavelet Transform (DWT). [46, 75] We use the symlet-5(sym5) wavelet to decompose the spectra with a scale of 7, and reject the highest-and lowest-frequency wavelets to eliminate noise and the broad fluorescence back-ground from the data, respectively. The resulting set of wavelet coefficients isapproximately one quarter the size of the original spectra dataset - in this case,reducing the dataset from over 1,000 points to several hundred.We also explored the application of feature selection to this reduced dataset, us-ing Template-Oriented Genetic Algorithm (TOGA). [5] TOGA uses a 100-generationgenetic algorithm to search for highly covariant features within a dataset, using aninitial probability template based on that same dataset. The probability templateis calculated using Monte Carlo-Uninformative Variable Elimination (MCUVE),which builds 400 Partial Least-Squares Regression (PLS) models withMonte Carlo-randomized target vectors and compiles the coefficient vectors, to determine whichvariables are most likely to be selected by the genetic algorithm.36This likelihood is measured by a fitness parameter s:s j,k =mean(b j,k)std(b j,k)(3.1)where b are the PLS coefficient vectors, and j,k are wavelet and position indices, asdefined in Eq. 2.10. For each j,k, the mean and standard deviation are measuredacross the 400 randomized PLS models, and the 150 largest values of s (“highlystable” features) are weighted and passed on as the probability template p.The genetic algorithm within TOGA itself works by minimizing this fitnessfunction, which in this case is the Root-Mean-Square Error of Prediction (RMSEP)of PLS models built with 50 combinations of variables, called chromosomes, eachcomposed of 15 features (see Eq. 2.19). When input into the TOGA algorithm, theprobability template p directs the genetic algorithm to initially use certain “highlystable” features as determined by MCUVE in Eq. 3.1, rather than purely randomones. After the minimization is completed, the chromosomes are permuted into anew generation; some are directly transferred (elite children, i.e. those with thebest fitness values), some are mixed with other features (crossover children), andsome are mutated (or in other words, randomly re-selected). With the new gener-ation of features, the optimization process is repeated. This process iterates up to100 times, or until some arbitrary convergence factor is reached.Ultimately, the TOGA algorithm selects the 10 features that represent the mosthighly-correlated areas of the input data with the target, in this case viscosity. Thealgorithm iterates ten times with randomized probability templates, and the resultsare collated; the features are then ranked by the number of times they are selectedbetween iterations, which illustrates the most important ones.With the reduced datasets described above, we explored two separate PLS algo-rithms for modelling viscosity: SIM and DCV. [72, 76] In both cases, we randomlyselect 30% of the data for leave-out validation, and construct models using the re-maining 70%. Our SIM algorithm, adapted from de Jong, [72] constructs 1,000independent models and averages their results. Our DCV algorithm uses a doublecross-validation method, as outlined by Li et al. [76]One of the most important steps in any PLS process is selecting an appropriate370 1 2 3 4 5 6 7 8 9B 1040.40.50.60.70.80.91C 1PLS Factor Optimization - Best: 50 0.5 1 1.5 2 2.5B 1040.50.550.60.650.70.750.80.850.90.951C 1PLS Factor Optimization - Best: 3Figure 3.1: Factor optimization charts for DWT-treated (left) and TOGAfeature-selected (right) data. The minimum value for C1 represents theoptimal number of components, 5 and 3 respectively.number of components. For this we used the C1 parameter as calculated in Eq.2.18. [73] Figure 3.1 shows representative optimization charts for the differenttreatment approaches. On the y-axis is the C1 parameter, and on the x-axis is thenorm of the PLS coefficient matrix∥∥∥bˆ∥∥∥. Each blue diamond represents a PLSmodel constructed using an iterated number of components (A); the data pointwith the minimum C1 value represents a model built with the optimal number ofcomponents. For DWT-treated data, the optimal number of components was A= 5,and for TOGA-treated data it was A= 3. This lower value is to be expected, as TOGAreduces the dimensionality of input dataset.3.3 Results: The Laboratory Proof-of-ConceptFigure 3.2 shows the complete spectral dataset (650 individual spectra), after rou-tine averaging and normalization (above), and as reconstructions from the DWTwavelets (below). Excluding the two highest levels of detail coefficients is an ef-fective - though not easily observable - denoising step, while excluding the approx-imation level totally removes the large fluorescent background present in the un-treated data, visible as the pronounced diagonal slant. As is evident from the upperband of spectra, there is considerable variation in this fluorescence background; its38Figure 3.2: 650 Raman spectra of dissolving pulp samples used to build PLSmodels. Above: Normalized spectra. Below: Spectra reconstructedfrom DWT coefficients, filtered by removing the highest two detail levelsand the approximation level.exclusion from the data removes a major source of uncorrelated variance, creatinga downstream improvement on the quality of the PLS models.The three main feature-containing regions in the spectra can be assigned asfollows. The low-wavenumber peaks (200−450 cm−1) are generally due to skele-tal bending modes in the cellulose chains. Variance in this region could indicatehemicellulose, and is also linked to amorphous cellulose. [1, 55, 59] The promi-nent peaks around 1000−1150 cm−1 arise from C–C and C–O stretching modes,which are prevalent in cellulose chains. The peaks around 1200− 1450 cm−1are related to various skeletal stretching modes, including C–O–H and C–C–Hbending. [55]Glucose monomers polymerize to form cellulose chains through condensa-tion, whereby two C–OH groups are united into a C–O–C glycosidic linkage.Thus, with increasing chain length (DP, and thence viscosity), the prevalence ofC–OH bending modes should decrease, with a corresponding increase in C–Oand C–O–C modes; these changes should be observed spectrally. [1, 55] In herM.Sc. thesis, Alison Bain correlated various peak heights of dissolving pulp Ra-man spectra with the samples’ viscosity. Though she met with some success, it isultimately more appropriate to uncover such complex correlations algorithmically;by looking at the whole spectrum instead of just certain peaks, we avoid introduc-39Figure 3.3: Left: Prediction plot of a PLS model built with the SIM algorithmshowing 477 training points (red) and 204 validation points (green).Right: A similar model, built instead with averaged spectra, showing159 training points (red) and 68 validation points (green). Blue dia-monds in both plots indicate the predictions for a specially reserved testset of 5 spectra whose viscosities are unknown. All axes in cm3/g.ing human bias (what we think should be important, but that may not necessarilybe) into our analysis. [1]When building the PLS models, we used a variety of data representations, inaddition to the two aforementioned PLS algorithms. These included wavelet coef-ficients, coefficient-based spectral reconstructions, feature-selected wavelet coeffi-cients, and feature-selected spectra. An example of one such model can be seen inFig. 3.3, built using the SIM algorithm. The prediction plots show the results ofthe PLS regression fitting covariance between a matrix of filtered DWT coefficientsand a vector of known and measured viscosities provided by Domsjö for with eachof the 681 pulp samples. The left-hand plot shows a model built with no spectralaveraging, while the right-hand plot shows a model built after spectral averaging.Red points on the plots show the predicted viscosity vs. the measured viscosityof the training set, a randomly-selected 70% subset of the data input. Green pointsshow the same for the left-out independent validation set, the remaining 30% ofthe data input. The blue line is a least-squares line of best fit, added for cosmeticpurposes, to show general deviation from the ideal y= x (black) line. The blue dia-40monds in each plot represent the predictions for a set of 5 samples whose attendantviscosity data were withheld by Domsjö, as a truly independent verification of ourmodels. They are plotted along the y = x axis (for want of a proper “measured”value), and are shown with error bars indicating the Root-Mean-Square Error ofCalibration (RMSEC).1 In both cases, there is clear alignment along the ideal line,indicating that the models performed reasonably well.Table 3.1 shows the results of models built using the various data represena-tions, both with and without replicate measurement averaging. The models in Fig.3.3 are represented on the second and sixth lines of the Spectral Reconstructionssection in the table. In general, theDCV results perform somewhat better than SIM.Using an optimized number of factors, determined with theC1 parameter as in Eq.2.18, yielded slightly poorer results in terms of Root Mean-Square Error (RMSE),but exhibited much less evidence of overfitting.We ultimately found that TOGA provided little in the way of improved modelperformance, while greatly increasing computational expense; in essence, compu-tational overkill. RMSE values for TOGA models were were slightly higher fromthose for DWT models, and the TOGA-SIM models built with 13 factors failed out-right as they generated near-singular covariance matrices. Even with code paral-lelization, the TOGA algorithm in its entirety takes several hours to execute. Thuswe decided that applying TOGA to this dataset was not ultimately necessary. Werethe dataset much larger (in terms of number of samples), it could be worthwhile torevisit TOGA.The RMSE values in general indicate good predictive ability across the variousmodel types, similar to the measurement reproducibility in Domsjö’s on-site lab-oratory. Though these figures are instructive insofar as model building goes, it ismore useful to look at how the models perform with unknown samples if one is togauge their performance. Looking at the predictions generated by the models listedin Tab. 3.1, we determined that those built with spectral reconstructions yielded themost reasonable results.1RMSEC was chosen for the error bars instead of RMSEP, since that best represents the modelerror along the x-axis, i.e. the measured values, which are unknown in this case.41Table 3.1: PLS model results for viscosity, built with dissolving pulp data.Method Algorithm Components RMSEC RMSEP Rel. ErrorNormalized SpectraFullDatasetSIMC1 = 9 21.0 26.2 4.9 %13 11.4 23.1 4.9 %DCVC1 = 7 19.9 24.5 4.8 %13 20.1 24.5 4.8 %ReplicatesAveragedSIMC1 = 6 27.5 31.6 6.0 %13 6.9 25.1 4.7 %DCVC1 = 6 19.5 25.5 4.8 %13 21.3 25.5 4.8 %Spectral ReconstructionsFullDatasetSIMC1 = 5 29.4 30.9 5.8 %13 18.6 29.8 5.6 %DCVC1 = 5 20.6 27.8 5.3 %13 19.8 25.6 4.8 %ReplicatesAveragedSIMC1 = 6 26.0 31.2 5.9 %13 10.7 27.2 5.1 %DCVC1 = 6 20.1 25.7 4.9 %13 19.0 24.0 4.5 %DWT CoefficientsFullDatasetSIMC1 = 5 29.9 31.5 5.9%13 18.4 30.6 5.8 %DCVC1 = 5 19.9 25.3 4.8 %13 19.6 25.3 4.8 %ReplicatesAveragedSIMC1 = 5 29.3 33.1 6.2 %13 10.8 31.0 5.8 %DCVC1 = 5 19.2 24.7 4.7 %13 19.0 24.7 4.7 %Continued . . .42Method Algorithm Components RMSEC RMSEP Rel. ErrorTOGA ReconstructionsFullDatasetSIMC1 = 5 27.4 29.1 5.5 %13 Model failed.DCVC1 = 5 22.6 29.6 5.6 %13 22.4 29.6 5.6 %ReplicatesAveragedSIMC1 = 5 32.5 33.1 6.2 %13 Model failed.DCVC1 = 3 20.9 27.2 5.1 %13 24.3 27.2 5.1 %TOGA CoefficientsFullDatasetSIMC1 = 5 32.0 32.4 6.1 %13 Model failed.DCVC1 = 3 23.7 31.0 5.8 %13 23.8 31.0 5.8 %ReplicatesAveragedSIMC1 = 5 26.2 27.9 5.3 %13 Model failed.DCVC1 = 3 23.8 31.3 5.9 %13 24.3 31.4 5.9 %Even though the models built with normalized spectra (without DWT treatment)had lower RMSE values, the predictions they produced were unreasonably low giventhe normal range of viscosity values. We also determined that the models builtwith theC1-optimized components performed much better, as they were much lessoverfit.Table 3.2 summarizes the predicted viscosities for the reserved samples, cal-culated with the various types of spectral reconstruction-based models, and theaverage RMSEP values for each. The Measured Viscosity column lists the valuesprovided to us by Domsjö, with their lab-measurement reproducibility. The Alland Avg. models are those illustrated in Fig. 3.3. For the prediction columns, theerror listed is RMSEC. The right-most column shows the values predicted with theDCV algorithm; the rest show the results of various SIM algorithms. We found thatthe DCV algorithm produced poorer predictions, so we focused on developing our43Table 3.2: Viscosity predictions for reserved samples from Domsjö FabrikerAB. All values in cm3/g.Sample Measured Predicted Viscosity (SIM) Pred.Name Viscosity All Avg. Random Split 1 Split 2 (DCV)A - D1747 505 498.8 507.6 508.2 511.6 508.2 527.4B - D1744 540 484.7 477.0 476.8 487.5 472.3 504.7C - D1727 516 504.6 493.7 494.0 495.8 501.0 508.3D - D1721 573 587.7 594.1 594.8 575.3 589.3 553.2E - D1637 447 459.1 435.9 436.4 438.4 443.5 461.5Error: 25 20.1 13.2 13.2 10.8 12.2 29.7application of the SIM algorithm.To try to minimize the potential overfitting, we calculated 50 re-randomizedPLS-SIM models (each with a unique validation set) based on the averaged spectraldata, and averaged the results. If the results from these re-randomizations werehighly dissimilar, despite having narrow RMSEC error bars, it would indicate thatthe models are unreliable. However, near-identical results would also be problem-atic, indicating that the models are overfit. The standard deviations of the pre-dictions for the reserved samples were in the 0.38-0.45 range, indicating that themodels likely performed well. Averaged predictions from the re-randomizationsare listed in Tab. 3.2 under the Random column.To further validate our results, we split the dataset into two independent halvesof equal size. Once again, the predictions for the reserved samples were comparedfor discrepancy; these are listed in Tab. 3.2 under the Split 1 and 2 columns. Agree-ment between these results, i.e. predictions that fall within each others’ margins oferror, would indicate that the models are robust. This is, in fact, what we observed.3.4 DiscussionOne thing that becomes immediately apparent upon examining Fig. 3.3 is the con-centration of samples with viscosities between 500 and 560 cm3/g. This has theeffect of skewing the models in favor of samples within that range, at the expenseof samples outside that range; in other words, the models would have difficulty pre-dicting the viscosities for outlying samples, which would be all the more important440 50 100 150 200Sample No.350400450500550600650Target valuePLS Predictions (For Dried Pads)True responsePLS predictions0 50 100 150 200Sample No.0246810121416Relative error (%)PLS Error Plot (For Dried Pads)400 450 500 550 600Target valueFigure 3.4: Left: Distribution plot for PLS-predicted viscosities, showingmeasured (red) and predicted (blue) values. Right: Error plot for thesame model, ordered by value.given the goal of implementing these models in a process monitoring environment.This effect is demonstrated in Fig. 3.4. The left-hand figure is a recastingof the averaged model in Fig. 3.3 (right). It is instructive in that it shows thedistribution of viscosity values, measured and predicted, clearly illustrating theconcentration of samples in the 500-560 cm3/g range, and the relative paucityof “extreme” samples. The right-hand figure shows the relative error for each ofthese predictions, and how the predictions for the “extreme” samples are muchless accurate; in some cases, the relative error is several times higher than that for“normal” samples. The blue line is a quadratic best-fit line, added to cosmeticallyindicate the trend.The best remedy for this issue would be to collect more samples with outlyingviscosities. However, here we are subject to some practical limitations. We can-not ask a mill to purposefully produce product they would consider to be “bad”;seeing as how this purpose-made pulp would not be marketable, it would incurunacceptable financial losses on their part. So, we have no choice but to acceptthe samples they provide. We can, however, request that they limit the number of500-560 cm3/g samples they send in order to provide a more even distribution ofviscosity values.Another issue lies in the RMSE itself. As previously noted, they are comparable45to the mill’s reported reproducibility. But, since all the training data necessarilyincorporates this variance, this gets folded into our PLS models. The result is thatit is not possible for our models to predict viscosity more accurately than the millcan measure it. However, the Error row in Tab. 3.2 seems to contradict this. Thisindicates some degree of self-consistency in our models, calling into question theirreliability when dealing with new data. Again, the best remedy for this problem isto process more samples.3.5 ConclusionOverall, our laboratory proof-of-concept for predicting the viscosity of dissolvingpulps based on their Raman spectra was successful, yielding results whose marginsof error were comparable to those produced by the mill’s on-site laboratory.Ultimately, the PLS algorithm works best when the number of rows in theXma-trix exceeds the number of columns. Even after dimensional reduction cuts downthe number of columns, the sample set of 236 is quite limited in number, especiallyconsidering the volume at which the mill operates. At some point, acquiring newsamples becomes a logistical challenge, due to the disconnect between the mill andthe laboratory environment. Therefore, our next steps would be to begin integrat-ing our hardware with the mill’s production environment, to facilitate collectionand processing of samples, and to progress towards the goal of full integrationwith the process line.46Chapter 4Incorporating RamanSpectroscopy into an On-LineAnalysis Module: A LaboratoryModelLike most mill towns, Örnsköldsvik, in northern Sweden, was dominated by a sin-gle company, in this case Mo och Domsjö (MoDo) AB. And also like many mill-town monopolies, this company was broken up for economic reasons after over acentury of operations. One of the results of this breakup was the transformationof one of their mill sites into something of an industrial research and develop-ment campus, involving the still-operating sulfite pulp mill, various spin-offs andstart-ups, and local universities. These include Domsjö Fabriker AB, our sourceof dissolving pulp samples, and PulpEye AB, which develops automated testingequipment for use in mills. A Raman probe would be a logical addition to suchequipment.4.1 IntroductionA persistent challenge posed by moving moving technology and expertise fromacademia into industry is the need to package things in an accessible manner.47Making such a transfer within an established infrastructure, if at all possible, willgreatly facilitate the process. We recognized PulpEye’s on-line pulp analysis unitas a perfect example of such “established infrastructure,” insofar as it is alreadydesigned to be deployed in a mill (and in fact can be found in dozens of millsworldwide). The PulpEye unit can be plumbed into the product line, samplingpulp solutions at a given interval, and can also analyze manually-introduced sam-ple solutions.Furthermore, the unit presents an easy spot for the integration of a Ramanprobe. It has a modular design, and its components and capabilities are addedper the needs of the customer. Their brightness-measuring module, specifically,is an ideal location for a Raman probe thanks both to its design and to the actualmeasurement process it employs.Figure 4.1: A brightness chamberIn essence, the brightness moduletakes in an aqueous pulp suspensionand dries it with compressed air, form-ing a circular pad approximately 2 mmthick. The brightness measurement ismade on the top of this pad; if there isany moisture left in the pad, the mea-surement of the pad’s reflectivity - andthus its brightness - cannot be consid-ered reliable. Likewise, pads madein the module should be superficiallyuniform, so as to ensure repeatability.The transparent plastic sleeve (visiblein Fig. 4.1, though covered in condensation) allows users to observe this entireprocess. Introducing a Raman probe to this module would require basic modifica-tion of the plastic sleeve, as well as the design of a mounting platform that couplesthe probe with the chamber; this will be detailed in the next section.The general goal of integrating Raman measurement with a PulpEye analy-sis module is illustrated in Figure 4.2. PulpEye units generate large quantities ofdata, which are then fed into a database to be analyzed in real time by a softwaresuite incorporating chemometrics and process analysis (ExtractEye; Extract Infor-48ProductStreamPulpEyeModuleData fromEstablishedMethodsRaman DataExtractEye"Calculated"PropertiesModeledPropertiesDWT+PLSClassificationFigure 4.2: Schematic showing the integration of Raman measurements witha PulpEye analysis module.mation AB, Norrköping, Sweden). In the words of PulpEye’s C.E.O., PulpEyeand ExtractEye together calculate (i.e. model and predict) many important fiberproperties in an automated setting. Given the similarity between this process andmodelling properties based on spectral data, setting up a parallel Raman analysissystem would be straightforward.4.2 A Benchtop Brightness-Raman Chamber: Designand DevelopmentLacking uninterrupted access to a full PulpEye unit in our laboratory, we built afully functional benchtop version of the brightness chamber (though without thebrightness-measuring camera), and designed a Raman probe system around it. Wetested this design by drawing upon our library of pulp samples to make brightnesspads for Raman measurement.We continued to use the backscattering free-space Raman probe with integratedobjective optics, manufactured byWasatch Photonics (WP785L), mentioned in Ch.3.2. To recall, its incorporated 785 nm laser has a maximum power of 100 mW,and its readout ranges from 270 to 2000 cm−1 with a resolution of 10 cm−1 throughits 50 µm entrance slit. The laser’s spot size is approximately 250 µm in diameter,49Figure 4.3: Left: CAD rendering of the Raman probe mount for the bright-ness module. Right: The probe mount, as constructed.providing a sampling area approximately 675 µm in diameter (including diffusescattering).One of the challenges facing the construction of an appropriate mounting plat-form for the spectrometer was the requirement that it fit inside a PulpEye cabinet.The spectrometer itself is 12.7 cm long, and taking into account its working dis-tance, the unit requires 16 cm of clearance. Because the spectrometer’s objective isoff-center, we designed the mounting platform to couple it to the brightness cham-ber at an angle. To facilitate alignment of the probe, we mounted it on a 5-axistranslation/rotation/tilt stage. We designed the platform to be strong enough tohold the combined weight of the spectrometer and stage - several kilograms - witha single point of attachment at the brightness chamber’s steel drain.Figure 4.3 illustrates the design of the mounting platform and its attachment tothe brightness chamber. The left-hand image is a rendering, showing the mountingangle and side plate for dry pad examination. The right-hand image shows theplatform assembled on the benchtop, with the brightness chamber holding a drydissolving pulp pad.Whereas previously we had been probing the surface of dissolving pulp sam-ples with our spectrometer, in this instance it was more practical to probe to theside of the brightness pads, though as can be seen in Fig 4.3 (left), we also de-signed a side plate for probing the surface of dried brightness pads. This would50PulpsampleSapphirewindow Spectrometer785 nm8 mmInset ScaleFull Scale32 mmFigure 4.4: Drawing of the modified brightness chamber.provide us with a basis of comparison between our previous methodology (see Ch.3.2) and the new system. We also reasoned that the sides of brightness pads maybe less subject to the morphological alterations that occur when the surface of thedried pulp sheets is formed; these processes typically involve heavy mechanicalcompression. Thus, the sides might provide a “purer” look at the bulk morphologyof the fibers. A rendering of the modified sleeve is shown in Figure 4.4.As organic polymers often have strong and highly characteristic Raman spec-tra, we modified the transparent plastic sleeve to incorporate a sapphire (α-Al2O3)window. Sapphire is preferred over glass due to its hardness and exceptional resis-tance to scratching, as well as its very broad optical transparency. Sapphire has anumber of Raman peaks in the 200−800 cm−1 range, some of which overlap withpeaks in pulp spectra (recall the latter from Fig. 3.2). [55, 77] Focusing the laserspot within the sample so that it is out of focus when passing through the sapphirewindow, as in Fig. 4.4, minimizes the sapphire signal intensity (though cannotentirely eliminate it). Furthermore, since the sapphire signal should be constantregardless of the sample, it should be easily rejected by chemometric techniques.Incorporating this window into the brightness chamber presented a number ofchallenges. The first and most important was maintaining the chamber’s water-tightness; since the chamber is designed to handle aqueous suspensions of pulp,51any leakage would severely limit the repeatability and reliability of measurements,not to mention cause catastrophic damage to sensitive electronics should it comeinto contact with the spectrometer. The best way to maximize watertightness is tominimize the number of possible escape points, as well as their area.With this in mind, we designed the modification as shown in Fig. 4.4. Atthe end of a wider channel, we bored a 1 mm diameter hole through the end ofthe plastic sleeve for the laser to pass through. We then slid the sapphire windowinto the channel (Fig. 4.4, blue), and sealed it in place with an O-ring and a brasslocking ring; thus, the laser could pass relatively uninterrupted to the sample, whilethe small aperture and the O-ring minimize water leakage. Another challenge withthe design was the 1 mm aperture itself; specifically, that suspended pulp fiberscould become stuck inside it, or worse, between the plastic sleeve and the window(visible as white space in Fig 4.4). Fortunately, this did not become an issue inpractice, due to the module’s mode of operation (detailed in the next section).Of further concern was the position of the aperture relative to the pad-formingscreen - that is, where the pulp samples would rest. In a normal brightness chamber,this screen sits well below the lip of the steel base, thus rendering the sample invis-ible to the Raman probe. We built an insert that elevated the position of the screenflush with the window, but this changed the hydrodynamics inside the chamber,requiring minor adjustment of the mode of operation.With our modified brightness chamber and the remaining components lent usby PulpEye, we built a functional benchtop version of the full brightness module,shown in Figure 4.5. The module uses several pneumatically-controlled switchesand a vacuum pump to introduce air, water, and pressure to the chamber duringthe course of operations. We added a funnel to the upper part of the module forsample introduction. We used a Teflon ring inserted into the chamber to alignthe spectrometer through the aperture; Teflon (polytetrafluoroethylene), like mostpolymers, has a very distinctive Raman spectrum with several well-defined peaksranging from 200 to 1400 cm−1, covering much of the same range as pulp spectra.[78] Therefore, it makes an ideal standard for aligning Raman spectrometers, aswell as for comparing the calibrations of multiple spectrometers (see Ch. 7.3).52Figure 4.5: Our Raman-equipped model brightness module - the “mini-pulpfactory”.4.3 Materials and MethodsAfter constructing the benchtop brightness module and mounting the Raman probeas in Fig. 4.5, we performed several tests to optimize the instrumentation using 236dry dissolving pulp brightness pads, analyzed previously (see Ch. 3.2). We alsoanalyzed 43 never-dried Northern Bleached Softwood Kraft (NBSK) pulp samples,refined to various energies, provided by Canfor. We built prediction models builtfrom these samples to assess the modified module’s performance with both pulptypes, using the results discussed in Ch. 3 as a baseline for comparison.4.3.1 Pad Formation and AnalysisWe rehydrate 15 g portions of the dissolving pulp sheets overnight, before dispers-ing them in 1 L of water. This requires 10 minutes of vigorous stirring. The NBSKsamples require no rehydration as they are originally clumps of wet pulp, but aresimilarly dispersed to form a smooth aqueous suspension. We then pour 300 mL ofthe dispersed sample into the brightness chamber, and vacuum-aspirate the solutionthrough a screen until it forms a sludge. We then apply a compressed air flow over5360 seconds, while maintaining vacuum, to dry the sludge into a brightness pad.1At this point, we collect a Raman spectrum with an integration time of 500 ms, andproceed to redisperse the pulp within in the chamber. We repeat the pad formationand Raman collection cycle three times for each of three 50 mL dispersed samplesper pulp sheet, collecting 9 spectra in total per sheet.4.3.2 Data TreatmentRefer to Appendix A.1.Treatment of the spectral data proceeded in a similar fashion to what was out-lined in Ch. 3.2, but without applying TOGA. As before, we applied DiscreteWavelet Transform (DWT) to the spectra with the sym5 wavelet with a scale of 7,rejecting the highest- and lowest-frequency wavelets. Using Partial Least-SquaresRegression (PLS), we modelled viscosity for the dissolving pulps, and for the NBSKpulps we modelled tensile breaking length, burst index, and tear index. We alsoconducted a comparison of PLS models built using MATLAB and ExtractEye.4.3.3 Gaussian Process RegressionIn addition to building models based on PLS regressions, we investigated the useof Gaussian Process Regression (GPR) for model building as well. Recalling thebasic form of a regression from Eq. 2.12 (Y = Xb+ ε), we can consider thatthe error term ε is normally distributed with mean 0 and variance σ2, such thatε ∼N (0,σ2). PLS is a deterministic method, as it numerically estimates b and εfrom the data X,Y (refer to Ch. 2.3.2).GPR, on the other hand, is a probabilistic method that estimates a set of n ran-dom latent variables F = f (xi), as well as a set of n basis functions H = h(x|i ),for each of the n observations in X. Thus, the GPR regression takes the followinggeneral form:Y=Hb+F(X) (4.1)1Normal brightness operations require much longer drying periods, on the order of 5-10 minutes.54The basis functionsH are explicitly chosen at the outset; we used a linear basis,such thatH= [1,X]. Like ε , the latent variables F(X) are normally distributed withmean 0, but their variance is represented by a covariance matrix K(X,X|θ), suchthat F ∼N (0,K(X,X|θ)). The elements of K represent the covariance betweeneach unique pair of observations in X. θ is a set of hyperparameters that definethe kernel functions used to construct K. We used a squared exponential kernelfunction for our covariance matrix:k(xi,x j|θ) = σ2f exp[−12(xi− x j)|(xi− x j)σ2l](4.2)where σ f is the signal’s standard deviation, and σl is the length scale (number ofvariables). Since both these parameters must be greater than 0, we can expressthe hyperparameters for this kernel function as θ = [log σ f , log σl]. From the basicregression expression, a set of target variablesY can be modeled with the followingprobability distribution:P(Y|F,X)∼N (Y|Hb+F,σ2In) (4.3)where σ2 is the noise variance and In is the identity matrix of size (n× n). Atthis stage it is useful to define the following parameter α , representing the totalvariance of the latent variables F:α ≡K(X,X|θ)+σ2In (4.4)Building a GPRmodel requires knowledge of b and σ2, as well as a kernel func-tion for the construction of K, which itself is dependent on θ . These parameterscan be estimated based on the training data. The GPR algorithm first maximizes alog-likelihood function for P(Y|X), dependent on b(θ ,σ2), using initial estimatesfor θ and σ2. It then re-maximizes the log-likelihood function using the new esti-mates for b, and finds new estimates for θ and σ2.55The log-likelihood expression is: [68]log P(Y|X,b,θ ,σ2) =−12(Y−Hb)|α−1(Y−Hb)−12log |α|− n2log 2π(4.5)With b, σ2, and θ estimated, the latent variables F in expression in Eq. 4.3 arewell-defined. The GPR algorithm uses the entire training dataset, unlike the trainingsubset used with the PLS algorithm. [68] Recalling Eq. 4.3, the expected values fora new set of data xa are calculated as: [79]E(ya|Y,X,xa) = h(x|a)b+K(x|a ,X)α−1(Y−Hb) (4.6)The terms of this expression can be described as follows. h(x|a)b represents thenew dataset xa transformed onto the basis defined byH, and weighted according tothe coefficients b. K(x|a ,X) represents the covariance between the new data and thetraining data X. Finally, α−1(Y−Hb) represents the training model. The valuesof E(ya) are the mean values of P(ya): [68, 79]P(ya|y,X,xa)∼N(ya|E(ya),σ2a +K(xa,xa)−K(xa,X)α−1K(X,xa))(4.7)Thus, the GPR model predicts a specific expected value and a probability dis-tribution for each new data object, which can make outlier detection much easier.Given its non-linearity, the GPR model essentially models the training data with aspline, which can improve local prediction accuracy. In theory this should mitigatesome of the issues described in Ch. 3.4, where PLSmodels had difficulty predictingoutlying (but valid) samples. [68]56ArbitraryFigure 4.6: 705 Raman spectra of dissolving pulp pads, produced with thebrightness module. Above: Normalized spectra. Below: Spectra re-constructed from DWT coefficients, filtered by removing the highest twodetail levels and the approximation level.4.4 Results: The Mini-Pulp FactoryFigure 4.6 shows 705 dissolving pulp Raman spectra from the “pad factory” (Fig.4.5), typical of those produced with this setup. Comparing with the previously-recorded spectra of the same pulps, shown in Fig. 3.2, several things are apparent.The most obvious is that the new spectra are substantially noisier pre-DWT, but thatthe broad background is less prevalent. The noise can be accounted for by the morecomplex nature of the probe’s optical path; collecting a backscattered spectrumfrom a newly-formed brightness pad through the sapphire window shown in Fig.4.4 involves far more sources of variance than does simply probing the surface ofa dried sheet. Applying DWT does minimize some of this variance, as can be seenin the lower band of spectra in Fig. 4.6.The difference in background signal is likely due to differences in sample mor-phology between the sides of the brightness pads compared with the tops of thedried pulp sheets. Since the pulp sheets probed in Fig. 3.2 were dried and com-pressed, their surface density was much higher than for the newly-formed bright-ness pads in Fig. 4.6. This would increase the reflectivity of the samples, as well asthe concentration of fluorescent contaminants, thus creating a more intense broadbackground signal.57-6 -4 -2 0 2 4 6Principal Component 1-6-5-4-3-2-101Principal Component 2PCA Plot of Pulp Suspension SpectraSuspension 1Suspension 2Suspension 3Figure 4.7: PCA plot of 132 Raman spectra of NBSK samples, averaged andsorted by suspension. Green: first suspension. Red: second suspension(first redispersion). Blue: third suspension (second redispersion).There are also a number of sharp peaks that appear in Fig. 4.6 that do not inFig. 3.2. These are mostly due to the out-of-focus sapphire window; sapphire’smost prominent peaks are at 376, 414, and 641 cm−1, all three of which are clearlyvisible in Fig. 4.6. [77] Note however that the peaks around 800− 850 cm−1are not from sapphire, but rather are likely due to H–C–C and H–C–O bendingmodes about C-6 (falling outside the ring structure). [55]We applied Principal Component Analysis (PCA) to the DWT-treated Ramanspectra collected from the NBSK samples to determine whether or not repeatedresuspension and reformation of the brightness pads had any effects on data quality;Figure 4.7 shows the results. Recall that PCA uncovers variance in a data block,along the lines of Eq. 2.13 and the red box in Fig. 2.6, without incorporatingany target information. The scores (or principal components) represent orthogonalcontributions to the overall variance, and are ranked by the percent of the overallvariance they account for. If the repeated resuspensions had a substantial effecton the Raman spectra, one would expect a plot of the first and second principalcomponents to show distinct clusters of data, illustrating that effect. In Fig. 4.7,the data forms something of a continuum along the first principal component (the x-58Table 4.1: PLS and GPR model results from brightness pad data.TargetParameter Units Samples RMSEC RMSEP Rel. ErrorViscosity(PLS) cm3/g 236 Diss. 18.6 32.0 6.0 %TensileBreaking Length km 43 NBSK 1.34 1.90 27 %Burst Index kPa m2/g 43 NBSK 1.05 1.61 29 %Tear Index mN m2/g 43 NBSK 2.70 4.10 23 %Viscosity(GPR) cm3/g 236 Diss. 18.3 40.1 7.7 %axis). Considering the indistinct nature of the separation, though, we can concludethat the repeated resuspension of the pulp samples does not have a substantial effecton the data.We built our PLSmodels along the same lines as outlined in the previous chapter(Ch. 3.3), using a randomly-selected 70% subset of the spectral data to train themodels with the SIM algorithm, and the remaining 30% of data to validate them.As previously mentioned, we selected a linear basis function for our GPRmodel,as well as a squared exponential kernel function for calculating covariances (seeEq. 4.2). For assessing the performance of the models, we used 5-fold cross-validation, which split the data into 5 subsets, reserved one set, and re-trained themodel using the remaining four sets.Table 4.1 summarizes the results of the modelling of the wet NBSK samples,and the initial 236 dissolving pulp samples. The top four rows are the results ofPLS models, and the bottom row of GPR. It is immediately apparent that the PLS-based predictions of strength properties (tensile breaking length, burst index, tearindex) are not outstanding due to their high Root Mean-Square Error (RMSE) val-ues, relative to the measurements’ lab reproducibilities. This can be explained bythe highly limited number of NBSK samples: 43. Of those 43, 28 were refinedsamples that had corresponding unrefined samples; in those cases, the refined sam-ples can still be considered independent of the unrefined samples. PCA plots of59the dataset showed no significant grouping, either by refining point or by sample,demonstrating that the differences between samples refined to different energieswas on the same scale as differences between samples themselves.The PLS viscosity results are slightly worse with the brightness pad data thanour standard of comparison (see Tab. 3.1, Fig. 3.3), with an increase in relativeerror from 5 to 6%. This is most likely a factor of the more complex optical path,as previously discussed, and of the lack of replicate averaging in the new models.We investigated the usefulness of averaging across each of the three resuspensions,as well as across each of the three individual batches (see Ch 4.3.1); both werefound to have a marginal effect on model quality.4.4.1 Comparing PLS and GPRGPR was entirely unable to produce models for the NBSK samples, for the samereason that the PLSmodels performed poorly: the data matrix X was rank-deficientwith only 43 rows, and the GPR algorithm could not determine the full set of basiscoefficients inH (see Eq. 4.1). For viscosity, GPR produced a model that performedbetter than PLS in terms of Root-Mean-Square Error of Calibration (RMSEC), butworse with respect to Root-Mean-Square Error of Prediction (RMSEP). This indi-cates that the GPR model is somewhat overfit.Figure 4.8 shows prediction (top row) and distribution (bottom row) plots forthe viscosity models. Subplot A shows the training (red) and validation (green)data for the PLS model. Subplot B shows the training data (red) for the GPR model,as well as the predictions from the 5-fold cross-validation (green). Both plots showpredicted viscosity on the x-axis, and measured on the y-axis; this inversion is aconvention used in ExtractEye (to be discussed in the next section), and is usedhere for consistency. 2 Subplot C shows the distribution of predictions (blue) andtrue values (red) for the PLS model, and Subplot D shows the same for the GPRmodel.For the GPR plots (right column), the predictions are in fact the expected values(per Eq. 4.6), and the distribution plot (D) shows the upper and lower limits of pre-2The rationale is as follows: because the errors in the measured values are not accounted forin the model, while the errors of the predicted values are well-defined by the model, the measuredvalues constitute the traditional “unknown” and should therefore be placed on the y-axis.604004505005506000 100 200 300 400 500 600 700350400450500550600650400 450 500 550 600 650 700PredictedSample No. Sample No.400420440460480500520540560580600MeasuredTarget valueTarget value0 100 200 300 400 500 600 700350650PLS Model GPR ModelBC D420 440 460 480 500 520 540 560 580 600Predicted420440460480500520540560580600MeasuredAFigure 4.8: Models of dissolving pulp viscosity. Top Row: Prediction plots.Bottom Row: Distribution plots. Left Col.: PLS model, with 12 com-ponents. Right Col.: GPR model, with 5-fold cross-validation. Valuesin cm3/g.diction as the gray area. The plots show that for GPR, the range of expected valuesis narrower than for PLS (as evidenced by the lower RMSEC in Tab. 4.1). However,the 5-fold cross-validation in the prediction plot (B) shows a wide distribution ofvalues. In fact, the GPR prediction plot is very typical of an overfit model.To avoid such overfitting with PLS models, we select an appropriate numberof components to use, but this is not possible with GPR because of the algorithm’sprobabilistic nature. Choosing a constant basis function (H = 1) radically under-fits the model, while a quadratic basis function (H= [1,X,X2]) radically overfits it.Thus, a linear basis (H= [1,X]) is the only viable option, despite its moderate over-61440 460 480 500 520 540 560 580Predicted Viscosity400450500550600Measured ViscosityPLS Prediction Model (12 Factors; From Pad Factory)440 460 480 500 520 540 560 580400425450475500525550575600Predicted valuesMeasured values6 comp. used for prediction w ith 'PLS Model 4'Data pretreatment: Symmetric DWT sym5 7 levels reconstructed by excluding A[7],D[2],D[1]Viscosityin modelnot in modelEquation: y=a+bxa:  33.27b:  0.9350r:  0.6272Count: 164Bias: -1.395RMSEP:  23.80SEP:  23.84Figure 4.9: Prediction plots for dissolving pulp viscosity, from Matlab (left)and ExtractEye (right). Values in cm3/g.fitting as shown in Fig. 4.8 (B). Given these challenges (and our limited dataset),we therefore decided to continue using PLS as our primary modelling algorithm.4.4.2 Comparing Matlab and ExtractEyeTo further the integration of our Raman probe with the PulpEye system, we had toaddress software integration as well as hardware integration. This involved build-ing PLS models with the aforementioned ExtractEye chemometrics software, andcomparing them to our Matlab-built models. Figure 4.9 shows such a comparison:models predicting viscosity built with the same DWT-treated data (reconstructions)in each program; the Matlab model shown is similar, but not identical, to the one inFig. 4.8. In both plots, training set data are red. Validation data are green (Matlab)or blue (ExtractEye).Though the clustering of points appears superficially quite similar, there area few notable differences. First is the number of components used for each; theMatlab model uses 12, while the ExtractEye model uses 6. This is a significantdifference, considering how Matlab models built with fewer components performworse (see Tab. 3.1). The ExtractEye model has an RMSEP value of 23.8 cm3/g(4.5%), while the Matlab model’s RMSEP is substantially higher, at 41.0 cm3/g620 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000A = 1p1p2w1w2A = 2X-Loadings (P) X-Weights (W)0.20.1-0.1-0.20.20.1-0.10012-2-1-0.200.20.1-0.1-0.20Pixel No.ArbitraryArbitrary ArbitraryArbitraryPixel No.Figure 4.10: First two loadings (p1, p2) and weights (w1,w2) of dissovlingpulp viscosity models from Matlab (red) and ExtractEye (blue).(7.7%). This is contrary to what might be expected, based on prior data, indicat-ing that ExtractEye performs some intermediate steps to build the models that ourMatlab algorithm does not.To investigate this, it is instructive to look at the PLS loadings and weights. Wecan see from the first and second X loading and weight vectors, plotted in Figure4.10, that there are some differences between the models. The blue lines are theloadings from the ExtractEye model, and the red from the Matlab model.Recalling Eq. 2.16, we can calculate a similar value R2X using the x sum ofsquares, that represents each component A’s explained variance in X. For the firstcomponent (A = 1, top row), the R2X values are 0.998 for Matlab and 0.952 forExtractEye. Clearly, p1 and w1 contain primarily spectral data, and the differencesbetween the red and blue lines (Matlab and ExtractEye, respectively) is likely dueto the differences in DWT algorithm. The DWT algorithm we use in Matlab performsbaseline pinching, while ExtractEye’s DWT algorithm does not. However, thesedifferences are minor; both p1 and w1 are plotted with the same y-scale.The second component (A = 2, bottom row) is more telling. The R2X valuesfor this component are 0.084 for Matlab and 0.016 for ExtractEye. This is reflectedin the difference between p2 and w2; note that w2 is plotted with a y range between±3, whereas p2 (like p1 and w1) is plotted with y between −0.2 and 0.25. Thereis negligible difference in magnitudes between P and W in the ExtractEye model,but there is a substantial difference with the Matlab model. The magnitude of w2is much higher than p2, while for w1 it is slightly less than p1.This difference is likely due in part to the mean-centering step that Extract-63Eye applies to the X data when building PLS models. Though this would be rel-atively straightforward to implement in the Matlab code, we decided to shift tousing ExtractEye as our primary modelling software, both for its improved modelperformance, as well as its ease of use and its tight-knit integration with PulpEyesystems.4.5 DiscussionDuring routine data collection with the “pad factory”, we encountered some un-foreseen challenges. For one, the quality of the newly formed brightness pads hadmore of an effect on spectral data than we expected. This mostly had to do withthe density of fibers around the laser; if they were loosely packed, this would havea very deleterious effect on the spectral Signal to Noise ratio (SNR). Since the padformation steps (vacuum, water, valves, etc.) were controlled manually, and givenour limited supply of compressed air, this was a somewhat frequent occurrence.The best way to counteract this was to use more standard operating conditions forpad formation, i.e. 5-10 minutes of drying air. Resuspending and reforming thepad was also an effective way to fix any pad formation issues.Routine brightness chamber operations are entirely automated, so pad forma-tion is much more uniform; however, this also limits our ability to directly assesstheir quality. The best way to ensure uniform quality would be to continue to re-peatedly resuspend and reform the brightness pad, collecting spectra each time.Spectra of these resuspensions could then be compared using PCA to detect out-liers, and averaged accordingly, to produce a single representative spectrum forthat sample.We also continued to struggle with predicting outlying viscosities, as can beseen in Fig. 4.8. Figure 4.11 shows a histogram of the samples used to pro-duced these models; the clustering around 500-570 cm3/g is evident. We inves-tigated how well our models would perform if we restricted the data to sampleswith viscosities in that range; this improved RMSE values somewhat, decreasingthe relative error from 6% to 4.5%. This indicates that Raman spectra have goodpredictive ability for viscosity, and that the main limitation is the narrow distribu-tion of samples, as we expected. Such “narrow” models cannot be used to predict64400 450 500 550 600-10123456789101112131415161718Measured valuesRelative frequencyViscosityMin:  379Max:  605Range:  226Mean:  529Median:  534Std.dev: 35.5N:  680Int. w idth:  10.0Figure 4.11: Histogram of viscosities of dissolving pulp samples.lower-viscosity samples with accuracy. Nonetheless, “broad” models covering thefull range of viscosity values can still be considered to be accurate; recall thatDomsjö’s on-site laboratory reproducibility for viscosity measurements was 4.7%(25 cm3/g). An accuracy near 6%, derived from spectra from a limited sample set,represents a good result.4.6 ConclusionAfter having successfully built and tested our benchtop “pad factory”, our need formore samples became clear. The best way to acquire enough samples would beto bypass the transportation logistics, and start collecting spectra on-site in a mill.Or, if not on-site, at least at a location where access to hundreds of mill samples iseasy. To this end, as well as to advance our hardware and software integration, werelocated our Raman probes closer to the mills, in collaboration with our industrialpartners. The next chapter will detail the progress made during these projects.65Chapter 5Fiber Probes and Data Fusion:Implementation in a Pilot PlantHaving successfully demonstrated proofs-of-concept for both the hardware andsoftware aspects of applying Raman spectroscopy and multivariate calibration as aprocess control method to the pulp industry, we progressed towards integrating ourtechnology with existing systems by working closely with our industrial partners:one project phase at PulpEye’s workshop in Örnsköldsvik, Sweden, and a secondat Canfor Pulp Innovation’s pilot plant in Burnaby, BC.5.1 IntroductionA major part of transitioning technology from academia to industry is the devel-opment of a Standard Operating Procedure (SOP) that outlines routine (i.e. non-experimental) use of the technology. In the case of Raman spectroscopy, such anSOP would be an established, regular procedure for data acquisition that ensuresrepeatability and reproducibility of measurement. Of course, an SOP is only use-ful if it is followed; without delving into a discussion of human psychology, it isnonetheless apparent that a successful new technology must be accessible, easilyintegrateable with existing systems and procedures, and above all, valuable to thosewho are using it.Easing through this transition requires taking all of these concepts into account.66Recognizing that the two main aspects of our Raman probe system - hardware andsoftware - represent two parallel tracks, we pursued development along both. Ourultimate goal was to set up our instrumentation in a pilot plant setting, and todevelop an SOP so that our industrial partners could begin routine data collectionthemselves, thereby obviating the logistical difficulties of acquiring hundreds ofpulp samples.Along the way, we refined our hardware designs to make the Raman probeeasier to integrate with existing infrastructure such as PulpEye’s brightness cham-ber. We also developed a new software approach that, while still performing allthe same functions as before, is both easier to use and directly interfaces with ourindustrial partners’ on-site databases.As previously mentioned, we performed this work in two phases, one withPulpEye and a second with Canfor. With PulpEye, we focused on hardware refine-ment, availing ourselves of their considerable expertise in this area. With Canfor,we were able to begin field tests of our instrumentation, and work with their tech-nicians to start developing an SOP for our Raman probe. Canfor also provided alibrary of samples to analyze, with more arriving regularly from the mills.5.2 Materials and MethodsAt PulpEye’s research and development workshop in Örnsköldsvik, we analyzed132 dissolving pulp samples from Domsjö Fabriker AB, predicting viscosity. Weused their brightness chamber to form brightness pads from the aqueous pulp sam-ples, with a drying time of 3 minutes. Raman spectra were integrated for 1000 ms.At Canfor Pulp Innovation’s pilot plant in Burnaby, we analyzed 285 NorthernBleached Softwood Kraft (NBSK) samples from Canfor’s mills in Prince George,BC (Intercontinental and Northwood Pulp Mills). We used this data to model avariety of parameters. The key strength parameters are burst strength, tear index,tensile breaking length, tensile energy absorption (TEA), and dry and wet zero-span breaking length. Other properties include physical parameters such as density,porosity/air resistance, stretching length, stiffness, and freeness, as well as variousoptical properties. These NBSK samples were dried unrefined pulp sheets; we col-lected Raman spectra from 4 points on each sheet, and averaged them. Spectra67Figure 5.1: Raman spectrometer (free-space) mounted to a brightness cham-ber in PulpEye’s workshop, without the watertight equipment cabinet.were integrated for 500 ms.In both cases, comparisons between spectrometers were made; these will beelaborated upon in the following sections.5.2.1 Probe Design: Fiber vs. Free-SpaceWorking in PulpEye’s workshop provided us the chance to use our spectrometermounting platform and free-space Raman spectrometer with their brightness cham-ber, the way it would be set up in one of their commercial analysis units, as seen inFigure 5.1. We added the plastic bag around the spectrometer itself to ensure wa-tertightness, though with automated sample inlet, this was less of a concern thanit was in our lab. However, this was only an interim solution; all the electronicswithin a PulpEye unit are hermetically sealed and located away from wet areas, asthere can be no assumptions made about what conditions may be encountered in amill.One practical consideration we had to take into account was the impracticalityof switching off the lights every time data was to be recorded; indeed, this is animpossibility in a mill. Mercury-containing fluorescent lighting severely interferes68with Raman data collection, as does blackbody incandescent lighting; both easilydrown out any signal of interest. LED and metal halide lamps, commonly found inindustrial settings, do the same (albeit to a lesser degree). Therefore, we “light-proofed” the spectrometer by blacking out the plastic sleeve and sealing the opticalpath between the spectrometer and the sapphire window.These adaptations were cumbersome, and combined with the spectrometer’ssize, this drove us to pursue the implementation of a fiber optic-coupled Ramanprobe in its stead. The brightness probe is itself fiber optic, so all the necessaryinfrastructure is already in place; it would simply be a matter of exchanging theprobes (as well as what they attach to at the other end).We used a Wasatch Photonics fiber-coupled probe (RP785), with a remote785 nm, 225 mW diode laser source and spectrometer.1 We attached the Ramanprobe to brightness probe armature, rather than coupling it to the chamber throughthe sapphire window. This meant that the fiber probe would be analyzing the sur-face of brightness pads, as opposed to their sides. Part of the adaptation of theexisting infrastructure was a probe-holder that fixed the objective lens of the Ra-man probe at its focal length (11 mm) above the surface of the sheet to be analyzed.When analyzing the 132 dissolving pulp samples in PulpEye’s workshop, wecollected one spectrum per sample with the free-space probe, and multiple spectrawith the fiber-coupled probe. These latter were collected at different points on thesurface of the pad and averaged; this was done by rotating the probe armature,though its range was limited due to spatial and pneumatic constraints.In our lab, we designed a second fiber probe, with two main differences fromthe Wasatch RP785. First was the incorporation of a spatial filter into the probedesign. Spatial filters focus light through a small pinhole, usually with the intentof improving a laser’s beam profile. In this case, a spatial filter serves to excludeout-of-focus light backscattered from the sample, thereby improving the signal in-tensity of the sample while excluding contributions from other sources (such assapphire windows or other optical components). [80, 81] True confocality may betoo academic to implement in a process control setting, but nonetheless, the in-1The spectrometer is also made by Wasatch Photonics (WP785); it has the same specificationsas the WP785L free-space probe. The laser is made by Innovative Photonic Solutions (MonmouthJunction, NJ, USA).69Figure 5.2: Fiber-coupled Raman probe with spatial filter.troduction of a spatial filter should improve the Signal to Noise ratio (SNR) of thespectra collected by the probe, thereby improving the downstream data processingsteps. The second difference between our probe design and the Wasatch probe wasour use of half-inch (Ø12.7 mm) optics, as opposed to Ø9 mm. With careful opticalalignment, this doubles the light-collecting capacity of the probe.A rendering of the probe is shown in Figure 5.2. The 785 nm excitation beam isintroduced off-axis (left-hand side) through the delivery fiber, and coupled into theobjective with a Dichroic Beamsplitter (DBS). The optical path for backscatteredlight collected by the objective is straight (right-hand side); it passes through theDBS and the spatial filter, and is focused into the collection fiber leading to thespectrometer. Both optical paths, between the fiber optic ports and the DBS cavity,are isolated. We conducted a comparison of our probe alongside the Wasatch probeand the free-space spectrometer at Canfor’s pilot plant. There, we collected spectrafrom the 285 NBSK pulp sheets, averaging across 4 points on each.705.2.2 Data TreatmentRefer to Appendix A.2.To further our software integration with ExtractEye and other existing systems,we translated our data processing script from Matlab into Python. Spectra muststill be collected and exported with the OEM software specific to the spectrome-ter. We use Python to collate these files, extract the spectral data, and treat themwith Discrete Wavelet Transform (DWT), before uploading them to an StructuredQuery Language (SQL) database. The Python script also reads each sample’s at-tendant target data as provided by Canfor, and uploads it to the database as well.This database is then read in by ExtractEye, where we use Principal ComponentAnalysis (PCA) to exclude spectral outliers, and then Partial Least-Squares Regres-sion (PLS) to model and predict the target data.5.2.3 Data Fusion: What the PulpEye Tells UsA PulpEye unit does much more than just measure brightness; most of the datait produces are fiber characterizations. We recognized that some of this data maybe correlated with the target properties - especially strength properties - we are at-tempting to predict based on the Raman spectra. There is some precedent to this;Marklund et al used PLS to model tensile and tear indices based on fiber properties.[82] Fusing these to our standard spectral dataaset could provide a valuable addi-tional source of information. Table 5.1 lists the primary data fusion parameters thatwe used.For this step, we decided to build PLS models using the DWT coefficients them-selves rather than the spectral reconstructions, so as to minimize the difference indata set widths. Compared with the full 1,024 spectral variables, the covariancedetermination could dismiss the 15 fusion parameters as irrelevant. Reducing thespectral datablock to one quarter of its original size would make this less likely.One concern when fusing datasets such as these is that one might “drown out”the other during PLS. Specifically, the least-squares minimization used to estimatethe regression coefficients will be dominated by data with large numeric values, asthe wavelet coefficients produced by DWT often are, at the expense of the fusionparameters in Tab. 5.1. To mitigate this, data is typically scaled so that it has unit71Table 5.1: List of PulpEye data fusion parameters.Property Units Typical rangeFreeness (CSF) mL 450-800Length-weighted fiber length mm 2.1-2.6Length-weighted fiber width µm 26-30Length-weighted curl percent 16-20Fines fraction percent 6.3-9.2Scaled coarseness mg m−1 0.05-0.20Kinks per fiber counts 0.50-0.90Kinks per millimeter counts 0.20-0.40Kinks angle degrees 31-38Shive content (> 0.1-0.3 mm) counts per gram 100-600Shive content (> 150 µm) counts per gram 100-600Shive content (> 1.5 mm) counts per gram 0-30Total crill (microfibrils) scalar 61-85Kappa number scalar 11-23Average fiber wall thickness (FWT) µm 1.8-2.0Average fiber width µm 23-24Average width to FWT ratio scalar 120-130Raw coarseness mg m−1 185-200Average fibril angle degrees 23.0-24.5variance. We perform this step after the datasets have been fused, and upload theresults to a separate SQL table.5.3 ResultsAt PulpEye’s workshop, we collected 129 free-space and 144 fiber-coupled spectraspanning 132 dissolving pulp samples. At Canfor’s pilot plant, we collected 1,172spectra each with the free-space and fiber-coupled probes, four replicates per eachof our 285 NBSK samples. We collected an additional 500 spectra with our custom-built fiber probe.Tables 5.2 and 5.3 list the results of the various models built during this phaseof work, using spectral data collected from both the dissolving and NBSK pulpsamples. Root Mean-Square Error (RMSE) values have been withheld due to intel-lectual property concerns; instead, the cumulative Q2 statistic (see Eq. 2.17) and72relative error are presented for each target, alongside its lab reproducibility valuesfor comparison.We found that the dissolving pulp models were improved collecting spectra at5 points on the brightness pad (as opposed to 2), and drying the pads for longer.Averaging spectra across more collection points reduces model sensitivity to sam-ple inhomogeneity; drying the pads for 5 minutes rather than 3, so that the padsare fully dry, reduces the morphological inhomogeneity that may be present on thesurface of the pads, and removes any variance that may be caused by moisture inthe sample.5.3.1 Comparison between Free-Space and Fiber-Coupled ProbesAside from the typical small differences in spectrometer Charge-Coupled Device(CCD) sensors, there was one major difference between the dissolving pulp spectracollected by the free-space and fiber-coupled Raman probes at PulpEye’s work-shop. The fiber-coupled spectra exhibited a large, broad peak at the low end of thespectrum (approx. 200-600 cm−1). This difference can be seen in Figure 5.3; bothplots show Raman spectra of the same set of dissolving pulp samples that havebeen treated with DWT. The left-hand plot shows the free-space spectra; these arein accordance with Figs. 3.2 and 4.6, though as this example lacks DWT baselinepinching, some of its values are negative.The right-hand plot shows the fiber-coupled spectra. Though somewhat alteredin shape due to the DWT treatment, the broad new peak at 390 cm−1 is clearly vis-ible, while the normally prominent features in the 1000-1200 cm−1 appear quitesmall. This peak was visible regardless of the sample (we also tested Teflon andpolystyrene), and was still present when the laser was switched off and the spec-trometer was only collecting ambient light.The anomalous peak can most likely be attributed to autofluorescence of thefiber optic material. Fused silica (SiO2) - the core material used in fiber opticcables - does exhibit a large, broad peak near 400 cm−1, but this is usually op-tically filtered with a laser cleanup filter (bandpass) on the delivery fiber, and aHolographic Notch Filter (HNF) (bandstop) on the collection fiber. [83–85] Thisautofluorescene peak was not observed at all during testing at Canfor’s pilot plant,730 201 401 601 801 1001-0.2-0.10.00.10.20.30.40.5Pixel No.Norm. CountsData pretreatment: Range Normalized Symmetric DWT sym7 7 levels reconstructed by excluding A[7],D[2],D[1]0 201 401 601 801 1001-0.2-0.10.00.10.20.30.4Pixel No.Norm. CountsData pretreatment: Range Normalized Symmetric DWT sym7 7 levels reconstructed by excluding A[7],D[2],D[1]Figure 5.3: Comparison of free-space (left) and fiber-coupled (right) Ramanspectra of dissolving pulp samples collected in PulpEye’s workshop. x-axis shows pixel number, ranging from 265 to 2005 cm−1. Note thecharacteristic sapphire peaks at 90, 200, and 420 px, corresponding to376, 414, and 641 cm−1.neither with the Wasatch RP785 probe nor with our probe design (see Fig. 5.6),indicating that the problem may have been specific to the fiber probe we used atPulpEye’s workshop.Table 5.2 shows the results for the PLS models built with free-space and fiber-coupled Raman spectra. Each model is represented with the number of componentsused (A), the relative prediction error, and the cumulative Q2 value (see Eq. 2.17).We found that slightly different DWT treatment steps were needed to produce op-timal results for the dry NBSK samples. For both datasets, we decomposed thespectra with the sym5 wavelet to a scale of 7, and rejected the approximation co-efficients. For the free-space data, we rejected the top two detail coefficients, whilefor the fiber-coupled data, we rejected the top three.The free-space models outperformed the fiber-coupled models for the NBSKsamples, but the opposite was true for the viscosity model built with the dissolvingpulp samples. The viscosity result on the first row of Tab. 5.2 is divergent, mostlikely because of the substantial difference in how the spectra were collected. Re-call from Ch. 5.2.1 and Fig. 5.1 that the free-space probe collected measurements74from the side of the brightness pads, whereas the fiber-coupled probe collected dataat multiple points on the top of the pads. Given the challenges posed by the sidemounting system, and the ability to collect spatially averaged data with the fiberprobe, this discrepancy in method may account for why the fiber-coupled viscositymodel proved superior.With regards to the NBSK data, however, the free-space and fiber-coupled method-ology was identical. The models built using the free-space probe generally per-formed similarly the fiber-coupled models, judging on both the relative error andthe cumulative Q2 value. In the case of stiffness, for example, though the rela-tive error of the fiber-coupled model was lower, so was its Q2, indicating that themodel accounted for less covariance than did the free-space model. The converse istrue with porosity/air resistance and tear index, though the overall quality of thesemodels was poor.With respect to the pilot plant reproducibility values, the relative errors for bothtypes of models are relatively close in most cases; an overview of individual caseswill be presented below.5.3.2 Data FusionData fusion is not without its pitfalls. One of the most significant is model com-plication by purposefully introducing new sources of uncorrelated variance, whichruns counter to all the pretreatment steps meant to minimize that occurrence. Wecan gain some insight into the correlation between fusion parameters and spectraldata by examining the PLS model loadings, as shown in Figure 5.4.The red points are each of the 1,024 spectral datapoints (one per pixel on thespectrometer’s CCD), and the green triangles are the fusion parameters from thePulpEye unit. A number of them (blue inset) are very closely grouped with thespectral data, and a few (red inset) are very distant; others lie along an orthogo-nal axis relative to the spectral data. It is expected that shive counts2 should notcorrelate meaningfully with strength properties. Freeness, however, is less eas-ily explained. It is known to correlate with strength properties, though prior re-search has encountered difficulty predicting freeness using PLS models built with2Shives are contaminants in pulp that have escaped digestion; see Ch. 6.1 for a detailed discus-sion.75-0.1 0.0 0.1 0.2 0.3-0.2-0.10.00.10.20.30.40.5LoadingWY 1* ( 67.4 %,  15.2 %)LoadingWY 2* ( 15.5 %,  21.7 %)Breaking Length Loading Plot (WP785, Fused data)Y123456789ACFKLTAverage width / FWTLength-weighted curlSpectral dataFines fractionKinks angleRaw coarsenessTotal crillAverage fibril angleAverage FWTAverage thicknessCoarsenessKappa numberKinks per fiberKinks per mmLength-weighted fiber lengthLength-weighted fiber widthNot Shown:FreenessShivesFigure 5.4: Weighted loading plot (w1 vs. w2) for a tensile breaking lengthmodel built using fused Raman (red) and PulpEye (green) datasets.PulpEye parameters are labelled.Near-Infrared (NIR) spectra. [20, 46] Preliminary attempts to model freeness us-ing Raman spectra also failed to produce reliable results - but since this value canbe accurately determined by the PulpEye analyzer, predicting it spectrally is notnecessary from a process control perspective.Table 5.3 shows the results for the PLS models predicting the properties of theNBSK pulp samples, built using DWT coefficients from the fiber-coupled spectraldataset alone, and the fused coefficients and PulpEye data. Dissolving pulp sam-ples were not included as Domsjö did not provide the attendant PulpEye data fortheir samples.Comparing between Tables 5.2 and 5.3, it is clear that using DWT coefficients tobuild models does not represent a substantial difference in quality relative to usingthe reconstructed spectra. When fusing the coefficients with the PulpEye data,however, the models are generally improved. This is especially the case with the761.5 2.0 2.5 3.0 3.5 4.0 4.51.52.02.53.03.54.04.5Predicted valuesMeasured values5 comp. used for prediction w ith 'BL All'00  BLin modelnot in modelEquation: y=a+bxa: -5.219e-2b:  1.013r:  0.6501r²:  0.4227Count: 70Bias: -6.677e-3RMSEP:  0.3258SEP:  0.32811.5 2.0 2.5 3.0 3.5 4.0 4.51.52.02.53.03.54.04.5Predicted valuesMeasured values4 comp. used for prediction w ith 'BL All'0000  BLin modelnot in modelEquation: y=a+bxa:  0.1525b:  0.9555r:  0.8126r²:  0.6603Count: 72Bias:  6.327e-4RMSEP:  0.2608SEP:  0.2626Figure 5.5: Comparison of models for tensile breaking length, built usingfree-space spectral reconstructions (left) and fiber-coupled DWT coef-ficients fused with PulpEye data (right). All NBSK samples were usedin Raman data collection and PLS modelling.important strength properties (burst, tensile breaking length, TEA, and zero-span),as illustrated by the higher cumulative Q2 values, and the lower relative errors.To take the example of tensile breaking length, the quality of the fused modelis similar to that of the free-space spectral model. Looking forward, we consideredit prudent to incorporate all data (including spectral outliers, and skipping the PCAstep) into a final model for tensile breaking length, using these two data treatmentapproaches. Figure 5.5 shows the results, in the form of two PLS plots. The left-hand plot shows the model from the free-space model (A= 5, rel. error = 9.53 %,Q2 = 0.351) and the fused fiber-coupled model (A = 4, rel. error = 7.63 %, Q2 =0.775). Red points are the training set, and blue points are the validation set.The difference in the plots illustrates the advantage presented by data fusion,especially in cases where samples have outlying tensile breaking length values,beyond the “normal” 3.0-4.0 km region. The points in the fused model plot aretightly aligned along the ideal y = x, unlike in the free-space model. Randompermutations to test for overfitting, as described in Ch. 2.3.2, showed that bothmodels were well-fit, confirming the validity of their predictions.770 170 270 370 470 570 670 770 870 970-0.2-0.10.00.10.20.30.40.5VariableMeasured valuesData pretreatment: Range Normalized Symmetric DWT sym5 7 levels reconstructed by excluding A[7],D[3],D[2],D[1]0 170 270 370 470 570 670 770 870 970-0.2-0.10.00.10.20.30.4VariableMeasured valuesData pretreatment: Range Normalized Symmetric DWT sym5 7 levels reconstructed by excluding A[7],D[3],D[2],D[1]Figure 5.6: Comparison of Raman spectra of NBSK pulp samples, collectedwith the WP785 (left) and our custom-built (right) fiber probes. x-axisshows pixel number, ranging from 265 to 2005 cm−1.5.3.3 Comparison between Fiber-Coupled ProbesBecause both the WP785 and our custom-built fiber probes shared the same laserand spectrometer, there were some logistical constraints at play when it came totesting them concurrently at Canfor’s pilot plant. Because of this, we were ableto collect fewer spectra with our probe (about half as many), though they spanthe same set of samples. For this reason, we encountered some difficulty whenconstructing models with our probe.Figure 5.6 compares the spectra from the two probes. They are evidently quitesimilar; note the absence of fiber autofluorescence (see Fig. 5.3, right). Table5.4 lists the results of these PLS models, using the same processing steps as withthe other fiber-coupled probe: DWT coefficients fused with PulpEye parameters.Comparing with Tab. 5.3, the models for strength and optical properties are slightlypoorer between the fiber probes, though the WP785 models have markedly highercumulativeQ2 values. With more samples, it is highly probable that the two probeswould perform equally well. Careful refinement of the optical path alignment inour custom-built probe would likely improve its performance as well.785.4 DiscussionThe NBSK model results in Tables 5.2 and 5.3 showed a few trends across the vari-ous processing steps: certain targets were much more easily predicted than others.Burst strength, tensile breaking length, TEA, and zero-span generally had the best-performing models, while porosity/air resistance and tear index consistently hadthe poorest. This is in line with prior research. [28, 46]Tear strength depends heavily on fiber length and kink angle, and it has alsobeen suggested that tear strength is positively correlated with both the presence ofamorphous regions within cellulose fibrils, and with xylan content. [9, 16, 19, 20,25] Tensile and burst strength are, on the other hand, mostly dependent on interfiberbonding (recall the Page Equation, Eqs. 1.1 and 1.2), which is not as important afactor in tear strength. This difference manifests itself as an inverse correlationbetween tear and tensile strength. [25]As previously mentioned, interfiber bonding is largely based on chemical fac-tors (hemicellulose content, surface carboxyl content), to which Raman spectroscopyis highly sensitive. It is perhaps tear strength’s increased dependance on fiberlength that limits Raman’s predictive ability in this case.Recall from Ch. 3 that viscosity is an indirect measure of the average Degreeof Polymerization (DP) of fibers, i.e. their polymeric chain length. DP is itself,however, only indirectly correlated with fiber length. [24] Raman’s sensitivity toDP is predicated on probing cellulose chain end-groups; a higher concentrationof groups indicates lower DP, and thus shorter chains. As this is not necessarilyreflective of fiber length, our relative success with predicting viscosity based onRaman spectra does not automatically translate into success with predicting tearstrength.Nonetheless, the data in Tables 5.2 and 5.3 demonstrate that we have developeda sound methodology for data collection and processing with a fiber-coupled Ra-man probe. Figure 5.7 shows a flowchart illustrating how data from such a probecould be integrated into routine operations in a setting such as Canfor’s pilot plant.The mill produces samples, which are analyzed with the PulpEye unit and theRaman probe. Training samples are also refined in the pilot plant, and tested usingtraditional wet-chemical means. Data from the training set are uploaded to the SQL79batchinfo ramantargets pulpeyebatchinforamantargetspulpeyeCPI Refinerand TestingPulpEye PulpEyeExtractEye Online ModelDocumentMill outputResultsTrainingsamplesbatchinforamanpulpeyeappend tobatchinfoappend toramanNewsamplesDWT +AutoscalingDWT +AutoscalingRamanprobeSpectral andlab data:batchinfonirramantargetsPulpEye data:`extracteye`batchinfofull data tablespulpeyebatchinfofull data tablesbatchinfosummary tablestbdRamanprobePLSmodellingFigure 5.7: Flowchart illustrating the Raman data acquisition and processingpipeline.databases (one for PulpEye data, one for spectral data); ExtractEye then reads inthese databases and builds a PLS model with them. New samples from the mill arethen analyzed, and fed into the databases, at which point the ExtractEye model willpredict their properties (Results).Such a process has the capacity to operate in real time. ExtractEye is gearedtowards this, in fact; it can update its models and calculations as new data be-comes available in the SQL database. It also provides real-time information aboutthe process trajectory, showing a time-resolved PCA plot of the samples, deviationand gap plots, and tracking individual variables over time. All these are powerfultools for determining when and if a sample’s predicted value should be consideredunreliable.5.5 ConclusionThe last three chapters have detailed our successful development of instrumentaland modelling approaches to transitioning Raman spectroscopic technology froman academic to an industrial research laboratory. Our proofs-of-concept, as de-scribed in Chs. 3 and 4, illustrated the utility of our spectroscopic methodologyas a process control method. This chapter outlined refinement to our instrumenta-tion and multivariate analysis as we moved forward along this transition with our80industrial collaborators.Many of the instrumental refinements we made arose from practical consider-ations we encountered specifically because of these collaborations, where the in-strumentation was required to operate in non-ideal environments. On the softwareside, we incorporated information from a pre-existing data source - the PulpEyeunit - to enhance our modelling power, and prepared for further integration withexisting systems by aligning our software practices to industry standards.As a final step to push our technology out of the nest, as it were, we set upour Raman probe system in Canfor’s pilot plant, and worked with their staff todevelop an SOP. That way, they can begin to collect Raman spectra as part of theirroutine analysis. As they accumulate data we will continue to update and refineour PLSmodels accordingly, and address any unforeseen challenges that may arise.A number of further steps in this project will be outlined in Ch. 7.The next chapter will describe a different process quality problem facing thepulp and paper industry and the methods we have developed to address it. Thereis a common undercurrent between that problem and the one we have begun toaddress by coupling Raman spectroscopy with multivariate analysis - predictingcritical end-product properties: the problem-solving approach itself. In general,we pair existing technology with data analysis methods and apply them to novelsituations, where they might facilitate access to valuable but hidden information.81Table 5.2: PLS model results from dissolving and NBSK pulp samples, com-paring free-space and fiber-coupled probe performance.TargetPropertyPilot PlantReproducibilityFree-Space Probe Fiber-Coupled ProbeA Rel. Error Cum. Q2 A Rel. Error Cum. Q2Viscosity 4.72 % 4 8.97 % 0.196 4 3.33 % 0.275Density 1.51 % 2 2.38 % 0.143 5 2.93 % 0.315Porosity 4.12 % 2 12.3 % 0.113 4 14.4 % 0.378Opacity 0.30 % 4 1.33 % 0.432 5 0.93 % 0.550Brightness 0.13 % 4 0.59 % 0.286 5 0.52 % 0.247Yellowness - 4 13.7 % 0.222 4 15.4 % 0.296Scattering 1.10 % 4 4.80 % 0.481 3 3.00 % 0.580Absorption 2.09 % 4 6.49 % 0.320 3 4.88 % 0.191L* - 4 0.23 % 0.386 3 0.14 % 0.399a* - 4 12.7 % 0.115 4 13.6 % 0.272b* - 4 13.5 % 0.216 4 15.3 % 0.312Tear index 6.84 % 3 10.2 % 0.114 4 9.14 % 0.246Burst strength 6.72 % 3 8.09 % 0.462 5 9.02 % 0.473Tensilebreaking length 5.99 % 4 5.12 % 0.611 5 7.40 % 0.493Stretch 1.61 % 3 5.71 % 0.250 4 6.66 % 0.349Stiffness 6.37 % 3 6.97 % 0.356 4 6.91 % 0.343TEA 7.20 % 3 8.85 % 0.348 5 10.7 % 0.424Wet zero-spanbreaking length 1.80 % 3 5.32 % 0.406 4 6.31 % 0.590Dry zero-spanbreaking length 4.53 % 3 7.27 % 0.555 4 8.53 % 0.49582Table 5.3: PLS model results from NBSK pulp samples, comparing spec-tral (DWT coefficients) and fused (DWT coefficient and PulpEye data)datasets, gathered with the fiber-coupled Raman probe.TargetPropertyPilot PlantReproducibilitySpectral Data Fused DataA Rel. Error Cum. Q2 A Rel. Error Cum. Q2Density 1.51 % 5 2.86 % 0.339 4 2.82 % 0.452Porosity 4.12 % 6 14.5 % 0.476 4 14.5 % 0.480Opacity 0.30 % 4 0.92 % 0.580 4 0.82 % 0.574Brightness 0.13 % 3 0.50 % 0.240 3 0.50 % 0.354Yellowness - 3 15.0 % 0.483 3 14.5 % 0.478Scattering 1.10 % 3 3.29 % 0.607 3 2.72 % 0.625Absorption 2.09 % 3 4.63 % 0.319 3 4.34 % 0.358L* - 3 0.14 % 0.492 3 0.13 % 0.509a* - 3 13.1 % 0.405 3 12.1 % 0.417b* - 3 14.7 % 0.486 3 14.2 % 0.474Tear index 6.84 % 5 9.00 % 0.265 3 9.05 % 0.256Burst strength 6.72 % 5 9.04 % 0.506 4 7.47 % 0.714Tensilebreaking length 5.99 % 5 7.48 % 0.496 4 5.49 % 0.740Stretch 1.61 % 5 6.48 % 0.393 3 7.52 % 0.348Stiffness 6.37 % 6 6.78 % 0.337 3 5.78 % 0.506TEA 7.20 % 6 10.9 % 0.432 3 11.4 % 0.526Wet zero-spanbreaking length 1.80 % 4 7.86 % 0.591 2 4.34 % 0.775Dry zero-spanbreaking length 4.53 % 4 7.70 % 0.471 2 6.02 % 0.80383Table 5.4: PLS model results from NBSK pulp samples, gathered with ourcustom-built fiber-coupled Raman probe (Fig. 5.2).TargetPropertyPilot PlantReproducibility A Rel. Error Cum. Q2Density 1.51 % 2 2.38 % 0.143Porosity 4.12 % 2 15.7 % 0.348Opacity 0.30 % 3 0.85 % 0.299Brightness 0.13 % 3 0.48 % 0.330Yellowness - 3 14.9 % 0.470Scattering 1.10 % 3 2.54 % 0.247Absorption 2.09 % 2 6.53 % 0.182L* - 2 0.15 % 0.408a* - 3 14.6 % 0.314b* - 3 14.7 % 0.417Tear index 6.84 % Model failed.Burst strength 6.72 % 2 6.76 % 0.489Tensilebreaking length 5.99 % 2 6.03 % 0.412Stretch 1.61 % 2 6.05 % 0.161Stiffness 6.37 % 2 9.34 % 0.147TEA 7.20 % 2 9.30 % 0.193Wet zero-spanbreaking length 1.80 % Model failed.Dry zero-spanbreaking length 4.53 % Model failed.84Chapter 6Automated Detection ofContaminants in Pulp UsingMachine Vision6.1 IntroductionFor the pulp and paper industry, micro- and macro-contaminants represent a majorthreat to end-product value. Contaminants disrupt the normal fiber network andhave a deleterious effect on several value-critical qualities, especially measures ofstrength. [8, 86] Thus, to maximize product value, and perhaps more importantly toavoid costly customer complaints, the industry is very concerned with minimizingthe number of contaminants present in their pulp products. [86, 87]In-process pulps are routinely filtered for contaminants. Typically, a productstream will pass through a physical size barrier, excluding the largest and easiest-to-separate contaminants. The stream will next pass through a continuous proba-bility screen, which separates out contaminants by exploiting product-contaminantinteractions. [88–90] These filters attempt to maximize efficiency (percent of con-taminants that are successfully excluded) while minimizing rejection rate (percentof desired product that is mistakenly excluded); however, in practice, such systemstypically operate around 50% efficiency and 20% rejection rate. [89]85Therefore, process operators must monitor outputs from filtration systems. Ifthe rejection rate exceeds a threshold value, the product streammay be redirected tofurther filtration steps to improve quality. Accurate and automated determinationof the rejection rate, or contaminant concentration in general, is thus an area ofinterest for the pulp and paper industry.One of the more common types of pulp contaminant are bundles of fibers ad-hered together, called shives, that have escaped the pulp digestion process. Shivesare typically on the same order of length as pulp fibers - 3-6 mm for NorthernBleached Softwood Kraft (NBSK) pulp - but are substantially thicker, often hun-dreds of µm, whereas pulp fibers are typically between 10-50 µm. Shives may besolid masses or branched.The most basic method to detect shives in in-process pulp is simple visualinspection and counting. This requires product, in the form of a pulp sheet, beremoved from the production line, then sampled, thinned, dried, and manually in-spected. [87] The inspector compares non-pulp objects in the sample to a standardsizing sheet to classify them, and if an arbitrary number of shives have been ob-served, the pulp batch can be sent back for re-digestion or re-filtration; otherwise,the number of shives can simply be recorded for time-resolved process quality in-formation. In terms of process control, the arbitrary maximum permissible numberof shives per sample represents a boundary on the process trajectory, with an ap-propriate corrective action in place. Enforcing this boundary requires substantialprocess down-time and personnel training, thereby preventing this process frombeing fully automated. [86, 87] Since each pulp sheet weighs about one kilogramas it comes off the baler, and pulp is input into the baler at high speed to produceabout 10 sheets per second, the standard one-gram mill sample collected every halfhour represents a miniscule fraction of what is being produced.Alternatives to visual inspection have been developed in the past several decades.One approach is to use a laminar hydrodynamic flow cell coupled to one or morephotodetectors. In general, a photodetector is placed opposite a visible-light Light-Emitting Diode (LED); shives in a pulp solution passing through the flow cell willobstruct the light beam, causing a corresponding dip in the photodetecor’s outputvoltage. The magnitude and duration of such responses are directly dependent oncontaminant width and length respectively, so response events can be segregated86and counted accurately according to the expected dimensions of contaminants. [87]This method has downsides, however. One major issue is that it is reliant uponshives in solution being oriented lengthwise; that is to say, in order to be accu-rately counted, a shive’s major axis must be parallel to the pulp flow direction. Inreal solutions, this is rarely the case. To overcome this, two perpendicular LED-photodetector pairs can be coupled to the flow chamber. By combining the voltageresponses of both photodetectors, the average number of shives in a given volumeof pulp can be determined. [86]Another major issue with this approach is the color of the pulp solution, whichvaries depending on the raw feedstock, origin of the batch, and processing steps.This can be problematic when using an LED-photodetector setup operating in thevisible light range; different-colored pulps have different absorbance spectra, whichdecreases measurement reliability by attenuating the photodetector’s voltage re-sponses.Since absorptivity in the Near-Infrared (NIR) region is generally low for bioma-terials, [49, 50] altering the LED-photodetector setup to operate in the NIR regionbypasses the problem of different pulp absorptivities due to varying color. [86]A different approach to automated shive detection is the use of optical imag-ing. This is distinct from the above photodetector-based methods, as it relies onwide-field detection, necessitating very different data extraction techniques. Theresulting images can either be visually inspected for shives, or processed with ma-chine vision algorithms to locate shives and other impurities.One approach is to collect images using a flow cell similar to that describedabove, though this substantially restricts the field of view. [8] This approach ex-amines shives (and fibers) instantaneously in two dimensions; the previously de-scribed flow cell observes shives continuously in one dimension. An alternate ap-proach is to capture full-color images of contaminants separated from a pulp sam-ple by hydrodynamic focusing. Machine vision algorithms then process the imagesthen apply a detection threshold for shives. [91]NIR imaging presents an alternative to these visible-light imaging techniques.Because of the increased penetration depth of NIR light, sample thinning - either byhandsheet formation or dilution through a flow cell - is not necessary. Likewise, asthe entirety of a sample can be examined at once, and since the detection algorithm87relies on image feature detection, there is no need to apply size discriminationto the sample. The novel combination of high-resolution NIR imaging combinedwith automated image processing algorithms holds great promise for the automateddetection of shives.6.2 Materials and MethodsWemount unbleached 1 mm thick pulp sheet samples, approximately 3’ square, ona custom-built light box (described below, see Fig. 6.2). We affix smaller piecesof these sheets, as well as bleached pulp sheets of similar thickness, to the lightbox with an insert that blocks stray light. We also vertically affix thin handsheetsamples, approx. 0.3 mm thick and 10” in diameter, to a three-dimensional trans-lation stage using a microscope slide holder for both reflectance and transmissionimaging.In total, Canfor provided us with 142 unbleached pulp sheets, both thick-formatand handsheet format, and 28 small-sized thick bleached pulp sheets (for dirt de-tection). We also conducted a test of our light box at Prince George Pulp and PaperMill; there, we sampled 196 full-size unbleached pulp sheets.Given the opacity and non-uniform brown color of unbleached pulp samples,at least at visible wavelengths, we investigated two imaging modes using differentspectral ranges. These are outlined below.6.2.1 UV-Vis Image AcquisitionCellulose pulp has a relatively low and constant absorbance between 300 and600 nm. It also broadly fluoresces under ultraviolet light, centered near 420 nm.[17, 92] On the other hand, shives (and other contaminants) absorb ultraviolet andvisible light, due to both their opacity and their chemistry. Thus, shives appearas visibly dark patches or fibers on an optical image. Near-ultraviolet imagingprovides higher contrast than visible-light imaging, due to the more pronounceddifferences in absorbance between pulp fibers and shives. Reflected-light Ultra-violet/Visible (UV-Vis) images capture both backscattered and fluorescent light,while transilluminated images capture only transmitted light, with some forward-scattered fluorescence.88We acquire UV-Vis images at fourfold magnification using a commercial CanonEOS 5D equipped with a macro lens (MP-E 65mm f/2.8 1-5x, Canon Inc.). Forultraviolet images, samples are illuminated using high-intensity 385 nm LED flash-lights; for visible-light images, samples are illuminated with a white-light LED ringlight affixed to the lens aperture.6.2.2 NIR Image AcquisitionNIR imaging is an alternative to UV-Vis imaging that provides better sample pene-tration for reasons previously mentioned. [49, 50, 86] Pulp samples typically havean absorbance minimum near 875 nm; [86] thus, NIR transillumination allows forthe imaging of the bulk material of thick samples.NIR image acquisition typically requires exposure times on the order of severalseconds, due to the relatively low light levels penetrating the pulp sheets. Thiscan be mitigated by raising camera sensor’s sensitivity (as represented by the ISOspeed); however, a higher ISO means that recorded images will be more speckled(i.e. noisier), which can confound feature detection.We acquire NIR images using a modified version of a commercial Canon EOS1200D Digital Single-Lens Reflex (DSLR) camera.1 The Complementary Metal-Oxide-Semiconductor (CMOS) sensors in such consumer-end cameras are filteredso as to record visible light only, and NIR light is filtered out. However, after-market users can disassemble the camera and swap the NIR exclusion filter for avisible light filter, so that the sensor records only NIR light near 850 nm. It would beinaccurate to call this “solar-blind”, as the emission spectrum of the sun extends farinto the infrared region;2 however, the camera is blind to mercury-vapor fluorescentand LED lights, as are typically found in laboratory or industrial environments. [50]6.2.3 Sample Mounting SystemsWe developed two sample mounting systems: a preliminary benchtop setup for usewith small samples, and a full-size light box.1Canon Inc., Ohta-ku, Tokyo, Japan; the EOS 1200D is known in the North American market asEOS Rebel T5.2The same is true for incandescent and metal-halide lights.89Figure 6.1: The NIR flashlight setup for small pulp samples.Small Samples: Tabletop SetupWe can define “small samples” to include thin handsheets, as well as pieces ofthick pulp sheets approximately 4” square. These we back-illuminate using a300 mW tactical LED flashlight, emitting NIR light at 835 nm. The camera ispositioned on the opposite side of the sample to the NIR flashlight, so as to ac-quire transilluminated images; Figure 6.1 shows this setup. The camera is fixed inplace directly onto the optical table, and we acquire images using the previously-mentioned macro lens, set at fourfold magnification. The camera stores imagessimultaneously as Raw (.CR2) and .JPEG files.Large Samples: Light BoxIn order to maximize sample throughput, an instrumentation setup should ideallybe designed for samples which have undergone minimal processing. In the caseof unbleached pulp samples, they leave the production line in approximately 3’square sheets, stacked into bales. Testing these sheets at their full size is ideal forhigh-throughput data collection, and brings us one step closer to online or at-lineimplementation.90Figure 6.2: Left: CAD rendering of the light box. Right: The light box, asconstructed. Adjacent monitor shows a live NIR view of the backillumi-nated pulp sample.With that in mind, we use a custom-built light box for NIR imaging that isdesigned to accommodate the full-size pulp sheets. We also made an insert, toadapt the light box to accomodate handsheets as well as 1’ square pulp samples.These sheets are back-illuminated by an LED light bank, consisting of 80 individ-ual LEDs, emitting 835 nm light for a total of 240 W.3 The same modified DSLRcamera images the transilluminated pulp, using a wide-angle lens (EF 24-70mmf/2.8L, Canon Inc.). We constructed a prototype light box out of cardboard, andthe finalized light box was built from plywood and mounted with cooling fans tomitigate heat generated by the LEDs (see Fig. 6.2).6.2.4 Image Processing using MATLABRefer to Appendix A.3.For processing in MATLAB, Raw images must be converted from Canon’s pro-prietary .CR2 format to the standard .TIFF format, which can be read universally.Once imported in MATLAB, we denoise images using a Wiener filter, and ad-3Mfd. by Larson Electronics LLC, Kemp, TX, USA91just their contrast using Contrast-Limited Adaptive Histogram Equalization (CLAHE)(see Ch. 2.1.1). We next crop the images; this step is necessary because peripheralregions tend to be darker, and the region of interest (where the shive is located) isin the center. Because the shives absorb NIR light to a much greater degree than thesurrounding pulp, the images are filtered to equalize out regions that are brighterthan a certain value.The next step is binarization of the image, to isolate darker regions. We deter-mine the binarization threshold value using a triangle algorithm applied to a one-dimensional intensity histogram of the grayscale image, as described by Zack etal. [93] The algorithm smooths the image’s histogram intensities using a moving-average filter (Fig. 6.3-A), then constructs a triangle by drawing a line (the “hy-potenuse”) between the highest-intensity histogram peak and the minimum nonzerobin (not necessarily 0% black; dotted line, Fig. 6.3-B). The algorithm then locatesthe maximum x-distance between the histogram peaks and this “hypotenuse” (soldline, Fig. 6.3-B). This maximized length is output as a percentage of nonzero bins.[93]Once the binarization threshold has been calculated, the pixel intensities in theimage are changed into 0 and 1, depending on their relation to the threshold. Fromthere, the feature detection and edge detection scripts diverge; each is describedindividually in the following sections.Feature DetectionRefer to Appendix A.3.1.The feature detection algorithm exploits the image region analysis algorithmscontained in MATLAB’s Image Processing toolkit. Before calculating the bina-rization threshold (using the triangle algorithm), a 35-point moving average filtersmooths the image intensity histograms. [93] We apply the triangle threshold valuewhen binarizing the image, and we smooth the result again using a median filterto remove high-frequency binary noise. We discard features touching the image’sborder, since they are almost always artifacts caused by uneven illumination oruneven sample thickness.MATLAB’s image processing functions isolates and analyzes connected com-920 50 100 150 200 25000.511.522.533.54Pixel Count× 10 5Raw IntensitiesMA-Filtered Intensities0 50 100 150 200 2500123456789Pixel Count× 10 4MA-Filtered IntensitiesHypotenuseMaximumABFigure 6.3: A: Raw (blue) and moving average (MA; red) filtered intensityhistorgrams of an NIR Shive image.B: Illustration of image binarization threshold determination using thetriangle algorithm, applied to the data in A. Hypotenuse (dotted) andmaximum distance (solid) lines shown; threshold value = 0.4604.ponents of the binary image. The script filters out features smaller than one stan-dard deviation above the mean area of all connected regions; the regions that arefiltered out are typically single-pixel noise artifacts. To be detected as a likelyshive, a feature from this new list must meet four criteria, outlined below.Image Moment Ratio: Corscadden et al. proposed several criteria for detectionof shives using image analysis; they termed one of them the Shive Branch In-dex (SBI), defined as the ratio of the second moment an “idealized image” to thatof a cropped feature image. [8] Their “idealized image” is a thin rectangle with aperimeter equal to that of the feature image. In essence, the SBI represents the de-gree to which the spatial distribution of the feature image deviates from an “ideal”linear distribution. They assert that the SBI effectively quantifies the shape of ashive, which may or many not be branched, and this figure has been incorporated93into commercial software.4 In the present algorithm (see Appendix A.3.1), theinverse of the SBI is used; this is more practical for reasons discussed below.We algorithmically define the “ideal” image as a rectangle with a width of10 pixels and a perimeter equal to that of the shive feature being analyzed, asdetermined by MATLAB’s regionprops function. The second moments of theimages I are calculated as follows:M22 =∑x∑yx2y2I(x,y) (6.1)We then calculate the moment ratio as MfeatureMideal . A smaller moment ratio (i.e.higher SBI) indicates a higher degree of branching. [8] In the context of transillumi-nated imaging of thick pulp sheets, images are typically noisy due to the necessarilyhigh ISO speeds.5 Binarized features are typically speckled; dark noisy regions arebinarized into features with very complex morphologies. These noisy regions typ-ically have very small moment ratios. Since they absorb NIR light, shives appearas very dark and non-noisy patches on an image, consequently having larger mo-ment ratios. Thus in this regime, the desired feature quality for shive detection is ahigh moment ratio; as an inverse measure, the SBI is less practical in this context.Further, the algorithm is focused on detection of shives, rather than analysis of theshives themselves.The minimum moment ratio threshold for shive detection is MfeatureMideal > 15. Forsome features, the moment ratio may be undefined; this is the case when the fea-ture’s perimeter is less than 20 px. Given the typical 4× magnification of the DSLRsetup, these features are on the order of tens to hundreds of microns in diameter,and therefore very unlikely to be shives.Eccentricity: Eccentricity, in the context of feature analysis, is defined in termsof an ellipse with the same second moments as the image feature, as represented inEq. 6.1. As such, eccentricity is always between 0 and 1. It can be calculated asfollows, where a is the length of the ellipse’s semimajor axis and b is the length of4The software is marketed by OpTest Equipment Inc., Hawkesbury, ON, Canada; Corscadden etal. are employees of the aforementioned company.5ISO speed is a measure of camera sensitivity.94its semiminor axis:e=√1− b2a2(6.2)Shives are almost always oblong, with a correspondingly high eccentricity. Thethreshold used in this algorithm is e > 0.8, meaning the feature must have an ap-proximate width-to-length ratio of 3 : 5.Compactness: Compactness measures the ratio of a feature’s perimeter to itsarea, as per the equation below.C =P2Afilled(6.3)The minimum compactness for a shive is C = 15. This may seem counter-intuitive, but in reality it filters out highly noisy features that are the result of thebinarization step. Such features have long perimeters and correspondingly smallcompactness values; shives, on the other hand, should also have small compactnessvalues due to their elongation. The other rules described herein provide an effectiveceiling to the compactness threshold.Darkness: Recognizing that shives absorb NIR light to a much greater degree thannormal pulp, the relative darkness of a feature can be used as a detection parameter.Of course, for a binarized image, “darkness” is irrelevant; therefore, we determinethe “darkness” value by averaging the pixel intensities from the grayscale image,cropped to the feature’s bounding box (as determined by analyzing the binarizedimage). More specifically, the algorithm trims the cropped grayscale subimageby multiplying by it with the corresponding cropped binary subimage, and thepixel intensity average is then calculated, using only nonzero elements. Figure 6.4illustrates this effect. The trimming step eliminates outlying regions which distortthe feature’s average intensity; only the pixels in the feature itself are countedwhen calculating its darkness. The value ranges between 0 and 65,535 (for 16-95Figure 6.4: Shive feature, before and after trimming. Left: Cropped grayscalesubimage showing feature. Right: Trimmed feature image.bit images).We then compare the feature darkness to the average intensity of the overallgrayscale image. If the feature is darker than two standard deviations below theaverage (D< µ−2σ - darker than 97.7% of the image), then it is considered to bea shive candidate. The feature in Fig. 6.4 would fail this test without the illustratedtrimming step.After processing, the algorithm counts the features it has detected, and circlestheir positions on the original input image.Edge DetectionRefer to Appendix A.3.2.The edge detection algorithm used for Raw images is a modified version ofa Canny edge detection algorithm, tailored for use in this context. The trianglethreshold scales a lowpass brightness filter; pixels with intensities greater than thisfilter are equalized to the image’s maximum intensity.Next, we apply two-dimensional Gaussian and Sobel masks (represented inFig. 6.5). Each two-dimensional mask is convoluted across the image area; the96Figure 6.5: Two-dimensional filter masks used during edge detection. Referto Appendix A.3.2 for their numerical representations. Left: Gaussianmask, broad (type 1). Right: Sobel mask, semi-coarse (type 4).Gaussian mask smooths the image, removing high-frequency noise, making edgedetection more reliable. The Sobel mask accentuates edges by locating the image’smost pronounced intensity gradients. We then binarize the filtered image and applya median filter to remove erroneous speckles that escaped the Gaussian filter butwere detected by the Sobel filter.MATLAB’s bwconncomp function isolates connected regions in the binaryimage. Only those connected regions that are larger than two standard deviationsabove the mean region size are kept (µ + 2σ ). These large regions are filled andtheir eccentricity is calculated. Other rules previously described can also be appliedto isolate shives.6.2.5 Image Processing using ENVIRefer to Appendix A.3.3.ENVI6 is an image analysis software package that is commonly used in re-mote sensing applications. Its functionality can be adapted to work for detectingshives. However, ENVI has some limitations that require its use alongside anotherplatform. For instance, MATLAB is much better suited to handle large data sets,such as stacks of images, than is ENVI. Additionally, ENVI is geared toward visual6Harris Geospatial, Boulder, CO, USA97Convert rawimagesThese values can be tailored to suit mill needsUBK sheets from baler Characteristics ofshives (aspect ratio,size, etc.)NIR light boxand camera2-4 sec exposureCrop, brighten,contrast enhanceImages loadedin PythonSegmentation and Classificationwith ENVI (via Python)Image statistics(Size, FOV, etc)Raw shive countShive imageShive statisticsClassificationrulesShives per sheetShives per m²Figure 6.6: Flowchart illustrating NIR shive detection process.workflows requiring frequent user input, as opposed to automated command-linefunctionality.ENVI’s API can be called directly from MATLAB or Python, thereby obviat-ing some of the challenges facing automation of image processing using ENVI. TheAPI requires input in IDL, which is quite distinct from Python or MATLAB’s nativecode language; this limits crosstalk between ENVI and other platforms. Nonethe-less, the ENVI API can output files that can be read in by MATLAB or Python; inthis case, shapefiles are used.Figure 6.6 outlines the process of detection using Python and ENVI, as de-ployed in the mill test. The details of the algorithm are outlined below.Contrast Enhancement and Image SegmentationWe contrast-enhance images using a CLAHE algorithm, provided in Python by theOpenCV package. This step is necessary for subsequent image segmentation; seg-mentation is in essence an edge detection method, and localized contrast enhance-98ment allows dark features such as shives to be easily distinguished from the lighterbackground.Specifically, the contrast enhancement step applies OpenCV’s CLAHE algo-rithm over a small window, which is incrementally rastered over the entire image.After this, ENVI segments the image by applying a Sobel filter (see Fig. 6.5) fol-lowed by a watershed algorithm, and then merges highly similar regions (see Ch.2.1.2 for details).Feature DetectionWe classify segmented shive images using a set of rules similar to those used inthe MATLAB script described previously (Ch. 6.2.4). ENVI calculates a numberof attributes for each image segment, and uses a set of rules to classify segmentsinto classes, each with a tolerance threshold. Each class is defined by one or moreweighed rules, the latter defined by one or more weighted attributes. The classesare uniquely named (alongside “Unclassified” for segments that fail to pass anyrule), and each segment’s class name is stored in the shapefile output. In the shiveapplication, ENVI generates output images that show the original picture overlaidwith the detected shives from the shapefile.The classes and rules can be either user-defined or determined through machinelearning based on a set of user-supplied example images. We employ the formerapproach, since developing a library of cropped shive images was infeasible, givenour limited sample size and the general variability in appearance of shives.Empirical determination of appropriate rules was conducted by manually se-lecting the potential shive segments and comparing their attributes to those of allsegments within an image. Some attributes were found to predict shives, while inothers no correlation was found. Appendix B shows the results of this empiricaldetermination.Spectral Attributes: Image data used in remote sensing applications are oftenmulti- or hyperspectral. In such applications, examining or comparing individualspectral bands can yield a great deal of useful information. For shive detection,however, the images are monochrome and thus have a single spectral band (at99835 nm). Attributes of this single band therefore reflect pixel intensities. Theseattributes, as calculated by ENVI, consist of spectral mean value, minimum value,maximum value, and standard deviation.As discussed previously, shives appear as dark spots on a light background.Shive candidates should therefore have low spectral means and minima. Sincethese attributes are calculated over each image segment, the values are typicallymuch lower than the mean for the overall image. Values can range between 0 and255 (28); according to empirical determination, the spectral mean should be lessthan 95 and the spectral minimum should be less than 75.Texture Mean: Texture attributes relate to the spatial uniformity of grayscale im-age segments, and are calculated for each spectral band. A small moving window,rastered across the image segment, calculates the normalized occurrence probabil-ity of each pixel value within the window; using this data, local attributes are thencalculated according to various formulas. The texture attributes ENVI calculatesare range, mean, variance, and entropy. As shives absorb NIR light to a greater de-gree than cellulose fibers, shive candidates should have relatively low spatial noise,and correspondingly low texture means, calculated as per the equation below.Mtex =Ng−1∑i=0iP(i) (6.4)where Ng is the number of distinct pixel values in the window and iP(i) is thenormalized occurrence probability for each pixel value within the window. Empir-ically, the texture mean for a shive candidate should be less than 111.Area: Segment area is a self-explanatory attribute; given knowledge of typicalshive dimensions, once can estimate that shive candidate’s area should lie between75 and 1,000 pixels. In terms of real area, this equates to image segments withcross-sections between approximately 4.5 and 45 mm2.Convexity: The convexity of a image segment refers to the ratio of an imagesegment’s perimeter to the perimeter of its convex hull. This is, in some ways,100a similar measure to the aforementioned SBI (see Ch. 6.2.4), [8] inasmuch as itmeasures a segment’s deviation from a perfectly rectangular (i.e. convex) shape.Because of the geometry of the imaging setup, branched shives are not as easyto detect as unbranched shives. A shive’s branches may be too small to be detectedat the camera’s distance, or may be out of focus due to the non-trivial thickness ofthe pulp sheet; thus, a branched shive may appear quite similar to an unbranchedshive, as only its core may be detected. That is to say, the core region of a branchedshive may be the only region that absorbs enough NIR light to show up on an image,which would make it practically indistinguishable from an unbranched shive.In this context, then, shive candidates should have low convexities; empirically,a maximum value of 1.5 was determined to be an adequate threshold.Roundness: Segment roundness is another self-explanatory geometric attribute.ENVI calculates roundness as follows:R=4AπL2major(6.5)where A is the segment area and Lmajor is the length of the major axis of the bound-ing box encasing the segment, both measured in pixels. Thus, this measures thedeviation of the segment’s shape from a perfect circle by comparing their areas.The maximum roundness value is 4π , which applies to perfect squares; a perfectcircle would have a roundness value of 1. Image segments with low roundnessvalues tend to be elongated, which is desirable for shive detection. Empirically,roundness values should be less than 0.4.Elongation: The elongation of an image segment is simply the ratio of the ma-jor and minor axis lengths for the segment’s bounding box. There is a fairly highdegree of (negative) correlation between roundness and elongation. During empir-ical rule determination, the Pearson’s correlation coefficient between the two wascalculated as −0.85 for the population of shive candidates. This correlation de-creases when all segments are considered. According to empirical determination,the elongation should be greater than 1.85.101Figure 6.7: Comparison of UV-Vis shive images.A: Ultraviolet reflectance illumination (385 nm LED).B: Ultraviolet transillumination (385 nm LED).C: Visible-light reflectance illumination (white LED).1026.3 Results6.3.1 UV-Vis ImagingUsing UV-Vis imaging for shive detection did not produce very useful results, thoughit might prove useful for imaging so-called stickies, or other contaminants that flu-oresce. Figure 6.7 shows an illustrative example of the UV-Vis imaging results.Subfigures A and B show ultraviolet reflectance and transmittance illumination. InA, some areas are clearly fluorescing (indicating non-cellulosic materials), whilethe cellulose fibers themselves appear relatively translucent, and shives appear asdarker fibers. B shows similar results, though without any visibly fluorescent areas.Subfigure C shows white-light reflectance illumination, similar to what one wouldsee with the naked eye. Shives are again visible, but much less so than in A.6.3.2 Laboratory Tests of NIR ImagingOur NIR imaging tests were more conclusive than the UV-Vis imaging tests. Wewere able to demonstrate a proof of concept, which led us to the mill test to bediscussed in the next section.Comparing Edge and Feature DetectionFor initial testing, we used small pieces of thick pulp sheets, imaged using thebenchtop setup (see Fig. 6.1). Since the camera images at 4× magnification, onlysmall areas of the samples were imaged at once. Figures 6.8 and 6.9 illustrateexample results of these tests, showing scale bars.Subfigure A of Fig. 6.8 shows a particularly large shive, located near the surfaceof the pulp sample. Of note is that the sample itself is quite homogeneous, and theshive is easily observable. The edge detection algorithm contrast-enhances theinput image (B), filters it (C), and outputs an image of the detected feature (D).103Figure 6.8: Stages of NIR shive image edge detection, using MATLAB.A: Unprocessed JPEG image.B: Cropped, contrast-enhanced Raw image.C: Edge-detected Raw image.D: Largest binary region in edge-detected image, filled, cropped, and enlarged.104Figure 6.9: Stages of NIR shive image feature detection, using MATLAB.A: Histogram-equalized Raw image.B: Binarized Raw image.C: Feature-processed Raw image.D: Detected shive feature, circled, on original Raw image.105We realized that this approach would work well only in cases such as these,where features are very well-defined. However, features located deeper in thestructure, shive features are not so easily resolved, as seen in Figure 6.9 (A).In this case, the feature in question is buried within the sheet, and the resultingimage is much noisier, as it was recorded with a faster ISO speed. SubfiguresB and C show the feature detection algorithm’s process, including binarization,de-noising, and feature detection. Subfigure D shows the result, where the shivefeature has been selected according to the detection criteria.In all, we collected 87 images of various regions of our dozen test samples.By comparison with visual inspection, our edge detection algorithm had a 46.6%positive rate, while the feature detection algorithm performed slightly better, at65.8% positive. These results indicated that much work remained to be done beforethis would become viable. However, we reasoned that these benchtop tests mightnot be representative of a real dataset, with wider-field imaging using the full-sizedlight box.Light Box TestingAfter building the light box (seen in Fig. 6.2), we used it to image 142 thickpulp samples, as well as handsheets that had been made from them. Figure 6.10shows the visual output from one example, processed using an early version ofour ENVI feature detection script (see Ch. 6.2.5). As indicated by the scale bar,the image is of a much larger area than the previously discussed results. The pulpsheet is of inhomogeneous thickness, and this is easily observable in wide-fieldimages such as this: “cloudy” areas in the image indicate thicker regions in thesheet, and the speckled regions are thinner. The latter look speckled both becauseof absorbing features in the sheet, and because of how the local CLAHE algorithmoperates. White areas in the image are very thin regions in the pulp sheet, wherethe CLAHE was unable to effectively equalize the image contrast. The edges of theimage are noticeably different from the main body; this is because the image is ofa small sheet.106Figure 6.10: NIR shive image feature detection, using ENVI. Red filled areas are probable shives, green filled areas arepossible shives.107Mill CountSample IndexImaging Count1110100123Figure 6.11: Mill and imaging shive counts for handsheet samples - lab test.Also visible is the wire pattern used by the industrial machinery to create thesheets; these are the slightly off-vertical bands visible through the image. The wirepattern physically appears on the internal surface of the sheet, facing the NIR LEDbank, and the smooth surface faces the camera.The actual results of the algorithm are output and displayed as filled shapeson the image. Red regions are features that pass strict criteria, indicating they aremost likely to be shives. Green regions fall into a broader category of probableshives; these are clearly more numerous. In this case there are 23 likely shives and31 possible shives.We discovered that imaging the handsheets provided better correlation betweenour count and the mill’s shive count than did imaging the thick sheets. Figure 6.11shows the results of this determination: the shive counts for the 142 handsheetsamples (orange), along with their mill-produced manual shive counts (blue). Thecounts are typically reported in terms of area, which from the images is calculatedusing the camera’s EXIF metadata.However, having to prepare handsheets for shive detection is a laborious pro-cess that the light box is designed to obviate. Continued testing of the light boxin a mill environment provided further insight into how to progress. This will be108Figure 6.12: NIR image of a NBSK pulp sheet, detecting dirt particles usingENVI. Blue circled regions indicate detected particles.discussed in the next section.Identifying Dirt in Bleached Pulp SamplesIn addition to shives, dirt inclusions are also of concern to the pulp industry, espe-cially in applications where product whiteness is integral to product quality. Cur-rent standard detection methods are identical to those for shives: infrequent sam-pling, paper making, visual inspection. We collected NIR images of 28 NBSK pulpsheet samples to determine if we could apply our machine vision methodology tothis problem.As with the unbleached pulp sheets, we used the lightbox to collect imagesunder identical conditions. The different (i.e. spherical) nature of the dirt particlesrequired some alteration of the feature selection algorithms. An example result isshown in Figure 6.12. The blue regions are the detected “dark round areas”, likelyto be dirt. Red and green regions are shive candidates. Detection results in general1091101001000Mill Count Imaging CountSample IndexCounts1 196Figure 6.13: Mill and imaging shive counts for handsheet samples - mill test(using ENVI and Python).were not particularly encouraging, since the wire pattern (similar to that seen inFig. 6.10) was extremely prominent and interfered with the feature detection. Thiswas due to the bleached nature of the pulp sheets, which lowered their absorption.6.3.3 Mill Test of NIR ImagingWe conducted an in-place test of our light box system at Prince George Pulp andPaper Mill.7 In the mill, operators conduct a manual count every half hour: theyremove a pulp sheet from the baling machine, form a handsheet from a piece of it,manually count the number of shives based on visual inspection, and then archivethe full-sized sheet. These “retains” are stored for 12 months; we were able to testthese, plus several sheets taken fresh from the baler (196 in total). After imageacquisition, we immediately processed the image using ENVI (as described in Ch.6.2.5) with both MATLAB and Python, to produce a shive count, which we couldthen compare with the mill’s data.7Prince George, B.C.; owned by Canfor.110Figure 6.14: Results of NIR shive image feature detection, using ENVI and MATLAB. Red areas are probable shives,green areas are possible shives. Field of view is approximately 19.4 × 12.9 cm. Shive count: 7.111Figure 6.15: Results of NIR shive image feature detection, using ENVI and Python 3.7. Red areas are probable shives,green areas are possible shives. Field of view is approximately 21.5 × 14.4 cm. Shive count: 13.112Figures 6.14 and 6.15 illustrate the difference in results for the same sample.The Python script produced a higher count (13 likely shives, 16 possible shives)than the MATLAB count (7 likely shives, 7 possible shives). There are some dif-ferences in what was classified as likely (red) and possible (green), but the resultsgenerally accord. Based on visual inspection of the image, the Python results aremore accurate.Figure 6.13 shows the shive count results (orange) along with their mill-producedmanual shive counts (blue). Note that the scale is logarithmic to accommodate aspike in the mill count - a shive outbreak. Note also that there are gaps in thedata, due to incomplete mill records, and samples where the feature detection algo-rithm returned a shive count of zero. It is apparent that there is no close correlationbetween the two counts, though there are a number of possible explanations, theforemost of which is sampling bias.The mill count is done only on a small portion of the sheet (with a total weightof 1 gram), whereas the imaging works over a much larger area. There is no reasonto assume that shives should be evenly distributed throughout a pulp sheet sample.As previously described, relative to the production rate, the standard mill testingroutine samples a tiny amount at very infrequent intervals, so it is very possiblethat shive outbreaks could pass under the radar of the mill operators. The higherthroughput of the machine vision algorithms somewhat mitigate this danger.6.4 Discussion and ConclusionsThe sample shown in Figs. 6.14 and 6.15 was one that was removed directly fromthe baler, and immediately tested, whereas most of the other samples had been instorage for months. Figure 6.16 shows part of an older sheet, produced 11 monthsbefore the image was recorded. The image is dominated by horizontal creases,which was typical for many of the older pulp sheets. These severely interferedwith the feature detection algorithm. Creases were largely absent from the freshsheets.The baler lies at the end of the production line, after the pulp is dried in massiveovens; it slices the 15-foot wide product stream into five 3’-by-3’ sheets twice persecond, and stacks them. When the stacks reach 50 sheets deep, they are then113Figure 6.16: A highly creased unbleached pulp sample. Field of view is ap-proximately 2.2 × 1.4 cm.weighed, compressed, wrapped, and prepared for shipping. By removing sheetsdirectly from the baler, we avoided the compressing step, and since they had veryrecently come out of the drying ovens, they still retained some moisture. Based onthe much higher quality of the images of the fresh samples compared to the oldersamples, we concluded that it would be best to continue testing with fresh sheets.An on-line adaptation of the NIR imaging system would probably be installedsomewhere between the dryers and the baler, as there is in fact another visible-lightimaging system there, monitoring other process parameters. Since the productionline - the 15-foot wide sheet of pulp moving at 2−3 m/s - is quite taut, and wouldstill retain a good deal of moisture at that point, the crease problem may in factdisappear. If it does not, it could be mitigated by refining the classification rules,to better distinguish between shives and creases. This is not a straightforward task,however, since there is little to distinguish them, except perhaps that shives haveuniform high absorption, and creases often exhibit adjacent light and dark regions.Nonetheless, the progress thus far has shown that NIR imaging shows promiseas a machine vision system for the automated detection of shives, but that muchwork remains to be done before it is fully ready to be implemented in a productionenvironment.114Chapter 7Future Work: Advancing to theMillMuch work remains to be done before spectroscopy and chemometrics find widespreadadoption in pulp mills. Before that happens, these methods must be established asparallel analysis techniques to be used in pilot plants, alongside traditional meth-ods. This is the natural continuation of the project presented in this thesis. Withthe continued support of Canfor Pulp LP and Mitacs, we can envisage pursuing thefollowing avenues of investigation.7.1 NIR ImagingAs detailed in the previous chapter (Ch. 6.4), we determined that the best practicefor shive detection was to image freshly-dried sheets to minimize the algorithm’sconfusion between shives and creases. A possible avenue for mitigating the creaseproblem, outside of further refining the algorithm, would be to develop an imagingsystem that can distinguish surface morphology. With the Near-Infrared (NIR) tran-sillumination system still in place to detect shives via absorbance, one can imaginea two-color visible light system that illuminates the surface of a pulp sheet fromtwo oblique angles, thus casting its surface morphology into relief.Given the geometry of a production line, it can be assumed that creases primar-ily form laterally with respect to the motion of the product, as it passes over rollers115that maintain the line’s tautness. These creases are also generally perpendicular tothe wire pattern applied to the bottom of drying pulp; Fig. 6.12 shows an example,with the wire pattern oriented horizontally, and numerous visible creases orientedvertically. Thus, the angles of this two-color illumination system could be easilyfixed to throw the creases into optimal relief. A hyperspectral camera could thensimultaneously capture both these visible light bands and the NIR transilluminationband. The visible light spectral bands could be analyzed to detect creases, and amachine vision algorithm could apply some mathematical transformations to theNIR band that corrects for them.To progress with such a design and to collect more data will require an im-plementation of the NIR imaging system as a side-by-side process monitoring step,to be conducted at the same time as the traditional visual inspection. To this end,we will design a new light box that is better-suited to routine use. This new lightbox will likely be oriented horizontally with a fixed camera mount above, to allowa mill operator to slide a pulp sheet into it and quickly record an image, whichcan then be fed into the machine vision algorithm. Such an orientation would bewell-suited to developing a surface illumination system for crease mitigation aswell.7.2 Raman Hardware IntegrationOne limitation of integrating a Raman probe with a PulpEye analysis module, asdescribed in Ch. 4, is its throughput. The PulpEye module is able to sample theproduct line automatically, and can accept manually introduced samples. But, asthe module conducts a battery of tests on each pulp sample, its sampling cycle (sev-eral minutes) is significant when compared with simple Raman analysis (spectralintegration in seconds). Thus, used as a standalone method, a Raman probe cancollect information with a vastly higher throughput than can a PulpEye module, atthe cost of hardware convenience.A standalone Raman probe, without a pad-forming brightness chamber, be-comes susceptible to variations in sample morphology. Therefore, it must be posi-tioned at a point along then production line where the morphology is highly uni-form. As with the NIR shive imaging, an ideal point would be after the dryers.116Another critical consideration is ensuring the probe’s objective lens is heldfixed above the product at its focal length. Luckily, there are other analysis mod-ules used on the post-dryer part of the product line that are ideal for incorporat-ing a Raman probe. Chief among them is the basis weight meter, which are inwidespread use thanks to this parameter’s high importance. Basis weight meterscontinuously measure the absorption of β – particles through a product sheet, anduse that information to calculate product weight per unit area; the choice of sourceradioisotope (eg. 147Pm, 85Kr, 90Sr) depends on the desired dynamic range of mea-surement. Two closely-spaced parallel plates (as narrow as 10 mm) are rasteredback and forth across the production line while continuously detecting transmittedparticles, in effect tracing out a triangle wave pattern across the production line.One could envisage how such a system would be easily adapted to include ahardened Raman probe, fiber optically coupled to a fixed laser source and spec-trometer. Such a probe could integrate continuously during rastering, producingspatially averaged spectra. Like with the PulpEye module, infrastructure wouldalready be in place for data handling and storage. Data fusion could be used toincorporate environmental and process variables (sample temperature and mois-ture) as well as other process analyses (as in Tab. 5.1) to maximize measurementreliability and account for external variance.This is a long-term goal, well-suited for a full-scale mill trial; as previouslymentioned, Raman spectroscopy must first be established as a parallel analysistechnique in a pilot plant setting.7.3 Calibration TransferAs chemically specific as Raman spectra are, they also contain some character-istics specific to the spectrometer that recorded them. This includes contributionsfrom grating dispersion, alignment angle, and Charge-Coupled Device (CCD) fixed-pattern noise.Partial Least-Squares Regression (PLS) models will discount this extra infor-mation, as it is invariant within a dataset - provided the dataset is all collected withthe same spectrometer. This presents a serious issue when it comes deploying aRaman probe in a process environment: new models must be built for every new117probe installation, requiring hundreds of samples be analyzed before any resultscan be gathered. This is highly impractical, and surely represents a barrier to adop-tion of this technique.This is, however, a well-recognized problem both with Near-Infrared Spec-troscopy (NIRS), and a number of approaches have been developed to overcome it.These typically use a transfer function to map information from one spectrometerto another. Such transfer functions can be piecewise or continuous, and are appliedto spectral data, regression coefficients, or even predictions themselves. Anotherapproach develops a standardized PLS model that works with any spectrometer.[7, 94–99]These approaches have recently been applied to Raman spectroscopy as well,mostly to pharmaceutical samples. In these cases, well-characterized standardswere used to assess the differences between spectrometers and develop the transferfunctions. The functions typically correct for baseline and dispersion shifts, thoughthese are often nonlinear across a spectrum. [100–108]For now, calibration transfer methods remain primarily of academic interest,and no agreed-upon standard method exists. Itoh et al recently illustrated the diffi-culty of standardizing simple spectra (polystyrene, cyclohexane, and benzonitrile)between 26 Raman spectrometers to within the standard tolerances defined by thepharmaceutical industry. [108] Though this remains an active area of research, itremains to be seen how well calibration transfer might work when applied to morecomplex materials such as cellulose.7.4 Facilitating Data AnalysisNo matter the instrument or sensor, a process control system must have an intuitiveuser interface. We cannot assume that everyone tasked with operating the systemwill be well-versed in the underlying physics and chemistry that it relies upon,nor in the mathematics and code. Without a fully autonomous system - as yet, astep too far - the ideal user interface would be a push-button system that triggersthe data collection and treatment steps, which then run in the background withoutfurther input. The Python and SQL script for processing Raman data (see Ch. 5.2.2)was designed with this in mind; specifically, the only user input it requires is the118selection of a folder containing the spectral data files, and of a file containing thetarget data; everything else is done programatically. Likewise, the Python and IDLscript for processing NIR shive images only requires folder selection, and one couldenvisage developing an Structured Query Language (SQL) database for its output.Further simplification of this process would require the creation of a dedicatedcontrol interface that would replace the Original Equipment Manufacturer (OEM)software for data collection (imagery or spectroscopic), and then calls the appro-priate Python script. This can be easily achieved with LabVIEW.1Until such time as that happens, though, a Standard Operating Procedure (SOP)must be developed in collaboration with pilot plant technicians for data collectionwith each system. This was addressed in Ch. 5.1; an SOP must not only coverbest practices for data collection, but also more easily-overlooked aspects such asfilename conventions and logging procedures.An SOPmust also account for longer-term considerations, such as maintenanceof the PLS models, as well as basic systems maintenance and troubleshooting stepsshould something go wrong. PLS models should be routinely updated with newdata and revalidated to ensure that they can continue to make accurate predictions.As part of a full-scale trial in a pilot plant setting, developing SOPs will facilitatethe transfer of knowledge and experience from academia to industry.7.5 Ongoing Modelling RefinementsIn addition to maintaining PLS models as data acquisition progresses, there are afew ways in which our approach to modelling might be refined.7.5.1 Extending Data FusionThe data fusion steps detailed in Ch. 5.2.3, where we incorporate information froma PulpEye analysis unit into our spectroscopic models, only scratch the surface ofthe trove of information that PulpEye units generate. Given our understanding ofhow fiber morphology affects strength properties (recall the Page Equation, Eq.1.1, and Ch. 1.5.3), it stands to reason that incorporating more detailed data aboutpulp fibers than are listed in Table 5.1 would improve prediction accuracy. PulpEye1National Instruments Corp., Austin, TX, USA119units provide detailed categorical breakdowns of fiber properties including length,width, and curl. With care so as not to overfit, fusing these into a PLS predictionmodel would be likely to substantially increase the model’s accuracy.7.5.2 Accounting for Refining EnergyOne of the key goals of implementing a Raman analysis system is to obviate theneed to refine pulp samples in a pilot plant. Since the in-process pulp will be refinedduring its transformation into final consumer products, these products’ strengthproperties will be different to those of the unrefined pulp feedstock. To accountfor this, pilot plants refine pulp samples to various energies and then test them, inaddition to testing unrefined pulp samples. [19, 22, 82]Prior research has demonstrated how the relationship between refining andstrength properties can be investigated using spectroscopy and chemometrics. [2,22] Given this, it should follow that spectroscopy can be used to predict how refin-ing affects properties. Given the requisite data from refined pulp samples alongsidethe Raman spectra of their unrefined counterparts, this should be a straightforwardtask.7.6 Unbleached Pulp: A Way Forward with Raman?One final avenue of future work will be what to do with unbleached pulp, whichretains high concentrations of lignin. This presents a very serious problem forRaman analysis. Given lignin’s immensely complex polyphenolic structure, it aut-ofluoresces to an extreme degree, totally overwhelming any spontaneous Ramansignal. [18, 109, 110] Lignin spectra have been recorded with enhanced Ramantechniques, such as Surface-Enhanced Raman Spectroscopy (SERS), FT-Raman, orultraviolet Resonance Raman spectroscopy, while Larsen and Barsberg approxi-mated lignin by examining typical monophenolic end groups. [18, 62, 109–112]Other groups found that immersion in water or D2O effectively quenched the fluo-rescence. [109, 110, 113, 114]Researchers have successfully used dispersive NIRS to analyze and predict prop-erties of unbleached pulps using multivariate analysis; Brink et al did so using aPulpEye analysis module, similar to the work described in Ch. 4. [21, 38, 39, 41,12044] This represents an immediately applicable methodology for use in a pilot plantsetting, provided enough samples can be collected. Despite a good deal of researchhaving been published demonstrating the usefulness of this method, NIRS remainssensitive to the variable environmental conditions in a process environment. Thus,it remains unclear if NIRS could be easily implemented in a mill environment. [44]Nonetheless, we can easily collect NIR spectra from unbleached samples in a pilotplant setting and apply our data treatment expertise to build PLSmodels as we havewith Raman.Of particular interest for a future potential process control application is theuse of Raman spectroscopy operating in the ultraviolet range. As noted, most workdone on lignin has used ultraviolet Resonance Raman spectroscopy. [109, 112,115, 116] Resonance Raman exploits the similarity in energy between the excita-tion beam and that of an electronic transition in the target molecule; this resonancecauses the Raman signal to be enhanced by several orders of magnitude. However,this presents the limitation that the resonant wavelengths must be known before-hand, and moreover, that the instrument must be able to generate these wavelengthsat will. [109, 115, 117] This requires more complex instrumentation than simplespontaneous Raman (recall Fig. 2.4).For reasons previously discussed, high sensitivity to lignin is not necessarily arequirement for modelling strength properties; instead, what is important is sensi-tivity to hemicelluloses and carboxyl groups. So, we need not ensure a resonanceeffect with lignin occurs in order to observe useful Raman spectra, so long as thefluorescence signal from lignin can be avoided or suppressed.To this end, it is worth considering spontaneous ultraviolet Raman spectroscopy,which would keep the instrumentation as simple as with 785 nm Raman. The rea-son ultraviolet Raman would be advantageous here is due to the gap between thefluorescence emission band and Raman (Stokes-shifted) band. This phenomenonhas been exploited to analyze catalysts and to map chemical vapor-deposited (CVD)diamond samples. [117, 118]We analyzed 20 unbleached kraft pulp samples provided by Canfor with anultraviolet fluorescence spectrometer; a representative example is shown in Figure7.1. The x-axis shows the emission wavelength, ranging from 200 to 600 nm; theexcitation wavelength was 190 nm; the region below 350 nm is largely devoid of121200 250 300 350 400 450 500 550 600Wavelength (nm)-1000100200300400500600700800Intensity (arb.)Figure 7.1: Ultraviolet fluorescence spectrum of an unbleached kraft pulpsample. Excitation wavelength = 190 nm.fluorescence peaks. The Raman spectra collected in this work span the range 200-2000 cm−1, and with our standard excitation wavelength of 785 nm, this equatesto a range of 797.5-931 nm. However, at an excitation wavelength of 190 nm asin Fig. 7.1, this wavenumber range equates to 191-197.5 nm. This is below thedetection window in Fig. 7.1, and therefore should be free of any fluorescenceinterference.The main drawback with this approach is expense. Until recently, ultravio-let laser wavelengths required the use of ion lasers (typically Ar+), which due totheir size and complexity would be highly impractical in any non-academic setting.[117, 118] Recently, ultraviolet diode lasers have been developed; these are idealfor industrial applications due to their small size and internal simplicity. However,they are limited to the near-ultraviolet range (300-400 nm). Mid-ultraviolet lasers,which would be of the most use in this application, are not yet widely available.Another approach would be to implement a time-resolved spectroscopy system,exploiting the difference in timescales between Raman scattering, which is almostinstantaneous, and fluorescence emission, which may be delayed by nanoseconds.[119] Such techniques are often used for depth profiling, and operate at any wave-length. [120, 121] Time-resolved Raman spectrometers have only recently becomecommercially available, and are still quite expensive. However, given time and fur-122ther research, this technique, or spontaneous ultraviolet Raman spectroscopy, maybecome viable ways to analyze unbleached pulp samples.7.7 Concluding RemarksThe progress made thus far, as presented herein, clearly demonstrates the utility ofRaman spectroscopy and chemometrics as applied to process control in the pulpand paper industry. Starting with a collection of dry pulp sheets and a Ramanspectrometer, we have advanced to a parallel implementation of our techniques inCanfor’s pilot plant, and refined our instrumental and data treatment approaches.With the assistance of a co-op student, Canfor will continue to gather Raman dataand send on it to us, which will help us improve our PLS models.This chapter presents some clear and immediate future steps for the continu-ation of the project, as well as a few avenues of academic interest. In all, we areconfident that we will be able to successfully transition this project from academiato the industry, and implement spectroscopy and chemometrics as a process controlsystem in a pulp mill. Doing so represents a small step forward along the road to“Industry 4.0”.123Bibliography[1] A. Bain, “Property prediction with Raman spectroscopy in the pulp andpaper industry,” Master’s thesis, The University of British Columbia,Vancouver, June 2016. → pages v, 25, 35, 39, 40[2] N. Tavassoli, Data mining in the spectro-microscopic analysis of complexmaterial. PhD thesis, The University of British Columbia, Vancouver,2017. → pages v, 9, 12, 25, 120[3] A. Christy, A. Bain, and E. Grant, “Spectrochemical prediction of theviscosity of dissolving pulps,” tech. rep., The University of BritishColumbia, Vancouver, BC, Canada, 2016. → pages v[4] D. A. Skoog, F. J. Holler, and S. R. Crouch, Principles of InstrumentalAnalysis. Brooks/Cole, 6th ed., 2007. → pages xviii, 19, 20, 22, 24[5] N. Tavassoli, Z. Chen, A. Bain, L. Melo, D. Chen, and E. R. Grant,“Template-oriented genetic algorithm feature selection of analyte waveletsin the Raman spectrum of a complex mixture,” Analytical Chemistry,vol. 86, pp. 10591–10599, Nov. 2014. → pages xix, xx, 27, 28, 33, 36, 137[6] P. Watson and M. Bradley, “Canadian pulp fibre morphology: Superiorityand considerations for end use potential,” The Forestry Chronicle, vol. 85,pp. 401–408, June 2009. → pages xix, xxi, 7, 8[7] J. H. Kalivas, “Multivariate calibration, an overview,” Analytical Letters,vol. 38, pp. 2259–2279, Nov. 2005. → pages xx, 30, 31, 118[8] K. W. Corscadden, S. Jack, C. Ross, and R. J. Trepanier, “Accurate shiveclassification using image analysis,” Appita Journal: Journal of theTechnical Association of the Australian and New Zealand Pulp and PaperIndustry, vol. 61, pp. 56–59, Jan. 2008. → pages xx, 85, 87, 93, 94, 101124[9] H. Sixta, “Pulp properties and applications,” in Handbook of Pulp(H. Sixta, ed.), ch. 11, pp. 1009–1067, Wiley-VCH Verlag GmbH & Co.,2006. → pages xxi, 4, 5, 6, 8, 10, 79[10] T. Trung and B. Leblon, “The role of sensors in the new forest productsindustry and forest bioeconomy,” Canadian Journal of Forest Research,vol. 41, pp. 2097–2099, Nov. 2011. → pages 3, 12, 17, 20[11] R. W. Kessler, “Perspectives in process analysis,” Journal ofChemometrics, vol. 27, pp. 369–378, Sept. 2013. → pages 3, 12, 13[12] G. Annergren, “Fundamentals of pulp fiber quality and paper properties,”in TAPPI Pulping Conference, pp. 29–40, 1999. → pages 4, 7, 8, 34[13] J. Sjöberg, Characterization of chemical pulp fiber surfaces with anemphasis on the hemicelluloses. PhD thesis, Kungliga Tekniska högskolan- Royal Institute of Technology, Stockholm, 2003. → pages 4, 5, 8[14] M. Åkerholm, Ultrastructural aspects of pulp fibers as studied by dynamicFT-IR spectroscopy. PhD thesis, Kungliga Tekniska högskolan - RoyalInstitute of Technology, Stockholm, 2003. → pages 4, 5, 20[15] Z. Chen, T. Q. Hu, H. F. Jang, and E. Grant, “Multivariate analysis ofhemicelluloses in bleached kraft pulp using infrared spectroscopy,” AppliedSpectroscopy, vol. 70, no. 12, pp. 1981–1993, 2016. → pages 4, 5, 20[16] J. Pere, E. Pääkkönen, Y. Ji, and E. Retulainen, “Influence of thehemicellulose content on the fiber properties, strength, and formability ofhandsheets,” BioResources, vol. 14, no. 1, pp. 251–263, 2018. → pages 5,8, 79[17] C. Heitner, “Light-induced yellowing of wood-containing papers,” in ACSSymposium Series, pp. 2–25, American Chemical Society (ACS), June1993. → pages 5, 88[18] Z. Sun, A. Ibrahim, P. B. Oldham, T. P. Schultz, and T. E. Conners, “Rapidlignin measurement in hardwood pulp samples by Near-Infrared Fouriertransform Raman spectroscopy,” Journal of Agricultural and FoodChemistry, vol. 45, pp. 3088–3091, aug 1997. → pages 5, 20, 21, 25, 120[19] R. Horn, “Morphology of pulp fiber from hardwoods and influence onpaper strength.,” tech. rep., United States Forest Service, Forest ProductsLaboratory, 1978. → pages 6, 79, 120125[20] R. A. Young, “Comparison of the properties of chemical cellulose pulps,”Cellulose, vol. 1, pp. 107–130, jun 1994. → pages 6, 76, 79[21] P. Fardim, M. M. C. Ferreira, and N. Durán, “Determination of mechanicaland optical properties of eucalyptus kraft pulp by NIR spectrometry andmultivariate calibration,” Journal of Wood Chemistry and Technology,vol. 25, pp. 267–279, Oct. 2005. → pages 6, 20, 21, 120[22] L. Wallbäcks, U. Edlund, T. Lindgren, and R. Agnerno, “Multivariatecharacterization of pulp,” Nordic Pulp & Paper Research Journal, vol. 10,pp. 88–93, May 1995. → pages 7, 9, 20, 120[23] P. Strunk, Å. Lindgren, B. Eliasson, and R. Agnemo, “Chemical changes ofcellulose pulps in the processing to viscose dope,” Cellulose Chemistry andTechnology, vol. 46, no. 9-10, pp. 559–569, 2012. → pages 7, 34[24] L. Lapierre, J. Bouchard, and R. Berry, “On the relationship between fibrelength, cellulose chain length and pulp viscosity of a softwood sulfitepulp,” Holzforschung, vol. 60, pp. 372–377, jul 2006. → pages 7, 34, 79[25] N. Wistara and R. A. Young, “Properties and treatments of pulps fromrecycled paper. Part I. Physical and chemical properties of pulps,”Cellulose, vol. 6, no. 4, pp. 291–324, 1999. → pages 8, 10, 79[26] S. L’Anson, A. Karademir, and W. Sampson, “Specific contact area and thetensile strength of paper,” Appita Journal: Journal of the TechnicalAssociation of the Australian and New Zealand Pulp and Paper Industry,vol. 59, no. 4, p. 297, 2006. → pages 9, 10[27] A. Bogomolov, “Multivariate process trajectories: capture, resolution andanalysis,” Chemometrics and Intelligent Laboratory Systems, vol. 108,pp. 49–63, aug 2011. → pages 12, 13[28] D. Chen, T. Trung, H.-F. Jang, D. Francis, and E. Grant, “High-throughputprediction of physical and mechanical properties of paper from Ramanchemometric analysis of pulp fibres,” Canadian Journal of ForestResearch, vol. 41, pp. 2100–2113, Nov. 2011. → pages 12, 25, 27, 28, 79[29] R. W. Kessler, W. Kessler, and E. Zikulnig-Rusch, “A critical summary ofspectroscopic techniques and their robustness in industrial PATapplications,” Chemie Ingenieur Technik, vol. 88, pp. 710–721, apr 2016.→ pages 12, 17126[30] J. Zimmerman, S. Pizer, E. Staab, J. Perry, W. McCartney, and B. Brenton,“An evaluation of the effectiveness of adaptive histogram equalization forcontrast enhancement,” IEEE Transactions on Medical Imaging, vol. 7,pp. 304–312, Dec. 1988. → pages 15[31] A. M. Reza, “Realization of the contrast limited adaptive histogramequalization (CLAHE) for real-time image enhancement,” The Journal ofVLSI Signal Processing-Systems for Signal, Image, and Video Technology,vol. 38, pp. 35–44, Aug. 2004. → pages 15[32] J. B. Roerdink and A. Meijster, “The watershed transform: Definitions,algorithms and parallelization strategies,” Fundamenta Informaticae,vol. 41, no. 1,2, pp. 187–228, 2000. → pages 15[33] D. J. Robinson, N. J. Redding, and D. J. Crisp, “Implementation of a fastalgorithm for segmenting SAR imagery,” tech. rep., Defence Science andTechnology Organization (Ministry of Defence, Commonwealth ofAustralia), 2002. → pages 16[34] H. Antti, M. Sjöström, and L. Wallbäcks, “Multivariate calibration modelsusing NIR spectroscopy on pulp and paper industrial applications,” Journalof Chemometrics, vol. 10, pp. 591–603, sep 1996. → pages 20, 21[35] N. Durán and R. Angelo, “Infrared microspectroscopy in the pulp andpaper-making industry,” Applied Spectroscopy Reviews, vol. 33,pp. 219–236, Aug. 1998. → pages 20[36] J. B. Hauksson, G. Bergqvist, U. Bergsten, M. Sjöström, and U. Edlund,“Prediction of basic wood properties for Norway spruce. interpretation ofnear infrared spectroscopy data using partial least squares regression,”Wood Science and Technology, vol. 35, pp. 475–485, Dec. 2001. → pages20, 21[37] J. J. Workman, “Infrared and Raman spectroscopy in paper and pulpanalysis,” Applied Spectroscopy Reviews, vol. 36, pp. 139–168, June 2001.→ pages 20, 25[38] P. Fardim, M. M. C. Ferreira, and N. Durán, “Multivariate calibration forquantitative analysis of eucalypt kraft pulp by NIR spectrometry,” Journalof Wood Chemistry and Technology, vol. 22, pp. 67–81, June 2002. →pages 20, 21, 120127[39] H. Henriksen, T. Næs, V. Segtnan, and A. Aastveit, “Using near infraredspectroscopy for predicting process conditions. A laboratory study frompulp production,” Journal of Near Infrared Spectroscopy, vol. 13,pp. 265–276, oct 2005. → pages 20, 120[40] V. Hoang, N. Bhardwaj, and K. Nguyen, “A FTIR method for determiningthe content of hexeneuronic acid (hexA) and kappa number of a high-yieldkraft pulp,” Carbohydrate Polymers, vol. 61, pp. 5–9, July 2005. → pages20[41] A. Alves, A. Santos, D. da Silva Perez, J. Rodrigues, H. Pereira, R. Simões,and M. Schwanninger, “NIR PLSR model selection for kappa numberprediction of maritime pine kraft pulps,”Wood Science and Technology,vol. 41, pp. 491–499, mar 2007. → pages 20, 120[42] C. R. Mora and L. R. Schimleck, “On the selection of samples formultivariate regression analysis: application to near-infrared (NIR)calibration models for the prediction of pulp yield in Eucalyptus nitens ,”Canadian Journal of Forest Research, vol. 38, pp. 2626–2634, Oct. 2008.→ pages 20, 21[43] G. Downes, R. Meder, C. Hicks, and N. Ebdon, “Developing andevaluating a multisite and multispecies NIR calibration for the prediction ofkraft pulp yield in eucalypts,” Southern Forests: a Journal of ForestScience, vol. 71, pp. 155–164, June 2009. → pages 20, 21[44] M. Brink, C.-F. Mandenius, and A. Skoglund, “On-line predictions of theaspen fibre and birch bark content in unbleached hardwood pulp, using NIRspectroscopy and multivariate data analysis,” Chemometrics and IntelligentLaboratory Systems, vol. 103, pp. 53–58, aug 2010. → pages 20, 121[45] F. Xu, J. Yu, T. Tesso, F. Dowell, and D. Wang, “Qualitative andquantitative analysis of lignocellulosic biomass using infrared techniques:A mini-review,” Applied Energy, vol. 104, pp. 801–809, apr 2013. → pages20[46] N. Tavassoli, W. Tsai, P. Bicho, and E. R. Grant, “Multivariateclassification of pulp NIR spectra for end-product properties using discretewavelet transform with orthogonal signal correction,” Anal. Methods,vol. 6, pp. 8906–8914, July 2014. → pages 20, 21, 36, 76, 79[47] A. J. Hobro, J. Kuligowski, M. Döll, and B. Lendl, “Differentiation ofwalnut wood species and steam treatment using ATR-FTIR and partial least128squares discriminant analysis (PLS-DA),” Analytical and BioanalyticalChemistry, vol. 398, pp. 2713–2722, Sept. 2010. → pages 20[48] G. Zhou, G. Taylor, and A. Polle, “FTIR-ATR-based prediction andmodelling of lignin and energy contents reveals independent intra-specificvariation of these traits in bioenergy poplars,” Plant Methods, vol. 7, no. 1,p. 9, 2011. → pages 20[49] A. M. Smith, M. C. Mancini, and S. Nie, “Bioimaging: Second window forin vivo imaging,” Nature Nanotechnology, vol. 4, pp. 710–711, Nov. 2009.→ pages 21, 87, 89[50] K. Mangold, J. A. Shaw, and M. Vollmer, “The physics of near-infraredphotography,” European Journal of Physics, vol. 34, pp. S51–S71, Oct.2013. → pages 21, 87, 89[51] L. R. Schimleck, P. D. Kube, C. A. Raymond, A. J. Michell, and J. French,“Estimation of whole-tree kraft pulp yield of Eucalyptus nitens usingnear-infrared spectra collected from increment cores,” Canadian Journal ofForest Research, vol. 35, pp. 2797–2805, Dec. 2005. → pages 21[52] A. C. Albrecht and M. C. Hutley, “On the dependence of vibrationalRaman intensity on the wavelength of incident light,” The Journal ofChemical Physics, vol. 55, pp. 4438–4443, nov 1971. → pages 24[53] A. Rzhevskii, “Basic aspects of experimental design in Ramanmicroscopy,” Spectroscopy, 2016. → pages 24[54] Z. Wang, A. Pakoulev, Y. Pang, and D. D. Dlott, “Vibrational substructurein the OH stretching transition of water and HOD,” The Journal of PhysicalChemistry A, vol. 108, pp. 9054–9063, Oct. 2004. → pages 24[55] J. Wiley and R. Atalla, “Band assignments in the Raman spectra ofcelluloses,” Carbohydrate Research, vol. 160, pp. 113–129, Feb. 1987. →pages 24, 39, 51, 58[56] S. Fischer, K. Schenzel, K. Fischer, and W. Diepenbrock, “Applications ofFT Raman spectroscopy and micro spectroscopy characterizing celluloseand cellulosic biomaterials,”Macromolecular Symposia, vol. 223,pp. 41–56, Mar. 2005. → pages 24[57] K. Schenzel, S. Fischer, and E. Brendler, “New method for determining thedegree of cellulose I crystallinity by means of FT-Raman spectroscopy,”Cellulose, vol. 12, pp. 223–231, June 2005. → pages 24129[58] K. Schenzel, H. Almlöf, and U. Germgård, “Quantitative analysis of thetransformation process of cellulose I→ cellulose II using NIR FT Ramanspectroscopy and chemometric methods,” Cellulose, vol. 16, pp. 407–415,Feb. 2009. → pages 24[59] U. P. Agarwal, R. S. Reiner, and S. A. Ralph, “Cellulose I crystallinitydetermination using FT-Raman spectroscopy: univariate and multivariatemethods,” Cellulose, vol. 17, pp. 721–733, May 2010. → pages 24, 39[60] U. P. Agarwal, R. R. Reiner, and S. A. Ralph, “Estimation of cellulosecrystallinity of lignocelluloses using near-IR FT-Raman spectroscopy andcomparison of the raman and Segal-WAXS methods,” Journal ofAgricultural and Food Chemistry, vol. 61, pp. 103–113, Jan. 2013. →pages 24[61] J. Vester, C. Felby, O. F. Nielsen, and S. Barsberg, “Fourier transformRaman difference spectroscopy for detection of lignin oxidation productsin thermomechanical pulp,” Applied Spectroscopy, vol. 58, pp. 404–409,Apr. 2004. → pages 25[62] U. Agarwal, “Raman spectroscopic characterization of wood and pulpfibers,” in Characterization of Lignocellulose Materials (T. Hu, ed.), ch. 2,pp. 17–35, Blackwell Publishing, 2008. → pages 25, 120[63] U. Agarwal and L. Landucci, “FT-Raman investigation of bleaching ofspruce thermomechanical pulp,” Journal of pulp and paper science,vol. 30, pp. 269–274, Oct. 2004. → pages 25[64] T. Ona, T. Sonoda, K. Ito, M. Shibata, and T. Kato, “Rapid prediction ofnative wood pulp properties by Fourier transform Raman spectroscopy,”Journal of pulp and paper science, vol. 29, pp. 6–10, Jan. 2000. → pages25[65] T. Ona, T. Sonoda, J. Ohshima, S. Yokota, and N. Yoshizawa, “A rapidquantitative method to assess eucalyptus wood properties for kraft pulpproduction by FT-Raman spectroscopy,” Journal of pulp and paper science,vol. 26, pp. 43–47, Feb. 2003. → pages 25[66] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,”Chemometrics and intelligent laboratory , vol. 2, pp. 37–52, Aug. 1987. →pages 26, 29, 30130[67] J. B. Cooper, “Chemometric analysis of Raman spectroscopic data forprocess control applications,” Chemometrics and Intelligent LaboratorySystems, vol. 46, pp. 231–247, Mar. 1999. → pages 26[68] C. Cui and T. Fearn, “Comparison of partial least squares regression, leastsquares support vector machines, and gaussian process regression for anear infrared calibration,” Journal of Near Infrared Spectroscopy, vol. 25,no. 1, pp. 5–14, 2017. → pages 26, 56[69] V. J. Barclay, R. F. Bonner, and I. P. Hamilton, “Application of wavelettransforms to experimental spectra: smoothing, denoising, and data setcompression,” Analytical Chemistry, vol. 69, pp. 78–90, Jan. 1997. →pages 27, 29[70] C. M. Galloway, E. C. L. Ru, and P. G. Etchegoin, “An iterative algorithmfor background removal in spectroscopy by wavelet transforms,” AppliedSpectroscopy, vol. 63, pp. 1370–1376, Dec. 2009. → pages 28[71] S. Wold, M. Sjöström, and L. Eriksson, “PLS-regression: A basic tool ofchemometrics,” Chemometrics and Intelligent Laboratory Systems, vol. 58,pp. 109–130, Oct. 2001. → pages 29, 30, 31, 33[72] S. de Jong, “SIMPLS: An alternative approach to partial least squaresregression,” Chemometrics and Intelligent Laboratory Systems, vol. 18,pp. 251–263, Mar. 1993. → pages 30, 31, 37, 168[73] J. H. Kalivas and J. Palmer, “Characterizing multivariate calibrationtradeoffs (bias, variance, selectivity, and sensitivity) to select model tuningparameters,” Journal of Chemometrics, vol. 28, pp. 347–357, Oct. 2013. →pages 32, 38[74] Q. Xu and Y. Liang, “Monte Carlo cross validation,” Chemometrics andIntelligent Laboratory Systems, vol. 56, pp. 1–11, Apr. 2001. → pages 33[75] D. Chen, Z. Chen, and E. Grant, “Adaptive wavelet transform suppressesbackground and noise for quantitative analysis by Raman spectrometry,”Analytical and Bioanalytical Chemistry, vol. 400, pp. 625–634, Feb. 2011.→ pages 33, 36[76] H. Li, Q. Xu, and Y. Liang, “libpls: An integrated library for partial leastsquares regression and discriminant analysis,” PeerJ PrePrints, 2014. →pages 37, 168131[77] M. Kadleíková, J. Breza, and M. Veselý, “Raman spectra of syntheticsapphire,” Microelectronics Journal, vol. 32, pp. 955–958, Dec. 2001. →pages 51, 58[78] D. Vavlekas, L. Melo, M. Ansari, E. Grant, F. Fremy, J. L. McCoy, andS. G. Hatzikiriakos, “Role of PTFE paste fibrillation on Poisson’s ratio,”Polymer Testing, vol. 61, pp. 65–73, Aug. 2017. → pages 52[79] M. Kemmler, E. Rodner, P. Rösch, J. Popp, and J. Denzler, “Automaticidentification of novel bacteria using Raman spectroscopy and gaussianprocesses,” Analytica Chimica Acta, vol. 794, pp. 29–37, Sept. 2013. →pages 56[80] N. J. Everall, “Confocal Raman microscopy: Performance, pitfalls, andbest practice,” Applied Spectroscopy, vol. 63, pp. 245–262, Sept. 2009. →pages 69[81] J. Kiefer, J. Rueger, and F. M. Zehentbauer, “A priori performanceestimation of spatial filtering in Raman backscattering experiments,”Spectroscopy, vol. 32, no. 5, pp. 56–61, 2017. → pages 69[82] A. Marklund, M. Paper, J. B. Hauksson, U. Edlund, and M. Sjöström,“Prediction of strength parameters for softwood kraft pulps,” Nordic Pulp& Paper Research Journal, vol. 14, pp. 140–148, May 1999. → pages 71,120[83] J. Ma and Y.-S. Li, “Fiber Raman background study and its application insetting up optical fiber Raman probes,” Applied Optics, vol. 35, p. 2527,may 1996. → pages 73[84] M. G. Shim and B. C. Wilson, “Development of an in vivo Ramanspectroscopic system for diagnostic applications,” Journal of RamanSpectroscopy, vol. 28, pp. 131–142, feb 1997. → pages 73[85] M. G. Shim, B. C. Wilson, E. Marple, and M. Wach, “Study of Fiber-Opticprobes for in vivo medical Raman spectroscopy,” Applied Spectroscopy,vol. 53, pp. 619–627, jun 1999. → pages 73[86] J. Hill, “Method and device for examining pulp for the presence of shives,”Jan. 1978. US Patent 4,066,492. → pages 85, 86, 87, 89[87] H. Hughes, Jr. and R. A. Schilling, “Shive ratio analyzer,” Sept. 1980. USPatent 4,225,385. → pages 85, 86, 87132[88] R. W. Gooding, “The passage of fibres through slots in pulp screening,”Master’s thesis, The University of British Columbia, Vancouver, Sept.1986. → pages 85[89] R. W. Gooding and R. J. Kerekes, “Derivation of performance equations forsolid-solid screens,” The Canadian Journal of Chemical Engineering,vol. 67, pp. 801–805, Oct. 1989. → pages 85[90] J. Olson, N. Roberts, B. Allison, and R. Gooding, “Fibre lengthfractionation caused by pulp screening,” Journal of Pulp and PaperScience, vol. 24, no. 12, pp. 393–397, 1998. → pages 85[91] G. Dorris, C. Caloca, S. Gendron, M. Ricard, N. Pagé, and D. Filion,“On-line macrocontaminant analyser and method,” Mar. 2016. US Patent9,280,726. → pages 87[92] J. A. Olmstead and D. G. Gray, “Fluorescence emission from mechanicalpulp sheets,” Journal of Photochemistry and Photobiology A: Chemistry,vol. 73, pp. 59–65, June 1993. → pages 88[93] G. W. Zack, W. E. Rogers, and S. A. Latt, “Automatic measurement ofsister chromatid exchange frequency.,” Journal of Histochemistry &Cytochemistry, vol. 25, pp. 741–753, July 1977. → pages 92, 188[94] C. E. Anderson and J. H. Kalivas, “Fundamentals of calibration transferthrough Procrustes analysis,” Applied Spectroscopy, vol. 53,pp. 1268–1276, Oct. 1999. → pages 118[95] P. Tillmann, T.-C. Reinhardt, and C. Paul, “Networking of near infraredspectroscopy instruments for rapeseed analysis: A comparison of differentprocedures,” Journal of Near Infrared Spectroscopy, vol. 8, pp. 101–107,Mar. 2000. → pages 118[96] T. Fearn, “Standardisation and calibration transfer for near infraredinstruments: A review,” Journal of Near Infrared Spectroscopy, vol. 9,pp. 229–244, Oct. 2001. → pages 118[97] R. N. Feudale, N. A. Woody, H. Tan, A. J. Myles, S. D. Brown, andJ. Ferré, “Transfer of multivariate calibration models: a review,”Chemometrics and Intelligent Laboratory Systems, vol. 64, pp. 181–192,Nov. 2002. → pages 118133[98] D. V. Poerio and S. D. Brown, “Dual-Domain calibration transfer usingorthogonal projection,” Applied Spectroscopy, vol. 72, pp. 378–391, Aug.2017. → pages 118[99] J. J. Workman, “A review of calibration transfer practices and instrumentdifferences in spectroscopy,” Applied Spectroscopy, vol. 72, pp. 340–365,Oct. 2017. → pages 118[100] M. Kompany-Zareh and F. van den Berg, “Multi-way based calibrationtransfer between two Raman spectrometers,” The Analyst, vol. 135, no. 6,p. 1382, 2010. → pages 118[101] C. M. Gryniewicz-Ruzicka, S. Arzhantsev, L. N. Pelster, B. J.Westenberger, L. F. Buhse, and J. F. Kauffman, “Multivariate calibrationand instrument standardization for the rapid detection of diethylene glycolin glycerin by Raman spectroscopy,” Applied Spectroscopy, vol. 65,pp. 334–341, Mar. 2011. → pages 118[102] J. D. Rodriguez, B. J. Westenberger, L. F. Buhse, and J. F. Kauffman,“Standardization of Raman spectra for transfer of spectral libraries acrossdifferent instruments,” The Analyst, vol. 136, no. 20, pp. 4232–4240, 2011.→ pages 118[103] H. Chen, Z.-M. Zhang, L. Miao, D.-J. Zhan, Y.-B. Zheng, Y. Liu, F. Lu,and Y.-Z. Liang, “Automatic standardization method for Ramanspectrometers with applications to pharmaceuticals,” Journal of RamanSpectroscopy, vol. 46, pp. 147–154, Oct. 2014. → pages 118[104] D. Brouckaert, J.-S. Uyttersprot, W. Broeckx, and T. D. Beer, “Calibrationtransfer of a Raman spectroscopic quantification method from at-line toin-line assessment of liquid detergent compositions,” Analytica ChimicaActa, vol. 971, pp. 14–25, June 2017. → pages 118[105] S. Guo, R. Heinke, S. Stöckel, P. Rösch, T. Bocklitz, and J. Popp, “Towardsan improvement of model transferability for Raman spectroscopy inbiological applications,” Vibrational Spectroscopy, vol. 91, pp. 111–118,July 2017. → pages 118[106] H. Chen, Y. Liu, F. Lu, Y. Cao, and Z.-M. Zhang, “Eliminating non-linearRaman shift displacement between spectrometers via moving window fastfourier transform cross-correlation,” Frontiers in Chemistry, vol. 6,pp. 1–11, Oct. 2018. → pages 118134[107] D. Liu, B. Hennelly, L. O'Neill, and H. J. Byrne, “Investigation ofwavenumber calibration for Raman spectroscopy using a polymerstandard,” in Optical Sensing and Detection V, Proceedings of SPIE - TheInternational Society for Optical Engineering (F. Berghmans and A. G.Mignani, eds.), vol. 10680 of SPIE Photonics Europe 2018, SPIE, May2018. → pages 118[108] N. Itoh, K. Shirono, and T. Fujimoto, “Baseline assessment for theconsistency of Raman shifts acquired with 26 different raman systems andnecessity of a standardized calibration protocol,” Analytical Sciences,vol. 35, pp. 571–576, May 2019. → pages 118[109] M. Halttunen, J. Vyörykkä, B. Hortling, T. Tamminen, D. Batchelder,A. Zimmermann, and T. Vuorinen, “Study of residual lignin in pulp by UVresonance Raman spectroscopy,” Holzforschung, vol. 55, pp. 631–638, nov2001. → pages 120, 121[110] S. Barsberg, P. Matousek, and M. Towrie, “Structural analysis of lignin byresonance Raman spectroscopy,”Macromolecular Bioscience, vol. 5,pp. 743–752, aug 2005. → pages 120[111] U. P. Agarwal and R. S. Reiner, “Near-IR surface-enhanced Ramanspectrum of lignin,” Journal of Raman Spectroscopy, vol. 40,pp. 1527–1534, nov 2009. → pages 120[112] K. L. Larsen and S. Barsberg, “Theoretical and Raman spectroscopicstudies of phenolic lignin model monomers,” The Journal of PhysicalChemistry B, vol. 114, pp. 8009–8021, jun 2010. → pages 120, 121[113] R. H. Atalla and U. P. Agarwal, “Recording Raman spectra from plant cellwalls,” Journal of Raman Spectroscopy, vol. 17, pp. 229–231, apr 1986. →pages 120[114] J. S. Bond and R. H. Atalla, “A Raman microprobe investigation of themolecular architecture of loblolly pine tracheids,” in 10th InternationalSymposium on Wood and Pulping Chemistry, vol. 1, pp. 96–101, TAPPIPress, 1999. → pages 120[115] A.-M. Saariaho, A.-S. Jääskeläinen, M. Nuopponen, and T. Vuorinen,“Ultra violet resonance Raman spectroscopy in lignin analysis:Determination of characteristic vibrations of p-hydroxyphenyl, guaiacyl,and syringyl lignin structures,” Applied Spectroscopy, vol. 57, pp. 58–66,Jan 2003. → pages 121135[116] A.-M. Saariaho, D. S. Argyropoulos, A.-S. Jääskeläinen, and T. Vuorinen,“Development of the partial least squares models for the interpretation ofthe UV resonance Raman spectra of lignin model compounds,” VibrationalSpectroscopy, vol. 37, pp. 111–121, jan 2005. → pages 121[117] H. S. Sands, F. Demangeot, E. Bonera, S. Webster, R. Bennett, I. P.Hayward, F. Marchi, D. A. Smith, and D. N. Batchelder, “Development ofa combined confocal and scanning near-field Raman microscope for deepUV laser excitation,” Journal of Raman Spectroscopy, vol. 33, no. 9,pp. 730–739, 2002. → pages 121, 122[118] P. C. Stair and C. Li, “Ultraviolet Raman spectroscopy of catalysts andother solids,” Journal of Vacuum Science & Technology A: Vacuum,Surfaces, and Films, vol. 15, pp. 1679–1684, may 1997. → pages 121, 122[119] N. Everall, T. Hahn, P. Matousek, A. W. Parker, and M. Towrie,“Picosecond time-resolved Raman spectroscopy of solids: Capabilities andlimitations for fluorescence rejection and the influence of diffusereflectance,” Applied Spectroscopy, vol. 55, pp. 1701–1708, dec 2001. →pages 122[120] B. Cletus, W. Olds, E. L. Izake, S. Sundarajoo, P. M. Fredericks, andE. Jaatinen, “Combined time- and space-resolved Raman spectrometer forthe non-invasive depth profiling of chemical hazards,” Analytical andBioanalytical Chemistry, vol. 403, pp. 255–263, feb 2012. → pages 122[121] S. K. V. Sekar, S. Mosca, S. Tannert, G. Valentini, F. Martelli, T. Binzoni,Y. Prokazov, E. Turbin, W. Zuschratter, R. Erdmann, and A. Pifferi, “Timedomain diffuse Raman spectrometer based on a TCSPC camera for thedepth analysis of diffusive media,” Optics Letters, vol. 43, p. 2134, apr2018. → pages 122136Appendix AScripts and FunctionsThis appendix contains the various scripts written by the author and employedduring the course of the research presented herein. These are written in MATLAB,Python, Interactive Data Language (IDL), and Structured Query Language (SQL).A.1 Raman Data Processing - MATLAB: TOGA, PLS, andGPRThe Template-Oriented Genetic Algorithm (TOGA) function, toga_mc_ac - calledon lines 168-170, was written by Daniel Da Chen, [5] and modified by the author.A.1.1 Main Processing Script1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2 % Processing script for pulp data3 % Ashton Christy4 % 20 June 20165 % Edited: 5 Sep 20176 % Version 2.37 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%8 % To input data correctly:9 % Set a working folder, and store all spectra (x) in a10 % \data\ subfolder.11 % Filenames should be formatted as follows:13712 % DR00-000-000-B0-S0.xyz13 % where:14 % DR00-000 is the sample number15 % (eg. DR15-089; always use leading 0)16 % -000 is the refining energy in kW (eg. 0, 50, 100)17 % -B0 is the batch number18 % -S0 is the suspension number19 % .xyz is the file extension (MAT, CSV, XLS, or XML)20 % Batch and suspension numbers must be 1-3,21 % i.e. 9 spectra per sample.22 % Store target data (y) in a .MAT file, with one vector23 % named "target" containing the data, and a second named24 % "target_name" containing the corresponding pulp sample25 % names (DR00-000-000; eg. DR15-089-50).26 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%27 % .MAT files produced by the INNO software28 % (RamanIntnalEngFX3.1.2; CTRL+E) contain a spectral matrix29 % named "data", a filename string named "name", and a matrix30 % with wavenumber and pixel number named "waveinfo". The31 % "data" vector may contain one or several spectra (one per32 % row).33 % .CSV files produced by Wasatch's Dash software (CTRL+S or34 % CTRL+7) may contain a header consisting of 2 rows and 1735 % columns. loaddata.m will remove this header, if it exists.36 % The .CSV file may contain one or several spectra (one per37 % row).38 % .XLS files produced by Wasatch's Dash software (CTRL+X)39 % contain four sheets: Summary, Pixel, Wavelength, and40 % Wavenumber. loaddata.m reads spectral data from the Pixel41 % sheet. The header and first column listing pixel numbers42 % are removed. The .XLS file may contain one or multiple43 % spectra (one per column).44 % .XML files produced by the INNO software (CTRL+S) are read45 % using xml2spectra.m. The INNO software cannot store more46 % than one spectrum per XML file; spectral library files47 % (.LIB; CTRL+S) storing multiple spectra cannot be read by48 % MATLAB.49 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%50 clear; close all; clc;51 %%%%%%%% SET BASE WORKING FOLDER; change as necessary %%%%%%%%%13852 cd Z:\Ashton\PulpEye_Data\53 %%%%%%%%%% CHOOSE TARGET FILE; c2hange as necessary %%%%%%%%%%%54 targetfile = 'Z:\Ashton\PulpEye_Data\viscosity_data.mat';55 %%%%%%%% LOAD WAVENUMBER INFORMATION FOR SPECTROMETER %%%%%%%%%56 load 'Z:\Ashton\PulpEye_Data\wavenum_red_calib.mat'57 %%%%%%%%%%%% LOAD DATA FOLDER; change as necessary %%%%%%%%%%%%58 cd data_viscpads_final59 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%60 % Set data folder in memory and return to base folder61 datafolder = pwd62 cd 'Z:\Ashton\PulpEye_Data'63 % Choose filetype64 print = 'Enter filetype (1 for MAT, 2 for CSV, 3 for XLS, 4 ...for XML): ';65 variables.filetype = input(print);66 % Test data folder67 try68 [files] = loaddata(datafolder, targetfile, ...variables.filetype);69 if strcmp(files,'No files found.')70 return71 end72 catch; return; end73 % Average replicate spectra74 print = '\nAverage across batches? (0 for no, 1 for yes): ';75 variables.avgbat = input(print);76 if variables.avgbat == 077 print = '\nAverage across suspensions? (0 for no, 1 for ...yes): ';78 variables.avgsusp = input(print);79 else80 variables.avgsusp = 1;81 end82 % Choose data subset83 print = '\nUse data subset? (0 for no, 1 for yes): ';84 variables.datasub = input(print);85 % Load data86 [files, sampleinfo, norm_data, target_sorted, ...files_not_loaded, raw_data, target_name_sorted] = ...loaddata(datafolder, targetfile, variables.filetype, ...139variables.avgsusp, variables.avgbat, variables.datasub);87 fprintf('\nData stored in memory.\n');88 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%89 %% Discrete Wavelet Transform %%90 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%91 fprintf('\nStarting filtering...\n');92 % Initialize DWT Parameters93 variables.scale = 7;94 variables.wfilter = 'sym5';95 variables.bg_iterations = 10;96 variables.hfreq_cuts = 2;97 variables.lfreq_cuts = 0;98 variables.bg_remove = 0;99 variables.indval = 1;100 filt_data = zeros(size(norm_data));101 % DWT Background Subtraction102 if variables.bg_remove == 1103 [bg_data, bg_sub_data] = dwt_bg_remove(norm_data, ...variables.scale, variables.wfilter, ...variables.bg_iterations);104 fprintf('\nBackground subtraction complete.\n');105 end106 % DWT filtering107 for ii = 1:size(norm_data, 1)108 % Perform filtering109 if variables.bg_remove == 1110 [filt_data_1d, wcoefs_1d, WL, cuts] = ...dwt_filter_lmelo(bg_sub_data(ii,:), ...variables.scale, variables.wfilter, ...variables.hfreq_cuts, variables.lfreq_cuts, 0);111 noise_1d = waverec(cuts, WL, variables.wfilter);112 else113 [filt_data_1d, wcoefs_1d, WL] = ...dwt_filter_lmelo(norm_data(ii,:), variables.scale, ...variables.wfilter, variables.hfreq_cuts, ...variables.lfreq_cuts, 1);114 end115 % Store filtered data and wavelet variables.use_coefficients116 noise(ii,:) = noise_1d;117 filt_data(ii,:) = filt_data_1d;140118 wcoefs(ii,:) = wcoefs_1d;119 end120 % Plot DWT filtered spectra121 figure; plot(wavenum, filt_data')122 hold on; plot(wavenum, norm_data')123 title('Filtered Data')124 xlabel('Raman Shift (cm^{-1})')125 ylabel('Norm. Counts')126 if variables.indval == 1 && iscell(files_not_loaded)127 fprintf('\nLoading independent validation set...\n');128 % Load independent validation set (unknowns)129 [indval_norm, indval_filt, indval_wcoef, indval_WL, ...indval_files] = loadindval(files_not_loaded, ...variables.filetype, datafolder, variables.avgsusp, ...variables.avgbat, variables.scale, variables.wfilter, ...variables.bg_iterations, variables.hfreq_cuts, ...variables.lfreq_cuts, variables.bg_remove);130 end131 fprintf('\nDWT filtering complete.\n');132 %%%%%%%%%%%%%%%%133 %% TOGA model %%134 %%%%%%%%%%%%%%%%135 fprintf('\nInitializing TOGA...\n');136 % Initialize TOGA template parameters137 variables.toga_pls_factors = 8;138 variables.mcuve_runs = 400;139 variables.mcuve_noise = 0;140 train_points = floor(0.65 * length(target_sorted));141 % Generate possibility template (stability)142 [stability] = mcuve(wcoefs, target_sorted, ...variables.toga_plsvariables.pls_factors, ...variables.mc-uve_runs, variables.mc-uve_noise, ...train_points);143 clc; fprintf('\nTOGA initialized.\n');144 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%145 % Initialize TOGA variables %146 % WARNING - This step takes a %147 % substantial amount of time. %148 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%149 fprintf('\nBuilding model...\n');141150 % Initialize TOGA variables151 toga_data.cal = wcoefs; % Base data152 toga_data.caltar = target_sorted; % Target data153 toga_param.factor = variables.toga_pls_factors; % PLS ...factors for TOGA model154 toga_param.numVariables = 150; % Number of variables to ...populate155 toga_param.numSelectedVars = 15; % Number of features to select156 toga_param.pretreatment = 1; % Pretreatment: 1 = raw, 2 = ...center, 3 = SNV (normalized), 4 = all157 variables.popsize = 50; % Population size158 variables.stallgenlimit = 30; % Number of generations ...before stall159 variables.stalltimelimit = 15000; % Max time per stall160 variables.timelimit = 16000; % Time limit per generation161 variables.generations = 100; % Number of generations to run162 variables.toga_iterations = 10; % Number of TOGA iterations163 feature_index = zeros(variables.bg_iterations, ...toga_param.numSelectedVars); % Preallocate output164 parfor ii = 1:variables.toga_iterations % Loop TOGA to ...choose top wavelet representation of 10 wavelets total165 % Calculate model166 [toga_index,¬] = toga_mc_ac(toga_data, toga_param, ...stability, variables.popsize, ...variables.stallgenlimit, variables.stalltimelimit, ...variables.timelimit, variables.generations);167 % Build feature matrix168 feature_index(ii, :) = toga_index;169 % Output: row = TOGA iteration, col = generation, val = ...feature no.170 fprintf('\nIteration %g of %g complete.\n', ii, ...variables.toga_iterations);171 end172 clc; fprintf('\nTOGA model built.\n')173 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%174 % Rank variables and choose top for wavelet reconstruction %175 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%176 fprintf('\nRanking and selecting features...\n');177 % Build list of features178 for ii = 1:length(wcoefs)142179 feature_list(ii) = length(find(feature_index == ii));180 % Output: col = feature no., val = number of occurrences181 end182 % Rank features based on number of occurrences in list183 [¬,ranked_index] = sort(-abs(feature_list));184 % Get top ten features185 feature_top = ranked_index(1:15);186 fprintf('\nFeatures selected. Reconstructing data...\n');187 % Build new wavelet coefficient matrix using only top 10 ...features188 newwc = zeros(size(wcoefs));189 newwc(:,feature_top) = wcoefs(:,feature_top);190 % Reconstruct wavelet spectrum using only top 10 variables191 [wt_recon] = dwt_reconstructed(newwc, WL, ...variables.wfilter, variables.scale);192 figure; plot(wavenum, wt_recon')193 title('Feature-Selected Data')194 xlabel('Raman Shift (cm^{-1})')195 % Reconstruct independent validation set196 if variables.indval == 1197 indval_newwc = zeros(size(indval_wcoef));198 indval_newwc(:,feature_top) = indval_wcoef(:,feature_top);199 [indval_recon] = dwt_reconstructed(indval_newwc, ...indval_WL, variables.wfilter, variables.scale);200 end201 fprintf('\nData reconstructed and plotted.\n');202 %%%%%%%%%%%%%%%%203 %% PLS models %%204 %%%%%%%%%%%%%%%%205 close all;206 % Initialize Variables207 variables.pls_factors = 0; % Number of LVs to use (0 = use ...optimized)208 variables.optimizeplot = 1; % Show plots of PLS factor ...optimization209 variables.dataset = 2; % 1 = TOGA, 2 = DWT, 3 = Raw data210 variables.use_coefficients = 0; % 0: Use reconstructions, ...1: use coefficients211 variables.valsize = 0.25; % Size of validation set, less ...than 0.5143212 variables.valtype = 5; % 1 = Random, 2 = Center, 3 = Low, 4 ...= High, 5 = Every Fourth Sample213 variables.indval = 0; % 0: Suppress independent validation ...calculations214 variables.pls_indvalreps = 10; % Number of times to repeat ...PLS model for variables.indval (>0)215 % Choose data216 switch variables.dataset217 case 1218 % Initialize validation and calibration sets - TOGA219 if variables.use_coefficients == 1220 pls_data = newwc;221 else222 pls_data = wt_recon;223 end224 case 2225 % Initialize validation and calibration sets - DWT226 if variables.use_coefficients == 1227 pls_data = wcoefs;228 else229 pls_data = filt_data;230 end231 case 3232 % Initialize validation and calibration sets - RAW233 pls_data = norm_data;234 end235 fprintf('\nBuilding PLSSIM models...\n');236 if variables.indval == 1 % Model independent validation set237 % Initialize variables238 parfor mm = 1:variables.pls_indvalreps239 % Calculate SIM model240 [B_multi(:,mm), cal_i, val_i, caltar_i, valtar_i, ...yp_cal_sim_i, yp_val_sim_i, rmsec_sim_i, ...rmsep_sim_i] = pls_ac_sim(pls_data, target_sorted, ...variables.valsize, variables.valtype, ...variables.pls_factors, variables.optimizeplot);241 % Calculate standard error242 err_multi(mm,:) = ...(rmsec_sim_i*ones(size(indval_filt,1),1)) / 2;243 % Make prediction144244 switch variables.dataset245 case 1 % Feature-selected data246 if variables.use_coefficients == 1247 yp_ind_multi(:,mm) = variables.indval_newwc * ...B_multi(:,mm);248 else249 yp_ind_multi(:,mm) = variables.indval_recon * ...B_multi(:,mm);250 end251 case 2 % DWT filtered data252 if variables.use_coefficients == 1253 yp_ind_multi(:,mm) = indval_wcoef * B_multi(:,mm);254 else255 yp_ind_multi(:,mm) = indval_filt * B_multi(:,mm)256 end257 case 3 % Raw data258 yp_ind_multi(:,mm) = indval_norm * B_multi(:,mm);259 end260 % Predict independent validation set using SIM model261 [indprediction_multi{mm}] = ...pls_ind_predict(indval_files, yp_ind_multi(:,mm), ...variables.avgsusp, variables.avgbat);262 % Apply trendlines and labels263 fit_sim_i = polyfit(caltar_i,yp_cal_sim_i,1);264 % Store model for averaging265 cal_multi(:,:,mm) = cal_i; val_multi(:,:,mm) = val_i;266 caltar_multi(mm,:) = caltar_i; valtar_multi(mm,:) = ...valtar_i;267 yp_cal_multi(mm,:) = yp_cal_sim_i; yp_val_multi(mm,:) = ...yp_val_sim_i;268 rmsec_multi(mm) = rmsec_sim_i; rmsep_multi(mm) = ...rmsep_sim_i;269 fit_multi(mm,:) = fit_sim_i;270 fprintf('Repetition %g complete.\n',mm);271 end272 else % No independent validation set273 % Calculate SIM model274 [B, cal, val, caltar, valtar, yp_cal, yp_val, rmsec, ...rmsep, R2, MSE, stats] = pls_ac_sim(pls_data, ...target_sorted, variables.valsize, variables.valtype, ...145variables.pls_factors, variables.optimizeplot);275 % Apply trendlines and labels276 fit_pls = polyfit(caltar, yp_cal,1);277 fit_pls_inv = [1/fit_pls(1)-fit_pls(2)/fit_pls(1)];278 end279 fprintf('\nPLSSIM models built.\n');280 % Average repetitions281 if variables.indval == 1282 if variables.pls_indvalreps > 1283 B = mean(B_multi,2);284 caltar = caltar_multi(end,:)'; valtar = ...valtar_multi(end,:)';285 yp_cal = cal_multi(:,:,end) * B;286 yp_val = val_multi(:,:,end) * B;287 rmsec = mean(rmsec_multi);288 rmsep = mean(rmsep_multi);289 fit_pls = mean(fit_multi);290 err = mean(err_multi);291 else292 B = B_multi;293 caltar = caltar_multi; valtar = valtar_multi;294 yp_cal = yp_cal_multi; yp_val = yp_val_multi;295 rmsec = rmsec_multi; rmsep = rmsep_multi;296 fit_pls = fit_multi; err = err_multi;297 end298 fit_pls_inv = [1/fit_pls(1)-fit_pls(2)/fit_pls(1)];299 R2cum = cumsum(R2,2);300 end301 % Plot averaged model302 figure; plot(yp_cal,caltar,'r*'); hold on;303 plot(yp_val,valtar,'b*');304 fitline_sim = refline(fit_pls_inv);305 goal = refline(1); goal.Color = 'k';306 title('PLSSIM Model'); ylabel('Measured'); xlabel('Predicted');307 xaxis_min = min(min(yp_cal),min(yp_val))-0.5;308 xaxis_max = max(max(yp_cal),max(yp_val))+0.5;309 yaxis_min = min(target_sorted)-0.5;310 yaxis_max = max(target_sorted)+0.5;311 axis([xaxis_min xaxis_max yaxis_min yaxis_max]);312 % Average and plot independent validation set146313 if variables.indval == 1314 err_sim = mean(err_multi);315 yp_ind_sim = mean(yp_ind_multi,2);316 for jj = 1:length(indprediction_multi{1})317 for ii = 1:variables.pls_indvalreps318 indpredlist(ii) = indprediction_multi{ii}(jj).Prediction;319 end320 samplename = strsplit(indprediction_multi{ii}(jj).File, ...'-B');321 ind_predictions(jj).Sample = samplename(1);322 ind_predictions(jj).Prediction = mean(indpredlist);323 end324 clear ii indpredlist samplename325 hold on;326 herrorbar(yp_ind_sim(:),yp_ind_sim(:),err(:),'bd')327 end328 % Store predictions329 [files(:).prediction] = deal(0);330 [files(:).err_absolute] = deal(0);331 [files(:).err_relative] = deal(0);332 [sampleinfo(:).Err_Absolute] = deal(0);333 [sampleinfo(:).Err_Relative] = deal(0);334 % Store predictions in files struct335 [y_pls, ysortindex] = sort(vertcat(caltar, valtar));336 yp_sim = vertcat(yp_cal,yp_val);337 for ii = 1:length(yp_sim)338 ypred_pls(ii) = yp_sim(ysortindex(ii));339 files(ii).prediction = yp_sim(ysortindex(ii));340 files(ii).err_absolute = abs(y_pls(ii)-ypred_pls(ii));341 files(ii).err_relative = ...cellstr(num2str((files(ii).err_absolute / ...ypred_pls(ii))*100,'%0.2f%%'));342 end343 % Store predictions in sample info struct344 kk = 1;345 for ii = 1:length(yp_sim)346 if rem(ii-1, round(length(files)/length(sampleinfo))) == 0347 try348 sampleinfo(kk).Prediction = mean(ypred_pls(ii: ...ii+round(length(files)/length(sampleinfo))));147349 catch350 sampleinfo(kk).Prediction = ...mean(ypred_pls(ii:length(files)));351 end352 kk = kk + 1;353 end354 end355 for ii = 1:length(sampleinfo)356 sampleinfo(ii).Err_Absolute = ...abs(sampleinfo(ii).Prediction-sampleinfo(ii).Target);357 sampleinfo(ii).Err_Relative = ...cellstr(num2str((sampleinfo(ii).Err_Absolute / ...sampleinfo(ii).Prediction)*100,'%0.2f%%'));358 end359 % Print output360 if variables.indval == 1 && variables.pls_indvalreps > 1361 fprintf('\nPLSSIM (Average of %g Repetitions):\ny = %.3gx ...+ %.5g\nRMSEC: %g\nRMSEP: %g\nPLS Models ...complete.\n', variables.pls_indvalreps, ...fit_pls_inv(1), fit_pls_inv(2), rmsec, rmsep);362 else363 fprintf('\nPLSSIM:\ny = %.3gx + %.5g\nRMSEC: %g\nRMSEP: ...%g\nPLS Models complete.\n', fit_pls_inv(1), ...fit_pls_inv(2), rmsec, rmsep);364 end365 %%%%%%%%%%%%%%%366 %% GPR model %%367 %%%%%%%%%%%%%%%368 close all;369 if not(exist('variables.dataset','var'))370 pls_data = filt_data;371 end;372 % Calculate model373 gp_mdl = fitrgp(pls_data, target_sorted, 'Basis','linear', ...'FitMethod','fic', 'PredictMethod','exact', 'Verbose',0);374 % Cross-validate model375 cvgp_mdl = crossval(gp_mdl, 'kfold',5);376 % Calculate predictions377 [ypred_gp,¬,yci_gp] = resubPredict(gp_mdl);378 yp_cvgp = kfoldPredict(cgvp_mdl);148379 if variables.indval == 1380 [ypred_ind,ysd_ind] = predict(gp_mdl, indval_filt);381 end382 % Calculate errors383 loss = resubLoss(gp_mdl);384 losscv = kfoldLoss(cvgp_mdl, 'mode','individual');385 loores = postFitStatistics(gp_mdl); % Leave-one-out residuals386 rmse = sqrt(sum(loores.^2) / length(target_sorted));387 % Plot model388 figure; plot(ypred_gp, target_sorted, 'r.'); hold on;389 plot(yp_cvgp, target_sorted, 'k.');390 fit_gp = polyfit(ypred_gp, target_sorted, 1);391 fitline_gp = refline(fit_gp);392 goal = refline(1); goal.Color = 'k';393 if variables.indval == 1394 plot(ypred_ind, ypred_ind, 'bo');395 herrorbar(ypred_ind, ypred_ind, ...(rmse*ones(size(ypred_ind))/2), 'bd');396 end397 title('GP Model'); ylabel('Measured'); xlabel('Predicted');398 xaxis_min = min(yp_cvgp)-0.5; xaxis_max = max(yp_cvgp)+0.5;399 yaxis_min = min(target_sorted)-0.5;400 yaxis_max = max(target_sorted)+0.5;401 axis([xaxis_min xaxis_max yaxis_min yaxis_max]);402 hold off;403 % Print output404 fprintf('\n\nGP:\ny = %.3gx + %.5g\nRMSE: %g\nGP Model ...complete.\n', fit_gp(1), fit_gp(2), rmse);405 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%406 %% Prediction and error plots %%407 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%408 % Initialize Variables409 variables.predplots = 1; % Show prediction plots410 variables.errplots = 1; % Show error plots411 variables.perspect = 0; % Show PLS plots with all spectra412 variables.persample = 1; % Show PLS plots for each sample413 % PLS Figure (All Spectra)414 if exist('yp_cal', 'var') && variables.perspect == 1415 % Prediction plot416 if variables.predplots == 1149417 figure; plot(y_pls,'r.'); hold on;418 plot(ypred_pls,'b.','MarkerSize',5);419 legend('True response','PLS ...predictions','Location','Best');420 title('PLS Predictions (All Spectra)');421 ylabel('Target value'); xlabel('Sample No.');422 axis([1 length(y_pls) min(horzcat(ypred_pls,y_pls))-0.5 ...max(horzcat(ypred_pls,y_pls))+0.5])423 hold off;424 end425 % Error plot426 if variables.errplots == 1427 y_pcterr = vertcat(files.err_absolute) ./ ...vertcat(files.prediction)*100;428 figure;429 plot(y_pcterr,'k.'); hold on;430 polyfn = ...polyval(polyfit((1:length(y_pls))',y_pcterr,2), ...1:length(y_pls),'r');431 plot(1:length(y_pls), polyfn); ax1 = gca;432 ax1.XAxisLocation = 'top'; ax1.YAxisLocation = 'left';433 axis([0 length(y_pls) 0 max(y_pcterr)+0.5])434 title('PLS Error Plot (All Spectra)')435 ax2 = axes('Color', 'none');436 axis([min(target_sorted) max(target_sorted) 0 1])437 ax2.Color = 'none'; ax2.YTick = [0 1];438 ax2.XAxisLocation = 'bottom'; ax2.YAxisLocation = 'right';439 ax2.YTickLabel = ''; ax2.YColor = get(gca,'Color');440 xlabel('Target value'); set(gcf,'CurrentAxes',ax1)441 ylabel('Relative error (%)'); xlabel('Sample No.');442 hold off;443 end444 end445 % PLS Figure (per sample)446 if exist('sampleinfo', 'var') && variables.persample == 1447 % Prediction plot448 if variables.predplots == 1449 figure; plot(vertcat(sampleinfo.Target),'r*'); hold on;450 plot(vertcat(sampleinfo.Prediction),'b*');150451 legend('True response','PLS ...predictions','Location','Best');452 title('PLS MatLab Predictions (Free Space)');453 ylabel('Target value'); xlabel('Sample No.');454 hold off;455 end456 % Error plot457 if variables.errplots == 1458 numsamples = length(sampleinfo);459 y_plserr_pers = vertcat(sampleinfo.Err_Absolute) ./ ...vertcat(sampleinfo.Prediction)*100;460 figure;461 plot(y_plserr_pers, 'k.'); hold on;462 polyfn_s = ...polyval(polyfit((1:numsamples)',y_plserr_pers,2), ...1:numsamples,'r');463 plot(1:numsamples, polyfn_s); ax1 = gca;464 ax1.XAxisLocation = 'top'; ax1.YAxisLocation = 'left';465 axis([0 numsamples 0 max(y_plserr_pers)+0.5])466 title('PLS Error Plot (Free Space)')467 ax2 = axes('Color', 'none');468 axis([min(target_sorted) max(target_sorted) 0 1])469 ax2.Color = 'none'; ax2.YTick = [0 1];470 ax2.XAxisLocation = 'bottom'; ax2.YAxisLocation = 'right';471 ax2.YTickLabel = ''; ax2.YColor = get(gca,'Color');472 xlabel('Target value'); set(gcf,'CurrentAxes', ax1)473 ylabel('Relative error (%)'); xlabel('Sample No.');474 hold off;475 end476 endA.1.2 Load Data Set Function1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2 % Load data set for TOGA and PLS3 % for use with pulp processing script4 % Ashton Christy5 % 24 Nov 20161516 % Edited 24 Nov 20177 % For model PulpEye files8 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%9 % INPUT10 % datadir = Data directory11 % target = Target .MAT file location12 % filetype = 1 for MAT, 2 for CSV, 3 for XLS, 4 for XML13 % avgsusp = Average across suspensions (1/0 = Y/N)14 % avgbat = Average across batches (incl. suspensions) (1/0=Y/N)15 % datasub = Use data subset (1/0 = Y/N)16 % To test whether the datadir contains files, omit last17 % three inputs18 % OUTPUT19 % files_out = Struct containing file and target information20 % samples = Struct containing sample information21 % norm_data = Normalized spectra matrix22 % target_sorted = Target vector23 % files_not_loaded = Files not loaded24 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%25 function [files_out, samples, norm_data, target_sorted, ...files_not_loaded, raw_data, target_name_sorted] = ...loaddata(datadir, targetfile, filetype, avgsusp, ...avgbat, datasub)26 cd(datadir) % Load data folder27 if nargin == 3 % Test folder contents28 switch filetype29 case 130 files = dir('*.mat'); % Get folder contents (for MAT)31 fprintf('\nMAT files selected.\n');32 case 233 files = dir('*.csv'); % Get folder contents (for CSV)34 fprintf('\nCSV files selected.\n');35 case 336 files = dir('*.xls'); % Get folder contents (for XLS)37 fprintf('\nXLS files selected.\n');38 case 439 files = dir('*.xml'); % Get folder contents (for XML)40 fprintf('\nXML files selected.\n');41 otherwise42 fprintf('\nInvalid filetype entered.\n');15243 return44 end45 cd .. % Return to base dir46 % No files of correct type found47 if isempty(files)48 fprintf('\nNo files found, check data folder.\n');49 files_out = 'No files found.';50 return51 else % Files found; test complete; return to main script52 files_out = [];53 return54 end55 else % Load folder contents56 switch filetype % List files in folder57 case 158 files = dir('*.mat');59 case 260 files = dir('*.csv');61 case 362 files = dir('*.xls');63 case 464 files = dir('*.xml');65 end66 cd .. % Return to base dir67 fprintf('\nFile list made. Loading data...\n');68 end69 %% Import Files %%70 % Load target data71 load(targetfile);72 load('Z:\Ashton\TOGA_Firstpulp\r18_data.mat');73 % Sort target files74 [¬,TSortIndex] = sort(target);75 target_name_sorted = target_name(TSortIndex);76 % Prepare variables for loading data77 kk = 1;78 [files(:).data] = deal(cell(1,1024));79 [files(:).batch] = deal(0);80 [files(:).suspension] = deal(0);81 [files(:).target] = deal(cell(1,1));82 % Build list of files to load based on names in target data15383 for ii = length(files):-1:184 base_filename = files(ii).name; % Get base file name85 file_number = strsplit(base_filename,{'-B','-S','.'}); % ...Get file number86 files(ii).batch = str2double(file_number(2)); % Store ...batch number87 files(ii).suspension = str2double(file_number(3)); % ...Store suspension number88 filename_idx = strcmp(target_name_sorted, ...strcat(file_number{1})); % Search for file within ...target names89 read_idx = find(filename_idx,1); % Find file position ...within target names90 if isempty(read_idx) % If file(ii) doesn't exist in the ...target data91 files_not_loaded{kk} = base_filename; % Record filename ...and skip92 kk = kk+1;93 files(ii) = []; % Delete file(ii) from list to load94 file_idx(ii) = [];95 else96 file_idx(ii) = read_idx; % Store file(ii)'s position97 end98 end99 % Address multiple spectra (replicates) for single samples100 files_sorted = files;101 replicates = max(cat(files.batch)) * ...max(cat(files.suspension)); % Calculate number of ...replicates per sample102 if replicates > 1103 target_rep = repelem(target,replicates); % Replicate targets104 r18_rep = repelem(r18,replicates); % Replicate targets105 for ii = 1:length(files)106 files(ii).target = target_rep(ii);107 end108 target_name_rep = repelem(target_name,replicates); % ...Replicate target names109 file_idx = file_idx.*replicates-(replicates-1); % ...Replicate file positions110 for ii = 1:length(file_idx)154111 if rem(ii-1,replicates) > 0 % Find replicated file ...positions112 file_idx(ii) = file_idx(ii) + rem(ii-1,replicates); ...% Add remainder to fill out list113 end114 end115 else116 target_rep = target;117 target_name_rep = target_name;118 end119 % Sort replicated targets and files120 [target_sorted,TSortIndex] = sort(target_rep);121 r18_sorted = r18;122 target_name_sorted = target_name_rep(TSortIndex);123 % Determine which files have null target data124 del_notgt = find(target_sorted == 0);125 % Set up raw data matrix126 raw_data = zeros(length(files),1024);127 warning('error','MATLAB:load:variableNotFound');128 warning('off','MATLAB:xlsread:ActiveX');129 %%130 % Load files131 for ii = 1:length(files)132 base_filename = files(ii).name; % Get base file name133 full_filename = fullfile(datadir,base_filename); % ...Generate full file name134 load_idx = file_idx(ii); % Determine which file to load135 if not(isempty(load_idx)) % If file(ii) exists136 if isempty(find(del_notgt == load_idx,1)) % If file(ii) ...has associated target data137 files_sorted(load_idx) = files(ii); % Add file(ii) to ...sorted list of files138 switch filetype139 case 1 % MAT140 try141 load(full_filename,'data');142 if size(data,1) > 1 % If there are multiple spectra143 raw_data(load_idx,:) = mean(data);144 else145 raw_data(load_idx,:) = data;155146 end147 catch % If the MAT file contains no data148 raw_data(load_idx,:) = zeros(1,1024);149 fprintf('\nWARNING: %s contains no data! ...Skipping...\n',base_filename);150 end151 files_sorted(load_idx).data = data;152 files_sorted(load_idx).target = ...target_sorted(load_idx);153 case 2 % CSV154 try % Check for string header155 data = csvread(full_filename);156 catch157 data = csvread(full_filename,2,17);158 % Subtract blank record159 blank = ...textread(full_filename,'%s','whitespace',',');160 if (strcmp(blank(26),'B') == 1) || ...(strcmp(blank(26),'"B"'))161 kk = 1;162 for ii = size(data,1):-1:2163 bgsub_data = data(2,:)-data(1,:);164 kk = kk+1;165 end166 data = bgsub_data;167 end168 end169 if size(data,1) > 1170 raw_data(load_idx,:) = mean(data);171 else172 raw_data(load_idx,:) = data;173 end174 files_sorted(load_idx).data = data;175 files_sorted(load_idx).target = ...target_sorted(load_idx);176 case 3 % XLS177 data = xlsread(full_filename,'Pixel');178 data(1,:) = []; data(:,1) = []; data = data';179 if size(data,1) > 1180 raw_data(load_idx,:) = mean(data);156181 else182 raw_data(load_idx,:) = data;183 end184 files_sorted(load_idx).data = data;185 files_sorted(load_idx).target = ...target_sorted(load_idx);186 case 4 % XML187 [raw_data(load_idx,:),¬] = ...xml2spectra(full_filename);188 files_sorted(load_idx).data = raw_data(load_idx,:);189 files_sorted(load_idx).target = ...target_sorted(load_idx);190 end191 else192 if not(isempty(find(del_notgt==load_idx,1))) % If ...file(ii) has no target data193 files_not_loaded{kk} = base_filename; % Record ...filename and skip194 kk = kk+1;195 end196 end197 end198 clear bg_sub_data kk199 end200 % Delete spectra with no target data201 if not(isempty('del_notgt'))202 raw_data(del_notgt,:) = [];203 files_sorted(del_notgt) = [];204 target_sorted(del_notgt) = [];205 target_name_sorted(del_notgt) = [];206 end207 % Find blank spectra208 kk = 1;209 for ii = 1:size(raw_data,1)210 if raw_data(ii) == 0211 del_nospect(kk) = ii;212 kk = kk+1;213 end214 end215 % Delete blank spectra157216 if exist('del_nospect','var')217 if size(raw_data,1) == length(files_sorted) % If blank ...spectrum has target data218 raw_data(del_nospect,:) = [];219 files_sorted(del_nospect) = [];220 target_sorted(del_nospect) = [];221 target_name_sorted(del_nospect) = [];222 else % If blank spectrum has no target data223 raw_data(del_nospect,:) = [];224 end225 end226 %%227 % Average replicate suspensions228 if avgsusp == 1229 kk = 1;230 for ii = 1:length(files_sorted)231 % Extract filenames232 file_1 = strsplit(files_sorted(ii).name,{'-B','-S','.'});233 file_1 = cellfun(@str2num,file_1(3));234 % Second spectrum235 if ii < length(files_sorted)236 file_2 = ...strsplit(files_sorted(ii+1).name,{'-B','-S','.'});237 file_2 = cellfun(@str2num,file_2(3));238 end239 % Check for identical targets240 if ii ≤ length(files_sorted)-max(cat(files.suspension))241 targets = range(vertcat(files_sorted(ii: ...ii+max(cat(files.suspension))-1).target));242 else243 targets = range(vertcat(files_sorted(ii:end).target));244 end245 % If there are replicates246 if file_1 == 1 && file_2 ̸= 1 && targets == 0247 if ii ≤ length(files_sorted)-max(cat(files.suspension))248 avg_data(ii,:) = mean(raw_data(ii: ...ii+max(cat(files.suspension)-1),:));249 else250 avg_data(ii,:) = mean(raw_data(ii:end,:));251 end158252 target_avg(ii) = files_sorted(ii).target;253 avg_list(kk) = ii;254 kk = kk+1;255 end256 % If there are no replicates257 if file_1 == 1 && file_2 == 1258 avg_data(ii,:) = raw_data(ii,:);259 target_avg(ii) = files_sorted(ii).target;260 avg_list(kk) = ii;261 kk = kk+1;262 end263 end264 % Average replicate suspensions265 if avgbat == 0266 raw_data = avg_data(avg_list,:);267 target_sorted = target_avg(avg_list)';268 target_name_avg = target_name_sorted(avg_list);269 files_out = files_sorted(avg_list);270 else271 files_avg = files_sorted(avg_list);272 end273274 end275 % Average replicate batches276 if avgbat == 1277 clear avg_list target_avg avg_data278 kk = 1;279 for ii = 1:length(files_avg)280 % Extract filenames281 file_1 = strsplit(files_avg(ii).name,{'-B','-S','.'});282 file_1 = cellfun(@str2num,file_1(2));283 % Second spectrum284 if ii < length(files_avg)285 file_2 = strsplit(files_avg(ii + ...1).name,{'-B','-S','.'});286 file_2 = cellfun(@str2num,file_2(2));287 else288 file_2 = 0;289 end290 % Check for identical targets159291 if ii ≤ length(files_avg)-max(cat(files.batch))292 targets = range(vertcat(files_avg(ii: ...ii+max(cat(files.batch))-1).target));293 else294 targets = range(vertcat(files_avg(ii:end).target));295 end296 % If there are replicates297 if file_1 == 1 && file_2 ̸= 1 && targets == 0298 if ii ≤ length(files_avg)-max(cat(files.batch))299 avg_data(ii,:) = mean(raw_data(ii: ...ii+max(cat(files.batch)-1),:));300 else301 avg_data(ii,:) = mean(raw_data(ii:end,:));302 end303 target_avg(ii) = files_avg(ii).target;304 avg_list(kk) = ii;305 kk = kk+1;306 end307 % If there are no replicates308 if file_1 == 1 && file_2 == 1309 avg_data(ii,:) = raw_data(ii,:);310 target_avg(ii) = files_avg(ii).target;311 avg_list(kk) = ii;312 kk = kk+1;313 end314 end315 % Average replicate batches316 raw_data = avg_data(avg_list,:);317 target_sorted = target_avg(avg_list)';318 target_name_avg = target_name_sorted(avg_list);319 files_out = files_avg(avg_list);320 end321 % No averaging322 if avgsusp == 0 && avgbat == 0323 files_out = files_sorted;324 end325 % Normalize data326 norm_data = zeros(size(raw_data));327 for ii = 1:size(raw_data,1)160328 norm_data(ii,:) = (raw_data(ii,:)-min(raw_data(ii,:))) / ...(max(raw_data(ii,:))-min(raw_data(ii,:)));329 end330 %%331 % Use subset of data332 if datasub == 1333 % low = mean(target_sorted) - std(target_sorted);334 % high = mean(target_sorted) + std(target_sorted);335 low = 450; % Alt. low filter336 high = 600; % Alt. high filter337 % Enumerate data to be removed338 low_cuts = find(target_sorted < low);339 high_cuts = find(target_sorted > high);340 cuts = vertcat(low_cuts,high_cuts);341 % Remove data342 target_sorted(cuts) = [];343 norm_data(cuts,:) = [];344 end345 % Remove unneded information from list of files346 fields = {'bytes','isdir','datenum'};347 files_out = rmfield(files_out,fields);348 if datasub == 1349 files_out(cuts) = [];350 end351 %%352 % Set up struct of sample names353 for ii = length(files_out):-1:1354 samples(ii) = struct('Sample',[],'Target', ...target_sorted(ii),'Prediction',[]);355 % No averaging356 if avgbat == 0 && avgsusp == 0357 if rem(ii-1,replicates) == 0358 samplename = strsplit(files_sorted(ii).name,'-B');359 samples(ii).Sample = samplename(1);360 else361 samples(ii) = [];362 end363 end364 % Average across suspensions365 if avgbat == 0 && avgsusp == 1161366 if rem(ii-1,max(cat(files_sorted.suspension))) == 0367 samplename = strsplit(files_sorted(ii).name,'-B');368 samples(ii).Sample = samplename(1);369 else370 samples(ii) = [];371 end372 end373 % Average across batches374 if avgbat == 1375 samplename = strsplit(files_sorted(ii).name,'-B');376 samples(ii).Sample = samplename(1);377 end378 end379 if not(exist('files_not_loaded','var'))380 files_not_loaded = 'All files loaded.';381 else382 files_not_loaded = files_not_loaded';383 end384 endA.1.3 Load Independent Validation Set Function1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2 % Load independent (unknown) validation set for PLS models3 % for use with pulp processing script4 % Ashton Christy5 % 31 Oct 20166 % Edited 8 Sep 20177 % For model PulpEye files8 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%9 % INPUT10 % files: list of files not loaded because they have '0' data11 % filetype = 1 for MAT, 2 for CSV, 3 for XLS, 4 for XML12 % datafolder: location of files to load13 % avgsusp = Average across suspensions (1/0 = Y/N)14 % avgbat = Average across batches (incl. suspensions) (1/0=Y/N)15 % scale: DWT scale16 % wfilter: DWT wavelet16217 % iterations: number of times to run DWT18 % hfreq_cuts: number of high frequency DWT cuts19 % lfreq_cuts: number of low frequency DWT cuts20 % bg_remove: 1 for background removal, 0 for none21 % OUTPUT22 % indvalnorm = Normalized spectra23 % indvalfilt = DWT-filtered spectra24 % indvalwcoef = Wavelet coefficients of spectra25 % indvalWL = Book-keeping vector for coefficients26 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%27 function [indvalnorm, indvalfilt, indvalwcoef, indvalWL, ...indvalname] = loadindval(files, filetype, datafolder, ...avgsusp, avgbat, scale, wfilter, iterations, ...hfreq_cuts, lfreq_cuts, bg_remove)28 warning('off','MATLAB:xlsread:ActiveX');29 for ii = 1:length(files)30 fullfilename = fullfile(datafolder,files{ii}); % Generate ...full file name31 switch filetype32 case 1 % MAT33 load(fullfilename,'data'); % Load spectra34 if size(data,1) > 1 % If there are multiple spectra35 indvalset(ii,:) = mean(data); % Average spectra36 else37 indvalset(ii,:) = data; % Store spectrum38 end39 case 2 % CSV40 try % Check for string header41 data = csvread(fullfilename); % Load spectra, no header42 catch43 data = csvread(fullfilename,2,17); % Load spectra ...with header44 end45 if size(data,1) > 1 % If there are multiple spectra46 indvalset(ii,:) = mean(data); % Average spectra47 else48 indvalset(ii,:) = data; % Store spectrum49 end50 case 3 % XLS51 data = xlsread(fullfilename,'Pixel'); % Load spectra16352 data(1,:) = []; data(:,1) = []; data = data'; % ...Manipulate53 if size(data,1) > 1 % If there are multiple spectra54 indvalset(ii,:) = mean(data); % Average spectra55 else56 indvalset(ii,:) = data; % Store spectrum57 end58 case 4 % XML59 [indvalset(ii,:),¬] = xml2spectra(fullFileName); % ...Store spectra60 end61 end62 % Average replicate suspensions63 if avgsusp == 164 kk = 1;65 for ii = length(files):-1:166 % Extract filenames67 file_1 = strsplit(files{ii},{'-B','-S','.'});68 file_1 = cellfun(@str2num,file_1(3));69 if ii > 1 % First spectrum70 file_2 = strsplit(files{ii-1},{'-B','-S','.'});71 file_2 = cellfun(@str2num,file_2(3));72 end73 if ii > 2 % Second spectrum74 file_3 = strsplit(files{ii-2},{'-B','-S','.'});75 file_3 = cellfun(@str2num,file_3(3));76 end77 if ii < length(files)78 file_4 = strsplit(files{ii+1},{'-B','-S','.'});79 file_4 = cellfun(@str2num,file_4(3));80 end81 % If there are three replicates82 if (file_1 == 3 && file_2 == 2 && file_3 == 1) || ...(file_1 == 1 && file_2 == 2 && file_3 == 3)83 avg_data(ii,:) = mean(indvalset(ii-2:ii,:));84 avg_list(kk) = ii;85 kk = kk+1;86 end87 % If there are two replicates16488 if (file_1 == 3 && file_2 == 2 && file_3 == 3) ...||(file_1 == 3 && file_2 == 1 && file_3 == 3) || ...(file_1 == 2 && file_2 == 1 && file_3 == 3 && ...file_4 ̸= 3)89 avg_data(ii,:) = mean(indvalset(ii-1:ii,:));90 avg_list(kk) = ii;91 kk = kk+1;92 end93 % If there are no replicates94 if file_1 == 1 && file_2 == 3 && file_4 == 395 avg_data(ii,:) = indvalset(ii,:);96 avg_list(kk) = ii;97 kk = kk+1;98 end99 end100 % Average replicate suspensions101 if avgbat == 0102 indvalset = avg_data(flip(avg_list),:);103 indvalname = files(avg_list);104 else105 files_avg = files(flip(avg_list));106 end107 end108 % Average replicate batches109 if avgbat == 1110 clear avg_list avg_data111 kk = 1;112 for ii = length(files_avg):-1:1113 % Extract filenames114 file_1 = strsplit(files_avg{ii},{'-B','-S','.'});115 file_1 = cellfun(@str2num,file_1(2));116 if ii > 1 % First spectrum117 file_2 = strsplit(files_avg{ii-1},{'-B','-S','.'});118 file_2 = cellfun(@str2num,file_2(2));119 end120 if ii > 2 % Second spectrum121 file_3 = strsplit(files_avg{ii-2},{'-B','-S','.'});122 file_3 = cellfun(@str2num,file_3(2));123 end124 if ii < length(files_avg)165125 file_4 = strsplit(files_avg{ii+1},{'-B','-S','.'});126 file_4 = cellfun(@str2num,file_4(2));127 end128 % If there are three replicates129 if (file_1 == 3 && file_2 == 2 && file_3 == 1) || ...(file_1 == 1 && file_2 == 2 && file_3 == 3)130 avg_data(ii,:) = mean(indvalset(ii-2:ii,:));131 avg_list(kk) = ii;132 kk = kk+1;133 end134 % If there are two replicates135 if (file_1 == 3 && file_2 == 2 && file_3 == 3) || ...(file_1 == 3 && file_2 == 1 && file_3 == 3) || ...(file_1 == 2 && file_2 == 1 && file_3 == 3 && ...file_4 ̸= 3)136 avg_data(ii,:) = mean(indvalset(ii-1:ii,:));137 avg_list(kk) = ii;138 kk = kk+1;139 end140 % If there are no replicates141 if file_1 == 1 && file_2 == 3 && file_4 == 3142 avg_data(ii,:) = indvalset(ii,:);143 avg_list(kk) = ii;144 kk = kk+1;145 end146 end147 % Average replicate batches148 indvalset = avg_data(flip(avg_list),:);149 indvalname = files(avg_list);150 end151 if avgsusp == 0 && avgbat == 0152 indvalname = files;153 end154 % Normalize155 indvalnorm = zeros(size(indvalset));156 for ii = 1:size(indvalset,1)157 indvalnorm(ii,:) = ...(indvalset(ii,:)-min(indvalset(ii,:))) / ...max(indvalset(ii,:));158 end166159 % Background removal160 if bg_remove == 1161 [¬,indvalbg] = dwt_bg_remove(indvalnorm, scale, wfilter, ...iterations);162 end163 % DWT filtering164 for ii = 1:size(indvalset,1)165 if bg_remove == 1166 [indvalfilt_1d, indvalwcoef_1d, indvalWL] = ...dwt_filter_lmelo(indvalbg(ii,:), scale, wfilter, ...hfreq_cuts, lfreq_cuts, 0);167 else168 [indvalfilt_1d, indvalwcoef_1d, indvalWL] = ...dwt_filter_lmelo(indvalnorm(ii,:), scale, wfilter, ...hfreq_cuts, lfreq_cuts, 0);169 end170 indvalfilt(ii,:) = indvalfilt_1d;171 indvalwcoef(ii,:) = indvalwcoef_1d;172 end173 end167A.1.4 Partial Least-Squares Regression Modelling FunctionsThe Partial Least-Squares Regression (PLS) functions, plssim and plsdcv, werewritten by S. de Jong and H.-D. Li, respectively. [72, 76]PLSSIM Function1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2 % PLS-SIM modeling script3 % for use with pulp processing script4 % Requires plssim.m and optimizepls.m functions5 % Ashton Christy6 % 7 Nov 20167 % Edited 24 Nov 20178 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%9 % INPUT10 % data = Data matrix (X)11 % target = Result vector (Y)12 % valsize = Size of validation set (percent, less than 0.5)13 % valtype = Calibration/Validation type:14 % 1 = Random15 % 2 = Center16 % 3 = Low17 % 4 = High18 % LVs = number of LVs (0 = optimize)19 % opt_plot = Display optimization plot20 % OUTPUT21 % B = Regression coefficients22 % cal = Calibration dataset23 % val = Validation dataset24 % caltar = Calibration target vector25 % valtar = Validation target vector26 % yp_cal = Predictions for calibration set27 % yp_val = Predictions for validation set28 % rmsec = Root Mean-Square Error of Calibration29 % rmsep = Root Mean-Square Error of Prediction30 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%31 function [B, cal, val, caltar, valtar, yp_cal, yp_val, ...rmsec, rmsep, PCTVAR, MSE, stats] = pls_ac_sim(data, ...168target, valsize, valtype, LVs, opt_plot)32 if nargin < 433 valtype = 1; % Default to random validation set34 end35 if valsize > 0.5 % Validation set too large36 fprintf('\nValidation set too large; must be less than ...0.5 (50% of data).\n');37 return38 end39 cal = data;40 val = data;41 switch valtype42 case 1 % Random43 split_rnd = randperm(length(target), ...round(length(target)*valsize));44 split_rnd = sort(split_rnd)';45 % Initialize validation and calibration sets46 val = zeros(length(split_rnd),size(data,2));47 for ii = 1:length(split_rnd)48 val(ii,:) = cal(split_rnd(ii),:);49 end50 cal(split_rnd,:) = [];51 % Initialize validation and calibration targets52 valtar = zeros(length(split_rnd),1); caltar = target;53 for ii = 1:length(split_rnd)54 valtar(ii) = target(split_rnd(ii));55 end56 caltar(split_rnd) = [];57 case 2 % Center58 split_lower = round(length(target)*(0.5-valsize/2));59 split_upper = round(length(target)*(0.5+valsize/2));60 % Initialize validation and calibration sets61 val(split_upper:end,:) = [];62 val(1:split_lower,:) = [];63 cal(split_lower+1:split_upper-1,:) = [];64 % Initialize validation and calibration targets65 valtar = target; caltar = target;66 valtar(split_upper:end) = [];67 valtar(1:split_lower) = [];68 caltar(split_lower+1:split_upper-1) = [];16969 case 3 % Low70 split_point = round(length(target)*valsize);71 % Initialize validation and calibration sets72 val = cal(1:split_point,:);73 cal(1:split_point,:) = [];74 % Initialize validation and calibration targets75 valtar = target(1:split_point);76 caltar = target;77 caltar(1:split_point) = [];78 case 4 % High79 split_point = round(length(target)*(1-alsize));80 % Initialize validation and calibration sets81 val = cal(split_point:end,:);82 cal(split_point:end,:) = [];83 % Initialize validation and calibration targets84 valtar = target(split_point:end);85 caltar = target;86 caltar(split_point:end) = [];87 case 5 % Every fourth88 % Initialize validation and calibration sets89 val = cal(1:4:end,:);90 cal(1:4:end,:) = [];91 % Initialize validation and calibration targets92 valtar = target(1:4:end);93 caltar = target;94 caltar(1:4:end) = [];95 end96 % Determine optimal number of factors (LVs)97 if LVs == 098 factors = optimizepls(cal,caltar,opt_plot);99 else100 factors = LVs;101 end102 if valtype ̸= 1 % Non-random validation103 % Calculate Model104 % S = cal' * caltar; XtX = cal' * cal;105 % [B,C,P,T,U,R,R2X,R2Y] = plssim(cal,caltar,factors,S,XtX);106 [¬,¬,¬,¬,B,PCTVAR,MSE,stats] = ...plsregress(cal,caltar,factors);107 % Plot model170108 yp_cal = [ones(size(cal,1),1) cal]*B;109 yp_val = [ones(size(val,1),1) val]*B;110 % Calculate RMSEP111 pls_error = zeros(length(valtar),1);112 for ii = 1:length(valtar)113 pls_error(ii,1) = (valtar(ii,:)-yp_val(ii,:));114 end115 rmsep = sqrt(sum(pls_error.^2)/length(valtar));116 % Calculate RMSEC117 pls_error = zeros(length(caltar),1);118 for ii = 1:length(caltar)119 pls_error(ii,1) = (caltar(ii,:)-yp_cal(ii,:));120 end121 rmsec = sqrt(sum(pls_error.^2)/length(caltar));122 else % Random validation123 randvalperms = 1; % Number of random permutations to average124 B_rnd = zeros(length(val)+1,randvalperms);125 rmsep_rnd = zeros(1,randvalperms);126 rmsec_rnd = zeros(1,randvalperms);127 pls_errorp_rnd = zeros(length(valtar),randvalperms);128 pls_errorc_rnd = zeros(length(caltar),randvalperms);129 for jj = 1:randvalperms % Build models130 cal = data;131 % Random calibration132 split_rnd = randperm(length(target), ...round(length(target)*valsize));133 split_rnd = sort(split_rnd)';134 % Initialize validation and calibration sets135 val = zeros(length(split_rnd),size(data,2));136 for ii = 1:length(split_rnd)137 val(ii,:) = cal(split_rnd(ii),:);138 end139 cal(split_rnd,:) = [];140 % Initialize validation and calibration targets141 valtar = zeros(length(split_rnd),1); caltar = target;142 for ii = 1:length(split_rnd)143 valtar(ii,:) = target(split_rnd(ii),:);144 end145 caltar(split_rnd,:) = [];146 % Output:171147 % B, matrix (p,m), regression coefficients148 % C, matrix (m,h), Y loadings149 % P, matrix (p,h), X loadings150 % T, matrix (n,h), X scores (standardized)151 % U, matrix (n,h), Y scores152 % R, matrix (p,h), X weights153 % R2X, vecor (1,h), X-variance154 % R2Y, vecor (1,h), Y-variance155 % Calculate model156 % S = cal' * caltar; XtX = cal' * cal;157 % [B_rnd(:,jj),C,P,T,U,R,R2X,R2Y] = ...plssim(cal,caltar,factors,S,XtX);158 [¬,¬,¬,¬,B_rnd(:,jj),PCTVAR,MSE,stats] = ...plsregress(cal,caltar,factors);159 yp_cal = [ones(size(cal,1),1) cal]*B_rnd(:,jj);160 yp_val = [ones(size(val,1),1) val]*B_rnd(:,jj);161 % Calculate RMSEP162 pls_error_rnd = zeros(length(valtar),1);163 for ii = 1:length(valtar)164 pls_error_rnd(ii) = (valtar(ii,:)-yp_val(ii,:));165 end166 rmsep_rnd(jj) = sqrt(sum(pls_error_rnd.^2) / ...length(valtar));167 pls_errorp_rnd(:,jj) = pls_error_rnd;168 % Calculate RMSEC169 pls_error_rnd = zeros(length(caltar),1);170 for ii = 1:length(caltar)171 pls_error_rnd(ii) = (caltar(ii,:)-yp_cal(ii,:));172 end173 rmsec_rnd(jj) = sqrt(sum(pls_error_rnd.^2) / ...length(caltar));174 pls_errorc_rnd(:,jj) = pls_error_rnd;175 end176 % Average models177 B = mean(B_rnd,2);178 rmsep = mean(rmsep_rnd);179 rmsec = mean(rmsec_rnd);180 yp_cal = [ones(size(cal,1),1) cal]*B;181 yp_val = [ones(size(val,1),1) val]*B;182 end172183 fprintf('\nPLSSIM models built.\n');184 if LVs == 0185 fprintf('%g factors used (optimized). ',factors);186 end187 endPLSDCV Function1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2 % PLS-DCV model for use with TOGA3 % Requires plsdcv.m and optimizepls.m functions4 % Ashton Christy5 % 9 Nov 20166 % Edited 1 Dec 20167 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%8 % INPUT9 % data = Data matrix (X)10 % target = Result vector (Y)11 % valsize = Size of validation set (percent, < 0.5)12 % opt_plot = Display optimization plot13 % OUTPUT14 % DCV = Struct containing PLS data:15 % .PLSB = Regression coefficients (column = # of LVs)16 % .method = DCV pretreatment method17 % .RMSECV = Rood Mean-Square Error of Cross-Validation18 % .nLV = Number of latent variables per DCV iteration19 % .predError = Prediction error (offset) per sample20 % .ypre = Predicted values21 % .ytest = Target values22 % rmsec = Root Mean-Square Error of Calibration23 % rmsep = Root Mean-Square Error of Prediction24 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%25 function [DCV, rmsec, rmsep] = pls_ac_dcv(data, target, ...valsize, opt_plot)26 if valsize > 0.5 % Validation set too large27 fprintf('\nValidation set too large; must be less than ...0.5 (50% of data).\n');28 return17329 end30 % Random calibration for factor (LV) optimization31 cal = data;32 caltar = target;33 split_rnd = randperm(length(target), ...round(length(target)*valsize));34 split_rnd = sort(split_rnd)';35 % Initialize calibration set36 cal(split_rnd, :) = [];37 % Initialize calibration targets38 caltar(split_rnd) = [];39 % Determine optimal number of factors40 factors = optimizepls(cal, caltar, opt_plot);41 fprintf('\nBuilding PLSDCV model...\n');42 % Build model43 [DCV] = ...plsdcv(data,target,20,floor((1-valsize)*length(target)), ...'center',0,0);44 % Outputs45 DCV.PLSB = squeeze(mean(DCV.PLSB,1));46 rmsep = mean(DCV.RMSEP);47 rmsec = DCV.RMSECV;48 for ii = size(DCV.PLSB,2):-1:149 if mean(DCV.PLSB(:,ii)) == 0 || ii > factors50 DCV.PLSB(:,ii) = [];51 end52 end53 fprintf('\nPLSDCV model built.\n');54 endA.1.5 PLS Latent Variable Optimization Function1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2 % Optimize PLS model paramteters3 % Requires plssim.m function4 % Ashton Christy5 % 4 Nov 20166 % Edited 24 Nov 20171747 % C1 parameter based on:8 % Kalivas and Palmer, J. Chemometrics 2014; 28: 347-3579 % DOI: 10.1002/cem.255510 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%11 % INPUT12 % cal = Calibration matrix (X)13 % caltar = Calibration target (Y)14 % plot = Show plots? 1 = yes, 0 = no15 % OUTPUT16 % opt_lv = Optimal number of latent variables (factors)17 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%18 function [opt_lv] = optimizepls(cal, caltar, opt_plot)19 if nargin < 320 opt_plot = 0; % Default to no plot21 end22 % S = cal' * caltar; XtX = cal' * cal;23 overfit = 0;24 for hh = 1:20 % Iterate number of factors25 % Calculate models26 % [B,¬,¬,¬,¬,¬,¬,¬] = plssim(cal,caltar,hh);%, S, XtX);27 [¬,¬,¬,¬,B,R2] = plsregress(cal,caltar,hh);28 yp_cal = [ones(size(cal,1),1) cal]*B;29 pls_error = zeros(length(caltar), 1);30 for ii = 1:length(caltar)31 pls_error(ii,1) = (caltar(ii,:)-yp_cal(ii,:));32 end33 % Calculate ||B||34 Bnorm(hh) = norm(B);35 % Remove overfit models36 if hh > 1 && (Bnorm(hh)/Bnorm(hh-1) > 20)37 Bnorm(hh) = [];38 break39 end40 R2 = cumsum(R2,2);41 R2_base(hh) = R2(2,end);42 rmsec(hh) = sqrt(sum(pls_error.^2)/length(caltar));43 % Randomize target vector to test for overfitting44 if overfit == 045 for ii = 1:10046 caltar_r = caltar(randperm(length(caltar)));17547 correlation(ii) = corr2(caltar,caltar_r);48 [¬,¬,¬,¬,¬,R2] = plsregress(cal,caltar_r,hh);49 R2 = cumsum(R2,2);50 R2_rnd(hh,ii) = R2(2,end);51 end52 if mean(R2_rnd(hh,:)) > 0.35 % Critical distance53 fprintf('Overfit at %g factors (Base R2Y = %.3f, ...Random R2Y = %.3f)\n', hh, R2_base(hh), ...mean(R2_rnd(hh,:)));54 overfit = 1;55 else56 % fprintf('Rnd R2Y at %g factors = %.3f\n', hh, ...mean(R2_rnd(hh,:)));57 end58 clear caltar_r R259 end60 end61 % Calculate C1 parameter62 for hh = 1:length(Bnorm)63 C1(hh) = ((Bnorm(hh)-min(Bnorm)) / ...(max(Bnorm)-min(Bnorm))) + ((rmsec(hh)-min(rmsec)) .../ (max(rmsec)-min(rmsec)));64 end65 % Output66 [¬,opt_lv] = min(C1);67 % Plot68 if opt_plot == 169 warning('off','MATLAB:handle_graphics:exceptions:SceneNode')70 warning('off','MATLAB:gui:latexsup:UnsupportedFont')71 figure; plot(Bnorm, C1, 'bd')72 hold on73 title(['PLS Factor Optimization - Best: ',num2str(opt_lv)])74 xlabel('\sffamily ...$||\hat{\textbf{B}}||$','interpreter','latex')75 ylabel('C_{1}')76 hold off77 end78 % Calculate constrained fit79 corr_plot = [1 abs(correlation)]';80 R2_plot = horzcat(R2_base(opt_lv),R2_rnd(opt_lv,:))';17681 options = optimoptions(@lsqlin,'Algorithm', 'active-set');82 p = lsqlin([corr_plot ones(size(corr_plot))],R2_plot, ...[],[],[1 1],R2_plot(1),[],[],[],options);83 % Evaluate constrained fit84 corrfit = polyval(p,corr_plot);85 % Plot86 if opt_plot == 187 figure; plot(corr_plot,R2_plot,'.')88 hold on89 plot(1,R2_plot(1),'kx','linewidth',4)90 plot(corr_plot,corrfit,'r')91 title(['PLS Model Validation - R2Y Intercept: ', ...num2str(corrfit(2), '%.3g')])92 xlabel('Correlation coeff.')93 ylabel('R2Y')94 plot(linspace(0,1),repelem(0.35,100),'k--');95 hold off96 end97 fprintf('Optimized number of factors: %g\nCumulative R2Y: ...%.3f\nRandomized cumulative R2Y: %.3f%\n', opt_lv, ...R2_base(opt_lv), corrfit(2));98 end177A.2 Raman Data Processing - Python, SQL, andExtractEyeThe following Python script collects Raman data, processes it as outlined previ-ously, and stores it an SQL table, for further processing with ExtractEye. The scriptalso collects PulpEye data from a data dump file, and collates it with the Ramandata.1 # -*- coding: utf-8 -*-2 """3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%4 % Raman spectra (CSV) input script5 % Ashton Christy6 % 25 Apr 20187 % Ported to Python 2.7 on 9 Nov 20188 % Ported to Python 3.7 on 18 Mar 20199 % Version 1.610 % Edited 19 Mar 201911 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%12 """13 # Load packages14 import os, csv, numpy as np, pywt, mysqlx, fnmatch, ...datetime as dt, scipy.io, time, wx15 t0 = time.time()16 # Differentiate between spectrometers17 def spectrometer(name):18 if "WP-" in name:19 spect = 120 sn = "WP785"21 if "S785LC" in name:22 spect = 223 sn = "INNO"24 return spect, sn25 # Convert excel dates to useable format26 def xldate(xldate, datemode):27 # datemode: 0 for 1900-based, 1 for 1904-based28 if xldate == 0:29 xldate = 230 return (17831 dt.datetime(1899, 12, 30)32 + dt.time∆(days=xldate + 1462 * datemode)33 )34 # Folder selector dialog box35 def get_path(wildcard):36 app = wx.App(None)37 style = wx.DD_DEFAULT_STYLE | wx.DD_DIR_MUST_EXIST38 dialog = wx.DirDialog(None, 'Select data folder', '', ...style=style)39 dialog.SetPath('C:\\Data\\All data\\')40 if dialog.ShowModal() == wx.ID_OK:41 path = dialog.GetPath()42 else:43 path = None44 dialog.Destroy()45 return path46 # Load Raman data from folder47 try:48 datadir = get_path("*") + "\\"49 except:50 input("Press Enter to exit...")51 exit(0)52 # Load data53 raw_data = []54 raw_data_line = []55 spec_stats = []56 nir_avg_data = []57 nir_time = []58 files = [ii for ii in os.listdir(datadir)]59 for file in files:60 if file.endswith(".csv"):61 f = open(datadir+file)62 csv_input = csv.reader(f, delimiter=',')63 csv_rows = [ii for ii in csv_input]64 if spectrometer(csv_rows[0][4])[0] == 1: # WP78565 raw_data.append(np.array(csv_rows[3][17:1041]))66 spec_stats.append(csv_rows[3][0:17])67 wavelength = csv_rows[2][17:1041]68 if spectrometer(csv_rows[0][4])[0] == 2: # INNO69 raw_data.append(np.mean(np.array([csv_rows[ii][17:] ...179for ii in ...range(2,len(csv_rows))]).astype(np.float), axis=0))70 spec_stats.append(csv_rows[2][0:17])71 spec_titles = csv_rows[1]72 f.close()73 if file.endswith(".mat"):74 f = scipy.io.loadmat(datadir+file)75 mat_rows = f.values()76 nir_avg_data.append(np.mean(mat_rows[6],axis=0))77 temp_name = mat_rows[4].split(": ")78 nir_time.append(dt.datetime.strptime(temp_name[2][0:-8] ...+ temp_name[2][-4:], '%a %b %d %H:%M:%S ...%Y').strftime('%Y-%m-%d %H:%M:%S'))79 if (not raw_data) and (not nir_avg_data):80 print("No files containing spectral data found in folder: ...%\n" + datadir + "\nDouble check your files and/or ...folder selection!")81 raw_input("Press Enter to exit...")82 exit(1)83 raw_data = np.array(raw_data).astype(np.float)84 nir_avg_data = np.array(nir_avg_data).astype(np.float)85 print("Data loaded from:\n" + datadir + "\n")86 # Average data87 ii = 088 jj = 089 avg_data = []90 files_avg = []91 spec_stats_avg = []92 # Raman data93 if raw_data.size:94 # Average data95 if spectrometer(csv_rows[0][4])[0] == 1: # WP78596 for file in files:97 try:98 if not "1." in files[ii+1]:99 jj += 1100 else:101 avg_data.append(np.mean(raw_data[ii-jj:ii+1,:], ...axis=0))102 files_avg.append(files[ii-jj])180103 spec_stats_avg.append(spec_stats[ii-jj])104 jj = 0105 except:106 avg_data.append(np.mean(raw_data[ii-jj:ii+1,:], ...axis=0))107 files_avg.append(files[ii-jj])108 spec_stats_avg.append(spec_stats[ii-jj])109 break110 ii += 1111 avg_data = np.array(avg_data).astype(np.float)112 elif spectrometer(csv_rows[0][4])[0] == 2: # INNO113 avg_data = raw_data114 files_avg = files115 spec_stats_avg = spec_stats116 # Normalize data117 norm_data = []118 for ii in range(len(avg_data)):119 norm_line = (avg_data[ii,:]-np.min(avg_data[ii,:])) / ...sum((avg_data[ii,:]-np.min(avg_data[ii,:])) * ...(avg_data[ii,:]-np.min(avg_data[ii,:])))120 norm_data.append(norm_line)121 norm_data = np.array(norm_data).astype(np.float)122 # Remove dead pixels123 if spectrometer(csv_rows[0][4])[0] == 1: # WP785124 norm_data = np.delete(norm_data, np.s_[0:49], axis=1)125 avg_data = np.delete(avg_data, np.s_[0:49], axis=1)126 if spectrometer(csv_rows[0][4])[0] == 2: # INNO127 norm_data = np.delete(norm_data, np.s_[0:4], axis=1)128 avg_data = np.delete(avg_data, np.s_[0:4], axis=1)129 # Apply DWT130 filt_data = []131 sym5 = pywt.Wavelet('sym5')132 discard = [0,5,6,7]133 for ii in range(len(avg_data)):134 coeffs = pywt.wavedec(norm_data[ii,:], sym5, ...mode='sym', level=7)135 for jj in list(discard):136 coeffs[jj] = np.zeros_like(coeffs[jj])137 filt_data_1d = pywt.waverec(coeffs, sym5, mode='sym')138 filt_data.append(filt_data_1d)181139 filt_data = np.array(filt_data).astype(np.float)140 # Autoscale data141 autoscaled_data = filt_data/filt_data.std(0)142 # NIR Data143 if nir_avg_data.size:144 files_avg = files145 # Normalize data146 nir_norm_data = []147 for ii in range(len(nir_avg_data)):148 nir_norm_line = ...(nir_avg_data[ii,:]-np.min(nir_avg_data[ii,:])) / ...(np.max(nir_avg_data[ii,:])-np.min(nir_avg_data[ii,:]))149 nir_norm_data.append(nir_norm_line)150 nir_norm_data = np.array(nir_norm_data).astype(np.float)151 # Apply DWT152 nir_filt_data = []153 sym5 = pywt.Wavelet('sym5')154 nir_discard = [0,7]155 for ii in range(len(nir_avg_data)):156 nir_coeffs = pywt.wavedec(nir_norm_data[ii,:], sym5, ...mode='sym', level=7)157 for jj in list(nir_discard):158 nir_coeffs[jj] = np.zeros_like(nir_coeffs[jj])159 nir_filt_data_1d = pywt.waverec(nir_coeffs, sym5, ...mode='sym')160 nir_filt_data.append(nir_filt_data_1d)161 nir_filt_data = np.array(nir_filt_data).astype(np.float)162 # Autoscale data163 nir_autoscaled_data = nir_filt_data/nir_filt_data.std(0)164 print("Data filtered. Loading raw Excel data...")165 #%% Load target data from CSV file166 pe_csvf = "C:\\Data\\SECURE_Canfor_Data\\PulpEye Data for ...Ashton (NOV 01 2018).csv"167 f = open(pe_csvf)168 pe_csv_input = csv.reader(f, delimiter=',')169 pe_csv = [ii for ii in pe_csv_input]170 f.close()171 # Extract data from CSV172 pe_titles_csv = pe_csv[11][1:]173 pe_samples_csv = [ii[1] for ii in pe_csv[12:]]182174 pe_samplerows = []175 pe_samplenames = []176 pe_fileidx = []177 pe_missing = []178 pe_missing_fileidx = []179 for ii,pe_csvr in enumerate(pe_csv):180 for jj,kk in enumerate(pe_csvr):181 if pe_csvr[jj] == '' or pe_csvr[jj] == '-' or ...pe_csvr[jj] == '#REF!' or pe_csvr[jj] == "#N/A":182 pe_csv[ii][jj] = '0'#-32768183 # Match spectral data with CSV data184 for ii,file in enumerate(files_avg):185 # Raman data186 if raw_data.size:187 temp_name = file.split("-")188 temp_name = temp_name[0] + "-" + temp_name[1]189 temp_match = fnmatch.filter(pe_samples_csv, temp_name+'*')190 # NIR data191 if nir_avg_data.size:192 temp_name = file.split(".")193 temp_match = fnmatch.filter(pe_samples_csv, ...temp_name[0]+'*')194 # Perform match195 if temp_match and (fnmatch.fnmatch(temp_match[0], "*-0") ...or fnmatch.fnmatch(temp_match[0], "* 0")):196 pe_samplerows.append(pe_samples_csv.index(temp_match[0]))197 pe_samplenames.append(temp_match[0])198 pe_fileidx.append(ii)199 elif temp_match:200 print(temp_match[0])201 else:202 pe_missing.append(temp_name)203 pe_missing_fileidx.append(ii)204 pe_samplerows = [ii+12 for ii in pe_samplerows]205 #%% Connect to MySQL server -- ADD NIR NOT WORKING!!!!206 print("Connecting to SQL server...")207 session = mysqlx.get_session({208 'host':'localhost', 'port':33060,209 'user':'root', 'password':'root',210 'ssl-mode':'disabled',183211 })212 # Build tables213 sql_file = ["C:\\Data\\SQL Queries\\CreateRamanTables.sql",214 "C:\\Data\\SQL Queries\\CreateExtendedFusionTables.sql"]215 for file in sql_file:216 f = open(file)217 file = f.read()218 f.close()219 sql_commands = file.split(';')220 for jj, command in enumerate(sql_commands):221 if command != "":222 try:223 session.sql(command).execute()224 except:225 print("Command skipped at line",jj)#,"\n",command)226 # Get main tables227 schema = session.get_schema('canfor')228 batchinfo = schema.get_table('batchinfo')229 raman = schema.get_table('raman')230 raman_scaled = schema.get_table('raman_scaled')231 nir = schema.get_table('nir')232 nir_scaled = schema.get_table('nir_scaled')233 targets = schema.get_table('targets')234 pulpeye = schema.get_table('pulpeye')235 # Get further fusion tables236 fusion0 = schema.get_table('fwtwidthresults')237 fusion1 = schema.get_table('fwtresults')238 fusion2 = schema.get_table('fibersummary')239 fusion3 = schema.get_table('fiberkinks')240 fusion4 = schema.get_table('fiberdistributions_1')241 fusion5 = schema.get_table('fiberdistributions_2')242 fusion6 = schema.get_table('fiberdistributions_3')243 fusion7 = schema.get_table('fiberdistributions_4')244 print("Appending Excel data to SQL...")245 # Append data to SQL tables246 for ii,jj in enumerate(pe_samplerows):247 if raw_data.size:248 sampletime = spec_stats_avg[pe_fileidx[ii]][2]249 if nir_avg_data.size:250 sampletime = nir_time[pe_fileidx[ii]]184251 # Batchinfo252 batchinfo.insert(['BatchId','SampleTime','ResultTime', ...'LABELTEXT','PulpName']).values(pe_fileidx[ii]+1, ...sampletime, ...xldate(float(pe_csv[jj][71]),0).strftime('%Y-%m-%d ...%H:%M:%S'), pe_csv[jj][48], pe_csv[jj][50]).execute()253 if raw_data.size:254 # Raman data255 raman_row = [ii+1, 1, "ENG", sampletime, 1, ...pe_fileidx[ii]+1, pe_csv[jj][48], ...spectrometer(csv_rows[0][4])[1]] + ...spec_stats_avg[pe_fileidx[ii]][1:]256 if spectrometer(csv_rows[0][4])[0] == 1: # WP785257 raman_row += [float(kk) for kk in list(np.zeros(48))]258 if spectrometer(csv_rows[0][4])[0] == 2: # INNO259 raman_row += [float(kk) for kk in list(np.zeros(4))]260 raman_row += [float(kk) for kk in ...list(filt_data[pe_fileidx[ii]])]261 raman.insert().values(raman_row).execute()262 # Autoscaled Raman data263 raman_row_scaled = [ii+1, 1, "ENG", sampletime, 1, ...pe_fileidx[ii]+1, pe_csv[jj][48], ...spectrometer(csv_rows[0][4])[1]] + ...spec_stats_avg[pe_fileidx[ii]][1:]264 if spectrometer(csv_rows[0][4])[0] == 1: # WP785265 raman_row_scaled += [float(kk) for kk in ...list(np.zeros(48))]266 if spectrometer(csv_rows[0][4])[0] == 2: # INNO267 raman_row_scaled += [float(kk) for kk in ...list(np.zeros(4))]268 raman_row_scaled += [float(kk) for kk in ...list(autoscaled_data[pe_fileidx[ii]])]269 raman_scaled.insert().values(raman_row_scaled).execute()270 if nir_avg_data.size:271 # NIR data272 nir_row = [ii+1, 1, "ENG", sampletime, 1, ...pe_fileidx[ii]+1, pe_csv[jj][48]]273 nir_row += list(nir_filt_data[pe_fileidx[ii]])274 nir.insert().values(nir_row).execute()275 # Autoscaled NIR data185276 nir_row_scaled = [ii+1, 1, "ENG", sampletime, 1,277 pe_fileidx[ii]+1, pe_csv[jj][48]]278 nir_row_scaled += list(nir_autoscaled_data[pe_fileidx[ii]])279 nir_scaled.insert().values(nir_row_scaled).execute()280 # Targets (from Refiner)281 targets.insert().values([ii+1, 1, "ENG", sampletime, 1, ...pe_fileidx[ii]+1] + pe_csv[jj][1:3] + [0] + ...pe_csv[jj][4:45]).execute()282 # PulpEye data summary283 pulpeye.insert().values([ii+1, 1, "ENG", sampletime, 1, ...pe_fileidx[ii]+1, ...xldate(float(pe_csv[jj][71]),0).strftime('%Y-%m-%d ...%H:%M:%S')] + pe_csv[jj][48:71]).execute()284 # FWT width results285 fusion0.insert().values([ii+1, 1, "ENG", sampletime, 1, ...pe_fileidx[ii]+1] + pe_csv[jj][179:199]).execute()286 # FWT results287 fusion1.insert().values([ii+1, 1, "ENG", sampletime, 1, ...pe_fileidx[ii]+1] + pe_csv[jj][199:239]).execute()288 # Fiber rummary289 fusion2.insert().values([ii+1, 1, "ENG", sampletime, 1, ...pe_fileidx[ii]+1] + pe_csv[jj][239:268] + [0]).execute()290 # Fiber kinks291 fusion3.insert(['ID','VersionID','Language','VersionTime', ...'BranchName','BatchId','KinksPerFiber','KinksPerLen', ...'KinksSegmLen','KinksAngleAvg']).values([ii+1, 1, ..."ENG", sampletime, 1, pe_fileidx[ii]+1] + ...pe_csv[jj][175:179]).execute()292 # Fiber distributions293 fusion4.insert().values([ii+1, 1, "ENG",294 sampletime, 1, 1, 1, pe_fileidx[ii]+1] + ...pe_csv[jj][268:318]).execute()295 fusion5.insert().values([ii+1, 1, "ENG", sampletime, 1, ...1, 2, pe_fileidx[ii]+1] + pe_csv[jj][318:368]).execute()296 fusion6.insert().values([ii+1, 1, "ENG", sampletime, 1, ...1, 3, pe_fileidx[ii]+1] + pe_csv[jj][368:418]).execute()297 fusion7.insert().values([ii+1, 1, "ENG", sampletime, 1, ...1, 4, pe_fileidx[ii]+1] + pe_csv[jj][418:468]).execute()298 session.close()299 t1 = time.time()186300 total = t1-t0301 print("\nDone.")302 print("Total time elapsed :", total, "\n")303 input("Press Enter to exit...")304 exit(0)187A.3 Near-Infrared Image ProcessingThe triangle algorithm function, triangle_th, was written by Dr. Bernard Pan-neton, adapted from Zack et al. [93]A.3.1 Feature Detection Script1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2 % Raw image (.CR2) import and feature detection program3 % Ashton Christy4 % 25 Oct 20165 % Edited 15 Feb 20186 % Version 3.27 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%8 clear; close all; clc;9 % Set data location, get folder contents10 cd 'Z:\Ashton\NIR_Shives\Box-PulpEye_Shive_Comparison'11 datadir = 'Test_20180206';12 %% Convert raw images to TIFF13 cd(datadir)14 files = dir('*.CR2');15 cd ..16 fprintf('\nConverting RAW images in folder "%s" to ...TIFF...\n',datadir);17 for ii = 1:length(files)18 imgfile = strcat(' ...Z:\Ashton\NIR_Shives\Box-PulpEye_Shive_Comparison\', ...datadir,'\',files(ii).name);19 command = strcat('dcraw -v -w -H 0 -o 0 -b 1 -q 3 -4 ...-T',imgfile);20 [status,cmdout] = dos(command,'-echo');21 fprintf('\n');22 end23 fprintf('Done.\n');24 clear imgfile command status cmdout25 %% Import TIFF files %%26 cd(datadir)27 files = dir('*.tiff');28 for ii = 1:length(files)18829 fprintf('Loading image %g of %g...\n',ii,length(files));30 temp = files(ii); % Load current file31 temptiff = Tiff(temp.name,'r'); % Load TIFF32 tempimg = read(temptiff); % Load full image33 images_raw(:,:,ii) = rgb2gray(tempimg); % Store raw image34 close(temptiff); % Close TIFF35 end36 cd ..37 fprintf('\n%g images loaded.\n',length(files));38 clear temp tempimg temptiff filenames filename;39 %% Detect features %%40 % Select image41 image_no = 11; % Image number42 cropsize = 0.75; % Percentage of image to consider, <1.043 img = images_raw(:,:,image_no); % Select image44 close all; clear feature_stats; clc;45 % Correct brightness46 darkness = mean(mean(img));47 img_br = img;48 while darkness ≤ 2500049 img_br = img_br.*1.25;50 darkness = mean(mean(img_br));51 end52 % Crop image53 cropsize = (1-cropsize)/2;54 img_br = img_br(ceil(cropsize * ...size(img_br,1)):ceil((1-cropsize)*size(img_br,1)), ...ceil(cropsize * ...size(img_br,2)):ceil((1-cropsize)*size(img_br,2)));55 raw_image = figure; % Set up maximized figure window56 imshow(img_br)57 % set(raw_image, 'Units', 'normalized', 'Position', ...[0.01,0.04,0.98,0.88]);58 pause(0.00001);59 frame_raw = get(raw_image,'JavaFrame');60 set(frame_raw,'Maximized',1);61 % Scan moving window through image62 windowsize = 300;63 stepsize = windowsize/3;18964 num_windows = ceil(size(img_br,1)/stepsize-2) * ...ceil(size(img_br,2)/stepsize-2);65 feature_stats = repmat(struct('Features',[],'Offset',[]), ...num_windows,1);66 gridpos = 1;67 tic68 for xx = windowsize+1:stepsize:size(img_br,1)69 for yy = windowsize+1:stepsize:size(img_br,2)70 fprintf('Analyzing Grid %g of %g (%g, ...%g)...\n',gridpos,num_windows,xx,yy);71 % Define and filter window72 imgwindow = img_br(xx-windowsize:xx,yy-windowsize:yy);73 imgwindow = wiener2(imgwindow,[5 5]);74 imgwindow_ce = adapthisteq(imgwindow,'NumTiles',[25 ...25],'ClipLimit',0.01); % Contrast enhance75 imgwindow_proc = imgwindow_ce; % Initialize processed ...window76 imgwindow_bw = ones(size(imgwindow_ce)); % Initialize ...binary window77 % Determine darkness threshold78 if mean(mean(imgwindow_ce)) > 32768 % Half of 2^1679 darkness_thresh = 25000;80 else81 darkness_thresh = mean(mean(imgwindow_ce)) - ...std2(imgwindow_ce);82 end83 darkness(gridpos) = darkness_thresh;84 % Select dark regions85 for ii = 1:size(imgwindow_ce,1)86 for jj = 1:size(imgwindow_ce,2)87 if imgwindow_ce(ii,jj) > darkness_thresh88 imgwindow_proc(ii,jj) = 0;89 imgwindow_bw(ii,jj) = 0;90 end91 end92 end93 % figure;imshow(imgwindow_ce)94 % Calculate region properties95 stats = regionprops(im2bw(imgwindow_bw),'all');96 % Remove fine noise19097 for ii = length(stats):-1:198 if stats(ii).FilledArea < 10099 imgwindow_proc(stats(ii).PixelIdxList) = 0;100 stats(ii) = [];101 end102 end103 % Calculate additional feature properties104 [stats.Compactness] = deal(length(stats));105 [stats.Compactness2] = deal(length(stats));106 [stats.Moment] = deal(length(stats));107 [stats.MomentIdeal] = deal(length(stats));108 [stats.MomentRatio] = deal(length(stats));109 [stats.Darkness] = deal(length(stats));110 for ii = 1:length(stats) % Per feature111 % Calculate feature compactness112 stats(ii).Compactness = stats(ii).Perimeter^2 / ...stats(ii).FilledArea;113 stats(ii).Compactness2 = ...sqrt(4*(stats(ii).FilledArea/pi)) / ...stats(ii).Perimeter;114 % Calculate moment of inertia for feature115 img_feature = double(imcrop(imgwindow_proc, ...stats(ii).BoundingBox));116 x = (1:size(img_feature,2));117 y = (1:size(img_feature,1)).';118 x = x-mean(x); y = y-mean(y);119 stats(ii).Moment = sum(reshape(bsxfun(@times, ...bsxfun(@times,img_feature,x.^2),y.^2),[],1));120 % Calculate ideal moment of inertia for feature121 img_ideal = ones(10,round((stats(ii).Perimeter-20)/2));122 x = (1:size(img_ideal,2)); y = (1:size(img_ideal,1)).';123 x = x-mean(x); y = y-mean(y);124 stats(ii).MomentIdeal = sum(reshape(bsxfun(@times, ...bsxfun(@times,img_ideal,x.^2),y.^2),[],1));125 % Calculate moment datio for feature126 stats(ii).MomentRatio = stats(ii).Moment / ...stats(ii).MomentIdeal;127 % Caluclate feature darkness128 fill_compare = size(stats(ii).FilledImage) ̸= ...size(imcrop(imgwindow,stats(ii).BoundingBox));191129 switch num2str(fill_compare)130 case '1 1' % Dimension mismatch131 img_fill = padarray(stats(ii).FilledImage,[1 1]);132 img_fill(1,:) = []; img_fill(:,1) = [];133 case '1 0'134 img_fill = ...padarray(stats(ii).FilledImage,[1],'post');135 case '0 1'136 img_fill = padarray(stats(ii).FilledImage,[0 ...1],'post');137 case '0 0'138 img_fill = stats(ii).FilledImage;139 end140 img_feature = ...imcrop(imgwindow_proc,stats(ii).BoundingBox); % ...Store feature image141 img_feature_crop = img_feature .* uint16(img_fill); % ...Store cropped feature image142 stats(ii).Darkness = ...min(min(img_feature_crop(img_feature_crop ̸= 0))); ...% Non-zero areas143 end144 % Select features145 for ii = length(stats):-1:1146 if stats(ii).Eccentricity < 0.95 || ...147 stats(ii).EulerNumber < 0 || ...148 stats(ii).Solidity < 0.75 || ...149 stats(ii).Compactness < 15 || ...150 not(isempty(find((vertcat(stats(ii).Extrema) == ...0.5),1))) || ...151 not(isempty(find((vertcat(stats(ii).Extrema) == ...windowsize-0.5),1))) || ...152 stats(ii).MomentRatio < 10000 || ...stats(ii).MomentRatio > 40000 || ...153 not(((stats(ii).Darkness < 22000 && ...stats(ii).Darkness < ...mean(mean(imgwindow_ce))-std2(imgwindow_ce)) ...&& ...154 stats(ii).FilledArea > 100 && ...stats(ii).Eccentricity > 0.98) || ...192155 (stats(ii).Darkness < 17500 && ...stats(ii).FilledArea > 10))156 stats(ii) = [];157 end158 end159 % Store detected features160 feature_stats(gridpos,:).Features = stats;161 feature_stats(gridpos,:).Offset = [xx-windowsize+1 ...yy-windowsize+1];162 clear stats imgwindow imgwindow_bw imgwindow_ce ...imgwindow_proc fill_compare img_ideal img_feature ...img_feature_crop163 gridpos = gridpos+1;164 end165 end166 if gridpos < num_windows167 feature_stats(gridpos:end) = [];168 end169 fprintf('Analysis complete.\n');170 % Show features171 img_features = zeros(size(img_br));172 % imggr_ce = adapthisteq(img_br,'NumTiles',[25 ...25],'ClipLimit',0.01);173 results_image = figure;imshow(img_br)174 hold on175 for ii = 1:length(feature_stats)176 % fprintf('Processing Grid %g of ...%g...\n',ii,length(feature_stats));177 for jj = 1:length(feature_stats(ii).Features)178 feature_hull = ...feature_stats(ii).Features(jj).ConvexHull; % Get ...outlines179 feature_img = ...double(feature_stats(ii).Features(jj).ConvexImage); ...% Get images180 feature_box = ...floor(feature_stats(ii).Features(jj).BoundingBox); ...% Get bounding boxes181 % Calculate window offsets193182 feature_hull(:,1) = feature_hull(:,1) + ...feature_stats(ii).Offset(2);183 feature_hull(:,2) = feature_hull(:,2) + ...feature_stats(ii).Offset(1);184 feature_box(1) = ceil(feature_box(1) + ...feature_stats(ii).Offset(2));185 feature_box(2) = ceil(feature_box(2) + ...feature_stats(ii).Offset(1));186 % Isolate features187 mm=1; nn=1;188 for xx = feature_box(2):feature_box(2)+feature_box(4)-1189 for yy = feature_box(1):feature_box(1)+feature_box(3)-1190 img_features(xx,yy) = img_features(xx,yy) + ...feature_img(mm,nn);191 nn = nn+1;192 end193 nn = 1;194 mm = mm+1;195 end196 % Circle features on full image197 plot(feature_hull(:,1),feature_hull(:,2),'g');198 end199 clear feature_hill feature_img feature_box200 end201 % set(results_image, 'Units', 'normalized', 'Position', ...[0.01,0.04,0.98,0.88]);202 pause(0.00001);203 frame_res = get(results_image,'JavaFrame');204 set(frame_res,'Maximized',1);205 features_detected = regionprops(im2bw(img_features));206 shive_count = length(features_detected);207 fprintf('\nProcessing complete. %g shive features ...detected.\n',shive_count);208 clear ii jj mm nn x xx y yy209 toc210 for ii = length(feature_stats):-1:1211 if size(feature_stats(ii).Features,2) == 0212 feature_stats(ii) = [];213 end214 end194215 features = vertcat(feature_stats.Features);216 % Plot feature-only image showing seletion frequency217 map = [[0,0,0]218 [1,0,0]219 [0,1,0]220 [0,0,1]221 [1,1,1]];222 for kk = 3:5223 for ii = 1:size(img_features,1)224 for jj = 1:size(img_features,2)225 if img_features(ii,jj) < kk226 img_features(ii,jj) = 1;227 end228 end229 end230 results_image = figure; imshow(img_features,map)231 features = regionprops(im2bw(img_features,1));232 set(results_image,'Units','normalized', ...'Position',[0.01,0.04,0.98,0.88]);233 figure;imshow(im2bw(img_features,1))234 shive_count(kk) = length(features);235 end236 for ii = 1:3237 fprintf('%g shive features detected ...(%gx).\n',shive_count(ii),ii+2);238 end195A.3.2 Edge Detection Script1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%2 % Raw image (.DNG) import and edge detection program3 % Ashton Christy4 % 9 Sep 20165 % Edited 24 Oct 20166 % Version 1.47 % Before use, convert all CR2 files to DNG using Adobe DNG ...Converter8 % Make sure DNG files are uncompressed!9 % In Adobe DNG Converter, select Change Preferences -> ...Compatibility ->10 % Custom -> check Uncompressed11 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%12 clear; close all; clc;13 warning('off','MATLAB:tifflib:TIFFReadDirectory')14 warning('off','MATLAB:imagesci:tiffmexutils:libtiffWarning')15 warning('off','images:initSize:adjustingMag')16 warning('off','MATLAB:tifflib:TIFFReadDirectory:libraryWarning')17 warning('off','images:initSize:adjustingMag')18 % Set data location19 cd Z:\Ashton\Thick_pulp_NIR\Raw_20160920 % Import image files %%21 files = dir('*.dng'); % Get folder contents22 for ii=1:length(files)23 fprintf('\nLoading image %g of %g...\n', ii, length(files));24 temp = files(ii); % Load current file25 temptiff = Tiff(temp.name,'r'); % Load TIFF26 offset = getTag(temptiff,'SubIFD'); % Get sub-image file ...directory offset for full image27 setSubDirectory(temptiff,offset(1)); % Set offset to full ...image28 tempimg = read(temptiff); % Load full image29 images_raw(:,:,ii) = tempimg; % Store raw image30 images(:,:,ii) = wiener2(histeq(tempimg),[5 5]); % Store ...equalized image31 close(temptiff); % Close TIFF32 end19633 clc; fprintf('\n%g images loaded.\n',length(files));34 clear j temp offset tempimg temptiff filenames filename35 %% Filter images %%36 close all Gaussian Sobel37 % Set parameters %%38 image_no = 22; % Image number39 scale = 1.0; % Contrast scale40 threshold = 35000; % Edge detection limit41 window_size = 9; % Subarea of image to examine for edges (3-99)42 gausstype = 1; % Gaussian filter style: 1 (broad), 2 (sharp)43 sobeltype = 2; % Sobel filter scale: 1 (fine) - 6 (coarse)44 medwindow = 1; % Median filter window (1 = none)45 % Basic pricessing46 img = images(:,:,image_no); % Select image47 fprintf('\nProcessing image %g: %s\n', image_no, ...files(image_no).name);48 img = img(700:2700,1400:4000); % Crop49 img_contrast = img;50 % imshow(img); % Cropped image51 p = (window_size-1)/2; % Define window coordinates52 indices = img > mean(img(:))-(std2(img) / ...(.85*triangle_th(imhist(img),256)));53 img_contrast(indices) = max(img(:)); % Equalize bright areas54 % figure; imshow(img_contrast) % High contrast image55 % Apply Gaussian filter %%56 if gausstype == 1 % Broad57 Gaussian = [2 4 5 4 2; 4 9 12 9 4; 5 12 15 12 5; 4 9 12 9 ...4; 2 4 5 4 2] / 95;58 end59 if gausstype == 2 % Sharp60 Gaussian = [2 4 5 4 2; 4 9 12 9 4; 5 12 15 12 5; 4 9 12 9 ...4; 2 4 5 4 2] / 120;61 end62 img_gauss = abs(conv2(double(img_contrast),Gaussian,'same'));63 % Apply Sobel filter %%64 if sobeltype == 1 % Fine65 Sobel = [0.1 1 0.1; 0 0 0; -0.1 -1 -0.1];66 end67 if sobeltype == 268 Sobel = [0.5 1 0.5; 0 0 0; -0.5 -1 -0.5];19769 end70 if sobeltype == 371 Sobel = [1 2 1; 0 0 0; -1 -2 -1];72 end73 if sobeltype == 474 Sobel = [3 10 3; 0 0 0; -3 -10 -3];75 end76 if sobeltype == 577 Sobel = [6 20 6; 0 0 0; -6 -20 -6];78 end79 if sobeltype == 6 % Coarse80 Sobel = [12 40 12; 0 0 0; -12 -40 -12];81 end82 H = conv2(double(img_gauss),Sobel,'same'); % Filter ...horizontally83 V = conv2(double(img_gauss),Sobel','same'); % Filter vertically84 img_sobel = sqrt(H.^2+V.^2); % Build edge matrix85 clear H V Gaussian Sobel86 %% Detect edges %%87 img_edge = zeros(size(img_sobel,1)-2*p, ...size(img_sobel,2)-2*p); % Initialize image matrix88 for x = p+1:1:size(img,1)-p89 for y = p+1:1:size(img,2)-p90 if (img_sobel(x,y) > threshold)91 img_edge(x-p,y-p) = 1; % Edge detected92 else93 img_edge(x-p,y-p) = 0; % Zero area with no edge94 end95 end96 end97 img_edge_filt = medfilt2(img_edge, [medwindow medwindow]); ...% Apply median filter98 % figure;imshow(img_edge_filt);99 clear x y window p100 % Find connected components in binary image101 img_binary = img_edge_filt;102 components = bwconncomp(img_edge_filt); % Calculate ...connected components103 numPixels = cellfun(@numel,components.PixelIdxList); % ...Count number of pixels per component198104 if length(numPixels) > 6105 scale = 2;106 else scale = 1;107 end108 if not(isempty(components.PixelIdxList))109 for ii = 1:length(numPixels)110 if numPixels(ii) ≥ (mean(numPixels)+scale * ...std(numPixels)) % Find largest components111 img_binary(components.PixelIdxList{ii}) = 2; % ...Isolate largest components112 end113 end114 img_binary = img_binary - 1; % Remove small components115 % figure; imshow(img_binary)116 stats = regionprops(img_binary,'all'); % Determine ...component properties117 img_fill = stats.FilledImage; % Fill in largest components118 fprintf('\nEccentricity: %g\n', stats.Eccentricity);119 % figure; imshow(img_fill)120 figure();121 subplot(2,2,1)122 imshow(images_raw(:,:,image_no),[1 8000])123 subplot(2,2,2)124 imshow(img)125 subplot(2,2,3)126 imshow(img_edge_filt)127 subplot(2,2,4)128 imshow(img_fill)129 else130 fprintf('\nNo components detected, check threshold.\n');131 end199A.3.3 ENVI ScriptThis script was originally written for MATLAB, and then ported to Pyhton 3.7. ThePython version is shown here. The section of this script for dynamically writingthe script for use with ENVI has been omitted; that script is appended after thePython script.1 # -*- coding: utf-8 -*-2 """3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%4 % Raw image (.CR2) import and feature detection program5 % Ashton Christy6 % 25 Oct 20167 % Ported to Python 2.7 on 6 Dec 20188 % Ported to Python 3.7 on 18 Mar 20199 % Version 1.610 % Edited 18 Mar 201911 % Requires IDL with ENVI API12 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%13 """14 # Load packages15 import os, numpy as np, time, wx, exifread, cv2, shutil, ...rawpy, imageio16 from datetime import datetime as dt17 from lxml import etree18 from lxml.etree import Element, SubElement19 t0 = time.time()20 # Folder selector dialog box21 def get_path(wildcard):22 app = wx.App(None)23 style = wx.DD_DEFAULT_STYLE | wx.DD_DIR_MUST_EXIST24 dialog = wx.DirDialog(None, 'Select data folder', '', ...style=style)25 dialog.SetPath('C:\\Data\\PG\\')26 if dialog.ShowModal() == wx.ID_OK:27 path = dialog.GetPath()28 else:29 path = None30 dialog.Destroy()20031 return path32 # Load images from folder33 try:34 datadir = get_path("*") + "\\"35 except:36 input("Press Enter to exit...")37 exit(0)38 #%%39 # Define directories40 datadir = "C:\\Data\\PG\\Full_sheets\\"41 ENVIdir = "\\ENVI_PY\\"42 IDLdir = ..."C:\\Progra¬1\\Harris\\ENVI54\\IDL86\\bin\\bin.x86_64\\idl"43 # Preallocate44 img_names = []45 exif_tags = []46 exif_data = []47 # Load and process images48 files = [ii for ii in os.listdir(datadir)]49 for file in files:50 if file.endswith(".CR2"):51 # Load image and EXIF data52 print("Processing "+file.split(".")[0]+"...")53 img_names.append(file.split(".")[0])54 f = open(datadir+file, "rb")55 exif_tags.append(exifread.process_file(f))56 f.close()57 # img = cv2.imread(datadir+file, cv2.IMREAD_GRAYSCALE)58 ######## Update for new OpenCV that can't read raw ...files directly59 with rawpy.imread(datadir+file) as raw:60 img = raw.postprocess()61 img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)62 # Contrast-enhance (CLAHE)63 img_ce = np.ones_like(img)64 xx = 165 yy = 166 tilesize = 20067 clahe = cv2.createCLAHE(clipLimit=2.55, ...tileGridSize=(12,12))20168 for ii in range(np.floor(img.shape[0] / ...tilesize).astype(np.uint8)):69 for jj in range(np.floor(img.shape[1] / ...tilesize).astype(np.uint8)):70 imgwindow = img[xx:xx+tilesize, yy:yy+tilesize]71 imgwindow_ce = clahe.apply(imgwindow)72 img_ce[xx:xx+tilesize, yy:yy+tilesize] = imgwindow_ce73 yy += tilesize74 xx += tilesize75 yy = 176 del img, clahe, imgwindow, imgwindow_ce77 # Brighten78 img_bright = img_ce.copy()79 del img_ce80 br_pass = 181 #print("\tBrightness: " + str("%i" % np.mean(img_bright)))82 while np.mean(img_bright) ≤ 100:83 print("\tBrightening... (pass " + str(br_pass) + ")")84 img_bright = img_bright * 1.2585 img_bright = img_bright.astype(np.uint8)86 br_pass += 187 # Crop88 img_crop = img_bright[0:np.floor(img_bright.shape[0] / ...tilesize*tilesize).astype(np.uint16), ...0:np.floor(img_bright.shape[1] / ...tilesize*tilesize).astype(np.uint16)]89 del img_bright90 # Get rid of dark spots91 th, im_th = cv2.threshold(img_crop, 200, 255, ...cv2.THRESH_BINARY_INV);92 # Get mask from floodfilling, combine with original image93 mask = np.zeros((im_th.shape[0]+2, im_th.shape[1]+2), ...np.uint8)94 cv2.floodFill(im_th, mask, (0,0), 255);95 mask = np.logical_not(mask).astype("uint8") * 25596 img_out = mask[:-2,:-2] | img_crop97 # Write98 cv2.imwrite(datadir+file.split(".")[0]+"_adj.tiff", ...img_out)99 del img_crop, img_out, mask, im_th202100 # Delete old folders101 for fname in ...os.listdir(os.path.dirname(os.path.dirname(datadir)) + ...ENVIdir):102 path = ...os.path.join(os.path.dirname(os.path.dirname(datadir)) ...+ ENVIdir, fname)103 if (os.path.isdir(path)) and ("IMG_" in path):104 shutil.rmtree(path)105 print("Images processed in:\n" + datadir + "\n")106 #%%107 # Define RUL file108 RULtree = Element("classes", name="All classes")109 # Shives - red str("%i" % (darkness*0.6))110 class_shives = SubElement(RULtree, "class", ...color="#0000FF", name="Shives", threshold="0.93")111 rule1 = SubElement(class_shives, "rule", weight="1.00")112 attr = SubElement(rule1, "attribute", algorithm="binary", ...band="0", name="Spectral_Mean", operation="lt", ...tolerance="5", value="95", weight="0.13")113 attr = SubElement(rule1, "attribute", algorithm="binary", ...band="0", name="Spectral_Min", operation="lt", ...tolerance="5", value="75", weight="0.13")114 attr = SubElement(rule1, "attribute", algorithm="binary", ...band="0", name="Texture_Mean", operation="lt", ...tolerance="5", value="111", weight="0.13")115 attr = SubElement(rule1, "attribute", algorithm="binary", ...band="0", name="Area", operation="between", ...tolerance="5", value="75, 1000", weight="0.13")116 attr = SubElement(rule1, "attribute", algorithm="binary", ...band="0", name="Convexity", operation="lt", ...tolerance="5", value="1.5", weight="0.13")117 attr = SubElement(rule1, "attribute", algorithm="binary", ...band="0", name="Roundness", operation="lt", ...tolerance="5", value="0.4", weight="0.13")118 attr = SubElement(rule1, "attribute", algorithm="binary", ...band="0", name="Elongation", operation="gt", ...tolerance="5", value="1.85", weight="0.13")119 attr = SubElement(rule1, "attribute", algorithm="binary", ...band="0", name="Minor_Length", operation="lt", ...203tolerance="5", value="17", weight="0.13")120 # Potential shives - yellow121 class_smlshives = SubElement(RULtree, "class", ...color="#00FF00", name="SmallShives", threshold="0.93")122 rule2 = SubElement(class_smlshives, "rule", weight="1.00")123 attr = SubElement(rule2, "attribute", algorithm="binary", ...band="0", name="Spectral_Mean", operation="lt", ...tolerance="5", value="100", weight="0.13")124 attr = SubElement(rule2, "attribute", algorithm="binary", ...band="0", name="Spectral_Min", operation="lt", ...tolerance="5", value="80", weight="0.13")125 attr = SubElement(rule2, "attribute", algorithm="binary", ...band="0", name="Texture_Mean", operation="lt", ...tolerance="5", value="111", weight="0.13")126 attr = SubElement(rule2, "attribute", algorithm="binary", ...band="0", name="Area", operation="between", ...tolerance="5", value="75, 2000", weight="0.13")127 attr = SubElement(rule2, "attribute", algorithm="binary", ...band="0", name="Convexity", operation="lt", ...tolerance="5", value="2", weight="0.13")128 attr = SubElement(rule2, "attribute", algorithm="binary", ...band="0", name="Roundness", operation="lt", ...tolerance="5", value="0.5", weight="0.13")129 attr = SubElement(rule2, "attribute", algorithm="binary", ...band="0", name="Elongation", operation="gt", ...tolerance="5", value="1.6", weight="0.13")130 attr = SubElement(rule2, "attribute", algorithm="binary", ...band="0", name="Minor_Length", operation="lt", ...tolerance="5", value="40", weight="0.13")131 # Dark spots - blue132 class_spots = SubElement(RULtree, "class", color="#FF0000", ...name="Spots", threshold="0.93")133 rule3 = SubElement(class_spots, "rule", weight="1.00")134 attr = SubElement(rule3, "attribute", algorithm="binary", ...band="0", name="Spectral_Mean", operation="lt", ...tolerance="5", value="78", weight="0.17")135 attr = SubElement(rule3, "attribute", algorithm="binary", ...band="0", name="Texture_Mean", operation="lt", ...tolerance="5", value="111", weight="0.17")136 attr = SubElement(rule3, "attribute", algorithm="binary", ...204band="0", name="Area", operation="between", ...tolerance="5", value="50, 750", weight="0.17")137 attr = SubElement(rule3, "attribute", algorithm="binary", ...band="0", name="Convexity", operation="lt", ...tolerance="5", value="1.5", weight="0.17")138 attr = SubElement(rule3, "attribute", algorithm="binary", ...band="0", name="Roundness", operation="gt", ...tolerance="5", value="0.25", weight="0.17")139 attr = SubElement(rule3, "attribute", algorithm="binary", ...band="0", name="Elongation", operation="lt", ...tolerance="5", value="2", weight="0.17")140 # Correct formatting for IDL141 RULstr = etree.tostring(RULtree, pretty_print=True, ...xml_declaration=True, ...encoding="utf-8").decode().replace("'",'"')142 # Write RUL file143 RULfile = os.path.dirname(os.path.dirname(datadir))+ENVIdir ...+ "Shive_rules_PY_autogen.rul"144 text_file = open(RULfile, "w")145 text_file.write(RULstr)146 text_file.close()147 #%%148 script_time = dt.now().strftime("%Y%m%d_%H%M%S")149 # PRO script here150 # Write PRO script151 PROfile = os.path.dirname(os.path.dirname(datadir)) + ..."\Shive_PY_autogen.pro"152 text_file = open(PROfile, "w")153 text_file.write(PRO_script_looped)154 text_file.close()155 #%%156 # Execute PRO script in IDL157 os.system(IDLdir+' '+PROfile+' -quiet')158 t1 = time.time()159 total = t1-t0160 print("\nDone.")161 print("Total time elapsed :", total, "\n")162 input("Press Enter to exit...")163 exit(0)205Below is the IDL .PRO script that is automatically generated by the preceedingPython script.1 compile_opt IDL22 PRINT, 'Beginning IDL script...'3 CD, 'F:\Ashton\PG'4 CD, CURRENT = base_dir5 CD, 'F:\Ashton\PG\Adaptive_testing\'6 CD, CURRENT = data_dir7 files = FILE_SEARCH('F:\Ashton\PG\Adaptive_testing\*.CR2')8 img_names = LIST()9 shives = UINTARR(files.LENGTH)10 smlshives = shives11 spots = shives12 results_filename = FILEPATH('Classification results ...(Adaptive_testing) - 20190225_151600.csv', ...ROOT_DIR=base_dir+'\ENVI_PY\')13 FOR xx=0, (files.LENGTH-1) DO BEGIN14 e = ENVI(/HEADLESS)15 temp_name = STRSPLIT(files[xx], '\.', /EXTRACT)16 sample_name = temp_name[4]17 PRINT, 'Processing '+sample_name+' ...('+(STRING(xx+1)).Compress()+' of ...'+(STRING(files.LENGTH)).Compress()+')...'18 img_names.Add, sample_name19 working_dir = base_dir+'\ENVI_PY\'+sample_name+'\'20 FILE_MKDIR, working_dir21 e.LOG_FILE = working_dir+sample_name+'_log.txt'22 file = FILEPATH(sample_name+'_adj.tiff', ROOT_DIR=data_dir)23 raster = e.OpenRaster(file)24 fid = ENVIRastertoFID(raster)25 rule_file = FILEPATH('Shive_rules_PY_autogen.rul', ...ROOT_DIR=base_dir+'\ENVI_PY\')26 report_filename = working_dir+sample_name+'_report.txt'27 confidence_raster_filename = ...working_dir+sample_name+'_confidence.dat'28 class_raster_filename = working_dir+sample_name+'_class.dat'29 seg_raster_filename = ...working_dir+sample_name+'_segmentation.dat'20630 vector_filename = working_dir+sample_name+'_vector.shp'31 shives_filename = working_dir+sample_name+'_shives.shp'32 smlshives_filename = working_dir+sample_name+'_smallshives.shp'33 spots_filename = working_dir+sample_name+'_spots.shp'34 dims = [-1L, 0, raster.ncolumns-1, 0, raster.nrows-1]35 pos = LINDGEN(raster.nbands)36 PRINT, 'Beginning classification...'37 ENVI_DOIT, 'envi_fx_rulebased_doit', fid=fid, pos=pos, ...dims=dims, r_fid=r_fid, merge_level=98.5, ...scale_level=20.0, rule_filename=rule_file, ...segmentation_raster_filename=seg_raster_filename, ...report_filename=report_filename, ...confidence_raster_image=confidence_raster_filename, ...classification_raster_filename=class_raster_filename, ...vector_filename=vector_filename, /EXPORT_VECTOR_ATTRIBUTES38 vector_shp = OBJ_NEW('IDLffShape', vector_filename)39 vector_shp.GetProperty, N_ENTITIES = num_ent40 n_shives = num_ent-241 shives_shp = OBJ_NEW('IDLffShape', ...FILEPATH(sample_name+'_shives.shp', ...ROOT_DIR=working_dir), /UPDATE, ENTITY_TYPE=5)42 smlshives_shp = OBJ_NEW('IDLffShape', ...FILEPATH(sample_name+'_smallshives.shp', ...ROOT_DIR=working_dir), /UPDATE, ENTITY_TYPE=5)43 spots_shp = OBJ_NEW('IDLffShape', ...FILEPATH(sample_name+'_spots.shp', ...ROOT_DIR=working_dir), /UPDATE, ENTITY_TYPE=5)44 vector_shp.GetProperty, attribute_info = attr_struct45 FOR ii=0, (attr_struct.LENGTH-1) DO BEGIN46 shives_shp.AddAttribute, attr_struct[ii].NAME, ...attr_struct[ii].TYPE, attr_struct[ii].WIDTH, ...PRECISION=attr_struct[ii].PRECISION47 smlshives_shp.AddAttribute, attr_struct[ii].NAME, ...attr_struct[ii].TYPE, attr_struct[ii].WIDTH, ...PRECISION=attr_struct[ii].PRECISION48 spots_shp.AddAttribute, attr_struct[ii].NAME, ...attr_struct[ii].TYPE, attr_struct[ii].WIDTH, ...PRECISION=attr_struct[ii].PRECISION49 ENDFOR50 pp=020751 qq=052 rr=053 FOR ii=0, (num_ent-1) DO BEGIN54 temp_attr = vector_shp.GetAttributes(ii)55 temp_ent = {IDL_SHAPE_ENTITY}56 CASE temp_attr.ATTRIBUTE_0 OF57 '1': BEGIN58 temp_ent = vector_shp.GetEntity(ii)59 shives_shp.PutEntity, temp_ent60 temp_struct = shives_shp.GetAttributes(/ATTRIBUTE_STRUCTURE)61 FOR jj=0, (N_TAGS(temp_struct)-1) DO BEGIN62 temp_struct.(jj) = temp_attr.(jj)63 ENDFOR64 shives_shp.SetAttributes, pp, temp_struct65 shives_shp.DestroyEntity, temp_ent66 pp++67 END68 '2': BEGIN69 temp_ent = vector_shp.GetEntity(ii)70 smlshives_shp.PutEntity, temp_ent71 temp_struct = smlshives_shp.GetAttributes(/ATTRIBUTE_STRUCTURE)72 FOR jj=0, (N_TAGS(temp_struct)-1) DO BEGIN73 temp_struct.(jj) = temp_attr.(jj)74 ENDFOR75 smlshives_shp.SetAttributes, qq, temp_struct76 smlshives_shp.DestroyEntity, temp_ent77 qq++78 END79 '3': BEGIN80 temp_ent = vector_shp.GetEntity(ii)81 spots_shp.PutEntity, temp_ent82 temp_struct = spots_shp.GetAttributes(/ATTRIBUTE_STRUCTURE)83 FOR jj=0, (N_TAGS(temp_struct)-1) DO BEGIN84 temp_struct.(jj) = temp_attr.(jj)85 ENDFOR86 spots_shp.SetAttributes, rr, temp_struct87 spots_shp.DestroyEntity, temp_ent88 rr++89 END90 ELSE: BREAK20891 ENDCASE92 ENDFOR93 shives[xx] = pp94 smlshives[xx] = qq95 spots[xx] = rr96 OBJ_DESTROY, shives_shp97 OBJ_DESTROY, vector_shp98 OBJ_DESTROY, spots_shp99 OBJ_DESTROY, smlshives_shp100 PRINT, 'Classification complete.'101 e.Close102 ENDFOR103 WRITE_CSV, results_filename, img_names.ToArray(), shives, ...smlshives, spots104 PRINT, 'Result file saved.'105 e = ENVI()106 results_dir = base_dir+'\ENVI_PY\Results (Adaptive_testing ...- 20190225_151600)'107 FILE_MKDIR, results_dir108 FOR xx=0, (files.LENGTH-1) DO BEGIN109 sample_name = img_names[xx]110 PRINT, 'Saving '+sample_name+' results ...('+(STRING(xx+1)).Compress()+' of ...'+(STRING(files.LENGTH)).Compress()+')...'111 working_dir = base_dir+'\ENVI_PY\'+sample_name+'\'112 file = FILEPATH(sample_name+'_adj.tiff', ROOT_DIR=data_dir)113 shives_filename = working_dir+sample_name+'_shives.shp'114 smlshives_filename = working_dir+sample_name+'_smallshives.shp'115 spots_filename = working_dir+sample_name+'_spots.shp'116 results_image = FILEPATH(sample_name+'_results.tiff', ...ROOT_DIR=results_dir)117 view = e.GetView()118 ras_adj = e.OpenRaster(file)119 layer_adj = view.CreateLayer(ras_adj)120 view.Zoom, /FULL_EXTENT121 IF shives[xx] GT 0 THEN BEGIN122 vec_s = e.OpenVector(shives_filename)123 layer_s = view.CreateLayer(vec_s)124 layer_s.COLOR = 'red'125 ENDIF209126 IF smlshives[xx] GT 0 THEN BEGIN127 vec_ss = e.OpenVector(smlshives_filename)128 layer_ss = view.CreateLayer(vec_ss)129 layer_ss.COLOR = 'yellow'130 ENDIF131 IF spots[xx] GT 0 THEN BEGIN132 vec_sp = e.OpenVector(spots_filename)133 layer_sp = view.CreateLayer(vec_sp)134 layer_sp.COLOR = 'blue'135 ENDIF136 view.Export, results_image, 'TIFF'137 ras_adj.Close138 IF shives[xx] GT 0 THEN vec_s.Close139 IF smlshives[xx] GT 0 THEN vec_ss.Close140 IF spots[xx] GT 0 THEN vec_sp.Close141 view.Close142 ENDFOR143 WIDGET_CONTROL, /DESTROY144 e.Close145 PRINT, 'Result images saved.'146 EXIT147 END210Appendix BEmpirical Classification RuleDetermination ResultsThis appendix, namely Figure B.1, contains the raw results from the empirical de-termination of classification rules to be applied during Near-Infrared (NIR) imageprocessing with ENVI, as described in Chapter 6.2.5. Each graph shows the resultsfor an individual attribute as calculated by ENVI. Green dots illustrate the attributevalues contained in all image segments for a single test image, while yellow dotsillustrate the attribute values for probable shive segments, collected from across allsample images. Black horizontal lines represent some early attempts at rulemak-ing. For each graph, the x-axis indicates the region identifier number of each imagesegment, which is linearly determined during image segmentation.The attributes studied are as follows (left to right in Fig. B.1): area1, convex-ity1, spectral minimum, rectangular fit, elongation1, spectral mean (labeled dark-ness), compactness, major axis length1, solidity, roundness, form factor, minor axislength1, texture range, texture mean, texture variance, and texture entropy.Note that these results were obtained using 16-bit images, whose pixel valuesranged from 0 to 65,535. The values were converted for use with 8-bit images(between 0 and 255), where appropriate.1y-axis displayed on a logarithmic scale.21112480 400 800 1200 1600 2000 2400 2800 3200 3600Elonga on00.20.40.60.811.20 400 800 1200 1600 2000 2400 2800 3200 3600Solidity0100002000030000400005000060000700000 400 800 1200 1600 2000 2400 2800 3200 3600Darkness00.20.40.60.811.21.40 400 800 1200 1600 2000 2400 2800 3200 3600Roundness1248160 400 800 1200 1600 2000 2400 2800 3200 3600Convexity0100002000030000400005000060000700000 400 800 1200 1600 2000 2400 2800 3200 3600Minimum00.050.10.150.20.250.30 600 1200 1800 2400 3000 3600Compactness00.10.20.30.40.50.60.70.80.90 400 800 1200 1600 2000 2400 2800 3200 3600Form Factor1101001000100001000001000000100000000 400 800 1200 1600 2000 2400 2800 3200 3600Area00.10.20.30.40.50.60.70.80.910 400 800 1200 1600 2000 2400 2800 3200 3600Rectangular Fit1101001000100000 400 800 1200 1600 2000 2400 2800 3200 3600Major Axis1101001000100000 400 800 1200 1600 2000 2400 2800 3200 3600Minor Axis05000100001500020000250003000035000400000 400 800 1200 1600 2000 2400 2800 3200 3600Texture Range0100002000030000400005000060000700000 400 800 1200 1600 2000 2400 2800 3200 3600Texture Mean0200000004000000060000000800000001000000001200000001400000001600000000 400 800 1200 1600 2000 2400 2800 3200 3600Texture Variance-1.4-1.2-1-0.8-0.6-0.4-0.200.20.40.60 400 800 1200 1600 2000 2400 2800 3200 3600Texture EntropyFigure B.1: Raw results of empirical classification rule determinations; see previous page for detailed information.212

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0383320/manifest

Comment

Related Items