MATHEMATICAL METHODS FOR AUTOMATED FLOW INJECTION ANALYSIS by OLIVER LEE B.Sc. (Hons.), University of British Columbia, 1987 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Department of Chemistry) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA November 1993 © Oliver Lee, 1993 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. (Signature) Department of Chemistry The University of British Columbia Vancouver, Canada November 26, 1993 II ABSTRACT Presently, the development of automated flow injection analyzers for use in both chemical research and process control is an active area of research in this laboratory. For these systems to operate for extended periods of time without human supervision, it is imperative that they behave in an intelligent manner. It is also desirable that they produce the best possible data. In this thesis, advanced mathematical strategies have been devised to enhance the robustness, analytical reliability and performance of automated flow injection analyzers. The mathematical methods developed are based on signal representation by orthogonal polynomials. Any arbitrary function can be expanded as a weighted linear combination of orthogonal functions via a generalized Fourier expansion; together, the weights or coefficients form a spectrum. The most familiar of these functions is the complex exponential set associated with the classical Fourier series expansion. In addition to this set, the Gram and Meixner polynomial families have been employed here. Since the latter are transients, they are particularly suited for representing flow injection data. A peak shape analysis strategy is presented for automatic identification or classification of both physicochemical and mechanical faults during analyzer operation. The coefficients from decomposition of the flow injection peak into orthogonal functions were used as general descriptors of peak shape. Each set of functions generated a different spectrum and thus each offered a different view of the peak. The effects of white noise on the reproducibility of the individual coefficients was quantified. Meixner polynomials are orthogonal over a semi-infinite interval. The In practice, these functions must be truncated, and orthogonality is lost between all functions in the set. However, a time scale parameter is available to stretch or compress these functions. It was found that for robust identification, the time scale should be set at the highest value possible at which orthogonality was still observed numerically over the subset of III functions chosen for identification. An empirical model was formulated to allow direct computation of this time scale. The capability of these descriptors for peak classification was demonstrated with simulated data and principal components analysis. A simple method based on the Fisher weight was developed to optimize the number of coefficients to use for a given classification problem. From the results obtained with this method, 5 to 13 coefficients are recommended for most FIA systems; this applies to each of the three sets of orthogonal functions used. Orthogonal function representation was also employed for digital filtering. A comparison between two implementations: the finite impulse response filter and the indirect filter, was conducted with the Gram polynomials. The latter was found to be better suited for filtering typical (highly skewed) flow injection peaks. The efficacy of the three basis functions for indirect filtering of peak-shaped transient signals was subsequently compared. The Meixner filter was found to be best for highly skewed peaks, the Fourier filter was best for more symmetric peaks and the Gram filter performed somewhere in between. The problem of determining the optimal filter cut-off order was cast in terms of hierarchical model selection. Two solutions are presented: the Akaike information criterion and cross-validation. Near-optimal filtering can be achieved with either method and hence, automatic filtering is facilitated. demànstrated on both simulated and real data. This was TABLE OF CONTENTS page ii ABSTRACT viii LIST OF TABLES x LIST OF FIGURES xviii GLOSSARY ACKNOWLEDGMENTS Chapter 1 INTRODUCTION 1.1 AUTOMATION 2 1.2 HISTORICAL PERSPECTIVE ON AUTOMATION IN ANALYTICAL CHEMISTRY 5 AUTOMATED CHEMICALANALYSIS 7 1.3 1.3.1 1.3.2 1.3.3 1.3.4 1.4 Advantages Disadvantages Automating the process of chemical analysis Information handling: the element of intelligence FLOW INJECTION ANALYSIS 1.4.1 1.4.2 1.4.3 1.4.4 Principles and mechanics General theory and models Applications Automated flow injection analysis 8 9 10 12 16 19 22 26 26 SCOPE OF THE THESIS 28 REFERENCES 32 PEAK SHAPE ANALYSIS I 37 1.5 Chapter 2 I 2.1 OUTLINE OF PEAK SHAPE ANALYSIS STRATEGY 39 2.1 .1 Peak capture 2.1.2 Fault detection 2.1.3 Peak shape descriptors 40 41 42 V 2.2 Chapter 3 PATTERN RECOGNITION 46 2.2.1 Data representation and structure 2.2.2 Data preprocessing 2.2.3 Principal components analysis 47 48 50 REFERENCES 54 PEAK SHAPE ANALYSIS II 56 3.1 THEORY 59 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 3.1.6 3.1.7 60 61 62 65 67 68 3.2 3.3 3.4 Definition of orthogonal functions General linear least squares approximation Generalized Fourier expansion Discrete complex exponential (Fourier) functions Gram polynomials Meixner polynomials Computation of expansion coefficients by least squares 72 EXPERIMENTAL 74 3.2.1 3.2.2 3.2.3 3.2.4 74 79 79 80 Simulation of flow injection peaks Experimental data Data processing Computational aspects GRAM AND FOURIER REPRESENTATIONS 81 3.3.1 Spectral information 3.3.2 Numerical studies in peak approximation 3.3.3 Effect of temporal resolution on expansion coefficients 3.3.4 Effects of experimental noise 81 84 91 100 MEIXNER REPRESENTATION 102 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 3.4.6 3.4.7 104 106 110 114 115 118 119 Determination of the critical time scale Calibration of the critical time scale Numerical studies in peak approximation Spectral information Effect of temporal resolution Effect of the time scale on the spectrum Effects of noise vi 3.5 3.6 Chapter 4 PATTERN RECOGNITION STUDIES 121 3.5.1 3.5.2 3.5.3 3.5.4 3.5.5 124 134 135 138 142 Gram and Fourier representations Meixner representation Selection of coefficients The Fisher weight criterion Evaluation on real data SUMMARY 146 REFERENCES 152 DIGITAL FILTERING OF FLOW INJECTION DATA 4.1 4.2 4.3 4.4 4.5 4.6 4.7 155 DIGITAL FILTERS 157 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 160 161 162 163 165 Linear filters Impulse response Frequency response Indirect filtering in the spectral domain Finite impulse response (FIR) filtering DETERMINING THE OPTIMAL ORDER FOR INDIRECT FILTERS 4.2.1 Indirect low-pass filtering as hierarchical model selection 4.2.2 Akaike information-theoretic criterion (AIC) 4.2.3 Cross-validation 168 171 172 174 EXPERIMENTAL 175 4.3.1 4.3.2 4.3.3 4.3.4 175 177 177 178 Simulation of flow injection data Experimental data General data processing Computational aspects COMPARISON OF FILTER IMPLEMENTATIONS CASE STUDY WITH GRAM POLYNOMIALS 179 COMPARISON OF BASIS FUNCTIONS FOR INDIRECT FILTERING 188 EVALUATION OF AUTOMATIC NEAR-OPTIMAL SMOOTHING CRITERIA FOR INDIRECT FILTERING 195 APPLICATION TO A GRADIENT CALIBRATION EXAMPLE 201 - VII 4.8 Chapter 5 SUMMARY 205 REFERENCES 207 CONCLUSIONS AND FURTHER WORK 211 Appendix A COMPLEX EXPONENTIAL FUNCTIONS 214 A.1 COMPLEX EXPONENTIAL FOURIER REPRESENTATIONS 214 A.2 CORRESPONDENCE BETWEEN FOURIER REPRESENTATIONS 216 REFERENCE 218 Appendix B LEGENDRE POLYNOMIALS 219 B.1 GENERAL 219 B.2 GENERALIZED FOURIER EXPANSION IN LEGENDRE POLYNOMIALS 222 REFERENCE 225 Appendix C STATISTICAL MOMENTS OF THE TANKS-IN-SERIES MODEL 226 C.1 GENERAL DERIVATIONS 226 C.2 DERIVATION FOR SKEWNESS 228 REFERENCES 229 Appendix D PROGRAM LISTINGS 230 D.1 GENERAL 230 D.2 PROGRAM FOR COMPUTING THE GRAM POLYNOMIALS 230 D.3 PROGRAM FOR COMPUTING THE MEIXNER POLYNOMIALS 231 VIII LIST OF TABLES Page Table 1.1 Technological developments which facilitate chemical automation 5 Some generic features of intelligent automated chemical instruments 14 Table 1.3 Basic models and equations for flow injection analysis 24 Table 2.1 Some time-domain descriptors used for peak characterization 44 Table 3.1 Values for T, used and S for selected tanks-in-series curves 77 Table 3.2 Formulae used for simulating bifurcated FIA peak shapes 78 Table 3.3 Number of coefficients required to approximate simulated peaks to 12-bit precision with Gram and Fourier functions 90 Table 3.4 Gram and Legendre coefficients for peak A 94 Table 3.5 Discrete and continuous Fourier coefficients for peak A 98 Table 3.6 Optimal expansion orders for SSE curves shown in Figure 3.15 Table 3.7 Results from curve-fitting the critical time scale surface to (3.64) 109 Number of coefficients required to approximate simulated peaks to 12-bit precision with Meixner polynomials 115 Variability of optimal Meixner time scale parameter with noise level 120 Manual class assignments for peaks obtained from the reaction between Fe(II) and 1,10-phenanthroline 144 Table 4.1 Comparison of Gram filter implementations on peak A 183 Table 4.2 Comparison of Gram filter implementations on peak C 184 Table 4.3 Comparison of Gram filter implementations on peak H 185 Table 4.4 Comparison of Gram filter implementations on peak P 186 Table 4.5 Indirect filter recommended for use in FIA based on peak skewness 190 Comparison of optimization results for the indirect Gram filter 198 Table 1.2 Table 3.8 Table 3.9 Table 3.10 Table 4.6 103 ix Table 4.7 Table 4.8 Table 4.9 Table B.1 Comparison of optimization results for the indirect Fourier filter 199 Comparison of optimization results for the indirect Meixner filter 200 Calibration peak: AIC and cross-validation values for three filters 205 The first 26 Legendre polynomials 220 x LIST OF FIGURES Page Figure 1.1 Figure 1.2 a) Feedback and b) feedforward schemes for system control. In the former, the output is evaluated and the system parameters are altered by the controller if necessary to achieve the desired output. In the latter, the input is evaluated and the system parameters are adjusted if necessary to suit the given type or quality of input. 3 Wade and Crouch model for an “intelligent” or fifth generation instrument (see text). 15 Figure 1.3 Basic components of an FIA system. 19 Figure 1.4 Schematic representation of the effects of convection and diffusion on the injected sample observed at an appropriate point downstream from injection. Stream moves from right to left. Shown are the transport conduit cross-section (top) and concentration profile (bottom) for a) no dispersion, b) dispersion from convection only, C) dispersion from diffusion only, and d) dispersion from both convection and diffusion. Figure 2.1 Figure 2.2 Figure 2.3 Figure 3.1 Figure 3.2 21 Schematic of the approach employed in a descriptor-based strategy for peak qualification. 39 Frequency distributions of two categories a and b for variable x. The Fisher weight uses the distance between the means of the two distributions scaled by the sum of the variance of both distributions as a measure of discrimination. 50 A graphical interpretation of principal components analysis. Two-dimensional data are projected onto a line I such that the maximum variation in the data is preserved. 51 First five exponential functions associated with the Fourier transform. Both real (solid line) and imaginary (dash-dot line) functions are plotted. Circled numbers indicate sequence. 66 a) First five un-normalized Gram polynomials. Circled numbers indicate sequence. b) Change in amplitude of the quintic as a function of the number of points: N = 65 (dotted), 129 (dash-dot) and 1025 (solid). Note that the abscissa for graph (b) is normalized to one. 69 xi Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 3.10 Figure 3.11 Figure 3.12 First five Meixner polynomials with a) C= 0.92 and b) = 0.95. Circled numbers indicate sequence. 71 Selected tank-in-series curves showing the effect of N. Circled numbers identify the value of N used. 75 Flow injection manifold used for the determination of Fe(lI) by reaction with 1,10— phenanthroline. 79 Simulated FIA peaks consisting of N points. Uppercase letters are labels which have been assigned in Tables 3.1 and 3.2. 82 a) Gram and b) Fourier magnitude spectra for the synthetic peaks shown in Figure 3.6. Spectra were computed from peaks containing 257 points. The coefficient vector has been normalized to L -norm. For clarity, only the first 30 2 coefficients are plotted. 83 Plots of SSE against expansion order for the approximation of peaks in Figure 3.6 with a) Gram polynomials and b) Fourier basis functions. The dotted line indicates quantization error level for 12-bit data representations. 86 Composite correlation profile versus expansion order. Curve derived from the plot of the correlation of all even Gram polynomials with its zeroth order polynomial, interlaced with that of all odd Gram polynomials with its first order polynomial (see text). Data were derived from Gram polynomials consisting of 257 points. The dotted line indicates level of machine precision. 87 Plot of the order n for the Gram polynomials at which orthogonality is (numerically) breached against number of points N used in Gram polynomials (solid line); quadratic fit to data (dotted line). 88 Effect of the number of points used on Gram coefficient values for peak A: a) 3rd, b) 13th, and C) 22nd order coefficients; and for peak 0: d) 3rd, e) 13th, and f) 22nd order coefficients. The value of the Legendre coefficient is shown with a dotted line. 93 Absolute relative range of Gram coefficient values for the first 26 coefficients over the intervals; a) 65-129, b) 129-257, c) 257-51 3, d) 513-1025, e) 1025-2049, and f) 65-co points for peak A. 96 XII Figure 3.13 Figure 3.14 Figure 3.15 Figure 3.16 Figure 3.17 Figure 3.18 Figure 3.19 Figure 3.20 Figure 3.21 Figure 3.22 Absolute relative range of Gram coefficient values for each coefficient over the intervals; a) 65-1 29, b) 129-257, c) 257-513, d) 513-1025, e) 1025-2049, and f) 65-cc points for peak P. Values in graphs indicate magnitude of off-scale bars. 97 Absolute relative range of Fourier (solid) and Gram (dotted) coefficient values for each coefficient over the intervals; a) 65-129, b) 129-257, C) 257-513, d) 513-1025, e) 1025-2049, and f) 65-cc for peak A. 99 Effect of noise on approximation. Gram basis on a) peak A, and b) peak Q. Fourier basis on C) peak A and d) peak Q. The noise levels shown in graph (a) apply to other graphs also and are expressed as percentage standard deviations with respect to peak height. 103 Plot of the scalar product of the highest order polynomial in a subset of Meixner functions against time scale. The number of functions in the subset is (from right to left in the graph): 1, 5, 10, 15, 20, 25, 30, 35 and 40. 105 Plot of RCOND against time scale Cfor a subset of Meixner polynomials. The number of functions in the subset is (from right to left in the graph): 1, 5, 10, 15, 20, 25, 30, 35 and 40. 106 Surface plot of critical time scale computed via RCOND against expansion order and datapoints. 107 Plots of SSE against time scale Cfor expansion orders of (from top to bottom in the graphs): 1, 5, 10, 15, 20, 25, 30, 35 and 40. Dotted lines indicate location of the critical time scale for that order, as computed in sub-section 3.4.2. 110 Plots of RCSSE against time scale Cfor expansion orders of (from top to bottom in the graphs): 1, 5, 10, 15, 20, 25, 30, 35 and 40 for a) peak A, and b) peak P. 113 Plots of SSE against expansion order for the approximation of peaks in Figure 3.6 with Meixner polynomials. Dotted line indicates quantization error level for 12-bit data representations. 114 Meixner spectra composed of a) 11 coefficients and b) 21 coefficients for the six synthetic peaks shown in Figure 3.6. Spectra were computed, via the RCSSE method, from peaks containing 257 points. The coefficient vector has been normalized to L -norm. 2 116 XIII Figure 3.23 Figure 3.24 Figure 3.25 Figure 3.26 Figure 3.27 Meixner spectra composed of a) 11 coefficients (= 0.8856), and b) 21 coefficients (C= 0.8122) for the six synthetic peaks shown in Figure 3.6. Spectra were computed at the critical time scale from peaks containing 257 points. The coefficient vector has been normalized to -norm. 2 L 117 Effect of the time scale parameter on coefficients values for a) peak A and b) peak P. The first six coefficients (labeled by the circled numbers) from a tenth order expansion are plotted. The coefficient vector was not normalized to unit length. 118 SSE curves as a function of time scale Cfor a) peak A, 1% noise level, b) peak A, 10% noise level, C) peak P, 1% noise level, d) 10% noise level. Expansion orders of 1, 5, 10, 15, 20, 25, 30, 35 and 40 are shown. Circled numbers indicate the sequence. 120 a) Principal components analysis on simulated peaks using the first to fifth order Gram coefficients. b) Principal components analysis on simulated peaks using the first to tenth order Gram coefficients. c) Principal components analysis on simulated peaks using the first to fifteenth order Gram coefficients. d) Principal components analysis on simulated peaks using the first to twenty-fifth order Gram coefficients. Noise level of 3% added to peaks. Scores plot using the i) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principal component loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i) and (ii). 125 a) Principal components analysis on simulated peaks using the first to third order Fourier coefficients. b) Principal components analysis on simulated peaks using the first to fifth order Fourier coefficients. c) Principal components analysis on simulated peaks using the first to tenth order Fourier coefficients. d) Principal components analysis on simulated peaks using the first to fifteenth order Fourier xiv Figure 3.28 Figure 3.29 Figure 3.30 Figure 3.31 coefficients. Noise level of 3% added to peaks. i) scores plot using the first two principal components (variance accounted for by the principal components are shown in brackets next to the axis labels), ii) first principal component loadings, iii) second principal component loadings. Noisefree peaks are shown with a ‘+‘ in (i). Dotted lines help in showing the membership of points. 130 Principal components analysis on simulated peaks using the zeroth to fifth order Meixner coefficients. Critical time scale method used. Noise level of 3% added to peaks. Scores plot using the i) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principal component loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (I) and (ii). 136 Principal components analysis on simulated peaks using the zeroth to fifth order Meixner coefficients. RCSSE method used. Noise level of 3% added to peaks. Scores plot using the i) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principalcomponent loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i) and (ii). 137 Plot of Fisher weight criterion (FWC) against expansion order for a) Gram representation (three principal components), b) Fourier representation (two principal components), C) Meixner representation (critical time scale method, three principal components), and d) Meixner representation (RCSSE method, three principal components), over the noise levels shown on graph (a). The zeroth order is excluded for the Gram and Fourier representations. 141 Matrix of peak shapes from Fe(ll) —1 ,10-phenanthroline reaction resulting from different combinations of sodium acetate (the pH modifier) and 1,1 0-phenanthroline (reagent) concentrations. Numbers in panels label the peaks. 143 xv Figure 3.32 FWC curves for a) Gram, b) Fourier, and c) Meixner representations of peaks from the Fe(ll) 1,1 0-phenanthroline reaction. Three principal components were used. Note that the zeroth order Gram and Fourier coefficient was excluded from the analysis. — Figure 3.33 Figure 3.34 Results of PCA on Gram representation of peaks from the Fe(ll) —1 ,1O-phenanthroline reaction. Expansion was performed over 8 coefficients with zeroth order coefficient excluded. i) Scores plot using first and second principal component. Position of each peak is at the center of the numbers drawn, except when a pointer is used. Number only: non-bifurcated; circled number: marginal; and boxed numbers: bifurcated. Percentages in brackets on score plot axis labels refer to variance accounted for by that principal component. ii) First principal component loadings. iii) Second principal component loadings. Results of PCA on Fourier representation of peaks from the Fe(ll) 1,10-phenanthroline reaction. Expansion was performed over 8 coefficients with zeroth order coefficient excluded. i) Scores plot using first and second principal component. Position of each peak is at the center of the numbers drawn, except when a pointer is used. Number only: non-bifurcated; circled number: marginal; and boxed numbers: bifurcated. Percentages in brackets on score plot axis labels refer to variance accounted for by that principal component. ii) First principal component loadings. iii) Second principal component loadings. 145 147 — Figure 3.35 Results of PCA on Meixner representation of peaks from the Fe(ll)— 1,10-phenanthroline reaction. Expansion was performed over 8 coefficients with zeroth order coefficient included. i) Scores plot using first and second principal component. Position of each peak is at the center of the numbers drawn, except when a pointer is used. Number only: non-bifurcated; circled number: marginal; and boxed numbers: bifurcated. Percentages in brackets on score plot axis labels refer to variance accounted for by that principal component. ii) First principal component loadings. iii) Second principal component loadings. 148 149 Figure 4.1 Diagram of a filter and its two primary applications. 158 Figure 4.2 Frequency responses of a) 11 point Savitzky-Golay filter of orders of 2, 4 and 6, and b) quadratic filter with 5, 11 and 21 points. 166 xvi Frequency responses of second order Meixner impulse response functions of length 11. A time scale of 0.25 was used. Circled numbers indicate the row of the estimation matrix from whence the impulse responses were taken. The frequency response of the 11-point Savitzky-Golay quadratic midpoint smoother is shown with a dotted line. 168 Figure 4.4 Simulated peaks used for studies on digital filtering. 176 Figure 4.5 Flow injection manifold and ICP operating conditions used for the study of the effect of Mg and acetic acid concentration on optical emission intensity. 178 Plot of SSE against Gram filter order for peak C and 20 Gaussian-distributed random noise sequences (SIN = 20). Each curve yields a minimum SSE and thus an optimal filter order. 182 The effect of skewness on filter performance. Gram ( 0), Fourier ( ii ) and Meixner ( ) filters were compared at signal to noise levels of a) 100, b) 20, c) 10, and d) 5. 190 The most normally-distributed noise sequence out of the set generated. 191 Residuals from indirect Gram filtering of simulated peaks (labeled by capital letter). Residuals were computed from the most normal noise sequence used in the study. 192 Residuals from indirect Fourier filtering of simulated peaks (labeled by capital letter). Residuals were computed from the most normal noise sequence used in the study. 193 Residuals from indirect Meixner filtering of simulated peaks (labeled by capital letter). Residuals were computed from the most normal noise sequence used in the study. 194 Plots of a) AIC and b) PRESS against Gram filter order for peak C and 20 Gaussian-distributed random noise sequences. The S/N ratio was 20. Note log scale for ordinate of graph (b). 196 Principle of gradient calibration. On the right is the usual calibration method in which replicate injections of standard solutions are made. This yields a calibration curve of say, absorbance (A) against analyte concentration (middle), which can then be used to obtain a relationship between time and concentration for a single standard injection as shown on the left. 202 Figure 4.3 Figure 4.6 Figure 4.7 Figure 4.8 Figure 4.9 Figure 4.10 Figure 4.11 Figure 4.12 Figure 4.13 xvii Figure 4.14 Near-optimal indirect digital filtering of a flow injection calibration peak. a) Original calibration peak with noise; b) peak after indirect Fourier smoothing; C) peak after indirect Gram filtering; d) peak after indirect Meixner filtering. Dotted curve in (b), (c) and (d) are reproductions of the curve in (a). 204 xviii GLOSSARY a.u. arbitrary units ASCFA air-segmented continuous-flow analysis ADC analog-to-digital converter AIC Akaike information-theoretic criterion DFT discrete Fourier transform FIA flow injection analysis FIR finite-duration impulse response (filter) FFT fast Fourier transform FWC Fisher weight criterion FWHM full width at half maximum (of a peak) hR infinite-duration impulse response (filter) IUPAC The International Union of Pure and Applied Chemistry PCA principal components analysis PRESS predictive residual error sum of squares RCOND reciprocal condition number RCSSE reciprocal condition number multiplied by the logarithm of the sum of the squares of the errors SIMCA soft independent modeling by class analogy SIN signal-to-noise ratio SSE sum of the squares of the errors SVD singular value decomposition z variable (real or complex) mean value of variable z z* complex conjugate of variable z i estimate of a variable z z(t) continuous function z in variable t 1(t) approximation or model of a function z in variable t z[k] discrete function z with index k Zk digital function z with index k xix Re(z) real part of z lm(z) imaginary part of z Izi magnitude of z; equal to z vector of elements in z; i.e. {z ,z 1 , 2 or gRe(z)2 + lm(z) 2 ..., ZN} length of vector z, based on Euclidean norm Z matrix of elements in z; i.e. [z, ] 7 [z] matrix consisting of the elements z Z# pseudoinverse of Z ZT transpose of Z 1 Z matrix inverse of Z c concentration e natural logarithmic base, 2.71 828... E() expectation operator H(a) frequency response J k time index loge logarithmic operator min() minimum operator; Le. take the minimum value of max() maximum operator; Le. take the maximum value of L (1) manifold length, or (2) filter length Lebesgue domain of square integrable functions N number of mixing stages in a tanks-in-series model r Pearson’s product-moment coefficient of correlation S skewness t (1) time (2) t statistic of the t-test T time period mean residence time in one tank for tanks-in-series model S Kronecker delta operator 3. 141 59... xx o phase 0 sample standard deviation (i.e. of z); equal to arctan{Im(z)/Re(z)} sample variance angular frequency C Meixner polynomial time scale parameter critical Meixner polynomial time scale parameter A forward difference operator infinity define as approximately equal to () inner product (Dirac notation) C element of (a set) denotes transform pair relationship factorial operator binomial coefficient; equal to (a)(b) generalized factorial function; equal to [a, b] closed interval bounded by a and b (includes a and b) (a, b) opened interval bounded by a and b (excludes a and b) [a, b) interval bounded by a and b (includes a, excludes b) a!I(a - xxi ACKNOWLEDGMENTS My deepest appreciation goes to Dr. Adrian P. Wade, whose support and encouragement were invaluable. His patience was unshakable even when I (on more than one occasion) pursued a different avenue of research. I must have been quite a “project” for him. His unparalleled enthusiasm for research will always be an inspiration. I would also like to thank our “post-Doc’s”: Dr. Peter D. Wentzell, whose research prowess is unmatched; and Dr. Lawrence E. Bowman, who has information on just about everything especially, water pumps. - I would like to acknowledge Dr. Terrance D. Hettipathirana for kindly allowing the author use of his FIA-ICP-OES data. My gratitude goes out to the “bestest”, “funnest” (never boring) bunch of people anyone could hope to have worked with: Adrian “I’m not getting married yet’ Cook, who trusted me enough to let me loose on the California highway in his VW Rabbit (he’ll never do that again!); Bruce “I’m a proud Winnipeger, I just don’t want to live there!” Munro, who does the best imitations of anyone I know and who still believes to this very day that he can beat me fairly in squash; Helen “I know something you don’t’ Nicolidakis-Soulsbury, who managed to keep me awake towards the end of a marathon 56-hour no-sleep period by relating an endless stream of gossip/stories that cannot be put in print; Julie “coffee?” Homer, who was always quick to inflate my ego; Kevin “...“ Soulsbury, who is the only person I know who can carry on a conversation all by himself; Megan “happ}” Sterling, for the water puddle fights; Mike “Know what I mean Vern?” Kester, who actually spends more time in the bathroom in the morning than me!; Dr. Paul “FIA god’ Shiundu, who has an awesome Darth Vader voice and who took it in xxii stride when I got the two of us busted at U. of W.; and last but not least, Timothy “HA HA HA” Crowther, who can get you to do the silliest things. A heartfelt thanks goes out to Dr. Donald Yapp for his friendship over the years and who heard most of my gripes, to Dr. Yoshikata Koga, whose help in general is very much appreciated and to Dr. Bruce Todd, for proofreading portions of the thesis. Several people at the Pulp and Paper Center deserve acknowledgment as well: Dr. Guy “ticked-c ffI!P’ Dumont, for many helpful ideas and suggestions, for access to his computing facilities and for proofreading this thesis; Rick “Unix god’ Morrison, who was always there to handle (and put up with) my computer problems (which were of a type only I could generate); Dr. Ye Fu, for his advice on mathematical/engineering matters; and Kristinn “It’s not a bald spot, it’s a solar panel for a sex machine!” Kristinsen, who did a bit of both. A sincere thank you is also extended to head secretary Lisa “Whad’ya want now?” Brandly, Administrative Assistant Georgina “I didn’t see it...” White, and everyone else there who had to put up with me. Finally, I would like to expressed my fondest appreciation to my family and especially, Teresa, whose love and support cannot be compared. xxiii If, therefore, anyone wishes to search out the truth of things in serious earnest, he ought not to select one special science; for all the sciences are conjoined with each other and interdependent. RENE DESCARTES “The Regulae Ad Directionem Ingenii” I Chapter 1 Introduction He is extending his eyes with radar; his tongue and his ears through telecommunications; his muscle and body structure through mechanization. He is extending his own energies by the generation and transmission of power and his neivous system and his thinking and decision making faculties through automation. SIR LEON BAGRIT, “The Age of Automation” Innovations and discoveries aside, a historian currently reviewing the development of analytical chemistry would likely identify two revolutionary phases. Near the beginning of this century, the adaptation of theory and experimental methods from physical chemistry transformed analytical chemistry from an entirely empirical artform into a reputable science [I]. Then, roughly 45 years later, the advent of instrumental methods greatly increased both the pace and boundaries of chemical analysis. Nowadays, it is difficult to imagine chemistry conducted without the benefit of, for example, optical or magnetic spectrometers. The progressive use of physical methods continues to enrich and change the face of chemical measurement [I]. Presently, analytical chemistry is in the midst of its third revolution. Over the last two decades, advances in the electronics industry have culminated in the development of inexpensive, versatile integrated circuits, namely, microprocessors and (micro) computers. These are radically changing the operation and functionality of chemical instrumentation [2]. In particular, the microcomputer has become a vital resource for the analytical chemist, who now uses it routinely for such tasks as automatic data Introduction acquisition and data processing, experimental design and control, and simulation 2 - some of which can only be effectively conducted with the aid of a computer. Indeed, it has been said that the future of chemical analysis may now be inextricably linked with that of the computer [3]. Already, analytical chemistry is undergoing the next logical step in the evolution of chemical analysis: automation. 1.1 AUTOMATION Automation has been commonly referred to as a means by which a machine is able to operate with minimal human intervention and maximum efficiency [4]. From a practical point of view, this definition is rather vague. The International Union of Pure and Applied Chemistry (IUPAC) has provided a more precise definition: ‘the use of combinations of mechanical and instrumental devices to replace, refine, extend, or supplement human effort and facilities in the performance of a given process, in which at least one major operation is controlled without human inteivention, by a feedback mechanism’ [5]. Implicitly contained within the definition of automation are two key elements: mechanization and the handling of information. As defined by IUPAC, the former makes ‘use of mechanical devices to replace, refine, extend or supplement human effort [5]. In general, it is motivated by reduction in cost and worker skill level, removal of tedium on the part of the worker, rapid turn-around time, reliability and accuracy [4]. The term mechanization has often been incorrectly used interchangeably with that of automation. Handel [6] has given an outline of the fundamental differences between the two. Clearly, automation is a natural evolution from mechanization, which only serves to simplify or ease the tasks that humans need to perform but does not replace them, and where human operators are still required to maintain control over and to operate the machine. Automation, in its purest form, removes the human operator from the process entirely. Hence, some capacity for intelligent handling of information is needed, for example, to adapt a process to account for variability of inputs or to deliberately alter the characteristics of the output [7]. Information handling Introduction 3 via feedback results in what control engineers refer to as closed-loop operation. Here, a control computer compares what is happening to what should be happening and issues control signals to bring the process (back) into line; that is, the operational criterion is maintenance of the desired output. Feedback only indicates the direction in which information flows and can involve many strategies of machine adaptation and control. Another scheme is feedforward, in which subsequent processing is modified to suit the current type of input. Since this approach is not closed-loop, it is less reliable and precise relationships between the input and output must be available. These two schemes are diagrammed in Figure 1.1. Betteridge has referred to information handling as the conceptual aspect of automation [8]. Automation, in general, is driven by sociological, political and economic forces, and is facilitated by technological developments. More than ever, the analytical chemist is faced with the task of providing fast, reliable chemical information to various sectors of society, private and public. In industry, for example, the challenge of international competition and government regulations means that a company must have ready access to analytical data such that critical business decisions can be made or process optimizations can be performed. This ever-increasing demand for rapid, selective chemical analysis has prompted the need to develop cost-effective solutions. nput output a) Figure 1.1 b) a) Feedback and b) feedforward schemes for system control. In the former, the output is evaluated and the system parameters are altered by the controller if necessary to achieve the desired output. In the latter, the input is evaluated and the system parameters are adjusted if necessary to suit the given type or quality of input. introduction 4 The result is a well-established market for automated instrumentation [9]. Presently, the pace of technological innovation is accelerating at a phenomenal rate. This is partly due to the increasing availability of powerful microcomputers - finally, control and computations that were theoretically possible but previously impractical are now feasible. Research in the laboratories of universities, public and private sectors, and commercial instrument manufacturers has contributed substantially to the progress of automation [10]. The scope of this research crosses disciplinary boundaries and targets both mechanical and conceptual aspects. Consequently, the analytical chemist must be aware of progress in these other fields. Suitable tools may then be adapted for use in automated chemical analysis systems. A list of significant technologies, categorized by scientific discipline, is given in Table 1.1. These generally show inter-dependence in terms of development and the state-of-the-art is determined by the least-developed system component. For example, robotics depend on, amongst other things, progress in microprocessor technology, pattern recognition and actuators. Advances in laboratory robotics is presently hindered by slow technological progress in three-dimensional visual perception. Unquestionably, the availability of inexpensive digital microcomputers has made automation economically feasible in analytical laboratories. Arguably, it is the single most important component since it can i) ultimately perform most of the roles of an operator, ii) incorporate programmable hardware (programmable pumps, detectors, etc.), which enhances versatility, and iii) facilitate the progress of and/or feasibility of many other technologies, e.g. pattern recognition. Indeed, digital computers have been largely responsible for the creation of at least one field, digital signal processing, which itself has ramifications to automation. Digital computerized automation may be referred to as “soft automation” since automation is mediated through software. Perhaps the ultimate form of automation is a robot. Like humans, which they are designed to mimic, robots are extremely versatile and provide for “flexible automation” [11]. Some limited systems (e.g. Zymark Corporation’s Zymate series, Introduction Table 1.1 5 Technological developments which facilitate chemical automation Scientific discipline Technology Chemistry Chemical and biochemical sensors, batch methods of analysis, continuous flow methods of analysis Computer Science Software and computing languages, artificial intelligence (including expert systems and natural language processing), database theory Electrical Engineering Robotics, micromechanics, actuators, sensors, information theory, digital signal processing, automatic control theory, cybernetics Electronics Microprocessors and computers, actuators, sensors Materials science Durable materials for construction Mechanical Engineering Micromechanics, sensors Mathematics Numerical computation Physics Physical methods of chemical detection Psychology and Cognitive Science Pattern recognition, knowledge representation Statistics Statistical (multivariate) methods of analysis Hewlett-Packard’s ORCA) have already been developed for chemical analysis; they perform well in industrial laboratories but their complexity and cost has thus far precluded widespread use. The development of a sentient, intelligent robot which can move from site to site, collect and analyze samples autonomously, and report the results to a human supervisor is still, (un)fortunately, a thing of the distant future. 1.2 HISTORICAL PERSPECTIVE ON AUTOMATION IN ANALYTICAL CHEMISTRY Perhaps the first true applications of automation in analytical chemistry occurred in industrial chemical process control. In March of 1959, after 3 years (and 30 people years) of development, Thomson Ramo Woolridge (an aerospace company) and Texaco installed a computer-controlled system for a polymerization unit at Texaco’s Port Arthur refinery [12]. The digital computer received temperature, pressure, flow, Introduction 6 and chromatographic inputs from 10 reactors and then computed and actuated the desired proportions of raw feed. It also carried out chromatograph calibration daily, checked chromatograph operation, ran self-diagnostics and checked input ports, logged data, and computed the approximate time for replacement of catalyst. Another example is an “automatic experimenter,” code-named OPCON, developed Westinghouse by Corporation for the optimization chemical of It was employed to perform adaptive optimization of operating processes [13]. parameters for a chemical plant to achieve maximum production. The first application was to a pilot plant of Dow Chemical Company, for the conversion of ethylbenzene to styrene and hydrogen in 1959 [13]. The interval between parameter changes ranged from 45 to 150 minutes and the optimal conditions could be established in 3 to 5 days. The first widespread use of “automation” in routine chemical analysis began with the introduction of the Technicon AutoAnalyzer TM in 1957 [14]. Dr. Leonard T. Skeggs Jr., a biochemist, devised a method of continuous-flow analysis to perform repetitive biomedical tests “automatically.” Necessity being the mother of invention, his work was in response to the laboratories [15, 16]. ever-increasing demand for such analyses in clinical The sample was aspirated into a stream and subsequently reacted with various reagents - introduced into the stream at specified points en route - to detection. Since the method segmented the analytical stream with air bubbles to minimize sample carryover , it was subsequently referred to as air-segmented 1 continuous-flow analysis (ASCFA) to differentiate it from other continuous flow methods. After several discouraging attempts to interest various companies to manufacture the equipment, 2 Skeggs finally persuaded Technicon Instruments Corporation to undertake the project [14]. The result of their collaboration, the , quickly became a fixture in clinical laboratories around the world, and TM AutoAnalyzer later in academic, government and industrial laboratories as its versatility and general 1 2 Contamination of subsequent samples by parts of the current sample left behind in the manifold. Roughly 20 years later, this same problem was to confront the principal developers of another continuous flow method: flow injection analysis [17]. Introduction 7 applicability to wet chemical analysis was gradually recognized. A multichannel variant known as SMATM (or SMACTM when a computer was included for data processing) was later developed. TM was From the I 960s to the early I 980s, the name AutoAnalyzer synonymous with the phrase “automated chemical analysis.” Although the proliferation of ASCFA methods has recently been slowed by the emergence of flow injection analysis (a dynamic continuous flow method described later), ASCFA is still the method of choice for many applications. The principles, applications, and instrumentation of ASCFA have been documented by Furman [18]. Finally, references to automation here have been colloquial and followed terminology in the field - hence, the use of quotes. In consideration of the IUPAC definition for automation, these instruments were far from automatic: they merely mechanized wet chemical assay. Technicon designers apparently never outfitted a commercial ASCFA instrument for closed-loop operation. Beginning in the early I 980s, the Zymark Corporation pioneered and marketed robotic systems for chemical analysis [19]. In contrast to the continuous flow approach employed by the AutoAnalyzer , this system used batch analysis; that is, chemical TM analysis was carried out in a reaction vessel or container. The robot was strategically placed in the middle of a set of stations, each of which was designed to carry out a single function such as sample weighing, decantation, reagent addition, mixing, digestion, temperature-controlled incubation, or detection. The robot was used to transport the sample from one station to another so that successive steps in the analysis could be performed. 1.3 AUTOMATED CHEMICAL ANALYSIS Automation is a powerful approach for the implementation of analytical techniques and methods. It minimizes or eliminates human intervention in the mechanics of chemical analysis, thereby, releasing the chemist for more intellectual pursuits such as methods development, chemical synthesis, modeling, and strategies for data interpretation. In essence, the automaton assumes the role of a technician - a Introduction very capable one. 8 The creative and scientific aspects of chemistry (e.g. design, problem solving, etc.) remain, as they always have, with the analytical chemist. For example, a proper understanding of the fundamental relations between chemical species and their chemical and physical properties is imperative if one is to I) select a physicochemical property for measurement, ii) assess its applicability (scope and limitations) for a particular problem, iii) adapt and improve existing methods to cope with difficult samples, to use analogous chemistries, andlor to meet analytical requirements such as speed or sensitivity [20]. Such knowledge is also essential for proper design of an automated instrument. Automation should not be viewed solely as a means to achieve a human-free laboratory (even though that is ideal for analysis in remote sites, or for samples which are dangerous to human operators, for example); it should also be seen as a powerful addition to the chemist’s repertoire of tools. 1.3.1 Advantages Automation of chemical analysis has obvious economic advantages due to greater work throughput, full utilization of resources, cost reduction per analysis, and reduction in (skilled) labour. The Technicon AutoAnalyzer TM facilitated new types of analysis; the number and variety of analyses done in clinical laboratories greatly increased after its introduction. Manual counterparts of these analyses were either too expensive, unreliable or lengthy [21]. Because automated analyzers can control operating conditions with high precision, they may be used to perform difficult and/or exacting procedures with greater reliability and reproducibility. determination of very low concentrations of selenium - An example is the as the hydride - in water. Here, computer-controlled sequencing reproducibility [22]. Several analyses can often be done together. made the procedure feasible by improving In fact, the feasibility of ASCFA was demonstrated on an analyzer which determined urea and glucose simultaneously [14]. Safety is promoted since the analyst is partially or completely isolated from hazardous chemicals or environments. Automation also Introduction encourages standardization. 9 Indeed, the United States Environmental Protection Agency’s standard methods quote Technicon part numbers for their analyzer components. It is fair to say that analyzers of the same make will perform the same analysis with less variation in the result than had the analysis been conducted by different human analysts becomes more complex. - this situation becoming more likely as the procedure Hence, long-term and between-laboratory variations are expected to decrease. The quality of data is also enhanced by increased precision, more frequent calibration and better drift control of instruments. Finally, automation facilitates collection of more data, real-time data processing and extraction of information. 1.3.2 Disadvantages Although automation can improve the capacity and performance of analytical instruments, it is not without limitations. Processes such as filtration are difficult to automate in a robust manner. Manual analytical methods may involve chemicals which are incompatible with equipment commonly used in automated systems [23]. The necessary hardware may be so expensive that the capital cost of investment cannot be recovered over a reasonable period of time or cannot be justified for the type of work performed. The need for routine maintenance and repair by specialist personnel also adds to the cost. From a practical point of view, chemists must familiarize themselves with the advantages of automated chemical analysis. They must learn to think in terms of automation. If developmental work on chemical automation is to be undertaken, some understanding of electronics, computer science, mathematics and electrical engineering, etc., is necessary in addition to chemistry. The complexity of automated analytical systems can be quite overwhelming and this certainly creates much inertia on the part of (analytical) chemists to delve into the engineering aspect of automation especially if they have been trained exclusively in chemistry. Indeed, Mossotti [24] has Introduction 10 compared the marriage between chemistry and electronics to a shotgun wedding. Similar sentiments could be expressed in connection with other disciplines previously unrelated to chemistry. Such a viewpoint is unfounded and may cease to exist in the future if chemists are to receive training in such disciplines as part of their education. Perhaps the greatest deficiency in automation at present is the lack of flexibility. This situation was recognized in the I 950s [25] but little ground has been covered since then. Automated systems are designed to operate within a limited domain (e.g. over specified ranges of analyte concentration, pH, etc.) and must be supplanted by human control outside this domain hence, the need for alarms that demand human - attention to the process. Therefore, in light of current technology, no automaton (used for chemistry or otherwise) can function entirely without supervision. Advances in this area may be linked to those in artificial intelligence. 1.3.3 Automating the Process of Chemical Analysis Chemical analysis, in the broadest sense, is concerned with the establishment of the partial or complete qualitative (including spatial structure) or quantitative composition of individual chemical species, substances and materials [20]. In general, this process can be decomposed into the following steps: i) sampling, ii) sample preparation, iii) measurement, and iv) information processing. Automation of the entire process, from sample collection to final report generation, would be ideal but has so far been realized in very few situations. Each step is briefly discussed below in connection with automation. Information handling, itself, will be treated in greater detail in the next sub-section since it represents the main thrust of the work described in this thesis. Sampling The objective of sampling is to obtain a representative sample of a mass of material for analysis. In certain instances, this involves sample collection from a field site. Methodologies such as continuous in-line, or non-invasive analysis of chemical process streams circumvent this step entirely. Typically, however, some pre-treatment Introduction 11 of the material is necessary prior to measurement. Usually a sample of the material has to be taken. This is absolutely necessary if the instrument has sample size limitations or the technique employed is destructive. Sampling is then a practical concern. Although sampling theory is highly developed, this step is perhaps the most difficult to automate since i) in general, it is mechanically more complex (e.g. extreme care may be necessary when a subsample must be physically separated from the bulk), and ii) sample collection remains a strategic problem. To date, little, if any, work has been conducted to automate this step. Note that auto-samplers do not automate the sampling procedure; they merely provide a means by which more than one sample may be automatically introduced sequentially into an instrument. Sample Preparation This step may include such treatments as analyte isolation (e.g. extraction, suspension) or modification (e.g. dissolution, derivatization) so that it is suitable for measurement; removal of interferents physically or chemically; and addition of internal standards or controls for monitoring the system. Automation of this step is often best accomplished by departing greatly from corresponding manual methods [21]. This statement is illustrated by approaches which employ continuously (or more recently, intermittently) flowing analytical streams (e.g. ASCFA). Indeed, this concept has led to the development of novel techniques. Some of these will be outlined below in the section on flow injection analysis. Measurement Analyte detection and measurement is an extensively studied area of chemical analysis and chemical automation may be in its most advanced state here. Physical methods of analysis and chemical sensors play a vital role since they provide a “clean” interface between the chemistry and the instrument. A chemical measurement system may comprise one or more detectors capable of making a single measurement or simultaneously and independently making many measurements. Automated data Introduction 12 acquisition (simultaneously) from one or more detectors is a mature technology. Many commercial detector systems such as Hewlett-Packard’s HP 8452A photodiode array spectrophotometer and Finnigan MAT’s TSQ-70 triple quadrupole mass spectrometer already operate autonomously, to some extent; the latter incorporates adaptive feedback in its data acquisition sub-system [8]. Information Processing This step is multifaceted and comprises of i) computations in which an experimental response is translated into practical information (e.g. regression analysis to provide analytical concentration data); ii) determination of confidence limits; iii) interpretation of data to yield chemical information about the site or process under study; and iv) report writing. The computer and associated software is prominent throughout this stage of chemical analysis. Some (automatic) data processing capability is already present within detector systems that incorporate a computer (integrated or stand-alone). Automatic interpretation of data, which may involve identification of components and/or provide answers to specific questions such as “Does the total concentration of chlorinated organics in the mill effluent fall below government llmits?’ç etc. remains a challenge. The consolidation of chemometric methods [26] and laboratory information management systems [27], in particular, database and knowledgebase technology will be necessary. 1.3.4 Information Handling: The Element of Intelligence While mechanically-oriented research has enjoyed much attention, research into information handling, beyond that of simple computations, has not been comparable. This situation is of concern. As Webb and Salin [28] have pointed out, current instrumentation, while capable of performing their tasks, has not been exploited to a significant extent for intelligent operation. Denton emphasized the need for intelligent operation when he posed the following question to the analytical community, “Do you want a number or a valid analysis?” [8]. Introduction 13 The meaning of “intelligent” has always been nebulous and its use here requires definition. Bell [29] considered the term “intelligent machine” in reference to “a machine which responds to information, and also has some capacity for acquiring and manipulating information.” It is in this context that the term intelligence is being used here; that is, the aspect of intelligence is of a type which is concerned with deduction of conclusions and decision making from specified data, not that associated with the creative aspect, which is responsible for induction and association. The issue of whether the latter can ever be acquired by a machine has been raised by Dreyfus [30]. As mentioned previously, the digital computer is central to information handling and powerful, versatile programs are required to make effective use of it. In designing an intelligent instrument, it is helpful to identify some useful generic features. Various workers [8, 28] have done exactly this and a compilation has been put together by the author. The result, with features grouped according to role, is given in Table 1.2. It should be stressed that this list is by no means exhaustive nor are all features equally important. To achieve these lofty goals, maximum use should be made of all available information [8]. Some features will of course be more easily achieved than others. Many of these ideas are self explanatory and have already been touched upon by their originators and thus a discussion here is unwarranted. A model for an intelligent instrument - based on that of Wade and Crouch [8] - which facilitates the implementation of these features is given in Figure 1.2. Here, the computer and chemical apparati form an inseparable unit. An instrument based on this model would correspond to what these researchers have referred to as a fifth generation instrument. At the heart of such an instrument is an expert system, which is a program that embodies the expertise of human experts [31]. It administrates the overall operation of the instrument, which includes such functions as training users, coordinating the distribution of information between sub-systems, determining the sequence of operations, and data interpretation. Some aspects of intelligent chemical analysis are now elaborated upon. Introduction Table I 2 14 Some generic features of intelligent automated chemical instruments Category Feature User Interface Responds to high level commands Administration Schedules work program Orders necessary reagents when supply dwindles or on user request Automates aspects of report writing Selects appropriate method, e.g. in consideration of sample type and source Instructional Aids operator in effective instrument use by monitoring operator commands and actions and providing context sensitive help Contains an on-line “teach” mode to train new users Operation Self-calibration Dynamic self-optimization of data collection and performance Tests its own operation and diagnoses mechanical failure Detects experimental error conditions and provides appropriate alarms, diagnostics and actions Detects user override of machine functions; modifies its operation accordingly Automatically detects interferences and can adapt to deal with difficult samples Won’t execute inappropriate commands Information Carries out signal processing and pattern recognition on results Aids operator in interpreting data Uses site- or batch-specific information in interpretation of analytical data Carries out decision analysis i.e. makes analytical decisions and gives the reasoning behind them. Includes experiment or reference databases Remembers past mistakes and avoids them where possible - Maintenance Indicates time-dependent or periodic degeneration in performance Guides troubleshooting personnel in locating cause of malfunction Carries out limited self-repair work Control of Chemical Processes Although the design of automatic instruments requires a compromise between accuracy, speed and robustness, the best results are obtained if the operation of the instrument is adjusted to suit the type of sample and environment (e.g. noise characteristics) encountered at any given time [32]; that is, the instrument should be the subject of closed-loop control. A distinction can be made between operational Introduction 15 sample preparation and measurement system = = expert system out device Figure 1.2 p alarm Wade and Crouch model for an “intelligent” or fifth generation instrument (see text). control, in which the purpose is to ensure the integrity of mechanical operation (e.g. tracking of a robot arm), and process control, in which mechanical operation is used to effect optimal chemical conditions (e.g. correct pH), which enhance results (e.g. maximum reaction). While the former is equally important, the latter is more interesting to an analytical chemist. Knowledge about the chemical process, together with appropriate control strategies and suitable inputs to the control system, secures effective control of instrument operation and ensures a high quality of data. The inputs themselves must faithfully reflect the chemical condition(s) which are to be controlled. Diagnostics The integrity of instrument operation is the key to reliability; an instrument that is prone to malfunction discourages acceptance. Although diagnostics generally refer to checks for hardware faults, in chemical analysis instrumentation it is also appropriate to use this term in reference to checks for process or analytical “malfunctions,” (e.g. chemical/spectral interferences, harsh sample matrix) which can lead to incorrect Introduction data analysis [28]. 16 Diagnostics may be conducted directly through hardware (a capability of many commercial detector systems). They may also be accomplished through software via i) “controls,” or test samples of known composition and/or ii) analysis of the shape of the analytical signal (e.g. test for agreement of signal collected with the expected form). The use of controls is quite common, especially in clinical analyses [18]. While peak shape analysis is not new [33], it deserves greater attention: one should expect a “quality index” to indicate measurement quality for an analytical determination. Particular emphasis should be placed on detecting conditions which lead to instrument malfunction or breach of instrumental and analytical integrity. Chemical Data Processing As the capabilities of instruments increase, the resulting amount of data (and information) may also be expected to increase. Besides the value of increased selectivity, this may also prompt analytical chemists (other than chromatographers) to be more adventurous and venture into the traditionally less well explored domain of multicomponent analysis. The sheer volume of data (potentially) generated can only be handled effectively if data reduction techniques and/or multivariate methods of analysis such as statistical pattern recognition are employed. Interpretation of such data is effectively conducted only with the aid of a computer. The realm of signal enhancement is becoming more oriented to real-time processing. optimization of digital filters for smoothing data is feasible [34]. Automatic Finally, new quantitation methods may be devised and employed to extend the usable dynamic range of existing chemical methods [35]. 1.4 FLOW INJECTION ANALYSIS Flow injection analysis [17] (FIA) is one of the fastest growing fields in analytical chemistry, with over 3700 papers 3 published on the subject to date. Conceived as a Based on the number of flow injection papers catalogued by Chemical Abstracts as of 1992. Introduction 17 means for rapid yet precise, cost-effective, “automatic” reagent assay, it has found widespread acceptance for industrial, clinical, biotechnological and environmental applications [17]. The original concept was simple but hardly new: assembly-line processing of a sample in which the “conveyer belt” was a continuous flowing stream and necessary reagents were added at predetermined points along the path to a detector. It is of no surprise then that, during its first decade of growth, much effort was devoted to the development of methodologies which demonstrated its prowess over the more established ASCFA method [36]. FIA was able to transcend this “advantage- through-mechanization” philosophy when the reproducible sample concentration gradient caused by sample dispersion into the carrier stream (aided by flow) was recognized and exploited [37]. This led to the development of different modes of operation which improved performance and enabled novel schemes such as those for dispersion-based calibration, reaction-rate methods of analysis, zone sampling etc. to be implemented [38]. Gradient techniques, such as these, conceptually distinguished FIA from other flow methods of analysis and transformed FIA into a powerful method, not only for solution handling, but also for solution manipulation [36]. This realization has led to the more recent development of sequential injection analysis [39], which paves the way for increased control over aspiration of sample and reagents and consequently, greater control of the gradients produced. Today, FIA is recognized to be an ideal system for the development and testing of sensors and wet chemical analytical procedures [36]. Like many other analytical techniques, FIA evolved over a period of time and incorporated ideas from various other disciplines, notably, chromatography and chemical engineering. Many have traced the origin of FIA to the work of Nagy et a!. [40, 41] who evaluated the analytical performance of a silicone rubber-based graphite electrode on samples injected into flowing media. The concept of continuous flow analysis, however, predated that. Skeggs’ classic work [15, 16] on air-segmented continuous-flow analysis was a revolutionary concept for wet chemical analysis. A Introduction 18 short time after Skeggs published his work, Spackman et a!. [42] described the method of post-column derivatization for chromatography. These developments laid the foundation for the creation of FIA. The first working definition of flow injection analysis, and the first usage of that term, is linked to the pioneering work of Ruzicka and Hansen [43]. At about the same time, Stewart et a!. [44] published similar work but did not define the scope of operation. During this period, much controversy abounded as to what constituted FIA and arguments based on mechanical aspects only confused the issue [37]. Betteridge [37] has noted, “... As the analytical significance of maintaining the integrity of the sample plug and allowing mixing to be controlled by diffusion processes...” is central to this technique. Work in subsequent years, largely spearheaded by Ruzicka, consolidated this idea and that of reproducible concentration gradients, to couch the concept of FIA in its present form [37]. FIA offers numerous practical advantages for chemical assay. In addition to automation, it is also amenable to miniaturization. Impressive examples are manifolds machined in plastic blocks the size of a credit card [38] and a high precision flow cell machined within the case of a light emitting diode [45]. The capacity for multiple determinations from one sample, sample containment, analyte preconcentration, sample dilution, matrix removal and self-cleaning are well documented [36]. In terms of performance, it is characterized by excellent reproducibility (typically less than I % relative standard deviation in peak height), good accuracy and reliability, and high throughput rates (from 60 to 200 samples per hour) [46]. Finally, it is also fairly easy and inexpensive to assemble components into an FIA system and many components can be purchased “off-the-shelf” or built in-house. Indeed, the first systems reported by Ruzicka and Hansen used Plexiglas TM and polypropylene tubing, all mounted on a TM board [37]. Such simplicity facilitates fast start-up and shut-down of operation LEGO and minimizes maintenance. FIA can also play a significant role in chemical research. Since analyzer Introduction 19 modularity leads to flexibility, adaptability and upgradability, FIA systems can easily be optimized for reaction conditions and resources. More importantly, the facile production of chemical gradients via FIA offers a new approach for the research chemist to perform and study chemical reactions. Whereas, optical emission spectroscopists espouse the virtues of simultaneous multielement determination, flow injection analysts may champion those of simultaneous multi-sample, multianalyte capability via reproducible concentration gradients. The following sections serve to briefly review the principles and mechanics, and theory of FIA. For a more comprehensive treatment of the subject, the reader is directed to the literature [17, 47-51]. Some applications are described and the role of automation is discussed. The latter is emphasized, in particular, with respect to “soft automation” since it can transform FIA into a powerful engine for chemical exploration [52]. 1.4.1 Principles and Mechanics The basic components of an FIA system are shown in the simple manifold of Figure 1.3. It consists of a pump, which is used to propel the carrier/reagent stream through a narrow tube; an injection port through which a sample is introduced into this stream; a reaction manifold (e.g. a coiled tube) where the sample disperses and (possibly) reacts with components of the carrier stream; and a (flow-through) detector sample 1 carrier towaste I pump Figure 1.3 injection fort reaction manifold Basic components of an FIA system. flow cell and detector Introduction 20 which selectively quantifies the chemical species of interest. In operation, a pre-defined volume of sample solution is injected as a plug into a flowing, unsegmented stream of a suitable liquid carrier within a tube. Controlled (and controllable) mutual physical dispersion between the sample and carrier results as the sample is transported downstream to a detector. If the carrier is also a reagent, then reaction between analyte and reagent molecules occurs simultaneously with dispersion. Further addition of reagents and further reactions may take place before the sample bolus arrives at the detector. 4 The time from injection to detection is determined entirely by the carrier flow rate and the geometry of the FIA system and is therefore, identical for each injection. These three concepts: sample injection, controlled dispersion and precise timing, form the principles of FIA. Their application leads to a three-dimensional physicochemical environment which exhibits highly reproducible concentration gradients. The final profile of a typical response curve (or residence time distribution curve), as detected, is determined by - in addition to detector characteristics both the extent of dispersion and chemical reaction. A typical - signal can be described as a skewed Gaussian whose height at any position along the curve, width at a specified height, or area may be used for quantitation. It is apparent that FIA is a dynamic technique and homogeneous mixing and chemical equilibrium are not normally observed (in fact, mixing is highly directional). However, the principles of FIA ensure that these conditions need not be necessary for an analysis to be successful since each sample is treated in exactly the same manner, both spatially and temporally. Indeed, this very idea was originally used to increase sample throughput [43]. Two physical processes, convection and diffusion, are responsible for sample dispersion. The former is induced by flow and causes the sample zone to acquire a parabolic head and tail; the latter causes the concentration profile across the zone to It should go without saying that sufficient time must be allocated for product formation prior to detection. Introduction 21 take on a Gaussian shape (Figure 1.4). The relative contributions of each depend on the flow rate, radius of the tube, time of sample residence in tube, temperature, viscosity and the diffusion coefficient [37]. In FIA, longitudinal diffusion is negligible in comparison to convection but radial diffusion and mixing is important and could be dominant at low flow rates. Secondary flow along the radial axis is observed in nonstraight tubes; coiling the tube enhances radial mixing and so minimizes longitudinal dispersion. Excessive dispersion results in loss of sensitivity. Sufficient dispersion is required for mixing and reaction to take place and for the production of a concentration gradient. An empirical measure of dispersion, the dispersion coefficient, was developed by Ruzicka and Hansen as an aid in designing flow injection systems. It is defined as the ratio of sample concentration before and after dispersion has occurred [17]: Co D=— (1.1) C where D is the dispersion coefficient, C° is the original sample concentration and c is the concentration after dispersion (commonly measured at peak maximum). The dispersion coefficient accounts for the physical process of dispersion only and changes a) b) c) d) A Figure 1.4 Schematic representation of the effects of convection and diffusion on the injected sample observed at an appropriate point downstream from injection. Stream moves from right to left. Shown are the transport conduit cross-section (top) and concentration profile (bottom) for a) no dispersion, b) dispersion from convection only, c) dispersion from diffusion only, and d) dispersion from both convection and diffusion. Introduction 22 with sample residence time. Its usefulness stems from the fact that it has been linked to the dimensions of the tube, flow rate and sample volume; practical expressions have been derived for certain ranges in the magnitude of the dispersion coefficient [17]. The versatility of FIA is evident from the variety of more complex configurations which can be derived from the basic scheme of Figure 1.3, the multitude of different reagents that can be used for a particular analysis, and the various techniques that have been developed to take advantage of the chemical environment produced. Researchers have reported the use of dual injectors [53] and reagent injection [54]. Reaction manifolds which employ dialysis [55], gas diffusion [56], solvent extraction [57], immobilized enzymes [58], catalytic reactors [59] and ion exchangers [60] have been described. Almost every form of chemical detection from amperometry [61] to UV visible spectrometry [62] has been used. Stopped-flow [63], merging zones [64], flowreversal [65], sinusoidal flow [66], sequential injection [39] and combinations thereof are examples of different modes of analyzer operation which have been developed. 1.4.2 General Theory and Models The flow injection response curve arises from the combined effects of dispersion and chemical reaction. However, a satisfactory mathematical description of this has thus far eluded theoreticians for all but the simplest manifolds and several workers [17, 67] have noted how much theory has lagged behind practice. The sheer number of configurations and modes of operation available have contributed to this situation. Nevertheless, some progress has been made and a number of concepts, models and strategies can be found in [17] and references therein. Usable expressions have been derived for certain types of simple manifolds [68-70]. An excellent review of the kinetic aspects has been given by Hungerford and Christian [71]. Hull et a!. [67] have recently reviewed various strategies for describing dispersion. An improved theoretical understanding of the effects of such processes as mutual dispersion between sample plug and carrier or chemical kinetics within a gradient, which are Introduction 23 unique to FIA, assists in defining and quantifying physical constraints and critical factors and thus, plays an important role in the development of novel experiments, optimization of manifolds and prediction of peak shape. Simultaneous dispersion and chemical reaction in a flow system have already been studied in chemical engineering and in chromatography prior to the advent of FIA. These fields developed “classical” descriptions of dispersion based on statistical moments, mixing lengths and mass transfer rates [17]. Some basic analytic models are given in Table 1.3. These equations provide a useful starting point for understanding and quantifying dispersion. However, their predictive capability is limited since they only take into account the physical dispersion of the sample in the manifold while ignoring dispersion which result from injectors, detectors and/or connecting components, mutual dispersion by the reagent, and the effects of chemical kinetics. Some workers have addressed the latter omission by incorporating reaction kinetics as an additional parameter in these models [17]. One classical description of dispersion for a solute plug in continuous flow systems is provided by the general diffusion-convection equation [67]: dc c 2 d c 2 d dc dc (1.2) where c is concentration (moliL), t is time (s), u is linear flow velocity (cmls), x is the axial distance and Dm is the molecular diffusion coefficient (cm ls). The first attempt at 2 solving this equation was conducted by Taylor [72,73] who derived expressions under two limiting cases. Unfortunately, in both cases, the diffusion-convection regions investigated were not typical of FIA, rendering the expressions useless in practice. However, this work was sufficient to demonstrate that i) mixing can be complete, ii) the concentration gradient is both reproducible and predictable and iii) the peak shape is affected by differences in sample and carrier matrix. Since this work, others [74-78] have made substantial improvements to the solution and increased its general applicability. Vanderslice et a!. [79] used this approach to predict the shape of the 24 Introduction Table 1.3 Basic models and equations for flow injection analysis Equation Model ___e_(1_t2u1(48) General equation: the C-curve c(t) Taylor’s equation c(x)_ Tanks-in-series (gamma function) c(t) (1.3) 25 — = m I 2.Jn.8L2 2 2rr I(L24S) 2 .(1_X) e (14) I L%jJ (N — e_tffj I)! (1 .5) where: c o L m N = = = = = concentration (mol/L) dispersion number (dimensionless) total length of a line (cm) mass of injected material (g) number of mixing stages (dimensionless) r = tube radius (mm) t = time (s) 1 T = = x = mean residence time in one tank (s) mean residence time (s) distance from peak maximum (cm) sample bolus at various flow injection conditions. Their results were later substantiated experimentally by Korenaga et a!. [80]. Modifications which incorporated chemical kinetics have also been reported [81, 82]. A simplified hydrodynamic model, the axially-dispersed plug flow model: c 2 d dc dc_DLUJ (1.6) where DL is the axial dispersion coefficient (cm /s) which incorporates all factors 2 (molecular diffusion, convection, etc.) that contribute to axial dispersion, Xa is the axial position (cm) and all others as described above, has also been used [83-85]. A model borrowed from chromatography is the “tanks-in-series” model [37, 86]. It is analogous in concept to the “height equivalent to a theoretical plate” model. The tube between injection and detection is conceptualized as consisting of a series of Introduction 25 mixing tanks whose size depend on the tube dimensions and flow rate. The gamma function (see Table 1.3) the first tank - - used to account for the distribution of a sample injected into can explain the physical dispersion of the sample as a function of time. A derivation based on the input-output relation for a perfect mixer subject to mass balance is given by Smit and Scheeren [87] and more recently, Berthod [88] has shown that the model can be generated by (a series of) exponential modification(s) on a rectangular function. The model, however, does not specify a mechanism since, alone, it is not physically meaningful [71]. Application of the model has led to the development of some practical expressions relating the dispersion coefficient, over certain ranges, to flow rate and tube length [37]. It has also been used for predicting the response from a flow injection-atomic absorption spectrometry system [89] and to treat chemical kinetics via a mass balance approach [71, 90]. A different computational approach, the random walk simulation model, was used by Betteridge et a!. [91] for a single-line FIA system and later extended by Crowe et a!. [92] to merging zones. In this, a “sample plug” of about 1500 solute molecules was injected and subjected to laminar flow. Each molecule was i) moved through a distance determined by the local flow velocity and then ii) moved in a random direction by a distance proportional to the diffusion coefficient. Chemical reaction was deemed possible if the local sample and reagent concentration exceeded a predefined threshold. The evolution of the mixing and chemical reaction was graphically displayed at discrete time intervals. This technique showed promise for investigating sample size, chemical kinetics and mixing since it modeled changes at the molecular level. Recently, Wentzell et a!. [93] extended this work and showed that results from the random walk model compared favorably with those from solution of the general diffusion-convection equation ([76, 80]). This is not surprising since the term describing diffusion in (1.2) can be derived from the random walk concept. Limitations in predictive theory have prompted some workers to employ empirical descriptive approaches [67]. In general, these are more restrictive: they are primarily Introduction 26 developed to achieve optimal performance on a given FIA system. An exception is impulse response deconvolution employed by Poppe [94] for modeling detector systems and by van Nugteren-Osinga et aL [95] for modeling flow injection components with the goal of predicting peak shapes. Other examples include the use of response surface mapping [52] and statistical moments analysis [96, 97]. 1.4.3. Applications FIA has been successfully applied to a variety of reagent- and immunoassays [17]. It can be used simply as a means for sample introduction (e.g. in atomic absorption spectrometry [98]); in more advanced systems it may incorporate sample handling and pre-treatment steps such as analyte pre-concentration, chemical speciation, selective reactions and matrix modification [50]. A comprehensive review of this area has been given by Clark et a!. [99]. Since the potential of FIA is realized through exploitation of the physicochemical gradient, this consideration has dominated the design of many novel FIA methods. As far back as 1978, Betteridge [37] had compiled an impressive list of applications which highlighted the power and versatility of this approach. Gradient methods take advantage of the gradient (i.e. the continuum of sample-to-carrier concentration ratios) to differentiate chemistries, reaction rates and concentrations. In this way, multi- element analysis, titrimetry, potentiometry, reaction rate methods, single-standard calibration, standard addition, dilution (electronic or by zone isolation), zone sampling and zone trapping can be facilitated [17]. Most recently, FIA has been employed to enhance flow cytometry and fluorescence microscopy for the study of cellular processes [100]. 1.4.4 Automated Flow Injection Analysis The assembly-line concept of FIA lends itself to automation, a practice that, as mentioned above, increases efficiency, reproducibility and reliability while reducing the IntroductIon 27 cost of analysis. Although these attributes drive automation in commercial enterprises, more potent reasons encourage its adoption in research-oriented systems. From the preceding sections, it is clear that FIA is a powerful solution management system for chemical processing. Automation affords a way to harness this power: It can enable precise valve timing (necessary, for example, to truncate peaks reproducibly for zone dilution/sampling); facilitate complex injection and flow schemes (sequential injection, stopped flow, flow reversals, etc.); and expedite such tedious, but worthwhile, tasks as empirical performance optimization and multivariate mapping. Though automation can be implemented entirely through hardware, a more attractive (from a research perspective) and flexible approach is to. use programmable microprocessors. In most laboratories, this invariably means use of a computer. With it, data acquisition, data processing and report writing can also be integrated. It was Betteridge [101] who first developed and demonstrated the advantages of a computer-controlled flow injection system. laboratory evolved from this work [102]. One system currently used in this It has been used to i) rapidly develop and characterize methods [103, 104], ii) compare and contrast different FIA modes of operation (conventional, stop flow and flow reversal) [62] or manifold designs [23], and iii) explore chemical response surfaces to obtain global information on reaction performance (conditions of optimum sensitivity, reproducibility, interference, etc.) [52]. The computer was shown to facilitate experiments which could not be conducted reliably by manual means (e.g. flow-reversal FIA). Other workers have also reported computer-controlled FIA systems [105-107]. Through computerization, the role of FIA can extend beyond routine chemical analysis and into the realm of an efficient, versatile chemical reactor for research purposes [108]. Besides instrumental control, the computer can also be used advantageously by chemists to exercise some measure of control over the data produced by the analyzer. For example, the extent of (digital) filtering, baseline correction and/or method of quantitation may be adjusted to suit the analysis. In more advanced applications, Introduction 28 diagnostic routines may be used to increase robustness and thus facilitate analyzer independence. Software will continue to play an increasingly important role in FIA, especially in the area of signal processing and information extraction. The concept of FIA, enhanced by advances in information handling and those in hardware, may perhaps someday result in the achievement of a fifth generation flow injection instrument and the realization of “keyboard-driven chemistry”. 1.5 SCOPE OF THIS THESIS Our laboratory has a keen interest in the development of computer-controlled automated FIA systems [102] and their application to chemical research and on-line process monitoring and control [23]. In the past, much effort has been expended in the development of novel manifold components, modes of operation, and testing of optimization schemes [108]. It was recognized that further advances in analyzer automation must also include parallel progress in (mathematical) methods for handling information arising from both mechanical and analytical (i.e. physicochemical) sources [108]. The purpose of this dissertation is to initiate research in this area with the intent that it will ultimately lead to the development of a fifth generation analyzer capable of operating completely independent of human supervision. The specific aim of the work presented is to develop strategies (and necessary methods) to enhance the robustness, analytical reliability and performance of automated flow injection analyzers. Two areas have been targeted for study. The first is concerned with strategies to identify faulty analyzer operation so as to promote analytical integrity. This is of practical importance in an (often harsh) industrial environment where these systems may encounter perturbations from normal operation (e.g. temperature differences between sample and reagent(s), large changes in sample pH, introduction of gases or particulates), and experience accelerated aging or fouling of components. It is also relevant to automated research environments when products are not well characterized and the range of usable operating conditions is unknown. The most obvious approach Introduction 29 is to arm the analyzer with an arsenal of sensors, each singly, or a combination of which is tuned to a specific problem, mechanical or chemical. The result is high specificity and rapid response. However, this could be costly and suitable (and robust) sensors may not be available. To bolster the effectiveness of the diagnostic system when only a limited number of sensors is available (as is the case here and in virtually all FIA systems), an appropriate form of “intelligence’ could be embedded within the data analysis and control algorithms - a task that can be realized quite economically. In effect, this would allow maximum use to be made of all available information. As outlined above, the shape and magnitude of the flow injection peak is determined by physicochemical factors such as convection, diffusion, mixing and chemical kinetics. As such, direct analysis of peak shape could uncover physical and chemical faults or unexpected processes during a determination. With this approach, the goal is to develop an optimal set of peak parameters or descriptors which would, singly or in combination, be representative of aspects of peak shape. Once developed, they may be used as input to expert systems, neural networks or any one of many other pattern recognition methods. The second area is concerned with the ubiquitious task of data filtering. Although a variety of filtering methods exist, presently, there lacks suitably general methods for optimizing filter parameters. Since quality of data invariably determines quality of results, the importance of filtering data optimally cannot be overstated. In automated flow injection systems, this must be done automatically. In this dissertation, both areas are tackled via the use of orthogonal polynomials in combination with advanced chemometric methods. A synopsis of each chapter follows. Chapter 2 outlines a strategy based on the use of peak shape descriptors for the identification/classification of physicochemical problems which manifest themselves in flow injection experiments. A survey of descriptors which have been used in other fields is presented. Two approaches to descriptor computation, numerical and model- Introduction based, are assessed. 30 From these considerations, a class of descriptors based on coefficients from orthogonal polynomial models are proposed for evaluation and use. These have not been exploited previously for peak shape analysis. The potential of these descriptors for peak shape analysis is realized by powerful mathematical methods such as statistical pattern recognition. Therefore, a brief introduction to pattern recognition, and principal components analysis in particular, is included within the chapter and serves as background material for work presented in the next chapter. In Chapter 3, the proposed descriptors are described, their numerical properties assessed and their potential for peak shape analysis via a pattern recognition approach evaluated. These descriptors are the coefficients from a generalized Fourier expansion of the flow injection signal into a set of orthogonal functions. This, in fact, is the basis of spectrum analysis. Three families of orthogonal functions are employed: complex exponential functions, Gram polynomials and Meixner polynomials. The first is associated with the classical Fourier series and the second is notable in analytical chemistry because it is used to derive the often-used Savitzky-Golay filters. The third is new to analytical chemistry. Numerical properties in approximation for all three are ascertained with simulated data. The effects of noise on the reproducibility of the individual descriptors are discussed. Meixner polynomials contain a time scale parameter, whose value determines the accuracy of representation. Methods for selecting practical values are described and the effects of noise on time scale selection are assessed. The potential of these descriptors as inputs to principal components analysis is evaluated with simulated data. To aid in selection of the optimal number of descriptors to employ for a given classification problem, a criterion based on the Fisher weight is proposed for use in conjuction with principal components analysis. An application to a real chemical system demonstrates the effectiveness of these descriptors for peak shape analysis and the viability of the approach. The use of the three aforementioned families of orthogonal functions for filtering is presented in Chapter 4. These functions can be implemented as filters in two Introduction convenient ways - 31 as finite impulse response filters and as indirect filters. These two filter implementations are compared with simulated flow injection data using the Gram polynomials as a case study. Comparisons are made between the three sets of orthogonal functions again with simulated data. Two methods are proposed for near optimal indirect filtering: the Akaike information-theoretic criterion and cross-validation. The performance of each is evaluated on both simulated and real data. Introduction 32 REFERENCES [1] I. M. Kolthoff, in Treatise on Analytical Chemistry, Part I, 2nd edn., Vol. 1, John Wiley and Sons: New York, 1984, 1. [2] A. P. Wade, S. R. Crouch and D. Betteridge, TrAC Trends Anal. Chem., 7 (1988), 358. [3] P. Van Dalen, paper presented at the French Scientific and Technical Press Meeting at Mesucora Physique, Paris, December 1985. [4] P. C. Barker, Computers in Analytical Chemistry, Pergamon Press: Oxford, 1983, Ch. 8. [5] IUPAC Compendium of Analytical Literature, Pergamon Press: Oxford, 1978, pp. 22-23. S. Handel, The Electronic Revolution, Penguin Books: New York, 1967. [6] [7] D. A. Bell, Intelligent Machines: An Introduction to Cybernetics, Blaisdell: New York, 1962, p. 35. [8] A. P. Wade and S. R. Crouch, Spectrosc., 3 (1988), 24. [9] P. B. Stockwell, J. Autom. Chem., 12 (1990), 95. [10] J. K. Foreman and P. B. Stockwell, in Topics in Automatic Chemical Analysis, Vol. 1, J. K. Foreman and P. B. Stockwell (eds.), Ellis Horwood: Chichester, 1979, p. 13. [11] M. Crook, Chemom. Intell. Lab. Syst., 17 (1992), 3. [12] H. R. Karp, Control Engineering, 6 (May, 1959), 40. [13] J. W. Bernard and F. J. Soderquist, Control Engineering, 6 (November, 1959), 124. [14] L. T. Skeggs Jr. in preface to Automation in Analytical Chemistry, L. T. Skeggs Jr. (ed.), proceedings of the Technicon Symposia, 1965, Mediad Incorporated: New York, 1966. [15] L. T. Skeggs Jr., Clin. Chem., 2 (1956), 241. [16] L. T. Skeggs Jr., Amer. J. Clin. Pathol., 28(1957), 311. [17] J. Ruzicka and E. H. Hansen, Flow Injection Analysis, 2nd edn., John Wiley and Sons: New York, 1988. [18] W. B. Furman, Continuous Flow Analysis: Theory and Practice, Marcel Dekker: New York, 1976. [19] F. H. Zenie, J. Autom. Chem., 133 (1991), 39. [20] P. J. Elving and H. Kienitz, in Treatise on Analytical Chemistry, Part I, 2nd edn., Vol. 1, John Wiley and Sons: New York, I 984, 53. Introduction [21] M. Margoshes and D. A. Burns, in Treatise on Analytical Chemistry, Part I, 2nd edn., Vol. 4, John Wiley and Sons: New York, 1984, 413. [22] D. G. Porter and A. L. Dennis, J. Autom. Chem., 2 (1979), 134. [23] M. D. Kester, J. A. Homer, H. Nicolidakis, A. P. Wade and J. T. Wearing, Process Control and Quality, 2 (1992), 305. V. C. Mossotti, in Treatise on Analytical Chemistry, Part I, 2nd edn., Vol. 4, John Wiley and Sons: New York, 1984, 1. [24] 33 [25] 0. Rubinfien, Control Engineering, 1 (September 1954), 64. [26] M. A. Sharaf, D. L. Iliman and B. R. Kowaiski, Chemometrics, John Wiley and Sons: New York, 1986. [27] R. D. McDowaII (ed.), Laboratory Information Systems: Concept, Integration and Implementation, Sigma Press: Wilmslow, 1988. [28] D. P. Webb and E. 0. Salin, Intell. Instr. Comp., 9(1991 ),185. [29] D. A. Bell, Intelligent Machines: An Introduction to Cybernetics, Blaisdell: New York, 1962, p. 1. [30] H. L. Dreyfus, What Computers Can’t Do: A Critique of Artificial Reason, Harper and Row: New York, 1972. [31] A. R. de Monchy, A. R. Forster, J. R. Arretteig, L. Le and S. N. Deming, Anal. Chem., 60 (1988), 1 355A. [32] D. A. Bell, Intelligent Machines: An Introduction to Cybernetics, Blaisdell: New York, 1962, p. 62. M. A. Blaivas and A. H. Mencz in Automation in Analytical Chemistry, Technicon Symposium, Mediad: New York, 1967, 133. [33] [34] R. J. Larivee and S. 0. Brown, Anal. Chem., 64 (1992), 2057. [35] M. J. Wintjes, M. D. Kester and A. P. Wade, paper 717 presented at the 19th annual conference of the Federation of Analytical Chemistry and Spectroscopy Societies, Philadelpia, PA, September 20-25, 1992. [36] J. Ruzicka, Anal. Chim. Acta, 261 (1992), 3. [37] D. Betteridge, Anal. Chem., 50 (1978), 832A. [38] [39] J. Ruzicka, Anal. Chem., 55 (1983), 1 040A. J. Ruzicka and C. D. Marshall, Anal. Chim. Acta, 237 (1990), 329. [40] C. Nagy, ZS. Fehér and E. Pungor, Anal. Chim. Acta, 52 (1970), 47. [41] E. Pungor, ZS. Fehér and G. Nagy, Anal. Chim. Acta, 51(1970), 417. [42] 0. H. Spackman, W. H. Stein and S. Moore, Anal. Chem., 30(1958), 1190. [43] J. Ruzicka and E. H. Hansen, Anal. Chim. Acta, 78 (1975), 145. [44] K. K. Stewart, C. R. Beecher and P. E. Hare, Fed. Proc., 33 (1974), 1439. Introduction 34 [45] P. K. Dasgupta, H. S. Bellamy, H. Liu, J. L. Lopez, E. L. Loree, K Morris, K. Petersen and K. A. Mir, Talanta, 40 (1993), 53. [46] M. Gisen, C. Thommen and K. F. Mansfield, Anal. Chim. Acta, 179 (1986), 149. [47] M. Valcarcel and M. D. Luque de Castro, Flow Injection Analysis: Principles and Applications, Ellis Horwood: Chichester, 1987. [48] B. Karlberg and G. E. Pacey, Flow Injection Analysis: A Practical Guide, Elsevier: New York, 1989. [49] J. L. Burguero, Flow Injection Atomic Spectroscopy, Marcel Dekker: New York, 1989. [50] J. Moller, Flow Injection Analysis, Springer-Verlag: Heidelberg, 1988. [511 K. Ueno and K. Kina, Introduction to Flow Injection Analysis, Kodansha Scientific: Tokyo, 1983. [52] A. P. Wade, P. M. Shiundu and P. D. Wentzell, Anal. Chim. Acta, 237 (1990), 361. [53] A. Izquierdo, P. Linares, M. D. Luque de Castro and M. Valcarcel, Fresenius. J. Anal. Chem., 336 (1990), 490. [54] J. L. Perez Pavon, B. Moreno Cordero, E. Rodriguez Garcia and J. Hernandez Mendez, Anal. Chim. Acta, 230 (1990), 217. [55] J. F. van Staden, Anal. Chim. Acta, 261 (1992), 453. [56] W. Frenzel and C. Y. Liu, Fresenius. J. Anal. Chem., 342 (1992), 276. [57] C. C. Lindgren and P. K. Dasgupta, Talanta, 39(1992), 101. [58] T. Yao, M. Satomura and T. Wasa, Anal. Chim. Acta, 261 (1992), 161. [59] T. Kawashima and S. Nakano, Anal. Chim. Acta, 261, (1992), 167. [60] 5. Olsen, L. C. R. Pessenda, J. Ruzicka and E. H. Hansen, Analyst, 108 (1983), 905. [61] K. Matsumoto, K. Sakoda and Y. Osajima, Anal. Chim. Acta, 261 (1992), 155. [62] P. M. Shiundu and A. P. Wade, Anal. Chem., 63 (1991), 692. [63] G. 0. Christian and J. Ruzicka, Anal. Chim. Acta, 261 (1992), 11. [64] H. Bergamin, E. A. G. Zagatto, F. J. Krug and B. F. Reis, Anal. Chim. Acta, 101 (1978), 17. [65] D. Betteridge, P. B. Oates and A. P. Wade, Anal. Chem. 59 (1987), 1236. [66] J. Ruzicka, G. 0. Marshall and G. D. Christian, Anal. Chem. 62(1990), 1861. [67] R. D. Hull, R. E. Malick and J. G. Dorsey, Anal. Chim. Acta, 267 (1992), 1. [68] T. Korenaga, H. Yoshida, F. Shen and T. Takahashi, TrAC Trends Anal. Chem., 8 (1989), 323. [69] S. R. Bysouth and J. F. Tyson, Anal. Chim. Acta, 261 (1992), 549. Introduction 35 [70] S. D. Kolev and W. E. van der Linden, Anal. Chim. Acta, 268 (1992), 7. [71] J. M. Hungerford and G. D. Christian, Anal. Chim. Acta, 200 (1987), 1. [72] G. Taylor, Proc. R. Soc. London, Ser. A, 219 (1953), 186. [73] G. Taylor, Proc. R. Soc. London, Ser. A, 225 (1954), 473. [74] V. Ananthakrishnan, W. N. Gill, A. J. Barduhn, AIChEJ., 13(1965), 1063. [75] H. Bate, S. Rowlands, J. A. Sirs and H. W. Thomas, Br. J. App!. Phys., 2 (1969), 1447. [76] J. T. Vanderslice, K. K. Stewart, A. G. Rosenfeld and D. J. Higgs, Talanta, 28 (1981), 11. [77] M. A. Gomez-Nieto, M. D. Luque De Castro, A. Martin and M. Valcarcel, Talanta, 32 (1985), 319. [78] P. L. Kempster, H. R. van Vliet and J. F. van Staden, Talanta, 36 (1989), 969. [79] J. T. Vanderslice, A. G. Rosenfeld and G. R. Beecher, Anal. Chim. Acta, 179 (1986), 119. [80] T. Korenaga, F. Shen, H. Yoshida, T. Takahashi and K. Stewart, Anal. Chim. Acta, 214 (1988), 97. [81] J. T. Vanderslice, G. R. Beecher and A. G. Rosenfeld, Anal. Chem., 56 (1984), 268. [82] V. P. Andreev and M. I. Khidekel, Anal. Chim. Acta, 278 (1993), 307. [83] J. H. van den Berg, R. S. Deelder and H. G. M. Egberink, Anal. Chim. Acta, 114 (1980), 91. [84] S. D. Kolev and E. Pungor, Anal. Chim. Acta, 185 (1986), 315. [85] S. D. Kolev and E. Pungor, Anal. Chem., 60 (1988), 1700. [86] J. Ruzicka and E. L. Hansen, Anal. Chim. Acta, 99 (1978), 37. [87] H. C. Smit and P. J. H. Scheeren, Anal. Chim. Acta, 215 (1988), 143. [88] A. Berthod, Anal. Chem., 63 (1991), 1879. [89] J. F. Tyson and A. B. Idris, Analyst, 106 (1981), 1125. [90] J. M. Reijn, H. Poppe and W. E. Van der Linden, Anal. Chem., 56 (1984), 943. [911 D. Betteridge, C. Z. Marczewski and A. P. Wade, Anal. Chim. Acta, 165 (1984), 227. [92] C. D. Crowe, H. W. Levin, D. Betteridge and A. P. Wade, Anal. Chim. Acta, 194 (1987), 49. [93] P. D. Wentzell, M. R. Bowdridge, E. L. Taylor and C. MacDonald, Anal. Chim. Acta, 278 (1993), 293. [94] H. Poppe, Anal. Chim. Acta, 114 (1980), 59. introduction 36 [95] I. C. van Nugteren-Osinga, M. Bos and W. E. Van der Linden, Anal. Chim. Acta, 214 (1988), 77. [96] K. R. Harris, J. Solut. Chem., 20 (1991), 595. [97] S. H. Brooks, D. V. Leff, M. A. Hernandez-Torres and J. G. Dorsey, Anal. Chem., 60(1988), 2737. [98] J. A. C. Broekaert and F. Leis, Anal. Chim. Acta, 109 (1977), 73. [99] G. D. Clark, D. A. Whitman, C. D. Christian and J. Ruzicka, CRC Critical Reviews in Anal. Chem., 21(1990), 357. [100] J. Ruzicka and W. Lindberg, Anal. Chem., 64 (1992), 537A. [101] D. Betteridge, T. J. Sly, A. P. Wade and D. G. Porter, Anal. Chem., 58 (1986), 2258. [102] P. D. Wentzell, M. J. Hatton, P. M. Shiundu, R. M. Ree, A. P. Wade, D. Betteridge and T. J. Sly, J. Autom. Chem., 11(1989), 227. [103] P. M. Shiundu, A. P. Wade and S. B. Jonnalagadda, Can. J. Chem., 68 (1990), 1750. [104] P. M. Shiundu, P. D. Wentzell and A. P. Wade, Talanta, 37 (1990), 329. [105] G. D. Clark, C. 0. Christian, J. Ruzicka, G. F. Anderson and J. A. van Zee, Anal. lnstr., 18(1989), 1. [106] G. D. Marshall and J. F. van Staden, Anal. Instr., 20 (1992), 79. [107] N. C. Sundin, Automated Mapping of Response Surfaces for Continuous Flow Methods of Analysis, Masters Thesis, Dalhousie University, 1992. [108] P. M. Shiundu, Automated Methods Development in Flow Injection Analysis, Doctoral Thesis, University of British Columbia, 1991. 37 Chapter 2 Peak Shape Analysis I The more complex a system becomes, the more open it is to total breakdown. LEWIS MUMFORD Flow injection systems produce output usually in the form of a transient signal whose shape resembles that of a skewed-Gaussian. This is also true of other analytical methods such as chromatography and graphite furnace atomic absorption spectrometry. The magnitude of the response, its height or area parameter (also, peak width for flow injection analysis), provides a measure of analyte concentration. With signals such as these, however, additional information exists within the shape of the peak, or more accurately, parameters that measure aspects of the peak’s shape. In chromatography, for example, it is well-known that physicochemical information such as rates of adsorption [1, 2] and diffusion [3] can be derived from the moments of a peak. Such information facilitates quantitative evaluation and optimization of column efficiency and resolution [4,5]. Other peak parameters have been employed for solute identification [6]. The importance of peak shape in chromatography has led to the proposal of a standard set of peak shape parameters collectively referred to as “chromatographic figures of merit” [4]. In graphite furnace atomic absorption spectrometry, peak shape analysis has been used for evaluation of atomization Peak Shape Analysis I 38 characteristics in samples with a large matrix/sample ratio [7, 8]. Other workers [9, 10] have shown that the position and shape of signal can provide insight as to the accuracy of a determination. In contrast, only the standard deviation of the peak has been exploited in FIA for data interpretation so far [11, 12]: It was proposed as an alternative measure of dispersion to the empirical dispersion number for manifold comparison and evaluation. This is somewhat surprising in light of the fact that peak shape has long served as diagnostic information in FIA [13, 14], alerting the operator to faulty or non-optimal analyzer operation. For example, bifurcated peaks can indicate that insufficient reagent is available for reaction or a competing reaction is dominating due to departure from the desired chemical environment [13]. height alone is evident in the literature. The limitation of quantitation by peak While developing a method for sulfide determination with an automated flow injection system, Kester et a!. [15] reported that precipitation, which occurred at high reagent concentrations, hampered their attempts at automated optimization of chemical conditions with a simplex algorithm: the simplex, set to seek conditions of highest absorbance, drove the system to conditions that favored precipitation. Since precipitation resulted in excessively tailing peaks, this problem could have been detected by measuring the retarded decay of the peak. While it is true that a symptom, say, a bifurcated peak, does not always enable diagnosis of a unique physicochemical problem (the cause may be insufficient reagent, pH change due to inadequate buffering, a combination of both, etc.), at least the presence of a problem and its type could be identified. This is certainly preferable to no information at all. Moreover, knowledge about the chemistry employed and/or the operational characteristics of the analyzer can then be exploited to eliminate certain possibilities. For simple systems in which only a limited number of faults may exist, unambiguous fault identification may be achieved. Clearly, information is embedded within the shape of the peak and peak shape analysis should be employed routinely on automated analyzers for diagnostic purposes. Peak Shape Analysis I 2.1 39 OUTLINE OF PEAK SHAPE ANALYSIS STRATEGY Diagnosis via peak shape analysis involves post-detection data processing and so, problems are not anticipated, but simply recognized and accounted for. A strategy to implement such diagnostics on automated flow injection systems is outlined in Figure 2.1. The problem can be divided into two parts: i) fault detection and ii) fault identification/classification. The first step serves to screen the peaks: There is no need to perform the second when the peak is acceptable. This is desirable on analyzers employed for routine analysis since the occurrence of bad peaks should be relatively infrequent, otherwise one should consider another wet analytical method or redesign the method or analyzer altogether. Based on its role, it is important that the fault detection procedure be fast yet reliable. The second step is invoked only when an anomalous peak is flagged. For this work, a distinction is made between fault identification and fault classification on the basis of state-of-knowledge. The former involves the unequivocal association of a fault to a cause and hence, information that is unique, sufficient and accurate must be available. When the available information fails to meet these criteria, only the type of problem can be ascertained, which leads to the notion of fault classification. FAULT DETECTION Figure 2.1 FAULT IDENTIFICATION AND/OR CLASSIFICATION Schematic of the approach employed qualification. in a descriptor-based strategy for peak Peak Shape Analysis I 40 Effectively, this amounts to identifying the various types of peak shape, each associated with a class of problems. In either case, a preselected set of descriptors containing the relevant information is first extracted. This can be viewed as a data reduction procedure designed to remove unwanted information (i.e. noise). The aim is to obtain a concise and more meaningful description of the data from which the desired information can be accessed more readily. The set of descriptors proposed for a given application should cover all known symptoms, or faults if possible. They can be designed with the intent to identify specific peak attributes which arise as a consequence of a problem (e.g. tailing due to formation of precipitates in the sample zone), or they can be designed to be completely general, reflecting qualities in peak shape without regard to any particular problem. The design of the former type relies heavily on the experience (and imagination) of the analyst to ensure unique representation. For the latter type, it is difficult to predict a priori just which descriptors will distinguish one peak shape from another. It is also possible that more than one (primitive) descriptor will be required but what combination of descriptors (i.e. derived descriptors) is best may not be obvious. Consequently, powerful mathematical methods such as statistical pattern recognition or neural networks should be used to expose the relevant, empirical, discriminatory characteristics through a training process. Once the pertinent relationships have been established, the task reduces to polling all descriptors (primitive and derived) to locate deviant descriptor values. The end result is that a practical decision regarding the quality of analysis or instrumental/physicochemical conditions during analysis can then be made (e.g. “switch to quantitation by peak width”, “the extent of the problem calls for a full system shutdown”, etc.). 2.1.1 Peak Capture The first practical consideration that must be addressed is how the experimental peak is to be acquired. A popular method used in chromatography and other forms of Peak Shape Analysis I 41 flow analysis is slope sensing, in which the start of a peak is acknowledged by the algorithm when the slope of a finite number of points in a “sliding window” exceeds a preselected threshold. This method, however, is very sensitive to noise. In FIA, peak capture is facilitated by one of the three principles of FIA: reproducible timing. That is, the residence time of the peak depends only on the pump speed, geometry and dimensions of the manifold. Hence, one can always compute or experimentally measure the time from injection to detection and set the window for data acquisition accordingly. The computational method must be employed if different flow speeds are to be used (e.g. in optimization studies). 2.1.2 Fault Detection The notion of a fault, within the context of peak shape analysis, implies that a deviation in the shape of the peak from that which is obtained under proper analyzer operating conditions (ie. a reference) has occurred. Consequently, fault detection reduces to quantifying the extent of this deviation, in which case, some measure of dissimilarity (or similarity, depending on the viewpoint) between a given peak and the reference is necessary. The product-moment correlation coefficient, or simply correlation coefficient, is commonly used to measure the degree of association between two sets of data. It has also been used in pattern recognition to signify the degree of similarity [16]. This provides one method of fault detection: to decide whether a given (unknown) peak is anomalous or not, one can simply compute the correlation coefficient r between it and the reference: xkunk — /k=O (xkuflk where x is the variate and — Xkunk )(xkmf — Xkref) )2 — k,unk k=O (xkmf )2 — Xk,re( is its mean value. The subscripts “uni?’ and “ref’ signify the “unknown” and “reference” peaks, respectively. As vectors, both the unknown and Peak Shape Analysis I 42 reference data sets are normalized to unit length. The reference should be based on peaks obtained from the analyzer while an expert verifies that it is operating in an acceptable manner. Since each peak will differ (slightly) due to noise, a large number of peaks should be acquired, from which an average reference peak can be computed. The lower limit of the correlation coefficient may be then estimated and a threshold some multiple of the lower limit - - selected in accordance with performance specifications. Any experimental peak whose value of r, with respect to the reference, falls significantly below this threshold is thus deemed anomalous. The lower the threshold, the greater the chance of accepting an anomalous peak, resulting in a decrease in overall analytical reliability. The higher the threshold, the greater the chance of rejecting suitable peaks, resulting in a need to perform more replicates. The optimal balance can only be determined through extensive experimentation. It is worthwhile to view the correlation coefficient as it is used here as a binary threshold logic unit [17] descriptor. The presence of noise can greatly affect the estimate of r because as the variability of the data increases, the denominator in (2.1) increases and consequently, the value of r decreases. However, the peak itself (ie. the true signal) may still be acceptable. To minimize the effects of noise, it is important to filter the data prior to computation of r. Methods for automatic near-optimal filtering of peak data are presented in Chapter 4. Finally, by keeping track of the short-term mean correlation with the reference over time, long-term changes can be noticed. This is important in detecting drift (e.g. due to aging) and fouling problems of electrodes. 2.1.3 Peak Shape Descriptors Peak shape parameters serve as a reduced description of the entire peak and must be designed, singly or in combination, to retain the desired information. They may be classified according to their manner of implementation (model based or Peak Shape Analysis I 43 numerical), the form of representation (binary, qualitative, quantitative), the type of information obtained (e.g. peak asymmetry, peak bifurcation), and whether they are primitive or derived (ie. composite of two or more primitive descriptors). For the work presented in this dissertation, only quantitative descriptors are considered. The ideal descriptor is absolutely specific, invariant to noise and easy to compute. A tabulation of some descriptors which have been used in peak shape analysis is given in Table 2.1. Note that work in chromatography has been extensive in this area and only the most complete reference is given. A popular approach for describing peak-shaped transients is to treat the profile as a distribution function and describe it with statistical moments. This is because moments completely specify any peak [18], do not require assumptions regarding peak shape [3], are easy to interpret, and can be linked to fundamental processes (in chromatography [19] and FIA [20, 21]). Descriptors analogous to some moments have also been devised. For example, time of peak median, peak width at height a and empirical asymmetry correspond to time of peak maximum, standard deviation and skewness, respectively. Other descriptors have been designed to measure the time of occurrence of critical peak features and the decay rate of the peak. The process of descriptor extraction from peak data has followed two approaches: numerical evaluation and curve-fitting. more general of the two. The former is the simpler and However, with few exceptions, the accuracy of descriptor values obtained via this approach strongly depends on the quality of the data. For example, it has been shown that numerical evaluation of statistical moments depends on digitizing rate, integration interval and signal-to-noise ratio [22]; high order moments, I e. third and above, are especially prone to inaccuracy since they depend increasingly on points furthest from the center of gravity. Hence, the quality of information must often be enhanced first via filtering. To circumvent such problems, some workers have chosen to fit the peak profile to a mathematical model, from which the relevant parameters could then be extracted. Peak Shape Analysis I Table 2.1 44 Some time-domain descriptors used for peak characterization Descriptor Field a Reference C, F C,F C C C GA GA GA GA GA C, F, GA C, F, GA C C [4, 13] [4,13] [4, 8, 10] [4] [4, 8, 10] [7, 8, 10] [10] [10] [10] [10] [4, 8, 12] [4, 7, 13] [4] [4] C, F, GA C C, F, GA C GA [4, 8] [4] [4, 8, 24] [4] [7] Primitive peak height peakarea (time of) peak mean (peak centroid) (time of) peak median (time of) peak maximumb time of peak appearance time to one-half of peak maximum time to e1 of peak maximum time to e2 of peak maximum (time of) peak end standard deviation peak width at height a empirical asymmetry th central statistical moment (n> 2) Derived skewness specific asymmetry kurtosis or excess theoretical plates ratio of peak area to peak height a) C = chromatography, F = flow injection analysis, GA = graphite furnace atomic absorption spectrometry b) Equivalent to retention time for chromatography or residence time for flow injection analysis The extraction of descriptors from a peak model, also referred to as a shape function [4, 23], is desirable for the following reasons. i) The procedure is more robust to noise. Indeed, curve-fitting is perhaps the best form of smoothing. ii) Analytical expressions may be derived for descriptors that are not model parameters [4]. Not only are computations quicker but one can increase the accuracy of the estimation of such descriptors if they were susceptible to noise. This involves finding expressions which relate these descriptors to model parameters that could be obtained with high accuracy and precision [5]. Many shape functions have been used, depending on the type of data [6, 24]. The effectiveness of this approach, however, depends on how well the model can account for the physical factors contributing to the signal. A poorly chosen model can introduce significant bias. The consequences of using the wrong model for Peak Shape Analysis I 45 peak characterization are well recognized [4, 25, 26]. Furthermore, the models proposed thus far are non-linear (i.e. in the parameters). Hence, the least squares solution cannot be determined analytically and a search must be conducted over the parameter space a procedure subject to location of false minima. In addition, starting - values of the parameters must first be estimated; the usual approach has been to employ simple parameter search routines [3] but this increases computational requirements. For the type of application presented here, namely diagnostics, a single model cannot be expected to cope with all problem peaks. Furthermore, a model that fits the data from one analyzer may not be adequate for another. The use of complex non-linear models is burdensome when every analyzer must be treated case by case. It is apparent that the difficulties encountered with the model-based approach lie in the type of models used. Often, one can obtain a very good approximation to a non linear function with one which is linear. squares problem immensely. Use of linear models simplifies the least For peak shape identification, an empirical model is justifiable since the parameters of the model are not to be linked with fundamental processes in FIA. A convenient class of models are polynomials, which take the following form: f(t) = Fq5(t) (2.2) in which ç (t) are polynomial functions of order n, and F are the corresponding 5 parameters. Equation (2.2) implies a series of models successively increasing in complexity. A series of models overcomes the lack of versatility in using only a single model. Use of the parameters of this type of model as (general) descriptors for peak shape analysis is investigated in the next chapter. Further details on this approach are deferred until then. Peak Shape Analysis I 2.2 46 PATTERN RECOGNITION As stated above, descriptors which describe general aspects of peak shape often require powerful data analysis strategies to uncover practical relationships which ideally would uniquely link a peak shape with a range of values for a single descriptor or a pattern (see below) for a composite descriptor. The latter represents the general case. In this instance, the data will be multivariate and a multivariate strategy is thus preferred if the relationships are to be obtained effectively and efficiently. A statistical pattern recognition approach has been adopted for this work but other methods in the artificial intelligence arsenal could also have been selected (e.g. neural networks, expert systems). This section describes the tools used to analyze and evaluate the polynomial model-based general descriptors to be developed in the next chapter. Pattern recognition is concerned with the discrimination or classification of a set of things, events or processes [27], which collectively will be referred to as objects to simplify discussion. In this dissertation, the objects of interest are FIA peaks and in terms of diagnostics they may fall into such categories as proper, bifurcated, skewed, etc. Pattern recognition is natural to humans and we perform it extremely well when the data is of limited dimensionality. The extraction of descriptors from an object is in fact the first step in the solution of a pattern recognition problem. These are then used to identify two or more categories. As mentioned above, there is no guarantee that the descriptors proposed for a particular identification or classification problem will be the most suitable, nor will they all be equally useful. Once a given set of descriptors have been proposed, however, techniques have been developed to extract a better subset of descriptors from the available set. Typically, these determine the acceptance of a descriptor based on its ability to discriminate between different categories of objects. Various algorithms are then available to handle the pattern recognition problem with these descriptors. They differ in the way the descriptors are treated (i.e. the algorithm) and in the measurement of “similarity” between objects. One of the more popular of these, principal components analysis, is used in this dissertation. Peak Shape Analysis I 47 2.2.1 Data Representation and Structure A set of M descriptors extracted from an object form a pattern which is represented in mathematical terms as a vector: = 1 x {x 2 (2.3) XMl ... where x , x 1 2 etc. represent individual descriptor values or variables and the superscript 1 The term ‘pattern vector” is associated with it and there is one T denotes transpose. formed for each object. Geometrically, x may be viewed as a point in M-dimensional space. With N objects, the pattern vectors are commonly arranged such that each one occupies a single row in a matrix. 11 X 12 X ... M 1 X (2.4) XNI 2 XN ... XNM A human analyst, in analyzing bivariate data sets, can often gain valuable insight into the association between objects (and variables) by examining the structure of the data via a scatter plot [28]. The proximity of objects when plotted in the plane of the descriptors can be indicative of the similarity or difference between them. Hence, one set of similar objects should cluster in one part of this plane while another set, different from the first should cluster in another part. In other words, objects with similar patterns will group in a particular region of the descriptor space. The distance between objects is one of many possible measures of similarity. Thus, the variation between patterns carries the information necessary for discrimination. This has intuitive appeal and is precisely the assumption under which many pattern recognition techniques operate. Object classification then reduces to a partitioning of this space. While such a task is easily accomplished in two or even three dimensions by humans, one often has to deal with scatter plots of (much) higher dimensionality, in which case adequate display is difficult and a human analyst becomes ineffective. This is one of Vectors will follow the column convention. Peak Shape Analysis I 48 the problems which pattern recognition methods were developed to solve. Lastly, it should be stressed that the information sought must be implicitly contained within the No pattern recognition algorithm, however powerful, can extract data matrix. information that is not there. 2.2.2 Data Preprocessing Rarely is the raw data itself suitable for direct input into a pattern recognition method. There may be missing data, differences in measurement scale, redundant variables or outliers. Thus, some form of data preprocessing is necessary. Data preprocessing involves any transformation of the original data [29]. This definition is rather broad and encompasses such functions as filling in missing data, removal of constant descriptors, numerical transformations and descriptor weighting [30]. The last two are touched upon in this section. One common preprocessing task involves the removal of any inadvertent weighting of descriptors due to differences in the magnitude of measurement scales. For example, a descriptor whose values ranged from 0 to 1000 units will dominate those whose values ranged from 0 to 1. This problem arises directly from the fact that variation translates to information content. Each descriptor can be placed on an equal footing by scaling. The most popular technique is autoscaling. Here, descriptors are transformed such that each has a mean of zero and a standard deviation of one (i.e. each is mapped to standard units or z-scores). Mathematically, this operation is given by [30]: x’ where Xm — XkmXm I N-I _m)2 1JNl is the autoscaled descriptor for the kth object and mth descriptor, x is the raw descriptor, m is the mean value of the mth descriptor and N is the number of objects. Other scaling methods are available [30-32] and the choice will depend on the Peak Shape Analysis I particular pattern recognition problem. 49 Though not suitable for all types of data, autoscaling is generally recommended in the absence of available information that would preclude its use [30]. Descriptor weighting is a way of assigning importance to variables in an attempt to improve classification. Of course, this means that suitable information (i.e. representative objects of known category) must already be available so that meaningful weights can be assigned. An appropriate measure of descriptor importance would be its discriminating ability in terms of category separation. One such measure is the Fisher weight. For a given variable x and two categories, denoted as a and b, the Fisher weight b) is defined as the difference in the category means over the wF(a, sum of the variances of each category: w(a,b) = Ia bI (2.6) b The Fisher weight can assume values from 0 (no separation) to (infinite separation). A pictorial representation is given in Figure 2.2. Where more than two categories exist, an average weight can be computed over all pairwise combinations of categories: w = 1 -—wF(i) (2.7) IVC 1=1 where N = K(K 1)12, with K being the number of categories. An application of the - Fisher weight to measure discrimination is described in Chapter 3. A non-parametric equivalent to the Fisher weight has been proposed and used by Soulsbury et a!. [33]: it uses the median and asymmetric spread quantitators. As stated above, methods exists for pruning a set of descriptors down to a smaller but more discriminating set. The Fisher weight can also be used in this capacity. The problem with using this measure alone in selecting descriptors is that certain descriptors may be highly correlated. Therefore, it cannot be used, in general, to find the most discriminating combination of descriptors. Peak Shape Analysis I a 50 b x Figure 2.2 Frequency distributions of two categories a and b for variable x. The Fisher weight uses the distance between the means of the two distributions scaled by the sum of the variance of both distributions as a measure of discrimination. 2.2.3 Principal Components Analysis The power of (bi- and tn-variate) scatter plots for analyzing data was mentioned above. If the dimensionality of the data can be reduced to two or three, the power of human pattern recognition may again be exploited to extract the relevant information. The trick, of course, is to accomplish this without significant loss of information. In doing so, descriptors that are non-discriminating may be identified and removed from analysis. A significant advantage of such an approach is that the underlying structure in a set of inter-related variables can be explored without the constraint of any preconceived structure. Hence, the procedure requires no supervision. Mathematically, dimensional reduction (of the data matrix) is carried out by projection. To illustrate the approach, suppose the objects initially resided in two dimensions (ie. in a plane) as illustrated in Figure 2.3. A one-dimensional representation may be derived by projecting them onto a line. In theory, an infinite number of lines exits within the plane and any one could be used. Two such are shown in the figure. Clearly, a projection onto I preserves the integrity of the data structure more so than a projection on 12: the separation of the data into two clusters is still evident in I. Note that a projection on the axis of one descriptor results in the complete removal of the other from the analysis. Line I, as shown, has the desirable property that it preserves the maximal variation, ie. information, among the data along its Peak Shape Analysis I 51 ‘2 2 x ‘1 1 x Figure 2.3 A graphical interpretation of principal components analysis. Two-dimensional data are projected onto a line I such that the maximum variation in the data is preserved. direction (and minimal variation around it). This in fact defines a principal component and it forms the basis of a technique known as principal components analysis (PCA). It is evident that a principal component is formed from a linear combination of the original variables: =J3x + 1 2 ax (2.8) where a and /3 are the coefficients, whose magnitudes indicate the relative significance of the associated variables. Principal components are sometimes referred to as latent variables. This procedure can be generalized to M dimensions in which case, a principal component is a linear combination of M variables. component can be constructed if desired. A second principal It is chosen to be orthogonal to the first principal component and to explain as much of the variation remaining after extraction of the first. With two such components, a plane is defined. Subsequent production and inclusion of a third principal component allows a three-dimensional space to be defined. It is apparent that the entire procedure amounts to an orthogonal rotation of the coordinate system such that the original variables are transformed into a new set of Peak Shape Analysis I uncorrelated variables. 52 As many components can be constructed as there are variables, in which case no information is lost. However, in many applications, the majority of the variation can be explained by the first 2 or 3 components; The rest can usually be dismissed as “noise”. descriptors is realized. Thus, an apparent reduction in the number of Clearly, PCA is adept at modeling the data structure. Unfortunately, it does not guarantee that different objects would be discriminated [34]. However, a survey of the literature reveals that PCA often succeeds (see [35] and references therein). The underlying mathematical theory of PCA can be found in the literature [35-38] and only a general account is given here. The distribution of information is contained within the variance-covariance (or the correlation) matrix, which is defined as: C,= (2.9) N12NN where c4 is the covariance between variables i and j, given by — kxk,—x,)(x,,J—xJ) U- N-I (2 10) Principal components are determined by diagonalizing this matrix; in other words, one solves the equation: Cv=2v (2.11) The vector v (of which there are M) determined thus is called the eigenvector and corresponds to a principal component; A is called the eigenvalue and its relative magnitude corresponds to the amount of variation explained by the associated eigenvector. The coefficient of a variable for a principal component is known as the loading of the variable for that component [39]. The value of an object for a principal component is its score for that component [39]. Peak Shape Analysis I 53 In many ways, PCA forms the basis of many pattern recognition strategies (e.g. SIMCA [40]) since it removes redundancy and maximizes information content. Indeed some authors regard it as a data preprocessing method [35]. Finally, provided that the distributions of the raw data are not excessively skewed, truncated or multimodal, most kinds of distributions can be used with PCA [41]. Peak Shape Analysis I 54 REFERENCES [1] E. Grushka, Methods Protein Sep., 1(1975), 161. [2] S. D. Mott and E. Grushka, J. Chromatogr., 148 (1978), 305. [3] S. N. Chesler and S. P. Cram, Anal. Chem., 45 (1973), 1354. [4] J. P. Foley and J. G. Dorsey, Anal. Chem., 55 (1983), 730. [5] W. W. Yau, Anal. Chem., 49 (1977), 395. [6] S. D. Mott and E. Grushka, J. Chromatogr., 126 (1976), 191. [7] M. Michaelis, W. Wegscheider and H. M. Ortner, J. Res. Nat. Bur. Stand., 93 (1988), 467. [8] W. Wegscheider, L. Jancar, M. T. Phe, M. R. A. Michaelis and H. M. Ortner, Chemom. Intel!. Lab. Syst., 7 (1990), 281. [9] W. B. Barnett and M. M. Cooksey, At. Absorpt. News!., 18 (1979), 61. [10] J. M. Harnly, J. Anal. At. Spectrom., 3 (1988), 43. [11] M. A. Hernandez-Torres, M. G. Khaledi and J. G. Dorsey, Anal. Chim. Acta, 201 (1987), 67. [12] [13] S. H. Brooks and J. G. Dorsey, Anal. Chim. Acta, 229 (1990), 35. J. Ruzicka and E. H. Hansen, Flow Injection Analysis, 2nd edn., John Wiley and Sons: New York, 1988. [14] W. E. Van der Linden, Anal. Chim. Acta, 179 (1986), 91. [15] M. D. Kester, P. M. Shiundu and A. P. Wade, Talanta, 39 (1992), 299. [16] D. L. Massart, B. G. M. Vandeginste, S. N. Deming, Y. Michotte and L. Kaufman, Chemometrics: a Textbook, Elsevier: Amsterdam, 1988, pp. 372-375. [17] Ibid., p. 395. [18] E. Grushka, M. N. Myers, P. D. Schettler and J. C. Giddings, Anal. Chem., 41 (1969), 889. [19] D. A. McQuarrie, J. Chem. Phys., 38 (1963), 437. [20] K. R. Harris, J. Solut. Chem., 20 (1991), 595. [21] S. H. Brooks, 0. V. Leff, M. A. Hernandez-Torres and J. G. Dorsey, Anal. Chem., 60 (1988), 2737. [22] S. N. Chesler and S. P. Cram, Anal. Chem. 43 (1971), 1922. [23] P. A. Aarnio and H. Lauranto, Nuci. Instr. Meth. Phys. Res. A, 276 (1988), 608. [24] H. C. Smit and P. J. H. Scheeren, Anal. Chim. Acta, 215 (1988), 143. [25] R. E. Pauls and L. B. Rogers, Sep. Sd., 49 (1977), 625. Peak Shape Analysis I [26] [27] 55 J. J. Kirkland, W. W. Yau, H. J. Stoklosa, C. H. DiIks, J. Chromatogr. ScL, 15 (1977), 303. K. S. Fu, in Digital Pattern Recognition, 2nd. edn., K. S. Fu (ed.), Springer-Verlag: NewYork, 1980, 1. [28] W. S. Cleveland, The Elements of Graphing Data, Wadsworth: Monterey, 1985. [29] M. A. Sharaf, D. L. lllman and B. R. Kowaiski, Chemometrics, Wiley: New York, 1986, p. 188. [30] Ibid., pp. 191 -1 94. D. B. Sibbald, P. D. Wentzell and A. P. Wade, TrAC Trends Anal. Chem., 8 (1989), 289. [31] [32] I. H. Brock, 0. Lee, K. A. Soulsbury, P. D. Wentzell, 0. B. Sibbald and A. P. Wade, Chemom. Intel!. Lab. Syst., 12 (1992), 271. [33] K. A. Soulsbury, A. P. Wade and D. B. Sibbald, Chemom. Intel!. Lab. Syst., 15 (1992), 87. [34] M. A. Sharaf, 0. L. lilman and B. R. Kowaiski, Chemometrics, Wiley: New York, 1986, p. 239. [35] M. A. Sharaf, 0. L. Iliman and B. R. Kowaiski, Chemometrics, Wiley: New York, 1986. [36] E. R. Malinowski and 0. G. Howery, Factor Analysis in Chemistry, Wiley, New York, 1980. [37] I. T. Jolliffe, Principal Components Analysis, Springer-Verlag, New York, 1986. [38] D. L. Massart, B. G. M. Vandeginste, S. N. Deming, Y. Michotte and L. Kaufman, Chemometrics: a Textbook, Elsevier: Amsterdam, 1988. [39] Ibid., p. 352. [40] S. Wold, Pattern Recognition, 8 (1976), 127. [41] D. Childs, The Essentials of FactorAnalysis, 2nd edn., Cassell: London, 1990. 56 Chapter 3 Peak Shape Analysis II “... whenever someone thinks of a time domain approach to a problem, one should consider taking a Fourier transform of the time record of the sample and use that” EMANUAL PARZEN “Informal Comments on the Uses of Power Spectrum Analysis” As was discussed in the last chapter, descriptors used to quantify aspects of peak shape have traditionally been developed in the time domain. Duality between time and frequency means that descriptors may also be derived from an equivalent representation of the signal in the frequency domain. In fact, this argument can be extended to other transforms and hence, other domains that can represent the signal - - equally well or, preferably, with greater efficiency. Such a representation or spectrum, as it is generally known, may be used either directly (in which case, an array of descriptors is available) or as a platform for deriving descriptors within that domain. Only the former type of descriptors is explored here. Because they are not designed with selectivity for any particular peak attribute, they are very general. The transform approach is justified if the time-domain signal can be cast into a form in which i) a more facile extraction of the desired information is possible; ii) the signal information is contained within a few spectral components so that the volume of data is reduced; iii) enhanced generality/flexibility is provided; and/or iv) the effects of noise are reduced. This chapter is concerned with the use of transforms involving orthogonal Peak Shape Analysis!! 57 functions and the material falls under the realm of spectrum analysis. The advent of digital technology has brought orthogonal transform methods to the forefront for the processing of digital signals [1]. They have found use in approximation [2], data compression [1], filter design and filtering [3], and signal identification [4]. Indeed, signal approximation with regards to smoothing of (experimental) data is the focus of work described in Chapter 4. Various classes of orthogonal functions exists. These include trigonometric functions (e.g. sine, cosine and complex exponential [5]), polynomial functions (e.g. the classical Jacobi, Chebyshev, Gegenbauer, Hermite, Laguerre, and Legendre polynomials [6]) and rectangular functions (e.g. Haar and Walsh functions [1]). both continuous and discrete counterparts. All examples listed have The first two classes provide functions which are studied here: namely, the trigonometric set associated with the Fourier transform and the Gram and Meixner polynomials, which are discrete analogs of the classical Legendre and Laguerre polynomials, respectively. The Fourier transform implemented as the fast Fourier transform (FFT) is truly the workhorse of modern digital signal processing. It is already an established tool for signal analysis and identification and its ubiquity is punctuated by the quote of Parzen. The use of Fourier transform methods in analytical chemistry is summarized in a number of papers and monographs [7-10]. One noteworthy application in FIA is the work of van Nugteren-Osinga et a!. [II]. These authors treated each flow injection system’s component as an “electrical filter” to the “sample plug signal”. With this approach, a Fourier transform could be used to characterize the “filter’s” impulse response, thereby providing a means to predict the form of the detected peak from a series of convolutions of component impulse responses. Trigonometric functions, however, may not be the most appropriate, especially when the signal is of finite duration, and other functions may provide better performance [12], i.e. a comparable approximation with fewer terms. Interest in the use of other orthogonal sets, particularly orthogonal polynomials, dates back to the work of Peak Shape Analysis II 58 Wiener and Lee in the 1930’s, and has been summarized in a book by Lee [13]. In analytical chemistry, Gram polynomials have been used to derive general equations for Savitzky-Golay smoothing filter weights for one-dimensional [14] and two- dimensional [15] data arrays. They have also been applied extensively to aid in the spectrophotometric determination of compounds of pharmaceutical interest [16-19]. Hassan and Loux used them to correct for spectral interferences in inductively coupled plasma - atomic emission spectroscopy [20]. Debets et a!. [21] used an expansion in Hermite polynomials to derive a peak separation quality criterion for chromatography based on the first two coefficients. In principle, any arbitrary (square integrable) signal can be approximated by a weighted linear combination of orthogonal functions via a mathematical procedure known as generalized Fourier expansion [22]. together form the spectrum - The weights or coefficients - which represent the contribution of their associated functions to the make-up of the signal and form the information bearing quantities. In fact, if each function was thought of as a “shape” then the weights quantify the contribution of that “shape” to the overall signal. This idea was previously recognized and exploited by Glenn [23] and later by Scheeren et a!. [24]. Though the weights may not have any direct relationship to physical parameters, the generality of this method makes it convenient and attractive for analyzing physical signals for which no exact or practical mathematical expression is available [25]. A study into the representation of flow injection response curves by a generalized Fourier expansion in discrete orthogonal functions was conducted. The properties of the chosen functions were studied numerically and evaluated with simulated and real data. Their sensitivity to different FIA peak shapes and their robustness to noise were investigated multivariately using principal components analysis. Comparisons were made between the various transforms used. Peak Shape Analysis!! 3.1 59 THEORY Orthonormal functions have long been used in approximation theory and signal representation [13]. The best known is that of a Fourier series expansion in trigonometric functions. The usefulness of this representation is its natural association with frequency. Because of its prominent position in modern spectral analysis, the Fourier transform is inevitably a standard against which other transforms are measured. For these reasons, it is included in this study. Since digital data are being analyzed, the discrete Fourier transform (OFT) was used. Another convenient class of functions are orthogonal polynomials. In fact, the trigonometric set has been referred to as trigonometric polynomials [26] For consistency with the OFT, discrete orthogonal polynomials were selected for study. Of , the Gram and Meixner polynomials were chosen. 1 the many available Gram polynomials have previously been used in this way although the expansions reported [16-19, 23] were fairly limited (i.e. 5 terms or less). Thus, their potential was never fully determined. In contrast, Meixner polynomials have not been previously applied in analytical chemistry. However, they have recently been used successfully for adaptive control and system identification [27-29]. From a control theory perspective, FIA signals resemble the response from a stable dynamic system when subjected to a pulse [30]. This is apparent when one considers the FIA manifold as a system that reacts to and (exponentially) modifies the sample plug. Such a connection has recently been recognized by some FIA researchers [11, 31]. Meixner polynomials have been shown to be well-suited exponential [27-29]. for approximating transients bounded by a decaying According to proper curve-fitting practice, the basis functions chosen should be similar to the function being approximated so that an efficient approximation is achieved [32]. In this context, the Meixner polynomials are potentially an ideal choice. 1 Consult Szego [33] for a comprehensive list. Peak Shape Analysis!! 60 This section outlines the theory of signal representation by orthogonal functions and a description of the three sets of functions used. The general theory will be presented in terms of continuous functions. To obtain the discrete forms, integration simply has to be replaced by summation and the continuous independent variable replaced by an index. We will further require that the function to be approximated is band-limited (he. have only a finite number of frequency components) and sampling conversion from continuous to discrete form - - has been conducted according to the Nyquist theorem [34] such that aliasing is prevented: the discrete representation is unique. Indeed, the signals constructed or experimentally obtained for this work are heavily over-sampled. 3.1.1 Definition of Orthogonal Functions A set of functions {q5(t)} of order n = 0,1,2,... is said to be orthogonal over an interval [a, b] if the inner (or scalar) product (n(t),m(t)> 1n(t)c(t)dt (3.1) = where ç(t) is the complex conjugate of qS(t), amn is a constant and mn 8 is the Kronecker delta operator, fI ifm=n =10 otherwise When = (3.2) I for all n, the functions form an orthonormal set. If the functions are real, the conjugate operation * is dropped. Any orthogonal system can be made orthonormal by replacing q5(t) with ç(t) = i,J(qS(t), q5(t)) 1 qS(t) (3.3) Equation (3.1) is defined for those sets [qS(t)} which exist in the Lebesgue space of square integrable functions L , i.e. 2 Peak Shape Analysis!! 2 fIqSnt)1 = f q5(t)q5(t)dt <oo 61 (3.4) The functions of an orthogonal set or any subset are necessarily linearly independent, that is (3.5) only if the coefficients f = 0 for all n. Any linearly independent set of functions can be orthogonalized with respect to the scalar product (e.g. via the Gram-Schmidt orthogonalization method [35]). n = 0, For example, orthogonalization of the set qS(t) = t’, 1, 2..., over [-1, 1] yields the Legendre polynomials. 3.1.2 General Linear Least Squares Approximation Before a description of function (signal) representation by a generalized Fourier expansion is given, it is beneficial to first examine the process of linear least squares approximation to which the former is intimately connected. The general form of the linear least squares model of order (N 1) is given by - 1(t) = (3.6) fqS(t) where f(t) is the approximation to a function f(t), {ç(t)} are arbitrary (complex) fixed functions referred to as the basis functions, f is a (complex) scalar and N > 0 is an integer. The definition of least squares requires that the integral of the squared errors between the model and f(t), b N—I 2 dt (3.7) 62 Peak Shape Analysis II be minimized. This occurs when the derivative of E with respect to all parameters , vanishes. The solution yields the so-called normal equations: 2 amnfn=/3m, m=0,...,N—1 (3.8) where a,,,, 5n(t)fm(t)dt =f 9 and 13m f1r(t)fm(t)1t (3.9) There are N linear equations in N unknowns which must be solved simultaneously. Provided that the function f(t) is continuous in [a, b], the normal equations will always have a unique solution [37]. It is apparent that if the functions çS(t) are orthogonal, as per (3.1), then am,, for m = 0 n, and the normal equations reduce to the simple form: n=0,...,N—1 (3.10) This is precisely the form for a generalized Fourier expansion of a function. It is clear that the least squares problem is greatly simplified by this approach. Furthermore, if the orthogonal basis had been derived from a linearly independent basis, the approximations computed from each are identical. Solution of (3.10) is not prone to the numerical instability which can arise when solving (3.8). This aspect is treated further in sub-section 3.1.7. 3.1.3 Generalized Fourier Expansion Any arbitrary function f(t) that is i) square integrable 3 on [a, b]; and ii) contains a finite number of relative minima and maxima, and a finite number of discontinuities on [a, b], can be approximated by an expansion in orthonormal functions ç(t) as 2 One derivation can be found in [36]. In most texts dealing with Fourier series expansions, the requirement has been that f(t) is absolutely integrable on [a, b]. It is clear that this condition holds when f(t) is square integrable. Peak Shape Analysis!! f(t)=Fç(t) 63 (3.11) The coefficients are given by [38] = ff(t)co:(t)dt (3.12a) or ff(t)0(t)dt (3.12b) q(t)qS(t)dt Note that the coefficients F are defined with respect to q(t), which are orthonormal while those in (3.10) are defined with respect to ç5(t), which are orthogonal. Unless stated otherwise, the work presented here will use the former. The conditions imposed on f(t) are known as the Dirichlet conditions [39].4 As shown above, the coefficients so derived are optimal in the least squares sense. In fact, for orthogonal functions, the error E takes on a convenient closed-form expression [41]: dt_FI Eff(t)I 2 (3.13) Since E is non-negative by definition, it follows that the second term converges. From this, one arrives at Bessel’s inequality: dt 2 ff(t) (3.14) If the equality or Parseval’s theorem, ‘ The first Dirichiet condition is only a sufficient and not necessary condition for the existence of a general Fourier expansion. There are functions which do not meet this condition but still have a Fourier representation. However, most are mathematical contrivances which rarely occur in practice [40]. Peak Shape Analysis!! II2 ff(t)I dt 2 64 (3.15) holds for every square integrable function f(t) in L , then the set [ç,(t)} is said to be 2 closed in [a, b] in which case E—*0 as n—*c-o and the approximation is said to converge in the mean to f(t). If E is in fact equal to zero (in the limit), then f(t) - f(t)I vanishes almost everywhere on [a, b]. In L , every closed orthogonal system is also complete, that is I) there exists no 2 non-trivial function f(t) such that F = 0 for all n = 0,1,2,...; or ii) for any piecewise continuous square integrable function f(t) and an arbitrary value of 6> 0, there exists a positive integer N such that: b N—I 2 (3.16) dt<8 Some points need to be made now. First, virtually all real-world signals such as flow injection peaks satisfy the Dirichiet conditions [39]. Second, the generalized Fourier expansion is a linear transformation, i.e. ax(t)+by(t) (3.17) aX(a)+bY(a) where x(t) and y(t) are the time domain sequences, X(a) and Y(a) are their respective transforms (here, the parameter a is taken to be a variable in the domain of the transform), and a and b are constants. Third, the function can be approximated by the series to any desired degree of accuracy by increasing the number of terms used. In fact, the total error is decreased by an amount IlI2 for every th term added. From (3.12), each coefficient F is seen to be computed independently of any other. The computational advantage is obvious: if a greater precision is subsequently desired, only the added terms need be evaluated and the work which has gone into meeting the previous accuracy is not wasted. This also means that it is possible to obtain the 65 Peak Shape Analysis!! spectrum over only the region of interest. Finally, convergence in the mean is also less demanding than other forms of convergence such as that of a Taylor series. former only requires that E —*0 as n —* , The whereas the latter also requires that the derivatives of the error tend to zero which is more difficult to achieve. In theory, any set of orthogonal functions may be used to represent a function provided that they are complete over the range of that function. Those selected for this work meet this condition and are now described. Discrete notation will necessarily be used since the functions are defined over a discrete variable. 3.1.4 Discrete Complex Exponential (Fourier) Functions As noted above, the complex exponential set is one of the most important of all orthogonal sets. The discrete version is defined as [42] qS [k]= ”” 2 e’ n = 2 (3.18) 2 where k = 0, 1,..., (N 1) is the time index, N is the number of points and] = Note - that the order n of the function is defined over negative as well as positive integer values. Hence, (3.11) takes on a symmetric form, i.e. f(t) = N/2 N/2 Foe’2nnk/N . The orthogonality property is (c5n[k],q5m[k]) = (3.19) By use of Euler’s relationship: ejx = cos(x)+]sin(x) (3.20) the basis can be re-written in terms of cosines (real part) and sines (imaginary part): ç5[k] = cos(2,znk IN)— ]sin(2,znk IN) (3.21) The sine and cosine terms, individually, are orthogonal. The first five basis functions (real and imaginary) are shown in Figure 3.1. A description of the continuous form is given in Appendix A. Peak Shape Analysis II 66 I a) V 4-. E Cu —I N-I 0 data point Figure 3.1 First five exponential functions associated with the Fourier transform. Both real (solid line) and imaginary (dash-dot line) functions are plotted. Circled numbers indicate sequence. Since the functions are complex-valued, Fourier expansion leads to a set of coefficients F which are also complex-valued. It is more practical to use the magnitude and phase (or polar) representation: (3.22) where = iRe(j +lm(f) 2 (3.23) is the magnitude of F, Re(F) means, take the real part of F and lm(F) means, take the imaginary part; and e= arctanIlmU’l Re( F)) (3.24) is the phase of F. For real functions, F.. = so that only half the coefficients are of use for spectral characterization. In fact, the positive and negative frequency components will be combined such that C,, = {II+IFI]/’/ for n 0, where C, are the spectral Peak Shape Analysis II coefficients used for identification. 67 The factor 1I[ is needed so that Parseval’s theorem is still observed, Of course, C 0 = IF. 3.1.5 Gram Polynomials The Gram polynomials arise naturally in the fitting of discrete data by least squares [22]. They are defined as (3.25) k,n=O,...,N—1 where A is the forward difference operator Aq5[k] = q[k +1]— 9 5[k], b[k] = AfAq5[k]} 1 A (3.26) b <a (3.27) and 1.b) a! b’(a b)! 0 — represents the binomial coefficient. Gram polynomials are orthogonal over the interval [0, N 1]. It is convenient to use an odd number of points and redefine the interval - such that the midpoint of the range is zero. Thus, with N = 2M + 1, the un-normalized expression is given explicitly by [2] M k — (—1)’’(i + n) (M + k)’ 21 (2M)” 2 (i!) 3 28 where the superscript is used to denote the integral half-interval and (a)(b) is the generalized factorial function: (a) = (3.29) (a—b)! The orthogonality property is [2] (q5’[k], [k]) (2M+n±1)!2M— n)! = ömn (3.30) Peak Shape Analysis!! 68 The Gram functions form a complete set which spans a space of N dimensions. As N —* , they become the Legendre polynomials [22], whose properties are documented in Appendix B. The first five Gram functions are plotted in Figure 3.2a. While those shown all have their amplitude within the range [-1, 1], this is not always the case. Only when a sufficient number of points are used, which depends on and increases with the order of the polynomial, does the amplitude completely reside within [-1, 1]. Hence, unlike the discrete trigonometric functions, the shape of the Gram polynomials shows some dependency on the number of points. Figure 3.2b for the quintic. This is depicted in Deviations with respect to the corresponding Legendre polynomial reach a maximum at the peaks and valleys of the Gram polynomials. Given the same number of points, these deviations increase with polynomial order. The un-normalized series is conveniently generated with the following recurrence relationship [2]: [k] M q5 with q5[k] = = 2(2n—1) (n_1)(2M+n) [k]— 1 M k6 “ n-2 [k] n(2M — n + 1) n(2M — n + 1) (3.31) 1. The second term is omitted when computing b[k]. 3.1.6 Meixner Polynomials Gottlieb [43] was the first to treat the following set of Laguerre-like polynomials which are orthogonal over the interval [0, co): k,n=O,...,oo (3.32) where e is a weighting function associated with the polynomial and 2> 0. The explicit expression is given by = b[k] (333) Peak Shape Analysis!! a) 69 I a) V 4- 0. E 0 —1 N-I 0 data point b) I a) V 4- 0. 0 E a) —1 0 I normalized data point Figure 3.2 where 9 a) First five un-normalized Gram polynomials. Circled numbers indicate sequence. b) Change in amplitude of the quintic as a function of the number of points: N = 65 (dotted), 129 (dash-dot) and 1025 (solid). Note that the abscissa for graph (b) is normalized to one. = . The parameter 2 e- 19 e [0, 1] is known as the discount factor and has the effect of increasing the rate with which the function converges to zero [3]. As 9 goes to one, the functions vanish. The orthogonality property is (ç5[k], qS,[k]) = (1 9) mn (3.34) Peak Shape Analysis!! 70 and the recurrence relation is = ‘+ (n + i± (9— ‘1)k c5[ki — (3.35) [k] 1 gS A set of functions of a type referred to as Meixner polynomials [45], is related to g5[k] as follows [3]: = where ‘= (_1)f 1 ;kc5n[k] (3.36) c [0, 1], is referred to as the time scale. complete set which span a space of infinite dimension. shown in Figure 3.3 for two values of These functions form a The first five functions are It is seen that not only serves as a time scale compressor/expander but also affects the shape of these functions. The Meixner functions can be synthesized from the recurrence relation (3.35). However, King and Paraskevopoulos [3] have developed an alternative based on a discrete ladder filter network composed of a single low pass filter and successive all pass filters each possessing a fixed phase shift. Each basis function is generated as the impulse response at each node of the network. The computational theory and details are too involved to allow a brief yet adequate description here; consequently, the reader is directed to the original publication. A program is given in Appendix D. The Meixner polynomials are semi-infinite. Since the data collected are necessarily finite, the Meixner functions must be truncated prior to use. Strictly, the functions will no longer be orthogonal over the complete set and general linear least squares should be used to obtain the spectrum. However, since the time scale affects the rate at which a Meixner polynomial decays to zero, a judicious choice of can maintain the orthogonality property, in the numerical sense, over a subset of Meixner polynomials for a given number of points (i.e. time duration). Specifically, the time scale can be used to “stretch” or “compress” these functions such that the highest order function in the set decays to zero at or before the last point. This represents the Peak Shape Analysis II 71 a) G) 0. E b) 0.4 a) 0. E Co -0.4 N-I 0 data point Figure 3.3 First five Meixner polynomials with a) indicate sequence. (= 0.92 and b) = 0.95. Circled numbers critical time scale beyond which orthogonality breaks down, i.e. insufficient “time” has been allocated for some high order polynomial(s) to decay back to zero. This boundary depends on both the number of points and the highest order polynomial used. The greater the number of points used to define the Meixner polynomial, the higher is the critical value since the functions have more time to decay to zero. The higher the order in the subset, the lower the critical value since each successive function requires more time to decay to zero. Two simple methods to locate the boundary are proposed in this dissertation. For an infinite order expansion and a data array of infinite length, the choice of C is completely arbitrary. However, for a truncated series expansion in M terms over a Peak Shape Analysis II span of N points (where M < 72 N), the accuracy of the approximation and the rate of convergence will depend on the time scale [3]. The search for the optimum time scale for a particular (M, N) pair amounts to minimizing the sum of the squared error (SSE) between the approximation f[k] and the given function f[k]: SSE = f[k] — f[k] where the SSE is a function of (3.37) here. In practice, the signal (true function) may be hidden by noise and the raw signal (signal and noise) may have to be substituted, which often yields different results. In either case, this problem can be solved numerically by any number of one-dimensional minimization algorithms available (consult reference [45]). It should be apparent that the computational advantage mentioned above is absent here; namely, increased precision requires only the calculation of higher terms. The presence of a scale factor provides an additional degree of freedom which is certainly good for the purpose of approximation but its usefulness for signal classification requires study. 3.1.7 Computation of Expansion Coefficients by Least Squares In general, least squares is required to compute the coefficients when the basis functions are not orthogonal. This section serves to explain the computational aspects of least squares and is meant to complement the discussion in sub-section 3.1.2. It was made necessary i) since the Meixner polynomials may not be orthogonal after truncation and ii) to introduce the condition number which can be used as a criterion for determining roughly the value of at which orthogonality breaks down numerically. The discrete form of the normal equations (3.8) are: amnf,=fim m=O,...,N—1 (3.38) Peak Shape Analysis II 73 where a,,,,, =q5[k]f,,[k] and /3m =f[kWm[k], (3.39) f[k] is the discrete function to be fitted and çS,,[k] are the basis functions. Let the design matrix A of dimension N x M be defined such that the basis functions are arranged as its columns, i.e. = (3.40) ç5[k] then in matrix formalism, the normal equations become (ATA)a = ATf (3.41) where superscript T is the transpose operator and f is f[k] expressed in vector form. Since the matrix representation is more compact, it will be adopted, where possible, for the rest of this chapter. Rearrangement of (3.41) yields an expression for the coefficients (3.42) a=A f 1 where 1 A = (ATA)IAT (3.43) is known as the pseudoinverse (for N> M). Direct solution via the normal equations is not generally recommended because it is susceptible to round-off errors. This arises because the matrix product ATA (i.e. the correlation matrix) may be very close to singular. Fortunately, an alternative representation for the pseudoinverse may be cast in terms of singular value decomposition. Singular value decomposition (SVD) decomposes an N x M matrix A, where N M, into three matrices A=UWVT (3.44) Peak Shape Analysis II 74 where U is an N x M column orthogonal matrix, W is a diagonal M x M matrix with non negative elements - the singular values, and V is an M x M orthogonal matrix. Furthermore, UTU=VTV=VVT=I (3.45) where us the identity matrix. The decomposition can always be done no matter how singular the matrix is. It follows that the inverse of A is simply 1 A = VW’ UT (3.46) which can replace the pseudoinverse in (3.42). Hence, a = VW’U y T (3.47) Notice that (3.47) breaks down when one or more of the singular values equals zero. This provides a simple means to quantify the extent of singularity. The condition number of a matrix is defined as the ratio of the largest singular value to that of the smallest. A matrix is singular if its condition number is infinite and ill-conditioned if its reciprocal approaches the floating point precision of the computer. A perfectly conditioned matrix (e.g. the identity matrix) has a condition number of one. 3.2 EXPERIMENTAL 3.2.1 Simulation of Flow Injection Peaks The performance of descriptors is most readily evaluated initially with simulated data since their characteristics are known. A general account of the models used will now be given and these will also be used for work in the next chapter. The tanks-in-series model (1.5), provides a useful means for generating idealized flow injection peaks at different levels of dispersion. It is repeated here for convenience: CTS(t) = (NT— 1)! W() Nr1 t1 e Peak Shape Analysis!! 75 where the subscript TS is used to distinguish the concentration function c(t) from other expressions for concentration distributions to be given later. By changing the value of the number of tanks N, this model may be used to synthesize a series of typical flow injection peaks. Some of these are shown in Figure 3.4. It is obvious that peak asymmetry decreases and the peak broadens with an increase in NT. Vhen digital data are used, band broadening must be eliminated if a descriptor is to be evaluated as a function of peak asymmetry. In practice, this corresponds to the situation whereby the peaks must all decay to baseline within the (artificially constrained) acquisition time window. The problem can be solved by either i) changing the sampling period or equivalently ii) compressing the time scale of the peak prior to sampling. Furthermore, it is desirable to have all the peaks normalized to the same height to facilitate the addition of noise (i.e. when their effects are to be studied). Both these operations can be performed simultaneously if the mean residence time parameter T, is adjusted such that the model yields curves of equal maximum height for different values of N. In effect, Ti is the time scale compressor. It is a simple matter to show - via the first derivative followed by back substitution - that a) 4- Q. E time Figure 3.4 Selected tank-in-series curves showing the effect of Air Circled numbers identify the value of N.,. used. Peak Shape Analysis!! 76 the maximum value of a tanks-in-series curve is given by: )Nr-1 1 (N Cm = (N —1)! (3.48) 7e”T For example, if the curve from two tanks was deemed to be the reference (as it was for this work), then its value of 7, was set, say, equal to 5, which yielded a maximum value of 0.073576. Substitution of this value into (3.48) for curves for other number of tanks produced their required T, values. It was found that peaks treated in this way have comparable full width at half maximum (FWHM) values over the values of N used: 2 to 16. Finally, since (1.5) incorporates a factorial function, numerically, this method becomes increasingly unreliable with increasing NT. Full machine precision (here, 64- bits) is retained in (3.48) for values of N less than 17. A measure of the asymmetry for the models above is provided by skewness, which is defined as: S (3.49) = where p 3 is the third central statistical moment and o- is the standard deviation (see Appendix C). For the tanks-in-series model, this equation takes the simple form (derived in Appendix C): (3.50) S=i__ A Gaussian peak, obtained in FIA under conditions of high dispersion, may also be synthesized from the tanks-in-series model by setting NT = cX (i.e., very large). Obviously, it is more convenient to use the direct formula [46] CG(t) = Aexp.f _ln(2)[2(t 1)] (3.51) L An unsuccessful attempt was made by the author to derive an expression for FWHM as a function of the number of tanks Nr Peak Shape Analysis!! where A is the amplitude, F 77 is the peak position, and St is FWHM. The value of t was set accordingly so that symmetry was maintained after sampling and St was set such that it’s value was comparable to those for the tanks-in-series curves. Values of T, and S for a series of tanks-in-series curves are given in Table 3.1. Bifurcated peaks were simulated by a linear combination of the tanks-in-series model of two tanks with either a modified version of the Cauchy (or Lorentzian) function [46] or with the Fraser-Suzuki asymmetric peak function [46]. These peak shapes can represent the case of incomplete reaction. The Cauchy is given by c(t)= i + [2(t t) I — ôtj (3.52) 2 but has been modified by the author in the following manner for reasons of aesthetics: A CMC(t)— - (3.53) 2 exp(1+[2(t_t)ISty]) The Fraser-Suzuki function is given by Values for T 1 used and Table 3.1 S for selected tanks-in-series curves Label Tanks T, S Label Tanks 7, S A 2 5.000000 1.414 I 10 1.790745 0.632 B 3 3.678794 1.155 J 11 1.700422 0.603 C 4 3.045044 1.000 K 12 1.622516 0.577 D 5 2.655310 0.894 L 13 1.554421 0.555 E 6 2.384849 0.816 M 14 1.494237 0.535 F 7 2.183095 0.756 N 15 1.440542 0.516 G 8 2.025177 0.707 0 16 1.392248 0.500 H 9 1.897178 0.666 P a) Corresponds to Gaussian peak. — 0 Peak Shape Analysis!! 2b(t ln[ CFS (t) = t) — 2 A exp —In(2) 78 (3.54) b where b is the asymmetry factor. Three bifurcated peaks were generated and the equations for each are given in Table 3.2. Note that the tanks-in-series curve was normalized for peak height prior to the addition of the second term. A total of 19 different peaks can thus be synthesized; these span a range of peak shapes which can be found in FIA. Data lengths were varied from 65 to 8193 points over a 51.2 s time window. Unless stated otherwise, each peak was scaled to a “peak-to-peak” range of one. Experimental noise was simulated by the addition of normally distributed random values which were then scaled to the desired noise standard deviation (with respect to peak range). For ease of discussion, each peak will be referenced by a letter as indicated in Tables 3.1 and 3.2. Table 3.2 Formulae used for simulating bifurcated FIA peak shapes Label Case Q weak bifurcation to the left of peak maximum R S moderate bifurcation to the right of peak maximum extreme bifurcation at peak maximum Model formula 05c maxIcTSI rs 0 — I maxIcTSI -( 7 maxIc cMC Parameter values CTS: ,T= 2 NT= ; 5 CMC: A=1,t=4,St3.5 CMC: NT= ; 5 = 7 I 2 A=1t=8St,=4 CTS: N=275; CFS: A=1,b=O.3,t=4.5,3t,=3 CTS: — — Peak Shape Analysis!! 79 3.2 2 Experimental Data The effectiveness of orthogonal polynomial identification was also evaluated on experimental data from the reaction between Fe(ll) and I ,10-phenanthroline [47]. The manifold (Figure 3.5) and the data obtained were previously reported [48]. In addition to reagent concentration, the extent of reaction was also found to be sensitive to carrier pH under the conditions used [48] (ie. pH <2). Hence, certain combinations of these two factors will result in bifurcated peaks. The flow rate of 1,1 0-phenanthroline (reagent) was varied from 0.10 to 1.00 mLlmin in steps of about 0.225 mLlmin, and that of sodium acetate (the pH modifier) was varied from 0.00 to 1.00 mL/min in steps of 0.25 mLlmin. The reaction was monitored at 508 nm and 25 different peaks (consisting of 108 to 214 points) were collected for each flow rate combination. Note that the hydroxylamine hydrochloride stream was used to reduce any Fe 3 to Fe . 2 3.2.3 Data Processing To account for differences in peak magnitude, the vector of coefficients from the approximation was either i) normalized to L -norm (i.e. Euclidean norm or unit length) 2 over the spectrum, NFII IFnI2 (355) = M Fe 4 7OL1O 2 O.O1M 1,10-phenanthroline I mLlmin water 100 cm coil 508 nm 0.2M acetic acid 0.2M sodium acetate 0.01 M hydroxylamine hydrochionde Figure 3.5 Flow injection manifold used for the determination 1,1 0-phenanthroline. 1 cm pathlength flow cell and detector to waste of Fe(lI) by reaction with PeakShapeAna!ysisII 80 where F is the vector of coefficients; or ii) normalized to the sum of the values of the peak. The resulting spectrum from each is equivalent within a scale factor. The former has the advantage of simultaneously accounting for data length but will introduce a bias in the coefficient vector when noise is present, in which case the former should be used, assuming no negative excursions of the peak. For peaks with negative values (not arising from noise), a significant bias could result but at least such bias will be restricted to these peaks. Since one of the aims is to identify this type of peak, the bias is an aid in discrimination, albeit a non-spectral one. Normalization in the presence of noise remains a difficult problem to eliminate. The normalization method used will be explicitly stated at all times. Since the approximation is only valid over the duration of the signal, the peak must be extracted from the data record. treatment. Leading baseline Simulated data were used without further points were stripped from the raw Fe(ll) — 1,10-phenanthroline reaction data; a trailing baseline was precluded by the data acquisition routine. The subsequent data length was made odd if necessary (by dropping the last point) to account for the computational requirement in the implementation of the Gram polynomials. Finally, the data were autoscaled prior to multivariate data analysis by principal components analysis. 3.2.4. Computational Aspects Gram polynomials were generated via the recurrence formula (3.31). Meixner polynomials were synthesized by the method of King and Paraskevopoulos [3]. A golden section search routine [49] was used to locate the optimal Meixner time scale parameter. PCA was performed via singular value decomposition [50]. Normally- distributed random number sequences were generated with MATLAB (version 4.Oa, The Mathworks Inc., South Natick, MA) according to the algorithm described in reference [51]. Discrete Fourier analysis was performed with an FFT routine also from Peak Shape Analysis!! 81 MATLAB; it was capable of handling data lengths that were not an integer power of two. Quadrature was conducted with MAPLE V (release 2, Waterloo Maple Software, University of Waterloo, Waterloo, ON), which uses the Clenshaw-Curtis method [52]. Linear least squares curve fitting was accomplished with either MATLAB or TABLECURVE (version 3.1, Jandel Scientific, Corte Madera, CA). All necessary software developed for this work was written in the MATLAB language and ran on a Sun workstation (model SPARCstation 2, Sun Microsystems Inc., Mountain View, CA). Double precision (64-bit) arithmetic was used throughout. 3.3 GRAM AND FOURIER REPRESENTATIONS The results for the Gram polynomials and complex exponential Fourier functions will be discussed first in unison. For the Meixner polynomials, the time scale must also be contended with and therefore, it is treated separately in the next section. To aid discussion, the term “basis” will be used when either set of functions is referenced, i.e. Gram and Fourier basis; “expansion” will mean generalized Fourier expansion; “expansion coefficients” is equivalent to “coefficients from a generalized Fourier expansion.” The trigonometric basis representation will be simply referred to as the Fourier representation, in keeping with tradition. Therefore, only when this term is explicitly qualified with “generalized” will it mean generalized Fourier expansion. 3.3.1 Spectral Information A plot of the coefficients from a generalized Fourier expansion of a signal against coefficient number produces a spectrum. The Gram spectra 6 and Fourier magnitude spectra for selected simulated peaks (shown in Figure 3.6), each consisting of 257 points, are given in Figure 3.7. For clarity, only the first 30 coefficients are plotted. The coefficient vector has been normalized to the Euclidean norm. Due to 6 An absolute Gram spectrum could have been used instead. Peak Shape Analysis!! 82 computational constraints (discussed below), only the first 50 coefficients were taken to represent the complete Gram spectrum; the coefficients beyond this were sufficiently small that the error introduced from normalizing the truncated spectrum was inconsequential. In both cases, the high order terms gain increasing prominence as the peak becomes more complex. Note that the odd Gram polynomials do not contribute to the spectrum for the Gaussian since it is an even function (an arrangement can be made to zero the odd components or any, for that matter, whose values are less than machine Clearly, the Gram spectrum is also precision, so that numerical noise is mitigated). quite capable of distinguishing these peaks and offers a different view of the data. However, while the Fourier spectrum can be interpreted entirely on the basis of 1.0 0.8 0.6 0.4 0.2 0 . U) -0.2 1.0 0.8 0.6 0.4 0.2 0 -0.2 I 0 I I I I N-I I 0 I I N-I 0 I I N-I data point Figure 3.6 Simulated FIA peaks consisting of N points. Uppercase letters are labels which have been assigned in Tables 3.1 or 3.2. Peak Shape Analysis!! a) I A 0.6 F 0.4 0.2 0 -0.2 83 I P I -0.4 -0.6 V z 0) E I I I Q 0.6 R S 0.4 0.2 0 -0.2 -0.4 -0.6 t 0 10 I I I I I I I I I I 20 30 0 10 20 30 0 10 20 30 I I I I coefficient number b) I 0.8 I I I A F P 0.6 0.4 0.2 = 0 I 0 1 C 0) E I I 0.8 I I I I I I I I I I Q R S 0.6 0.4 0.2 0 Iuao I 0 10 20 30 0 10 20 30 0 10 20 30 coefficient number Figure 3.7 a) Gram and b) Fourier magnitude spectra for the synthetic peaks shown in Figure 3.6. Spectra were computed from peaks containing 257 points. The coefficient vector has been normalized to L norm. For clarity, only the first 30 coefficients are plotted. 1 Peak Shape Analysis I! 84 frequency, the Gram representation cannot (although it has some correlation with frequency) and its meaning is entirely abstract: what is the physical significance of fitting successively to a constant, a straight line, a quadratic, a cubic etc.? For the purpose of identification or classification, this is of no consequence. Indeed, for the latter application, a complete spectrum is not required (e.g. the first 20 Gram coefficients can provide adequate discrimination between these simulated peaks). The use of a limited expansion, its justification and the problems that arise are discussed in the following sub-sections. 3.3.2 Numerical Studies in Peak Approximation In theory, the approximation to a signal is exact, in the sense that it will go through every data point, if the number of terms in the expansion equals the number of data points. That is, the spectrum will be an equivalent representation and no information is lost from the transformation. However, there is no need to expend (perhaps considerable) computing power in the quest for such accuracy since an adequate representation can often be met by a lesser number of terms; the coefficients beyond this number will be relatively insignificant and may be safely neglected. Indeed, data reduction was intended. With the Gram basis, this may be an important practical consideration for some applications since a fast algorithm analogous to the fast Fourier transform (FFT) is not available. 7 In addition, measurement errors and noise, common to empirical data, prevent an exact fit to the true signal. The fit is further limited by the numerical accuracy that can be achieved from a digital representation of the data and the manner in which the orthogonal functions are calculated. As shown below, round-off errors can become a limiting factor in the accuracy of high order functions when a recurrence is employed for computation. ‘ computation of the coefficients via the discrete version of (3.12) requires a total of AF (complex) multiplications and N(N-1) (complex) additions. For the complex exponential basis and N= 2”, the FFT can reduce this to Nm/2 complex multiplications and Nm complex additions [53]. Since computing time is often determined by the number of multiplications required, a substantial improvement in speed is achieved, e.g. a factor of 200 for N = 1024. Some consolation exists for the Gram approximation since only real arithmetic is involved. Peak Shape Analysis II 85 Because FIA signals do not vary quickly with time, a good approximation can be achieved with a limited expansion. Care must be exercised in deciding where to truncate since a certain level of accuracy in approximation must be met to ensure not only integrity of representation but also, the likelihood for spectral uniqueness. This may become crucial when a multitude of peak shapes are to be identified or when one needs to distinguish between peaks that are very similar in shape. Such issues are highly dependent on the particular application at hand. Hence, the process of approximation, for which these orthogonal functions were initially developed, cannot be entirely divorced from that of signal representation. A study into the numerical properties of the basis sets under consideration with regards to approximation is appropriate. Practical limits can then be defined. The effectiveness of an approximation can be assessed by evaluating the sum of the squared error (SSE, equation 3.35) function - - between the approximation and the true as a function of the expansion order (te. number of terms minus one). The practical value of zero may be designated by machine precision which for 64-bit floating point numbers is roughly 2.2x1 016. Given 257-point data records, for example, the SSE is effectively zero when it approaches I 028 ..1029 Unless, otherwise stated, all subsequent references to numerical limits will assume 64-bit floating point precision. Figure 3.8 shows the plots of SSE against expansion order for both bases for the six synthetic peaks from Figure 3.6. The plots for the Gram basis reveal that an “exact” approximation can be attained for the simpler unimodal peaks (A, F, P) given that a sufficient number of terms is used. They also disclose the numerical limitations of the recurrence, which prevented the more complex peaks (Q, R, S) from being fitted exactly. The SSE decreases initially as more terms are used, as expected, but the descent is countered by another factor (discussed below) after a certain point. An inspection of the absolute correlation between polynomials provides some insight into the problem. Peak Shape Analysis!! 86 a) I 10*12 10*15 10*21 P .!. 10 LU U) Cl) 10 3 -27 [ I V 10.12 10.15 10.21 10*27 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 expansion order b) I I I I I I I I I I I I I I I I io 10.12 10.15 10 10 -27 I I I U) I 10.12 z. . I I I I 10.15 10.21 10*27 0 25 50 75 100125 0 25 50 75 100125 0 25 50 75 100125 expansion order Figure 3.8 Plots of SSE against expansion order for the approximation of peaks in Figure 3.6 with a) Gram polynomials and b) Fourier basis functions. The dotted line indicates quantization error level for 12-bit data representations. Peak Shape Analysis II 87 Figure 3.9 gives a graphical profile for the inner product of the zeroth and the first order polynomials with all the others. The profile is a composite in which correlations involving the zeroth order polynomial with that of the second, fourth, etc. are interlaced with those involving the first order polynomial with that of the third, fifth, etc. This was done because odd-even ordered combinations were always observed to be orthogonal, but odd-odd and even-even ordered combinations yielded correlations which increased with the difference between the two orders. The reason behind this is unclear. A composite circumvents the need for two separate profiles and provides a general picture. The first two polynomials served as references because they are the most accurate. It is obvious from a comparison of Figures 3.8a and 3.9 that the cause of the problem is due to a loss of orthogonality in general. As the order increases and/or number of points decreases, the functions could take on very large values (e.g. 10100). Since the word length of the computer is finite, extreme values cannot be represented with sufficient accuracy. Hence, cancellation becomes increasingly poor. Returning to Figure 3.8a, suppose that a low order polynomial contributed to the 0 0 50 100 150 200 250 expansion order Figure 3.9 Composite correlation profile versus expansion order. Curve derived from the plot of the correlation of all even Gram polynomials with its zeroth order polynomial, interlaced with that of all odd Gram polynomials with its first order polynomial (see text). Data were derived from Gram polynomials consisting of 257 points. The dotted line indicates level of machine precision. Peak Shape Analysis!! 88 approximation of a given function but a high order polynomial did not, on the premise that the polynomials were numerically correct. If the latter polynomial became correlated to that of the former due to (in this case) round-off error such that linear dependence is observed, then the latter will end up making a contribution to the fit to some extent of that correlation and its coefficient will become significant. component is added as noise and a poorer fit results. This If the assertion is made that st order machine precision is zero, then orthogonality is breached at roughly the j6 polynomial, given a resolution of 257 points. Above the 152th order, all odd polynomials and all even polynomials are highly correlated (r> 0.98). The order at which orthogonality breaks down numerically, i.e. the cut-off order, depends on temporal resolution. This is shown in Figure 3.10. The plot is quite noisy since the correlation profiles themselves are noisy (the consequence of working near machine precision) but relatively “flat”. A quadratic curve: )N 5 n=21+0.16N—(8.5x10 2 (3.56) valid over N = [65, 513] has been drawn in to indicate the trend and may be used to 120 I I 100 a) 80 0 60 40 20 0 100 I I I I 200 300 400 500 600 data points Figure 3.10 Plot of the order n for the Gram polynomials at which orthogonality is (numerically) breached against number of points N used in Gram polynomials (solid line); quadratic fit to data (dotted line). Peak Shape Analysis II 89 obtain a more representative value for the cut-off order if desired. The relationship is obvious: the greater the number of points used, the higher the cut-off order. But this is characterized by that of diminishing returns as an increasing number of points are required to achieve the same increment in the cut-off order. The highest usable order is thus determined by the signal containing the fewest number of points. In essence, the more points used the better. A rule of thumb over the interval [65, 513] may be 15 orders for every 100 points. Returning to the simulated peaks shown in Figure 3.6, if they had contained say 513 points instead of 257, then a smaller SSE for peaks Q, R, and S would be numerically possible but the relative decrease in SSE with expansion order would remain unchanged. Finally, since odd order polynomials do not contribute significantly to a Gaussian, the effect of non-orthogonality is delayed and reduced for it. For the Fourier basis approximation, only 129 coefficients are shown in Figure 3.8b since the coefficients occur in pairs (with the exception of the first coefficient). That is, for real data, an increment in order involves both the negative and positive parts: when only one of the two is used, the approximation is complex. The Fourier approximation is numerically more stable as evident by the continuous decrease in SSE with expansion order. In comparing the two sets of functions, it may be said that the Fourier basis ranks behind the Gram basis in terms of overall performance (allowing for numerical limits in orthogonality for the latter). However, the Fourier expansion is more efficient when only a few terms are used. The exact number depends on the function to be approximated. For peak A, it is 6 while for peak P, it is 28 (or 14 if the fact that only half the Gram coefficients are actually used in fitting a Gaussian is taken into account). The results above define the numerical boundaries of approximation under 64-bit floating point precision. In practice, the highest accuracy attainable on a real system is set by the error introduced by the data acquisition component. Present acquisition hardware in our laboratory typically uses 12-bit analog-to-digital converters and values of SSE will be about I .5x10 5 for 257 data points (assuming that the Peak Shape Analysis II 90 maximum quantization error is 1 bit). Table 3.3 lists the expansion order required to match this value for the entire set of simulated peaks. Note that the required order is independent of the number of points. Beyond the 12-bit SSE value, the decrease is due to the fitting of quantization noise, assuming that this is the only source of noise. It is seen that less than 44 Gram terms but up to 127 trigonometric terms (254 if negative order terms are counted) are required to approximate all simulated peaks to within experimental error. Since the profile for peak S is quite drastic (ignoring peaks with discontinuities), these values represent the practical upper limit for most flow injection applications (under conditions of very high signal-to-noise ratio). Of course, for digitizers with a larger word length, e.g. 16- or 20-bit, these limits should be increased accordingly up to the maximum, set here by the word length of the computer. The limit for trigonometric approximation is inflated by its poor performance on highly skewed peaks and consequently, the variation in the number of coefficients required is high. The number of Gram coefficients required show less variation. This is more desirable from a pattern recognition perspective since most pattern recognition algorithms deal with a fixed number of inputs (in this case coefficients) and thus if a similar amount of information is contained with this span of inputs, then each peak type will be treated in more equal fashion. Table 3.3 Lastly, what is quite apparent from Table 3.3 is that the Gram Number of coefficients required to approximate simulated peaks to 12-bit precision with Gram and Fourier functions Peak Skewness Gram DFT Peak Skewness Gram DFT A B C 0 E F G H I J 1.414 1.155 1.000 0.894 0.816 0.756 0.707 0.666 0.632 0.603 II 14 16 17 16 18 18 18 19 18 127 37 20 15 12 11 10 9 9 9 K L M N 0 P Q R S 0.577 0.555 0.535 0.516 0.500 0 18 19 19 19 19 8 8 8 8 8 6 127 127 127 a) Only 10 of the 19 coefficients (i.e. even order coefficients) are significant. - - - 37 43 43 Peak Shape Analysis!! 91 polynomials may be best used on skewed peaks while the Fourier functions would be best for symmetric peaks. 3.3.3 Effect of Temporal Resolution on Expansion Coefficients As noted in the theory section, the Gram polynomials become the Legendre polynomials and the discrete Fourier functions become the continuous Fourier functions as the number of points N goes to infinity. At this limit, summation goes over to integration, and with reference to (3.12), the coefficients from either discrete expansion will tend to the values of their continuous counterpart. For any given finite value of N, the discrete-form coefficients will differ from the continuous-form coefficients. This may be attributed to: i) mathematically, summation follows the rectangular rule for integration; the error of integration over each interval At for such a formula is proportional to the square of At times the value of the function’s first derivative evaluated at some indeterminate point on that interval [54] and ii) operationally, the signal is less well defined because information is lost through sampling. It follows that when N is small, the difference between the discrete and continuous coefficients is expected to be relatively large; the difference should, in general, decrease systematically as N increases. Indeed, if the difference was due entirely to summation, then the relationship between coefficient value and N will take the form: NAt 2 = 2 cd/N, where a and b define the limits. N[(b—a)IN] One should bear in mind data length dependency, where appropriate, when making comparisons between coefficients. In the context of a pattern recognition problem, variation due to changes in data length would be incorrectly interpreted as information. In practice such a problem can arise when signals are of different duration. In flow injection analysis, peak duration is determined by such factors as manifold dimensions and geometry, pumping rate, and molecular diffusion coefficient [55]. These are fixed for a given analyzer. Only when physicochemical effects come into play to alter dispersion characteristics (such as increased viscosity with sample Peak Shape Analysis!! concentration [56]), could differences in peak duration conceivably occur. 92 However, there have been few reports in the literature on such effects and the magnitude of this problem may not be significant. Moreover, there will be concomitant change in the shape of the peak with a change in dispersion. Thus, it is mainly for the sake of completeness that a study into the magnitude of the differences in the coefficient value as a function of N has been conducted. Figure 3.11 depicts the change in coefficient value for an expansion in Gram polynomials for peaks A and 0 as a function of data points. These peaks bound a range of typical flow injection peak shapes. Peak 0 was selected over that of the Gaussian (peak P) since both odd and even coefficients were desired. Both peaks were not scaled to peak height as only the relative effect is of importance. Evaluation of the function was carried out with the following distribution of abscissa values: 65 - 1025 in steps of 4, 1033-2049 in steps of 8, and 2065-8193 in steps of 16; the point density reflects the gradient in the function. This yielded a total of 753 points. The 3rd, 13th and 22nd order coefficients were chosen for plots of coefficient value against number of points to highlight the key observations. A tabulation of the coefficient values for peak A up to the 25th order for selected number of points can be found in Table 3.4. The raw coefficients generated by the summation form of (3.11 a) were normalized for data length via division by -Jii and the Legendre coefficient was computed via quadrature and scaled by a factor of 1/ ..J (see Appendix C). It is evident that only an expansion to the 24th order is necessary: the 25th order coefficient is effectively zero. In general, the value of the Legendre coefficient is approached asymtoptically by the Gram coefficient as N increases, as expected. Most of the change is seen to occur in the first 500 points. Some semblance of a “1/N-type” relationship is observed - it best describes the curves for the low orders but deviations increase with polynomial order. Indeed, this is the dominant factor. The approach may be from either direction (i.e. from higher or lower values) depending on the peak (Figures 3.lla and 3.lld). As Peak Shape Analysis Ii 93 d) a) 5.OE-3 8.05E-3 4.5E-3 8.OOE-3 7.95E-3 4.OE-3 7.90E-3 3.5E-3 7.50E-7 e) b) 0.5E-5 = 7.25E-7 0 a) = 7.OOE-7 -0.5E-5 E 6.75E-7 -1.OE-5 6.50E-7 -: I -2.4E-14 I I I t t) c) -1.2E-6 -2.6E-14 -1.3E-6 -2.8E-14 -1 .4E-6 -1 .5E-6 -3.OE-14 -1.6E-6 -3.2E-14 I l 0 2000 4000 6000 8000 0 2000 4000 6000 8000 data points Figure 3.11 th, rd b) I 3 3 Effect of the number of points used on Gram coefficient values for peak A: a) 1 th, and f) 22 nd order and c) 22” order coefficients; and for peak 0: d) 3”, e) 13 coefficients. The value of the Legendre coefficient is shown with a dotted line. Peak Shape Analysis!! Tab!e 3.4 Gram and Legendre coefficients for peak A Polynomial order 0 I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 c 94 Gram coefficients (tabulated by points)a 257 513 65 129 1.918E-02 -1.993E-02 6.384E-03 5.024E-03 -8.386E-03 6.567E-03 -3.687E-03 1 .653E-03 -6.225E-04 2.027E-04 -5.822E-05 1 .496E-05 -3.480E-06 7.388E-07 -1.442E-07 2.602E-08 -4.365E-09 6.835E-10 -1.003E-10 1.382E-11 -1.797E-12 2.206E-13 -2.566E-14 2.832E-15 -2.984E-16 3.114E-17 1.936E-02 -2.029E-02 6.930E-03 4.394E-03 -7.838E-03 6.199E-03 -3.489E-03 I .566E-03 -5.903E-04 I .925E-04 -5.547E-05 I .432E-05 -3.353E-06 7.181E-07 -1.417E-07 2.593E-08 -4.423E-09 7.063E-10 -1.060E-10 1.502E-11 -2.012E-12 2.558E-13 -3.093E-14 3.568E-15 -3.967E-16 4.416E-17 1.944E-02 -2.047E-02 7.1 96E-03 4.080E-03 -7.559E-03 6.008E-03 -3.383E-03 1.51 8E-03 -5.714E-04 I .862E-04 -5.360E-05 I .383E-05 -3.238E-06 6.937E-07 -1.370E-07 2.511 E-08 -4.293E-09 6.877E-10 -1.036E-10 1.474E-11 -1.986E-12 2.541E-13 -3.096E-14 3.596E-15 -3.998E-16 4.262E-17 9.61 5E-04 9.690E-04 9.728E-04 Legendre coefficients 1025 2049 1.948E-02 -2.055E-02 7.327E-03 3.924E-03 -7.419E-03 5.910E-03 -3.329E-03 I .492E-03 -5.613E-04 I .827E-04 -5.255E-05 I .355E-05 -3.169E-06 6.784E-07 -1.339E-07 2.453E-08 -4.192E-09 6.714E-10 -1.012E-10 1.440E-II -1.940E-12 2.484E-13 -3.029E-14 3.526E-15 -3.931E-16 4.264E-17 1.950E-02 -2.059E-02 7.392E-03 3.846E-03 -7.349E-03 5.861E-03 -3.301 E-03 I .479E-03 -5.561E-04 I .809E-04 -5.200E-05 I .340E-05 -3.132E-06 6.700E-07 -1.322E-07 2.420E-08 -4.133E-09 6.616E-10 -9.965E-11 1.417E-11 -1.910E-12 2.444E-13 -2.980E-14 3.466E-15 -3.858E-16 4.OIIE-17 1.951E-02 -2.061E-02 7.424E-03 3.808E-03 -7.314E-03 5.836E-03 -3.287E-03 I .473E-03 -5.535E-04 I .800E-04 -5.172E-05 I .332E-05 -3.113E-06 6.656E-07 -1.313E-07 2.402E-08 -4.102E-09 6.563E-10 -9.882E-11 1.405E-11 -1.893E-12 2.422E-I3 -2.951E-14 3.431E-15 -3.805E-16 4.115E-17 1.952E-02 -2.063E-02 7.457E-03 3.769E-03 -7.279E-03 5.811E-03 -3.273E-03 1 .466E-03 -5.508E-04 1.791 E-04 -5.I43E-05 I .324E-05 -3.093E-06 6.611E-07 -1.303E-07 2.384E-08 -4.069E-09 6.508E-10 -9.795E-11 1.392E-11 -1.874E-12 2.397E-13 -2.920E-14 3.395E-15 -3.776E-16 4.031E-17 9.747E-04 9.756E-04 9.761 E-04 9.766E-04 a) Normalized for data length. the order increases, the rate of approach decreases, reflecting the increasing difference between the Gram and Legendre polynomials with order, at a fixed number of points. Hence, changes in the coefficient value with data lengths are not proportional between coefficients. For this reason, normalization of the coefficient vector to unit length does not eliminate the problem. While differences in the first 2 to 4 coefficients have been reduced by up to 50% in this way, this improvement is gained at the expense of the others, which increases slightly. For peak 0, the 8th and the 13th Peak Shape Analysis!! 95 order (Figure 3.1 le) coefficients undergo a change in sign at datapoint 456 and 298, respectively, as the data length increases. With higher order polynomials (above the 15th order), the curves in these plots are seen to change direction (Figure 3.11 c and 3.1 le) and the point at which the reversal occurs increases (mildly) with polynomial order. The exact cause of this is unclear but the author suspects that it is linked to changes in shape of the polynomials as a function of N. In any case, this effect is not cause for alarm since only the range of coefficient change is of interest here. A general overview of the magnitude of coefficient change can be had by computing values of the relative change: the values for the th Im(I”) — min()I I , where F” denotes coefficient in the interval [a, b], and f° is taken to be the Legendre coefficient, over selected ranges of data length. Results for the first 25 coefficients using data length intervals of [65, 129], [129, 257], [257, 513], [513, 1025], [1025, 2049], and [65, respectively. “doubling” cc] are shown in Figures 3.12 and 3.13 for peaks A and 0, Except for the last of these ranges, the intervals chosen represent a in the number of points. The relative differences appear to depend on the function to be transformed. The shape of these histograms become more similar with increasing N an indication that the changes are becoming more proportional. As the - coefficient decreases in magnitude, the relative difference generally increases. When a change in sign occurs, the relative difference - in the two places where it is observed for peak 0 is exceptionally large. - Similar results were obtained with the Fourier basis functions and thus plots corresponding to Figure 3.11 are not given. A reversal in the curves were not observed. The first 26 coefficients for selected number of points are listed in Table 3.5. Figure 3.14 contrasts the relative differences over selected intervals of data lengths for the Fourier coefficients with those of the Gram for peak A. Again, the relative differences for the former increases with order, which coincide with decreasing coefficient values. For peak A, they also change comparatively less as the number of points go beyond 257. Peak Shape Analysis!! I 0.4 - I I I I I I 96 I d) a) 0.15 0.3 0.10 0.2 0.05 0.1 _oI1ooooannnuUOOElI1E1E1E1E1U 0 I I I I 5 10 15 20 b) _. 0.15 0.15 0.10 0.10 0.05 0) E 0 QOÜUOO I I I I 0 I I c) f) 0.4 0.15 0.3 0.10 0.2 0.05 0 uuuuE1I1I1uI:ll1uiiu 0 5 10 15 20 0.1 0 25 J 0 25 coefficient Figure 3.12 Absolute relative range of Gram coefficient values for the first 26 coefficients over the intervals; a) 65-129, b) 129-257, c) 257-513, d) 513-1025, e) 1025-2049, and f) 65-ac points for peak A. Peak Shape Analysis!! I 0.6 I I a) 2.6 I d) 7.9 I 0.4 I I I I 20 25 97 3.1 0.03 0.4 0.2 0 0.02 : 0.01 0 b) •—. a) 1.6 DDDUD I I I e) 8.8 0.06 0.03 0.04 0.02 0.02 0.01 UDI I 0.2 1.6 C) E DDDDOU 0 c) UU 0.8 flUuoa 0 U 5.6 fl 0 0.6 f) 6.0 27 0.03 0.4 0.02 LL 0.01 0 0 5 10 15 0.2 0 20 25 DOflUD 0 5 O 10 15 coefficient Figure 3.13 Absolute relative range of Gram coefficient values for each coefficient over the intervals; a) 65-129, b) 129-257, c) 257-513, d) 513-1025, e) 1025-2049, and f) 65-co points for peak P. Values in graphs indicate magnitude of off-scale bars. Peak Shape Analysis!! Table 3.5 Discrete and continuous Fourier coefficients for peak Aa Function e‘Ir 1 or. Discrete Fourier coefficients (tabulated by polnts)Li 513 257 Continuous Founer coefficients . 65 0 I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 98 129 1.91 8E-02 I .989E-02 1.106E-02 6.376E-03 4.01 7E-03 2.733E-03 1 .974E-03 1 .493E-03 1.172E-03 9.464E-04 7.832E-04 6.613E-04 5.680E-04 4.952E-04 4.374E-04 3.908E-04 3.528E-04 3.214E-04 2.954E-04 2.736E-04 2.553E-04 2.398E-04 2.267E-04 2.1 56E-04 2.062E-04 I.984E-04 I .936E-02 I .998E-02 1.104E-02 6.326E-03 3.964E-03 2.681 E-03 I .923E-03 I .443E-03 1.122E-03 8.972E-04 7.341E-04 6.121E-04 5.185E-04 4.453E-04 3.869E-04 3.396E-04 3.008E-04 2.686E-04 2.416E-04 2.186E-04 1.991E-04 I .822E-04 I .676E-04 I .548E-04 I .436E-04 1.338E-04 I .944E-02 2.002E-02 1.103E-02 6.307E-03 3.945E-03 2.663E-03 I .907E-03 I .428E-03 1.108E-03 8.835E-04 7.207E-04 5.990E-04 5.057E-04 4.326E-04 3.744E-04 3.272E-04 2.884E-04 2.562E-04 2.292E-04 2.063E-04 1.867E-04 I .698E-04 I .552E-04 I .424E-04 1.311 E-04 1.212E-04 I .948E-02 2.004E-02 1.102E-02 6.299E-03 3.937E-03 2.657E-03 1.901 E-03 I .423E-03 1.103E-03 8.792E-04 7.167E-04 5.951E-04 5.020E-04 4.290E-04 3.709E-04 3.237E-04 2.851E-04 2.529E-04 2.259E-04 2.031E-04 1.835E-04 I .666E-04 I .520E-04 I .392E-04 I .280E-04 1.181E-04 1025 I .950E-02 2.005E-02 1.102E-02 6.295E-03 3.934E-03 2.654E-03 I .899E-03 1.421 E-03 1.1OIE-03 8.777E-04 7.153E-04 5.939E-04 5.008E-04 4.279E-04 3.698E-04 3.227E-04 2.841E-04 2.520E-04 2.250E-04 2.021E-04 1.826E-04 I .657E-04 1.511 E-04 1 .384E-04 1.271 E-04 1.172E-04 2049 1.951 E-02 2.005E-02 1.102E-02 6.293E-03 3.932E-03 2.653E-03 I .898E-03 I .420E-03 1.IOIE-03 8.771 E-04 7.148E-04 5.934E-04 5.004E-04 4.275E-04 3.694E-04 3.224E-04 2.837E-04 2.516E-04 2.247E-04 2.018E-04 1.823E-04 I .655E-04 I .508E-04 1.381 E-04 I .269E-04 1.170E-04 I .952E-02 2.006E-02 1.102E-02 6.292E-03 3.931 E-03 2.652E-03 I .897E-03 I .420E-03 1.IOOE-03 8.766E-04 7.144E-04 5.931E-04 5.000E-04 4.272E-04 3.692E-04 3.221 E-04 2.835E-04 2.514E-04 2.245E-04 2.017E-04 1.821E-04 I .653E-04 I .507E-04 I .379E-04 I .267E-04 1.I68E-04 a) Only the first 26 coefficients are tabulated. b) Normalized for data length. Orders above zero are a combination of the negative and positive order Fourier coefficients, i.e. 42F F, In circumstances where the time duration of the signals produced varies significantly, the easiest way to deal with this problem is to set the acquisition rate such that at least 500 points are acquired. To ensure that the number of data points stays relatively constant is more difficult and impossible in situations where the time duration of the signal cannot be determined beforehand. The use of this many points increases the computational overhead but is more than adequately handled by today’s powerful microcomputers. This last concern takes on little significance for FIA when one Peak Shape Analysis II 0.6 a) d) 0.03 0.4 99 0.02 0.2 0.01 0 0 I z I b) I I I I e) 0.10 0.020 0.015 G) 0.05 0.010 C C) E i 0.005 0 0 I - I I c) I 0.6 -U 0.04 0.4 ‘. 0.02 0.2 0 0 0 5 10 15 20 25 I I I I I I 0 5 10 15 20 25 coefficient Figure 3.14 Absolute relative range of Fourier (solid) and Gram (dotted) coefficient values for each coefficient over the intervals; a) 65-129, b) 129-257, C) 257-513, d) 513-1025, e) 1025-2049, and U 65-co for peak A. Peak Shape Analysis!! 100 considers that the time scale of an FIA experiment typically ranges between 20 to 60 5, whereas the computation time for either representation is under a second on an Intel 33-MHz 80486DX-based microcomputer. 3.3.4 Effects of Experimental Noise In sub-section 3.3.1 different peak shapes were found to result in different spectra, but those peaks were ideally produced. Real signals are always corrupted by noise and the reproducibility of the spectrum is again of concern. In this study, white noise will be assumed (i.e. the frequency spectrum is constant over all frequencies) along with a Gaussian distribution in the time-domain. While differences in the coefficients due to different data lengths are systematic, the changes due to noise are stochastic and can only be adequately described by probability theory. That is, one makes use of frequency distributions (and parameters thereof, such as the mean and variance) which define the frequency of an outcome (e.g. peak voltage) over a long period of time. The mean outcome is known as the expected value. Debets et a!. [21] have previously demonstrated how coefficient reproducibility depended on the actual noise sequence, the type of noise (e.g. white or lit) and the noise level. The first factor is attributable to the fact that a limited noise sequence is, to some extent, biased [57] and this can cause differences between theory and practice. The second is apparent from consideration of the Fourier transform whose spectrum is commonly used to distinguish the various types of noise. The third is intuitively obvious. In theory, the expected value for the coefficients will attain the true value as the number of measurements goes to infinity. That is, given that a large population of samples of the same signal corrupted by noise is available, the estimate of the mean value of any coefficient will be unaffected by noise, provided it is white. From a mathematical perspective, a finite discrete noise sequence can be cast as a signal. If the assumption is made that each element of noise consisted of independent, Gaussian Peak Shape Analysis!! 101 distributed variates y 1 with means p, and all having the same variance c 9 such that E(y)=u and E((y_p)(y_p)T)=1o2 (3.57) where E denotes the expectation operator, and I is the identity matrix such that Ic? is the covariance matrix, then the transform vector X=ATy (3.58) consists also of random elements [58] having mean E(x) = E(ATy) = ATp (3.59) and covariance matrix E((x — E(x))(x E(x)) ) T — (y T E(A = — — A) T u) ATE((yp)(y_p)T)A (3.60) =1o2 Equation (3.58) is that of (3.40) simplified for an orthogonal basis and A is the design matrix. Thus, the variance of the input function is preserved in the output function. In other words, if the noise treated here is zero mean, the sequences randomly increase or decrease the value of the coefficients for a given signal and the amount of variation expected is equal to the input noise variance. It follows that the relationship between coefficient variability and noise level will be linear. For the Fourier magnitude coefficient, variability will be reduced since the sign of the coefficient is removed. Nevertheless, the effect of noise will be linear by virtue of the fact that: x(t) + ay(t) IX(a)Ie’°° + IaY(a.)Ie’°’ (3.61) for any constant a; this follows from (3.17). With white noise, all coefficients will be affected equally and consequently, the smaller the coefficient, the more susceptible it is to noise. Hence, the noise level is critical and every effort should be made to minimize it so as to facilitate accurate peak shape identification. PeakShapeAnalysisli 102 From the foregoing, it is obvious that the noise spectrum masks that of the signal. In general, the signal spectrum dominates the low order coefficients while the noise spectrum dominates the high order coefficients. Thus, there is an order at which a changeover occurs and the signal component cannot be reliably extracted beyond this point, Of course, the signal spectrum itself could be modeled (optimal Wiener filtering [5]) but the limited number of signal components available for modeling again places a limit on the accuracy; the task is also difficult is perform automatically. The result is that an optimal expansion order exists In terms of approximation, expansions beyond the optimal order cause overfitting; that is, the fitting of noise. Ramifications of this to pattern recognition are discussed in a later section. Figure 3.15 shows the error in approximating peaks A and Q with the Gram and Fourier basis as a function of expansion order and noise level. The same noise sequence was used. As noise increases, the optimum expansion order decreases and the minimum SSE increases (see Table 3.6). The change in the optimum expansion order can be quite substantial as observed for peak Q for the Gram approximation and peak A for the Fourier approximation. Since the true signal characteristics are unknown, the true optimum order cannot be determined exactly although strategies [57] have been devised to compute a near-optimal result. Clearly, the error in approximation is no longer bound by numerical accuracy. This problem will be treated in the context of digital filtering in Chapter 4. Finally, it should be stressed that overfitting is a problem inherent in modeling and not one that is directly associated with these basis functions. 3.4 MEIXNER REPRESENTATION Generalized Fourier expansions in Meixner polynomials are complicated by the fact that these functions are defined on [0, ao]. Since, the data to be represented are finite in length, there exists a basic incompatibility. The obvious solution is to truncate the Meixner functions but this results in loss of orthogonality over the entire Meixner Peak Shape Analysis II 103 100 10 1 0.1 0.01 0.001 0.0001 LU C,) Cl) I 100 I I I c) d) 10 I 0.1 0.01 0.001 051015202530 051015202530 expansion order Figure 3.15 Effect of noise on approximation. Gram basis on a) peak A, and b) peak Q. Fourier basis on c) peak A and d) peak Q. The noise levels shown in graph (a) apply to other graphs also and are expressed as percentage standard deviations with respect to peak height. Table 3.6 Optimal expansion orders for SSE curves shown in Figure 3.15 Noise level (% a) 2 4 7 10 20 Gram basis peak A peak Q order SSE order SSE 10 9 9 9 9 0.004 0.013 0.036 0.071 0.279 24 20 15 8 7 0.012 0.039 0.104 0.158 0.339 Fourier basis peak A peak Q order SSE order SSE 21 19 16 13 9 0.022 0.050 0.111 0.194 0.469 9 9 7 7 6 0.004 0.014 0.031 0.057 0.184 104 Peak Shape Analysis II space. As stated in sub-section 3.1.6, the time scale parameter provides a means to scale the functions such that an orthogonal subspace consisting of the first M functions remains orthogonal in the numerical sense. This requires that the highest order function in the chosen subspace converges to zero (or close enough that it can be considered zero) within the specified number of data points. The time scale beyond which orthogonality is no longer in effect is referred to as the critical time scale, herein denoted as . The importance of determining the critical time scale with regards to identification is that the spectrum is a function of the time scale. Here, the critical time scale can serve as a reference point to anchor the spectrum. This need not be rigid and there are reasons (see below) for allowing the time scale to change slightly with signal shape. 3.4.1. Determination of the Critical Time Scale At present, an expression for direct computation of the critical time scale does not exist but empirical methods can be devised. Two such are proposed here. The first involves taking the scalar product of the highest order orthonormal function with itself as a function of . The value is equal to one only when the function decays to zero; that is, when the function is completed. As increases, the Meixner polynomials become stretched-out and decrease in amplitude (refer to Figure 3.3). Indeed, with reference to the orthogonality property given by (3.33), the Meixner polynomials vanish as goes to 1. For this reason, truncation causes the scalar product to decrease - the greater the extent of truncation, the smaller this value will be. Figure 3.16 shows the scalar product curves for 257 points over the domain = [0.6, 1.0] in steps of 0.0005, and expansion orders of 1, 5, 10, 15, 20, 25, 30, 35 and 40. The critical time scale is located at the point where the curves first deviate from unity (the threshold being computational precision, for example). As the expansion order increases, the critical time scale decreases to accommodate the next function. A step-like pattern is observed. At the time scale corresponding to these steps, the function was found to be Peak Shape Analysis II 105 0.8 06 Figure 3.16 O::8O:90 Plot of the scalar product of the highest order polynomial in a subset of Meixner functions against time scale. The number of functions in the subset is (from right to left in the graph): 1, 5, 10, 15, 20, 25, 30, 35 and 40. truncated at a peak or valley and as a result the decrease slowed. In fact, a slight dip was observed. If a sharper drop is desired, one may take these curves to higher powers. Finally, it should be stressed that the curves are valid for 257 points only - if more points are used, the curves would shift to a higher time scale, and vice versa. The second method is based on the condition number (sub-section 3.1.7) and produces the same end results. When the basis functions are orthonormal, the correlation matrix assumes the identity matrix and the condition number is equal to one. Truncation reduces the degree of orthonormality and the extent of truncation causes the Meixner polynomials to become more and more linearly dependent. Consequently, the range of the eigenvalues and the condition number increases. Eventually, the condition number goes to infinity when complete linear dependence is observed between any two or more basis functions. Figure 3.17 is analogous to that of Figure 3.16; the reciprocal condition number (RCOND) is plotted against for different expansion orders. The critical time scale is determined as above. The RCQND function assumes a sigmoidal shape that also conveniently ranges from one to zero. Peak Shape Analysis II I I I I 106 1.0 p0.8 0.6 Figure 3.17 0.7 0.8 1.0 0.9 Plot of RCOND against time scale Cfor a subset of Meixner polynomials. The number of functions in the subset is (from right to left in the graph): 1, 5, 10, 15, 20, 25, 30, 35, 40. The profile for each curve is almost identical for all. Again, these functions are defined only for the number of points specified and they may be squared to obtain a sharper drop-off. Both methods are equally adept at locating requires less computation and is thus much faster. . The scalar product method However, the RCOND function does not exhibit steps and local minima, and has a sharper natural drop-off. In situations where the entire function is needed, the condition number method is preferred over the scalar product method. One such is described later. 3.4.2. Calibration of the Critical Time Scale In both the scalar product and RCOND schemes, a regularity appears in the value of the as a function of polynomial order. To test for linearity, critical values were computed via the RCOND method to an accuracy of 0.0001 (using a threshold of 0.999999) for data points from 65 to 689 in increments of 16, and over expansion orders from I to 40 in steps of 1. The critical time scale surface is plotted in Figure 3.18. The relationship is clearly non-linear. 0 a) a) (U 0 Co 0 (0 - (‘1 0 U) 0 0. CU CU 0 - .2 .2 0 0 = Cu — 0 . 0 . c. CU ) 0 o C Cu . a) E . 0 - o3 0 a) . — . . 0 0 - .s •a. a). 00. 00 W z - Co C) 0) LZ < .CU > i..: a) II) .-‘ ‘I— - • — CU 0E Ua) CUC CU Q c CU - CD 0. . CU - (0 L. . CU U)— 0 C CU C CD 0 CD — EQ .- c .o > i... 0 0 E CU . . W U) .0 o 0 o (I) U) Co CU Ci) D - (I) G) D.90 2€oo 0 a) CUE 0Ec 0 > 0 z .2 CU 0 U) .c — x 0. CUE 2 . wô5 0 0. D Cl) a)O E C E C o- CU .9 U) (1) CD• 0 Z U) w D 5 > E 0 (U 0ECUC CD E - 0 I— C) a) .D CU I a) 5 C CU Peak Shape Analysis!! could be treated separately first. 108 The results from each would then be multiplied together to yield an expression for the surface. A survey of functions of the form: 9(x) where maxiMi = IMfIX% = 8, was undertaken with the program TABLECURVE. For the expansion order variable, one of the best overall (linear least squares) approximations - now formulated in practical terms for the given data was found to be - = (N) + f(N).J + f(N)n + f 0 f (N)n1i + 2 3 (N)n 4 f where n is the expansion order and the coefficients (3.62) are a function of the number of data points N. Remarkably, the “inverse” to (3.62) was found to be equally good for the data points variable, i.e. (N) = g (n) + 0 + 2,J1?) + (,) 3 g Here, the coefficients g, are functions of n. + g (3.63) This “inverse” relationship would be expected for a theoretical model, if one existed. The product of (3.62) and (3.63) gives the desired expression in both variables: (n, N) = + 20 + an + an a 10 + n a n 2 in2N 4 01 + nN a N 11 + nN a 21 + nN a 31 + a a n%N1 + nN 2 32 + n a +N 02 + ai a 1 22 + n%It1 a 42 a 1 N 2 + + 03 + nN a N 13 + nN a 23 + anN + an a N 2 + 2+ aN (3.64) nN2 + nt’t 4 a 24 + an%N2 + an a 2 N 2 The linear least squares curve-fit results are given in Table 3.7. Of course, the fit is only valid for the surface in Figure 3.18. Also shown in the summary is a measure of the adequacy of the fit as provided by the standard error of the fit, -.JMSE, which is defined as Peak Shape Analysis II Table 3.7 109 Results from curve-fitting the critical time scale surface to (3.64) A. Modeling Set Parameter values: value coefficient coefficient 1.002972 -6.313329E-3 4.172602E3 -1.137855E-3 8.732053E-5 -2.174628E-1 4.462146E-1 -2.896922E-1 7.803969E-2 -5.752757E-3 a 10 a 20 a 11 a a 41 a value -3.246079E-3 1.311270E1 5.278547 -1 .925054 1.320208E-1 -6.790450E1 I .286522E2 -7.967123E1 2.032516E1 -1.167729 02 a 12 a a22 2 42 a 03 a 13 a a Fit standard error gMSE: Maximum absolute residual error: coefficient a0.4 14 a 24 a value 2.545052E2 4.361778E2 2.708994E2 -5.645649E1 2.841588 3.393E-5 8.829E-5 B. Evaluation Set Absolute residual for evaluated data points over expansion orders of I to 40: data point mean maximum 151 449 645 3.192E-5 2.840E-5 2.564E-5 8.233E-5 6.895E-5 6.123E-5 SJMSE = /(y[k] — N-M = SSE /VN—M (3.65) where M is the number of parameters (ie. coefficients). below the accuracy sought during direct computation of was tested on curves of C C. The value obtained is well For evaluation, the model against N, computed at 151, 449 and 645 datapoints. As shown, the magnitudes of the residuals are again within the accuracy sought. Peak Shape Analysis!! 110 3.4.3 Numerical Studies in Peak Approximation As discussed in sub-section 3.1.6, the accuracy of signal approximation not only depends on the expansion order but also on the time scale parameter for an expansion in Meixner polynomials as a function of SSE curves and expansion order for peaks A, F, P, Q, R, and S (each consisting of 257 points) were computed. Only those for peaks A and P are given in Figure 3.19; they serve to highlight the main points. In both graphs, the position of the critical time scale for each expansion order is indicated by a vertical dotted line. I I a) 10 I 0 In_I LU (.) Cl) 4fl15 ‘V -20 10-25 10 b) 102 100 ;‘ 1O Cu ‘ 10.8 1010 I I 0.6 0.7 0.8 0.9 1.0 C Figure 3.19 Plots of SSE against time scale for expansion orders of (from top to bottom in the graphs): 1, 5, 10, 15, 20, 25, 30, 35 and 40. Dotted lines indicate location of the critical time scale for that order, as computed in sub-section 3.4.2. Peak Shape Analysis!! 111 The shape of the error curves depends on the peak. The optimal time scale (i.e. for best fit) does not, in general, coincide with the critical time scale; for the peak types studied, the former generally resides in the region of non-orthogonality. It is obvious that the Meixner functions perform most efficiently on peaks that resemble them. For example, peak A requires only six terms to reach machine precision while peak F requires only ten. Ideally, one would like to be able to determine the minimum expansion order required for a peak since the most representative spectrum is obtained. This requires a search for the optimum time scale and calculation of the minimum SSE over each expansion order until the SSE decreases to a pre-specified level - a task that at present is clearly not practical for near real-time data evaluation, especially if that level is located at a very high expansion order. Furthermore, due to overfitting, noise can render such a procedure ineffective (as demonstrated below), unless some additional criterion is available to aid in locating the optimum. In some instances (e.g. peak A), a single optimal time scale exists (observable at low expansion orders) while in other instances (e.g. peak P) many local minima are detected. The rippling results after orthogonality has been breached. The local minima are suspected to be linked to the oscillations in the Meixner functions perhaps in the same manner as those found in Figure 3.16. An inspection of the functions at these minima yielded inconclusive results however. For high expansion orders and high values of The noise itself is observed to be structured. the curves can become noisy. In this region, the RCOND value has reached machine precision (about 10-16) for the design matrix: singularity has been reached. The SSE levels off more or less at 10_8. Some peaks (e.g. A and F) do not reach their potential minimum at high expansion orders before this problem sets in. Finally, as predicted from theory, the “minimum well” of the error curves broadens with an increase in terms and, with regards to approximation, the time scale becomes essentially immaterial at infinite expansion order. The region where orthogonality is not observed can be problematic. Local Peak Shape Analysis Ii 112 minima and noise pose problems for search algorithms which commonly assume smooth, single mode functions or surfaces. Though not as critical for approximation, an inability to locate a unique time scale has potential dire consequences for identification. One way is to fix the time scale at the critical time scale (determined in the manner described above). This method is rigid and sacrifices a potentially better representation. However, as shown later, it is more robust to noise. If discrimination is adequate, this route i favored because of its speed since the coefficients can be computed via (3.12) and , via (3.64). Alternatively, one can relax the rigidity of a fixed time scale to some degree such that the time scale is afforded some latitude. In terms of identification, it could serve as an additional parameter for discrimination since the optimal time scale is a function of peak shape. If strict adherence to orthogonality is desired, the search for the minimum SSE( can be subjected to the following constraint: scalar product RCOND method). > threshold (the scalar product method is faster than the However, only peaks having optimal time scales within the orthogonal region will benefit. The scope can be increased somewhat if the orthogonality constraint was relaxed slightly as well. This could be advantageously used in situations where the peak does not decay to baseline within a reasonable time and must be truncated. Of course, the peak value at the point of truncation must not be too far from zero. In addition, the Gaussian peak, which shows the most rippling, is less common than skewed peaks in FIA and the danger of finding a local minimum is of less concern. Hence, such an approach is deemed worthy of investigation. The following two-term criterion is proposed for minimization: RCSSE = RCOND x [Iog(SSE) log(offset)] - (3.66) The RCOND function is used over the scalar product for the reasons stated in the last section. The offset term is to account for situations where the SSE does not fall below one. Any value which causes max(Iog(SSE)) to fall below zero could be used and a suggested value, and one used here, is the sum of the squared data vector, which Peak Shape Analysis!! 113 yields a very high SSE. A sharp cut-off at the critical time scale is not expected (from the shape of the RCOND functions) and hence, part of the non-orthogonal region will be included. In Figure 3.20, the SSE error curves are redrawn with RCSSE as the dependent variable. Although a single minimum is not always achieved (e.g. Peak P), it is a definite improvement especially since the noisy region now does not endanger the search. In Figure 3.21, the SSE is plotted as a function of expansion order along the time scale set by min(RCSSE). Since the time scale changes, the step features in some of these curves are not caused by certain polynomials not contributing to the I I a) 0 0 Cu hE I b) I I 0 _. = Cu -2 - Lii (0’) Cl) 0 -6 -8 -10 I I I I I 0.6 0.7 0.8 0.9 1.0 C Figure 3.20 Plots of RCSSE against time scale for expansion orders of (from top to bottom in the graphs): 1, 5, 10, 15, 20, 25, 30, 35, 40 for a) peak A, and b) peak P. 114 Peak Shape Analysis Il I I I I I I I I A 100 F i12 i16 io20 i24 i.28 I LU C) U) I I 101 I I I I I I I I I Q - I I I I I I I I I I I S R-- - 100 101 io2 10 I I 0 10 20 I I I I I I I I I 30 40 0 10 20 30 40 0 10 I 20 30 40 expansion order Figure 3.21 Plots of SSE against expansion order for the approximation of peaks in Figure 3.6 with Meixner polynomials. Dotted line indicates quantization error level for 12-bit data representations. peak. Table 3.8 lists the expansion order required to meet 12-bit accuracy. It should be kept in mind that with each expansion, a different Meixner basis is used. Not surprisingly, they are observed to be exceptionally good at approximating highly skewed, single-mode peaks. Overall, the performance of the Meixner polynomials with regard to these synthetic peaks fall somewhere between that of the Gram polynomials and Fourier basis (compare with Table 3.3). 3.4.4 Spectral Information The L -normalized Meixner spectra for the simulated peaks in Figure 3.6 are 2 shown in Figures 3.22 (RCSSE) and 3.23 (critical time scale), respectively for 10 and Peak Shape Analysis!! Table 38 Number of coefficients required to approximate simulated peaks to 12-bit precision with Meixner polynomialsa Peak Skewness Terms Peak Skewness Terms A B C D E E 1.414 1.155 1.000 0.894 0.816 0.756 0.707 0.666 0.632 0.603 2 3 4 5 7 8 8 9 9 10 K L M N 0 P Q R S 0.577 0.555 0.535 0.516 0.500 0 11 11 12 12 13 21 21 27 24 G H I J 115 - - - 20 coefficients. For the former, the time scale is given along-side each spectrum. In both cases, the similarity of the spectra to the actual peak profiles is striking. This feature makes the Meixner representation ideal for compression of peak-shaped data. Note that the spectra obtained via the RCSSE method are slightly more compact and resemble the signal shape to a greater extent. The match becomes progressively better as the number of coefficients used is increased. Indeed, with reference to the Meixner polynomials themselves (Figure 3.3), there appears to be some rather loose connection between the magnitude of the coefficients and values computed by direct integration of the peak over successive equal time intervals. However, the Meixner coefficients are computed over the entire peak. With the greater number of points they should, in principle, be more robust to noise than segment integration. In addition, this connection accounts for the “enhanced resolution” in the spectrum with expansion order if one recalls that the subset of functions are scaled to fit the range of the signal. 3.4.5 Effect of Temporal Resolution As with the Gram and Fourier representations, the Meixner coefficients also change with the number of data points. The results are similar, and therefore are not repeated here. As such, the recommendations stated in sub-section 3.3.3 apply to the Meixner case as well. Peak Shape Analysis!! I I a) 0.8 I I I I I I I 116 I C = 0.9297 C = 0.9125 C = C =0.8933 C =0.8726 C =0.8605 C = 0.9130 0.6 0.4 0.2 0 C) Cu E 0.8 0.6 0.4 0.2 0 0 l 1 uuu:o coefficient I I b) 0.6 I I I I I - C = C 0.8589 = 0.8610 0.8495 0.4 0.2 0 JIUoD . = = •1 C) Cu 0.6 C E = C o.8535 = C 0.8241 0.8424 0.4 0000 UJ uUu 0.2 0 0 5 101520 0 5 101520 0 5 101520 coefficient Figure 3.22 Meixner spectra composed of a) 11 coefficients and b) 21 coefficients for the six synthetic peaks shown in Figure 3.6. Spectra were computed, via the RCSSE method, from peaks containing 257 points. The coefficient vector has been normalized to -norm. 2 L 117 Peak Shape Analysis Ii I a) I I I I I I I I I I 0.6 0.4 0.2 Cu a) Cu 0 0 0.6 E 0.4 L 0.2 0 UOUO:D coefficient I b) I I I 0.6 0.4 0.2 = Cu a) = = 0) Cu 0 IOuo._ - I I I I 0 5 101520 - - I I 0 5 - I I 0.6 E 0.4 0.2 0 101520 0 5 101520 coefficient Figure 3.23 Meixner spectra composed of a) 11 coefficients (= 0.8856), and b) 21 coefficients ((= 0.8122) for the six synthetic peaks shown in Figure 3.6. Spectra were computed at the critical time scale from peaks containing 257 points. The coefficient vector has been normalized to La-norm. Peak Shape Analysis!! 118 3.4.6 Effect of the Time Scale on the Spectrum Earlier, it was mentioned that the time scale has a profound effect on the spectrum. Figure 3.24 depicts the change in the first 6 coefficients (from a tenth order expansion) of peaks A and P. The spectrum was not normalized. As seen, low time scale values weight the higher order coefficients while high values weight the low order coefficients. This is expected from the changes observed in the Meixner functions with time scale. The extent of change is a function of both the peak and the time scale. The changes are not proportional between coefficients and can be substantial. Hence, it is imperative that the time scale remains fairly constant. a) I I 0.95 1.00 6 a) 4 V = 0) a) E 2 0 -2 b) = 4 a) 2 0) a) E 0 -2 0.80 0.85 0.90 C Figure 3.24 Effect of the time scale parameter on coefficients values for a) peak A and b) peak P. The first six coefficients (labelled by the circled numbers) from a tenth order expansion are plotted. The coefficient vector was not normalized to unit length. Peak Shape Analysis!! 119 3.4.7 Effects of Noise As with the Gram and Fourier representations, the variation of coefficient values with noise is also observed with the Meixner representation. If the truncated Meixner functions are orthogonal and the noise is white, the analysis described in sub section 3.3.4 is applicable. This would be the only source of variation due to noise if a fixed time scale was used. When the RCSSE function is used, an additional concern is that noise may affect the shape of the SSE curves such that an uncertainty is introduced in the location of the desired time scale. As discussed above, a changing time scale represents a systematic error and the amount of error transferred onto the coefficients is difficult to predict since that quantity is a function of the peak (i.e. shape) and the time scale. Thus, the effect of noise on the SSE profiles is of interest. In practice, when noise is present, the true signal is unknown and comparisons must be made between the model and the raw signal. Figure 3.25 shows the SSE error curves for peaks A and P over noise levels of I % and 10%, with respect to peak height. The SSE curves begin to flatten out when they reach the sum of squared values for the noise sequence, which are 0.0256 and 2.56 at I % and 10% noise, respectively. The minimum falls very slowly with an increase in expansion order. This flattening makes it difficult for minimum search algorithms to locate the global minimum. Hence, at the very least, the SSE should be combined with RCOND to yield the corresponding RCSSE function. However, even this fails to ensure a constant minimum because the SSE curves vary in shape with noise sequence. Table 3.9 tabulates the mean time scale and its standard deviation for peaks A and P over various noise levels for 10 and 20 order expansions. were used. A total of 50 different Gaussian-distributed noise sequences The amount of variation in expansion order. is seen to increase with noise level and A significant bias (decrease in C with increase in noise level) is observed with noise level and is seen to be the main cause of error. With respect to the RCSSE method, the possible advantage of enhanced discrimination from allowing some degree of freedom in selecting the time scale exposes the representation to this Peak Shape Analysis II 120 c) a) 101 10° 10_i Cs I I I I I I I I I LU C) (i) d) b) i2 101 100 10_I - i2 0.2 0.4 0.6 0.8 1.0 I I I I I 0.2 0.4 0.6 0.8 1.0 time scale parameter C Figure 3.25 SSE curves as a function of time scale (for a) peak A, 1% noise level, b) peak A, 10% noise level, c) peak P, 1% noise level, d) 10% noise level. Expansion orders of 1, 5, 10, 15, 20, 25, 30, 35 and 40 are shown. Circled numbers indicate the sequence. Table 3.9 Variability of optimal Meixner time scale parameter with noise level a Noise level Peak A coefficients (%o) 0 0.5 1 3 5 10 20 10 0.9297 0.9059 0.9025 0.8940 0.8889 0.8835 0.8738 ± ± ± ± ± ± Peak P coefficients 20 0.0021 0.0033 0.0078 0.0110 0.01 62 0.0219 0.8589 0.8318 0.8255 0.8104 0.8050 0.7928 0.7862 ± 0.0046 ± 0.0081 ± 0.01 65 ± 0.0238 ± 0.0271 ± 0.0364 10 0.91 30 0.9115 ± 0.0009 0.9051 ±0.0011 0.8890 ± 0.0036 0.8913 ± 0.0081 0.8929 ± 0.0099 0.8958 ± 0.0118 20 0.8495 0.8298 0.8290 0.8224 0.8171 0.8152 0.8116 ± 0.0020 ± 0.0032 ± 0.0099 ± 0.0117 ± 0.01 34 ± 0.0161 a) Optimal time scale parameter computed from RCSSE function. Variability estimated from 50 random Gaussian-distributed noise sequences. Peak Shape Analysis!! 121 problem. These findings show that unless the noise level is either very low (much less than 0.5%) and/or falls within a very small range, the advantage hypothesized in using the RCSSE method (see sub-section 3.4.3) may not be realized. 3.5 PATTERN RECOGNITION STUDIES The study up to this point has focused on mathematical aspects of generalized Fourier representations; in particular, the properties and errors of individual coefficients with respect to noise. This section extends the work into a multivariate domain via a pattern recognition approach. The aim is to evaluate the performance of these representations in the context of a pattern recognition problem. Such information could then be used in developing a peak shape classification strategy for flow injection analysis (and other analytical techniques that produce similarly shaped output signals). At the very best, a solution to a classification problem needs only a single descriptor to distinguish between all K different categories. This situation is far from typical and the number of descriptors necessary generally increases with K. While the use of a minimum number of descriptors optimized according to a selected training set - - is most economical, such specificity lessens the ability of the classifier to detect unexpected peak types. Forthe sake of greater generality, it was decided to use all coefficients of the spectrum up to some “optimal” expansion order, at the cost of performance. In doing so, a problem which arises is that, depending on the peaks to be classified, correlations between coefficients can exist, which results in redundancy. For this reason, the spectrum is first converted into (a smaller number of) uncorrelated variables with principal components analysis (PCA). Most, if not all, pattern recognition algorithms, including PCA, require that the number of inputs (i.e. the length of the pattern vector) be fixed. In selecting the number of coefficients used for identification, the effects of noise must be accounted for. It was demonstrated, from the approximation studies, that the number of coefficients necessary for representation to a given accuracy is dependent on the peak. In a noise- Peak Shape Analysis!! 122 free environment, one can simply use the highest number of coefficients required for all conceivable peak shapes for a given FIA system, subject to some practically feasible level of accuracy. However, as shown previously, when noise is present there must necessarily be some trade-off between integrity of signal representation and robustness to noise. In other words, the more coefficients used for identification, the more accurate the representation and the greater the number of peak shapes identifiable but this is countered by an increased susceptibility of peak identification to noise since the high order coefficients will generally be small for FIA peaks. Peaks which require the fewest number of coefficients will be particularly sensitive. With the Meixner representation, variations in time scale must also be accounted for if the RCSSE method is used. One could decide to average two or more consecutive coefficients in order to reduce the noise but this is undesirable when only a small number of significant coefficients exists. Moreover, it cannot be done for the Gram spectrum unless the absolute spectrum is used, in which case, a potential discriminator in the sign of the coefficient is discarded. It is clear that once a particular expansion order is selected, a constraint is inherently placed on the amount of noise that the pattern recognition procedure can tolerate. The competition between ability to discriminate and robustness to noise is an important practical consideration. No simple method has yet been devised to overcome it. One may attempt to determine the optimum expansion order for a given peak and set all higher coefficients to zero. However, this approach is suspect since the optimum expansion order may vary over a broad approximation order depending on the noise level (see Figure 3.15) given that the optimum expansion can be determined - in the first place. Thus, by “windowing” the spectrum in this manner, an artificial, noise dependent difference could be introduced for the peaks that require the most coefficients for adequate representation: the noise problem is now shifted onto such peaks. Should the windowed spectrum be sufficiently different, this may at best lead to the peak being classified as an unknown and at worst cause it to be identified as Peak Shape Analysis!! 123 another type of peak. However, when signal-to-noise ratio is likely to be very poor, windowing may be an option. In light of these problems, one wonders why pattern recognition was attempted here at all. The answer is that discrimination between peaks requires only that the systematic variation in the coefficients due to differences in peak shape be greater than the random and systematic variation due to noise. That is, given that noise is present, the spectra, in whole or in part, may still be sufficiently different between peaks to enable category separation. A highly accurate representation in the spectral domain is not always necessary for classification of all anticipated peak shapes and a partial spectrum or select components may be adequate. While this means that different types of peaks, in general, will not be treated equally (i.e. one type may be completely represented while another may by partially represented), this, in itself, has no influence on classification. Furthermore, electrical noise can often be reduced to an acceptable level in a well designed analyzer. It should also be obvious that, in general, no hard and fast rules can be formulated for selecting the most appropriate basis function; every problem must be treated case by case. Practically, the experience of the analyst can play a significant, if not important, role. A pattern recognition investigation within a PCA framework is now described. It should be emphasized that the goal here is not to develop a classification procedure although the results may be used as such. The aim is simply to explore and compare the structure of the representations and therefore, the magnitude of the effect of noise through the use of simulated data. In this way, the aforementioned problems can be visualized but more importantly, perhaps additional insights can be obtained. A total of thirty different Gaussian-distributed noise sequences were added to each of the simulated peaks in Figure 3.6 to obtain a data set of 186 peaks (noise-free peaks inclusive). Noise levels of 0.5%, 1%, 3%, 5% and 10% standard deviation (with respect to peak range) were investigated. The data were normalized to peak area. To isolate the effects of noise, PCA was first performed on the noise-free peaks to obtain the Peak Shape Analysis II 124 principal components subspace. The complete data set was then projected onto this subspace. The same labels that identified the peaks will also be used to identify the clusters, It is recognized that it is very unlikely that all these peaks would arise from one FIA system - normally, only one of peaks A, F and P would be present. The aim was merely to increase the complexity of the problem to demonstrate the effectiveness of this approach. 3.5.1 Gram and Fourier Representations Data preprocessing left the zeroth order coefficient constant and therefore, it was removed from the analysis; the number of coefficients quoted hereafter takes this into account. The result of PCA on the Gram representation over the first 5, 10, 15, and 25 coefficients and at 3% noise level are shown in Figure 3.26. Note that in the last three cases, there are more variables (coefficients) than objects (peaks) situation often viewed as undesirable. - a However, it is justifiable in the context of principal components (least squares) modeling. For more information, the reader is directed to the tutorial paper of Wold et a!. [59]. Three principal components modeled essentially all of the variability in the data. Six clusters are observed in these figures; each corresponds to one of six simulated peak shapes. When noise is absent, these clusters reduce to the point indicated by the cross-hairs (due to limitations in resolution, these may not be apparent in the figures). As the noise level increases, the clusters expand outward. Excessive noise results in overlap of clusters making positive peak type identification difficult. Since the increase in noise level does not affect the relative position of a peak within a category but only the dispersion of the cluster as a whole, figures corresponding to the other noise levels used are not given. By visual inspection, it was discovered that with a favorable choice for the number of coefficients, up to 10% noise level could be tolerated in this simulation. The balance between integrity of representation and robustness to noise is quite apparent. An inadequate representation which uses just 5 coefficients results in peak Peak Shape Analysis!! I I I I 125 I i) 0.2 A P 0- Li) 0 cv) C) a- -0.2 -0.4 - - 3- I I I I I I I ii) 2Li) cw) Li) I C) a- 1: S 0- —1 Q A - -2 -2 P* I I I -1 0 1 2 3 4 PC #1 [63.71%] I 0.8 - iii) I I iv) 0 -0.4 I -0.8 Figure 3.26a I I 12345 12345 12345 coefficient (order) coefficient (order) coefficient (order) Principal components analysis on simulated peaks using the first to fifth order Gram coefficients. Noise level of 3% added to peaks. Scores plot using the I) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principal component loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i) and (ii). Peak Shape Analysis II I I I 126 I i) 2 .. F c) 1 C1 c) A C) 0 a ’ R 4 4 Q —1 •& S -2 ii) . R41. 2 I I I I I A Q C.,’ 1 0 0 a —1 -2 P .# -3 I I I -3 -1 -2 1 0 2 4 3 PC #1 [45.54%] 0.8 0) 0.4 CD 0 iii) - 0 E1 -0.4 I 0 2 4 6 8 10 coefficient (order) Figure 3.26b v) iv) 0 2 4 6 8 10 coefficient (order) 0 2 I 4 6 8 10 coefficient (order) Principal components analysis on simulated peaks using the first to tenth order Gram coefficients. Noise level of 3% added to peaks. Scores plot using the i) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principal component loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i) and (ii). Peak Shape Analysis!! 127 i) F9 3. - 2 o Co CD - 1 C) S 0 • 5 p4 - s__• -2 2- ii) - Q 1- .*. - *SA Fwfr’ 0 S*r - -1- C) - -2 -3 - iSP -45 -8 I I I I I -6 -4 -2 0 2 - 4 PC #1 [53.61%] I I :.:_uc:uUOOOU coefficient (order) Figure 3.26c I I UUUUOUO I I I DD coefficient (order) coefficient (order) Principal components analysis on simulated peaks using the first to fifteenth order Gram coefficients. Noise level of 3% added to peaks. Scores plot using the i) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principal component loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i) and (ii). Peak Shape Analysis!! I 128 I I i) F s?y. 3c) cw) c..1 2- c) 1- C-) 0 A 0—1 Q ;.‘t. ;i R -. - I 4— I I I I I I I I T ii) 2.4, S 0- C•%1 F 1) -2 - ... -4 f• P - -6 -10 — -8 -6 -4 0 -2 2 4 6 Pc #1 [66.09%] I 0) 0.5 •0 0 0 I I I I I -iv) iii) I I I I II - 0W0Wu000uW0 -0.5 I 0 5 10 15 20 25 coefficient (order) Figure 3.26d I - 0 5 10 15 20 25 coefficient (order) 0 I 5 10 15 20 25 coefficient (order) Principal components analysis on simulated peaks using the first to twenty-fifth order Gram coefficients. Noise level of 3% added to peaks. Scores plot using the i) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principal component loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i) and (ii). Peak Shape Analysis II 129 A being “erroneously” similar (in consideration of human pattern recognition) to peaks Q and R, and exceptionally dissimilar to peak F. Peaks A, Q and R actually share a common model and this is responsible for their observed similarity here; this will be encountered again later with the Meixner representation. Loss of complete discrimination between peaks A and Q (peaks which are quite similar) is observed at the 5% noise level. At the other extreme, use of 25 coefficients yields peak dissimilarities which are consistent with human pattern recognition but now noise, rather than inadequate peak representation, clouds separation between peaks A, F and Q even at 3% noise level. However, adequate separation still exists between this aggregate and the other shapes. From a visual scan, a good and economical balance is observed with roughly ten coefficients. In terms of cluster separation, fewer than this number are actually needed. Indeed, an expansion in five orders is seen to be very discriminatory. However, unless all possible peak shapes are accounted for, it is beneficial to have the clusters organized in accord with common sense such that if an unknown peak (with a signal-to-noise ratio within the working limits) is encountered, the algorithm has the opportunity to ascertain the characteristics of the unknown by interpolation or (if one must) by extrapolation and hence, perform corrective measures. This cannot be done if the position of the clusters have no definite relationship with peak shape. Similar results are observed with the Fourier representation, as depicted in Figure 3.27 for 3, 5, 10 and 15 coefficients and 3% noise. Two principal components describe almost all the variation in the data. The organization of the clusters is appropriate even when as few as three coefficients are used but the clusters are relatively large. Unlike the Gram case, however, a better result in terms of discrimination appears to exist beyond the first few expansion orders. As more coefficients are used, the clusters spread but the increased separation between clusters R and S from the others is most apparent. After ten coefficients are used, cluster R stabilizes but cluster S continues to move away from the rest. Also, at ten Peak Shape Analysis II 130 2- i) A 1.. Q.. R ) 1 C 0 0 a- P —1 - 0” -2 -2 -3 0 -1 2 1 PC#1 [61 .77%] I 1.0• I 1.0—... iii) ii) C) n. v.’J . 0) - 0 s.:. I I 1 2 coefficient (order) Figure 3.27a 3 1 2 3 coefficient (order) Principal components analysis on simulated peaks using the first to third order Fourier coefficients. Noise level of 3% added to peaks. i) scores plot using the first two principal components (variance accounted for by the principal components are shown in brackets next to the axis labels), ii) first principal component loadings, iii) second principal component loadings. Noise-free peaks are shown with a ‘÷‘ in (i). Dotted lines help in showing the membership of points. Peak Shape Analysis II 131 2 i) 1 R : •: F c%1 0- (N C) 0 I’ —1 - s. —2 :. - •. -1 -2 -3 0 3 2 1 Pc #1 [70.69%] 1.00) 0 0 I I I - ii) :pp= 1 2 3 4 coefficient (order) Figure 3.27b 5 10 iii) 1 2 3 4 5 coefficient (order) Principal components analysis on simulated peaks using the first to fifth order Fourier coefficients. Noise level of 3% added to peaks. i) scores plot using the first two principal components (variance accounted for by the principal components are shown in brackets next to the axis labels), ii) first principal component loadings, iii) second principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i). Peak Shape Analysis!! 2 I I I • 132 I) 1- C-) a- F —1 - P -2 I -6 -2 -4 0 4 2 PC #1 [75.25%] 0.5u10r1 o o -0.5 I I I 12345678910 coefficient (order) Figure 3.27c I -0.5 I I I I I 12345678910 coefficient (order) Principal components analysis on simulated peaks using the first to tenth order Fourier coefficients. Noise level of 3% added to peaks. I) scores plot using the first two principal components (variance accounted for by the principal components are shown in brackets next to the axis labels), ii) first principal component loadings, iii) second principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i). Peak Shape Analysis!! I I 133 I I i) 2- A F :. + -2 . -6 -8 4 0 -2 - 2 4 PC #1 [82.25%] I 0.5 - I I I I I ii) I 0.5 2: 13579111315 coefficient (order) Figure 3.27d - I I I I I I I iii) - I 13579111315 coefficient (order) Principal components analysis on simulated peaks using the first to fifteenth order Fourier coefficients. Noise level of 3% added to peaks. I) scores plot using the first two principal components (variance accounted for by the principal components are shown in brackets next to the axis labels), ii) first principal component loadings, iii) second principal component loadings. Noise-free peaks are shown with a ‘+‘ in (I). Peak Shape Analysis!! 134 coefficients, an unexpected effect occurs. The noisy peaks of clusters F and P just begin to separate from their noise-free counterparts. This effect is quite significant at fifteen coefficients. Over a range of noise levels, the cluster thus assumes a triangular shape. Inspection of the first principal component loadings shows that after the first three coefficients, additional coefficients are added in the same way. Hence, the leverage of the noise-dominated coefficients begin to tilt the first principal component in such an orientation that the separation is produced. This problem is seen first for peak P, then F; these two have the fewest significant coefficients. This bias effect will be met again later with the Meixner representation. 3.5.2 Meixner Representation The two ways of selecting a suitable time scale; namely, the critical time scale and RCSSE methods, were both studied. The results for the former are shown in Figure 3.28 for an expansion to the fifth order on data corrupted by 3% noise. Note that the zeroth order coefficient was kept here. PCA results were also computed for the 3rd, 7th, 10th, 12th, 15th and 20th order. All these were found to be similar to the fifth order expansion. As evident from Figure 3.28, the patterns for peaks A, F and P defined most of the variability. Some changes in the relative positions of peaks A, Q and R do occur with an increase in expansion order, but these are relatively minor in comparison. Like the Gram representation for small expansion orders, the commonality of model for peaks A, Q and R results in their similarity in the Meixner domain. In fact, the distribution of peaks in the first principal components plane is very similar to that of Figure 3.24a; greater differences are seen in the second principal components plane. The critical time scale method is very robust. From a visual comparison between Figure 3.28 with Figures 3.26a and 3.27b, the white noise tolerance of the Meixner representation is slightly better than that of the Gram representation and significantly better than that of the Fourier representation. substantiated quantitatively later. This qualitative assessment is Peak Shape Analysis II 135 With the RCSSE method, the effects of noise on discrimination were more significant. This was expected from the results in sub-section 3.4.7. Figure 3.29 shows a fifth order expansion, again with 3% noise in the data. Separation between peak R and peaks A and Q is improved, but the latter two clusters now merge. Other problems are also obvious. Clusters F and 5, in particular, are fragmented because one of two different minima was returned by the golden search routine; membership of fragments are indicated by dotted line/curves. This points to the absolute necessity of having the search method find the minimum value of RCSSE reliably - a stipulation that invariably means greater computational effort. Other clusters were affected at other noise levels and expansion orders. Second, the bias effect reported for the Fourier representation is also present. This is eliminated only if the expansion was limited to the second order but a concomitant loss of separation between peak R and A ensues. Thus, the RCSSE method is not very attractive for the routine practical work envisaged on an automated flow injection analyzer and the critical time scale method is recommended instead. 3.5.3 Selection of Coefficients The discussion thus far has assumed that the pattern vector consists of a contiguous set of coefficients which begin with the zeroth order. As noted above, this does not, in general, produce the best discrimination between categories for a given problem. Better performance may be achieved if the coefficients were used in arbitrary combinations. However, the search for the best combination is tedious. Suppose that a suitable training set (consisting of the peak types to be discriminated) was available. With a naive approach, the selection of the 10 most discriminating coefficients out of a set of 30 requires the evaluation of 30045015 combinations with respect to some criterion. Even a sequential tree scheme [60] can be time consuming. In addition, the exclusion of certain spectral components may hinder identification of unknowns. Since adequate separation was found with PCA, a less ambitious approached was adopted here; namely, the principal component results are optimized with respect to expansion Peak Shape Analysis II 1.5 I I I I 136 i) S 1.0 0) 0.5 C Co C.) 0 P. 0 tQ R*A -0.5 Ft -1.0 I I I I F ii) 2 0) Co C.’,’ Co C.) 0 S I 0 —1 R*A I -5 -4 I I -3 -2 -1 2 1 0 PC #1 [61.43%] 0.8 0) I I I I v) UI) 0.4 0 0 0 -0.4 -0.8 01 2345 coefficient (order) Figure 3.28 012345 012345 coefficient (order) coefficient (order) Principal components analysis on simulated peaks using the zeroth to fifth order Meixner coefficients. Critical time scale method used. Noise level of 3% added to peaks. Scores plot using the i) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principal component loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i) and (ii). Peak Shape Analysis!! 2.0 I I I I 137 I i) S (“I 1.0 - 0) + C’, C.) a 0- F_... -.. ‘. N ./ P A Q 1 .. . R —1 I 3 I I I I I I I I I I F’.* II) 2 CD CD F C•4 I 4 0 a -.-.-- 0 R A —1 . 1 P -4 + I -3 -2 -1 0 I I 1 2 3 PC #1 [62.27%] 0.8 C) 0.4 0 0 I I I I I I I I I I vp -0.4 -0.8 Figure 3.29 I I 012345 012345 012345 coefficient (order) coefficient (order) coefficient (order) Principal components analysis on simulated peaks using the zeroth to seventh order Meixner coefficients. RCSSE method used. Noise level of 3% added to peaks. Scores plot using the i) first and third principal components ii) first and second principal components, (variance accounted for by the principal components is shown in brackets next to the axis labels); iii) first principal component loadings; iv) second principal component loadings; v) third principal component loadings. Noise-free peaks are shown with a ‘+‘ in (i) and (ii). Peak Shape Analysis!! 138 order with a category discrimination criterion. That is, a compromise representation, or more accurately, model, over all peak types is selected based on the requirement of maximum discrimination. This method and a criterion for optimization is described next. 3.5.4 The Fisher Weight Criterion The PCA results above suggest that, for a particular noise level, there is an optimum expansion order where the best discrimination in principal component space is observed. This arises as a result of two competing factors: the increase (if any) in inter-cluster distances due to an increase in expansion order, and the increase in intra cluster distances due to addition of noise components. In order to quantify this, one requires some type of performance index. The Fisher weight (2.6) is one such. To provide an overall measure, the mean Fisher weight (2.7) is used; the resulting performance index is referred to as the Fisher weight criterion (FWC). The aim is to maximize (2.7) with respect to the expansion order. The equation for the Fisher weight is univariate and consequently, dimension reduction is required if more than one principal component is employed; that is, the scores must be projected onto a line in the principal component subspace used for classification. Intuitively, the desired line is that which goes through the centroid of the two categories under consideration. The objects in both categories are projected onto this line. The distance between the two centroids and the scores are then used for input into (2.6). This is done for every combination of two categories in a principal component subspace and (2.7) can then be computed. Given that there are N objects and M descriptors x (I = 1. .N, I = 1.. M) spanning K categories, the method is now explicitly stated as follows: 1. Perform principal components analysis and select number of significant components P (e.g. summed variance meets or exceeds 90% of the total). 2. Compute category centroid: Peak Shape Analysis!! N 1 — —T x 1 x [x 2 — = xe], — — ... 1 X = 3. Compute the vector direction between the centroid of categories a and b: d=(a)—(b) Note that the sign of the vector is not important. 4. Compute the vector length which equals the distance between the centroid of categories a and b: = — NdN = — This is the numerator for the Fisher weight. 5. Compute mean-center scores y, for categories a and b: (a), T yT(a)=xT(a)— 11,...,Na (b), T yT(b)=xT(b)— I=l,...,Nb where Na and Nb are the number of members in categories a and b, respectively. 6. Project respective mean-centered scores of categories a and b onto vector d to obtain scores z on line going through the centroids of both categories: Y ( 1 a) d, (a) 1 z 1 z(b) I — IIdN , = i—I ‘ ‘•‘ Nb — 7. Compute variance of z(a) and z(b). 8. Compute Fisher weight according to (2.6). 9. Repeat steps 3 to 8 for all pairs of categories. K(K- 1)/2 pairs in total. 10. Compute mean Fisher weight (2.7) over all pairs. There will be 139 Peak Shape Analysis!! The advantage of this measure is its simplicity. 140 However, the method is sensitive to the shape of the cluster. The distribution of scores on the line joining the centroids should, strictly speaking, assume some degree of normality. In cases where this condition is poorly met, a non-parametric measure of discrimination [611 may be substituted for the Fisher weight. The FWC is a scalar quantity which summarizes the magnitude of cluster separation and one should bear this in mind when interpreting results. Finally, the criterion can be applied to the data without first performing PCA but this is not recommended here because of possible and probable descriptor correlations. FWC curves for the simulation study described above are given in Figure 3.30 for all three generalized Fourier representations. Since the projection plane has been rigidly fixed, the shape of the FWC function is not significantly affected over the noise range studied. However, the decrease in overall performance, le. discrimination, with noise level is evident in each case. The FWC curves (based on the first three principal components) for the Gram representation shows a general decrease beginning at the minimum order studied (Figure 3.30a). Cluster tightness obtained by using very few terms is responsible for high performance values. This is not surprising since the FWC, as is, treats all clusters equally and the result is in accord with statements made above. For the Fourier case, the FWC curve derived from the first two principal components (Figure 3.30b) is also consistent with the observations made before. The optimal is located roughly at the 12th order expansion. Performance drops significantly beyond an expansion in 16 orders. The FWC curves (based on the first three principal components) for the Meixner case was computed for both the critical time scale method (Figure 3.30c) and the RCSSE method (Figure 3.30d). For the former, its behavior resembles those for the Gram. However, high performance is maintained up to about the 14th order. Thereafter, a noticeable drop in performance is evident. In contrast, the FWC curves Peak Shape Analysis!! 141 IU C) a) 1 I () 1O io 2 100 101 10_I 5 0 10 15 20 25 expansion order I 30 0 15 20 25 expansion order 106 I I I I 10 15 20 d) b) 4 ‘io 10 5 4 1o 0 1O 1O U- 100 100 10_I I 0 5 10 I I I 15 20 25 10_I 0 expansion order Figure 3.30 5 expansion order Plot of Fisher weight criterion (FWC) against expansion order for a) Gram representation (three principal components), b) Fourier representation (two principal components), c) Meixner representation (critical time scale method, three principal components), and d) Meixner representation (RCSSE method, three principal components), over the noise levels shown on graph (a). The zeroth order is excluded for the Gram and Fourier representations. for the RCSSE method shows that it is considerably less robust. The FWC value for a 20th order expansion is roughly an order of magnitude less than the corresponding one for the critical time scale method. The FWC appears to behave as expected. Because it is mathematical, it provides an objective means for selecting the “best” expansion order. Here, at a given noise level, the Meixner representation is slightly better than the Gram representation and both are better than that of the Fourier representation by an order of magnitude in terms of the FWC. The Gram representation owes its better performance to the fact Peak Shape Analysis II that its coefficients can also take on negative values. 142 Better performance may be obtained from the Fourier representation by only using the real or imaginary part such that the sign is retained. Based on these results (which cover some extremes in shape of flow injection peaks), and taking into account the greater variability of real peaks, a recommendation of 5 to 13 coefficients for all three representations should be suitable for most flow injection applications. The exact number must be optimized for a particular system. The FWC is a useful aid for such a task. This is demonstrated in the next sub-section. 3.5.5 Evaluation on Real Data While simulated data provides an ideal test bed for a systematic study of orthogonal polynomial identification, this capability must also be demonstrated on real data. The peaks obtained from the reaction of Fe(lI) with 1,1 0-phenanthroline as a function of reagent concentration and pH are shown in Figure 3.31. They have been normalized to peak area to emphasize the shape of the peak. High signal-to-noise ratio is observed and a greater number of coefficients should be usable. In general, peak magnitude increases with both reagent concentration and pH over the ranges considered [48]. This simple example links non-optimal reaction conditions with peak bifurcation. Clearly, blind application of conventional quantitation routines based solely on peak height or peak area could lead to invalid analyses. The peaks were manually divided into three categories: non-bifurcated, marginal and bifurcated. The assignments are given in Table 3.10. This set may be considered a representative set of peaks, any of which may be encountered by the flow injection system. That is, it is being used as a training set. Again, the zeroth order was dropped for the Gram and Fourier representations. The peaks were normalized to peak area. FWC curves were generated for all three representations using three principal components and plotted in Figure 3.32. Peak Shape Analysis II 143 1.00 0.78 E E 0 0.55 2 C C a) 0 0.32 0.10 0.00 0.25 0.50 0.75 1.00 sodium acetate flow rate (mLlmin) Figure 3.31 Matrix of peak shapes from Fe(ll) —1 ,10-phenanthroline reaction resulting from different combinations of sodium acetate (the pH modifier) and 1,1 0-phenanthroline (reagent) concentrations. Numbers in panels label the peaks. Peak Shape Analysis II Table 3.10 144 Manual Class Assignments for Peaks Obtained from the Reaction Between Fe(ll) and 1,10-Phenanthroline Class Peak numbersa non-bifurcated 2,3,4,5,8,9, 10, 14, 15, 19, 20,25 marginal 1,7, 13, 18,24 bifurcated 6, 11, 12, 16, 17, 21, 22, 23 a) Refer to Figure 3.31 for peak assignments. The FWC curve for the Gram exhibits two humps. The smaller one with maximum at eight coefficients was expected (from the simulation studies). The other hump which peaks at thirty coefficients arises because, beyond twenty coefficients, the cluster of bifurcated peaks was observed to systematically move away from the other two clusters. However, the position of the clusters for the non-bifurcated and marginal peaks changed very little and separation between these two are no better than if eight coefficients were used. The result of PCA on the Gram representation using eight coefficients (i.e. from first to eighth order) is shown in Figure 3.33. principal component plane is depicted. Only the first Sufficient separation between the bifurcated and non-bifurcated peaks is observed and only the first principal component is actually needed. Hence, the computational cost involved in using more than 20 coefficients is hard to justify. The marginal peaks hover around the perimeter of the non-bifurcated cluster and are less well separated. Given that the results in Figure 3.33 are satisfactory, separation may then be achieved automatically with, for example, the following simple classification rule: IFPC#1 <OTHEN Peak is acceptable OTHERWISE Peak is not acceptable may be bifurcated. - Peak Shape Analysis II a) 4 F 145 I 3.5 32.5 - 21.5 - I 0.5 0 b) 1.8 10 20 30 expansion order I I 40 I 1.4 1.0 0.6 0.2 0 c) 3.5 5 10 15 20 expansion order I I I I I I 5 10 25 I 3.0 2.5 2.0 1.5 1.0 0.5 0 15 20 25 expansion order Figure 3.32 FWC curves for a) Gram, b) Fourier, and C) Meixner representations of peaks from the Fe(lI) I ,10-phenanthroline reaction. Three principal components were used. Note that the zeroth order Gram and Fourier coefficient was excluded from the analysis. — Peak Shape Analysis II 146 Of course, discriminant analysis [62] would provide a more rigorous rule. Similar rules may also be generated for the other representations. Note that the bias due to having different numbers of data points is not observed here nor with the other two representations. The FWC curve for the Fourier representation is comparable to that for the Gram representation for orders below fifteen. The portion below the fifteenth order is relatively flat, like the results with the simulated data in the previous sub-section. Again, an eighth order expansion is optimal. The results are shown in Figure 3.34. Complete separation between clusters is observed and the cluster (or more accurately, line cluster) of marginal peaks lies appropriately between the other two. Indeed, a simple linear regression along this cluster may be used as the discriminating function. With the Meixner case, the critical time scale method was employed. The critical values were computed with (3.64) together with equation coefficients from Table 3.7. The FWC curve obtained is very sharp. Performance decays rapidly after the eighth (optimal) order. From the third to fifteenth order, it is the best of the three representations. The PCA results are shown in Figure 3.35. In all three cases, adequate to good separation between non-bifurcated and bifurcated peaks is observed. For this particular problem, the Meixner representation boasts the best separation for expansions in less than 15 orders and the Fourier, the worst, as reflected in the FWC curves. In addition, both the Gram and Meixner representations were able to indicate the extremeness of bifurcation in peaks 16 and 21. The choice of representation will also be dictated by the particular information sought. The availability of three representations provides greater versatility in meeting different performance criteria. 3.6 SUMMARY Spectral analysis via orthogonal transforms has long been a popular means of signal identification. Three families of discrete orthogonal functions: discrete complex Peak Shape Analysis!! 2 147 i) 25 ®20 14 19 10//3 8 4 15 CD CN 0 0 -4. - -6 - -2 0 4 2 PC #1 [43.43%] 1.0 I I 1.0 I I iii) ii) EiE 1 :E coefficient (order) Figure 3.33 coefficient (order) Results of PCA on Gram representation of peaks from the Fe(ll) —1 ,10-phenanthroline reaction. Expansion was performed over 8 coefficients with zeroth order coefficient excluded. i) Scores plot using first and second principal component. Position of each peak is at the center of the numbers drawn, except when a pointer is used. Number only: non-bifurcated; circled number: marginal; and boxed numbers: bifurcated. Percentages in brackets on score plot axis labels refer to variance accounted for by that principal component. ii) First principal component loadings. iii) Second principal component loadings. 148 Peak Shape Analysis II I I I I) 3 - 2- - I - 2 0- -1 15725 - 19@ 14 20 -2- -3 -2 2 0 6 4 PC #1 [63.50%] 1.0 I I I 1.0 I I I iii) ii) 0 coefficient (order) Figure 3.34 coefficient (order) Results of PCA on Fourier representation of peaks from the Fe(ll) 1,10phenanthroline reaction. Expansion was performed over 8 coefficients with zeroth order coefficient excluded. i) Scores plot using first and second principal component. Position of each peak is at the center of the numbers drawn, except when a pointer is used. Number only: non-bifurcated; circled number: marginal; and boxed numbers: bifurcated. Percentages in brackets on score plot axis labels refer to variance accounted for by that principal component. ii) First principal component loadings. iii) Second principal component loadings. — Peak Shape Analysis II I I I 149 I I) 2 1N 15 ® 1 839 -2 -3 - - -8 -6 -4 -2 0 4 2 PC#1 [54.14%] I 0.5 I I I ii) 0.5 I I I iii) I coefficient (order) Figure 3.35 coefficient (order) Results of PCA on Meixner representation of peaks from the Fe(ll)—1,10phenanthroline reaction. Expansion was performed over 8 coefficients with zeroth order coefficient included. i) Scores plot using first and second principal component. Position of each peak is at the center of the numbers drawn, except when a pointer is used. Number only: non-bifurcated; circled number: marginal; and boxed numbers: bifurcated. Percentages in brackets on score plot axis labels refer to variance accounted for by that principal component. ii) First principal component loadings. iii) Second principal component loadings. Peak Shape Analysis II 150 exponential Fourier functions, Gram polynomials and Meixner polynomials were examined for suitability in identification of flow injection signals. The theory was outlined and a comprehensive study on the numerical properties was undertaken. When transformed, the flow injection data consisting of 100 to 500 points could be adequately represented by a much smaller set of coefficients, typically ranging from 2 to 30, depending on i) the orthogonal function, ii) peaks to be separated and iii) digital accuracy. Hence, the volume of data (to be subsequently processed or stored) is substantially reduced; this can be accomplished with minimal loss of relevant information. Each transform was found to offer a different view of the data. The recurrence method for generating the Gram polynomials was found to be susceptible to round-off errors. An equation, valid over 65 to 513 points, for the upper limit of expansion has been developed. A useful rule of thumb over this range is 15 orders for every 100 data points. The Meixner polynomials are defined over the interval [0, oo] and in practice, these functions must be truncated. In general, this lead to a loss of orthogonality over all functions in the set. A parameter is available to match the time scale of these functions with the given data: this enhances its performance for approximation. For robust identification, it was more important to maintain orthogonality, in the numerical sense, over a subset of functions used. Two methods for determining the highest time scale value (i.e. the critical time scale) at which orthogonality was still observed with this subset were proposed. An empirical model was developed to allow direct computation of the critical time scale; the model is valid over expansion orders from I to 40 and data points from 65 to 689. A multivariate study with principal components analysis on simulated data indicated that tolerance of noise levels up to 10% (with respect to peak range) or more may be possible, depending on what is to be identified. A criterion based on the Fisher weight was proposed for determining the optimum span of coefficients to use for a particular application. In general, an expansion in 5 to 13 terms, for the orthogonal Peak Shape Analysis!! families considered, was recommended as suitable for most FIA systems. 151 The approach was demonstrated successfully on peaks arising from the analytical method for determination of Fe(II) by reaction with I ,1O-phenanthroline. Peak Shape Analysis!! 152 REFERENCES [1] [2] N. Ahmed and K. R. Rao, Orthogonal Transforms for Digital Signal Processing, Springer-Verlag: New York, 1975. A. Ralston, A First Course in Numerical Analysis, 2nd edn., McGraw-Hill: New York, 1965, Ch. 6. [3] R. E. King and P. N. Paraskevopoulos, mt. J. Circuit Theory App!., 5 (1977), 81. [4] P. R. Clement, J. Franklin Inst., 313 (1982), 85. [5] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical Recipies: The Art of Scientific Computing, Cambridge University Press: Cambridge, 1987, Ch. 12. [6] M. Abramowitz, I. A. Stegun, Handbook of Mathematical Functions, National Bureau of Standards: Washington, 1964. [7] G. M. Hieftje, Anal. Chem., 44 (1972), 81A. [8] G. Horlick, Anal. Chem., 44 (1972), 943. [9] P. R. Griffiths (ed.), Transform Techniques in Chemistry, Heyden: London, 1978. [10] A. G. Marshall (ed.), Fourier, Hadamard and Hi!bert Transforms Transforms in Chemistry, Plenum: New York, 1982. [11] I. C. Van Nugteren-Osinga, M. Bos and W. E. Van der Linden, Anal. Chim. Acta, 214 (1988), 77. [12] [13] R. Deutsch, System Analysis Techniques, Prentice-Hall: Englewood Cliffs, 1969, p. 103. Y. W. Lee, Statistical Theory of Communication, Wiley: New York, 1960. [14] P. A. Gorry, Anal. Chem., 62 (1990), 570. [15] J. E. Kuo, H. Wang and S. Pickup, Anal. Chem., 63 (1991), 630. [16] M. M. Amer, S. M. Hassan, A. Aboul Kheir and A. A. Mostafa, Pharmazie, 33 (1978), 344. [17] A. M. Wahbi, S. Belal, H. Abdine and M. Bedair, Talanta, 29(1982), 931. [18] E. A. El-Yazbi, M. H. Abdel-Hay and M. A. Korany, Pharmazie, 41(1986), 630. [19] M. E. Abdel-Hamid and M. A. Abuirjeie, Analyst, 112 (1987), 895. [20] 5. M. Hassan and N. T. Loux, Spectrochim. Acta B, 45 (1990), 719. [21] H. J. G. Debets, A. W. Wijnsma, D. A. 000rnbos and H. C. Smit, Anal. Chim. Acta, 171 (1985), 33. [22] A. Erdélyi, Higher Transcendental Functions, Vol. II, McGraw-Hill: New York, 1953, Ch. 10. Peak Shape Analysis!! 153 [23] A. L. Glenn, J. Pharm. Pharmac., 15 suppl. (1963), 123T. [24] P. J. H. Scheeren, Z. Klous, H. C. Smit and D. A. 000rnbos, Anal. Chim. Acta, 171 (1985), 45. [25] [26] T. Y. Young and W. H. Huggins, IRE Trans. Circuit Theory, CT-9 (1962), 362. R. L. Burden, J. 0. Faires and A. C. Reynolds, Numerical Analysis, 2nd edn., Prindle, Weber and Schmidt: Boston, 1981, p. 336. [27] G. A. Dumont, C. C. Zervos and G. Pageau, Automatica, 26 (1990), 781. [28] C. C. Zervos and G. A. Dumont, [29] U. Nurges, Avtomatika i Telemekhanika, 3 (1987), 88. [30] G. A. Dumont, Chemom. Intell. Lab. Systems, 8 (1990), 275. [31] J. Ruzicka, Anal. Chim. Acta, 261 (1991), 3. [32] R. Deutsch, System Analysis Techniques, Prentice-Hall: Englewood Cliffs, 1969, p. 294. G. Szego, Orthogonal Polynomials, Amer. Math. Soc. Colloquium Publications 23, rev. edn., 1959. [33] mt. J. Control, 48 (1988), 2333. [34] E. 0. Brigham, The Fast Fourier Transform, Prentice-Hall: Englewood Cliffs, 1974. [35] H. Anton, Elementary LinearAlgebra, [36] J. A. Cadzow and H. F. Van Landingham, Signals, Systems and Transforms, Prentice-Hall: Englewood Cliffs, 1985. [37] R. L. Burden, J. D. Faires and A. C. Reynolds, NumericalAnalysis, 2nd edn., Prindle, Weber and Schmidt: Boston, 1981, p. 331. [38] Y. W. Lee, Statistical Theory of Communication, Wiley: New York, 1960, p. 460. J. A. Cadzow and H. F. Van Landingham, Signals, Systems and Transforms, Prentice-Hall: Englewood Cliffs, 1985, p. 269. [39] [40] 4th edn., Wiley: New York, 1984. [43] ibid, p. 273. A. Erdelyi, Higher Transcendental Functions, Vol. II, McGraw-Hill: New York, 1953, p. 156. E. 0. Brigham, The Fast Fourier Transform; Prentice-Hall: Englewood Cliffs, 1974, p. 97. M. J. Gottlieb, Am. J. Math., 60 (1938), 453. [44] J. Meixner, J. London Math. Soc., 9 (1934), 6. [45] E. Polak, Computational Methods in Optimization, Academic Press: New York, 1971. [46] R. D. B. Fraser and E. Suzuki, Anal. Chem., 41(1969), 37. [41] [42] Peak Shape Analysis!! 154 [47] A. I. Vogel, A Textbook of Quantitative inorganic Analysis, 3rd edn., Longman: London, 1962, pp. 786-787. [48] A. P. Wade, P. M. Shiundu, P. D. Wentzell, Anal. Chim. Acta, 237 (1990), 361. [49] W. H. Press, B. P. Flannery, S. A. Teukoisky, W. T. Vetterling, Numerical Recipies: The Art of Scientific Computing; Cambridge University Press: Cambridge, 1986, Ch. 10. [50] ibid, Ch. 2. [51] G. E. Forsythe, M. A. Malcolm and C. B. Moler, Computer Methods for Mathematical Computations, Prentice-Hall: Englewood Cliffs, 1977, Ch. 10. [52] C. W. Clenshaw and A. R. Curtis, Numer. Math., 2 (1960), 197. [53] E. 0. Brigham, The Fast Fourier Transform; Prentice-Hall: Englewood Cliffs, 1974, p. 151. [54] W. H. Press, B. P. Flannery, S. A. Teukoisky, W. T. Vetterling, Numerical Recipies: The Art of Scientific Computing; Cambridge University Press: Cambridge, 1986, Ch. 4. [55] J. Ruzicka and E. H. Hansen, Flow injection Analysis, 2nd edn., John Wiley and Sons: New York, 1988, p. 36. [56] D. Betteridge and J. Ruzicka, Talanta, 23 (1976), 409. [57] R. J. Larivee and S. D. Brown, Anal. Chom., 64 (1992), 2057. [58] J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag, New York, 1980, p. 201. 5. Wold, K. Esbensen and P. Geladi, Chemom. Intell. Lab. Syst., 2 (1987), 37. [59] [60] D. L. Massart, B. G. M. Vandeginste, S. N. Deming, Y. Michotte and L. Kaufman, Chemometrics: a Textbook, Elsevier: Amsterdam, 1988, p. 408. [61] K. A. Soulsbury, A. P. Wade and 0. B. Sibbald, Chemom. Intel!. Lab. Syst., 15 (1992), 87. [62] D. L. Massart, B. G. M. Vandeginste, S. N. Deming, Y. Michotte and L. Kaufman, Chemometrics: a Textbook, Elsevier: Amsterdam, 1988, Ch. 23. 155 Chapter 4 Dig ital Filtering of Flow Injection Data You should call it “entropy” and for two reasons: first, the function is already in use in thermodynamics under that name; second, and more importantly, most people don’t know what entropy really is, and if you use the word “entropy” in an argument you will win every time! JOHN VON NEUMANN Advice to Claude E. Shannon, who was considering calling a function he had derived for the theory of communication, information (as recounted by Myron Tribus [1]). Filtering is an operation that is often required when electrical measurements, relating to some chemical property, are taken from a chemical system of interest. The goal is to enhance the quality of the data by increasing the signal-to-noise ratio (SIN) such that the desired information can be extracted more readily. With the incorporation of digital hardware in analytical instrumentation and availability of affordable, powerful microcomputers, this important aspect of the measurement process has fallen increasingly on the lap of digital filters. In analytical chemistry, digital filters have grown in popularity since the pioneering work of Ernst [2], and Savitzky and Golay [3]. Their use have permeated many instrumental methods such as flow injection analysis [4], chromatography [5], optical spectrometry and spectroscopy [6], and nuclear magnetic resonance spectroscopy [7]. In this chapter, the performance of various digital filters on flow injection data will be evaluated in terms of filtering effectiveness. These include time-domain filters of the Savitzky-Golay type and Fourier-, Gram- and Meixner-domain (indirect) filters. The common thread is that they are all based on a (general) polynomial model. Strategies Digital Filtering of Flow Injection Data 156 based on a predictive error criterion are described for automatic near-optimal spectral truncation for indirect filters. filters. This represents a step towards “black-boxing” digital Optimal filtering “on the fly” leads naturally to improved performance and greater analytical reliability of automated flow injection systems, e.g. by improving estimates of time-domain peak (quantitation) parameters which must be determined numerically. There are several reasons for inclusion of such a study in this dissertation. First, the relative noise level in the data generally determines the precision at which a flow injection peak can be quantified and the uniqueness with which it can be identified, in the case of peak shape analysis. In quantifying signal characteristics, signal integrity cannot always be sacrificed for maximum SIN. While some descriptors such as peak height or peak position benefit considerably from maximum S/N enhancement (via match filtering [8]), others such as peak width are rendered unusable due to severe signal distortion. Moreover, peak distortion cannot be tolerated when one or more points on the peak other than the maximum are employed. It has been recognized that no single filtering technique can be optimal for every parameter which can be extracted from the data [9]. Consequently, filtering will be treated in the context of “weak” filtering [10] in which noise is reduced to the extent that signal distortion is insignificant. Second, gradient-based calibration, exclusive to FIA, is currently without a convenient, accurate parametric regression model. Hence a non-parametric model (i.e. one in which a collection of data define the mode!) of time versus concentration serves as a template. As such, noise can be a concern. While the typical approach has been to ensemble average [11] over N data records to reduce noise, the reduction factor is only proportional to fiJ. This is clearly inefficient. Indeed, Savitzky and Golay first proposed the use of polynomial filters to supplement ensemble averaging [3]. Furthermore, as a consequence of instrumental non-ideality (e.g. pump noise, error in synchronization of injection and data acquisition), such averaging does not completely eliminate noise. Filtering, thus, becomes an attractive alternative. Again, Digital Filtering of Flow Injection Data 157 this problem can be treated in the context of weak filtering. Third, the use of orthogonal transforms for filtering was alluded to in the last Here, the filtering process involves selective attenuation/removal of basis chapter. function components from an exact spectral representation of the given data. The Fourier transform has been applied extensively for digital filtering. The same cannot be said for the Gram polynomials (apart from its use in deriving the filters popularized by Savitzky and Golay which are used frequently in analytical chemistry). Meixner Exploitation of these increases the polynomials are new to analytical chemistry. repertoire of filtering tools, thereby, providing greater flexibility in dealing with a variety of filtering applications. Finally, this study is of significance in light of the fact that much of the work in the literature which dealt with digital filters has employed Gaussian- or Lorenzian shaped peaks only, and little work [12, 13] has been reported on asymmetric peaks, which are common in flow injection analysis. Many researchers have also measured filter performance mainly on the basis of noise reduction; signal distortion was a secondary concern. Both these issues are addressed in this chapter. 4.1 DIGITAL FILTERS In simple terms, a filter is a device which manipulates a set of inputs, under well- defined rules, to produce a set of outputs which to justify the cost - more useful [14]. - are deemed to be It transforms a signal, for example, from one type to another. As shown in Figure 4.1, there are two main objectives to filtering. First, it can be used to enhance the quality of the inputs. This is often a necessary practice when (electrical) measurements are made or when the inputs are used to control devices [15]. Second, it can be used to extract information, in the form of parameters, from the inputs - a process referred to as estimation [16]. Filters are classified as either digital or analog depending on the nature of the input and output signals. This difference dictates different though often analogous design and analysis techniques [17]. - - Digital Filtering of Flow Injection Data 158 enhanced signal *put xcrS signal parameters Figure 4.1 Diagram of a filter and its two primary applications. Digital filters have evolved from mere simulations of analog filters to become an attractive, economical alternative for many filtering applications [18]. The advent of very large-scale integration (VLSI) digital circuits has contributed significantly to their widespread use in scientific and engineering research. Though digital filters may place a significant demand on computational power, they offer several advantages over their analog counterparts. Because digital filters can be realized through software (as it usually is), a single general-purpose microprocessor can be used for several different filter applications. This is a prime consideration for an automated flow injection analyzer which is designed to support more than one analytical method - each of which may demand different noise reduction strategies and/or perhaps different types of information need to be extracted. Indeed, since the raw signal can be captured, a different filter can be used at a later time if the current filter was inadequate or if additional information was desired. implementation and testing. The software approach also accelerates filter Digital filters do not necessarily require the use of expensive, high precision components. They do not suffer from hardware problems such as component drift and matching, since they are based solely on arithmetic Digital Filtering of Flow Injection Data operations [19]. 159 This makes them reliable, predictable and maintenance-free. Moreover, no additional noise, other than computational round-off, is introduced into the signal; in contrast, every component in an analog filter is a potential noise source [20]. Finally, digital filters often exhibit superior performance [19]. Many types of digital filters exist. They differ in performance (amount of noise reduction, speed and memory requirements), method of implementation (nonrecursive, recursive), generality (general models, specific models) and operating domain (time domain, frequency domain), Of these, digital filters which incorporate system models (e.g. Kalman filter [21]) are perhaps the most effective at reducing (or even eliminating) noise from signals. Unfortunately, not all flow injection systems can be adequately modeled as it is often difficult to account for all experimental, particularly chemical, factors. A change in manifold configuration with analytical method generally requires re-identification of the system. Furthermore, model specificity negates flexibility. For these reasons, methods based on general polynomial models and on conventional (i.e. frequency selective) digital filters [18] are an attractive alternative. Digital filters may also be divided into real time or batch filters [22]. With the former, the data are processed while they are being collected. produced at the rate of data acquisition. Hence, the output is In terms of estimation, this represents a flexible means for decision-making since the necessary information is obtained uinstantlyI so that, for example, appropriate corrections to the process under control can be effected immediately if necessary. With the latter type, the processing occurs after all the data points have been collected. Here, more complex but time consuming computations can take place. For flow injection analysis, the necessity of real-time filtering is debatable because I) the time scale of a typical flow injection experiment (20 60 s) is far greater than that of the computational time required by the filters - considered here (less than 2 s on a 33 MHz Intel 80486DX-based microcomputer), ii) the filter (and analysis) operation could be interlaced with the preparation cycle (e.g. wash/sample priming etc.) of the flow injection experiment, or passed to a Digital Filtering of Flow Injection Data 160 dedicated digital signal processing sub-system if that is available, and iii) with regards to peak shape analysis, a sizable portion of the peak must be recorded and analyzed before fault classification is possible (even when real-time filtering is used): real-time corrective actions on physicochemical problems are not feasible at present. As such, fast batch filtering, or pseudo real-time filtering, is considered sufficient. When used for reduction of high frequency noise (Le. whose frequency components are above that of the signal), digital filters are often referred to as low-pass filters or smoothers. At present, the predominant forms are based upon direct convolution in the time domain or, equivalently, windowing in the frequency domain [9]. The applicability of all these rest on the premise that the noise varies faster than changes in the signal itself. That is, given that the signal has been properly sampled according to the Nyquist sampling theorem [23], the signal should be confined to a few low frequency components while the noise should be composed mainly of higher frequency components. The greater the difference between the frequencies spanned by signal and noise, the higher the prospect of recovering the true signal. The ideal low-pass filter assumes a rectangular window function in the frequency domain; that is, it has a value of unity for frequencies less than some specified cut-off frequency and a value of zero for all higher frequencies. In the following sub-sections, the principles of digital filtering will be briefly reviewed and the filtering methods employed for this work described. comprehensive treatment may be found in other texts dealing A more with the subject [18, 24-27]. For this chapter only, time-domain signals and parameters will be denoted with lowercase letters while the transform-domain counterparts will be denoted with uppercase letters. 41.1 Linear Filters A filter takes an input x(t) and produces an output y(t). For a digital filter, the input and output are cast in the form of a time series and assume quantized values Xk Digital Filtering of Flow Injection Data and 161 respectively, Yk “-I Xk Filter I ‘-LFilter I where k is the time index. A digital filter is linear if + bUk axk where Xk gives output Yk’ Uk gives output Vk, ay, + bVk and a and b are arbitrary constants. A linear filter is said to exhibit the properties of superposition (addition) and homogeneity (scalar multiplication) [28]. This definition is consistent with the definition of a linear transform in mathematics. Furthermore, a digital filter is Xk ‘I Filter ‘I Filter I stationary or shift in variant if Yk implies Xk, I Y,, for all shifts i. A filter that satisfies both properties is called a linear, shift invariant filter. These types of filters represent the most popular and well studied class of digital filters. The filters used here fall into this category. 4.1.2 Impulse Response A filter’s impulse response hk is its output when the input is an impulse. For a digital filter, the impulse takes the form of a unit impulse which is defined as Ii Xk= 0 ifk=O otherwise (4.1) The impulse response is of particular significance to linear, time-invariant filters because it links the input to the output. Specifically, the input is related to the output through a convolution with the impulse response [29]: Yk = h,xk, (4.2) 162 Digital Filtering of Flow Injection Data where i is the shift or lag variable, defined here over the most general range. When the impulse response goes to zero after a finite duration, the filter is referred to as a finiteduration impulse response (FIR) filter; if the duration is infinite, the filter is referred to as an infinite-duration impulse response (IIR) filter. In general, the former are non- recursive while the latter are recursive; some FIR filters have been implemented recursively [30]. While the development has been conducted in the time domain, an equivalent frequency domain representation exists. This is made possible by the convolution theorem [31], which states that convolution in the time domain is equivalent to multiplication in the frequency domain. Application of the (discrete) Fourier transform on (4.2) yields the equivalent relationship (4.3) Y=HX where n is the discrete frequency index. 4.1.3 Frequency Response The frequency response is a powerful means for characterizing filters. It describes the amount of amplification or attenuation of a frequency component possible by a filter. This concept plays an important role in the design of both digital and analog filters. In either case, the frequency response is developed through an eigenfunction eigenvalue argument [32]. Suppose that the input to the digital filter is a complex exponential, (4.4) xk=e where Co is angular frequency and] = Yk -Pi. Substitution of (4.4) into (4.2) yields The expression in square brackets is a function of makes Xk (4.5) = Co. It is the eigenvalue of Xk, which an eigenfunction of any linear, shift invariant filter. The eigenvalue is in fact Digital Filtering of Flow Injection Data 163 the frequency response: H(a) = (4.6) which describes the change in magnitude and phase at frequency . Inspection of (4.6) reveals that it is simply the discrete Fourier transform of the impulse response. It corresponds to H in (4.3). This is implied by 2irn (47) where N is the number of samples. All the properties of the discrete Fourier transform are observed. Consequently, the frequency response of a discrete-time system is periodic with a period equal to the sampling frequency. Since angular frequency is used, this means the frequency response repeats every 2n radians. By convention, the frequency interval is taken to be from - to r (or 0 to r if frequency response is symmetric). 4.1.4 Indirect Filtering in the Spectral Domain A convenient and powerful technique for data smoothing is to convert the data (consisting of N points) into the frequency domain via the Fourier transform where the two may be differentiated on the basis of spectral differences, attenuate the components due to noise with some appropriate window function, and then perform the inverse transform to construct the smoothed time domain data [33]. Following Press et a!. [34], this approach will be referred to as indirect filtering. It was shown in the theory section of Chapter 3 that such an operation can be cast in terms of linear least squares polynomial approximation. The rectangular window function, as exhibited by the ideal low pass filter response, is equivalent to an approximation conducted only up to the th (less than N) function in the set, which corresponds to the cut-off point in the frequency domain. Hence, higher order polynomials are effectively “filtered” out. Ideally, the Digital Filtering of Flow Injection Data 164 desired model is sufficient to fit the pure signal but not so high as to fit the noise [35]. Methods to select the “besU’ model order are described later. This procedure is not restricted to the use of trigonometric polynomials nor is the orthogonal property required (although it simplifies computation). Any other set of basis functions could also be used to perform smoothing in this manner. Examples include the Gram and Meixner series of polynomials, in which case, signal and noise separation is conducted in the Gram and Meixner domains. Recall, from sub-section 3.1.7, that the solution to the linear least squares problem was given by the normal equations (3.42), which in matrix form is given by a=A”y where a is the vector of parameters, y is the vector of data points to be fitted, A of size (N x M) is the design matrix and A# = (ATA)IAT is the pseudoinverse (for M < N). The estimated or smoothed values may be generated by left multiplication of the coefficient vector by the design matrix to yield =Aa (4.12a) or (4.12b) where is the vector of smoothed data points. Following Bialkowski [36], AA# is referred to as the estimation matrix since each row (or column since it is symmetric) is used to generate an estimate of the corresponding datapoint. While not obvious, the row (or column) vectors in the estimation matrix are inherently normalized to unity [36]. When M = N 1, the estimation matrix reduces to the identity matrix: a forward - transformation followed by its inverse can be replaced by an identity operator. If the basis functions are orthogonal, the estimation matrix becomes AAT, which is much easier to handle, and as mentioned before, the problem of singularity is also avoided. Digital Filtering of Flow Injection Data 165 4.1.5 Finite Impulse Response (FIR) Filtering Equation (4.12b) shows that the kth point in the smoothed vector is obtained by taking the scalar product of the kth row of the estimation matrix with the data vector. In order words, each row (or column) can be regarded as the impulse response function of a filter which smooths the kth point [36]. However, the same impulse response can be used to smooth the entire data vector via convolution. From (4.2), one can see immediately that this is characteristic of an FIR filter. Due to the way it operates, FIR filters are also known as moving average filters. When the Gram polynomials are used to generate the estimation matrix, the FIR implementation is the well-known Savitsky-Golay filter [3]. This type of digital filter is perhaps the most used and studied in analytical chemistry [10, 37-40]. For an odd number of points, the middle row yields the traditional Savitsky-Golay midpoint smoother. It is symmetric and as a result, exhibits linear phase [41]. This is a desirable characteristic since phase distortion does not exist and positional integrity is maintained. The two parameters: filter length L (number of points) and polynomial order, provide some flexibility in matching the frequency characteristics to a particular application. Figure 4.2 shows the effect of these two parameters on frequency response for selected midpoint smoothers. The cut-off frequency increases with increasing order and decreasing filter length. The higher the degree of the polynomial, the greater the tangency at co = 0. Also, the ideal low-pass filter response is far from realized: the transition from low-pass to high-pass or transition band is gradual and high frequency components are not eliminated entirely. The noise rejection of FIR filters can be computed from the impulse response. Specifically, given that the noise is white, the ratio of the variance in the output to that of the input is given by [42] utput = input b? I (4.13) Digital FIltering of Flow Injection Data a) I 166 I 1.00 0.75 6 0.50 0.25 0 b) I I 1.00 0.75 5 0.50 0.25 0 0 0.25 ,r 0.50 r 0.75 r (0 Figure 4.2 Frequency responses of a) 11 point Savitzky-Golay filter of orders of 2, 4 and 6, and b) quadratic filter with 5, 11 and 21 points. where b, are the filter coefficients. The variance ratio for each impulse response function is simply the diagonal elements of the estimation matrix [36]. The sum of all variance ratios or trace(AA#) is equal to M, the number of basis functions in the design matrix. This can be inferred by recognition of the fact that AA# reduces to the identity matrix, as mentioned above. For the Savitzky-Golay filter, the variance ratios increase with polynomial order at a fixed number of points. The minimum variance ratio occurs for midpoint smoothers except when the order is even, in which case, the minimum depends on but moves towards the midpoint smoother with order [36]. Of course, FIR versions of filters can also be obtained for the complex exponential functions and Meixner polynomials. Indeed, the former type is well known Digital Filtering of Flow Injection Data 167 in digital filtering circles where filter design is conducted predominantly in the frequency domain. Some general statements will be made about these two for the sake of completeness but they were not applied in this dissertation. Presentation of Meixner based FIR filters constitute new material (to the author’s knowledge, these have not been reported on in the literature). For the complex exponential set of functions, the impulse responses all have the same coefficients arranged in the same cyclic sequence. The maximum impulse response value for the kth impulse lies at the kth position. For example, if the first impulse response was {1, 0.75, 0.5, 0.25), the second would be {0.25, 1, 0.75, 0.5), and so on. It is clear that identical variance ratios are exhibited by each. Thus, the variance ratio must be the average of the diagonal values of AA#, which is trace(AA”)/length(AA”) = MIN, (i.e. columns of A! rows of A). Of course, each has a different frequency response but the overall attenuation of frequency components is subject to this constraint. The frequency response is also subject to Gibbs phenomenon [43]: an overshoot is observed at the cut-off frequency. This problem arises whenever the Fourier functions must approximate a discontinuity, which, in this case, is the edge of the ideal filter function. The effect can be reduced, at the cost of a broader transition band, by convolving the impulse function with a suitable window function [44]. In practice, the computational approach outlined above is not used since expressions are available to compute the Fourier FIR filter coefficients directly [44]. The Meixner polynomials yield only asymmetric impulse response functions and hence, exhibit non-linear phase characteristics. The frequency response for selected impulse response functions or filters generated from a second order FIR implementation with a filter length of 11 points is shown in Figure 4.3. The time scale of the functions was 0.25 sub-section 3.4.2. comparison. - a practical critical time scale determined as described in The 11-point Savitzky-Golay midpoint quadratic is included for The first Meixner FIR filter is zero (no-pass filter) and is not plotted. Excluding this first filter, the rest of the filters can be grouped into three types. The Digital Filtering of Flow Injection Data I I I I 168 I 1.5 1.0 © 0.5 © 00 0.25 ,r 0.50 r 0.75 r 0) Figure 4.3 Frequency responses of second order Meixner impulse response functions of length 11. A time scale of 0.25 was used. Circled numbers indicate the row of the estimation matrix from whence the impulse responses were taken. The frequency response of the 11-point Savitzky-Golay quadratic midpoint smoother is shown with a dotted line. second filter is essentially an all-pass filter. All filters beyond the third are low-pass filters with successively increasing attenuation. The third, itself, has a response somewhere in between these two cases. For this set, the fourth and fifth filters have perhaps the best low-pass characteristics. But their high frequency attenuation is much inferior to that of the quadratic Savitzky-Golay mid-point filter one reason for not using - Meixner-based FIR filters. As the time scale increases over its range, the frequency response of each low-pass filter sequentially achieves dominance in terms of maximum magnitude. 4.2 DETERMINING THE OPTIMAL ORDER FOR INDIRECT FILTERS A well recognized problem that results when using indirect filters is that of determining the spectral cut-off point [5, 6, 33, 45-48]. When the signal and noise frequency-domain distributions are completely separated, the cut-off point is obvious, but when they overlap as is often the case, its position is less determinate. This situation worsens as SIN decreases. Hence, spectral truncation could be a haphazard Digital Filtering of Flow Injection Data 169 operation unless accurate a priori knowledge is available on both signal and noise [20]. Unfortunately, such information is usually not readily available. A number of ad hoc methods have been described for the selection of a cut-off point for Fourier filtering. Horlick [33] has suggested that the data be acquired under conditions of very high SIN such that the separation between the signal and noise components is easily discerned; the cut-off point can then be selected visually. More objective methods were subsequently proposed to avoided the subjectiveness of “filtering-by-ey&’. Kirmse and Westerberg [45] chose the frequency component which was just above the maximum noise magnitude, as estimated from the final points of the spectrum. Maldacker et a!. [5] selected the cut-off point at the frequency whose magnitude has fallen to 0.1% of the maximum non-d.c. component. Bush [6] employed a method in which the standard deviation of a specified number of the last few points was first calculated. Another point was added and the computation repeated. In this manner, the procedure was worked back to low frequencies. The cut-off was selected at the point where a significant increase in the standard deviation (from 5 to 40%) was observed. Brinkley and Dessy [46] used the criterion of Maldacker et a!. [5] as a starting point and then computed spectra for a range of frequency cut-offs about that initial point. The results were then compared with theoretical data and the cut-off selected on the basis of best agreement with a particular parameter of interest case, it was peak height. criterion. - in their Lam and Isenhour [47] proposed the equivalent width The equivalent width of a peak corresponds to the width of a rectangle having the same area and height. In fact, it is simply the ratio of these two quantities. The equivalent width in the time-domain is equal to the reciprocal of that in the frequency-domain, the value of which gives the desired cut-off frequency. Felinger et a!. [48] extended Bush’s method [6] by dividing the frequency spectrum by a sharpening function, i.e. a function which is the theoretical shape of the peak, to form a ratio function. This was evaluated in the vicinity determined by Bush’s method and the cut off frequency was deemed to be at the point where the function began to increase. Digital Filtering of Flow Injection Data 170 They also evaluated some of the other techniques and found that the methods of Lam and Isenhour [47], and Krimse and Westerberg [45] tended to select cut-offs higher than the optimal while the opposite was true for that of Maldacker et a!. [5]. With the exception of the method of Kirmse and Westerberg [45], drawbacks of these techniques include I) strong dependence on prior assumptions and/or ii) the proposed criterion themselves require arbitrary decisions. Maldacker et a!. [5] make the implicit assumption that the noise will indeed fall below the given threshold. The specification of an appropriate threshold is as arbitrary as determining the cut-off visually. The method of Brinkley and Dessy [46] depended on the accuracy of their theoretical model. Lam and Isenhour’s equivalent width criterion [47] requires computation of the peak height - a parameter which is often sought via filtering! The method of Bush [6] is quite arbitrary in its criterion for significant increase in standard deviation. Felinger et a!. [48] made use of a sharpening function based on the theoretical model. Kirmse and Westerberg’s [45] method estimates the noise level from a region far removed from the point of interest. Finally, with the exception of the equivalent width criterion, all these methods required the computation of all spectral components - a concern when basis functions besides the classical Fourier set are used. It is quite evident that none of these techniques are general nor are they well suited for automatic near-optimal smoothing of arbitrary signals. Clearly, other approaches are needed. Recently, Larivee and Brown [49] proposed the use of Shannon’s information entropy as a means to resolve the problem. Rissanen [50] had earlier used this approach for model selection in system identification. problems are intimately connected. As discussed later, the two The use of information theory represents a significant advance from those described above. Shannon’s information entropy H is defined as H = —pIog(p) (4.19) Digital Filtering of Flow Injection Data 171 where Pk represents the probability of a discrete event. Taken as a whole, the Pk form a probability distribution and so Pk = 1. The entropy describes a probability distribution, in a manner analogous to the mean or standard deviation. Larivee and Brown [49] proposed (4.19) as a criterion for detecting either the decrease in peak intensity or increase in peak broadening associated with over-filtering. The normalized smoothed data was used to represent a probability distribution and it is this that is maximized with (4.19). This approach appears to be based on the principle of maximum entropy introduced by Jaynes [51] for statistical mechanics. In contrast to the work of Rissanen [50], no constraints were imposed here and so, for (4.19), the maximum occurs when the distribution is the uniform distribution, i.e. when all the Pk are equal. No explanation was given as to how an optimum was guaranteed. Their strategy was to detect peak broadening by scaling the smoothed data to a constant height but if noise is still present, the required scaling must be estimated for filtering. Hence, the results could be and were affected - - another task being subject to the method of peak height estimation. Rissanen’s method was also affected by a scaling problem [52]. There is room for improvement. Before describing the approaches adopted for this work, one additional point will be made. The question as to whether an objective method is an essential advantage over a subjective one has been raised by SoderstrOm [53] in the context of model selection for system identification: with either approach, there is always a risk of selecting an inappropriate model. This point will become apparent later. For off-line filtering, the author agrees wholeheartedly with SöderstrOm and the importance of an objective method could very well have been over-emphasized. However, when automatic “black-box” filtering is desired, objective methods, with their consistency, are inherently required. 4.2.1 Indirect Low-Pass Filtering as Hierarchical Model Selection The filtering problem is first recast in terms of modeling: the indirect filtering Digital Filtering of Flow Injection Data 172 approach is a least squares modeling method based on a set of functions which are hierarchical extensions of the (weighted) zeroth order polynomial. As the order increases, the model becomes more complex. In the last chapter, it was shown that in the absence of noise, the “correcU’ model is the simplest model for which the cost function (e.g. the SSE) equals zero. If there is noise, the correct model will no longer have a minimum value of zero for the SSE since the noise is not explained. Increasing the model complexity makes it easier to follow the noise contribution and lowers the value of the cost function: the given data (signal and noise) are fitted exactly when the most complex model is used. erroneously favored. Hence, the highest order model will always be The parsimony principle [54] was introduced to handle this situation: a model should be as simple as possible, yet sufficiently flexible to cover all aspects of the process itself. 1 In this context, the problem of determining the spectral cut-off point translates to that of determining the “correct” model. To avoid over-fitting, it is necessary to use a criterion in which model complexity is taken into account. A variety of criteria have been proposed as an objective means to choose the “correct” model. These include the final prediction error criterion [56], Akaike information-theoretic criterion (AIC) [57], the F-test [58], Parzen’s criterion [59] and cross-validation [60]. Except for the last, an overview of these have been given by SOderstrOm [53], who also showed that the first three are asymptotically equivalent. The AIC and cross-validation have been chosen for evaluation here. A link between the two has been established by Stone [61]. Indeed, it has been said that the AIC imitates the idea of cross-validation [62]. 4.2.2 Akaike Information-theoretic Criterion (AIC) Akaike [52, 57] proposed the AIC to incorporate model complexity: AIC L(s(k, a), k, a) + dim(a)] (4.20) = I Analogous to this is the criterion of sufficiency proposed by Fisher [56] for statistical estimation: “the statistic chosen should summarize the whole of the relevant information supplied by the sample”. Digital Filtering of Flow Injection Data 173 where L((k, a), k, a) =—logp(8,k, a) (4.21) is referred to as the log-likelihood function, p(8, k, a) is the probability density function of the model errors: e(k, a) = y(k) 9(k, a) — (4.22) y(k) is the data, 9(k, a) is the model, a is the maximum likelihood estimate of the parameters, and dim(a) is the dimensionality of a. The first part of the expression is the contribution of the error, which decreases with an increasing number of parameters, while the second part represents the model complexity and becomes dominant once an excessive model complexity (i.e. number of parameters) have been used. The criterion calls for the selection of the model for which the AIC is a minimum. For Gaussian- distributed errors, the AIC takes the form [63]: AIC = 10 + 2dim(a) g(SSE NJ N (4.23) where SSE is the sum of the squared errors defined by (3.37). Theoretical details of the derivation can be found in the literature [57] and only the most general account will be given here. The AIC was derived from an informationtheoretic point of view via the Kuliback-Leibler information distance [63]: ,p 0 l(p ) 1 = fp (k)log’°dk 0 (4.24) where p (k) is the reference probability distribution and p 0 (k) is a given probability 1 distribution. The information distance measures the similarity between the two distributions. Although the use of Shannon’s entropy for model selection [50] has been criticized by Akaike [57], the Kullback-Leibler information distance is actually a generalization of Shannon’s definition of information or negative entropy [64]. This is apparent from a comparison of (4.19) and (4.24). Simply stated, the AIC is based on Digital Filtering of Flow Injection Data 174 the following formulation: minimize the average information distance or equivalently, maximize the average entropy between the probability distributions of the model errors and the true errors. In essence, the errors are matched to an assumed probability distribution, with the model yielding the best match deemed as the “correct” one. (4.23), the true errors are assumed to have a Gaussian distribution - In one that is often valid in practice. 4.2.3 Cross-validation An alternative to the AIC for model selection is the method of crossvalidation [60]. This is a popular technique used in pattern recognition for determining the number of significant components in PCA and related methods [65, 66]. Indeed, there is much similarity between that problem and the one under consideration: both involve the identification of statistically significant variance (information) for noise corrupted data, the former within the framework of a principal components model, while the latter within that of a polynomial model (i.e. the given basis functions). Cross- validation has also found use in other areas [67-70]. It has been used as a parameter optimization method for cubic spline smoothing [71]; as such, there is a precedent for this approach. The procedure is intuitively simple and will be outlined within the scope of the problem at hand. Given a data set y of size N, a subset of data, say q percent, is first removed. In general, the pattern for deletion should be entirely random, but for this work, it is sufficient to delete every qth point. In fact, since the entire peak will be homogeneously represented, this may be better. The remaining data are used to estimate the coefficient of the first basis function. In turn, the resulting model is used to predict the missing points. The error between each observed point and corresponding predicted point, the predictive residual error, is computed: 6 = Yk — f(k) (4.25) where f denotes the model whose parameter(s) was estimated without the qth subset. Digital Filtering of Flow Injection Data 175 These errors are stored. To complete the cycle, the removed points are replaced in the data set. This procedure is repeated on another subset of data subject to the condition that no point previously removed shall be removed again; thus, a total of lOOlq repeats will be required. When every element has been removed once, the quantity known as the predictive residual error sum of squares (PRESS) may then be calculated: (4.26) PRESS(n)= where n is the order of the model, which at this stage is equal to one. The polynomial model is then incremented by one order and the entire procedure reiterated. In this way, the PRESS is evaluated as a function of n. If the model is fitting the signal, each successive increment should improve the prediction and the PRESS decreases. But when the model begins to fit noise, the predictive ability should become poorer and the PRESS increases. Hence, there should be a minimum in which the corresponding order is optimal with respect to predictive capability. Cross-validation is computationally intensive. Deletion of points in the design matrix means that the general form of the normal equations (3.42) must be used. This computational load is multiplied q times ample justification for not using the leave-one- out scheme even though it is best [72]. When speed is important, the value of q should be set as small as possible, with the minimum being two. The trade-off is the benefit of averaging. The appeal of cross-validation lies in its pragmatic nature: the method makes sense without resorting to probabilistic arguments or assumptions about the true signal. 4.3 EXPERIMENTAL 4.3.1 Simulation of Flow Injection Data Simulated peaks were drawn from the set described in the experimental of Chapter 3. Please refer to that section for details of the models. The peaks used for 176 Digital Filtering of Flow Injection Data this study are shown in Figure 4.4. In every case, a total of 513 points were used to define the peak such that with a sampling rate of 0.1 s, it spanned a time interval of 51.2 s. When the Savitzky-Golay filter was evaluated, the peak was extended on the front and back by a number of points equal to one-half the filter length used. I I I I I I I I I 1.0 I I A I I I B I C 0.8 0.6 0.4 0.2 0 I I I I I I E 1.0 I I I H I K Cu U) C 0 0.2 0 I I I I I I I I I I I I I I I I I I I I I I I I I I I 1.0 0.8 0.6 0.4 0.2 0 0 100 200 300 400 500 0 100 200 300 400 500 data point Figure 4.4 Simulated peaks used for studies on digital filtering. 0 100 200 300 400 500 Digital Filtering of Flow Injection Data 177 4.3.2 Experimental Data Experimental data were taken from a study in which the interferent effects of acetic acid on magnesium determination by inductively coupled plasma - optical emission spectroscopy (ICP-OES) was mapped as a function of both analyte and interferent concentrations. Since details of the experiment have been reported previously [73], only a skeletal account is presented here. Solutions containing 0, 10, 20, 30, 40, 50, 60, 70, 80, 90 and 99% (vlv) acetic acid were prepared by appropriate dilution with deionized water. Only data from the first solution was used in this work. The flow injection manifold is illustrated in Figure 4.5. A 100 ppm Mg solution was injected into the deionized water carrier, allowed to disperse and then merged with the acetic acid stream. This resulting mixture was introduced into the ICP, which used a linear photodiode array for radiation detection. The spectral line was measured over a 9-diode window. Three additional diodes (corresponding to roughly 281.5, 282.7 and 283.9 nm) were chosen for measurement of background. Radiation integration times of 0.543 s were used. The flow injection peak consisted of 150 points taken with a sampling period of 1.543 s. Seven replicate injections were done for each interferent concentration. 4.3.3 General Data Processing The sum of squared error (SSE) between the filtered data and the true, noisefree data expressed by (3.37), was adopted as the criterion of optimality since preservation of peak shape was of importance to this work. When data were smoothed with the Savitzky-Golay filter, the SSE was calculated only over the main peak segment; the extension was not included. For all studies, a set of 20 different noise sequences was added to each peak. Application of the Kolmogorov-Smirnov test [74] to these with respect to the normal distribution yielded D values (i.e. the test statistic) between 0.019 to 0.037; At the 0.05 level of significance, one can accept these noise sequences to be normally distributed. In each experiment, the average SSE and Digital Filtering of Flow Injection Data 178 70tL 100 ppm Mg il water acetic acid ICP operating conditions inner gas: 0.2 Llmin r.f. power: 1.00 kW outer gas: 10 L/min entrance slit: 50tm wide by 2 mm high nebulizer gas: 1.3 Llmin view height: 8 mm above load coil Figure 4.5 Flow injection manifold and ICP operating conditions used for the study of the effect of Mg and acetic acid concentration on optical emission intensity. standard deviation was computed. This minimized the bias arising from use of a single noise sequence. 4.3.4 Computational Aspects Normally-distributed random number sequences were generated with MATLAB (version 4.Oa, The Mathworks Inc., South Natick, MA) according to the algorithm described elsewhere [75]. Except when cross-validation was performed, discrete Fourier analysis employed an FFT routine, also from MATLAB, which was capable of handling data lengths that were not an integer power of two. In the first case, a DFT was used. Implementation of the Gram and Meixner polynomials and linear least squares curve fitting were as described in Chapter 3. Coefficients for the Savitzky Golay and Meixner FIR filters were calculated from Gram and Meixner polynomials, respectively, via their estimation matrices. The cross-validation and AIC procedures were coded in the MATLAB language. All programs ran on a Sun workstation (model SPARCstation II, Sun Microsystems Inc., Mountain View, CA). Double precision (64bit) arithmetic was used throughout. Digital Filtering of Flow Injection Data 4.4 179 COMPARISON OF FILTER IMPLEMENTATIONS CASE STUDY WITH GRAM POLYNOMIALS - Based on the number of citations and papers published on it, the Savitzky-Golay filter is one which has most endeared itself to analytical chemists. Since other filters, real-time or otherwise (e.g. frequency-based FIR filters, hR filters), generally give better performance it is likely that this is due to simplicity of implementation, the availability of coefficient values, or equations for, in the chemical literature [3, 76, 77], and its connection with polynomial least squares fitting. The Savitzky-Golay filter does possess the following attractive property though: among all FIR filters of the same length which also preserves moments, it gives the most noise reduction [10]. Support for this is provided in Figure 4.3. Consequently, it has been stated [20] that the Savitzky-Golay filter is the standard by which other FIR filters should be compared. When the signal and noise characteristics are unknown, it is the recommended FIR filter to use [20]. For a filter which is designed for real-time operation , it is quite surprising that it 2 is more often used off-line. This observation is not only apparent from a survey of the analytical chemistry literature but may also be inferred from the importance that many researchers have placed on the inability of this filter (and all FIR filters for that matter) to smooth the ends of batch data [36, 78-80]. The recommended technique for circumventing this is to use the appropriate non-midpoint impulses to fit the ends [36, 78-80]. Hence, the solution is a mixed scheme in which least-squares fitting is added as an afterthought to least-squares smoothing. While this approach does eliminate the loss of end data - a problem that compounds when repeated smoothing is performed - a discontinuity can exist at the transition between smoothed and fitted curve (refer to examples given in references [36, 80]). Furthermore, multi-pass smoothing can generally be replaced by an equivalent single pass operation after a sufficient number 2 It is recognized that technically, the Savitzky-Golay midpoint smoother is not a real-time filter since it is not causal as defined, ,e. to get new value for point k, LJ2 subsequent points must first be acquired. However, a simple shift in the index of convolution is all that is required to make the filter causal. Digital Filtering of Flow Injection Data of passes has been determined - a luxury afforded by batch filtering. 180 Instead of employing least squares fitting as an add-on, perhaps it should be used as an alternative. That is, a global least squares model (i.e. indirect filtering) is used rather than multiple overlapping local least squares fitting. There are several reasons for considering this alternative in FIA. First, only a single analytical peak is observed for any one flow injection experiment. This is in contrast to other methods such as chromatography or optical spectroscopy where multiple peaks are commonly analyzed. This is not to say that doublet peaks do not exists in FIA - they do [81], but they are far less common. From theoretical considerations outlined in Chapter 1, the analytical signal is (typically) a deterministic, continuous, smooth function of time. Although the true form of the signal is not known when noise is present, this very fact means that a global polynomial model can be used to obtain a very good approximation to the true signal, especially if it can be described with a small number of coefficients. Since the position of the sample at any time after injection can be determined from the flow rates and geometry of the analyzer, the data acquisition window can be arranged to cover the expected time interval of peak transit across the detector. Hence, the peak can be captured with minimal baseline data such that the indirect filter can operate effectively. Second, the two primary practical advantages of the FIR implementation: real-time (continuous) operation and filtering of very large data arrays [34], are not important here: i) the analysis is carried out on discrete samples as opposed to a continuous stream. The result is a finite data set, typically consisting of 100 to 500 points; ii) the non-essentiality of real-time filtering for the work reported in this thesis has been discussed before. Rather, the FIR implementation requires an additional U2 points to be taken after the peak has returned to baseline. This results in longer experimental times. Finally, there is only one parameter to optimize in indirect filtering: polynomial order. This greatly simplifies the automatic filter problem. Larivee and Brown [49] have reported that optimization of the Savitzky-Golay filter was complicated by the fact that i) a large change in filter Digital Filtering of Flow Injection Data 181 parameters was necessary to effect a significant change in the frequency response and ii) several combinations of filter order and filter length gave comparable results. A comparison of the two Gram filter implementations: FIR and indirect, is presented below. Unlike the complex discrete exponential basis associated with the discrete Fourier transform, these are not linked by the convolution theorem. The comparison was carried out over eight simulated peaks (see Figure 4.4), four SIN levels (100, 20, 10 and 5) and twenty Gaussian-distributed noise sequences. The Savitzky-Golay filter has two parameters: polynomial order and filter length. Polynomial orders from 2 to 20 were examined. For each polynomial order, the filter length was scanned from the lowest (e.g. 5 points for a quadratic filter) up to the maximum filter length of 1025 points. For indirect filtering, the Gram polynomial order was scanned from 0 to 24. Evaluation of ultimate filter performance, in terms of the SSE, followed the approach of Larivee and Brown [49]. Their strategy is illustrated in Figure 4.6, which shows the results for the optimization of the indirect Gram filter with respect to order on peak C. Each peak was combined with one of 20 noise sequences of a specified S/N (in this case, 20) to give a data set of 20 records. SSE curves were plotted for each record to generate 20 optimum filter parameters corresponding to 20 minimum SSE values. The mean and standard deviation for both filter order and SSE was then computed over these values. The results for peaks A, C, H and P with S/N of 100 and 10 are presented in Tables 4.1 to 4.4, respectively. These results are adequate to illustrate the salient features. The peaks chosen represent the two extremes (A, P) and two cases in between (C, H). First, some general observations can be made in regards to the results tabulated. In weak filtering, for which the signal is preserved, the frequency components of the signal must lie within the unit response portion of the filter’s frequency response curve [10]. For the Savitzky-Golay filter, this corresponds to the Digital Filtering of Flow injection Data I I I I I I I I 0 5 10 15 20 182 100 10 - = LU Cl) C/) 10.1 - 0.01 25 filter order Figure 4.6 Plot of SSE against Gram filter order for peak C and 20 Gaussian-distributed random noise sequences (S/N = 20). Each curve yields a minimum SSE and thus an optimal filter order. flat portion in Figure 4.2. Bromba and Ziegler have stated that in this region, signal error decreases with increasing order, provided that it is not too high [10]. because higher orders give greater tangency at = This is 0 [82] and consequently less signal distortion. The counter-balance is that un-necessarily high orders result in overfitting. In using higher orders, the filter length must be increased to maintain the same cut-off frequency. These considerations account for the observed increase in the optimal filter length with filter order. The results reported by Larivee and Brown [49] for the third 3 midpoint smoother, in which optimum filter length was found to vary inversely order with number of points defining FWHM, are contrary to this. The increase in optimal filter length with noise level is due to the fact that the optimal cut-off frequency must fall as signal components at the higher frequency end become increasingly polluted by noise. The corresponding result for the indirect filter is that the optimal order decreases. The relative standard deviation in the SSE varied depending on the peak and noise level. It was as low as 8% (observed for peak A with SIN = 100) or as high as For mid-point Savitzky-Golay smoothers, the third order filter is equivalent to the second order filter. Order 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 100 100 100 100 100 100 100 100 100 100 10 10 10 10 10 10 10 10 10 10 59.6 91.6 121.1 152.8 181.0 213.3 241.0 271.5 302.0 332.2 17.4 27.6 36.3 45.3 55.9 65.6 73.9 83.7 92.4 103.1 0.009 16 0.00924 0.009 31 0.009 35 0.009 35 0.009 35 0.009 38 0.00940 0.009 41 0.00943 0.225 07 0.22889 0.23262 0.23332 0.23440 0.23430 0.234 34 0.23486 0.23447 0.233 75 1.4 1.6 3.1 4.6 5.1 5.2 5.9 6.6 8.4 8.8 6.5 12.8 15.3 22.0 24.7 27.8 27.3 32.5 34.0 34.7 0.053 59 0.05497 0.05522 0.05447 0.05537 0.05592 0.056 02 0.05545 0.05537 0.054 97 0.00076 0.00076 0.000 78 0.000 80 0.000 79 0.000 79 0.00082 0.00082 0.000 82 0.00083 Savitzky-Golay adaptation SSE Optimal filter length mean mean 8.35 10.05 0.59 0.60 0.070 85 0.000 88 0.03201 0.000 33 Indirect filter Optimal order SSE mean mean Comparison of Gram filter implementations on peak A S/N Table 4.1 10.77 10.83 11.05 11.21 11.15 11.06 11.04 11.16 11.15 11.16 43.47 44.03 43.18 42.58 43.32 43.05 41.90 41.85 42.02 41.90 t C,) co 0• 0 CQ Th Order 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 100 100 100 100 100 100 100 100 100 100 10 10 10 10 10 10 10 10 10 10 85.2 137.9 190.0 238.7 296.4 346.6 399.0 447.1 495.7 546.0 45.0 77.5 106.7 134.7 162.2 190.8 220.4 249.1 278.0 302.6 8.4 10.8 13.5 21.0 30.5 31.5 35.4 44.9 47.9 56.2 3.0 4.7 7.1 9.3 11.5 14.6 17.8 18.2 18.4 22.6 70 42 33 31 30 30 30 30 30 30 0.132 70 0.123 60 0.121 30 0.12044 0.118 91 0.117 82 0.117 10 0.11640 0.115 84 0.11564 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 72 63 59 56 54 53 53 53 53 53 0.048 15 0.046 91 0.045 98 0.04497 0.04481 0.045 21 0.045 15 0.04519 0.044 89 0.04447 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Savitzky-Golay adaptation Optimal filter length SSE mean mean 9.65 14.75 1.39 0.64 0.087 94 0.001 38 0.03582 0.000 48 Indirect filter Optimal order SSE mean mean Comparison of Gram filter implementations on peak C S/N Table 4.2 3.25 2.63 2.49 2.46 2.35 2.26 2.21 2.15 2.12 2.11 6.69 5.71 5.46 5.56 5.62 5.66 5.68 5.67 5.62 5.61 t I 0 CO ‘.1 CO. Order 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 100 100 100 100 100 100 100 100 100 100 10 10 10 10 10 10 10 10 10 10 9.5 11.1 21.4 28.7 34.2 43.2 51.2 57.2 62.8 74.8 105.2 174.6 239.2 302.0 366.5 431.1 493.9 556.3 622.1 683.6 3.8 6.8 10.2 11.8 10.6 13.0 13.8 16.0 18.0 19.0 61.9 114.5 164.6 212.3 262.0 306.7 352.9 397.0 439.5 484.7 0.097 0.088 0.085 0.083 0.082 0.081 0.081 0.081 0.081 0.081 86 85 12 33 12 69 62 76 51 36 0.001 78 0.001 38 0.001 24 0.00121 0.001 18 0.001 18 0.001 17 0.001 17 0.001 18 0.001 18 0.04223 0.040 84 0.04023 0.039 95 0.039 23 0.03867 0.038 00 0.037 93 0.037 79 0.037 64 0.000 58 0.000 52 0.00049 0.000 47 0.000 47 0.000 46 0.000 46 0.000 46 0.000 46 0.000 46 Savitzky-Golay adaptation Optimal filter length SSE mean mean 12.00 15.95 1.21 1.00 0.112 12 0.00152 0.043 32 0.000 46 Indirect filter Optimal order SSE mean mean Comparison of Gram filter implementations on peak H S/N Table 4.3 -1.03 -1.70 -1.99 -2.13 -2.24 -2.28 -2.31 -2.30 -2.32 -2.34 -0.85 -1.78 -2.04 -2.21 -2.28 -2.31 -2.30 -2.28 -2.26 1.53 t (31 CD 0 CD 0 CO 0.039 37 0.037 82 0.037 40 0.036 56 0.036 25 0.03596 0.035 95 0.035 64 0.035 52 0.035 32 0.08733 0.076 82 0.072 64 0.070 28 0.068 68 0.06794 0.06752 0.066 97 0.066 56 0.066 51 9.0 15.7 22.7 27.7 29.8 36.7 39.4 47.8 52.1 60.4 119.1 198.7 278.9 362.3 437.0 512.2 589.4 662.2 739.3 815.3 2 4 6 8 10 12 10 10 10 10 10 10 10 10 10 10 12.10 (13.20) 16.40 (17.20) 1.02 (1.36) 1.05 (1.36) 0.11703 0.042 10 (0.06707) (0.025 82) 0.001 58 0.00048 (0.000 86) (0.000 29) Indirect filtera SSE Optimal order mean mean a) Values In parentheses indicate the case where only the even Gram polynomials were used for indirect filtenrig. 16 18 20 14 0.000 56 0.000 49 0.000 47 0.000 45 0.000 45 0.000 44 0.000 44 0.00043 0.00043 0.000 42 0.001 58 0.001 17 0.001 01 0.000 93 0.000 89 0.000 87 0.000 86 0.00085 0.00085 0.000 84 4.1 8.3 11.1 17.2 21.6 26.5 27.3 34.0 37.2 43.8 69.8 135.5 204.1 268.9 334.9 400.0 465.6 526.4 593.4 655.6 2 4 6 8 10 12 14 16 18 20 100 100 100 100 100 100 100 100 100 100 mean SSE Order Optimal filter length mean Savitzky-Golay adaptation Comparison of Gram filter implementations on peak P SIN Table 4.4 (5.01) (2.37) (1.20) (0.60) (0.28) (0.08) (-0.04) (-0.08) (-0.12) (-0.14) -2.25 (1.88) -3.10 (0.93) -3.44 (0.53) -3.65 (0.31) -3.79 (0.16) -3.86 (0.09) -3.90 (0.04) -3.96 (-0.01) -3.99 (-0.05) -4.01 (-0.06) -0.02 -2.63 -3.72 -4.31 -4.60 -4.78 -4.91 -4.96 -5.02 -5.06 t 0) 0 (Q Digital Filtering of Flow Injection Data 50% (observed for peak P with SIN = 187 5). In some cases, the standard deviation was relatively high and some exceeded the difference in the mean SSE between the two filters; the small sample t test was applied to ascertain whether these differences were statistically significant. The small sample t statistic is given by [83]: 2 1 X (4.27) _1)s(11 2 I(Ni1)s +(N 2 + 1 N — N 2 where , 1 N 2 N 1 are the two means with standard deviations s, s and N , N 1 2 are the respective number of samples. The t statistic is given in the last column of the tables; here, the sign indicates whether the indirect filter is better (+) or poorer (-). The null hypothesis (that the means were drawn from the same population) was tested at the 0.05 level of significance on a two-tail distribution. If the absolute value of the t statistic was greater than 2.306, then the means were accepted as statistically different. Table 4.1 shows that for peak A, the Savitsky-Golay filter is outperformed by the indirect filter at all orders and noise levels studied, as reflected by high negative values of t. This can be explained by the fact that the leading edge of peak A exhibits a sharp rise from baseline, which makes it difficult for Savitzky-Golay filters to smooth without introducing significant distortion; much of the error is distributed in this area after filtering. Short filter lengths and low orders are favored and the mean SSE rises with filter order. Out of those studied, this trend is observed only for this peak. The indirect filter is better here because only the points which define the peak is required: the sharp break can be avoided. Indirect filtering is also superior on peak C as shown in Table 4.2. Since the lead-in to peak C is much more gradual, longer filter lengths and thus higher filter orders for the Savitzky-Golay filter can be used. Consequently, the difference in mean SSE between the two filter types is reduced. At high SIN, the difference is still quite significant but at low SIN, significance is lost when a twelve order (or above) Savitzky Golay filter is employed. However, the optimal filter length (L) is about 347 and 545, Digital Filtering of Flow injection Data respectively for the twelve and twentieth order Savitzky-Golay filters. 188 The latter is longer than the actual peak data length! As noted above, experimental time increases with filter length. Note that the optimum order for the FIR implementation is beyond the twentieth. Beyond peak D, a quadratic or quartic Savitzky-Golay filter is competitive with the indirect filter. A changeover in filter dominance is estimated to occur at peak G (eight-tank model). The reversal in filter performance is clearly evident with peak H. A reason for the turn-around is that the odd basis functions of the indirect filter contribute very little since the peak is quite symmetric. Hence, the filter coefficients for these are dominated by noise and do not contribute effectively to the recovery of the signal. This rationale is substantiated by the results for peak P, the Gaussian. When both odd and even basis functions are used, the SSE for the indirect filter is relatively poor, particular at low SIN. However, when only the even basis functions are used for fitting, the SSE is seen to be comparable and in fact, betters the quadratic and quartic Savitzky-Golay filters at high S/N. In our experience, flow injection peaks are usually highly skewed, as represented by peaks A to D. As such, the indirect filter approach is preferable, in practice, over that of the FIR when signal recovery is important. Moreover, only those points that define the peak are required. As shown below, in situations where (nearly) symmetric peaks are observed, the indirect Fourier filter offers comparable or superior performance over the Savitzky-Golay filter. 4.5 COMPARISON OF BASIS FUNCTIONS FOR INDIRECT FILTERING The effectiveness of three basis functions: Gram, Fourier and Meixner, for filtering was evaluated on the set of peaks described above. To maintain a “level playing field”, mathematical tricks such as zero-padding of the data prior to Fourier approximation, and windowing [34] were not used. The former allows the Fourier basis to achieve higher spectral resolution and thus, better performance, in general. The Digital Filteiing of Flow Injection Data 189 latter is often used to minimize ripples in the smoothed data due to Gibbs phenomena although a preliminary study found little difference in the results when a Hanning window was employed. Similar findings were also reported by Larivee and Brown [49] when they employed a trapezoid window. Since the simulated peaks exhibit various degrees of skewness, this parameter is a natural focal point for this comparison. A graphical analysis is employed here. Actual values for both the minimum SSE and optimal order will be presented in the next sub-section, where these values will be used as reference for evaluation of automatic filter optimizers. Note that some values for the Gram basis have already been tabulated in Tables 4.1-4.4. The peaks employed in the last section is also used here. Figure 4.7 shows the effect of skewness on filter performance in terms of signal recovery. Each graph corresponds to one of the four S/N ratios. The mean and standard deviation of the minimum SSE is plotted. Similar trends are observed regardless of S/N. It is evident that the Fourier filter is best on peaks which are skewed little or not at all and that the Meixner filter performs best on those that are highly skewed. The latter is due to the fact that the Meixner polynomials resemble such peaks. Obviously, the performance of these two filters crosses over somewhere between these two extremes. The transition region occurs between skewness values of about 0.6 and 0.9. In this region, both filters perform equally well but the Fourier filter should be preferred due to faster computational speed. The Gram filter is seen to be a moderate performer in comparison to the other two, one way to improve its, performance is described below. Where computational speed is important, however, it is a useful alternative to the Meixner filter for very skewed peaks. Table 4.5 summarizes these observations in the form of general guidelines. The typical flow injection peak will favor the use of the Meixner filter. The SSE provides an overall measure of performance. Because gradient procedures utilize the entire peak, it is beneficial to examine the residuals (with respect to the true function) to get a feel for the distribution of error. Plotting the residuals for Digital Filtering of Flow Injection Data I I I I 0.4 I a) 190 I c) 0.01 0.3 0.2 - 0.005 b) d)_IN 0.0:! 10 1.5 0 5 1 1.5 skewness Figure 4.7 The effect of skewness on filter performance. Gram (0), Fourier ( U) and Meixner ( G) filters were compared at signal to noise levels of a) 100, b) 20, C) 10, and d) 5. Table 4.5 Indirect filter recommended for use in FIA based on peak skewness Skewness Recommended filter greater than 0.9 Meixner (Gram if time is a factor) between 0.6 and 0.9 any of the three less than 0.6 Fourier Digital Filtering of Flow Injection Data 191 all 20 noise sequences makes the resulting graphical analysis unwieldy. Hence, only the most normally-distributed of the 20 noise sequences, as determined by the Kolmogorov-Smirnov statistic [74], was used. Since only general observations are desired, this should not be a serious problem. plotted in Figure 4.8. The noise sequence of interest is Note that the noise values at the ends are biased positively. Residual plots are presented in Figures 4.9 4.11 for each filter. A S/N of 20 was used - and all 8 peaks are depicted. In general, the most error from Gram filtering occurs at the ends (consisting of about 6 to 8 points) of the data record; the middle portion is fitted best. This is because data at the ends rely heavily on their own value and use data further removed for estimation in comparison to those in the middle - a fact reflected in the rows of the estimation matrix. In effect, there is less averaging at the ends. Notice how the end residuals follow the bias in the end noise values. No peak shape dependency is observed. To minimize the “end errors”, one could acquire, say 6 to 8 additional points before the peak and after the peak. For reasons stated in the last section, only the post-peak error can be minimized effectively in this way for peaks resembling peak A. I I I I 0 100 I I I I I I 200 300 400 500 3 —, 2 V E -2 -3 data point Figure 4.8 The most normally-distributed noise sequence out of the set generated. Digital Filtering of Flow Injection Data 0.03 I I I I I U.Ub H A 0.02 - 0.04 0.01 192 - - 00 I t -u.uflAI I I I I -0.04 0.08 0.03 B 0.02 K - 0.04 - 0.01 00• Cu . Cu I I -0.01 I -0.04 , -- v.L, C 0.080 0.04 0.04 0.02 - 0- 0— -0.0 0.03 -0.04 0.06 E P 0.02 0.04 0.01 0• -0.01 - 0- - 0 100 200 300 400 500 0 100 200 300 400 500 data point Figure 4.9 Residuals from indirect Gram filtering of simulated peaks (labeled by capital letter). Residuals were computed from the most normal noise sequence used in the study. Digital Filtering of Flow Injection Data A 0.1 0.05 193 H 0.01 0 - 0-0.01 I I I I I I I I I I K B 0.01 - 0.02 0-0.02 0 - I I I _u.u-I Cl, U, 0.02 0.01 I -c I I I I I I I I 0.02 0 0.01 - - 00-0.01 - I -u.u’ I I I -0.01 0.Oi J. Li P E 0.01 0.005 - 00-0.01 -0.02 - I 0 100 200 300 I 400 500 -0.005 I I I I I 0 100 200 300 400 - 500 data point Figure 4.10 Residuals from indirect Fourier filtering of simulated peaks (labeled by capital letter). Residuals were computed from the most normal noise sequence used in the study. Digital Filtering of Flow Injection Data U Ui A 194 I I I I I I I I I I I I I I H 0.010 0.04 0.005 0• 0 I I I 0.015 -0.04 1)4 0.c: K B 0.010 0.04 0.005 0 0 cu U) Cu V U) -0. I)Ic 0.04 -004 C 0 0.03 0.06 0.02 0.03 0.01 0 0 -0.01 0.08 I — I 0.09 I I I I I 200 300 400 500 - P E 0.06 - 0.04 0.03 0 -0.04 . I I I I I 0 100 200 300 400 I 500 0 -0.03 0 100 data point Figure 4.11 Residuals from indirect Meixner filtering of simulated peaks (labeled by capital letter). Residuals were computed from the most normal noise sequence used in the study. Digital Filtering of Flow Injection Data 195 The errors for Fourier filtering are distributed evenly across the time axis, except with peak A, where end errors are noticeable. It was observed that truncation of peak A in the frequency domain was much more drastic than for the other peaks. It is well known in such circumstances that the Fourier approximation exhibits the Gibbs phenomenon. The lack of significant end errors with Fourier filtering for most of these peaks is another reason to prefer its use over the Gram filter. The results for the Meixner filter are quite different from the other two. The distribution of error is asymmetric, being greatest at the first part of the data. The portion of the peak after peak maximum can be recovered extremely well - better than by the other two filters. The fact that the Meixner polynomials are themselves weighted decaying exponentials accounts for its effectiveness here. This can be exploited for gradient methods. If the low magnitude part of the rising portion of these peaks are avoided, good performance is also observed for the leading edge of the peak prior to peak maximum. Note that the zero error observed at the zeroth data point for each case is artificial since the Meixner polynomials are defined to begin at zero and the models were also made to start with zero. The indirect filter approach together with these three bases should provide enough versatility to handle most if not all types of peak-shaped transient signals encountered in flow injection analysis. In the next section, two mathematical methods the AIC and cross-validation - - are presented to provide a means for the filter operation to be optimized automatically. 4.6 EVALUATION OF AUTOMATIC NEAR-OPTIMAL SMOOTHING CRITERIA FOR INDIRECT FILTERING In Figure 4.6, it was shown that plots of SSE versus filter order assume an “L” shape with the minimum SSE value coinciding with the break. Analogous graphs for AIC and PRESS criteria are depicted in Figure 4.12. While the SSE against filter order curves exhibit an obvious break at the optimal filter order, the break in the AIC and Digital Filtering of Flow Injection Data a) I I 196 I -2 -3 - () -4 -5 -6 - - - I I I b) I 100 C.’) 10 I I 0 5 I I 10 15 20 25 filter order Figure 4.12 Plots of a) AIC and b) PRESS against Gram filter order for peak C and 20 Gaussiandistributed random noise sequences. The S/N ratio was 20. Note log scale for ordinate of graph (b). PRESS curves is more subtle (and is, in fact, difficult to notice without greater scale expansion). Furthermore, more than one minimum can exist. In such cases, the order corresponding to the first is taken to be correct, provided that a smaller value does not occur within two filter orders. The stipulation imposed enhances robustness; some justification for doing this is given later. When more than one minimum is present for these simulated peaks, the global minimum was observed to correspond to much higher orders. High order global minima are suspected to arise when a high order basis function correlates well with the given noise sequence. explanation for their sporadic appearances. This provides an Close inspection of the plots in Digital Filtering of Flow injection Data Figure 4.12 will confirm that the two types of curves do differ. 197 Ignoring scale, their general shapes are very similar and so it is of no surprise that the AIC and crossvalidation are comparable, as shown below. The ability of the AIC and cross-validation to determine the optimal filter order are summarized in Tables 4.6-4.8 for each filter studied. As above, only the data for peaks A, C, H, and P are presented. Values under the “Best” category refer to the true values. The SSE values listed under the “AIC’ and “Cross-validation” categories refer to the SSE determined at the optimum filter orders determined by that criterion. For each filter, there is very close agreement between the optimal filter orders (and minimum SSE) determined via the AIC and cross-validation. In general, good agreement between these and the true optimal filter orders are also observed. Differences between the means of the true SSE and that determined by the two criteria are within the standard deviation of the former. Exceptions are “boxed” in the tables. Only 2 of the 48 tabulated cases deviate; this is quite acceptable. The case for indirect Fourier filtering on peak A at a SIN of 100 is particularly deviant. The Fourier spectrum for this peak contains small but significant components at relatively high frequencies drawn out over a broad bandwidth (see Figure 3.5). Because this region is also quite shallow, small changes in noise level can greatly affect the optimal cut-off frequency; this is reflected by the high standard deviation in the optimal filter order. Such problems are a reflection of the shortcomings of the indirect filter itself rather than the AIC or cross-validation procedures and they can be avoided simply by choosing a more efficient set of basis functions; in this case, the Meixner polynomials. Hence, the fact that both the AIC and cross-validation yield good results does not diminish the importance of choosing an appropriate indirect filter. It should also be noted that, in practice, both procedures need not be scanned over the span of orders employed above. The procedure should be aborted after the first minimum is encountered, subject to the condition stated above. Further, it is not always necessary to begin the procedure at the zeroth order basis function. For 0.001 58 0.03303 0.11703 0.40271 1.05 1.28 1.02 1.02 16.40 13.50 12.10 10.90 100.00 20.00 10.00 5.00 0.000 48 0.011 50 0.04210 0.13871 0.000 46 0.01138 0.04332 0.14444 0.001 52 0.03141 0.112 12 0.38396 1.00 1.39 1.21 1.47 15.95 13.05 12.00 10.45 100.00 20.00 10.00 5.00 0.000 48 0.009 88 0.03582 0.13853 14.75 11.35 9.65 9.05 100.00 20.00 10.00 5.00 0.001 38 0.027 22 0.08794 0.31368 0.60 0.51 0.59 0.67 10.05 8.95 8.35 7.85 0.64 1.73 1.39 0.51 mean SSE 0.000 33 0.008 65 0.03201 0.124 22 Best 0.000 88 0.019 09 0.070 85 0.26681 Optimal order mean AIC mean 0.91 0.92 0.91 1.04 0.001 0.022 0.085 0.321 07 88 01 95 0.83 1.42 1.26 1.03 0.00153 0.03319 0.10831 0.39080 0.50 0.76 1.67 1.39 0.00159 0.035 40 0.14312 0.46353 15.60 13.10 11.75 10.50 1.05 1.21 1.33 1.57 0.001 73 0.03705 0.13982 0.52469 Peak P (Gaussian) 15.40 12.45 11.20 9.60 Peak H (Tanks-in-series: 9 tanks) 13.80 10.85 9.70 8.70 Peak C (Tanks-in-series: 4 tanks) 10.10 8.70 8.25 7.65 0.000 53 0.01203 0.04591 0.191 24 0.000 53 0.013 63 0.04800 0.191 74 0.000 47 0.010 85 0.04713 0.19584 0.00048 0.011 03 0.033 12 0.14312 SSE Peak A (Tanks-in-series: 2 tanks) Optimal order mean Comparison of optimization results for the indirect Gram filter 100.00 20.00 10.00 5.00 S/N Table 4.6 15.65 12.80 11.45 10.05 15.30 12.40 10.85 9.60 13.90 10.45 9.50 8.80 9.95 8.75 8.35 7.55 1.42 1.36 1.36 1.50 0.73 0.75 1.63 1.39 1.12 1.54 1.10 1.15 0.76 0.97 1.04 1.05 0.001 79 0.03931 0.14482 0.52469 0.00168 0.034 84 0.14580 0.46353 0.001 55 0.03246 0.10728 0.39744 0.001 05 0.022 75 0.08862 0.32253 0.000 53 0.01745 0.04960 0.19330 0.000 63 0.013 93 0.05510 0.191 74 0.000 51 0.01101 0.051 13 0.20296 0.00046 0.011 43 0.04584 0.142 46 Cross-validation Optimal order SSE mean mean CD (Q 0.47 0.66 0.51 0.61 0.31 0.44 0.39 0.50 7.30 5.70 4.95 4.50 5.10 4.25 4.05 3.60 100.00 20.00 10.00 5.00 100.00 20.00 10.00 5.00 1.30 0.99 0.75 0.72 13.00 8.40 6.85 5.75 100.00 20.00 10.00 5.00 5.19 2.09 1.70 1.19 39.85 18.55 12.45 8.95 Optimal order mean 0.00042 0.009 58 0.038 11 0.12848 0.000 47 0.01064 0.041 12 0.15468 0.001 34 0.02492 0.08868 0.30926 0.000 86 0.018 42 0.070 64 0.25524 0.000 63 0.01177 0.04517 0.168 12 0.002 60 0.04015 0.12922 0.412 75 0.000 99 0.015 62 0.055 41 0.18531 SSE 69 93 30 19 mean 0.010 0.107 0.290 0.789 Best mean 3.60 2.65 1.96 1.39 I 0.012 63 0.123 07 0.335 49 0.91587 1.33 1.23 1.00 0.72 0.002 0.047 0.151 0.475 87 21 34 86 0.69 0.75 0.91 0.95 0.001 48 0.028 15 0.11020 0.393 48 5.20 4.30 3.95 3.35 0.52 0.57 0.76 0.59 0.000 96 0.02102 0.088 94 0.299 53 Peak P (Gaussian) 7.50 5.65 4.75 4.20 Peak H (Tanks-in-series: 9 tanks) 11.75 8.05 5.95 4.75 Peak C (Tanks-in-series: 4 tanks) 30.75 15.80 11.05 8.15 I 0.000 50 0.01102 0.049 95 0.153 19 0.000 56 0.010 11 0.04464 0.203 98 0.000 58 0.015 47 0.05934 0.194 02 0.001 65 0.015 52 0.059 79 0.20317 SSE Peak A (Tanks-in-series: 2 tanks) Optimal order mean AIC Comparison of optimization results for the indirect Fourier filter 100.00 20.00 10.00 5.00 SIN Table 4.7 5.25 4.15 4.00 3.35 7.60 5.55 4.70 4.00 11.65 8.05 6.00 4.75 29.40 15.05 10.40 8.15 0.55 0.37 0.56 0.59 0.82 0.83 0.92 0.79 1.39 1.23 1.03 0.72 3.82 1.93 1.98 1.50 I 0.000 0.019 0.081 0.294 97 62 20 98 0.001 53 0.02957 0.10820 0.38462 0.002 98 0.046 20 0.15033 0.477 73 0.013 35 0.12195 0.342 26 0.91307 I 0.000 0.010 0.043 0.143 49 22 82 32 0.000 57 0.01206 0.04090 0.181 85 0.000 73 0.014 63 0.05796 0.188 26 0.00276 0.016 40 0.05912 0.23345 Cross-validation Optimal order SSE mean mean CD CD - 2. ‘1 0.00036 0.010 23 0.035 12 0.12028 0.00095 0.023 20 0.089 54 0.33009 0.00161 0.03540 0.13237 0.48464 0.66 0.41 0.88 1.02 0.89 1.00 0.97 1.01 8.30 7.20 6.60 6.25 16.45 14.05 12.00 11.20 100.00 20.00 10.00 5.00 100.00 20.00 10.00 5.00 0.000 44 0.01070 0.04230 0.156 11 0.00031 0.00766 0.02984 0.10227 0.00057 0.01424 0.059 00 0.21998 100.00 20.00 10.00 5.00 0.37 0.31 0.62 0.83 1.40 1.30 1.25 1.35 100.00 20.00 10.00 5.00 4.15 4.10 4.20 3.45 mean SSE 0.000 18 0.00407 0.016 12 0.06394 Best 0.00029 0.006 89 0.027 72 0.111 68 Optimal order mean AIC mean 0.83 0.75 0.83 0.99 I 0.00041 0.009 54 0.03924 0.17864 1.19 0.83 0.86 1.11 0.00082 0.018 55 0.076 52 0.322 18 0.67 0.81 0.99 1.00 0.001 11 0.025 30 0.111 19 0.41586 16.05 13.15 11.65 10.20 1.00 1.14 1.69 1.40 0.00172 0.04054 0.16989 0.558 06 Peak P (Gaussian) 8.15 6.85 6.65 5.55 Peak H (Tanks-in-series: 9 tanks) 4.95 4.50 4.30 3.80 Peak C (Tanks-in-series: 4 tanks) 1.45 1.40 1.45 1.60 I 0.000 43 0.011 37 0.05395 0.17536 0.00051 0.010 94 0.049 37 0.15719 0.00044 0.00849 0.037 54 0.11972 0.000 31 0.007 80 0.03205 0.15658 SSE Peak A (Tanks-in-series: 2 tanks) Optimal order mean Comparison of optimization results for the indirect Meixner filter 0.50 0.47 0.44 0.49 S/N Table 4.8 15.80 13.20 11.15 9.70 7.95 7.00 6.95 5.70 4.75 4.75 4.25 3.60 1.20 1.10 1.20 1.25 1.15 1.36 1.31 1.49 1.00 0.97 1.19 1.38 1.21 1.29 1.16 1.05 0.62 0.45 0.62 0.64 0.001 77 0.04205 0.16944 0.606 59 0.001 14 0.025 11 0.109 80 0.41795 0.00076 0.01944 0.08061 0.31039 0.000 35 0.007 72 0.03288 0.13708 0.000 53 0.01009 0.05617 0.18970 0.00043 0.010 53 0.05353 0.16939 0.00044 0.010 79 0.04377 0.11339 0.00027 0.00624 0.026 15 0.101 82 Cross-validation Optimal order SSE mean mean ‘.3 0 0 E. . . ‘.1 (‘ Digital Filtering of Flow Injection Data 201 example, when the Gram filter is used, one can be sure that functions up to at least the second order will be required to fit a peak-shaped transient. Hence, un-necessary work can often be avoided. In terms of effectiveness, these criterion are comparable based on their performance on simulated data. The AIC is simpler to compute and currently is the faster of the two, which leads one to recommend it over cross-validation. However, it is clear, from the results above, that the two methods are not redundant and can produce different answers. This is important when real data are involved since cyclic noise [20] may also be present and multiple minima could be more prevalent. If time is not a premium commodity, the use of both independent criteria can reduce the risk of choosing the wrong filter order. This is demonstrated in the example below. 4.7 APPLICATION TO A GRADIENT CALIBRATION EXAMPLE Exploitation of the reproducible concentration gradient in FIA has led to a variety of novel experimental techniques, one of which is gradient calibration. The method is relatively simple. The following description will make references to Figure 4.13. Measurement of analyte concentration is commonly based on maximum peak height (Cmaj. However, one could very well choose the height (Ck) at any other point k along the gradient for quantitation because every part of the gradient can be thought of as being composed of an element of fluid exhibiting sample concentrations defined by the amount of dispersion at that instant in time. Gradient calibration utilizes this fact to avoid inefficient, repetitive calibration by sequential dilution and injection of standards since (reproducible) dilution is already effected via sample dispersion. By performing a calibration in the usual manner first, as illustrated on the right side of the figure, the response-time profile of the peak obtained for the most concentrated standard can be mapped into a concentration-time profile, yielding a non-parametric calibration model (left side of figure). After that, all subsequent calibrations can be performed with a single injection. The empirical nature of the method is necessitated by the lack of general, accurate calibration models linking concentration as a function of time, Digital Filtering of Flow Injection Data tl Figure 4.13 4 3 2 t [analyte] 202 t Principle of gradient calibration. On the right is the usual calibration method in which replicate injections of standard solutions are made. This yields a calibration curve of say, absorbance (A) against analyte concentration (middle), which can then be used to obtain a relationship between time and concentration for a single standard injection as shown on the left. although simple expressions have been derived for systems in which chemical reaction is absent and for which a one tank model applies [84]. Recently, gradient calibration was employed to rapidly quantify the interferent effects of acetic acid on the determination of magnesium by inductively coupled plasma — optical emission spectroscopy [73]. In that study, emission intensity at 279.55 nm, corresponding to the Mg(ll) line, was mapped as a function of both analyte and interferent concentration. With two factors evaluated over N levels each, the number of experiments required would be N2, excluding replicates. By exploiting the gradient for one factor, say analyte concentration, only N experiments would be needed - one for each interferent concentration. The analyte gradient also allowed a greater number of analyte concentrations to be examined. Such elegance provides much impetus for the adaptation of gradient calibration whenever possible. Unfortunately, the flow injection peaks obtained were noisy, even after ensemble averaging of seven replicates. This included the peak for zero interferent concentration, which served as the calibration model. Conventional wisdom would decree the model to be unacceptable, otherwise Digital Filtering of Flow Injection Data 203 segment fitting [85] would be the dominant method for calibration rather than least squares fitting. This is one problem in which digital filtering could be used to advantage, especially when an indirect filter is utilized. Figure 4.14a shows the averaged peak used for calibration. This peak differs from the one reported previously [73] in that only one (2759) of the three diodes (2759 - 2761) with the most intense readings was used to define it rather than an average of all three. The reading from diode 2759 exhibited the best overall SIN of all the diodes used to measure the Mg(ll) line. This was visually evident. Use of the other two diodes leads to decreased S/N. The leading and trailing baselines have also been removed so that the peak now consisted of 95 points. Noise is clearly seen on this peak and its top is slightly flattened as a result. All three digital filters and both optimization criteria were applied. The results are presented in Table 4.9. The global minimum values for the AIC and PRESS are boxed. The various local minima are underlined. The AIC and PRESS curves show much more noise here than with the simulated peaks. Excellent agreement is observed for the optimal filter orders determined by the two criteria. The optimal filter orders were 11, 7 and 5 for the Gram, Fourier and Meixner filters, respectively. Note that for the Meixner filter, the AIC has a global minimum at filter order 16 and not 5, as indicated. However, the PRESS does not show a global minimum within the same vicinity, only a local minimum. Hence, it can be inferred that the AIC global minimum is false; the second best minimum, at filter order 5, should be accepted as correct. Finally, for the Gram filter, the optimum order determined by cross-validation was selected, for the sake of computational economy. The filtered peaks using the optimal filter orders are shown in Figure 4.14(b-d). Clearly, any one of these serve as better calibration models. Over the main portion of the peak, from the 20th to 70th data point, the three are comparable; The maximum differences were observed between the Gram and Fourier results, for which the maximum difference was 4.86 ADC units (0.93% of peak maximum) and the mean Figure 4.14 Near-optimal indirect digital filtering of a flow injection calibration peak. a) Original calibration peak with noise; b) peak after indirect Fourier smoothing; c) peak after indirect Gram filtering; d) peak after indirect Meixner filtering. Dotted curve in (b), (c) and (d) are reproductions of the curve in (a). data point 0 Digital Filtering of Flow Injection Data Table 4.9 Filter order 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 205 Calibration peak: AIC and cross-validation values for three filters Gram filter Fourier filter Meixner filter AIC PRESS AIC PRESS AIC PRESS 10.415 0 9.693 6 9.6242 8.431 5 6.509 0 6.461 7 5.958 0 5.117 8 4.1500 4.014 0 4.028 1 4.0087 3.923 5 3.924 1 3.924 7 3.925 7 3.941 8 3.961 8 3.978 1 3.9982 3.104E+06 1 .489E+06 1.372E+06 4.089E+05 6.496E+04 6.826E+04 4.124E+04 1.737E+04 5.987E+03 5.562E+03 5.607E+03 5.290E+03 5.694E+03 6.941E+03 6.381E+03 6.848E+03 7.538E+03 9.IOIE+03 1.120E+04 1.895E+04 10.415 0 8.709 9 7.1552 5.766 4 4.697 6 4.1222 3.984 1 3•9395 3.9794 3.988 8 3.971 3 3.962 6 3.928 2 3.966 5 4.002 9 4.010 1 4.001 6 4.028 0 4.0136 4.043 9 3.104E+06 5.430E+05 l.107E+05 2.705E+04 9.523E+03 5.731E+03 5.209E+03 4.955E+03 5.211E+03 5.329E+03 5.182E+03 5.161E+03 5.427E+03 5.737E+03 5.912E+03 6.270E+03 6.21 8E+03 6.403E+03 6.402E+03 6.793E+03 9.8176 7.338 8 4.096 0 4.003 6 4.024 7 3.839 9 3.860 9 3.8623 3.8400 3.853 7 3.874 8 3.8681 3.858 4 3.8584 3.857 7 3.808 4 3.797 0 3.818 1 3.8372 3.853 0 1.722E+06 1 .596E+05 7.114E+03 6.125E+03 6.882E+03 4.590E+03 4.848E+03 4.624E+03 4.591E+03 5.998E+03 5.467E+03 4.702E+03 2.705E+04 4.555E+03 4.821E+05 1.668E+05 2.656E+04 3.923E+04 2.062E+04 4.812E+05 I I I I I I difference was 2.35 ADC units (0.45% of peak maximum). At the tailing end of the peak, the Meixner filtered version appears to maintain a monotonic decrease while both the Gram and Fourier filtered peaks follow the local noise structure to some extent. Hence, if very low concentrations need to be calibrated, the Meixner filter is preferred. 4.8 SUMMARY A comparison between least-squares smoothing (Savitzky-Golay filtering) and least squares fitting (indirect filtering) using the Gram polynomials, demonstrated that the latter is better suited for filtering typical flow injection peaks (peaks which are Digital Filtering of Flow Injection Data significantly skewed). 206 Three basis functions, the complex exponential, Gram and Meixner polynomials were evaluated as basis functions for indirect filtering. The Meixner filter was found to be best for highly skewed peaks while the Fourier filter was best for more symmetric peaks. The Gram filter performed somewhere in between. The indirect filtering problem of determining the optimal filter cut-off order was cast and treated in terms of hierarchical model selection. Two solutions were presented: the AIC and cross-validation. Both methods were able to yield near optimal results (based on the SSE measure). However, they have not been demonstrated to be foolproof. A verdict can only be reached after they have been evaluated on real and varied experimental data over a long period of time. Until then, heavy reliance on their capabilities is discouraged. As Larivee and Brown [49] have noted, such methods should not be viewed as a panacea for poor data acquisition. Nevertheless, the combination of either criteria with indirect filtering should enhance the reliability of automated flow injection analyzers by increasing the precision of i) quantitative measurements and ii) peak shape descriptors used in assessment of peak quality and whose values must be evaluated numerically. The approach was applied to a gradient calibration peak model which was corrupted by noise. exhibited minimal noise. The resulting filtered peaks Digital Filtering of Flow Injection Data 207 REFERENCES [1] M. Tribus, in The Maximum Entropy Formalism, R. D. Levine and M. Tribus, (eds.), 1978, 1. [2] R. R. Ernst, in Advances in Magnetic Resonance, Vol. 2, J. S. Waugh (ed.), Academic Press: New York, 1966, 1. [3] A. Savitzky and M. E. Golay, Anal. Chem., 36 (1964), 1627. [4] P. M. Shiundu, Automated Methods Development in Flow Injection Analysis, Doctoral Thesis, University of British Columbia, 1991. [5] T. A. Maldacker, J. E. Davis and L. B. Rogers, Anal. Chem., 46 (1974), 637. [6] C. A. Bush, Anal. Chem., 46 (1974), 890. [7] T. Inouge, T. Harper and N. C. Rasmussen, Nucl. Instrum. Methods, 67 (1969), 125. [8] M. U. A. Bromba and H. Ziegler, Anal. Chem., 56 (1984), 2052. [9] G. Horlick and G. M. Hieftje, in Contemporary Topics in Analytical and Clinical Chemistry, Vol. 3, 0. M. Hercules, G. M. Hieftje, L. R. Snyder and M. A. Evenson (eds.), Plenum Press: New York, 1977, 153. [10] M. U. A. Bromba and H. Ziegler, Anal. Chem., 53 (1981), 1583. [11] G. M. Hieftje, Anal. Chem., 44(1972), 69A. [12] B. Szostek and M. Trojanowicz, Anal. Chim. Acta, 261 (1992), 509. [13] I. D. Brindle and S. Zheng, Analyst, 117 (1992), 1925. [14] C. S. Williams, Designing Digital Filters, Prentice-Hall: Englewood Cliffs, 1986, p.2. [15] R. Saal, U. Mitarb and W. Entenmann, Handbook of Filter Design, AEG-Telefunken, 1979, p. 9. [16] A. Gelb (ed.), Applied Optimal Estimation, M.I.T. Press: Cambridge, 1974. [17] C. S. Williams, Designing Digital Filters, Prentice-HaIl: Englewood Cliffs, 1986, p.5. [18] T. W. Parks and C. S. Burrus, Digital Filter Design, John Wiley and Sons: New York, 1987. [19] Ibid., p. 3. [20] S. E. Bialkowski, Anal. Chem., 60 (1988), 355A. [21] S. D. Brown, Anal. Chim. Acta, 181 (1986), 1. [22] L. Ljung and T. Söderstrom, Theory and Practice of Recursive Identification, M.I.T. Press: Cambridge, 1983. Digital Filtering of Flow Injection Data 208 [23] E. 0. Brigham, The Fast Fourier Transform, Prentice-Hall: Englewood Cliffs, 1974, pp. 83-87. [24] R. B. Blackman, Data Smoothing and Processing, Addison-Wesley: Reading, 1965. [25] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, Prentice-Hall: Englewood Cliffs, 1975. [26] R. W. Hamming, Digital Filters, Prentice-Hall: Englewood Cliffs, 1977. [27] C. S. Williams, Designing Digital Filters, Prentice-Hall: Englewood Cliffs, 1986. [28] [29] Ibid., pp. 32-35. Ibid., p. 29. [30] M. U. A. Bromba and H. Ziegler, Anal. Chem., 51(1979), 1760. [31] E. 0. Brigham, The Fast Fourier Transform, Prentice-Hall: Englewood Cliffs, 1974, Ch. 4. [32] C. S. Williams, Designing Digital Filters, Prentice-Hall: Englewood Cliffs, 1986, pp. 45-48. G. Horlick, Anal. Chem. 44 (1972), 943. [33] [34] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical Recipies: The Art of Scientific Computing, Cambridge University Press: Cambridge, 1987, Ch. 12. [35] [36] A. Ralston, A First Course in Numerical Analysis, McGraw-Hill: New York, 1965, p. 250. S. E. Bialkowski, Anal. Chem., 61(1989), 1308. [37] C. C. Enke and T. A. Nieman, Anal. Chem., 48 (1976), 705A. [38] P. D. Wilson and T. H. Edwards, App!. Spectrosc. Rev., 12 (1976), 1. [39] S. Jagannathan and R. C. Patel, Anal. Chem., 58 (1986), 421. [40] P. D. Wentzell, T. P. Doherty and S. R. Crouch, Anal. Chem., 59 (1987), 367. [41] T. W. Parks and C. S. Burrus, Digital Filter Design, John Wiley and Sons: New York, 1987, p. 20. [42] [44] R. W. Hamming, Digital Filters, Prentice-Hall: Englewood Cliffs, 1977, pp. 13-14. T. W. Parks and C. S. Burrus, Digital Filter Design, John Wiley and Sons: New York, 1987, p. 59. Ibid., Ch. 3. [45] D. W. Kirmse and A. W. Westerberg, Anal. Chem., 43 (1971), 1035-1 039. [46] D. P. Binkley and R. E. Dessy, Anal. Chem., 52 (1980), 1335. [47] R. B. Lam and T. L. lsenhour, Anal. Chem., 53 (1981), 1179. [43] Digital Filtering of Flow Injection Data 209 [48] A. Felinger, T. L. Pap and J. Inczédy, Anal. Chim. Acta, 248 (1991), 441. [49] R. J. Larivee and S. D. Brown, Anal. Chem., 64 (1992), 2057. [50] J. Rissanen, in System Identification: Advances and Case Studies, R. K. Mehra and D. C. Lainiotis (eds.), Academic Press: New York, 1976, 97. [51] E. T. Jaynes, Phys. Rev., 106 (1957), 620. [52] H. Akaike, in Trends and Progress in System Identification, P. Eykhoff (ed.), Pergamon Press: Elmsford, 1981, 169. [53] T. SOderström, tnt. J. Control, 26 (1977), 1. [54] J. Schoukens and R. Pintelon, Identification of Linear Systems: A Practical Guideline to Accurate Modelling, Pergamon Press: New York, 1991, Ch. 5. [55] R. A. Fisher, Proc. Cambridge Philos. Soc., 22 (1925), 700. [56] H. Akaike, Ann. Inst. Statist. Math., Tokyo, 21(1969), 243. [57] H. Akaike, “Information theory and an extension of the maximum likelihood principle,” 2nd International Symposium on Information Theory, B. N. Petrov and F. Czáki (eds.), Budapest: Akademiai KiadO, 1973, 267. [58] K. J. Astrom and T. Bohlin, “Numerical identification of linear dynamic systems from normal operating records”, IFAC Symposium on Self-Adapting Systems, Teddington, England, 1965. [59] E. Parzen, IEEE Trans. Autom. Control, 19 (1974), 723. [60] M. Stone, J. R. Stat. Soc., B36 (1974), 111. [61] M. Stone, J. R. Stat. Soc., B39 (1977), 44. [62] L. Ljung, System Identification: Theory for the User, Prentice-Hall: Englewood Cliffs, 1987, p. 417. [63] Ibid., p. 420. S. Kuilback and R. A. Leibler, Ann. Math. Statist., 22 (1951), 79. [64] [65] M. A. Sharaf, D. L. lllman and B. R. Kowaiski, Chemometrics, John Wiley and Sons: New York, 1986, pp. 254-255. [66] L. Staahle and S. Wold, J. Chemom., 1 (1987), 185. [67] A. T. Brunger, Acta Crystallogr., D49 (1993), 24. [68] K. J. Coakley, IEEE Trans. Nuci. Sci., 38 (1991), 9. [69] F. J. Samper and P. S. Neuman, Water Resour. Res., 25 (1989), 373. [70] H. Mager, Quant. Struct.-Act. Relat. Pharmacol., Chem. Biol., 3 (1984), 147. [71] C. Wahba and S. Wold, Comm. Statist., 4(1975), 1. [72] S. Wold and M. SjOstrOm, J. Chemom., 1 (1987), 243. Digital Filtering of Flow Injection Data 210 [73] T. D. Hettipathirana, Effects of Organic Acids in Inductively Coupled Plasma Optical Emission Spectroscopy, Masters Thesis, University of British Columbia, 1989. [74] R. von Mises, Mathematical Theory of Probability and Statistics, Academic Press: New York, 1964, Ch. IX. [75] G. E. Forsythe, M. A. Malcolm and C. B. Moler, Computer Methods for Mathematical Computations, Prentice-Hall: Englewood Cliffs, 1977, Ch. 10. [76] J. Steinier, Y. Termonia and J. Deltour, Anal. Chem., 44 (1972), 1906. [77] H. H. Madden, Anal. Chem., 50 (1978), 1383. [78] A. Proctor and P. M. A. Sherwood, Anal. Chem., 52 (1980), 2315. [79] R. A. Leach, C. A. Carter and J. M. Harris, Anal. Chem., 56 (1984), 2304. [80] P. A. Gorry, Anal. Chem., 62 (1990), 570. [81] J. F. Tyson, Analyst, 112 (1’987), 523. [82] R. W. Hamming, Digital Filters, Prentice-Hall: Englewood Cliffs, 1977, p. 35. J. E. Freund, Modern Elementary Statistics, 7th edn., Prentice-Hall: Englewood Cliffs, 1988, p.313. J. Ruzicka and E. H. Hansen, Flow Injection Analysis, 2nd edn., John Wiley and Sons: New York, 1988, pp. 51-52. [83] [84] [85] J. Workman Jr. and H. Mark, Spectrosc., 7 (1992), 14. 211 Chapter ConcIusons and Further Work A mathematician knows how to solve a problem —but he cant do it WILLIAM E. MILNE Modeling methods based on orthogonal transformations have been developed to treat two problems of interest in automation of flow injection systems: peak shape analysis and digital filtering. Three families of discrete orthogonal functions were employed: discrete complex exponential functions, Gram polynomials and Meixner polynomials. First, a strategy for peak shape quantitation has been devised as a means to identify faulty analyzer operation (analytical and mechanical). Decomposition of a flow injection peak into a weighted linear combination of orthogonal functions yields a compact set of peak descriptors. The numerical accuracy of representation for all three sets of functions were evaluated using simulated data. The complex exponential functions and Meixner polynomials were found to be most efficient for (nearly) symmetrical and highly skewed peaks, respectively. The effects of noise on the reproducibility of the individual descriptors were discussed. In practice, the Meixner polynomials must be truncated and consequently, there is a loss of orthogonality over all functions in the set. A time scale parameter is available and its value determines Conclusions and Further Work the accuracy of representation when Meixner polynomials are truncated. 212 For robust representation, the time scale should be set at the highest value possible at which orthogonality was still observed numerically over the subset of polynomials chosen for identification/classification. An empirical model was developed to allow direct computation of this time scale. The effectiveness of the descriptors for peak classification was demonstrated via principal components analysis using simulated data. A simple method based on the Fisher weight was developed to optimize the number of coefficients to use for a given classification problem. From the results obtained with this method, 5 to 13 descriptors are recommended as suitable for most FIA systems; this applies to each of the three sets of orthogonal functions used. The simulation results were verified experimentally. Second, the important task of data filtering was examined. Since the flow injection peak is a smoothly-varying continuous function of time, a filtering method based on global linear least squares fitting (indirect filtering) with orthogonal polynomials was investigated. In a case study with Gram polynomials, this approach was found to be competitive with or better than (when peaks are highly skewed) the popular least squares smoothing method of Savitzky and Golay. Subsequent comparisons between the three sets of orthogonal functions with simulated data indicated that the Meixner filter was best for highly skewed peaks while the Fourier filter was best for more symmetric peaks; the Gram filter performed between the two. The availability of three sets of orthogonal functions provides sufficient flexibility in handling a variety of peak-shaped transients. Indirect filtering also facilitated automatic optimization since only filter order needs to be optimized. The Akaike information theoretic criterion and cross-validation were used for this purpose. The performance of each was evaluated on both simulated and real data; both methods were found to give near-optimal results. Extensive real-world testing of both methodologies is the next logical step. In addition, future work should explore the practical advantages, if any, in using a Conclusions and Further Work 213 combination of descriptors from the three orthogonal functions for peak classification. Signal representation with other functions should also be examined. One family that is immediately obvious is the Chebyshev polynomials. The feasibility of orthogonal function decomposition for systems which produce multiple peaks should be investigated as well as extensions to two dimensions of information (i.e. measurements at multiple wavelengths). The Fisher weight criterion may be extended to incorporate “category separation” weights; that is, the Fisher weight between two categories which must be separated above all others may be adjusted to contribute more heavily than the others to the Fisher weight criterion. Furthermore, the non-parametric form should be evaluated as it may be less sensitive to cluster shape and category outliers. Finally, a more efficient method for performing cross-validation needs to be developed to replace the naive computational method used. This would make it computationally competitive with the Akaike information-theoretic criterion. 214 Appendix Complex Exponential Functions A.1 COMPLEX EXPONENTIAL FOURIER REPRESENTATIONS By far, the most popular and best studied set of complete orthogonal functions are the complex exponentials. This set is remarkable in that they are orthogonal on any interval [a, b]. If the interval is denoted by T, then the functions are given by Ø(t)=e0)0t n=O,±1,±2,... (Al) where (A.2) is the fundamental angular frequency. The orthogonality property is (cm,c5n)=ömnT (A.3) The exponential Fourier series approximation of a function f(t) is therefore f(t) = Fe0t (A.4) 1 F are computed via where the expansion coefficients 1 Note that these coefficients are computed with respect to qS, which are orthogonal but not orthonormal. This is contradictory to the conventions stated in section 3.1.3. However, it was deemed necessary to avoid confusion with the traditional definition of the Fourier series. Complex Exponential Functions 215 a+T (A.5) ff(t)e°dt = It is seen that (A.4) and (A.5) have an inverse relationship. Since the integrand is complex-valued, it follows that the coefficients are, in general, complex scalars. Hence, F = Re()+jlm(F) = IIe° where Re(F) is the real part of F, lm(F) is the imaginary part, (A.6) IFI is the magnitude 2 and 8is the phase angle given by arctan(lm()/Re()). given by I[Re()]2 +[lm()] The coefficient gives the magnitude and phase of a harmonic component having frequency na 0 and collectively in sequence, they form a harmonic spectrum. The exponential Fourier series generates a representation 1(t) its points of continuity but not at its points of discontinuity. produced is periodic, i.e. it repeats itself every T units of t. that equals f(t) at The approximation This is a direct consequence of the fact that the complex exponential functions satisfy the periodic relationship q5(t + T) = q5(t). By careful inspection of (A.1) and (A.2), it is clear that the fundamental angular frequency is simply normalized over the interval T. With periodicity in mind, the lower limit of the integration interval, a, is commonly set to -T12 in (A.5). While the Fourier series can be used to represent an aperiodic function, one must realize that such a representation is valid only in the specified interval and the two functions generally differ outside this interval. It is not surprising then, that the Fourier series is used almost exclusively to represent periodic waveforms. The aperiodic Fourier series representation is mainly of mathematical interest and plays a prominent role in the development of the Fourier transform, which is now defined. The Fourier transform is given by F(a) = 1f(t)e_10tdt (A.7) Complex Exponential Functions 216 Equation (A.7) is also known as the Fourier integral and is defined if the integral exists for every value of a. As with the Fourier series, the transform has an inverse: f(t) —— JF(o.)e1(otdo., (A.8) 2, In contrast to the Fourier series, the Fourier transform yields a continuous frequency spectrum. A connection may be drawn between the two: if one assumes that f(t) is zero outside [a, a + 7], then (A.7) becomes (A.5) times T, at = . 0 na While the Fourier transform (FT) is a powerful analysis tool, it is not amenable to machine computation. However, a discrete version, the discrete Fourier transform (DFT) has been developed and is given by F(a,) = n = where co = 2,in I N is radian frequency, n O,1,...,N—1 (A.9) is the frequency-domain index, k is the time- domain index and N is the number of points in one period of the time and frequency domain waveforms. The above indexing scheme was used to emphasize that the continuous Fourier transform is actually being sampled. The corresponding inverse is f)=9F(a))e1° k=O,1,...,N—1 (A.1O) The DFT is periodic, repeating itself every N coefficients. A.2 CORRESPONDENCE BETWEEN FOURIER REPRESENTATIONS The DFT is an approximation to the continuous Fourier transform. The validity of this statement depends on the waveform being analyzed. Differences between the two arise from the requirement of sampling (in both the time and frequency domains) and waveform truncation with the DFT. If f(t) is periodic and band-limited, and if the truncation interval equals exactly an integer number of periods, then the DFT Complex Exponential Functions 217 coefficients will be the same, within a scale factor, as the samples of F(cv), and F,,. This represents the only case where an exact match exists between the DFT and FT. Specifically, for equivalence between the FT and DFT, the latter must be multiplied by At; for equivalence between the Fourier series and the DFT, the latter must be divided by N. In essence, the continuous-form coefficients are linked to those of the discreteform by relating integration to summation, which itself is simply the rectangular rule for integration. From the foregoing, direct comparisons can only be effectively made between the DFT and the Fourier series. Hence, (All) The Fourier series coefficients are in fact normalized to a unit interval of t and with reference to Parseval’s theorem (sub-section 3.1.3), the power of f(t) is simply the sum of the squares of IFI. Complex Exponential Functions REFERENCE [1] E. 0. Brigham, The Fast Fourier Transform, Prentice-Hall: Englewood Cliffs, 1974. 218 219 Appendix B Legendre Polynomials B.1 GENERAL Symbol: F,(x) Differential equation: (i Explicit expression: F(x) Recurrence Relation: (n + 1) (x) = (2n +1)xP(x) nP, 1 (x) 1 Standardization: P(1) Orthogonal interval: [-1, 1] Orthogonality: dx 2 J[i(x)] — X2)yI — 2xy’ + n(n + 1)y =0 n 2 (_i)mflj = — = I = Orthogonal series: m)xn2m 2 2n+1 f(x) = = /2n+1jfpd Legendre Polynomials 220 The first 26 Legendre polynomials Table Si F(x)=1 I(x)=x (x) = 2 P (3X2 — i) (x)=(5x_3x) 3 P (x) = k(35x 4 P — 2 +3) 30x 3 +15x) (x)= (63X —70x 5 P 4 + 1 05x —5) Ij(x) = (231X6 —31 5x (x) 7 P = .(429x — P ( 8 x)= (6435X8 5 693x — 31 5x 3 + — 35x) 2 +35) 6 +6930x 12012x 4 —1260x 7 +18018x 5 —4620x 3 +315x) (x)= (12155x —25740x 9 P ° —1 09395x 1 8 (x) = (461 89x 0 19 + 6 90090x — 4 30030x + 2 3465x — 63) 3 _693x) 9 +218790x 7 —90090x 5 +15015x 11 (88179Xh1 —230945x P (x)= (x)= 2 19 (676039X12 —1939938x 8 —1021 020x 6 +225225x 4 ° +2078505x 1 —1801 8x 2 + 231) 13 (x) = -(i 300075x 3 19 — 3 90090x + (x) = (5014575X14 4 19 + — 9 —2771 340x 7 +765765x 5 11 +4849845x 4056234x 3003x) 12 +22309287x 16900975x ° 1 — 6 4849845x — 4 + 45045x 765765x 2 — — 5 2909907x + 255255x 3 16 —1163381 400x 14 (x) = .(3oo54o1 95x 6 19 + + 8 14549535x 429) 13 +50702925x 11 (x) 1 P 5 = 1(9694845x15 —351 02025x +1 4549535x 7 — — — 9 37182145x 6435x) I 825305300x 12 —1 487285800x ° 1 66927861 Ox 8 —1 62954792x 6 +1 9399380x 4 — 8751 60x 2 + 6435) continued... Legendre Polynomials 17 —2404321 560x 15 (x) = (5834o1 555x 7 I +18591 07250x 9 — + 4071 221 13 —365061 0600x 834900x 11 7 +81 477396x 535422888x 5 — 3 5542680x +1 09395x) (2268783825x18 —991 7826435x 6 F ( 8 x) = . 16 + 18032411 700x 14 —1764461 7900x 12 +100391791 50x ° 1 + (x) 9 19 = 6(441 81 — + 20 P (x)= 6 624660036x hi 262 — — 8 3346393050x 581981 40x 4 + 2078505x 2 —12155) 19 —2041 9054425x 57975x 17 + 39671 305740x 15 13 + 26466926850x 42075627300x 11 —100391791 50x 9 7 —267711 444x 2230928700x 5 +1 4549535x 3 — 230945x) (34461 632205x 20 —167890003050x 18 +347123925225x 16 — 39671 3057400x 14 + 273491 577450x 12 —1164544781 40x ° 1 8 + 3011 7537450x + — 4461 857400x 6 + 4 334639305x — 2 9699690x 461 89) i(67282234305x21 —34461 6322050x 21 262 P (x)= 17 19 +755505013725x — 15 + 694247850450x 925663800600x 13 — 3281 89892940x 11 9 —1721 0021400x +97045398450x 7 +1673196525x 5 —74364290x 3 + 969969x) (x) 2 F = (263012370465X22 —141 2926920405x 5288 ° + 3273855059475x 2 18 — 42811 95077775x 16 + 3471 239252250x 14 —18050444111 70x 12 + 601681 470390x ° 1 — 23 P (x) = 4 +22309287x 929553625x 2 —88179) (514589420475X 524 — —1247726551 50x 8 +1 5058768725x 6 —289313607511 5x 21 + 7064634602025x 19 98215651 78425x 17 + 85623901 55550x 15 —48597349531 50x 13 +18050444111 70x 11 — — 9 + 62386327575x 429772478850x 7 501 9589575x 5 +18591 0725x 3 —202811 7x) continued... Legendre Polynomials (x) 4 F = 24 4194304(8061 900920775x 22 47342226683700x — + 222 1215117151 54830x ° 2 18 +1 66966608033225x 16 —102748681 866600x 14 —178970743251 300x + — 4211 7702927300x’ 2 —11345993441 640x’° +19339761 54825x 8 8 +100391791 50x 4 I 94090796900x 25 25 = 4194304(1580132580471 9x P (x) — — 4050390505161 OCx’ 9 +1198734621 77700x’ 3 — B.2 7 859544957700x + — 202811 700x 2 23 96742811 049300x + — + 40268417231 5425x’ 7 361 00888223400x” + 676039) ’ 2 260382246760350x — 2671465728531 60x 15 + 7091245901 025x 9 5 —1 825305300x 58227239070x 3 +1 6900975x) GENERALIZED FOURIER EXPANSION IN LEGENDRE POLYNOMIALS A finite generalized Fourier expansion of a function f(x) defined over [-1, 1] in Legendre polynomials is given by: f(x)=FF,(x) where f(x) is the approximation, N (B.1) 0 is an integer, and the expansion coefficients F are F J2n+1jfPd 1 (B.2) Typically, the function to be approximated is not defined over [-1, 1], and consequently it must be scaled to this range (and made finite if necessary) before the expansion can be carried out. In general, if a function f(t) is defined on [t , tb] and the 8 orthonormal basis functions çp(x) are defined on [Xa, xb], this incompatibility can be eliminated by the following change in variable substitution: b a XbXa x— bXaba XbXa (B3) Legendre Polynomials This converts f(t) to f(x). 223 If afterwards, the coefficients for f(t) are desired rather than those for f(x), then the coefficients should be multiplied by / V tt Xb — (BA) Xa Alternatively, one can scale the basis functions to the range of the function, in which case (B.3) is solved for x, and the resulting expression substituted into the basis functions to yield 1: (t) = + bta) (B. 5) The new basis functions will likewise be orthonormal (over the new interval). In summary, the coefficients F can be computed by = I: a Sf(x)conxdx (B.6a) (B.6b) ff(t)o(t)dt As an example, suppose that the tanks-in-series model (1.5), c(t) tIT_le_tff1 — — (Nr_1)!7lT was to be approximated by a generalized Fourier expansion in Legendre polynomials. Since the function is defined over [0, co], it must first be truncated. Let the upper bound be denoted by T(i.e. interval becomes [0, 7]). Hence, (B.3) becomes (B.7) and Legendre Polynomials I — ‘S ivT Tj,T — “2 224 IT, 2) )• I Furthermore, from an inspection of the Legendre polynomials in Table B.I, it is obvious that the expansion coefficients are a linear combination of the following form = J J 1 2n+1 T, )r1T m (N _I (B.9) f(x where m denotes the order of the variable in a given term of the Legendre polynomial and am is the associated coefficient. The integration is best handled with quadrature. The continuous-form coefficients can be linked to those of the discrete-form by relating integration to summation: If(t)dt Atf[k] = bJ (B.I 0) f[k] where I[k] is the discrete version of f(t) sampled over N points. For the Legendre/Gram polynomial combination, (B.10), (B.6a) and (3.12b) combine to give F J J 1 2fl+Iff(x)P(x)dxJtb_ta tb_ta 2 2 N k]pk] (B.I1) where p[k] denote the Gram polynomials. Normalization of F with respect to Tyields gt- ta = J2n2+ 1 ff(x)P(x)dx (B.12) such that the sum of the squares of these coefficient values will equal the power of f(t) over the interval T. The factor Iici for the expression on the right hand side of (B.12) 1 ii is that required to normalize the coefficients F for number of data points when the Gram polynomials are used for computation. Legendre Polynomials REFERENCE [1] M. Abramowitz, I. A. Stegun, Handbook of Mathematical Functions, National Bureau of Standards: Washington, 1964. 225 226 Appendix Statistical Moments for the Tanks-in-series Model C.1 GENERAL DERIVATIONS The ordinary statistical moments p,, of a distribution f(t) are defined as [1] forn>O (C.1) MO alit where = $f(t)cit (C.2) alit For moments beyond the first, it is more convenient to use the central statistical moments [1], =i Y’f(t)dt 1 f(t—u (C.3) MO alit where ,i=-i--ftf(t)dt (C.4) MO alit The central statistical moments can be expanded into a linear combination of the ordinary statistical moments. For example, for n = 3, Statistical Moments for the Tanks-in-series Model 3-— PU 227 f(t—,ui)3f(t)dt alit =i_ ft3f(t)dt_.i—’i ft2f(t)dt+t ftf(t)dt_±’-_ $f(t)dt MO alit MO alit Mo alit Mo (C.5) alit =M332+31414 = P3 3uu + 2j4 The tanks-in-series model (1.5) can be re-written as C(t) tNT_le_tffj — — (N_l)!77T and is restricted to non-negative values of t. Applying the definition of the moments above, one obtains the general equation for the ordinary moments of (1.5): ufC(t)} JtNT+n_le_t/Tidt = (N i)lT’’T (C.6) The integral in (C.6) is given by [2] Jr edt = —i, (a >0, m is a positive integer) (C.7) Evaluation of (C.6) according to (C.7) followed by cancellation of common factors leads to fc} = 77 (C.8) Following up on the example above, application of (C.8) and (C.5) to the third central statistical moment gives = [(NT + 2)(N + 3 =2NT7 2N]7’ 1)N (3 NT + 1)N + 3 — (C 9) Statistical Moments for the Tanks-in-series Model C.2 228 DERIVATION FOR SKEWNESS Skewness is defined as [1] -M S-M 3 where o- is the standard deviation. The third central statistical moment is given by (C.8); in the same manner, the second can be shown to be (C.11) Substitution of (C.9) and (C.1 1) into (C.1O) produces the final result: S==_ (C.12) Statistical Moments for the Tanks-in-series Model REFERENCES [1] [2] R. Porkess, Dictionary of Statistics, Collins: London, 1988. R. C. Weast (ed.), CRC Handbook of Chemistry and Physics, 68th edn., CRC Press:Boca Raton, 1987, p. A61. 229 230 Appendix D Program Listings D.1 GENERAL This appendix contains listings of programs written in C for computing the Gram and Meixner polynomials. The routines are written as functions and to conserve memory, the matrix of polynomials are stored row-wise. Double precision arithmetric is used. For clarity, only a skeleton function is given. D.2 PROGRAM FOR COMPUTING THE GRAM POLYNOMIALS void Gram( double **g, mt order, mt points) /* Computes the un-normalized Gram polynomials (stored in g) up to the specified order and number of points. Size of the data matrix must be g[order+I][points]. *1 { mti, j, pml = points-I; double d, ni, n2, pmO5 = pml/2.0; initialize first two polynomials for recurrence / for(j=0;j<points;j++){ g[0][j] = 1.0; if( order) g[I][j] = (2.0 *j) / pml -1.0; /* } / perform recurrence I for( i = 2; i <= order; i++) { d = ((double) points i) * i, nI = (4.0*i -2.0), n2 = (i 1.0)*(pml for( j = 0; j <points; j++) g[i][j] = (g[i1][j]*n1*(j pmO5) g[i2][j]*n2)Id; - - - - } } + i); Program Listings D.3 PROGRAM FOR COMPUTING THE MEIXNER POLYNOMIALS void Meixner( double **m mt order, mt points, double timeScale) /* Computes the orthonormal Meixner polynomials (stored in m) up to the specified order, number of points and timeScale E [0, 1). Size of the data matrix must be m[order+1][points]. Calling program must have: #include <math.h> *1 { mt i, j; double a = sqrt(1.O timeScale * timeScale); - / set first point of polynomials to zero / for( i = 0; i <= order; i++) **(m+i) = 0.0; / Meixner filter network loop */ for(j= 1;j<points;j++){ m[O][j] = timeScale * m[O][j-1]; if(j == 1) m[O][j] += a; for( i = 1; i <= order; i++) m[i][j] = timeScale * m[i][j-1] } } + m[i-1][j-1] timeScale * m[i-1][j]; - 231
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Mathematical methods for automated flow injection analysis
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Mathematical methods for automated flow injection analysis Lee, Oliver 1993
pdf
Page Metadata
Item Metadata
Title | Mathematical methods for automated flow injection analysis |
Creator |
Lee, Oliver |
Date Issued | 1993 |
Description | Presently, the development of automated flow injection analyzers for use in both chemical research and process control is an active area of research in this laboratory. For these systems to operate for extended periods of time without human supervision, it is imperative that they behave in an intelligent manner. It is also desirable that they produce the best possible data. In this thesis, advanced mathematical strategies have been devised to enhance the robustness, analytical reliability and performance of automated flow injection analyzers. The mathematical methods developed are based on signal representation by orthogonal polynomials. Any arbitrary function can be expanded as a weighted linear combination of orthogonal functions via a generalized Fourier expansion; together, the weights or coefficients form a spectrum. The most familiar of these functions is the complex exponential set associated with the classical Fourier series expansion. In addition to this set, the Gram and Meixner polynomial families have been employed here. Since the latter are transients, they are particularly suited for representing flow injection data. A peak shape analysis strategy is presented for automatic identification or classification of both physicochemical and mechanical faults during analyzer operation. The coefficients from decomposition of the flow injection peak into orthogonal functions were used as general descriptors of peak shape. Each set of functions generated a different spectrum and thus each offered a different view of the peak. The effects of white noise on the reproducibility of the individual coefficients was quantified. The Meixner polynomials are orthogonal over a semi-infinite interval. In practice, these functions must be truncated, and orthogonality is lost between all functions in the set. However, a time scale parameter is available to stretch or compress these functions. It was found that for robust identification, the time scale should be set at the highest value possible at which orthogonality was still observed numerically over the subset of functions chosen for identification. An empirical model was formulated to allow direct computation of this time scale. The capability of these descriptors for peak classification was demonstrated with simulated data and principal components analysis. A simple method based on the Fisher weight was developed to optimize the number of coefficients to use for a given classification problem. From the results obtained with this method, 5 to13 coefficients are recommended for most FIA systems; this applies to each of the three sets of orthogonal functions used. Orthogonal function representation was also employed for digital filtering. A comparison between two implementations: the finite impulse response filter and the indirect filter, was conducted with the Gram polynomials. The latter was found to be better suited for filtering typical (highly skewed) flow injection peaks. The efficacy of the three basis functions for indirect filtering of peak-shaped transient signals was subsequently compared. The Meixner filter was found to be best for highly skewed peaks, the Fourier filter was best for more symmetric peaks and the Gram filter performed somewhere in between. The problem of determining the optimal filter cut-off order was cast in terms of hierarchical model selection. Two solutions are presented: the Akaike information criterion and cross-validation. Near-optimal filtering can be achieved with either method and hence, automatic filtering is facilitated. This was demonstrated on both simulated and real data. |
Extent | 3907278 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-04-08 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0059495 |
URI | http://hdl.handle.net/2429/6956 |
Degree |
Doctor of Philosophy - PhD |
Program |
Chemistry |
Affiliation |
Science, Faculty of Chemistry, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1994-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1994-894229.pdf [ 3.73MB ]
- Metadata
- JSON: 831-1.0059495.json
- JSON-LD: 831-1.0059495-ld.json
- RDF/XML (Pretty): 831-1.0059495-rdf.xml
- RDF/JSON: 831-1.0059495-rdf.json
- Turtle: 831-1.0059495-turtle.txt
- N-Triples: 831-1.0059495-rdf-ntriples.txt
- Original Record: 831-1.0059495-source.json
- Full Text
- 831-1.0059495-fulltext.txt
- Citation
- 831-1.0059495.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0059495/manifest