T H E N U M E R I C T H E A L T M E R T E H A O T M D E O N F T C O O F N J I U L G L A - P T O S E D E G P R A R D O I B E L N T By Sally A . Ross B. Sc. (Mathematics) University of Leeds, England. A T H E S I S T H S E U B M I T T E D I N P A R T I A L R E Q U I R E M E N T S M A S T E F R O O F R T F U L F I L L M E N T H E D E G R E E O S C I E N C E in T H E F A C U I N S T I T U T E L T Y O F O F G R A A P P L I E D D U A T E M A T S T U D I E S H E M A T I C S We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C September 1988 © Sally A . Ross, 1988 O L U M B I A F O E F M S S U S I N G In presenting degree , freely at this the available copying of department publication of in partial fulfilment University of British Columbia, for this or thesis reference thesis by this for his thesis and scholarly or for her of The University of British 1956 Main Mall Vancouver, Canada V6T 1Y3 Date DE-6(3/8D Columbia I further purposes gain the requirements I agree shall that agree may representatives. financial permission. Department study. of be It not is that the Library permission granted by understood be for allowed an advanced shall for the that without make it extensive head of my copying or my written Abstract This thesis examines the use of the method of Conjugate Gradients as an iterative method to be applied to linear system of equations that are ill-conditioned. A n overview of the particular problems associated with the solution of ill-conditioned linear systems is given and the method of Conjugate Gradients described. It is shown that, at each iteration, the method of Conjugate Gradients weights the contribution from the singular vectors using a polynomial that is characterized as the solution to a weighted least squares problem. The form of the polynomials is shown to approximate a series of interpolating polynomials that constitute an efficient filtering technique required for computing solutions to illconditioned systems. The performance of the method of Conjugate Gradients is shown to compare favourably with other accepted methods for ill-conditioned linear systems. An application from the field of image processing is given and the efficient computation of a smooth reconstructed image from a defocussed picture is demonstrated. n Table of Contents Abstract ii List of Tables vi List of Figures vii Acknowledgement viii 1 2 3 Introduction 1 1.1 7 Sensitivity of Linear Systems The N u m e r i c a l Treatment of Integral Equations 12 2.1 The Continuous Problem 13 2.2 The Discrete Problem 17 2.3 Regularization Techniques 23 2.3.1 Regularization of the Problem 24 2.3.2 Regularization of the Solution : Iterative Filtering 27 T h e M e t h o d of Conjugate Gradients 30 3.1 The Method of Conjugate Directions 31 3.2 The Method of Conjugate Gradients 36 3.2.1 The Conjugate Gradients Algorithm 36 3.2.2 Properties of the Method of Conjugate Gradients 39 3.2.3 Conjugate Gradients as an Iterative Method 42 iii 3.2.4 3.3 3.4 4 5 43 Preconditioned Conjugate Gradients 45 3.3.1 Generalized Conjugate Gradients 45 3.3.2 Practical Implementation of Preconditioned Conjugate Gradients 47 3.3.3 Common preconditioners 49 Conjugate Gradients for Linear Least Squares 51 3.4.1 Linear Least Square Algorithm 51 3.4.2 The Preconditioned Algorithm for Least Squares 53 T h e F i l t e r i n g Properties of the M e t h o d of Conjugate G r a d i e n t s 54 4.1 Iterative Filtering 55 4.2 The Filtering Function 56 4.3 Characterization of the Operator 62 4.3.1 Precise: Minimization of the Deviation from Zero 63 4.3.2 Approximate: Interpolation 64 4.3.3 Extension to Later Iterates 68 4.4 Approximation to the Inverse 72 4.5 Numerical Tests 74 Numerical Application 86 5.1 A Description of the Problem 86 5.1.1 The Mathematical Reconstruction of Degraded Pictures 86 5.1.2 The Test Example 89 5.2 6 Convergence Rate Discussion of Results 90 5.2.1 101 Termination Discussion 104 iv Bibliography 106 Appendices m A Appendix A.l HI Kronecker Products Ill v L i s t of T a b l e s 2.1 Comparison between the S V E and the SVD 19 4.2 Coefficients for the first 3 polynomials for the Trapezoidal example 4.3 Coefficients for the first 4 polynomials for the Phillips' example 62 4.4 Comparison of accuracy and work for the Hilbert example 75 4.5 Comparison of accuracy and work for the Trapezoidal example 77 4.6 Comparison of accuracy and work for the Simpson's example 77 4.7 Comparison of accuracy and work for the Phillips' example 78 4.8 Conjugate Gradient analysis for the Hilbert example 82 4.9 Conjugate Gradient analysis for the Trapezoidal example 83 ... 60 4.10 Conjugate Gradient analysis for the Simpson's example 84 4.11 Conjugate Gradients analysis for the Phillips' example 85 5.12 Analysis from the Conjugate Gradients algorithm for the Cross example . 101 5.13 Analysis from the Conjugate Gradients algorithm for the L example . . . 102 5.14 Analysis from the Conjugate Gradients algorithm for the Box example 102 vi . List of Figures 4.1 Examples of filtering polynomials 61 4.2 Examples of the filter values for the Hilbert example 70 4.3 Examples of the filter values for the Trapezoidal example 71 4.4 The singular value spectra for the 4 test examples 76 4.5 Chosen solution for the Simpson and the Phillips' test cases 79 5.6 The digitized sharp test images 91 5.7 The digitized defocussed test images 93 5.8 Numerical representation of the degraded test images 94 5.9 The error for each of the test images 95 5.10 The singular values of the reconstruction matrix 96 5.11 Numerical results from iterations for the Cross 97 5.12 Visual representation of iterates for the Cross 98 5.13 Visual representation of iterates for the L 99 5.14 Visual representation of iterates for the Box vii 100 Acknowledgement I am indebted to my supervisor, Dr James Varah, for his constant patience and expert guidance regarding the preparation of this thesis and throughout my studies in Vancouver. Special thanks go to my family in Belgium, for always being so supportive, but especially for paying the telephone bill. I would like to acknowledge, with particular gratitude, the Gibbins family for giving me a stress-free home to enable this to be completed. This thesis may have been abandoned had it not been for the support and encouragement given by Steve Adams, I am very grateful to him for so many reasons. vin Chapter 1 Introduction The computation of solutions to problems for which the exact solution does not depend continuously on the supplied data is challenging; such problems are said to be ill-posed and encountered frequently in various application areas. Solutions are difficult to find since small changes in the data may alter the solution considerably. The data for many physical problems may be generated as the results gathered from practical experimentation and small errors are inherent in the problem. Any numerical computation may introduce further small errors through the effects of roundoff. These small errors, which are effectively due to "noise" in the system must not be reflected as large oscillations in the solution. It is of great practical importance to be able to efficiently compute a meaningful solution to such problems. This thesis examines the use of the method of Conjugate Gradients for computing the numerical solution of linear systems of equations which arise from ill-posed problems. One source of ill-posed problems is that of the solution to linear integral equations of the first kind (1.1) such equations arise in many applications, e.g., chemistry and image processing. The operator, K(-, •), has the characteristic that it smooths the function, /(•), in the mapping /(') — * 9(')- The function, g, will always be smooth, even in cases where / is not contin- uous. A solution will not exist for arbitrary g. The stronger the effect of the smoothing is, the more difficult it will be to evaluate the solution. The numerical treatment of (1.1) 1 Chapter 1. Introduction 2 is often achieved b y first discretizing t h e e q u a t i o n to f o r m a l i n e a r s y s t e m , Ax=b where A 6 9ft" xn a n d !i E B". (1.2) T h e i l l - p o s e d p r o p e r t y o f the p r o b l e m is reflected i n h e r e n t i l l - c o n d i t i o n i n g o f the coefficient m a t r i x A. n u m e r i c a l solution of i l l - c o n d i t i o n e d l i n e a r systems t h e c o n t i n u o u s p r o b l e m (1.1) in the T h e c o n c e r n o f this thesis is t h e (1.2); t h e t h e o r y for the s o l u t i o n o f is used for u n d e r s t a n d i n g a n d m o t i v a t i o n . T h e p r o b l e m itself is difficult t o solve. E x a c t solutions m a y b e i m p o s s i b l e to find. A s o l u t i o n (if it exists) m a y be n o n - u n i q u e a n d h i g h l y u n s t a b l e , e.g., i n t h e field of i m a g e p r o c e s s i n g m a n y very different o r i g i n a l p i c t u r e s m a y b e i n d i s t i n g u i s h a b l e w h e n the i m a g e is " b l u r r e d " i n some way. T h e strategy is i n s t e a d to c a l c u l a t e t h a t w i l l be acceptable. solution It is therefore necessary to state c o n d i t i o n s u n d e r w h i c h a s o l u t i o n w i l l b e d e e m e d acceptable; t h e n ensure that an approximate s u c h as some degree o f s m o o t h n e s s i n t h e s o l u t i o n . T h i s w i l l the effect o f t h e s m a l l errors, t o cause results, is m i n i m i z e d . A n acceptable the calculation of misleading s o l u t i o n is therefore o n e t h a t is stable a n d t h a t w i l l give a s m a l l r e s i d u a l error. O n c e discretized, t h e vector require t h a t t h e p r o b l e m (1.2) b m a y n o t h e i n t h e range of A. b e stated i n s t e a d as the m i n i m i z a t i o n m i n ||Ax w h i c h is the u s u a l linear least squares p r o b l e m . 6||2 m a y be very different c o m p u t e a stable s o l u t i o n o f (1.3) w i l l have a s o l u t i o n t h a t are e q u a l l y accurate i n terms f r o m each other. T h e need to be able t o has m o t i v a t e d t h e strategy o f i n c l u d i n g i n the p r o b l e m a r e s t r i c t i o n o n the class o f allowable solutions. M o t i v a t i o n f r o m t h e c o n t i n u o u s p r o b l e m , [6], f r o m t h e i n c l u s i o n , as c o m p o n e n t s (1.3) T h e p r o b l e m (1.3) b u t t h e i l l - c o n d i t i o n i n g of A m e a n s that t w o estimates of t h e s o l u t i o n to (1.2) T h e first step is t o demonstrates o f t h e s o l u t i o n , o f vectors t h a t t h e difficulties stem that are h i g h l y oscillatory. Chapter 1. Introduction 3 The "noise" in the system has an exaggerated effect on the contribution of these vectors so that they are given inaccurate weightings that may be disproportionately high. A n approximation is formed so that the contributions from these unstable components is reduced, or excluded. The class of allowable estimates is thus restricted to be functions that are smooth but will (hopefully) still be relatively accurate. The motivation leads to the technique of Regularization. Regularization is used in general to mean the restatement of an ill-posed problem as a well-posed one that is made close to the original by the use of a small parameter. The parameter controls the extent of the regularization. The idea of regularization was first introduced by Tikhonov [37]. The original problem (1.3) is restated as a constrained least squares problem to restrict the solution to be smooth whilst minimizing the residual error. Understanding of the continuous problem demonstrates that the idea of regularization can be seen to be equivalent to including in (1.1) a small contribution from the function / to the right hand side of the equation and thus converting the equation to one of the second kind. The solution of second kind integral equations is a well-posed problem [5, 9]. The illconditioning of the coefficient matrix is now no cause of difficulty and the corresponding discrete problem may be solved by any direct method. In practical applications the system of linear equations generated is often very large and sparse. The use of direct solution methods is restricted. Iterative techniques can be used successfully by setting the parameter to zero and regularizing the solution by terminating the iterative process before the oscillatory components of the solution have converged. The procedure is known as an Iterative Filtering technique and was first studied by Landweber [24]: a series of less and less regularized solutions is generated by filtering the contributions of the unstable vectors. The extent of the regularization is controlled by the choice of the iterative method and the number of iterations performed. Chapter 1. Introduction 4 Various iterative methods have been studied, [17, 36]. Each method generates an associated filtering function. The purpose of this thesis is to examine the filtering properties of the iterative method of Conjugate Gradients. The method of Conjugate Gradients is an efficient numerical iterative algorithm for computing the solution of a linear system of equations, (1.2). At each step the majority of the work effort is found in the formation of a single matrix-vector product with the original coefficient matrix; for this reason the scheme is particularly suited to large sparse systems of linear equations. The original presentation of the method of Conjugate Gradients [22], was as a minimization process that terminated in a finite number of steps; thus as a direct numerical method. It was soon discovered that although in a practical environment the process did not terminate as predicted, a good estimate of the solution is achieved after only a few steps. The basic algorithm may be simply modified to give an efficient solution method for the linear least squares problem (1.3). In [25] it was first shown that the performance of the method could be further improved by the notion of preconditioning. The method of Conjugate Gradients, particularly when preconditioned, was seen to be a very powerful and economical iterative procedure. The method of Conjugate Gradients has been applied to ill-conditioned systems of equations [35] and [8] with little explanation as to the reasons for its success. Recently, Vogel [42] demonstrated that the method of Conjugate Gradients does have the properties of a favourable iterative filtering method. This thesis extends the ideas in [42] by examining the results of the filtering process in more detail. A number of discretized integral equations are provided as numerical examples, including an application from the field of picture processing; these are more general than the illustrations given in [42]. The method of Conjugate Gradients is shown to be an economical method to compute a smooth, accurate solution when the system of equations is ill-conditioned. The method is successful without the need to apply a preconditioner; in fact any preconditioning may Chapter 1. Introduction 5 inhibit the ability of the method to calculate a smooth approximating. Although the integral equation (1.1) is a source of the features of the problem, it is the solution of the resulting linear system which is the topic here. The material in later chapters requires the understanding of the sensitivity of linear systems to small errors in the data. As an introduction the basic notion of conditioning is detailed in §1.1 at the end of this Chapter. The numerical treatment of integral equations is the subject of Chapter 2. The continuous equation (1.1) is examined in §2.1 where Fredholm integral equations of the first kind are shown to be a source of ill-posed problems. The analogies between the continuous and discretized problem are presented in §2.2. With this motivation the technique of regularization is discussed in §2.3. The original method of regularization, presented in §2.3.1, is shown to be inappropriate for solution by iterative procedures. §2.3.2 describes the approach to the problem by using the iteration count as the parameter to control the extent of the regularization. A description of the method of Conjugate Gradients is presented as Chapter 3. The method of Conjugate Gradients is shown to belong to the general class of Conjugate Directions schemes in §3.1, and the theory on which the development of the algorithm is based is discussed. The specific case of the method of Conjugate Gradients is the subject of §3.2. The algorithm is detailed in §3.2.1 and the properties required in later chapters are discussed in §3.2.2. The alteration in the manner in which the algorithm is considered, (from direct to iterative) is discussed in §3.2.3 and the evaluation of the convergence rate presented in §3.2.4. The improvement of the method by preconditioning techniques is given in §3.3. Preconditioning is motivated by the idea of a Generalized Conjugate Gradients method in §3.3.1 and the practical implementation detailed in §3.3.2. Some of the common choices for the preconditioner are described in §3.3.3. §3.4 shows that the method of Conjugate Gradients can be successfully used to solve the linear least squares Chapter 1. Introduction 6 problem (1.3); an efficient algorithm is given in §3.4.1 and the preconditioned version presented in §3.4.2. The use of the method of Conjugate Gradients as an iterative filtering technique is examined in Chapter 4. The features that are required by any iterative method to act as an effective filter are discussed in §4.1. The relations that arise in the algorithm are given and the polynomial that forms the filtering function developed in §4.2. In §4.3 the polynomial is shown to approximate the Lagrangian interpolant through the singular values and thus be an effective filter; a number of numerical illustrations are provided. The polynomial may also be viewed as an expansion of the approximation of the inverse of the coefficient matrix; this approach, to the explanation of the properties, is examined in §4.4. §4.5 discusses the practical application of Conjugate Gradients as an iterative filter. Results are compared with other recognised solution methods for ill-conditioned systems. The method of Conjugate Gradients is shown to be a competitive alternative. An application from the field of Picture Reconstruction is examined in Chapter 5. A description of the problem and its mathematical formulation is detailed in §5.1. The mathematical formation of general problems from image processing is discussed in §5.1.1 and the specific example used for the numerical tests derived in §5.1.2. The results of the numerical tests are presented in §5.2. Chapter 6 is a discussion of the findings of this thesis. As a preliminary to the problems that are the topic of the later chapters the following section defines what is meant by a linear system to be described as ill-conditioned. Results concerning the Kronecker product of matrices, summarized for use in Chapter 5, are detailed in the Appendix. Chapter 1. Introduction 1.1 7 Sensitivity of L i n e a r Systems An important question in the numerical computation of the solution to linear systems of equations is the effect on the solution of small changes in either the elements of the coefficient matrix, or in the data forming the right hand side. Numerical systems such as (1.2) frequently contain small errors which arise either in the way the data is collected or in the numerical calculations performed in the solution process. These errors mean that the original problem is rarely represented exactly in its numerical form and the resulting system is some perturbation of the original problem. There are two considerations: • How the exact solution, x~, changes when the system is perturbed ; • How the computed solution, x, differs from the exact solution. The first issue concerns the sensitivity of the system and is dependent only on the problem itself, but the error in the calculated solution is a feature of the chosen solution method. If a system is non-singular, so that A" 1 can be formed, there is a unique solution to (1.2): x" = A~ b for each vector b. However the matrix may be "almost" singular in x the sense that small perturbations of the elements will result in a matrix that is rank deficient. In this case, although (mathematically) the system has a unique solution, computationally it has the features of a singular system, so that a solution (if one exists) may not be unique. The smaller the alterations to the matrix A that produce a singular matrix A + 8A are, the greater the problems in computing the solution to (1.2). A measure of the extent of the near-singularity of A, and so the relative magnitude of the difficulties, is given by the condition number of the matrix K(A) = ^ = \\A\\ \\A~%. 2 (1.4) Chapter 1. Introduction 8 where o~i > . . . > <r > 0 denote the singular values of the matrix A. (Singular values n are the square roots of the eigenvalues of A A.) If A + 6A is singular then the 2-norm of T the smallest such perturbation matrix is equal to the magnitude of the smallest singular value, i.e., ||£A|]2 = <r [16], so that the shortest (2-norm) distance from A to the set n of singular matrices is given by a . n When this distance is close to zero the resulting condition number (1.4) may be very large. Matrices for which the condition number is high are known as ill-conditioned. The effect on the solution of small variations in the data is illustrated by considering a perturbation of the right hand side to b = b + 8b. The solution x = x + 8x is such that (1.5) (See [16, 26] for details and a similar bound for a perturbation of the matrix A). This means that perturbations of the data in (1.2) are reflected in the solution by a factor of the condition number. Systems that involve a poorly conditioned coefficient matrix may give rise to unstable solutions, i.e., solutions which may show a finite change based on an infinitesimal variation of the data. When a matrix is ill-conditioned it effectively has a finite "near" null-space, i.e., there are non-zero vectors w such that Aw ~ 0. If x solves (1.2) then x + w is also an acceptable solution since the residual \\A(x + w) — 6|| is small. However \\w\\ need not be small, 2 2 so the two solutions may be quite different from each other. The vectors w that form this near-null space are combinations of the singular vectors that correspond to the small singular values. They are often violently oscillatory [6]. The inclusion of these vectors as components of the solution is the cause of the instability. In numerical calculations it is not usually possible to obtain the exact error ||aj" — xi j J 2 of an estimate u to the solution of (1.2). One measure of the accuracy of u is the norm C h a p t e r 1. Introduction 9 of the residual |jT*112 = \\Au — 6||. The residual and the exact error are related as 2 ii n 2 1 j 10 [ 1 2 If the condition number K(A) is small, this is a reasonable bound and the residual is a reliable quantity to test the accuracy of an estimate. However, in the ill-conditioned case, although a large residual is an indication of a poor estimate, a small residual can be misleading. In some cases the particular vector b for the system gives rise to a solution which lies far from the "near" null-space. The matrix is still ill-conditioned but the system itself is less susceptible to errors since the contribution from the oscillatory singular vectors is such that roundoff effects will still be relatively non-destructive. (Equally, a system with the same coefficient matrix may have a solution that is highly oscillatory and will be extremely sensitive to errors even of the same size as the machine precision.) These are special cases; it is generally true that the higher the condition number the greater the difficulties in evaluating an appropriate solution. In summary: the consequence of a matrix being ill-conditioned is that the solution is effectively non-unique in that there are many near-solutions. These near solutions are the exact solutions to slightly perturbed problems, but it is not necessarily true that they are close to the actual solution. In addition it is difficult to determine the accuracy of any chosen estimate. Conditioning is a feature of the matrix and not the method of solution. However an algorithm must succesfully handle the difficulties (of computing a stable solution) that arise when dealing with poorly conditioned systems. The numerical solution of (1.2) introduces errors which effectively result in perturbed problems, since all numbers are represented and computations performed using a finite word length. When the matrix is ill-conditioned, if the data is altered either by the numerical calculations (or in the way that the problem is initialized) any attempt at Chapter 1. Introduction 10 solving the system may result in the wrong solution being selected. Additionally, since the measure of accuracy of the solution is given by the residual, it is often difficult to distinguish between good and bad solutions. Therefore there are two problems : • To select a solution to the problem so that small numerical perturbations do not result in large oscillatory components being exaggerated in the solution. • To be confident that the solution obtained is reasonable. Problems can arise when standard solution methods are applied to ill-conditioned systems. Numerical methods for linear systems effectively form the solution x" = A b. _1 When A is ill-conditioned it may be very close to a matrix which is not invertible and so this method of solution (by effectively forming the inverse) is not advised. Perturbation of the elements of A or 6 even by an amount equivalent to that of the machine precision may adversely affect the exact solution x". This indicates the need to alter the statement of the problem (1.2) before it is solved so as to ensure that some form of pseudo-inverse of A is used instead. In many physical applications the systems are very large and sparse, with the solution approximated by an iterative method. When the coefficient matrix is ill-conditioned, slow convergence rates and instability in the solution process can cause problems, but a more pertinent issue is whether the solution obtained is the one sought (since the residual, which is the usual indicator of convergence, is not reliable). Some modifications to the solution strategy are needed to allow successful resolution of this issue. Insight into the exact nature of the difficulties and guidance on how to overcome them can be gained by considering the continuous analogy to the problem (1.2) ; that of Fredholm integral equations of the first kind. The solution to integral equations has long caused difficulties due to the ill-posed nature of many of the problems. The theory is the Chapter 1. Introduction 11 subject of many dedicated texts, for example [34] and [9] and their numerical treatment is equally well documented, [5, 13, 1]. The next chapter begins by considering the continuous case of Fredholm Integral equations of the first kind. Their discretization results in an ill-conditioned linear system. The similarities between the continuous and discrete cases are examined in §2.2 and the modified solution strategy motivated. The methods used to approach ill-posed problems is the subject of sections §2.3.1 and §2.3.2. (It is noted that the terms ill-posed and ill-conditioned are not strictly interchangable. Ill-conditioned is a term to describe a particular feature of the coefficient matrix whereas ill-posed is used to describe a quality of the entire problem. A problem may be ill-posed because: There is no solution; There are many solutions; The solution is unstable. Linear systems that arise from ill-posed probems are known to have ill-conditioned coefficient matrices. While it is possible for a matrix A to be ill-conditioned and the problem well-posed (in that the particular right-hand-side gives a straightforward stable solution) this is a special case. Throughout this thesis it is assumed that ill-conditioned implies ill-posed and the terms used as synonyms.) Chapter 2 T h e N u m e r i c a l Treatment of Integral Equations The solution of problems formed by discretizations of Fredholm integral equations of the first kind L b K(s,t)f(s)ds = g(t), c<t<d (2.7) are known to require special care. Equations of this form are distinguished from other integral equations because the unknown function /(s) appears only under the integral sign and that the interval t £ [c,d] need not necessarily coincide with the range of integration. We will consider only linear equations where the kernel, K(s,t), is square integrable (or L ) . A n Li kernel is one such that 2 ( f K(s,t) dsdt Ja Ja 2 < oo (2.8) The solution of (2.7) is difficult to calculate because in general it is not continuously dependent of the data. Small alterations in the data may lead to large oscillations in the solution; thus there may be many "near" solutions. The problem is therefore ill-posed in that a solution (if it exists) may be non-unique or highly unstable. Numerical solutions of (2.7) can be obtained by first discretizing the problem to obtain an algebraic system of linear equations of the form (1.2). It is expected that the solution to the discrete problem will in some way reflect that of the continuous one. It is possible therefore to understand the difficulties mentioned in §1.1 in a continuous setting. In §2.1 the continuous problem is detailed as motivation to the solution of the discrete counterpart in §2.2 The motivation from the theory of integral equations suggests a 12 Chapter 2. The Numerical Treatment of Integral Equations 13 modification to the solution strategy for ill-conditioned systems. The original problem is replaced by a well-conditioned problem that is made close to the original by the presence of a small parameter; this technique is known as Regularization. §2.3 examines the technique of Regularization. §2.3.1 discusses the creation of a well-conditioned equivalent system by a constrained minimization problem and §2.3.2 describes how Regularization may be achieved by iterative methods. 2.1 T h e Continuous P r o b l e m Fredholm equations of the first kind, (2.7), frequently result from physical problems (e.g., chemical analysis, image reconstruction, geophysics) [1, 13, 31, 27, 37]. In these situations the function / corresponds to some actual physical phenomenon that is required to be calculated while g is the set of data, recorded by some apparatus, on which the estimation of / is based. The measuring instruments will have some degree of smoothing effect on the data which are represented in the operator K. There are experimental Umitations on the accuracy of the data aquisition, and thus some loss in the details of / . In particular high frequency components become exaggerated. With these problems it is important to avoid treating small oscillations in g, that are actually due to the noise in the system, as if they were caused by large oscillations in / . The solution method becomes important as, during its course, any numerical computation is liable to further complicate the situation by introducing additional errors that resemble noise. There are two important difficulties associated with the solution of (2.7). A n equation of the form (2.7) is not guaranteed to have a solution for arbitrary kernel, K(s,t), data g{t), and a solution (if it exists) need not be unique. or A computationally more troublesome feature is that the problem is not well-posed, in that the solution does not depend continuously on the data. This means that small perturbations of the function Chapter 2. The Numerical Treatment of Integral Equations 14 g(t) can be reflected as arbitrarily large changes in the solution f(s). The problem of existence and uniqueness can be illustrated by simple example. With K(s,t) = cos(s)cos(£) and 0 < s < 2TT (2.9) any given f(s) will produce a function g(t) that is a multiple of cos(s). This means that if an arbitrary g(t) is not of this form it will not be possible to solve for f(s). In addition for any solution f(s) that does exist f(s) -fasin(.s), for any value of a, is also a solution. An illustration of the ill-posed property of the system is as follows. The RiemannLebesgue lemma [5, 34] states that: For any square-integrable kernel, K(s,t), / K(s, t) sm(ns)ds —»• 0 as n —> oo Ja (2-10) This means that if h(s) is a solution to (2.7) then h(s) = h(s) + Csin(ra.s) for any constant C is also an approximate solution since (for large n), r / b K(s,t)[h(s) + C sin(ns)]ds -> g(t) as n -> oo. (2.11) Ja Large values of n = TV that give rise to a slight perturbation of the data f K(s,t)[h(s) b + Csin(Ns)]ds Ja = t K(s,t)h(s)ds Ja = g(t) + = Ja where e < C l m correspond to a solution h(s) where C may be arbitrarily large. = h(s) + ~ h(s) + C + C t Csm(Ns) K{s,t)sm(Ns)ds Chapter 2. The Numerical Treatment of Integral Equations 15 In order to solve (2.7) it is necessary to resolve the problem of how to choose, from the many solutions that are available, a solution that is reasonable. A solution will be accepted as reasonable if it is stable and will give a small residual, i.e., || J K(s,t)f(s)ds-g(t)\\_~0. (2.12) The class of allowable solutions must in some way be restricted. One approach is to require that the solution f(s) be smooth. Whether this smooth choice is the "best" is difficult to assess; it is, however, the exact solution to some small perturbation of the equation (2.7) and will have guaranteed stability, and so is a reasonable solution. The singular value expansion, (SVE), of the system (2.7) can be used to understand the problem of selecting a reasonable solution, [6, 19], and to give an analysis of the existence of a smooth solution. For any square integrable kernel K(s,t) there exist two sets of orthogonal functions {$;(£)} and {^(s)} known as the left and right singular functions respectively. Associated with these functions are scalars pi such that J K(s,t)Vi(s)ds = pi$i{t) and J §i(t)K(s,t)dt The pi are the singular values of the kernel K(s,t) = UiVi(s). (2.13) and since oo K(s,t)=Y,p V (s)<t> (t), i=l i l i (2.14) in order for the kernel to be L the sum (2.14) must converge and therefore zero is the 2 only limit point of the singular values, so that, pi —> 0 as i —> oo. From this oo </(*) = (2.15) t=i where 7/; = ($i,fir) (where (•, •) denotes the usual generalized inner product). The solution can then be expanded as the infinite sum /(0 = £ ^ ( 0 i=l ( - ) 2 16 Chapter 2. The Numerical Treatment of Integral Equations 16 This again illustrates the ill-posed property of the problem. The limit point of pi is zero so the convergence of (2.16) is not guaranteed. The singular functions $i associated with the small singular values tend to be highly oscillatory [6], and so highly sensitive to roundoff effects when forming the expansion (2.16). It is the inclusion of these as components of the solution that causes the instabilty. A condition under which convergence, and with it, stability, is assured, is given by Picard's Condition [34, 39] : The existence of a smooth solution f(x) £ Li to equation (2.7) where K(s,t) E Li is guaranteed provided oo £l^| <°° (2-17) 2 where 7/; = ($i,g)> and pi are the components of the singular value expansion of K(s,t). For a reasonable solution to exist then, the 77; must decrease to zero at a faster rate than the decay of the scalars /z; [1]. This will ensure the convergence of (2.17). The decay of the singular values is a function of the smoothing property of the kernel. The smoother the kernel is, the faster the singular values will decay. This increases the illconditioning and places more severe restrictions on an admissable solution. The integral equation (2.7) can be treated numerically by first discretizing it using some quadrature rule to produce a linear system (1.2). A set of n linear equations are thus created : n $2w K(s t )f{s )=g(t )ioii = l,...,n J i il j i i (2.18) where {wi} are the quadrature weights at the nodes s; £ [c,d]. The points U are chosen from [a,b]. This gives the linear system (1.2) formed by A = (aij) where a^ = WjK(si,tj) Xi = f(si) and hi = g(ti) ^ (2-19) (2.20) In the continuous problem infinitesimal changes in g(t) or K(s, t) may produce a finite Chapter 2. The Numerical Treatment of Integral Equations change in f(s). 17 This is reflected in the discrete version by the inherent ill-conditioning of the matrix A (in the sense detailed in §1.1). 2.2 The Discrete P r o b l e m The linear system Ax = b with A 6 3 R nXn (2.21) and b £ 3? that is equivalent to the continuous problem (2.7) is created n by discretization of the integral equation by a chosen quadrature method. (The linear system has the properties discussed in §1.1.) The purpose here is to investigate the quality of a numerical solver for general ill-posed problems, so the exact solution sought is taken to be that of the discrete problem (2.21). How accurately the solution to the discrete problem x matches the continuous function f(s) is a separate consideration that involves the truncation error introduced by the discretization. Any quadrature method with reasonably small truncation error will be acceptable and it will result in an ill-conditioned system of equations. Different discretizations do not lead to significant differences in the underlying ill-conditioning of the matrix, only in the accuracy with which the discrete solution resembles its continuous counterpart. The use of a quadrature method to discretize the problem may result in a system such that the data vector b is not in the range of the coefficient matrix, i.e., b £ R(A). In this case it is usual to solve the problem by computing the solution of minimum residual norm, or the Linear Least Squares solution. min||Ax -b\\\. (2.22) The singular value decomposition of the matrix A can be used to demonstrate the similarities between the two problems (2.21) and (2.7). The continuous theory can then Chapter 2. The Numerical Treatment of Integral Equations 18 be used to motivate an appropriate solution strategy for all numerical ill-conditioned problems. The existence of singular values and their associated singular vectors in the continuous problem is reflected in the discrete problem in the decomposition of A as UT,V , where T S = diag(a ,... ,o~ ) U = [ui,...,u ] V = [v ...,v ] 1 n n u n and Ui,Vi are the left and right singular vectors of A, respectively. (To avoid scaling difficulties it is assumed that the matrix A is scaled so that o~ = 1.) This is the standard x singular value decomposition, (SVD) (see, e.g., [16, 15]) of a matrix. The vectors U{ and Vi are discrete approximations to the continuous functions $; and $ ;. The singular values {cr;} correspond to the constants /z; of the continuous problem and therefore will decay to zero. The matrix A is thus inherently ill-conditioned as defined in §1.1. The solution x is the finite equivalent of the sum (2.16) x = —Vi i=l (2.23) °* In forming the sum (2.23) numerically, the small magnitude of the singular values may cause roundoff effects to associate incorrect weights with the "later" singular vectors. This means that the troublesome effects of these vectors as components of the solution become increased. The properties of the singular value expansion of (2.7) and the singular value decomposition (2.23) of (2.21) are summarized in Table (2.1) to emphasize the correspondence and similarities between the two. With the strong connection between the analysis of the two problems and the similarity in the difficulties associated with their solution, it is natural to assume a discrete Chapter 2. The Numerical Treatment of Integral Equations SVE SVD K{s,t) = _2Zi H&Vi Pi > P2 > • • • > 0 for i = 1,2,... i CT2 . . . > O > 0 for i = 1, 2, . .. , n ujuj = S^ vfvj = Sij Avi = OiUi A Ui = (TiVi O"! > = SK{s,t)9 {t)dt J K(s,t)$i(s)ds 19 = m$ (t) = Pi^i(s) i n T Table 2.1: Table emphasizing the relation between the S V E and the SVD analogue of the Picard Condition given in (2.17). This condition can be used to specify the existence of a smooth solution to the discretized system of equations. The Picard Condition (2.17) requires that the ratio 1 — 1 decay to zero as the index, i increases. This is assured by the relative decay of r/; and / i ; , so | —| —> 0 as i —> oo. Mi Once discretized the summations, shown in the table, become finite. discrete quantities 7; = ufb In practice the do not decrease to a value below the noise level of the system (unlike the continuous equivalent rji = ($;,5r)). The singular values o";, however, continue to decay to zero. The criterion under which the solution to a system (2.21) will be deemed reasonable can therefore be given as a version of Picard's Condition in discrete form [40]. It is required that the solution depend continuously on the data: i.e., that it be insensitive to noise or small perturbations of the system. A reasonable solution is one such that the residual is small for an approximation that is smooth, which mathematically can be stated as: \\Ax - b\\\ = 0(e) and \\x\\ = 0(1) 2 2 (2.24) for noise level e. Expanding this in terms of the SVD gives \\Ax-b\\l = Y.{oivJx - f 1{ i=l (2.25) Chapter 2. The Numerical Treatment o f Integral Equations and 11as||2 = n J2( i ) v x 2 20 (- ) 2 26 i =l Thus if x is to be a reasonable solution given that for some value m the scalars 7; have a magnitude approximately that of the noise of the system, i.e., J7;j ~ e for i > m, then the previous 7; must be of the order of the associated singular values, i.e., l7i| = O(o-i) for i < m. (2.27) The condition above (2.27) requires that the rate of decrease of 7; to a magnitude of the order of the noise level of the system (i.e., 7; —> e for i < m) be faster than that for the singular values <T;. This is a discrete form of Picard's Condition and its use assures the existence of a computable smooth solution. (A solution may exist even if this does not hold but it is not guaranteed to be smooth, and some other approach is necessary.) The condition (2.27) can be used as a strategy to compute the solution to (2.21) by employing a truncated form of the SVD. Analysing the quantities 7, = ufb and cr; for z = 1, . . . ,71 (which are readily accessible once the SVD has been computed) even in cases where the noise level e is not precisely known, a value m can be selected with |7i| ~ e, m +1< i < n (2.28) i = l,...,m (2.29) t=m+l,...,n (2.30) Then the ratios | —|, are all of "reasonable" magnitude, whereas | — I, grow rapidly in size. In forming the full solution (2.23) large errors are accumulated as the summation proceeds. The division by small numbers (2.30) causes large factors to Chapter 2. The Numerical Treatment of Integral Equations 21 be associated with the (oscillatory) singular vectors. A smooth solution is guaranteed if the expansion is truncated before the destructive components are included x™=£^v . (2.31) l i=l 1 This is the method known as the Truncated SVD see for example, [8, 38]. If the exact solution to the system (2.21) can be expressed in terms of the first few singular vectors only, this method of truncating the SVD expansion is extremely accurate and the ill-conditioning of the coefficient matrix is of no consequence. If the noise level is unreasonably high, or the kernel particularly smooth so that the singular values decrease very rapidly, the value of m may be extremely low. In this case, although the computed solution is smooth, the residual may be unacceptably large. When the magnitude of m is reasonably large the calculation of a smooth, relatively accurate solution (in terms of a small residual) is easily obtained. This will give a solution to a slightly perturbed system Ax=b where A = UTiV T (2.32) and £ = diag(a , . .. cr , 0, .. . , 0). The cost of forming the solution 1 m in this way is n operations for the decomposition plus mn 3 2 operations to perform the summation. The smoothness property is guaranteed by selecting m using the condition (2.27). The expansion of the solution is assumed to be in the singular vectors of the coefficient matrix. The success of this approach is based on how well the solution can be expressed in terms of the first few singular vectors. Other sets of orthogonal vectors may be used if the solution is known to be expressable in some other form; a truncated form of the QR method is simple to apply, see for example [39],[8]. (The QR method is described in most linear algebra texts, e.g., [15, 16].) Chapter 2. The Numerical Treatment of Integral Equations 22 Here the discrete Picard Condition, (2.27), is used both as an indication of the existence of a stable solution and as the strategy for creating it. A n alternative strategy can be derived by using the Picard Condition only to show existence of a reasonable solution. The smooth solution may be calculated by restricting the class of admissible solutions by means of a constrained minimization problem. The stability is guaranteed by either minimizing the residual whilst restricting the solution to be smooth: min || A c — b\\\ such that ||a;||| < £ (2.33) or by minimizing the instability whilst admitting as possible solutions those that result in a small residual: min 11x11| such that ||Ac — i | | | < 8 (2-34) for small values e and 8. These two problems (2.33) and (2.34) can be restated as one; mm(||Ac - 6||| 4- A||x|||) (2.35) where A is a small parameter. In this way the ill-conditioned problem is transformed into a well-conditioned form, and can be solved by standard methods. The analogy with the continuous theory is again evident since (2.35) is, in some sense, the discrete equivalent of the Fredholm integral equation of the second kind g{t) = \f{t)+ I" K(s,t)f(s)ds Ja (2.36) where g(t) =K*g(t) K{s,t) =KK*{s,t) = [ Y(s~J)g(s)ds Ja = [ K{s,t)K{s,t)ds Ja b (2.37) b (2.38) Chapter 2. The Numerical Treatment of Integral Equations 23 In cases where the integral equation is of the second kind (2.36) the problem is well-posed see [5, 6] and [9]. The corresponding algebraic system is of the form (/ — XA)x = b and unless A is very large the near singularity of A is not a source of difficulty. The solution of (2.21) by solving instead (2.35), or (2.32), are examples of the use of the technique of Regularization. The following section outlines the method of Regularization and discusses some of the ways in which it is implemented. §2.3.1 examines the numerical solution of (2.35) and §2.3.2 presents an alternative approach to Regularization which is motivated by the method of truncating the S V D (2.31). 2.3 The R e g u l a r i z a t i o n Techniques term Regularization is used in a broad sense to mean the replacement of the ill- conditioned problem by a well-conditioned system that is made close to the original by means of a small parameter. There are two basic approaches: • To alter the specification of the problem by means of a constraint and then apply standard solution methods to the new well-conditioned system (2.35). The parameter is used to control the contribution of the constraint. • To solve the original least squares problem (2.22) using a modified iterative method to give a sequence of less and less regularized estimates. The regularization parameter is provided by the iteration count. Both methods try to select a smooth solution that gives a small residual. The parameter is required to be small so that the modified system resembles the original problem. However, the larger the value is taken for the parameter, the smoother the computed solution will be. Thus the parameter represents some trade-off between Chapter 2. The Numerical Treatment of Integral Equations accuracy (in terms of the residual) and smoothness. 24 The choice of the value of the parameter for optimum results is non-trivial, see for example [43]. A straight-forward approach is to make the choice interactively by solving the system for various values of the parameter and selecting the "best" solution. The first approach is presented in §2.3.1 and the iterative approach to solving the original, minimum norm, problem is dealt with in §2.3.2. 2.3.1 Regularization of the P r o b l e m The original method of regularization was introduced by Tikhonov, [37], and Phillips, [28]. In its general form a smoothing operator is included within the constraint and is stated as: min(||Aa!-6|| + A||Lx|| ), 2 (2.39) 2 where A is a small parameter and L some chosen operator. L is usually some discrete approximation to a differential operator so that for example, Lx = x,x,x. The above system (2.39) is well conditioned and will have a unique solution provided the intersection of the null-space of A and L is trivial, i.e., N(K) n N(L) = 0. Regularization in Standard Form. With the operator L taken as the identity so that Lx = x the problem (2.39) is often referred to as being in standard form. The solution can then be performed using the associated normal equations (A A T + XI)x = A b. T (2.40) If the S V D of A is obtained (2.40) can then be written as V(t )V x 2 T = VW b T (2.41) Chapter 2. The Numerical Treatment of Integral Equations 25 where U, V and S are the matrices as from the SVD and £ is a diagonal matrix with non-zero entries as {<T,- + A}. The solution is the expressable as the sum «=i i+ u A The expansion (2.42) is given in the same set of vectors as the truncated form of the SVD given above (2.31). The difference is that all the singular vectors contribute to the solution but the contributions are damped (from the exact solution (2.23)) by a factor wU- < » 2 43 This results in a smoother transition of the weights than in the truncated form of the SVD (2.31). Once the singular value decomposition has been performed the solution can be efficiently obtained a number of times for different values of the parameter A, and the best one selected. Regularization in this standard form performs similarly to the method of truncating the singular value decomposition see [39]. The M e t h o d of T i k h o n o v R e g u l a r i z a t i o n The choice above of the operator L as the identity is not necessarily optimal. The stability of x can be further ensured by the inclusion of the smoothing operator as part of the constraint. The associated normal quations are then (A A + XL L)x T T = Ab T (2.44) An expansion solely in terms of the singular vector of the matrix A is no longer possible. Various suggestions have been given to produce an effective solution method to avoid the formation of the factors A A 1 and L L. T Chapter 2. The Numerical Treatment of Integral Equations 26 [8, 16] and [38] demonstrate the use of the generalized singular value decomposition of the matrix pair (A, L) so that A = UD W- x (2.45) L = VD W~ (2.46) d l e where U and V are orthogonal, D and D diagonal and W causes the diagonalization d e of (A, L). This gives the solution as f-\ di + Aei/di In [27], a bidiagonalization algorithm is presented as a more efficient method of performing the solution to (2.39). Use of Iterative M e t h o d s The methods above are effectively direct methods for evaluating a solution to the wellconditioned problem (2.39). Solution of (2.39) may also be performed using iterative techniques. The normal equations may be written in factored form as: (A T XL ) T ( A \ x = A b. T (2.48) V XL J Most iterative techniques can then be applied to this factored form of the equation, see [8, 41, 45]. The performance of the iterative scheme is still very sensitive to the condition number of A. This is in contrast to the direct approaches above where the illconditioning of the coefficient matrix is no longer a source of trouble. The performance of the iterative technique may be improved by some preconditioning of the equations before they are solved. This may be done by the introduction of a scaling matrix to produce a coefficient matrix with an improved singular value spectrum. However, this has the effect Chapter 2. The Numerical Treatment of Integral Equations 27 of accelerating the computation of an accurate solution and in so doing may "miss" the smooth approximation: it is not obvious whether the use of preconditioning is advised for any particular ill-conditioned system [8]. The use of iterative techniques for the regularized problem is also inhibited by the presence of the parameter. For each new value of the parameter the entire solution process must be completed and thus is wasteful in computational effort. The use of an iterative method as a solution strategy for the problem (2.39) is thus limited to very large, sparse systems, where the use of direct methods is prohibitive. 2.3.2 R e g u l a r i z a t i o n of the Solution : Iterative F i l t e r i n g In §2.3.1 the solution of ill-conditioned problems is achieved by constraining the equations by solving instead the equation (2.39) for small positive values of A, i.e., A > 0. The use of iterative solution techniques is limited. If the parameter A is set to zero (thus requiring the solution of the original Least Squares problem) iterative techniques may be used successfully. The regularization is achieved by terminating the iteration before the components corresponding to the small singular values have converged. The effect is to discriminate against the unstable part of the solution by filtering the contributions of the singular vectors. The truncated form of the SVD in §2.2 is, in some sense, an example of this Iterative Filtering technique. As the iteration proceeds a sequence of less and less regularized estimates of the solution is produced. The extent of the regularization is therefore controlled by the choice of iterative method and the (hopefully small) number of iterations performed. In the truncated form of the S V D the singular vectors TJ; appear as components of x m either with their full weighting, as in x* (2.23) (for i < m) or not at all (for i > m). In Chapter 2. The Numerical Treatment of Integral Equations 28 this way the truncated S V D expansion can be expressed as x m = J2H(i)^ i=l *i (2.49) Vi where the weight function H(k) is, { 1 k = 1, . . . ,m 0 k = m + 1, . . . , n Another strategy is to allow all the singular vectors to contribute by a weight function F(o~) that discriminates against the singular vectors associated with the small singular values. The function F(o~) is such that F(cr) 1 for a "large" 0 for cr "small" and has a smooth transition for the intervening values of cr. A general iterative technique is given by Strand [36], x = x_ k k + J\4(A A)A (b T x T — Axk-i) where A4(o~ ) is some polynomial. The initial estimate x 2 0 (2.50) = 0 is assumed but is not restrictive. The solution to (2.50) can be written in terms of the singular value decomposition as: »fc=E^fc(^)-«x, i =l (2.51) 1 where ^ (cr ) = l - ( l - c r A 4 ( c r ) ) . 2 2 2 f c fe (2.52) (2.51) can then be compared with (2.23). The effect is to damp the component along the singular vectors V{ by an amount ^.(cr?). J-(-) is the Filtering Function. Different choices for the filtering function result in different solution schemes. Chapter 2. The Numerical Treatment of Integral Equations 29 A special case of (2.50) is given by Landweber, [24], where a simple iterative strategy is applied to the normal equations so that x = JBfc-i +A (b-Ax ^) (2.53) T k with A normalized so that o 1 k = 1. (2.53) implicitly takes M.(cr ) — 1. This scheme 2 proceeds relatively slowly since after k iterations only the components corresponding to singular values such that o~i > l/y/k have converged. [17] gives a study of various forms of (2.50) where T is taken as some continuous approximation to the Heaviside step function: 0 z =0 1 otherwise A series of possibilities using varying degrees of the best least squares fits and Chebyshev polynomials as the approximating function are investigated. This thesis investigates the "filtering" properties of the iterative method of Conjugate Gradients and thus assesses the success with which it can be used as a method for the solution of ill-posed problems. The next chapter details the method of Conjugate Gradients before examining its use as a regularization procedure in Chapter 4. Chapter 3 T h e M e t h o d of Conjugate G r a d i e n t s The method of Conjugate Gradients is an efficient algorithm for computing the solution to a system of linear equations Ax = b where 6 G 3?" and A £ 3f? nXn (3.54) is assumed to be symmetric positive definite. The method was developed independently by Hestenes and Stiefel [22]. The algorithm is based on the process of locating the minimum of a quadratic function over decreasing dimensional planes and can be easily implemented as a repeated series of simple steps. Each iteration requires the formation of just one matrix-vector product and provides an improved estimate of the required solution. Mathematically, the solution x" = A~ b l will be evaluated in at most n repetitions and so is a direct solver. Unfortunately, computationally the effects of numerical roundoff are unfavourable. Although ineffective as a direct scheme (as it was originally proposed) a good estimate of the solution is obtained early in the process, regardless of roundoff errors and hence the method of Conjugate Gradients can be used as a successful iterative procedure. Considerations of the convergence rate led to further development of the Preconditioned Conjugate Gradients. This chapter gives an outline of the method of Conjugate Gradients. The method of Conjugate Gradients belongs to the general class of Conjugate Directions methods, which is motivated by the theory of function minimization. §3.1 uses the minimization process to develop the method of Conjugate Directions. The development begins with the method of Steepest Descents. A n examination of the drawbacks of this 30 Chapter 3. 31 The Method of Conjugate Gradients simplest method lead to the Conjugate Directions algorithm. The Conjugate Gradients algorithm itself is discussed in §3.2. §3.2.1 derives the scheme as a special instance of the Conjugate Directions algorithm and §3.2.2 outlines some of the features of the method which will be required in Chapter 4. The effects of roundoff errors and the use of the method as an iterative scheme, are discussed in §3.2.3. Finally, an estimate of the convergence rate of the scheme is mentioned in §3.2.4 and used to motivate the idea of a preconditioned algorithm. The preconditioned algorithm is discussed in more detail in §3.3. §3.3.1 presents Conjugate Gradients as a method based on a splitting of the coefficient matrix, which leads to a discussion of the role of the preconditioner. Various choices for the preconditioner are mentioned in §3.3.3. Finally, §3.4 extends the algorithm to cases for which the coefficient matrix is not symmetric positive definite. A n algorithm for the Linear Least Squares problem is discussed in §3.4.1 and the preconditioned algorithm is presented in §3.4.2. 3.1 T h e M e t h o d of Conjugate Directions The solution of the linear system (3.54) and the Conjugate Direction algorithm itself can be motivated by considering the minimization of the quadratic function <f>(x), defined by d>(x) = l/2x Ax T - x b. T (3.55) Necessary conditions for the existence of a local minimum for <j> (3.55) at x~ are given by: 1. V(j)(x*) = 0 2. V d>(x~) is positive definite. 2 Hence with A = V </> symmetric positive definite, locating the minimum of (3.55) is 2 thus equivalent to solving Vd>(x) = Ax — b = 0. The problem of locating the minimum Chapter 3. The Method of Conjugate Gradients 32 of (3.55) is equivalent to the solution of the linear system (3.54). One of the simplest iterative methods for calculating the location of the minimum x* of <p is the method of steepest descents. At the point x , 4> decreases most rapidly along the direction of the k negative gradient of <p - V<j>{x ) = b- Ax . k (3.56) k The vector r = b — Ax is the residual of the system at x . In the method of steepest k k k descents the next estimate x +i is chosen as the point which minimizes <b along r k k) and is determined by choosing a step length parameter a which minimizes <f>(x -\-a r ). This k k k k gives a step length of a k = 4lh- (- ) 3 57 The speed at which this strategy converges to x* can be prohibitively slow. The search directions are along the inward normals of the (n — 1)-dimensional ellipsoids, centred at x" which form the level surfaces of cf>. The singular values of A are precisely the length of the semi-axes of these ellipsoids. For ill-conditioned problems where the ratio of largest to smallest singular value is large, the level surface ellipsoids are very elongated and relatively flat valleys with steep sides. The method of steepest descents requires that at each iteration the new search direction be perpendicular to the last by making a rightangled turn at the point where the search direction becomes tangent to the contour lines. This forces the minimizer to be located by a series of many short steps that proceed back and forth across the valley floor. (The system does not have to be very ill-conditioned before this can be destructive. The presence of one singular value that is much smaller than the rest can produce the elongation and force slow progress.) This slow progress can be overcome by searching instead along a set of directions that do not correspond to the negative gradient of <p at each point. One strategy, used by the method of Conjugate Directions, is suggested by the following result [21]: Chapter 3. The Method of Conjugate Gradients 33 ffx is the unique minimizer of <p on the plane defined by the set of k mutually conjugate vectors i\k — {pi,... ,pk} then the minimum of <p is located at x' which is contained within the (n — k)-plane ir -k defined to lie through x and conjugate to 7Tj.. n This result means that the dimension of the space, in which i is known to lie, can x be reduced. The notion of conjugacy is an extension to the idea of orthogonality. Two vectors u and v are G-conjugate if the vector u is orthogonal to the vector Gv, i.e., u Gv = 0. (This T leads to the term G-orthogonal sometimes being used.) Unless otherwise specified, in the case where the matrix G is the coefficient matrix for the system (3.54), the condition A-conjugate will simply be stated as conjugate. The minization of (f>, by the method of Conjugate Directions, is performed by systematically reducing the dimension of the search space as follows: • Select a point x and a line L\ through x in the direction pi] 0 0 • Find Xi that minimizes (j> on L\ exactly; • Construct the (TI — l)-plane 7r _i through X\ and conjugate to p\. n By the above result, 7r _\ contains x", the location of the minimum of c4 (thus the n dimensionality of the problem has been reduced by one). This process is repeated starting with the point X\ and confining the search to the (n — l)-plane 7r _x. n • Select a line L in 7r _ through Xi in the direction of p G 7r _ ; 2 n 1 2 n 1 • Find x that minimises <f> on L exactly; 2 2 • Construct the (n — 2)-plane 7r _ through x and conjugate to p . n 2 2 2 Chapter 3. The Method of Conjugate Gradients 34 Now x~ is known to be contained in 7r _ . ri 2 In this way an iterative procedure is created. At each step the dimension of the search space is reduced by one. At the n-th step the search space Ti is a line L . The location of minimum of <p on L , n n is exactly x" the location of the global minimum, as required. Hence the termination of the procedure is guaranteed within n steps. The termination of the minimization process within n steps is independent of the initial point x . For convenience it is usual to choose 0 XQ = 0 as the starting point, but this is not necessary. (If an improved estimate of the solution is available this should be substituted.) Constructing the planes 7r as described above does not produce an effective numerical fe algorithm. However, the requirement that the next search direction Pk+i be in the (n — k)— plane 7r _A. (which contains a;") is equivalent to chosing Pk+i to be conjugate to all n the previous search directions {p ... [21]. This suggests the Conjugate Directions ,p } 1; k method. The iterative step for the method of Conjugate Directions can be defined as: • Select pk to be conjugate to • Find Xk ... p -i}; } k that minimizes <f> on the line Xk-\ + ctkPk exactly. Each value of a is found, as in the method of steepest descents, by differentiating k 4>{xk-\ + ctkPk) with respect to ctk and equating the result to zero. This yields the step length as: T a k = g ^ , (3.58) which is always well-defined since A is assumed to be a symmetric positive definite matrix, so by definition [u, Au) j= 0 for any vector u. As presented, there is no explicit specification for the choice of the search directions Pfc, but reducing the magnitude of (j>(xk) at each step imposes a restriction on p - The k Chapter 3. The Method of Conjugate Gradients 35 value of <p at each step is given by <j)(x ) = <j>(x -l + OLkPk) k k = then with the value of a k <i>(xk-i) + a pl(Ax -i k -b) + k l/2a plAp 2 k k as given in (3.58), this gives <t>{ ) = <p{ - ) - i / 2 " v , : Vk Pk Xk Xk x Thus to ensure a reduction in <p, the direction p k k ( - ) 3 59 must be chosen such that it is not k orthogonal to the current residual, i.e., p r _i • A ^ 0. A practical algorithm for the method of Conjugate Directions is given as follows : Conjugate Directions A l g o r i t h m 0 = b — Ax k ^ 0 ^ 0 Set x = 0 and r 0 For i = 1,2,... do while r choose a direction p such that p^r _ k k x x k 0 = x -i k + ct p k k (3.61) The steps above form the basis for the taxonomy of algorithms known as Conjugate Direction Methods. This general class of algorithm does not place any requirement on the search directions, other than requiring that they not be orthogonal to the residual at each step and that together they form a mutually conjugate set. Specific Conjugate Direction schemes are distinguished by the method that is used to evaluate the search directions. The next section details the method of Conjugate Gradients in which each search direction is formed as an orthogonal projection of the residual. Chapter 3. The Method of Conjugate Gradients 3.2 36 T h e M e t h o d of Conjugate Gradients The general method of Conjugate Directions described in the previous section allows some choice over the method of selecting and computing the search directions. Various choices lead to specific methods. The original method of Conjugate Gradients was suggested by Hestenes and Stiefel. They independantly formulated a scheme in which the search directions are formed by A-orthogonalization of the residuals. A full description of this scheme, its properties and variations, is given in [22]. §3.2.1 §3.2.2. presents the algorithm itself, while some of its properties are mentioned in The use of the algorithm as an iterative scheme is outlined in § 3 . 2 . 3 and its convergence rate is examined in § 3 . 2 . 4 . 3.2.1 T h e C o n j u g a t e Gradients A l g o r i t h m The method of Conjugate Directions specifies only the space from which search directions must be selected. Each new search direction is to be conjugate to all previous search directions, which is equivalent to requiring that Pk+i G span{Api, ... Apk} (where S 1 ± [16],[21] is the orthogonal complement of the set S.) Recall that at any point the direction in which the function shows most rapid decrease is that of the negative gradient at that point. The conjugacy requirement for the direction vectors in the the Conjugate Directions method was introduced to reduce the problem of the slow convergence of the method of steepest descents. However it could be advantageous to use as much of the information from the residual direction as possible. One way to achieve this is to make the search direction the orthogonal projection of the residual onto the space containing the admissable search directions. (This is in one sense making the search direction as close as possible to the direction of negative gradient.) Chapter 3. The Method of Conjugate Gradients 37 This causes any new search direction to be conjugate to the residual r%Ap = 0 k+l (3.62) and gives the method of Conjugate Gradients its name. The method of Conjugate Gradients selects the first search direction as the negative gradient and thereafter each new search direction p +i is the orthogonal projection of the k corresponding residual r onto span{Ap\,... , Apn}- . 1 k The Conjugate Gradients algorithm is simply a modification of the general Conjugate Directions algorithm which reflects the above considerations. • Set x = 0; 0 • Compute pi to be the vector of steepest descent; • Obtain the minimum x of <p on the line through x in the direction of p\ . Compute x 0 p to be the orthogonal projection of r on to 2 x span{Ap }^] x • Obtain x = m i n <b(xi -f ap ); 2 a 2 • Iterate until at the rz-th step, x n will be the required minimum x*. The precise calculation of each direction vector is possible from the orthogonality and conjugacy requirements. From (3.62) each new search direction is contained in the space described by the last search direction and the residual so that p £ span{r _ ,p _ } k so without loss of generality p can be represented as p — r _ k k k x k 1 k 1 [16], + /3 pk-i, for some (3 . It k k is required that each new direction be conjugate to any previous search direction so that P*_i4Pfc = o = Pk-i (r -i A k +/3 pfc_i). fe Chapter 3. The Method of Conjugate Gradients 38 Therefore o @ k Vk-\Ark-\ T A = ,3 • v I' ) 63 This formula for Q can be simplified by using the expression for the step length (3.58) K and the recurrence relation for the residual r = r - -a Ap , k k 1 k (3.64) k and the fact that each search vector must be orthogonal to all future residual directions. Together these give IK-i|| 2 = (r -2 - a -i Apk-ifr^ = rl_ r _ = -a _ pf_ Ar _ k (3.65) k 2 k fc x 1 - a ._ipf_ Ar _ A 1 1 fc 1 fe 1 (3.66) (3.67) And PfcVfc-i = {r -i+BkPk-i) Tk-\ (3.68) = ('2>fc-i)- ( -69) T k 3 Then by substituting (3.69) into (3.58) gives (3.71), and substituting (3.67) into (3.63) with (3.71) gives (3.71) The expressions (3.58) and (3.63) can then be rewritten as: X AFC and/3 fc = -PUPT = | ^ ^ . \\ k~2\r ( 3 - 7 0 ) (3.71) r The steps for the Conjugate Gradients algorithm can now be presented in a form that is practical for programming purposes. Chapter 3. The Method of Conjugate Gradients 39 Conjugate Gradients A l g o r i t h m . Set x = 0 r 0 — b then for k — 1, 2, 3, . . . ,n 0 (3 ~ (/3 k-2 k-2 Pk = Tk-i + PkPk-i T k = T l T k T " k x k r k 1 T 1 = (3.72) (3.73) 0) T (3-74) = ( - x ^i + cx p - r _! - a Ap . k fc k k 3 ' 7 5 ) (3.76) k (3.77) k The formulae for a,/3 and r can be written in various forms (see [22] and [29]) but the forms given above are preferred for implementation [29]. The algorithm above appears as it was first presented. Despite its mathematical beauty and computational simplicity the algorithm was not popular, as in practice it proved to be uncompetitive with other available schemes. The next section discusses the source of the problems and shows that when the method is regarded as an iterative scheme (rather than as a direct scheme) it can be considered as a viable solution method. 3.2.2 Properties of the M e t h o d of C o n j u g a t e Gradients Theoretically the Conjugate Gradients method presented above possesses favourable qualities. It involves at most (in exact arithmetic) n repetitions of a series of simple steps. The coefficient matrix A is not involved in any decomposition process and is required only for the single matrix-vector product performed in the calculation. Any special properties of the matrix A (e.g., banded structure) can be fully exploited reducing the computational effort per iteration, and the storage requirements. This is of particular importance in cases where A is large and sparse. Practically, though, the algorithm's performance falls short of these high expectations. Chapter 3. The Method of Conjugate Gradients 40 T h e Residuals and Search Directions The residuals r are the search directions used by the method of steepest descent, but the k Conjugate Gradients algorithm uses the search directions as the conjugate vectors pi to produce an improved performance. In the algorithm (3.77) the particular conjugate vectors, formed by orthogonal projections of the residuals, create a set of relations between the residuals and the search directions. By definition the direction vectors {p , ... ,p } t n form a mutually conjugate set, and the residuals {ro, . . . ,r } are mutually orthogonal. In n addition pfrj = 0, for i < j. We have also that span{pi,... span{b, Ab, . .. , A ~ b}. k 1 ,p } = span{r ,... k 0 ,r -i} k = So in summary: (r rj) =0 i^j ( ,A ) Pj =0 i±j (Pi,rj) =0 i<j (Pi,rj) = ||r\-|| it Pi i>j. 2 T h e Parameters a and 8 The two parameters a and 8 used in the method of Conjugate Gradients are calculated by known formulae (3.58) and (3.63). No a priori knowledge of the system, or other user interaction is required. This is in contrast to many other methods (e.g. SOR and Chebychev iteration) which require parameters to be estimated in order to attain optimal efficiency. The parameter 8 is the square of the ratio of the last two residual L norms. If at k 2 each step the error e is defined as k e = x* - x k k (3.78) Chapter 3. The Method of Conjugate Gradients 41 and the inner product E{e ) = (ek,Ae ), then k k E(e ) k = (aj" - x) A{x' = H^-^IU = x" Ax" - 2(j>(x ) T - x) (3.79) k (3-80) (3.81) T k and the minimization of (3.55) is equivalent to the minimization of the A-norm of the error. This means that the method of Conjugate Gradients ensures the reduction in magnitude of \\x" — Xk\\^ at each step and therefore guarantees that an improved estimate to a;" is obtained at each iteration. The reduction in length of the error vector does not guarantee a monotonic decrease in the magnitude of the residuals, so no bound can be predicted for the /3fc's. The step lengths a k are such that their reciprocal is bounded between the largest and smallest singular values of the matrix A [22]. With the singular-values ordered cr > .. . > a , 1 n — < a < —. (3.82) k For well conditioned systems a reasonable indication of the sizes of the a 0~i k can be es- timated but when — is large the oik's are effectively unbounded. However, since it is assumed that the matrix A is symmetric positive definite, each step length is positive, i.e., a*. > 0 for k = 1,. .. , n. Finite Termination The termination of the method within a finite number of steps is guaranteed using exact arithmetic. The direction vectors {pi, . . . p \ form a set of k linearly independent vectors, k so the full set of n direction vectors is therefore span{pi,... that minimizes (j> o v e r span{pi,... ,p } n = 3?" . An algorithm ,p } must (in exact arithmetic) locate the minimum n Chapter 3. The Method of Conjugate Gradients 42 x". In addition, if A is such that it possesses only t distinct, non-zero, singular values then the solution can be expressed as a combination of t independent vectors and the method will locate the minimum in at most t iterations [22]. 3.2.3 Conjugate Gradients as an Iterative M e t h o d Following the publication of the original paper detailing the Conjugate Gradients algorithm very little interest was shown in the technique. This neglect stemmed from the fact that in a practical situation its major feature, the finite termination property, failed to hold. The algorithm was not competitive with such schemes as Gaussian Elimination with regard to both accuracy and efficiency, and it fell into disuse. However the method's other properties were promising and this led to a later re-evaluation of the scheme [29]. In [29] extensive tests are performed which demonstrate the effects caused by roundoff errors. Errors due to roundoff are propagated in the formation of the vector products (rfc_i,r-fc) and (Apk-i^Pk) [29] so that the orthogonality conditions {ri,Tj) = 0 and (Api.pj) = 0 for * ^ j (3.83) no longer hold, and the guarantee of termination within n steps is lost. The propagation effects result in the accumulation of the errors, so that the component of r in the most k recent direction is accurate but earlier components are less so. This means that the method of Conjugate Gradients is useful as a direct method only for systems of very small order. One way to minimize the effects of roundoff errors is to restart the process once a reasonable estimate has been obtained [22]. The guarantee of the monotonic decrease in the length of the error ||a;* —a; || (from (3.79)) justifies the use of any calculated estimate fc 2 as an improved starting point. However, too many restarts would destroy the advantages of the process over that of steepest descents, and is not in general recommended. Chapter 3. The Method of Conjugate Gradients 43 Even when roundoff affects the performance of the algorithm a reasonably accurate estimate of x' is obtained long before the nth step [22]. Reid found (for relatively wellconditioned systems) that the guaranteed reduction of ||ejt.|| held despite the effects of 2 roundoff. This observation led Reid [29] to propose the scheme's successful implementation as an iterative process to generate approximate solutions, and indeed (3.77) is an efficient and competitive procedure compared with other iterative schemes such as Chebyshev or Jacobi iteration. Using economical procedures to perform vector products makes the method of Conjugate Gradients particularly effective for solutions of large sparse systems of linear equations. Implementing the algorithm as an iterative scheme is particularly ecconomical as it requires 5n + 2 operations and one matrix-vector product (which will involve a maximum of n multiplications) per iteration. The product Ar 2 k can be stored each time to avoid unnecessary multiplications. The storage requirement for the scheme is therefore 4 nvectors (x,r, p and Ap). In the case where A has a favourable sparcity structure the operation count can be reduced considerably. In the evaluation of the Conjugate Gradients algorithm as an iterative scheme the critical issue is the number of steps required to reduce the error by a given amount, and thus the computational effort needed to obtain a reasonable estimate. A n examination of the rate at which the algorithm (3.77) converges follows in the next section. 3.2.4 Convergence R a t e A measure of the efficiency of an iterative solution method is given by its convergence rate. At each step the Conjugate Gradients algorithm produces a new estimate to the solution x" such that the error functional (3.79) is minimized. Various attempts have been made to specify a bound for this error and so give an estimate of the rate of convergence Chapter 3. The Method of Conjugate Gradients 44 see [12] and references therein. In [10] it is shown that - x \\A < Qk\\x'k x \\ 0 A (3.84) where Q depends only on the singular values of the matrix A. For a general matrix k A it is not possible to specify Qk exactly, but different bounds can be placed on the convergence rate by obtaining a value for Qk given some information about the singular value spectrum. A basic bound can be generated [10] assuming that an interval [a, 6] is known to contain all the singular values of A. Then Q, < (3.85) This is a reasonable bound for matrices where K (the condition number of A) is small, but severe ill-conditioning could lead to impractically slow convergence. In cases where more detailed information on the exact structure of the spectrum is available (e.g., clustering of the singular values) this bound can be improved see for example [4] and [30]. The convergence rate of an iterative scheme can often be improved by preconditioning the coefficient matrix. Preconditioning refers to the strategy of transforming the matrix in question to reduce its condition number, or to cluster or otherwise redistribute its eigenvalues. The technique is well understood and widley used for such methods as SSOR [3]. The method of Conjugate Gradients is also well suited to the idea of preconditioning. In exact arithmetic the number of steps required to converge to the solution x" is equal to the number of distinct singular values of the coefficient matrix, so the performance of the Conjugate Gradients Algorithm takes account of the full singular value spectrum and not just those on the periphery. Any clustering of the singular values, which would effectively reduce the number of distinct singular values, would therefore have a favourable effect Chapter 3. The Method of Conjugate Gradients 45 on the efficiency of the scheme. A n implementation of the Preconditioned Conjugate Gradients algorithm is given in the next section. 3.3 P r e c o n d i t i o n e d Conjugate Gradients Recognising the method as an iterative process permits the loss of orthogonality conditions (3.83) to be tolerated, but the question of convergence rate is then much more important. The error bound (3.85) suggests an unfavourable result for poorly conditioned systems. The notion of preconditioning the method of Conjugate Gradients is thus well motivated. There are two approaches to the idea : The process can be regarded as • Using Conjugate Gradients as an acceleration to another iterative process. • Preconditioning the sytem for Conjugate Gradients iteration. Theoretically the views are equivalent but, as is shown below, while the first gives a clearer understanding of the notion the second is used for practical implementation. §3.3.1 derives the preconditioned Conjugate Gradients method as a special case of a generalized two step iterative scheme based on a splitting. §3.3.2 gives a practical preconditioned algorithm and discusses the issues involved in the choice of the preconditioner. Some common preconditioners are mentioned in §3.3.3. 3.3.1 Generalized Conjugate Gradients A generalized two step iterative scheme based on the splitting of the matrix A = M — TY, where M is symmetric positive definite, can be formed by: Set £_i = Xo = 0 and ro Fork = b = 1,2, . . . or until ||r || < TOL fc Chapter 3. The Method of Conjugate Gradients Solve Mz _ = Choose parameters cu k x 46 6 - ( A f - N)xk-i = r_ k x and v -\ and set k k Various iterative methods can be described in this way, e.g., the Jacobi, Gauss-Seidel, and SOR schemes, [16]. Combining the scheme above into one equation gives Mz k = Mz _ k - w {v _ (M 2 k If the parameters u -\ and k - N)z _ x k + M ( z _ - z _!)) x fe are computed so that k zjMZJ 2 = (3.86) fc 0 for i ^ j then 3k < n such that z = 0, and so x — x". This description is equivalent to a generalized two step k k Conjugate Gradients method and will therefore converge in no more than n steps. The required values of the parameters v _ k and u x z _ {M - T k and w fc = (1 k x are: N)z _ k x v -\ zl- Mz _ 1 =—— ). V -1 Z ^2 k-2 k X k x Mz k k Since M is symmetric positive definite, it can be represented as M — CC T for some non-singular matrix C. Choosing a splitting A — M — N is thus equivalent to choosing a scaling matrix C. The system Ax = b then becomes the transformed system Ax=l (3.87) with A = C-'AC- , 7 x = C~ x, T b = C-H (3.88) (note that A is still symmetric positive definite). The Conjugate Gradients algorithm (3.77) can be applied to this preconditioned system. Chapter 3. The Method of Conjugate Gradients 3.3.2 47 P r a c t i c a l Implementation of Preconditioned Conjugate G r a d i e n t s Applying the Conjugate Gradients algorithm to the preconditioned problem (3.87) gives: Set Cx 0 = 0 and Q r = b then for k = 1, 2, 3 l 0 p = C~ r -i + 3 p -i l k k k k ZCC- AC-ip l P k Xk = Xk-l + OLkPk C~ r x k = C-Vi-fliC- ^-^. 1 Computing A is not advised as this would increase the expense and introduce unnecessary roundoff error. Generally, even if A has any sparsity structure (e.g. a banded matrix) A will not, and will require increased storage. The major part of the work per iteration is in performing matrix-vector products. It is possible in such cases that any advantage of routines written to perform the matrix-vector products Av efficiently will be redundant and the overall work per iteration for the preconditioned system may increase substantially. The additional cost of applying the preconditioner must be sufficiently small so as to not negate the savings produced by the increased convergance rate. A practical scheme is achieved by simplifying each line in the algorithm and setting M = C. 2 The extra matrix-vector products are reduced by setting z = M~~ r, which 1 then requires the solution of the system Mz k = r once per iteration. The algorithm k below is produced. P r e c o n d i t i o n e d Conjugate G r a d i e n t s A l g o r i t h m . Set XQ = 0 and r 0 Solve Mz k = b then for k = 1, 2, 3 , . . . and j|r-jt |j < TOL = r k Chapter 3. The Method of Conjugate Gradients f3k = 48 (ft = 0) k-2 k-2 z Pk = r zk-i + QkPk-i vl n A x - rk = k Xk-i + a pk h r _i-a Ap . k k k By preconditioning in this way it is possible to work with only the original matrix, and the sequence of updates is directly for x (and not x). The algorithm is still applied to a symmetric positive definite system so all the properties of the original scheme hold, but with an improved convergence rate. In the case where M = I (the identity matrix) the above algorithm reduces to that of the original scheme (3.77). The algorithm requires one solution of a linear system plus 5n + 2 opertions and one matrix-vector product (with a maximun of n operations) per iteration. Storage is 2 required for the vectors x,r, p, Ap and z (although z can share the storage with Ap). Routines may be written so the it is not necessary to store A or M in full matrix format. It is not at all obvious which choice of the preconditioning matrix C would give the optimum performance. The matrix C is selected so that K(A) < n(A), and any redistribution of the eigenvalues so as to cluster them would also have favourable effects. However, the matrix C is never required in the algorithm as it only appears in the form C 2 = M. As explained above, §3.3.1, preconditioning is equivalent to accelerating an iterative method formed by splitting the original coefficient matrix as A = M — N. The matrix M of the splitting and the matrix M that appears in the algorithm are in fact the same matrices. The question of how best to choose C is thus replaced by the problem of choosing M in the splitting A = M — N. In one sense it is possible to view the above algorithm as a direct method (applied Chapter 3. The Method of Conjugate Gradients 49 to a modified system Mz = r) imbedded within an iterative scheme. Different choices for M will alter the computational effort involved for the direct method. There are two opposing considerations governing the choice of the preconditioner M. • One philosophy is that M should be as close to the identity matrix as possible, ensuring the solution of Mz = r requires minimal extra work. • Alternatively, since the rate of convergence is related to the condition number of M~ A X (from the underlying iterative scheme), the ideal choice for M would be A itself. This choice would reduce the Conjugate Gradient algorithm to that of the method chosen to solve Mz = r, since at the first step r = b so the solution x" would be achieved immediately (but at the cost of solving the full system Az = 6). Thus M should instead be selected to be as close, in some sense, to A as possible. From the splitting A = M - N, A = I + C~ NC-\ (3.89) l so that the number of distinct eigenvalues of A is equal to the number for N. For the number of iterations to be as few as possible N should be like the null matrix and thus M should reflect A. Obviously the optimum choice involves some compromise. A short discussion detailing some of the common choices appears in the following section. 3.3.3 C o m m o n preconditioners A good preconditioning matrix M is one that achieves a favourable compromise between the reduction in number of iterations needed and the increase in the cost per iteration. A successful choice is somewhat problem dependent since the method of Conjugate Gradients is able to make effective use of symmetries and other structural properties inherent e Chapter 3. The Method of Conjugate Gradients 50 in the system. Many preconditioners have been proposed (see for example [25],[11], [12], [32],[44]). A comparison of the various choices is beyond the scope of this thesis, but for completeness some are discussed below. Since Preconditioned Conjugate Gradients has its basis in a splitting A = M — N this can provide some guidance on devising rapidly converging schemes. Any of the classical methods can be used. • Jacobi method Diagonal preconditioning is the simplest iterative method. M is taken as the diagonal of A so the solution of Mz = r at each iteration is just a simple division process. It requires no extra storage and an increase of n operations per iteration. This type of preconditioner is equivalent to accelerating the Jacobi iterative scheme by the method of Conjugate Gradients. • SSOR iteration The method of SSOR can be accelerated by Conjugate Gradients in a similar manner. With A - D + L + U the preconditioner becomes M = (D +u>L)D~ (D 1 +uL). This needs no extra storage since the solution Mz = r is performed directly using the required elements of A. • The method of I C C G Decompositions of A may not retain the sparsity structure in the factors and so there may be a radical increase in storage requirement. It is preferable to keep any banded structures that A may possess. The Incomplete Cholesky factorization was proposed as a preconditioner by Meijerink and Van der Vorst [25]. The strategy is to perform the Cholesky decompostion of A but to retain as zeros preassigned elements of L. The problem of fill-in is thus confined to some known area (e.g., Chapter 3. The Method of Conjugate Gradients 51 within the bandwidth of A). The decomposition is then a splitting A = LL T — R so that the preconditioner is the product of simple triangular matrices M = LL . T Various forms of the incomplete factorization arise from different choices of the elements required to be zero. The preconditioning matrix can also be based on the polynomial expansion of the inverse of A. This technique is particularly applicable to parallel computers see [32] and [44]. In this preconditioned version Conjugate Gradients is one of the most practical iterative solvers. The method is also of historical interest. Although direct and iterative methods generally form two distinct classes of solution schemes, the method of Conjugate Gradients falls into both and emphasizes the link between the two. It was conceived as a direct scheme but was soon rejected. Its properties, once re-evaluated, proved its worth as an iterative approach. The preconditioned version in one sense combines direct and iterative schemes in one algorithm. 3.4 Conjugate G r a d i e n t s for L i n e a r Least Squares The derivation of the algorithm, and the discussion of its properties, is valid only for a symmetric positive definite coefficient matrix. A modification to the original algorithm to deal with the more general case is described here. §3.4.1 presents an algorithm to solve the Linear Least Squares problem while §3.4.2 discusses the preconditioned version. 3.4.1 L i n e a r Least Square A l g o r i t h m For any matrix A the product A A T (A Av,v) T ^ 0 and (A Av,v) T is symmetric positive definite (This is valid since - (Av, Av) > 0 so that (A Av,v) T > 0). Hence in the Chapter 3. The Method of Conjugate Gradients 52 general case where A is not symmetric positive definite (or is overdetermined) the method of Conjugate Gradients can be applied (with A A as the coefficient matrix) to the Normal T equations A Ax = A b and be used to solve the Linear Least Squares problem. T T In the case of the Normal equations the error functional (3.79) is replaced by E(x--x ) k = (af - x ) A A(x~ = r r T T k - x) (3.90) k (3.91) T k k so that the algorithm ensures the monotonic reduction in length of the residual rather than the error vector. When working with the Normal Equations, formation of the product A A T is not ad- vised as it introduces roundoff errors, and increases the ill-conditioning of the system, since = K(A A) T K(A) . 2 The resulting, more efficient algorithm, while algebraically equiv- alent to applying the procedure directly to the normal equations, (with the coefficient matrix A A T and right-hand side A b) is computationally superior. T Conjugate G r a d i e n t s Least Squares A l g o r i t h m Set x = 0 and r 0 0 = Vk = b then for k = 1, 2, 3 , . . . while ||r || < TOL fc Ar_ + (3 p _ T k x k k x rl_ AA r _ T 1 - a k x k r k k 1 UA T P Pk = Zfc-i + ap = T-fe-i - ot Ap . k k k k One additional matrix-vector product is introduced, (so that solution requires 5ra + 2 operations, and two matrix-vector products per iteration) but the scheme updates x directly and retains the advantages of the basic algorithm. It is possible for the vector Chapter 3. The Method of Conjugate Gradients Ar 53 to share storage with Ap, so that space is still required only for the 4 vectors T x,r,p,Ap. The use of the method of Conjugate Gradients for finding the solution of minimum norm is studied in [7] and [33]. 3.4.2 T h e Preconditioned A l g o r i t h m for Least Squares The idea of preconditioning can also be applied to the least squares problem. P r e c o n d i t i o n e d Conjugate Gradients Least Squares Set x = 0 and r 0 0 Solve Ms ^ = b then for k = 1, 2, 3, . . . while llrJI < TOL = A k A r_ k _ s~[_ AA - s{__AATs - = k k k k s -i T 1 Pk = 8 -l Solve Mq x ( a / ? 1 - 0 ) +/3 pk-l k p k sl_ AA s _ T 1 k 1 {ATAq q k Xk = a:jfe-i + T r_ k = k x - OLkM^pk a Aq . k k Notice the algorithm requires the solution of two linear systems per iteration. For this the question of balancing the increased cost against the reduction in number of iterations to produce a decrease in the overall cost is particularly critical. Comparisons of some preconditioners for the least squares problem are given in [33] and [7]. Chapter 4 T h e F i l t e r i n g Properties of the M e t h o d of Conjugate Gradients The solution of ill-conditioned systems, that are large and sparse, by the technique of Tikhonov Regularization in §2.3.1 is prohibitive; direct methods are costly and iterative methods are ineffective due to the small parameter. Iterative methods can be used efficiently by achieving the regularization via an iterative filtering technique, §2.3.2. This chapter examines the use of the method of Conjugate Gradients as the iterative method to be applied. The first section describes the properties of a "favourable" filtering method, extending the ideas from §2.3.2. §4.2 describes the filtering function associated with the method of Conjugate Gradients using the relations from the algorithm (3.77). The similarity of the polynomial filtering function to a step function, is investigated in §4.3. The polynomial at each iteration is characterized as the solution to a weighted least squares problem and is shown to approximate the Lagrange interpolant through the singular values. A detailed examination of the filtering effects is given for the polynomial formed at the second iteration; the theory is extended qualitatively for later iterations and numerical evidence is presented, using four integral equations as test cases. In §4.4 the polynomial, at each iteration, is viewed as the expansion of an approximation to the inverse of A; this gives an alternative approach to the explanation of the filtering properties. The practical implementation is discussed in §4.5 and numerical tests show that the method of Conjugate Gradients gives comparable results to the method of truncating the SVD (2.31). 54 Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 4.1 55 Iterative F i l t e r i n g In Chapter 2 it was shown that the objective when computing the solution to a system of equations that are ill-conditioned is to find a smooth approximation that is close to the exact solution. In many practical situations the system is very large and often sparse. In these cases iterative solution methods are preferred. The approximation may be achieved by the technique of iterative filtering as in §2.3.2. In Chapter 2 some of the common iterative methods that have been used as iterative filters were mentioned. For an effective filtering method the convergence of the components of the solution corresponding to the "larger" singular values should be rapid as these form the smooth solution. As the contributions from the "later" oscillatory singular vectors converge the solution becomes unstable. It is valuable for the rate of convergence of these vectors to be slower, so that they can be damped. A favourable iterative method will give a series of estimates that, when expressed in terms of the singular vectors of the system T *k = b _H<n\k)— i> t=l v ( -92) 4 i a the function, T(o~\ k) filters the contribution of the singular vectors by damping those corresponding to the small singular values. Qualitatively J-(a;k) should have the property that, for the ordered singular values o~i > . .. > cr ( ) > . .. o~ > ... > o p fc 1 T(o; k) = I 0 q n (i.e., for o "large") p(k) >i>l { n > i> q (i.e., for a "small") { s(cr) q > i > p(k) (i-e., otherwise) where s(o) is some non-decreasing function that has values between one and zero, i.e., s'(-) < 0, 0 < s(o-) < 1. (4.93) Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 56 The function J-(cr;k), is known as the filtering function associated with the chosen iterative method. As the iteration process proceeds (so the value of k increases) the value of p and q should increase monotonically so that the components of each of the singular vectors converge in order, (i.e., those corresponding to the larger singular values first). The success of a particular iterative method thus depends on finding a set of polynomials J-(a; k) that give good approximations to the Heaviside step function. The filtering properties of the method of Conjugate Gradients are examined in the next section, by considering the polynomials inherent in the iterative process. 4.2 The Filtering Function In order to examine the feasibility of the method of Conjugate Gradients as an iterative filter it is necessary to express the iterates in terms of the singular vectors as in (4.92). From the algorithm (3.77) the method of Conjugate Gradients gives a series of estimates Xk to the solution of (2.21), formed as Xi = c[ ^b x = c Ab + c[ b x = c<£ A b + c Ab + c b 2 3 l 2) 2) 2 ) 2 3) 2 ( 3) i (4.94) where the cf^ are constants. This means that the iterates are expressible in terms of a polynomial as, x =V (A)b k h (4.95) where Vm(A) is a polynomial of degree (m — 1). Each iterate can then be expressed as an expansion in terms of the singular value decomposition of the coefficient matrix. This Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 57 is ub n T x =T *iM<n)—Vi (4.96) s = E&(o-0— - (- ) k / so that with Q (o) = oV (cr) k k Vi fc 4 97 °i i=l The polynomial G (o~) is the filtering function at the fcth iteration associated with the k method of Conjugate Gradients. (Q (o) is a polynomial of degree m.) It is the properties m of these polynomials that are considered in order to assess the viability of the use of the method of Conjugate Gradients with ill-conditioned linear systems of equations. The algorithm for the method of Conjugate Gradients, (3.77), is based on the recurrence relations for the search directions, the iterates, and the residuals, Pk = ffe-1 + Q p - x x_ k r k = k k k + ap x k X k = Tk-i - ot Ap k (4.98) k It is therefore also possible to express the search directions in terms of a polynomial; p = V {A)b k (4.99) k From the relations, (4.98), the two polynomials V k and V k are formed by the following recurrence formulae V (a) = V {a) = V ^(<r) + a V (o) k k where T> = 1 and V 0 0 (1 - aVk-^a)) k k + BkVk-^ir) k (4.100) (4.101) = 0. By rearranging (4.101) and substituting the result into (4.100), together the relations give the three term recurrence relation for the polynomial Chapter 4. The Filtering Properties of the Method of Conjugate Gradients filtering function. 58 G (<r) k Gk+i{<r) a 3 + (1 + ot k + l a xo-(l - a o-g (cr)) = k+ k+1 k )Gk{o) k + 1 k a with Go = Gi 0 and G -i{o-) a cr. = (4.102) k k The polynomials can thus be expanded in full to give k expressions, in terms of the parameters in the algorithm, for the coefficients in (4.95). The first four polynomials are given as illustrations: Q {a) = 1 (4.103) {a) ai G {a) = -(a a )(cr ) + ( a + a 3 + a )(a) 2 1 2 G {a) = 3 2 a (a^a^o- ) 2 - (a a2 + 3 (4.104) 2 2 aaB x 1 s 3 +aia + a a 3 + a a )(a ) 2 3 3 2 + a3 2 Gi{(r) = J r 2 3 + a + a33 2 2 3 2 + aB 3 -(a a Qi3Q; )(cr ) + ( a 3 a 4 1 2 4 a a OL^ 1 2 + aa a 3 2 — (a a x l A + 3 -f OLIOL 3 + a a 2 3 -f a a ? 4 1 3 4r 3 x 4 2 4 2 2 4 4 3 2 3 3 + Q i C t a a ^ aaaB 2 3 + aia /3 /3 3 4 3 +a a 2 2 + aa3 2 a Q;ia3a4 + + aia + a a 3 +a a /3 /3 + a a 3 2 a 2 4 4 (4.105) + a )(tr) 3 T + a a a:4)(<x ) 3 2 2 3 4 3 + a a 2 4 +a a 3 3 + a a4/#3 + ct a^)(a ) 2 3 4 2 3 3 3 +(ai + a /3 + a + OL B B + a /3 2 + a3 2 2 3 2 3 3 + CK4/32/33/34 + Q4/33/34 + CK4/34 + 3 a4)(o") (4.106) Without loss of generality the theory in this chapter is developed in terms of the original algorithm (3.77). In the case of the solution to the linear least squares problem Chapter 4. The Filtering Properties of the Method of Conjugate Gradients the polynomial, Vk{-) is in terms of the coefficient matrix A A. 59 The coefficients T in the expressions (4.103)—(4.106), etc., have the same form so application to the least squares case is achieved simply by replacing V " by V " so that x 2 From the algorithm (3.77), the parameters and k = Vk((T )b. are all positive. 2 This shows that each polynomial Qk is composed of coefficients cf ' for i = 1 , . . . , k from (4.95), that 5 alternate in sign, j ( +1 i [ - I t "odd" "even" and with £7fc(0) = 0 for all k. It is clear that • Qii ') i 0 • s a straight line with positive gradient. £2(0") is a quadratic that is concave down, and that in general 0 Gk{°~) is a polynomial of degree k that has positive gradient as it passes through the origin. In general (and particularly in the case of an ill-conditioned system) little can be said about the magnitude of the coefficients in each polynomial, as this was discussed in the last chapter. In Tables 4.2 and 4.3 the values of the coefficients for the polynomials from two practical problems are given as illustration. The two test problems are formed by the discretizations of two integral equations. The first example (which will be referred to as the "Trapezoidal" example) is generated from an example given in [17]. The trapezoidal rule, with a step length of 1/99, was used to discretize the integral equation (1.1) where K[a,t) = Vs +t 2 2 Chapter 4. The Filtering Properties of the Method of Conjugate Gradients Gi(x) = 60 l.ox G (x) = -163.Ox + 109.4x 2 2 Q \x) = 3, 664,574.9a: - 2, 458, 562.6a: + 2, 2587.6a; 3 2 z Table 4.2: The first 3 polynomials from Conjugate Gradients for the Trapezoid example for 0 < s,t < 1, which has the exact solution f(t)=t (4.107) Table 4.2 shows the values for the coefficients of the polynomials for the first 3 iterations for the Trapezoidal example. The second test case was generated using an example studied by Phillips in [28]. The trapezoidal rule was used, with a step length of 2/33 to discretize the integral equation given by K(s,t) = TT(S — t) l + cos(-^-—-), |s|,|i|<3 o g(t) = ( -|i|)(l +l/2cos(-))- sin(-li) 6 T which gives a solution of f{t) = l + c o s ( ^ ) , |t|<3 (4.108) Table 4.3 shows the values of the coefficients of the polynomials from the first 5 iterations for the Phillips example. Both tables illustrate how the coefficients in the polynomials may rise rapidly in magnitude. Numerical examination, by plotting the polynomials, demonstrates that when considered over all real numbers 3?, operator Gk( ) may be a wildly oscillating function. (Fig. x 4.1 shows the actual shape of the early polynomials given in the Tables 4.2 and 4.3.) Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 0.0 0.1 0.2 0.3 0.1 0.5 0.6 0.7 L n t e r v o l of scngular values 0 Lnter-val 10 20 of s i n g u l a r 0.0 0.1 0.2 0.3 0.1 0.5 0.6 0.7 unterval of singular values 30 values 61 tnterval 20 of s i n g u l a r 30 values Figure 4.1: Actual polynomials for iterations 2 and 3 for the Trapezoidal test case and for iterations 4 and 5 for the Phillips' test case. Chapter 4. The Filtering Properties of the Method of Conjugate Gradients = Gi{x= = GA{X = G$(x= 62 0.03x -0.0016a; 4- 0.08a; 0.0002a; - 0.014a; + 0.2a; -0.0003a; + 0.018a; - 0.31a; + 1.5a; 0.02a; - 1.2z + 21.5a; - 101.6a; + 69.8a; 2 3 2 4 5 3 4 2 3 2 Table 4.3: The polynomials from the first 4 iterations for the Phillips' example However, the values of Gk(%) i (4.95) are required only at the singular values of the man trix A; the algorithm uses these implicitly. It should be noted that explicit formation of the polynomial is never required. The oscillating behaviour is interesting however since it implies that as roundoff error accumulates, resulting in the use of values close to the true singular values, the effects may be unpredictable. This suggests the importance of performing only a small number of iterations of the method. It is possible to give a clearer idea of the evaluation of the polynomial at the singular values by examining the operator in more detail. In §4.3.1 the polynomial is shown to be characterized as the solution to a weighted least squares problem. The theory is further developed by a detailed algebraic examination of £2(0-) in §4.3.2. In §4.3.3 the observations are generalized for the later polynomials and numerical illustrations are presented. 4.3 Characterization of the Operator This section shows that the method of Conjugate Gradients has filtering properties that give results comparable with the method of truncating the SVD. Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 63 Precise: Minimization of the Deviation from Zero 4.3,1 The polynomial operator V (cr) is precisely the polynomial of degree k that minimizes k the A-norm of the error §3.2.4 since min\\x - x \\ k A = min \\x - V {A)b\\ k A = min((x - V (A)b) A(x = min b (V (A)AV {A) - V (A)b)) T k T k r k k n n k - 2V {A))b k n = ^* (S i) - ^fc(^))J)( _m 7m«m) i=lTiwfXE j =i(°'i^fc( l n U Cr 2 2 U = l where U{ are the singular vectors of the matrix A and o~i the associated singular values and 7; = uf b. Differentiating the last expression and using the orthogonal properties of Ui gives ECtfV*) i=l - ira^)l7i| 2 = 0. (4.109) (4.109) is equivalent to the minimization m i n ^ n ^ ) - ! ) 2 i=i (4-110) -1 The minimization (4.110) is that of a discrete weighted least squares problem with the weights { — } . The polynomial, CTi K (x) k = xV {x)-l k = g (x)-l, (4.111) k is therefore given exactly as the (continuous) polynomial that minimizes the deviation from zero at the (discrete set of) singular values. This is a unique polynomial and gives the exact characterization of the filtering polynomial G (x); this is the basis on k which the evaluation of the filtering properties is made. However, a precise evaluation of G (cri) is required to understand the filtering effects; this is not obvious from the k Chapter 4. 64 The Filtering Properties of the Method of Conjugate Gradients minimization (4.110). generalizations T h e weights d e p e n d o n t h e d a t a of t h e given p r o b l e m a n d therefore are very difficult. It is p o s s i b l e to give a straightforward q u a l i t a t i v e o b s e r v a t i o n . F r o m the t h e o r y i n the existence of a s m o o t h s o l u t i o n requires t h a t the weights { — } as i increases. T h e least squares fit for the filtering p o l y n o m i a l , §2.2 decrease i n m a g n i t u d e Gk{%), w i l l therefore be s u c h t h a t the values c o r r e s p o n d i n g to c^ where i is s m a l l are m o r e closely fitted to one t h a n those for i large. T h i s means that generally p o l y n o m i a l is r e q u i r e d to go t h r o u g h zero, polynomial Gk(°~) = aVk(o-) Gk{°~i) ~ 0 for i for z s m a l l a n d since the large. T h u s qualitatively f 1 x > 0 { 0 x = 0 T h i s is observed b y V o g e l [42]. T h e s i m i l a r i t y o f the p o l y n o m i a l to the step f u n c t i o n is necessary for Gk(%) effective Gk{%) m filter, however it is not sufficient. a y due to the (4.110). A dependence possible to f i n d Gk{^i) more quantitative v i a the m i n i m i z a t i o n following section begins by showing that Gii®) approximates e x p l a n a t i o n is a n o n - t r i v i a l p r o b l e m , of the weights o n t h e d a t a , , exactly to be a n Progress t o w a r d a m o r e precise e v a l u a t i o n of h e a c h i e v e d b y s t u d y i n g a " s i m p l e r " p o l y n o m i a l that s o m e h o w the p o l y n o m i a l f r o m the is a n a p p r o x i m a t i o n to the H e a v i s i d e f u n c t i o n H(x)=\ b y a p o l y n o m i a l of degree k. Gk(o~i) ~ 1 and although (4.110) theoretically it is this is not p r a c t i c a l . The m a y be r e g a r d e d as a n a p p r o x i m a t i o n to a n i n t e r p o l a t i n g p o l y n o m i a l t h r o u g h the first two s i n g u l a r values. W i t h this s i m p l i f i c a t i o n it is p o s s i b l e to g a i n insight into the filtering p r o p e r t i e s . 4.3.2 Approximate: Interpolation T h e precise a l g e b r a i c representation o f the first few c o n s t a n t s a a , a bb WAb 2 and 0 2 is f a i r l y easy. T «i = <" > 4 112 Chapter 4. The Filtering Properties of the Method of Conjugate Gradients (b b)(b A b)-(b Ab) T 2 [ _ fa T 2 T T 3 (Fb)(b*A>b) { b T 1 bb T ' { {VAbf A b T {VAHy T - b Ab 2 (b A b)(b Ab)- 65 y (- ) 4 114 To simplify the algebra it is assumed that the linear system has a diagonal coefficient matrix. The above can then be simplified using bH =6 = ±b\ t=i n b Ab T = = s i=\ bAb T 2 = f>?6? =d i=l bAb T 3 = = * (4-115) i=i The polynomial given at the second iteration (4.104) can therefore be expressed using the symbols in (4.115) as, „ , .. ,bt — sd. ,bd — s . ~ 2 ftW ta '- ir7^ = ) ( (4 - 116) (4.116) may be by expanded using the terms, (bt-sd) = (ai + o- )(ai - a ) blb + (en + o-3)(o-i - asYblbl + 2 2 2 2 2 (bd-s ) = {a -a fb\b\ + {cr ~<T ) b\bl + . . . 2 2 1 (ts - d ) = 2 2 1 aio^o-i - (4.1 3 o- ) b\b\ + 2 2 <ri<r (o-i - cr ) & 6 + . . . 2 3 2 (4.119 2 3 where the eUipsis indicate that the terms continue to be included in the summation for all i,j such that j > i. Substituting these into the polynomial (4.116) gives Q ( x ) as 2 r (,\ - {^+^){(Ti-<r ) b\b ^.. _ ( r - c r ) 6 6 + . . . ^ - - o {* -a yb\b + ... a a {a -<r fb\b 2 2 2 2{X} 2 < 1 2 2 2 2 0 x 2 l 2 l 2 1 2 With the assumption that cr is distinct from 0\ (4.120) may be rewritten as 2 2 X 2 • Chapter 4. The Filtering Properties of the Method of Conjugate Gradients Cl C ^ M l (oi + r)(o- < 3 H — 0"2 + a ) b\bl + 2 1 66 z (*i-<r )'fflj + ... a = , Cr ,(<r -<r ) fflS + ... a J 1 lg ^ 2 3 (a - ^6?63 + ... + a K - cr fb\bl + z ... K-cr ) 6 6I + ... 2 2 2 0"icr + o-o-(o- 1 3 1 <r ) b\b\ j 2 + •.• 2 z X ' (4.121) (<r-<7)26?6!.+ ... 2 1 2 With the expression for G {x) in this form (4.121) the "shape" of this polynomial can be 2 examined by considering the magnitude of the three quotient terms, (<Ti + <r)(o-i 3 o- ) & 6 + • • • (cr -o yb b 2 1 2 + 2 2 [a -o fb b 2 1 (a -o ) b b 2 2 2 3 (o- -o yb b 2 1 2 ... (4.122) +... 2 2 i 2 3 2 2 + ... + ... (4.124) Recall that the system under consideration is diagonal so that the singular vectors are just the unit vectors, thus the quantities 7; = ufb are equivalent to bi. In §2.2 the discrete Picard condition gives assumptions that are made concerning the relative values of the 7; and the o~i in order for a smooth solution to exist. It is therefore valid to assume that the bi will decay to the noise level of the system at least as fast as the decay of the singular values o~i. The singular values will then continue to decay to zero. In an "ideal" case, with little or no noise, the bi will decay very rapidly. Thus the three quotient terms (4.122), (4.123), (4.124) will all be negligible. The polynomial then reduces to G {x) 2 ~ (——l)xcr a 1 2 — o- o~ 1 2 x 2 (4.125) Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 67 This gives clear insight into the behaviour of Qk as the righthand side of the expression (4.125) is exactly the unique Lagrange interpolating polynomial (quadratic) through the points (o !, 1), (cr , 1), (0, 0), which is denoted as L [(o' , 1), (<r, 1), (0, 0)]. - 2 2 2 l This is a representation of the continuous polynomial which obviously may have values very different from one at all continuous points between a and <r. This explains the x 2 behaviour shown in Fig. 4.1. However, the Lagrangian interpolant, when considered evaluated only at the singular values, is a good approximation to the step function, and thus the method of Conjugate Gradients shows favourable filtering properties. The similarity of the polynomial C/ (cr) to the Lagrange interpolant means that after 2 two iterations, the method of Conjugate Gradients gives similar results to regularization in standard form (2.39), i.e., when the magnitude of the regularization parameter is small relative the first two singular values and large relative to the rest. This procedure is also comparable to truncating the SVD expansion with the first two singular vectors. From the point of view of computational effort both the truncated SVD and Regularization require 0(n ) operations and are unsuitable for large sparse systems. This result (4.125) 3 suggests the the method of Conjugate Gradients (which requires 0(n ) operations per 2 iteration) may be a viable alternative. It is interesting to note, by considering again (4.121), that if the singular values are clustered in some way, e.g., if the first two singular values are very close, so that <j ~ o^, 2 then the expansion reduces instead to approximate the interpolating polynomial through ((T!,!),^,!) and (0,0). Although this gives insight into the nature of the filtering properties of the method of Conjugate Gradients in general the polynomial, C/ , is only an approximation to the 2 Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 7J2[(<TI, 1), (02, 1), (0, 0)]. The manner in which Q 2 68 deviates from the Lagrangian inter- polant will depend on the magnitude of the three quotient terms, (4.122)-(4.124). To be able to reliably predict how close Q will be represented by L requires re-consideration 2 2 of the difficulties associated with ill-posed problems. A system with unreasonably high noise level will give significant magnitudes to (4.122)-(4.124) and therefore Q may deviate substantially from the interpolant. However 2 this corresponds to a solution that is very difficult to compute, as it is highly unstable. This can be seen from consideration of the weighted least squares fit (4.110). Recall that the polynomial is formed so that the singular values are close to unity whilst requiring that the polynomial goes through the origin. In a system where the weights — are high Oi for i small and negligible for i large the fit will be accurate for the larger singular values and thus generate a smooth approximation. If all the weights are of a similar magnitude, or they are unreasonably high for the later singular vectors, (as is the case for "noisy" systems) the fit will be very poor as too much emphasis will be given to the unstable vectors. This demonstrates that the precise nature of Q is highly problem dependent. 2 However, in cases where the exact solution is reasonably smooth the polynomial Q will 2 be well represented by 4.3.3 L [(ai, 2 1), (<r , 1), (0, 0)]. 2 Extension to Later Iterates Numerically it is shown that the interpretation given in the last section holds more generally: i.e., at the third iteration, G (x) is an approximation to the interpolating 3 polynomial through the first three singular values, Q_(x) ~ L [(<r 3 1} 1), (<r ,1), 2 ((73,1), (0,0)]. (4.126) 69 Chapter 4. The Filtering Properties of the Method of Conjugate Gradients In general, at the fcth iteration Q is an approximation to the Interpolant through k k singular values, Qk(x)~ (4.127) L [(<r l),...,(o-k,l),(0,0)} k u In Fig. 4.2 the first few iterates are shown where the plots are evaluated at the singular values for another test problem. The close resemblance to the step function is clear. The test problem was generated so that the coefficient matrix was that of the Hilbert matrix with entries = 1 f° r l <i,3< 9 9 ( ) 4128 This is a well known ill-conditioned matrix. The vector b was generated so that the exact solution was given by Xi = a random noise factor was then added. Each successive iterative gives an approximation to the interpolating polynomial through all previous singular values plus one more. Once a singular vector is included in the solution it will remain so for all following iterations. As the noise level is approached the Qk will deviate more from the Lk and the weight for the next singular vector may be too small, or too large. In Fig. 4.3 the polynomials are plotted at the singular values for the first few iterations in the Trapezoidal example detailed in §4.2. The deviation from the step function is evident and corresponds to the stage at which the noise level has been reached. This analysis shows that the method of Conjugate Gradients continues to act like the method of truncating the S V D , but that there is also some damping effect as in the method of regularization in standard form. More accurately, the method of Conjugate Gradients gives results that are similar to performing the technique of regularization in its standard form a number of times with successively smaller values for the parameter; each iterate will correspond to using a new value for the parameter. Chapter 4. The Filtering Properties of the Method of Conjugate Gradients cndex 70 tndex F i g u r e 4.2: T h e p o l y n o m i a l s from i t e r a t i o n s 2,3,4 a n d 5 e v a l u a t e d at the s i n g u l a r values for t h e H i l b e r t m a t r i x Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 71 F i g u r e 4.3: T h e p o l y n o m i a l s from iterations 3,5,7 a n d 9 e v a l u a t e d at the s i n g u l a r values for the T r a p e z o i d a l e x a m p l e Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 4.4 72 A p p r o x i m a t i o n to the Inverse The theory in the last section may be reinforced by considering the polynomial Vk{A) as an approximation to the inverse of the coefficient matrix. The method of Conjugate Gradients solves the linear system Ax = b so that x = A~ b. l If the process were to continue for the full n iterations then x = V (A)b. So that in the x n absence of roundoff error the polynomial is an expansion of the inverse, A- = V (A) 1 (4.129) n This means that each singular value satisfies the identity — x*=V {<Ti)x* (4.130) n and therefore that at the solution x" <TiV {<n) = 1. (4.131) n (4.131) shows that the expansion of the inverse is exactly the interpolating polynomial forced through unity at each of the singular values L [(<ri, 1), . .. , (<x, 1), (0, 0)]. It is n n interesting to note that although the cti and the 8 that contribute to the formation of { the polynomials depend on the right hand side vector b, this dependence is lost and the final polynomial is unique for each matrix. Again qualitatively it can be said that at each stage of the algorithm the polynomial, Vk(A), is some approximation to the inverse of A. This also follows from (4.110) that can be written equivalently as min\\AV (A)k I\\l (4.132) The notion of Vk(A) as the approximation of the inverse of A can be emphasized by extending a point made by Hestenes in [21]. Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 73 T h e a l g o r i t h m (3.77) c a n be r e w r i t t e n i n a m a t r i x f o r m as x = 0 0,r j? = b o k = 1,2,.. PkPk Pk Pk A D k = f-AE Pk = Dlrk-i Xk = Xk-l + r = D r _i k where D k and E are m a t r i c e s . k k k EkTk^ (4.133) k F r o m the above the i t e r a t e x c a n be g i v e n as the m a t r i x k expansion, x k = E Dr = {E +E = X 0 1 + ED D T 0 7 X Q + ... + E D ^ 0 K + ... + 2 K ... k of 0 Bb k i=l representation 0 ri is f o r m e d as the s u m of outer p r o d u c t s 7 k 2 E) k w h e r e the m a t r i x B DDr V (A). k Therefore PiPi 1 '—, a n d is the m a t r i x Pi Pi A r e c a l h n g the c o n j u g a c y p r o p e r t y vectors f r o m t h e a l g o r i t h m (3.77), the o p e r a t o r g i v e n b y B A k o f the d i r e c t i o n is equivalent to the i d e n t i t y for a l l p r e v i o u s d i r e c t i o n s , a n d m a p s a l l future ones to the n u l l - v e c t o r , i.e., BA k = Pj B Apj P j =0 k j < k (4.134) j > k. (4.135) T h i s relates to t h e i d e a t h a t once a s i n g u l a r vector is i n c l u d e d i n the e s t i m a t e i t r e m a i n s i n c l u d e d for a l l f o l l o w i n g i t e r a t i o n s , a n d t h a t one m o r e is i n c l u d e d at each i t e r a t i o n . The above conditions give that (B A-I) =0 k Pj j<k (4.136) Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 74 Bk is thus an approximation to the inverse of A on the span{pi, . .. ,pk} which is equivalent to span{b, ... ,A ~ b}. k 1 This means that as the iterations progress the identity o-iV^i) = 1 (4.137) holds for successively more singular values. (4.137) may be interpreted as a polynomial that is interpolated through the singular values at unity. 4.5 N u m e r i c a l Tests The performance of the method of Conjugate Gradients was tested numerically using a number of test problems. The same test problems were solved using the method of truncating the SVD (2.31) and by Landweber's iteration (2.53). The results are discussed below. In every case the method of Conjugate Gradients shows a significant improvement, in terms of work for a given accuracy, over the method of Landweber. The performance of the method is therefore compared to the method of truncating the SVD. For the tests using the method of truncating the S V D , the truncation was performed at the point where the quantity 7; had reached the noise level, thus using the criterion in [40] to select the regularization parameter. The test cases used are those used earlier in §4.2 and §4.3.3 and detailed again below for clarity. A fourth test case was generated by discretizing the integral equation K(s,t) = e st e y t+1 - 1 t+ 1 W for 0 < s,t < 1 which gives a solution of f(t) = e l (4.138) Chapter 4. The Filtering Properties of the Method of Conjugate Gradients method CG TSVD LAND error .1.977 x 1(T 1.700 x 10~ 1.130 x 10~ 75 itn. 10 7 200(+) 5 5 2 Table 4.4: Comparison of accuracy and work for the Hilbert example. where Simpson's rule was used with a step length of 1/99. This equation was used in [6]. The singular value spectrum for each of the test cases is given in Fig. 4.4; this shows the clustering in each case and thus gives an indication of the number of stable singular vectors that should be included in an appropriate approximation. Table 4.4, which shows the results for the Hilbert case, with coefficient matrix = - , i +j - 1 1 1 (4-139) and solution x = 1/z, the method of Conjugate Gradients compares well, with regard to accuracy, with the truncated SVD. The matrix has four large singular values with the others clustered close to zero. The progress toward the chosen estimate is equivalent to that of the truncated SVD for the first four iterations and then begins to slow. This is expected from the spectrum of the singular values. The computational effort required for this application of the method of Conjugate Gradients is substantially less than that required for the same accuracy to be achieved by the method of truncating the S V D , (approximately 1/10 the number of operations); this is significant when considering larger systems. For the tests performed with the examples set up as the Trapezoidal, Table 4.5 with kernel K(s,t) = Vs 2 +t 2 0 < s , t < l (4.140) and solution /(£) = t, and the Simpson's case (see (4.138)) in Table 4.6 the accuracy Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 10 60 INDEX 10 60 INDEX Figure 4.4: T h e singular values for the test examples. 40 76 60 INDEX 10 60 INDEX From left to right and top bottom they show the Hilbert, Trapezoidal, Phillips and Simpson test cases to Chapter 4. The Filtering Properties of the Method of Conjugate Gradients method CG TSVD LAND error 4.942 x l O " 4.684 x l O " 5.176 x l O 1 1 - 1 77 itn. 5 16 200(+) Table 4.5: Comparison of accuracy and work for the Trapezoidal example. method CG TSVD LAND error 5.894 5.892 5.901 itn. 6 6 200(+) Table 4.6: Comparison of accuracy and work for the Simpson's example. obtained is again favourable. In both cases the convergence is slow but the method of Conjugate Gradients is shown to be an efficient alternative to the method of truncating the SVD. The slow convergence is emphasized since for the Simpson's example another 10 iterations were required to gain the extra accuracy in the third decimal. The Phillips' example with kernel K{s,t) = 1 + c o s ( ^ ~ S ), |a|,|*|<3 (4.141) o Tt and solution f(t) = 14- cos( —) is not as ill-conditioned as the other cases. Many of the o components give no contribution to the solution; of the first 19 singular vectors only 9 have a value for 7; larger than zero. Table 4.7 shows that the accuracy of the method of Conjugate Gradients is again competitive with the method of truncating the S V D , and whilst the number of iterations is "larger" it is noted that this still corresponds to a much smaller amount of computational effort than the 0(n ) ~ 10 operations required 3 6 by the truncated SVD. In the test cases the solution is chosen with reference to the actual error which is Chapter 4. The Filtering Properties of the Method of Conjugate Gradients method CG TSVD LAND error 1.789 x 10" 1.947 x 1(T 1.920 x 10" 2 2 1 78 itn. 14 9 (non-zero) 200(+) Table 4.7: Comparison of accuracy and work for the Phillips' example. available as the exact solution to the integral equations. In Fig 4.5 the chosen solutions are shown for the test example of Simpson and Phillips'. The solution to the Phillips' example shows almost perfect resemblance to the exact solution. example the oscillation is interesting. For the Simpsons It is believed that this is accounted for by an extra singularity in the discretized system and that the approximation is accurate for the discretized system. (It should be noted that the error measure is used a comparison to the continuous solution.) The results from the truncated SVD are comparable and it is noted that if an averaging is performed over neighbouring points an accurate representation of e (the continuous solution) is achieved. f When the exact error is not known the question of when to terminate the iterative process is non-trivial. In some cases the convergence may be very slow and the calculation of many iterations is not justified with regard to the work required. The residual does not give an accurate indication of convergence rate or of oscillations in the solution. The number of iterations is certainly small and so one acceptable approach is simply to study consecutive iterates and terminate the iteration as the oscillatory components appear. In cases where a priori knowledge is available on the singular value spectrum or of the noise in the system a similar criterion for the method of truncating the SVD [40] may be successfuly used as the number of iterations for the method of Conjugate Gradients. Generally this knowledge is not available and although all the analysis in this thesis has been achieved via the singular value decomposition, in practice all that is Figure 4.5: The chosen solution for the Simpson and the Phillips' test examples. Simpson example is the one above The Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 80 available are the a and 3 from the algorithm. It is suggested that an unreasonable change in the magnitude of the parameters a or 3 from one iteration to the next may be an indication that the process should be terminated. A significant alteration in the size of one or more of the parameters a; or 8{ will be an indication that the z'th singular value shows a sharp reduction in magnitude from the previous one; this requires that the constants ujb also reduce rapidly and may suggest that the noise level has been reached. The large value of the parameter will have the effect of placing unecessary emphasis on the weighting of one of the singular vectors and the polynomial will deviate from the Lagrange interpolant. Tables 4.8-4.11 show results for the four test cases and demonstrate the observations above. The results show numerically that there is a correspondence in the sharp increase and the convenient termination point. In Table 4.8 an shows an increase of 10 from a^o and 3\2 is an increase of 10 8 12 from Bn corresponding to the minimum error at iteration 11. There are also increases of 10 and 10 between a j 5 6 6 and 3^,6 that indicate a reasonably accurate solution. In Table 4.9 the error continues to improve but very slowly. a 9 shows an increase of 10 8 from ag and there is an increase of l O between /3 7 and may be used to indicate that 1 0 6| sufficient accuracy has been achieved by iteration 8. (Other large increases, consider for example /3 4)5 and 3 14il5} correspond to relatively good improvements in the solution during the slow convergence.) In Table 4.10 this example again has an error that is monotonically decreasing. However a is 10 s 12 times the size of 07 and corresponds to a convenient termination at iteration 8. In this example the magnitudes of the parameters vary dramatically, compare for example ot ,8 ,8s g iiS 6i6 t and 812,13, a n d is believed to be an indication of the prohibitively slow convergence; suggesting that additional work to improve the accuracy is not recommended. The Phillips' test case is less clear. Table 4.11 shows that no such sharp rise is apparent within the first 15 iterations. The largest Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 81 increase is however at the convenient termination point of iteration 15; ais is 10 times 3 ai . 4 Note also that 3 is an increase of 10 from 8 5 9 S and corresponds to a significant improvement in the solution. The numerical evidence demonstrates the the method of Conjugate Gradients may be used successfully as a solution method for ill-conditioned systems. The results support the theory that each iteration of Conjugate Gradients has a similar effect to increasing the number of singular vectors included in the estimate by one. This means that the results are comparable with those from the truncated form of the SVD and to regularization in its standard form. As with all ill-conditioned systems the performance of any solution method is somewhat problem dependent as it relies heavily on the relative decay of the quantities 7; and the singular values <T;; this is also the case with the method of Conjugate Gradients. Chapter 4. The Filtering Properties of the Method of Conjugate Gradients Itn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Resid 3 659 X i o - I 2 003 X i o - 2 6 179 X i o - 4 1 368 X 10" 5 1 368 X i o - 5 5 867 X i o - 7 5 856 X i o - 7 5 453 X i o - 7 5 453 X 10" 7 5 453 X i o - 7 5 437 X i o - 7 5 435 X i o - 7 5 435 X i o - 7 5 435 X 10" 7 5 376 X 10" 7 Pnorm a 5.550 2.109 x l O " 1 1.485 x 10° 3.003 x 10"- l -3 2.113 x 10 4.356 x 10" 3.022 x 10"-5 4.179 x 10 1.403 x 10"-7 3.229 x 10° -6 2.084 x 10" 6.608 x 10 2.957 x 10"-8 •1.490 x 10° 3.970 x 10" 10 2.894 x 10 -9 1.633 x 105.899 x 10° -8 3.204 x 10" •7.531 x 10" 5.143 x 10" 12 6.332 x 10 -5 2.618 x 10" 1.692 x 10° 7.890 x 10"-9 4.217 x 10 4.812 x 10--8 2.109 x l O - 1 2.977 x 10" 11 4.548 x 10 P 1 2 5 - 5 2 1 3 5 7 2 7 0 2.919 x l O ' 3 2.111 x l O ' 4 4.813 x 10~ •2.156 x 1 0 1.436 x 10 2.958 x 10~ 1.884 x l O ' 4 3.644 x 10° 1.951 x 10 •2.360 x 10~ 5.092 x 10 3.012 x 1 0 ' 6.098 x 10° 5.676 x l O " 1 1 6 4 4 6 82 Error 4 567 X 10 - l 9 301 X 10 -2 1 271 X 10 -2 1 391 X 10 -3 1 390 X 10 -3 1 308 X 10 -4 1 308 X 10 -4 1 979 X 10 -5 1 979 X 10 -5 1 977 X 10 -5 3 250 X 10 -4 3 692 X 10 -4 3 725 X 10 -4 3 725 X 10 -4 1 705 X 10 -3 Table 4.8: The parameters from the first 15 iterations by Conjugate Cradients for the Hilbert example. The "•" indicates points at which a or 0 show significant increases in magnitude. Chapter 4. The Filtering Properties of the Method of Conjugate Gradients Itn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Resid Pnorm 1 622 X 10- i 3.622 x 10 0 8 778 X i r r 4 1.562 x 10"-2 7 698 X 10" 5 5.832 x 10"-6 7 680 X 10" 5 8.495 x 10"- 8 7 017 X io- 6 2.150 x 10"-4 6 894 X io- 6 1.676 x 10"-9 2 840 X io- 6 9.051 x 10"-6 2 840 X 10" 6 4.108 xTO"-9 8 919 X io- 7 2.140 x 10" 10 8 918 X 10" 7 2.370 x 10- 11 8 918 X io- 7 3.450 x 10"-9 8 918 X 10"-7 4.223 x 10" 10 3 624 X 10" 7 3.612 x 10--9 3 624 X 10" 7 3.931 x 10- 12 1 589 X 10" 7 1.170 x 10--7 a 1.512 x 10° 1.079 x 10 2.248 x 10 3.845 x 10 3.203 x 10 6.076 x 10 2.603 x 10 •1.524 x 10° 1.625 x 10 3.361 x 10 1.693 x 10° 1.016 x 10 8.126 x 10 4.497 x 10 2.376 x 10 2 4 3 2 5 3 8 5 3 6 4 s 0.000 x 10° 1.860 x 10~ 1.394 x 10" •2.121 x 10" 2.531 x 10 • 1.538 x l O " 5.399 x 10 3.706 x l O " 7.961 x l O " 1.240 x 10~ 1.450 x 10 1.191 x l O " 8.526 x 10° 1.837 x 10~ 2.975 x 10 5 1 4 3 4 3 2 2 1 4 4 Error 1.763 X 10° 5 164 X I O 4 993 X I O 4 993 X i o 4 942 X IO" 4 942 X i o 4 933 X 10~ 4 933 X i o 4 917 X i o 4 917 X i o 4 917 X I O " 4 917 X i o 4 904 X IO" 4 904 X i o 4 891 X i o - 1 7 3 83 1 1 7 1 x 1 1 1 1 1 1 1 1 Table 4.9: The parameters from the first 15 iterations of Conjugate Gradients for the Trapezoidal example. The "•" indicates points at which a or 3 show significant increases in magnitude. Chapter 4. The Filtering Properties of the Method of Conjugate Gradients Itn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Resid 2 101 X 10 -1 6 221 X 10"-4 4 615 X 10"-6 4 588 X 10 -6 1 143 X 10 -6 1 684 X 10"-7 1 684 X 10"-7 8 363 X 10 -8 8 254 X 10 -8 8 246 X 10 -8 1 266 X 10 - 9 1 266 X 10 - 9 1 266 X 10 - 9 1 266 X 10 - 9 1 266 X 10 - 9 Pnorm a I 3.429 x 10 4.871 x l O " 2.375 x 10"-2 7.824 x 10 -6 2.381 x 10" 6.827 x 10 7.441 x 10"-7 •4.871 x l O " 1.375 x 10 3.790 x 10" 10 -5 4.048 x 108.330 x 10 2.535 x 10- 11 •4.872 x l O " 2.223 x 10" 13 4.323 x 10 3.585 x 10"-7 2.276 x 10 -4 1.060 x 10" 5.732 x 10" 1.155 x 10--5 2.647 x 10 1 7 1.304 x 10 2.160 x 10" 5.627 x 10 5.673 x 10" 16 4.080 x 10" 12 3.604 x 10 1 0 2.286 x 10" 1.500 x 10° 1 1 7 4 8 1 8 1 11 3 1 3 1 5 6 0 0.000 x 10° 4.797 x 10~ 1.005 x 10~ 8.965 x 10•2.826 x 10~ 1.068 x 10 4.170 x l O •7.725 x 10~ 1.613 x 10 2.956 x 10 1.090 x l O " •1.817 x 10~ 2.577 x 10 7.193 x 10 5.601 x 10 2 7 5 1 - 8 5 6 2 1 16 1 3 1 84 Error 6 182 x 10° 5 896 X 10° 5 894 X 10° 5 894 X 10° 5 894 X 10° 5 894 X 10° 5 894 X 10° 5 892 X 10° 5 892 X 10° 5 892 X 10° 5 892 X 10° 5 892 X 10° 5 892 X 10° 5 892 X 10° 5 892 X 10° Table 4.10: The parameters from the first 15 iterations of Conjugate Gradients for the Simpson's example. The "•" indicates points at which a or 0 show significant increases in magnitude. Chapter 4. The Filtering Properties of the Method of Conjugate Gradients Itn Resid Pnorm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1.015 x 10 i 3.791 x 10 0 6.926 X io-- l 2.443 X 10"-2 3.065 X 10"-3 2.010 X 10"-3 2.009 X 10"-3 1.846 X 10"-3 6.451 X 10"-4 4.158 X 10"-4 4.157 X io--4 4.156 X io--4 3.156 X io--4 2.131 X 10"-4 1.841 X 10"-4 2.377 x 10 2 4.234 x 10 1 9.891 x 100 6.243 x 10"- l 2.932 x 10"-3 2.425 x 10"-4 6.862 x 10"-4 9.125 x 10"-5 1.388 x 10"-1 7.698 x 10"-2 1.523 x io--5 1.032 x io--5 2.369 x 10"-4 1.077 x 10"-2 6.265 x 10"-6 a 3.197 x IO" Error 2 5.095 x IO" 1.498 x IO" 1.235 x 10° 6.833 x IO 9.159 x IO 3.120 x IO" 8.874 x IO 2.773 x IO" 4.066 x 10~ 2.128 x IO" 1.984 x 10° 4.138 x IO •6.751 x IO" 4.341 x 10 2 0.000 x 10° 3.079 x IO" 5.331 x IO" 4.185 x IO" 2.215 x IO" 6.795 x IO" 2.377 x 10° •5.113 x IO" 1.520 x IO 5.544 x IO" 3.739 x I O 3.497 x IO" 2.258 x IO 4.546 x IO 3.308 x 10~ 2 1 2 3 5 1 1 3 2 1 3 1 1 2 1 - 5 1 1 2 85 1 1 1 4 2 3.026 x 10 0 1.729 x 100 8.011 x 10"- l 2.149 x 10"- l 6.895 x 10"-2 5.929 x 10" 2 5.928 x 10" 2 5.234 x 10" 2 2.907 x 10" 2 2.866 x io-•2 2.865 x 10" 2 2.864 x 10" 2 2.149 x 10" 2 1.789 x 10" 2 1.640 x 10" 2 Table 4.11: The parameters from the first 15 iterations of Conjugate Gradients for the Phillips' example. The "•" indicates points at which a or 3 show significant increases in magnitude. Chapter 5 Numerical Application This chapter comprises a numerical example to demonstrate the practical use of the method of Conjugate Gradients to calcuate a smooth approximation to a linear system of equations that is severely ill-conditioned. The problem is that of retrieving an in focus picture from a blurred image. The method of Conjugate Gradients is applied and a smooth approximate solution is computed with minimal work. The problem is described in §5.1. §5.1.1 discusses the general mathematical interpretation of the problem and §5.1.2 details the problem used for the illustration. The results of a number of test examples are given in §5.2, where both the numerical and visual representation are discussed. 5.1 A D e s c r i p t i o n of the P r o b l e m A full description of the theory of image processing and the mathematical treatment of the problems is available in many texts, see for example [31] and the references therein. This section gives a brief introduction to the ideas of reconstructing an image from a degraded picture and details the particular numerical example used in this thesis. 5.1.1 T h e M a t h e m a t i c a l Reconstruction of Degraded Pictures The reconstruction of a sharp image from an out of focus picture has important practical use in photography. A n image may be blurred due to movement, improper focussing through the lens or inappropriate lighting. The retrieval of the associated in focus image 86 Chapter 5. Numerical 87 Application is a difficult problem. Many original pictures that are quite different can produce a very similar result when blurred : the problem is therefore ill-posed. Reconstruction of the exact picture from a general image is not possible. A calculated estimate must however somehow smooth the error caused by the blurring. The problem can be modeled mathematically. It is assumed that the original picture is a black and white photograph with a continuous gradation for the grey scale. The variation in the grey level, that results in the visualized image, can be represented as a two-dimensional function of position, f(s,t) t which is denned over the finite region of the picture. The affect which causes the degradation of the image may be represented by an operating function K(x,y,s,t), known as a point-spread function. The degraded image g{x,y) g(x,y) is given by the two-dimensional integral equation J J = (5.142) K(x,y,s,t)f(s,t)dsdt (this is the two dimensional equivalent of (2.7)). In many examples the point-spread function is separable, i.e., the effect in the two space dimensions is independent and the kernel is expressable as the product, K(x,y,s,t) (5.143) = Kfaa^tfat). The action of the blurring may depend on position, i.e., be spatially variant (e.g., with motion or lighting the effects may be much worse over one area of the image); these cases occur frequently and are severely ill-posed. One common example of a spatially varying point spread function that is separable is that of a Gaussian blurr, [2]. K(x,y,s,t) = exp{-p (x x - s) 2 - p (y y - t) }. 2 (5.144) where the functions p and p vary linearly from the centre of the image. The more the x y variation in the functions p and p is, the more severe the degradation will be; this will x y result in an increase in the ill-conditioning of the problem. Chapter 5. Numerical Application 88 In order to obtain numerical solutions to problems of the form (5.142) the equation may be first discretized. The image may be digitized into a two-dimensional array by dividing the area into a grid of equally sized picture elements, or pixels. Each pixel is assigned a numerical value that represents the average grey level over its area. The theory of digitization is very complex [31] but a straightforward approach is to divide the grey scale into a number of equal length intervals, Ij, and assign one numerical value (often the integer, j, is used) to represent the "colour" of each interval. The two dimensional array that forms the digitized picture, F, can be simply transformed into a vector by stacking each of the rows of the array end on end, (see §A.l), thus an original n x n image is converted into an n -vector. 2 The discretization of the point spread function is performed by some quadrature rule. In the case where the operator is separable as in (5.143) the discretization results in a system of the form {K ®K )J=g 1 (5.145) 2 where A ® B is the Kronecker product of two matrices [23]. The matrices K\ and K 2 are n X n arrays and the discrete images / and g are n vectors. In the example of the 2 Gaussian blurr (5.144) the entries in the matrices K\ and K are given by 2 {K^n = exp{- (i - j) } {KJn = expi-wti-j) } 2 Wi 2 (5.146) (5.147) where Wi is the weighting function for the quadrature rule. The discretization of problems from image reconstruction frequently give rise to systems (5.145) that are large and sparse. The point spread function generally acts only locally, that is the "blurring" of any particular point affects only the neighbouring pixels and so many of the elements of K Y and K 2 are zero. The number of pixels used in any problem may be substantial since the finer the digitization is, the more accurately Chapter 5. Numerical 89 Application the picture can be represented. The ill-posed nature of the reconstruction problem is reflected in the discretized system by the ill-conditioning of the resulting product of the matrices K\ and K 2 5.1.2 T h e Test Example Using the properties summarized in the appendix §A.l it is possible to transform the equation (5.145) into an (n x n ) linear system; this may be achieved in various ways. 2 2 The example used in the tests is based on an example in [14]. The system is created using the singular value decomposition of Ki as, = f/iEiVf K x where S i = diag(ou, cr , .. . , < 7 i 12 ). n (5.148) Then, using property (3) in §A.l, the coefficient matrix is expanded as ( t f i E i V f ) ®K 2 = ((7i ® /)(Si <g> K ){V? ® /) 2 (5.149) The problem of finding the solution of minimum norm is therefore reduced, by using property (4), to min ||(Si ® K )fi fi - gi\U 2 where // = 9i = ( V f <g> I)J {U?®I)g From the definition of a Kronecker product the coefficient matrix Kf = (Sj <8> K ) is the 2 block diagonal array \ / <^uK 2 o-\ K 2 2 <^lnK J 2 Chapter 5. Numerical Application The 90 vectors, / / and gi are just formed as the reduction of the matrices Fj = V F and T t X Gi — U^G, (using property (2)). The condition number of the coefficient matrix, Kr, formed above is the product of the condition numbers of K and K . The matrices K and K are totally positive [23], and x 2 x 2 thus their Kronecker product is also. The singular vectors i>- therefore have (i — 1) sign t changes and so an increase in the contributions from the singular vectors corresponding to even relatively large singular values may cause instability in the solution. The original [n x n) image has therefore resulted in an (re x re ) system that is sparse and extremly 2 2 ill-conditioned. The numerical example used for the tests is a modification of the Gaussian blurring function given in [14]. The elements of the matrices K~i and K are given as 2 (K^ (K ) 2 13 = (—f^-cn^i-j)') = (c-7i^ ( - ^ ( i - j ) ' ) (5.150) (5.151) 2 with 7 i = 2 - 3.92,ra - + 0.5-t. re 2 (5.152) the constants are set such that c i = 1/2 and c = 1/4 (this corresponds to_ a moderately 2 severe blurring effect, [2, 14]). The three test images are taken as simple 15 x 15 digitized arrays that corresponded to the sharp images shown in Fig.5.6. The digitization is set to have 9 grey levels so that values below zero are represented as white up to values above 8 which are represented as black, which allows a visual (if crude) representation of the solution to be given. 5.2 The Discussion of Results system is very large and sparse. The use of the truncated form of the singular value decomposition (2.31) on this system would be extremely inefficient as it would require Chapter 5. Numerical Application 91 Figure 5.6: The original sharp test images of the Cross,L and Box. For the Cross and the L each test is a binary image composed of 8 s and O's; the Box has an additional middle grey level of 4. The Legend shows the grey scale used for the digitization. r Chapter 5. Numerical Application 92 0 ( n ) operations (where n is the number of pixels in one row). With efficient routines 6 for the multiplication of banded matrices the use of the method of Conjugate Gradients has a work requirement of 0(n ) 3 operations. It is not possible to retrieve the exact images in Fig.5.6 from the degraded images, shown in Fig.5.7, due to the severe ill-posed quality of the problem. (Comparison of Fig.5.7 with Fig.5.6 demonstrates the extent of the retrieval problem.) The requirement for the solution to be smooth, as given by the discrete Picard Condition (2.27), can be interpreted visually. A n approximate solution will be satisfactory provided it gives a reasonable distinction between the area containing the object and the background. Gradual fading from white to black over the boundaries and into the centre of the image will be acceptable. This corresponds to a smoothing of the effect of the defocussing. The numerical representation of the images that form matrix G are given in Fig. 5.8, they demonstrate the severe degredation that has occured. In the test examples the exact error is available. The graphical representation of the exact error for each of the tests is shown in Fig.5.9. The results show that, in each case, there is a marked minimum of this error that corresponds to the best approximation to the exact image. This occurs very quickly; in the region of 4 or 5 iterations for a system of 225 equations! Examination of the singular values (see Fig.5.10) of the coefficient matrix indicates that there are four high values clustered away from the others. In terms of the discrete Picard Condition (2.27) the quantities (uf gi) must decrease very rapidly in order to be assured of a smooth solution. For most examples, therefore, the best smooth approximation will include contributions from only the first four singular vectors. This explains the rapid "convergence" (to a smooth solution) for the method of Conjugate Gradients. The value of the error seems quite substantial in each case. However it is acceptable since it accounts for the fading across the boundaries. The numerical representation of the results of the iteration process is given in Fig.5.11 Figure 5.7: The result of defocussing the test images of the Cross.L and Box. The legend shows the grey scale used for the digitization. Chapter 34 1 9 90 0 1 16 .5 258 0 4 76 .5 530 0 454 .9 314 . 7 110 .2 14 9 2 .3 1 .5 1 9 2 .7 20 . 4 test 31 112 272 466 596 613 7 555 603 717 818 801 840 407 202 10X 1 1 0 2 4 4 0 0 7 7 4 6 5 5 8 587 209 213 364 615. 672 576 404 162 53 45 56 75. 104 201 8 2 6 .4 5 3 8 18. 8 67 . 4 21. 76. 184. 316 404 416 387 375 401 463 514 496 394 250 124 . 1 86 35 47 71 77 66 56 49 56 66 77 71 47 35 86 4 30 123 149. 310 562 623. 535 371. 133. 23 10 11. 15. 21. 55. Application 94 3 3 0 3 0 7 2 2 881 394 339 429. 621. 655. 561 404 203 119 129 163. 220 303. 547 3 1225 1 624 491 8 5 486 578 9 1 575 4 492. 1 369 2 244 205 8 1 237 0 301 7 408 3 561 1 994. 8 1402 7 758 3 572 2 493 7 499 4 467 399 2 313 8 4 257 1 256 8 305 8 388 9 525 9 722 6 1270 2 1317 . 9 1 726 0 541. 5 6 0 443 8 3 418. 9 380. 2 4 1 324 2 1 8 261 9 235 6 5 247 8 1 297 0 377 9 0 7 512 0 3 703 8 5 1235 3 989 530 402 353 387 347 296 231 183 178. 212 269 365 502 883 0 6 9 6 3 4 7 5 9 9 1 7 4 0 7 557 . 256 217 258. 360 375 321 233 124 80 68. 111 151 208 373 7 3 2 9 3 6 8 4 1 3 5 8 5 2 3 354 113 125 238 418 461 395 275 104 25 17 21 28 5 5 7 2 7 2 7 .7 2 6 7 3 8 39 6 82 6 368. 97 125. 277 512. 570. 489 338 118. 16 2. 1. 2 3 22 5 2 6 7 7 3 5 8 6 2 6 8 3 2 6 416 3 108 0 141 5 316 4 586 1 652 4 559 9 387 2 135 0 17 4 1 6 0. 2 0 1 0 .4 20 0 0 0 0 0 0 .0 0 .0 0 0 0. 1 1 .6 12 6 60 1 170 6 300 0 360 .5 295 2 189 7 94 3 3 397 111 1 103 428 145. 325 603 671 576 398 138 5 6 2 4 3 5 9 17 9 1 6 0 2 0 1 0. 3 20 . 4 135 302 559 623 534 369 128 16 1 0. 0 0 6 33a 1 87 .7 1 1 1 14 9 2 257 0 9 2 9 9 9 6 5 2 1 3 IB 9 476 2 530 0 454 9 314 5 109 6 14 1 12 0 2 0. 1 0 2 18 . 1 iaage . 1 0 0 5 .4 0 1 .0 .1 . 3 0 0 .3 . 2 2 .1 t»tt 9 8 4 1 5.4 8 .2 8 9 7 8 6 5 5 .6 8 5 7 .8 8 .9 8 2 5 .4 4. 1 9 8 Numerical t » s t imagm CROSS . 5. 183. 8 281. 359 369. 343 332 356. 411 457 441 350. 222 110. 0 2 2 8 8 3 7 3 .3 8 9 6 . 7 8 4 20. 74 180 309 395 406. 378 388 396 462 520 505. 402. 256 127 6 2 2 1 1 1 2 8 4 . 7 3 6 9 1 1 13. 47 116 199 255 282 244 240 270 332 413 418 336 214. 106 3 9 3 5 0 , 1 5 0 7 6 1 6 .1 2 3 3 9 14 1 34 3 58 9 75 3 77 5 73 . 1 78 .4 115 8 203 .0 301 1 332 3 257 .5 178 5 87 7 0 2 0 8 1 9 3 2 4. 1 4 3 5 1 13 6 SO . 7 137 2 238 3 277 3 233. . 3 148 9 74. ,5 0 0 0 0 0 0 0 0 0. 0 0 1 1 1. 8 7 41 3 117 2 206 1 240 8 202 8 130 3 84 8 0 9 . 4 7 . 7 .6 7 3 7 .6 .7 7 .4 .9 0 147 7 61 6 81 3 123 0 133 3 114 5 97 8 85 1 97 8 114 .5 133 .3 123 0 81 3 81 6 147 7 189 78 104 157 170 148 129 114 129 148 170 157 104 78 189 5 8 0 2 8 .3 7 .2 .7 3 .8 .2 .0 .8 5 187 81 107 181 176 159 151 137 151 159 176 161 107 81 197 167 8 75 7 99 .6 ISO .9 166 .8 160 . 1 170 3 160 .8 170 3 160 . 1 166 .8 150 .9 99 6 75 . 7 187 a 182 72 95 145 161 160 180 172 180 160 161 145 95 72 182 0 7 ,7 9 . 1 5 1 5 . 7 7 6 0. 0 0 0 0 0 0 0 0 0 0 . 1 1 2 9 9 47 5 134 a 237 i 277 0 233 . 3 149 9 74 5 0 .0 0 0 0 0 0 .0 0 .0 0 1 1 .4 11 . 7 55 8 158 6 278 .8 325 .7 274 . 3 178 3 67 6 0. 0 0 .0 0. 0 0 0 0 0 0 . 1 1. 7 13 .6 65 1 184 .9 325 3 360 0 320 .0 205 6 102 2 187 75 99 150 166 160 170 160 170 160 166 197 81 107 161 176 159 151 137 151 159 176 161 107 81 197 189. 78 104 157 170 148 129 114 129 148 170 157 104 78 189 .0 0 0 0 0 0 0 0 1 10 8 3 39 8 112 9 198 5 231 .9 195 3 125 5 62 .4 0 1 6 30 85 150 175 148 95 47 3 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0. 1 0. 1 3 0 2 2 0 15 . 2 72 . 2 204 .9 360 2 420 7 354 . 3 227 7 113 2 4 14 8 19 5 9 a 4 i 5 4 0 8 1 .5 4 7 0 1 iaage 35 4 14 . 8 19 5 29 5 32 0 27 4 23 3 20. 3 23 3 27 4 32 0 29 5 19 5 14 8 35 4 3 . 1 .0 8 .9 .5 2 .2 .2 .5 .9 8 .0 .1 3 9 9 9 3 6 5 2 .9 2 5 6 3 9 9 9 ISO 8 . 7 6 .9 8 . 1 3 a . 3 . 1 .8 . 9 99 .6 75 . 7 187 a 3 . 1 .0 8 .9 .5 .2 .2 2 5 .9 8 0 . 1 3 5 8 .0 2 8 3 7 2 . 7 3 .8 .2 0 8 5 147 61 81 123 133 114 97 85 97 114 133 123 81 61 147 7 6 . 3 0 . 3 5 86 0 35 9 47 4 1 8 5 3 .0 3 6 7 49 . 3 8 71 7 77 7 66 .6 58 7 58 7 66 6 77 . 7 71 . 7 4 9 86 0 47 35 35 29 32 27 23 20 23 27 32 29 19 5 0 .4 3 3 3 4 0 5 5 14 a 35 4 8 .2 a .9 7 .6 6 5 5 6 6. 5 7 .6 6 9 8 2 5 4 4 1 9 a Figure 5.8: The numerical representation of the defocussed test images of the Cross L and Box. These matrices form the right hand side of the linear system and show the detail the is not available once they are digitized o °4. 6 8 Lterotuon 2 1 number 10 Figure 5.9: The measure of the exact error against the iteration number for the test images of the Cross.L and Box. Chapter 5. Numerical Application 96 index Figure 5.10: The spectrum of singular values for the coefficient matrix in each of the reconstructions. for the cross; this illustrates the actual numerical effects and gives more detail than is available after digitization. The digitized equivalent images are shown in Fig.5.12 using the 9 grey levels. The images at various iterations are shown for the ' L ' Fig.5.13 and for the Box Fig.5.14 In each case good estimates of the original images are retrieved from very degraded pictures. Even using the chosen, elementary, digitization the results are quite dramatic. The results confirm that the method of Conjugate Gradients will compute a smooth approximation to the exact solution. The work requirements to perform the retrieval are extremely favourable. It is interesting to note that in the case of the Box the iterated estimates retain the symmetry of the problem. The elements a; and pi together with the L -norms of the direction and residual 2 are shown for each case in Tables 5.12, 5.13 and 5.14. As anticipated the residual is unreliable as an indication of the accuracy of the estimates. The magnitude of and Pi do demonstrate the rapid increase at the termination point. For the "Cross" there is an increase in 8 by 10 from iteration 3 to iteration 4; the best solution is at iteration 2 Chapter 5. Numerical Application Iteration nuabar2=1 1.1 1.4 1.8 1.5 1.9 2 3 1.5 1.9 2 2 1.1 1. 4 1. 7 1 9 1.1 1 .4 1.6 1 6 1 .0 1 . 2 1 . 3 1 3 0 9 0.9 0 9 0.7 0.4 0.6 0.5 0 6 0.2 0.3 0 3 0 3 0 3 0. 1 0. 1 0.2 0.3 0 5 0. 1 0.2 0 1 0 3 0 5 0 1 1 0 7 0 2 0.4 0 4 0.7 1 2 0.2 0 4 0 7 1 1 0.2 1.1 1.1 8 Itarat ion nuabar -2 0 2 -8 4 -5.0 -2.3 -7.0 -5.5 2 -5.4 -3.9 -1.7 •0.2 2.4 3.86 4 3 5 4 0 4.9 2.6 3.2 0 9 1.3 -0.3 0.1 -1.1 -1.5 -3 0 2 4 -4.4 3.6 -4 1 -5.0 -4 6 -3 - 1.0 2.1 5.4 6.7 5.6 3.7 1.7 0.5 -0.2 -0.8 -1.4 -1.6 -1.5 2 3 3 0 4 6 7 7 0 5 7 3 2 0 1 2 3 1 7 2 1 2 3 2 2 8 8 1 8 Itarat ion nuaber -1.7 2 -8 5 -5.5 -4.8 -2. 1 2 -7.3 -6. 1 -2.6 1. 5 3.0 3 9 2.6 0 7 -0.7 •19 3 4 -4.3 -0.8 3.3 -12 1. 7 5.2 8 5.4 5. 1 6 8.0 3.4 4.0 1 7 1.2 0 3 -0.3 -0 4 •1.4 -2 7 - 10 -4.8 -1.6 •5 4 -3.8 •4.3 -5.0 -1.8 - 1 -4.0 2 4 6 7 8 4 2 1 1 1 1 2 72 ItaratIon nuaber 12 -6. 1 -2 4 -7.9 -4.0 •0.0 •5.0 -1.0 -3.4 0.5 -6 -2.4 1 .0 4.5 1 5 4.6 7 6 3 2 5.6 7 2.5 5.6 4. -0.3 0.6 1.6 -2.2 •16 -0.6 -3.3 •2.4 -1.0 -4.3 •2.9 -1.0 5.3 •3.4 -1.0 -5.7 •3.5 -5.3 •3.2 -0.91.0 •8 6 8 4 3 2 3 7 9 8 8 6 1 • 2 0 1 1 2 2 2 5 3 8 8 8 4 1 2.3 2.5 2 3 2.0 1.6 12 0.1 0.5 0.3 0.4 0.6 1 1 1.5 1.6 1.5 2 2 2 1 1 0 0 0 0 0 0 1 1 1 1 2 6.5 6.9 6 9 7. 1 7.3 6 8 5 0 3.4 2.1 2.0 2.9 4 6 6.1 6 7 6 2 9 9 9 8 6 5 3 2 2 2 4 6 8 9 8 3 6 5 6.8 6.7 7.0 7.4 7 0 2 2 1 2 2 6 9 1 0 2.3 2.1 3.0 4 5 5.9 6.5 6. 1 4 5 4 7 2 6 9 7 9 2 4 9 4 7 9 0 6 0 2 1 97 5.8 3.9 1 6. 6.5 8.4 9.4 8.5 6.3 3.5 2 6 3 4 4.7 8 5 6.2 5.6 6 7 6 5 4 3 2 1 1 1 2 4 a 7 6 9 9 8 7 5 4 2 2 1 2 3 6 a a 8 9 8 5 3 3 5 7 9 10 9 0 6 9 4 7 2 9 1 7 3 9 4 7 9 6 9 0 0 6 1 0 8 4 9 7 0 4 0 4 1 6 1 0 1 9 4 1 3 6 2 5 7 9 6 8 9 8 87 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 9 8 4 3 4 7 2 7 6 7 83 9 9 9 8 7 5 4 3 2 2 4 6 9 9 8 6 9 0 9 5 1 7 5 3 3 4 7 1 5 7 6 1 2 1 1 1 0 0 0 0 0 0 1 1 1 1 3 4 2 5 7 8 1 3 1 0 3 7 6 3 5 3 0 3 9 9 9 7 6 4 3 2 2 2 4 6 1 0 1 1 1 0 9 0 0 6 0 4 0 3 0 2 0 2 0 3 0 4 0 6 0 6 0 a 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 9 9 a a 3 2 1 1 2 2 3 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 7 5 2 1 0 1 1 1 1 8 a 7 a 5 3 4 4 4 4 3 2 1 1 0 1 2 3 3 3 a i 0 0 0 a 7 0 0 1 3 5 5 4 3 1 0 0 -0 9 -0 0 -0 9 •0 7 6 4 3 3 9 a 0 3 3 0 1 2 2 1 •0 -0 0 3 5 3 1 5 3 1 1 2 3 4 4 3 2 1 0 0 0 0 1 0 3 3 3 3 4 3 3 2 1 1 1 2 2 3 3 5 7 7 9 1 9 1 1 2 0 4 2 9 2 0 1 1 1 3 4 5 4 2 1 0 0 0 0 0 0 5 4 a 0 a 0 i 7 2 4 3 4 5 6 6 0 0 1 3 5 6 5 3 1 0 -0 -0 •0 -0 -0 a a 2 2 6 4 4 5 4 3 1 2 2 2 2 0 0 0 3 6 7 6 4 1 0 -0 -0 1 1 1 2 4 5 5 3 1 1 1 1 1 1 1 a 3 4 6 6 a 0 6 9 1 1 3 4 3 2 8 0 -0 -0 1 6 3 6 2 1 0 2 6 9 5 1 -0 •0 -1 -1 -1 -1 6 2 5 7 3 7 9 7 3 0 8 a a 7 2 7 a 5 8 4 3 8 a 3 2 8 8 4 z 0 1 7 0 5 0 0 5 2 7 9 9 a 9 2 a 8 9 10 2 9 3 1 7 3 6 5 5 4 4 2 3 4 3 9 5 7 8 1 10 2 11 0 10 2 8 4 5 4 1 a a 4 3 2 3 5 a i 2 i 1 1 3 4 6 7 6 5 9 0 8 5 1 7 a 7 9 a a 3 5 5 5 4 3 2 2 3 5 a 7 a 9 8 5 6 5 6 6 9 5 9 4 9 1 8 a 1 a 4 8 9 3 3 4 5 a 3 1 5 4 a •0 • 1 6 • 1 2 1 2 5 0 7 0 6 5 4 5 1 6 0 0 •0 6 • 12 • 18 -2 1 - 19 1 1 1 2 1 3 3 1 3 1 4 1 2 0 9 0 6 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 3 • 14 4 • 1 a a -0 5 2 5 9 5 7 6 5 7 6 3 a 5 1 5 3 0 3 2 -0 2 4 -0 5 a -0 7 6 -0 5 -0 7 -2 -2 -1 1 4 6 5 3 1 0 •0 -0 -0 -0 •0 1 2 1 3 3 1 3 1 3 1 2 0 9 0 5 0 2 0 1 0 0 0 0 0 0 0 1 0 1 11 1 7 9 2 8a 83 83 9 2 6 8 6 1 1 8 8 5 3 8 1 0 -0 -0 •0 -0 -0 a 4 -0 1 •0 -0 2 2 1 5 3 7 1 6 0 4 6 1 2 0 1 -0 2 •0 •0 2 -0 2 -0 -0 -0 2 •0 8 6 1 •0 -0 •0 -0 -0 •0 8 8 4 -1 i 2 2 2 3 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 9 7 4 2 1 0 0 0 0 0 4 7 5 5 7 0 1 3 3 2 2 5 -3 -3 -2 0 3 4 4 2 1 0 -0 -0 -0 9 -0 -0 0 4 3 5 5 8 1 7 1 2 3 6 a 9 8 5 -2 -3 -2 3 -0 4 2 9 4 3 4 5 2 3 1 1 0 2 •0 2 •0 3 -0 3 -0 2 -0 5 0 3 0 9 4 1 7 0 0 3 4 4 5 4 8 5 2 1 0 0 0 0 0 8 a a -2• 102 0 5 7 1 1 0 5 2 1 2 2 2 1 3 2 7 3 1 5 6 0 3 4 1 4 6 5 0 3 7 9 9 7 9 1 5 6 9 1 5 3 •0 5 6 -0 4 3 0 2 2 0 6 1 0 1 0 7 8 1 4 5 5 3 1 0 -0 •0 •0 -0 •0 1 0 •0 4 -1 • 12 2 1 5 1 a 4 3 1 0 -0 7 -0 6 -0 0 0 3 0 4 0 4 8 S 8 • 1 -3 2 -4 5 -4 0 • 10 2 1 3 4 2 a 0 3 -0 9 -0 9 -0 a -0 a -0 a •0 a Figure 5.11: The exact numerical results from iterations 1,2.3, and 4 from the method of Conjugate Gradients for the Cross test image. Figure 5.12: The visual representation for the results of iterations 1.2,3 and 4 for the Cross test image. Comparison with the numerical results in the previous figure shows the digitization effects. F i g u r e 5.13: T h e v i s u a l representation for the results of iterations 1,3.4 a n d 5 for the L test i m a g e . T h e same grey scale is used for the d i g i t i z a t i o n as before. Chapter 5. Numerical Application 100 Figure 5.14: The visual representation for the results of iterations 2,3,4 and 5 for the Box test image. The same grey scale is used for the digitization as before. Chapter 5. Numerical Itn 1 2 3 4 5 6 7 8 9 10 101 Application Pnorm 6.953 X 10 5.414 X 10 8.500 X 10 2.303 X 10 7.209 X 10 8.019 X IO 1.808 X 10 2.844 X IO 3.835 X IO 9.529 X IO 5 5 4 5 4 4 5 4 4 4 a 2.210 x 9.745 x 5.664 x 9.353 x 5.573 x 5.211 x 8.095 x 3.472 x 5.439 x 1.756 x Resi du al 0 5.165 x IO 10" -5 i 4.253 x 102.576 x IO 10" -5 2 3.353 x 102.499 X IO 10" 2.274 2.186 X IO 10"-5 4 1 1.984 x 10" 1.745 X IO 10" -5 8.527 x 10" 1 1.705 X IO 10" 2.057 1.570 X IO 10"-5 -4 2 1.512 X IO 9.461 x 10" 10" -5 1.067 1.502 X IO 10" 4 2.304 1.426 X IO 10" •5 3 3 3 3 3 3 3 3 3 3 Table 5.12: The analysis of the parameters, direction norm and residual for the first 10 iterations of Conjugate Gradients for the Cross test example. 3. In the case of the " L " such an increase does not occur until iteration 7; however the error does not increase as rapidly as for the "Cross". The "Box" is more troublesome as it is not until iteration number 9 that any substantial increase is demonstrated; (10 2 for a). In each case fluctuation in the magnitude of both a and 8 does occur after four iterations. The large magnitude of the singular values of the coefficient matrix accounts for the small values for the a's. 5.2.1 Termination If the iteration process is continued too far the picture becomes unstable. This is the expected feature of solutions to ill-posed problems. The question arises as to when to terminate the process for practical cases where the exact error is not available. A n exact solution to the question (as with other solution methods for ill-posed problems) is not possible; however, there are various guidelines that may be considered. 1. Since each iteration is a relatively uncostly process one option is simply to view each of the images. This is the simplest, possibly least effective method, and involves no Chapter 5. Numerical Application Itn 1 2 3 4 5 6 7 8 9 10 Pnorm 4.969 X 10 2.496 X 10 7.280 X 10 9.610 X 10 4.718 X 10 6.886 X 10 9.709 X IO 1.665 X IO 2.788 X IO 8.449 X IO 5 5 4 4 4 4 4 4 4 4 a 1.828 x 1.190 x 3.057 x 2.382 x 5.249 x 3.678 x 9.987 x 4.228 x 6.556 x 1.280 x 102 3 0 -4 2.087 x 10" i 10" 10"-5 9.248 x 10- 2 10"-4 9.449 x l O 1 10"-4 3.044 x 10- 1 1.184 10"-5 -5 1.249 10" 10"-4 9.486 x 10"-2 1.363 10"-5 -4 2.867 io- -5 - io- Residual 3.038 x IO 1.760 x IO 1.718 x IO 1.371 x IO 1.077 x IO 1.049 x IO 9.475 x 10 9.036 x IO 8.941 x 10 8.386 x IO 3 3 3 3 3 3 2 2 2 2 Table 5.13: The analysis of the parameters, direction norm and residual for the first 10 iterations of Conjugate Gradients for the L test example. Itn 1 2 3 4 5 6 7 8 9 10 Pnorm 1.460 x 10 1.358 x 10 3.222 x 10 3.346 x 10 1.493 x 10 5.567 x 10 1.249 x 10 3.220 x 10 1.022 x 10 2.388 x 10 5 5 4 4 4 3 4 3 4 3 a 2.439 x 9.372 x 8.436 x 2.860 x 1.987 x 8.169 x 7.085 x 2.332 x 1.246 x 3.817 x io--5 io--4 -5 10" 10"-4 10"-4 10"-4 10"-4 10"-3 10 -3 10"-5 3 0 5.560 x 10" i 7.814 x 10" 2 6.841 x 10- 1 2.454 x 10" 1 1.618 x 10" 1 1.875 1.477 x 10- 1 2.857 1.571 x 10" 1 Residu al 1.517 x 10 1.092 x 10 6.413 x 10 4.797 x 10 4.463 x 10 4.227 x 10 3.811 x 10 3.592 x 10 3.228 x 10 3.226 x 10 3 3 2 2 2 2 2 2 2 2 Table 5.14: The analysis of the parameters, direction norm and residual for the first 10 iterations of Conjugate Gradients for the Box test example. Chapter 5. Numerical 103 Application mathematics at all. The decision, as to when to stop, is then at the discretion of the user. It will certainly be only a few iterations and since it was seen that after reaching the minimum the exact error will increase monotonically there is not a high risk of excessive work being performed using this method. 2. The magnitude of the constants from the algorithm may be used. In the previous chapter it was suggested that consideration of the values of the parameters and Bi which are readily available from the algorithm may give a useful guide on the behaviour of the error. A n apparent rapid increase in the size of either the a; or suggests that a convenient termination point has been achieved. 3. The singular value decomposition of the matrices K\ and K 2 only 0(n ) s operations. can be achieved in It is possible then to obtain the singular values of the coefficient matrix Kj and its decay can be used to consider the point at which a truncated SVD may be terminated and thus a convenient number of iterations to be performed by the method of Conjugate Gradients. Chapter 6 Discussion This thesis has demonstrated that the method of Conjugate Gradients may be used for the efficient computation of a smooth approximation to the solution of large sparse illconditioned linear system of equations. Clear reasoning has been given for the successful filtering, of the oscillatory components, at each iteration and the method of Conjugate Gradients has been shown to give similar results to the method of truncating the S V D with significantly less computational effort. The work forms a basis for a more quantitative study for specific applications. Various points are raised : • The decision as to when to terminate the iterative process is a complex problem. In cases where a priori knowledge is available regarding the singular value spectrum of the system a similar criterion over the choise of regularization parameter to that of truncating the SVD [40] is recommended. In most practical applications all that is available are the parameters a:; and 8i from the algorithm. It is felt that these may be used in forming a termination criterion but since they are dependent on the noise in the system any generalization is not possible. • It has been shown that the polynomial at the fcth iteration is the solution to a weighted least squares problem. The weights are specific to the problem, making exact predictions as to the characteristics of the polynomial very difficult. Insight into the behaviour of the polynomial at the /cth iteration was made possible by demonstrating that it approximates the interpolating polynomial through zero and 104 Chapter 6. 105 Discussion the first k singular values with the value one as the ordinate. With this simplification a qualitative discussion of the filtering properties was presented. The similarity of the interpolating polynomial (when considering evaluation only at the singular values) to the step function means that the filtering properties of the method of Conjugate Gradients are favourable for use as an iterative filtering technique. A more precise description of the polynomial is highly complex due to the dependence of the coefficients on the "noise" in the data. Further study into the weighted least squares problem (4.110) would enable a more precise evaluation of the features and may give more guidance on how to set a termination criterion. • Although preconditioning is seen to improve the performance of the method of Conjugate Gradients (see for example [25]) it is not obvious that this is the case for ill-conditioned systems. In many cases the number of iterations required by the basic algorithm to achieve the best smooth solution is acceptably small, and the additional cost that will be incurred by use of a preconditioner is not justified. Preconditioning has the effect of increasing the rate of convergence to an approximation that has a required accuracy in terms of the residual. In ill-conditioned cases, where the residual is a poor estimate of accuracy, this acceleration may cause the algorithm to continue past the best smooth solution without ever computing it, so that the slower convergence without preconditioning is preferable. Systems for which the noise level is very low are effectively well-conditioned and preconditioning, using the algorithm detailed in §3.4.2, may be recommended to improve a prohibitively slow iteration process. Further work is recomended to assess the possibility of effective application of the preconditioned Conjugate Gradients method to ill-conditioned linear systems. Bibliography [1] R.S.Anderssen et. al., Application of Numerical Solution of Integral Equations, Sijthoff & Noordhoff, (1978). [2] E.S. Angel and A . K . Jain, "Restoration of images degraded by spatially varying pointspread functions by a conjugate gradient method", Applied Optics 17, (1978). [3] O.Axelsson, " A generalized SSOR method", BIT 12, (1972) 443-467. [4] O.Axelsson, " A class of iterative methods for finite element equations", Comp. Meth in App. Mech & Eng. 9, (1976) 123-137. [5] C.T.E. Baker, Tiie Numerical Treatment of Integral Equations, Clarendon Press, (1977). [6] C.T.E. Baker, C Fox and C. Miller, "Numerical Solution of Fredholm Integral Equations of the First Kind", Comp. J. 7 (1969), 141-148. [7] A . Bjorck, "Least Squares Methods", Handbook of Numerical Analysis, Ciarlet and Lions, editors, (to appear). [8] A. Bjorck, E. Elden, "Methods in Numerical Algebra for Ill-Posed Problems", LithMat-R-33-1979, (1979). [9] L l . G . Chambers, Integral Equations : A Short Course, International Textbook Company, (1976). [10] R. Chandra, "Conjugate Gradient Methods for Partial Differential Equations", Ph.D. Thesis Yale Univ. (1978). 106 Bibliography 107 [11] C. Concus, G . Golub and Meurant, "Block Preconditioning for the Conjugate Gradient Method", SIAM.J.Sci.Stat.Comp 6, (1985). [12] C. Concus, G . Golub and D. O'Leary, " A Generalized Conjugate Gradient Method for the Numerical Solution of Elliptic Partial Differential Equations", Sparse Matrix Computations, Bunch and Rose, editors, (1976). [13] L . M . Delves and J.Walsh, Numerical Solution of Integral Equations, Clarendon Press, Oxford, (1974). [14] L.Elden and I. Skoglund, "Algorithms for the Regularization of ill-conditioned least squares problems with tensor product structure and application to space-varient image reconstruction", Lith-Mat-R-1982-48, (1982). [15] A.George and J.W. Liu, Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall Series in Computational Mathematics, (1981). [16] G.Golub and. C F . Van Loan, Matrix Computations, John Hopkins (1985). [17] J . Graves and P . M . Prenter, "Numerical Iterative Filters applied to First Kind Fredholm Integral Equations", NumerMath 3 0 (1978) 281-299. [18] C . W . Groetsch, The theory of Tikhonov Regularization for Fredholm Equations of the First Kind, Springer-Verlag, Berlin, (1984). [19] P.C. Hansen, "Computing the Singular Value Expansion", Computing (to appear). [20] M . R . Hestenes, "Pseudoinverses and Conjugate Gradients" Comm. A.CM. 18, (1975) 40-43. [21] M . R . Hestenes, (1980). Conjugate Direction Methods in Optimization, Springer-Verlag, Bibliography 108 [22] M . R . Hestenes and E.Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems" J.Res.Nat.Bur.St 4 9 (1952) 409-436. [23] P. Lancaster and M . Tismenetsky, The Theory of Matrices with Applications, Computer Science and Applied Mathematics, Academic Press, (1985). [24] L.Landweber "An iterative formula for Fredholm integral equations of the first kind", Am.JMath 73, (1951), 615-624. [25] J . A . Meijerink and H.A. van der Vorst, " A n Iterative Solution Method for Linear Equations Systems of which the coefficient Matrix is a Symmetric M-Matrix", JMath.Comp 31 (1977) 148-162. [26] J.LI. Morris, Computational Methods in Elementary Numerical Analysis, John W i ley & Sons (1984). [27] D. O'Leary and J.Simmons, " A Bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems", SIAM.J.Sci.Stat.Comp 2 (1981). [28] D . L . Phillips, " A technique for the numerical solution of certain integral equations of the first kind", J. ACM. 9 (1964) 84-97. [29] J . K . Reid, "On the method of Conjugate Gradients for the Solution of Large Sparse Systems of Linear Equations", Large Spase Sets of Linear Equations, J . K . Reid editor, Academic Press, New York, (1971), 231-254. [30] J.K.Reid, "The use of conjugate gradients for systems of linear equations possessing Property A " , SIAM. J. Num. Anal. bf9 (1972) 325-332. [31] A . Rosenfeld and A . C . Kak, Digital Picture Processing Vol. 1, Computer Science and Applied Mathematics, Academic Press, (1982). Bibliography 109 [32] Y . Saad, "Practical use of Polynomial Preconditioners for the Conjugate Gradients Method", SIAM.J.Sci.Stat.Comp 6 (1985). [33] M . A Saunders, "Sparse Least Squares by Conjugate Gradients : A comparison of Preconditioning Methods", Rep. Sol. 79-5, Systems Optics Laboratory, Stanford C.A., (1979), 15-20. [34] F. Smithies, Integral Equations, Cambridge University Press, (1958). [35] W. Squire, "Solution for Rl-Conditioned Unear systems arising from Fredholm Equations of the first kind by steepest descent and Conjugate Gradients", Int.J for Num.Meth in Eng 1 0 (1976), 607-617. [36] O.N. Strand, "Theory and Methods related to the singular-function expansion and Landwebers iteration for integral equations of the first kind", SIAM.J.Num.Anal. 1 1 (1974), 798-825. [37] A . N . Tikhonov and V . Y . Arsenes, Solutions of ill-posed problems, John Wiley &; Sons, (1977). [38] J . M . Varah, "On the numerical solution of ill-conditioned linear systems with applications to ill-posed problems", SIAM.J.JVum.Anai. 1 0 (1973), 257-267. [39] J.M.Varah, "A practical examiniation of some Numerical Methods for Linear discrete ill-posed problems", SI AM Review 2 1 (1979), 100-111. [40] J.M.Varah, "Pitfalls in the Numerical solution of linear ill-posed problems", SIAM.J.Sci.Stat.Comp. 4 (1983), 164-176. [41] R.Varga, Matrix Iterative Analysis, Prentice-Hall, (1962). Bibliography 110 [42] C.R.Vogel, "Solving ill-conditioned linear systems using the Conjugate Gradients Method", preprint, Dept of Math Sc., Montana State Univ., (1988). [43] G. Wahba, "Practical approximate solution to linear equations when the data is noisy", SIAM.J.Num.Anal Vol 14 (1979), 651-667. [44] Y.S. Wong and H. Jiang, "Approximate Polynomial Preconditioning Applied to B i harmonic Equations on Vector Supercomputers", NASA Tech Mem 100217 ICOMP87-5 (1987). [45] D . M . Young, Iterative solution of Large Linear Systems, Academic Press, (1971). Appendix A Appendix This appendix details the particular results required from the theory of Kronecker products for the application in Chapter 5. A.l Kronecker Products The formulation of the mathematical image problem used in Chapter 5 requires the use of the Kronecker product of matrices. The following results are summarized from [23]. 1. The Kronecker (tensor) product of two (n x n) matrices results in an ( n x n ) 2 2 matrix and is performed as follows ^ a B n A®B = a \B 2 aB ... a B ... 12 22 \ &n\B aB ln ^ a B 2n aB nn ) 2. The result of multiplying the Kronecker product of two (n X n) matrixes by an (n ) 2 vector is the (n x n) matrix given by (A ® B)x = AXB (A.153) 3. The Kronecker product of two matrix-products is equivalent to the matrix product of two Kronecker products performed in order, i.e., (AB) ® (CD) = {A ® C)(B ® D) 111 (A.154) Appendix A. Appendix 112 4. The transpose of a Kronecker product is equivalent to the Kronecker product of the transpose of both matrices, i.e., (A <g> B) =A T T <g> B T (A.155) 5. If the singular values of the matrix A are p for s = 1, 2 , . . . ,n and the singular a values of the matrix B are A for r = 1,2, ...n r then the singular values of the matrix that is the Kroneker product of A and B (A ® B) are the n values 2 Xp. r 3 6. The expansion of an (n x n) matrix into an n -vector is achieved by stacking the 2 rows of the matrix end on end. ft F ( fi \ fn J f \
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The numerical treatment of ill-posed problems using...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The numerical treatment of ill-posed problems using the method of conjugate gradients Ross, Sally A. 1988
pdf
Page Metadata
Item Metadata
Title | The numerical treatment of ill-posed problems using the method of conjugate gradients |
Creator |
Ross, Sally A. |
Publisher | University of British Columbia |
Date Issued | 1988 |
Description | This thesis examines the use of the method of Conjugate Gradients as an iterative method to be applied to linear system of equations that are ill-conditioned. An overview of the particular problems associated with the solution of ill-conditioned linear systems is given and the method of Conjugate Gradients described. It is shown that, at each iteration, the method of Conjugate Gradients weights the contribution from the singular vectors using a polynomial that is characterized as the solution to a weighted least squares problem. The form of the polynomials is shown to approximate a series of interpolating polynomials that constitute an efficient filtering technique required for computing solutions to ill-conditioned systems. The performance of the method of Conjugate Gradients is shown to compare favourably with other accepted methods for ill-conditioned linear systems. An application from the field of image processing is given and the efficient computation of a smooth reconstructed image from a defocussed picture is demonstrated. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-09-09 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0079413 |
URI | http://hdl.handle.net/2429/28329 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1988_A6_7 R65.pdf [ 7.14MB ]
- Metadata
- JSON: 831-1.0079413.json
- JSON-LD: 831-1.0079413-ld.json
- RDF/XML (Pretty): 831-1.0079413-rdf.xml
- RDF/JSON: 831-1.0079413-rdf.json
- Turtle: 831-1.0079413-turtle.txt
- N-Triples: 831-1.0079413-rdf-ntriples.txt
- Original Record: 831-1.0079413-source.json
- Full Text
- 831-1.0079413-fulltext.txt
- Citation
- 831-1.0079413.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0079413/manifest