T H E N U M E R I C A L T R E A T M E N T O F I L L - P O S E D P R O B L E M S U S I N G T H E M E T H O D O F C O N J U G A T E G R A D I E N T S By Sally A. Ross B. Sc. (Mathematics) University of Leeds, England. A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S I N S T I T U T E O F A P P L I E D M A T H E M A T I C S We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A September 1988 © Sally A. Ross, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it , freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date D E - 6 ( 3 / 8 D Abstract This thesis examines the use of the method of Conjugate Gradients as an iterative method to be applied to linear system of equations that are ill-conditioned. An overview of the particular problems associated with the solution of ill-conditioned linear systems is given and the method of Conjugate Gradients described. It is shown that, at each iteration, the method of Conjugate Gradients weights the contribution from the singular vectors using a polynomial that is characterized as the solution to a weighted least squares problem. The form of the polynomials is shown to approximate a series of interpolating polynomials that constitute an efficient filtering technique required for computing solutions to il l-conditioned systems. The performance of the method of Conjugate Gradients is shown to compare favourably with other accepted methods for ill-conditioned linear systems. An application from the field of image processing is given and the efficient computation of a smooth reconstructed image from a defocussed picture is demonstrated. n Table of Contents Abstract i i List of Tables v i List of Figures v i i Acknowledgement v i i i 1 Introduction 1 1.1 Sensitivity of Linear Systems 7 2 The Numer ica l Treatment of Integral Equations 12 2.1 The Continuous Problem 13 2.2 The Discrete Problem 17 2.3 Regularization Techniques 23 2.3.1 Regularization of the Problem 24 2.3.2 Regularization of the Solution : Iterative Filtering 27 3 The M e t h o d of Conjugate Gradients 30 3.1 The Method of Conjugate Directions 31 3.2 The Method of Conjugate Gradients 36 3.2.1 The Conjugate Gradients Algorithm 36 3.2.2 Properties of the Method of Conjugate Gradients 39 3.2.3 Conjugate Gradients as an Iterative Method 42 iii 3.2.4 Convergence Rate 43 3.3 Preconditioned Conjugate Gradients 45 3.3.1 Generalized Conjugate Gradients 45 3.3.2 Practical Implementation of Preconditioned Conjugate Gradients 47 3.3.3 Common preconditioners 49 3.4 Conjugate Gradients for Linear Least Squares 51 3.4.1 Linear Least Square Algorithm 51 3.4.2 The Preconditioned Algorithm for Least Squares 53 4 The Fi l te r ing Propert ies of the Me thod of Conjugate Gradients 54 4.1 Iterative Filtering 55 4.2 The Filtering Function 56 4.3 Characterization of the Operator 62 4.3.1 Precise: Minimization of the Deviation from Zero 63 4.3.2 Approximate: Interpolation 64 4.3.3 Extension to Later Iterates 68 4.4 Approximation to the Inverse 72 4.5 Numerical Tests 74 5 Numer ica l App l i ca t ion 86 5.1 A Description of the Problem 86 5.1.1 The Mathematical Reconstruction of Degraded Pictures 86 5.1.2 The Test Example 89 5.2 Discussion of Results 90 5.2.1 T e r m i n a t i o n 101 6 Discussion 104 iv Bibl iography 106 Appendices m A Append ix H I A . l Kronecker Products I l l v List of Tables 2.1 Comparison between the SVE and the SVD 19 4.2 Coefficients for the first 3 polynomials for the Trapezoidal example . . . 60 4.3 Coefficients for the first 4 polynomials for the Phillips' example 62 4.4 Comparison of accuracy and work for the Hilbert example 75 4.5 Comparison of accuracy and work for the Trapezoidal example 77 4.6 Comparison of accuracy and work for the Simpson's example 77 4.7 Comparison of accuracy and work for the Phillips' example 78 4.8 Conjugate Gradient analysis for the Hilbert example 82 4.9 Conjugate Gradient analysis for the Trapezoidal example 83 4.10 Conjugate Gradient analysis for the Simpson's example 84 4.11 Conjugate Gradients analysis for the Phillips' example 85 5.12 Analysis from the Conjugate Gradients algorithm for the Cross example . 101 5.13 Analysis from the Conjugate Gradients algorithm for the L example . . . 102 5.14 Analysis from the Conjugate Gradients algorithm for the Box example . 102 vi List of Figures 4.1 Examples of filtering polynomials 61 4.2 Examples of the filter values for the Hilbert example 70 4.3 Examples of the filter values for the Trapezoidal example 71 4.4 The singular value spectra for the 4 test examples 76 4.5 Chosen solution for the Simpson and the Phillips' test cases 79 5.6 The digitized sharp test images 91 5.7 The digitized defocussed test images 93 5.8 Numerical representation of the degraded test images 94 5.9 The error for each of the test images 95 5.10 The singular values of the reconstruction matrix 96 5.11 Numerical results from iterations for the Cross 97 5.12 Visual representation of iterates for the Cross 98 5.13 Visual representation of iterates for the L 99 5.14 Visual representation of iterates for the Box 100 v i i Acknowledgement I am indebted to my supervisor, Dr James Varah, for his constant patience and expert guidance regarding the preparation of this thesis and throughout my studies in Van-couver. Special thanks go to my family in Belgium, for always being so supportive, but especially for paying the telephone bill. I would like to acknowledge, with particular grat-itude, the Gibbins family for giving me a stress-free home to enable this to be completed. This thesis may have been abandoned had it not been for the support and encouragement given by Steve Adams, I am very grateful to him for so many reasons. vin Chapter 1 Introduction The computation of solutions to problems for which the exact solution does not depend continuously on the supplied data is challenging; such problems are said to be ill-posed and encountered frequently in various application areas. Solutions are difficult to find since small changes in the data may alter the solution considerably. The data for many physical problems may be generated as the results gathered from practical experimen-tation and small errors are inherent in the problem. Any numerical computation may introduce further small errors through the effects of roundoff. These small errors, which are effectively due to "noise" in the system must not be reflected as large oscillations in the solution. It is of great practical importance to be able to efficiently compute a meaningful solution to such problems. This thesis examines the use of the method of Conjugate Gradients for computing the numerical solution of linear systems of equations which arise from ill-posed problems. One source of ill-posed problems is that of the solution to linear integral equations of the first kind such equations arise in many applications, e.g., chemistry and image processing. The op-erator, K(-, •), has the characteristic that it smooths the function, /(•), in the mapping /(') —* 9(')- The function, g, will always be smooth, even in cases where / is not contin-uous. A solution will not exist for arbitrary g. The stronger the effect of the smoothing is, the more difficult it will be to evaluate the solution. The numerical treatment of (1.1) (1.1) 1 Chapter 1. Introduction 2 is often achieved b y first discretizing the equation to f o r m a linear system, Ax=b (1.2) where A 6 9ft"xn a n d !i E B". T h e i l l -posed p r o p e r t y of the p r o b l e m is reflected i n the inherent i l l -condit ioning of the coefficient m a t r i x A. T h e concern of this thesis is the n u m e r i c a l solution of i l l -conditioned l inear systems (1.2); the theory for the solution of the continuous p r o b l e m (1.1) is used for u n d e r s t a n d i n g a n d m o t i v a t i o n . T h e p r o b l e m itself is difficult to solve. E x a c t solutions m a y be impossible to find. A solution (if it exists) m a y be n o n - u n i q u e a n d h i g h l y unstable, e.g., i n the field of image processing m a n y very different original pictures m a y be i n d i s t i n g u i s h a b l e w h e n the image is " b l u r r e d " i n some way. T h e strategy is instead to calculate a n a p p r o x i m a t e solution that wi l l be acceptable. It is therefore necessary to state conditions u n d e r w h i c h a solution w i l l be deemed acceptable; such as some degree of smoothness i n the solution. T h i s wi l l then ensure that the effect of the s m a l l errors, to cause the c a l c u l a t i o n of misleading results, is m i n i m i z e d . A n acceptable solution is therefore one that is stable a n d that wi l l give a s m a l l residual error. O n c e discretized, the vector b m a y not he i n the range of A. T h e first step is to require that the p r o b l e m (1.2) be stated i n s t e a d as the m i n i m i z a t i o n m i n ||Ax - 6||2 (1.3) w h i c h is the usual linear least squares p r o b l e m . T h e p r o b l e m (1.3) wi l l have a solution b u t the i l l -condit ioning of A means that two estimates that are equal ly accurate i n terms of the solution to (1.2) m a y be very different f r o m each other. T h e need to be able to c o m p u t e a stable solution of (1.3) has m o t i v a t e d the strategy of i n c l u d i n g i n the p r o b l e m a restriction o n the class of allowable solutions. M o t i v a t i o n from the continuous p r o b l e m , [6], demonstrates that the difficulties stem f r o m the i n c l u s i o n , as components of the s o l u t i o n , of vectors that are highly oscillatory. Chapter 1. Introduction 3 The "noise" in the system has an exaggerated effect on the contribution of these vectors so that they are given inaccurate weightings that may be disproportionately high. An approximation is formed so that the contributions from these unstable components is reduced, or excluded. The class of allowable estimates is thus restricted to be functions that are smooth but will (hopefully) still be relatively accurate. The motivation leads to the technique of Regularization. Regularization is used in general to mean the restatement of an ill-posed problem as a well-posed one that is made close to the original by the use of a small parameter. The parameter controls the extent of the regularization. The idea of regularization was first introduced by Tikhonov [37]. The original problem (1.3) is restated as a constrained least squares problem to restrict the solution to be smooth whilst minimizing the residual error. Understanding of the continuous problem demonstrates that the idea of regularization can be seen to be equivalent to including in (1.1) a small contribution from the function / to the right hand side of the equation and thus converting the equation to one of the second kind. The solution of second kind integral equations is a well-posed problem [5, 9]. The il l -conditioning of the coefficient matrix is now no cause of difficulty and the corresponding discrete problem may be solved by any direct method. In practical applications the system of linear equations generated is often very large and sparse. The use of direct solution methods is restricted. Iterative techniques can be used successfully by setting the parameter to zero and regularizing the solution by terminating the iterative process before the oscillatory components of the solution have converged. The procedure is known as an Iterative Filtering technique and was first studied by Landweber [24]: a series of less and less regularized solutions is generated by filtering the contributions of the unstable vectors. The extent of the regularization is controlled by the choice of the iterative method and the number of iterations performed. Chapter 1. Introduction 4 Various iterative methods have been studied, [17, 36]. Each method generates an asso-ciated filtering function. The purpose of this thesis is to examine the filtering properties of the iterative method of Conjugate Gradients. The method of Conjugate Gradients is an efficient numerical iterative algorithm for computing the solution of a linear system of equations, (1.2). At each step the majority of the work effort is found in the formation of a single matrix-vector product with the original coefficient matrix; for this reason the scheme is particularly suited to large sparse systems of linear equations. The original presentation of the method of Conjugate Gra-dients [22], was as a minimization process that terminated in a finite number of steps; thus as a direct numerical method. It was soon discovered that although in a practical environment the process did not terminate as predicted, a good estimate of the solution is achieved after only a few steps. The basic algorithm may be simply modified to give an efficient solution method for the linear least squares problem (1.3). In [25] it was first shown that the performance of the method could be further improved by the notion of preconditioning. The method of Conjugate Gradients, particularly when preconditioned, was seen to be a very powerful and economical iterative procedure. The method of Conjugate Gradients has been applied to ill-conditioned systems of equations [35] and [8] with little explanation as to the reasons for its success. Recently, Vogel [42] demonstrated that the method of Conjugate Gradients does have the proper-ties of a favourable iterative filtering method. This thesis extends the ideas in [42] by examining the results of the filtering process in more detail. A number of discretized integral equations are provided as numerical examples, including an application from the field of picture processing; these are more general than the illustrations given in [42]. The method of Conjugate Gradients is shown to be an economical method to compute a smooth, accurate solution when the system of equations is ill-conditioned. The method is successful without the need to apply a preconditioner; in fact any preconditioning may Chapter 1. Introduction 5 inhibit the ability of the method to calculate a smooth approximating. Although the integral equation (1.1) is a source of the features of the problem, it is the solution of the resulting linear system which is the topic here. The material in later chapters requires the understanding of the sensitivity of linear systems to small errors in the data. As an introduction the basic notion of conditioning is detailed in §1.1 at the end of this Chapter. The numerical treatment of integral equations is the subject of Chapter 2. The continuous equation (1.1) is examined in §2.1 where Fredholm integral equations of the first kind are shown to be a source of ill-posed problems. The analogies between the continuous and discretized problem are presented in §2.2. With this motivation the technique of regularization is discussed in §2.3. The original method of regularization, presented in §2.3.1, is shown to be inappropriate for solution by iterative procedures. §2.3.2 describes the approach to the problem by using the iteration count as the parameter to control the extent of the regularization. A description of the method of Conjugate Gradients is presented as Chapter 3. The method of Conjugate Gradients is shown to belong to the general class of Conjugate Directions schemes in §3.1, and the theory on which the development of the algorithm is based is discussed. The specific case of the method of Conjugate Gradients is the subject of §3.2. The algorithm is detailed in §3.2.1 and the properties required in later chapters are discussed in §3.2.2. The alteration in the manner in which the algorithm is considered, (from direct to iterative) is discussed in §3.2.3 and the evaluation of the convergence rate presented in §3.2.4. The improvement of the method by preconditioning techniques is given in §3.3. Preconditioning is motivated by the idea of a Generalized Conjugate Gradients method in §3.3.1 and the practical implementation detailed in §3.3.2. Some of the common choices for the preconditioner are described in §3.3.3. §3.4 shows that the method of Conjugate Gradients can be successfully used to solve the linear least squares Chapter 1. Introduction 6 problem (1.3); an efficient algorithm is given in §3.4.1 and the preconditioned version presented in §3.4.2. The use of the method of Conjugate Gradients as an iterative filtering technique is examined in Chapter 4. The features that are required by any iterative method to act as an effective filter are discussed in §4.1. The relations that arise in the algorithm are given and the polynomial that forms the filtering function developed in §4.2. In §4.3 the polynomial is shown to approximate the Lagrangian interpolant through the singular values and thus be an effective filter; a number of numerical illustrations are provided. The polynomial may also be viewed as an expansion of the approximation of the inverse of the coefficient matrix; this approach, to the explanation of the properties, is examined in §4.4. §4.5 discusses the practical application of Conjugate Gradients as an iterative filter. Results are compared with other recognised solution methods for ill-conditioned systems. The method of Conjugate Gradients is shown to be a competitive alternative. An application from the field of Picture Reconstruction is examined in Chapter 5. A description of the problem and its mathematical formulation is detailed in §5.1. The mathematical formation of general problems from image processing is discussed in §5.1.1 and the specific example used for the numerical tests derived in §5.1.2. The results of the numerical tests are presented in §5.2. Chapter 6 is a discussion of the findings of this thesis. As a preliminary to the problems that are the topic of the later chapters the following section defines what is meant by a linear system to be described as ill-conditioned. Results concerning the Kronecker product of matrices, summarized for use in Chapter 5, are detailed in the Appendix. Chapter 1. Introduction 7 1.1 Sensit ivity of Linear Systems An important question in the numerical computation of the solution to linear systems of equations is the effect on the solution of small changes in either the elements of the coefficient matrix, or in the data forming the right hand side. Numerical systems such as (1.2) frequently contain small errors which arise either in the way the data is collected or in the numerical calculations performed in the solution process. These errors mean that the original problem is rarely represented exactly in its numerical form and the resulting system is some perturbation of the original problem. There are two considerations: • How the exact solution, x~, changes when the system is perturbed ; • How the computed solution, x, differs from the exact solution. The first issue concerns the sensitivity of the system and is dependent only on the problem itself, but the error in the calculated solution is a feature of the chosen solution method. If a system is non-singular, so that A"1 can be formed, there is a unique solution to (1.2): x" = A~xb for each vector b. However the matrix may be "almost" singular in the sense that small perturbations of the elements will result in a matrix that is rank deficient. In this case, although (mathematically) the system has a unique solution, computationally it has the features of a singular system, so that a solution (if one exists) may not be unique. The smaller the alterations to the matrix A that produce a singular matrix A + 8A are, the greater the problems in computing the solution to (1.2). A measure of the extent of the near-singularity of A, and so the relative magnitude of the difficulties, is given by the condition number of the matrix K(A) = ^ = \\A\\2\\A~%. (1.4) Chapter 1. Introduction 8 where o~i > . . . > 0 denote the singular values of the matrix A. (Singular values are the square roots of the eigenvalues of AT A.) If A + 6A is singular then the 2-norm of the smallest such perturbation matrix is equal to the magnitude of the smallest singular value, i.e., ||£A|]2 = oo (2-10) J a This means that if h(s) is a solution to (2.7) then h(s) = h(s) + Csin(ra.s) for any constant C is also an approximate solution since (for large n), rb / K(s,t)[h(s) + C sin(ns)]ds -> g(t) as n -> oo. (2.11) J a Large values of n = TV that give rise to a slight perturbation of the data fb K(s,t)[h(s) + Csin(Ns)]ds = t K(s,t)h(s)ds + C t K{s,t)sm(Ns)ds J a J a J a = g(t) + where e < C l = m correspond to a solution h(s) = h(s) + Csm(Ns) ~ h(s) + C where C may be arbitrarily large. Chapter 2. The Numerical Treatment of Integral Equations 15 In order to solve (2.7) it is necessary to resolve the problem of how to choose, from the many solutions that are available, a solution that is reasonable. A solution will be accepted as reasonable if it is stable and will give a small residual, i.e., || J K(s,t)f(s)ds-g(t)\\_~0. (2.12) The class of allowable solutions must in some way be restricted. One approach is to require that the solution f(s) be smooth. Whether this smooth choice is the "best" is difficult to assess; it is, however, the exact solution to some small perturbation of the equation (2.7) and will have guaranteed stability, and so is a reasonable solution. The singular value expansion, (SVE), of the system (2.7) can be used to understand the problem of selecting a reasonable solution, [6, 19], and to give an analysis of the existence of a smooth solution. For any square integrable kernel K(s,t) there exist two sets of orthogonal functions {$;(£)} and {^(s)} known as the left and right singular functions respectively. Associ-ated with these functions are scalars pi such that J K(s,t)Vi(s)ds = pi$i{t) and J §i(t)K(s,t)dt = UiVi(s). (2.13) The pi are the singular values of the kernel K(s,t) and since oo K(s,t)=Y,piVl(s)i(t), (2.14) i = l in order for the kernel to be L2 the sum (2.14) must converge and therefore zero is the only limit point of the singular values, so that, pi —> 0 as i —> oo. From this oo and pi are the components of the singular value expansion of K(s,t). For a reasonable solution to exist then, the 77; must decrease to zero at a faster rate than the decay of the scalars /z; [1]. This will ensure the convergence of (2.17). The decay of the singular values is a function of the smoothing property of the kernel. The smoother the kernel is, the faster the singular values will decay. This increases the i l l-conditioning and places more severe restrictions on an admissable solution. The integral equation (2.7) can be treated numerically by first discretizing it using some quadrature rule to produce a linear system (1.2). A set of n linear equations are thus created : n J$2wiK(siltj)f{si)=g(ti)ioii = l,...,n (2.18) where {wi} are the quadrature weights at the nodes s; £ [c,d]. The points U are chosen from [a,b]. This gives the linear system (1.2) formed by A = (aij) where a^ = WjK(si,tj) (2-19) Xi = f(si) and hi = g(ti) ^ (2.20) In the continuous problem infinitesimal changes in g(t) or K(s, t) may produce a finite Chapter 2. The Numerical Treatment of Integral Equations 17 change in f(s). This is reflected in the discrete version by the inherent ill-conditioning of the matrix A (in the sense detailed in §1.1). 2.2 The Discrete P rob lem The linear system Ax = b (2.21) with A 6 3R n X n and b £ 3?n that is equivalent to the continuous problem (2.7) is created by discretization of the integral equation by a chosen quadrature method. (The linear system has the properties discussed in §1.1.) The purpose here is to investigate the quality of a numerical solver for general ill-posed problems, so the exact solution sought is taken to be that of the discrete problem (2.21). How accurately the solution to the discrete problem x matches the continuous function f(s) is a separate consideration that involves the truncation error introduced by the discretization. Any quadrature method with reasonably small truncation error will be acceptable and it will result in an ill-conditioned system of equations. Different discretizations do not lead to significant differences in the underlying ill-conditioning of the matrix, only in the accuracy with which the discrete solution resembles its continuous counterpart. The use of a quadrature method to discretize the problem may result in a system such that the data vector b is not in the range of the coefficient matrix, i.e., b £ R(A). In this case it is usual to solve the problem by computing the solution of minimum residual norm, or the Linear Least Squares solution. min| |Ax -b\\\. (2.22) The singular value decomposition of the matrix A can be used to demonstrate the similarities between the two problems (2.21) and (2.7). The continuous theory can then Chapter 2. The Numerical Treatment of Integral Equations 18 be used to motivate an appropriate solution strategy for all numerical ill-conditioned problems. The existence of singular values and their associated singular vectors in the continuous problem is reflected in the discrete problem in the decomposition of A as UT,VT, where S = diag(a1,... ,o~n) U = [ui,...,un] V = [vu...,vn] and Ui,Vi are the left and right singular vectors of A, respectively. (To avoid scaling difficulties it is assumed that the matrix A is scaled so that o~x = 1.) This is the standard singular value decomposition, (SVD) (see, e.g., [16, 15]) of a matrix. The vectors U{ and Vi are discrete approximations to the continuous functions $; and $ ;. The singular values {cr;} correspond to the constants /z; of the continuous problem and therefore will decay to zero. The matrix A is thus inherently ill-conditioned as defined in §1.1. The solution x is the finite equivalent of the sum (2.16) x = —Vi (2.23) i = l ° * In forming the sum (2.23) numerically, the small magnitude of the singular values may cause roundoff effects to associate incorrect weights with the "later" singular vectors. This means that the troublesome effects of these vectors as components of the solution become increased. The properties of the singular value expansion of (2.7) and the singular value decomposition (2.23) of (2.21) are summarized in Table (2.1) to emphasize the correspondence and similarities between the two. With the strong connection between the analysis of the two problems and the simi-larity in the difficulties associated with their solution, it is natural to assume a discrete Chapter 2. The Numerical Treatment of Integral Equations 19 SVE SVD K{s,t) = _2Zi H&Vi Pi > P2 > • • • > 0 O"! > CT2 . . . > On > 0 for i = 1,2,... = for i = 1, 2, . .. , n ujuj = S^ vfvj = Sij SK{s,t)9i{t)dt = m$i(t) J K(s,t)$i(s)ds = Pi^i(s) Avi = OiUi ATUi = (TiVi Table 2.1: Table emphasizing the relation between the SVE and the SVD analogue of the Picard Condition given in (2.17). This condition can be used to specify the existence of a smooth solution to the discretized system of equations. The Picard Condition (2.17) requires that the ratio 1 — 1 decay to zero as the index, i increases. This is assured by the relative decay of r/; and / i ; , so | —| —> 0 as i —> oo. Mi Once discretized the summations, shown in the table, become finite. In practice the discrete quantities 7; = ufb do not decrease to a value below the noise level of the system (unlike the continuous equivalent rji = ($;,5r)). The singular values o";, however, continue to decay to zero. The criterion under which the solution to a system (2.21) will be deemed reasonable can therefore be given as a version of Picard's Condition in discrete form [40]. It is required that the solution depend continuously on the data: i.e., that it be insensitive to noise or small perturbations of the system. A reasonable solution is one such that the residual is small for an approximation that is smooth, which mathematically can be stated as: \\Ax - b\\\ = 0(e) and \\x\\22 = 0(1) (2.24) for noise level e. Expanding this in terms of the SVD gives \\Ax-b\\l = Y.{oivJx - 1{f (2.25) i=l Chapter 2. The Numerical Treatment o f Integral Equations 20 n and 11as|2 = J2(vix)2- (2-26) i = l Thus if x is to be a reasonable solution given that for some value m the scalars 7; have a magnitude approximately that of the noise of the system, i.e., J7;j ~ e for i > m, then the previous 7; must be of the order of the associated singular values, i.e., l7i| = O(o-i) for i < m. (2.27) The condition above (2.27) requires that the rate of decrease of 7; to a magnitude of the order of the noise level of the system (i.e., 7; —> e for i < m) be faster than that for the singular values 0. The use of iterative solution techniques is limited. If the parameter A is set to zero (thus requiring the solution of the original Least Squares problem) iterative techniques may be used successfully. The regularization is achieved by terminating the iteration before the components corresponding to the small singular values have converged. The effect is to discriminate against the unstable part of the solution by filtering the contributions of the singular vectors. The truncated form of the SVD in §2.2 is, in some sense, an example of this Iterative Filtering technique. As the iteration proceeds a sequence of less and less regularized estimates of the solution is produced. The extent of the regularization is therefore controlled by the choice of iterative method and the (hopefully small) number of iterations performed. In the truncated form of the SVD the singular vectors TJ; appear as components of xm either with their full weighting, as in x* (2.23) (for i < m) or not at all (for i > m). In Chapter 2. The Numerical Treatment of Integral Equations 28 this way the truncated SVD expansion can be expressed as xm = J2H(i)^Vi (2.49) i=l *i where the weight function H(k) is, { 1 k = 1, . . . ,m 0 k = m + 1, . . . ,n Another strategy is to allow all the singular vectors to contribute by a weight function F(o~) that discriminates against the singular vectors associated with the small singular values. The function F(o~) is such that F(cr) 1 for a "large" 0 for cr "small" and has a smooth transition for the intervening values of cr. A general iterative technique is given by Strand [36], xk = xk_x + J\4(AT A)AT(b — Axk-i) (2.50) where A4(o~2) is some polynomial. The initial estimate x0 = 0 is assumed but is not restrictive. The solution to (2.50) can be written in terms of the singular value decomposition as: » f c = E ^ f c ( ^ ) - « x , (2.51) i = l 1 where ^ f e(cr 2) = l - ( l - c r 2 A 4 ( c r 2 ) ) . f c (2.52) (2.51) can then be compared with (2.23). The effect is to damp the component along the singular vectors V{ by an amount ^.(cr?). J-(-) is the Filtering Function. Different choices for the filtering function result in different solution schemes. Chapter 2. The Numerical Treatment of Integral Equations 29 A special case of (2.50) is given by Landweber, [24], where a simple iterative strategy is applied to the normal equations so that xk = JBfc-i +AT(b-Axk^) (2.53) with A normalized so that o1 = 1. (2.53) implicitly takes M.(cr2) — 1. This scheme proceeds relatively slowly since after k iterations only the components corresponding to singular values such that o~i > l/y/k have converged. [17] gives a study of various forms of (2.50) where T is taken as some continuous approximation to the Heaviside step function: 0 z = 0 1 otherwise A series of possibilities using varying degrees of the best least squares fits and Chebyshev polynomials as the approximating function are investigated. This thesis investigates the "filtering" properties of the iterative method of Conjugate Gradients and thus assesses the success with which it can be used as a method for the solution of ill-posed problems. The next chapter details the method of Conjugate Gradients before examining its use as a regularization procedure in Chapter 4. Chapter 3 The Method of Conjugate Gradients The method of Conjugate Gradients is an efficient algorithm for computing the solution to a system of linear equations Ax = b (3.54) where 6 G 3?" and A £ 3f?nXn is assumed to be symmetric positive definite. The method was developed independently by Hestenes and Stiefel [22]. The algorithm is based on the process of locating the minimum of a quadratic function over decreasing dimensional planes and can be easily implemented as a repeated series of simple steps. Each iteration requires the formation of just one matrix-vector product and provides an improved estimate of the required solution. Mathematically, the solution x" = A~lb will be evaluated in at most n repetitions and so is a direct solver. Unfortunately, computationally the effects of numerical roundoff are unfavourable. Although ineffective as a direct scheme (as it was originally proposed) a good estimate of the solution is obtained early in the process, regardless of roundoff errors and hence the method of Conjugate Gradients can be used as a successful iterative procedure. Considerations of the convergence rate led to further development of the Preconditioned Conjugate Gradients. This chapter gives an outline of the method of Conjugate Gradients. The method of Conjugate Gradients belongs to the general class of Conjugate Direc-tions methods, which is motivated by the theory of function minimization. §3.1 uses the minimization process to develop the method of Conjugate Directions. The development begins with the method of Steepest Descents. An examination of the drawbacks of this 30 Chapter 3. The Method of Conjugate Gradients 31 simplest method lead to the Conjugate Directions algorithm. The Conjugate Gradients algorithm itself is discussed in §3.2. §3.2.1 derives the scheme as a special instance of the Conjugate Directions algorithm and §3.2.2 outlines some of the features of the method which will be required in Chapter 4. The effects of roundoff errors and the use of the method as an iterative scheme, are discussed in §3.2.3. Finally, an estimate of the convergence rate of the scheme is mentioned in §3.2.4 and used to motivate the idea of a preconditioned algorithm. The preconditioned algorithm is discussed in more detail in §3.3. §3.3.1 presents Con-jugate Gradients as a method based on a splitting of the coefficient matrix, which leads to a discussion of the role of the preconditioner. Various choices for the preconditioner are mentioned in §3.3.3. Finally, §3.4 extends the algorithm to cases for which the coefficient matrix is not sym-metric positive definite. An algorithm for the Linear Least Squares problem is discussed in §3.4.1 and the preconditioned algorithm is presented in §3.4.2. 3.1 The M e t h o d of Conjugate Directions The solution of the linear system (3.54) and the Conjugate Direction algorithm itself can be motivated by considering the minimization of the quadratic function (x), defined by d>(x) = l/2xTAx - xTb. (3.55) Necessary conditions for the existence of a local minimum for (3.55) at x~ are given by: 1. V(j)(x*) = 0 2. V2d>(x~) is positive definite. Hence with A = V2 symmetric positive definite, locating the minimum of (3.55) is thus equivalent to solving Vd>(x) = Ax — b = 0. The problem of locating the minimum Chapter 3. The Method of Conjugate Gradients 32 of (3.55) is equivalent to the solution of the linear system (3.54). One of the simplest iterative methods for calculating the location of the minimum x* of

decreases most rapidly along the direction of the negative gradient of

{xk) = b- Axk. (3.56) The vector rk = b — Axk is the residual of the system at xk. In the method of steepest descents the next estimate xk+i is chosen as the point which minimizes (xk -\-akrk). This gives a step length of ak = 4 l h - (3-57) The speed at which this strategy converges to x* can be prohibitively slow. The search directions are along the inward normals of the (n — 1)-dimensional ellipsoids, centred at x" which form the level surfaces of cf>. The singular values of A are precisely the length of the semi-axes of these ellipsoids. For ill-conditioned problems where the ratio of largest to smallest singular value is large, the level surface ellipsoids are very elongated and relatively flat valleys with steep sides. The method of steepest descents requires that at each iteration the new search direction be perpendicular to the last by making a right-angled turn at the point where the search direction becomes tangent to the contour lines. This forces the minimizer to be located by a series of many short steps that proceed back and forth across the valley floor. (The system does not have to be very ill-conditioned before this can be destructive. The presence of one singular value that is much smaller than the rest can produce the elongation and force slow progress.) This slow progress can be overcome by searching instead along a set of directions that do not correspond to the negative gradient of

, by the method of Conjugate Directions, is performed by system-atically reducing the dimension of the search space as follows: • Select a point x0 and a line L\ through x0 in the direction pi] • Find Xi that minimizes (j> on L\ exactly; • Construct the (TI — l)-plane 7rn_i through X\ and conjugate to p\. By the above result, 7rn_\ contains x", the location of the minimum of c4 (thus the dimensionality of the problem has been reduced by one). This process is repeated starting with the point X\ and confining the search to the (n — l)-plane 7rn_x. • Select a line L2 in 7rn_1 through Xi in the direction of p2 G 7rn_1; • Find x2 that minimises on L2 exactly; • Construct the (n — 2)-plane 7rn_2 through x2 and conjugate to p2. Chapter 3. The Method of Conjugate Gradients 34 Now x~ is known to be contained in 7rr i_2. In this way an iterative procedure is created. At each step the dimension of the search space is reduced by one. At the n-th step the search space Ti is a line Ln. The location of minimum of

on the line Xk-\ + ctkPk exactly. Each value of ak is found, as in the method of steepest descents, by differentiating 4>{xk-\ + ctkPk) with respect to ctk and equating the result to zero. This yields the step length as: T ak = g ^ , (3.58) which is always well-defined since A is assumed to be a symmetric positive definite matrix, so by definition [u, Au) j= 0 for any vector u. As presented, there is no explicit specification for the choice of the search directions Pfc, but reducing the magnitude of (j>(xk) at each step imposes a restriction on pk- The Chapter 3. The Method of Conjugate Gradients 35 value of

(xk-l + OLkPk) = (xk-i) + akpl(Axk-i -b) + l/2ak2plApk then with the value of ak as given in (3.58), this gives VkAPk {Xk) = fc-i)- (3-69) Then by substituting (3.69) into (3.58) gives (3.71), and substituting (3.67) into (3.63) with (3.71) gives (3.71) The expressions (3.58) and (3.63) can then be rewritten as: X AFC = -PUPT ( 3 - 7 0 ) and/3 fc = | ^ ^ . (3.71) \\rk~2\r The steps for the Conjugate Gradients algorithm can now be presented in a form that is practical for programming purposes. Chapter 3. The Method of Conjugate Gradients 39 Conjugate Gradients Algorithm. Set x0 = 0 r0 — b then for k — 1, 2, 3, . . . ,n (3.72) (3k = T T ~ l T k - 1 (/31 = 0) (3.73) Tk-2Tk-2 Pk = Tk-i + PkPk-i (3-74) T " k = ( 3 ' 7 5 ) xk - xk^i + cxkpk (3.76) rk - r f c_! - akApk. (3.77) The formulae for a,/3 and r can be written in various forms (see [22] and [29]) but the forms given above are preferred for implementation [29]. The algorithm above appears as it was first presented. Despite its mathematical beauty and computational simplicity the algorithm was not popular, as in practice it proved to be uncompetitive with other available schemes. The next section discusses the source of the problems and shows that when the method is regarded as an iterative scheme (rather than as a direct scheme) it can be considered as a viable solution method. 3.2.2 Properties of the Method of Conjugate Gradients Theoretically the Conjugate Gradients method presented above possesses favourable qualities. It involves at most (in exact arithmetic) n repetitions of a series of simple steps. The coefficient matrix A is not involved in any decomposition process and is re-quired only for the single matrix-vector product performed in the calculation. Any special properties of the matrix A (e.g., banded structure) can be fully exploited reducing the computational effort per iteration, and the storage requirements. This is of particular importance in cases where A is large and sparse. Practically, though, the algorithm's performance falls short of these high expectations. Chapter 3. The Method of Conjugate Gradients 40 The Residuals and Search Directions The residuals rk are the search directions used by the method of steepest descent, but the Conjugate Gradients algorithm uses the search directions as the conjugate vectors pi to produce an improved performance. In the algorithm (3.77) the particular conjugate vec-tors, formed by orthogonal projections of the residuals, create a set of relations between the residuals and the search directions. By definition the direction vectors {pt, ... ,pn} form a mutually conjugate set, and the residuals {ro, . . . ,rn} are mutually orthogonal. In addition pfrj = 0, for i < j. We have also that span{pi,... ,pk} = span{r0,... ,rk-i} = span{b, Ab, . .. , Ak~1b}. So in summary: (ritrj) =0 i^j (Pi,APj) =0 i±j (Pi,rj) =0 ij. The Parameters a and 8 The two parameters a and 8 used in the method of Conjugate Gradients are calculated by known formulae (3.58) and (3.63). No a priori knowledge of the system, or other user interaction is required. This is in contrast to many other methods (e.g. SOR and Chebychev iteration) which require parameters to be estimated in order to attain optimal efficiency. The parameter 8k is the square of the ratio of the last two residual L2 norms. If at each step the error ek is defined as ek = x* - xk (3.78) Chapter 3. The Method of Conjugate Gradients 41 and the inner product E{ek) = (ek,Aek), then E(ek) = (aj" - x)TA{x' - xk) (3.79) = H ^ - ^ I U (3-80) = x"TAx" - 2(j>(xk) (3.81) and the minimization of (3.55) is equivalent to the minimization of the A-norm of the error. This means that the method of Conjugate Gradients ensures the reduction in magnitude of \\x" — Xk\\^ at each step and therefore guarantees that an improved estimate to a;" is obtained at each iteration. The reduction in length of the error vector does not guarantee a monotonic decrease in the magnitude of the residuals, so no bound can be predicted for the /3fc's. The step lengths ak are such that their reciprocal is bounded between the largest and smallest singular values of the matrix A [22]. With the singular-values ordered cr1 > .. . > an, — < ak < —. (3.82) For well conditioned systems a reasonable indication of the sizes of the ak can be es-0~i timated but when — is large the oik's are effectively unbounded. However, since it is assumed that the matrix A is symmetric positive definite, each step length is positive, i.e., a*. > 0 for k = 1,. .. , n. Fini te Terminat ion The termination of the method within a finite number of steps is guaranteed using exact arithmetic. The direction vectors {pi, . . . pk\ form a set of k linearly independent vectors, so the full set of n direction vectors is therefore span{pi,... ,pn} = 3?" . An algorithm that minimizes (j> o v e r span{pi,... ,p n } must (in exact arithmetic) locate the minimum Chapter 3. The Method of Conjugate Gradients 42 x". In addition, if A is such that it possesses only t distinct, non-zero, singular values then the solution can be expressed as a combination of t independent vectors and the method will locate the minimum in at most t iterations [22]. 3.2.3 Conjugate Gradients as an Iterative M e t h o d Following the publication of the original paper detailing the Conjugate Gradients algo-rithm very little interest was shown in the technique. This neglect stemmed from the fact that in a practical situation its major feature, the finite termination property, failed to hold. The algorithm was not competitive with such schemes as Gaussian Elimination with regard to both accuracy and efficiency, and it fell into disuse. However the method's other properties were promising and this led to a later re-evaluation of the scheme [29]. In [29] extensive tests are performed which demonstrate the effects caused by roundoff errors. Errors due to roundoff are propagated in the formation of the vector products (rfc_i,r-fc) and (Apk-i^Pk) [29] so that the orthogonality conditions {ri,Tj) = 0 and (Api.pj) = 0 for * ^ j (3.83) no longer hold, and the guarantee of termination within n steps is lost. The propagation effects result in the accumulation of the errors, so that the component of rk in the most recent direction is accurate but earlier components are less so. This means that the method of Conjugate Gradients is useful as a direct method only for systems of very small order. One way to minimize the effects of roundoff errors is to restart the process once a reasonable estimate has been obtained [22]. The guarantee of the monotonic decrease in the length of the error ||a;* —a;fc||2 (from (3.79)) justifies the use of any calculated estimate as an improved starting point. However, too many restarts would destroy the advantages of the process over that of steepest descents, and is not in general recommended. Chapter 3. The Method of Conjugate Gradients 43 Even when roundoff affects the performance of the algorithm a reasonably accurate estimate of x' is obtained long before the nth step [22]. Reid found (for relatively well-conditioned systems) that the guaranteed reduction of ||ejt.||2 held despite the effects of roundoff. This observation led Reid [29] to propose the scheme's successful implemen-tation as an iterative process to generate approximate solutions, and indeed (3.77) is an efficient and competitive procedure compared with other iterative schemes such as Chebyshev or Jacobi iteration. Using economical procedures to perform vector products makes the method of Conjugate Gradients particularly effective for solutions of large sparse systems of linear equations. Implementing the algorithm as an iterative scheme is particularly ecconomical as it requires 5n + 2 operations and one matrix-vector product (which will involve a maximum of n2 multiplications) per iteration. The product Ark can be stored each time to avoid unnecessary multiplications. The storage requirement for the scheme is therefore 4 n-vectors (x,r, p and Ap). In the case where A has a favourable sparcity structure the operation count can be reduced considerably. In the evaluation of the Conjugate Gradients algorithm as an iterative scheme the critical issue is the number of steps required to reduce the error by a given amount, and thus the computational effort needed to obtain a reasonable estimate. An examination of the rate at which the algorithm (3.77) converges follows in the next section. 3.2.4 Convergence Rate A measure of the efficiency of an iterative solution method is given by its convergence rate. At each step the Conjugate Gradients algorithm produces a new estimate to the solution x" such that the error functional (3.79) is minimized. Various attempts have been made to specify a bound for this error and so give an estimate of the rate of convergence Chapter 3. The Method of Conjugate Gradients 44 see [12] and references therein. In [10] it is shown that - xk\\A < Qk\\x'- x0\\A (3.84) where Qk depends only on the singular values of the matrix A. For a general matrix A it is not possible to specify Qk exactly, but different bounds can be placed on the convergence rate by obtaining a value for Qk given some information about the singular value spectrum. A basic bound can be generated [10] assuming that an interval [a, 6] is known to contain all the singular values of A. Then Q, < (3.85) This is a reasonable bound for matrices where K (the condition number of A) is small, but severe ill-conditioning could lead to impractically slow convergence. In cases where more detailed information on the exact structure of the spectrum is available (e.g., clustering of the singular values) this bound can be improved see for example [4] and [30]. The convergence rate of an iterative scheme can often be improved by preconditioning the coefficient matrix. Preconditioning refers to the strategy of transforming the matrix in question to reduce its condition number, or to cluster or otherwise redistribute its eigenvalues. The technique is well understood and widley used for such methods as SSOR [3]. The method of Conjugate Gradients is also well suited to the idea of preconditioning. In exact arithmetic the number of steps required to converge to the solution x" is equal to the number of distinct singular values of the coefficient matrix, so the performance of the Conjugate Gradients Algorithm takes account of the full singular value spectrum and not just those on the periphery. Any clustering of the singular values, which would effectively reduce the number of distinct singular values, would therefore have a favourable effect Chapter 3. The Method of Conjugate Gradients 45 on the efficiency of the scheme. An implementation of the Preconditioned Conjugate Gradients algorithm is given in the next section. 3.3 Preconditioned Conjugate Gradients Recognising the method as an iterative process permits the loss of orthogonality con-ditions (3.83) to be tolerated, but the question of convergence rate is then much more important. The error bound (3.85) suggests an unfavourable result for poorly conditioned systems. The notion of preconditioning the method of Conjugate Gradients is thus well motivated. There are two approaches to the idea : The process can be regarded as • Using Conjugate Gradients as an acceleration to another iterative process. • Preconditioning the sytem for Conjugate Gradients iteration. Theoretically the views are equivalent but, as is shown below, while the first gives a clearer understanding of the notion the second is used for practical implementation. §3.3.1 derives the preconditioned Conjugate Gradients method as a special case of a generalized two step iterative scheme based on a splitting. §3.3.2 gives a practical preconditioned algorithm and discusses the issues involved in the choice of the preconditioner. Some common preconditioners are mentioned in §3.3.3. 3.3.1 Generalized Conjugate Gradients A generalized two step iterative scheme based on the splitting of the matrix A = M — TY, where M is symmetric positive definite, can be formed by: Set £_i = Xo = 0 and ro = b Fork = 1,2, . . . or until ||r f c|| < TOL Chapter 3. The Method of Conjugate Gradients 46 Solve Mzk_x = 6 - ( A f - N)xk-i = rk_x Choose parameters cuk and vk-\ and set Various iterative methods can be described in this way, e.g., the Jacobi, Gauss-Seidel, and SOR schemes, [16]. Combining the scheme above into one equation gives Mzk = Mzk_2 - wk{vk_x(M - N)zk_x + M ( z f e _ 2 - zfc_!)) (3.86) If the parameters uk-\ and are computed so that zjMZJ = 0 for i ^ j then 3k < n such that zk = 0, and so xk — x". This description is equivalent to a generalized two step Conjugate Gradients method and will therefore converge in no more than n steps. The required values of the parameters vk_x and uk are: zTk_x{M - N)zk_x vk-\ zl-XMzk_x 1 and wfc = (1 =—— ). Vk-1 Zk^2Mzk-2 Since M is symmetric positive definite, it can be represented as M — CCT for some non-singular matrix C. Choosing a splitting A — M — N is thus equivalent to choosing a scaling matrix C. The system Ax = b then becomes the transformed system Ax=l (3.87) with A = C-'AC-7, x = C~Tx, b = C-H (3.88) (note that A is still symmetric positive definite). The Conjugate Gradients algorithm (3.77) can be applied to this preconditioned system. Chapter 3. The Method of Conjugate Gradients 47 3.3.2 Prac t ica l Implementation of Preconditioned Conjugate Gradients Applying the Conjugate Gradients algorithm to the preconditioned problem (3.87) gives: Set Cx0 = 0 and Q lr0 = b then for k = 1, 2, 3 pk = C~lrk-i + 3kpk-i PZCC-lAC-ipk Xk = Xk-l + OLkPk C~xrk = C - V i - f l i C - 1 ^ - ^ . Computing A is not advised as this would increase the expense and introduce unnecessary roundoff error. Generally, even if A has any sparsity structure (e.g. a banded matrix) A will not, and will require increased storage. The major part of the work per iteration is in performing matrix-vector products. It is possible in such cases that any advantage of routines written to perform the matrix-vector products Av efficiently will be redundant and the overall work per iteration for the preconditioned system may increase substan-tially. The additional cost of applying the preconditioner must be sufficiently small so as to not negate the savings produced by the increased convergance rate. A practical scheme is achieved by simplifying each line in the algorithm and setting M = C2. The extra matrix-vector products are reduced by setting z = M~~1r, which then requires the solution of the system Mzk = rk once per iteration. The algorithm below is produced. Precondi t ioned Conjugate Gradients Algor i thm. Set XQ = 0 and r0 = b then for k = 1, 2, 3 , . . . and j|r-jt |j < TOL Solve Mzk = rk Chapter 3. The Method of Conjugate Gradients 48 f3k = (ft = 0) zk-2rk-2 Pk = zk-i + QkPk-i vlAn xk - Xk-i + ahpk rk = rk_i-akApk. By preconditioning in this way it is possible to work with only the original matrix, and the sequence of updates is directly for x (and not x). The algorithm is still applied to a symmetric positive definite system so all the properties of the original scheme hold, but with an improved convergence rate. In the case where M = I (the identity matrix) the above algorithm reduces to that of the original scheme (3.77). The algorithm requires one solution of a linear system plus 5n + 2 opertions and one matrix-vector product (with a maximun of n2 operations) per iteration. Storage is required for the vectors x,r, p, Ap and z (although z can share the storage with Ap). Routines may be written so the it is not necessary to store A or M in full matrix format. It is not at all obvious which choice of the preconditioning matrix C would give the optimum performance. The matrix C is selected so that K(A) < n(A), and any redistribution of the eigenvalues so as to cluster them would also have favourable effects. However, the matrix C is never required in the algorithm as it only appears in the form C2 = M. As explained above, §3.3.1, preconditioning is equivalent to accelerating an iterative method formed by splitting the original coefficient matrix as A = M — N. The matrix M of the splitting and the matrix M that appears in the algorithm are in fact the same matrices. The question of how best to choose C is thus replaced by the problem of choosing M in the splitting A = M — N. In one sense it is possible to view the above algorithm as a direct method (applied Chapter 3. The Method of Conjugate Gradients 49 to a modified system Mz = r) imbedded within an iterative scheme. Different choices for M will alter the computational effort involved for the direct method. There are two opposing considerations governing the choice of the preconditioner M. • One philosophy is that M should be as close to the identity matrix as possible, ensuring the solution of Mz = r requires minimal extra work. • Alternatively, since the rate of convergence is related to the condition number of M~XA (from the underlying iterative scheme), the ideal choice for M would be A itself. This choice would reduce the Conjugate Gradient algorithm to that of the method chosen to solve Mz = r, since at the first step r = b so the solution x" would be achieved immediately (but at the cost of solving the full system Az = 6). Thus M should instead be selected to be as close, in some sense, to A as possible. From the splitting A = M - N, A = I + C~lNC-\ (3.89) so that the number of distinct eigenvalues of A is equal to the number for N. For the number of iterations to be as few as possible N should be like the null matrix and thus M should reflect A. Obviously the optimum choice involves some compromise. A short discussion detailing some of the common choices appears in the following section. 3 .3 .3 Common preconditioners A good preconditioning matrix M is one that achieves a favourable compromise between the reduction in number of iterations needed and the increase in the cost per iteration. A successful choice is somewhat problem dependent since the method of Conjugate Gradi-ents is able to make effective use of symmetries and other structural properties inherent e Chapter 3. The Method of Conjugate Gradients 50 in the system. Many preconditioners have been proposed (see for example [25],[11], [12], [32],[44]). A comparison of the various choices is beyond the scope of this thesis, but for completeness some are discussed below. Since Preconditioned Conjugate Gradients has its basis in a splitting A = M — N this can provide some guidance on devising rapidly converging schemes. Any of the classical methods can be used. • Jacobi method Diagonal preconditioning is the simplest iterative method. M is taken as the di-agonal of A so the solution of Mz = r at each iteration is just a simple division process. It requires no extra storage and an increase of n operations per iteration. This type of preconditioner is equivalent to accelerating the Jacobi iterative scheme by the method of Conjugate Gradients. • SSOR iteration The method of SSOR can be accelerated by Conjugate Gradients in a similar man-ner. With A - D + L + U the preconditioner becomes M = (D +u>L)D~1(D +uL). This needs no extra storage since the solution Mz = r is performed directly using the required elements of A. • The method of I C C G Decompositions of A may not retain the sparsity structure in the factors and so there may be a radical increase in storage requirement. It is preferable to keep any banded structures that A may possess. The Incomplete Cholesky factorization was proposed as a preconditioner by Meijerink and Van der Vorst [25]. The strategy is to perform the Cholesky decompostion of A but to retain as zeros preassigned elements of L. The problem of fill-in is thus confined to some known area (e.g., Chapter 3. The Method of Conjugate Gradients 51 within the bandwidth of A). The decomposition is then a splitting A = LLT — R so that the preconditioner is the product of simple triangular matrices M = LLT. Various forms of the incomplete factorization arise from different choices of the elements required to be zero. The preconditioning matrix can also be based on the polynomial expansion of the inverse of A. This technique is particularly applicable to parallel computers see [32] and [44]. In this preconditioned version Conjugate Gradients is one of the most practical it-erative solvers. The method is also of historical interest. Although direct and iterative methods generally form two distinct classes of solution schemes, the method of Conjugate Gradients falls into both and emphasizes the link between the two. It was conceived as a direct scheme but was soon rejected. Its properties, once re-evaluated, proved its worth as an iterative approach. The preconditioned version in one sense combines direct and iterative schemes in one algorithm. 3.4 Conjugate Gradients for Linear Least Squares The derivation of the algorithm, and the discussion of its properties, is valid only for a symmetric positive definite coefficient matrix. A modification to the original algorithm to deal with the more general case is described here. §3.4.1 presents an algorithm to solve the Linear Least Squares problem while §3.4.2 discusses the preconditioned version. 3.4.1 Linear Least Square Algorithm For any matrix A the product ATA is symmetric positive definite (This is valid since (ATAv,v) ^ 0 and (ATAv,v) - (Av, Av) > 0 so that (ATAv,v) > 0). Hence in the Chapter 3. The Method of Conjugate Gradients 52 general case where A is not symmetric positive definite (or is overdetermined) the method of Conjugate Gradients can be applied (with ATA as the coefficient matrix) to the Normal equations ATAx = ATb and be used to solve the Linear Least Squares problem. In the case of the Normal equations the error functional (3.79) is replaced by E(x--xk) = (af - xk)TATA(x~ - xk) (3.90) = rTkrk (3.91) so that the algorithm ensures the monotonic reduction in length of the residual rather than the error vector. When working with the Normal Equations, formation of the product ATA is not ad-vised as it introduces roundoff errors, and increases the ill-conditioning of the system, since K(ATA) = K(A)2. The resulting, more efficient algorithm, while algebraically equiv-alent to applying the procedure directly to the normal equations, (with the coefficient matrix ATA and right-hand side ATb) is computationally superior. Conjugate Gradients Least Squares Algorithm Set x0 = 0 and r0 = b then for k = 1, 2, 3 , . . . while ||r f c|| < TOL Vk = ATrk_x + (3kpk_x rl_1AATrk_1 a k - PUTAPk xk = Zfc-i + akpk rk = T-fe-i - otkApk. One additional matrix-vector product is introduced, (so that solution requires 5ra + 2 operations, and two matrix-vector products per iteration) but the scheme updates x directly and retains the advantages of the basic algorithm. It is possible for the vector Chapter 3. The Method of Conjugate Gradients 53 ATr to share storage with Ap, so that space is still required only for the 4 vectors x,r,p,Ap. The use of the method of Conjugate Gradients for finding the solution of minimum norm is studied in [7] and [33]. 3 .4 .2 The Preconditioned Algorithm for Least Squares The idea of preconditioning can also be applied to the least squares problem. Preconditioned Conjugate Gradients Least Squares Set x0 = 0 and r0 = b then for k = 1, 2, 3, . . . while llrJI < TOL Solve Msk^ = A rk_x _ s~[_1AAT sk-i A - s{__AATsk-a ( / ? 1 - 0 ) Pk = 8k-l +/3kpk-l Solve Mqk = pk sl_1AATsk_1 q{ATAqk Xk = a:jfe-i + OLkM^pk Tk = rk_x - akAqk. Notice the algorithm requires the solution of two linear systems per iteration. For this the question of balancing the increased cost against the reduction in number of iterations to produce a decrease in the overall cost is particularly critical. Comparisons of some preconditioners for the least squares problem are given in [33] and [7]. Chapter 4 The Fi l te r ing Properties of the M e t h o d of Conjugate Gradients The solution of ill-conditioned systems, that are large and sparse, by the technique of Tikhonov Regularization in §2.3.1 is prohibitive; direct methods are costly and iterative methods are ineffective due to the small parameter. Iterative methods can be used efficiently by achieving the regularization via an iterative filtering technique, §2.3.2. This chapter examines the use of the method of Conjugate Gradients as the iterative method to be applied. The first section describes the properties of a "favourable" filtering method, extending the ideas from §2.3.2. §4.2 describes the filtering function associated with the method of Conjugate Gradients using the relations from the algorithm (3.77). The similarity of the polynomial filtering function to a step function, is investigated in §4.3. The polynomial at each iteration is characterized as the solution to a weighted least squares problem and is shown to approximate the Lagrange interpolant through the singular values. A detailed examination of the filtering effects is given for the polynomial formed at the second iteration; the theory is extended qualitatively for later iterations and numerical evidence is presented, using four integral equations as test cases. In §4.4 the polynomial, at each iteration, is viewed as the expansion of an approximation to the inverse of A; this gives an alternative approach to the explanation of the filtering properties. The practical implementation is discussed in §4.5 and numerical tests show that the method of Conjugate Gradients gives comparable results to the method of truncating the SVD (2.31). 54 Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 55 4.1 Iterative Fi l ter ing In Chapter 2 it was shown that the objective when computing the solution to a system of equations that are ill-conditioned is to find a smooth approximation that is close to the exact solution. In many practical situations the system is very large and often sparse. In these cases iterative solution methods are preferred. The approximation may be achieved by the technique of iterative filtering as in §2.3.2. In Chapter 2 some of the common iterative methods that have been used as iterative filters were mentioned. For an effective filtering method the convergence of the components of the solution corresponding to the "larger" singular values should be rapid as these form the smooth solution. As the contributions from the "later" oscillatory singular vectors converge the solution becomes unstable. It is valuable for the rate of convergence of these vectors to be slower, so that they can be damped. A favourable iterative method will give a series of estimates that, when expressed in terms of the singular vectors of the system *k = _H (4-92) t=l ai the function, T(o~\ k) filters the contribution of the singular vectors by damping those cor-responding to the small singular values. Qualitatively J-(a;k) should have the property that, for the ordered singular values o~i > . .. > crp(fc) > . .. o~q > ... > on 1 p(k) >i>l (i.e., for o{ "large") T(o; k) = I 0 n > i > q (i.e., for a{ "small") s(cr) q > i > p(k) (i-e., otherwise) where s(o) is some non-decreasing function that has values between one and zero, i.e., s'(-) < 0, 0 < s(o-) < 1. (4.93) Tb Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 56 The function J-(cr;k), is known as the filtering function associated with the chosen iter-ative method. As the iteration process proceeds (so the value of k increases) the value of p and q should increase monotonically so that the components of each of the singular vectors converge in order, (i.e., those corresponding to the larger singular values first). The success of a particular iterative method thus depends on finding a set of polyno-mials J-(a; k) that give good approximations to the Heaviside step function. The filtering properties of the method of Conjugate Gradients are examined in the next section, by considering the polynomials inherent in the iterative process. 4.2 T h e F i l t e r i n g F u n c t i o n In order to examine the feasibility of the method of Conjugate Gradients as an iterative filter it is necessary to express the iterates in terms of the singular vectors as in (4.92). From the algorithm (3.77) the method of Conjugate Gradients gives a series of estimates Xk to the solution of (2.21), formed as Xi = c[l^b x2 = c22)Ab + c[2)b x3 = c<£)A2b + c23)Ab + c(i3)b (4.94) where the cf^ are constants. This means that the iterates are expressible in terms of a polynomial as, xk=Vh(A)b (4.95) where Vm(A) is a polynomial of degree (m — 1). Each iterate can then be expressed as an expansion in terms of the singular value decomposition of the coefficient matrix. This Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 57 is n uTb xk=T/*iM0 = 1 and V0 = 0. By rearranging (4.101) and substituting the result into (4.100), together the relations give the three term recurrence relation for the polynomial Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 58 filtering function. Gk( 0 H(x)=\ { 0 x = 0 by a p o l y n o m i a l of degree k. T h i s is observed by V o g e l [42]. T h e similarity of the p o l y n o m i a l to the step function is necessary for Gk(%) to be an effective filter, however it is not sufficient. Progress toward a more precise evaluation of Gk{%) m a y h e achieved b y studying a "simpler" p o l y n o m i a l that somehow approximates the p o l y n o m i a l f r o m (4.110). A more quantitative e x p l a n a t i o n is a n o n - t r i v i a l p r o b l e m , due to the dependence of the weights o n the data, , a n d a l t h o u g h theoretically it is possible to f ind Gk{^i) exactly v ia the m i n i m i z a t i o n (4.110) this is not practical . T h e following section begins by showing that Gii®) m a y be regarded as an a p p r o x i m a t i o n to an i n t e r p o l a t i n g p o l y n o m i a l t h r o u g h the first two singular values. W i t h this simplif ication it is possible to gain insight into the filtering properties. 4.3.2 A p p r o x i m a t e : Interpolat ion T h e precise algebraic representation of the first few constants a a , a 2 a n d 02 is fairly easy. bTb «i = WAb <4"112> Chapter 4. The Filtering Properties of the Method of Conjugate Gradients 65 (bTb)(bTA2b)-(bTAb)2 bTAb 2 [(bTA3b)(bTAb)- {VAHy1 bTb { ' _ (Fb)(b*A>b) - {VAbf fa - { b T A b y (4-114) To simplify the algebra it is assumed that the linear system has a diagonal coefficient matrix. The above can then be simplified using bH = ±b\ = 6 t=i n bTAb = = s i = \ bTA2b = f > ? 6 ? =d i = l bTA3b = = * (4-115) i = i The polynomial given at the second iteration (4.104) can therefore be expressed using the symbols in (4.115) as, „ , .. ,bt — sd. ,bd — s2. ~ ftW = ta)'-(ir7^ ( 4 - 1 1 6 ) (4.116) may be by expanded using the terms, (bt-sd) = (ai + o-2)(ai - a2)2blb22 + (en + o-3)(o-i - asYblbl + . . . (4.117) (bd-s2) = {a1-a2fb\b\ + {cr1~ i. Substituting these into the polynomial (4.116) gives Q 2 ( x ) as r (,\ - {^+^){(Ti- k. (4.135) T h i s relates to the idea that once a s ingular vector is i n c l u d e d i n the est imate i t remains i nc luded for a l l fo l lowing i terat ions, and that one more is i n c l u d e d at each i te ra t ion . T h e above condi t ions give tha t (BkA-I)Pj=0 j K2){V? ® / ) (5.149) The problem of finding the solution of minimum norm is therefore reduced, by using property (4), to min ||(Si ® K2)fi - gi\U fi where / / = ( V f I)J 9i = {U?®I)g From the definition of a Kronecker product the coefficient matrix Kf = (Sj <8> K2) is the block diagonal array / <^uK2 \ o-\2K2 <^lnK2 J Chapter 5. Numerical Application 90 The vectors, / / and git are just formed as the reduction of the matrices Fj = VXT F and Gi — U^G, (using property (2)). The condition number of the coefficient matrix, Kr, formed above is the product of the condition numbers of Kx and K2. The matrices Kx and K2 are totally positive [23], and thus their Kronecker product is also. The singular vectors i>t- therefore have (i — 1) sign changes and so an increase in the contributions from the singular vectors corresponding to even relatively large singular values may cause instability in the solution. The original [n x n) image has therefore resulted in an (re2 x re2) system that is sparse and extremly ill-conditioned. The numerical example used for the tests is a modification of the Gaussian blurring function given in [14]. The elements of the matrices K~i and K2 are given as (K^ = (—f^-cn^i-j)') (5.150) (K2)13 = ( - ^ ( - ^ ( i - j ) ' ) (5.151) c 27i 3.92,ra w i t h 7 i = 2 - - + 0 . 5 - t . (5.152) re 2 the constants are set such that c i = 1/2 and c2 = 1/4 (this corresponds to_ a moderately severe blurring effect, [2, 14]). The three test images are taken as simple 15 x 15 digitized arrays that corresponded to the sharp images shown in Fig.5.6. The digitization is set to have 9 grey levels so that values below zero are represented as white up to values above 8 which are represented as black, which allows a visual (if crude) representation of the solution to be given. 5.2 Discussion of Results The system is very large and sparse. The use of the truncated form of the singular value decomposition (2.31) on this system would be extremely inefficient as it would require Chapter 5. Numerical Application 91 Figure 5.6: The original sharp test images of the Cross,L and Box. For the Cross and the L each test is a binary image composed of 8rs and O's; the Box has an additional middle grey level of 4. The Legend shows the grey scale used for the digitization. Chapter 5. Numerical Application 92 0(n 6 ) operations (where n is the number of pixels in one row). With efficient routines for the multiplication of banded matrices the use of the method of Conjugate Gradients has a work requirement of 0(n3) operations. It is not possible to retrieve the exact images in Fig.5.6 from the degraded images, shown in Fig.5.7, due to the severe ill-posed quality of the problem. (Comparison of Fig.5.7 with Fig.5.6 demonstrates the extent of the retrieval problem.) The requirement for the solution to be smooth, as given by the discrete Picard Condition (2.27), can be interpreted visually. An approximate solution will be satisfactory provided it gives a reasonable distinction between the area containing the object and the background. Gradual fading from white to black over the boundaries and into the centre of the image will be acceptable. This corresponds to a smoothing of the effect of the defocussing. The numerical representation of the images that form matrix G are given in Fig. 5.8, they demonstrate the severe degredation that has occured. In the test examples the exact error is available. The graphical representation of the exact error for each of the tests is shown in Fig.5.9. The results show that, in each case, there is a marked minimum of this error that corresponds to the best approximation to the exact image. This occurs very quickly; in the region of 4 or 5 iterations for a system of 225 equations! Examination of the singular values (see Fig.5.10) of the coefficient matrix indicates that there are four high values clustered away from the others. In terms of the discrete Picard Condition (2.27) the quantities (uf gi) must decrease very rapidly in order to be assured of a smooth solution. For most examples, therefore, the best smooth approximation will include contributions from only the first four singular vectors. This explains the rapid "convergence" (to a smooth solution) for the method of Conjugate Gradients. The value of the error seems quite substantial in each case. However it is acceptable since it accounts for the fading across the boundaries. The numerical representation of the results of the iteration process is given in Fig.5.11 Figure 5.7: The result of defocussing the test images of the Cross.L and Box. shows the grey scale used for the digitization. The legend Chapter 5. Numerical Application 94 CROSS t » s t imagm 34 1 9 4 30 1 587 8 90 0 123 1 209 2 1 16 .5 149. 0 213 6 258 0 310 2 364 . 4 4 76 .5 562 4 615. 5 530 0 623. 4 672 3 454 .9 535 0 576 8 314 . 7 371. 0 404 3 110 .2 133. 7 162 3 14 9 23 7 53 0 2 .3 10 4 45 3 1 .5 11. 6 56 0 1 9 15. 5 75. 7 2 .7 21. 5 104 2 20 . 4 55. 8 201 2 . t e s t i a a g e 31 . 1 18. 8 21. . 1 112 0 67 . 4 76. 0 272 0 183. 8 184. 7 466 5 281. 0 316 ,7 596 .4 359 2 404 9 613 0 369. 2 416 . 1 7 1 .0 343 8 387 5 555 . 1 332 8 375 1 603 . 3 356. 3 401 5 717 0 411 7 463 . 7 818 0 457 3 514 7 801 .3 441 . 3 496 6 840 . 2 350. 8 394 . 7 407 2 222 9 250 8 202 . 1 110. 6 124 4 10X t»tt iaage 9 8 35 4 86 0 4 1 14 . 8 35 9 5 . 4 19 5 47 . 4 8 . 2 29 5 71 7 8 9 32 0 77 . 7 7 8 27 4 66 .6 6 5 23 3 56 7 5 .6 20. 3 49 3 8 5 23 3 56 7 7 .8 27 4 66 .6 8 .9 32 0 77 .7 8 2 29 5 71 7 5 .4 19 5 47 .4 4 . 1 14 8 35 .9 9 8 35 4 86 0 881 3 1225 8 1402 2 394 1 624 7 758 1 339 8 491 3 572 6 429. 5 486 2 493 0 621. 9 578 7 499 3 655. 1 575 4 467 4 561 4 492. 2 399 1 404 1 369 8 313 8 203 2 244 4 257 9 119 8 205 1 256 5 129 1 237 8 305 1 163. 0 301 8 388 0 220 7 408 9 525 7 303. 3 561 9 722 3 547 1 994. 6 1270 5 20. 6 13. 3 3 9 74 2 47 9 14 1 180 2 116 3 34 3 309 1 199 5 58 9 395 1 255 0 75 3 406. 1 282 , 1 77 5 378 2 244 5 73 . 1 388 8 240 0 78 .4 396 4 270 7 115 8 462 . 7 332 6 203 .0 520 3 413 1 301 1 505. 6 418 6 332 3 402. 9 336 . 1 257 .5 256 1 214. 2 178 5 127 1 106 3 87 7 147 7 189 5 187 3 61 6 78 8 81 . 1 81 3 104 0 107 .0 123 0 157 2 181 8 133 3 170 8 176 .9 114 5 148 .3 159 .5 97 8 129 7 151 2 85 1 114 .2 137 .2 97 8 129 . 7 151 .2 114 .5 148 3 159 .5 133 .3 170 .8 176 .9 123 0 157 .2 161 8 81 3 104 .0 107 .0 81 6 78 .8 81 . 1 147 7 189 5 197 3 1317 . 9 989 0 557 . 7 726 0 530 6 256 3 541. 5 402 9 217 2 443 8 353 6 258. 9 418. 9 387 3 360 3 380. 2 347 4 375 6 324 2 296 7 321 8 261 1 231 5 233 4 235 6 183 9 124 1 247 8 178. 9 80 3 297 0 212 1 68. 5 377 9 269 7 111 8 512 0 365 4 151 5 703 8 502 0 208 2 1235 3 883 7 373 3 0 2 0 0 0. 0 0 8 0 0 0 0 1 9 0 0 0 0 3 2 0 0 0 0 4 . 1 0. 0 0 0 4 3 0 1 0 . 1 5 1 1 . 1 1 2 13 6 8 7 9 9 SO . 7 41 3 47 5 137 2 117 2 134 a 238 3 206 1 237 i 277 3 240 8 277 0 233. . 3 202 8 233 . 3 148 9 130 3 149 9 74. ,5 84 8 74 5 167 8 182 9 187 8 75 7 72 9 75 . 7 99 .6 95 9 99 6 ISO .9 145 3 150 .9 166 .8 161 6 166 8 160 . 1 160 5 160 . 1 170 3 180 2 170 3 160 .8 172 .9 160 a 170 3 180 2 170 . 3 160 . 1 160 5 160 . 1 166 .8 161 6 166 .8 150 .9 145 3 ISO . 9 99 6 95 9 99 .6 75 . 7 72 9 75 . 7 187 a 182 9 187 a 354 5 368. 5 416 3 113 5 97 2 108 0 125 7 125. 6 141 5 238 2 277 7 316 4 418 7 512. 7 586 1 461 2 570. 3 652 4 395 7 489 5 559 9 275 .7 338 8 387 2 104 2 118. 6 135 0 25 6 16 2 17 4 17 7 2. 6 1 6 21 3 1. 8 0. 2 28 8 2 3 0 1 39 6 3 2 0 .4 82 6 22 6 20 0 0 .0 0. 0 0 0 0 0 0 .0 0 0 0 0 0. 0 0 .0 0 .0 0 0 0 .0 0 .0 0 0 0 0 0 1 0 . 1 0 . 1 1 .4 1 . 7 1 .6 11 . 7 13 .6 12 6 55 8 65 1 60 1 158 6 184 .9 170 6 278 .8 325 3 300 0 325 .7 360 0 360 .5 274 . 3 320 .0 295 2 178 3 205 6 189 7 67 6 102 2 94 3 197 3 189. 5 147 7 81 . 1 78 8 61 6 107 .0 104 .0 81 . 3 161 8 157 2 123 0 176 .9 170 8 133 . 3 159 .5 148 3 114 5 151 .2 129 7 97 8 137 .2 114 2 85 1 151 2 129 . 7 97 8 159 5 148 3 114 5 176 .9 170 .8 133 3 161 8 157 .2 123 .0 107 0 104 0 81 3 81 . 1 78 8 61 6 197 3 189 5 147 7 428 3 397 6 33a 1 111 1 103 1 87 .7 145. 5 135 1 1 14 9 325 6 302 2 257 0 603 2 559 9 476 2 671 4 623 2 530 0 576 3 534 9 454 9 398 5 369 9 314 5 138 9 128 9 109 6 17 9 16 6 14 1 1 6 1 5 1 2 0 2 0. 2 0 2 0 1 0 1 0 . 1 0. 3 0 3 0 2 20 . 4 IB 9 18 . 1 0 .0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 1 0.0 0 0 0 . 1 0 1 0 1 0 2 1 0 0 8 2 0 8 3 6 3 15 . 2 39 8 30 1 72 . 2 112 9 85 .5 204 .9 198 5 150 4 360 2 231 .9 175 7 420 7 195 3 148 0 354 . 3 125 5 95 1 227 7 62 .4 47 3 113 2 86 0 35 4 9 a 35 9 14 8 4 i 47 4 19 5 5 4 71 7 29 5 8 .2 77 7 32 0 a .9 66 .6 27 .4 7 .6 58 7 23 3 6 5 49 . 3 20 3 5 6 58 7 23 3 6 . 5 66 6 27 4 7 .6 77 . 7 32 0 6 9 71 . 7 29 5 8 2 47 4 19 5 5 4 35 9 14 a 4 1 86 0 35 4 9 a Figure 5.8: The numerical representation of the defocussed test images of the Cross L and Box. These matrices form the right hand side of the linear system and show the detail the is not available once they are digitized o ° 4 . 2 1 6 8 L t e r o t u o n n u m b e r 10 Figure 5.9: The measure of the exact error against the iteration number for the test images of the Cross.L and Box. Chapter 5. Numerical Application 96 i n d e x Figure 5.10: The spectrum of singular values for the coefficient matrix in each of the reconstructions. for the cross; this illustrates the actual numerical effects and gives more detail than is available after digitization. The digitized equivalent images are shown in Fig.5.12 using the 9 grey levels. The images at various iterations are shown for the ' L ' Fig.5.13 and for the Box Fig.5.14 In each case good estimates of the original images are retrieved from very degraded pictures. Even using the chosen, elementary, digitization the results are quite dramatic. The results confirm that the method of Conjugate Gradients will compute a smooth approximation to the exact solution. The work requirements to perform the retrieval are extremely favourable. It is interesting to note that in the case of the Box the iterated estimates retain the symmetry of the problem. The elements a; and pi together with the L2-norms of the direction and residual are shown for each case in Tables 5.12, 5.13 and 5.14. As anticipated the residual is unreliable as an indication of the accuracy of the estimates. The magnitude of and Pi do demonstrate the rapid increase at the termination point. For the "Cross" there is an increase in 8 by 102 from iteration 3 to iteration 4; the best solution is at iteration Chapter 5. Numerical Application 97 Iteration nuabar = 1 1.1 1.4 1.8 2 1 2.3 2 3 1 9 1 4 1 0 0 9 1 1 1 2 1 2 1 i 0 9 1.1 1.5 1.9 2 3 2.5 2 4 2 0 1 5 1 1 1 0 1 1 1 3 1 3 1 2 1 0 1.1 1.5 1.9 2 2 2 3 2 2 1 9 1 4 1 1 1 0 1 1 1 3 1 3 1 2 1 0 1.1 1. 4 1. 7 1 9 2.0 1 8 1 5 1 1 0 9 0 9 1 1 1 3 1 3 1 2 1 0 1.1 1 .4 1.6 1 6 1.6 1 3 1 1 0 a 0 8 0 9 1 1 1 3 1 4 1 3 1 0 1 .0 1 . 2 1 . 3 1 3 12 0 9 0 7 0 a 0 6 0 a 1 0 1 2 1 2 1 1 0 9 0.7 0 9 0.9 0 9 0.1 0 8 0 5 0 4 0 4 0 a 0 7 0 9 0 9 0 8 0 7 0.4 0.5 0.6 0 6 0.5 0 4 0 3 0 3 0 3 0 3 0 5 0 5 0 6 0 5 0 4 0.2 0.3 0 3 0 3 0.3 0 3 0 3 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0. 1 0. 1 0.2 0 3 0.4 0 4 0 4 0 3 0 2 0 1 0 1 0 1 0 1 0 1 0 1 0. 1 0.2 0.3 0 5 0.6 0 7 0 7 0 5 0 3 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0 5 0 8 1 1 1 2 1 1 0 a 0 4 0 2 0 1 0 0 0 0 0 0 0 0 0 2 0.4 0 7 1 1 1.5 1 7 1 5 1 i 0 6 0 2 0 1 0 0 0 0 0 0 0 0 0.2 0 4 0.7 1 2 1.6 1 6 1 7 1 2 0 6 0 3 0 1 0 1 0 0 0 0 0 0 0.2 0 4 0 7 1 1 1.5 1 7 1 6 1 i 0 a 0 2 0 1 0 1 0 1 0 0 0 0 Itarat ion nuabar 2 -8 4 -5.0 -2 0 2 2 6.5 9 0 9 0 6 8 3 a 1 8 0 7 •0 3 • 1 4 -2 4 -3 0 -7.0 -5.5 -2.3 2 3 6.9 9 6 9 6 7 2 4 i 1 9 0 6 -0 4 • 1 a -2 7 -3 4 -5.4 -3.9 - 1.0 3 0 6 9 9 1 8 9 6 a 4 0 2 3 1 4 0 a -0 5 -1 5 -2 3 -1.7 •0.2 2.1 4 8 7. 1 8 0 7 4 5 7 4 0 3 3 3 3 3 1 2 5 1 5 0 5 2.4 3.8 5.4 6 7 7.3 6 8 5 7 4 a 4 0 4 4 5 3 5 9 5 7 4 7 3 5 4 3 5 6 6.7 7 0 6 8 5 4 4 2 3 5 3 a 4 5 5 9 8 8 6 8 6 0 4 8 4 0 4.9 5.6 5 7 5 0 3 9 2 9 2 5 2 7 3 a 4 a 5 8 5 7 5 1 4 1 2.6 3.2 3.7 3 8 3.4 2 7 2 1 1 7 1 a 2 3 3 0 3 6 3 a 3 3 2 7 0 9 1.3 1.7 2 0 2.1 2 0 1 7 1 3 1 1 1 1 1 3 1 5 1 5 1 3 1 1 -0.3 0.1 0.5 1 2 2.0 2 4 2 3 1 7 0 a 0 5 0 3 0 3 0 3 0 2 0 2 -1.5 -1.1 -0.2 1 3 2.9 4 0 3 9 2 9 1 4 0 4 0 0 -0 2 -0 2 •0 2 -0 3 -3 0 2 4 -0.8 1 7 4 6 6 4 6 4 4 7 2 3 0 a -0 1 -0 4 -0 5 -0 5 -0 6 -4.4 3.6 -1.4 2 1 6.1 8 a a 7 a 3 3 1 0 9 -0 2 •0 a -0 7 -0 8 -0 a -5.0 -4 1 -1.6 2 3 6 7 9 8 9 6 7 0 3 5 1 0 -0 2 -0 6 -0 a -0 9 -0 9 -4 6 -3 8 -1.5 2 2 6 2 8 9 9 0 6 8 3 3 0 9 •0 1 -0 5 -0 7 •0 a -0 8 Itarat ion nuaber 3 -8 5 -4.8 -1.7 2 5 6 5 9 1 9 1 a a 3 5 1 5 0 a 0 4 -0 4 -1 5 -2 5 -7.3 -5.5 -2. 1 2 3 6.8 9 6 9 7 7 2 3 7 1 4 0 a 0 1 •0 a -2 0 -3 0 -6. 1 -4.3 -12 2 8 6.7 9 1 9 0 a 7 3 7 1 a 1 2 0 8 -0 0 • 1 2 -2 3 -2.6 -0.8 1. 7 4 8 7.0 8 0 7 5 5 8 3 9 3 0 3 2 3 2 2 5 1 3 -0 0 1. 5 3.3 5.2 6 8 7.4 7 1 6 0 4 8 4 1 4 a 5 6 6 1 5 7 4 4 2 9 3.0 5.4 6 8 7 4 7 0 5 9 4 a 3 8 3 9 5 0 6 4 7 3 7 1 5 9 4 4 3 9 5. 1 8.0 8 2 5.8 4 4 3 3 2 8 3 1 4 i 5 4 6 1 6 1 5 3 4 1 2.6 3.4 4.0 4 2 3.9 3 1 2 4 z 0 2 1 2 7 3 5 4 0 4 0 3 5 2 7 0 7 1.2 1 7 2 1 2.3 2 3 2 0 1 5 1 2 1 2 1 4 1 6 1 5 1 3 1 0 -0.7 -0.3 0 3 1 2 2.1 2 6 2 5 1 9 1 0 0 4 0 3 0 2 0 2 0 1 0 0 •19 •1.4 -0 4 1 2 3.0 4 2 4 2 3 0 1 4 0 3 -0 1 -0 1 -0 1 -0 2 •0 3 3 4 -2 7 - 10 1 6 4 5 6 5 6 7 4 8 2 2 0 4 -0 2 -0 2 •0 2 •0 2 •0 4 -4.8 -3.8 -1.6 1 9 5.9 8 7 a 9 6 5 2 9 0 5 •0 2 •0 2 -0 2 •0 3 -0 4 •5 4 •4.3 -1.8 2 1 6.5 9 6 9 a 7 1 3 2 0 6 -0 2 -0 2 -0 2 -0 3 -0 5 -5.0 -4.0 - 7 2 0 6. 1 8 9 9 2 6 7 3 0 0 6 -0 2 -0 2 •0 1 •0 2 -0 4 ItaratIon nuaber 4 -6. 1 -2 4 12 4 4 7 2 9 5 9 8 a 7 1 a •0 8 0 3 2 3 2 7 1 0 • 1 8 -7.9 -4.0 •0.0 3 5 6 9 9 7 10 2 a 9 1 3 • 1 6 -0 6 1 3 1 5 •0 4 -3 2 •8 6 •5.0 -1.0 2 7 6. 1 8 8 9 3 a 3 1 4 • 1 2 -0 7 0 6 0 3 -1 8 -4 5 -6 8 -3.4 0.5 3 9 6.5 6 1 8 1 5 9 2 6 1 2 1 9 2 4 1 4 • 1 2 -4 0 -2.4 1 .0 4.5 7 2 8.4 8 3 7 3 5 8 4 6 5 0 6 2 6 6 5 0 2 1 • 1 0 1 5 4.6 7 6 9 4 9.4 8 1 6 5 5 5 5 a 7 0 8 8 9 3 7 9 5 1 2 1 3 2 5.6 7 8 8 9 8.5 7 0 5 4 4 6 5 0 6 5 a 3 8 9 7 9 S a 3 4 2.5 4. 1 5.6 6 4 6.3 5 3 4 2 3 5 3 6 4 5 5 6 6 1 5 6 4 3 2 a -0.3 0.6 1.6 2 7 3.5 3 7 3 4 2 6 1 9 1 6 1 8 1 9 1 5 1 0 0 3 -2.2 •16 -0.6 0 9 2 6 3 6 3 9 2 6 1 1 0 0 -0 3 •0 3 •0 5 -0 7 -0 9 -3.3 •2.4 -1.0 1 0 3 4 5 3 5 7 3 9 1 1 •0 6 •0 9 -0 6 -0 4 -0 6 -0 9 -4.3 •2.9 -1.0 1 6 4.7 7 5 8 1 5 5 1 3 • 1 2 -1 2 •0 3 0 2 -0 0 -0 a 5.3 •3.4 -1.0 2 0 5 8 9 3 10 2 a 9 1 4 • 1 8 -1 6 -0 2 0 6 0 3 -0 a -5.7 •3.5 • 1.0 2 2 6.2 10 0 11 0 7 4 1 3 -2 1 -1 8 -0 1 0 8 0 4 -0 a -5.3 •3.2 -0.9 2 1 5.6 9 3 10 2 a 9 1 2 - 1 9 -1 6 •0 1 0 7 0 4 •0 a Figure 5.11: The exact numerical results from iterations 1,2.3, and 4 from the method of Conjugate Gradients for the Cross test image. Figure 5.12: The visual representation for the results of iterations 1.2,3 and 4 for the Cross test image. Comparison with the numerical results in the previous figure shows the digitization effects. Figure 5.13: T h e visual representation for the results of i terations 1,3.4 and 5 for the L test image. T h e same grey scale is used for the d ig i t i za t ion as before. Chapter 5. Numerical Application 100 Figure 5.14: The visual representation for the results of iterations 2,3,4 and 5 for the Box test image. The same grey scale is used for the digitization as before. Chapter 5. Numerical Application 101 Itn Pnorm a Resi du al 1 6.953 X 105 2.210 x 10" •5 0 5.165 x IO3 2 5.414 X 105 9.745 x 10" -5 4.253 x 10- i 2.576 x IO3 3 8.500 X 104 5.664 x 10" -5 3.353 x 10- 2 2.499 X IO3 4 2.303 X 105 9.353 x 10" -5 2.274 2.186 X IO3 5 7.209 X 104 5.573 x 10" 4 1.984 x 10" 1 1.745 X IO3 6 8.019 X IO4 5.211 x 10" -5 8.527 x 10" 1 1.705 X IO3 7 1.808 X 105 8.095 x 10" -5 2.057 1.570 X IO3 8 2.844 X IO4 3.472 x 10" -4 9.461 x 10" 2 1.512 X IO3 9 3.835 X IO4 5.439 x 10" -5 1.067 1.502 X IO3 10 9.529 X IO4 1.756 x 10" 4 2.304 1.426 X IO3 Table 5.12: The analysis of the parameters, direction norm and residual for the first 10 iterations of Conjugate Gradients for the Cross test example. 3. In the case of the " L " such an increase does not occur until iteration 7; however the error does not increase as rapidly as for the "Cross". The "Box" is more troublesome as it is not until iteration number 9 that any substantial increase is demonstrated; (102 for a). In each case fluctuation in the magnitude of both a and 8 does occur after four iterations. The large magnitude of the singular values of the coefficient matrix accounts for the small values for the a's. 5.2.1 Termination If the iteration process is continued too far the picture becomes unstable. This is the expected feature of solutions to ill-posed problems. The question arises as to when to terminate the process for practical cases where the exact error is not available. An exact solution to the question (as with other solution methods for ill-posed problems) is not possible; however, there are various guidelines that may be considered. 1. Since each iteration is a relatively uncostly process one option is simply to view each of the images. This is the simplest, possibly least effective method, and involves no Chapter 5. Numerical Application 102 Itn Pnorm a 3 Residual 1 4.969 X 105 1.828 x io--5 0 3.038 x IO 3 2 2.496 X 105 1.190 x 10" -4 2.087 x 10" i 1.760 x IO 3 3 7.280 X 104 3.057 x 10" -5 9.248 x 10- 2 1.718 x IO 3 4 9.610 X 104 2.382 x 10" -4 9.449 x l O - 1 1.371 x IO 3 5 4.718 X 104 5.249 x 10" -4 3.044 x 10- 1 1.077 x IO 3 6 6.886 X 104 3.678 x 10" -5 1.184 1.049 x IO 3 7 9.709 X IO4 9.987 x 10" -5 1.249 9.475 x 102 8 1.665 X IO4 4.228 x 10" -4 9.486 x 10" -2 9.036 x IO2 9 2.788 X IO4 6.556 x 10" -5 1.363 8.941 x 102 10 8.449 X IO4 1.280 x io--4 2.867 8.386 x IO2 Table 5.13: The analysis of the parameters, direction norm and residual for the first 10 iterations of Conjugate Gradients for the L test example. Itn Pnorm a 3 Residu al 1 1.460 x 105 2.439 x io--5 0 1.517 x 10 3 2 1.358 x 105 9.372 x io--5 5.560 x 10" i 1.092 x 10 3 3 3.222 x 104 8.436 x 10" -4 7.814 x 10" 2 6.413 x 102 4 3.346 x 104 2.860 x 10" -4 6.841 x 10- 1 4.797 x 102 5 1.493 x 104 1.987 x 10" -4 2.454 x 10" 1 4.463 x 102 6 5.567 x 103 8.169 x 10" -4 1.618 x 10" 1 4.227 x 102 7 1.249 x 104 7.085 x 10" -4 1.875 3.811 x 102 8 3.220 x 103 2.332 x 10" -3 1.477 x 10- 1 3.592 x 102 9 1.022 x 104 1.246 x 10 -3 2.857 3.228 x 102 10 2.388 x 103 3.817 x 10" -5 1.571 x 10" 1 3.226 x 102 Table 5.14: The analysis of the parameters, direction norm and residual for the first 10 iterations of Conjugate Gradients for the Box test example. Chapter 5. Numerical Application 103 mathematics at all. The decision, as to when to stop, is then at the discretion of the user. It will certainly be only a few iterations and since it was seen that after reaching the minimum the exact error will increase monotonically there is not a high risk of excessive work being performed using this method. 2. The magnitude of the constants from the algorithm may be used. In the previous chapter it was suggested that consideration of the values of the parameters and Bi which are readily available from the algorithm may give a useful guide on the behaviour of the error. An apparent rapid increase in the size of either the a; or suggests that a convenient termination point has been achieved. 3. The singular value decomposition of the matrices K\ and K2 can be achieved in only 0(ns) operations. It is possible then to obtain the singular values of the coefficient matrix Kj and its decay can be used to consider the point at which a truncated SVD may be terminated and thus a convenient number of iterations to be performed by the method of Conjugate Gradients. Chapter 6 D i s c u s s i o n This thesis has demonstrated that the method of Conjugate Gradients may be used for the efficient computation of a smooth approximation to the solution of large sparse il l-conditioned linear system of equations. Clear reasoning has been given for the successful filtering, of the oscillatory components, at each iteration and the method of Conjugate Gradients has been shown to give similar results to the method of truncating the SVD with significantly less computational effort. The work forms a basis for a more quantita-tive study for specific applications. Various points are raised : • The decision as to when to terminate the iterative process is a complex problem. In cases where a priori knowledge is available regarding the singular value spectrum of the system a similar criterion over the choise of regularization parameter to that of truncating the SVD [40] is recommended. In most practical applications all that is available are the parameters a:; and 8i from the algorithm. It is felt that these may be used in forming a termination criterion but since they are dependent on the noise in the system any generalization is not possible. • It has been shown that the polynomial at the fcth iteration is the solution to a weighted least squares problem. The weights are specific to the problem, making exact predictions as to the characteristics of the polynomial very difficult. Insight into the behaviour of the polynomial at the /cth iteration was made possible by demonstrating that it approximates the interpolating polynomial through zero and 104 Chapter 6. Discussion 105 the first k singular values with the value one as the ordinate. With this simplifica-tion a qualitative discussion of the filtering properties was presented. The similarity of the interpolating polynomial (when considering evaluation only at the singular values) to the step function means that the filtering properties of the method of Conjugate Gradients are favourable for use as an iterative filtering technique. A more precise description of the polynomial is highly complex due to the dependence of the coefficients on the "noise" in the data. Further study into the weighted least squares problem (4.110) would enable a more precise evaluation of the features and may give more guidance on how to set a termination criterion. • Although preconditioning is seen to improve the performance of the method of Conjugate Gradients (see for example [25]) it is not obvious that this is the case for ill-conditioned systems. In many cases the number of iterations required by the basic algorithm to achieve the best smooth solution is acceptably small, and the additional cost that will be incurred by use of a preconditioner is not justified. Preconditioning has the effect of increasing the rate of convergence to an approx-imation that has a required accuracy in terms of the residual. In ill-conditioned cases, where the residual is a poor estimate of accuracy, this acceleration may cause the algorithm to continue past the best smooth solution without ever computing it, so that the slower convergence without preconditioning is preferable. Systems for which the noise level is very low are effectively well-conditioned and precondi-tioning, using the algorithm detailed in §3.4.2, may be recommended to improve a prohibitively slow iteration process. Further work is recomended to assess the pos-sibility of effective application of the preconditioned Conjugate Gradients method to ill-conditioned linear systems. Bibl iography [1] R.S.Anderssen et. al., Application of Numerical Solution of Integral Equations, Si-jthoff & Noordhoff, (1978). [2] E.S. Angel and A . K . Jain, "Restoration of images degraded by spatially varying pointspread functions by a conjugate gradient method", Applied Optics 17, (1978). [3] O.Axelsson, "A generalized SSOR method", BIT 12, (1972) 443-467. [4] O.Axelsson, "A class of iterative methods for finite element equations", Comp. Meth in App. Mech & Eng. 9, (1976) 123-137. [5] C.T.E. Baker, Tiie Numerical Treatment of Integral Equations, Clarendon Press, (1977). [6] C.T.E. Baker, C Fox and C. Miller, "Numerical Solution of Fredholm Integral Equa-tions of the First Kind", Comp. J. 7 (1969), 141-148. [7] A. Bjorck, "Least Squares Methods", Handbook of Numerical Analysis, Ciarlet and Lions, editors, (to appear). [8] A. Bjorck, E. Elden, "Methods in Numerical Algebra for Ill-Posed Problems", Lith-Mat-R-33-1979, (1979). [9] L l .G . Chambers, Integral Equations : A Short Course, International Textbook Com-pany, (1976). [10] R. Chandra, "Conjugate Gradient Methods for Partial Differential Equations", Ph.D. Thesis Yale Univ. (1978). 106 Bibliography 107 [11] C. Concus, G. Golub and Meurant, "Block Preconditioning for the Conjugate Gra-dient Method", SIAM.J.Sci.Stat.Comp 6, (1985). [12] C. Concus, G. Golub and D. O'Leary, "A Generalized Conjugate Gradient Method for the Numerical Solution of Elliptic Partial Differential Equations", Sparse Matrix Computations, Bunch and Rose, editors, (1976). [13] L . M . Delves and J.Walsh, Numerical Solution of Integral Equations, Clarendon Press, Oxford, (1974). [14] L.Elden and I. Skoglund, "Algorithms for the Regularization of ill-conditioned least squares problems with tensor product structure and application to space-varient image reconstruction", Lith-Mat-R-1982-48, (1982). [15] A.George and J.W. Liu, Computer Solution of Large Sparse Positive Definite Sys-tems, Prentice-Hall Series in Computational Mathematics, (1981). [16] G.Golub and. C F . Van Loan, Matrix Computations, John Hopkins (1985). [17] J . Graves and P .M. Prenter, "Numerical Iterative Filters applied to First Kind Fred-holm Integral Equations", NumerMath 30 (1978) 281-299. [18] C.W. Groetsch, The theory of Tikhonov Regularization for Fredholm Equations of the First Kind, Springer-Verlag, Berlin, (1984). [19] P.C. Hansen, "Computing the Singular Value Expansion", Computing (to appear). [20] M.R. Hestenes, "Pseudoinverses and Conjugate Gradients" Comm. A.CM. 18, (1975) 40-43. [21] M.R. Hestenes, Conjugate Direction Methods in Optimization, Springer-Verlag, (1980). Bibliography 108 [22] M.R. Hestenes and E.Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems" J.Res.Nat.Bur.St 49 (1952) 409-436. [23] P. Lancaster and M . Tismenetsky, The Theory of Matrices with Applications, Com-puter Science and Applied Mathematics, Academic Press, (1985). [24] L.Landweber "An iterative formula for Fredholm integral equations of the first kind", Am.JMath 73, (1951), 615-624. [25] J .A. Meijerink and H.A. van der Vorst, "An Iterative Solution Method for Lin-ear Equations Systems of which the coefficient Matrix is a Symmetric M-Matrix", JMath.Comp 31 (1977) 148-162. [26] J.LI. Morris, Computational Methods in Elementary Numerical Analysis, John Wi-ley & Sons (1984). [27] D. O'Leary and J.Simmons, "A Bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems", SIAM.J.Sci.Stat.Comp 2 (1981). [28] D.L. Phillips, "A technique for the numerical solution of certain integral equations of the first kind", J. ACM. 9 (1964) 84-97. [29] J .K. Reid, "On the method of Conjugate Gradients for the Solution of Large Sparse Systems of Linear Equations", Large Spase Sets of Linear Equations, J .K. Reid editor, Academic Press, New York, (1971), 231-254. [30] J.K.Reid, "The use of conjugate gradients for systems of linear equations possessing Property A " , SIAM. J. Num. Anal. bf9 (1972) 325-332. [31] A. Rosenfeld and A.C. Kak, Digital Picture Processing Vol. 1, Computer Science and Applied Mathematics, Academic Press, (1982). Bibliography 109 [32] Y . Saad, "Practical use of Polynomial Preconditioners for the Conjugate Gradients Method", SIAM.J.Sci.Stat.Comp 6 (1985). [33] M . A Saunders, "Sparse Least Squares by Conjugate Gradients : A comparison of Preconditioning Methods", Rep. Sol. 79-5, Systems Optics Laboratory, Stanford C.A., (1979), 15-20. [34] F. Smithies, Integral Equations, Cambridge University Press, (1958). [35] W. Squire, "Solution for Rl-Conditioned Unear systems arising from Fredholm Equa-tions of the first kind by steepest descent and Conjugate Gradients", Int.J for Num.Meth in Eng 1 0 (1976), 607-617. [36] O.N. Strand, "Theory and Methods related to the singular-function expansion and Landwebers iteration for integral equations of the first kind", SIAM.J.Num.Anal. 1 1 (1974), 798-825. [37] A . N . Tikhonov and V . Y . Arsenes, Solutions of ill-posed problems, John Wiley &; Sons, (1977). [38] J . M . Varah, "On the numerical solution of ill-conditioned linear systems with appli-cations to ill-posed problems", SIAM.J.JVum.Anai. 1 0 (1973), 257-267. [39] J.M.Varah, "A practical examiniation of some Numerical Methods for Linear discrete ill-posed problems", SI AM Review 2 1 (1979), 100-111. [40] J.M.Varah, "Pitfalls in the Numerical solution of linear ill-posed problems", SIAM.J.Sci.Stat.Comp. 4 (1983), 164-176. [41] R.Varga, Matrix Iterative Analysis, Prentice-Hall, (1962). Bibliography 110 [42] C.R.Vogel, "Solving ill-conditioned linear systems using the Conjugate Gradients Method", preprint, Dept of Math Sc., Montana State Univ., (1988). [43] G. Wahba, "Practical approximate solution to linear equations when the data is noisy", SIAM.J.Num.Anal Vol 14 (1979), 651-667. [44] Y.S. Wong and H. Jiang, "Approximate Polynomial Preconditioning Applied to Bi-harmonic Equations on Vector Supercomputers", NASA Tech Mem 100217 ICOMP-87-5 (1987). [45] D . M . Young, Iterative solution of Large Linear Systems, Academic Press, (1971). Appendix A Appendix This appendix details the particular results required from the theory of Kronecker prod-ucts for the application in Chapter 5. A . l Kronecker Products The formulation of the mathematical image problem used in Chapter 5 requires the use of the Kronecker product of matrices. The following results are summarized from [23]. 1. The Kronecker (tensor) product of two (n x n) matrices results in an (n 2 x n2) matrix and is performed as follows A®B = ^ a n B a12B ... alnB ^ a2\B a22B ... a2nB \ &n\B annB ) 2. The result of multiplying the Kronecker product of two (n X n) matrixes by an (n2) vector is the (n x n) matrix given by (A ® B)x = AXB (A.153) 3. The Kronecker product of two matrix-products is equivalent to the matrix product of two Kronecker products performed in order, i.e., (AB) ® (CD) = {A ® C)(B ® D) (A.154) 111 Appendix A. Appendix 112 4. The transpose of a Kronecker product is equivalent to the Kronecker product of the transpose of both matrices, i.e., 5. If the singular values of the matrix A are pa for s = 1, 2 , . . . ,n and the singular values of the matrix B are A r for r = 1,2, ...n then the singular values of the matrix that is the Kroneker product of A and B (A ® B) are the n2 values Xrp3. 6. The expansion of an (n x n) matrix into an n2-vector is achieved by stacking the rows of the matrix end on end. (A B)T = AT BT (A .155) F ft ( fi \ f \ fn J