A Levenberg-Marquardt Method For Large-Scale Bound-Constrained Nonlinear Least-Squares by Shidong Shan B S c (Hon.), A c a d i a University, 2006 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF Master of Science in T h e Faculty of Graduate Studies (Computer Science) T h e University of B r i t i s h C o l u m b i a (Vancouver) J u l y 2008 © Shidong Shan 2008 Abstract T h e well k n o w n L e v e n b e r g - M a r q u a r d t m e t h o d is used extensively for solving nonlinear least-squares problems. W e describe a n extension of the LevenbergM a r q u a r d t m e t h o d to problems w i t h b o u n d constraints on the variables. E a c h iteration of our a l g o r i t h m approximately solves a linear least-squares problem subject to the original b o u n d constraints. O u r approach is especially suited to large-scale problems whose functions are expensive to compute; only m a t r i x vector products w i t h the Jacobian are required. W e present the results of n u merical experiments that illustrate the effectiveness of the approach. Moreover, we describe its application to a p r a c t i c a l curve fitting problem i n fluorescence o p t i c a l imaging. Contents Abstract ii Contents iii L i s t of Tables L i s t of Figures Acknowledgements v vi vii 1 2 3 1 Introduction 1.1 Overview 1.2 Definitions and n o t a t i o n 2 Background on Optimization 2.1 Unconstrained o p t i m i z a t i o n 2.1.1 Line-search methods 2.1.2 Trust-region methods 2.2 Bound-constrained o p t i m i z a t i o n 2.2.1 Active-set methods 2.2.2 G r a d i e n t projection methods 2.3 A l g o r i t h m s for bound-constrained o p t i m i z a t i o n 5 5 6 6 8 8 9 10 3 U n c o n s t r a i n e d Least Squares 3.1 Linear least squares 3.1.1 N o r m a l equations 3.1.2 T w o n o r m regularization 3.2 Nonlinear least squares 3.2.1 The Gauss-Newton method 3.2.2 The Levenberg-Marquardt method 3.2.3 U p d a t i n g the d a m p i n g parameter 12 12 12 13 14 14 16 17 4 N o n l i n e a r Least Squares W i t h Simple B o u n d s 4.1 M o t i v a t i o n 4.2 Framework 4.3 S t a b i l i z a t i o n strategy 4.4 T h e subproblem solver 20 20 21 24 25 4.5 The B C N L S algorithm 27 5 Numerical Experiments 5.1 Performance profile 5.2 T e r m i n a t i o n criteria 5.3 B e n c h m a r k tests w i t h A l g o r i t h m 566 5.4 C U T E r feasibility problems 30 30 31 31 34 6 C u r v e F i t t i n g i n Fluorescence O p t i c a l Imaging 6.1 B r i e f introduction to t i m e domain o p t i c a l imaging 6.2 M a t h e m a t i c a l models of fluorescence signals 6.3 C u r v e fitting 6.4 Fluorescence lifetime estimation 6.4.1 F i t t i n g parameters 6.4.2 Parameter bounds 6.4.3 Simulated d a t a for lifetime fitting 6.4.4 E x p e r i m e n t a l results 6.5 Fluorophore depth estimation 6.6 O p t i c a l properties estimation 6.6.1 S i m u l a t i o n for estimating o p t i c a l properties 6.6.2 Results for optical properties 6.7 Discussion 38 38 40 41 42 42 43 43 43 45 48 49 50 50 7 Conclusions a n d F u t u r e W o r k 52 Bibliography 53 List of Tables 5.1 5.2 5.3 Results of fifteen benchmark tests from A l g o r i t h m 566 . . . . . . S u m m a r y of the benchmark tests from A l g o r i t h m 566 S u m m a r y of C U T E r feasibility tests 32 33 36 6.1 6.2 6.3 C o m p a r i s o n of lifetime fitting w i t h and without D C component . C o m p a r i s o n of lifetime fitting using B C N L S and A R T _ L M . . . . C o m p a r i s o n of depth fitting results for (A) a l l d a t a points, (B) d a t a points w i t h depth > 1 m m , (C) d a t a points w i t h S N R > 20, (D) d a t a points w i t h S N R > 20 a n d depth > 1mm C o m p a r i s o n of depth fitting results for B C N L S and A R T X M for d a t a points w i t h S N R > 20 a n d d e p t h > 1mm and /x^ values used i n s i m u l a t i o n C o m p a r i s o n of results of o p t i c a l properties w i t h A R T _ L M and BCNLS 47 48 6.4 6.5 6.6 48 49 50 50 List of Figures 3.1 3.2 Geometrical interpretation of least squares T h e image of function q{p) 5.1 Performance profiles on benchmark problems from A l g o r i t h m 566: number of function evaluations Performance profiles on benchmark problems from A l g o r i t h m 566: C P U time Performance profile on C U T E r feasibility problems, number of function evaluations Performance profile on C U T E r feasibility problems, C P U t i m e . 5.2 5.3 5.4 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 T i m e domain optical imaging Fluorescent decay curve Fluorophore identification w i t h lifetime A n example of fitted T P S F curve Lifetime fitting results of simulated d a t a w i t h v a r y i n g depth, without fitting D C component Lifetime fitting results of simulated d a t a w i t h v a r y i n g depth, w i t h fitting D C component E s t i m a t e d depth versus true depth Relative accuracy of calculated depth versus signal S N R 13 18 34 35 37 37 39 39 40 44 45 46 47 49 Acknowledgements I a m grateful to m y supervisor D r . M i c h a e l Friedlander, who made the subject of N u m e r i c a l O p t i m i z a t i o n exciting and beautiful. W i t h his knowledge, guidance, a n d assistance, I have managed to accomplish t h i s work. I would like t o t h a n k D r . Ian M i t c h e l l for his careful reading of this thesis a n d his helpful comments that have improved this thesis. I a m fortunate to have had the financial support for this work from the N a t u r a l Sciences a n d Engineering Research C o u n c i l of C a n a d a ( N S E R C ) i n the form of a postgraduate scholarship. A l s o , I a m t h a n k f u l for the M I T A C S Accelerate B C internship opportunity at A R T A d v a n c e d Research Technologies Inc. It has been a privilege to work w i t h G u o b i n M a , Nicolas R o b i t a i l l e , a n d S i m o n Fortier i n the Research and Development group at A R T Advanced R e search Technologies Inc. T h e i r work and ideas o n curve fitting i n fluorescence o p t i c a l imaging have formed a strong foundation for C h a p t e r 6 i n this thesis. I n a d d i t i o n , I wish to t h a n k the three of t h e m for reading a draft of C h a p t e r 6 a n d their valuable comments a n d suggestions for i m p r o v i n g the chapter. T h e Scientific C o m p u t i n g L a b o r a t o r y at the U n i v e r s i t y of B r i t i s h C o l u m b i a has been a pleasant place to work. I have enjoyed the support and company of m y current and past colleagues: E w o u t v a n den B e r g , E l i z a b e t h Cross, M a r i s o l Flores G a r r i d o , H u i H u a n g , D a n L i , T r a c y L a u , a n d Jelena Sirovljevic. A m o n g t h e m , E w o u t van den B e r g has always been of great help to me. I a m deeply indebted to m y parents a n d f a m i l y for their unconditional love a n d understanding. M y brother H o n g q i n g has inspired me and set a great example for me to fulfill m y dreams. T h a n k s are also due to many of my friends for their care and friendship over the years. Above a l l , I sincerely t h a n k my best friend a n d partner B i l l H u e , whose love, support, and faithful prayers give me strength and courage to cope i n difficult times. W i t h o u t h i m , I could have not achieved this far i n m y academic endeavor. F i n a l l y , I want to dedicate this thesis to my late father, who suddenly passed away while I was w o r k i n g on this thesis. I n his life, my father was always passionate that his children w o u l d have every educational opportunity. There is no doubt that my father w o u l d be very p r o u d i f he could live to witness another academic accomplishment of his youngest daughter. T o my father, K u n s o n g Shan Chapter 1 Introduction T h i s thesis proposes a n algorithm for nonlinear least-squares problems subject t o simple bound constraints on the variables. T h e problem has the general form minimize 5l|r'(a;)||2 subject t o (,<x<u, where i,u G E " are vectors of lower a n d upper bounds on x, and r is a vector of real-valued nonlinear functions " rx{x) we assume throughout that each ri{x) : E " —>• R is a twice continuously difFerentiable nonlinear function. Nonlinear least-squares problems arise i n m a n y data-fitting applications i n science a n d engineering. F o r instance, suppose that we model some process by a nonlinear function 0 that depends on some parameters x , a n d that we have the actual measurements j/j at time ti\ our a i m is t o determine parameters x so that the discrepancy between the predicted a n d observed measurements is m i n i m i z e d . T h e discrepancy between the predicted a n d observed d a t a c a n be expressed as r i ( x ) = 4>{x,ti) - Vi. T h e n , the parameters x can be estimated by solving the nonlinear least-squares problem minimize |||T"(a;)||2- (1.1) Very large nonlinear least-squares problems arise i n numerous areas of a p plications, such as medical imaging, tomography, geophysics, and economics. I n many instances, both t h e number of variables a n d the number of residuals are large. It is also very common that only the number of residuals is large. M a n y approaches exist for the solution of linear a n d nonlinear least-squares problems, especially for the case where the problems are unconstrained. See Bjorck [3] for a comprehensive discussion of algorithms for least squares, i n c l u d i n g detailed error analyses. Because of the high demand i n the industry, nonhnear least-squares software is fairly prevalent. M a j o r numerical software libraries such as S A S , and programming environments such as M a t h e m a t i c a a n d M a t l a b , contain n o n l i n ear least-squares implementations. Other h i g h quality implementations include, for example, D F N L P [50], M I N P A C K [38, 41], a n d N L 2 S 0 L [27]; see M o r e and W r i g h t [44, Chapter 3]. However, most research has focused on the nonlinear least-squares problems without constraints. F o r problems w i t h constraints, the approaches are fewer (also see B j o r c k [3]). In practice, the variables x often have some known lower and upper bounds from its physical or natural settings. T h e a i m of this thesis is to develop an efficient solver for large-scale nonlinear least-squares problems w i t h simple bounds on variables. 1.1 Overview T h e organization of the thesis is as follows. I n the remainder of this chapter, we introduce some definitions and notation that we use i n this thesis. Chapter 2 reviews some theoretical background on unconstrained and bound-constrained o p t i m i z a t i o n . O n unconstrained o p t i m i z a t i o n , we summarize the basic strategies such as line-search and trust-region methods. F o r bound-constrained o p t i m i z a t i o n , we review active-set and gradient-projection methods. W e also briefly discuss some existing algorithms for solving the bound-constrained o p t i m i z a t i o n problems, i n c l u d i n g A S T R A L [53], B C L S [18], a n d L - B F G S - B [57]. C h a p t e r 3 focuses on unconstrained least-squares problems. W e start w i t h a brief review of linear least-squares problems a n d the normal equations. W e also review two-norm regularization techniques for solving ill-conditioned linear systems. For nonlinear least-squares problems, we discuss two classical nonlinear least squares algorithms: the G a u s s - N e w t o n method and the LevenbergM a r q u a r d t method. W e also explain the self-adaptive u p d a t i n g strategy for the d a m p i n g parameter i n the Levenberg-Marquardt m e t h o d . These two methods a n d the u p d a t i n g strategy are closely related to the proposed algorithm i n this thesis. C h a p t e r 4 is the m a i n contribution of this thesis. W e explain our proposed a l g o r i t h m , named B C N L S , for solving the bound-constrained nonlinear least-squares problems. W e describe our a l g o r i t h m w i t h i n the A S T R A L [53] framework. T h e B C N L S algorithm is based on solving a sequence of linear least-squares subproblems; the bounds i n the subproblem are copied v e r b a t i m from the original nonlinear problem. T h e B C L S least-squares solver forms the c o m p u t a t i o n a l kernel of the approach. T h e results of numerical experiments are presented i n Chapter 5. In C h a p ter 6, we illustrate one real data-fitting application arising i n fluorescence optical imaging. T h e B C N L S software package a n d its documentation are available at h t t p : / / w w w . cs. ubc. c a / l a b s / s c l / b c n l s / . 1.2 Definitions and notation In this section, we briefly define the notation used throughout the thesis. T h e gradient of the function f{x) is given by the n-vector dfix)/dxx g{x) = Vf{x) dfix)/dX2 = df{x)/dx„ a n d the Hessian is given by 9 ^ | V M dxidXn H{x) = Vy{x) Ilk. 9X2 c 120X1 = I dx'i 0X2 oxn Sim dx.H ^ X l T h e Jacobian of r{x) is the m - b y - n m a t r i x Vr^ixf Vr2{xf J ( x ) = Vr{xf = F o r nonlinear least-squares problems, the objective function has the special form fix) = ir(x)^r(x), (1.2) a n d , i n this case, g{x) = (1.3) J{xfr{x), m H(x) = J{xfj{x) + '£^ri[x)V^ri{x). (1.4) M o s t specialized algorithms for nonlinear least-squares problems exploit the special structure of the nonlinear least-squares objective function (1.2). O u r proposed algorithm also explores this structure a n d makes use of the above s t r u c t u r a l relations. W e use the following abbreviations throughout: • Xk: the value of x at iteration fc; • x*: a local minimizer of / ; • ik,Uk- the lower and upper bounds for the subproblem at iteration k. • Tk = r{xk), vector values of r at x^; • fk = f{xk), function value of f &X Xk\ • Jk — J(xk), Jacobian of r{x.) at Xjt; • gk = g{xk), gradient of f{x) • Hk = H{xk), at x^; Hessian of / ( x ) at Xk- Chapter 2 Background on Optimization T h i s chapter reviews some fundamental techniques for unconstrained and b o u n d constrained o p t i m i z a t i o n . W e begin w i t h some basic strategies for unconstrained o p t i m i z a t i o n problems, then we examine some specialized methods for b o u n d constrained problems. 2.1 UnconstrEiined optimization T h e general unconstrained o p t i m i z a t i o n problem has the simple mathematical formulation minimize f{x). (2.1) x€R" Hereafter we assume that f{x) : R " —> E is a smooth real-valued function. B y Taylor's theorem, a smooth function can be approximated by a quadratic model i n some neighbourhood of each point x £ R " . T h e o p t i m a l i t y conditions for unconstrained o p t i m i z a t i o n problem (2.1) are well-known. W e sunmiarize the conditions below without explanation. For more detailed descriptions, see N o c e d a l a n d W r i g h t [47, C h a p t e r 2]. F i r s t - O r d e r Necessary C o n d i t i o n s : If x ' is a local minimizer and / is continuously differentiable i n a n open neighbourhood of x*, then g(x*) = 0. S e c o n d - O r d e r Necesssiry C o n d i t i o n s : If x* is a local m i n i m i z e r of / and V - ^ / exists and is continuous i n an open neighbourhood of x*, then ^(x*) = 0 a n d H{x*) is positive semidefinite. S e c o n d - O r d e r Sufficient C o n d i t i o n s : Suppose that V ^ / is continuous i n a n open neighbourhood of x*, ^(x*) = 0, and H{x*) is positive definite. T h e n X* is a strict local m i n i m i z e r of / . In general, algorithms for unconstrained o p t i m i z a t i o n generate a sequence of iterates {xk}'^-Q using the rule Xk+\ = Xk+dk, where dk is a direction of descent. T h e two m a i n algorithmic strategies are based on different subproblems for c o m p u t i n g the updates dk- 2.1.1 Line-search methods L i n e search methods obtain a descent direction dk i n two stages. found such that gïdk < 0. F i r s t , dk is (2.2) T h a t is, i t requires that / is strictly decreasing along dk w i t h i n some neighbourhood of Xfc. T h e most obvious choice for search direction is d^ = —g^, which is called the steepest-des cent direction. However, any direction dk that solves the equations Bkdk = —Qk for some positive definite m a t r i x Bk also satisfies (2.2). Second, the distance to move along dk is determined by a step length ak that approximately solves the one-dimensional m i n i m i z a t i o n problem minimize a>0 f(xk+adk)- P o p u l a r line search conditions require that ak gives sufficient decrease i n / a n d ensures that the algorithm makes reasonable progress. For example, the Armijo condition imposes sufficient decrease by requiring that (2.3) f{xk + akdk) < f{xk) + aiUkÇkdk, for some constant cri € (0,1). T h e curvature progress" by requiring that ak satisfies gixk + akdkfdk condition imposes > (r2gk<ik, "reasonable (2.4) for (72 6 ( o ' i , l ) - T h e sufficient decrease (2.3) a n d curvature conditions (2.4) collectively are known as the Wolfe conditions. T h e strong Wolfe conditions strengthen the curvature condition a n d require t h a t , i n a d d i t i o n to (2.3), ak satisfies \g{xk + akdkfdk\ 2.1.2 < cr2\gldk\. Trust-region methods Trust-region methods were first developed for nonlinear least-squares problems [32, 37, 45]. W h e n f{x) is twice continuously differentiable, Taylor's theorem gives f{x + d) = f + g'^d + \(fHd + 0{\\df)- T h u s the quadratic model function ruk used at each iterate Xk is mk{d) = fk+ gld + \c[^Hkd. (2.5) T h e fundamental idea of trust-region methods is to define a trust-region radius for each subproblem. A t each iteration, we seek a solution dk of the subproblem based on the quadratic model (2.5) subject to some trusted region; minimize m^id) (2.6) subject to ||d||2 < A ^ . One of the key ingredients i n a trust-region algorithm is the strategy choosing the trust-region radius A ^ at each iteration. G i v e n a step dk, define the quantity ^ fjxk) - f(xk+dk) mfc(0)-m,(4) ' for we , - ^ ' w h i c h gives the ratio between the actual reduction and predicted reduction. T h e actual reduction represents the actual decrease of the objective function for the t r i a l step d^. T h e predicted reduction i n the denominator of (2.7) is the reduction predicted by the model function ruk- T h e choice of Afc is at least p a r t i a l l y determined by the ratio pk at previous iterations. Note that the predicted reduction should always be nonnegative. If pk is negative, the new objective value f{xi, + dk) IS greater t h a n the current value fk, so the step must be rejected. If pk is close to 1, there is good agreement between the model and the function over the step, so it is safe to expand the trust region for the next iteration. If pk is positive but significantly smaller t h a n 1, we do not alter the t r u s t region, but if it is close to zero or negative, we shrink the trust region A ^ at next iteration. G i v e n some constants 771,772,71, a n d 72 that satisfy 0 < 771 < 772 < 1 and 0 < 71 < 72 < 1, we can update the trust-region radius by { (A/i,oo), if [72Afc,Afc], if pk>m, pfc G [771,772], (2.8) (7iAfc,72Afc), if Pk<m; See, e.g. C o n n , G o u l d and Toint [9, C h a p t e r 7]. Note that (2.6) can be solved equivalently as minimize Çkd + \(f{Hk + Xkl)d, for some positive Xk that is larger t h a n the leftmost eigenvalue of Hk- T h u s , the solution satisfies {Hk + \kl)d=-gk. (2.9) T h e relationship between (2.6) a n d (2.9) is used i n C h a p t e r 4 when we describe the B C N L S a l g o r i t h m . For a s u m m a r y of trust-region methods, see Nocedal a n d W r i g h t [47, C h a p t e r 4]. 2.2 Bound-constrained optimization B o u n d constrained o p t i m i z a t i o n problems have the general form minimize f{x) subject to £ < X < u, (2.10) where l,u e R" are lower a n d upper bounds on the variables x. T h e feasible region is often called a "box" due to its rectangular shape. N o t e that b o t h the lower a n d upper bounds m a y be optional, a n d when some components of x lack a n upper or a lower b o u n d , we set the appropriate components of ^ or w to — c » or +00, respectively. We define the set of b i n d i n g constraints [20, 53] b y Bix*)={ i x*=ei, X* = Ui, iîg{x*)i>0, if g{x*)i < 0, or or k = Ui. Furthermore, the strictly binding set is defined as Bs(x*) = B{x*) n {i : g{x*)i ^ 0}. T h e standard first-order necessary condition for a local m i n i m i z e r x* requires g{x')i = Q {0TiiB{x*). (2.11) T h e expression (2.11) shows the importance of the b i n d i n g constraints to the bound-constrained o p t i m i z a t i o n problems. T h e first-order necessary c o n d i t i o n is also called the first-order K a r u s h - K u h n - T u c k e r ( K K T ) condition. T h e secondorder sufficient condition for x* to be a local m i n i m i z e r of the bound-constrained problem (2.10) is that it satisfies the K K T condition a n d also w^H{x*)w > 0, for a l l vectors w ^ 0 such that Wi — 0 for each i € Bs{x*). for the detailed theory of constrained o p t i m i z a t i o n . 2.2.1 See [47, C h a p t e r 12] Active-set methods Active-set methods for constrained o p t i m i z a t i o n p a r t i t i o n the variables into active a n d inactive sets. Those variables whose indices are i n the b i n d i n g set are also classified as fixed variables, while the variables whose indices are not active are classified as free variables. G i v e n any set of free variables, we can define the reduced gradient a n d the reduced Hessian m a t r i x , respectively, as the gradient a n d the Hessian m a t r i x of f{x) w i t h respect to the free variables. I n this terminology, the second-order sufficient condition requires that the reduced gradient be zero a n d t h a t the reduced Hessian m a t r i x be positive definite at x*. In general, active-set algorithms for the solution of bound-constrained problems use unconstrained m i n i m i z a t i o n techniques to explore the reduced problem defined by a set of free variables. Once this exploration is complete, a new set of free variables is chosen w i t h the a i m of d r i v i n g the reduced gradient to zero. One drawback of the active-set m e t h o d is that the working set changes slowly [47]. U s u a l l y at each iteration at most one constraint is added to or dropped from the w o r k i n g set. T h u s the active-set method m a y require many iterations to converge on large-scale problems. For example, if there are ko active constraints at the s t a r t i n g point XQ, a n d kg constraints are active at the solution, then at least \ks —fco|iterations w i l l be required to reach the solution. 2.2.2 Gradient projection methods A gradient-projection method is a special type of active-set method that allows the active set to change r a p i d l y from iteration to iteration. It is a very efficient m e t h o d when there is a n effective m e t h o d for the projection onto the feasible set. F o r example, projection onto b o u n d constraints can be done w i t h linear cost. However, there are also other sets t h a t can be efficiently projected onto, see, e.g.. C o n n , G o u l d , a n d Toint [9, C h a p t e r 12]. Gradient projection methods can be used to m i n i m i z e b o t h convex a n d nonconvex functions; the feasible set, however, must be convex. O n e popular a p proach to the gradient projection methods is to implement each iteration by a two-stage process. F i r s t , we compute a n active face of the box by searching along the steepest descent direction —g from the current point x, a n d we compute a generalized Cauchy point by searching along the piecewise-linear p a t h i n the feasible region. T h e w o r k i n g set is then defined to be the set of bound constraints that are active at the C a u c h y point. I n the second stage, we compute a search direction using the active face b y solving a subproblem i n w h i c h the active components of x are i n the w o r k i n g set. T h i s approach is implemented by B C L S [18], T R O N [33], Lancelot [8], L - B F G S - B [57], a n d some others. G i v e n a n a r b i t r a r y point x, t h e projection of x onto the feasible bounded region is defined as follows. T h e i t h component of the projection of x is given by r Ci if xi < £i, p{x,e,u)i = l Xi iîxi € [ei,ui], [ Ui if Xi > Ui. T h e projected gradient of the objective function / at a point x onto the feasible region is defined by gix) = P{x-g(x),ê,u)-x. (2.12) It is well-known that x* is a first-order stationary point for (2.10) if and only if gix*) = 0 [6], i.e., X* ^Pix* -g{x*),e,u). If the second-order sufficient condition c a n be satisfied, then the gradientprojection method is guaranteed to identify the active set at a solution i n a finite number of iterations. A f t e r it has identified the correct active set, the gradient-projection algorithm reduces to the steepest-descent algorithm on the subspace of free variables. A s a result, this m e t h o d is often used i n conjunction w i t h a second-order method i n order to achieve a faster rate of convergence. 2.3 Algorithms for bound-constrained optimization M a n y algorithms have been proposed t o solve (2.10). For example, algorithms have been proposed by B y r d el a l [5], C o n n et a l [6, 7, 8], Hager and Z h a n g [23, 24], L i n a n d More [33], Z h u el a l [57], a n d others. T h e m a i n framework of recent approaches follows the two-stage process as described i n Section 2.2.2. However, the algorithms differ on how to identify the active face and how to compute the search direction at each iteration. F o r example, t h e m e t h o d i n [6, 7] uses a two-norm trust-region method, a n d it computes the search step by a p p l y i n g the truncated conjugate-gradient m e t h o d to the quadratic approximation of the objective function o n the active face subject to t h e trust-region constraints. A similar approach is also used by L i n a n d More [33] i n T R O N , but the t r u n cated conjugate-gradient method is applied to t h e quadratic approximation of the objective function o n the subspace parallel t o the active face. T h e active-set m e t h o d proposed by Hager and Z h a n g [23, 24] uses a backtracking line-search to identify the active face and a p p l y the unconstrained method to the objective function on the active face. O n e well-known q u a s i - N e w t o n method for solvi n g (2.10) is the L - B F G S - B a l g o r i t h m [57], w h i c h computes a search direction by t r u n c a t i n g t h e unconstrained L - B F G S [34] update relative to the subspace parallel to the active face. M o r e recently, X u a n d B u r k e [53] have proposed the A S T R A L a l g o r i t h m for solving large-scale nonlinear bound-constrained o p t i m i z a t i o n problems. A S T R A L is a n active-set a l g o r i t h m that uses b o t h active-set identification techniques and limited memory B F G S u p d a t i n g for the Hessian approximation. A t each iteration, A S T R A L uses a gradient projection step to determine a n active face and then forms a n ^oo trust-region quadratic programming ( Q P ) subproblem relative to the active face. T h e trust-region subproblems are solved using p r i m a l - d u a l interior point techniques. T h e A S T R A L a l g o r i t h m has a close relationship to the proposed algorithm i n this thesis, thus we w i l l describe the A S T R A L a l g o r i t h m i n more detail i n C h a p t e r 4. One particular software package for solving bound-constrained problems that plays a n i m p o r t a n t role i n this thesis is B C L S [18]. T h e B C L S algorithm solves the linear least-squares problem w i t h simple bounds on the variables minimize ^\\Ax - b\\l + ^ô^\\x\\l + c'^x subject t o £ < X < u, (2.13) where A is m x n (can be any shape), b G R " " , l,u,c e E " , and 5 is a d a m p i n g parameter. In C h a p t e r 4, we show that the problem (2.13) has the exact form for the subproblems to be solved i n our proposed a l g o r i t h m ( B C N L S ) for b o u n d constrained nonlinear least-squares. In fact, the B C N L S algorithm utilizes the B C L S package to solve the subproblem at each iteration. We explain the B C L S algorithm a n d describe how to use it effectively w i t h i n the B C N L S a l g o r i t h m i n C h a p t e r 4. Chapter 3 Unconstrained Least Squares T h i s chapter reviews some theoretical background on unconstrained least squares. W e start w i t h a preliminary review of linear least squares i n Section 3.1, and we describe the two-norm regularization techniques for solving the ill-conditioned linear systems. In Section 3.2, we describe two classical methods for unconstrained nonlinear least squares problems: the G a u s s - N e w t o n method and the Levenberg-Marquardt method. W e also describe a self-adaptive strategy for u p d a t i n g the d a m p i n g parameters i n the Levenberg-Marquardt method. 3.1 Linear least squares T h e classical linear least-squares problem has the form minimize IIIAx-feU^, (3.1) where is an m x n m a t r i x , and b is an m-vector. T h i s problem belongs to a special class of o p t i m i z a t i o n problems w i t h much interest. F i r s t l y , the p r o b l e m (3.1) is a special case of nonhnear least squares. Secondly, the classical methods for nonhnear least squares, such as the G a u s s - N e w t o n m e t h o d a n d the Levenberg-Marquardt m e t h o d i n Section 3.2, iteratively solve hnear leastsquares subproblems. Geometrically, to solve the problem (3.1), we want to find a vector x € R " such that the vector Ax € R™ is the closest point i n the range of A to the vector b. T h i s geometrical interpretation is illustrated i n F i g u r e 3.1. T h e problem (3.1) has been studied extensively, and well-known n u m e r i cal algorithms exist. See L a w s o n a n d H a n s o n [29], G o l u b a n d V a n L o a n [21, C h a p t e r 5], and Trefethen and B a u [52, Lecture 11] for more detail. 3.1.1 Normal equations For a given x, define the residual by r = b — Ax. A vector x s R " minimizes the n o r m of the residual if a n d only if r is orthogonal to the range of A: A^r = 0. F i g u r e 3.1: G e o m e t r i c a l interpretation of least squares T h i s condition is satisfied if a n d only if À^M = A^b. (3.2) T h e equations i n (3.2) is called the normal equations. T h i s system is nonsingular if a n d only if A has full rank. Consequently, the solution x is unique if a n d only if A has full rank. 3.1.2 Two norm regularization R e g u l a r i z a t i o n is a c o m m o n technique for ill-conditioned linear least squares problems [17]. T h e solutions of ill-posed problems may not be unique or may be sensitive to the problem data. T h e regularized least-squares problem has the quadratic form minimKe ^\\Ax - bg + ^S'^\\x\\l (3.3) for some <5 > 0. S o l v i n g (3.3) is equivalent to solving the linear least squares problem 1 A ' ' b (3.4) X— minuruze âl 0 x€R" 2 N o t e that the m a t r i x i n (3.4) necessarily has full column rank. Historically, the s t u d y of problem (3.4) as a function of 5 has also been called ridge regression, damped least squares, or Tikhonov regularization [51]. T h e regularization is a technical device of changing the ill-posed problem to an approximate problem whose solutions may be well-behaved a n d preferable. For example, it m a y be preferable to have approximate solutions w i t h s m a l l norms. T h e intent of using 5 is to prevent ]|a;()2 from getting large when A is ill-conditioned. B y performing singular value decomposition A = U'EV'^, we can show that Ml-tier/. where is the i t h c o l u m n of the orthogonal m a t r i x U a n d CTJ is the i t h singular value of A. N o t e that increasing S would cause the n o r m \\xs || to decrease. T h u s we can o b t a i n some acceptable compromise between the size of the solution vector X a n d the size of the n o r m of the residuals by choosing proper 5. F o r further discussion about regularization a n d the choice of parameter S, refer to L a w s o n a n d H a n s o n [30, C h a p t e r 25]. T h e t w o - n o r m regularization can also be regarded as a trust-region method. T h e solution x to equations (3.3) a n d (3.4) also solves mimmize i||AE-6||| subject to ||xj|2 < A , (3.5) for some A > 0. It can be shown that the solution of (3.4) is a solution of the trust-region problem (3.5) if a n d only if x is feasible a n d there is a scalar 5 >0 such that x satisfies (3.4) a n d S{A - \\x\\2) = 0. See Nocedal a n d W r i g h t [47, C h a p t e r 10] for the proof. Instead of setting the trust region A directly, two-norm regularization m o d ifies A i m p l i c i t l y by adjusting the d a m p i n g parameter S. T h e coimection between the trust-region m e t h o d a n d t w o - n o r m regularization is also revisited i n Section 3.2.2. R e g u l a r i z a t i o n techniques play a key role i n this thesis. T h e B C L S a l g o r i t h m i n Section 4.4 solves a bound-constrained two-norm regularized linear leastsquares problem. T h e Levenberg-Marquardt m e t h o d discussed i n Section 3.2.2 solves a regularized linear least squares subproblem at each iteration. Moreover, the algorithm for the nonhnear least squares problems w i t h simple bounds described i n C h a p t e r 4 solves a regularized linear least squares subproblem w i t h b o u n d constraints at each iteration. 3.2 Nonlinear least squares W i t h the theoretical background of o p t i m i z a t i o n i n C h a p t e r 2 a n d methods for linear least-squares problems from Section 3.1, we now describe two classical methods for imconstrained nonlinear least-squares problems (1.1), namely, the G a u s s - N e w t o n m e t h o d a n d the L e v e n b e r g - M a r q u a r d t method. For a more detailed description of these methods, refer to [47, C h a p t e r 10]. 3.2.1 The Gauss-Newton method Perhaps one of the most important numerical methods is Newton's method. T h e N e w t o n search direction is derived from the second-order T a y l o r series a p p r o x i m a t i o n to f{x + d), f(x + d)^f + g'^d+^cFHd. (3.6) A s s u m e that H is positive definite. B y m i n i m i z i n g (3.6) at each step k, the N e w t o n search direction is computed as the solution of the linear system Hd = -g. Newton's m e t h o d requires H to be positive definite i n order to guarantee that d is a descent direction. A l s o , H must be reasonably well-conditioned to ensiu-e that \\d\\ is small. W h e n H is positive definite, Newton's m e t h o d typically has quadratic convergence. However, when H is singular, the N e w t o n direction is not defined. Since Newton's m e t h o d requhres computation of the second derivatives, it is only used when it is feasible to compute H. T h e G a u s s - N e w t o n m e t h o d difi'ers from Newton's method by using a n app r o x i m a t i o n of the Hessian m a t r i x . F o r nonlinear least squares problems, i n stead of using the Hessian i n (1.4), the G a u s s - N e w t o n method approximates the Hessian as H « J. (3.7) B y (1.3), we have g = J ^ r . T h u s at each iteration, the Gauss-Newton m e t h o d obtains a search direction by solving the equations (3.8) j'^Jd^^-j'^r. T h e Hessian a p p r o x i m a t i o n (3.7) of the G a u s s - N e w t o n method gives a n u m ber of advantages over the Newton's method. F i r s t , the computation of J o n l y involves the Jacobian c o m p u t a t i o n , a n d it does not require any a d d i t i o n a l derivative evaluations of the i n d i v i d u a l residual Hessians V^ri(a;), w h i c h are needed i n the second t e r m of (1.4). Second, the residuals ri{x) t e n d to be small i n many apphcations, thus the first t e r m J i n (1.4) dominates the second t e r m , especially when Xk is close to the solution a;*. In this case, J is a close a p p r o x i m a t i o n to the Hessian and the convergence rate of G a u s s - N e w t o n is similar to that of Newton's method. Third, J is always at least positive semi-definite. W h e n J has full rank a n d the gradient g is nonzero, the G a u s s - N e w t o n search direction is a descent direction and thus a suitable direction for a line search. Moreover, if r{x*) = 0 a n d Jix*)"^ J{x*) is positive definite, then x* is a n isolated local m i n i m u m and the method is locally quadratically convergent. T h e fourth advantage of G a u s s - N e w t o n arises because the equations (3.8) are the n o r m a l equations for the linear least-squares problem minimize \\\Jd + r\\l. (3.9) d In principle, we can find the G a u s s - N e w t o n search direction by a p p l y i n g linear least-squares algorithms from Section 3.1 to the subproblem (3.9). T h e subproblem (3.9) also suggests another view of the G a u s s - N e w t o n method. If we consider a linear model for the vector function r ( x ) as r{xk + d)xrk + Jkd, then the nonlinear least-squares problem can be approximated by l\\r(xk + d)g^l\\Jkd+r4l T h u s , we obtain the Gauss-Newton search direction by m i n i m i z i n g the linear least-squares subproblem (3.9). Implementations of the G a u s s - N e w t o n method usually perform a line search i n the search direction d, requiring the step length to satisfy conditions as those discussed i n C h a p t e r 2, such as the A r m i j o a n d Wolfe conditions described in Section 2.1.1. W h e n the residuals are small and J has full rank, the GaussN e w t o n method performs reasonably well. However, when J is rank-deficient or near rank-deficient, the G a u s s - N e w t o n method can experience numerical difficulties. 3.2.2 The Levenberg-Mcirquardt method Levenberg [32] first proposed a d a m p e d G a u s s - N e w t o n method to avoid the weaknesses of the Gauss-Newton method when J is rank-deficient. M a r q u a r d t [37] extended Levenberg's idea by i n t r o d u c i n g a strategy for controlling the d a m p i n g parameter. T h e Levenberg-Marquardt m e t h o d is implemented i n the software package M I N P A C K [38, 42]. T h e Levenberg-Marquardt method modifies the G a u s s - N e w t o n search direct i o n by replacing J^J w i t h J^J + S'^D ,where D is diagonal, 5 >0. Hereafter, we use D = I to simplify our description of the method. A t each iteration, the Levenberg-Marquardt search direction d satisfies iJ^J + Ô^I)d=-J^r. (3.10) T h e Levenberg-Marquardt method can also be described using the trustregion framework of C h a p t e r 2. L e t A be the two-norm of the solution to (3.10), then d is also the solution of the problem minimize subject to |||</<^ + '"|l2 \\d\\2 < A . Hence, the Levenberg-Marquardt method can be regarded as a trust-region method. T h e m a i n difference between these two methods is that the trust-region method updates the trust region radius A directly, whereas the LevenbergM a r q u a r d t method updates the parameter S, which i n t u r n modifies the value of A implicitly. For more description of the differences between the two methods, see [42, 43, 55, 56] for details. Note that the system (3.10) is the n o r m a l equations for the damped linear least-squares problem . . . mimmize d 1 2 J SI d + (3.11) T h e clamped least-squares problem (3.11) has the same form as (3.4) i n Sect i o n 3.1.2. Similar to the G a u s s - N e w t o n method, the Levenberg-Marquardt m e t h o d solves the nonlinear least-squares problem by solving a sequence of l i n ear least-squares subproblems (3.11). T h e local convergence of the LevenbergM a r q u a r d t method is also similar to that of the G a u s s - N e w t o n method. T h e key strategy decision of the L e v e n b e r g - M a r q u a r d t method is how to choose a n d update the d a m p i n g parameter S at each iteration. 3.2.3 Updating the damping parameter M u c h research has been done on how to update Ô. See, for example, [14, 36, 42, 46] for more details. F r o m equations (3.10), if 5 is very small, the LevenbergM a r q u a r d t step is similar to the G a u s s - N e w t o n step; and if (5 is very large, then d w —-^9, and the resulting step is very similar to the steepest descent direction. T h u s , the choice of 6 i n the L e v e n b e r g - M a r q u a r d t m e t h o d affects b o t h the search direction and the length of the step d. If x is close to the solution, then we want the faster convergence of the G a u s s - N e w t o n method; if X is far from x", we would prefer the robustness of the steepest descent method. Similar to the trust-region method, the ratio pk (2.7) between the actual reduction of the objective function a n d the predicted reduction of the quadratic model is used to control and update the d a m p i n g parameter 5k i n the LevenbergM a r q u a r d t method. In general, if the step dk is successful and pk is satisfactory, we accept dk a n d reduce S; otherwise, we reject dk and enlarge 5. N a t m a l l y , when Pk is close to 1 a n d the step dk is very successful, we may reduce Sk to near 0, then the next step w o u l d become a n approximate G a u s s - N e w t o n step. O n the other hand, when dk is large and pk < 0, it is reasonable to enlarge Sk to move the search direction to a descent direction. W e c o u l d adopt this strategy for u p d a t i n g the trust region radius A t and modify (2.8) for u p d a t i n g the d a m p i n g parameter i n the Levenberg-Marquardt method. F o r example, one possible approach, described by F a n [15], is (5fc/4, if pk > Tj2, Sk, if Pfc € 44, if [771,772], (3.12) Pk<Vi- T h e m a i n drawback of this approach is that it is not sensitive to small changes in the values of pk- There exist j u m p s i n Sk-^-i/Sk across the thresholds of rji and 772 [15]. Recently, Y a m a s h i t a a n d P u k u s h i m a [54] show that if the Levenberg-Marquardt parameter is chosen as Sk = Ijr.ll, and if the initial point is sufficiently close to the solution x*, then the LevenbergM a r q u a r d t m e t h o d converges quadratically to the solution x*. D a n et a l [10] propose an inexact L e v e n b e r g - M a r q u a r d t m e t h o d for nonhnear least-squares problem w i t h the parameter chosen as \ 2-6 2 \ \ \ 1" \ 06 % 0.1 0.2 0.S 0.4 0.6 0,B 0-7 \ 0.8 OJ Figure 3.2: T h e image of function q(p) where is a positive constant. T h e rationale of using a parameter u i n adjusting the d a m p i n g parameter is based on the following observations. W h e n the current iterate is too far away firom the solution and ||rfc|| is too large, ok could become too large; the resulting step dk could potentially be too small and prevent the sequence from converging quickly. O n the other h a n d , when the current position is sufficiently close to the solution, ||rfc|| maybe very small, so the 5k could have no effect. D a n et a l [10] show that the Levenberg-Marquardt method converges to the solution superlinearly when v G (0,2), a n d it converges quadratically when 1/ = 2. M o r e recently, F a n and P a n [14] propose a strategy of adjusting the d a m p i n g parameter by * I = afc||rfc||^ (3.13) for u € (0,2], and ak is some scaUng factor. T o avoid the jumps i n ôk+i/Sk across the thresholds of rji and T]2 a n d make more use of the information of the ratio pk, F a n a n d P a n use a scaling factor ak at each iteration, where ak+i is a function of pk a n d akDefine the continuous nonnegative function g(p) = maoc{l/4,1 - 2(2p - 1)3}. (3.14) T h e image of function (3.14) is illustrated i n F i g u r e 3.2. T h e function of ak+i is then defined as ak+i = akq{pk)(3.15) B y using the function q{p), the factor ak is u p d a t e d at a variable rate according to the ratio pk, rather t h a n s i m p l y enlarging or reducing the Ô by some constant rate as i n (3.12). C o m b i n i n g (3.13), (3.14) a n d (3.15) together, we have an u p d a t i n g strategy which is referred to as the self-adaptive Levenberg-Marquardt method [14]. T h i s technique is similar to the self-adaptive trust-region algorithm proposed by H e i [26]. U n d e r the local error b o u n d condition, F a n and P a n [14] show that the self-adaptive Levenberg-Marquardt method converges superlinearly to the solution for v € (0,1), while quadratically for v € [1, 2]. Note that the theoretical convergence proof of the self-adaptive u p d a t i n g strategy is for unconstrained nonlinear least-squares optimization. Nevertheless, i n C h a p t e r 4, we show that this u p d a t i n g strategy can be adapted and extended to bound-constrained nonlinear least-squares problems. Chapter 4 Nonlinear Least Squares W i t h Simple Bounds T h i s chapter presents the m a i n c o n t r i b u t i o n of this thesis. W e describe our proposed a l g o r i t h m ( B C N L S ) for solving the nonlinear least-squares problem w i t h simple bounds minimize 5lk(a;)||2 subject to i < X < u, (4.1) where u € E " are vectors of lower a n d upper bounds o n x, a n d r is a vector of real-valued nonlinear functions. In C h a p t e r 2, we briefly introduced some algorithms for bound-constrained nonlinear o p t i m i z a t i o n problems. I n this chapter, we show how we adopt the general framework of the A S T R A L [53] a l g o r i t h m into the B C N L S a l g o r i t h m for solving the nonlinear least-squares problems. W e also describe how we make use of the B C L S [18] algorithm t o solve a sequence of bound-constrained hnear least-squares subproblems. W e begin this chapter w i t h a brief description of the motivation of our a p proach. T h e n we illustrate the general framework of the proposed a l g o r i t h m i n Section 4.2. T h e strategies a n d techniques used for stabilizing a n d enhanci n g the performance of the a l g o r i t h m are described i n Section 4.3. Section 4.4 describes the B C L S a l g o r i t h m for solving the subproblems. F i n a l l y , i n Sect i o n 4.5, we show the detailed implementation steps of the B C N L S a l g o r i t h m a n d summarize the basic features of the a l g o r i t h m . 4.1 Motivation M a n y algorithms (for example, A S T R A L , L - B F G S - B ) exist for general nonlinear o p t i m i z a t i o n w i t h simple bounds, but such algorithms do not make use of any knowledge of the objective functions. O n the other h a n d , efficient methods (for instance, B C L S ) exist for solving hnear least-squares problems w i t h simple bounds. Hence, we want to develop a n efficient a l g o r i t h m specially designed for large-scale nonlinear least-squares problems w i t h simple bounds on variables. In the design of the B C N L S a l g o r i t h m , we set the following three goals for our approach. F i r s t , we want a n algorithm that scales well to large problems. Second, the a l g o r i t h m should be very effective for the problems whose functions and gradients are expensive to evaluate. T h a t is, the a l g o r i t h m must be frugal w i t h the evaluation of functions a n d gradients. T h i r d , we want to ensure that the variables are always feasible at each iteration, so that the intermediate results can be useful i n practical applications. To achieve the above goals, we use the principles of the classical G a u s s N e w t o n a n d L e v e n b e r g - M a r q u a r d t m e t h o d for unconstrained nonlinear leastsquares problem. O u r a l g o r i t h m ( B C N L S ) solves the bound-constrained n o n h n ear least-squares problem by iteratively solving a sequence of bound-constrained linear least-squares subproblems. T h e subproblems are handled by the software package B C L S . T o fully exploit the advantages of B C L S , the subproblems are formed by using the two-norm regularization instead of trust-region method. A t each iteration, the d a m p i n g parameter of the subproblem is updated using the self-adaptive u p d a t i n g strategy described i n Section 3.2.3. 4.2 Framework T h e B C N L S a l g o r i t h m uses the same quadratic model (3.6) for the objective function a n d the same Hessian a p p r o x i m a t i o n (3.7) as i n the G a u s s - N e w t o n method. T h e a l g o r i t h m ensures that the variables x ^— x + d are always feasible. T h a t is, we superimpose the simple b o u n d constraints i<Xk + dk <u, (4.2) for each subproblem at step k. T h e bounds (4.2) can be w r i t t e n as ik<dk< Uk, (4.3) where £k = i — Xk, a n d Uk — u — Xk are the updated lower a n d upper bounds of d at iteration k. A s w i t h the G a u s s - N e w t o n method, we take advantage of the special structure (4.1) of the nonlinear least-squares objective. B y combining (3.9) a n d (4.3), we have extended the classical unconstrained G a u s s - N e w t o n m e t h o d to the n o n linear least-squares problem w i t h simple bounds. T h e tentative subproblem has the form minimize \\\Jkdk + rk\\l subject to £k < dk < Uk- (4.4) However, if the subproblem (4.4) were to be solved without any modification, the proposed method may experience numerical difficulties when Jk is nearly rank-deficient. T h e numerical solution of (4.4) is dependent on the condition of J just as i n the G a u s s - N e w t o n method. In C h a p t e r 3, we have described how the trust-region m e t h o d a n d the L e v e n b e r g - M a r q u a r d t m e t h o d can be used to remedy the numerical difficulties of the G a u s s - N e w t o n method. It is n a t u r a l for us to adopt these remedies. R e c a l l i n Section 2.3, we have mentioned that trust-region methods have been used lems more gives trust i n several algorithms [6, 7, 53] for solving the nonlinear o p t i m i z a t i o n probw i t h simple bounds on variables. For bound-constrained problems, it is convenient for the trust region to be defined by the || • ||oo norm, which rise to a box-constrained trust-region subproblem. For this reason, the ^oo region is used i n A S T R A L [53]. O u r first approach is to m i m i c A S T R A L ' s approach by adding an ico t r u s t region constraint to the subproblem (4.4). minimize h\\Jkdk + rkWl subject to £k < dk < Uk, (^-S) \\d\\oo < AkT h e role of ||rf||oo < is to control the n o r m of d. W e instead use the two-norm regularization to achieve the same effect. We can express the subproblem (4.5) i n a two-norm regularized form minimize ||| J/tOÎ/fe + T"*:!!! + l^fcNfcHi subject to êk < dk < Uk, (4-6) for some Sk > 0. T h i s subproblem has the benefit t h a t it can be solved directly by the software package B C L S [18]. O n e remaining challenge lies i n how to adjust 5k at each i t e r a t i o n . F o r the a l g o r i t h m to have a rapid convergence rate, it is i m p o r t a n t to have a n effective strategy to update 5k at each step. Essentially, the a l g o r i t h m uses 5k to control b o t h the search direction a n d the step size. A l g o r i t h m 4.1 outlines the basic framework of our approach. T h e m a i n comp u t a t i o n a l component of this a l g o r i t h m is solving the bound-constrained linear least-squares subproblem (4.6). I n this way, the bounds are handled directly by the subproblem solver. T h u s , we can a p p l y the self-adaptive d a m p i n g strategy described i n Section 3.2.3, a n d the local a n d global convergence properties r e m a i n the same as i n the unconstrained cases. A t this point, it is necessary to show that the proposed framework is a reasonable approach for solving the bound-constrained nonhnear least-squares problem (4.1). T h e key global a n d local convergence properties of A l g o r i t h m 4.1 can be established by showing that the B C N L S a l g o r i t h m has a very shnUar framework as the A S T R A L a l g o r i t h m . T h e A S T R A L a l g o r i t h m solves a general bound-constrained nonlinear probl e m (2.10) by solving a sequence of ^oo trust-region subproblems. T h e subprobl e m has the basic template minimize ^S" Bkd + g^d (4.7) d subject to Ik < d < Uk, \\d\\oo < Ak. T h e objective function is based on the quadratic model (2.5), where Bk'isa s y m metric positive definite Hessian a p p r o x i m a t i o n of the objective function at step A l g o r i t h m 4.1: O u t l i n e of the B C N L S method 1 initialization: fc <— 0, a;o, 5o given. 2 while not optimal do Solve the bound-constrained subproblem (4.6) for dk3 4 C o m p u t e the ratio pk (2.7). 5 if Pk is satisfactory then 6 7 Xk-+-i *- Xk + dk else 8 9 10 11 12 1 Xk+l *— Xk- end U p d a t e the d a m p i n g parameter Okie *-k+I. end k. T h e A S T R A L a l g o r i t h m solves the quadratic trust-region subproblem (4.7) by using a p r i m a l - d u a l interior point method. T h e outline of the A S T R A L a l g o r i t h m is shown i n A l g o r i t h m 4.2. Note that the gradient projection restart procedure (line 6) m A l g o r i t h m 4.2 is a strategy to assist i n the identification of the active constraints at the current solution. U n d e r the standard nondegeneracy hypothesis, X u and B u r k e [53] established that the framework of A l g o r i t h m 4.2 converges b o t h globally and locally to a first-order stationary point. Moreover, the sequence of {xk} generated by the A S T R A L a l g o r i t h m converges quadratically to x* under standard assumptions. T h e A S T R A L algorithm a n d its convergence theory follow the pattern estabhshed by Powell i n [49] a n d has been used for similar algorithms from the Uterature [4, 6, 9, 47]. For detailed theoretical proof of the convergence properties, see [53]. After a l l , the nonlinear least squares problem (4.1) is a special case of the nonlinear problem (2.10). T h e similarity between the frameworks of A l g o r i t h m 4.1 and A l g o r i t h m 4.2 suggests that the convergence properties of the A S T R A L algorithm apply to the B C N L S algorithm for the bound-constrained nonlinear least-squares problem. T h e m a i n differences lie i n the specific techniques used for solving the quadratic subproblems. In this sense. A l g o r i t h m 4.1 specializes the A S T R A L a l g o r i t h m to a special class of nonlinear least-squares o p t i m i z a t i o n problems w i t h simple bounds on variables. T h e new approach takes advantage of the special structure of the objective function by u t i l i z i n g the special relationships between its gradient, Hessian and the Jacobian. O u r approach eliminates the need to b u i l d , compute a n d possibly modify the Hessian of the objective. Therefore, the B C N L S algorithm can make use of the function a n d gradient evaluations more efficiently t h a n other similar approaches for the general nonlinear o p t i m i z a t i o n problems. A l g o r i t h m 4.2: O u t l i n e of the A S T R A L m e t h o d 1 initialization: Â; <— 0, l o given, SQ > 0, ÔQ m i n { m a x i { u j — Zi},(5o}; 2 initialize symmetric positive definite BQ € R . 3 choose a set of controlling constants. 4 while not optimal do 5 6 7 8 9 10 111 12 13 if ||dfc||oo is not acœptable t h e n P e r f o r m a gradient projection restart step. end Solve the trust-region subproblem (4.7) for dkC o m p u t e the ratio pk (2.7). if Pk is satisfactory t h e n 1 Xk+i ^ Xk + dk else 15 16 Xk-^l Xkend U p d a t e symmetric positive definite m a t r i x BkU p d a t e the trust-region radius. 17 k*-k 14 + l. 18 e n d 4.3 Stabilization strategy T h e A S T R A L a l g o r i t h m makes use of the l^o trust-region on the search direction d for the subproblems. T h e £oo trust region constraint l|4||oo < A f c is equivalent to where e = ( 1 , 1 , . . . , 1)'^. B y superimposing the £oo trust region constraint onto the feasible bounds, the resulting feasible region continues to have a nice rectangular box shape, a n d the subproblems c a n be solved by a p p l y i n g the bound-constrained quadratic p r o g r a m m i n g techniques. A t each iteration k, the A S T R A L a l g o r i t h m updates the trust-region radius Ak based on the performance of the previous steps. T h e strategy of u p d a t i n g the trust-region radius prevents the n o r m of d from getting large a n d thus has the effect of stabilizing the a l g o r i t h m . T h e B C N L S a l g o r i t h m uses a very simUar strategy to the A S T R A L algor i t h m . R a t h e r t h a n u p d a t i n g the t r u s t - r a d i u s directly, we use the two-norm regu l a r i z a t i o n technique a n d formulate the subproblem as the form of (4.6). T h e d a m p i n g parameter 5k is then u p d a t e d by using the self-adaptive LevenbergM a r q u a r d t strategy i n Section 3.2.3. T h e theoretical equivalence between the trust-region m e t h o d a n d the two-norm regularization techniques has been established i n Section 3.2.2 a n d Section 3.2.3. C o m p a r e d w i t h the trust-region approach, the two-norm regularization has the advantages of reducing the cond i t i o n nunaber of the m a t r i x i n the subproblem and preventing from gett i n g large. Furthermore, the two-norm regularization strategy gives rise to a nicely formatted linear least-squares subproblem w i t h simple bounds on the variables (4.6), w h i c h can be solved directly by B C L S . 4.4 T h e subproblem solver T h e kernel of the computational task i n the B C N L S a l g o r i t h m is the solution of the bound-constrained linear least-squares subproblems (4.6). T h e overall efficiency of the a l g o r i t h m u l t i m a t e l y depends on the efficient solution of a sequence of such subproblems. For large-scale problems, the J a c o b i a n matrices are potentially expensive to compute and it would require relatively large storage space if we were to store the J a c o b i a n matrices explicitly. To make the B C N L S a l g o r i t h m amenable to large-scale nonlinear least-squares problems, we want to make sure that the o p t i m i z a t i o n algorithms for the subproblem do not rely on m a t r i x factorizations and do not require the exphcit c o m p u t a t i o n of Jacobian and the Hessian approximations. In the remainder of this section, we show that the software package B C L S has the desirable features to make it a viable choice for solving the subproblem i n the B C N L S algorithm. B C L S is a separate implementation for solving linear least-squares problems w i t h bounded constraints. Descriptions of the software package B C L S can be found i n [18], [19, Section 5.2], a n d [31, Section 2.4]. For the completeness of presenting the B C N L S a l g o r i t h m , we reproduce a brief description of the B C L S a l g o r i t h m here using the generic form of the problem (2.13). Because the hnear t e r m c^x is not used i n the B C N L S a l g o r i t h m , we set c = 0 a n d eliminate the linear term. T h u s , the simplified version of the linear least-squares problem for the B C L S a l g o r i t h m has the form mimmize ^\\Ax - bg + ^6^\\x\\l subject to £ < X (4.8) <u. T h e B C L S a l g o r i t h m is based on a two-metric projection method (see, for example, [2, C h a p t e r 2]). It can also be classified as a n active-set method. A t all iterations, the variables are p a r t i t i o n e d into two different sets: the set of free variables (denoted hy XB) and the set of fixed variables (denoted by xjv), where the free and fixed variables are defined as i n classical active-set methods i n Section 2.2.1. Correspondingly, the columns of m a t r i x A are also p a r t i t i o n e d into the free (denoted by ^ B ) a n d fixed (denoted by AAT) components based on the set of indices for the free a n d fixed variables. A t each iteration, the two-metric projection method generates independent descent directions AXB and AXN for the free and fixed variables. T h e search directions are generated from an approximate solution to the block-diagonal linear system AlA 0 -(52j 0 AXB = Â^r - S^x, D (4.9) where r = b — Ax is the current residual a n d D is a diagonal m a t r i x w i t h strictly positive entries. T h e right-hand side of the equation (4.9) is the negative of the gradient of the objective i n (4.8). T h e block-diagonal linear system (4.9) is equivalent to the following two separate systems {A%AB + S^I)AXB = = DAXN AZr - Aj-^r — 5 S^XB xjv. (4.10a) (4.10b) T h u s , a N e w t o n step (4.10a) is generated for the free variables XB, a n d a scaled steepest-descent step (4.106) is generated for the fixed variables x^- T h e aggregate step {AXB, AXN) is then projected onto the feasible region a n d the first minimizer is computed along the piecewise linear projected-search direction. See [9] for a detailed description on projected search methods for b o u n d constrained problems. T h e linear system (4.9) is never formed e x p l i c i t l y i n the B C L S a l g o r i t h m . Instead, AXB is computed equivalently as a solution to the least-squares problem minimize Ax ^IABAXB - r\\l + hS'^llxs + A X B H ^ - (4.11) T h e solution to (4.11) can be found by a p p l y i n g the conjugate-gradient type solver L S Q R [48] to the problem mmimize AXB As 01 AXB - (4.12) where (3 = max{(5, e}, and e is a smah positive constant. \i 5 < e, the resulting step is effectively a modified N e w t o n step, a n d it has the advantage of safeguarding against rank-deficiency i n the current s u b - m a t r i x AB- T h e m a j o r c o m p u t a t i o n a l work i n the L S Q R a l g o r i t h m is determined by the t o t a l number of matrix-vector products w i t h the m a t r i x a n d its transpose. I n order to cont r o l the amount of work carried out by L S Q R i n solving the problem (4.12), the B C L S a l g o r i t h m employs an inexact N e w t o n strategy [11] to control the accuracy of the computed sub-problem solutions A X B T h e following notable features make the B C L S a l g o r i t h m a perfect fit for solving the subproblems (4.6) i n the B C N L S a l g o r i t h m . F i r s t , the B C L S algor i t h m uses the m a t r i x A only as a n operator. T h e user only needs to provide a routine to compute the matrix-vector products w i t h m a t r i x A a n d its transpose. T h u s the a l g o r i t h m is amenable for solving linear least-squares problems in large scale. Second, the B C L S a l g o r i t h m can take advantage of good starting points. T h e ability to warm-start a problem using an estimate of the solution is especially useful when a sequence of problems needs to be solved, a n d each is a small perturbation of the previous problem. T h i s feature makes B C L S work well as a subproblem solver i n the B C N L S algorithm. In general, the solutions to the subsequent subproblems are generally close to each other, and the solution of the previous iteration can serve as a good starting point for the current iteration and thus speed up the solution process. T h i r d , the bounded constraints are handled directly by the B C L S package, which leaves the B C N L S framework straightforward to understand and easy to implement. F o u r t h , the formulation of the two-norm regularized linear least-squares problem (4.8) enables the B C L S a l g o r i t h m to fully exploit the good d a m p i n g strategies. Therefore, the integrat i o n of the B C L S a l g o r i t h m and the self-adaptive d a m p i n g strategy contributes to the success of the B C N L S algorithm i n this thesis. 4.5 T h e B C N L S algorithm T h e implementation of the B C N L S a l g o r i t h m requires the user to provide two user-defined function routines for a given bound-constrained nonlinear leastsquares problem. T h e residual functions are defined by a user-suppUed routine called funObj, which evaluates the vector of residuals r(x) for a given x. T h e J a c o b i a n of r{x) is defined by another user-written function called Jprod, whose essential function is to compute the matrix-vector products w i t h Jacobian and its transpose, i.e., computes Jx and J^y for given vectors x and y w i t h compatible lengths for J . T h e i n i t i a l point XQ and b o t h lower and upper bounds are optional. If the i n i t i a l starting point xo is provided but not feasible, then the infeasible s t a r t i n g point is first projected onto the feasible region and the projected feasible point is used as the i n i t i a l point. T h e B C N L S a l g o r i t h m w i l l output the solution x, the gradient g, and the projected gradient g of the objective function at x. In addition, it provides the user w i t h the following information: the objective value at the solution, the t o t a l number of function evaluations, the number of iterations, the computing time, a n d the status of t e r m i n a t i o n . A l g o r i t h m 4.3 describes our implementation of the B C N L S a l g o r i t h m . A t the initialization stage, steps 1-5 define the damping u p d a t i n g strategj' for the given problem. T h e positive constant «tnin at step 5 is the lower bound of the d a m p i n g parameter, it is used for numerical stabilization and better c o m p u t a t i o n a l results. It prevents the step from being too large when the sequence is near the solution. Step 7 sets the step acceptance criteria. Steps 810 give the t e r m i n a t i o n criteria for the a l g o r i t h m . T h e algorithm is structured around two blocks. Steps 1 9 - 2 2 are executed only when the solution of the previous iteration is successful (i.e., when Xk has an updated value). Otherwise, the a l g o r i t h m repeatedly solves the subproblem by steps 24-39 using an updated d a m p i n g parameter Sf. until pk is satisfactory. T h e B C N L S algorithm uses only one function evaluation and at most one gradient evaluation per major iteration. In this way, the algorithm ensures that the m i n i m u m number of function and gradient evaluations are used i n the sol u t i o n process. T h i s approach may lead to more calls of B C L S for solving the A l g o r i t h m 4.3: 1 2 3 4 5 6 T 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 38 B o u n d - C o n s t r a i n e d N o n l i n e a r Least Squares ( B C N L S ) : f u n O b j , J p r o d , Xo, £, u input output: x, g, g, info Set d a m p i n g parameter So > 0 Set scaling factor ao > 0 Set u p d a t i n g constant P Ç (0,2] Define a nonnegative nonlinear function q{p) Set the m i n i m u m scaling factor 0 < amin < cto Set the s t a r t i n g point do for B C L S Set step acceptance parameter 0 < po < 1 Set o p t i m a l i t y tolerance € Set m a x i m u m iterations Set m a x i m u m function evaluations A; +- 0 Xk + — XQ dfc «— do C o m p u t e the residuals <— f u n O b j Sk aollrfcll". Set iStepAccepted < — true while ll^fcll > e do if iStepAccepted t h e n (xk) C o m p u t e the gradient gk <— J p r o d ( r t , 2) C o m p u t e the projected gradient gk <— P{xk — gk,i) u) — Xk i k - i - x k Uk*-U-Xk end [dk] *- BCLSiJprod, - r ^ , 4 , Uk, do, 0, Sk) C o m p u t e new residuals rnew *— f u n O b j (x^ + dk) C o m p u t e the r a t i o pk (2.7). if pk > Po t h e n Xk+i *Xk+dk rk+l *— ''new do *— dk IStepAccepted true else Xk+l <- Xk Tk+l Tk IStepAccepted 36 37 38 39 false. end Set ak+\ max{Q;min,a!fcg(pfc)}. U p d a t e 5k+i ^ ak+iWrkW"• k^k + 1. 40 e n d 41 x*-Xk, g*-9k, 9^9k- subproblems and potentially use more computation of the matrix-vector p r o d ucts w i t h J a c o b i a n a n d its transpose. However, i n general, large-scale problems requires more c o m p u t i n g time to evaluate the function and gradient values. Therefore, the B C N L S a l g o r i t h m can perform well for large-scale problems. O u r implementation of A l g o r i t h m 4.3 shows that we have achieved our p r i m a r y goals for the bound-constrained nonlinear least-squares solver. T h e B C N L S a l g o r i t h m has the following basic features. F i r s t , two-norm regularization a n d d a m p i n g techniques are used i n solving the subproblems. Second, Hessian approximations are never formed and no second-order derivatives are required to be computed. T h i r d , the simple bounds are handled directly by the subproblem solver B C L S a n d the sequence of {xk} is always feasible at any time. F o u r t h , the B C L S a l g o r i t h m makes use of the conjugate-gradient type m e t h o d L S Q R . B y inheriting the significant features from the B C L S and L S Q R software packages, the B C N L S a l g o r i t h m uses the Jacobian only as an operator. It only requires the user to provide matrix-vector products for the Jacobian a n d its transpose. T h u s the B C N L S package is amenable for bound-constrained nonlinear least-squares problems i n large-scale. Chapter 5 Numerical Experiments W e give the results of numerical experiments on two sets of nonlinear leastsquares problems. T h e first set is composed of the fifteen benchmark nonlinear least-squares problems from the N e t l i b collection A l g o r i t h m 566 [1, 39, 40]. T h e second set of test problems are generated from the C U T E r testing environment [22]. U s i n g these two sets of test problems, we compare the performances of B C N L S , L - B F G S - B and A S T R A L . T h e L - B F G S - B F o r t r a n subroutines were downloaded from [5, 57]. T h e A S T R A L M a t l a b source codes were downloaded from [53]. T h e comparisons are based on the number of function evaluations and C P U times. 5.1 Performance profile To illustrate the performance, we use the performance profile technique described i n D o l a n and M o r e [12]. T h e benchmark results are generated by r u n ning the three solvers on a set of problems and recording information of interest such as the number of function evaluations a n d C P U time. Assume that we are interested i n using C P U time as a performance measure, we define tp.^ as the C P U time required to solve problem p by solver s. L e t ibest be the m i n i m u m C P U time used by a l l solvers for problem p. T h e performance ratio is given by r - Is^ tbest T h e ideas are the same when comparisons are based on the number of function evaluations. T h e performance profile for solver s is the percentage of problems that a performance ratio rp^g is w i t h i n a factor T € R of the best possible ratio. A plot of the performance profile reveals a l l of the major performance characteristics. In particular, if the set of problems is s u i t a b l y large, then solvers w i t h large probability psir) are to be preferred. One i m p o r t a n t property of performance profiles is that they are insensitive to the results on a small number of problems. A l s o , they are largely unaffected by small changes i n results over many problems [12]. In this chapter, a l l the performance profile figures are plotted i n the l o g j scale. T h e M a t l a b performance profile script perf.m was downloaded from h t t p : / / w w w - u n i x . m c s . a n l . g o v / ~ m o r e / c o p s [12]. If a solver s fails on a given problem p, its corresponding measures, such as the number of function evaluations or C P U time, are set to N a N . In this way, the performance profile not only captures the performance characteristics, but also it shows the percentage of solved problems. 5.2 Termination criteria I n our numerical experiments, each a l g o r i t h m is started at the same i n i t i a l point. W e use the following criteria to terminate the algorithms: \\fk-fk-i\\ max(l,||Mi,||A_i||) Il5.ll fk < emach-10^ < e < e (*) < m rif < 1000, where emach is the machine epsUon, e is the o p t i m a l i t y tolerance, gk is the projected gradient (2.12) of the objective function, n / is the number of function evaluations. Note that the t e r m i n a t i o n criteria are the same as i n the A S T R A L a n d L - B F G S - B solvers, except that (*) is partictilar to the B C N L S solver. If a n algorithm terminates but the o p t i m a l i t y condition < e is not satisfied, we consider the a l g o r i t h m to have failed on the given problem. However, since the objective fimction is i n the form of least-squares, if an algorithm terminates w i t h /fc < e and Xk is feasible, we consider the a l g o r i t h m succeeded on the given problem, regardless of the value of the projected gradient. 5.3 Benchmarlc tests with Algorithm 566 T h e test set of benchmark least-squares problems from N e t h b A l g o r i t h m 566 are unconstrained least-squares problems. W e do not include Problems 1-3 from the original N e t h b collection, as these are hnear least-squares problems. T h e remaining fifteen nonlinear least-squares problems are used as the benchmark to test the three boimd-constrained algorithms. W e a r b i t r a r i l y choose the bounds on the variables to be 0 < i < oo. T h e original N e t l i b problems were implemented i n F o r t r a n subroutines. W e recoded the problems i n M a t lab. Furthermore, we use the standard i n i t i a l points as defined i n A l g o r i t h m 566 for a l l the test problems. T h e results w i t h fifteen test problems are given i n Table 5.1. To summarize a n d compare the overall performances of the three solvers based on the results i n Table 5.1, we use the following criteria to judge the outcomes (success or failure) of the solvers on any problem. If b o t h / > 10~^ and > 10""*, then we consider the solver has failed on the problem, otherwise it is a success. Based on the criteria, we show the overall performance of the three algorithms i n Table 5.2. Table 5.1: Results of fifteen benchmark tests from A l g o r i t h m 566 Algorithm BCNLS LBFGSB ASTRAL 2.47e - 15 5.08e - 06 7.42e - 06 9\\ 1.30e - 07 3.51e-01 5.36e - 02 rif 5 22 36 time(Sec) 3.00e - 02 1.12e - 02 5.20e - 01 BCNLS LBFGSB ASTRAL 3.62e + 02 3.62e + 02 5.31e + 01 2.50e + 02 2.50e + 02 1.29e - 03 1 22 22 9.70e - 01 1.43e - 02 3.59e - 01 BCNLS LBFGSB ASTRAL 9.98e - 30 6.82e - 06 9.83e - 06 9.99e - 15 1.05e - 02 1.27e - 03 7 21 27 4.00e - 02 1.51e - 02 5.79e - 01 BCNLS LBFGSB ASTRAL 6.40e + 01 6.40e + 01 6.40e + 01 4.62e - 07 2.13e-14 1.74e - 06 4 5 3 2.00e - 02 7.95e - 03 5.53e - 02 BCNLS LBFGSB ASTRAL 4.11e-03 4.11e-03 4.11e - 03 1.28e - 05 3.46e - 06 3.52e - 06 6 37 86 4.00e - 0 2 2.07e - 0 2 1.16e + 00 BCNLS LBFGSB ASTRAL 1.54e-04 1.54e - 04 1.54e - 04 3.29e - 06 1.08e - 06 6.90e - 06 9 34 72 7.00e - 0 2 2.18e - 0 2 6.33e - 0 1 10 BCNLS LBFGSB ASTRAL 1.21e + 04 7.30e + 08 5.03e + 06 2.69e + 05 4.06e + 10 2.86e + 07 21 4 31 1.40e - 0 1 8.36e - 0 3 5.55e - 0 1 11 BCNLS LBFGSB ASTRAL 6.88e - 03 6.88e - 03 6.88e - 03 1.53e - 06 6.61e - 06 2.72e - 05 6 33 45 7.00e - 02 2.95e - 0 2 6.36e - 0 1 12 BCNLS LBFGSB ASTRAL 5.65e - 08 5.17e-06 9.30e - 06 2.66e - 04 8.43e - 03 3.08e - 03 5 21 32 4.00e - 0 2 1.48e - 0 2 5.54e - 0 1 13 BCNLS LBFGSB ASTRAL 6.22e + 01 6.22e + 01 6.22e + 01 3.52e - 03 1.60e - 05 1.14e - 02 18 21 37 l.OOe - 0 1 1.31e - 0 2 5.21e - 0 1 14 BCNLS LBFGSB ASTRAL 1.06e 1.06e 1.06e •05 -05 •05 1.14e + 00 1.57e - 03 1.66e - 03 15 30 17 9.00e - 0 2 2.10e - 0 2 3.61e - 0 1 86 145 201 Problem 4 15 150 BCNLS LBFGSB ASTRAL 7.39e - 06 4.81e - 02 4.40e - 02 2.33e - 04 1.87e + 00 2.29e + 00 l . O l e + 02 2.29e + 00 5.42e + 00 16 2000 BCNLS LBFGSB ASTRAL 1 . 2 2 e - 14 l.OOe + 09 l , 4 3 e - 22 2.50e - 07 3 5.31e 2.00e + 06 4 4.83e 6.90e - 1 0 14 1.64e C o n t i n u e d on next + 00 + 02 + 01 page Problem 17 18 Table 5.1 - continued from Algorithm / BCNLS 2.54e - 02 5 LBFGSB 2.54e - 02 ASTRAL 2.54e - 02 BCNLS 2 . 0 1 e - 02 11 LBFGSB 2.01e - 02 ASTRAL 2.01e - 02 n Solver BCNLS LBFGSB ASTRAL Success Failure 10 10 9 5 5 6 previous page \\9II 2.57e - 0 3 2.18e - 0 6 2.52e - 0 2 8.45e - 0 6 4.90e - 0 6 5.35e - 0 5 11 40 76 15 73 357 time(Sec) l.OOe - 01 3.07e - 01 1.37e + 00 2.30e - 01 3.36e - 01 5.65e + 00 Success Rate(%) 67 67 60 Table 5.2: S u m m a r y of the benchmark tests from A l g o r i t h m 566 Overall, the three algorithms have similar success rates on this set of test problems. For P r o b l e m 15 {Chebyquad function), b o t h L - B F G S - B and A S T R A L seem to have difficulties i n finding the solution, but B C N L S has performed reasonably well on the problem, terminated w i t h / = 7.39 x 10"^ and (l^ll = 2.33 X 1 0 " ' ' . A l s o , a l l three solvers failed on P r o b l e m ^{Helical Valley Function), P r o b l e m 10 {the Meyer function) a n d P r o b l e m 14 (the Brown and Dennis function). F o r P r o b l e m 16 [the Brown almost-linear function), both B C N L S and A S T R A L performed well, but I ^ B F G S - B d i d not solve i t . O n the other hand, for P r o b l e m 17 {the Osborne 1 function), L - B F G S - B performed better t h a n A S T R A L a n d B C N L S . For details on these function definitions, refer to [1, 39, 40]. Because b o t h B C N L S and A S T R A L were w r i t t e n i n M a t l a b , whereas L B F G S - B was w r i t t e n i n F o r t r a n , we compare the number of function evaluations among the three algorithms. T h e c o m p u t i n g time is compared only between B C N L S a n d A S T R A L . F i g u r e 5.1 a n d F i g u r e 5.2 illustrate the performance profiles for B C N L S , L - B F G S - B , a n d A S T R A L on the fifteen benchmark nonhnear least-squares problems. F i g u r e 5.1 compares the number of function evaluations among the three solvers. W e observe that the performance profile for B C N L S hes above b o t h L - B F G S - B a n d A S T R A L for T < 2.6. T h a t is, for approximately 5 5 % of the problems, the performance ratio of the function evaluation for the B C N L S a l g o r i t h m is w i t h i n a factor 2.6 of the best possible ratio. Therefore, the performance profile of the number of function evaluations for B C N L S dominates that of L - B F G S - B and A S T R A L . W e conclude that B C N L S is more efficient i n terms of function evaluation. F i g u r e 5.2 compares C P U t i m e between B C N L S a n d A S T R A L . Clearly, the performance profile of C P U time for B C N L S dominates A S T R A L , thus B C N L S uses less C P U time t h a n ASTRAL. Performance profiles, number of function evaluations — • — B C N L S - - LBFGSB •- D - • ASTRAL 9 9^ <>•• I 0.4 ; «& 9" - 6 -6 0.À ,H -A t 1 o-è •—-a - i 2.5 3 X F i g u r e 5.1: Performance profiles on benchmark problems from A l g o r i t h m 566: number of function evaluations. 5.4 C U T E r feasibility problems T h e general constrained C U T E r test problems have the form minimize subject to , f(x) 4 < c(x) (•x <x (5.1) <Uc, <Ux, where a; G M " , c{x) is a vector of m nonlinear real-valued functions of R " —>• R™, 4 a n d Uc are lower a n d upper bounds on c{x), a n d ix a n d Ux are lower a n d upper bounds on x. W e use problems from the C U T E r set to generate a set of nonhnear leastsquares test problems. These problems are cast as that of finding a feasible point to (5.1). S u c h problems can arise as subproblems for more general o p t i m i z a t i o n packages, where it is i m p o r t a n t to have a feasible starting point. W e can rewrite the constrained problem (5.1) as minimize subject to (5.2) fix) c{x) — s = 0, (•X ^ X < Ux- Because we are o n l y interested i n getting a feasible point for the C U T E r prob- Performance profiles, CPU time — B C N L S - a - ASTRAL ^ - 6 0) • -6 o.e( • Oi SS S 0.9 I 9 • -6 • -6 0.3 -o 0.2 0.1' • -<b 0 2.6 3 F i g u r e 5.2: Performance profiles on benchmark problems from A l g o r i t h m 566: C P U time. lem, we can s i m p l y ignore the objective function i n (5.2), and deal only w i t h the constraints i n (5.2). T h e feasibility problem can then be expressed as mmimize subject to èl|c(a.)-*lli ic<S< Uc, T o fit this problem statement into the standard bound-constrained nonhnear least-squares framework, we define u = X = Ux Ur c{x) = c{x) — S. N o w , the feasibility problem has the s t a n d a r d form mmimize subject to él|c(5)iii, (5.3) i<x<û. Clearly, the solution to (5.3) satisfies a l l the constraints i n the original C U T E r problem (5.1). W e refer to the problems generated from C U T E r i n such a way as the C U T E r feasibility problems. T o select the testing problems from the C U T E r environment, we have used the C U T E r selection u t i l i t y [16]. T h e basic criterion for the problem selection Solver BCNLS LBFGSB ASTRAL Success 183 151 166 Failure 14 46 31 Success rate(%) 92.9 76.7 84.3 Table 5.3; S u m m a r y of C U T E r feasibility tests is that the C U T E r problem must have at least one nonlinear constraint. W e do not impose any a d d i t i o n a l restriction on the objective function, the number of variables, bound conditions, and so on. W e installed both the C U T E r and SifDec environment i n large scale on a L i n u x P e n t i u m 4 P C w i t h 3 G H z processor a n d I G B R A M . SifDec can compile a l l the selected problems. However, inside the M a t l a b C U T E r testing environment, the call to csetup w o u l d fail for large problems due to an "out of memory" error. W e thus exclude very large test problems from the selected test list. A s a result, a t o t a l of 197 C U T E r feasibility problems were tested on the three solvers B C N L S , L - B F G S - B , and A S T R A L . T h e m a x i m u m number of variables i n the tested problems is 700, and the m a x i m u m number of constraints c{x) is 284. T o compare the performances of the three solvers, we use the same criteria as i n Section 5.3 to judge the success a n d failure of a solver on any problem. T h e overall outcomes for the solvers is given i n Table 5.3. For this set of test problems, B C N L S has the highest success rate compared w i t h L - B F G S - B a n d A S T R A L . It has solved more problems t h a n either L - B F G S - B or A S T R A L . T h e outcomes i n b o t h Table 5.2 and Table 5.3 suggest that the overall success rate of B C N L S tends to increase as the size of the test set of nonlinear least-squares problems increases. F i g u r e 5.3 illustrates the performance profiles for the number of function evaluations for B C N L S , L - B F G S - B , and A S T R A L . W e can see from F i g u r e 5.3 that the curve of B C N L S lies above the curves of b o t h L - B F G S - B and A S T R A L . T h e performance profile shows that B C N L S uses fewer function evaluations t h a n b o t h L - B F G S - B and A S T R A L i n solving bound-constrained nonlinear leastsquares problems. F i g u r e 5.4 shows the performance profile for C P U time for B C N L S a n d A S T R A L . Because the performance profile of B C N L S dominates that of A S T R A L , F i g u r e 5.4 shows that B C N L S is more efficient t h a n A S T R A L i n C P U time. For large-scale problems, the cost of function evaluations usually dominates the computing time. T h e numerical experiments i n the two sets of test problems show that B C N L S outperforms its counterparts i n solving bound-constrained nonlinear least-squares problems. A s the size of the test set a n d the number of variables increase, the benefits of B C N L S become more promising. T h e results are not surprising to us, because B C N L S takes advantage of the special structure of the nonlinear least-squares functions and thus is efficient in its use of function evaluations, whereas b o t h L - B F G S - B and A S T R A L do not use any structure information about the functions. Performance profile, number of function evaluations X F i g u r e 5.3: Performance profile on C U T E r feasibility problems, number of funct i o n evaluations Performance profile, CPU time X F i g u r e 5.4: Performance profile on C U T E r feasibility problems, C P U time Chapter 6 Curve Fitting in Fluorescence Optical Imaging T h i s chapter describes one industrial a p p l i c a t i o n of the B C N L S solver. I n coll a b o r a t i o n w i t h A R T A d v a n c e d Research Technologies Inc., a company that designs molecular i m a g i n g products for the medical a n d pharmaceutical industries, we have applied B C N L S for solving the nonlinear inverse problems i n t i m e - d o m a i n ( T D ) fluorescence optical-imaging processes. It is a c o m m o n practice i n research a n d i n d u s t r y to estimate parameters t h r o u g h a curve fitting procedure. B y comparing the measured signal w i t h some model d a t a , one can estimate a combination of parameters that minimizes the difference between measured d a t a and the model signals through the general curve fitting techniques, such as least squares. I n this chapter we describe three curve fitting applications that use B C N L S : estimation of fluorescence lifetime, fluorophore depth, and optical properties of a fluorophore. O u r description of the T D fluorescence problem is based on [13]. 6.1 Brief introduction to time domain optical imaging I n s m a l l - a n i m a l fluorescence imaging, fluorophores or labelled biomolecules are injected into small animals such as mice. W h e n an external light source i l l u m i nates a region of interest, the fluorophore transitions into an excited state and then re-emits a low-energy photon upon relaxation. T h e photon can then be detected by some o p t i c a l detectors to generate fluorescence images. T D optical imaging uses photon-counting technology. It measures the arrival t i m e of each detected photon. F i g u r e 6.1 illustrates a t y p i c a l configuration of a T D optical i m a g i n g system [28]. T h e fluorescent decay curve is fundamental to the T D fluorescent optical i m a g i n g techniques. F i g u r e 6.2 illustrates a t y p i c a l fluorescent decay curve, w h i c h is also commonly called the temporal point-spread function ( T P S F ) [28]. T h e T P S F curve contains i m p o r t a n t information about the fluorophore: • depth: the location of the fluorophore; F i g u r e 6.1: T i m e d o m a i n optical i m a g i n g loboo tiine(ns) F i g u r e 6.2: Fluorescent decay curve • concentration: the • fluorescence fluorophore concentration; lifetime: the t i m e for the intensity to decay by a factor of e. Fluorescence lifetime is also defined as the average t i m e a fluorophore remains i n its excited state following excitation. Fluorescent lifetime is an i n t r i n sic property of the fluorophore that depends on the nature of the fluorescent molecule a n d its environment. Fluorophore identification is therefore possible using fluorescence lifetime. T h e relationship between the hfetime a n d the slope of the T P S F is shown i n F i g u r e 6.3. I n general, the shorter the lifetime of a fluorophore, the steeper the falling slope of the T P S F curve. T h e t i m e - d o m a i n approach provides a unique o p p o r t u n i t y to distinguish fluorescent signals w i t h identical spectra based on the differences i n their fluorescent lifetime. W i t h T D technology, fluorophore d e p t h can be estimated from the t e m p o r a l characteristics of the T P S F , a n d fluorophore concentration can be calculated from the intensity a n d photon depth. I n a d d i t i o n , T D technology allows 3 D reconstruc- 1 2 3 4 s 6 7 8 10 8 11 12 F i g u r e 6.3: F l u o r o p h o r e identification w i t h lifetime t i o n of fluorophore distributions. 6.2 Mathematical models of fluorescence signals In fluorescence lifetime imaging, lifetime is measured at each p i x e l a n d displayed as contrast. I n t i m e d o m a i n , the fluorescence signal Fo{t) is a s u m of several decay curves over time: F o ( t i ) = ^ A i e - t ( 6 . 1 ) where t represents t i m e . Ai a n d T , are the a m p l i t u d e a n d lifetime of the ith fluorophore component, a n d DC represents the offset signal i n the d a t a . T h e a m p l i t u d e is related to m a n y characteristics of a fluorophore, such as q u a n t u m y i e l d , e x t i n c t i o n coefficient, concentration, volume, excitation a n d emission spectra, a n d so forth. I n a d d i t i o n , the combination of the t e m p o r a l profile of the excitation laser pulse a n d the system impulse response function ( I R F ) , denoted by S{t), also contributes t o the signal. T h u s , the measured signal F{t) c a n be modeled as F{t) = Fo{t)*S{t), where * denotes the convolution operator. A convolution is a n integral that expresses the amount of overlap of one funct i o n g as i t is shifted over another function / . M a t h e m a t i c a l l y , the convolution of two function / a n d g over a finite range [0, t] is given by [f*9m= f Jo f{r)g{t-T)dT. W h e n fluorophore is embedded inside a b u l k tissue or t u r b i d m e d i u m , there w i l l be two more terms c o n t r i b u t i n g to the convolution: the propagation of excitation light from source to fluorophore, denoted by Hifg — ff,t), a n d the propagation of fluorescent light from fluorophore to detector, denoted by E{ff f d , i ) ; see [13]. L e t f represent a 3-dimensional location vector. T h e locations of fluorophore, light source, a n d detector are represented by f / , fg, and r v , respectively. T h e fluorescence signal F{t) inside tissue has the form F(t) = H{fs * - ff, t) * Fo{t) E(ff - fd, t) * S{t); see [35]. B o t h the H{rs — rf,t) a n d E{ff — fd,t) are very complicated functions of many parameters, w h i c h require knowledge of tissue o p t i c a l properties, such as the absorption coefficient Ha a n d the reduced scattering coefficient p'^, etc, a n d the spatial information of the fluorophore. Such information is usually not available i n practical applications. For a detailed description of the equations, refer to [25, 35]. However, for a fluorescence signal from small-volume biological tissues—such as those obtained i n mice—the light diffusion due to photon p r o p agation, H{fs — ff,t) a n d E{ff —rd,t), does not significantly change the shape of the t e m p o r a l profile of a fluorescence signal, while the photon propagation affects the amplitude a n d the peak position of fluorescence decay curve. U n d e r the condition of a relatively short optical p a t h between the excitation and fluorescent signal, such as a fluorophore inside a mouse, the effect of light diffusion i n tissues can be simplified as a time delay ùd. T h a t is, F{t)^ (6.2) FQ{t)*5{M)*S{t). In practice, a measured signal usually contains a D C component. B y substit u t i n g (6.1) into (6.2), we can express the m a t h e m a t i c a l model for fluorescence signal as F{t)KDC + 5{At)*S(t)*YlAie-^i. (6.3) i Hereafter i n this chapter, a l l numerical experiments are based on the mathem a t i c a l m o d e l (6.3) for the curve fitting applications. 6.3 Curve fitting In general, curve fitting is the process of m i n i m i z i n g the difi'erence between the observed d a t a yd[ti) a n d the d a t a from m a t h e m a t i c a l model ym{ti) over some u n k n o w n parameters. L e t ri = j-iyd{ti) - VmiU)) be the residual between the observed d a t a a n d predicted model, where cXi is a weighting t e r m . T h e objective used for curve fitting is then X^ = \\\r{x)\\l = Y.i [J-(2/<i(i^)-1/™(t.)) In practice, the u n k n o w n parameters usually have known lower a n d upper bounds from their physical settings or based on a priori knowledge of the problem. T h u s , the bound-constrained curve fitting problem i n T D fluorescence optical i m a g i n g has the standard general form of (4.1). For a l l experiments on simulated data, we follow A R T ' s research approach to estimate the fluorescence lifetime, depth, and optical properties separately through curve fitting procedure based on the mathematical model i n Section 6.2. It would be interesting to combine a l l three separate curve fitting problems into one and find an o p t i m i z e d combination of a l l parameters. T h e o p t i m i z a t i o n problem w i l l remain the same i n theory w i t h the only difference of increased number of u n k n o w n parameters. In Section 6.4, 6.5, and 6.6, we describe our experiments of a p p l y i n g the B C N L S solver to estimate the fluorescence lifetime, depth, and optical properties i n T D technology. T h e results are presented i n comparisons w i t h A R T ' s implementation of L e v e n b e r g - M a r q u a r d t algorithm ( A R T _ L M ) . T o precisely compare two o p t i m i z a t i o n algorithms, b o t h methods must use the identical stopping c r i teria. A s a bound-constrained o p t i m i z a t i o n solver, B C N L S uses an o p t i m a l i t y condition based on the projected gradient of the objective function. Conversely, A R T J L M is an unconstrained o p t i m i z a t i o n method which has an o p t i m a l i t y condition based on the gradient of the objective function. In order to compare the c o m p u t a t i o n a l efficiency, we need to adjust the stopping c r i t e r i a i n different algorithms so that two algorithms have the same t e r m i n a t i o n conditions. T h i s change may slightly affect the reported numerical results i n this chapter. 6.4 Fluorescence lifetime estimation In s m a l l - a n i m a l fluorescence lifetime imaging, measurements of fluorescence lifet i m e can y i e l d information on the molecular microenvironment of a fluorescent molecule. M a n y factors i n the microenvironment have effect on the lifetime of a fluorophore. T h e measurements of lifetime can be used as indicators of those factors. In in vivo studies, these factors can provide valuable diagnostic inform a t i o n relating to the functional status of diseases. Furthermore, the lifetime measurements represent considerable practical advantages, as the measurements are generally absolute, being independent of the fluorophore concentrations. 6.4.1 Fitting parameters In lifetime fitting procedure, the parameters that need to be estimated include Ai, Ti, At, a n d DC. For a multiple lifetime fluorescence signal composed of n fluorophores, there w i l l be 2n -I- 2 parameters. However, one can use two strategies to reduce the number of fitting parameters by two. One strategy is to estimate the D C component separately; another is to use relative amplitude ai inste.ad of absolute amplitude A^. T h e normalized amplitude ai is given by T h i s allows one of the relative amplitudes to be determined by the n o r m a l ization feature of Q J , i.e., Yli'^i ~ 1- C)ur numerical experiments adopt b o t h of these strategies, a n d so the t o t a l number of fitting parameters is 2n for a fluorescence signal F{t) composed of n fluorophore components. 6.4.2 Parauneter bounds In fluorescence lifetime fitting, the unknown parameters have k n o w n bounds: Tmin ^ fi ^ Tmux 0 < ai < 1 0<At< r/6 0 < D C < Cnaax/3, where T is the size of the time window of a T P S F , Cmax is the count m a x i m u m of a T P S F , Train = 0.25ns, rmax = T/i. T h e bounds of a n d the lower b o u n d of D C originate from their physical limits. A l l other bounds are p r a c t i c a l l i m i t s i n order to increase the reUabihty of fitting results. 6.4.3 Simulated data for lifetime fitting A U d a t a used i n the experiments are generated a n d provided by A R T . Several sets of simulated d a t a were used to estimate parameters through solving the nonlinear least squares o p t i m i z a t i o n problem (4.1). E a c h d a t a set was generated by v a r y i n g one of the following parameter: • fluorophore depth d • signal D C level • relative amplitude a , • lifetime gap A T • mean lifetime f • m a x i m a l signal a m p l i t u d e Cmax', the other parameters are held fixed. In order to o b t a i n statistics of fitting results, r a n d o m Poisson noise was generated a n d added to the fluorescence signal, a n d m u l t i p l e trials were performed for each set of s i m u l a t i o n . 6.4.4 Experimental results T h r e e sets of error measurements are defined to characterize the performance of different o p t i m i z a t i o n methods for curve fitting. T h e figure of merit includes the fitting error, standard deviation, a n d error margin of fitted values. T h e fitting error reflects the fitting accuracy. T h e standard deviation indicates the T=1.00 1.84ns, f=0.26 0.74, DC=-2, t0=0.16 ns time (ns) F i g u r e 6.4: A n example of fitted T P S F curve repeatability of the fitting. E r r o r m a r g i n refers to the m a x i m u m error i n each d a t a set. T h e error m a r g i n measures the robustness of the fitting method. F i g u r e 6.4 shows one example of the curve fitted by B C N L S i n fluorescence lifetime fitting process. In this example, the k n o w n parameters i n the simulated d a t a are set as n = 0.98, T 2 = 1 . 8 2 , / i = 0 . 2 5 , / 2 = 0.75, DC = 0, a n d the computed parameters are n = 1,T2 = 1 . 8 4 , / i = 0 . 2 6 , / 2 = 0.74, DC = —2, a n d the final objective value = 1-0003. F o r a d a t a set w i t h about 1000 d a t a points, the fitted errors of reconstructed values are w i t h i n reasonable l i m i t s for a n i n d u s t r y problem. Some t y p i c a l results for fifetime fitting are i l l u s t r a t e d i n F i g u r e 6.5 a n d F i g ure 6.6, where F i g u r e 6.5 results from not t r e a t i n g D C as a fitting parameter, a n d F i g u r e 6.6 results from i n c l u d i n g D C as a fitting parameter. I n b o t h cases, the same simulated d a t a file w i t h v a r y i n g fluorophore depth was used. I n F i g ures 6.5 a n d 6.6, the computed results for each fltting parameter are presented i n terms of the error of fitted values related to the true values (errTr), the error m a r g i n ( e r r M a x ) , the s t a n d a r d deviation (std), a n d the objective value at the solution (x^). Table 6.1 summarizes the effect of D C component on fluorophore lifetime fitting results from Figures 6.5 a n d 6.6. It shows that estimating D C component separately has the effect of reducing overall fitting errors a n d hence improve the accuracy of the fitting results. T h i s suggests that reducing the number of fitting parameters results i n fewer degrees of freedom of the system a n d increases the likelihood of the fitting parameters being close to the true value. For numerical examples of A R T ' s i m p l e m e n t a t i o n of L e v e n b e r g - M a r q u a r d t e r r T r = 0 . 0 9 1 , eiTMax=0.51, s t d = 0 . 1 8 errTr=0.092, errMax=0.38, std=0.12 1.4 stdBar 1.2 true 1 0.8 6 10 depth errTr=0.097, errMax=0.38, std=0.15 5 10 depth errTr=0.097, errMax=!0.38. std=0.15 1 0.8 r i 0.6 0.4 0.2 I StdBar true 11 X r StdBar true 11 1 1 j1 t ; I I T 5 10 deptti errTr=0.000, errlVlax=0.00, std=0.00 1 0.5 X StdBar • true 0 -0.5 -1 5 10 depth F i g u r e 6.5: L i f e t i m e fitting results of simulated d a t a w i t h v a r y i n g depth, without fitting D C component algorithms ( A R T i M ) , refer t o [13]. Table 6.2 shows a comparison of B C N L S and A R T _ L M o n the same simulated d a t a . Note that B C N L S gets better results i n terms of fitting errors. However, A R T X M takes slightly less c o m p u t a t i o n t i m e compared w i t h B C N L S . T h i s is likely due t o the parameter transformation used i n A R T X M t o handle the b o u n d constraints indirectly. 6.5 Fluorophore depth estimation Fluorescent inclusion d e p t h can be estimated b y fiuorescent T P S F peak position if the hfetime of the fluorophore a n d m e d i u m o p t i c a l properties are k n o w n a priori [25]. D u e t o noise, it is not accurate t o s i m p l y take the peak position from the T P S F . Instead, a curve fitting procedure is employed t o determine the depth. T o avoid the i n s t a b i l i t y due t o noise at the beginning a n d ending parts of the T P S F , where the count level is low a n d relative noise level is h i g h , we estimate fluorophore d e p t h b y only using the d a t a p o r t i o n near the peak of the TPSF. T h e d e p t h estimation also permits recovery of the fluorophore concentrat i o n . T h e fitting parameters i n the fluorophore depth estimation consist of fluorophore depth, and amplitude. T h e bounds of a l l the parameters come from their physical limitations. S i m i l a r t o the lifetime fitting procedure, simulated d a t a w i t h r a n d o m noise were generated and added t o the fiuorescence signal. A curve fitting procedure errTi=0.127, errMax=0.46, errTi^O.226, errMax=1.15, std=0.22 std=0.11 1.3 ^ I 1-2 stdBar StdBar true true 1.1 1 5 0.9 10 0.8 1 0.6 10 15 depth e r r T R O . 1 6 8 , errMax=0.42, std=0.13 1 s- 5 16 depth errTr=0.158, errMax=0.42, std=0.13 5 0.4 StdBar X StdBar true • true 10 • 11 16 depth ( errTr=1.748, errMax=5.35, std=0.96 StdBar true 0 5 10 16 depth F i g u r e 6.6: - 3L i f e t i m e fitting results of simulated d a t a w i t h v a r y i n g depth, w i t h fitting D C component is used to attempt to recover the k n o w n fluorophore depth and concentration. Similarly, error m a r g i n and standard deviation are used to compare the depth fitting results for different algorithms. Table 6.3 shows the depth fitting results using B C N L S for one set of benchmark d a t a . T h e abbreviations used i n Table 6.3 are • d: the calculated depth; • con: the computed • mnErrMgn: fluorophore concentration; the average of the error m a r g i n over a l l d a t a sets; • r M n M g n ( % ) : the average of the ratio of error m a r g i n to true value over a l l statistic d a t a sets; • r M a x M g n ( % ) : the m a x i m u m of the ratio of error m a r g i n to true value over a l l statistic d a t a sets; • m e a n S t d : the average of the standard deviation over a l l d a t a sets; • r M e a n S t d ( % ) : the average of the ratio of std to true value over a l l d a t a sets; • r M a x S t d ( % ) : the m a x i m u m of the ratio of std to true value over a l l d a t a sets. ^- Chapter 6. Curve Fitting in Fluorescence Optical Imaging 47 s- Parameter n T2 ai DC Comparison errTr errMax std errTr errMax std errTr errMax std errTr errMax std With DC No D C 0.09 0.51 0.18 0.09 0.38 0.12 0.10 0.38 0.15 0.00 0.00 0.00 0.13 0.46 0.11 0.23 1.15 0.22 0.16 0.42 0.13 1.75 5.35 0.96 Table 6.1: C o m p a r i s o n of lifetime fitting w i t h a n d without D C component errCen=0.02, errMax=0.11, std=0.06 * o errorMaxbar X StdBar 1 • true 8 10 1 1 "2 4 6 12 tnjBDeptil F i g u r e 6.7: E s t i m a t e d depth versus true depth In Table 6.3, we compare the depth fitting results for four sets of d a t a points: (A) a l l d a t a points, (B) d a t a points w i t h depth greater t h a n 1 m m , (C) d a t a points w i t h S N R greater t h a n 20, (D) d a t a points w i t h S N R greater t h a n 20 and depth greater t h a n 1mm. T h e results show that better results can be obtained i n the depth fitting procedure by selecting those d a t a points w i t h S N R > 20 a n d d e p t h > 1mm. Table 6.4 compares the depth estimation results from B C N L S a n d A R T _ L M for the latter case. Notice that a l l error margins and s t a n d a r d deviations obtained w i t h B C N L S are very close to those obtained w i t h the A R T X M a l g o r i t h m . F i g u r e 6.7 illustrates the estimated depth compared w i t h true depth. F i g ure 6.8 shows the relationship between the error of the estimated depth a n d the signal to noise ratio S N R . In general, signal w i t h higher signal to noise ratio results i n more accurate estimated depth. These results by B C N L S also m a t c h w i t h the results from A R T _ L M very closely. Parameter Comparison n T2 Oil DC BCNLS ARTXM errTr errMax std 0.07 0.17 0.09 0.07 0.19 0.11 errTr errMax std errTr errMax std 0.13 0.35 0.17 0.19 0.52 0.23 0.11 0.29 0.14 errTr errMax std 1.53 2.59 0.11 0.28 0.15 2.92 7.71 0.83 43.00 36.00 C P U time (seconds) . 2.96 Table 6.2: C o m p a r i s o n of lifetime fitting using B C N L S and A R T X M D a t a points mnErrMgn rMnMgn rMaxMgn meanStd rMeanStd rMaxStd A l l (A) d 0.28 25.41 259.45 0.16 15.61 139.94 d > 1 m m (B) con 64.56 24.47 a 0.21 3.84 216.93 21.54 30.89 0.12 2.22 15.23 8.33 92.98 con 63.04 23.88 90.73 21.59 8.30 49.86 S N R > 20 (C) d 0.17 19.75 178.82 0.11 12.56 93.62 con 34.75 13.15 44.91 13.48 5.12 15.19 (B) & (C) d 0.11 1.96 8.63 0.06 1.17 5.29 con 34.91 13.24 44.91 13.71 5.20 15.19 Table 6.3: C o m p a r i s o n of depth fitting results for ( A ) a l l d a t a points, (B) d a t a points w i t h depth > 1mm, (C) d a t a points w i t h S N R > 20, (D) d a t a points w i t h S N R > 20 and depth > 1mm 6.6 Optical properties estimation T h e effective optical properties, absorption coefficient /Xa a n d reduced scattering coefficient / i ^ , of biological tissues play a critical role i n most algorithms for processing optical fluorescence data. For small animals such as mice, there can be large variations of the o p t i c a l properties w i t h i n a given region of interest. In estimation of optical properties, the curve fitting problem has the same general form as the curve fitting problem i n Section 6.3. T h e unknown p a r a m eters are the optical properties /i» and /x^, and the a m p l i t u d e A. T h e lower and upper bounds of the optical properties are based on a priori knowledge of the m e d i u m o p t i c a l properties values. In our experiments, these bounds are set as 0.0001 <p.a< 0.2 <p'^< 0.85 <A< 0.2 3.5 1.15, „ . comparison ^ç^^^g mnErrMgn rMnMgn rMaxMgn meanStd rMeanStd rMaxStd 0.11 1.96 8.63 0.06 1.17 5.29 Depth ARTXM Concentration BCNLS ARTXM 0.11 2.00 8.60 0.06 1.20 5.30 34.91 13.24 44.91 13.71 5.20 15.19 34.91 13.20 44.90 13.71 5.20 15.20 Table 6.4: C o m p a r i s o n of d e p t h fitting results for B C N L S and A R T _ L M for d a t a points w i t h S N R > 20 a n d d e p t h > 1 m m rErrCen=0.5%, rErrMax=2.0%, rStd=1.2% 11r^ • 10 0 errorMaxbari X StdBar true 25 30 35 bestSNR 40 45 50 F i g u r e 6.8: R e l a t i v e accuracy of calculated d e p t h versus signal S N R where the units of M o a n d p'^ are 6.6.1 mm~^. Simulation for estimating optical properties A n I R F curve w i t h a peak count of 1000 a n d a f u l l - w i d t h - a t - h a l f - m a x ( F W H M , or the time between 5 0 % of the m a x i m u m count level before and after the peak) of 400 picoseconds was numerically generated. T h e peak count of the T P S F s was fixed to be 1500. Different T P S F curves were calculated using each pair of {Ha, p's) c o m b i n a t i o n from the pre-selected sets of values i n Table 6.5. For comparisons, one set of tests was carried out w i t h o u t noise. F o r each pair of (/Xa, Ms), multiple trials were performed by a d d i n g Poisson noise (shot noise) to the T P S F a n d I R F curves. T h e mean values of the fitted o p t i c a l properties a n d s t a n d a r d deviation for a l l the trials were then computed. T h e relative errors are the differences between the fitted values a n d their corresponding true values are shown i n Table 6.5. Parameter Ma Values ( m m ^) 0.001, 0.005, 0.01, 0.03, 0.05, 0.1 0.2, 0.5, 1, 2, 3 Table 6.5: fia and p'^ values used i n simulation Comparison Residual T i m e (s) R e l Pa (%) Rel (%) Statistic Mean STD Mean STD Mean STD Mean STD without ART_LM 0.00 0.00 0.15 0.03 -0.00 0.00 0.00 0.00 noise BCNLS 0.00 0.00 0.17 0.08 0.00 0.00 0.00 0.00 w i t h noise ART_LM BCNLS 65.71 65.60 1.95 1.96 0.16 0.13 0.03 0.13 -0.22 -1.48 4.10 4.56 0.00 -0.35 0.79 0.75 Table 6.6: C o m p a r i s o n of results of optical properties w i t h A R T i M and B C NLS 6.6.2 Results for optical properties W e have compared the performance of the solver B C N L S w i t h A R T X M , note that no parameter transform was used w i t h A R T _ L M i n this case. T h e comparison criteria for different algorithms include the relative error, residual function value at the solution, a n d computation time. Table 6.6 shows the results of o p t i c a l properties for the simulated d a t a . For the d a t a without Poisson noise, B C N L S performs as well as A R T X M method. However, for the simulations w i t h Poisson noise, B C N L S outperforms the A R T X M method. 6.7 Discussion T h e numerical experiments on estimation of fluorescence lifetime, depth and optical properties at A R T illustrate that B C N L S c a n be applied to solve the curve fitting problems w i t h bounds on parameters i n practice. T h e challenges i n fluorescence lifetime fitting lie i n multiple lifetime fitting. A l t h o u g h the experimental results presented i n this chapter are l i m i t e d to fluorescent signals of two fluorophore components, B C N L S can be easily adapted to the cases for multiple fluorescence lifetime curve fitting. Moreover, curve fitting is a process of finding a n o p t i m i z e d combination of unknown parameters by some o p t i m i z a t i o n method. In theory, there are some differences between the solutions of curve fitting o p t i m i z a t i o n and the reconstruction of the known values i n the simulation. G i v e n the m a t h e m a t i c a l model for fluorescence signal i n Section 6.2, our experiments reveal that a good curve fitting optimization solution may not always result i n satisfactory reconstruction of the u n k n o w n . Nevertheless, the experiments show that the B C N L S solver is able to handle the simple bounds on variables i n this curve fitting application. Chapter 7 Conclusions and Future Work I n this thesis, we have developed the B C N L S a l g o r i t h m for bound-constrained nonhnear least squares that solves a sequence of bound-constrained linear least squares subproblems, using techniques motivated by the L e v e n b e r g - M a r q u a r d t m e t h o d a n d ^2-norm regularization. T h e subproblems are solved by B C L S , an existing solver for bound-constrained hnear least squares. O u r approach differs substantially from previous general bound-constrained o p t i m i z a t i o n methods. O u r approach takes advantage of the special structure of the objective a n d uses the t w o - n o r m regularization to formulate the subproblems. B C N L S efficiently uses of function evaluations at the expense of potentially solving more linear linear-squares subproblems. T h e convergence properties of the new m e t h o d are generally as good as, or better t h a n , quasi-Newton methods such as L - B F G S - B , or the trust-region approach such as A S T R A L . P r e h m i n a r y results are promising. T h e i n d u s t r i a l apphcation of curve fitting i n fluorescence o p t i c a l i m a g i n g fleld suggests that the new a l g o r i t h m may prove to be useful i n practice. There is much scope for further development of the a l g o r i t h m described i n this thesis. One p r o m i s i n g direction w o u l d be to allow the user to choose preconditioners to speed u p the process of solving the subproblems for large-scale systems. It is well-known that conjugate-gradient methods can be accelerated if a nonsingular preconditioner m a t r i x M is available. Because the subproblem solver B C L S uses the conjugate gradient L S Q R a l g o r i t h m , we beheve that preconditioning w o u l d be useful i n our proposed a l g o r i t h m . However, good preconditioners are h a r d to define, especially for large a n d sparse systems, although B C L S is capable to solve the subproblem w i t h user-supplied preconditioning routines. F i n a l l y , the d a m p i n g parameter 5 plays a critical role i n the proposed a l g o r i t h m B C N L S . T h e performance of the u p d a t i n g strategy for 5 depends o n problems. It is a longstanding challenge to find a perfect u p d a t i n g strategy for different o p t i m i z a t i o n problems. T h e self-adaptive L e v e n b e r g - M a r q u a r d t strategy works well i n general, but further investigation may lead to better u p d a t i n g strategies. Bibliography [1] V . Z. A v e r b u k h , S. Figueroa, a n d T . Schlick. R e m a r k on A l g o r i t h m - 5 6 6 . Transactions on Mathematical Software, 20(3):282-285, 1994. [2] D . P . Bertsekas. Constrained ods. A c a d e m i c Press, 1982. [3] Â. Bjorck. Numerical phia, 1996. Optimization and Lagrange Multiplier Meth- Methods for Least Squares Problems. S I A M , P h i l a d e l - [4] J . B u r k e , J . J . M o r e , a n d G . Toraldo. Convergence properties of trust region methods for linear a n d convex constraints. Math. Program., 47(3):305-336, 1990. [5] R . H . B y r d , P. L u , J . Nocedal, a n d C . Z h u . A l i m i t e d memory a l g o r i t h m for b o u n d constrained o p t i m i z a t i o n . SIAM Journal on Scientific and Statistical Computing, 16(5):1190-1208, 1995. [6] A . R . C o n n , N . L M . G o u l d , a n d P . L . Toint. G l o b a l convergence of a class of trust region algorithms for o p t i m i z a t i o n w i t h simple bounds. SIAM J. Numer. Anal, 25:433-460, 1988. [7] A . R . C o n n , N . I. M . G o u l d , and P . L . Toint. Testing a class of m e t h ods for solving m i n i m i z a t i o n problems w i t h simple bounds on variables. Mathematics of Computation, 50(182):399-430, 1988. [8] A . R . C o n n , N . L M . G o u l d , a n d P. L . Toint. LANCELOT: age for large-scale nonlinear optimization (Release A). 1992. A Fortran packSpringer-Verlag, [9] A . R . C o n n , N . I. M . G o u l d , a n d P. L . Toint. Trust-Region S I A M Series on O p t i m i z a t i o n . S I A M , 2000. Methods. M P S - [10] H . D a n , N . Y a m a s h i t a , and M . F u k u s h i m a . Convergence properties of the inexact Levenberg-Marquardt method under local error bound. Optimization methods and software, 17:605-626, 2002. [11] R . S. D e m b o , S. C . Eisenstat, and T . Seihaug. Inexact N e w t o n methods. SIAM J. Numer. Anal, 19:400-408, 1982. [12] E . D . D o l a n and J . J . M o r e . B e n c h m a r k i n g o p t i m i z a t i o n software w i t h performance profiles. Mathematical Programming, pages 201-213, 2002. [13] G . M a et al. Fluorescence lifetime estimation of multiple near-infrared dyes i n mice. In F . S. A z a r a n d X . Intes, editors, Multimodal Biomedical Imaging III, volume 6850. Proceedings of the International Society of O p t i m a l Imaging, February 2008. [14] J . F a n a n d J . P a n . Convergence properties of a self-adaptive LevenbergM a r q u a r d t a l g o r i t h m under local error bound condition. Computational Optimization and Applications, 34:47-62, 2006. [15] J . Y . F a n . A modified Levenberg-Marquardt a l g o r i t h m for singular system of nonlinear equations. Journal of Computational Mathematics, 21(5):625636, 2003. [16] R . Felkel. Selection of nonlinear programming problems using the C U T E classification scheme, J u l y 1999. [17] C . Fraley. Solution of Nonlinear ford University, 1987. Least-Squares Problems. P h D thesis, S t a n - [18] M . P . Friedlander. B C L S : B o u n d constrained least-squares, M a r c h 2007. http://www.cs.ubc.ca/~mpf/bcls/. [19] M . P. Friedlander and K . H a t z . C o m p u t i n g nonnegative tensor factorizations. Tech. Rep. T R - 2 0 0 6 - 2 1 , D e p t . C o m p u t e r Science, U n i v e r s i t y of B r i t i s h C o l u m b i a , Vancouver, December 2007. T o appear i n Optimization Methods and Software. [20] P. E . G i l l , W . M u r r a y , and M . H . W r i g h t . Practical Press, 1981. [21] G . H . G o l u b and C . L . V a n L o a n . Matrix 1983. Optimization. Computations. Academic Johns-Hopkins, [22] N . I . M . G o u l d , D . O r b a n , a n d P . L . Toint. C U T E r and SifDec: A constrained a n d unconstrained testing environment, revisited. ACM Transaction on Mathematical Software, 29(4):373-394, 2003. [23] W . Hager a n d H . Zhang. A new active set a l g o r i t h m for box constrained o p t i m i z a t i o n . S I AM Journal on Optimization, 17(2):525-557, 2006. [24] W . Hager a n d H . Zhang. Recent advances i n b o u n d constrained o p t i m i z a tion. In F . CeragioU, A . Dontchev, H . F u r u t a , K . M a r t i , a n d L . Pandolfi, editors, System Modeling and Optimization, volume 199, pages 67-82. I F I P International Federation for Information Processing, Springer, 2006. [25] D . H a l l , G . M a , F . Lesage, a n d Y . W a n g . Simple t i m e - d o m a i n optical method for estimating the depth and concentration of a fluorescent i n c l u sion i n a t u r b i d m e d i u m . Optics Letter, 29(19) :2258-2260, 2004. [26] L . H e i . A self-adaptive trust region method. A c a d e m y of Sciences, 2000. R e p o r t , A M S S . Technical report, Chinese [27] J . E . Dennis J r . , D . M . Gay, a n d R . E . Welsch. A l g o r i t h m 573: N L 2 S 0 L - a n adaptive nonlinear least-squares algorithm. ACM Transactions on Mathematical Software, 7(3):369-383, Sept. 1981. [28] J . L a k o w i c z . Principles of fluorescence spectroscopy. Springer, 2006. [29] C . L . L a w s o n a n d R . J . H a n s o n . Solving Least Squares Problems. Number 15 i n Classics i n A p p l i e d M a t h e m a t i c s . Society of Industrial a n d A p plied M a t h e m a t i c s , 1995. O r i g i n a l l y published: P r e n t i c e - H a l l , Englewood Cliffs, N J , 1974. [30] C . L . L a w s o n a n d R . J . H a n s o n . Solving Least Squares Problems. 1995. SIAM, [31] F . L e b l o n d , S. Fortier, a n d M . P. Friedlander. Diffuse optical fluorescence tomography using time-resolved d a t a acquired i n transmission. In F r e d S. A z a r , editor. Multimodal Biomedical Imaging II, volume 6431. Proceedings of the International Society of O p t i m a l Imaging, February 2007. [32] K . Levenberg. A n method for the solution of certain nonlinear problems i n least squares. Quarterly of Applied Mathematics, 2:164-168, 1944. [33] C . L i n a n d J . J . More. Newton's method for large bound-constrained o p t i m i z a t i o n problems. SIAM Journal on Optimization, 9(4):1100-1127, 1999. [34] D . C . L i u a n d J . Nocedal. O n the l i m i t e d memory method for large scale o p t i m i z a t i o n . Mathematical Programming B, 45(3):503-528, 1989. [35] G . M a , J . Delorme, P . G a l l a n t , and D . Boas. C o m p a r i s o n of simplified M o n t e C a r l o simulation and diffusion approximation for the fluorescence signal from phantoms w i t h t y p i c a l mouse tissue optical properties. Applied Optics, 46(10):1686-1692, 2007. [36] K . M a d s e n , H . B . Nielsen, and O. Tinglefï. M e t h o d s for nonlinear least squares problems. Technical report, Technical University of D e n m a r k , A p r i l 2004. [37] D . W . M a r q u a r d t . A n a l g o r i t h m for least squares estimation of nonlinear parameters. Journal of the Institute of Mathematics and its Applications, 11(2):431-441, J u n e 1963. [38] J . J . More, B . S. G a r b o w , and K . E . H i l l s t r o m . User guide for M I N P A C K - 1 . A N L - 8 0 - 7 4 , A r g o n n e N a t i o n a l Laboratory, Argonne, 1980. [39] J . J . M o r e , B . S. G a r b o w , and K . E . H i l l s t r o m . A l g o r i t h m 566: F O R T R A N subroutines for testing unconstrained o p t i m i z a t i o n software. ACM Transactions on Mathematical Software, 7(1):136-140, 1981. [40] J . J . More, B . S. G a r b o w , and K . E . H i l l s t r o m . Testing unconstrained o p t i m i z a t i o n software. ACM Transactions on Mathematical Software, 7(1):1741, 1981. [41] J . J . M o r e , D . C . Sorensen, K . E . H i l l s t r o m , and B . S. G a r b o w . T h e M I N P A C K project. In W . J . Cowell, editor, Sources and Development of Mathematical Software, pages 88-111. P r e n t i c e - H a l l , 1984. [42] J . J . More. T h e Levenberg-Marquardt algorithm: implementation a n d theory. I n G . A . W a t s o n , editor, Lecture Notes in Mathematics 630: Numerical Analysis, pages 105-116. Springer-Verlag, 1978. [43] J . J . More. Recent developments i n algorithms and software for trust region methods. In A . B a c h e m , M . Grotschel, a n d B . K o r t e , editors. Mathematical Programming: The State of Art, pages 258-287. Springer-Verlag, 1983. [44] J . J . More and S. J . W r i g h t . Optimization t r i a l and A p p l i e d M a t h e m a t i c s , 1993. software guide. Society of Indus- [45] D . D . M o r r i s o n . M e t h o d s for nonhnear least squares problems and convergence proofs, t r a c k i n g programs and o r b i t a l determinations. In Proceedings of the Jet Propulsion Laboratory Seminar, pages 1-9, 1960. [46] H . B . Nielsen. D a m p i n g parameter i n M a r q u a r d t ' s method. Technical report, Informatics a n d M a t h e m a t i c a l M o d e l l i n g , Technical U n i v e r s i t y of D e n m a r k , D T U , R i c h a r d Petersens P l a d s , B u i l d i n g 321, D K - 2 8 0 0 K g s . L y n g b y , A p r i l 1999. [47] J . N o c e d a l and S. J . W r i g h t . Numerical edition, 2006. Optimization. Springer, second [48] C . C . Paige a n d M . A . Saunders. L S Q R : A n a l g o r i t h m for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software, 8(1):43-71, 1982. [49] M . J . D . Powell. Convergence properties of a class of m i n i m i z a t i o n algor i t h m s . I n R . R . M e y e r a n d S . M . Robinson, editors. Nonlinear Programming 2, pages 1-27. A c a d e m i c Press, 1975. [50] K . Schittkowski. D F N L P : A F o r t r a n implementation of an S Q P - G a u s s N e w t o n a l g o r i t h m . R e p o r t , Department of C o m p u t e r Science, U n i v e r s i t y of B a y r e u t h , 2005. [51] A . N . T i k h o n o v and V . Y . A r s e n i n . Solutions and Sons, 1977. [52] L . N . Trefethen a n d D . B a u III. Numerical of ill-posed problems. W i n s t o n Linear Algebra. S I A M , 1997. [53] L . X u a n d J . B u r k e . A S T R A L : A n active set /oo-trust-region a l g o r i t h m for box constrained o p t i m i z a t i o n . Optimization online, J u l y 2007. [54] N . Y a m a s h i t a a n d M . F u k u s h i m a . O n the rate of convergence of the Levenberg-Marquardt method. Computing (Suppl), 15:239-249, 2001. [55] Y . X . Y u a n . Trust region algorithms for nonlinear equations. 1:7-20, 1998. Information, [56] Y . X . Y u a n . A review of trust region algorithms for o p t i m i z a t i o n . In J . M . B a l l a n d J . C . R . H u n t , editors, ICM99: Proceedings of the fourth international congress on Industrial and Applied Mathematics, pages 271-282. O x f o r d U n i v e r s i t y Press, 2000. [57] C . Z h u , R . B y r d , P . L u , a n d J . Nocedal. A l g o r i t h m 778: L - B F G S - B , fortran subroutines for large-scale bound-constrained o p t i m i z a t i o n . ACM Transactions on Mathematical Software, 23(4):550-560, 1997.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A Levenberg-Marquardt method for large-scale bound-constrained...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A Levenberg-Marquardt method for large-scale bound-constrained nonlinear least-squares Shan, Shidong 2008
pdf
Page Metadata
Item Metadata
Title | A Levenberg-Marquardt method for large-scale bound-constrained nonlinear least-squares |
Creator |
Shan, Shidong |
Publisher | University of British Columbia |
Date Issued | 2008 |
Description | The well known Levenberg-Marquardt method is used extensively for solving nonlinear least-squares problems. We describe an extension of the Levenberg- Marquardt method to problems with bound constraints on the variables. Each iteration of our algorithm approximately solves a linear least-squares problem subject to the original bound constraints. Our approach is especially suited to large-scale problems whose functions are expensive to compute; only matrix-vector products with the Jacobian are required. We present the results of numerical experiments that illustrate the effectiveness of the approach. Moreover, we describe its application to a practical curve fitting problem in fluorescence optical imaging. |
Extent | 7309262 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-03-06 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0051461 |
URI | http://hdl.handle.net/2429/5648 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2008-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2008_fall_shan_shidong.pdf [ 6.97MB ]
- Metadata
- JSON: 24-1.0051461.json
- JSON-LD: 24-1.0051461-ld.json
- RDF/XML (Pretty): 24-1.0051461-rdf.xml
- RDF/JSON: 24-1.0051461-rdf.json
- Turtle: 24-1.0051461-turtle.txt
- N-Triples: 24-1.0051461-rdf-ntriples.txt
- Original Record: 24-1.0051461-source.json
- Full Text
- 24-1.0051461-fulltext.txt
- Citation
- 24-1.0051461.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0051461/manifest