Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Variable metric methods without exact one- dimensional searches. Fitzpatrick, Gordon James 1972

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-UBC_1972_A7 F58.pdf [ 3.48MB ]
Metadata
JSON: 831-1.0101566.json
JSON-LD: 831-1.0101566-ld.json
RDF/XML (Pretty): 831-1.0101566-rdf.xml
RDF/JSON: 831-1.0101566-rdf.json
Turtle: 831-1.0101566-turtle.txt
N-Triples: 831-1.0101566-rdf-ntriples.txt
Original Record: 831-1.0101566-source.json
Full Text
831-1.0101566-fulltext.txt
Citation
831-1.0101566.ris

Full Text

C l VARIABLE METRIC METHODS WITHOUT EXACT ONE-DIMENSIONAL.SEARCHES by Gordon James B.A. Univ e r s i t y B.Sc. Un i v e r s i t y F i t z p a t r i c k of A l b e r t a 1965 of Calgary 1970 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE i n the Department of E l e c t r i c a l Engineering We accept this thesis as conforming to the required standard. THE UNIVERSITY OF BRITISH COLUMBIA September 1972 In present ing th i s thes is in pa r t i a l f u l f i lmen t o f the requirements fo r an advanced degree at the Un ivers i t y of B r i t i s h Columbia, I agree that the L ib ra ry sha l l make i t f r ee l y ava i l ab le for reference and study. I fu r ther agree that permission for extensive copying o f th is thes i s for s cho la r l y purposes may be granted by the Head of my Department or by h is representat ives . It is understood that copying or pub l i c a t i on o f th i s thes i s fo r f i nanc i a l gain sha l l not be allowed without my wr i t ten permiss ion. Department of E l e c t r i c a l E n g i n e e r i n g The Un ivers i t y of B r i t i s h Columbia Vancouver 8, Canada D a t e October 6. 1972 ABSTRACT The s e l e c t i o n of updating formulas f o r the H matrix and the subproblem of one-dimensional search are considered f o r v a r i a b l e metric algorithms not using exact l i n e a r searches i n the subproblem. I t i s argued that the convex class of updating formulas given by Fletcher i s the l o g i c a l choice f o r such algorithms. I t i s shown that d i r e c t l i n e a r searches are more e f f i c i e n t than l i n e a r searches using d i r e c t i o n a l d e r i -vatives at each point, p a r t i c u l a r l y f or objective functions of many va r i a b l e s . Features of algorithms using the convex class of formulas without exact searches are discussed. I t i s proven that e f f e c t s on these algorithms of s c a l i n g of the v a r i a b l e s of the objective function are iden-t i c a l to e f f e c t s of transforming the i n i t i a l H matrix. I t i s predicted that regions of a function where the Hessian i s non-positive d e f i n i t e may be detrimental to convergence of these algorithms. A function i s given f o r which Fletcher's recent algorithm (which does not use any l i n e a r search for a minimum) f a i l s to converge. Attempts were made to devise a test to detect non-convexity of a f u n c t i o n so that an algorithm using no l i n e a r search f o r a minimum i n convex regions and an a l t e r n a t i v e strategy i n non-convex regions could be developed to overcome this problem. Algorithms incorporating a t e s t f o r non-convexity were coded as w e l l as Fletcher's algorithm, an algorithm using a weak quadratic d i r e c t minimizing search, and an algorithm using the weak cubic minimizing search as used i n the Fletcher Powell method. Numerical experiments were performed on functions from the l i t e r a t u r e and functions developed by the author. i Fletcher's algorithm was found to be i n e f f i c i e n t on a l l functions i n comparison to the weak quadratic search algorithm where the i n i t i a l H matrix had small eigenvalues. Where Fletcher's algorithm was r e l a t i v e l y e f f i c i e n t , the former search was i n a l l cases competitive. The value of a d i r e c t over d e r i v a t i v e l i n e a r search i s demonstrated. The algorithms using a test f o r convexity were not e f f e c t i v e , since the best was not ge n e r a l l y " e f f e c t i v e i n detecting non-convexity. I t i s concluded that algorithms without a form of one-dimensional weak minimizing search are not s u i t a b l e for use on general minimization problems, and that the weak quadratic d i r e c t search proposed i s a more e f f i c i e n t and r e l i a b l e a l t e r n a t i v e . i i TABLE OF.CONTENTS Page ABSTRACT • i TABLE OF CONTENTS • i i i LIST OF TABLES . ; i v LIST OF FIGURES . . . . . . . . . . v ACKNOWLEDGEMENT • • • v i I. INTRODUCTION . . . . . . 1 I I . UPDATING FORMULAS FOR H • . . .. 6 I I I . THE ONE-DIMENSIONAL SEARCH SUB-PROBLEM . . . 10 IV. ALGORITHMS WITH WEAK SEARCHES USING FLETCHER'S FORMULAS . 19 V. NUMERICAL EXPERIMENTS 24 VI. CONCLUSIONS .51 REFERENCES 53 APPENDIX I. Equivalence of Transformation of Variables and Transformation of I n i t i a l H Estimate 55 APPENDIX I I . Linear Searches . . . . . . . . . 58 APPENDIX I I I . A Family of Ra d i a l l y Symmetric Test Function . 64 i i i LIST OF TABLES Table No. T i t l e Page I F a i l u r e of Fletcher's Algorithm with Comparison to other Algorithms 32 IT. Performance of Algorithms on Function 4 33 III S e n s i t i v i t y of Fletcher's Algorithm to Scaling 35 IV Ineffectiveness of Suspending.H Updating 36 i v LIST OF FIGURES Figure No. T i t l e Page 1 Comparison of four algorithms - Function 1 37 2 Comparison of four algorithms - Function 2 38 3 Comparison of four algorithms - Function 3 39 4 Comparison of four algorithms - Function 4 40 5 Comparison of four algorithms - Function 5 41 6 Comparison of four algorithms - Function Powell . . . . 42 7 Comparison of four algorithms - Function Woods 43 8 Comparison of four algorithms - Function Rosenbrock . . 44 9 Comparison of four algorithms - Function H e l i x 45 v ACKNOWLEDGEMENT The author wishes to thank Dr. G.F. Schrack f o r h i s i n t e r e s t , advice, and encouragement throughout the research and w r i t i n g of this t h e s i s . Thanks are also due Miss Norma Duggan for typing the th e s i s . F i n a n c i a l support for t h i s research was provided by the National Research Council of Canada as a Post-Graduate Scholarship to the author. v i I. INTRODUCTION This thesis gives the results of attempts to gain insight into the operation of the unconstrained optimization methods variously termed variable metric or quasi-Newton on functions more general than quadratic, and from this to obtain improvements in the efficiency and r e l i a b i l i t y of algorithms of this type. While they have been shown to have conjugate direction properties under suitable conditions on quadratic functions, these properties have not explained the demonstrated advantages in ef-ficiency of variable metric methods over simple conjugate direction a l -gorithms on non-quadratic and non-convex minimization problems. The approaches taken have been to explore the simi l a r i t i e s between these methods and Newton's method for minimization, to consider the effects of non-convexity of the objective function, and to treat the sub-problem of search on a line contained in these methods with par-ticular attention to the elimination of a minimizing search on a line. Modifications are proposed to published techniques, and numerical tests have been made to explore the effectiveness of the modifications. The unconstrained problem is that of finding a local minimum of a real function f(x) where x is an n vector. These methods require that the function value, f, and the gradient, g(x) = {j- , 37") 1 n be available to the algorithm, and that the Hessian matrix of second par-t i a l derivatives exist, G(x) ={ J } = 1, 2, . . . - . i i . 1 2 Most v a r i a b l e m e t r i c methods f o l l o w t h i s p r o c e d u r e : Given i n i t i a l point•: x^ ,- and an n : x n symmetric p o s i t i v e d e f i n i t e m a t r i x (denoted by H >_0) . 1. C a l c u l a t e f = f(Xg) and g = g(x^) 2 . Generate the search d i r e c t i o n d = -Hg 1.1 3 . Choose a s c a l a r m u l t i p l i e r ( through some k i n d of search on the l i n e x + ad) g i v i n g the new p o i n t x + = x + ad 1.2 + + and new f u n c t i o n and g r a d i e n t , f , g , such tha t a t the l e a s t f + < f , and p o s s i b l y a d d i t i o n a l c r i t e r i a such as d f c g^ < e are s a t i s f i e d at the new p o i n t . 4. D e f i n i n g * - + o = x - x + Y = g - g add a c o r r e c t i o n E(H.6 .v ) to H such tha t the c o r r e c t e d JL. m a t r i x H s a t i s f i e s the c o n d i t i o n H + y = 6 1 .3 5 . I f 161 < e o r | g + | < e s top -e lse x = x + , g = g + , H = H + , go t o 2. I n Newton's method f o r m i n i m i z a t i o n , the s teps remain the same except t h a t H i s not updated , but i s c a l c u l a t e d d i r e c t l y from the H e s s i a n G f o r a d i r e c t i o n of s e a r c h d = - G ^"g. Newton's method converges to the minimum of a p o s i t i v e d e f i n i t e q u a d r a t i c i n one s t e p . I f the H e s s i a n of a f u n c t i o n i s n o n - s i n g u l a r a t the minimum, near the minimum the f u n c -t i o n i s a c c u r a t e l y r e p r e s e n t e d by the T a y l o r s e r i e s expans ion t r u n c a t e d to a q u a d r a t i c , and convergence i s r a p i d . The c o n d i t i o n H"*" y = ensures tha t the H m a t r i x h a s some of the p r o p e r t i e s o f G ^ s i n c e f o r a q u a d r a t i c G = 6 . V a r i a b l e m e t r i c methods update the H m a t r i x so t h a t i t a p -proaches the i n v e r s e H e s s i a n as the minimum i s approached, and the r a p i d 3 convergence o f Newton's method i s o b t a i n e d w i t h o u t the n e c e s s i t y f o r e x p l i c i t e v a l u a t i o n o f t h e H e s s i a n and i n v e r s i o n . The c l o s e r e l a t i o n between the two methods has l e d to the term quasi-Newton b e i n g a p p l i e d to t hese methods. The term v a r i a b l e m e t r i c i s due to Davidon ^ - j , who p r o p o s e d the f i r s t a l g o r i t h m o f t h i s t y p e . T h i s term can be u n d e r s t o o d by con-s i d e r i n g the i n t e r r e l a t i o n between the d i s t a n c e measure, o r m e t r i c , i n a E u c l i d i a n space and the d i r e c t i o n o f s t e e p e s t d e s c e n t . I f we c o n s i d e r the p roblem o f d e t e r m i n i n g d i s t a n c e i n t h e space o f the i n d e p e n d e n t v a r i a b l e s , X, i t can r e a d i l y be a p p r e c i a t e d t h a t t h i s measure i s a r b i t r a r y . F o r example, i n a two d i m e n s i o n a l p r o b l e m c o o r d i n a t e x.. might have u n i t s o f temperature and X2 u n i t s o f p r e s s u r e . How i s d i s t a n c e d e t e r m i n e d between two p o i n t s z and y i n X? The v e c t o r z-y i s n o r m a l l y c o n s i d e r e d t 1/2 to have l e n g t h ( ( z - y ) ( z - y ) ) but s i n c e t h e c h o i c e o f m e t r i c i s a r -t 1/2 b i t r a r y , t h i s can be g e n e r a l i z e d to ( ( z - y ) M (z-y)) where M > 0. To determine the d i r e c t i o n o f s t e e p e s t d e s c e n t a t a p o i n t , we w i s h t o s e l e c t the u n i t d i r e c t i o n v e c t o r p such t h a t the d i r e c t i o n a l d e r i -v a t i v e p*"g i s minimum. The u n i t d i r e c t i o n v e c t o r s a r e d e f i n e d by the s e t o f d i r e c t i o n s p such t h a t pSip = 1 and the extrema o f p f cg a r e r e a c h e d when t h e p r o j e c t i o n o f g onto t h i s c o n s t r a i n t i s z e r o , t h a t i s , [ I - ^ _EMj=o. p MMp T h i s has s o l u t i o n s p = -ctM ''"g where a p o s i t i v e i n d i c a t e s the d i r e c t i o n of s t e e p e s t d e s c e n t . Comparing w i t h 1.1, u s i n g t h e m a t r i x H as a l i n e a r t r a n s f o r m a t i o n on the g r a d i e n t i s e q u i v a l e n t t o assuming a m e t r i c H "^ and t a k i n g a " c o r r e c t e d " s t e e p e s t d e s c e n t s t e p u s i n g t h i s m e t r i c r a t h e r than the i d e n t i t y m a t r i x m e t r i c . . ' In the c l a s s i c a l method o f s t e e p e s t d e s c e n t , M = I, t h e u n i t . m a t r i x , and i n Newton's method, M - G. In the v a r i a b l e m e t r i c method as 4 i n Newton's method the metric i s changing from step to step, hence the term v a r i a b l e metric. I t should be pointed out that the term v a r i a b l e metric i s a misnomer i n cases where the H matrix i s not p o s i t i v e d e f i n i t e , since the i m p l i c i t distance measure (d^E ^~d)^~^ may not be r e a l , or H ^ may be undefined, v i o l a t i n g the d e f i n i t i o n that a metric define a p o s i t i v e d i s -tance measure. Kowalik and O s b o r n e h a v e pointed out that the term i s somewhat misleading inasmuch as no transformation of the v a r i a b l e s i s a c t u a l l y performed. The metric i s changed so that f i n a l l y the "corrected" d i r e c t i o n of steepest descent w i l l hopefully point toward the minimum, but the v a r i a b l e metric viewpoint gives l i t t l e i n d i c a t i o n how changes are to be made to the matrix H to achieve t h i s . The f i r s t d e f i n i t i v e form of a v a r i a b l e metric method was given by Fletcher and Powell using conjugate d i r e c t i o n and quasi-Newton •conditions to obtain an algorithm-which was convergent to the-minimum of a p o s i t i v e d e f i n i t e quadratic i n at most n steps, and was r a p i d l y con-vergent on a v a r i e t y of non-quadratic t e s t functions. Subsequently many, updating formulas were derived having the same quadratic convergence property, and a u n i f i e d method to construct to conjugate d i r e c t i o n algorithms with quadratic convergence has been given by H u a n g ^ based on a general updating formula f o r H and exact l i n e a r searches. Since v a r i a b l e metric methods can be conjugate d i r e c t i o n algorithms, they have been termed con-jugate d i r e c t i o n methods, but the term i s of questionable accuracy when the function i s non-quadratic or when exact searches i n the successive d i r e c t i o n s are not performed. Numerical experience has i n d i c a t e d that v a r i a b l e metric algorithm using "weak" searches, which f i n d only a f i r s t approximation to the mini-mum on a l i n e , are often more e f f i c i e n t i n terms of evaluations of the 5 function and gradient (although the number of l i n e a r searches i s increased). This has l e d to the b e l i e f that conjugacy of d i r e c t i o n s may not be necessary to obtain rapid convergence. Indications are that, given a c e r t a i n num-ber of function evaluations, i t i s bet t e r to perform a number of weak searches to improve the d i r e c t i o n of search through updating H more often than to spend the evaluations i n performing a smaller.number of exact, searches i n d i r e c t i o n s which may be f a r from- the most e f f e c t i v e . F l e t c h e r ^ c a r r i e s this approach to the extreme by not performing a search f o r a minimum on the line, at a l l , but simply.choosing a stepsize m u l t i p l i e r a = 1 and i f a s u f f i c i e n t reduction i n the function value i s obtained, updating of H i s performed and a new d i r e c t i o n generated. The published comparison of his method with the o r i g i n a l Fletcher-Powell algorithm shows that h i s method gives a reduction i n the t o t a l number of function evaluations of from 40%. to 60% over .the .procedure FLEP.OMIN.,., . Barkb.am.rp, has found that Fletcher's method c o n s i s t e n t l y requires only about 10% of the time required f o r the Fletcher-Powell method on h i s p a r t i c u l a r functions, i n d i c a t i n g that the published r e s u l t s are not a t y p i c a l . Given the demonstrated increases i n e f f i c i e n c y possible through discarding the one-dimensional minimization searches, i t seemed worthwhile to i n v e s t i g a t e t h i s approach i n t e n s i v e l y through considering what s p e c i a l d i f f i c u l t i e s might be encountered through abandoning l i n e a r search, and to propose modifications to overcome these d i f f i c u l t i e s . 6 I I . UPDATING FORMULAS FOR H Since updating of H i s cent r a l to the success of v a r i a b l e metric methods, we w i l l consider some of the properties of these formulas. The only complete treatment of updating formulas, i n c l u d i n g convergence properties, has been from the conjugate d i r e c t i o n approach. Huang gives a u n i f i e d d erivation of q u a d r a t i c a l l y convergent algorithms which use the general updating formula 6 ( 0 , 6 + C-HS)' Hy(K,S + K H t Y ) t H = H + p . - = 2.1 ( C x 6 + C 2 H t T ) t 0^6 + K 2HS0 S where p, C^ , C^, K^, K^, are a r b i t r a r i l y prescribed: r e a l numbers subject to the r e s t r i c t i o n that IC. and K not vanish simultaneously. In th i s C l formula, there are three degrees of freedom i n the choice of p, —— , C 2 and -—. With exact l i n e a r searches, on a p o s i t i v e d e f i n i t e quadratic 2 function from the same s t a r t i n g XQ and HQ, a l l algorithms of th i s class produce the same sequence of d i r e c t i o n s and points to minimize a quad-r a t i c i n at most n steps. I f n steps are taken, the matrix = pG where G i s the Hessian of the quadratic. On non-quadratic functions f o r those algorithms where p = 0, = 0, and the matrix H must be reset to HQ a f t e r every nth step. For a l l algorithms applied to non-quadratic functions, the matrix H i s also reset when |g td| = Ig^gl < E to counter the p o s s i b i l i t y of s i n g u l a r i t y i n H. Huang and L e v y ^ j have demonstrated numerically that algorithms with p = 1 produce the same sequence of points on non-quadratic functions and s i m i l a r l y f o r p = 0 . Dixon ^ ^ j ' h a s pi the same r e s u l t provided that the exact search chooses a l o c a l minimum unambigously from the p o t e n t i a l l y many l o c a l minima i n a l i n e a r search >roven 7 by t h e r u l e t h a t the c l o s e s t l o c a l minimum below t h e p r e s e n t p o i n t i s s e l e c t e d . The e q u i v a l e n c e o f a l l t h e s e a l g o r i t h m s does n o t a p p l y t o t h e l i n e a r s e a r c h p o r t i o n s i n c e t h e a l g o r i t h m s p r o d u c e t h e same d i r e c t i o n s o f s e a r c h , b u t n o t the same magnitude o f d i r e c t i o n f o r s e a r c h . T h e r e f o r e , f o r t h e same i n i t i a l c h o i c e o f s t e p s i z e m u l t i p l i e r a i n t h e l i n e a r s e a r c h , d i f f e r i n g s t a r t i n g p o i n t s a r e o b t a i n e d f o r the e x a c t l i n e a r s e a r c h , w i t h r e s u l t i n g d i f f e r e n t sequences o f p o i n t s i n t h e l i n e a r s e a r c h . These u p d a t i n g f o r m u l a s a r e n o t e q u i v a l e n t i n a l g o r i t h m s n o t u s i n g e x a c t l i n e a r s e a r c h e s s i n c e t h e c o n j u g a c y o f d i r e c t i o n s i s n o t m a i n t a i n e d . I t w o u l d seem t h a t t h o s e a l g o r i t h m s w h i c h r e t a i n p a s t i n -f o r m a t i o n by n o t r e s e t t i n g t he m a t r i x H a r e most u s e f u l a t l e a s t i n t h e f i n a l s t a g e s o f c o n v e r g e n c e , and t h a t u n l e s s t h e r e i s some q u a s i - N e w t o n p r o p e r t y t h a t H t e n d t o -G \ t h e r e i s no b a s i s f o r e x p e c t i n g c o n v e r g e n c e t o be r a p i d . T h i s r e s t r i c t s t h e c h o i c e o f p a r a m e t e r p t o p = 1. I n ad-d i t i o n , e q u a t i o n 2.1 p r o d u c e s non*-symmetric H's w h i c h as a p p r o x i m a t i o n s t o a s y m m e t r i c m a t r i x a r e n e e d l e s s l y g e n e r a l and w a s t e f u l o f b o t h s t o r a g e and c o m p u t a t i o n a l c a p a c i t y . When t h e s e two c o n d i t i o n s a r e imposed on t h e g e n e r a l f o r m u l a , a one p a r a m e t e r f a m i l y o f s y m m e t r i c e x a c t u p d a t i n g f o r -mulas r e s u l t . The term e x a c t r e f e r s t o t h e p r o p e r t y t h a t = G ^ f o r e x a c t l i n e a r s e a r c h e s on a p o s i t i v e d e f i n i t e q u a d r a t i c . T h i s f a m i l y was f i r s t n o t e d by Broyden ^^.j and i s . g i v e n by e q u a t i o n 2.2 w i t h h i s p a r a m e t e r B. -H1 = H + ( X + 66* + B C S y t H + Ey6t) - H y y ^ 2.2 A n o t h e r p a r a m e t r i c form has been g i v e n by F l e t c h e r w h i c h e x -h i b i t s t he p r o p e r t i e s o f t h e f a m i l y to ad v a n t a g e . 8 where and = <!»H0+ + (1 - a»H* 2.3 H + - H + ^ - 3 2 L Y V 2.4 o Y Y Hy H + = H - ^ L J L H I ^ + ( 1 + Y ! H I ) ^ 2 . 5 J- i t v -t ' _t o Y o Y o Y Equation 2.4 is the well known Davidon-Fletcher-Powell formula. The properties of equation 2.5 have been explored by a number of authors [12][13]' The updating formula remains positive definite for <j> £ 0 and 6 TY > 0 on any function, an advantage to be discussed later. For 0 $ $ 1, a convex class of formulas result satisfying a quasi-Newton property without linear searches. Fletcher has shown that this class satisfies the property that for a quadratic function with G positive de-1 1 2 2 f i n i t e , the eigenvalues of K = G HG tend monotonically to 1 for any sequence of 6 vectors(but not necessarily s t r i c t l y monotonically). Furthermore, for any <|> outside the range, i t is possible to construct a quadratic for which the property does not hold and H diverges from G ^. Fletcher also shows that the eigenvalues of for >• $ arranged in order are greater than or equal to the eigenvalues of H^. In his a l -gorithm, according to the test 6ty > Y ^ Y J the choice of HQ is made i f this inequality is satisfied, and the choice H^ i f not. This test measures in some sense whether H i s larger than or smaller than required and chcoses the formula which w i l l produce the closest alteration in eigenvalues. The use of two updating formulas rather than one overcomes a deficiency of the Davidon-Fletcher-Powell formula discussed by P o w e l l b y 9 countering tendencies toward singularity on one hand, and unboundedness on the other. Since the convex class of formulas given by Fletcher are the only updating formulas which: a. satisfy a quasi-Newton propert}7 with weak searches as well as with exact searches, b. retain positive definitenass of H on general functions and c. possess some control over the conditioning of the H matrices, .this class of formulas, using Fletcher's method of determining. <J> isr.used throughout in the numerical tests. 10 I I I . THE ONE-DIMENSIONAL SEARCH SUB-PROBLEM A v a r i a b l e metric/quasi-Newton/conjugate d i r e c t i o n method at the point x uses the current gradient g and H to obtain a search d i r e c t i o n vector d = - Hg. In order that the algorithm possess the descent property + f < f some sort of search along the d i r e c t i o n d must be employed to s a t i s f y at le a s t this i n e q u a l i t y . Choosing the step <5 = x + - x = ad consists of a search f o r the sca l a r m u l t i p l i e r a, a one-dimensional search. dx dF 8F i i-Let F(A) = f ( x + Ad) then F' = — = — — — = g (x •+ Ad)d. At the point dA 3x.^  dA x, AQ = 0 and we have a v a i l a b l e FQ = f ( x ) , F^ = g td. The one-dimensional search problem i n F(A) s t a r t s with F(0) and F'(0) known and we wish to f i n d a A^ s a t i s f y i n g the conditions f o r terminating the search whereupon a = A^. For methods guaranteeing H > 0, F^ < 0, therefore A A > 0 w i l l always be chosen. The following searches w i l l assume that the search proceeds i n the A > 0 d i r e c t i o n . Exact Searches We w i l l f i r s t consider the most stringent conditions f o r t e r -minating the search. Where the algorithm i s desired to be a conjugate d i r e c t i o n method, the condition < e << 1 ( g ^ d = 0) must be s a t i s f i e d as w e l l . Since F(A) need not be unimodal even i f f(x) i s unimodal our search w i l l be for a l o c a l minimum with F(A A) < F(0). The searches to be discussed f i r s t f i n d an i n t e r v a l containing a minimum lower than FQ, and then reduce t h i s i n t e r v a l i t e r a t i v e l y u n t i l necessary conditions f o r a minimum are met. The searches consist of three parts: 1. Choice of the i n i t i a l step A.. 11 2. D e f i n i t i o n of an i n i t i a l i n t e r v a l containing a minimum. 3. S e l e c t i o n of a new point within the i n t e r v a l . Test f o r necessary conditions, of a minimum having been reached. R e d e f i n i t i o n of a smaller i n t e r v a l and repeat 3. 1. Choice of i n i t i a l step: Since i t has been -shown-.^ ,- that the optimum X tends to 1 as a quasi-Newton method converges, A = 1 i s a reasonable choice. Two a l -ternatives to t h i s general rule have been proposed. F l e t c h e r ^ ^ ^ ^ favors the use of a running stepsize m u l t i p l i e r f o r the f i r s t n steps of the algorithm, that i s X^ = ct^_^ i $ n, u n t i l the matrix H has a chance to become reasonably accurate. Where an estimate of the minimum function f -F value, f , i s a v a i l a b l e , the choice of i n i t i a l X., = min(1,2 —~-, ) est 1 FQ ( i f FQ > r e s t ) m a y be used to avoid an unreasonably large f i r s t step. This choice i s obtained by assuming that min.. F(A) = r e s t a n d stepping to the minimum l o c a t i o n of a parabola f i t t e d using F~, F' and f ,. & 0 0 est This choice can r e s u l t i n very small f i r s t steps being taken i f f >. *opt' 2. Choice of an i n t e r v a l containing a minimum F < FQ The general idea i s to evaluate F^ = F(A^) and i f F^ < FQ, increase X u n t i l an increase i n the function i s detected, that i s A, = k+1 vX^ where v > 1. For d i r e c t searches using only function values at X, , k > 0, the i n t e r v a l i s established e i t h e r with F, > F. or F, „ > F, .. • k 1 0 k-2 k-1 < F, . For der i v a t i v e searches using both F, and F,':, e i t h e r F, ., > F, k k k k+1 k or F' > 0 define the i n t e r v a l . k. 12 3. Refinement of the minimum location The d i s t i n c t i o n between di r e c t and derivative searches has already been made. In this part of the search the d i s t i n c t i o n between i a minimum estimating search and i n t e r v a l reducing search i s useful. Minimum estimating - d i r e c t : quadratic i n t e r p o l a t i o n A quadratic i s the simplest function having a minimum which can be f i t t e d to function values only, hence i t i s the simplest example of a minimum estimating search. The basic procedure uses three function values, F > F , < F , and 0 < A < X, < A , to obtain the point a b c a b c X < X < X , a q c 2 2 2 2 2 2 n {K ~ X Z ) F + ( X ^ - X Z ) F , + ( X Z - X ^ ) F \ = A. b c a a c b a b c q 2 ( A - A ) F + ( X - X ) F U + ( X - X , ) F b c a a c b a b c the location of the minimum of an interpolating parabola. The i n t e r v a l is redefined so the position of the lowest function value of F and F , q b becomes A ^ of the new i n t e r v a l , discarding the end point not adjacent to the r e s u l t i n g new A ^ . The procedure i s repeated u n t i l |X^.- A | < e or A f a l l s outside the i n t e r v a l due to errors i n c a l c u l a t i o n , and the q lowest of A ^ , A selected as A . . I f F . < F _ , s t a r t i n g points for th i s D q w X U basic procedure are immediately available from the direct search i n part 2. I f the i n t e r v a l i s defined by F ^ < 0, F 1 5 F Q , l e t F c = F x> x c = X!» and perform quadratic interpolation using F Q , F Q and F C u n t i l F Q > F ^ < F^ where o , F ' A A = 1 0 C b 2 F ' A + F - - F 0 c 0 c and A = A , , F = F, , u n t i l . F , < F „ . Then A = 0, F = F . and begin c b e b b 0 a a 0 basic procedure. 13 Tests of the b a s i c procedure showed thai; t h i s method can be extremely slow i n converging to the minimum of a f u n c t i o n of high order x such as -x + e . I t i s of course p o s s i b l e to do a s i m i l a r search w i t h a higher degree polynomial. A cubic d i r e c t search was also t r i e d and was somewhat more r e s i s t a n t to the aforementioned d i f f i c u l t y , but was also found to be extremely slow i n s i m i l a r circumstances. I n t e r v a l reducing - d i r e c t : golden s e c t i o n search The golden s e c t i o n search does not attempt to place new p o i n t s at the l o c a t i o n of an estimated minimum. I t avoids the p o s s i b i l i t y of very slow convergence by choosing a new p o i n t such that the new i n t e r v a l -1 + /5~ cont a i n i n g the minimum i s always reduced by the f r a c t i o n q = ~ ~ .618 of the o l d i n t e r v a l . This r e d u c t i o n i n i n t e r v a l i s independent of the type of f u n c t i o n as long as f i s s t r i c t l y unimodal, t h e r e f o r e conver-gence i s always of the same r a t e . The search i s stopped when the i n t e r -v a l c o n t a i n i n g the minimum, X - X < e. c a This search can be much slower than the q u a d r a t i c i n t e r p o l a t i o n search. I f the f u n c t i o n i s q u a d r a t i c , the qu a d r a t i c procedure terminates l o g ( E / ( X k _ 2 - X k ) ) a f t e r two s t e p s , w h i l e the golden s e c t i o n search r e q u i r e s n = 1-1 1°e q i t e r a t i o n s . Since the i n t e r v a l r e d u c t i o n i s constant at q, the maximum e r r o r i n minimum l o c a t i o n i s the f i n a l i n t e r v a l l e n g t h . This can be r e a d i l y computed f o r any s t a r t i n g i n t e r v a l and given number of i t e r a t i o n s so no numerical t r i a l s of the procedure were made f o r purposes of com-parison with the other searches. 14 Mixed strategy - d i r e c t : Davies-Swann-CampeyM••, Lio J The two preceding searches use pure s t a t e g i e s , e i t h e r minimum estimating or i n t e r v a l reducing. In order to obtain the advantages of both, many mixed strategies are p o s s i b l e . One example of t h i s approach i s the Davies-Swann-Campey procedure'. Part 3. of the procedure begins with A < A, < A equally spaced points and F > F, < F . The minimum a b c J 1 1 a b c of the i n t e r p o l a t i n g parabola i s c a l c u l a t e d (A - A ) (F - F ) A = A, + a c q b 2(F - 2F, + F ) c b a and a new i n t e r v a l defined as i n the simple quadratic search so that the lowest of F, , F i s bracketed. For the next i t e r a t i o n , a new point A, b q r b i s obtained at the midpoint of A , A before the quadratic i n t e r p o l a t i o n 3. C i s repeated. In t h i s way the search requires no more than twice the function evaluations of a simple quadratic search, yet can guarantee that the i n t e r v a l i s reduced by more than one h a l f every two function evaluations, I t therefore i s safe i n that excessive function evaluations w i l l never be required, and can give rapid convergence as w e l l . For these reasons i t i s a good compromise for an exact l i n e a r search. Minimum estimating - d e r i v a t i v e : cubic i n t e r p o l a t i o n The cubic search of Davidon i s the simplest example of a mini-mum estimating search using d e r i v a t i v e s . From an i n i t i a l i n t e r v a l with A < A, such that F' < 0 and F, > F or F' > 0, the minimum of the i n t e r -a b a b a b polating cubic i n the i n t e r v a l i s e a s i l y obtained from; (F - F.) z = 3 —7^- ± - + F' + F' A, - A a , b 15 W WzL - r K a b (F' + w - z) (A - A ) X = \ + —2 a b c % + R' - F' + 2W and the two points defining the new i n t e r v a l are chosen such that i f F' > 0 or F > F then A, = A else A = A . The search terminates c c a b c a c when A -A n , < e or F' < e. This search was found to be most e f f i c i e n t 1 c co l d 1 c on a l l unimodal functions of one v a r i a b l e tested, i f an evaluation of F' costs the same as an evaluation of F. No d i f f i c u l t y with slow conver-gence was encountered. I n t e r v a l reducing - d e r i v a t i v e : b i s e c t i o n search A -A D' 3 This search consists of using A = A + — - — and applying the c a /. same rules for choice of the next i n t e r v a l and termination as i n the cubic search. Using the same c r i t e r i o n of e f f i c i e n c y as i n the cubic search, this search i s not as e f f i c i e n t as the golden section search, i m since the reduction i n i n t e r v a l , r . . = (—)2-> f = C 6 1 8 j m bxsectxon v2 gss for m equivalent evaluations of the function. Conclusions on exact one dimensional searches There i s no one l i n e a r search which can be considered optimal for a l l functions. In terms of a guaranteed convergence rate,.the golden section search has the highest guaranteed rate of i n t e r v a l reduction, and would be preferable where high order functions are encountered. I t i s used i n the successive unconstrained minimization technique j-^y-j > a penalty function method f or constrained optimization which uses penalty functions of the form k/g(x) f or i n e q u a l i t y constraints g(x) > 0. Simple 16 quadratic d i r e c t searches are prone to very slow convergence i n these conditions, and are to be avoided, but a search such as the Davies-Swann-Campey quadratic search i s a good compromise between the two since i t has a guaranteed convergence rate and can be as e f f i c i e n t as a simple quad-r a t i c search. Higher order d i r e c t polynomial searches, such as a cubic i n t e r p o l a t i o n for a minimum do not avoid the same slow convergence noted for the quadratic i n t e r p o l a t i o n , and i n use on general functions require involved checks to set up and maintain the set of points chosen for i n -t e r p o l a t i o n to ensure that an i n t e r p o l a t i o n for a minimum i s i n f a c t p o s s i b l e . The cubic search of Davidon which uses d e r i v a t i v e s i s a widely used method f o r "weak" searches, and i n terms of avoiding slow convergence i t was uniformly successful as an exact search i n determining the minimum of a v a r i e t y of functions i n -a small -number of i t e r a t i o n s . However there i s an important consideration which makes d e r i v a t i v e l i n e a r searches r e l a t i v e l y i n e f f i c i e n t i n an ri dimensional context. The c a l c u l a t i o n dF t of = g (x + Xd) requires that the n elements of g(x + Ad)' be evaluated. I f we make the assumption that the evaluation of a p a r t i a l d e r i v a t i v e of f requires the same amount of work as the evaluation of f, then an evalua-t i o n of F' requires n times the work of an evaluation of f. C l e a r l y , a d i r e c t search using one function evaluation per i n t e r p o l a t i o n can perform n + 1 i n t e r p o l a t i o n s f o r one i n t e r p o l a t i o n of a d e r i v a t i v e method given n + 1 equivalent function evaluations. This measure of comparison has been suggested to compare the e f f i c i e n c y of d i r e c t and d e r i v a t i v e searches and i s used i n t h i s t h e s i s . This consideration e f f e c t i v e l y excludes der i v a t i v e searches as e f f i c i e n t searches for the n dimensional v a r i a b l e metric algorithm. 17 Weak l i n e a r searches Once the condition that a minimum on the l i n e be found i s d i s -pensed with, a more stringent c r i t e r i o n than that the new point be lower i s required to ensure convergence., since f + < f i n an algorithm only im-p l i e s non-increasing function values. Where a weak minimizing search + i s performed, the condition f < f has been s a t i s f a c t o r y i n p r a c t i c e f o r terminating the l i n e a r search a f t e r at l e a s t one i n t e r p o l a t i o n (Fletcher and Powell's algorithm) with a cubic d e r i v a t i v e search, but conditions can a r i s e where a very small a i s chosen i n cases where F, » F i n which b a case the step taken i s very much short of the actual minimum f o r high order functions. A safeguard against t h i s behaviour i s to r e s t r i c t the new A to .1 of the old X. When an a c c e l e r a t i n g search i s necessary to define an i n t e r v a l containing a minimum (the case where F^ < F^), the step i s assured to be not too small since i f = ^ k ' t n e P o i n t : s e l e c t e d must be at l e a s t halfway between X= 0 and X^, the true minimum l o c a t i o n . The c r i t e r i a f o r accepting some X as a, the f i n a l stepsize m u l t i p l i e r f o r that search, becomes important when no attempt i s made to locate the minimum on the l i n e . Fletcher's algorithm requries that the reduction i n function value be at l e a s t as great as some small pro-portion of the reduction expected i f the function were l i n e a r . He chooses = 1 and i f that choice s a t i s f i e s the test f + - f £ sXd t& 0 < e < .5, th i s choice of X i s used. I f the test i s not s a t i s f i e d , a cubic i n t e r -polation,with d e r i v a t i v e s i s used and the l a r g e r of that X and .1 of the old A/is used u n t i l the test i s s a t i s f i e d . This counters cases where X^ i s too large, but assumes that the choice of A i s not too small. As w i l l be shown i n the numerical experiments, the case of very small stepsize 18 can r e s u l t f o r the choice of A = 1 with r e s u l t i n g slow progress even on a quadratic. Goldstein and P r i c e & ^ - v e a n a d d i t i o n a l condition that f + - f ^ (l-e)Ad tg, 0 < e £ .5, which prevents a r i b t r a r i l y small A + t since the r a t i o (f -f)/Ad g approaches unity as A approaches 0, but t h i s has not been tested numerically. In consideration of the r e l a t i v e c o s t l i n e s s of de r i v a t i v e searches, a weak d i r e c t quadratic search i s proposed which uses the same procedure as described f o r the exact quadratic search with the a d d i t i o n of the con-s t r a i n t that A, i n the case where F^ 5 F^, i s not reduced to l e s s than .1 of the o l d A on each i n t e r p o l a t i o n . A non-minimizing search i s also proposed which i s s i m i l a r to Fletcher's search, but replaces cubic d e r i -vative i n t e r p o l a t i o n by quadratic i n t e r p o l a t i o n to decrease the number of gradient evaluations. 19 IV. ALGORITHMS WITH WEAK SEARCHES USING FLETCHER'S FORMULAS Po s i t i v e Definiteness of H The condition that H remain p o s i t i v e d e f i n i t e i s t h e o r e t i c a l l y a t t r a c t i v e since the d i r e c t i o n d = - Hg remains non-zero for a l l non-zero g. This avoids r e s e t t i n g the matrix R, which may be necessary where H can become i n d e f i n i t e or singular (as i n Huang's general algorithm on non-quadratics) , and always ensures that the d i r e c t i o n of search i s i n the p o s i t i v e A d i r e c t i o n i n the l i n e a r search along x + Ad, since FQ = d tg = -gfcHg i s always l e s s than zero. The l a t t e r i s a p r a c t i c a l convenience i n programming the l i n e a r search. • A l l proofs of convergence have required that HQ > 0 , and i f r e s e t t i n g of the matrix H i s to be avoided (to save information from e a r l i e r steps), a l l H must be p o s i t i v e d e f i n i t e . The property of square matrices that the trace i s the sum of the eigenvalues allows a bound on the l a r g e s t eigenvalue of H p o s i t i v e d e f i n i t e to be given by Tr(H). Since H i s always non-singular under t h i s condition, updating formulas f o r T = H ^ are a v a i l a b l e and Tr(T) bounds the l e a s t eigenvalue of H above zero. Powell j-^gT has used these properties of the Davidon-Fletcher-Powell formula to prove convergence of t h i s method on s t r i c t l y convex functions where the l e a s t eigenvalue of the Hessian i s greater than some e. In addition, he shows that i f at the minimum the function i s three times d i f f e r e n t i a b l e , convergence i s superlinear. This seems to be the only proof of convergence for a v a r i a b l e metric method which treats func-tions more general than quadratic. The opportunity f o r having T = H ^ a v a i l a b l e to an algorithm 20 i s of use i f a d i r e c t i o n of search other than d = -Hg i s used. I f we assume that T =. G, the Hessian of a quadratic, then the choice of d'g A.. = --° - i s s u i t a b l e f or an i n i t i a l s t e p s i z e , since t h i s choice of A d rd w i l l minimize F on the l i n e d for the assumed quadratic. Another consequence of ensuring that H > 0 i s that updating of H can be suspended, and the d i r e c t i o n d = - Hg may s t i l l be used as a 1/2 1/2 1/2 d i r e c t i o n of search, since f o r H > 0, H = H tH where H i s a non-singular square matrix. Suspension of updating and use of the d i r e c t i o n d = -Hg i s equivalent to performing a steepest descent search on the 1/2 function f(x) = f(H x) obtained by the nonsingular transformation of variables of the o r i g i n a l function (see Appendix I ) . Since properties of unimodality and convexity of functions remain unchanged under such a transformation, convergence of a steepest descent method i s s t i l l assured, and suspension of updating cannot prevent convergence, although i t may be very slow. Non-positive d e f i n i t e H matrices do not produce t h i s equivalence to a transformed steepest descent search, and cannot guarantee convergence i f updating i s suspended. The necessary and s u f f i c i e n t condition f o r H > 0, 8* y > 0, t t i s always s a t i s f i e d for exact searches since 6 y = -ad g > 0. When the condition for exact l i n e a r search i s not required, 6*" y > 0 i s not i n general s a t i s f i e d . When i t i s not s a t i s f i e d , that i s , the d i r e c t i o n a l d e r i v a t i v e at x"*" i s less than the (negative) d i r e c t i o n a l d e r i v a t i v e at x, i t can always be s a t i s f i e d for any function bounded below by incr e a s i n g the step 6 s u f f i c i e n t l y as Fletcher points out. The condition i s always s a t i s f i e d for convex functions, and since a s u f f i c i e n t l y small neigh-borhood of the minimum of a d i f f e r e n t i a b l e function i s convex, two points 21 i n that neighborhood can be found to s a t i s f y the search. Linear searches and the s e n s i t i v i t y of algorithms to s c a l i n g One aspect of the use of a weak minimizing search using e i t h e r cubic or quadratic i n t e r p o l a t i o n to estimate a minimum l o c a t i o n i s that i f the function i s nearly quadratic, these searches w i l l be e f f e c t i v e i n obtaining a very close estimate of the minimum on the l i n e . Therefore conjugate d i r e c t i o n s are produced when they are useful. Where the function i s not quadratic, the di r e c t i o n s are not conjugate, but conjugacy i s of questionable value i n that case. When the search f o r a minimum on the l i n e i s abandoned, the pro-perty of quadratic convergence does not hold. This property i s in t i m a t e l y involved with a very desirable property of any algorithm: i n s e n s i t i v i t y to a change of scale "of the v a r i a b l e s , or more generally, "to a non-singular l i n e a r transformation of v a r i a b l e s . This property i s one of the most powerful features of va r i a b l e metric methods which use minimizing searches. When applied to any p o s i t i v e d e f i n i t e quadratic they minimize the function i n at most n steps, and are therefore completely i n s e n s i t i v e to changes of scale wheras a steepest descent method i s very s e n s i t i v e . Kowalik and Osborne ^-j give the following geometric convergence rate of a steepest descent method applied to a p o s i t i v e d e f i n i t e quadratic. f ( x i + 1 ) X -X1 2 — . — r — = (-———) where A., and X are the l e a s t and f (x.) X +X, I n 1 n 1 greatest eigenvalues of the Hessian. Algorithms which do not perform any kind of minimizing search may therefore be expected to be s e n s i t i v e to changes i n sc a l e , even f o r quadratics. As shown i n Appendix I, a change of scale i s equivalent to 22 transforming the i n i t i a l estimate HQ. This i m p l i e s that s e n s i t i v i t y to change of s c a l e i s equivalent to s e n s i t i v i t y to choice of H^. S e n s i t i v i t y to i n i t i a l H i s considered i n the.numerical t e s t s . Convexity of the o b j e c t i v e f u n c t i o n Since the quasi-Newton property that H -»- G i n some sense i s an important part of these methods, l e t us consider the a p p l i c a t i o n of Newton's method i t s e l f to non-convex f u n c t i o n s . A necessary and s u f f i c i e n t c o n d i t i o n f o r convexity of a twice d i f f e r e n t i a b l e f u n c t i o n i s that G(x) be p o s i t i v e s e m i - d e f i n i t e f o r a l l x. Non-convex f u n c t i o n s w i l l t h e r e f o r e have regions where G i s n o n - p o s i t i v e d e f i n i t e . On an i n d e f i n i t e n on-sin-g u l a r q u a d r a t i c , the step d = -G "*"g i s to the s t a t i o n a r y p o i n t of the q u a d r a t i c . At some points below x A , the s t a t i o n a r y p o i n t , t h i s can be detected by the f a c t that the s i g n of'the d i r e c t i o n a l d e r i v a t i v e i n the d i r e c t i o n of search i s p o s i t i v e , i n d i c a t i n g that the step would be i n a d i r e c t i o n of i n c r e a s i n g f u n c t i o n values which cannot occur i f G ^ > 0. Above x,, there i s no i n d i c a t i o n from the d i r e c t i o n a l d e r i v a t i v e s that a s t a t i o n a r y p o i n t , r a t h e r than a minimum, i s being approached. G must be analysed to detect negative or zero eigenvalues. In f a c t , algorithms f o r m i n i m i z a t i o n u s i n g Newton's method employ a v a r i e t y of methods to o b t a i n a b e t t e r d i r e c t i o n i f G i s not p o s i t i v e d e f i n i t e . Some of these i n c l u d e an i n t e r p o l a t i o n between the NexJton step and steepest descent step so that the d i r e c t i o n of search i s always d o w n h i l l j - ^ ^ , or using an i n v e r s i o n scheme which ensures that H remain p o s i t i v e d e f i n i t e - j . In the l a t t e r case, i f negative or zero eigenvalues of G are encountered during i n v e r s i o n , the i n v e r s e of a G' having a l l eigenvalues p o s i t i v e i s c a l c u l a t e d . 23 The d i r e c t i o n -G g i s not an estimate of a minimum l o c a t i o n i f G | 0. At best i t represents a step i n a d i r e c t i o n opposite to a higher stationary point i n a downhill d i r e c t i o n , at worst i t could approach non-minimal stationary points when reduction of the function value would be a be t t e r ac t i o n . An i n d e f i n i t e G gives no information about which d i r e c -t i o n i s best i n a search f o r a minimum since no estimate of a minimum i s implied. - s These considerations apply to v a r i a b l e metric methods i n the following way: since a G f 0 i s no advantage i n determining a minimum, the quasi-Newton property i s not u s e f u l where an i n d e f i n i t e G might be i n f e r r e d from the sequency of points and gradients. Since H remains p o s i t i v e d e f i n i t e i n a v a r i a b l e metric method, the quasi-Newton property does not hold where G j 0, but Powell's proof of convergence f o r the DFP algorithm indicates that convexity of the function may be necessary to bound the eigenvalues of H. I t seems p l a u s i b l e that where G i s not p o s i -t i v e d e f i n i t e , the H matrix may attempt to approximate negative eigen-values by d r i v i n g some of i t s eigenvalues very small causing very small steps, or steps nearly orthogonal to the gradient, with d r a s t i c e f f e c t s on convergence. Summarizing, since Newton's method i s not useful i n an -unmodified form on non-convex regions of a function, i t may be unwise to r e l y wholly on updating of the H matrix to produce convergence i n the same s i t u a t i o n . A t e s t f o r non-convexity using the necessary condition f o r s t r i c t convexity of a d i f f e r e n t i a b l e function given by equation 4.1 f ( x i ) < f ( X j ) + g J t(x ± - x.) 4.1 f o r a l l x. ^ x. i J has been implemented and tested and w i l l be described i n Chapter V. 24 V. NUMERICAL EXPERIMENTS The approach taken to improve present algorithms was to t r y to produce poor behaviour on s p e c i a l l y constructed test functions and to t r y to overcome the poor performance with modifications. As mentioned i n the previous chapter, convexity of the objective function seems to be c l o s e l y r e l a t e d to the effectiveness of the v a r i a b l e metric updating. Non-convex functions of one minimum have been proposed, indeed most common test functions are of t h i s kind, but the type considered i n i t i a l l y were only s l i g h t l y more general than convex, namely pseudoconvex. A d i f f e r e n t i a b l e function i s convex over a convex set of points S i f f o r any two points x^ and x_. i n S equation 4.1 i s s a t i s f i e d . That i s , any l i n e a r extrapolation underestimates the function value. A d i f f e r e n t i a b l e function i s pseudoconvex over a convex set of points S i f for any two points x^ and x_. i n S g i ( x i " X j ) 5 ° implies f(x.) > f(x.) for x. ^ x. 1 J 1 J That i s , when the d i r e c t i o n a l d e r i v a t i v e i n any d i r e c t i o n i s non-negative, the function increases i n that d i r e c t i o n . y The type of function chosen f o r preliminary tests was f y + 1 n 2 where y = £ a.x.. This function i s pseudoconvex and has a minimum of i = l 1 1 f =. 0 at x = 0. I t has the property that f o r y > —, f > .25, the Hessian has a negative eigenvalue. Preliminary t e s t i n g of Fletcher's method on 2 2 the two dimensional function with y = + 100 x 2 i n d i c a t e d that f o r cer-t a i n s t a r t i n g conditions X Q and HQ, a very large number of steps could be 25 taken without leaving the non-convex region f > .25, for example, nearly 10,000 i t e r a t i o n s . To probe the causes of t h i s behaviour, the d i r e c t i o n a l d e r ivatives at each point of the l i n e a r search, the eigenvalues of H, p o s i t i o n and function value were p r i n t e d f o r the f i r s t 147 i t e r a t i o n s of the above sequence. The problem arose from the i n c r e a s i n g l y erroneous d e f l e c t i o n of the gradient to give d i r e c t i o n of search -Hg. For t h i s p a r t i c u l a r function, a r a t i o of 1:10 i n the eigenvalues of H would be necessary to y i e l d a d i r e c t i o n p o i n t i n g toward the minimum, yet the eigenvalues of H a f t e r the 52nd i t e r a t i o n began diverging, the small eigenvalue decreasing, the large eigenvalue increasing. At the end of the 52nd i t e r a t i o n , the eigenvalues were.. 6,43., and at the 147th i t e r a t i o n , -3 2 5. 10 and 3. 10 . On the l a s t i t e r a t i o n (the 9989th) the eigenvalues -5 2 of H were 3. 10 and 3. 10 . This had the e f f e c t of producting steps which crossed the contours of the function at a decreasing angle as the d i r e c t i o n vector d approached orthogonality to the gradient. The steps produced were not small i n displacement, but the reduction i n function value was very small f o r each step due to the poor d i r e c t i o n of search. The reduction i n function value during t h i s sequence was of the order of .1 of that expected from a l i n e a r e xtrapolation, therefore n e i t h e r + t Fletcher's condition f - f £ e 6 g nor Goldstein and P r i c e ' s condition + t f - f £ (1-E)6 g would have provoked any a d d i t i o n a l attempt to get a closer point to the minimum on the l i n e . In f a c t , given the poor d i r e c t i o n , the simple l i n e a r search gave a reasonable approximation to the minimum on the l i n e , since the magnitude of the d i r e c t i o n a l d e r i v a t i v e at the new point was reduced around 90% on each search over i t s s t a r t i n g value on that search. The poor behaviour of the updating was a t t r i b u t e d to non-convexity 26 of the function, and i t was f e l t that i f t h i s could be detected, measures could be taken to prevent i n c r e a s i n g l y i n e f f i c i e n t behaviour of the non-minimizing algorithm. The condition &*~y > 0 , which i s tested i n Fletcher's method, was not s u i t a b l e , since t h i s condition was r a r e l y detected i n the non-convex region. S i m i l a r l y the somewhat stronger conditions that 6 tg < f"*~-and 6 tg > f + - f were included i n the l i n e a r search procedure, and a simple search devised so that the interval,chosen for updating d i d not v i o l a t e these conditions for convexity. This too was r a r e l y e f f e c t i v e i n detec- . t i n g non-convexity. F i n a l l y a comparatively d r a s t i c t e s t for non-convexity was devised u t i l i z i n g the necessary condition for s t r i c t convexity,, equation 4.1. The function was assumed non-convex unless the previous n p o s i t i o n vectors, gradient vectors, and function values, together with the current point, s a t i s f i e d equation 4.1. A v a r i a b l e metric algorithm using t h i s test was defined to be i n state 2 when the equation was s a t i s f i e d over n pairs of points, and i n state 1 otherwise. The test can be performed over a set of m points, and be per-formed i n a recursive manner so that as a point i s added to the set, only the r e l a t i o n s i n c l u d i n g the new point need be tested, n p a i r s of points were used since a minimum of n steps are necessary to span the space of the n v a r i a b l e s . The non-convexity test i s defined as follows: Given 1 $ m £ n stored points s a t i s f y i n g equation 4.1, and a new point x , the r e l a t i o n s 5.1, 5.2, are tested from r new the most recent stored point to the oldest stored point. f, - f > g1* (x, - x ) k = l,...,m 5.1 k new °new K new f - f, > g5 (x - x. ) new k k new K 27 I f at some k, e i t h e r r e l a t i o n i s not s a t i s f i e d , m i s set to k, the new point i s stored, and the algorithm i s i n state 1. I f the r e l a t i o n s are s a t i s f i e d over m points and m < n, m i s incremented by 1, the new point i s stored, and the algorithm i s i n state 1. I f m = n, the oldest point i s discarded, the new point i s stored, and the algorithm i s i n state 2. Given that non-convexity can be detected, suspension of updating the H matrix was proposed as an appropriate a c t i o n . When updating i s sus-pended i n state 1, i t i s necessary to ensure that stepsizes w i l l be reasonable, since non-minimizing searches depend on the H matrix g i v i n g a reasonable i n d i c a t i o n of the proper step. To t h i s end a v a r i e t y of weak minimizing searches are proposed f o r use i n state 2. The searches were: INT, a weak quadratic i n t e r p o l a t i n g minimi-zing search; BK.T, a search which merely brackets the .minimum to obtain a lower function value; and ARB, a search which doubles the previous step-s i z e m u l t i p l i e r and accepts the r e s u l t i n g step i f i t produced a lower function value, performing quadratic i n t e r p o l a t i o n only i f necessary. In state 2, a simple search RDU, s i m i l a r to Fletcher's non-minimizing search, but using quadratic i n t e r p o l a t i o n with function values only as opposed to cubic i n t e r p o l a t i o n with d e r i v a t i v e s was proposed. The l i n e a r searches are given i n d e t a i l i n Appendix I I . Nine algorithms were tested on nine d i f f e r e n t functions. The choice of i n i t i a l stepsize m u l t i p l i e r i n a l l cases was A^ = 1 except f o r the f i r s t n searches a f t e r s t a r t i n g , or i n state 1 where non-convexity was tested, i n which cases the stepsize m u l t i p l i e r from the previous step was used. A l l algorithms used the updating formulas f o r H given by equa-tions 2.4 and 2.5 where the choice of formula was by Fletcher's c r i t e r i o n . 28 Termination of the algorithms was by the conditions that: a. The d i r e c t i o n vector d was smaller than a s p e c i f i e d magnitude and more than n searches had been performed or b. The step taken was so small as to r e s u l t i n no change i n p o s i t i o n . or c. A s p e c i f i e d number of function evaluations was exceeded. Six algorithms used the non-convexity test to suspend updating i n state 1, and on entry to state 2, updated the H matrix over the stored steps which had not previously been used to update. The type of l i n e a r search used i n each state f o r these algorithms are given below Algorithm Linear search i n state 1 Linear search i n state 2 A INT (quadratic minimum INT estimating) C INT BKT (minimum bracketing) C INT RDU (function value reducing) D BKT BKT E BKT RDU F ARD (doubles stepsize . RDU m u l t i p l i e r , function value reducing) For comparison the following algorithms were also tested. FLVMM Fletcher's algorithm using a non-minimizing search INTVM'' Using the quadratic search INT. CUVM Using a cubic d e r i v a t i v e search as used i n the Fletcher-Powell algorithm. The test functions considered were: Woods [ 2 0 ] .f = 100 (x 2-y) 2.+ (x-1) 2 + ( z - 1 ) 2 + 9 0 ( z 2 - u ) 2 + l O . l K y - l 2 ) + (u-1) 2] + 19.8(y-l)(u-1) 29 minimum f = O a t x = y = z = u = l stationary point f = 7.876 at x = 0.9679, y = .9471, z = -.9695, u =.-9512 suggested s t a r t i n g point: x = -3, y = -1, z = -3, u = -1. P p w § l l [ 2 1 ] . f = (x + 10y) 2 + 5 ( z - y ) 2 + ( y - 2 z ) 4 + 10(x-u) 4 minimum f = 0 at x = y - z = u - 0 suggested s t a r t i n g points: x = 10, y = 10, z = 10, u = -10 (Huang) x = 3 , y = - l , z = 0 , u = l (Powell) This function has a singular Hessian at the minimum. Rosenbrock ^2] f = I O O C X J - X J 2 ) 2 + (1 - x x ) 2 minimum f = 0 at = Xj = 1 . suggested s t a r t i n g point x^ = -1.2, x 2 = 1. This function has a steep sided p a r a b o l i c v a l l e y . H e l i x 2 f = 1 0 0 [ ( X l - y i ) 2 + ( x 0 - y o ) 2 ] + 3 x 3 + 4 where y^ = x^ sinu x^ ^2 = X 3 C O S 7 D C 3 minimum f = 0 at x. = 0 l suggested s t a r t i n g points: X Q = 0, -1, 1 0, 2, 2 0, -3, 3 0, 4, 3 This function has a steepsided conic h e l i c a l v a l l e y with axis x^, period 2, and radius X y The l a s t term gives the v a l l e y bottom 2 4 a slope which increases as the function value decreases f o r x^ > " j . 30 2 4 Therefore } f o r x^ > —, the Hessian along the v a l l e y bottom has a negative eigenvalue. In addition, the l a s t term gives an increase i n the r a t i o between the steepness of the v a l l e y walls and the slope of the v a l l e y bottom as x^ i s increased. The following family of test functions has minimum f = 0 at x^ = 0 and Hessian equal to the un i t matrix at the minimum. The properties of each function are derived i n Appendix I I I . Function I (convex) ^ i . • 1 n 2 f(x) = j I xf i = l 1 This function has a constant Hessian. Function 2 (convex) f(x) = exp "(i £ x 2) - 1 i = l 1 This function has a Hessian whose eigenvalues "increase monotonically as a function of the distance from the minimum. Function 3 (convex) f(x) = 2-i\ 1 x 2 + 1 - 2 This function has a Hessian whose eigenvalues decrease monotonically as a function of distance from the minimum. Function 4 (pseudoconvex) 4y 1 n ' 2 f(x) = , . where y = — £ x. y + 4 . 2 i = 1 x This function has a Hessian whose eigenvalue corresponding to the . 4 eigenvector c o l l i n e a r with a radius vector i s negative f o r y > —. 4 That i s , the Hessian i s si n g u l a r at y = —, f = 1, and i n d e f i n i t e f o r y l a r g e r . The function i s convex f o r f < 1. 31 Function 5 (quasiconvex) 3 2 ' n . rr \ 1 y ^ 1 v 2 f(x) = *r=- - + y where y = — E x. Z / J 2. . , 1 1=1 This function has a contour f = 1 where g(x) = 0 as w e l l as an an-nular region where the Hessian has a negative eigenvalue. The func-t i o n i s convex f o r f < .487. For comparison between algorithms (since some require gradient as w e l l as function evaluations at each point, and others only function evaluations during the l i n e a r search), a gradient evaluation was considered to be as c o s t l y as n function evaluations for give a measure of equivalent function evaluations. Algorithms were compared on the basis of equivalent -12 function evaluations required to reduce the function value to 10 . above the true minimum value. The functions and algorithms were coded i n Fortran using double p r e c i s i o n arithmetic throughout, and were executed on an IBM 360/67 computer. The previously mentioned equivalence between s c a l i n g of v a r i a b l e s and i n i t i a l estimate of H allowed the r a d i a l l y symmetric t e s t functions to be used giving d i f f e r e n t diagonal HQ and XQ to the algorithms to ob-t a i n r e s u l t s equivalent to the corresponding e l l i p s o i d a l contoured func-tions and s t a r t i n g H Q = I , a convenience i n evaluating the e f f e c t s of s c a l i n g of the v a r i a b l e s . F a i l u r e s of Fletcher's method, FLVMM, performance of the best non-convexity checking algorithm, and performance of the best minimizing search algorithm are shown i n Table I . On test function 4, the f a i l u r e s occur when FLVMM i s i n e f f e c t i v e i n leaving the non-convex region f > 1 within the f i r s t few i t e r a t i o n s . A large number of tests on the t e s t function 4 demonstrated that the non-convexity t e s t was not very e f f e c t i v e i n detecting that 32 TABLE I: FAILURES OF FLETCHER'S ALGORITHM WITH COMPARISON TO OTHER ALGORITHMS Number of searches " I t e r " and number of equivalent function evaluations shown are those required to reduce the function value to l O - " ^ above the true minimum value unless otherwise i n d i c a t e d . Test function FLVMM Best I t e r Equiv. and i n i t i a l I t e r Equiv non- F conditions F convex Evals (HQ diagonal) Evals t e s t a l g . Best I t e r Equiv min. F srch Evals a l g . Function 4 n = 5 XQ—2,2,2,2,2, H =1,1,1,-001, .001 493 3005* E 86 685 INTVM 7 90 Function 4 n = 5 x Q= 1,2,3,4,5 II =1,1,1,.001, .001 497 3000* 377 3000* INTVM 12 148 Function 4 n = 5 x =20,10,50 U .1, IO" 4 H n=.l,.5,1.0 1.01, 2 468 3005* 23 202 INTVM 19 212 Function 4 n = 2 x Q = 10, 10 1, .01 483 1497* 73 INTVM 79 Woods xQ=-3,-1,-3,-1 -498 H o=10 _ 7, 1 0 - 7 -7 -7 10 ', 10 2495* 27 258 INTVM 21 213 Algorithm f a i l e d to converge wi t h i n a l l o t e d number of equivalent function evaluations. 33 TABLE I I : PERFORMANCE OF ALGORITHMS ON FUNCTION 4 This.function i s convex f or f < 1. The effectiveness of the non-convexity t e s t can be judged by observing at what function value state 1 i s f i n a l l y l e f t . Number of searches "ITER" and equivalent function evaluations i n d i c a t e the number required to reduce the function value to IO--'-2 above the true minimum value (0.0), unless the "FINAL F" column contains an entry. Convergence State 1 I n i t i a l conditions (HQ diagonal) Algorithm I t e r F i n a l F Equiv F Evals f i n a l l y l e f at i t e r f = X Q = 1,2,3,4,5 £ 0 = 3 ' 4 9 - 1 -3 H = 1,10 x , .10 J , -5 -7 10 , 10 CUVM 21 576 INTVM .FLVMM A 28 246 28 3.44 435 1482* 372 25 . .003 B 150 3.34 1504* 6 3.45 C 247 3.34 1504* 6 3.45 D 151 3.34 1502* 6 3.44 E 247 3.34 1503* 6 3.44 F 249 3.44 1500* 6 3.47 x n = 1,2,3,4,5 f „ - 3 . 4 9 H =1, 10, 10 ,10 1 CUVM 18 348 INTVM 17 171 FLVMM A 213 41 2.65 1499* 365 36 .142 B 60 496 51 .72 C 98 671 27 2.84 D 30 247 23 .096 E 32 250 24 .216 F 35 264 29 .12 X Q = 1,2,3,4,5 CUVM 22 606 f Q - 3.49 Hn=10 7,10 3,1, 10 INTVM FLVMM '" A ••: 22 224 43 1.53 280 1496* 426 32 1.55 B 122 1151 26 2.86 c 237 1.54 1502* 36 1.81 D 182 .999 1502* 162 1.53 E 243 1.53 1501* 23 1.54 F 243 1.48 1499* 30 2.03 3CQ "~~ 3_• 1 ^  «3j 1 • 3 ^  - • f 0=X.48 H = 1 , 10 \ 10- J -5 -7 10 , 10 01 CUVM 15 378 INTVM FLVMM A 16 248 17 .85 203 1494* 218 6 1.14 B 47 437 6 1.14 C 248 .845 1502* 6 1.15 D 45 414 6 1.15 E 248 .838 1504* 6 1.15 F 209 1202 6 1.16 34 TABLE II CONT'D x Q = 1.1,.5,1,1.5,.01 f Q - 1.48 H = 10 7, 10 5 , 1.0, 5 7 10 , 10 CUVM 14 396 INTVM 14 191 FLVMM 238 .516 1494* •A 18 218 6 .95 B 32 296 6 .94 C 249 .516 1500* 6 .94 D 35 326 6 .94 E 245 .516 1501* 6 .94 F 245 .516 1498* 6 .94 converge within alloted equivalent function evaluations. 35 TABLE I I I : SENSITIVITY OF FLETCHER'S ALGORITHM TO SCALING Test function and i n i t i a l conditions (HQ diagonal) Number of searches " I t e r " and number of equivalent func-t i o n evaluations shorn are those required to reduce the func-t i o n value to 10~12 above the true minimum unless otherwise i n d i c a t e d . Best FLVMM non- Best . Equiv convex Equiv min Equiv F t e s t F srch F I t e r Evals a l g . I t e r evals a l g . I t e r Evals Function 3 498 2994* 22 291 INTVM 21 286 n = i XQ=1,2,3,4,5 H = 1, 10" 1, -3 -5 -7 10 ,10 ,10 Function 3 n = 5 x 0=l,2,3,4,5 H 0=10" 7,10" 5 a  1.0,10 5,10 7 490 2994* 18 210 INTVM 17 211 Function 3 n = 5 x 0=l,2,3,4,5 H =10^,10,1.0, -1 -3 10 ,10 198 1212 .21 224 INTVM 16 170 Function 1 n = 5 x 0=l,2,3,4,5 H =10,10 _ 1,10 - 3 -5 -7 10 , 10 ' 54 330 10 136 INTVM 77 Function 1 n = 5 x Q=l,2,3,4,5 H =10 _ 7,10 _ 5,1.0 5 7 10 , 10 31 198 135 INTVM 107 Function 1 n = 5 XQ—1,2,3,4,5 H n=10 3,10,1.0 -1 -3 10 ,10 23 150 99 INTVM 67 * Algorithm f a i l e d to converge wi t h i n a l l o t e d number of equivalent func-t i o n evaluations. 36 TABLE IV: INEFFECTIVENESS OF SUSPENDING H UPDATING Number of searches " I t e r " and number of equivalent function evaluations shown are those required to reduce the function to 10 above the true minimum value unless otherwise i n d i c a t e d . -12 FLVMM Alg. A INTVM Test func-t i o n I n i t i a l xo conditions Hp (diagonal) I t e r equiv F evals I t e r equiv F evals I t e r equiv F evals H e l i x 0, -3,-3 1, 1, 1, 242 1224 226 2000* 145 1040 4 1,2,3,4,5 1,1,1,.001,.001 497 3000* 374 .3000* 12 148 4 2,2,2,2,2, 1,1,1,.001,-001 493 3005* 278 2260 7 90 a FIGURE 1. Comparison of four algorithms - Function 1. I n i t i a l conditions: x0«(l.0,2.0,3.0,4.0,5.0), HQ= diag(l0 3,10 1,1.0,10~ 1,10~ 5) V e r t i c a l axis represents the logarithm of tKe function difference from the minimum value. H o r i z o n t a l axis has u n i t s of equivalent function evaluations. Marked points represent values at the beginning of each l i n e a r search. Rote the quadratic convergence of INTVM and CUVM. o i a o EQUIV F EVflLS FIGURE 2. Comparison of four algorithms -Function 2. I n i t i a l conditions: x Q-(0.1,0^2,0.3,0.4,0.5),- H - diag(l.0,10~ 1,10" 3,10" 5,10~ 7) V e r t i c a l axis represents the logarithm.of the function difference from the minimum value. Ho r i z o n t a l axis has u n i t s of equivalent function evaluations. Marked points represent values at the beginning of each l i n e a r search. 0.0 FIGURE 3. i — 2C.0 40.0 60.0 Comparison of fou r algorithms - Functio n 3. — i 1 i 80.0 100.0 . .1,20.0 EQUIV F EVfiLS iXl'O1 J i 140.0 I 160.0 I 120.0 200.0 I n i t i a l c o n d i t i o n s : x„=(l.0,2.0,3.0,4.0,5.0), Hn= diag ( l 0 ^ , I O 1 , I . O , 1 0 _ 1 , 1 0 3 ) 0 ' 7 , s - ~ , r . ~ , s . ~ j , ~ Q V e r t i c a l a x i s represents the l o g a r i t h m of the f u n c t i o n d i f f e r e n c e from the minimum value. H o r i z o n t a l a x i s has u n i t s of e q u i v a l e n t f u n c t i o n e v a l u a t i o n s . Marked po i n t s represent values a t the beginning of each l i n e a r search. Note that the f i n a l convergence r a t e of FLVMM compares f a v o r a b l y to other algorithms, but i s very slow i n a t t a i n i n g J;hat rate." FIGURE 4- Comparison of four algorithms - Function 4. I n i t i a l conditions: : x Q » ( l . 1 , 0 . 5 , 1 . 0 , 1 . 5 , 0 . 0 1 ) , H 0= . d i a g(l0 5 ,10 1 ,1.0,10~ 1 ,10 - 5 ) V e r t i c a l axis represents the logarithm of the function difference from the minimum-value. Horizontal axis has u n i t s of equivalent function evaluations. Marked points represent values at the beginning of each l i n e a r search. 0.0 23.0 EQUIV F EVflLS tX10^°)° 140.0 160.0 1S0.D 200.0 FIGURE 5. Comparison of four algorithms - Function 5- I n i t i a l conditions: -(li-l,0.5,1.0,1.5,0.01), H = diag ( l 0 5 , 1 0 1 , 1 . 0 , 1 0 " 1 , i 0 3 V e r t i c a l axis represents the logarithm of the function difference from the minimum value. H o r i z o n t a l axis has u n i t s of equivalent function evaluations. Marked points represent values at the beginning of each l i n e a r search. Note that the f i n a l convergence of FLVMM i s s i m i l a r to the other algorithms a f t e r the i n i t i a l l y slow progress. 0.0 50.0 100.0 150.D 300.0 . 250.0 300.0 350.0 400.0 430.0 500.0 EQUJV F EVfllS FIGURE 6. Comparison of four algorithms - Function Powell. I n i t i a l conditions: x =.(3.0,-1.0,0.0,1.0), H • I. V e r t i c a l axis represents the logarithm of the function d i f f e r e n c e from the minimum value. Horizontal axis has u n i t s of equivalent function evaluations. Marked points represent values at the beginning of each l i n e a r search. Note that f i n a l convergence of the algorithms on t h i s . f u n c t i o n i s . l e s s r a p i d iv> than on the other functions due to the singular Hessian at the minimum. EQUIV F EVfllS FIGURE 7. Comparison of four algorithms - Function Woods/ x = -3,-1,-3,-1; H = I V e r t i c a l axis represents the logarithm of the function Sifference from trie.minimum value. H o r i z o n t a l axis has u n i t s of equivalent function evaluations. Marked points represent values.at the beginning of each l i n e a r search. I 1 1 O.fl 40.0 80.0 130.0 160.0 300;0 340.0 2B0.0 320.0 360.0 400.0 • ... . EGUIV F EVflLS F I G U R E 8. Comparison of four algorithms - Function Rosenbrock. I n i t i a l conditions: XQ=(-1.2,l .o), HQ= I . V e r t i c a l axis represents the logarithm of the function d i f f e r e n c e from the minimum value. Horizontal axis has u n i t s of equivalent function evaluations. Marked points represent values at the beginning of each l i n e a r search. Note that the more c o s t l y search used i n CUVM makes th i s algorithm, much l e s s e f f i c i e n t than the others. FIGURE 9. CUVM 1 r 200.0 250.0 EQUIV F EVflLS 300.0 350.0 4)0.0 0.0 50.0 100.0 150.0 Comparison of four algorithms - Function Helix. I n i t i a l values: 450.0 500.0 x Q= ( 0 . 0 , - 1 . 0 , 1 . 0 ) , H Q= I. V e r t i c a l axis represents the logarithm of the fu n c t i o n d i f f e r e n c e from the minimum value. Horizontal axis has units of equivalent f u n c t i o n . evaluations. Marked points represent values at the beginning of each l i n e a r search. 4 6 condition when used with the non-minimizing searches RDU and BKT (algorithms B, C, D, E and F). The f i r s t n or more steps were taken with the non-con-vexity condition on, and i f the non-convex linear search was effective in reaching the convex region f < 1 in those steps, the other linear search usually reached the minimum within the alloted equivalent function evaluations When the non-convexity condition was turned off in the non-convex region, the condition was almost never detected during RDU, and often was not detected during BKT (see the f i r s t three examples in Table II). It was also found that the non-minimizing searches as used in FL.VMM and the algorithms C, E, and F could be very inefficient even within the convex region of the function (see the last two examples in Table II). This inefficiency on convex functions occurred when the i n i t i a l H had very small eigenvalues,. even for a quadratic function. Examples of •the 'sensi-tivity of FL-VMM to-small -HQ are shown i n Table III for a variety of convex and non-convex functions. Note the extremely poor performance on test function 3. The effectiveness of suspending the updating of H on detection of non-convexity has not been demonstrated. In fact, Table IV gives instances where this action has been detrimental to convergence: INTVM is identical with A except that A suspends updating on detection of non-convexity, and INTVM is shown to be much superior in these examples. The failures of A are attributed to the sensitivity of steepest descent methods to scaling of the variables, since A i s equivalent to a steepest descent method when updating is suspended. The successes of this a l -gorithm are fe l t to reflect the value of the minimizing search, rather than the value of conditional updating, since performance of the algorithm INTVM was in a l l cases successful, whereas other conditional updating 4 7 algorithms were prone to f a i l u r e . Figures 1 to 9 compare the progress of the four algorithms CUVM, INTVM, FLVMM, and A (the most r e l i a b l e of the c o n d i t i o n a l updating algorithms). Figures 6 to 9 confirm that f o r a reasonable s c a l i n g of the v a r i a b l e s , Fletcher's non-minimizing search i s more e f f i c i e n t than a cubic minimizing search using gradients during the search, as reported by F l e t c h e r ^ . However, the more economical l i n e a r search used i n INTVM can i n most cases match the e f f i c i e n c y of Fletcher's method without r i s k i n g the previously noted pathological behaviour. 'kQ Suggestions f o r Further Work The r e l a t i v e advantage of a non-minimizing search i s lessened i f the comparable minimizing search i s a d i r e c t search rather than a de r i v a -t i v e search. In i t s most e f f i c i e n t operation, the cubic search required 2n+2 equivalent function evaluations, the non-minimizing search n + l equivalent function evaluations, and the quadratic, search n + 2 equivalent function evaluations per search. This p o t e n t i a l reduction of only one equivalent function evaluation .per search over a d i r e c t search does not give the same p o t e n t i a l f o r increased e f f i c i e n c y f o r a non-minimizing search, and i t remains open whether non-minimizing searches should be pursued further. Murtagh and Sargent[23] ^ a v e reported d i f f i c u l t i e s with a weak quadratic search, but this may be caused by the lack of a constraint on the reduction of stepsize i n the event that F > F (discussed i n Chapter -L 0 I I I ) . Assuming that non-minimizing searches are to be further considered, the condition of Goldstein and Price should be used to prevent •very small"s steps and slow convergence on convex functions. More tests should be car-r i e d out comparing the quadratic and cubic weak search, i n algorithms on a v a r i e t y of test functions. For general functions, i t s t i l l seems that the use of a weak minimizing search i s necessary for non-convex regions. The property of p o s i t i v e definiteness of H may have to be dispensed with f o r non-minimizing searches, since t h i s was i n part a cause of the approach of the step path toward function contours. The non-convexity t e s t f a i l s because the d i r -ections chosen f o r search were not i n a d i r e c t i o n which would have allowed non-convexity to be detected. A non-minimizing l i n e a r search algorithm was suggested by Hes-tenes, .-, which f o r n steps uses a formula guaranteeing conjugate 4 9 d i r e c t i o n s f o r a quadratic, but having no quasi-Newton property ( i . e . H = 0), then calculated the n + 1st value of H from the stored sequence n ^ of steps by a formula having the quasi-Newton property. Powell thinks that t h i s may lead to an e f f e c t i v e algorithm f o r convex functions. The sequence of points and gradients stored could be tested for non-convexity, and a minimizing search used i f t h i s condition were s a t i s f i e d . Use of an H not guaranteed to be p o s i t i v e d e f i n i t e might give more freedom i n d i r e c t i o n and therefore make non-convexity e a s i e r to detect with the test proposed i n t h i s t h e s i s . In a s i m i l a r vein, there i s no requirement that steps be deter-mined by the H obtained from the convex class of formulas to update using these formulas. I t may be useful to use a non-positive d e f i n i t e formula such as the Rank l r r l formula to determine an H to produce [5] "directions under convex conditions and "update "two •es'timates ft, one using the convex c l a s s , and one using the Rank 1 formula. When a non-convex condition i s detected, the algorithm would switch to minimizing searches and use the convex class formulas u n t i l the non-convexity condition i s o f f . Then the p o s i t i v e d e f i n i t e H would be used as the s t a r t i n g H f o r both updating schemes. Although not discussed previously i n the th e s i s , r e l i a b l e t e r -mination c r i t e r i a are needed for algorithms. The usual c r i t e r i o n of displacements becoming very small gives no warning i f the point found i s a c t u a l l y a non-minimal stationary point. Pox^ell -j has suggested the use of small perturbations to a supposed minimum and r e s t a r t i n g the algorithm at these points. Given t h i s s o r t of termination procedure, i t would be possible to measure the extent to which the Hessian i s con-stant through the s i z e of corrections to the H matrix. I f corrections 50 to H were vanishing around the given point, t h i s would provide necessary and s u f f i c i e n t conditions f o r a minimum since the necessary condition that the gradient be zero (or very small) and the second d e r i v a t i v e matrix p o s i t i v e d e f i n i t e and constant define a l o c a l minimum. 51 ' VI. CONCLUSIONS The problem of choosing an i n i t i a l estimate of H has been shown to be equivalent to the choice of s c a l i n g of the v a r i a b l e s of the o b j e c t i v e function, and a family of r a d i a l l y symmetric test functions was derived which allowed r e s u l t s equivalent to various s c a l i n g s to be obtained through d i f f e r e n t i n i t i a l H values. Test .functions have been given which provoke extreme i n e f f i c i e n c y of Fletcher's algorithm and other algorithms using non-minimizing searches i n the one dimensional search problem when updating was performed using formulas with a quasi-Newton property and maintaining H p o s i t i v e d e f i n i t e . For t h i s reason, algorithms of t h i s kind are not s u i t a b l e for use on twice d i f f e r e n t i a b l e functions of one minimum i n general. Their a p p l i c a t i o n should be r e s t r i c t e d to convex functions (or regions which are convex), and even there, steps must be taken to.ensure that displacements are not too small, since an i n i t i a l H estimate with eigenvalues very small can cause slow convergence due to very small steps. It has been demonstrated that even f o r non-convex functions, updating the H matrix i s of use, since suspension of updating leads to i n e f f i c i e n c y i n those cases. An algorithm incorporating a weak quadratic i n t e r p o l a t i n g l i n e a r search using only function values (except f o r the gradient a v a i l a b l e at the o r i g i n of the search) has been demonstrated to be more e f f i c i e n t on a number of test functions than a s i m i l a r algorithm incorporating a weak l i n e a r search using both function values and gradients. The d i r e c t l i n e a r search algorithm has been shown to be as e f f i c i e n t as Fletcher's recent algorithm i n most t e s t cases, and i s n o t prone to the same problems as Fletcher's method. 52 Variable metric algorithms must have some-search which approxi mates the miriiiivum i n successive d i r e c t i o n s of search for r e l i a b l e and e f f i c i e n t operation on non-convex functions. 53 . REFERENCES 1. W.C. Davidon, "Variable Metric Method f o r Minimization" Argonne National Laboratory Report. ANL-5990. 2. J . Kowalik and M.R. Osborne, Methods for Unconstrained Optimization  Problems, American E l s e v i e r , New York, 1968. "3. R. Fletcher and M.J.D. Powell, "A Rapidly Convergent Descent Method for Minimization", Comp. J . 6, pp 163-168, 1963. 4. H.Y. Huang, "A U n i f i e d Approach to Quadratically Convergent Algorithms .for Function Minimization",.JOTA pp 405-423, 1970. 5. R. Fletcher, "A New Approach to Variable Metric Algorithms", Comp. J. 13, pp 317-322, 1970. 6. M. Wells, "Algorithm 251 Function Minimization", CACM 8, pp 169-170, 1965. 7. R. Fletcher, " C e r t i f i c a t i o n of Algorithm 251 CACM", 9, pp 686-687, 1966. 8. P.D.G. Barkham, pri v a t e communication. 9. H>Y-. Huang and A^V. Levy, "Numerical Experiments on Quad r a t i c a l l y Convergent Minimization Methods", Comp. J. 13, pp 269-282, 1970. 10. L.C.Q. Dixon, "Variable Metric Algorithms: Necessary and S u f f i c i e n t Conditions for I d e n t i c a l Behaviour on Non-Quadratics", Technical Report #26, Numerical Optimization Centre, The H a t f i e l d Polytechnique, 1970. 11. CG. Broyden, "Quasi-Newton Methods and Their A p p l i c a t i o n to Function Minimization", Maths. Comp. 21, pp 368-331, 1967. 12. D. Goldfarb, "A Family of Variable Metric Methods Derived by V a r i a t i o n a l Means", Math. Comp. 25, pp 23-26, 1970. 13. . D.F. Shanno, "Conditioning of Quasi-Newton Methods for Function M i n i -mization", Math. Comp. 24, pp 647-656, 1970. 14. M.J.D. Powell, "Recent Advances i n Unconstrained Optimization", Mathe-matical Programming 1, pp 26-57, 1971. 15. A.A. Goldstein and J.F. P r i c e , "An E f f e c t i v e Algorithm f o r Minimization", Num. Math. 10, pp 184-189, 1967. 16. W.H. Swann, "Report on the Development of a New D i r e c t Search Method of Optimization",I.C.I. L t d . , Central Instrument Laboratory Research Note 64/3, 1964. 17. A.V. Fiacco and G.P. McCormick, Nonlinear Programming: Sequential 54 Unconstrained Minimization Techniques, John Wiley and Sons Inc., New York, 1968. 18. M.J.D. Powell, "On the Convergence of the Variable Metric Algorithm", AERE, T.P. 382, 1969. 19. S.M. Goldfeld, R.E. Quandt and H.F. T r o t t e r , "Maximization by Improved Quadratic H i l l - C l i m b i n g and other Methods", Econometric Research Memo, No. 95, Princeton U n i v e r s i t y , 1968. 20. A.R. C o l v i l l e , "A Comparative Study on Nonlinear Programming Codes", IBM Tech. Rep. No. 320-2949. 21. M.J.D. Powell, "An I t e r a t i v e Method f or Rinding Stationary Values of a Function of Several Variables", Comp. J . 5, p 147, 1962. 22. H.H. Rosenbrock, "An Automatic Method f or Finding the Greatest or the Least Value of a Function", Comp. J. 3, p 175, 1960. 23. B.A. Murtagh and R.W.H. Sargent, "Computational Experience with Quadratically Convergent Minimization Methods", Comp. J . 12, pp 185-194, 1970. 24. M.R. Hestenes, " M u l t i p l i e r and Gradient Methods", i n : Computing  Methods i n Optimization Problems 2, Eds. L.A. Zadeh et a l . Academic Press, New York, 1969. 25. M.J.D. Powell, "A Survey of Numerical Methods for Unconstrained Optimization", SIAM Review, 12, pp 79-96, 1970. 55 APPENDIX I - EQUIVALENCE OF TRAILS FORMATION OF VARIABLES AND TRANSFORMATION OF INITIAL H ESTIMATE Given: f(x) a twice d i f f e r e n t i a b l e function of x. g(x) the gradient of f(x) H an n x n symmetric p o s i t i v e d e f i n i t e matrix and the following v a r i a b l e metric algorithm: d = -Hg 6 = ad x + = x + 6 + Y = g - 8 t t ' • i H+ „ H + • «i_ _ HJXH + v ^ ( Y t H Y ) 2 _ H l _ } 6 y Y Hy 6 Y Y HY (a re-arrangement of Fletcher's one-parameter family) Define the new function with transformed v a r i a b l e s : f(x) = f(Ax) = f(y) where A i s a non-singular l i n e a r transformation. Then g(x) = Atg (Ax) since | L . ? JL. ! ! i . (..; ..,/•* 9x. . , 9y. 9x. v l i ' 2i n i ' l j=l • ' l l 3f The algorithm steps are r e l a t e d iri the following way: d(x) = -H g(x) = -II A f cg (Ax) 56 x + = x + 6 Y + = A t (g(Ax +) - g(Ax)) and the same updating formula with H, 6, y sub s t i t u t e d f o r H, 6, y-I f we choose s t a r t i n g conditions XQ = A 1 X Q , HQ = A 1 HQ(A 1 ) T where the subscript refers to the i t e r a t i o n , then f ( x n ) = f(A A" 1) = f(x ) U XQ U g(x n) = A tg(A A 1 ) = A t g ( x n ) _ . „ .. 0 d(x 0) = -H 0g(x Q) = -A"1 H(A~V. At g(x 0).= A _ 1 d ( x 0 ) F(X) = f ( x Q + Ad Q) = f ( A ( A - 1 x 0 + AA _ 1d 0) = f ( x Q + Ad Q) = F(A) Since the function F(A) i n the d i r e c t i o n d» i s equivalent to the 0 function F(A) along the d i r e c t i o n dQ, on i d e n t i c a l choice of a = a w i l l r e s u l t from any l i n e a r search strategy, therefore 6 (x 0) = a d Q = A VdQ = A 6 and x l = x0 + 6o = A" 1 (XQ + 6 Q) = A x^ Therefore the f i r s t step i s i n v a r i a n t under the transformation. Now consider the updating: Y = A T ( g Q - g ^ = A % H0Y = A"1 HQCA" 1) 1 1 A % = A" 1 HQY 6FCY = S*" (A"'")*" Afcy = 6 TY ~ t t -1 t y Hy = y A A HQY = y Hy 5 7 , t„ ,2 .Ah A "^0 \ A - l v - (Y H Q ) ( — - — — ) = A v o y Y Hy s u b s t i t u t i n g i n t o the updating formula gives i i «- A ~ 1 x x t / A ~ 1 \ t A ^HnYY^nCA 1 ) t i , _ u A - 1 u / A - X \ t . A 66 (A ) 0 0 , , .-1 t , -I,, t H1= A H Q(A ) + ^ — : — + <j> A vv (A ) 6 Y Y H Q Y ,-1 r u . 68* H 0 Y y H 0 , , t, / A - l x t [Ho + Ti T— + *v v ] ( A } 6 Y Y HQY . • . ^ = A" 1 H X ( A _ 1 ) t Therefore the next and a l l subsequent steps are i n v a r i a n t , since the same argument can be repeated f o r a l l subsequent steps. That i s , the sequence of steps {x^} on the transformed function f(x) = f(Ax) may be obtained from the sequence {x^} simply by x^ = A "''x^  f o r any n x n non-singular l i n e a r A provided that X Q = A "*"XQ and HQ = A "*"HQ(A 1 ) t . The sequence {f^hremains unchanged. i 58 APPENDIX II LINEAR SEARCHES In a l l searches, the following are s p e c i f i e d by the c a l l i n g v a r i a b l e metric algorithm: ce i n i t i a l stepsize m u l t i p l i e r estimate x .. , o r i g i n of the l i n e a r search o l d g i , gradient at x . , °old " o l d f , , function value at x ., , ol d o l d d d i r e c t i o n of search function (x) supplies the value of the te s t function at x gradient (x) supplies the values of the gradient at x d § d t g o l d k the number of searches performed At the end of each search, the following are returned to the c a l l i n g v a r i a b l e metric algorithm: a chosen stepsize m u l t i p l i e r x the new point g the new gradient f the new function value . Cubic Search (used i n CUVM) fb = f o l d g b = dg x = x o l d step f a = f b 8 a = 8b x = x + ad f = function (x) 59 fb = f g = gradient ( x ) g b = g'd i f (g^ > 0 V f^ > f > go to i n t e r p o l a t e , a = 4 *«a . go to step . ( f a " V i n t e r p o l a t e z = 3 * H g a + g^ w = " g„& a°b a ( g b + w - z) X = h ~ K + 2w x = x - A * d f = function (x) g = gradient (x) i f (f, a f A f j? f, ) return b a b i f (f, >, f ) go to repeat b a x = x + A * d f = fb return repeat g b = g f cd a = a - A i f ((g^ < 0 A a < I O - 6 ) A k > n) e r r o r e x i t go to i n t e r p o l a t e end INT ((quadratic i n t e r p o l a t i n g ) x = x T J + a * d o l d f = function (x) i f (f £ ^ 0 i d ^ 8° t 0 derivative interpolation a = 0 a f = f u a old a = a c accelerate a = a b c b c a = 2a c c x = x , j + a * d old c f = function (x) c i f (f • < f,) go to accelerate c b direct interpolation ^ 2 2 2 2 2 . (oC - a ) f + ( a - a )£. + (a - a, )f 1 b c a c a b a b e a 2 (a, - a )f + (a - a ) f, + (a - a,)f t> c a c a b a u < x=x.., + a * d old f = function (x) i f (f < f^) go to gradient a = "b f = fb go to gradient derivative interpolation * r - i dg * a a = a * max [.1, ° — 2<dg*a + f o l d - f ) x = x , , + a * d old f = function (x) i f (f > f'-^d^ 8° t 0 derivative interpolation i f (f = f / A x = x , ,) i n s i g n i f i c a n c e e r r o r e x i t old old gradient g = gradient (x) return BKT (minimum bracketing) x = x , J + a * d old f = function (x) i f (f >, f , ,) to to interpolate old a = a a accelerate a = a + a a a a x = x 1 J + a * d old a f = function (x) a i f (f > f) go to gradient a a = a a f = f a go to accelerate interpolate o - a * max • 2 ( d g J o V f o l d - f) ] x = x o l d + a * d f = function (x) i f (f > f n j ) go to interpolate old i f (f = f i , A x = x . , ) insignificance error exit v old old i f (f = f g° to interpolate gradient g = gradient (x) return RDU (function value reducing) step x = x . . , + a * d ol d f = function (x) i f (f = f n , A x = x.) i n s i g n i f i c a n c e e r r o r e x i t o±d U i f (f - f l d < a * dg * 1 0 ~ 4 ) go to gradient a = a * max (1., mm (.5, 2(dg*a - f + % l d ) } • go to step gradient g = gradient (x) return Fletcher's Search (used i n FLVMM) step x = x , , + a * d r o l d f = function (x) g = gradient (x) i f (f - f l d < a * dg * 10~ 4) return dg 2 = d f cg < f n 1 d ~ f) z = 3 — ^ + dg + d g o w = z" - dg * dg 2 (dg 2 + w - z) a = a* max [ . 1 , 1 - -73 — — , 0 \ ] (dg 2 - dg + 2w) go to step ARD • a = .2 * a step x = x . . , + a * d o l d f - function (x) i f (f - f , , A x = x ) i n s i g n i f i c a n c e e r r o r e x i t o l d o b i f (f - f d < a * dg * 1 0 ) go to gradient o - a K max (.1, nan (.5, 2 ( d g 4 _ f + t ^ go to step gradient g = gradient (x) return 64 APPENDIX III A FAMILY OF RADIALLY -SYMMETRIC TEST FUNCTIONS Define y = — E x^ = — x x i = l and f o r r a d i a l symmetry, l e t f(x) = F(y) The gradient of f(x) i s 8 i W 3x. dy 3x. dy X i I ^ I J /• \ _ dF 8 ( x ) " dy" x and the Hessian i s : r 5 g i ( x ) _ _dV_^y_ , d F ^ i i j 3x. , 2 3x. X i dy 3x. J dy 3. . J = x.x. +4^ " <$.. (Kronecker delta) d y2 J 1 dy 13 dy 2 d y To s i m p l i f y the development, we note that the gradient i s co-l i n e a r with a radius vector. Symmetry allows us to l e t x = ( r , o, o)*" without l o s s of generality. The eigenvalue u °f G with eigenvector p = (1, 0 ... 0) can be solved f o r simply, and s i m i l a r l y f o r the tangential eigenvalue with eigenvectors x = (0, ... 1 . . . ) t by s u b s t i t u t i o n of 8.1 in t o 8.2. V p = Gp 8.2 65 but and d F 2 , dF u = —~T. r + — r 2 dy dy r = 2y 0 d 2F , dF V = 2 — - + — dy 2 1 . 3 s i m i l a r l y , dF 8.4 t dy We impose the f o l l o w i n g c o n d i t i o n s of f ( x ) w i t h corresponding c o n d i t i o n s on F ( y ) . 1. Unique minimum at x = 0, g(o) = 0, i m p l i e s dF dy y=0 d e f i n e d and dF dy >* 0 y > 0 2. Minimum value f ( o ) = 0 i m p l i e s F(o) = 0 3. G at minimum equals I , i d e n t i t y matrix i m p l i e s dF dy y=0 Test f u n c t i o n s were derived meeting the above c o n d i t i o n s . Function 1 (quadratic) F(y) = y y r'-«i y t - i Function 2 (convex) F(y) = e Y - 1 dF y d y " = e d l y dy and from 8.3 and 8.4 u = 2y e + e r y r = e y (1 •+ 2y) y y = e t Function 3 (convex) F(y) = 2 /y + 1 - 2 dF dy /- y+1 2 d F -1 dy 2 2 ( y + l ) 3 / 2 :2_ r ( y + D 3 / 2 (y+D 1 y = r ( y + l > 3 / 2 y 1 W 2 Function 4 (pseudoconvex) dF = (y+4) 4 - 4y d y " (y+4) 2 dF 16 d y (y+4) 2 d 2F -32 2 3 dy (y+4)"3 r (y+4) 3 (y+4) 2 = ~64y + 16y + 64 (y+4) 3 67 16 (4-3y) (y+4) 3 y = 3' y r = ° ' F = 1 y > y r < 0, F > 1 4 o <• y < —, l > y r > o, O < F < I 16 y t (y+4)2 Function 5 (quasiconvex) 3 2 F(y) = - + y dF = Z _ 2 Z 2 dy 9 3 y 3^ ; d 2F = 2v _ 2 , 2 9 3 dy y r = ^ f - " 2y + 1 = | (y - |) (y - 3) 1 >, Mr > 0 0 $ y < | u r - 0 y = | F - ^ l l = .487 y r < 0 | y < 3 y r = 0 y = 3 F = 1 y > 0 y > 3 r J y t = f - f y + i u t = 0 y = 3 y t > 0 y 4 3 , v dF g(x) = — x 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0101566/manifest

Comment

Related Items