A COMPARISON OF THREE RANDOM SEARCH METHODS by NICK BOROWSKI B.A.Sc., U n i v e r s i t y of B r i t i s h Columbia, 1961 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE i n the Department of E l e c t r i c a l Engineering We accept t h i s thesis as conforming to the required standard Research Supervisor » Members of the Committee Head of the Department Members of the Department of E l e c t r i c a l Engineering THE UNIVERSITY OF BRITISH COLUMBIA January, 1971 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f the r e q u i r e m e n t s f o r an advanced deg ree a t t he U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r a g r ee t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by t he Head o f my Depar tment o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Depar tment o f The U n i v e r s i t y o f B r i t i s h C o l u m b i a V a n c o u v e r 8, Canada ABSTRACT Three recent random search algorithms are compared on the basis of e f f i c i e n c y , and on the basis of s e n s i t i v i t y to noise, s c a l i n g and the number of v a r i a b l e s . A general d i s c u s s i o n of random search methods points out t h e i r advantages and disadvantages i n r e l a t i o n to d e t e r m i n i s t i c methods. A new random vector generator i s described i n the appendix. TABLE OF CONTENTS Page ABSTRACT II TABLE OF CONTENTS I l l LIST OF ILLUSTRATIONS IV 1. INTRODUCTION 1 Mathematical Programming Problems, 1 Random Search Methods> 1 Scope and Aim of Thesis> 3 2. DESCRIPTION OF RANDOM SEARCH ALGORITHMS 6 Adaptive Random Search, 6 . Adaptive Step Size Random Search, 10 Kj e l l s trout's Random Search, 15 3. RESULTS AND COMPARISONS 22 Average E f f i c i e n c y , 22 E f f e c t of Scaling, 29 Ef f e c t of High Dimensions, 29 E f f e c t of Noise, 32 Summary of Numerical Experiments, 34 4. GENERAL DISCUSSION OF RANDOM SEARCH METHODS 39 Other Random Search Algorithms*39 E f f i c i e n c y of Random Search Methods, 40 Advantages of Random Search Methods, 43 Suggestions for Further Research, 44 5. CONCLUSION 47 APPENDIX 48 REFERENCES • 56 LIST OF ILLUSTRATIONS Page 1. Flowchart of Adaptive Random Search 8 2. Flowchart of Adaptive Step Size Random Search ... 11 3. Performance Check on Adaptive Step Size Random Search Hypersphere of Dimension n 14 4. Flowchart of Kjellstrom 's Random Search 17 5. Performance Check on Kjellstrom's Random Search - Test Functions II and I I I 19 6. Performance Check on Kjellstrom's Random Search -The Moon Problem 20 7. Sample Data - Adaptive Random Search and Test Function I I I 23 8. Comparison of E f f i c i e n c y - Test Function I 25 9. Comparison of E f f i c i e n c y - Test Function II 27 10. Comparison of E f f i c i e n c y - Test Function I I I 28 11. Comparison of E f f i c i e n c y - Test Function IV 30 12. E f f e c t of Scaling - Test Function V 31 13. E f f e c t of High Dimensions - Test Function VI 33 14. E f f e c t of Noise - Test Function VII 35 15. E f f e c t of Noise - Test Function VIII 36 16. D e f i n i t i o n of Variables f o r the Derivation of the P r o b a b i l i t y Density Function 50 17. Generation of Random Numbers by a Refined Acceptance -Rejection Method 50 18. Sample Histogram from the Random Vector Generator 54 ACKNOWLEDGEMENT I wish to aknowledge the encouragement and assistance of my supervisor, Dr. G. F. Schrack. In p a r t i c u l a r , he pointed out to me a number of relevant t e c h n i c a l papers which I might otherwise have missed. Further, I wish to acknowledge the f i n a n c i a l support of the National Research Council of Canada and the Un i v e r s i t y of B r i t i s h Columbia. 1 1. INTRODUCTION Mathematical Programming Problems This thesis deals with a certain class of techniques for the num-e r i c a l solution of a mathematical programming problem, which may be stated as follows: Minimize Q(x) with respect to x, subject to the constraints h (x) = 0 i = 1,2,. . . . , m<n g ±(x) >, 0 i = 1,2,...,p where x denotes a vector i n n-dimensional Euclidean space with components x^, X ^ J - . - J X ^ . In the context of optimization problems Q(x) i s called the quality function, and i s a measure of performance of a system with parameters x.. I t i s i n general nonlinear, as are the functions h_^(x) and g^(x) . This problem occurs i n diverse areas of applied science, such as optimum network design [22,41], adaptive control devices [19,26] and the optimization of chemical processes. The problem of finding the optimal control function may also be formulated i n this manner. For example, i n one such formulation the parameter vector x represents a point i n i n i t i a l co-state space [21], and i n another i t represents the co e f f i c i e n t s of a polynomial approximation to the control function. Random Search Methods Searches for the solution of the mathematical programming problem may be dichotomized i n several ways. Thus a pa r t i c u l a r search method may be c l a s s i f i e d as being either direct or i n d i r e c t , simultaneous or sequential, deterministic or random. A direct search does not require the evaluation of derivatives and sometimes not even thei r existence. Sequential methods use 2 information gained i n previous steps of the search to guide future steps, whereas simultaneous methods make no use of such information. The d e f i n i t i o n of "random search" used i n this thesis i s that of Schumer [38]: "The search technique i s deterministic i f , when the search starts under i d e n t i c a l conditions, i t always follows the i d e n t i c a l route to the i d e n t i c a l outcome. If on the other hand, i t does not always follow i d e n t i c a l paths, the method i s said to be a randomized search method". I t should be noted that two assumptions are made. F i r s t , i f the random numbers required for the search are obtained from a pseudo-random number generator i t must be r e - i n i t i a l i z e d d i f f e r e n t l y for each run. Second, random variables do not enter into the problem i t s e l f , only into the method of solution. Methods of dealing with the case i n which random variables form part of the problem to be solved come under the heading of stochastic programming and are not considered here. Most random methods incorporate various degrees of determinism i n that they use previous information to guide the next t r i a l step. I f no such determinism enters into the search i t i s called pure random search, which means the same as simultaneous random search. On the other hand, i f the results of previous steps completely determine the next t r i a l step the method i s obviously deterministic. Between these extremes l i e s the spectrum of sequential random search methods. Pure random searches are useful for the location of global extrema whereas sequential random searches are gen-e r a l l y more e f f i c i e n t for l o c a l extrema. The large majority of search methods found i n the l i t e r a t u r e at the present time i s deterministic, but a number of random search methods have also appeared i n recent years, beginning with Brook's paper i n 1958 [5]-Since then at least a dozen algorithms have been published and a number of 3 theses have been written on the subject [2,8,9,11,12,15,19,21,22,24,25,26, 31,35,38,41]. However, with some of these algorithms, l i t t l e attempt has been made to ascertain th e i r advantages and disadvantages. Sometimes only convergence to) the solution i s proved, and i t i s then stated that the method i s efficient,without benefit of supporting data. Scope and Aim of Thesis Three sequential random search algorithms w i l l be compared against each other and, i n less d e t a i l , against deterministic methods. They were designed for d i g i t a l computer implementation. The choice of these p a r t i c u l a r methods was not completely arb i t r a r y . They were chosen because they are r e l a t i v e l y recent and exemplify the variety of approaches that may be used. The comparisons w i l l be more ind i c a t i v e than d e f i n i t i v e i n nature because the result of any comparison of search methods depends to a large extent on the p a r t i c u l a r test function that i s used, and on the values assigned to the various parameters that occur i n most algorithms. In fact there i s no generally v a l i d basis of comparison for search algorithms because their intended application i s also an important factor i n deciding on the super-i o r i t y of one algorithm over another. However, even a very rough quantitative comparison i s preferable to no such comparison at a l l . The convergence of an algorithm may be studied i n terms of location convergence or value convergence. The former refers to the convergence of x to x*, the optimum parameter vector, while the l a t t e r refers to the con-vergence of Q(x) to Q(x*), the optimum value of the quality function. I t i s always assumed here that x converges to x* as Q(x) converges to Q(x*)> This assumption evidently holds for unimodal functions i n the absence of noise. Thus, the convergence rate of any one method r e l a t i v e to a given test function i s measured only i n terms of the rate at which Q(x) converges to Q(x*). This rate i s determined by the number of function evaluations required to reduce Q(x) by a given r a t i o . Other properties of the algorithms w i l l also be considered, such as the e f f e c t s of poor s c a l i n g and high dimensions. R e l a t i v e l y simple test functions have been composed for the purpose of i s o l a t i n g these e f f e c t s . Though not s p e c i f i c a l l y designed for the case where noise i s present, the algorithms w i l l also be compared on the basis of such problems. None of these algorithms i s s p e c i f i c a l l y equipped to deal with e q u a l i t y c o n s t r a i n t s . If such were present they would f i r s t have to be con-verted i n t o some other equivalent form. Thus, the above eq u a l i t y constraints may be approximated by the i n e q u a l i t y constraints |h.(x)| < £. where are s u i t a b l y small p o s i t i v e numbers. A l t e r n a t i v e l y , the q u a l i t y function Q(x) may be replaced by the penalty function P(x). a P(x) 2 Q(x) + J h . 2 (x) Both a l t e r n a t i v e s have t h e i r advantages and disadvantages. For example, penalty functions introduce steep curving v a l l e y s into the surface whose minimum i s to be found, but they do not require an i n i t i a l f e a s i b l e point to s t a r t the search. If i n e q u a l i t y constraints are present, however, an i n i t i a l f e a s i b l e point must be given. In the following comparisons no equality constraints are present, or they may be assumed incorporated into the problem as penalty functions. For the i n e q u a l i t y constrained test function an i n i t i a l f e a s i b l e point i s given. A c t u a l l y , two of these algorithms are not s p e c i f i c a l l y equipped to deal with i n e q u a l i t y constraints e i t h e r . A simple common expedient i s used i n such cases. Wherever the i n e q u a l i t y constraints are not s a t i s f i e d Q(x) i s assigned a s u f f i c i e n t l y large value so that a step outside the f e a s i b l e region w i l l always r e s u l t i n a f a i l u r e of that step to produce an improvement. T h i s e f f e c t i v e l y confines the search to the f e a s i b l e region. The comparison of the algorithms i s the primary aim of the t h e s i s . A secondary aim i s to answer the question: Why use random search methods when there i s already a large number of d e t e r m i n i s t i c methods a v a i l a b l e f o r these problems? From the r e s u l t s of these comparisons as w e l l as from a study of the e x i s t i n g l i t e r a t u r e on random search methods some answer to t h i s question w i l l be a r r i v e d at. 6 2. DESCRIPTION OF RANDOM SEARCH ALGORITHMS A l l three algorithms have been reconstructed from t h e i r d e scriptions and s i m p l i f i e d flow charts as published i n the papers by Matyas [26], Schumer and S t e i g l i t z [39], and K j e l l s t r o m [22]. Where parameters have been found to be merely delimited or l e f t altogether unspecified reasonable estimates of t h e i r best values have been used. The e f f i c i e n c y of each reconstructed algorithm i s then checked against that of the o r i g i n a l v e r s i o n by using the same tes t functions as were used by the author of that p a r t i c u l a r algorithm. M u l t i p l e stopping rules are used throughout. In order to guard against unexpectedly large computing times each algorithm i s designed to stop the search a f t e r a predetermined number of function evaluations. The second stopping r u l e that i s common to a l l algorithms depends on the value of the q u a l i t y function Q(x). In a d d i t i o n to these each algorithm contains one or more stopping rules p e c u l i a r to that p a r t i c u l a r method. The stopping rules remain unchanged from problem to problem. Adaptive Random Search (ARS) The e a r l i e s t of the three methods considered, t h i s i s an a r b i t r a r y common sense method. Nevertheless, i t i s capable of adapting both step s i z e and step d i r e c t i o n according to success or f a i l u r e of previous steps. I t s convergence has been proven [26,27], but the rate of t h i s convergence i s open to question. The algorithm has previously been tested or discussed by several w r i t e r s [9,28,29,35]. A stepwise d e s c r i p t i o n follows. S t a r t i n g from u , a f e a s i b l e point i n parameter space, a random step 6 i s generated according to 6 k = b k + T V th k k where k denotes the k i t e r a t i o n , b i s a bias vector and £ i s a normal random vector with covariance matrix equal to the i d e n t i t y matrix. The k transformation matrix T i s an (n x n) matrix, i n t h i s case r e s t r i c t e d to be diagonal with a l l elements equal to a, the standard deviation. T = diag (a) The new t r i a l point i s then x . lc k Pk X = U + 0 The q u a l i t y f unction Q(x) i s evaluated at t h i s point and a comparison with Q*, the best value of Q(x) so f a r , determines success or f a i l u r e of th i s step. When the i n e q u a l i t y Q(x k) < Q* - e i s s a t i s f i e d , and no constraints are v i o l a t e d , the step i s a success and the v a r i a b l e s are updated as follows: b = c b + d 6 s • s T k + 1 «, a T k s k+1 k , .k u = u + o If the i n e q u a l i t y i s not s a t i s f i e d , or i f a constraint i s v i o l a t e d , the step i s a f a i l u r e and the v a r i a b l e s are updated as follows: b k + 1 = c f b k + d f 6 k T k + 1 = a f T k k+1 k u = u The threshold e i s a s u i t a b l y small p o s i t i v e number. The constants must s a t i s f y the i n e q u a l i t i e s : OSc $1 d >0 c + d >1 s s s s 0Sc f<l d f$0 |c + d |<1 input parameters M, u, a, of, e i n i t i a l i z e Q>Q(u) K>0 step i s unsuccessful K>M Yes maximum computing e f f o r t i s exceeded WRITE Q", u EXIT generate t r i a l point <5-HD+T£ x-Hi+S K+-K+1 No improvement Q(x)<Q*-e No V a-K).9 a b ^ b + d ^ f f CKO , Yes Yes 1 step i s successful 0>Q(x) u+x a-KL. l a b-^ -c b+d 6 s s e^O.0001|Q*1 No a<--o, Figure 1 Flowchart of Adaptive Random Search The following values have been assigned i n t h i s implementation of the algorithm c = c = 0.75 s f d = 0.5 d^ = -0.25 s f a = 1.1 a = 0.9 s f As i t was decided to make the standard deviation a v a r i a b l e i t was found necessary to also set a lower l i m i t on the values that a can take i n order to prevent the steps from becoming so small that further improvements become impossible. An upper l i m i t for 6 , as suggested i n the o r i g i n a l paper, has not been incorporated because the algorithm, as implemented, can quickly recover from an overly larp". step s i z e . Intro-duction of a d d i t i o n a l a r b i t r a r y parameters i s therefere unnecessary. The constants c g and c^ have the e f f e c t of reducing the influence of previous steps as they recede farther into the past h i s t o r y of the search. They are therefore c a l l e d r e j e c t i o n c o e f f i c i e n t s . The constants d^ and d^on the other hand, modify the bias b for the next step according to success or f a i l u r e of the present step. Since they incorporate new experience i n t o the search e f f o r t they are c a l l e d detection c o e f f i c i e n t s . The i t e r a t i v e adaptation of step length and d i r e c t i o n makes t h i s i n fact a learning algorithm [29]. Further d e t a i l s are given i n the flowchart ( f i g . 1). In a l l flow-charts the v a r i a b l e K i s used as a counter for the number of function evaluations, and M i s the maximum value of K before the search i s stopped. L i t t l e data i s a v a i l a b l e for checking the performance of t h i s implementation against the o r i g i n a l . In fac t only a quadratic i n two va r i a b l e s i s given [26]: 2 2 Q(x) = 0.26x 1 + 0.26x 2 - 0.48x^2 St a r t i n g from the point (15,30) an average of 49 function evaluations were required by th i s author's version of the algorithm to f i n d a point where Q(x) was les s than 0.2. About 25 of these were s u c c e s s f u l . The r e s u l t s compare favourably with corresponding figures given by Matyas, who required 93 function evaluations, 55 of which were s u c c e s s s f u l . For t h i s test function, then, the present ve r s i o n i s somewhat more e f f i c i e n t than the o r i g i n a l . However, almost any method performs reasonably w e l l on such a simple problem and therefore no conclusion i s warranted on how c l o s e l y the performance of the present version resembles that of the o r i g i n a l i n more d i f f i c u l t problems. Adaptive Step Size Random Search (ASSRS) This algorithm, proposed by Schumer and S t e i g l i t z [39], i s based on a more d e t a i l e d mathematical analysis than ei t h e r of the other two. Even so, the p r a c t i c a l version contains several unspecified parameters to which values had to be assigned somewhat a r b i t r a r i l y . The algorithm was designed for unimodal, w e l l scaled, unconstrained functions with many vari a b l e s . Given an i n t t i a l point u i n n-dimensional parameter space and an i n i t i a l step s i z e s, a t r i a l point x^"^ i s computed according to x ( 1 ) = u + sr Id e a l l y the random vector r i s uniformly d i s t r i b u t e d on the surface of the unit h-sphere, but i n pr a c t i c e an approximation i s used. If the in e q u a l i t y Q(x ( 1 ))<Q(u) -i s not s a t i s f i e d another t r i a l point i s chosen, s t a r t i n g from the same point u. A f t e r I successive f a i l u r e s the step s i z e s i s reduced to m s / ( l + A) and the search i s then continued as before. The parameter A i s a p o s i t i v e number les s than 1. When the i n e q u a l i t y i s s a t i s f i e d a success i s reg i s t e r e d at x^ 1^ and the immediately following t r i a l step 11 ENTRY input parameters M, u, s, s f , O f No check stopping c r i t e r i o n K>M or s<s., Yes — ? 0 No K I m „No Yes I-e-I+l I+-0 s^s/(l+A) i s K some mu l t i p l e of a large, number? Yes t r y a much la r g e r step s i z e , update i f successful i n i t i a l i z e K+-0 1^0 Q*«-Q(u) GENERATE r 6«-sr x : ( 1 )^u +6 K<-K+l LOOP improvement ? q(x' (1W Yes tr y larger step x< 2 )«-x ( 1 ) +A6 K+-K+1 Q ( x < 2 » X Q ( x < 1 ) ) No x^"^ i s better point u ^ ( 1 ) I«-0 Q <Q, Yes LOOP Yes (2) i s better point (2) u-^ x Q ^ ( x u ; ) I-K) s -<-s(H-A) WRITE Q , u EXIT Figure 2 Flowchart of Adaptive Step Size Random Search to x i s made i n the same d i r e c t i o n but with a longer step length s ( l + A). Whichever step length produces the greater improvement becomes the new nominal step length, and the corresponding t r i a l point becomes the new s t a r t i n g point u. A f t e r a large number of steps have been t r i e d a s i n g l e t r i a l i s c a r r i e d out with a much larger step s i z e . Again the current values of s and u are updated i f a success i s r e g i s t e r e d . Since t h i s l a s t mentioned feature has proved to be rather i n e f f e c t i v e during preliminary t r i a l runs i t has been strongly de^emphasized i n t h i s implementation. This was done by choosing the above mentioned large number s u f f i c i e n t l y large. The number 100 was used. One important change has been introduced d e l i b e r a t e l y . As described above, a step of length s ( l + A) i s t r i e d only i f the immediately preceding step was s u c c e s s f u l . In the o r i g i n a l version, both function evaluations were apparently c a r r i e d out simultaneously, and the b e t t e r improvement then determined the new s t a r t i n g point and the new step length. By changing from simultaneous to sequential t r i a l s a saving i n the number of function evaluations has been r e a l i z e d , as w i l l be shown. Ad d i t i o n a l d e t a i l s are given i n the flowchart ( f i g . 2 ) . Those v a r i a b l e s not already mentioned have the following s i g n i f i c a n c e : Variable I counts the number of successive f a i l u r e s ; s^ and are minimum values of s and Q(x) r e s p e c t i v e l y . The parameter I was set to 3 and A was set to 0.618. The m . l a t t e r i s an approximation to the golden section number , which i s the s o l u t i o n of the equation ^ 2 Y +Y =1. Both these values were found by experiment to be approximately optimum, though the value for A was not very c r i t i c a l . The mathematical analysis of the Optimum Step Size Random Search [38], a t h e o r e t i c a l algorithm which i s approximated by ASSRS, predicts that for the case of the n-dimensional hypersphere the number of function evaluations m required to obtain a s p e c i f i e d accuracy f o r the minimum i s asymptotically proportional to the dimensionn, i . e . , n Q m - = ¥ l o g B i s a p o s i t i v e constant such that B/n i s the expected normalized improvement i n Q(x) per function evaluation. Q q and are the i n i t i a l and f i n a l values of Q r e s p e c t i v e l y . This linear r e l a t i o n i n n i s remarkable because f or most d e t e r m i n i s t i c methods m increases as some power of n greater than 1. For example, i n the case of the Newton-Raphson method the corresponding r e l a t i o n i s [39] m = (n + l ) 2 and for the Simplex method of Nelder and Mead [33] 2.11 m=3.16 (n+1) It should be kept i n mind, however, that f o r the simple gradient method i n t h i s case m = n + n^ where n^ i s the number of function evaluations required f o r the l i n e a r search along the gradient. The performance of t h i s algorithm was compared with that of the o r i g i n a l only i n connection with the above mentioned hypersphere (Test Function I with d i f f e r e n t values f o r n). The r e s u l t s of t h i s performance check are shown i n fig u r e 3. Each data point represents a mean of 10 runs. * -8 The e f f e c t i v e stopping c r i t e r i o n i n t h i s case was 0 ^10 . Whereas the r e s u l t s of Schumer and S t e i g l i t z are w e l l described by the r e l a t i o n m ~ 80nDimension,-n Figure 3 Performance Check on ASSRS - Hypersphere of Dimension n 15 the corresponding curve f o r t h i s implementation of the algorithm appears to be asymptotic to m - 68n-100, In t h i s algorithm a random d i r e c t i o n r i s chosen by generating n random numbers, each uniform on the i n t e r v a l (-1,1), and then normalizing. This procedure does not generate a t r u l y uniform density of points over the unit sphere. In order to test i f t h i s non-uniformity a f f e c t s the e f f i c i e n c y of the method a s p e c i a l algorithm was designed to generate these uniform d e n s i t i e s exactly. This random vector generator was then used i n the search algorithm instead of the approximate method and the ^ r e s u l t s were pl o t t e d on the same graph ( f i g . 3). I t i s c l e a r that there i s no s i g n i f i c a n t change. In terms of computer time i t i s much more e f f i c i e n t to use the approximate method, e x p e c i a l l y f o r large n. The random vector generator i s described i n the appendix. Kjellstrom's Random Search (KRS) This algorithm [22], which i s the most recent of the three, e n t a i l s the greatest amount of i n c i d e n t a l computations, i . e . , computations other than those required for evaluation of the q u a l i t y function Q(x). The c h a r a c t e r i s t i c feature of t h i s method i s i t s use of a random walk i n a s u i t a b l y defined region of parameter space i n order to determine the c o r r e l a t i o n between parameters. This information i s then used to guide the random walk i n the succeeding i t e r a t i o n , which i s r e s t r i c t e d to a smaller region about the minimum. The following stepwise d e s c r i p t i o n i s most d e t a i l e d where the o r i g i n a l was most general, thus eliminating uncertainty as to how the algorithm was inte r p r e t e d . The random walk begins from an i n i t i a l f e a s i b l e point u. The allowable region i n parameter space i s the set of a l l points x where Q(x)<q, some s p e c i f i e d constant, and where a l l constraints are s a t i s f i e d . A three-dimensional subspace i s chosen, corresponding to the parameters 16 x., x, and x . A t r i a l point x i n t h i s subspace i s located by varying J k 1 X j , x^ and x^ while keeping a l l other x^ unchanged. These v a r i a t i o n s 5^ are computed according to the r e l a t i o n ( £ r 6 k, 6 1 ) t = A.C where t denotes the transpose, A, i s a (3 x 3) matrix corresponding to th t h i s j — subspace, and £ i s a three-dimensional normal random vector. The n components of x are computed as follows: x. = u. + 5. for i = j , k , l 1 1 u i J • 1 o x. = u. f o r i $ j,k ,1 1 1 If X i s i n the allowable region i t i s stored along with the corresponding value of Q(x). Then the step s i z e parameter D i s increased and the search i s continued i n the next subspace, s t a r t i n g from the newly found point x, which i s also the new value of u. When the t r i a l point i s outside the allowable region, D i s decreased and the search continued i n the same subspace from the same point. A f t e r L points have been located the random walk i s stopped and the r e s u l t s are analyzed. F i r s t the mean, and then the covariance and c o r r e l a t i o n matrices of the stored points are computed. For each parameter x_. the two most c l o s e l y c o r r e l a t e d parameters x^ and x^ are determined, and the t r i p l e t s j , k, 1 are stored. Next the concentration e l l i p s o i d i s computed. I t consists of a l l points x which s a t i s f y the i n e q u a l i t y t -1 ,„ x u x$n+Z where t again denotes the transpose and y i s the covariance matrix j u s t computed. Corresponding to each t r i p l e t a cut i s made i n th i s e l l i p s o i d 17 ENTRY input paramaters M, u, L M , D, D f ) Qf, n i n i t i a l i z e q-K}(u)+0.01 |Q(u).| A.-HJNIT MATRIX, a l l . 3 2 K-K), L+0, j-*-l generate t r i a l point x-Ki+DA C K-HC+1 I x i n allowable region ? Q(x)<q Yes store x L-*-L+l -L x -<-x u-<-x IX-1.115 D EK-0. 933D D-«-D f No random walk complete ? No go to next subspace i f j>n set j-f-1 Yes analysis of random walk COMPUTE y = cov(x) SELECT TRIPLETS j , k, 1 COMPUTE y " 1 , y." 1, y., A. 2 2 2 SORT AND RELABEL POINTS Q(x 1)$Q(x 2)$Q(x 3)^... u+-x , L-<-0 q+-MEDIAN VALUE OF Q(x) No K>M Yes WRITE q,u Figure 4 Flowchart of Kjellstrom's Random Search 18 by s e t t i n g to zero a l l elements o f u " whose double subscripts are not combinations of the numbers j , k and 1. The above inequality i s then scaled by the factor 5/(n+2) r e s u l t i n g i n the new inequality t -1 ,. x 1J • x<;5 which describes the three-dimensional e l l i p s o i d r e s u l t i n g from t h i s cut. This r e l a t i o n also defines u.., the covariance matrix of the cut [ l O ] . Next y i s factored into two triangular matrices A . and 3 3 3 V. = A.Afc 2 3 3 th and A^ i s stored to guide the random walk i n the j — subspace during the next i t e r a t i o n . This means that n such cuts and matrices w i l l be computed. The stored values of Q(x) are then sorted i n ascending order. That value of x which corresponds to the lowest value of Q(x) becomes the new s t a r t i n g point u for the next random walk. The new allowable region i s the set of a l l points x s a t i s f y i n g the inequality constraints and the ine q u a l i t y Q(x) <q, where q i s the median of the stored values of Q(x). When the preassigned number of function evaluations M have been exhausted, or when D f a l l s below a preassigned value D^,, the search i s stopped. The current value of u and any points stored at that time are approximations to x while q i s an approximation to Q(x*). The variable L i n the flow chart ( f i g . 4) counts the current number of points that have been stored i n the current i t e r a t i o n of the random walk. The constants 1.115 and 0.933 were specified by Kjellstroru who obtained these experimentally. Again the performance of t h i s algorithm was checked against that of the o r i g i n a l by using the same test functions as were used by Kj ellstrom (figures 5 and 6). The performance i s only s l i g h t l y below that of the o r i g i n a l algorithm i n a l l cases except that of the H e l i c a l Valley (Test 4 4 Kjellstrom's Results Present Version of KRS 1 I I __J 500 1000 1500 2000 Function Evaluations g, """'"••"Figure 6 Performance Check on KRS - The Moon Problem 21 Function I I ) , where t h i s version i s only about half as e f f i c i e n t . This discrepancy may be due to Kjellstrom's use of a v a r i a t i o n of t h i s algorithm that uses the updated mean of the points located i n the allowable region as the s t a r t i n g point of the search on alternate steps. Kjellstrom.does. not indicate how much weight i s to be attached to h i s r e s u l t s . The re s u l t s of the present version represent averages over 10 runs. A l l test functions are described i n d e t a i l i n the next chapter. 22 3. RESULTS AND COMPARISONS Some of the test functions used i n these comparisons have been taken from the papers o r i g i n a l l y describing these algorithms, and the others have been composed with a view to e x h i b i t i n g s p e c i f i c properties of these algorithms. They were implemented i n Fortran on an IBM/360-67 computer under the Michigan Terminal System. A l l data for comparison are presented g r a p h i c a l l y . As i n the case of the performance checks each l i n e of the graph represents a mean of 10 runs with d i f f e r e n t i n i t i a l i z a t i o n s of the random number generator. The averages were ca l c u l a t e d at regular i n t e r v a l s , usually every 100 function evaluations. The q u a l i t y function i s p l o t t e d on a logarithmic scale (base 10), both for the purpose of d i s p l a y i n g the rate of convergence of the various algorithms over a wide range of function values, and for e x h i b i t i n g c e r t a i n l i n e a r properties of convergence such as were discussed i n connection with ASSRS. A l l te s t functions are unimodal with a minimum value of zero. A sample graph showing the 10 separate runs, from which the mean was c a l c u l a t e d , i s shown i n fig u r e 7. I t in d i c a t e s the order of magnitude of the d i s p e r s i o n of the separate runs. A d e t a i l e d analysis of the dis p e r s i o n was not c a r r i e d out, since the emphasis here i s on the average e f f i c i e n c y of these methods. Average E f f i c i e n c y Four t e s t functions are used as a basis of comparison of the average e f f i c i e n c y of the algorithms. Test Function I has already been r e f e r r e d to i n the previous chapter. Test Function I: The Hypersphere Q(x) = Data Points of I n d i v i d u a l Runs Mean of 10 Runs Figure 7 Sample Data - ARS and Test Function I II 10 24 x° = ' ( 1 , 1 , . . . , 1 ) x* = ( 0 , 0 , . . . , 0 ) o * The search s t a r t s at x and the minimum i s located at x . This function i s very easy to minimize by any method, and serves mainly as a benchmark fo r f u r t h e r comparisons. The r e s u l t s ( F i g . 8) show that ASSRS performs best i n th i s case. ARS i s about equally e f f i c i e n t u n t i l the function value drops below about —6 10 , a f t e r which point i t makes l i t t l e a d d i t i o n a l progress. This behaviour of ARS was believed to be due to the lower l i m i t on the standard deviation a .. In thi s implementation of the algorithm was set to 0.001. To check i f the hypothesis i s r i g h t another set of 10 runs was ca r r i e d out with o"^ set to 0.01. This curve i s also shown i n fi g u r e 8 and confirms the correctness of the guess. I t i s further noted that the convergence of ASSRS i s l i n e a r i n Qo log as well as i n n, p r e c i s e l y as predicted by a n a l y s i s . Somewhat f Q o unexpectedly, the convergence of KRS i s also l i n e a r i n log (jr~)> and Q f even AJIS i s approximately l i n e a r for as long as i t i s making s i g n i f i c a n t progress. Assuming that second d e r i v a t i v e s can be cal c u l a t e d with s u f f i c i e n t accuracy the Newton-Raphson method would require 36 function evaluations —8 to reduce Q(x) to a value l e s s than 10 . The most e f f i c i e n t of these random search methods, ASSRS, requires about 270. Test Function I I : The H e l i c a l V a l l e y Q(x) = 1 0 0[(x 3 - 1 0 t ) 2 + ( r - l ) 2 ] + x 3 2 x^ = r cos 2TT t x 2 = r s i n 2Tft • _ -2.5$x 3$7.5 ' * • 26 x° = (-1,0,0) n* = (1,0,0) Test Function II[13] i s much more d i f f i c u l t to minimize. Its behaviour i s s i m i l a r to that of penalty functions, i . e . , i t has steep curving v a l l e y s i n the performance surface. In t h i s problem KRS i s most e f f i c i e n t while ASSRS i s l e a s t e f f i c i e n t ( F i g. 9). By comparison the Fletcher Powell v a r i a b l e metric method requires 81 function evaluations [32], and although an exact comparison would require s p e c i f i c a t i o n of the l e v e l of accuracy involved i t i s nevertheless c l e a r that the d e t e r m i n i s t i c method i s much more e f f i c i e n t i n terms of function evaluations. Test Function I I I : The Quartic Q(x) = ( x 1 + 1 0 x 2 ) 2 + 5 ( x 3 - x 4 ) 2 + ( x 2 - 2 x 3 ) 4 + 10 ( x ^ ) 4 x° = (3,-1,0,1) x = (0,0,0,0) This test function [13] i s very s i m i l a r to the previous one, and the r e s u l t s are also s i m i l a r (Figure 10). Test Function IV: The Moon Problem n 9 Q(x) = E ( x . - l / 2 ) Z - f i f g(x)>n 1 6 = 10 (stands f o r °°) i f g(x)£n n 2 g(x) = E x 1 . x° = (-1.2,-1.2,...,-1.2) x* = (1,1,.,-.,1) o o 500 1000 1500 2000 Function Evaluations Figure 9 Comparison of E f f i c i e n c y ^.Test Function II .'l\> This function [22] has already been re f e r r e d to i n the d e s c r i p t i o n of K j e l l s trom's algorithm. The value of n i s here set to 6. The search o * proceeds halfway around a hyperspherical forbidden region from x to x , always keeping close to the boundary of the f e a s i b l e region; hence the name. Results (Fig. 11) show that KRS i s again the most e f f i c i e n t method. ARS performs poorly, and ASSRS breaks down completely, as expected, because i t requires the v a l l e y i n the performance surface to be smooth. E f f e c t of Scaling A simple test f u nction has been composed to demonstrate the extent to which the various algorithms depend on the constant-Q surfaces being hyperspherical. Test Function V: The H y p e r e l l i p s o i d Q(x) = O . l x 2 + E x 2 1 2 x x° = (1,1,...,1) x* = (0,0,...,0) This function d i f f e r s from Test Function I only i n the c o e f f i c i e n t 2 f o r x^. The 0.1 c o e f f i c i e n t has the e f f e c t of elongating the hypersphere i n the x^ d i r e c t i o n so that the r a t i o between major and minor axes i s l;v / To(Fig. 12). For ease of comparison the corresponding curves for the hypershpere have been reproduced on the same graph with dotted l i n e s . It i s seen that ASSRS i s by f a r the most s e n s i t i v e to s c a l i n g , the other two being almost unaffected. E f f e c t of High Dimensions The r e s u l t s of Schumer and S t e i g l i t z suggest that random search becomes r e l a t i v e l y more e f f i c i e n t as the number of v a r i a b l e s increases. 4 4 32 To see how the algorithms under i n v e s t i g a t i o n compare i n t h i s respect tests were c a r r i e d out on the hypersphere with n = 10. Test Function VI: The Hypershpere i n Ten Dimensions 10 Q(x) = I x. 1 1 x° = (1,1,...,1) x* = (0,0,...,0) From the r e s u l t s ( F i g . 13) i t i s apparent that the e f f i c i e n c y of KRS decreases most r a p i d l y with an increase i n dimensions. As n doubles the number of function evaluations required f or a predetermined accuracy also doubles approximately f o r ASSRS, and for ASS i n th e ' l i n e a r region, but more than quadruples for KRS. E f f e c t of Noise If the q u a l i t y function must be evaluated by experiment, for example, a c e r t a i n amount of noise i s usually unavoidable due to random errors of measurement and the influence of the environment. I t i s therefore i n t e r e s t i n g to check how these random search algorithms are af f e c t e d by small random errors i n 0(x). Such problems are properly the subject of s t o c h a s t i c programming methods [42]. Test Function VII: The Hypersphere with Gaussian Noise 5 Q(x) =(Z x ) (1+0.01 £ ) 1 x° = (1,1,...,1) x* =' (0,0,'. . . ,0) This i s again Test Function I, except that a noise term has been added. Test Function VI (n-10) Test Function I (n=5) Function Evaluations Figure 13 E f f e c t of High Dimensions - Test Function VI 34 As before, £^ denotes a.normal random v a r i a b l e with mean 0 and standard deviation 1. Results ( F i g . 14) show that t h i s 1% noise l e v e l i s of l i t t l e consequence for a l l three algorithms. Location convergence,not shown here, was not noticeably affected e i t h e r . Yet for any method r e q u i r i n g gradients to be determined by taking f i n i t e d i f f e r e n c e s , t h i s amount of noise would very l i k e l y reduce the e f f i c i e n c y d r a s t i c a l l y [16, 18]. Test Function VIII: The Hypersphere with Uniform Noise ' ' ' 5 2 Q(x) = E x. + 0.05y. 1 1 x° = (1,1,...,1) x* = (0,0,...,0) The random v a r i a b l e y i s uniformly d i s t r i b u t e d on the i n t e r v a l (-1,1), and the maximum noise amplitude i s constant at 1% of the s t a r t i n g value of Q ( x ) . In t h i s test function then, and only i n t h i s one, Q(x) can become negative. The stopping c r i t e r i o n was there-fore set to | Q ( X ) | < Q J . instead of Q ( x ) < Q £ , as i n a l l other problems. This change has the e f f e c t of postponing the moment of breakdown of the algorithm, thus improving l o c a t i o n convergence s l i g h t l y . Results i n d i c a t e ( F i g . 15) that the algorithms break down when the noise i s of about the same magnitude as the value of the q u a l i t y function without the noise. KRS and ASSRS stopped because the step s i z e had become too small, whereas ARS continued on i n e f f e c t u a l l y u n t i l the preassigned computing e f f o r t had been exhausted. Nevertheless, up to the point of breakdown, the e f f i c i e n c y was not appreciably a f f e c t e d . Location convergence, as expected, was extremely poor. Summary of Numerical Experiments On the basis of these comparisons i t i s concluded that KR.S Test Function VII Test Function I 500 1000 1500 2000 Function Evaluations Figure-14 E f f e c t of Noise - Test Function VII Ln 4 2 Test Function VIII Test Function I Method Breaks Down 2000 Figure 15 E f f e c t of Noise - Test Function VIII i s the best general random search method when there are fewer than about 10 parameters to be optimized. This l i m i t a t i o n to low dimensions would become even more apparent i f computation times were taken into account. This i s due to the large number of matrix m u l t i p l i c a t i o n s and inversions that are involved. The next best method appears to be ARS. I t performs poorly, however, when there are sharp, discontinuous v a l l e y s i n the performance surface, and i t i s very s e n s i t i v e to parameter changes. The l a t t e r c h a r a c t e r i s t i c might be an advantage i f the e f f e c t s of the parameter changes were known i n advance for then the algorithm could be adjusted to the problem at hand. ASSRS i s l e a s t desirable as a general purpose minimization algorithm because i t i s severely r e s t r i c t e d i n scope, being able to handle e f f i c i e n t l y only r e l a t i v e l y w e l l scaled problems, such as Test Functions I, V, VI and VII. In t h i s narrow f i e l d , however, i t i s the most e f f i c i e n t method. A l l three random search methods have shown themselves to be immune to r e l a t i v e l y small amounts of noise, which corroborates the findings of Brooks, Weinberger, Pensa, and others. In this respect they have a d e f i n i t e advantage over most de t e r m i n i s t i c methods, e s p e c i a l l y those r e q u i r i n g computation of d e r i v a t i v e s by taking f i n i t e d i f f e r e n c e s [16, 18, 41.]. In those problems not contaminated with noise, however, de t e r m i n i s t i c methods were found to be more e f f i c i e n t than random methods. Results of Test Functions I, V and VII show that KRS, l i k e ASSRS, has l i n e a r convergence for quadratic q u a l i t y functions, with or without small amounts of noise. In other words, the r e l a t i o n between log Q(x) and K, the number of function evaluations, i s very nearly a s t r a i g h t l i n e . Even Matyas' ARS e x h i b i t s t h i s property f o r as long as the standard deviation -O i s far from i t s lower l i m i t a . I t suggests 38 the l i n e a r i t y may be c h a r a c t e r i s t i c of a large c l a s s of random search methods when applied to quadratics. 39 4- GENERAL DISCUSSION OF RANDOM SEARCH METHODS Other Random Search Algorithms As stated above, the three algorithms compared here are only a sampling of the available random search methods. A few additional methods are l i s t e d below, i n chronological order, further i l l u s t r a t i n g the variety of approaches that have been used. Brooks [5,6,17] experimented with a number of simple random search methods, both simultaneous and sequential. Matyas' ARS may be considered an elaboration of hi s "creeping" or sequential random search. Karnopp's [l9,20] "narrow range scan with moving mean" and Gall's [ l 5 ] "exponential random search" are two more sequential methods which may be considered descendents of creeping random search. Karnopp has further suggested use of histograms for gathering global information about the q u a l i t y function, and he investigated the problem of choosing between global and l o c a l search methods. Weinberger's [41] "multistage random search" uses the concept of the hypervolume of a suitably defined allowable region i n parameter space. As the hyparvolume i s reduced i n stages the allowable region con-verges to a small neighborhood i n parameter space which has a certain p r o b a b i l i t y of containing the optimum. Other methods are the "Variable Structure Method" [28], "Gradient Biased Random Search" [35] and. a method due- to Lawson and Lavi that may be described as a sophisticated version of ASSRS [24]. .Certain universal adaptive optimization codes make use of one or more of the simpler random search methods [14]. A number of random search algorithms have appeared in.foreign language journals, especially Russian technical journals. Some of the authors are Rastrigin, Chichinada^ Zakharov, Vaysbord, Yudin and Matyas. 40 One of the l a t e s t papers on random search methods proposes a new method of finding optimal control functions. The time function i s approxi-mated by a f i n i t e , ordered sequence of independent parameters. These para-meters are then optimized by using a random search method i d e a l l y suited for the hybrid computer. I t i s not the only method of finding optimal control functions by random search, but i t i s the most dir e c t and the least r e s t r i c t i v e method [23]. The E f f i c i e n c y of Random Search Methods. Random search methods are frequently described as being e f f i c i e n t [ l 9 , 15> 41, 38]. Unless the reader pays close attention to the context i n which such statements are made he may very e a s i l y conclude that, i n some cases at lea s t , random search methods are more e f f i c i e n t than the best available deterministic method. Such a case has not been proven to e x i s t . Karnopp [19] gives a lengthy proof to show that h i s proposed random search method i s more e f f i c i e n t than a certain deterministic method. This "deterministic" method i s actually a complicated extension of the gradient method so as to make i t applicable to multimodal functions. For t h i s purpose random s t a r t i n g points are used. S t r i c t l y speaking, therefore, he i s simply comparing one random search method with another. Gradient methods by themselves simply do not apply to multimodal functions. I t i s therefore not a v a l i d comparison of random and deterministic methods. Gal l describes his exponential random search as simple and e f f i c i e n t , but does not make a quantitative comparison with any deterministic a l t e r n a t i v e . In the same work [15] he then shows that the number of function evaluations required by his method varies with the dimension approximately as 2n~"'", provided n i s less than about 5. This exponential dependence on n does 41 not indicate e f f i c i e n c y f o r high n. Furthermore, he concludes that "the e f f i c i e n c y of the simpler methods could stand improvement". E f f i c i e n c y i n t h i s context, therefore, means the method i s not so i n e f f i c i e n t as to render i t impracticable. Weinberger compares the e f f i c i e n c y of h i s multistage random search with two deterministic methods, the dichotomous method and a form of gradient method. The dichotomous method i s known to be i n e f f i c i e n t f o r multivariable problems, and the pa r t i c u l a r gradient method considered requires the evaluation of t h i r d derivatives, which again makes t h i s method quite i n e f f i c i e n t [41]. Current conjugate d i r e c t i o n methods require only f i r s t derivatives to be evaluated. This comparison therefore proves only that some random search methods are more e f f i c i e n t than some i n e f f i c i e n t deterministic methods. Schumer [38,39] has compared the ef f i c i e n c y of ASSRS to the ef f i c i e n c y of the Newton-Raphson method. When comparison was r e s t r i c t e d to problems with hyperspherical constant-Q surfaces, he found that f o r high dimensions and for l i m i t e d accuracy ASSRS i s the more e f f i c i e n t algorithm. The most e f f i c i e n t method i n that case, however, i s the gradient method. Following Rastrigin, he has also shown that fixed-step size random search i s more e f f i c i e n t than fixed step size gradient search, provided that consideration i s r e s t r i c t e d to the hypersphere and the step size i s chosen small enough. This r e s u l t , however, i s not very surprising nor p r a c t i c a l l y very s i g r i f i c a n t . For such w e l l scaled problems a gradient method would be designed to take the optimum step s i z e , i . e . , i t would converge after one li n e a r search i n the gradient d i r e c t i o n . Other authors [4,36] consider random methods i n e f f i c i e n t and 42 evidently also unimportant. They relegate such methods to i n c i d e n t a l aspects of the search, such as applying dither to gradient methods and exploration around an apparent optimum to determine whether or not i t i s indeed an optimum. S t i l l others concede the general i n e f f i c i e n c y of random methods, but f i n d that they have certain compensating advantages [22]. I t i s also the conclusion reached by t h i s author. In addition to the evidence of numerical experiments there i s a good p l a u s i b i l i t y argument to support t h i s point of view. I f the q u a l i t y function can be described by a r e l a t i v e l y simple mathematical expression i t means that a great deal i s known about the function. Methods that make e f f i c i e n t use of t h i s information may therefore be expected to be more e f f i c i e n t than those making very l i t t l e or no use of such information. Random methods become " e f f i c i e n t " only when the function i s so badly behaved that deterministic methods become very i n e f f i c i e n t or break down altogether. 43 Advantages of Random Search Methods. The main advantage of random methods i s that they are applicable to a larger class of functions than any other method. This i s due to the fact that they require a minimum of r e g u l a r i t y conditions on the q u a l i t y function, which may be discontinuous, multimodal, moderately noisy, slowly time-varying [ l 9 ] , or otherwise badly behaved. This makes possible the use of more r e a l i s t i c , hence more complicated, performance c r i t e r i a , such as Gall's max-ranklng matrix [ l 5 ] . Simultaneous random search makes possible the i n v e s t i g a t i o n of global properties of functions. For example, t h i s method may be used to locate regions i n parameter space which are l i k e l y to contain the global optimum and to provide some measure of the r e l a t i v e extent of these regions. Sometimes t h i s i s the most d i f f i c u l t part of the optimization problem, the l o c a l search being merely a refinement of the approximate resu l t given by the global method [ l 9 J . Another advantage of random methods i s t h e i r f l e x i b i l i t y . Random search methods may be designed for digital,, analogue or hybrid computers, depending upon the problem at hand. The search method may be r e l a t i v e l y complicated [22], as when a large computer i s available, or i t may be simple enough to be suitable for an inexpensive r e a l time optimizer such as described by Matyas. Various kinds of a p r i o r i knowledge about the q u a l i t y function may be incorporated into the algorithm. Thus, the d i s -t r i b u t i o n of t r i a l points may be chosen so that certain regions of parameter space are searched more int e n s i v e l y than others. In fact, 44 the very absence of any inherent structure of random methods allows the designer of the algorithm to structure i t according to the available hardware, the available a p r i o r i information, and the intended application of the. solution. Random methods also have a number of r e l a t i v e l y minor advantages. The boundaries of the search space generally need not be specified exactly, most random search methods are r e l a t i v e l y easy to program, and some methods can solve inequality constrained problems with l i t t l e more d i f f i c u l t y than unconstrained problems [22]. Suggestions f o r Further Research This study suggests that ARS i s the algorithm most amenable to further improvement. I t may be possible to establish guidelines for the choice of the c^ and d^ parameters according to the properties of the function to be minimized. The transformation matrix T may be computed i n such a way as to make more e f f i c i e n t use of the previous hist o r y of the search. Computation of T from the e l l i p s o i d of concentration of successful t r i a l points may be investigated i n t h i s connection. There i s considerable s i m i l a r i t y between ASSRS and gradient methods. Each method i s the most e f f i c i e n t i n i t s class when the constant-Q surfaces are hyperspherical, and both deteriorate ra p i d l y i n e f f i c i e n c y as these surfaces become progressively less spherical. A closer comparison between these methods may y i e l d additional insight into t h e i r convergence properties. The e f f i c i e n c i e s of these methods could be compared on the basis of test functions s i m i l a r to Test Function V. The relevant parameters to be varied are the dimension and the e c c e n t r i c i t y of hyperellipsoid. Such a comparison may show that ASSRS i s always less e f f i c i e n t than a good 45 p r a c t i c a l version of the gradient method when i t applies. " I t i s an exercise of the imagination to devise schemes for using e s s e n t i a l l y l o c a l information for speeding convergence to a l o c a l maximum"[19]• More comparisons are desirable f o r new or e x i s t i n g random search algorithms and t h e i r prospective competitors, both deterministic and random. Such comparisons can guide the user i n the choice of an algorithm f o r a given optimization problem. Though i t i s easy to design an algorithm that works, i t i s more d i f f i c u l t to design one that works e f f i c i e n t l y . Random search methods may be considered as strategies f o r using a machine to do by brute force what cannot be accomplished by the more economical a n a l y t i c a l methods, or methods derived from these. Because the c a p a b i l i t i e s of the machine are l i m i t e d as w e l l , analysis i s s t i l l required to devise e f f i c i e n t strategies. Various passages i n the l i t e r a t u r e [ l 9»34]> as well as p l a u s i b i l i t y arguments, suggest that p r i n c i p l e s and methods of other branches of science, such as Pattern Recognition and Information Theory, may be brought to bear on the problem of designing e f f i c i e n t random search algorithms. Such approaches may be p a r t i c u l a r l y useful i n connection with the d i f f i c u l t problems presented by performance surfaces which are m u l t i -modal or have steep, curving va l l e y s . In t h i s connection i t i s i n t e r e s t i n g to note that close analogies have been drawn between sequential random search and the theory of evolution [3, .5,12,19]. Each species may be thought to be a l o c a l optimum, each mutation Being a new t r i a l point, and the s u r v i v a l of l i f e being the performance index. Analogies have also been found between random search, knowledge processes and the evolution of s o c i a l i n s t i t u t i o n s [ 3 ,7 ] . Such analogies indicate that random search i s a natural method of solving problems with l i t t l e a p r i o r i knowledge about the qua l i t y function. One could possibly obtain useful ideas f o r the construction of random search algorithms by a study of these natural random search processes. 47 .5. CONCLUSION Among the three reconstructed algorithms compared in this study Kjellstrom's Random Search performed best provided that the number of variables was small. In the special case of spherical constant-Q surfaces ASSRS was most effic i e n t . ARS was neither noticeably superior nor i n -ferior i n any case, but i t has the greatest potential for further improve-ment. The ASSRS algorithm used here i s somewhat more efficient than the original because of a small change that has been introduced deliberately. It was verified that use of a truly uniform random vector distribution produced no appreciable improvement i n the efficiency of the method. A special algorithm was designed to generate these uniformly distributed random vectors. ASSRS and KRS were found to have certain linear convergence properties. A l l three algorithms were found to be immune to small amounts of noise. But i n relation to deterministic methods they were a l l rather ine f f i c i e n t . A general discussion of random search methods points out the variety of approaches that have been used i n the construction of such algorithms. The advantages of random search methods were discussed and suggestions were made for further research. The conclusion reached is that random search methods are more widely applicable than deterministic methods and more flexible to the problem at hand. As computing machines become progressively faster, and as more efficient algorithms are developed, i t may be expected that random search methods w i l l be more widely used, especially where regularity conditions are minimal. 48 Appendix Random Vector Generator This i s an algorithm for the generation of random vectors uniformly d i s t r i b u t e d over the surface of the unit n-sphere. Such vectors are assumed to be a v a i l a b l e i n the OSSRS algorithm upon which ASSRS i s based. However, i n ASSRS the components of th i s random vector are approximated by making each component a random v a r i a b l e uniform on the i n t e r n a l (-1,1) and then normalizing, which does not r e s u l t i n a uniform d i s t r i b u t i o n . The f i r s t step i s to derive the p r o b a b i l i t y density f u n c t i o n of the components r_^ of the random vector r . Because of the uniformity of the d i s t r i b u t i o n a l l n components have the same density function. The following d e r i v a t i o n proceeds by analogy to three dimensional geometry and i s simpler than that of R a s t r i g i n [37]. A ring-shaped element of surface area dS^ on a sphere of radius P 3 and at a distance r^ on the x^-axis i s given by the formula ds 3 = s ^ where S 2 = 2 ITP 2 P 2 = p 3 s i n (j) r3 = p3 C O S ^ S 2 i s inte r p r e t e d as the "surface area" of the two dimensional "sphere" which r e s u l t s from the i n t e r s e c t i o n of the sphere 3 2 2 Zx = p 1 and the plane X 3 = r3 49 In t h i s case the "surface area" i s a c t u a l l y the circumference of the c i r c l e of radius p^. Generalizing to n dimensions, a s i m i l a r element of surface area d S n on a hypersphere of radius p n and at a distance r n on the x n ~ a x i s i s defined by the formula dS - S , p d A n n-1 n v where S , i s again int e r p r e t e d as the surface area of the (n-1) -dimensional n-1 ° hypersphere which r e s u l t s from the i n t e r s e c t i o n of the hypersphere 2 i i n £ x_. = p1 and the hyperplane x = r n n The formula f o r the surface of an n-dimensional hypersphere i s given by [10] s . , 2 r n ( i / 2 ) f t . n - 1 n r (n/2) To complete the analogy one requires the d e f i n i t i o n s p n - l =Pn s i n *• r n = p n coscb In f i g u r e 16, which i l l u s t r a t e s the d e f i n i t i o n of these v a r i a b l e s , only two coordinate axes are shown. Any other axes, for n^3, w i l l be orthogonal to each other and to the plane of the page. If random points are uniformly d i s t r i b u t e d over the surface of t h i s hypersphere of radius then the p r o b a b i l i t y P (<!>) °f a n v point f a l l i n g i n the element of surface area dS i s p r e c i s e l y the r a t i o dS , i . e . , n n S n p (cb) dd> = dS n n S n S _ p deb n-1 n r S n x , - AXIS n-1 Figure 17 Generation of Random Numbers by a Refined Acceptance-Rejection Method. 51 therefore (1) Pn(<i>) = G n s i n 1 1 <j> 0 £<J>£TT , n>l where G = rin/2) n r d / 2 ) r ( — ^ This r e s u l t agrees with that of R e s t r i g i n . In order to conserve computer time f o r the implementation of the algorithm i t i s convenient at th i s point to change v a r i a b l e s from d i r e c t i o n angles to d i r e c t i o n cosines. Let X = cos <j> then dX = -sin<j>dcf> It i s desired to express p (X), the p r o b a b i l i t y density function of n d i r e c t i o n cosines, i n terms of Pn(<J>), the p r o b a b i l i t y density function of d i r e c t i o n angles. Reasoning from f i r s t p r i n c i p l e s one may set P n ( » |d<}>| = P n ( x ) j d x | This r e l a t i o n expresses the condition that the p r o b a b i l i t y of f i n d i n g a random d i r e c t i o n angle i n the i n f i n i t e s i m a l l y small i n t e r v a l (<j) ,<j>+d<j)) i s equal to the p r o b a b i l i t y of f i n d i n g the corresponding d i r e c t i o n cosine i n the corresponding i n t e r v a l (X, X + dX) , The absolute value signs are required because X decreases as <j) increases. Substituting f o r dX and so l v i n g for p (X), n p (X) = p 0)/sin<j> n n (2) P n ( x ) = G n ( l - X 2 ) ( n ~ 3 ) / 2 - U X * + - - 1 , n>l A more general and rigorous procedure f o r changing the v a r i a b l e s of p r o b a b i l i t y density functions i s given i n [ 1 0 ] . The problem now i s to generate random vectors r the components of which have density function p n ( X ) . A s i m i l a r analogy w i l l now be used i t e r a t i v e l y . When generating a uniformly d i s t r i b u t e d random vector i n three dimensions, r ^ i s chosen from a uniform d i s t r i b u t i o n on the i n t e r v a l (-1,1), which amounts to s e t t i n g n=3 i n formula (2). Then another point i s chosen on the circumference of that c i r c l e , with radius P 2 > which has been determined by the choice of r ^ . The d i s t r i b u t i o n of points on the circumference of t h i s c i r c l e i s again uniform. Choice of the second component amounts to a r e p e t i t i o n of the same process as choice of the t h i r d component r ^ , except that the dimension has been reduced by one and the radius of the sphere has been reduced by the 2 1/2 fa c t o r sincb = (1-X ) I t i s noted also that the repeated process i s c a r r i e d out i n the subspace again defined by the i n t e r s e c t i o n of the sphere of radius and the plane x ^ r ^ , j u s t as i n the f i r s t analogy. Choice of also determines r ^ from the requirement of unit length f or Generalizing to n dimensions, the algorithm for computing the n components r ^ i s as follows: 1) Set p = 1 n • 2) Set i = n 3) Choose X. from p. I l (X) A) Scale r. = p.X. l i i 5) Proj ect p. n = p. J i - l l ( i - x2 ) 1 / 2 l 6) Decrement i 7) If i > l go to step (3) 8) Set X = +1, with equal p r o b a b i l i t y 9) Set r x = p 1 X 1 In t h i s context X^ denotes a r e a l i z a t i o n of the random v a r i a b l e with p r o b a b i l i t y density function p^(X). 53 This procedure y i e l d s random vectors of u n i t length, independently of the functions p^(X), provided only that | ] $ 1 . I N I ^ r 2 1 2 2 2 = X +X \ (1-X )+ n n-1 n + x J ( l - X 2 , ) ( l - X 2 , ) . . . ( l - X 2 ) „ n-1 _ n = X + Z [XT n (1-X7)] J=l J 1=J+1 Taking into account the d e f i n i t i o n of X^ the second term may be telescoped 2 to 1-X , r e s u l t i n g i n n l l r l l - 1 . As t h i s procedure by analogy does not c o n s t i t u t e a proof, the question about whether or not r i s uniformly d i s t r i b u t e d on the hypersphere was v e r i f i e d experimentally. A histogram was constructed of the projections of these random vectors onto some unit reference vector. As the number of sample random vectors increases without l i m i t , t h i s histogram must tend to the above density function when correspondingly scaled. Experiments with several randomly chosen reference vectors i n 5,6 and 7-dimensional spaces indicated that the d i s t r i b u t i o n s were indeed uniform. A sample histogram i s shown i n f i g u r e 18, i n t h i s case the random vectors are projected onto one of the coordinate axes. During the implementation of t h i s algorithm the t h i r d step, choosing X_^ from p^(X), proved troublesome. Since p_^(X), cannot be integrated and solved f o r X the Inverse Method of random number generation could not be applied [1]. Instead, a r e f i n e d version of the Acceptance-Rejection method has been used. The area under the curve p^(X) was covered with non-overlapping rectangles, which i n turn were uniformly covered with randomly chosen points. Those points f a l l i n g under the curve were accepted and the rest were rejected XXX Data Points of Histogram Density Function p 7(X) X RXIS Figure 18 Sample Histogram, of 100 000 7-Dimensional Random Vectors Projected onto x^-Axis i_ri -P-[ f i g . 17]. For large n t h i s method i s more e f f i c i e n t than a simple Acceptance-Rejection method because P n(X) tends to the impulse function as n tends to i n f i n i t y . Also, f o r n$3 the whole procedure was abandoned i n favour of one of the simpler methods a v a i l a b l e f or three dimensional d i s t r i b u t i o n s , thus avoiding the problems posed by P2(X), which i s i n f i n i t e at +1. The algorithm i s somewhat les s e f f i c i e n t than the method des-cribed by Muller [30], which simply scales a random vector from a r a d i a l l y symmetric n-dimensional Gaussian d i s t r i b u t i o n for unit length. The algorithm was used i n t h i s work to generate random vectors i n up to 30 dimensions. No analysis of the propagation of truncation errors was c a r r i e d out, however. These may be s i g n i f i c a n t for high n. The r e l a t i v e l y simple geometry of the hypersphere suggests that i t may be pos s i b l e to s i m p l i f y t h i s algorithm. 56 REFERENCES 1. M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions, Dover Publications, Inc. (1965). 2. G. A. Bekey, M. H. Gran, A. E. Sabroff and A. Wong, "Parameter Op-timizat i o n by Random Search Using Hybrid Computer Techniques", Proc. AFIPS 1966 F a l l Joint Computer Cong., v o l . 29, pp. 191-200. 3. H. J. Bremermann, M. Rogson and S. Salaff, "Search by Evolution", Proc. Second Cybernetic Sciences Symposium: Biophysics and Cybernetic Systems, pp. 157-167 (1965). 4. N. S. Bromberg, "Maximization and Minimization of Complicated Multivariable Functions", AIEE Trans. Com. and E l e c , v o l . 58, pp. 725-730 (1962). 5. S. H. Brooks, "Discussion of Random Search Methods for Locating Surface Maxima", Op. Res., v o l . 6, pp. 244-251 ( l 9 5 8 ) . 6. "A Comparison of Maximum Seeking Methods", Op. Res., v o l . 7, pp. 430-457 (1959). 7. D. T. Campbell, "Blind Variation and Selective Survival as a General Strategy i n Knowledge Processes"-, Self-Organizing Systems (M.G. Yovits and S. Cameron, eds.), Pergamon Press, I960. 8. V. K. Chichinadze, "Random Search to Determine the Extremum of Functions of Several Variables", Eng. Cybernetics, 1967, pp. 115-123. 9. L. D. Cockrell, "On Adaptive Random Search Techniques", IEEE Proc. Seventh Symp. Adaptive Processes> UCLA, 1968. 10. H. Cramer, Mathematical Methods of S t a t i s t i c s , Princeton Univ. Press, 1945. 11. B.M.E. De S i l v a , "The Application of Nonlinear Programming to the Automated 57 Minimum Weight Design of Rotating Discs", Optimization (R. Fletcher, ed.), Academic Press Inc., 1969« 12. R. E. Favreau and R. Franks, "Random Optimization by Analog Techniques", Proc. Second Int. Analog Computation Meeting, Presses Academiques Europeenes, 1959-13. R. Fletcher and M. J. D. Powell, "A Rapidly Convergent Descent Method for Minimization", Comp. J., v o l . 6, pp. 163-168 (1963). 14. M. M. Flood and A. Leon, "A Universal Adaptive Code for Optimization", Recent Advances i n Optimization Techniques (A. Lavi and T. P. Vogl, eds.), Wiley and Sons, Inc., 1966. 15. D. A. G a l l , "A P r a c t i c a l Approach to the Optimal Control of Systems with Applications to the Control of a Submarine i n a Random Sea", Ph. D. Thesis, MIT, 1964. 16. L. S. Gurin and L. A. Rastrigin, "Convergence of the Random Search Method i n the Presence of Noise", Automation and Remote Control, v o l . 26, pp. 1505-1511 (1965). 17. R. Hooke and J. A. Jeeves, "Comments on Brooks' Discussion of Random Search Methods", Op. Res., v o l . 6, pp. 881-882 ( 1958) . 18. J. M. Idelsohn, "10 Ways to. Find the Optimum", Control Eng., v o l . 11, pp. 97-102 (1964). 19. D. C. Karnopp, "Search Theory Applied to Parameter Scan Optimization Problems", Ph.D. Thesis, MIT, 1961. 2 0 . "Random Search Techniques for Optimization Problems", Automatica, v o l . 1, pp. 111-121 (1963). 21 . W. P. Kavanaugh, E. C. Stewart and D. H. Brocker, "Optimal Control of S a t e l l i t e A l t i t u d e Acquisition by a Random Search Algorithm on Hybrid Computers", Proc. AFIPS 1968, pp. 443-452. 58 22. G. K j e l l s t r o m , "Network Optimization by Random V a r i a t i o n of Component Values", Ericson Technics, no. 3, 1969. 23. G.A. Korn and H. Kosako, "A Proposed Hybrid-Computer Method for Functional Optimization", IEEE Trans. Computers, Feb. 1970. 24. G.P. Lawson and A. L a v i , "Monte Carlo Programming: Random Searching i n the Solution of Parameter Optimization Problems',' 1970. 25. A. Leon, "A C l a s s i f i e d Bibliography on Optimization", Recent Advances i n Optimization Techniques (A. Vavi and T.P. Vogl. eds.), 1966. 26. J . Matyas, "Random Optimization", Automation and Remote Control, v o l . 26, pp. 244-251 (1967). 27. "Das z u f a e l l i g e Optimierungsverfahren und seine Konvergenz", F i f t h Intern. Analogue Computation Meetings, Proceedings,' v o l . 1, pp. 540-544 (1967). 28. J.M. Mendel and K.S. Fu (eds.), Adaptive, Learning and Pattern Recognition Systems (chapter 7), Mathematics i n Science and Engineering, v o l . 66, Acedemic Press, 1970. 29. J.M. Mendel and J . J . Zapalac, "The A p p l i c a t i o n of Techniques of A r t i f i c i a l I n t e l l i g e n c e to Control System Design", Advances i n Control Systems, v o l . 6, Academic Press, 1968. 30. M.E. Muller, "A Note on a Method for Generating Points Uniformly on n-Dimensional Spheres", Comm. ACM, v o l . 2, pp. 19-20 (1959). 31. J.K. Munson and A.I. Rubin, "Optimization by Random Search on the Analog Computer", IRE Trans. EC-8, pp. 200-203 (1959). 32. B.A. Murtagh and R.W.H. Sargent, "Computational Experience with Qu a d r a t i c a l l y Convergent Minimization Methods", Comp. J . , v o l . 43, no. 2 (1970). 33. J.A. Nelder and R. Mead, "A Simplex Method"for Function Minimization", 59 Comp. J . v o l . 7, pp. 308-312 (1965). 34. Y.I. Neymark and R.G. Strongin, "Information Approach to Finding the Extremum of Functions", Eng. Cybernetics, 1966, no. 1. 35. A.F. Pensa, "Gradient Biased Random Search", Ph.D. Thesis, Pennsyl-vania State U n i v e r s i t y , 1969. 36. D.A. P i e r r e , Optimization Theory with A p p l i c a t i o n s , Wiley and Sons, 1969. 37. L.A. R a s t r i g i n , "The Convergence of the Random Search Method i n the Extremal Control of a Many Parameter System", Automation and Remote Control, v o l . 24, pp. 1337-1342 (1963). 38. M.A. Schumer, "Optimization by Adaptive Random Search", Ph.D. Thesis, U n i v e r s i t y of Princeton, 1968. 39. M.A. Schumer and K. S t e i g l i t z , "Adaptive Step Size Random Search", IEEE Trans. AC-13, pp. 270-276 (1968). 40. E'.M. Vaysbord, "Asymptotic Estimates of the Rate of Convergence of Random Search", Eng. Cybernetics, 1967, no. 4. 41. M.R. Weinberger, "Multistage Random Search and Automatic Network Synthesis", Ph.D. Thesis, MIT, 1966. 42. D.J. Wilde, Optimum Seeking Methods, Prentice H a l l , 1964.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Comparison of three random search methods
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Comparison of three random search methods Borowski, Nick 1971
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Comparison of three random search methods |
Creator |
Borowski, Nick |
Publisher | University of British Columbia |
Date Issued | 1971 |
Description | Three recent random search algorithms are compared on the basis of efficiency, and on the basis of sensitivity to noise, scaling and the number of variables. A general discussion of random search methods points out their advantages and disadvantages in relation to deterministic methods. A new random vector generator is described in the appendix. |
Subject |
Computer programming Algorithms |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-05-03 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0093288 |
URI | http://hdl.handle.net/2429/34218 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1971_A7 B67.pdf [ 2.66MB ]
- Metadata
- JSON: 831-1.0093288.json
- JSON-LD: 831-1.0093288-ld.json
- RDF/XML (Pretty): 831-1.0093288-rdf.xml
- RDF/JSON: 831-1.0093288-rdf.json
- Turtle: 831-1.0093288-turtle.txt
- N-Triples: 831-1.0093288-rdf-ntriples.txt
- Original Record: 831-1.0093288-source.json
- Full Text
- 831-1.0093288-fulltext.txt
- Citation
- 831-1.0093288.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0093288/manifest