UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Optimized relative step size random search Choit, Mark David 1973

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-UBC_1973_A7 C46_3.pdf [ 2.46MB ]
Metadata
JSON: 831-1.0101426.json
JSON-LD: 831-1.0101426-ld.json
RDF/XML (Pretty): 831-1.0101426-rdf.xml
RDF/JSON: 831-1.0101426-rdf.json
Turtle: 831-1.0101426-turtle.txt
N-Triples: 831-1.0101426-rdf-ntriples.txt
Original Record: 831-1.0101426-source.json
Full Text
831-1.0101426-fulltext.txt
Citation
831-1.0101426.ris

Full Text

OPTIMIZED RELATIVE STEP SIZE RANDOM SEARCH by MARK DAVID CHOIT B. S c , Un i v e r s i t y of B r i t i s h Columbia, 1970 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE i n the Department of E l e c t r i c a l Engineering We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA May 1973 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the Head of my Department or by h i s representatives. It i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of ELECfRtC/lL hNCrlNb The University of B r i t i s h Columbia Vancouver 8, Canada ABSTRACT A t h e o r e t i c a l technique f o r the minimization of a function by a random search i s presented. The search i s a modification of the Optimum Step Size Random Search of Schumer and S t e i g l i t z to include r e v e r s a l s . A theory f o r updating the step s i z e i s presented upon which an implemen-t a t i o n of a search algorithm s u i t a b l e f or high-dimensional functions with no requirements for de r i v a t i v e evaluations i s based. i TABLE OF CONTENTS Page ABSTRACT 1 TABLE OF CONTENTS i i LIST OF ILLUSTRATIONS . i v LIST OF TABLES v ACKNOWLEDGEMENTS v i CHAPTER I INTRODUCTION 1.1 The Optimization Problem 1 1.2 Scope and Aim of Thesis 2 CHAPTER II UPDATING THE STEP SIZE 2.1 Introduction 3 2.2 Statement of the Problem 3 2.3 Previous Work 4 2.4 Updating the Step Size 6 CHAPTER I I I INCREASING THE PROBABILITY OF SUCCESS BY.USING REVERSALS 3.1 Introduction 12 3.2 The E f f e c t of Using Reversals 12 3.3 The Search Algorithm 14 3.4 The I n i t i a l Estimation Phase 18 3.5 The Search Phase 19 3.6 Re-estimation of n 20 3.7 Results of Experiments 23 CHAPTER IV THE UNNORMALIZED RANDOM VECTOR GENERATOR 4.1 Introduction 31 4.2 D e f i n i t i o n of s and n f o r the unnormalized vector generator 31 4.3 A p p l i c a t i o n of the ORSSRSWR to functions other than the Hypersphere 35 4.4 Comparison of the ORSSRSWR to other searches 37 CONCLUSIONS 40 Suggestions f o r Future Research 40 APPENDIX A .45 i i Page APPENDIX B . 45 REFERENCES . . . . . 49 i i i LIST OF ILLUSTRATIONS Figure Page 2.1 A cross section of parameter space 4 2.2 A cross s e c t i o n of parameter space showing the range where P^.^ c ^ n 7 2.3(a,b) Average value of n a f t e r a number of moves 11 3.1(a.b) P r o b a b i l i t y of success without and with reversals . . 15 3.2(a.b) Expected improvement per function evaluation f o r the searches without and with reversals 15 3.3(a.b) Average value of n a f t e r a number of moves when reversals are used . . 21 3.4 Flow chart of the ORSSRSWR 24 3.5 Performance of the ORSSRSWR on hyperspheres of various dimensions . 25 3.6 Comparison of the ORSSRSWR with the optimum rate of convergence for dimension 20 27 3.7 Number of function evaluations required f o r a reduction of f(x) by a fac t o r of 1 0 ^ as a function of dimension 28 3.8(a,b) P r o b a b i l i t y density function f o r given n r f o r dimension 10, 20 29 4.1 Performance of the ORSSRSWR with the unnormalized random vector generator . . . . . . 36 4.2 Performance of the ORSSRSWR for some functions of dimension 20 . 38 4.3 Comparison of the ORSSRSWR to the ASSRS 39 i v LIST OF TABLES Table Page 2.1 Optimum n, the expected value of next n and the optimum updating parameter 10 3.1 Comparison of the parameters f or the searches, without and with reversals 16 * 3.2 Table of values for optimum n , the expected value of next n> and the updating parameter a*, for the ORSSRSWR V. 17 4.1 Comparison of the parameters f or the searches using the unnormalized random vector generator, without and with reversals 34 v ACKNOWLEDGEMENT The author wishes to thank Dr. G.F. Schrack for h i s i n t e r e s t , encouragement and many valuable suggestions throughout the research and w r i t i n g of this thesis. Thanks are also due to Miss Norma Duggan f o r typing the t h e s i s . The F i n a n c i a l support of the National Research Council of Canada i n the form of a Research Assistantship i s g r a t e f u l l y acknowledged. v i CHAPTER I INTRODUCTION 1.1 The optimization problem The problem considered i n t h i s thesis i s the optimization of a c e r t a i n class of functions by a random search. The p a r t i c u l a r problem considered here i s the minimization of a nonlinear function f(X) with respect to X, where X i s an N-dimensional parameter vector. Such pro-blems may be encountered i n various areas of Engineering. In co n t r o l problems f o r instance, f(X) may be a f u n c t i o n a l and X = (x^,..., x^) the c o e f f i c i e n t s of the optimal feedback c o n t r o l function, u(t) = x^y^(t) + ... + x^y^Ct) where Y = ( y ^ ( t ) , . . . , y ^ ( t ) ) i s the state vector, or X may be the i n i t i a l costate vector i n optimal c o n t r o l . The problem considered throughout the thesis i s the unconstrained optimization, however con-s t r a i n t s can be handled with the use of a penalty function [10], or i n the case of equality constraints, by d i r e c t e l i m i n a t i o n of some v a r i a b l e s . The methods used 'to optimize a function may be d e t e r m i n i s t i c or random. In de t e r m i n i s t i c searches the d i r e c t i o n of the search i s obtained from some predetermined formula. Most popular d e t e r m i n i s t i c searches use der i v a t i v e s to obtain the d i r e c t i o n of search, these may require only the f i r s t d e r i vatives i n the cases of steepest descent [5, 15], conjugate gradient [4], or v a r i a b l e metric methods [3]. Some searches such as Newton-Raphson [10] use the second d e r i v a t i v e s as w e l l as the f i r s t to determine the d i r e c t i o n of search. I f one counts the evaluation of de r i v a t i v e s as function evaluations, then searches that use f i r s t d erivatives need N+l function evaluations per move, and searches that use second d e r i v a t i v e s need ( N + 1 ) ( N + 2 ) function evaluations per 1 2 move, taking the symmetry of the matrix of second d e r i v a t i v e s i n t o account. Thus f o r high dimensions i t requires many function evaluations before a move can be made. In random searches the d i r e c t i o n of search i s not e n t i r e l y predetermined but i s chosen from some random d i s t r i b u t i o n . There may be various degrees of randomness and determinism i n the d i r e c t i o n of search. The searches of Matyas [6], Wozny and Heydt [16], Pensa and McMurtry [9] obtain the d i r e c t i o n of the search from a combination of the previous d i r e c t i o n plus a random d i r e c t i o n . The search of Schumer and S t e i g l i t z [12] (ASSRS) obtains the d i r e c t i o n of search e n t i r e l y from a random vector but use some degree of determinism i n the magnitude of the random vector. 1.2 Scope and Aim of Thesis The aim of the thesis i s to improve and implement the theore-t i c a l search of Schumer and S t e i g l i t z . An algorithm w i l l be developed and i t s rate of convergence to the optimum discussed. A method of making an e f f i c i e n t use of the random vectors generated w i l l be shown. Throughout the development of the algorithm two b a s i c requirements had to be met: the search should not use any d e r i v a t i v e s , and the search should not be d i f f i c u l t to use for high dimensional problems. I t w i l l be shown i n the thesis that the number of function evaluations required to move a c e r t a i n distance towards the optimum increases l i n e a r l y as a function of the dimensions. For the purpose of obtaining a mathematical a n a l y s i s , the function optimized i s the hypersphere of dimension N. However the algorithm developed i s tested on other functions and a com-parison of the performance i s made. 3 CHAPTER II UPDATING THE STEP SIZE 2.1 Introduction In t h i s chapter, the theory of Schumer and S t e i g l i t z [12] which i s based upon the work of R a s t r i g i n [11], and Mutseniyeks and R a s t r i g i n [8] i s presented. A modification i s made to the d e f i n i t i o n of the expected improvement normalized by the present value of the func-t i o n value, to make i t applicable to a more general class of functions. A theory for updating the step s i z e i s presented and the r e s u l t s of some simulation experiments are shown. 2.2 Statement of the Problem The problem considered i n t h i s thesis i s the optimization of an unimodal function with N parameters without constraints. The algorithm and the analysis are based on a p a r t i c u l a r function: namely, the hyper-sphere of dimension N, but i t w i l l be shown l a t e r that the algorithm can be extended to other functions. The algorithm for the minimization of N f(x) = / = I xf (2.1) i = l can be summarized by the following steps: i ) Given a function with X and N-vector and a s t a r t i n g point X Q . i i ) Find the "optimum" step s i z e s . i i i ) Choose a random vector A X being a unit vector uniformly d i s -t r i b u t e d on the surface of a hypersphere of dimension N. i v ) Test the t r i a l point X T = X + s * A X (2.2) o i f f (X*") < f ( X ) then the t r i a l i s c a l l e d a success and a move o X Q -> X 1 " i s made and step i i ) i s repeated, else step i i i ) i s repeated. 4 The procedure is continued un t i l some stopping condition is met. The preceding algorithm, named "Optimum Step Size Random Search" (OSSRS) by Schumer and Ste i g l i t z , was analyzed by them as to the expected improvement per function evaluation and the probability of success per function evaluation. However, i t was never implemented because i t re-ft quires the knowledge of the value of the optimum step size, s , at each stage. It is the purpose of this thesis to extend the analysis of the OSSRS and to give an implementation. 2.3 Previous Work Consider Figure 2.1 (due to [11]). The probability of success i s defined to be the probability that the vector AX l i e s in the range where-f(X + sAX) < f(X). X Figure 2.1 A cross section of parameter space. 5 The p r o b a b i l i t y density f o r <f>, where <j> e[0,ir], i s given by [11] as, »<•>-lifer* <2-3> where a(N) = / n / 2 s i n N ~ 2 <j> d<f> (2.4) J 0 and the r e l a t i v e step s i z e ri i s defined by [12], by n = - • (2.5) P The p r o b a b i l i t y of success w i l l be denoted P(N,n) and i s given by [12] as • o ( n ) s i n N - 2 *d+ (2.6) '0 Schumer and S t e i g l i t z also define I(N,n), the expected improve-ment per function evaluation normalized by the present value of the 2 2 function, and E [ A p ], the expectation of A p " over a l l s u c c e s s f u l steps. These are given by r*D(n) E [ A p 2 ] = i _ 0 A P n ( * ) P(*)d* ( 2 . 7 ) 2 where Ap (<j>) i s obtained from Figure 2.1 „ 2 s p cos(<j>) - s i f |<J)| < c|) Ap^(4») = ° (2.8) 0 i f U | >, * 0 2 2 Define now E a [ A p ] as the expectation of Ap over a l l steps, then the 2 2 r e l a t i o n s h i p between E [Ap ] and E [ A p ] becomes: E [ A p 2 ] = P(N,n) E [ A p 2 ] (2.9) and l(N,n) i s redefined as E [ A p 2 ] i(N,n) = -2—2 (2.10) P 6 rather than as defined by [ 12 ] , This d e f i n i t i o n i s more general as i t ft applies to any function of f(p) and not j u s t to the hypersphere with X at X = 0 . Another useful concept introduced by [12] i s the "optimum ft ft step s i z e s , which i s that value of s = np for which n s a t i s f i e s the equation I ' ( N , T O = 0 ( 2 . 1 1 ) ft The computations of P(N,n), I(N,n) and n are given i n Appendix A. 2 . 4 Updating the Step Size Various methods have been proposed f o r updating the step s i z e i n random searches. In the random searches of Matayas [ 6 ] , Schumer and S t e i g l i t z ' [12] "Adaptive Step Size Random Search", and Beltrami and Indusi [1], the step s i z e i s increased a f t e r a s u c c e s s f u l move. For ft the OSSRS however, i t i s desirable that n be kept as close to n as pos s i b l e , since this gives the highest improvement per function evaluation. This suggests that as R decreases so must s i n order that n may remain ft close to n . The objective then i s to f i n d an updating parameter a where Sk+1 = a S k < 2' 1 2> where k = 0, 1 . . . i s the i t e r a t i o n counter such that the expectation of equals n^. The parameter a can be found i n the following way. Consider Figure 2 .2 and assume that a succe s s f u l move occurred. Let P(p) be the p r o b a b i l i t y that P^+i < P f o r p < p^ .. Then from Figure 2 . 2 and equation ( 2 . 3 ) 1 / • * ' . N-2 , ^ ^ J n s i n ((f)) do) »<P> • 2 a ' N ) ° . .., (2-13) 1 r<j> . N - 2 , , . 2 a W / 0 ° S l n U ) d * 7 Figure 2.2 A cross section of parameter space showing the range where P +^-^ can f a l l Now from t r i a n g l e ABC i n Figure 2.2 2 ^ 2 2 s + p - p cos <f>' = — - (2.14) 2p ks and de f i n i n g 6 by 2 ^ 2 2 s + p - p 3 £ k ( } P ks the equation for <f>' becomes • i -1 /Bv Cf)' = C O S ( y ) (2.16) 8 whereupon (2.13) reduces to r { p ) P(N,n) * U ; The p r o b a b i l i t y density f o r given n k i s found i n the following way. D i f f e r e n t i a t i n g (2.17) with respect to p and evaluating i t at p = P ^ - ^ J y i e l d s the p r o b a b i l i t y density 2 2 _ 2 N-3 X 1 l , s k Pk Pk+1 N2, 2 pfc+l { 1 " 4 ( ~ ^ > } P ( pk+lI Pk> = 2a(N) P(N,pk) p k s k " ( 2 * 1 8 ) S k where p e[p P k(l-n k>] and n k = — . k Now define S ^ k + i |\) a s t n e p r o b a b i l i t y density f o r n k +^ given n k where a s k n, - = • From (2.18) the expression f o r the p r o b a b i l i t y density k+1 n k + 1 i s then obtained as 2 N-3 , k { 1 _ ^ ( 1 + _ 1 _ _ ^ _ ) 2 } -2 nw nk+i 8 < V l l V = 3 ( 2 ' 1 9 ) 2a(N) P ( N , r , k ) r, f c ^ where n, ,, e[n, , z ] if •= < 2. k+1 k l _ r l k k The expectation of 1 k + ^ i s given by f 1 _ n k E [ \ + l ! \ ] = y n 8 ( \ + i ' V \ + i d n k + i ( 2 * 2 0 ) k where s 9 Requiring now that Sk+1 n, = TTp 1 -r » (2 .22) k E [ p k + i ! p k ] the updating parameter a becomes \ k E t n k + 1 | n k ] Assuming that n k = 1 , the optimum updating parameter a = a(n ) i s obtained. Table 2.1 shows the values of the optimum values of a,n and the values of ETn^ .^ -^  | ] f o r various dimensions. Note that a f t e r a s u c c e s s f u l move the step s i z e should be decreased. In order to t e s t whether a was a reasonable updating parameter, the following simulation experiments were performed. 1) Given an i n i t i a l n, the step s i z e s = np was c a l c u l a t e d . 2) Random vectors were generated u n t i l a s u c c e s s f u l move was made, then the new value of n ( a f t e r updating the step size) was computed. 3) The procedure was continued u n t i l at some stage n became % 1 or 20 successful moves were made. Figures 2.3 a, b show the average values of n vs. s u c c e s s f u l move number for dimension 10 and 20 r e s p e c t i v e l y . The important point to note about Figure 2.3 i s that on the average n w i l l not change i f s t a r t e d with op-timal n and updated with a . This was found to be true f o r a l a r g e r number of successful moves as the dimension increased. Rather than attempt to implement an algorithm at t h i s point, a further improvement of the search w i l l be proposed and analyzed i n Chapter I I I . This w i l l increase the expected improvement per function evaluation as w e l l as reduce the number of random vectors which need to be generated. 10 Dimension * E(n k + 1|n*) * a 3 .66667 1.0 .66667 4 .58687 .76569 .76646 5 .52982 .64519 .82119 10 .38118 .41510 .91827 20 .27168 .28273 .96089 50 .17260 .17527 .98474 100 .12223 .12316 199243 Table 2.1 Showing optimum n , the expected value of next r i and the optimum updating parameter 11 .3 .0 Dimension=10 a=.9182754 average of 50 0 5 Figure 2.3 (a) 10 15 20 Number of Moves n .4 ,' .3 .2 .1 .0 Dimension=20 a=.9609014 average of 50 0 5 10 15 20 Figure 2.3 (b) Number of Moves Figures 2.3 (a,b) Average value of n after a number of moves 12 CHAPTER III INCREASING THE PROBABILITY OF SUCCESS BY USING REVERSALS 3.1 Introduction In this chapter an improvement w i l l be proposed to the pre-v i o u s l y discussed search. This improvement consists i n the r e v e r s a l of the d i r e c t i o n of search a f t e r an unsecessful t r y . The method w i l l be c a l l e d Optimized Relative Step Size Random Search With Reversals, (ORSSRSWR), and an analysis w i l l be made of the p r o b a b i l i t y of success and the ex-pected improvement per function evaluation. An algorithm w i l l be pro-posed and a method of implementing i t w i l l be discussed. 3.2 The E f f e c t of Using Reversals In the random search of Beltrami and Indusi [1] i t was proposed that i f the t r i a l point Xfc = X + sAX f a i l s to y i e l d an improvement, then the t r i a l point Xfc = X - sAX should be t r i e d , rather than generating a new vector AX. Such a scheme w i l l be c a l l e d a random search with rever-s a l s . The OSSRS of [1], but modified to include r e v e r s a l s , w i l l be analy-zed again. For the search with reversals three corresponding parameters ft w i l l be defined, P r(N,n), I r(N,n) and n r» these are completely analogous ft to P(N,n), I(N,n), and n r e s p e c t i v e l y , i n the search without r e v e r s a l s . The p r o b a b i l i t y of improvement P r(N,n) w i l l be derived f i r s t . Let n be the number of random vectors generated, l e t s^ and S£ be the number of successes on the f i r s t and second ( a f t e r a reversal) t r i a l r e s p e c t i v e l y . The number of re v e r s a l s i s given by n-s^, so that the r a t i o of the t o t a l number of successes to function evaluations becomes s + s (3.1) n + (n - s^) The p r o b a b i l i t y of success i s given by the l i m i t as n •> 0 0 of (3.1) 13 P (N,n) = UZ I1 + *2 • (3.2) r n--+-°° 2n - s^ Define the r e l a t i v e frequency of the i * " * 1 success by A s • f. = — i = 1, 2. (3.3) x n Since a random vector i s equally l i k e l y to f a l l on the l e f t or on the ri g h t h a l f of a hypersphere (where l e f t and r i g h t are defined as two semi-hyperspheres formed by an N-1 dimensional hyperplane containing the o r i g i n and normal at the o r i g i n to the gradient v e c t o r ) , lim f ^ = l i m f^. n -y co n •> 0 0 From Chapter II (3.4) , . l i m — l i m f. , „ c s P(N,n) = n = 1 . (3.5) n 0 0 n 0 0 D i v i d i n g each term i n (3.2) by n and taking the l i m i t as n M together with s u b s t i t u t i o n s (3.4), (3.5) the expression f o r Pr(N,n) reduces to Pr(N,n) = 2-P(N^) ^ P ( N ' n ) ' ( 3 , 6 ) Hence the use of reversals increases the p r o b a b i l i t y of success per function evaluation. The expected improvement per function evaluation i s found as follows. S u b s t i t u t i o n of (2.9) i n t o (2.10) r e s u l t s i n 2 I(N,n) = P(N,n) ^ -A-£-l (3.7) P 2 Since E[Ap ] i s the expectation over the s u c c e s s f u l steps only and does not depend on whether reversals are used or not, the expected improvement per function evaluation for the search with r e v e r s a l s i s given by 2 Ir(N,n) = P r(N, n) (3.8) p 14 S u b s t i t u t i n g (2.9) and (3.6) into (3.8) gives V N ^ > - I ^ N T n T * I ( N ^ • ( 3 ' 9 ) Hence the use of reversals also increases the expected improvement per function evaluation. D i f f e r e n t i a t i n g equation (3.9) with respect to n and s e t t i n g equal to zero y i e l d s the equation f o r the optimum r e l a t i v e * step s i z e with reversals n as r i ' ( N , n * ) ( 2 - P ( N , n * ) ) + P ' ( N , n * ) = 0 (3.10) from which values of n r for given N can be c a l c u l a t e d numerically. Figures 3.1 a, b show the p r o b a b i l i t y of success per function evaluation and Figures 3.2 a, b show the expected improvement per function evalua-t i o n f o r the searches without and with reversals r e s p e c t i v e l y . Table 3.1 compares the p r o b a b i l i t y of success and the expected improvement per function f or the two searches. The p r o b a b i l i t y density f o r 1k+-^> g ( n k + ^ | T i k ) i s the same as before since i t i s the density of a f t e r a s u c c e s s f u l move and does not depend on whether reversals were used or * not. The updating parameter a(n) w i l l be ca l c u l a t e d as before, but * * ft I w i l l now be defined as a(n ). Table 3.2 shows a , n and Etn, ,, n, ] r r r k+11 k f o r various dimensions. 3.3 The Search Algorithm So f a r the search analyzed i s only of t h e o r e t i c a l value, since the search depends on knowing the r e l a t i v e step s i z e n and consequently on the distance to the optimum. In order that a r e a l i z a b l e algorithm may be implemented, a way of estimating n must be found. This estimation must occur not only i n i t i a l l y , but throught the duration of the search, while the step s i z e s and the distance to the optimum p are c o n t i n u a l l y Figures 3.1 (a,b) Probability of success without and with reversals for various dimensions. Figures 3*2 (a,b) Expected improvement per function evaluation for the searches without and-with reversals. Dimension * n P(N, n") I(N, n ) -x- P r (N , n . r ) 2 .78347 , .37101 .23065 .74895 .46582 • .28378 3 .66667 .33333 .14815 .62347 .41565 .17836 4 .58687 .31591 • .10880 .54477 .39343 .12971 5 .52982 .30596 .08589 .48969 .38100 .10183 10 .38118 .28723 .04174 .34938 .35810 .04898 20 o27168 .27857 .02056 .24802 .34760 .02401 50 .17260 .27354 .00815 .15720 .34159 .00949 100 .12223 .27189.. .00406 .11124 .33963 .00473 Table 3.1 Comparison of the parameters for the searches, without and with reversals 17 Dimension * n r * a r 3 .62347 .90587 .68826 4 .54477 .69685 .78177 5 .48969 .58788 .83299 10 .34938 .37821 .92377 20 .24802 .25740 .96356 50 .15720 .15947 .98580 100 .11124 .11203 .99293 Table 3.2 Table of values for optimum • n , the expected value of next n , and the optimium updating parameter a , for the ORSSRSWR 18 changing. Therefore a d i s t i n c t i o n w i l l be made between two phases i n the search, an i n i t i a l estimation phase and the search phase. 3.4 The I n i t i a l Estimation Phase For the case where the function i s a hypersphere an i n i t i a l estimate of r\ could be obtained from estimating the value of I r(N,n) at the s t a r t i n g point and sol v i n g f o r I T However t h i s procedure i s not applicable to other functions and another approach i s required. A more general method of estimating n can be obtained from the p r o b a b i l i t y of success. Pr(N,n) can be estimated from the r e l a t i v e frequency, i . e . the r a t i o of successes to the t o t a l number of function evaluations. An i n i t i a l estimate of n can be obtained by f i r s t guessing the step s i z e , then generating n random vectors and c a l c u l a t i n g the r e l a t i v e frequency. Once an i n i t i a l estimate of r\ has been obtained the step s i z e i s updated and s i s replaced by the best estimate s of s . The best estimate of the step s i z e can be obtained i n the following way: 1) Solve equation (3.11) f o r Pr(N,n) = R (3.11) where R i s the r a t i o of successes to number of function evaluations. 2) Assume that the n obtained i s a good estimate of the ac t u a l n> then obtain s - n p (3.12) But an estimate of the optimum step s i z e i s required i . e . s = P n* . (3.13) S u b s t i t u t i n g (3.12) i n t o (3.13) the equation f o r the best estimate of the s t a r t i n g step s i z e becomes * n r s = s (3.14) n 19 From Figure 3.1b i t can be seen that equation (3.11) w i l l have a s o l u t i o n 2 i f 0 £ R $ —. This may not always be so, therefore a modi f i c a t i o n has to be made. Let the r a t i o R' be the r a t i o of t o t a l number of successes to the t o t a l number of vectors generated. The p r o b a b i l i t y of success per number of vectors generated i s 2P(N,n) or twice the p r o b a b i l i t y of success per function evaluation without reversals because f o r each vec-tor generated both d i r e c t i o n s are t r i e d , hence i t i s equivalent to genera-t i n g random vectors on the surface of a semi-hypersphere only. The a l t e r -nate estimation of r\ i s then to solve f o r r) the equation 2P(N,n) = R' (3.15) rather than (3.11). This equation w i l l always have a s o l u t i o n since 0 $ R' $ 1. Above procedure for estimating n i s s u i t a b l e only f o r the i n i t i a l estimate since no progress can be made while i t i s being used. 3.5 The Search Phase Once an estimate of the i n i t i a l n i s known the step w i l l be reset such that the new n i s as close to n r as p o s s i b l e . Subsequently a f t e r each successful move the step s i z e i s updated by s k + 1 = a* s k (3.16) * Figure 3.3 shows that i f n = n > then on the average n, » n ^ - i n i w i l l be equal to n r for some number, of successes k. However for i n d i -v i d u a l runs there may be deviations from t h i s value, hence a f t e r some number of moves n k must be re-estimated and the step s i z e reset such ft that TI, z n • Th e preceding algorithm w i l l be c a l l e d "Optimum Relat i v e Step Size Random Search with Reversals" (ORSSRSWR). 20 3.6 Re-estimation of n From Figure 3 . 3 i t can be seen that i f n ~ n r then on the average r\ changes approximately linearly as a function of the number of successes. Let M denote the success number since last estimation of n. Then n(M) can be approximated by n(M) = n Q + YM ( 3 . 1 7 ) where n and y are constants. Furthermore from Figure 3 .1 i t can be * seen that for n ~ n , P(N,n) i s also approximately linear, thus the probability of success P can be expressed as P(M) = P - «n(M) ( 3 . 1 8 ) where P and 6 are constants. Let M be the number of successes between o t re-estimation of n. After M successes have occured, n is re-estimated as follows-: Let n be the new estimate obtained from ( 3 . 1 7 ) n = n (M T ) = n D + YM,.. ( 3 . 1 9 ) Let r = Y R> where R' is the previously defined ratio of successes to number of vectors generated. Assuming that r = P , where m n t P = £ I P(M) ( 3 . 2 0 ) t M=l is the average probability of success between two estimations, and where P(M) is obtained from substitition of ( 3 . 1 7 ) into ( 3 . 1 8 ) . Then the following equation is obtained M + 1 f = P - 6 n - Sy ( . ) . ( 3 . 2 1 ) o o / For large M T > the approximation MFC + 1 ~ M can be made. Equation ( 3 . 2 1 ) substituted into ( 3 .19 ) gives an estimate of n after M successful moves 21 .0 I • t i 1 0 5 10 15 20 (a) Number of Moves 0 5 10 15 20 (b) Number of Moves Figures 3.3 (a,b) Average value of n after a number of moves when reversals are used. 22 as n = n + YM . = | D? - r ] - n . (3.22) O t 0 o o At t h i s point, i f n Q were known then n 5 the estimate of n a f t e r M s u c c e s s f u l moves, could be found. However n w i l l normally not be known with s u f f i c i e n t accuracy to permit the type of estimation j u s t discussed, therefore another method w i l l be t r i e d . The method of re-estimating n was derived both h e u r i s t i c a l l y and experimentally. What i s needed i s some function n of r with the following properties: 1) when r = .5 then n Q was approximately equal to zero so that n ~ 0 * * 2) when r = P(N,n ) then n = n because n must have been close to r r o * 3) when r = 0 then no successes have occurred because n was ^ 2 o hence i t w i l l be assumed that n = 2. The function chosen which s a t i s f i e s the previous three conditions i s * S - r * n ( c „ , „ — J R - ) i f r i P(N,n ) r v.5 - P(N.,n*) r y 1 r' n(r) = ? A a 1 r + a 2 r + 2 i f r < P(N,nr) (3.23) where ft ft n - 2 + 4 P(N,n ) r r a 1 P(N,n*)(P(N,n!) - .5) and a 2 " " 4 " 7 V Formula (3.23) was found to give a reasonably good estimate of n. I t was also found that the ORSSRSWR was not c r i t i c a l l y dependent on having a very close estimate of n. It would work with even poorer estimates, although with a slower rate of convergence. 23 3.7 Results of Experiments Figure 3.4 shows the implemented ver s i o n of ORSSRSWR that was used f o r experiments i n this chapter. Throughout the experiments the parameters l i s ted have the following values: STARTS (the number of i n i t i a l successes before n could be estimated) = 20 NMOVE (the number of moves made between re-estimations of n) =20 e (smallest value of the step s i z e s before stopping) = 10 ^  MAXNFE (maximum number of function evaluations before stopping) = 2000 MAXRVG (maximum number of random vectors that could be generated unless a success i s found) = 25 A l l experiments were performed 10 times and the r e s u l t s p l o t t e d are the averages of 10 search runs. Figure 3.5 shows the values of LOG^ (av (f(X))) versus the number of function evaluations (NFE) for the hyper-spheres of dimensions 10, 12, 15, 17, 20, 22, and 25. The h o r i z o n t a l l i n e at 1° g^Q (av(f(X))) = -'4 indi c a t e s the number of function evaluations necessary f o r reducing f(X) by a f a c t o r of 10"^. The rate of convergence of the ORSSRSWR w i l l now be discussed. [12] gave the expected value of f ( x i + - | ^ given f(X^) as E [ f ( X . + 1 ) ] = f ( X . ) [ l - I(N,n)] (3.24) * assuming that at each point n = n , then a f t e r M moves are made E [ f ( X . + M ) ] = f ( X . ) [ l - I r ( N , n * ) ] M (3.25) A [ L 0 G 1 Q ( f ( X i ) ) ] Defining the rate of convergnece of the ORSSRSWR as — the equation for the optimum rate of convergence when reversals are used becomes A[L0G. n(f(X,))] A AM " L° Gi0 ( 1 " M N ' V > ( 3 ' 2 6 ) 24 I n i t i a l i z e MOVES =STARTS PHASE=.TRUE. NRVG=0 1,MOVES l.HAXRVG Generate random vector AX Form X =X+sAX Step size too large restart * s A 'E=MAXNFE Y E S a/EXITj or s<e ^ step size too small s ^ — 1 0 0 s 1 MOVES r = 2 NRVG Solve for rj P(N,fl)=r s<-n r n I n i t i a l estimation over PHASE=.FALSE. M0VES=NM0VES Quadratic f i t Linear f i t f|=a 1r 2+a 2r+2 f^b-^r+b. I & „ ——<3 Figure 3.4 Flowchart of .the ORSSRSWR Figure 3.5 Performance of the ORSSRSWR on hyperspheres of various dimensions 26 A l l the re s u l t s shown i n Figure 3.5 were obtained by s t a r t i n g the o p t i -* mization procedure with the optimum s t a r t i n g point s . In order to t e s t how the ORSSRSWR behaves when s t a r t e d with a poor s t a r t i n g point a run * -4 with s = s 10 was t r i e d . Figure 3.6 compares the runs with optimum s t a r t i n g point and poor s t a r t i n g point to the optimum rate of convergence i n equation (3.26) for dimension equal to 20. A very important feature of any search algorithm i s the r e l a -t i o n s h i p between the number of function evaluation required to reach the optimum and the number of dimensions. For the ORSSRSWR i t was found that t h i s r e l a t i o n s h i p i s l i n e a r . The r e l a t i o n s h i p between the number of function evaluations and the dimension, when the optimum rate of con-vergence i s used w i l l now be derived. P i c k i n g AL0G^Q(f(X)) as -10 i . e . a reduction of f(X) by a fac t o r of 10"^ and s o l v i n g (3.25) f o r M when •A-LOG Q ( f (X)) = -10 gives M = = ^ _ _ (3.27) L0G 1 ( J(l - i r ( N , n r ) ) The values of NFE at the i n t e r s e c t i o n of the dashed l i n e i n Figure 3.5 along with equation (3.27) with M = NFE are p l o t t e d as a function of N, the dimension i n Figure 3.7. The l i n e a r dependence of the number of function evaluations for a reduction of the function by a c e r t a i n f a c t o r , on the dimension of the parameter space, combined with no requirement f o r the evaluation of d e r i v a t i v e s , makes the ORSSRSWR a p a r t i c u l a r l y a t t r a c t i v e algorithm f o r optimizing functions of high dimensions. Another i n t e r e s t i n g feature of the ORSSRSWR i s i t s increasing s t a b i l i t y with the increase of dimension. By s t a b i l i t y i s meant that n remains close to n and does not wander towards n = 0 or n = 2, where progress i s very slow. Looking at Figures 3.8 (a,b) which show the p r o b a b i l i t y density NFE Figure J>.6 Comparison of the ORSSRSWR with the optimum rate of convergence for dimension 20 28 2000 1800 1600 1400 ORSSRSWR 1200 1000 800 optimum rate of convergence 600 400 _L 10 15 Dimension N 20 25 Figure 3.7 Number of function evaluations required f o r a reduction of f(x) by a f a c t o r of IO"*"*"* as a function of dimension. 30 function when = n r for dimensions 10, 20, i t becomes apparent that as the dimension increases the deviation of from decreases. 31 CHAPTER IV THE ^ NORMALIZED RANDOM VECTOR GENERATOR 4.1 Introduction In t h i s chapter an a d d i t i o n a l improvement to the ORSSRSWR w i l l be proposed. This improvement consists of saving time i n the generation of random vectors. The random vectors uniform on the surface of a hypersphere of radius s were generated using Muller's [7] method. In th i s method N random numbers from a Gaussian d i s t r i u b t i o n with zero mean and unit standard deviation are chosen to form a vector. The vector i s then normalized to form a vector of radius s on the surface of a hyper-sphere. In order to save time i n the generation of random vectors the normalization w i l l be eliminated and the previous algorithm (ORSSRSWR) w i l l be analyzed with t h i s modified vector generator. 4.2 D e f i n i t i o n of s and n for the unnormalized vector generator Let r = ( q , ... , C N) (4.1) be an N-vector where E. are Gaussian random va r i a b l e s with zero mean x and unit standard deviation. Then E [ | | r | | 2 ] = N (4.2) (see Appendix B). Now l e t £ ! = — £ . , i *= 1, . . . . N (4.3) J - , A s -and r = — r. •M Thus from (4.2) E t l l r - ' M 2 ] = s 2 . (4.4) Therefore rather than generating £ and then normalizing the vector r so that | | r | | = s, we s h a l l generate the vector r ' and no normalization w i l l be required. As can be shown (see Appendix B), f o r large dimensions 32 t h i s i s equivalent to the previous method. Using equations (4.4) we s h a l l define s as 1 s £ (Etllr 'H 2 ]} 2 (4.5) g and as before n = — but now however, s i s not the stepsize, the step P s i z e i s | | r ' | | = r g , and a success can occur f o r a l l values of s, therefore nefO, 0 0). The r a d i a l p r o b a b i l i t y density function i s given by Nr 2 P r ( rs> a ^ e" 1? <4'6> s and i s derived i n Appendix B. The angular p r o b a b i l i t y density function i s the same as before (equation (2.3)), however successful steps occur £ ° r * e[0, cos" 1 ( ^ ) ] (4.7) and r g e[0,2p]. ' 4 ' 7 ) Since <j> and r g are s t a t i s t i c a l l y independent, the j o i n t p r o b a b i l i t y den-s i t y function i s the product of (2.3) and (4.6) and i s given by Nr 2 g p ( r ,*) a / _ 1 e~ (7~2 ) s i / " 2 * . (4.8) o o Z S The p r o b a b i l i t y of success and the expected improvement per function evaluation f o r the normal d i s t r i b u t i o n w i l l be denoted by P n(N,n) and l n(N,n) r e s p e c t i v e l y . From (4.8) and the l i m i t s i n (4.7) we obtain M 2 s -1 r N-2 2p N-1 7C t cos ( — ) s i n <t>d<J>dr _ 2p s J o Jc r » N-1 ~ , f" . N-2 A J A / r s e 2s 2 d r s J Sin *d* p n(N,n) = ^ 2 ° (4.9) S , TT o o r g Upon making the s u b s t i t u t i o n x = — (4.10) s 33 and evaluating the denominator of (4.9) together with s u b s t i t u t i o n of equation (2.6), the expression reduces to P n(N, n) = — ~ ( y ) 2 J n x N 1 e 2 x P(N,xn) dX. (4.11) r(y) 1 ° Using equation (2.8) but with s replaced by r g and (4.8) repla c i n g p(<J>) i n equation (2.7) together with s u b s t i t u t i o n (4.10) we obtain the expected improvement per function evaluation l n(N,n) as N 2 2 * ± n M 7 t M x N 2 2 2 T N C M - 2 ,N.2 r n / n N — , x n N , I (N,n) - -IT" (j) { a ( N ) ( N _ 1 } J x e 2 (1 - — ) d, rc-j; o 2 r - N+l - f x 2 (4.12) - n / n x e 2 P(N,xn) dx} ''o Equations (4.11) and (4.12) were integrated numerically and the r e s u l t s were very s i m i l a r to those i n Figures 3.1(a,b) and 3.2.(a,b) . In s i m i l a r fashion as i n chapter II we can use reversals and define the p r o b a b i l i t y of success with reversals and the expected im-provement per function evaluation with reversals as p n ( N > n ) = 2 p n ( ^ ) ( 4 > 1 3 ) r 2-P n(N, n) and l > , n ) = (4.14) 2-P n(N,n) re s p e c t i v e l y . Table 4.1 shows some of the values of equation (4.11) to (4.14) f o r the previously defined optimum values of n. Comparing the values i n Table 4.1 to the values i n Table 3.1 i t becomes apparent that f or high dimen-sions i t does not matter which random vector generator i s used. Further proof of t h i s can be found i n Appendix B. Dimension * n P N(N, N*) i i nU„*) r r I > . 0 r r 2 .78847 ; .38321 ! i .18798 .74895 .48407 .23365 3 .66667 .34645 1 .13038 .62347 .43367 .15827 4 .58687 .32763 .09923 .54477 .40902 .11906 5 .52982 .31625 .07994 .48969 .39443 .09527 • 10 .38118 .29334 .04036 .34938 .36574 .04747 20 ..27168 .28182 .02023 .24802 .35123 .02181 Table 4.1 Comparison of the parameters for the searches using the unnormalized random vector generator, without and with reversals 4> 35 The previously implemented ORSSRSWR (shown i n Figure 3.4), but with the new random vector generator was t r i e d f o r dimension 15, 20, 25. Figure 4.1 shows the values of L0G^Q(av(f(X))) versus the number of function evaluations. Comparing Figure 3.5 i t can be seen that there i s l i t t l e d ifference between the performance of ORSSRSWR f o r the two random vector generators for high dimensions. The unnormalized vector generator does save some computing time. 4.3 A p p l i c a t i o n of the ORSSRSWR to functions other than the Hypersphere In the de r i v a t i o n of the ORSSRSWR no use was made of the func-t i o n value other than to determine i f the current function value i s smaller than the previous. Thus the step s i z e updating and the estimation of n are independent of the value of f ( X ) . Consider the function h ( f ( p ) ) where p i s defined by (2.1), then i f h i s a monotone:increasing function for p 5 0 optimization of f(p) since i f Thus the ORSSRSWR i s not only i n s e n s i t i v e to s c a l i n g , but i t i s also i n s e n s i t i v e to any transformation h ( f ( p ) ) where h i s a monotone in c r e a s i n g function. This i s so because the transformation does not change the shape of the contours of the function but only the values along them. The ORSSRSWR was also t r i e d on the following functions of dimension 20 with the unnormalized vector generator. 1) The h y p e r e l l i p s o i d of [2] f ( P l ) > f ( p 2 ) = > h ( f ( P l ) ) a ( f ( P l ) ) . (4.15) f(X) = .1 x^ + I i=2 x. l 2 (4.16) 2) and the q u a r t i c of [12] f(X) = 1 x* (4.17) i = l Figure 4.1 Performance of the ORSSRSWR with the unnormalized random vector generator 37 These r e s u l t s along with those f o r the hypersphere are shown i n Figure 4.2. 4.4 Comparison of the ORSSRSWR to other searches. White and Day [1 4 ] made an improvement on the ASSRS of Schumer and S t e i g l i t z , This improvement consists of f i n d i n g b e t t e r parameters f o r the ASSRS. As i n the Schumer and S t e i g l i t z search, the parameters are found experimentally, and no claim i s made that they are the optimum parameters. White and Day compared the ASSRS with t h e i r parameters to that of Schumer and S t e i g l i t z for the hypersphere of various dimensions. The experiment was performed under the following conditions: the elements of the s t a r t i n g vector X q are chosen such that ||x|| = 1, i . e . such that _g f(X) = 1, and the stopping condition was f(X) < 10 In the present study the experiment was done f or the ORSSRSWR using the normalized random vector generator. In figu r e 4.3 the number of function evaluations NFE required to meet the stopping conditions are pl o t t e d as a function of the dimension. The r e s u l t s given by White and Day are also p l o t t e d f o r comparison.- As can be seen from figure 4.3, the r e s u l t s obtained from ORSSRSWR are be t t e r than e i t h e r of the previous methods. CO CO Figure 4.2 Performance of the ORSSRSWR f o r some functions of dimension 20 39 Figure 4.3 Comparison of the ORSSRSWR to the ASSRS 40 CONCLUSIONS The algorithm developed i n t h i s thesis has been found to require N 2 fewer function evaluations to minimize the function f(X) = Y x. i = l of Schumer and S t e i g l i t z or the one with the White and Day modifications. The algorithm w i l l also work well f o r any function with s p h e r i c a l contours such as the Schumer and S t e i g l i t z q u a r t i c or any monotone increasing function of f ( p ) . For a function with narrow contours such as the h y p e r e l l i p s o i d of [ 1 ] , i n i t i a l l y the performance was comparable to that of the hypersphere but tapered o f f a f t e r a while. This happened because the step s i z e became smaller than the optimum step s i z e . In general such problems can be avoided by p e r i o d i c a l l y r e s t a r t i n g the search procedure from the i n i t i a l estimation phase. The use of reversals and the use of the unnormalized random vector have reduced both the number of function evaluations and the time of of the search. The number of function evaluations for a minimization of the value of the hypersphere by a c e r t a i n f a c t o r was found to be a l i n e a r function of the dimension of the space. This suggests that the ORSSRSWR can be an e f f i c i e n t algorithm for optimizing functions of high dimensions. The requirement f o r no derivatives makes t h i s search very easy to imple-ment and to use. Suggestions for future research The ORSSRSWR could be improved by f i n d i n g a be t t e r method of estimating the r e l a t i v e step s i z e . But as Figure 3.7 shows for high dimensions t h i s improvement would not be that s i g n i f i c a n t , since the implemented ORSSRSWR's performance i s close to that of the t h e o r e t i c a l 41 upper bound. In the search algorithm presented, information gathered from the successes and f a i l u r e s of various d i r e c t i o n s of search c o n t r i -buted only to the estimation of one parameter, the step s i z e . I f that information could be used to p a r t i a l l y guide the next d i r e c t i o n of search, a more e f f i c i e n t search algorithm would r e s u l t . Then instead of the search d i r e c t i o n being purely random, i t could be biased by a scheme such as that of Matyas or the search r e s t r i c t e d to be on the surface of a N-dimensional hypercone as i n the search of Wozny and Heydt [16]. The p r o b a b i l i t y of improvement and the expected improvement per function evaluation could be found i n s i m i l a r manner as i n the ORSSRSWR and the step s i z e updating theory of the ORSSRSWR used f o r the determining the magnitude of the random vector. The optimum r e l a t i v e step s i z e was found using the assumption that the function being optimized i s a hypersphere. For other functions i t i s very probable that a d i f f e r e n t value of the r e l a t i v e step s i z e i s the optimum value. Thus during a search d i f f e r e n t values of the r e l a t i v e step s i z e with t h e i r corresponding values of the updating parameters could be t r i e d . Then the value of the r e l a t i v e step s i z e which produces the greatest improvement could be taken as the value of the optimum r e l a t i v e step s i z e f o r the p a r t i c u l a r function. The search then could become a t r u l y adaptive one. 42 APPENDIX A * A . l Computation of P(N,n), I(N,n), and n Formulas f or the computation of P(N,n), I(N,n), and n w i l l now be derived. Equation (2.6) i s integrated by parts, giving 2 — P(N+2,n) = P(N,n) - N + i ( N ) w + 1 o n (1- ~)2 . (A.l) 2 [Y(~-)] AP(N,n) i s defined as AP(N,n) = P(N+2,n) - P(N,n) . (A.2) Su b s t i t u t i n g (A.l) in t o (A.2) gives 2 N ~ 1 A P ( N , n ) = - 2 N + 1 ( ^ ) 2 n ( l - \ ) 2 (A.3) and f o r N+2 2 P ( N + 2 ' n ) " " N-fs'^NS 2 n ( 1 " 4 " > - ( A ' 4 ) 2 N + \ r ( ^ f ) ) A * S u b s t i t u t i o n of (A.3) i n t o (A.4) y i e l d s another recurrence r e l a t i o n f o r AP(N,ri) N 2 AP(N+2,n) = AP(N,n) ^ (1 - j~) (A.5) Together with recurrence r e l a t i o n s (A.l) and (A.5) and the following i n i t i a l values P(N,n) can be computed e a s i l y : 1) f or N even P(2,n) = £ c o s " 1 (h (A.6) 4P(2,tl) - nVl " 7- (A.7) 43 2) f o r N odd P(3,n) = | t l - f l (A-8) -, 2 AP(3,n) = ± [1 - f-] (A.9) Formulas f or the expected improvement per function evaluation, I(N,n), w i l l now be derived. I(N ,n) was given by [12] as [ *° fi A 2 • 2 , \ j i J (2ri cos? - n sxn <j>)d<}> i ( N , n ) - - 2 2T(N) : ( A ' 1 0 ) the f i r s t part of the numerator i s integrated and the second part i s 2 ju s t P(N , n ) r i , thus: 2 — K N , n ) = Ti_\1)nn ^ " \) 2 " ^2 p(N>^ <A-n> 2 « j [ r ( r i _ i ) ] / ( N _ 1 ) Now equation (2.6) i s d i f f e r e n t i a t e d with respect to n so that • 2 — P ' ( N ' n ) = ~ 4 a W ( 1 2 ' ( A ' 1 2 ) S u b s t i t u t i o n of (A.12) i n t o (A.11) gives the equation f o r I(N,n) i n terms of P(N,n) as 2 K N , n ) = - ( 4 _ ~ n } P'(N,n) - n 2 p(N,n) (A.13) which can be computed by the recurrence r e l a t i o n s ( A . l ) , (A.5)-(A.9) and the evaluation of P'(N,n) from formula (A.12). Now i n order to f i n d n the equation I (N,n ) = 0 has to be solved f o r n . D i f f e r e n t i a t i o n of (A.13) with respect to n, and the use of (A.12) to express P"(N , r i ) i n terms of P'(N , r i ) gives the equation f o r * n as N-1 * *. 2 I ( N > "*> = 2(N-l)a(N) a - 1 ^ ) 2 (A.14) 44 Using equation (A.13) to eliminate I(N,n ) i n (A.14) and s i m p l i f y i n g g i V 6 S * 0 N-1 P(N,n*) = jf (1 - V") 2 • (A'15> (N-l)2a(N)n This equation i s e a s i l y solved using Newton's method. 45 APPENDIX B B . l Derivation of the p r o b a b i l i t y density function, expectation and variance of a vector with N random Gaussian components. Let r be the vector defined i n (4.1). By the d e f i n i t i o n of r each component of r ., has the p r o b a b i l i t y density function , v P U . ) = — e 2 , (B.l) Let the norm of r for dimension N be r N ^ | | r | | ='{? K ± 2 } 2 (3.2) 1=1 . Since the elements ^ are s t a t i s t i c a l l y independent the j o i n t p r o b a b i l i t y density function becomes N P ( KV , 5 N ) = n p ( q ) (B.3) i = l 2 1 ,N - • ( — ) e 2 . I t i s required to f i n d the p r o b a b i l i t y that r„ l i e s between r„ and r„T+dr . N N N N From (B.3) 2 i N - i . p ( r N ) d r N = C-^) e 2 d . . . , d ^ (B.4) /2TT where d ? r ... , d f ^ (B.5) i s a volume element of an N-sphere. The volume of an N-sphere i s given by [13] as N 2 r * n 2 46 and the volume element i s obtained from (B.6) by d i f f e r e n t i a t i o n N 2r n dV = — ~ dr (B.7) r(|) S u b s t i t u t i n g (B.7) into (B.4) we obtain 2 N-l p ( r N ) a r N x e 2 . (B.8) The expectation of r ^ i s given by oo / ' rN p ( rN> d r N o  E [ r N ] = - ^ — (B.9) d r N Integration of (B.9) by parts y i e l d s the following recurrence r e l a t i o n f o r E [ r N ] E< rN ] = E l r T T ] ( B ' 1 0 ) N-l with E [ r 2 ] = f . 2 The expectation of r i s given by 2 N+l - rN . , - rN 6 T d r N - 3 ( B - n ) 6 - 2 ~ d r N which reduces to E [ r 2 ] = N. (B.12) ^o f N-l 4 7 B.2 Deviation of r„, from the expected value 1 2 2 It w i l l now be shown that f o r large N {E[r^]] * E [ r ^ ] , and consequently the norm of the vector r, (defined i n chapter IV) w i l l be close to s, the step s i z e . To show that f o r N £ 2 >¥ } E[r ] >, y ^ I : (B.13) For any random v a r i a b l e E [ ( r N - E [ r N ] ) 2 ] >. 0. (B.14) 2 Using equation (B.12) to eliminate E [r^] from i n e q u a l i t y (B.14) we obtain the two i n e q u a l i t i e s a n d . (B.15) v ^ T * E [ r N _ i ] ' D i v i d i n g both sides of equation (B.10) by we obtain Using the second part of i n e q u a l i t y (B.15) we obtain the i n e q u a l i t y N-l (B.16) E< rN-l ] * 1. (B.17) Su b s t i t u t i o n of (B.16) i n t o (B.17) gives E [ r N ] ^ x (B.18) or Et rN-' * Thus the second part of (B.18) combined with f i r s t part of (B.15) completes the proof i . e . & z E[r„] >, /N-l Q.E.D. 48 If we now use the vector r' defined i n chapter IV, we have E [ | | r ' | | 2 ] = s 2 and from (B.13) s % E[||r'||] * s . (B.19) Thus for high dimensions the step s i z e f o r the vector generated without the normalization, w i l l be close to the step s i z e of the normalized vec-t o r . 49 REFERENCES 1. E.J. Beltrami and J.P. Indusi, "An Adaptive Ranodm Search Algorithm f o r Constrained Minimization", I.E.E.E. Transactions on Computers, C-21, September 1972, 1004-1008. 2. N. Borowski, "A Comparison of Three Search Methods", M.A.Sc. Thesis, U.B.C, 1971. 3. W.C. Davidon, "Variable Metric Method f o r Minimization", A.E.C. Re-search and Development Rep. ANL-5990 (Rev.) 1959. 4. R. Fletcher and CM. Reeves, "Function Minimization by Conjugate Gradients", Computer Journal 7, A p r i l 1964, 149-154. 5. J . Kowalik and M.R. Osborne, Methods for Unconstrained Optimization Problems, E l s e v i e r Publishing Co. Inc., New York, 1968 , 34-37. 6. J . Matyas, "Random Optimization", Automation and Remote Control, 26, February 1965, 244-251. 7. M.E. Muller, "A Note on a Method for Generating Points Uniformly on N-Dimensional Spheres", Comm. ACM 2, A p r i l 1959, 19-20. 8. V.A. Mutseniyeks and L.A. R a s t r i g i n , "Extremal Control of Continuous "Multi-Parameter Systems by the Method of Random Search", Engrg. Cy-ber n e t i c s , 2, January-February 1964, 82-90. 9. A.F. Pensa and G.J. McMurtry, "Gradient Biased Random Search", Pro-ceedings Systems Science and Cybernetics Conference, Pittsburgh, October 14-16 1970, 171-173. 10. D.A. P i e r r e , Optimization Theory with Applications John Wiley and Sons Inc., 1969. 11. L.A. R a s t r i g i n , "The Convergence of the Random Search Method i n the Extremal Control of a Many Parameter System", Automation and Remote Control, 24, A p r i l 1964, 1337-1342. 12. M.A. Schumer and K. S t e i g l i t z , "Adaptive Step Size Random Search", I.E.E.E. Transactions on Automatic Control, AC-13, June 1968, 270-276. 13. D.M.Y. Sommerville, An Introduction to the Geometry of N Dimensions, Dover P u b l i c a t i o n s , Inc., New York, 1958, 135-137. 14. L.J. White and R.G. Day, "An Evaluation of Adaptive Step-Size Random Search", I.E.E.E. Transactions on Automatic Control AC-16, October 1971, 475-478. 15. D.J. Wilde and C S . Be i g h t l e r , Foundations of Optimization, P r e n t i c e -H a l l , Inc., Englewood C l i f f s , N.J., 1967, 288-290. 50 M.J. Wozny and G.T. Heydt, "Hyperconical Random Search", Journal Dynamic Systems, Measurement, and Control, 94G, March 1972, 71-78 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0101426/manifest

Comment

Related Items