S-R-T Division Algorithms As Dynamical Systems by Mark A . McCann B . S c , University of B r i t i s h Columbia, 1999 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR THE DEGREE OF Master of Science in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard The University of British Columbia A p r i l 2002 © M a r k A . M c C a n n , 2002 In presenting degree freely at this the thesis in partial fulfilment University of British Columbia, I agree that the available for copying of department publication this or of reference thesis by this his for and study. scholarly or thesis for her purposes of Compter Date DE-6 (2/88) It gain shall not Sciintt The University of British Columbia Vancouver, Canada may representatives. financial the requirements I further agree that permission. Department of be is Library permission granted by understood be for an advanced shall make for the that allowed without it extensive head of my copying or my written Abstract S - R - T division, as it was discovered in the late 1950s [4, 19, 23], represented an important improvement i n the speed of division algorithms for computers at the time. A variant of S - R - T division is still commonly implemented i n computers today. A l t h o u g h some bounds on the performance of the original S - R - T division method were obtained, a great many questions remained unanswered. In this thesis, S - R - T division is described as a d y n a m i c a l system. T h i s enables us to bring modern dynamical systems theory, a relatively new development i n mathematics, to bear on an older problem. In doing so, we are able to show that S - R - T division is ergodic, and is even Bernoulli, for a l l real divisors and dividends. ii Contents Abstract ii Contents iii List of Figures v Acknowledgements vi 1 Introduction and Background 1 1.1 Introduction to S - R - T Division 1 1.2 S - R - T D i v i s i o n as a D y n a m i c a l System 5 1.3 Shift Average for D € [|, 1) 12 2 Bernoulli Property 2.1 P r o o f of Bernoulliness 2.2 Entropy oi T 15 16 • D 3 Extensions to Multi-Divisor S-R-T Division 19 23 3.1 M u l t i - D i v i s o r S - R - T Division 23 3.2 P r o o f of Bernoulliness 27 3.3 Some Restrictions on a 32 3.4 Entropy of M u l t i - D i v i s o r S - R - T D i v i s i o n 37 iii 4 Future Work Bibliography L i s t of Figures 1.1 A pseudo state-machine for converting to binary 4 1.2 S - R - T division where p = 0 . 6 7 , and D = 0 . 7 5 5 1.3 Following partial remainder magnitudes graphically for D 0 0.75 = and p = 0 . 6 7 7 0 1.4 A p p l y i n g T / to x = j one hundred times 1.5 Applying T 1.6 A p p l y i n g P associated w i t h T / to f{x) = 1 six times 1.7 A p p l y i n g P associated w i t h T / to f(x) — ^~ f^ 3.1 S - R - T division where p = 0 . 6 7 , D = 0 . 7 5 , and a = ( 0 . 7 5 , 1 , 1 . 2 5 ) 2 7 3.2 C o m b i n e d plot of the regions where oti(e, D) < | and a (e, D) > 4 8 5 4 / 5 to x = f + 0 . 0 0 0 0 1 one hundred times 3 3 5 5 2 ' ^ 8 11 six times 12 0 2 1 for proof of Theorem 1 5 34 3.3 A n example of a non-ergodic system for T D 3.4 A n example of a non-ergodic system for To, 3.5 A n example of a non-ergodic system for T o ) Q a v ; a G 9Tt„>4 34 £ 9^3 35 G WI2 37 Acknowledgements I would like to thank my thesis supervisor, Nick Pippenger, for his patient guidance and for the inspiration for this thesis. I would like to thank Mark Greenstreet for the many excellent comments and suggestions that he provided as my second reader. I would like to thank my good friends Ellen Gethner, Wesley Wong, and Michael Forbes for many good conversations relating to my thesis, and for their numerous helpful suggestions. I would especially like to thank my wife Karen for being a constant support and encouragement to me throughout my Master's degree. Finally, I would like to thank my parents for all the support and encouragement they have given me through numerous years of schooling. M A R K The University April of British Columbia 2002 vi A . M C C A N N Chapter 1 Introduction and Background 1.1 Introduction to S-R-T Division " S - R - T division" roughly refers to a class of non-restoring, binary division algorithms that have been designed for floating-point computers [3, 5, 6, 7, 14, 22]. T h e term "non-restoring" refers to the fact that partial remainders are allowed to range freely through the interval (—1,1), rather than being restored to the positive realm before proceeding to the next step. T h i s feature reduces uses of the adder by about fifty percent. A n equally important feature of this algorithm is the " S - R - T " optimization from whence the algorithm gets its name. In the late 1950's, Sweeney [4], Robertson [19], and Tocher [23] independently made the observation that whenever a p a r t i a l remainder is i n the range (—|, | ) , there w i l l be one or more leading zeros that can be shifted through i n a very short amount of time (usually one cycle). T h e more leading zeros i n a given step, the more the algorithm can avoid costly uses of the adder. A further development of this original algorithm, which is still called S - R - T division, is the algorithm most often implemented in modern point units. floating- In modern S - R - T division, a fixed number of quotient digits are produced every cycle as opposed to a variable number [5, pp. 1 37-62]. A n example of modern S - R - T division i n use is Intel's first release of the P e n t i u m ™ C P U with its infamous "Pentium Bug," which was really just a small error in its S-R-T division implementation. This thesis will restrict its attention to the original version of S-R-T division. To present the simplest type of S-R-T division, we begin with a few definitions for an algorithm similar to that presented by Shively [22, pp. 3-4]: (a) n represents the number of iterations performed in the algorithm. (b) po is the dividend (or initial partial remainder) normalized so that po G [\, !)• (c) Pi G (—1,1), i G N, is the partial remainder after the ith step. (d) D is the divisor normalized to 1). (e) qi G {—1,0,1} (i G { 0 , . . . , n — 1}) is the quotient digit generated by the i t h step. 71-1 (f) Q = ^2 % is the "rounded off" quotient generated after n steps of the i=0 algorithm. n Given the above definitions, after n steps of the division algorithm, we would like it to be true that po = DQ n + s{n) where e(n) is a term that goes to zero as n goes to infinity. A recurrence relation for the S-R-T division algorithm can be stated as Pi+i = < 2pi \Pi\ < \ 2(pi - D) \pi\ > \ and pi > 0 2(pi + D) \pi\ > \ and pi < 0, 2 and : 0 Qi = < 1 P i : i \ < \Pi I : -1 \ > 5 a > 5 n d Pi > 0 and pi < 0 . B y observing that 2{ -{Q)D) \Pi\ < 5 \ 2 ( P i - (1)D) \Pi\ > \ ( Pi+i = Pi 2{ -{-l)D) and pi > 0 |pi| > 5 and pi < 0 , Pi we can rewrite the definition of pi+i as After n steps have been completed, we have = 2 p n n Po - 2qD n 0 - 2- D 2 q -lD.. n l l qi n and then after dividing by 2 and solving for Q we find that n P Pn , PO = 7T + qoD q\D 2° + +... + 71-1 Qn-lD Jn—1 Pn i=0 Now let e(n) = p /2 n n and let Q* = l i m ^ o o Q - Since | p | < 1, i n the limit as n n n goes to infinity Po = DQ*. The quotient bits being generated are not i n a standard binary representation, but it is a simple matter to convert the answer back to standard binary without using any expensive operations. Figure 1.1 shows a simple pseudo state-machine (really a push-down automaton) that converts positive floating-point numbers in the {—1,0,1} representation into binary. Figure 1.1: A pseudo state-machine for converting sequences of { — 1,0,1} into sequences of {0,1} (binary). We assume that the input sequence corresponds to a positive number. The letter ' Z ' is used to indicate that the end of the sequence has been reached, and the symbol e represents the null string. We represent a run of m zeros as 0 • • • 0 and a run of m ones as 1 • • • 1. Sequences of symbols should be read m from left to right. For example, the expression 1/10 ••• 0 means: if a 1 is encountered in the input sequence, write a 1 followed by m zeros. The above conversion automaton implies that conversion happens after the calculation is completed. In reality, the conversion from the generated quotient bits to standard binary is done in hardware on-the-fiy, using registers to convert runs of zeros into runs of zeros or ones in parallel, or by performing a single subtraction. Figure 1.2 shows an example of using the S-R-T division algorithm to divide 0.67 by 0.75. The steps that produce non-zero quotient bits have been shown. In this example, after six uses of the adder, the quotient (0.893) has been determined to four digits of precision. 4 0.67 Po = 0.67 -0.16 Pi = 2(0.67 - D) Pi = 2 ( 2 ( - 0 . 1 6 ) + £>) = 0.22 Pi = 2(2 (0.22) — D) = 0.26 P9 = 2(2 (0.26) - D) = -0.46 2 2 1 1 go = Qo = 1 Qz = - 1 qs = Qz = 0.875 1 Qe = 0.890625 1 Qs = 0.89453125 Pn = 2 ( 2 ( - 0 . 4 6 ) + D) = - 0 . 3 4 Qw = - 1 Qw = 0.8935546875 Pl3 = 2 ( 2 ( - 0 . 3 4 ) + D) = 912 -1 Ql2 = 0.8933105469 1 0.14 1 = Figure 1.2: A n example of S-R-T division when the dividend po = 0.67, and the divisor D = 0.75. The quotient Q* is 0.893. Now, with this simple system of division in hand, we might want to ask certain questions about its performance. For example, we could ask "How many bits of precision are generated per iteration of the algorithm on average?" To answer this question, we must look at the magnitude of \Q* — Q \ = \ /2 \. n n of bits of precision on the nth step is then n — log 2P n Pn The number . In the worst case, p is close n to 1, and therefore we get at least one bit of precision per iteration of the algorithm, regardless of the values of D or Q. Of course, a designer of actual floating-point P hardware probably wants to know the expected performance based on the expected values of p . To answer the many variants of this type of question, it is clear that we n must know something about the distribution of partial remainders over time. The remainder of this thesis is devoted to extending what is known about the answer to this type of question as it relates to S-R-T division and its variants. 1.2 S-R-T Division as a Dynamical System The example in figure 1.2 makes it clear that keeping track of the signs of successive partial remainders is irrelevant in determining how many times the adder will be 5 used for a p a r t i c u l a r calculation. F o r this reason, we only need to consider the magnitudes of successive p a r t i a l remainders. W e now give a reformulation of S - R - T d i v i s i o n that w i l l allow us to look at d i v i s i o n as a d y n a m i c a l system. Definition 1 (S-R-T Division Transformation). F o r D G [5,1), we define the function T D : [0,1) ->• [0,1) as TD(X) = { 2x 0 < x < \ 2(D - x) \ < x < D 2(x - D) D < x < 1 T h i s t r a n s f o r m a t i o n of the u n i t interval represents the successive p a r t i a l remainders that arise as S - R - T d i v i s i o n is carried out b y a divisor D o n a d i v i d e n d x. D is n o r m a l i z e d to 1). T h e d i v i d e n d x is normalized to the successive p a r t i a l remainders Tjj(x) 1) initially, while each of (n G N) subsequently ranges t h r o u g h [0,1). B y using the characteristic function for a set A defined as 1A(*) = we c a n rewrite T (x) D J 1 : i e A 0 : x g" A , as = 2x • l ^ i j ( 1 ) + 2(D - x) • l r i , 0 r ) + 2(x - D) • l o ) m ) (x) . (1.1) If we p l o t equation 1.1 o n t h e u n i t interval, we o b t a i n a very useful v i s u a l i z a t i o n of our transformation. F i g u r e 1.3 shows the plot of To.75(0;) c o m b i n e d w i t h a plot of the successive p a r t i a l remainders that arise while d i v i d i n g 0.67 b y 0.75. T h i s is the same system that was presented earlier i n figure 1.2. N o t i c e that a vertical line i n the interval [\,D) corresponds to a subsequent flip i n the sign of the next p a r t i a l remainder. 6 0 0.25 0.5 0.75 1 X F i g u r e 1.3: A n example of following p a r t i a l remainder magnitudes g r a p h i c a l l y for D — 0.75 a n d po = 0.67. T h e heavy solid lines represent the transf o r m a t i o n To.75, while the abscissa of the t h i n vertical lines represent successive p a r t i a l remainder magnitudes. F i g u r e 1.3 shows a n example of following the t r a j e c t o r y of a single p a r t i a l remainder for a p a r t i c u l a r divisor. A f t e r ten applications of the To.75, there is not any obvious regular pattern, although we expect to see one eventually since the quotient is r a t i o n a l i n this case. O f course, most numbers are not r a t i o n a l and we can deduce that for most numbers, the transformation w i l l never exhibit a repeating pattern. In figures 1.4 a n d 1.5, we see that a very s m a l l change i n the value of the i n i t i a l p a r t i a l remainder quickly produces large differences i n the observed behaviour of the subsequent p a r t i a l remainders. O u r system appears to be chaotic (it certainly has sensitive dependence o n i n i t i a l conditions a n d is topologically transitive), a n d , if this the case, we w i l l gain little understanding by s t u d y i n g the trajectories of 7 i n d i v i d u a l p a r t i a l remainders. T h e logical next step is to study the behaviour of d i s t r i b u t i o n s of points over the whole interval. l r 0.75 0.5 0.25 100 F i g u r e 1.4: T h e result of a p p l y i n g T / 4 5 to x = j one h u n d r e d times. 0.75 ST 0.5 0.25 0 J 20 40 F i g u r e 1.5: T h e result of a p p l y i n g T / 4 5 n 60 80 100 to x = j + 0.00001 one h u n d r e d times. T h e area of understanding the behaviour of ensembles of points under repeated transformation is the r e a l m of d y n a m i c a l systems theory. F o r the remainder of this thesis, we assume a certain amount of f a m i l i a r i t y w i t h the fundamentals of 8 d y n a m i c a l systems theory (or ergodic theory), w h i c h requires some basic unders t a n d i n g of measure theory. W e w i l l include a few helpful b a c k g r o u n d m a t e r i a l definitions as they are needed, b u t mostly we w i l l provide references. A very good i n t r o d u c t i o n to the study of chaotic systems is L a s o t a a n d M a c k e y ' s book Chaos, Fractals, and Noise [11]. For a more detailed i n t r o d u c t i o n to ergodic theory (along w i t h the necessary measure theory needed to understand this thesis), Peter W a l ters's book An Introduction to Ergodic Theory [24] a n d K a r l Petersen's book Ergodic Theory [18] are h i g h l y recommended. Definition 2 (Probability Space). If B is a cr-algebra on subsets of a set X a n d if m is a measure o n B where m(X) = 1, then the triple (X, B, m) is called a probability space. (See [24, p p . 3-9] a n d [11, pp. 19-31] for a good overview of basic measure theory a n d Lebesgue integration.) Definition 3 (Stationary Distribution). Let (X, B, m) be a p r o b a b i l i t y space, let P be the P e r r o n - F r o b e n i u s operator associated w i t h a non-singular t r a n s f o r m a t i o n T : X —>• X, a n d let L Pf = ft 1 denote the L space of (X,B,m)^. l If / € L 1 is such that then / is called a stationary distribution of T. Definition 4 (Perron-Frobenius operator). F o r a probability space the Perron-Frobenius (X,B,m), operator associated w i t h a non-singular t r a n s f o r m a t i o n T : X —)• X is defined b y f Pf(x) d mJT-^B) = [ f{x) d m , JB for B eB . For a piecewise C § transformation T w i t h n pieces, we c a n give a n explicit 2 f o r m u l a for the P e r r o n - F r o b e n i u s operator. Let A = {Ai,A ,... 2 ,A } n be the p a r t i - t i o n of X w h i c h separates T into n pieces. For i € { 1 , . . . , n } , let ti(x) represent the *For a probability space (X,B,m), the L space of (X,B,m) is the set of / : X -> E satisfying J \f(x) \ dm < co. *The o symbol will be used to indicate that a given relation holds except possibly on a set of measure zero. §C denotes the set of all functions with two continuous derivatives. 1 x 2 9 n a t u r a l extension of the ith. C 2 function T ( x ) | ^ . i T h e P e r r o n - P r o b e n i u s operator for T is t h e n tl {x) f(t-\x))-l {x) l dx u[Ai) I n p a r t i c u l a r , for T o (as i n equation 1.1), Pf{x) = \f{\x)-l {x) m + \f{D - \x) • l ,2D-i](aj) (0 + \f{D + \x) •l [ 0 l 2 -2D)(^ • (1-2) W i t h equation 1.2 we c a n show precisely what happens to a n i n i t i a l dist r i b u t i o n of points (described b y a n integrable function) after they are repeatedly t r a n s f o r m e d under Try. Figures 1.6 a n d 1.7 show w h a t happens to two different i n i t i a l d i s t r i b u t i o n of points after five applications of the P e r r o n - F r o b e n i u s operator associated w i t h T / ( z ) . B y the fifth a p p l i c a t i o n , the d i s t r i b u t i o n s look r e m a r k a b l y 3 5 similar. O n e might guess that they are b o t h approaching t h e same final d i s t r i b u t i o n . T h i s s i t u a t i o n is i n m a r k e d contrast to chaotic behaviour observed i n figures 1.4 a n d 1.5. 10 2 2 1.5 1.5 1 1 0.5 0. 5 0 0 0.25 0.5 0.75 0 1 0 0.25 2 2 1.5 1. 5 1 CO 1 0.75 1 0.75 1 1 0 0.25 0.5 0.75 0 1 0 0.25 2 2 1.5 1.5 1 lO 0. 0.5 0 0.25 0.5 a: 0.5 X X 0 0.75 0.5 0.5 0 0.5 X X 0.75 1 0.5 0 1 0 0.25 0.5 X u r e 1.6: T h e result of a p p l y i n g the P e r r o n - F r o b e n i u s operator P associated with T / 3 5 to f(x) = 1 six times. 11 2.5 2.5 2 ^ ^ 2 ^ 1.5 a, 1 i 0.5 0.5 0 0 0.25 0.5 X 0.75 0 1 0 0.25 0.5 X 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 2 2 ^ - 1. 5 1.5 1 CN 0.5 0.5 0 0 0.25 0.5 0.75 0 1 x 2 2 1.5 1.5 1 a, 0.5 0 1 lO 0.5 0 0.25 0.5 X 0.75 1 F i g u r e 1.7: T h e result of a p p l y i n g the P e r r o n - F r o b e n i u s operator P associated 1 f = -—/ !og2 J • • 1 w i t h T 3 / 5 to fix) 1 / 2 1.3 — six times. x Shift Average for D G [§, 1) A n exact equation for the stationary d i s t r i b u t i o n when D € [|, 1) was first given by F r e i m a n [6] a n d is restated by Shively [22] as 1 /(z) 1 = -Q\O,2D-\){?) + 2^1[2£>-i,i)(») • (1-3) T o verify that this is a stationary d i s t r i b u t i o n f u n c t i o n , we b e g i n by a p p l y i n g the P e r r o n - F r o b e n i u s operator as given i n equation 1.2 to equation 1.3 a n d verifying 12 that Pf(x) = f(x). So then, a p p l y i n g P to / we get Pf(x) = ^ ^ l [ o , 2 D - i ) ( H + 2^ [2i>-i,i)(5 )) [o,i)( ) 1 a; 1 + \ ^ l [ o , 2 i > - i ) ( - D - h ) + -fij [2D-\,\){D x L + \ ^l[0,2X?-l)(-D + \ ) x a; ~ k ) ) l(0,2£»-i](a;) + 2 ^ 1 [ 2 D - l , l ) P + \x)^j l[Q, -2D){x) 2 • A s s u m i n g that D € [5,1), a n d observing that a; € [0,1), Pf(x) = \ ^ l [ o , 4 D - 2 ) ( a j ) + ^ [ 4 D - 2 , i ) ( ) ) l[o,i)(a:) 1 a ; + £ ^ ( 2 - 2 D , l ) ( ) + ^ ( 0 , 2 - 2 £ > ] ( 2 ; ) ) l(0,2D-l](a;) 1 a ; 1 + £ ( ^ 1 [ 0 , 2 D - 1 ) ( ^ l[0,2-2O)(a;) • F i n a l l y , assuming that D € [|, 1), we have P f( ) x = 2^1[0,l)(a;) + 2 ^ ( 2 - 2 D , 2 i 3 - l ] ( ^ ) + ^ l ( o , 2 - 2 D ] ( a ; ) + 42jl[o,2-2D)(a:) = 3 1 ^ 1 [ 0 , 2 - 2 D ) ( ^ ) + ^ 1(0,2-20] (*) 1 + -^ [2-2D,2D-l){ ) l x + ^T5 (2-2D,2£)-l](a;) 1 + 2^1[22>-i,i)(a0 = ^l[o,2D-i)(a;) + 2^1[2D-i,i)(a:) = / ( » ) • O n e of the p r i m a r y uses of having a formula for t h e d i s t r i b u t i o n of p a r t i a l remainders is for calculating the shift average for a given divisor. T h e shift average is the average number uses of the shift register (single shift or m u l t i p l i c a t i o n by two) between uses of the adder. U n d e r the assumption that a register shift is a m u c h faster operation t h a n using the adder, the shift average gives a useful characterization of the expected performance of our a l g o r i t h m for a given divisor. W i t h equation 1.3, we know the fraction of bits that require the use of the adder. T o calculate the average number of zero bits generated between non-zero bits (bits requiring use of 13 the adder), we take the reciprocal of the fraction of bits that require the adder. W e calculate the shift average for a divisor D e [f, 1) to b e •<">-I4 * £ T = (L4) Since have not proven that the s t a t i o n a r y d i s t r i b u t i o n s f r o m S - R - T d i v i s i o n are unique, we have no way of k n o w i n g whether or not a shift average c a l c u l a t i o n i n equation 1.4 is correct. T o prove that a l l stationary d i s t r i b u t i o n s are unique, we need to show that TD is ergodic for a l l D G [^,1)- F r e i m a n [6] shows that T o is ergodic for r a t i o n a l D, b u t we extend this result for real D. I n the next section we show that a l l are B e r n o u l l i a n d it is k n o w n that h a v i n g the B e r n o u l l i property implies ergodicity. Before c o n c l u d i n g this chapter w i t h a definition for ergodicity, we w i l l briefly comment o n the derivation of stationary distributions for D G [|, f ) . F o r D G [|, | ) , the stationary d i s t r i b u t i o n functions have been derived, a n d their associated shift average functions have been shown to be constantly three [6, 22]. T h e layout of stationary d i s t r i b u t i o n functions i n the region D G [ j , § ) has several s u r p r i s i n g properties a n d is far f r o m being fully understood. W e discuss the c a l c u l a t i o n of shift averages as a n interesting area for future investigation i n C h a p t e r 4. Definition 5 (Ergodic [11]). L e t (X, B,m) be a p r o b a b i l i t y space a n d let a nonsingular t r a n s f o r m a t i o n T : X —> X b e given. T h e n T is ergodic i f for every set B G B such that T~ {B) L = B, either m(B) = 0 or m(X \B) 14 = 0. Chapter 2 Bernoulli Property In this chapter, we w i l l prove that the class of transformations of the interval that characterizes the S - R - T d i v i s i o n for a l l real divisors D has the property that each t r a n s f o r m a t i o n TD is B e r n o u l l i . A l t h o u g h the basic concept of a B e r n o u l l i shift (the things to w h i c h transformations having a B e r n o u l l i property are isomorphic to) is not difficult, a complete definition requires enough a u x i l i a r y concepts from measure theory (concepts not used anywhere else i n this thesis) that we chose to refer the interested reader to [17, 18, 21, 24] a n d other selections listed i n the B i b l i o g r a p h y . N e i t h e r a n understanding of B e r n o u l l i shifts, nor a f o r m a l definition of what it means to be B e r n o u l l i is required to follow the proofs i n this chapter. H a v i n g said this, we should mention informally the connection between B e r n o u l l i shifts and transformations h a v i n g the B e r n o u l l i property. T h e transformation TD is an non-invertible e n d o m o r p h i s m of the unit interval. T h i s means that f r o m a given p a r t i a l remainder we can predict a l l future p a r t i a l remainders, but we cannot uniquely predict past p a r t i a l remainders. T h e r e is a natu r a l way (called the n a t u r a l extension) to make our t r a n s f o r m a t i o n invertible (an a u t o m o r p h i s m ) on a larger space. Specifically, each non-invertible transformation TD h a v i n g the B e r n o u l l i property has a n extension to a n a u t o m o r p h i c transformat i o n , isomorphic to a two-sided B e r n o u l l i shift [18, p p . 13,276]. F r o m the way that 15 entropy for a transformation is denned, the entropy for a n a u t o m o r p h i c B e r n o u l l i t r a n s f o r m a t i o n associated w i t h a non-invertible B e r n o u l l i t r a n s f o r m a t i o n is the same as the entropy for the non-invertible B e r n o u l l i transformation. B y p r o v i n g that a l l transformations T p are B e r n o u l l i , a n d b y p r o v i n g that entropy of each To is the same, we w i l l be able to conclude that the n a t u r a l extensions of S - R - T d i v i s i o n algorithms are isomorphic to each other for a l l divisors. 2.1 Proof of Bernoulliness Definition 6 (of Bowen [1], Expanding). W e w i l l say that a t r a n s f o r m a t i o n T on a n interval is expanding i f it has the property that s u p u-(T U) = 1 for a l l open n n > 0 intervals U w i t h p{U) > 0, where / i is any n o r m a l i z e d measure that is absolutely continuous w i t h respect to Lebesgue measure. Definition 7 (Straddle). L e t U be a n interval of reals (either o p e n , closed, or half open) a n d let p G K . If p G U°J then we say that U straddles p. + Theorem 1. The S-R-T division transformation is expanding for all real divisors. Proof. L e t (X,B,m) be a p r o b a b i l i t y space where X = [0,1), B is the B o r e l o- algebra o n X a n d m is the Lebesgue measure o n H*. L e t T D : X ->• X be the S - R - T d i v i s i o n t r a n s f o r m a t i o n for a given normalized divisor D as defined i n equation 1.1. *The symbol o as the exponent of an interval denotes an open version of the interval. *For an interval [a, b], the Lebesgue measure is defined as m([a, b]) = b — a. 16 Let us define an infinite sequence of intervals U = {C/jjieN as and Ui = U C/°C[0,i) g [0, i ) and (7° g U° T (C7in[0,i)) D ort/?C[±,l) [i, 1) and m(C7i D [0, i ) ) > m ( < 7 [ i , l ) ) i n tf?g[0,±)andtf?g[±,l) n [ ± , i ) ) and n [ o , i ) ) Property 1. For all Proof. Ui such that \ U° and D g U°, m{U i) i+ TD i = 2m{Ui). [5,-D), or [D, 1), then we are in the first If a U° is a subset of either [0, case of the U definition and we apply <m((7 n[i,l)). directly. Since each of the three cases of the expand an interval by a factor of two, it is clear that m(T£>(Ui)) = m(Ui \) + = 2m(Ui). Property 2. Proof. For all Ui where D 0 Ui, m(Ui i) + > m(Ui). Assume that D $ Ui. If j £ Ui, then according to Property 1, U{+\ doubles. Otherwise, \ 6 U{ and therefore, to find E/i+i, we must consider the second and third cases of the U sequence. In the worst case, m(Ui D [0, ^)) = m(Ui n [^,D)), and regardless of which half we choose, m(Ui By fl [0, ^)) = m(Ui fl [5, D)) = ^m(Ui). applying Trj to this truncated interval, we double what we halved so that m(Ui) = m{U i). i+ By way of contradiction, let us assume that there exists a sequence of U that never expands to fill X. Such a sequence can never include the point D and the following Property will hold: Property 3. (a) There exists N such m ( [ / ; n [ 0 , ^)),m(Uif][^, straddle that for all i > N 1)) > 0 (in other \), and 17 words, all subsequent intervals must (b) m{Ui PI [0, \)) < m(Ui n that the right Proof of Property half 1)) (in other words, of Ui is not discarded all subsequent by the definition Ui must be such ofU). 3(a) Property 1 says that the only way not to double is to straddle ^. Therefore, at a minimum, it must be the case that ^ is eventually included every time or else the interval will double a sufficient number of times to include D which would be a contradiction. Proof e,\ of Property 3(b) If m{Ui n [0, | ) ) + e') where e > e'. Now U i+1 > m(Ui = T {Ui) n [£, 1)), = T (\ D then we have Ui = {\ - - e, \) = (1 - 2e, 1). B u t , D since D is not i n C/j+i, \ cannot be in t/i+i and Property 3(a) fails, resulting in a contradiction. B y Property 3, we will eventually be in a situation where U{ = ( \ — e', \ +e), e' < e, and Property 3 will hold for every subsequent interval. So then Ui i+ =T {\-e',\+e) =T [\,\ D + e) = (2Z> - 1 - 2e, 2 D - 1] D by Property 3(b). But again by Property 3, U l+2 = T {2D D - 1 - 2e,2D - 1] = T [\,2D - 1] = [2 - 2D,2D D - 1]. It is now clear that ^ is at the midpoint of Ui+2 and that we must now pick the left half of the interval which contradicts Property 3(b). Therefore, D will eventually be included in an interval and the sequence will expand to fill all of X. • We can now prove that the S-R-T division process is weak-mixing, and therefore Bernoulli, by two theorems of Bowen [1]. Theorem 2 (of Bowen [1]). T-invariant system probability Let T be a piece-wise measure, (T, /j,) is weak-mixing, then C 2 map of [0,1], [i be a and A = infn< <i |/'0c)| > 1x the natural 18 extension U of (T, p) is smooth dynamical Bernoulli. W e mention here that the natural of (T, n) is the associated a u - extensions t o m o r p h i c transformation that we alluded to at the b e g i n n i n g of this chapter. See Petersen [18, p. 13] for a n exact definition. Theorem 3 (of Bowen [1]). With T and mixing ifT is / i as in Theorem 2, C Perron-Frobenius with T, then for any f operator (n Sfc=o f* has the property -f /)n^=i under ? s associated convergent that Pf* 2 function be a probability and let T : X —>• X be a piecewise invariant will be weak- expanding. Theorem 4 (of Lasota and Yorke [10]). Let (X,B,m) fc (T,n) such that i n f \T'\ > 1. If P is the in norm to a function = f* space f* € L\. and consequently, G L, the sequence 1 The limit function the measure d/Lt* = / * d m is T. H a v i n g established that TD is e x p a n d i n g , we now use the above three theorems to prove the central result of this thesis. Theorem 5. TD is Bernoulli. Proof. F r o m the definition of T o , we see that TD is C a n d that i n f o < < i |TD'(:E)| = 2 x 2 > 1 since \TD'(X)\ i n f n < < i \TD'(X)\ x = 2 for a l l x for w h i c h the derivative is defined. Since > 1, by T h e o r e m 4 there exists at least one \x such that fi is a s m o o t h T o - i n v a r i a n t p r o b a b i l i t y measure. B y T h e o r e m 1, we see that T h e o r e m 3 holds. Hence, (TD,U-) 2.2 is w e a k - m i x i n g a n d , by T h e o r e m 2 ( T o , ^ ) is B e r n o u l l i . • Entropy of T D K n o w i n g that a l l T o are B e r n o u l l i is a very useful property because we c a n use entropy as a complete invariant to show i s o m o r p h i s m amongst the two-sided B e r n o u l l i shifts associated w i t h T o that have the same entropy. T h i s comes f r o m the contrib u t i o n of O r n s t e i n to the K o l m o g o r o v - O r n s t e i n T h e o r e m . 19 Theorem 6 (of Kolmogorov [8, 9] and Ornstein [16]). Two Bernoulli shifts are isomorphic if and only if they have the same entropy. T h e purpose of this section is t o calculate the entropy o f T p . W e begin w i t h a m u l t i - p a r t definition of entropy along w i t h some s u p p o r t i n g definitions that follow the development presented by Walters [24, pp. 75-87]. Definition 8 (Partition). A partition of (X,B,m) is a disjoint collection of ele- ments of B whose u n i o n is X. Definition 9 (Join). Let V and Q be finite partitions of (X, B, m). T h e n ? V Q = {PnQ : P G V, a n d Q 6 Q } is called the join of V a n d Q. N o t e that V V Q is also a finite p a r t i t i o n of (X, B, m). Definition 10 (Entropy of a partition). L e t (X,B,m) a n d let V = {Pi,...,Pk} be a finite p a r t i t i o n of (X,B,m). be a p r o b a b i l i t y space T h e entropy of the partition is defined as H(V) k = -^miPi) log m(Pi). Definition 11 (Entropy of a transformation with respect to a partition). Suppose T : X —> X is a measure-preserving transformation of the p r o b a b i l i t y space (X, B, m). If V is a finite p a r t i t i o n of (X, B, m), then is called the entropy ofT with respect to partition V. Definition 12 (Entropy of a transformation). L e t T : X —> X be a measurepreserving t r a n s f o r m a t i o n of the p r o b a b i l i t y space (X, B,m) a n d suppose h(T) = s u p / i ( T , V), where the s u p r e m u m is taken over a l l finite partitions V of T h e n h(T) is called the entropy ofT. 20 (X,B,m). T h e following definitions a n d theorems involving C - m a p s a n d P C - m a p s are taken f r o m a paper o f L e d r a p p i e r [12] a n d have been streamlined for o u r argument. Definition 13 (of Ledrappier [12], C-map). A real f u n c t i o n / defined o n a n interval [a, b] is said to be a C-map if / is continuously differentiable a n d its derivative / ' has the following properties: (a) / ' satisfies a H o l d e r condition^ of order e > 0. (b) T h e r e are only a finite number of points x G [a, 6] where f'(x) t h e m by a < a\ < a •.. < a 2 n = 0. W e denote < b w i t h / ' ( a ; ) = 0 for 0 < i < n. (c) T h e r e exist positive numbers k~ (kf) such that log ,k |x—a| 1 -(+) is b o u n d e d i n a left (right) neighborhood of a ; . Definition 14 (of Ledrappier [12], PC-map). A m a p / : [0,1) —> [0,1) is called a PC-map if there exists a finite p a r t i t i o n 0 < b\ < b .. • < b 2 C - m a p f r o m [bj,bj i] + into [0,1), for any j. Theorem 7 (of Ledrappier [12]). Let f be a PC-map. lutely continuous < 1 such that / is a m invariant measure), If then Rohlin's formula is an a.c.i.m. \i (abso- [20] is true: Hf) = j log|/'|d/i. Theorem 8. The entropy h{T ), D ofT D for D e [^,1) is equal to / l o g | T o ' | d/z = log 2. Proof. W e begin by showing that T o is a P C - m a p . B y the definition of a P C - m a p , T p is a P C - m a p if each of the three functions Tu\^ ij ,T!D|[1 y Q D a n d To\y ^ is a D C-map. T r i v i a l l y , each Try restricted to any of the three domains [0, | ) , [D,l) or satisfies a H o l d e r c o n d i t i o n of order e = 1 because each piece of To is just a §A function f(x) defined on an interval [a, b] satisfies a Holder condition of order e E K if there exists c 6 R such that for any two points pi,P2 G [a,b], \f{pi) — f(p2)\ < c\pi - P2\ • + + e 21 line of slope two. T h u s c o n d i t i o n (a) of D e f i n i t i o n 13 is satisfied. C o n d i t i o n (b) is satisfied because there are no points for w h i c h the derivative is equal to zero w i t h i n a given line segment. Therefore, condition (c) is t r i v i a l l y satisfied. T h u s each of the three segments of TD are C - m a p s a n d by D e f i n i t i o n 14, T o is a P C - m a p . N o w , since each TD is B e r n o u l l i , there exists a unique a.c.i.m., call it /z, for each To- B y T h e o r e m 7, we can use R o h l i n ' s formula to calculate the entropy: • W i t h the proof of T h e o r e m 8 we have established i s o m o r p h i s m amongst the a u t o m o r p h i c transformations (or n a t u r a l extension) associated w i t h simple S - R - T d i v i s i o n transformations by a n a p p l i c a t i o n of the K o l m o g o r o v - O r n s t e i n T h e o r e m . T h e key to o b t a i n i n g this result was being able to show that TD has Bowen's exp a n d i n g property. In C h a p t e r 3, we extend the results of this chapter to a more general type of S - R - T d i v i s i o n . 22 Chapter 3 Extensions t o M u l t i - D i v i s o r S-R-T Division 3.1 Multi-Divisor S-R-T Division A c o m m o n o p t i m i z a t i o n to the S - R - T d i v i s i o n a l g o r i t h m is the i n c l u s i o n of a d d i t i o n a l divisors to increase the shift average. I n this section, we prove that a l l such d i v i s i o n algorithms w i t h reasonable assumptions o n the separation of the divisor multiples have the e x p a n d i n g property. It w i l l be useful to define precisely a class of m u l t i divisor S - R - T d i v i s i o n transformations. Definition 15. L e t a G W be such that 1 (a) 0 < ai < a < • • • < a , and (b) F o r a l l x,D e [|, 1), there exists i G { 1 , . . . , n) such that \c<iD — 2 n x\<\. W e define 2 l „ to be the set of a l l a G R , satisfying conditions (a) a n d (b). A l s o , n Definition 16 (Peaks and Valleys). G i v e n a n a between two lines f(x) G 2 l > 2 , the point of intersection N = 2(x - ctiD) a n d g(x) = 2(a iD i+ 23 - x) w i l l be called a peak a n d is denoted by ipi = (^D(cti i + ati), D(a{ i to the abscissa as ipf = ^D(oti i + a j ) , a n d to the ordinate as ipf = D{a.i \ + + + — a;)). For convenience, we w i l l refer — aj). + T h e point of intersection of the two lines f(x) = 2{aiD — x) a n d g(x) = 2(x — otiD) is (cxiD,0) a n d w i l l be called a valley. Definition For a D e 17. a n d Q 6 21, define the t r a n s f o r m a t i o n Tr), (x) a : [0,1) —* [0,1). F o r a S 2 t i , we get the familiar transformation 2x TDA ) 0 < \2(D-x)\ For X < i X < = X < 1. a € 2l , 2 Q<x<\ 2x \2{ D-x)\ i < x < ipf \2{a D-x)\ \ < x a n d ipf < x < 1 . ai 2 For a € 2 l n > 3 , 2x 0 <x < \ \2( D-x)\ \ < x < ipf \2( D-x)\ \<x \2{a D-x)\ \ < x a n d ipn-i < x < l . ai ai n Definition 1 8 . Define Wl = {T n UneN^n- We c a n * n es e Di(X t °^ we c a l l iVd the set of multi-divisor a n a n d ipf < x < tpf +1 : D <E ( ± , 1 ] , a € 2 t } a n d define iW = n -divisor n S-R-T S-R-T division division transformations and transformations. C o n d i t i o n (b) i n D e f i n i t i o n 15 guarantees that the d i v i s i o n a l g o r i t h m generates a new quotient bit every step. A l t h o u g h the c o n d i t i o n makes intuitive sense, it is not i m m e d i a t e l y obvious i f a n a satisfies the c o n d i t i o n just by inspection. L e m m a 10 below provides a n easier way t o check. 24 Lemma 9. If a = ( a i ) , then condition (b) of Definition 15 is satisfied if and only if ai = 1. Proof. If a\ = 1, then maxjr> e[i/2,i) \ \D « i ^ 1 and e G M + — x\ < | . N o w consider the cases when a ]X . If ct\ = 1 + e, then when D = a n d x = \ , \OL\D — x\ = 1 — i = i ^ i . O n the other h a n d , if ct\ = 1 — e , then when D = \ a n d a; = 1 — | , |ai£> - x | = 1 - | - (1 - e ) ± = i ^ \ . Lemma 10. A n condition (i) G 2l a{ € ( 0 , 7 ? ] and (Sketch). if for some otj € a^c? satisfies n (b) if and only ( M J a» € [^,1] Proof a • [1,1 + a*], condition i,j (a) of Definition G { 1 , . . . ,n} (possibly 15 also i = j), satisfies either or € [l,3«i]. L e m m a 9 has shown that a single component a of a w i t h a = 1 is sufficient to ensure that the range of f(x) = 2 \aD — x\ is equal to [0,1) as x a n d D range over [5,1). It is easy, to see based on the proof of L e m m a 9 that if there does not exist i 6 { 1 , . . . ,n} such that ai = 1, then there must exist i,j 6 { 1 , . . . ,n} (i < j) where OJJ < 1 a n d a > 1. 3 Let us assume that i is the largest value where CKJ < 1, a n d let us assume that j is the smallest value where ctj > 1 (therefore j = i + 1). W e make this a s s u m p t i o n because no other scalars of D w i l l have an influence on whether or not c o n d i t i o n (b) is satisfied. C o n s i d e r the case where c*j G (0, ^]. In this case where ai 6 (0, | ] , when D is close enough to 1, some of the line f(x) = 2(x — aiD) appears i n the region (denoted R) where \ < x < 1, 0 < T (x) a < 1. W h e n a p o r t i o n of the line f(x) appears i n region R, we must put restrictions on aj i n terms of ai so that the peak ?/>! is always i n R. ipf is greatest when D = 1. W e find the m a x i m u m allowable value of aj by setting D = 1 a n d solving ipf = 1 for aj: ipf = I Therefore, if a ; G (0, => D(aj then - ai) = 1 G [1,1 + aj]. 25 aj = a ; + 1. In the case where a j € [^, 1], for large enough values of D, the line f(x) 2(x — Dai) = crosses the line x = 1 i n the range [0,1). Because of this, we must loosen the restriction that ctj G [1,1 + ctj]. It is straightforward to calculate that begins to cross the line x = 1 i n the range [0,1) w h e n D = f(x) W e c a n ensure that as D becomes smaller, the peak ipi w i l l always be i n region R by solving tpf = 1 for ctj when D = tp\ = 1 D(aj - on) = 1 => 7f—((Xj - = 1 on) =4> ctj = 3a . t Therefore, if ai € [^, 1], then ctj £ [1, 3aj]. • Definition 19 (Separation). F o r a € 2 l , we define the separation i n a as n separation(a) = max i€{l,...,n-l} 1 + 1 . ttj L i m i t i n g the separation is a convenient way to restrict the subset of 21 being considered. If separation(a) = r, we say that "the divisor multiples i n a are separated by at most a factor of r." F i g u r e 3.1 shows a n example of m u l t i - d i v i s o r S - R - T d i v i s i o n . T h i s example is performing the same calculation as i n figure 1.2, but it has c o m p u t e d the d i v i d e n d w i t h twice as m a n y digits of precision w i t h the same effective n u m b e r of uses of the adders. W e say "effective" because i n m u l t i - d i v i s o r S - R - T d i v i s i o n , there are several adders w o r k i n g i n parallel. In a real i m p l e m e n t a t i o n of m u l t i - d i v i s o r S - R - T d i v i s i o n , the values for a must be carefully chosen so that not too m u c h overhead is required to select a good p a r t i a l remainder. T h e r e is also a tradeoff between the amount of overhead i n choosing a good p a r t i a l remainder a n d the precision to w i t h w h i c h a g o o d p a r t i a l remainder is selected. 26 0.67 Po = 0.67 Pi = 2(0.67 - -0.16 a D) 2 P4 = 2(2 (-0.16) + anD) 2 P7 = -0.155 = 2(2 (-0.155) + a D) = -0.115 2 x Pll = 2(2 (-0.115) + a D) 3 3 = Qo = 1 Qo = 0.035 Q3 = -a Qz = 0.90625 <76 = Q = 0.89453125 9io = - « 3 Qw = 0.8933105469 x 6 = 2(2 (0.035) - a D) = -0.005 Qis = Ql5 = 0.8933334351 P24= 2(2 (0.005) + a D) = -0.155 923 = Q2S = 0.8933333456 Pl6 4 x 7 x Figure 3.1: An example of S-R-T division where three multiples of the divisor are used. In this example the dividend po — 0.67, and the divisor D = 0.75 with divisor multiples a = (0.75,1,1.25). The quotient Q* is 0.893. 3.2 Proof of Bernoulliness In this section, we will show that all multi-divisor S-R-T division transformations are Bernoulli, given a necessary restriction on the multiples of the divisor. As in the case for a single divisor, it will be useful to define a sequence of intervals that are subsets of the sequence of sets that would arise from repeatedly applying TD,Q. to an initial open interval. Unless otherwise noted, assume that the function m denotes the Lebesgue measure. Definition 20. Given an initial open interval U C [0,1) and To^a G 9DT, we define 27 the infinite sequence of intervals U = {Ui}^ as and Ui = U C/°C[0,i) T , (Ui) D a T , (Ui D a t / ° £ [0, orC7?C[i,l) n [0, i ) ) a n d t/? g [ i , 1) a n d m(f/<h[0,i)) T ^ a ^ i a n d C7? g [ ± , l ) U°%%\) n [ i , i ) ) >m{Uif\{\,\)) and m(f7 n[0,i)) i Definition 21 (Critical Points). For a given Tu, a C = { Ci n : i 6 {1, • • •, m } , a E B U {0, ±, 1}} n C2 < . . . < 1. C is called the set of critical Lemma 11 (Doubling). GivenTr),a points for T o £ m be the set of critical j G {1,... , r a - 1}, then m(Ui+i) Proof. Since Ui C [CJ,CJ \] + = ) a • • •, V>n-i}} a n d c i < . ^ ^ sequence of intervals U be defined e 20 and let Ui be some interval C = {ci,... ,c } ) i where a € 2 l , define the set where 5 = {& : i < & < 1 and b G { a i - D , • • •, a - D } U O f , as in Definition <m(f/ n[i l)). in the sequence. points for If Ui C T D , « . Furthermore, [ C J , C J + I ] for let some 2m(Ui). for some j £ { l , . . . , m - l } , because we are i n the first case of the definition of U, either U? C [0, \) or U° C [ i , 1). B y simple inspection of the i n d i v i d u a l cases that define Trj , i(X the points Cj and Cj+i, it is apparent that a l l of Ui, except possibly fall w i t h i n the same case of Tr), a Therefore, the resulting interval Ui+i w i l l be double the length of Ui. Definition 22 (Active Valleys). G i v e n T • Dt(X € 93T , define n V = {ctiD : i 6 { 1 , . . . , n) and \ < a D t V is called the set of active valleys for To, a 28 < 1} . Definition 23 (Active Peaks). Given Tn, G Tl , define a n P = {if>f : t e { l , . . . , n - l } andi<^f<l}. P is called the set of active peaks for To, a Lemma 12 (Non-shrinking). the sequence of intervals {Ui}i^ active valleys for T r j , . For any interval a m(U i) i+ > m(Ui) or m(U ) > i+2 Proof, separation(a) with separation(a) Given TD,OC G < | , let be defined as above and let V denote the set of Ui G U such that V f l Ui = 0, either m(Ui). < | implies that cti \ < + For a given separation, the value of ip\ is maximized when ipf = 1. This implies that aj = We calculate the value of ipf with the assumption that ipf = 1 to get a bound on ipf for D < 1: iPf < D(l ai - ai) = D(l ) ai = £>(§&) = 5 • Case 1: Consider when Ui C [0, \ ] . In this case, m(Ui \) — 2m(Ui). + Case 2: Consider when Ui C [^,1). The interval Ui can span at most one peak. Therefore, m(£/j+i) > m(Ui). m{U ) i+2 = A further observation is that since Ui+i C [0, 2m{Ui). Case 3: Consider when Ui £ [0, \] and Ui % 1). In this case, Ui straddles \ . From the definition of U, we see that in the worst case we might throw away up to half of Ui. Call the part not thrown away Ui and observe that m(Ui) > \m(Ui). Now, either Ui' C [0, \) or Ui' C [i, 1). If [// C [0, ±], then m(C/i+i) = 2m(<7;') > m{Ui). If t/i' C [ i , 1), then m ( C / i ) = 2m(£//) > m(c/;). Lemma 13. A multi-divisor when separation(a) • +2 S-R-T division transformation Tr>, G 9JI is expanding a < |. Proof. Let V be the set of active valleys (as defined in Definition 22) for a Trj,a- Let P be the set of active peaks (as defined in Definition 23) for a To,a- Let U = be the sequence of intervals associated with a To, and an initial interval U. a 29 {Ui\i^n B y way of c o n t r a d i c t i o n , assume that a T£> >a is not e x p a n d i n g . T h i s means that for some T r j , , there does not exist a n interval Ui where any of the points i n V a are contained i n Ui. T h i s is true because i f any of the valley points are i n Ui, then Ui i + = [0, e) or C / j + 1 = [0,e], a n d after a finite number of steps, Ui w i l l have grown to include a l l of [0,1). If there is a sequence U that avoids a l l points i n V, t h e n b y L e m m a 12 it must be true that the intervals i n the sequence can only double a finite number of times. L e t i e N b e the first index for w h i c h there is no j > i such that m(Uj) > 2m(Ui). It now follows that Ui straddles ^ . T h e proof for L e m m a 12 reveals that this is the only s i t u a t i o n where it is not necessarily the case that either m{Ui \) + or rn(Ui+2) = 2m(Ui). I n fact, Ui must straddle b o t h \ a n d m i n P . not s t r a d d l e d a n d m(Ui f l or m(Ui z) + m(Ui n [0, > 2m(Ui). [0, ^)) < m(Ui f l [|,1)), then either m(Ui+2) = 2m{Ui) If m i n P is > 2m([/j) I n the other possibility where m i n P is not straddled a n d ^)) > m{Ui n [1,1)), we find that m(U ) i+2 > 2m{Ui). A s s u m i n g that Ui straddles b o t h ^ a n d m i n P , we also observe that there can be no j > i such that m(Uj D [0, 5)) > m(Uj f l [5,1)) because this q u i c k l y leads to d o u b l i n g . I n other words, the right side must be larger t h a n the left side whenever we straddle | . Therefore, we must be i n the s i t u a t i o n where t/i = ( i - e ' , i + e), e'<£ Ui+i = (min{2(i - aiD),2(a D- U l + 2 = (2min{2(i u i+3 = (min{2(i - aiD),2{a Ui+4 = (2min{2(i => U i + 5 i+l - aiD),2(a D i+1 D- i+1 -aiD),2{a D i+1 = (min{2(i - aiD),2{a D- i+1 _(i+ ))}, 2^?) e 2vf)},^r) -2ltf)},2#) 2 ^ ) } , ^ ) = l7 3 i+ It is apparent that the interval represented by [/j+4 w i l l re-occur every other interval ad infinitum. W e now use this interval to show that i n fact such a sequence of n o n - e x p a n d i n g intervals is not possible. 30 Since straddles \ , we can compare the length of the left a n d right sides of J7j+4. L e t R = [^,2i/>f) denote the right side a n d let L = (4(^ — o^-D), \ ) a n d V = (4(cti+iD — 2ipy), ^) denote the two possibilities for the left side. T h e length of the right side is = 2Vf - m(R) i, while the length of the left side is the larger of two possible lengths m(L) = \ - 4(i - D) ai and m(L') = i-4(a i + i£>-2^) . W e then compare the differences between the right side a n d each of the two possible left sides. T h e first possibility is m(R) - m(L) = 2$ a D)) - I - (I - 4(i - = 2D(a - i+l = 2a D a i t ) - 1 + 2- 4a D { - GctiD + 1 , i+1 while the second possibility is m(R) - m(L') = 2^f - ± - (± - 4 ( a = 2D{a l+x = -2a D i+i l + 1 2$)) - - ai) - 1 + 4{a D l+1 + 6aiD - - 2D{a l+1 - a )) t 1. It is now clear that m(R) - m(L) = - (m{R) - m(L')) . B u t this means that the length of the left side is always greater t h a n or equal to the length of the right side, w h i c h contradicts our a s s u m p t i o n t h a t the right side must be bigger t h a n the left side whenever the interval straddles ^. • 31 Theorem 14. To,a G %Jl is Bernoulli Proo/. Let T = T o i a . when separation(cx) < |. F r o m the definition of T , we see that T o is C a n d that 2 > a i n f o < < i |T'(a;)| = 2 > 1 since | T ' ( x ) | = 2 for a l l x for w h i c h the derivative is defined. x Since i n f o < < i |T"(a;)| > 1, by T h e o r e m 4, there exists at least one /z such that zz is x a s m o o t h T - i n v a r i a n t p r o b a b i l i t y measure. B y L e m m a 13 we see that T h e o r e m 3 holds w h e n separation(a) < §. Hence, (T, /z) is w e a k - m i x i n g a n d b y T h e o r e m 2, (T,/z) is B e r n o u l l i w h e n separation(a) 3.3 < |. • Some Restrictions on ex. In section 3.2, we showed that i f a l l T D ) Q G 9Jt, i f separation(a) < §, t h e n TD >OC is B e r n o u l l i . I n this section, we construct examples of T G %Jl , for every n, that fail n to be B e r n o u l l i w h e n the restriction that separation(a) Theorem 15. For TD,O. G i/iere ea;zsr; uncountably if separation(a) ffin>4, many a. for which Tu < § is relaxed. [|,1), > §, then for each D G is not ergodic. t0l Proof. W e b e g i n this proof by considering T G 9 J l = 4 n A s s u m e that we relax the restrictions o n a b y e > 0. T h i s means that separation(a) < | + e a n d that no peak can be above the line f(x) = this r e l a x a t i o n , we can define a = (c*i, a ,0:3,0:4) 2 a subset of [0,1) is n o n - e x p a n d i n g . 40C+2 4 Pe 9 £ 1 a n ( ^ a 4 = 40P+24Pe • ^ o r o u W e let ai = r With w i t h respect t o a given D so that g+48D > 2 = a 8 0 £ 8 or>+48£>e' 0 : 3 = constructed a to be v a l i d , we must be careful that conditions (a) a n d (b) of D e f i n i t i o n 15 h o l d . C o n d i t i o n (a) requires that the components of a r e m a i n i n ascending order. T h i s is satisfied w h e n e G (0, ^ ] . ordering is m a i n t a i n e d , separation(a) Since < 3, a n d minrj [i/2,i),ee(o,2/i5] 4 = 1-2 > 1, a g to verify that c o n d i t i o n (b) of D e f i n i t i o n 15 holds, it is sufficient to show (by L e m m a 10) that for a l l values of D a n d e, either ot\, a , or a G [5,1]- B y m a x i m i z i n g a n d 2 3 m i n i m i z i n g over e a n d D, we find that a\ G [0.375,0.7] a n d a G [0.625,1.3]. F i g u r e 2 32 3.2 provides a v i s u a l proof that as e is varied over [0, ^ ] a n d D is varied over [5,1], it is never the case that b o t h a\ < \ a n d a 2 that either a\ or a 2 > 1. Therefore, it is always the case G [5, !]• H a v i n g verified that our defined a satisfies D e f i n i t i o n 15, we calculate that peak ^ = ( i ± i f , l{j±if§) a n d peak </> 3 while r e m a i n i n g above the line g(x) below f(x) = (f^, ftl||). — ^, a n d the point ip W i t h this definition for 1 w i l l always be slightly while r e m a i n i n g above the line g(x) = \ . A l l of the definitions have been chosen so that we are i n a s i t u a t i o n where 1 — ip% = ^ 3 ~~ \ — 2(V'i ~~ \) = 2{ip\ A n o t h e r i m p o r t a n t feature i n this construction is the interval between a D 2 Since ip 2 and |)- — a^D. is not used i n our construction, it is possible to insert a n a r b i t r a r y number of divisor multiples between a D 2 a n d a D . T h u s , the results i n this proof a p p l y 3 to T G 9Jt„ for a r b i t r a r i l y large n. F i g u r e 3.3 illustrates the type of t r a n s f o r m a t i o n that we have constructed. We are now i n a p o s i t i o n to show that there exists a set of points A w i t h 0 < m(A) < 1, for w h i c h TD,OL{A) = A. T h i s is the definition of a t r a n s f o r m a t i o n 5). i + w - A* = [h- m -M+m - h)i 4. = [i - 2(1 - 1]. being non-ergodic [11, p. 59]. Define A = A\ U A It can be shown by calculation that T o ( A i ) ] a TD,OL{M) = A. 2 Therefore, TD,CI(A) U A 2 = A, 2 3 where A\ and = [\ - (ipf - = A\ U A 3 , and T^A^) = A, a n d by definition, To, a is non-ergodic or • non-expanding. 33 Figure 3.2: Combined plot of the regions where a\(e,D) < \ and ct2{e,D) > 1. Over the domain e 6 [0, ^ ] and D € [\, 1], it is never true that both ct\ < \ and «2 > 1- X Figure 3.3: A n example of a non-ergodic system for Trj a ample n = 4, D = a = (||, 1, | | ) , | + j^-. The thick lines represent T r j . ]a € 9Jt„>4- In this exand separation(a) = The coarse dashed line represents the necessary separation restriction on a to guarantee that T D A = ) Q §] is ergodic. u tl> I] u In this case, partial remainders in the set ( i t . !) a r e mapped back to A by T , . This D a means that T D . is not ergodic, and therefore not Bernoulli. ) Q 34 Theorem 16. For To, G 9Jt a 3) if separation(a) > | , then for each D £ [5,1), there exists an a for which T o i a is not ergodic. Proof. T h e proof for this theorem comes as a special case from the proof for T h e orem 15. C o n s i d e r a separation{a) = (oti, 0:2, 0^3,0^4) as defined i n the proof for 15. = | = | + we are i n the special situation where a 2 When = 0:3. Since a l l of the results for the proof of T h e o r e m 15 s t i l l h o l d , we now have a n example transformation T w i t h only three unique multiples of D a n d this T has been proven to be non-ergodic. F i g u r e 3.4 gives a n example of a non-ergodic transformation for D = l. • 2 0.25 0 0.5 0.75 1 X F i g u r e 3.4: A n example of a non-ergodic system for T o , D = a n d « = ( § > f> i f ) - T n e t m c ^ u n e s a £ 9^3- In this example, r e P r e s e n t I'D,a- The coarse dashed line represents the necessary separation restriction o n a to guarantee that To,a is ergodic. In this case, p a r t i a l remainders i n the set A = [±, ^ ] U [ ^ , i § M y § , 1) are m a p p e d back to A by T h i s means that Trj, a T,. is not ergodic, a n d therefore not B e r n o u l l i . 35 D a Theorem 17. For Trj, a there exist uncountably € 9#2, if separation(a) many a for which TD is not ergodic. >(X Proof. A s s u m e that separation(a) ai = jjj > 3, then for some D G (^,1), < 3 + e a n d D G {\, ^ p ) . F i r s t , we choose so that a i D = | a n d a = 1 + OL\. O u r restriction o n D i n terms of e has 2 been chosen so that a /a\ 2 < 3 + e w h e n 0:2 = 1 + « i - Since 0 2 > (a) of D e f i n i t i o n 15 is satisfied. Since a\ G (\, 5), a n d a condition G (1,1 + a\], b y L e m m a 2 10, c o n d i t i o n (b) of D e f i n i t i o n 15 is satisfied. T h u s , o u r defined a is always v a l i d . Define A = [±, D]. W e now a p p l y T = T D>a to A: T [ i , D] = [ m i n { 2 ( i - D),2(a D ai - 2 £>)},^] = [ m i n { 2 ( ± - 4%), 2(D + \ - D)},D(a 2 =[min{I,i},D(l + 4 )] ai ^- ^)] 4 N o w , since \ < D < 1, 0 < m ( A ) < 1 a n d T A Dt(x ergodic. - = A, b y D e f i n i t i o n T D ] a is not • 36 X F i g u r e 3.5: A n example of a non-ergodic system for Trj, a D = | and S 93?2. In this example, a = ( ^ , ^ ) . T h e thick lines represent Tr>,a- T h e coarse dashed line represents the necessary separation restriction o n a guarantee that T a to is ergodic. In this case, p a r t i a l remainders w i t h i n the interval [|, |] m a p back to |] a n d w h i c h means T o , a is not ergodic, a n d therefore not B e r n o u l l i . 3.4 Entropy of Multi-Divisor S-R-T Division T h e c a l c u l a t i o n for entropy i n m u l t i - d i v i s o r S - R - T d i v i s i o n follows the same m e t h o d used for single divisor S - R - T d i v i s i o n . W e begin by showing that To,a is a P C - m a p . Lemma 18. Proof. Trj tOC eiVl is a PC-map B y i n s p e c t i o n , each T o , a (as defined in Definition 14)- is a finite collection of line segments each w i t h slope 2. E a c h of these line segments is a C - m a p by the same argument used i n the proof for T h e o r e m 8. Therefore, by definition, each To,a is a P C - m a p . Theorem 19. The entropy of any TD >(X S 9 ^ with separation(a) 37 • < | is log 2. Proof. B y L e m m a 18, a l l To, a when separation(ot) G 9Jt are P C - m a p s . B y T h e o r e m 14, TD,<X is B e r n o u l l i < | and hence there exists a unique a.c.i.m. /J. T h e o r e m 7 says that R o h l i n ' s f o r m u l a for the entropy is true and therefore: h{T , ) D a = J log | T b , ' | d i i = log 2 ^ dfi = log 2. a • 38 Chapter 4 Future Work T h e o r i g i n a l question that inspired this thesis was "Is simple S - R - T d i v i s i o n ergodic for a l l real d i v i s o r s ? " In p u r s u i n g the answer to this p r o b l e m , we discovered that not o n l y is simple S - R - T d i v i s i o n ergodic for a l l divisors, but it is also B e r n o u l l i . H a v i n g established a B e r n o u l l i property, a n d having calculated the entropy for our transformations, we were able to use the K o l m o g o r o v - O r n s t e i n theorem to conclude that our transformations are isomorphic to each other. In p r o v i n g these i m p o r t a n t properties for simple S - R - T d i v i s i o n , we made extensive use of more general results f r o m d y n a m i c a l systems theory. Consequently, our results were shown to be easily extensible to more general d i v i s i o n systems. In general, it is difficult to prove that a p a r t i c u l a r class of transformations are ergodic or B e r n o u l l i . O u r results have p r o v i d e d a n effective means of p r o v i n g b o t h of these properties for a large class of S - R - T - l i k e d i v i s i o n algorithms. P r o m the standpoint of understanding a n algorithm's expected performance, it is necessary to know that when a stationary d i s t r i b u t i o n is f o u n d , it is unique. H a v i n g established the uniqueness of stationary distributions, the next step is to find the a c t u a l stationary d i s t r i b u t i o n for as wide a class of transformations as possible. In section 1.3, we verified a k n o w n expression for the stationary d i s t r i b u t i o n f u n c t i o n for TD where D e [|, 1). In a d d i t i o n , m a n y of the stationary d i s t r i b u t i o n functions 39 have been classified by Shively and F r e i m a n for D G [§, §], a l t h o u g h the derivations are not nearly as simple as for D G [§, 1). It turns out that things become very c o m p l i c a t e d w h e n D G [^, §]. In his thesis [22], Shively shows m a n y interesting properties for the stationary d i s t r i b u t i o n functions i n this region. F o r example, he shows that there are many different intervals of D where there are a n infinite number of different stationary d i s t r i b u t i o n equations. A s such, the g r a p h of the shift average for D G [5, |] is far f r o m complete a n d appears to have a°complex p a t t e r n (from the few points that have been plotted i n this region). T h i s is s u r p r i s i n g considering the s i m p l i c i t y of the u n d e r l y i n g transformation. A better u n d e r s t a n d i n g of this final region of simple S - R - T d i v i s i o n would be an interesting goal to pursue. In the work of F r e i m a n [6], it was first shown that the shift average for D G [|, |] is constantly 3, w h i c h can be easily shown to be the m a x i m u m possible shift average. T h i s property was then used by M e t z e [15] to o b t a i n a version of S - R - T d i v i s i o n that has a n expected shift average of 3 for a l l divisors. A n o t h e r area to pursue w o u l d be to explore shift averages for m u l t i - d i v i s o r S - R - T d i v i s i o n a n d , if other plateaus are f o u n d , they could possibly be used to o b t a i n higher expected shift averages for a l l possible divisors. U n d o u b t e d l y , o b t a i n i n g a complete u n d e r s t a n d i n g of the stationary d i s t r i b u t i o n functions for m u l t i - d i v i s o r d i v i s i o n w o u l d be even more difficult t h a n it is for simple S - R - T d i v i s i o n . It is possible that such results i n this area could lead to improvements i n m o d e r n S - R - T d i v i s i o n . R e l a t e d to this, it w o u l d be interesting to attempt to extend the results of this thesis to m o d e r n S - R - T division. 40 Bibliography [1] R u f u s B o w e n . B e r n o u l l i maps of the interval. Israel J. Math., 28(1-2):161-168, 1977. [2] A b r a h a m B o y a r s k y and Pawel G o r a . Laws of chaos. B i r k h a u s e r B o s t o n Inc., B o s t o n , M A , 1997. Invariant measures a n d d y n a m i c a l systems i n one dimension. [3] Digital Joseph F. Cavanagh. tion. Computer Arithmetic: Design and Implementa- M c G r a w - H i l l , N e w Y o r k , N Y , 1984. [4] J . Cocke a n d D. W . Sweeney. H i g h speed arithmetic i n a parallel device. T e c h n i c a l report, I B M , F e b r u a r y 1957. [5] M i l o s D. Ercegovac a n d Tomas L a n g . Division algorithms and implementations. and square root: digit-recurrence Kluwer Academic Publishers Group, Norwell, M A , U S A , a n d Dordrecht, T h e Netherlands, 1994. [6] C . V . F r e i m a n . S t a t i s t i c a l analysis of certain b i n a r y d i v i s i o n algorithms. IRE, 49:91-103, 1961. [7] D. H a r r i s , S. O b e r m a n , and M . H o r o w i t z . S R T d i v i s i o n architectures i m p l e m e n t a t i o n s . In 13th IEEE Symposium on Computer ings, California, USA, v o l u m e 13 of Symposium July Computer 6-9, 1997, Asilomar, Arithmetic, [8] A . N . K o l m o g o r o v . Akad. Proc. Nauk SSSR, Arithmetic: and proceedon pages 18-25. I E E E C o m p u t e r Society Press, 1997. A new invariant for transitive d y n a m i c a l systems. 119:861-864, 1958. 41 Dokl. [9] A . N . K o l m o g o r o v . p h i s m s . Dokl. E n t r o p y per unit time as a metric invariant of automor- Akad. Nauk SSSR, 124:754-755, 1959. [10] A . L a s o t a a n d James A . Yorke. O n the existence of invariant measures for Trans. piecewise m o n o t o n i c transformations. Amer. Math. Soc, 186:481-488 (1974), 1973. [11] A n d r z e j L a s o t a a n d M i c h a e l C . Mackey. Chaos, fractals, and noise. Springer- V e r l a g , N e w Y o r k , second edition, 1994. Stochastic aspects of d y n a m i c s . [12] Frangois L e d r a p p i e r . Some properties of absolutely continuous invariant measures on a n i n t e r v a l . Ergodic Theory Dynamical Systems, l ( l ) : 7 7 - 9 3 , 1981. [13] T i e n Y i e n L i a n d James A . Yorke. E r g o d i c transformations f r o m a n interval into itself. Trans. Amer. Math. Soc., 235:183-192, 1978. [14] O . L. M a c S o r l e y . High-speed arithmetic i n b i n a r y computers. Proc. IRE, 4 9 : 6 7 91, 1961. [15] G e m o t M e t z e . A class of b i n a r y divisions y i e l d i n g m i n i m a l l y represented quotients. IRE Trans, on Electronic [16] D o n a l d S. O r n s t e i n . Advances [17] Computers, EC-ll:761-764, D e c . 1961. B e r n o u l l i shifts w i t h the same entropy are isomorphic. in Math.,.4:337-352, D o n a l d S. O r n s t e i n . Ergodic 1970. theory, randomness, and dynamical systems. Yale U n i v e r s i t y Press, N e w H a v e n , C o n n . , 1974. James K . W h i t t e m o r e Lectures i n M a t h e m a t i c s given at Y a l e University, Y a l e M a t h e m a t i c a l M o n o g r a p h s , N o . 5. [18] K a r l Petersen. Ergodic theory. C a m b r i d g e U n i v e r s i t y Press, C a m b r i d g e , 1983. [19] J . E . R o b e r t s o n . A new class of d i g i t a l d i v i s i o n methods. IRE Trans. Computers, E C - 7 : 2 1 8 - 2 2 2 , Sept. 1958. 42 Electronic [20] V . A . R o h l i n . E x a c t endomorphisms of a Lesbesgue space. Amer. Transl., Math. Soc. 39, 1964. [21] P a u l Shields. The theory of Bernoulli shifts. T h e U n i v e r s i t y of C h i c a g o Press, C h i c a g o , I l l . - L o n d o n , 1973. C h i c a g o Lectures i n M a t h e m a t i c s . [22] R o b e r t Shively. Stationary division. distribution of partial remainders in S-R- T digital P h D thesis, U n i v e r s i t y of Illinois, 1963. [23] K . D . Tocher. Techniques of m u l t i p l i c a t i o n a n d d i v i s i o n for a u t o m a t i c computers. Quart. J. Mech. [24] Peter Walters. An introduction Appl. Math., to ergodic 1982. 43 binary 11:364-384, J u l y - S e p t e m b e r 1958. theory. Springer-Verlag, N e w Y o r k ,
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- S-R-T division algorithms as dynamical systems
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
S-R-T division algorithms as dynamical systems McCann, Mark A. 2002
pdf
Page Metadata
Item Metadata
Title | S-R-T division algorithms as dynamical systems |
Creator |
McCann, Mark A. |
Date Issued | 2002 |
Description | S-R-T division, as it was discovered in the late 1950s [4, 19, 23], represented an important improvement in the speed of division algorithms for computers at the time. A variant of S-R-T division is still commonly implemented in computers today. Although some bounds on the performance of the original S-R-T division method were obtained, a great many questions remained unanswered. In this thesis, S-R-T division is described as a dynamical system. This enables us to bring modern dynamical systems theory, a relatively new development in mathematics, to bear on an older problem. In doing so, we are able to show that S-R-T division is ergodic, and is even Bernoulli, for all real divisors and dividends. |
Extent | 1827172 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-08-13 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051671 |
URI | http://hdl.handle.net/2429/12179 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2002-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2002-0175.pdf [ 1.74MB ]
- Metadata
- JSON: 831-1.0051671.json
- JSON-LD: 831-1.0051671-ld.json
- RDF/XML (Pretty): 831-1.0051671-rdf.xml
- RDF/JSON: 831-1.0051671-rdf.json
- Turtle: 831-1.0051671-turtle.txt
- N-Triples: 831-1.0051671-rdf-ntriples.txt
- Original Record: 831-1.0051671-source.json
- Full Text
- 831-1.0051671-fulltext.txt
- Citation
- 831-1.0051671.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051671/manifest