TERMINATION AND STORAGE REQUIREMENTS IN A MODEL FOR PARALLEL COMPUTATIONS by IVAN 3CHNAPP Dipl.Ing., Czech Technical University, Prague, 1966 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF . MASTER OF APPLIED SCIENCE in the Department of Electrical Engineering We accept this thesis as conforming to the required standard Research Supervisor Members of the Committee Acting Head of the Department Members of the Department of Electrical Engineering THE UNIVERSITY OF BRITISH COLUMBIA June, 1970 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver 8, Canada Date ABSTRACT This paper deals with a grajjh-theoret i c model for parallel computations as formulated-by Karp and Miller. A necessary condition and a sufficient condition for self-termination of loops with unit loop gain are presented. For the special case that W=U the necessary and sufficient condition is derived. A direct procedure for testing termination properties of strongly connected graphs is presented. A method due to Reiter, for determining the rua ximura storage required for a computation graph, is extended to cover the general case. TABLE OF CONTENTS ABSTRACT i TABLE OP CONTENTS ii LIST OF ILLUSTRATIONS iiACKNOWLEDGMENT v 1. INTRODUCTION. 1 1.1 The Karp-Miller model 1 1.2 Research connected with parallel computation models 6 2. TERMINATION PROPERTIES OF COMPUTATION GRAPHS 9 2.1 The Karp-Miller algorithm 9 2.2 Some necessary and sufficient conditions for self-termination of loops 17 2.3 A direct method of testing termination properties of strong components 25 3 . STORAGE REQUIREMENTS 8 3.1 Maximum storage requirement - special case 28 3.2 Maximum storage requirement - general case 33 4. CONCLUSIONS .40 REFERENCES 42 ii LIST OP ILLUSTRATIONS Figure Page 1 Computation graph for Laguerre polynomials ...5 2 Dependence of the amount of data on the loop gain 18-19 3 Termination of loops 24 iii ACKNOWLEDGMENT I wish to express my gratitude to Dr. E. L. Sigurdson for his guidance and encouragement in carrying out this research, and to Dr. G. F. Schrack for reading the manuscript. I also wish to acknowledge the financial support of the National Research Council. iv CHAPTER 1 INTRODUCTION As signal propagation speeds represent a serious barrier to increasing the speed of strictly sequential computers more attention has been paid in recent years to'the use of the parallel ism intrinsic to most computational algorithms. A number of de signs have appeared which utilize a number of processors which may simultaneously execute several steps of the computation (/l/-/5/), rather than overlapping of subfunctions in sequential processing. In general, values to be used.in a computation step are the results of previous computation steps. This establishes certain precedence constraints upon the computation steps. A model of such a system, satisfying a particular class of precedence constraints, lias been formulated by Karp and Miller /6/. This thesis studies some problems arising in connection with this model, in particular termination properties (Chapter 2) and storage requirements (Chapter 3). In this chapter we present the model and the results of pre vious research, and compare it with other approaches to the parallel processing. 1.1 The Karp-Miller Model The model represents the sequencing of a parallel computation by a finite directed graph. Each node of the graph corresponds to an operation in the computation (or to a processor assigned to per form that operation). Each branch represents a first-in first-out 1 2 queue of data directed from one processor to another. To describe data transformation by processors, with each node is associated a single-valued function determining the dependence of outputs on inputs. The eligibility for initiation of an operation is determin ed by the lengths of the queues on branches directed into its associated node. Thus a computation is represented by a directed graph G called a computation graph which is given by: (i) a set of nodes n^n^, . . (ii) a set of branches d^,..d^, where any given branch d^ is directed from a specified node n^ to a specified node n • , (iii) four nonnegative integers A ,U ,¥ and T , where • ° ° P P P P T > ¥ , associated with each branch d . P P P Here, A^ gives the initial number of data words in the first-in first-out queue associated with d ; U gives the number of 1 P P words added to the queue whenever the operation Ch associated with n^ terminates; V<T gives the number of words remove;: from the queue whenever the operation 0- is initiated; and T is a threshold giv-. J P ing the minimum queue lengtli of d^ which permits the initiation of 0-. Upon initiation of 0. only the first W of the T operands for J J P P 1 Ch are removed from the queue. The operation 0. associated with a given node n. is eligible J J for initiation if and only if, for each branch d directed into n., P J the number of words in the queue associated with d is greater than or equal to T^. After Ch becomes eligible for initiation, ¥ words are removed from each branch d directed into n.. The operation 0. P J J 3 is then performed. Vhen Ch terminates, words are placed on each branch d directed out from n.. The times required to perform the steps mentioned above are left unspecified by the original model as presented in /6/. These constraints on initiation lead to the following defin itions of the possible sequences of initiations associated with a given computation graph G. Let E be a sequence of nonempty sets ,S,j > • . >S^r, . . , such that each set S^T is a subset of£l,2,.., whore X is the number of nodes in G. Let x(j,0) - 0, and, for N>0, let x(j,N) denote the number of sets S , l<m<N, of which i is an element, in' ' 0 The sequence E is an execution of G if and only if for all N, the following conditions hold: (i) if jC Sv,. and G has a branch d directed from n. to J N+l p l n., then A + U x(i,N) - W x(j,N)> T : (ii) if E is finite and of length R, then for each j there exists a node n. and a branch d directed from n. to l p l n. such that A + U x(i,R) - W x(j,R)<T . j p p v ' ' p x J ' ' p An execution E of G is called a proper execution if the follow ing implication holds: (iii) if, for all n. and for every branch d directed from l J p n. to n., A + U x(i,N) - W x(j,N)>T , then j€ S„ l j p p ' p J ' p' J li for some R > N. The sequence E may be interpreted as giving a possible temporal sequence of initiations of operations throughout the per formance of the parallel computation specified by G; the occurence 4 of S.. denotes the simultaneous initiation of 0. for all }€. S,-. Condition (i) states that in order for node n_. to initiate J for the x(j,N + l)-th time, the queue lengths on its input branches must be greater than, or equal, to the respective branch thresholds. Condition (ii) defines the circumstance under which an execution terminates, i.e. under which the computation defined by G halts. This computation terminates when every node of G is unable to initiate. Condition (iii) requires that a node, if able to initiate, actually will do so after some finite number of initiations of other nodes. The following example illustrates these ideas: Example 1.1 Consider the Laguerre polynomials defined by the recurrence relation L ^-.(x) = (2n + 1 - x)L (x) - n2L ,(x) n+l ' nv ' n-1v ' with initial conditions LQ(x) - 1 Lx(x) = 1 - x We want to compute the values of Ln(x) for n = 2,3,..,N and for a given x. A computation graph for this calculation is in Fig.l. For each branch the intermediate result is shown. Branch coefficients are assumed to be A=0, U=Vi:=T=l unless otherwise shown. Brandies (n^,n^)' (i.e. the branch directed from n^ to n^ ) and (n2>n9) serve as counters; the computation is terminated by the depletion of queues associated with these branches. Node n^ produces 2n and n", and places them on (n,,n~) and (n, ,n-j), respectively. Node n„ takes 5 A=N-1 U=0 (2n+l-x) 2n n n. A=N-U=0 Ln-l(x> Computation graph for Laguerre polynomials Fig. 1 6 (l-x) from the branch ^2^2) and adds it to the other input 2n; the result (2n+l-x) is placed on (i^jn^). Node n^ forms the product of (2n+l-x) and Ln(x) and places the result on (n^,n^). Node n^ multi-2 plies Ln_-^(x) by n and places the result on (n^,n_). Finally, n^ produces the desired polynomials L^x), L^(x),.., L^,(x) and places them on (n^,n^) and (n^,n^). The initial data queue on (n^,n^) is LQ(X)=1, L^(X)=1-X and on (n^,n4) it is L^(x)=l-x. 1.2 Research connected with parallel computation models Karp and Miller fbf show that for every proper execution the sequence of data words occuring on any branch of G is always the same thus ensuring the same computational result. This property is referred to as the determinaoy of a computation graph. Also, they give an algorithm to determine whether a computation terminates, and a procedure for finding the number of performances of each operation in G. Finally, they give necessary and sufficient con ditions for the lengths of data queues to remain bounded. Reiter in his Ph.D. thesis fjf addresses himself to the problems of storage, scheduling, and optimum assignment of operat ions to processing units. He gives an integer linear program for the determination of the maximum storage required by a computation graph G. He introduces a concept of an admissible schedule defining valid node initiation times and characterizes the class of all admissible schedules in the case W =T =1, U -0 or 1. He further shows that in P P P this case it is possible to find a periodic admissible schedule which achieves the maximum computation rate (also see /8/). He also defines the cost of an assignment of node functions to processors 7 and gives a method for determining feasible solutions when the maxi mum comjmtation rate has a lower bound. Finally, he extends the model to incorporate a restricted form of data dependency without losing its detcrminacy. A different approach to the graphical representation of a computation is taken by Martin /9/. He allows two types of node in put control. In the case of conjunctive input control a node can initiate only if each branch directed into the node contains at least one data word. In the case of disjunctive input control a node can initiate only if at least one branch directed into the node contains at least one data word. Similarly, conjunctive output control places one word on each branch directed out from the node and disjunctive output control places one word on a branch directed out from the node according to an a priori probability. The latter represents a conditional transfer which is deterministic every time it occurs but over many possible data sets may be modelled probabil istically. Note that the conjunctive input control and conjunctive out put control correspond to the T=vr=l case and U=l case in the Karp -Miller model, respectively. Also note, that because of conditional transfers, this model is not determinate. In his work Martin studies the assignment of node comput ations to processors and tries to minimize the average computation time. Further research of this model can be found in /lO/, where an approximative method for calculating the average computation time ±s given, and /ll/, where procedures.are Riven.to determine a lower and an upper bound on the number of processors required for maximum 8 parallelism. An application of results of these studies to assembly-line balancing problems is given in /12/. CHAPTER 2 TERMINATION PROPERTIES OF COMPUTATION GRAPHS Part 2.1 of this chapter is devoted to a presentation of the Karp-Miller algorithm. The algorithm is used to determine whether the computation specified by a given computation graph terminates, and to find the number of performances of each operation in case the computation terminates. Several theorems are given in parts 2.2 and 2.3 to improve the efficiency of the algorithm. 2.1 The Karp-Miller Algorithm Theorems and Lemmas of section 2.1 are proved in /6/. Since the number of performances of an operation 0 is independent of the execu tion considered only if this excution is proper, Karp and Miller restrict their attention to proper executions. A node n_. of a computation graph is said to terminate if and only if (in the following iff) j occurs in only a finite number of the sets of a proper execution of G. Naturally, this number is the same for all proper executions of G. To further study the termination of properties of computation graphs we need to introduce a few concepts from graph theory. A directed graph is called strongly connected iff given any pair of nodes n. and n. there exists a directed path from n. to n.. A special case of a strongly connected graph is the trivial graph which has only one node and no branches. For any directed graph there exists a unique partition of 9 10 its nodes into equivalence classes as follows: Two nodes n. and n. are in the same class if there exists a directed path from n^ to n^ , and a directed path from n^ to n^. A subgraph consisting of the nodes of an equivalence class and the branches of the original graph connecting these nodes is then a strongly connected graph. The subgraphs corresponding to the node equivalence classes are maximal strongly connected subgraphs of the original graph and are called the strong components of the graph. These strong components play a crucial role in the Karp-Miller algorithm. Lemma 2.I Let G' be a strongly connected subgraph of a computation graph G.'Then either every node of G' terminates or nonedoes. We say that 0' terminates if every node of G' terminates. Divide all the nodes of a computation graph G into two classes according to whether they terminate or not. Then by Lemma 2.1 the set S of terminating nodes is a union of sets of nodes of some strong components of G. The strongly connected subgraph G' which would terminate if it were isolated from the rest of G is called self-terminating. Let us examine when a strong component is self-terminating. Lemma 2.2 Let d^=(n^,iij) be a branch of a computation graph. Then n^ terminates if n. terminates. l 11 r s This loads to the following concepts: Let G and G' -be strong components of G. Then define Gr >_ GS if either Gr. = GS or there exist r s such nodes n-6 G and n.€ G that there is a directed path from n. 1 j 1 to n . in G. J Theorem 2.3 s A strong component G of a computation graph G terminates r r iff there exists G such that G is self-terminating and Gr> Gs. s r Thus G terminates iff it is self-terminating or some G is self-terminating and there is a directed path from some n.6 G to some n - £ Gb. Therefore to determine the set S we examine strong com-r s r ponents for self-termination in such order that if G >. G , then G is examined first. Now the problem is how to determine 'whether a strongly con nected subgraph G' is self-terminating. For this purpose Karp and Miller use properties of the 1OO|JS contained in G'. A computation graph L is called a loop if it consists of distinct nodes ,n0,.,"and branches d^,d2,..,d£ such that d^ is directed from to n^_+1 > k=l, 2, . . ,/,-l, and d^ is directed from n^ to n^ . Here we note that any strongly connected subgraph G' except the trivial graph contains at least one loop. Theorem 2.4 A strongly connected subgraph G1 is self-terminating iff G' 12 contains a self-terminating loop. Given this theorem the only problem is how to establish that a particular loop is self-terminating. Let L be a loop with branches d, ,d,,, . . ,d«. The product it U. g= TT rr^ is called the gain of the loop. There are 3 cases: g^l, i = l i g=l, and g >1. Loops with g < 1 Theorem 2.5 Any loop L for which g<l is self-terminating. Loops witli g=I Theorem 2.6. Let r~ P = 1 1 U'"l vi ul h Vl Ul h ui Vi h Wl h wi , Vl h h U2 ui Y'l ''2 i h-l h 1 ' T.I "1 W2 h j I V,'l V h-l h-l I ' 1 Vi i. I 13 *1 /33 ft Ak-i"Tk-i+1 where fo l{ = y— k=2 , 3 ,. . and k-1 /3i = Ai-Tx+1 Then the necessary condition for self-termination of a loop with g=l is P /3 < 0 Karp and Miller give a necessary and sufficient condition for self-termination in the following special case: Theorem 2.7 If, for l£k^X,, Vi'. =U, =1, then the loop L is self-terminating iff (i.e. Z A < Z (T,-l) ). k=l " k=l k k=l k In case that Theorems 2.6 -and 2.7 cannot be applied Karp and Miller derive an upper bound on the numbers of x^erformances of nodes of a self-terminating loop. Theorem 2.8 Let L be a self-terminating loop with g=l. Let X* be a positive integer solution of the system OC^ = a 14 U1 OC = -i a 2 M1 a Ul U2 u u ry, - _ —£ U a whore a is an arbitrary parameter. Let X = xn X, where X^ is the number of performances of node n^, k=l,2,.., Then at least one component of X is less than the correspond ing component of X*. Loops with g >1 Theorem 2.6 is valid also for this case. If L is self-terminating then we get the following upper bound: X < yi- P fi 1-g '-Having obtained an upper bound for X we can test self-termination of a loop by applying a procedure given in the following theorem. 15 Theorem 2.9 # For all nodes n.€S the following iteration scheme converges J in a finite number of steps to X; *(0)<;>=° r _ („) , Ik -T +1+U xKa,{iT x^n+1^(j) = max J x^(j), (i,p)6Z.j, i€S\ % Here^j is the set of ordered pairs (i,p) such that d^ is a brancli from node n. to node n.. The results given so far may be organized into an algorithm for determining which nodes of a computation graph G terminate and, for the terminating nodes n.. , computing the number of performances x(j). This algorithm may be outlined /6/ as follows: Step 1. From among the strong components of the computation graph being considered (initially this graph is G), select one which is not covered by any other subgraph. Call it G'. Step 2. By applying Steps 2A,...,2D given below, test whether G' is self-terminating and, when it is, determine x(j) for each n.£G'. Step 3. Form a new computation graph as follows: If G' is not self-terminating, remove G' and all branches incident with nodes of G'. If G' is self-terminating, replace each branch d from n^G' to n^G' by an "equivalent" branch d^, from n^ to n^ , having U^,=0, A ,=A +U x(i) , T ,=T , and W , =V,T . Then remove G' . P P p p p p p Step 4. If the new computation graph is nonempty, return to Step 1. Otherwise the analysis of termination is complete. 4 The symbol fx! denotes "least integer greater than or equal to x." 16 The details of Step 2 are now described. Step 2k. If G' contains a branch d^ with U =0, go to Step 2D. If not, determine whether G' contains a loop with g<l. This is equi valent to determining whether there is a loop L sucli that Z log (U /V )<0. d € L P P P This determination can be carried out by a shortest-route algorithm given in /13/. Enumeration of loops is not required in this pro cedure. If a loop with g<l exists, go to Step 2D; otherwise, go to Step 2E. Step 2B. Every loop of G' has .g > 1. Determine whether there is a loop not previously considered such that 0> Pyg. If no such loop exists, G' is not self-terminating; return to Step 3. If such a loop L is found, determine upper bounds on the quantities x^(k) by the methods given above. These bounds hold, of course, only if L is self-terminating. Step 2C. Continue applying the iteration scheme of Theorem 2.9, tak ing S to be set of nodes of G', until either (a) it terminates, establishing that G' is self-terminating, and giving x(j) for each n.£G', or J (b) for some n and some k, x^n^(k) exceeds the upper bound on x^(k), establishing that L is not self-terminating. Return to Step 2B. Step 2D. G' is self-terminating. Apply the iteration scheme of Theorem 2.9, taking S to be the set of nodes of G', to obtain x(j) for each n.£G'.. Return to Step 3. 17 2.2 Some necessary and sufficient conditions for .self-termination of 1o o ps As shown above, the Karp-Miller algorithm is.based on termin ation properties of loops. Consequently its efficiency depends main ly on the means available for testing self-termination of loops. In the following we shall derive some theorems testing these proper ties. Theorem 2.10 ' If, for l^k<L,£, p— = 1, then the loop L is self-terminating bk I Z 1X1 £ o k=l PROOF; By Theorem 6 of /6/ the necessary and sufficient condition for self-termination of a loop is the existence of a nonnegative integer solution of the following system of inequalities: , ,v A^-Tjt+l+lixU) x(l)> g A1r-T^+1+U,,x(k) x(k+l)>——\-. 'for k=l,2,..je-l k This system reduces to x(l)> /3£+x(i) x(k+l)>/3k+x(k) for k-1, 2,..,i-l Since all x's are integers, we have 18 x(k+l) > f/3j+x(k) for k=l,2f..,1-1 By summing left and right-hand sides of the above inequalities, respectively, we get the necessary condition To prove that it is also a sufficient condition we -ban.-, show that if it is satisfied the system x(l)=C x(k+l)=x(k)+ f/3k] k=l,2,..,1-1 where C is a sufficiently large integer is a nonnegative integer solution of the above inequalities. Consider the following examples Example 2.1 A-0 U=l \V=T=2 A=2 Here g=l/2. The data distribution after each node performes once is A'=0 U=l W=T=2 T=Vf=l U=l A'=l Fig. 2 19 Example 2.2 A=0 A=l Here g=2. The data distribution after each node performs once is A'=0 T=W=1 A'=2 Example 2.3 A=l. Here g-l. The data distribution after each node performs once is A'=0 T=Vf=l A'=l Dependence of tlie amount of data on the loop gain Fig.2 20 These examples indicate that (roughly speaking) for g>l the "amount of data" increases, for g<1 it decreases, and for g=l it remains constant. The following theorem gives this fact a precise form. Theorem 2.11 „ I u Let L be a loop with gain g - II ^— . Let A. and A^ be the i = l i initial and current number of words on the i-th branch, respectively, for i = l,2,..,j£. Then JL I if g<l Z ciAi< 21 c.A^ i=l i=l L 5L if g=l ' Z c.A. = Z c.A0 . , l i • , I l i=l i=l t L if g> 1 Z c. A. > Z. c.A FA . , I I . , I I 1=1 i=l where c^ - 1 and c. 4 TT Tf-*1— for i=2,3,..,i. PROOF; Suppose that W . words are taken from the j-th branch (j'^;C), J and LTj_^ words are placed on the (j+l)-th branch. Taking into account that c . , = c . d+i - j u.+1 the change in the sum F c.A. is ^ . , I l i = l 21 W . c. ,U. , - c.V. = c. rr-L- U. , - C.W. = 0 0+1 J+l J J J J+l J J Now suppose that W^ words are taken from the j2-th branch and U words are placed on the 1-st branch. We have Cl !i!2 Vi h. cz = u2 u3 uz = e w^ and the change in the sum U ciui - CA = ui - i*x W = V1 " i> is positive for g>l, negative for g < 1, and zero for g=l. As a corollary we get a necessary condition for self-term ination of a loop, which is essentially equivalent to Theorem 2.6. Corollary 2.12 A necessary condition for self-termination of a loop with g>l and initial number of words on the i-th branch A?, i=l,2,.., L is that T c.A°< J" c. (T.-l) 1=1 1=1 PROOF: If the loop is self-terminating, then after a finite number of performances we must have 22 A. < T.-l for i=l,2,..,1 i-i ' ' ' and T c.A.< > c.(T.-l) ii . , I X I i=l x x i=l By Theorem 2.11 I I I I c.A°< f c.A. < f c. (T.-l) . , I l — I l — I l ' i=l A A i=l x x i=l Now we shall derive a sufficient condition for loops with g=l Lemma 2.13 Let L be a loop with g=l, and W\=T^ for i=l,2,..,Let A? be the initial number of data words on the i-th branch for i=l,2,..,£. If Z c.A. < max c .V . i=l 1 1 I<j<l J J then the loop is self-terminating, PROOF: Let k be such number that Tli en JL max c -V . = c, Vf, c .*—, c, 1 i = l k t— n i ^ k Since 23 I c.A° = I c.A. where A. is the current number of words on the i-th branch we have l X* c X c A, < 5" — A. = 7 — A°<W, = T, k c, l c, l k k i=l k i=l k Thus the number of words on the k-th branch will never reach the threshold and the number of performances of node iij. is zero. By Lemma 2.2 every node of the loop terminates. Hence the theorem. Theorem 2.14 Let L be a loop with g=l. Let A? be the initial number of data words on the i-th branch, i = l,2,..,X. A sufficient condition for self-termination of L is that 0 1 y c.A < y c. (T.-W.) + max c.V . i=i 1 1 iii 1 1 1 i*gjt J J PROOF; Suppose that L does not terminate. Then after each node performed at least once, Ai^ W i,e- Ai = (VV + Ai where A[> 0 for i=l,2,.. If we replace A^ and T^ by A| and T^=Vr_^, respectively, the resulting loop L' will have the same termination properties as L, i.e. will not terminate. . Then by Lemma 2.13 24 2_ c'A! > max c'.W. i=l 1 1 l£j*Jt J J Since c!=c for i=1.2,*.,£ f c.A0 = f c.A. = ? c. (T.-YT. ) + [ c.A! = £ c. (T.-V. ) + J_ c!A!> iTi 11 iti 1 1 1 1 i=i 1 1 i=i 1 11 i=l 1 x~ T c(T.-W.) + max c'.W. = T c.(T.-W.) + max c .W. itl 1 1 1 l<j<Z J J i = l 1 1 1 l*j£ J J which is a contradiction. To illustrate how strong the conditions of Corollary 2.12 and Theorem 2.14 are consider the following two simple examples; Example 2.4 n4 W=T=2 * U=2 Example 2.5 W=T=1 U=l Here c^=c-j=l c2=c4=2 Ic.A0 = 2 =Ic.(T.-l) ii *~ lx l and the loop terminates A=0 Here c^=c2=l and max c.W.=1 l<j£2 J J 7c.A?=l= Zc•(T.-W.)+raax c.W. 11 1 1 1 l<g£2 J J and the loop does not terminate Termination of loops Fig.3 25 2•3 A direct method of testing termination properties of strong components• As noted in /6/, the Karp-Miller algorithm requires the inspec tion of each loop of a strong component G' when G' is not self-term inating. If G' contains many loops a more direct way is desirable. Consider the shortest-route algorithm given in /l3/. It is used in Step 2A of the Karp-Miller algorithm for testing termiation properties (see part 2.1). If we use multiplication and associate U A with branch d ,'rather than addition and log(U Ar ), then, in p/ p ° p' p' ' ' • the absence of loops with g<l, the algorithm results in assigning a rational number to each node of G'. On multiplying these numbers by the least common product of their denominators each node n. will have an integer OC . assigned. l ° l ° If d =(n.,n.) is a branch of a loop L with g=l, then 06. /oi. =\j A . pi'j 1 o> j'ip'p If L has g > 1, then for one and only one branch do we haveOc^/<X< <U A ; for other branches of L OC./CX.^U A • P P' J/ i P P Consider now an arbitrary loop L of G' with g=l. If L is self-terminating then by Theorem 2.8 the number of performances is for at least one node n.€ L less thanOC.. J J We shall show that the same is valid for loops with g>l. Theorem 2.15 If L is a self-terminating loop with g> 1, then at least one component of X is less than the corresponding component of X? here X = X, X, X, x*= al oc0 OC,, a a(U1A1) a(u1A1)(u2A2)-(fiV;i-i) 26 where a is an arbitrary parameter and X^ is the number of jierf ormances of node n^, i=l,2,..,X. PROOF; By Theorem 4 of /6/, X is the minimum nonnegative solution of (E-A)X>/3 , i.e. 0 1 0 0 u. 0 0 , . 0 0 0 • • • u Jt-1 Vl 0 xl V x2 ^2 • > • • • where /3-, = W1 and fc. Ai-1-Ti-1+1 Vl i i — 2 , 3 , . . ,/£ Then (E-A)X* = U. 0 0 0 0 0 0 0 0 u 'l-l l-l 0 a tfl W2 ^1^2 Vl Wl Wl 27 (E-A)X* = a Now consider the vector X-X*. (E-A)(X-X*) = (E-A)X - (E-A)X* > /3 If no component of X is less than the corresponding component of X*, then (X-X*) is a smaller nonnegative integer solution of (E-A)X>/| than X, which is a contradiction. Theorems 2.15, 2.8, and 2.4 serve as a basis for a procedure for testing termination properties of strong components, which may be outlined as follows: Step 1. Apply the shortest-route algorithm modified as shown above. If a loop with g<1 is found go to Step 2; otherwise continue until each node n. of the strong component is assigned a constant^. . l ° l Step 2. Apply the iteration scheme of Theorem 2.9 to the nodes of the strong component G1 until either a) the scheme terminates, establishing that G' is self-terminating and giving the number of performances x(j) for each n . € G 1 , o r 3 b) for some n and some k, x (k) exceeds the upper bound C*k on x(k), establishing that G' is not self-terminating. 1-g 0 0 0 ^ 0 -; CHAPTER 3 STORAGE REQUIREMENTS In the Karp-Miller model each branch d^=(n^,nj) represents a queue of data words which may be an output of operation 0^ assoc iated with node n^ and which may serve as an input for operation 0j associated with node n^. Each data word has to be stored in a memory location of a computer performing the computation, and the maximum number of memory locations required becomes of interest. Chapter 3 is devoted to this problem. 3•1 Maximum storage requirement - special case In this part we present the results of /if. Let us introduce a branch parameter tr > 0: If d =(n..n.),X r p p i J p is the fixed time required by node n^ to fetch its input data from storage, process these data, and place outputs into memory locations associated with the queue on branch d . Thus if n. initiates at time p l t, it places U data words upon branch d at time t+tT . P P P um Another parameter we introduce is T^=max^TJ^| where the maxim is taken over all branches d directed out from node n.. P i A schedule is a set 4-{^-^ ' * * ,(->t} wnere each is a funct ion 6. : {l,2,..,X.} -» R such that for l<k<r£X. l ^(kKd.U) Here R is the set of real numbers and X^ is the number of initiations 28 29 of node n^ for any proper execution of G. If X^=0, is undefined. With each we associate a function xi : R-{o,l,2,..,X.} x.(t)=0 iff either X.=0 or X.il and t<4.(l). 1x 1 l l For l<k<Xi, xt(t)=k iff 6 (k) ^ t< 6 (k+l) For X. >1, x.(t)=X. iff <S-(X.)<t. I ' ii ivi For every branch d =(n.,n.) define P i j bd(t)=A +U x.(t-T )-W (x.(t)-E.(t)) p P pi p p J J where £.(t)=l if there exists k, l^k^X. such that 6.(k)=t £..(t)=0 otherwise. A schedule is called an admissible schedule if, for j=l,2,..,X <S.(k)=t b6(t)> T j P P for all branches d into n.. and for all k, 1 £k£X.. P J J A schedule <o is sequential if for no nodes n., n., with 1 i' j n.. do we have i J 6.(k)< d.(r)< 6. (k)+T. I J I I ,forl£k£X., l£r<X.. i J These definitions are to be interpreted as follows: <^(k)=t means that node n^ begins its k-th initiation at time t under the schedule (J. x^(t) is the number of initiations of node IK , up to and including time t, under the schedule <i. b^(t) is the number of data words on branch d^ at time t. The quant ity £.(t) is introduced for the following reason: All data transmit-J ted to node n^ by node n^ via the branch (n^,n^) must first pass 30 through storage. Then if t is a time at which n. does not initiate, the storage requirement at time t is b6(t) = A +U x.(t-f )-W x.(t). P P P i • P P J If n^ initiates at time t, the number of data words in storage at time t is b6(t) = A +U x.(t-r )-V (x.(t)-l) P P P i P P j An admissible schedule specifies those node initiation times cor responding to a proper execution. Thus a node n^ initiates at time t (d>.(k)=t for some l<k<X.) only if each branch d directed into v lv I ' J p n. contains at least T data words at time t. (b^(t)>T ). Finally, l p ' p p a sequential schedule is a schedule under which no node initiates at the same time that some other node is executing. For any admissible scheduled, define A> = max T b6 (t) C6 t p P jkJ^ thus defines the maximum amount of storage required by the admissible schedule 6. Lemma 3.1 Let 6 be an admissible schedule for a computation graph G. Then there exists an admissible sequential schedule (a.s.s.) 6 such that ^6' * <^6 This Lemma shows that in general a parallel system is more econom ical than a sequential system in the sense that it needs fewer storage locations. 31 Let ^U/ = max 6 is &n admissible schedule^ . ^is the maximum number of memory locations that a computation graph G can require. Corollary 3.2 i_, £M = oax( ^/<? is an a.s.s.) For any admissible schedule define T<* = {t/t£ (6.(10,4. (kJ+'E) for 0<k£X., i = l,2,..,/}. are those times during which no node of G is executing under the schedule 6; however, nodes of G may just be initiating or terminat ing under 6 at some of the times t€ T^. We then have the following result: Corollary 3.3 (Ms- wax max {^("t)/CJ is an a.s.s. and t 6 T^} . 6 t Write b. f°r ^ne column vector with p-th component A , W for the column vector with p-th component W , T for the column vector with p-th component T , and, for any admissible schedule <$> define _b^(t) to be the column vector with p-th component b^(t), p=l,2,..,t. Define a tXj£, matrix A with elements a .=\V if branch d is directed into n. but not also out from n.. PJ P P J J a . =-U if brancli d is directed out from n. but not also into n.. PJ P P J a . =Vr -U if branch d is directed out from n. into n.. PJ P P P J J a .=0 otherwise. PJ 32 Finally, let X be a column vector with i-th component X^, i=l,2,.., Theorem 3.4 # Let G satisfy b.>T-W. Then is determined by the following integer linear program: |b| - minU^I subject to 0 <y_< X where each y^, i=l,2,..£, is an integer. Theorem 3.4 enables us to determine the maximum amount of storage required for a given computation graph, provided b>T_-W. What if this is not satisfied? We quote from /l/: "In those cases where this inequality is violated, the program /of Theorem 3.4/ is inapplicable. Under such contingency one pos sible course of action is to simulate all possible admissible schedules for G until a distribution of data is obtained which satisfies the above ineqality, and then apply the program /of Theorem 3.4/ to this data distribution. Then the maximum storage requirement is either that obtained through the simulation phase, or that obtained by the program, whichever is the greater. In general, however, such a scheme would be impractical due to the potentially large number of possible different simulations in volved. " n # If ii is a n-dimensional vector, then define |x[ = T, x. . ~~ i=l 1 33 3.2 Maximum storage requirement - general case A node n^ of a computation graph can initiate only if for every branch d^ directed into n^ the number of data words assoc iated with this branch is not less than the corresponding threshold T . Consequently, the number of data words on this branch after the initiation is not less than T -V . P P Consider a computation graph Cr which does not satisfy b> T-W, i.e. there exists at least one branch d^=(n^,n..), d^ £ G such that A < T -Vir . Let us assume that G contains only one node terminal to p p p such a branch. Later we shall show how to extend the results to the case of more nodes terminal to such branches. Clearly, the data distribution will satisfy the requirement b >T-W after the first initiation of n.. In the following we derive J a method for finding the maximum storage required prior to the first initiation of n.. Then we apply methods of part 3.1 to determine the maximum storage required after the first initiation of n.. J Given a computation graph G modify it as follows: For the node n^ jmt all V .=T -=0 all U • .=0 J s Note that the data distribution of the modified graph G' is not affected by initiations of n.. In the following parameters of G' will be primed. Lemma 3.5 Let <o = (6- (k. ), d>. (k. ), . . ,<J. (k. )} be a schedule, L xl xl r2 12 xm xm where ^i^, i2,.., im} - IC{l,2,..,j-1,j + 1,., l) and 34 xl xl xi x2 x2 x2 x2 x2 x3 x3 d i (k£ < <£. (k. ) . m-1 m-1 m-1 m ra Then 6 is admissible for G' iff it is admissible for G; moreover b*.(t)=b*(t) for 0<t£c?. (k. ). m m PROOF: By definition x.(t)=o iff t^6.(i) xi(t)=k iff t >6\(k) and t£<S\ (k+l) Therefore x.(t)=x!(t) for i€l and 0<t<d. (k. ) l lx ~- I l in m and x. (t)=x! (t)=0 for i/l and 0<t<6. (k. ). ii ii m m Also £i(t) = £j(t) for i€I and 0 < t < d>i (1^ ), m rn and £. (t) = 6! (t)=0 for i/l and 0<t<6- (k. ). ii / ii m m Let us examine bd(t)=A +U x (t-V )-V (x (t)-£ (t)) for d =(n p p p r p p s s p r There are four different cases depending on whether r and s are elements of I or not. (i) r € I, s € I Here U =U1 , W =V»M P P P P and b6(t)=A +U x (t-t )-\f (x (t)-£ (t))=lj6(t) pv p p rv p p sx s px (ii) rel, s^I Here Up=Up» xg(t)=x^(t)=0, £g(t)=£^(t)=0 and b*(t)=Ap+Uixr(t-rp)=b'J(t) (iii) r £ I, s 6 I Here Vp=Vp» xr(t)=x^.(t)=0 and bp (t )=Ap-Wp (xs (t)- £s (t) )=b'J( t) (iv) r£l, s£l Here xf (t )=x^.( t )=xs (t )=x^ (t ) = £g (t ) = £, (t )=0 and b6(t)=A =b6{t) P P P 35 We have thus established that b£(t) = b£,(t) for 0*t<6. (k. ). ni m Moreover, for all d =(n ,n ), s€ I 'we have T'=T and ' p v r' s ' p p b6(t)>T P P iff. b'<Ht)>T' for 0<t£6. '(k. ). p p 1 1 A r ram This proves that 6 is admissible for G iff it is admissible for G'. Lemma 3.6 The maximum storage S' required for the modified graph G' max is the same as the maximum storage S required for G prior max to the first initiation 6 • (1) of n.. J J PROOF: By Corollary 3.2 we need only consider sequential schedules. The proof will consists of 3 parts. (i) Every sequential s chedul e d> = i 6>. (k. ), <•> . (k. ),..,(i. (k. ) , . .> 11 2 2 mm can be divided into time intervals <o,6. (k )), <<3. (k ),6 (k )),..,<d (k ),L )),.. Xl Xl Xl 1l X2 12 1m 1m Sa+l ^m+l From the definition of b^(t) and £^(t) it follows that S(t)< S(0) if t<6- (k. ) 1 1 and S(t)<S(<S. (k. )) if (k. )<t<di (k. ) in ra m m m+1 m+1 where 3(t)=|b6(t)| Thus to get S it is sufficient to examine S(t) at t=0, <£. (k. ), max l ^ ^" 1' (>• (k. ), d>. (k. ),..; in other words it is sufficient to examine the 12 x2 x3 x3 integer sequence S( where S]ii=S(<S. (k. ). m m . (ii) Here we shall show that'S' is infinite iff S is infinite. v max max Let N be an arbitrary integer. If S' is infinite, then there exists 36 a schedule («> and integer M' such that SM • > N Take the initial part of d> up to d> . (k. ) and omit all initiations Hi' XM' of n.. Then by Lemma 3.5 we get an admissible schedule 6 for G which J requires the same storage as C>'. Thus for some M<M' we have SM=SM«> N" In the same fashion we can show that S' is infinite if S is max max infinite. (iii) Now suppose that S' is finite. Then sequences S' are bound-v 11 max x m ed by S1 and there exists such a schedule d' and integer M' that J max S^t=S' . Take the initial part of 6 up to 6. (k. ) and omit all M max l^p l^j, the initiations of n.. This by Lemma 3.5 will give us an a.s.s. for G with the same storage requirements as 6 . Thus S ' =SV'( i —S., £v£> max iM' M max On the other hand, since S is finite, there exists a schedule Q ' max ' > and integer N such that N max The initial part of this schedule up to O. (k. ) is an a.s.s. o' for XN • \\T G' with the same storage requirements as o* This gives s =s5=s'£^ S' max N N max From this and S' < S we obtain max max S =S1 max" max Corollary 3.7 Let all branches of G satisfy A >T -Yi with a possible ex-p- p p 1 ception of branch (n.,n.). Modify G as follows: Put all Vrj.=Tr;.=0, all Ujg=0 37 Then the maximum storage S ^ required for G prior to the first initiation of n. is determined by the following integer linear program: max 1 —' 1 d-' subject to o<yk<x^. k/j 0 < y • < oo • A'vib-T'+W' where A', T', W', X' are parameters of the modified graph G'. PROOF: Proof follows from Lemma 3.6 and Theorem 3.4. Lemma 3.8 Let G be a computation graph whose all branches satisfy A >T -Vi' with a possible exception of a branch (n.,n.,). p p p 1 i' j" Let X.> 0. Then £=b^(t), where _c is defined belov/, for some a.s.s. <S and some t € T^, t > <S. (1) iff there exist such integers y^ , i=l, 2 , . . ,J& which satisfy (i) °-yk-Xk k^ 1 < y • < X . J J (ii) £=P_-AX^1-1L PROOF: Analogous to that of Theorem 2.4 of /l/. Corollary 3.9 Let G be a computation graph whose all branches satisfy A >T -V with a possible exception of a branch (n.,n.). p p p 1 * i J 38 Let X.>0. Then the maximum storage S^ required after the j ° max 1 first initiation of n. is determined by the following integer J linear program: S"*" =1 b I -inin j Ay I max ' —1 J-1 subject to o<yk<\ k^j Ay_<b - T + ¥ where ea.ch yk is integer. Theorem 3.10 Let G be a computation graph whose all brandies satisfy A >T -YY with a possible exception of a branch (n.,n.). P P P 1 J Let X^ > 0. Then the maximum storage required is M/ = max(3 ,) < max' max PROOF: I. a) Suppose S ^S1 11 max max There exists an a.s.s. for the modified graph G' which requires Sm^x of storage. By Lemma 3.5 this schedule is admissible also for G and therefore is an initial part of some a.s.s. for G. Thus A/> S > 31 C max max and XC>max(S ,3"^ ) C — max' max b) Now suppose S >S 11 max— max By Corollary 3.9 S^ax= I b | -1 Ay_°l whor e | Ay_°| = m i n | Ay_ | subject to 0^yk<Xk k^j 39 Then by Lemma 3.8 S1 =|b6(t)|- for some a.s»s.» and niax /W/^S1 >s C max— max i.e. Xt/>raax(S ,3^" ) <• max' max II. Let (•> be an arbitrary schedule for G. Then |b6(t)|<S for t<d.(l) 1 — x " max j and I bd (t )| ^ 31 for t>6.(l) 1 — 1 max — j Thus ii/<iraax(S , ) Cr — max' max The results of I. and II. give M/= raax(S , S"^ ), q.e.d. b < max' max ' ^ Theorem 3.10 and Corollaries 3.7 and 3.9 thus provide means for finding the maximum storage for graphs where one and only one node is terminal to a branch which does not satisfv A > T -W . - p- p p In case there are' r such nodes, we take a subset of these nodes { n. ,n. ,..,n. , where 0 < s r. For all branches directed ^ Xl X2 s into the nodes of the subset we put Yir=T=0, and for all branches directed out from the nodes of the subset we put U=0. Then to this modified graph we apply the following integer linear program: S =jbl-min|Ay I max 1 —1 I J. i subject to 0<yk<Xk Mn ,n ,..,n. 12 s l£y• £ x. xi xl 1 < y • < X. 16 y • £ X. J i i s s r r Since there are 2 sucli subsets we have 2 linear programs. The maximum required storage is the maximum of the 2 partial maximum storage requirements. CHAPTER 4 CONCLUSIONS The Karp-Miller algorithm for testing termination properties of computation graphs is based on the termination properties of loops. It is, therefore, desirable to have means for testing loops. A quantity related to the number of data words in a loop is introduced, which decreases for g<l, increases for g>l, and remains constant for g=l, in the course of computation. This concept makes it possible to derive a simple sufficient condition (Theorem 2.14) for self-termination of loops with g=l, and to give a shorter and intuitively more satisfying proof (Corollary 2.12) of necessary condition of Theorem 2.6 due to Karp and Miller. In the special case that \V=U, the necessary and sufficient condition is given (Theorem 2.10). This condition has a simple form due to the fact that data propagation has local character in this case. Since this is invalid in the general case, one probably cannot hoj)e for a simple form of the general necessary and sufficient condition. The Karp-Miller algorithm is not well suited for computation graphs with many loops. Therefore, in part 2.3 of Chapter 2 a direct procedure for testing termination properties of strongly connected graphs is derived. The procedure does not require inspection of every loop as in the Karp-Miller algorithm. However, it also uses the iteration scheme of Theorem 2.9, which for large graphs may be too lengthy. Reiter in /l/ gives a linear integer program for determining the maximum amount of storage required in the special case that 40 41 b>T-W. Part 3.2 of chapter 3 extends his method to cover the general case. The number of linear programs required in our method increases as 21 where i is the number of nodes terminal to brandies which do not satisfy A^ > ^p""^*p» ou^ -"-^ appears to be more efficient than simulation of all possible schedules /7/, especially for highly parallel computations. REFERENCES flf Gregory J. and McReynolds R., "The SOLOMON computer", IEEE Trans, on Electronic Computers, vol. EC-12, pp.774-781, Dec. 1963. /2/ Schwartz J., "Large parallel computers", J.ACM, vol.13, No. 1, pp.25-32, Jan.1966. /3/ Slotnick D.L. et al.,"The ILLIAC IV computer", IEEE Trans, on Computers, vol. C-17, No.8, pp.746-757, Aug.1968. /A/ Thurber K.J. and Myrna J.W., "System design of a cellular APL computer", IEEE Trans, on Computers, vol. C-19, No.4, pp.291--303, April 1970. /5/ Koczela L.J. and Yvang G.Y., "The design of a highly parallel computer organisation", IEEE Trans, on Computers, vol.C-18, No.6, pp.520-529, June 1969. /6/ Karp R.M. and Miller R.E., "Properties of a model for parallel computations: deterrninacy, termination, queuing", SIAM J. Appl. Math., vol'.14, No.6, pp. 1390-1411, Nov.1966. flf Reiter R., "A study of a model for parallel computations", Techn. Report No. RADC-TR-6S-350, Rome Air Development Center, Nov.1968. /8/ Reiter R., "Scheduling parallel computations", J.ACM, vol.15, No.4, pp.590-599, Oct.1968. /9/ Mart in D.F., "The automatic assignment and sequencing of comput ations on parallel processor systems", -Ph.D. Thesis, Dept. of Eng., UCLA, Jan.1966. /lO/ Martin D.F. and Estrin G., "Path length on graph models of computations", IEEE Trans, on Computers, vol.C-18, No.6, pp.530-536, June 1969. /ll/ Baer J.L.E. and Estrin G., "Bounds for maximum parallelism in a bilogic graph model of computations", IEEE Trans, on Comp., vol.C-18, No.11,.pp.1012-1014, Nov.1969. /12/ Reiter R., "On assembly-line balancing problems", Operations Research, vol.17, No.4, pp.685-700, July-Aug.1969. /13/ Ford L.R. and Fulkerson D.P., Flows in networks, Princeton University Press, Princeton, 1962, pp.130-134. 42
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Termination and storage requirements in a model for...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Termination and storage requirements in a model for parallel computations Schnapp, Ivan 1970-05-17
pdf
Page Metadata
Item Metadata
Title | Termination and storage requirements in a model for parallel computations |
Creator |
Schnapp, Ivan |
Publisher | University of British Columbia |
Date Issued | 1970 |
Description | This paper deals with a graph-theoretic model for parallel computations as formulated by Karp and Miller. A necessary condition and a sufficient condition for self-termination of loops with unit loop gain are presented. For the special case that W=U the necessary and sufficient condition is derived. A direct procedure for testing termination properties of strongly connected graphs is presented. A method due to Reiter, for determining the maximum storage required for a computation graph, is extended to cover the general case. |
Subject |
Computers. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-05-17 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0102044 |
URI | http://hdl.handle.net/2429/34610 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1970_A7 S39_4.pdf [ 2.5MB ]
- Metadata
- JSON: 831-1.0102044.json
- JSON-LD: 831-1.0102044-ld.json
- RDF/XML (Pretty): 831-1.0102044-rdf.xml
- RDF/JSON: 831-1.0102044-rdf.json
- Turtle: 831-1.0102044-turtle.txt
- N-Triples: 831-1.0102044-rdf-ntriples.txt
- Original Record: 831-1.0102044-source.json
- Full Text
- 831-1.0102044-fulltext.txt
- Citation
- 831-1.0102044.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0102044/manifest