INFORMATION THEORETIC MEASURE OF ALGORITHMIC COMPLEXITY by L o i s Wright Hon. B . S c , U n i v e r s i t y of Western O n t a r i o , 1972 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n the Department of Computer Science We accept t h i s t h e s i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF BRITISH COLUMBIA A p r i l 1974 In p r e s e n t i n g t h i s t h e s i s in p a r t i a l f u l f i l m e n t o f the r e q u i r e m e n t s f o r an advanced degree at the U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d by the Head o f my Depar tment or by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Depa rtment The U n i v e r s i t y o f B r i t i s h Columbia Vancouver 8, Canada i i ABSTRACT T h i s work i s a study of an i n f o r m a t i o n t h e o r e t i c model which i s used to develop a complexity measure of an a l g o r i t h m . The measure i s d e f i n e d to r e f l e c t the computational c o s t and s t r u c t u r e of the given a l g o r i t h m . In t h i s study computational c o s t s are expressed as the execution times of the a l g o r i t h m , where the algorithm i s coded as a program i n a machine independent language, and analysed i n terms of i t s r e p r e s e n t a t i o n as a pseudograph. I t i s shown t h a t t h i s measure aids i n d e c i d i n g which s e c t i o n s of the a l g o r i t h m should be optimized, segmented or expressed as subprograms. The model proposed i s designed to y i e l d a measure which r e f l e c t s both the program flow and computational c o s t . Such a measure allows an •optimal' a l g o r i t h m to be s e l e c t e d from a set of a l g o r i t h m s , a l l of which s o l v e the given problem. T h i s s e l e c t i o n i s made with a more meaningful c r i t e r i o n f o r d e c i s i o n than simply execution c o s t . The measure can a l s o be used to f u r t h e r analyse a given algorithm and p o i n t to where code o p t i m i z a t i o n techniques should be a p p l i e d . However i t does not y i e l d a method of generating e q u i v a l e n t a l g o r i t h m s . i i i TABLE OF CONTENTS Page CHAPTER I: RECENT STUDIES IN ALGORITHMIC COMPLEXITY 1 1.1 Introduction 1 1.2 Related Work 2 1.2.1 Studies Without the Use of Graphs 2 1.2.2 Graph Theoretic Approach 4 1.2.3 Information Theoretic Approach 7 CHAPTER I I : AN INFORMATION THEORETIC MODEL OF ALGORITHMS 10 2.1 Constituents of an Information Theoretic Model 10 2.2 Basic Terminology 12 2.2.1 Graph Theory De f i n i t i o n s 12 2.2.2 Model Definitions 13 2.3 D e f i n i t i o n of the Information Model 18 2.1 Definiton of P r o b a b i l i t i e s 18 2.5 Definiton of Information Measure 22 2.6 Uses of the Information Value 24 2.7 Information Value -A Complexity and E f f i c i e n c y Measure 26 CHAPTER I I I : AN INFORMATION THEORETIC ANALYSIS OF ALGORITHMIC COMPLEXITY 29 3.0 Introduction 29 3.1 Procedure for Flow-Sequence Construction 29 3.2 Flow-subsequences; Weight Assignment 30 i v 3 . 3 H i e r a r c h y o f I s o m o r p h i c F l o w - S e q u e n c e s 31 3 . 4 R e d u c t i o n a n d I s o m o r p h i s m o f F l o w - S e q u e n c e s 32 3 . 5 E n t r o p y , C o n i t i o n a l E n t r o p y , I n f o r m a t i o n V a l u e 33 3 . 6 F u r t h e r A n a l y s i s o f t h e I n f o r m a t i o n V a l u e 33 3 . 6 . 1 S e l e c t i n g a n A l g o r i t h m 33 3 . 6 . 2 S e l e c t i n g t h e A l g o r i t h m I m p l e m e n t a t i o n 3 5 3 . 7 E x a m p l e s 38 3 . 7 . 0 The F o r m a n d P u r p o s e o f t h e E x a m p l e s 38 3 . 7 . 1 C o m p a r i s o n o f Two A l g o r i t h m s f o r t h e Same T a s k 41 3 . 7 . 2 A n a l y s i s o f Heap A l g o r i t h m 4 3 3 . 7 . 3 Summary o f t h e E x a m p l e s 50 3 . 8 Summary 51 B I B L I O G R A P H Y 5 3 APPENDIX A 5 5 A P P E N D I X B 57 V LIST OF TABLES Table Page I Example 1 42 I I Examples 2 and 3 42 I I I Example 4 44 IV Example 5 46 V Example 6 47 VI Example 7 4 9 v i LIST OF FIGURES F i g u r e Page 1 Information Channel 10 c v i i RCKNORLEGEMENTS I wish to express my sincere appreciation to Dr. abbe Mowshcwitz for his continued guidance and invaluable advice. I would also l i k e to thank Dr. D. Seeley for his several suggestions. F i n a n c i a l assistance was received from both the National Research Council and the Department of Computer Science. 1 CHAPTER I: RECENT STUDIES IN ALGORITHMIC COMPLEXITY 1.1 Introduction The following i s a study of an information theoretic model which i s used to develop a complexity measure of an algorithm. The measure is defined to r e f l e c t the computational cost and structure of the given algorithm. In t h i s study computational costs are expressed as the execution times of the algorithm, where the algorithm i s coded as a program in a machine independent language. For purposes of analysis, the algorithm i s represented by a pseudograph. I t w i l l be shown that t h i s measure of complexity aids i n deciding which sections of the algorithm should be optimized, segmented or expressed as subprograms. The model proposed i s designed to y i e l d a measure which r e f l e c t s both the program flow and computational cost. Such a measure allows an •optimal 1 algorithm to be selected from a set of algorithms, a l l of which solve the given problem. This selection i s made with a more meaningful c r i t e r i o n for decision than simply execution cost. The measure can also be used to further analyse a given algorithm and point to where code optimization techniques should be applied. However i t does not y i e l d a method of generating equivalent algorithms. Before a d e f i n i t i o n of the model i s given, a brief summary w i l l be presented of the problem under study and of related research i n this area. The main objectives i n the study of algorithms have been to generate •equivalent* algorithms, to 2 optimize compiler code through an analysis of the algorithm and, through a measure of e f f i c i e n c y , to compare, rank and then s e l e c t , an algorithm which i s i n some sense optimal for a given task. Studies in the f i r s t area have often been quite theoret i c a l and have avoided the problem of obtaining a measure of optimality. The second area, being concerned with compilers, has tended tc concentrate on improvements in p a r t i c u l a r types of code rather than evaluations of the program as a whole. Investigations of the t h i r d type have often focused on only a single facet of complexity rather than the o v e r a l l complexity of the algorithm. The discussion of some of the relevant studies i s given i n three parts. F i r s t those which have not included graphs i n t h e i r analyses, then those that have used graphs, and f i n a l l y those that have defined an information measure on graphs and/or algorithms are presented. 1.2 Belated Work 1.2. 1 Studies Without The Use Of Graphs An early, quite theore t i c a l evaluation of algorithms i s given i n a study by Ianov [17,18]. The notion of 'program schemata* i s introduced to represent abstract algorithms or programs, where programs are represented in a li n e a r notation and also as matrices. Using program schemata, a formalism which allows f o r transforming these schemata into equivalent ones, a decision procedure for determining equivalence, and a method of generating a l l equivalent schemata are developed. Rutledge [28] si m p l i f i e s Ianov's [17,18] formalism by interpreting i t in such 3 a way that the equivalence question i s seen to be the equivalence problem of f i n i t e automata, for which a solution e x i s t s , even though i t i s rather impractical. Furthermore, a method for generating a l l program schemata equivalent to a given program schemata i s presented. These studies are more concerned with the formalisms introduced than with an evaluation of the algorithms. But since these procedures do y i e l d a l l eguivalent program schemata, an e f f i c i e n c y measure defined on them would allow for the optimum of a l l equivalent programs to he determined. However, such an abstract measure cannot reveal much about code implementation or program evaluation. Thus, such studies remain s t r i c t l y t h e o r e t i c a l and very l i t t l e p r a c t i c a l work has followed from them, one exception to t h i s i s Paterson's [25] work on program schemata. He discusses the p o s s i b i l i t y of applying complexity considerations i n program optimization but does not develop the necessary machinery. In a more r e s t r i c t e d study, Beus [9] evaluates sort algorithms by defining an e f f i c i e n c y measure on the number of comparisons made. However, t h i s measure lacks s u f f i c i e n t generality ty neglecting other cost inducing factors such as memory accesses, index c a l c u l a t i o n s etc. At a less t h e o r e t i c a l l e v e l , Hievergelt [24 ] considers the problem of optimizing a program. As motiviation for his study he discusses the application of optimizing i n areas where syntactic improvements would be useful, leaving semantic considerations to the programmer. His study gives several s p e c i f i c p r a c t i c a l transformations that can be applied to any program. These y i e l d d e f i n i t e improvements i n execution cost units. Yet the advantage gained i s questionable considering the expense involved i n applying such improvements to sections of the program that are seldom executed. An improvement on Nievergelt*s [24] analysis appears i n a much referenced paper by Allen [4], where topological properties of the program are used to obtain program modules. The most frequently executed of these are considered for optimization. Hopkins [16], using a similar approach, presents a s e r i e s of transformations for optimizing programs, and the o v e r a l l implementation of such techniques. Many other papers [7,15,22] written during the 1960*s and early 1970's study the problem of optimization; however, their approach i s even more sp e c i a l i z e d than those mentioned above. These studies give techniques f o r optimizing general code, but f a i l to consider program flow, algorithmic implementation etc. This approach does not attempt tc evaluate the program as a whole, but i s concerned with improving p a r t i c u l a r sections of code. In the following section, we consider papers which take a global view of programs and algorithms. 1.2.2 Graph Theoretic Approach One of the f i r s t graph theoretic approaches to the evaluation of algorithms i s the one developed by Karp [19] in 1960. Karp notes that previously the seemingly natural 5 r e l a t i o n s h i p between program flow and graphs had been ignored. In t h i s paper an algorithm i s expressed as a flow chart, an execution of the algorithm defines a path through the flow chart and t h i s path i s considered as a graph consisting of operational elements (seguences of instructions without a transfer) and decision elements. The graphic representation f a c i l i t a t e s the i d e n t i f i c a t i o n of some simple program improvements. The programs are analysed i n . terms of subprograms, and a Markov model i s introduced for investigating the frequency of execution of parts of the program. Schurmann [29] uses the graphic representation to study the problem of program segmentation. This i s also an optimization problem since in order to insure fast execution, a minimum amount cf paging i s reguired. The loops i n the graphic representation of the algorithm are analysed using the adjacency matrix of a p a r t i a l graph based on the cycles (loops of the algorithm). The cut having the minimum number of program loops spanning i t i s found and t h i s defines the optimal cut. Berztiss [8] improves on the method of construction of t h i s matrix. In a more recent paper, Aho and Oilman [ 1 ] introduce cost considerations into algorithmic optimzation. An algorithm i s represented as directed a c y c l i c graphs, and four transformations applicable to t h i s representation are defined. Properties of the programs equivalent under the transformations are discussed, as well as possible extensions of t h i s representation to further program optimizations. 6 Bachmann [6] recently has assessed the problem of program e f f i c i e n c y using an abstract calculus based on a directed graph representation of the program. Several transformational rules are given together with a measure of e f f i c i e n c y defined on the p r o b a b i l i t i e s and the cost of a l l operations executed on a given path. However th i s i s assumed to be a t h e o r e t i c a l c r i t e r i o n only, since determination cf p r o b a b i l i t i e s i s d i f f i c u l t and the number of d i s t i n c t paths often quite large. As well, the formalism needed to apply the transformations i s not mentioned, but would presumably be quite cumbersome. Allen [ 3 ] , using a graphic representation of an algorithm, proposes a useful method of analysing i t s control flow. In his paper, alien defines dominance relations among nodes and from these obtains i n t e r v a l s (subgraphs with single entry point h such that each closed path passes through h) which are used to p a r t i t i o n the graph and tc give a p a r t i a l ordering of the i n t e r v a l s . Mention i s made of how these constructs can be used in analyses involving searches for redundant instructions, variable d e f i n i t i o n s and use relationships etc. Applying the flow a n l y s i s methods given by Allen £3], Cocke [10] defines a set of Boolean variables on computations. This set yields a system of eguations, which, when solved, point to those expressions which should be eliminated. Also included i n t h i s paper i s the notion of node s p l i t t i n g when a cycle i s entered from more than one node, a procedure often very useful i n the optimization of cycles. 7 1.2.3 Information Theoretic Approach Information theory, although applied to diverse problems, has not been used widely i n the area of algorithmic complexity and optimization. In a 1973 paper. Green [13] defines an entropy function to measure the complexity of paths in rooted trees. Let T„ be a tree with n terminal nodes v^ , va , . . ., v n and m non-terminal nodes u t, u^,. .. , u m, and PL = P(v 0,v. t) = P f v ^ u ^ u ^ , . . . , ^ ^ . ) , a path of nodes from vo to v-. Let od (u) denote the outdegree of node u. Then the path entropy E i s given by K I (P.) = Slog od (u, ) + log od (v ) ; L j = 1 J and the entropy of the tree i s defined by n E(T n) = 2 E(P. ) . i=1 F i n a l l y , the normalized entropy of T n i s given by H(T n) = (1/n)E(T r t). A 1969 paper by Warshall [3 1] analyses the problem of algorithmic complexity using information theory, under the constraint that the algorithms studied are viewed as arrays of choice-making elements. Thus, a sequence of states or Boolean expressions ever states cannot be evaluated using t h i s model. The f i n i t e input set consists of the set of data inputs { I } , each of which determines a sequence of states from entry to ex i t . An algorithm A i s a function from the input values to the set of E-sequences ( f i n i t e sequence of at least three elements. 8 beginning and ending with E, the entry or exit s t a t e ) . The states of A consist of the union of the elements occurring i n the E-seguences. To each input 1^ there corresponds a probability P ( I ) . The cost associated with state s i s -p(s-»t) log p (s-»t) where p(s—>t) i s the probability that t i s the next state. The t o t a l in determinancy of algorithm A, i . e . the choice among paths c, i s given by H (c) = - S p (c) (log p (c)) , p (c) = 2 p ( j ) . Then l e t t i n g TT t(s) be the number of occurrences of states s i n c, the mean number of occurrences of s per path i s TT(s) = S p (c)TTc(s) and per execution cost of algorithm A i s C(A) = - S T T ( s ) £p(s-^t)log p(s-»t). Thus, H(A) i s the algorithm cost assuming free bookkeeping, C (A) the cost (in the information theoretic sense) without t h i s assumption. Eased on the above, Marshall evaluates those algorithms which consist only of decision elements and develops some properties of t h i s representation. Rather than present a general analysis of algorithms, t h i s paper selects a subset of algorithms and f i t s i t to an information theoretic model. To be of more general use, t h i s model w i l l have to be extended to include a wider class of algorithms, node execution costs, types of loops etc. Mowshowitz [23] defines the s t r u c t u r a l information content of a graph using the o r b i t s of i t s automorphism group. This measure i s s t r u c t u r a l in the sense that i t gives the information 9 content of a p a r t i c u l a r graph r e l a t i v e to a system of transformations f o r which i t i s invariant. Using f i n i t e undirected graphs and digraphs and a particular type of i n f i n i t e graph, properties of t h i s measure are investigated and the effect of several graph theoretic operations on the measure examined. It i s suggested that an information measure on the structure of the graph of an algorithm might be useful in characterizing the r e l a t i v e complexity of the algorithm. Consideration cf that proposal i s the basis of t h i s t h e s i s . In the following, an information value w i l l be defined on an algorithm, methods of obtaining t h i s value given and i t s usefulness as a measure of e f f i c i e n c y and complexity discussed. In Chapter II d e f i n i t i o n s of necessary terminology, the cost p r o b a b i l i t i e s and the information value w i l l be given. A general explanation of the concepts related to t h i s p a r t i c u l a r measure i s also included. In Chapter I I I , a procedure w i l l be presented for computing the non-isomorphic, reduced seguences of the algorithm being studied. In addition, the chapter elaborates on the usefulness of t h i s measure through extensive examples and possible extensions of the work. 10 CHAPTER I I : AN INFORMATION THEORETIC MODEL OF ALGORITHMS 2. 1 Constituents of an Information Theoretic Model In order to define an information theoretic model, one must i d e n t i f y a channel with inputs and outputs, and input and channel p r o b a b i l i t i e s (see, for example. Ash [ 5 ] ) . The basic channel model defines the input as members of some f i n i t e set {b^,b z,...,b J } , and the output as members of the same or a dif f e r e n t set, say {a , a ,.. . , a K ]. Each output a^ i s s t a t i s t i c a l l y dependent only on the corresponding input bj. Such dependence i s determined by a fixed conditional p r o b a b i l i t y assignment P (a K | b- ) , defined for each input b- and output a^. The set of these conditional p r o b a b i l i t i e s defines the channel. P r o b a b i l i t i e s are also given f o r each input b-. Such a model can be depicted by the flow diagram shown in Figure 1. Noise Source Encoder Channel -V Decoder > > / Figure 1: Information Channel This system describes the flow of information from a source to a destination i n p r o b a b i l i s t i c terms. The source message i s associated with some object which can be sent over the channel. This channel i s considered as the medium over which the coded 11 message i s transmitted. The decoder operates on the channel output in an attempt to obtain the o r i g i n a l message. In our model, the source message w i l l be the basic steps of the algorithm; and the received message, the c o l l e c t i o n of these steps r e s u l t i n g from the transmission of the source message over the channel (i.e. the r e s u l t of the algorithm being executed). Kncwing the output, we w i l l attempt to extract information concerning the source message which w i l l indicate how a more e f f i c i e n t algorithm can be obtained. The fundamental notion of information provided about the event x by the occurrence of the event y i s defined on the model by I(x;y) = log (P (x|y)/P (x)) where the base of the logarithm i s usually taken to be 2. The self-information of the event x i s defined by I (x) = -log P (x) and the conditional self-information of x given y by I (x|y) = -log P (x|y) . The average values of these are defined as the entropy of the ensemble X and conditional entropy of the ensemble X, given Y, respectively, and are given by H (X) = - S P (x) log P (x) and H(X|Y) •= - S p ( x , y ) l o g P(x|y). The average information between X and Y i s the entropy of X less the conditional entropy of X given Y, i . e . I(X;Y) = H(X) - H(X|Y). Such a model w i l l be used i n the following study. The 12 underlying p r o b a b i l i t i e s of the model derive from two d i f f e r e n t sources: data items (rela t i v e frequencies), and a s t r u c t u r a l decomposition. The l a t t e r arise i n the following way. Let n^ for 1 < i < k be non-negative integers associated with an algorithm, and l e t n = ni+ ... + n^. A probability scheme i s constructed by taking p^ = n^/n. The entropy H(P L, .».# P^ ) = -Epilog of the probability scheme then serves as a measure of the structure of the algorithm. The role of the r e l a t i v e freguencies of data items w i l l become clear i n the detailed development which follows. 2.2 Basic Terminology 2.2.1 Graph Theory Definitions Certain graph theoretic concepts are of use in defining this model. The following d e f i n i t i o n s are taken from Harary [11]. A 3£a£h G i s a f i n i t e non-empty set V of p points together with a set X of q unordered pairs of d i s t i n c t points of V. Each pair x = {u,v} of points i n X i s a l i n e of G. A graph G i s directed i f the set X i s ordered. The elements of X are directed l i n e s or arcs. A loop of a graph i s a l i n e which joins a point to i t s e l f ; i f more than one l i n e joins two points, such l i n e s are c a l l e d multiple l i n e s . A directed graph which allows both loops and multiple l i n e s i s c a l l e d a Pgeudograph. A subgraph of G is a graph having a l l i t s points and l i n e s in G. A °f a graph G i s an alternating sequence of points and l i n e s ,x^ , vL ,... ,vn j_ ,xri, v^ , beginning and ending with points, in which each l i n e i s incident with the two points immediately 13 preceding and following i t . Such a walk joins v o and v n and may be denoted by v f l,v ,...,v n. The walk i s closed i f v0=vrt. It i s a t r a i l i f a l l the lines are d i s t i n c t , and a JEath i f a l l the points and hence, a l l the l i n e s , are d i s t i n c t . If a walk i s closed and i t s n > 3 points d i s t i n c t , then i t i s a cycle. 2.2.2 Model Definitions Before defining the model precisely, we w i l l discuss the interpretation of the term algorithm, as used in t h i s study. An algorithm X can be represented, quite naturally, as a pseudograph G (X). Interpreted i n t h i s way, the basic elements or steps of the algorithm become the nodes, {v, ), of a graph. Each node of the graph i s c a l l e d a c a l c u l a t i o n node or a decision and c a l c u l a t i o n node, depending on the type of the eguivalent step in the algorithm. An arc joins two nodes i n the graph i f the corresponding steps are sequential i n the algorithm. Each execution of the algorithm then defines a walk in G (X) ; these walks are, i n f a c t , pseudographs. In any algorithm there are distinguished nodes, at which any necessary i n i t i a l i z a t i o n of values i s done. The next node, v ?, marks the beginning of the n o n - i n i t i a l i z a t i o n steps of the algorithm. We give s p e c i a l attention to certain walks of G(X). A walk in G(X) beginning with vg and up to but not including the next occurrence of v& i s c a l l e d a flow-seguence a of G(X). The set {a K } of flow-sequences i n G(X) w i l l be denoted by A(X), or simply A. I f v^ i s the f i r s t decision node of a flow-sequence a w = v e >\ ' • * * vu 'v<4 ' • • •» V i / t h e n a f low-subsegugnce b of a i s 14 defined as (1) any subwalk of a K , beginning with a decision node v', and up to, but not including, the next occurrence cf either v 1 or v, , or (2) the subwalk v , v ,.. . , vL/. The introduction of S S ^ flow-subseguences into the model i s a s t r u c t u r a l consideration. For a given node, each of the flow-subsequences containing i t provides a d i s t i n c t s t r u c t u r a l context i n which i t may be analysed. We denote the set of a l l flow-subsequences of flow-sequences in G (X) by B (X) or B. The set of flow-subsequences of a flow-sequence a i s denoted by B (a). In the following, we assume the t o t a l number of flow-sequences and flow-subsequences to be m and n, repectively. The notion of flow-subsequence i s somewhat si m i l a r to that of i n t e r v a l , given by Allen [4], Before defining reduction and isomorphism of flow-seguences, we introduce some further terminology. In order to provide a weight function for the model, a hypothetical assembly language i s defined. The language i s simple so as not to incorporate any instructions dependent on the configuration of some machine. The actual in s t r u c t i o n set i s given in Appendix A. Each node v of the graph i s assigned a weight, w(v), which i s the cost in execution units, ( i . e . machine c y c l e s ) , of the assembly language instructions necessary to compute t h i s step. The weight of a walk i s the sum of the costs of i t s constituent nodes. For this study, we equate an increase i n e f f i c i e n c y with a decrease i n these execution costs. Next we define the process of structuring to include 15 a l t e r a t i o n s to the program code which depend on flow relationships among certain nodes of the graph. A structuring i s considered b e n e f i c i a l i f the e f f i c i e n c y of the program i s increased. writing a section of the algorithm as a subroutine, or as a form that r e f l e c t s a fixed order among certain nodes are examples of structuring. Structural complexity i s i d e n t i f i e d with the flow of control of the algorithm. For example, we regard as simple, a structure with simple l i n e a r flow, and as complex, one with imbedded looping and branching. we now continue with the model d e f i n i t i o n s . An implementation M (X) of algorithm X, i s a coding of the algorithm as a program i n the assembly language. The standard implementation, M0(X) , i s the i n i t i a l coding of the algorithm. For i > 1, M-(X) w i l l denote an implementation which has been structured in an attempt to decrease o v e r a l l execution costs. We next make precise the terms reduction and isomorphism. A flcw-seguence may contain several copies of a given flow-subseguence, as i s the case, for example, when a loop i s repeated several times. Noting t h i s , we define a', the reduced flow-sequence of flow-sequence a, as the walk consisting of the d i s t i n c t flow-subsequences occuring in a. We define the weight of a flow-sequence as the weight of the corresponding reduced flow-sequence. Thus, w(a) i s to be interpreted throughout as w(a'). The set of a l l reduced flow-sequences w i l l be denoted by A* (X) or simply A'. A flow-sequence a r i s isomorphic to flow-sequence a s i f , a) B(ar') = B(a s'), i . e . the order of flow-subsequences within 16 flew-sequences i s not important, or b) Bfa,?) C B {a*), and there i s no a 1 such that B(ar') C B(a«) and «(ar') < w(at«) < w(a6») . That i s , the flow-subsequences of ar' are also i n the set of f lew-subsequences of a&', and the cost in execution units of a,? i s closer to the cost of a$' than to any other reduced flow-sequence. Reduction and isomorphism are present in the model i n order to include structure i n the measure of complexity. If the flow-sequences are such that flow-subsequences are not repeated, i . e . no reduction i s possible, then within these flow-sequences, structuring of the program so that some inner loop i s made very e f f i c i e n t i s unlikely to decrease the execution times s i g n i f i c a n t l y . This follows since no single part of the flow-sequence i s repeated often. However i f much reduction i s possible, i . e . inner loops are repeated many times, structuring the program to r e f l e c t t h i s , or coding these lccps more e f f i c i e n t l y , decreases the o v e r a l l execution time. S i m i l a r l y , when analysing isomorphism, i f the same basic structure i s repeated, i . e . one flew-sequence i s simply a subset of another, then the complexity of the t o t a l graph i s decreased, while the cost remains constant. Emphasis on such a flow-sequence, when attempting to optimize the code, i s l i k e l y to decrease o v e r a l l cost to a greater extent than a flow-seguence which has no corresponding isomorphic flow-sequences. also, i f 17 there are no isomorphic flow-sequences, then there i s no par t i c u l a r flow-sequence which should be studied for more e f f i c i e n t ceding, and the cost i s u n l i k e l y to be s i g n i f i c a n t l y reduced by considering the structure alone. That i s , assuming the unstructured program has been coded e f f i c i e n t l y , no structuring w i l l markedly reduce the o v e r a l l cost. This w i l l become more evident when the information value i s defined below. One can associate with an algorithm X the set of data D(X) on which the algorithm X operates. For example, i f S i s an algorithm to sort n numbers into order, then D(S) i s the set of a l l n f a c t o r i a l possible orderings of n numbers. If D(X) i s not f i n i t e , then for this analysis a f i n i t e subset D' (X) of D (X) i s selected which i s representative of the data on which X executes. By representative we mean that for each possible flow-sequence of G(X), there i s an element of D*(X) which y i e l d s that flow-sequence. Since a f i n i t e data set can be obtained for any algorithm, we w i l l assume that D(X) i s f i n i t e i n what follows. With each element d of D (X) , there i s an associated r e l a t i v e frequency, '8(d), which denotes how frequently the algorithm executes on t h i s element. For each d, l e t GJX) denote the pseudograph resulting from the execution of the algorithm on datum d, We then define to be the set cf flow-sequences of Gd(X) . A^ ' i s the subset of consisting of the reduced, non-isomorphic flow-sequences of A , . The union of a l l A* over D(X) i s the set A*, defined above. Bd i s the corresponding set for flow-subsequences. 18 2.3 Definition of the Information Hodel. The model used w i l l follow the description given i n section 2 . 1 , altered to include a probability scheme defined on the cost and s t r u c t u r a l properties of the algorithm. With the flow-subseguences as the source message and the resulting flow-sequences as the received message, t h i s model becomes meaningful. The execution of the algorithm on D(X) produces as output, flow-seguences; an analysis of these yi e l d s information about the input which can point to a more e f f i c i e n t implementation of the algorithm. Thus, by transmitting the input ever the channel, the algorithm can be characterized by the resulting flow-sequences, and an information value defined to r e f l e c t i t s cost and structure. Given an algorithm X, i t s graph G (X) , and i t s data set D (X), the model i s characterized b r i e f l y as follows. The channel i s defined by the conditional p r o b a b i l i t i e s induced by D(X). The ijvput, {b - }, i s the set B of flow-subseguences of G (X). The output i s qiven by A *= [a^* }, the set of reduced, non-isomorphic flow-sequences obtained by transmitting B over the channel. 2.4 Definition Of P r o b a b i l i t i e s The input, channel and output p r o b a b i l i t i e s w i l l now be defined, completing the formulation of the model. The probabilties are defined as a function of execution cost units 19 in order that the the model w i l l i n fact be a function of cost. Relative frequencies are assumed tc be defined on D(X) and hence are incorporated i n the d e f i n i t i o n of the information value. Given that the execution cost of flow-subsequence b" i s defined 0 by and the t o t a l cost of the n flow-subsequences by H = 2Mj , then the input p r o b a b i l i t i e s are defined by Pfbj) = B j / * . This gives the cost frequency of flow-subsequence b^ with respect to the t o t a l cost of the flow-subsequences. Defining aJ = { a | b- i s i n B (a) }, and the weight of a-* as w(aJ ) = 2w(a) , where the sum i s over those a in a^, then the channel p r o b a b i l i t i e s are [ w(a„)/w(a'i) i f b- i s i n B (a u) 0 otherwise i . e . the cost frequency of flow-sequence with respect to those flow-sequences containing flow-subsequence b:. We denote J the output p r o b a b i l i t i e s by n P{aK) = 2,P(a K|b-)P(bj) , i . e . the cost probability of flow-sequence a^ averaged cost wise over a l l flow-subsequences i n flow-sequence a K . For each element d of D (X), R(d) i s the r e l a t i v e frequency of datum d. 20 We now show t h a t these d e f i n i t i o n s do i n f a c t y i e l d p r o b a b i l i t i e s . Obviously i=1 * = 1 s i n c e , 2 P ( b ; ) = i=1 N N • + I r l = I ~ 1 N J 8 A l s o 2 P ( a J k=1 m n = S S P ( a l b , ) P ( b : ) k=1 j=1 m n = (1/N) 2 2 (w(a )/w(a J))N ; k=1 j=1 J [ ' w (a1-) v (a*) ' w(a") + *Al „ & * w (a"1 ) y v (a L ) w (a n) 21 + . . . + 8^ ( H t „ + . . . + w (a* ) = <1/N) ( H t + N x + . . . + N „ ) = 1 f w (a K) i f b- i s in B (a K) where w = j '•J 0 otherwise m and <1/w(aL)) = (w (aL)/w (a L)) = 1 • k=l *'L In defining the model p r o b a b i l i t i e s as cost measures, the concepts of cost and complexity are strengthened within the model. If a flow-sequence a^ has high p r o b a b i l i t y , then t h i s indicates one or more of the following. F i r s t , a K consists of seldom repeated flow-subsequences, i . e . the flow-subsequences of a^ do not appear i n many other flow-sequences, so structuring of the program around t h e i r occurrence in a^ w i l l not greatly impede the execution of the other flow-seguences, and should decrease the execution time of a k . Also, i f the flow-sufcseguences i n a^ are costly i n r e l a t i o n to those in other flow-sequences, high probability r e s u l t s . This i s consistent with the notion that a very costly flow-sequence should receive considerable analysis, since i t necesssarily .will be a large contributor tc overall cost (assuming uniform r e l a t i v e 22 frequencies). If the above occur, a higher cost p r o b a b i l i t y w i l l r e s u l t , and hence, as w i l l be shown, a larger information value. 2.5 Definition Of Information Measure Using the information theoretic r e l a t i o n I (X;Y) = H (¥) - H (YjX) , the information value of an algorithmic implementation w i l l be defined in terms of the entropy H(A(X)) of the ouput flow-sequences, and the conditional entropy H (A (X) | B (X)), of the flow-sequences given the input flow-subsequences. For each possible datum d i n D(X), with r e l a t i v e frequency B(d), H d(A(X)) and H d (A (X) | B (X)) are calculated as H d(A(X)) = -2 (w(a y)P(a K)log P(a K)) where the sum i s over the set of reduced flow-seguences A^, and H d(A (X) |B (X)) = -£(N-P(b- J P f a J b ^ l o g P ( a u | b j ) ) , the sum taken over the set of flow-sequences A', and flow-subsequences B^. The costs, and hence p r o b a b i l i t i e s , are those determined by the given implementation, M (X). Where no confusion w i l l a r i s e , we w i l l write A and B f o r A(X) and B(X), respectively. We interpret the entropy as a measure of the cost and s t r u c t u r a l complexity of the algorithm. A lower entropy occurs i f there are only a few flow-sequences in A* compared to the number possible, i . e . there were many isomorphic flow-sequences. Thus, proper structuring should decrease the o v e r a l l cost. 23 Also, i f the same flow-subsequence occurs i n many di f f e r e n t flow-sequences, the entropy i s lower, indicating that this flow-subsequence should be coded more e f f i c i e n t l y , with emphasis on i t s flow pattern. However, entropy i s higher when either the f low-subseguences within c e r t a i n flow-sequences are costly r e l a t i v e to other flow-subsequences, or flow-sequences are long and s t r u c t u r a l l y complex r e l a t i v e to other flow-seguences. Attention should be given to such flow-sequences when attempting to optimize the code of a given algorithm, since overa l l cost i s dominated by them. This measure then, points to flow-sequences or flow-subsequences which should be considered for possible optimization. The weighted conditional entropy, H^(A|B), i s a refinement of the entropy measure. It considers the amount of cost and st r u c t u r a l complexity preference that i s , i n a sense, duplicated. If the same flow-subsequence b appears i n many di f f e r e n t flow-sequences, then, although the code of b can be optimized, only one of the flow-sequences containing b can be structured so as to optimize on b's r e l a t i v e position among other flow-subseguences of the flow-sequence. This follows from the observation that i f more than one of these flow-sequences has the same general structure, then they are isomorphic. But i f they are isomorphic, then only one copy i s present. Necessarily then, the structures i n the non-isomorphic sequences are d i s t i n c t , verifying the above remark. Hence, by subtracting 24 the conditional from the unconditional entropy, a correction i s made for the amount of 'structural information' which i s repeated. Also of note i s the fact that i f the conditional entropy measure i s low, then the flow-subsequences are generally not repeated in many flow-seguences. This indicates a f a i r l y simple structure which yields a substantial cost decrease when i t i s properly structured and e f f i c i e n t l y coded. However, a high conditional entropy denotes a more complex structure, with the same flow-subsequences occurring i n many s t r u c t u r a l l y d i f f e r e n t flow-sequences. In a sense the conditional entropy accounts for the inters e c t i o n of flow-sequences, through common flow-subseguences. The above discussion i s concerned with only a single element d of D (X). To complete the model, the input must be transmitted through the channel, i . e . the algorithm must be executed on a l l data. Thus the entropy and conditional entropy are defined as H (A) = 2 8 (d) H^ (A) and H(A|B) = S B(d)H d(A|B), where the sum is over a l l d i n D (X). Then the information value of the implementation of algorithm X i s given by I (A; B) = H (A) - H (A | B) . 2.6 Uses of The Information Value The term cost comparable i s used here to describe those algorithms which, i n their standard implementations, have within 25 a few per cent the same execution cost per node, t o t a l number of nodes, and execution cost per datum. The maximum value of I(A;B) over a set {Z} of cost comparable algorithms which solve the same problem, i s obtained from the most e f f i c i e n t algorithm. A higher information value on t h i s set i s more 'informative 1, in that i t indicates any of the following. F i r s t , the program can be structured to take advantage of a dominant flow-seguence which i s s t r u c t u r a l l y simple. Secondly, certain flow-subsequences can be considered for optimization of cede since they are dominant throughout the program. And l a s t l y , c ertain flew-sequences, consisting of c o s t l y flow-subsequences, are making the program expensive to execute. In the set {Z}, an algorithm such that no structuring i s b e n e f i c i a l to i t , must consist of many d i s t i n c t flow-sequences, since there are no isomorphic flow-sequences. Furthermore, no reduction i s possible and some flow-subsequences appear i n many di f f e r e n t flow-sequences. Hence, both the conditional and flow-sequence probability w i l l be low, and thus, also the information value. But an algorithm where t h i s i s not the case, i . e . many dominant flews, w i l l have a higher information value. On the other hand, r e s t r i c t i n g our attention to a particular algorithm X of {Z}, the best implementation M (X) i s the one having the lowest information value. This follows by observing that the implementation to which consideration of structure has been most b e n e f i c i a l , w i l l have the lowest cost, and hence information value. I n t i a l l y the information value i s calculated r e l a t i v e to the standard algorithm implementation 26 M^X). Then, upon st r u c t u r i n g , i f the information value increases, such a structuring does not r e f l e c t the flow of the algorithm. But i f the information value decreases, the given implementation included structuring and optimizing which decrease the cost s i g n i f i c a n t l y . A further discussion and v e r i f i c a t i o n of t h i s fact i s given i n Chapter II I , where i t i s shown that the information value can be used to select that implementation which i s most appropriate in a given s i t u a t i o n . 2.7 Information Value - A Complexity and E f f i c i e n c y Measure In order to analyse an algorithm, a means of ranking i t r e l a t i v e to other algorithms for the same task must be available. This measure should include the cost of executing the algorithm, and r e f l e c t i t s flew pattern. Measures which are a function of a single cost contributor, although helpful when comparing algorithms r e l a t i v e to t h i s factor, are necessarily, as a general measure of cost or complexity, only p a r t i a l l y e f f e c t i v e . S i m i l a r l y , measures which consider flow only are most helpful i n analysing the problem of whether and how code can be optimized, but in such study, r e l a t i v e costs are often disregarded, again leaving the measure, in some sense, incomplete. A measure then, that i s useful i n a general sense should r e f l e c t both of these f a c t o r s . In the measure given above, an attempt has been made to incorporate cost and some notion of st r u c t u r a l complexity. An information theoretic measure seemed 27 to be a most natural means of combining the two e n t i t i e s , Through the d e f i n i t i o n of isomorphism and reduction, and of the p r o b a b i l i t i e s , t h i s model of the execution of an algorithm becomes a function of s t r u c t u r a l complexity and cost, respectively. As desired, an algorithm which i s more suitable for structuring w i l l have a higher information content than an algorithm with comparable cost, but for which i t i s not p r a c t i c a l tc attempt to e s t a b l i s h an e f f i c i e n t structure. I f algorithms are ' s t r u c t u r a l l y eguivalent* in the sense that, the flow pattern of both indicates that the same amount of structuring i s applicable, then the information value w i l l rank the algorithms according to cost, the more expensive one having a higher information value. Thus, when comparing d i f f e r e n t implementations of some algorithm X, that one with the smallest information value w i l l be the most e f f i c i e n t i n terms of execution costs. Structural equivalence i s indicated by the conditional entropy, which, when analysing a given algorithm, pa r t i t i o n s the data set into blocks, where each block responds in approximately the same manner to proper structuring. If the conditional entropies are equal for elements d1 and d2 of D(X), then their walks through the algorithm are the same. This concept i s useful when selecting the most costly of those sections of code that respond s i m i l a r l y to given code optimization. Thus, the measure achieves the goal of establishing a complexity measure which i s a function of both cost and structure. This measure, by analysing the output of flow-seguences, provides information about the constituent flow-28 subsequences; such information then points to ways in which the algorithm can be made more e f f i c i e n t . Further discussion of thi s observation follows i n Chapter I I I , where several examples of how t h i s measure has been applied are given. 29 CHAPTER I I I : AN INFORMATION THEORETIC ANALYSIS OF ALGORITHMIC COMPLEXITY 3.0 Introduction In t h i s Chapter, we present a more detailed study of the model, He describe the processes used to obtain i t s components, investigate c e r t a i n properties of the defined measure, and f i n a l l y , demonstrate, through examples, some of i t s applications. Let X be an a r b i t r a r y algorithm. Throughout the following discussion, the nodes v Q,v t,...,v r of G ( X ) w i l l be numbered to follow the directed flow, depth f i r s t . Each node i s either a decision and c a l c u l a t i o n node, or simply a ca l c u l a t i o n node. Using the procedures given below, the input, output, and input and channel p r o b a b i l i t i e s can be obtained, and hence, the model made operational. 3.1 Procedure for Flow-Sequence Construction In the statement of the procedure, j i s used as the index on the nodes of X. The f i r s t n o n - i n i t i a l i z a t i o n step of the algorithm i s v. , and the terminal node i s v . Here we assume that no decision node occurs i n the i n i t i a l i z a t i o n process. A stack i s used to store that portion of a flow sequence that has been constructed prior to the occurrence of the present decision node. Also stored on the stack i s the number of branches exiting from th i s decision node (and hence the number of flow-sequences that w i l l be created). 30 Step 1: I n i t i a l i z a t i o n step. I n i t i a l i z e k to 1, j to 0. Step 2: I n i t i a l i z a t i o n Nodes. I f j<s, increment j and return to step 2. I f j=s, set a^ to v, , and i f v_ i s a decision node, stack j and a,,; increment j , and go tc step 3. Step 3: Main Flow-Sequence Construction. If j>r, go to step 4. Adjoin V j to a K» If j ^ t , and v- i s a decision node, stack j , a R ; i f j=t, and the stack i s not empty, increment k, remove a K , j from the stack; increment j ; go to step 3. Step 4: Terminate. 3.2 Flow-Subsequences; Weight Assignment. The set B of flow-subsequences i s formed from A in the following manner, where i n i t i a l l y the set B i s empty. For each flow-sequence a, of A, add to B any of a's flow-subsequences not already in B. To allow the analysis programs to manipulate the flow-sutsequences more e a s i l y , we associate a set of numbers, P. , with each node v,- , one value for each of the f low-j J subsequences in which v. appears (see Appendix B and the following examples). To accomplish t h i s , i n i t i a l i z e i to 1 and repeat the following procedure for each node v. . For each flow-O subseguence b of B, i f v. i s a node of b, include i in P- and increment i . Using the results of this process, the flow-subseguences are expressed as sets of numbers. Next, the nodes are a l l o t t e d weights determined by the current implementation of 31 the algorithm. Each flow-subsequence and flow-sequence i s then assigned a weight which i s the sum of the weights of i t s constituent flow-subsequences. Once th i s assignment has been made, the cost p r o b a b i l i t i e s can be calculated. 3.3 Hierarchy Of Isomorphic Flow-Sequences The flow-sequences are now ordered as a hierarchy of families of flow-sequences, F^ , where b (F. ) i s the lowest, by weight, member of the family F-L , q the number of families constructed so f a r , and s the index of the family to which the current flow-sequence w i l l be adjoined. Step 0: I n i t i a l i z a t i o n Step Set q to 1, and F^ to the flow-sequence of highest weight, t i e s resolved a r b i t r a r i l y . Step 1: If any flow-sequences remain, set h to the one of highest weight, set i to 1 and go to step 2. Otherwise, go to step 5. Step 2: If B (h) i s a subset of B (b (FL )) , then set d i f f to w (b (F. ) )-w(h) , set s to i , increment i and go to step 3. If B (h) i s not a subset of B (b (F^)) , increment i ; i f i>q, set s to i and go to step 4; otherwise go to step 2. Step 3: I f i<q, i f B (h) i s a subset of B (b (F. ) ) , ana d i f f i s greater than w (b (F- ) ) - w (h) , then set d i f f to t h i s new difference, and set s to i ; i f i ^ g , increment i and go to step 3; otherwise, step 4. Step 4: Adjoin h to F„ , set q to i and return to step 1. 5 32 Step 5: Terminate. The above procedure establishes families of isomorphic flow-sequences, where such flow-sequences are reduced. In each family, the flow-sequences are ordered by weight, the most costly being assigned the highest order (see Appendix B) . The use of the hierarchy makes the removal of isomorphic copies of flow-seguences a simple process. 3.4 Reduction and Isomorphism of Flow-sequences For each d, the pseudograph G (X) i s produced, and AJ d computed. Then, the following processes are applied so that the flow-seguences can be reduced and isomorphic copies removed. F i r s t , A^ i s considered for reduction. Within each flow-sequence, i f any flow-subsequence appears more than once, only a single copy i s retained i n the walk; however, the t o t a l weight of the flow-sequence remains unchanged. Next, the reduced flow-seguences are checked for isomorphism. Each reduced flow-seguence, a', has an associated family, F*, in the hierarchy. If there i s a flow-sequence i n both F' and A^ which has a higher order than a' in F', or a* occurs more than once i n A d, remove a copy of a' from A^. Then, add wfa 1) to that flow-sequence in both A^ and F» which has the least order greater than or equal to the order of a'. This process yie l d s the subset A * of A.-consisting of only reduced, non-isomorphic flow-seguences (see Appendix B and the examples i n section 3 . 7 ) . 33 3 . 5 Entropy, Conditional Entropy, Information Value Using A', the entropy, conditional entropy and information d value are calculated for each fixed d i n D(X). Then, associating a r e l a t i v e frequency with each d, the entropy, conditional entropy and information value are obtained by averaging over D(X). Each d i s t i n c t implementation of the algorithm yi e l d s another set of costs for B and hence, defines a new input probability d i s t r i b u t i o n . The evaluation and analysis of the algorithm which i s now possible, i s discussed with examples, in the remaining sections. 3 . 6 Further Analysis of the Information Value In t h i s study, the information value has been used for two purposes. F i r s t , to s e l e c t from a set of algorithms, that one which i s , i n a cost sense, the most suitable for the given task, and secondly, to choose an implementation of that algorithm which i s most appropriate in such a s i t u a t i o n . In the following analysis, we assume that the flow-sequences have been reduced, and isomorphic copies removed. Also, we define an index set on the flow-subsequences bj of a flow-sequence a by I (a) = { j i b - i s in B (a) }. 3.6.1 Selecting An Algorithm F i r s t , we assume that the algorithms are cost comparable within three or four per cent. For such algorithms, since the costs are more or less the same, the flow structure i s the 34 dominant component of the information value; we obtain the largest information value from the s t r u c t u r a l l y optimum algorithm. S p e c i f i c a l l y , l e t X and Y be two cost comparable algorithms. Furthermore, suppose that algorithm X has a d i s t i n c t l y better flow structure than algorithm Y, i n the sense that each flow-sequence c of X consists of flow-subsequences not occuring i n other flow-seguences of X. Then the information value indicates that algorithm X has a flow structure which can be used to point to a more e f f i c i e n t coding of the algorithm. To see t h i s , we study each algorithm's information value. F i r s t analysing algorithm X, we note that in most cases the flow-subsequences bj of a flow-sequence cK do not appear i n many other flow-sequences, so the conditional p r o b a b i l i t y P ( c j b ^ ) = w(c K)/w(c J) , i s close to unity and, thus, over I (c^) , P(c K) i s approximately equal to S p ( b - ) ; t h i s shows that the walk p r o b a b i l i t i e s are f a i r l y uniformly distributed. However in the second algorithm this does not hold. Most flow-subsequences xt • of flow-sequence e^ of Y, occur i n other flow-sequences. As a r e s u l t , there are more flow-sequences for Y than for X. Thus, when calcula t i n g j P<ej |ut.) = w (e^)/w (e L) , i t s value i s considerably less than unity and hence the r e s u l t i n g flow-sequence p r o b a b i l i t i e s P(e ) = £p(e^|u L)P(u.) , as well as corresponding entropy, are less than those for algorithm X. Hence, since the flow-sequence weights are nearly 35 the same, the information value for algorithm x i s greater than that of algorithm Y. Thus, when comparing unstructured algorithms, the one with the higher information value i s selected, and an implementation of i t which r e f l e c t s the flow structure, coded. To summarize, when the costs for algorithms are comparable, the maximum information value indicates which algorithm i s optimal r e l a t i v e to t h i s factor. 3.6.2 Selecting the Algorithm Implementation Having chosen the algorithm most suitable for the given task, an analysis of i t , based on cost, i s made. In t h i s instance, the implementation with minimum cost i s desirable; accordingly the minimum of the information values points to such an implementation. To see t h i s , we consider the two ways in which the algorithm costs can be decreased. Case Jl: Some flow-seguence a* i s made more e f f i c i e n t by applying optimizing techniques to i t s flow-subsequences. For s i m p l i c i t y , we assume that a* consists of flow-subsequences b^ which do not appear i n any other flow-sequence of the algorithm, and that P(a*) < 0.5. Then, i f a* i s coded more e f f i c i e n t l y , i . e . w (a*) decreases, the information value also decreases. The proof follows. Under the assumptions given above, l e t Z j be the the decrease in N; , (z; i s zero i f b%- i s not in B (a*)) , z = S Z J , and 36 N*= 2N - , where the sums are over I (a*). Recall that N i s defined by 2w (bj), where the sum i s taken over B. Then the following equality holds f o r those j in I (a*). P ( a * | b , ) = « (a*)/w (a^ ) = w(a*)/w(a*) = 1 . Also, 2 P (b^) over I (a*) decreases, for i f SP(bj) were to increase, then 2 (N-j-2-)/(N-Z) > This implies (N*-z)/(N-z) > N*/N, and hence N < N *, a contradiction. So 2 P(b^) over I (a*) decreases, which implies P (a*) decreases, and hence, that -P(a*)log P (a*) decreases. Now w(a*) decreases by assumption, so w (a*) (-P (a*) log P (a*)) , the information value of a*, decreases. Qed. Now we examine the ef f e c t of decreasing the cost of some flow-subsequence b* which appears in many flow-sequences. This sit u a t i o n can be examined as two subcases. Case 2-J: F i r s t suppose that b* i s not in either B (a^) or B (aJ ), j i n I(a^), and l e t z be the decrease in w(b*). Now P ( b ) = N-/N increases for b-, i n B(a u) since N- i s fixed and N decreases. Also P(a^|b-j) = w(a^)/w(aJ) i s fixed, since both and w(aJ) are constant. Thus, n P (a ) = S P (a |b- ) P (b ;) j = 1 J J increases, and hence -P ( a k ) l o g P (a u) increases. This 37 implies -w(au) P(a^)log P (a u) increases, since P(a^) < 0.5 and w(aK) i s f i x e d . That i s , the information value increases. Now, assume that b* i s i n B ( a J ) , for seme j i n I(a^), but b* i s not i n B(a K). Thus w(aJ) decreases, Pta^Jbj) = wta^J/wta1*) increases, and, as above P (b. ) increases. Hence, P ( a K ) , and thus the information value of a^, increases. Thus, when b* i s not in B ( a K ) , the information value indicates that the change made was not b e n e f i c i a l to the flow-sequence a^. If this i s true for most flow-sequences, the o v e r a l l information value increases, indicating that such a change should not be made. Case 2-2: b* i s i n Bfa^). Again l e t z be the decrease in w(b*). l e t p' be the new P(a K) , p the old Pfa^), and w'(a^) = w(a K)~z. If p» < p, then -w«(a K) (p'log p») i s less than -w(a K) (p log p). Thus, the information value decreases. On the other hand, i f p' > p, the information value s t i l l decreases for most z. The difference, p*-p, must be small, since even large z implies only small increases in P(a w|bj) and small changes in P ( b ) , j i n I(a ). We observe that the information value decreases i f f w« (a K) (-p'log p') < w(a K)(-p log p) , i f f w(aK) [ (-p'log p»)-(-p log p) ] < z (p'log p') i f f w (a K) (log (p «P /p?) ) < z(p»log p'). And, since for most z, log^p'^/p^j i s approximately 0, t h i s ineguality i s s a t i s f i e d . That i s , the information value decreases. 38 The usefulness of the information measure f o r selecting the most appropriate implementation of an algorithm i s thus evident. When the implementation i s an improvement, the corresponding information value decreases, and when this i s not the case, the information value increases. Some of the applications of such a measure are now given. F i r s t , i f the probability of a flow-seguence i s r e l a t i v e l y large, but i t s weight i s comparable to other flow-seguences, then the algorithm can be coded to r e f l e c t t h i s flow, and hence to decrease execution costs. Also, when attempting to p a r t i t i o n a program into segments, a flow-sequence with high p r o b a b i l i t y can be informative. Such probability indicates that the flow-sequence has l i t t l e i n t e r a c t i o n with other parts of the program, a major factor i n a segmentation problem. Thirdly, i f the conditional p r o b a b i l i t i e s on the flow-sequences, r e l a t i v e to some flow-subsequence b, are consistently low, then such a flow-subsequence appears many times, and makinq i t more e f f i c i e n t i s ref l e c t e d throughout the program. To i l l u s t r a t e the two purposes the information measure i s used for i n this study, the following examples are included. 3.7 Examples 3.7.0 The Form and Purpose of the Examples In order to analyse the selected algorithms, certain programs are necessary to produce and evaluate the output and calculate the information value. The language i n which these 39 routines are written i s ALGOLW. However, the programs representing the algorithms are expressed in the hypothetical assembly language, described i n Appendix A. To apply the measure, a task and algorithms to compute the task, are selected. The sorting of n numbers (eguivalently the indices of n records)' into order i s chosen since several well documented algorithms for sorting exist, and the corresponding data sets are well defined ( i . e . a l l possible orderings of n numbers). The two sort algorithms, 'heap' and •merge* [21], are used as the possible candidates for the task of placing i n order f i v e numbers. In the following discussion, the f i v e f a c t o r i a l elements of the data set are a r b i t r a r i l y numbered from 1 to 120; we w i l l r e f e r to certain of these data in the examples given below. On t h i s data set, the merge algorithm i s not amenable to structuring since no reduction of flow-subsequences i s possible. Although t h i s l a t t e r point o r d i n a r i l y implies that code optimization of the flow-sequences would be b e n e f i c i a l , the s i m p l i c i t y of the calculations, and the lack of interaction among the nodes within a flow-subsequence allow for no noticahle improvements. As a contrast, the heap algorithm i s studied. In t h i s case, the •shortcomings* of the merge algorithm are absent, and hence, both structuring and code optimization can be applied. The task evaluated here i s obviously somewhat t r i v i a l , so the concept of structuring i s not as dominant as i t would be in a more complex s i t u a t i o n . For a task involving more 40 calculations and decisions, the use of subroutines as structures would be e f f e c t i v e . However, the s i m p l i c i t y of the examples s t i l l allows the usefulness of t h i s measure to be i l l u s t r a t e d , both as a cost and st r u c t u r a l complexity standard, and as an improvement on a measure based on execution costs alone. When analysing an algorithm, the standard implementation i s coded f i r s t . So at t h i s time, no attempt i s made to optimize any p a r t i c u l a r area of the program. Both the merge and heap algorithms are implemented i n this manner. In addition, based on the res u l t s of an analysis of the information values, another implementation of the heap algorithm i s given. In the examples, the execution costs associated with each implementation are included with the information value, so that a comparison can be made with the conventional measure. Table entries l i s t e d as costs, are i n execution units, while the information value and conditional entropy are un i t l e s s . The measure i s f i r s t applied to the problem of deciding which of two algorithms should be used for a given task. The information value points to the algorithm which i s more suitable for the si t u a t i o n . Moreover, once the algorithm has been selected, the measure further analyses i t by providing information cn i t s o v e r a l l cost and structure, which indicates where attention should be focused i n order to produce a more e f f i c i e n t implementation of the algorithm. 41 3.7.1 Comparison of Two algorithms for the Same Task In t h i s section the f i r s t a pplication of the measure i s discussed. The heap algorithm, (H-I), as mentioned above, i s amenable to structuring, while merge, (M), i s not. The standard implementations are compared and consequently B-II, an implementation r e s u l t i n g from a structuring which improves the e f f i c i e n c y of heap, i s included i n the study. In the f i r s t example, assuming the r e l a t i v e freguencies of the orderings are equal, the average costs of the algorithms are compared (see Table I ) . Merge i s seen to be cheaper; however, removing the extreme cases (i.e. those with very high or very low costs), the costs are r e l a t i v e l y comparable. Also, the number of nodes and cost per node are very close. Thus the information value can be used to evaluate the two algorithms, indicating that heap i s more informative, i . e . given proper structuring the overa l l cost of heap, for t h i s s i t u a t i o n , would be less c o s t l y . The implementation of a single s t r u c t u r a l improvement (optimization of a flow-subseguence), reduces the average cost of heap below that of merge. Using the measure of average cost, i t i s unlikely that the heap algorithm would have been given further consideration f o r t h i s p a r t i c u l a r application. In the next section a further analysis of heap i s given. 42 Ala m e r g e heap-I heap-II Av Cost 198.75 228.31 198. 18 info Val 35. 195 38.365 37.608 Table I: Example 1 The next two examples involve p a r t i c u l a r data and t h e i r r e s u l t i n g output from the two algorithms. Such analysis i s useful i f a p a r t i c u l a r datum or type of datum i s highly probable. In such a case, the algorithm that performs in the best way for the given datum i s the one selected. Datum IV H-I IV W M cost H-I cost H-II cost 45 31.495 34.940 200.1 206.5 202.7 69 33.480 33.328 202.9 205.9 201.9 44 40.420 40.075 199.5 200.6 195.7 Table I I : Examples 2 and 3 As shown in Table II in the entry for datum 45, the information value of merge i s higher, i n d i c a t i n g that even with proper structuring and optimizing i t i s unlikely that the heap algorithm w i l l perform better than merge. The costs of H-I and H-II confirm t h i s . 43 For each of data 69 and 44 the information values of the two algorithms are r e l a t i v e l y close, but heap i s s l i g h t l y higher. This indicates that optimization methods should reduce the cost of the heap algorithm. Again the costs associated with H-I and H-II support t h i s observation. 3.7.2 Analysis of Heap Algorithm In the remaining examples, the flow-subsequences are expressed as sequences of numbers, corresponding tc nodes, separated by dashes; parentheses indicate repeated flow-subseguences which are eliminated i n the reduction process. Execution over each datum d generates f i v e flow-sequences; those marked with an asterisk are isomorphic to others i n the l i s t . Only the set of non-isomorphic flow-sequences are considered in computing the information value. The l i s t s of flow-subsequences and flow-sequences are given i n Appendix B. The i n i t i a l i z a t i o n nodes are omitted, since no optimization techniques are applied to these nodes. The following example i l l u s t r a t e s the use of the conditional entropy as a means of partitioning the data set according tc structure. Within • p a r t i t i o n s ' , the improvements in cost r e s u l t i n g from a given change in the implementation of the algorithm are f a i r l y close. When the conditional entropies d i f f e r , t h i s indicates that the corresponding structures of the outputs do also. The smaller the conditional entropy, the more amenable to node or flow-subsequence improvements are the 44 sections of the algorithm associated with th i s datum. Consider the data pairs shown i n Table I I I . Datum Info Val 33 56.325 36 54.939 Cond Ent H-J. cost 1.374 229.1 1.731 219.0 H-II cost |I - I I | 223.1 6.0 214.2 4.8 19 42.802 2.095 218.4 213. 4 5.0 20 44.188 1.731 228.5 222.3 6.2 Table I I I : Example 4 The output flow-seguences for the above data are: (33) 1* 1-3 7- 11 -16-21 27-28 1 1-3 7-11 -16-21 (7-11-16-21) 27 2 2-4 -3 7- 11-16-21 9-15-23 27-28 2* 2-4 -3 7- 11-16-21 27-28 2* 2-4 -3 9- 15-23 27-28 (36) 1 1-3 7- 11 -16-21 27-28 2 1-3 7-11 -16-21 5-12-19-25 3 2-4-3 7- 11-16-21 9-15-23 3* 2-4-3 7- 11-16-21 27-28 3* 2-4-3 9- 15-23 27-28 45 (19) 1 1-3 7-11 -16-21 27-28 2 1-3 7- 1 1 -16-21 5-12-19-25 3 2-4-3 7- 11-16-21 9-15-23 27-28 4 2-4-3 8- 13-17-22 27-28 3* 2-4-3 9- 15-23 27-28 (20) 1* 1-3 7- 11 -16-21 27-28 1 1-3 7- 11 -16-21 (7-11-16-21) 27 2 2-4-3 7- 11-16-21 9-15-23 27-28 3 2-4-3 8- 13-17-22 27-28 2* 2-4-3 9- 15-23 27-28 The information value of (33) i s s l i g h t l y higher than that of (36) due to the simpler structure and more costly flow-seguences of (33). This same observation holds for the second pair of data, with (20) having the higher information value. In both pairs, the lower conditional entropy indicates which datum has the simpler structure in i t s output. An implementation which r e f l e c t s such a structure yields greater cost savings for the datum with the lower conditional entropy. Such an analysis i s useful i f i t i s known that a high portion of the data i s of the form of either (19) or (20) say, and i t i s desirable to know whether optimization methods should be applied to the path followed by (19) or by (20). This measure indicates that by concentrating on (20) more o v e r a l l savings can be obtained. 46 In the next example, the notion of p a r t i t i o n and the eff e c t of costs on the measure are discussed. The conditional entropy of both (3) and (5) are close (see Table IV), but the i r information values d i f f e r . From t h i s , i t i s known that their structures are s i m i l a r , but that (3) either has more costly flow-subsequences, or has flow-sequences which can be coded more e f f i c i e n t l y to r e f l e c t their unique flow structures. This example i l l u s t r a t e s that within •partitions' the information value ranks the data according to the corresponding costs i . e . the higher the cost, the higher the information value. Datum Info I s i 3 53. 334 5 35.287 Cond Ent H-I cost 1.610 228.5 1.611 204.7 Table IV: Example 5 The associated flow-sequences are: (3) 1 1-3 8-13-17-22 27-28 2 1-3 7-11-16-21 (7-11-16-21) 27-28 3 2-4-3 7-11-16-21 9-15-23 27-28 3* 2-4-3 7- 11-16-21 27-28 3* 2-4-3 9-15-23 27-28 47 (5) 1 1-3 8-13-17-22 27-28 2 1-3 7-11-16-21 (7-11-16-21) 27-28 3 2-4-3 7-11-16-21 10-18-26 4 2-4-3 7-11-16-21 27-28 3* 2-4-3 10-18-26 The th i r d example of t h i s section (see Table V) shows that i f a datum has a much higher information value than another, and i t s conditional entropy i s lower, then this datum, as well as consisting of more costly flow-subsequences, i s more amenable to node or flow-subsequence improvement. Thus, by selecting the more informative of equally probable data, and applying optimizing technigues to i t s flow path, greater cost reductions resu l t than i f such processes were applied to the other datum. Datum Info Val Cond Ent H-I cost H-II cost | I - I I | 1 1 54.291 1.887 216.3 21 1. 6 4.7 12 67.106 1.715 228.2 222.3 5.9 Table V: Example 6 48 The corresponding flow-seguences are l i s t e d below. (11) 1* 1-3 7-11-16-21 27-28 1 1-3 7-11-16-21 8-13-17-22 27-28 2 2-4-3 7-11-16-21 9-15-23 27-28 2* 2-4-3 7-11-16-21 27-28 3 2-4-3 10-18-26 (12) 1* 1-3 7-11-16-21 27-28 1 1-3 7-11-16-21 8-13-17-22 27-28 2 2-4-3 7-11-16-21 9-15-23 27-28 2* 2-4-3 7- 11- 16-21 27-28 2* 2-4-3 9-15-23 27-28 In t h i s instance (12) i s seen to have the simpler structure, lacking the extra flow-sequence 10-18-26 which appears in (11). Here flow-subsequence 27-28 was optimized. In the l a s t example a case where the conditional entropies are approximately the same i s considered. This implies that the outputs corresponding to the two data have nearly the same structure. Sample (49) has a somewhat c o s t l i e r output, a fa c t r e f l e c t e d i n the information value given i n Table VI. However, since the conditional entropies indicate similar structures, the marginal difference in costs w i l l not cause marked 49 d i s s i m i l a r i t i e s i n the cost savings. That i s , the flow-sequences of either datum can be structured to increase the e f f i c i e n c y , and the resultinq cost decreases for both cases w i l l be nearly the same. Due to the s i m p l i c i t y of the sample task, the flow-subsequences of (49) and (57) are f a i r l y s i m i l a r ; however, this i s not necessary i n order that the conditional entropies be the same. Datum I n f o Val Cond Ent H-I cost H-II cost |I-II I 49 36.686 1.820 198. 1 193. 1 5.0 57 35.450 1.828 195.1 189.7 5.4 Table VI: Example 7 The flow-sequences associated with the table entries are given below. (49) 1 1-3 7-11-16-21 27-28 2 1-3 5-12-19-25 3 2-4-3 7-11-16-21 9-15-23 27-28 4 2-4-3 8- 13-17-22 27-28 3* 2-4-3 9-15-23 27-28 50 (57) 1* 1-3 8-13-17-22 27-28 1 1-3 8-13-17-22 27-28 2 2-4-3 7- 1 1-16-21 10-18-26 3 2-4-3 8-13-17-22 27-28 4 2-4-3 9- 15-23 27-28 3.7.3 Summary Cf The Examples The above examples are intended to demonstarate how the information value, together with the conditional entropy, can be used to aid in the analysis of an algorithm. The measure can point to expensive data ( i . e . c o s t l y for algorithms to execute), but as well, can indicate which paths through the algorithm should be considered for code optimization in an attempt to obtain the maximum cost saving. In more complex examples, improvements on nodes or flow-sequences rather than just flow-subsequences would be in order. The advantages of using the information value as a complexity measure are evident from these examples. 51 3.8 Summary In t h i s thesis, an attempt was made to define a cost and s t r u c t u r a l complexity measure f o r an algorithm. To accomplish t h i s , we defined an information theoretic model of the execution of an algorithm, i n which the input i s a set of subwalks, and the output certain walks, of a graph theoretic representation of the algorithm. Cost i s included in the model through the d e f i n i t i o n of a cost p r o b a b i l i t y scheme, and structure through the concepts of reduction and isomorphism. An information value f o r each implementation of the algorithm i s calculated. I t i s shown that t h i s value provides a l l the information that the conventional measure of cost alone does. Moreover, i t presents s t r u c t u r a l information which indicates the amount of i n t e r a c t i o n between program sections, and points to dominant, repeated and independent flow patterns, and to s t r u c t u r a l s i m i l a r i t i e s . Under the assumption of comparable costs, the maximum information value points to a s t r u c t u r a l l y optimum algorithm; when the structure i s fixed, i . e . analysing a given algorithm, the minimum cost implementation has the smallest information value. The appropriateness of these considerations i n the analysis of an algorithm has been demonstrated i n Chapter I I I . More generally, t h i s study has demonstrated the f e a s i b i l i t y of using information theory to measure the complexity of an algorithm. We conclude with some suggestions f o r improving the model treated here. F i r s t , the introduction of more s t r u c t u r a l 52 parameters may improve the model. Presently, reduction and isomorphism have proved b e n e f i c i a l i n evaluating an algorithm; however, a r e d e f i n i t i o n or expansion of these may y i e l d a more informative measure. The model obtains the information value from the weighted average of the entropy less the conditional entropy. Without the i n c l u s i o n of these weights, the measure becomes much more responsive to s t r u c t u r a l information, and l e s s sensitive to costs. In certain instances t h i s may y i e l d a more valuable measure than the one defined i n t h i s study. Of note, when using the 'unweighted' measure, i s the fact that the most appropriate implementation has the largest information value. The i n c l u s i o n of other parameters in the model might also prove useful, but t h i s would increase the cost of applying the measure when this cost may already seem pro h i b i t i v e . However, as with most studies concerned with the complexity of algorithms, t h i s analysis i s based on the following assumption. The task which the algorithm or algorithms under consideration w i l l be computing i s c e n t r a l to some process that i s to be repeated many times (for instance i n a business application, some procedure which must be calculated d a i l y ) . Thus, the cost i n performing a complexity analysis for such a task may well be neglible r e l a t i v e to the o v e r a l l savings incurred through the use of the appropriate algorithm and i t s optimal implementation. 53 BIBLIOGRAPHY 1. AHO, A, and ULLMAN, J . , Optimization of Straight Line Programs, SIAM Journal Comju y, (1972) pp. 1-19 2. ALEKSEEV, V.E. , Sorting Algorithms with Minimum Memory, Cybernetics^ 5, (1969), pp. 642-648. 3. ALLEN, F.E., Control Flow Analysis, Proc. of a Symposium on Compiler Optimization, SIGPLAN Notices, ACM, New York, July "1970, pp. 1-19. 4. ALLEN, F. E. , Program Optimization, i n : Annual Review in Automatic Programming VoJU 5, Pergamon Press, New York, 7 9 6 9 . 5. ASH, R. Information Theory, Wiley Interscience, New York, 1965. 6. BACHMAN, P., A Contribution to the Problems of the Optimization of Programs, Inform. Processing 2.11 North-Holland, Amsterdam, 1972, pp7~397-40l7 7. BAKU, S. On Reduction of Program Schemes, SIAM J . Comp.. 16, (1968) pp. 328-339. 8. BERZTISS, A., A Note on Segmentation of Computer Programs, Inform.. Control 12, (1968) pp. 21-22. 9. BEUS, H., The Use of Information i n Sorting JAJUC.M.. 17, (1970), pp. 482-495. 10. COCKE, J., Global Common Subexpression Elimination, 2£oceedings of a Symp^ on Compiler Optimization^ SIGPLAN Notices,"ACM, New~York, July"19707~pp. 20-24*7 11. FSAZER, W.E., Analysis of Combinatory Algorithms - A Sample of Current Methodology, AFIPS Coirf. Proceedings, Spring Joint Computer Conference AFIPS Press, Montvale, N.J., 19727 pp. 4 83-491 12. GALLAGER, R., Information Theory and Reliable Communication, Wiley, New York, 1968. 13. GREEN, C , A Path Entropy Function For Rooted Trees, Jj,A^C^M^. 20, (1973), pp. 378-384. 14. HARARY, F., Grap_h Theory, Addison-Wesley, Reading Mass., 1969. 15. HARTMANIS, J . , Computational Complexity of Random Access Stored Program Machines, Math_. Systems Theory 5, (1971) pp. 232-245. 16. HOPKINS, M., An Optimizing Compiler Design, Inform. 54 Processing *7J, North-Holland, Amsterdam, 1972, pp. 391-396. 17. IANOV, I., On the Equivalence and Transformation of Program Schemes, Comm. ,&.C.g. J[, (1958), pp. 8-12. 18. IANOV, I., On Matrix Program Schemes, C O M . 1 ^ , ^ . 1, (1958), pp. 3-6. 19. KARP, R., A Note on the Application of Graph Theory to D i g i t a l Programming, Inform. Control 3, (1960), pp. 179-190. 20. KARREMAN, G., Topological Information Content and Chemical Reactions, B u l l . Math.. Bio^hjs^ 17 (1955), pp. 279-285. 21. KNUTH, D., Sorting and Searching!, The Art of Computer Programmingj. Vol. 3, Addison-Wesley, Reading, Mass., 1973. 22. KRIDER, L., A Flow Analysis Algorithm, J.A.C.E1. J1, (1964), pp. 429-436. 23. MOWSHOWITZ, A., Entropy and the Complexity of Graphs, Doctoral Dissertation, University of Michigan, 1967. 24. NIEVERGELT, J . , On the Automatic S i m p l i f i c a t i o n of Computer Programs, Comju jUC. EU 8, (1965), pp. 366-370. 25. PATERSON, M., Program Schemata i n : Machine Intelligence Vol. 3, (D. Michie, e d i t o r ) , American Else v i e r , New York, 1968." 26. PICARD, C., Theorie des Questionnaires, G a u t h i e r - V i l l a r s , Paris, 1965. 27. RASBEVSKY, N., L i f e , Information Theory, and Topology, B u l l i Hath. Biopjrys^ 17 (1955), pp. 229-235. 28. RUTLEDGE, J. , On Ianov's Program Schemata, J.A.CAM. IJ, (1964) , pp. 1-9. 29. SCHORMANN, A., The Application of Graphs to the Analysis of Dist r i b u t i o n of Loops in a Program, Inform. Control 7, pp. 275-282. 30. TRUCCO, E., A Note on the Information Content of Graphs, B u l l i Math. Bio^hys^ 1.8 (1956), pp. 129-135. 31. MARSHALL, S., On Computational Cost, i n : Annual Reyiuew in Automatic Programming Vol. 5, Pergamon Press, New York, 19697" 32. WOODGER, M., On Semantic Levels i n Programming, i n : I n f o i processing Ul, North-Holland, Amsterdam, 1972, pp."402-407. 55 appendix A Here we give the in s t r u c t i o n set of the hypothetical assembly language in which the algorithms are •written 1. These inst r u c t i o n s are very basic to insure they remain machine independent. The execution costs are based on the MIX [ 2 1 ] , PDP-10, and CDC assembler language timings. The following assumptions about this language are made. (1) there are 8 r e g i s t e r s , which are i n fast memory and can be used f o r indexing. (2) there i s one accumulator. (3) input and output are ignored. The costs l i s t e d are i n execution units; costs i n parentheses are i n e f f e c t i f indexing i s used. In the statement of the ins t r u c t i o n , c a p i t a l l e t t e r s refer to memory locations, small l e t t e r s to r e g i s t e r s . C(x) i s the contents of location or register x; Acc re f e r s to the accumulator. INSTRUCTION COST JUHP- unconditional jump 1 .2 (1 .3 ) JUMPE jump i f Acc = 0 JUMPLE jump i f Acc < 0 JUMPL jump i f Acc < 0 JUMPG jump i f Acc> 0 JUMPGE jump i f Acc > 0 JOMPNE jump i f Acc = 0 SETZM A set C(A) to 0 SETZR » i set C(i) to 0 SETR i set C(j) to C(i) SE TI R i# n set C{i) to n SETMR A, i set C (i) to C(A) SETRM A, i set C(A) to C(i) SETNM A set C (A) to -C (A) SETUR i i set C (i) to -C(i) ADD' R i# j add C (i) to C(j) SUB I A, n add n to C(i) J HR A, i add C(A) to C(i) 1 . 7 , ( 1 . 8 ) 1.0(1. 1) 1.4 1.0 (1. 1) 1.6(1.8) 1 .8 (1.9) 1.7 (1.9) 1.5(1.7) 1.6(1.8) 1.2 (1.3) 2.2 (2.3) DIvT ,n divide Acc by n 11.3 CMPZM A CHPZR CMPR CM PI CMPH #1 i» j A C(A) - 0; set f l a g C (i) - 0; set f l a g C(i) - C(j) ; set f l a g C(Acc) - n; set f l a g C(Acc) - C (A); set fl a g 1.7 (1.9) 1.5(1.7) 1.6 (1.8) 1.2 (1.3) 1.9(2.0) BLT n move n contiguous words of memory 0.8+(2.1) 57 Appendix B In order to i l l u s t r a t e some of the concepts in t r o d u c e d i n t h i s t h e s i s , we w i l l present a p a r t i a l a n a l y s i s of H, the heap a l g o r i t h m [21]. We f i r s t g i v e the al g o r i t h m i n i t s t e x t u a l form. Based on t h i s , we produce G (H), the graph of H. Then, so th a t the a n a l y s i s programs have use of a numeric r e p r e s e n t a t i o n , the se t s are e s t a b l i s h e d . F i n a l l y we o b t a i n the numeric r e p r e s e n t a t i o n of the flow-subsequences and flow-sequences, where the l a t t e r i s l i s t e d i n f a m i l y n o t a t i o n (see procedure i n s e c t i o n 3.3). fleapsort: A f i l e of numbers R^,Ra,... ,RN i s a 'heap* i f R~ M > R^ f o r 1 < [ j / 2 ] < j <N, where [ x ] i s the g r e a t e s t i n t e g e r i n x. Thus, R^ >BX , R^>R3 , fi4- R4-' e t c » » a n d t h i s i m p l i e s t h a t the l a r g e s t number appears •on top of the heap 1, R L = max(R l #B i,... , R J . I f an a r b i t r a r y input f i l e i s transformed i n t o a heap, a 'top-down* s e l e c t i o n procedure can be used to s o r t . Algorithm H. Numbers R^,...,RN are rearranged so th a t a f t e r s o r t i n g i s complete, they w i l l be i n or d e r . F i r s t the f i l e i s rearranged so that i t forms a heap, then the top of the heap i s re p e a t e d l y removed and t r a n s f e r r e d to i t s proper f i n a l p o s i t i o n . Assume that N > 2. 58 B1 . [ ' I n i t i a l i z e ] S e t d t o [ H/2 ] + 1 , r t o i . . H 2 . [ D e c r e a s e d o r r . ] I f d > 1 , s e t d t o d - 1 , a n d B t o B d . ( I f d > 1 , t h e f i l e i s b e i n g made i n t o a h e a p ; i f d=1 t h e n t h e f i l e i s a l r e a d y a h e a p . ) H 2 b . O t h e r w i s e s e t R t o R r , R r t o R^, a n d r t o r - 1 ; i f t h i s m a k e s r = 1 , s e t R^ t o R a n d t e r m i n a t e . H 3 . [ P r e p a r e f o r ' s i f t - u p ' ] S e t j t o d . (A t t h i s p o i n t we h a v e R r . ^ > B- f o r d < [ j / 2 ] < j < r ; a n d R K i s i n i t s f i n a l p o s i t i o n f o r r < k < N. H 4 . [ A d v a n c e d o w n w a r d . ] S e t i t o j a n d j t o 2 j . ( I n t h e f o l l o w i n g s t e p s we h a v e i = [ j / 2 ] . ) I f j < r , go t o s t e p H 5 ; i f j = r , go t o s t e p H 6 ; a n d i f j > r , go t o H8 B 5 . [ F i n d ' l a r g e r 1 s o n . ] I f R^< R^ + L , t h e n H 5 b : s e t j t o j p l u s 1 . H 6 . [ L a r g e r t h a n R ] I f R>R- , t h e n go t c s t e p H 8 . H 7 . [ M o v e i t u p . ] S e t R^ t o R j , a n d go b a c k t o s t e p H*». B 8 . [ S t o r e R. ] S e t R^ t o R . ( T h i s t e r m i n a t e s t h e * s i f t - u p « a l g o r i t h m i n i t i a t e d i n s t e p H 3 . ) R e t u r n t o s t e p H 2 . 59 60 2 2.1 IMS X- P • 1 1,2 [ t=10] 2 3 3 5 , 6 , 7 , 8 , 9 , 1 0 , 2 7 4 11,12 5 1 5 , 1 6 , 1 7 , 1 8 , 1 9 , 2 0 6 2 1 , 2 2 , 2 3 7 2 4 , 2 5 , 2 6 , 2 8 8 13,14 9 4 Flow-Subseguences 1- 3 8 - 13 -17-22 27-28 7 - 1 1 - 1 6 - 2 1 10-18-26 2 - 4 -3 9 - 15-23 6 - 1 4 - 2 0 - 2 4 5 - 1 2 - 1 9 - 2 5 61 Families of Flow-Seguences Costs F^: 2-4-3 7-11-16-21 10-18-26 44.7 2-4-3 10- 18-26 24.4 i F A : 1-3 7-11-16-21 8-13-17-22 27-28 56.6 1-3 7-11-16-21 27-28 37.2 1-3 8-13-17-22 27-28 36.6 F 3: 1-3 7-11-16-21 6-14-20-24 46.8 1- 3 6-14-20-24 26.5 F. : 2-4-3 7-11-16-21 9-15-23 27-28 56.6 2- 4-3 7-11-16-21 27-28 41.5 2-4-3 9-15-23 27-28 36.3 F s: 1-3 7-11-16-21 5-12-19-25 47.4 1-3 5-12-19-25 27.1 F : 2-4-3 8-13-17-22 27-28 40.9
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- An information theoretic measure of algorithmic complexity
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
An information theoretic measure of algorithmic complexity Wright, Lois E. 1974
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | An information theoretic measure of algorithmic complexity |
Creator |
Wright, Lois E. |
Publisher | University of British Columbia |
Date Issued | 1974 |
Description | This work is a study of an information theoretic model which is used to develop a complexity measure of an algorithm. The measure is defined to reflect the computational cost and structure of the given algorithm. In this study computational costs are expressed as the execution times of the algorithm, where the algorithm is coded as a program in a machine independent language, and analysed in terms of its representation as a pseudograph. It is shown that this measure aids in deciding which sections of the algorithm should be optimized, segmented or expressed as subprograms. The model proposed is designed to yield a measure which reflects both the program flow and computational cost. Such a measure allows an 'optimal' algorithm to be selected from a set of algorithms, all of which solve the given problem. This selection is made with a more meaningful criterion for decision than simply execution cost. The measure can also be used to further analyse a given algorithm and point to where code optimization techniques should be applied. However it does not yield a method of generating equivalent algorithms. |
Subject |
Algorithms Numerical calculations |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-22 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051761 |
URI | http://hdl.handle.net/2429/19034 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1974_A6_7 W75.pdf [ 3.31MB ]
- Metadata
- JSON: 831-1.0051761.json
- JSON-LD: 831-1.0051761-ld.json
- RDF/XML (Pretty): 831-1.0051761-rdf.xml
- RDF/JSON: 831-1.0051761-rdf.json
- Turtle: 831-1.0051761-turtle.txt
- N-Triples: 831-1.0051761-rdf-ntriples.txt
- Original Record: 831-1.0051761-source.json
- Full Text
- 831-1.0051761-fulltext.txt
- Citation
- 831-1.0051761.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051761/manifest