Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An information theoretic measure of algorithmic complexity Wright, Lois E. 1974

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1974_A6_7 W75.pdf [ 3.31MB ]
Metadata
JSON: 831-1.0051761.json
JSON-LD: 831-1.0051761-ld.json
RDF/XML (Pretty): 831-1.0051761-rdf.xml
RDF/JSON: 831-1.0051761-rdf.json
Turtle: 831-1.0051761-turtle.txt
N-Triples: 831-1.0051761-rdf-ntriples.txt
Original Record: 831-1.0051761-source.json
Full Text
831-1.0051761-fulltext.txt
Citation
831-1.0051761.ris

Full Text

INFORMATION  THEORETIC MEASURE OF ALGORITHMIC COMPLEXITY  by Lois Hon.  Wright  B.Sc, University  o f Western O n t a r i o ,  1972  A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE  REQUIREMENTS MASTER  in  FOR THE DEGREE OF OF SCIENCE  t h e Department of  Computer  We a c c e p t t h i s  thesis  required  THE  Science  as conforming standard  UNIVERSITY OF BRITISH April  to the  1974  COLUMBIA  In p r e s e n t i n g an the  thesis  in p a r t i a l  advanced degree at the U n i v e r s i t y Library  I further for  this  shall  make  agree t h a t p e r m i s s i o n  scholarly  h i s representatives.  of  this  thesis  gain  permission.  Depa r t m e n t Columbia  requirements  Columbia,  I agree  f o r r e f e r e n c e and  f o rextensive  copying o f this  shall  that  copying or  for that  study. thesis  by t h e Head o f my D e p a r t m e n t  I t i s understood  f o r financial  The U n i v e r s i t y o f B r i t i s h V a n c o u v e r 8, Canada  of B r i t i s h  available  p u r p o s e s may be g r a n t e d  by  written  i t freely  f u l f i l m e n t o f the  or  publication  n o t be a l l o w e d w i t h o u t my  ii  ABSTRACT  This which The  work i s a s t u d y  is  used  measure  structure  where  to develop  the  given  expressed  the  as  proposed  i s designed  program  flow  and  meaningful  cost.  The  be  applied.  equivalent  a  as  t o be given  of  the as  machine of  its  measure  should  s u b p r o g r a m s . The reflects  a set  model  both  the  simply  execution  should  algorithms.  yield  analyse  a  p o i n t t o where code o p t i m i z a t i o n t e c h n i q u e s does n o t  further  i s made with  given  i t  to  an  of a l g o r i t h m s , a l l  d e c i s i o n than  used  be  a  However  a l s o be  a  terms  problem. T h i s s e l e c t i o n for  algorithm,  c o s t . Such a measure a l l o w s  s e l e c t e d from  criterion  in  algorithm  a measure which  and  computational  the  in  cost  I t i s shown t h a t t h i s  expressed  to y i e l d  of  program  analysed  sections or  measure can and  coded  computational  •optimal' algorithm  more  times  a pseudograph.  segmented  which s o l v e t h e  execution  study  model  algorithm.  computational  In t h i s  and  a i d s i n d e c i d i n g which optimized,  the  theoretic  measure o f an  algorithm.  is  language,  representation  algorithm  information  a complexity  as t h e  algorithm  independent  of  an  is defined to r e f l e c t of  costs are  of  a method o f  generating  iii  TABLE OF CONTENTS Page  CHAPTER I : RECENT STUDIES IN ALGORITHMIC COMPLEXITY  1  1.1  Introduction  1  1.2  Related Work  2  1.2.1  S t u d i e s Without the Use of Graphs  2  1.2.2  Graph T h e o r e t i c Approach  4  1.2.3  Information T h e o r e t i c Approach  7  CHAPTER I I : AN INFORMATION THEORETIC MODEL OF ALGORITHMS  10  2.1  C o n s t i t u e n t s of an Information T h e o r e t i c Model  10  2.2  B a s i c Terminology  12  2.2.1  Graph Theory D e f i n i t i o n s  12  2.2.2  Model D e f i n i t i o n s  13  2.3  D e f i n i t i o n of the Information Model  18  2.1  Definiton of P r o b a b i l i t i e s  18  2.5  D e f i n i t o n o f Information Measure  22  2.6  Uses of the Information Value  24  2.7  Information Value A Complexity and E f f i c i e n c y Measure  26  CHAPTER I I I : AN INFORMATION THEORETIC ANALYSIS OF ALGORITHMIC COMPLEXITY  29  3.0  Introduction  29  3.1  Procedure f o r Flow-Sequence C o n s t r u c t i o n  29  3.2  Flow-subsequences; Weight Assignment  30  iv  3.3  Hierarchy  of  Isomorphic  3.4  R e d u c t i o n and Isomorphism  3.5  Entropy,  3.6  Further  Conitional Analysis  of  Flow-Sequences of  Entropy, the  S e l e c t i n g an A l g o r i t h m  3.6.2  Selecting the  3.7  32  Information  33  Algorithm  Value  Value  33 33  Implementation  Examples The  3.7.1  Comparison of  3.7.2  Analysis  3.7.3  Summary o f  Summary  35 38  3.7.0  3.8  Flow-Sequences  Information  3.6.1  31  Form and  of  Purpose Two  of  the  Algorithms  Heap A l g o r i t h m the  Examples  Examples for  the  38 Same T a s k  41 43 50  51  BIBLIOGRAPHY  53  APPENDIX  A  55  APPENDIX  B  57  V  LIST OF TABLES  Table  Page  I  Example 1  42  II  Examples 2 and 3  42  III  Example 4  44  IV  Example 5  46  V  Example 6  47  VI  Example 7  49  vi  LIST OF  Figure 1  FIGURES  Page Information  Channel  10  c  vii  RCKNORLEGEMENTS I  wish  to  Dr. abbe Mowshcwitz  express  my  sincere  f o r h i s continued  appreciation  to  guidance and i n v a l u a b l e  advice. I would  a l s o l i k e t o thank Dr. D. Seeley  for his  several  suggestions. Financial  assistance  was  received  from both the N a t i o n a l  Research C o u n c i l and the Department of Computer  Science.  1  CHAPTER I : RECENT STUDIES IN ALGORITHMIC COMPLEXITY  1.1  Introduction The  which  f o l l o w i n g i s a study of an i n f o r m a t i o n t h e o r e t i c  is  used to develop a complexity  The measure i s defined t o r e f l e c t structure  of  the given a l g o r i t h m .  c o s t s are expressed where  the  algorithm  is  represented  by  should  subprograms. which  The  reflects  computational  times  as  a  of  a  pseudograph.  program  optimized,  which  the  Such a measure a l l o w s an  computational algorithm,  in  a  the  machine algorithm  sections  segmented  model proposed i s designed both  or  of  the  expressed  to y i e l d a  cost.  a l g o r i t h m to be s e l e c t e d from  1  a set of a l g o r i t h m s , a l l of which s o l v e the g i v e n problem. s e l e c t i o n i s made with a more meaningful than  simply  execution  further analyse a optimization  cost.  given  techniques  as  measure  program flow and computational  •optimal  and  I t w i l l be shown t h a t t h i s  aids i n deciding  be  cost  the  For purposes of a n a l y s i s ,  measure o f complexity algorithm  coded  measure of an a l g o r i t h m .  In t h i s study  as the e x e c u t i o n  independent language. is  the  model  The  algorithm  criterion for  This  decision  measure can a l s o be used to and  point  should be a p p l i e d .  to  where  code  However i t does not  y i e l d a method of generating e q u i v a l e n t a l g o r i t h m s . Before a d e f i n i t i o n of the model i s given, a b r i e f will  be  presented  of  the  problem under study and  research i n t h i s area.  The main  algorithms  to  have  been  objectives  generate  in  the  summary  of r e l a t e d study  of  •equivalent* a l g o r i t h m s , to  2  o p t i m i z e compiler  code through  through a measure of select, task.  efficiency,  theoretical  in  has tended  the  first  and have avoided  of o p t i m a l i t y .  compare,  rank  and  then  have  often  been  quite  the problem of o b t a i n i n g a measure with c o m p i l e r s ,  t c concentrate on improvements i n p a r t i c u l a r types of  rather  than  evaluations  of  the  the a l g o r i t h m . in  of  the  program  as  t h i r d type have o f t e n focused  s i n g l e f a c e t of complexity  given  area  The second area, being concerned  Investigations  is  to  an a l g o r i t h m which i s i n some sense o p t i m a l f o r a g i v e n Studies  code  an a n a l y s i s of the a l g o r i t h m and,  a  whole.  on only a  r a t h e r than the o v e r a l l complexity of  The d i s c u s s i o n of some of the three p a r t s .  First  relevant  studies  those which have not i n c l u d e d  graphs i n t h e i r a n a l y s e s , then those t h a t have used graphs,  and  f i n a l l y those t h a t have d e f i n e d an i n f o r m a t i o n measure on graphs and/or a l g o r i t h m s are presented.  1.2  Belated Work  1.2. 1  S t u d i e s Without The Use Of Graphs An  early,  quite  given i n a study schemata*  is  by  theoretical  evaluation of algorithms i s  Ianov [17,18].  introduced  to  The  represent  programs, where programs are represented and a l s o as m a t r i c e s . allows  notion  of  'program  a b s t r a c t a l g o r i t h m s or in  a  linear  notation  Using program schemata, a formalism  which  f o r t r a n s f o r m i n g these schemata i n t o e q u i v a l e n t ones, a  d e c i s i o n procedure  f o r determining  e q u i v a l e n c e , and a method  g e n e r a t i n g a l l e q u i v a l e n t schemata a r e developed. simplifies  Ianov's [17,18] formalism  of  Rutledge [ 2 8 ]  by i n t e r p r e t i n g i t i n such  3  a  way  that  equivalence  the  equivalence  problem  of  question  finite  the  formalisms  algorithms. program allow  schemata,  determined.  of  remain  such  an  or  strictly  Paterson's [ 2 5 ] work on of  eguivalent  program  to  evaluation.  theoretical  program  applying  programs  he  a b s t r a c t measure cannot r e v e a l  p r a c t i c a l work has f o l l o w e d from them,  possibility  to a given  than with an e v a l u a t i o n of the  a l l equivalent  much about code implementation studies  a  e f f i c i e n c y measure d e f i n e d on them would  optimum  However,  the  These s t u d i e s are more concerned  introduced  an  be  Furthermore,  But s i n c e these procedures do y i e l d a l l  f o r the  such  impractical.  a l l program schemata e q u i v a l e n t  program schemata i s presented. with  to  automata, f o r which a s o l u t i o n  e x i s t s , even though i t i s r a t h e r method f o r generating  i s seen  very  one e x c e p t i o n  schemata.  complexity  and  He  Thus, little  to this i s  discusses  the  c o n s i d e r a t i o n s i n program  o p t i m i z a t i o n but does not develop t h e necessary machinery. In  a  algorithms comparisons generality  more by  restricted defining  made. ty  memory accesses,  study,  Beus [ 9 ] e v a l u a t e s  sort  an e f f i c i e n c y measure on the number of  However,  neglecting  this  other  measure  lacks  c o s t inducing  sufficient  f a c t o r s such as  index c a l c u l a t i o n s e t c .  At a l e s s t h e o r e t i c a l l e v e l , H i e v e r g e l t [24 ] c o n s i d e r s problem  of  he d i s c u s s e s syntactic  o p t i m i z i n g a program. the  application  improvements  considerations to  the  would  the  As m o t i v i a t i o n f o r h i s study  of  optimizing  be  programmer.  useful, His  i n areas leaving  study  gives  where  semantic several  specific  practical  program.  These y i e l d  units.  Yet  expense the  transformations  d e f i n i t e improvements  the advantage gained  involved  program  in  that  t h a t can be a p p l i e d to in  execution  i s questionable  are  are  seldom  executed.  An  improvement  on  paper  by  where t o p o l o g i c a l p r o p e r t i e s of the program are used  considered  s i m i l a r approach, optimizing  the  a p p l y i n g such improvements to s e c t i o n s of  to o b t a i n program modules. these  cost  considering  N i e v e r g e l t * s [ 2 4 ] a n a l y s i s appears i n a much r e f e r e n c e d Allen [ 4 ] ,  any  The  most  frequently  for optimization.  presents  programs,  and  a  series  the  executed  of  Hopkins [ 1 6 ] , using a  of  overall  transformations  for  implementation of such  techniques. Many other early  1970's  papers [7,15,22] w r i t t e n during the  studies  g i v e techniques  f a i l to c o n s i d e r This  approach  In the  program flow, a l g o r i t h m i c does  not  attempt  Graph T h e o r e t i c One  of  the  Karp  first  notes  above.  code, but  t c evaluate  etc.  the program as a  particular  sections  of  papers which take a  algorithms.  Approach  e v a l u a t i o n o f algorithms 1960.  mentioned  implementation  f o l l o w i n g s e c t i o n , we c o n s i d e r  g l o b a l view of programs and  1.2.2  those  f o r optimizing general  whole, but i s concerned with improving code.  and  study the problem of o p t i m i z a t i o n ; however, t h e i r  approach i s even more s p e c i a l i z e d than These  1960*s  graph  theoretic  i s the one  that  developed  previously  the  approaches by  to  Karp [19]  seemingly  the in  natural  5  r e l a t i o n s h i p between program flow and In  this  paper  execution  an  algorithm  of the algorithm  decision  elements.  identification programs  are  of  some  analysed  model i s i n t r o d u c e d of p a r t s  of  the  representation  without  a  execution,  representation f a c i l i t a t e s  the  simple  program  The  program.  loops  in  analysed  the  a  improvements.  o f subprograms, and  Schurmann [ 2 9 ] the  minimum  graphic  uses  representation  (loops of the a l g o r i t h m ) .  optimal  cut.  construction of t h i s  Berztiss [8]  represented  of  graphic  order  to  insure  the  The  algorithm  are  of a p a r t i a l graph based The  cut having  improves  into  paper, Aho algorithmic  and  and on  Oilman [ 1 ]  optimzation.  as d i r e c t e d a c y c l i c graphs, and  a p p l i c a b l e to t h i s r e p r e s e n t a t i o n are the programs e q u i v a l e n t  optimizations.  four  defined.  under the t r a n s f o r m a t i o n s  as w e l l as p o s s i b l e extensions program  the  the minimum  this the  on  defines  method  of  matrix.  In a more r e c e n t considerations  execution  amount c f paging i s r e g u i r e d .  number of program loops spanning i t i s found the  a Markov  problem of program segmentation.  using the adjacency matrix  the c y c l e s  and  graphic  i n . terms  study  chart  transfer)  f o r i n v e s t i g a t i n g the frequency of  to  an  as a graph c o n s i s t i n g of o p e r a t i o n a l  This i s a l s o an o p t i m i z a t i o n problem s i n c e i n fast  ignored.  i s expressed as a flow c h a r t ,  instructions The  been  d e f i n e s a path through the flow  and t h i s path i s considered elements (seguences of  graphs had  introduce  cost  An a l g o r i t h m  is  transformations Properties  of  are  discussed,  of t h i s r e p r e s e n t a t i o n  to f u r t h e r  6  Bachmann [ 6 ]  recently  has assessed  the problem of program  e f f i c i e n c y using an a b s t r a c t c a l c u l u s based on a d i r e c t e d representation  of  are given together  the program.  graph  Several transformational r u l e s  with a measure of e f f i c i e n c y d e f i n e d  on  the  p r o b a b i l i t i e s and the c o s t of a l l o p e r a t i o n s executed on a given path.  However  this  is  assumed to be a t h e o r e t i c a l  o n l y , s i n c e determination number  of  formalism  distinct  criterion  c f p r o b a b i l i t i e s i s d i f f i c u l t and  paths  needed to apply  often  quite  large.  the t r a n s f o r m a t i o n s  the  As w e l l , the  i s not  mentioned,  but would presumably be q u i t e cumbersome. Allen [ 3 ] ,  using a g r a p h i c r e p r e s e n t a t i o n of an  algorithm,  proposes a u s e f u l method of a n a l y s i n g i t s c o n t r o l flow. paper,  alien  defines  these  obtains i n t e r v a l s  such  that  dominance r e l a t i o n s among nodes and  from  single  each c l o s e d path passes through h)  intervals.  tc  give  Mention i s made of how  analyses  variable  his  (subgraphs with  p a r t i t i o n the graph and  in  In  involving  definitions  searches  and  a  partial these  for  of  system  Boolean of  expressions  eguations, which  when  paper i s the n o t i o n of node s p l i t t i n g from  more  than  one  of  the  instructions,  Cocke [ 1 0 ]  solved,  be e l i m i n a t e d .  h  c o n s t r u c t s can be used  on computations.  which,  should  ordering  use r e l a t i o n s h i p s e t c .  variables  point  which are used to  redundant  flow a n l y s i s methods given by A l l e n £ 3 ] , set  entry  Applying  the  defines  a  T h i s set y i e l d s a point  to  those  Also i n c l u d e d i n t h i s  when a  cycle  is  entered  node, a procedure o f t e n very u s e f u l i n the  o p t i m i z a t i o n of c y c l e s .  7  1.2.3  Information T h e o r e t i c Approach Information theory, although a p p l i e d to  has  not  been used  trees.  In  function  a  to  1973  paper.  Green [ 1 3 ]  measure the complexity  defines  a  t  P path  and  n  and  m  = P(v ,v. ) = P f v ^ u ^ u ^ , . . . , ^ ^ . ) ,  L  0  t  of nodes from v  to v-.  o  of node u.  an  of paths i n rooted  L e t T„ be a t r e e with n t e r m i n a l nodes v^ , v , . . ., v  m non-terminal nodes u , u^,. .. , u ,  a  problems,  widely i n the area o f a l g o r i t h m i c complexity  and o p t i m i z a t i o n . entropy  diverse  Let od (u) denote the  Then the path entropy K  outdegree  E i s given by  I (P.) = Slog od (u, ) + log od (v ) ; j= 1  L  and  J  the entropy of the t r e e i s d e f i n e d by n  E(T )  =  n  2  i=1  E(P. ) .  F i n a l l y , the normalized entropy of T H(T ) n  A 1969 algorithmic  =  n  i s given by  (1/n)E(T ).  paper  rt  by  Warshall [3 1]  complexity  using  analyses  information  the  problem  of  under  the  theory,  c o n s t r a i n t that the a l g o r i t h m s s t u d i e d are viewed as choice-making  elements.  Thus, a sequence of s t a t e s or  e x p r e s s i o n s ever s t a t e s cannot The  finite  input  set  be e v a l u a t e d  using  a sequence  of  states  this  from  of  Boolean  s e t c o n s i s t s of the set of data i n p u t s  each of which determines exit.  arrays  model. {I},  entry  to  An a l g o r i t h m A i s a f u n c t i o n from the i n p u t values to the of E-sequences  ( f i n i t e sequence of  at least  three  elements.  8  beginning states the  and  of  ending  A  with  E, the  entry  c o n s i s t of the union  E-seguences.  probability P ( I ) .  or  exit  state).  The  of the elements o c c u r r i n g i n  input  1^  To  each  there  The  c o s t a s s o c i a t e d with  corresponds  a  state s i s  -p(s-»t) l o g p (s-»t) where p(s—>t) The  total  i s the p r o b a b i l i t y that t i s the next s t a t e .  i n determinancy of a l g o r i t h m A , i . e . the c h o i c e among  paths c, i s given  by  H (c) = - S p Then l e t t i n g TT (s)  (c) (log p (c)) ,  c, the mean number of occurrences TT(s) per execution C(A)  2  p (j).  be the number of occurrences  t  and  p (c) =  = Sp  of s t a t e s  s  in  of s per path i s (c)TT (s) c  c o s t of a l g o r i t h m  A is  = - S T T ( s ) £p(s-^t)log  p(s-»t).  Thus, H(A)  i s the a l g o r i t h m c o s t assuming f r e e bookkeeping, C (A)  the  (in  cost  assumption. algorithms  the  information  Eased  on  the  theoretic  above,  Marshall  evaluates  which c o n s i s t only of d e c i s i o n elements and  some p r o p e r t i e s of t h i s r e p r e s e n t a t i o n . general  sense) without  analysis  a l g o r i t h m s and  Rather than  this those  develops  present  a  of a l g o r i t h m s , t h i s paper s e l e c t s a subset  of  f i t s i t to an i n f o r m a t i o n t h e o r e t i c model.  of more g e n e r a l use,  t h i s model w i l l  i n c l u d e a wider c l a s s of a l g o r i t h m s ,  have  to  be  To  extended  node execution  costs,  be to  types  of loops e t c . Mowshowitz [ 2 3 ]  d e f i n e s the s t r u c t u r a l i n f o r m a t i o n  of a graph u s i n g the o r b i t s of  its  automorphism  group.  measure i s s t r u c t u r a l i n the sense t h a t i t g i v e s the  content This  information  9  content  of  a  transformations  particular for  which  u n d i r e c t e d graphs and graph,  properties  It  is  of  this  the  to  a  invariant.  a p a r t i c u l a r type of  infinite  are  i n v e s t i g a t e d and  operations  on  the  algorithm  relative  might  complexity  be  of  In the f o l l o w i n g , an i n f o r m a t i o n value w i l l algorithm,  methods  of  obtaining  cost  probabilities  and  of  in  algorithm. thesis.  be  defined  complexity  on  necessary  discussed.  terminology,  the  the i n f o r m a t i o n value w i l l be g i v e n .  general e x p l a n a t i o n of the concepts  the  t h i s value given and i t s  u s e f u l n e s s as a measure of e f f i c i e n c y and In Chapter I I d e f i n i t i o n s  useful  the  the  measure  that an i n f o r m a t i o n measure on  an  of  finite  C o n s i d e r a t i o n cf that proposal i s the b a s i s of t h i s  an  system  Using  measure  suggested of  relative  is  theoretic  s t r u c t u r e of the graph characterizing  it  digraphs and  e f f e c t of s e v e r a l graph examined.  graph  r e l a t e d to  this  A  particular  measure i s a l s o i n c l u d e d . In Chapter I I I , a procedure w i l l be presented the  non-isomorphic,  studied. of  this  reduced  seguences  In a d d i t i o n , the chapter measure  through  extensions of the work.  f o r computing  of the a l g o r i t h m  e l a b o r a t e s on the  extensive  examples  and  being  usefulness possible  10  CHAPTER I I : AN INFORMATION THEORETIC MODEL OF ALGORITHMS  2. 1  C o n s t i t u e n t s of an Information  T h e o r e t i c Model  In order to d e f i n e an i n f o r m a t i o n t h e o r e t i c model, one must identify  a channel  with  channel  probabilities  inputs  and  outputs,  z  J  different  and  s e t , say  statistically  input  (see, f o r example. Ash [ 5 ] ) .  channel model d e f i n e s the i n p u t as members of {b^,b ,...,b } ,  and  the  output  as  members  {a , a ,.. . , a ].  Each  K  dependent  only  on  some  the  and  The b a s i c finite  set  of the same or a output  a^  corresponding  is  input b j .  Such dependence i s determined by a f i x e d c o n d i t i o n a l p r o b a b i l i t y assignment P (a | b- ) , defined f o r each input bK  The  and  output  a^.  s e t of these c o n d i t i o n a l p r o b a b i l i t i e s d e f i n e s the channel.  P r o b a b i l i t i e s are a l s o given f o r each input b-.  Such  can be d e p i c t e d by the flow diagram shown i n F i g u r e  a  model  1.  Noise  Source  Encoder  >  Channel  -V /  F i g u r e 1: Information  Decoder  >  Channel  T h i s system d e s c r i b e s the flow o f i n f o r m a t i o n from a source to  a d e s t i n a t i o n i n p r o b a b i l i s t i c terms.  The source  message i s  a s s o c i a t e d with some o b j e c t which can be sent over the This  channel  i s considered  channel.  as the medium over which the coded  11  message i s t r a n s m i t t e d .  The  output  to o b t a i n the o r i g i n a l message.  in  an  attempt  model, the source algorithm;  and  message the  decoder  will  received  operates  be  the  on  basic  the  Kncwing  concerning efficient  we  will  fundamental  x by  attempt  the these  message over  being  executed).  indicate  how  a  more  obtained.  notion  the occurrence  our  to e x t r a c t i n f o r m a t i o n  the source message which w i l l a l g o r i t h m can be  The event  output,  of  message, the c o l l e c t i o n of  ( i . e . the r e s u l t of the a l g o r i t h m  the  In  steps  steps r e s u l t i n g from the t r a n s m i s s i o n of the source the channel  channel  of i n f o r m a t i o n provided about  of the event  the  y i s d e f i n e d on the model  by I(x;y) = l o g (P (x|y)/P (x)) where the base of the l o g a r i t h m i s u s u a l l y taken t o be s e l f - i n f o r m a t i o n of the event  x i s defined  2.  The  by  I (x) = - l o g P (x) and the c o n d i t i o n a l s e l f - i n f o r m a t i o n of x given y by I (x|y) = - l o g P (x|y) . The  average  ensemble X and  values  of these are d e f i n e d as the entropy  c o n d i t i o n a l entropy  r e s p e c t i v e l y , and  are given  The  average i n f o r m a t i o n between X and  a  model  will  be  Y,  P(x|y).  Y i s the entropy  of X l e s s  of X given Y, i . e .  I(X;Y) = H(X) Such  given  (x) l o g P (x)  H(X|Y) •= - S p ( x , y ) l o g  the c o n d i t i o n a l entropy  the  by  H (X) = - S P and  of the ensemble X,  of  used  - H(X|Y). in  the  following  study.  The  12  u n d e r l y i n g p r o b a b i l i t i e s of sources:  data  items  the model d e r i v e from two  (relative  f r e q u e n c i e s ) , and a  decomposition.  The l a t t e r a r i s e i n the f o l l o w i n g  for  be  1 < i < k  a l g o r i t h m , and  .».#  L  by  taking  -Epilog  P^) =  integers  ... + n^.  i  constructed H(P ,  non-negative  let n = n +  A  the  of  the  Basic  2.2.1  associated  with  an  scheme  is  probability The  entropy  scheme  the a l g o r i t h m .  C e r t a i n graph model.  The  which  follows.  t h e o r e t i c concepts are following  of  definitions  A 3£a£h G i s a f i n i t e  use  in  are  defining  taken  non-empty s e t V of p  together with a s e t X of q unordered p a i r s of d i s t i n c t Each p a i r x = {u,v}  d i r e c t e d l i n e s or a r c s .  The elements  A loop of a graph i s a l i n e  p o i n t to i t s e l f ; i f more than one l i n e  l i n e s are c a l l e d both  loops  and  multiple l i n e s . multiple  lines  called  °f  in  graph G i s an a l t e r n a t i n g sequence  which  joins  ri  each l i n e i s i n c i d e n t with the two  allows  in  A G.  of p o i n t s and  ,x^ , v ,... ,v j_ ,x , v^ , beginning and ending with n  are  a Pgeudograph.  A  L  X  j o i n s two p o i n t s , such  of G i s a graph having a l l i t s p o i n t s and l i n e s  lines  A graph  which  subgraph  a  points  of  A d i r e c t e d graph which is  from  p o i n t s of  of p o i n t s i n X i s a l i n e of G.  G i s d i r e c t e d i f the s e t X i s o r d e r e d .  a  role  Definitions  The  Harary [ 1 1 ] .  V.  then  Terminology  Graph Theory  this  n^  r e l a t i v e f r e g u e n c i e s of data items w i l l become c l e a r i n  the d e t a i l e d development  2.2  Let  probability  serves as a measure of the s t r u c t u r e of  structural  way.  p^ = n^/n. of  different  points  points,  immediately  13  preceding and f o l l o w i n g i t .  Such a walk j o i n s v  be denoted by v ,v , . . . , v .  The walk i s c l o s e d i f v =v .  fl  n  0  a trail  i f a l l the l i n e s are d i s t i n c t , and a  points  and  hence,  and v  o  JEath  may  It i s  rt  if  a l l the l i n e s , are d i s t i n c t .  and  n  a l l the  I f a walk i s  c l o s e d and i t s n > 3 p o i n t s d i s t i n c t , then i t i s a c y c l e .  2.2.2  Model  Definitions  Before d e f i n i n g the model p r e c i s e l y ,  we  will  discuss  the  i n t e r p r e t a t i o n o f the term a l g o r i t h m , as used i n t h i s study. algorithm  X  can  pseudograph G (X). or  steps  of  represented,  I n t e r p r e t e d i n t h i s way,  and  graph  is  called  calculation  node,  e g u i v a l e n t s t e p i n the a l g o r i t h m . graph  if  algorithm. i n G (X) ;  the  corresponding  the  a  calculation  depending  occurrence  a  elements  node  or  a  on the type of the  An a r c j o i n s two nodes i n the  steps  are  sequential  walks  are,  in  fact,  pseudographs.  of values i s done.  in  In  The next node, v , marks ?  the  any  the  of the n o n - i n i t i a l i z a t i o n steps of the a l g o r i t h m .  give s p e c i a l a t t e n t i o n t o c e r t a i n walks of G(X). beginning  basic  as  t h e r e are d i s t i n g u i s h e d nodes, at which any necessary  initialization beginning  naturally,  Each execution of the a l g o r i t h m then d e f i n e s a walk these  algorithm  quite  the a l g o r i t h m become the nodes, {v, ), of a graph.  Each node of the decision  be  An  with  v  of v  &  g  and  up  to  but  not  a of G(X).  The s e t  simply  d e c i s i o n node of a flow-sequence  a  w  =  v  I f v^ i s the f i r s t  e >\ ' • * *u ' <4 ' • • • » i / v  v  V  t  h  e  n  a f low-subsegugnce  by  next  will  A.  denoted  the  {a } of flow-sequences i n G(X) K  be  A walk i n G(X)  including  i s c a l l e d a flow-seguence  We  b of  A(X),  a  or  is  14  d e f i n e d as (1) any subwalk of a , beginning  with a d e c i s i o n node  K  v', v  1  and up t o , but not i n c l u d i n g , the next occurrence or v, , or  (2) the subwalk  v , v ,.. . , v . S  flow-subseguences  a  analysed.  distinct  flow-subsequences  structural  context  in  flow-sequence  be  which  i t may  a  i s denoted by B (a).  m and n, r e p e c t i v e l y .  of  it be  flow-  The s e t of flow-subsequences of In the f o l l o w i n g , we  assume the t o t a l number of flow-sequences and to  containing  We denote the s e t of a l l flow-subsequences  sequences i n G (X) by B (X) or B. a  of  ^  i n t o the model i s a s t r u c t u r a l c o n s i d e r a t i o n .  For a given node, each of the provides  The i n t r o d u c t i o n  L/  S  cf either  flow-subsequences  The n o t i o n of flow-subsequence i s  somewhat s i m i l a r t o that of i n t e r v a l , given by A l l e n [ 4 ] , Before  defining  reduction  and  isomorphism  of  flow-  seguences, we i n t r o d u c e some f u r t h e r terminology. In  order  to  provide  h y p o t h e t i c a l assembly simple  so  as  a  language  weight f u n c t i o n f o r the model, a is  defined.  given i n Appendix A.  weight,  w(v),  ( i . e . machine necessary  which  is  cycles),  the  of  cost  the  to compute t h i s step.  an  increase  in  The a c t u a l  instruction  is on set  Each node v of the graph i s assigned a in  execution  assembly language  units,  instructions  The weight of a walk i s the sum  of t h e c o s t s of i t s c o n s t i t u e n t equate  language  not to i n c o r p o r a t e any i n s t r u c t i o n s dependent  the c o n f i g u r a t i o n of some machine. is  The  nodes.  efficiency  For  with  a  this  study,  we  decrease i n these  execution c o s t s . Next we  define  the  process  of  structuring  to  include  15  alterations  to  the  program  code  which  depend  r e l a t i o n s h i p s among c e r t a i n nodes of the graph. is  considered  increased.  i f t h e e f f i c i e n c y of the program i s  of  as  subroutine,  t h a t r e f l e c t s a f i x e d order among c e r t a i n nodes are structuring.  S t r u c t u r a l complexity  with t h e flow of c o n t r o l o f regard  flow  structuring  w r i t i n g a s e c t i o n of t h e a l g o r i t h m as a  or as a form examples  beneficial  A  on  simple,  the  algorithm.  i s identified  For  example,  we  a s t r u c t u r e with simple l i n e a r flow, and as  complex, one with imbedded l o o p i n g and branching. we  now  continue  implementation as  a  program  i > 1,  structured We  next  in  model  the  assembly  language.  will  make  precise  subseguence,  as  t o decrease  the  repeated s e v e r a l times.  overall  several case,  copies  flow-sequence  which has been  of  a  given  when  costs. A  flow-  a loop i s  Noting t h i s , we d e f i n e a', the  Thus, w(a) i s to be  simply A'.  We d e f i n e the  as the weight o f the corresponding  The s e t of a l l reduced or  algorithm.  reduced  o f flow-sequence a, as the walk c o n s i s t i n g of the  flow-sequence.  A* (X)  the  standard  execution  f o r example,  d i s t i n c t flow-subsequences o c c u r i n g i n a.  w(a').  An  the terms r e d u c t i o n and isomorphism.  contain is  The  denote an implementation  i n an attempt  may  a  of  0  flcw-seguence  of  definitions.  M (X) , i s the i n i t i a l coding  M-(X)  flow-sequence  the  M (X) of a l g o r i t h m X, i s a coding of the a l g o r i t h m  implementation, For  with  interpreted  flow-sequences  A flow-sequence a  r  weight reduced  throughout  as  w i l l be denoted by  i s isomorphic to flow-  sequence a i f , s  a) B(a ') = B(a '), i . e . the order of flow-subsequences w i t h i n r  s  16  flew-sequences  i s not important, or  b) Bfa,?) C B {a*), and there i s no a  1  such that  B(a ') C B(a«) and r  «(a ') < w(a «) < w(a ») . r  t  6  That i s , the flow-subsequences of a ' are r  f lew-subsequences  of  also  in  the  s e t of  a ', and the c o s t i n e x e c u t i o n u n i t s of a,? &  i s c l o s e r to the cost of a ' than $  to  any  other  reduced  flow-  sequence. Reduction  and isomorphism are present i n the model i n order  to i n c l u d e s t r u c t u r e i n the measure of complexity. sequences  are  such  that  flow-subsequences  I f the flow-  a r e not repeated,  i . e . no r e d u c t i o n i s p o s s i b l e , then w i t h i n these  flow-sequences,  s t r u c t u r i n g of the program so that some i n n e r loop i s made efficient  is  significantly. sequence  is  unlikely  to  decrease  the  execution  T h i s f o l l o w s s i n c e no s i n g l e p a r t of repeated  often.  However  i f much  the  very times flow-  reduction i s  p o s s i b l e , i . e . i n n e r loops are repeated many times,  structuring  the  lccps  more  same  basic  program  to  reflect  this,  or  coding  e f f i c i e n t l y , decreases the o v e r a l l execution  these time.  S i m i l a r l y , when a n a l y s i n g isomorphism, i f the s t r u c t u r e i s repeated, i . e . one flew-sequence  i s simply a subset  of another,  then the complexity o f the t o t a l graph  while  cost  the  remains  sequence, when attempting decrease  overall  constant.  Emphasis  to optimize the  code,  i s decreased,  on such a flowis  likely  to  c o s t to a g r e a t e r extent than a flow-seguence  which has no corresponding isomorphic flow-sequences.  also,  if  17  there  are  no  particular  the  flow-sequences,  flow-sequence  efficient reduced  isomorphic  ceding,  and  which  should  then be  structuring  studied  the c o s t i s u n l i k e l y t o be  by c o n s i d e r i n g the s t r u c t u r e alone.  unstructured  there  program  has  w i l l markedly reduce  been  the o v e r a l l  for  no  more  significantly  That  coded  is  is,  assuming  efficiently, cost.  This  no will  become more evident when the i n f o r m a t i o n value i s defined below. One  can  a s s o c i a t e with an a l g o r i t h m X the s e t of data  on which the a l g o r i t h m X operates. algorithm  For  example,  if  t o s o r t n numbers i n t o order, then D(S)  all n factorial  p o s s i b l e o r d e r i n g s of n numbers.  S  D(X)  is  an  i s the set of I f D(X)  i s not  f i n i t e , then f o r t h i s a n a l y s i s a f i n i t e subset D' (X) of D (X) selected  which  executes.  is  representative  By r e p r e s e n t a t i v e we  of  mean  the that  data for  on each  flow-sequence of G(X), there i s an element of D*(X)  is  which  X  possible  which y i e l d s  that flow-sequence.  Since a f i n i t e data set can be obtained f o r  any  will  algorithm,  we  assume  that  D(X)  follows.  With each element d of D (X) , t h e r e  relative  frequency,  algorithm executes  '8(d),  which  on t h i s element.  the pseudograph r e s u l t i n g from datum d, G (X) . d  We A^'  then d e f i n e is  flow-subsequences.  how  an  associated  f r e q u e n t l y the  For each d, l e t GJX)  to be the set c f  denote  flow-sequences  c o n s i s t i n g of the reduced,  of A , .  d e f i n e d above.  is  the execution of the a l g o r i t h m on  the subset of  isomorphic flow-sequences i s the s e t A*,  denotes  i s f i n i t e i n what  The union o f a l l A* B  d  i s the corresponding  of non-  over  D(X)  set  for  18  2.3  D e f i n i t i o n of the Information Hodel. The  model used w i l l f o l l o w the d e s c r i p t i o n given  2.1,  a l t e r e d to i n c l u d e a p r o b a b i l i t y scheme d e f i n e d  and  structural  properties  subseguences as the sequences  as  meaningful. output, about  source  the  The  of  the  message  received  execution  algorithm. and  the  input  which  of  can  the  point  resulting  to r e f l e c t  algorithm.  flow-sequences, and  i t s c o s t and  Given an  algorithm  The  channel  induced by D(X). subseguences  is  can  X,  reduced,  The  flowbecomes as  information  more  efficient  by t r a n s m i t t i n g  the  characterized  by  be  value  defined  G (X) ,  and  its  data  set  follows.  by the c o n d i t i o n a l p r o b a b i l i t i e s {b - },  is  output i s qiven  non-isomorphic  t r a n s m i t t i n g B over the  flow-  produces  an i n f o r m a t i o n  i t s graph  ijvput,  of G (X).  cost  structure.  defined  The  a  Thus,  D (X), the model i s c h a r a c t e r i z e d b r i e f l y as  2.4  on D(X)  to  input ever the channel, the a l g o r i t h m  of  model  flow-seguences; an a n a l y s i s of these y i e l d s  implementation  the  resulting  this  of the a l g o r i t h m  on the  With the  the  message,  in section  the  set  B  of  flow-  by A *= [a^* }, the  flow-sequences  obtained  set by  channel.  D e f i n i t i o n Of P r o b a b i l i t i e s The  defined,  input,  channel  completing  probabilties  are  the  defined  and  output p r o b a b i l i t i e s w i l l now  formulation  of  the  model.  as a f u n c t i o n of execution  cost  be The  units  19  i n order that the the model w i l l i n f a c t be a f u n c t i o n of R e l a t i v e f r e q u e n c i e s are assumed t c be d e f i n e d on D(X) are  incorporated  in  the  cost.  and hence  d e f i n i t i o n o f the i n f o r m a t i o n  Given that the e x e c u t i o n c o s t of flow-subsequence  b" i s 0  by  value. defined  and t h e t o t a l cost of the n flow-subsequences by H  2Mj ,  =  then the input p r o b a b i l i t i e s are d e f i n e d by Pfbj) = B j / * . This  gives  the  cost  frequency  of  flow-subsequence  r e s p e c t to the t o t a l c o s t of the flow-subsequences.  b^ with  Defining  a = { a | b- i s i n B (a) }, J  and the weight of a-* as w(a ) = 2w(a) , J  where  the  sum  probabilities  i . e . the  is  over  those  a  in  a^,  then  the  channel  are  [  cost  w(a„)/w(a' )  i f b- i s i n B (a )  0  otherwise  i  frequency  u  of  flow-sequence  with respect to  those flow-sequences c o n t a i n i n g flow-subsequence b:.  We  denote  J  the  output p r o b a b i l i t i e s P{a ) K  i . e . the  by  n = 2,P(a |b-)P(bj) , K  cost  probability  of  flow-sequence a^ averaged c o s t  wise over a l l flow-subsequences i n flow-sequence a . K  element  d  of  For  each  D (X), R(d) i s the r e l a t i v e frequency of datum d.  20  We  now  show  that  these  definitions  do  in  fact  probabilities.  Obviously  = i=1  1  since,  N  N  *  2P(b;)  •  =  i=1  Irl = I ~  +  N J  1  8  Also n  m  2 P ( a J k=1  =  = S  SP(a  k=1  (1/N)  [  lb,)P(b )  j=1  m 2 k=1  :  n 2 (w(a j=1  1  *  J  „&  Al  y  *  v (a ) L  ;  J  v (a*)  ' w (a -)  +  )/w(a ))N  '  w(a")  w (a" ) 1  w (a ) n  yield  21  + ... +  8^  ( H  t  „  +  . . .  +  N „ )  +  w (a* )  = =  where w  '•J  +  x  . . .  i f b- i s i n B (a )  K  <1/w(a ))  K  otherwise m  L  defining  If  N  f w (a ) j 0  =  a  = (w (a )/w ( a ) ) = 1 • L  k=l *'  L  L  the model p r o b a b i l i t i e s as cost measures, the  concepts o f cost and model.  +  t  1  and  In  ( H  <1/N)  complexity  are  strengthened  within  the  flow-sequence a^ has high p r o b a b i l i t y , then  this  i n d i c a t e s one or more of the f o l l o w i n g . seldom repeated flow-subsequences,  First,  program  around  their  occurrence  impede t h e execution of the decrease  the  sufcseguences  execution  other  time  high  consists  probability  in  a ^ w i l l not g r e a t l y  a . k  Also,  relation results.  to  and  contributor  analysis, tc  overall  since cost  should  i f the those  in  flowother  This i s consistent  with the notion that a very c o s t l y flow-sequence should considerable  of  so s t r u c t u r i n g of  flow-seguences,  of  i n a^ are c o s t l y i n  flow-sequences,  K  i . e . the flow-subsequences of  a^ do not appear i n many other flow-sequences, the  a  receive  i t n e c e s s s a r i l y .will be a l a r g e (assuming  uniform  relative  22  frequencies).  If  the  above  occur, a higher c o s t  w i l l r e s u l t , and hence, as w i l l be shown, a  larger  probability information  value.  2.5  D e f i n i t i o n Of Information Measure Using the i n f o r m a t i o n t h e o r e t i c r e l a t i o n I (X;Y) = H (¥) - H (YjX) ,  the  information  value of an a l g o r i t h m i c implementation  defined i n terms of the sequences,  and  flow-sequences  the  entropy  H(A(X))  conditional  given  the  of  entropy  input  the  will  ouput  flow-  H (A (X) | B (X)),  flow-subsequences.  be  of the  For  each  p o s s i b l e datum d i n D(X), with r e l a t i v e frequency B ( d ) , H ( A ( X ) ) d  and H  d  (A (X) | B (X)) are c a l c u l a t e d H (A(X)) = - 2  (w(a )P(a )log P(a ))  d  where the sum  as y  K  K  i s over the set of reduced  flow-seguences  A^,  and  H (A (X) |B (X)) = -£(N-P(b- J P f a J b ^ l o g P ( a | b j ) ) , d  the  u  sum  taken  over  the  set  of flow-sequences  A', and f l o w -  subsequences  B^.  The c o s t s , and hence p r o b a b i l i t i e s , are  determined  by  the  given  c o n f u s i o n w i l l a r i s e , we w i l l  implementation,  M (X).  w r i t e A and B f o r A(X)  those  Where and  no  B(X),  respectively. We  interpret  the  entropy  as  a  s t r u c t u r a l complexity of the a l g o r i t h m . if  there  are  only a few flow-sequences  measure of the c o s t A lower entropy i n A*  number p o s s i b l e , i . e . t h e r e were many isomorphic Thus, proper  structuring  should  decrease  the  and  occurs  compared to the flow-sequences. overall  cost.  23  Also,  i f the  same  flow-subsequence  occurs i n many d i f f e r e n t  flow-sequences, the entropy i s lower, i n d i c a t i n g that t h i s f l o w subsequence its  flow  should be coded more e f f i c i e n t l y , pattern.  f low-subseguences relative and  with  emphasis  on  However, entropy i s higher when e i t h e r the within  certain  flow-sequences  are  costly  t o other flow-subsequences, or flow-sequences a r e long  structurally  complex  relative  to  other  flow-seguences.  A t t e n t i o n should be given to such flow-sequences when attempting to o p t i m i z e the code of a given a l g o r i t h m , s i n c e o v e r a l l c o s t i s dominated  by them.  This  measure  subsequences  then,  which  points  should  to  be  flow-sequences  considered  or  for  flow-  possible  optimization. The  weighted c o n d i t i o n a l e n t r o p y , H^(A|B), i s a refinement  of the entropy measure.  I t c o n s i d e r s the  structural  complexity  preference  duplicated.  I f the same  flow-subsequence  different  flow-sequences,  then,  amount  that  of  i s , in b  so  as  to  other flow-subseguences  optimize  if  the they  sense,  in  b  Necessarily  of  T h i s f o l l o w s from  these  flow-sequences  only  one  present.  then, the s t r u c t u r e s i n the non-isomorphic  sequences  v e r i f y i n g the above remark.  copy  But  is  are d i s t i n c t ,  then  be  on b's r e l a t i v e p o s i t i o n among  of the flow-sequence.  isomorphic,  many  can  same g e n e r a l s t r u c t u r e , then they are isomorphic. are  and  although the code of b can be  the o b s e r v a t i o n that i f more than one has  a  appears  optimized, only one of the flow-sequences c o n t a i n i n g structured  cost  Hence, by s u b t r a c t i n g  24  the  c o n d i t i o n a l from the u n c o n d i t i o n a l entropy, a c o r r e c t i o n i s  made  for  the  repeated.  amount  Also  of  of  'structural  note  is  information'  repeated i n many flow-seguences.  simple  structure  the  This  are g e n e r a l l y  indicates  a  fairly  which y i e l d s a s u b s t a n t i a l c o s t decrease when  i t i s properly structured high  is  the f a c t that i f the c o n d i t i o n a l  entropy measure i s low, then the flow-subsequences not  which  and  efficiently  coded.  However,  a  c o n d i t i o n a l entropy denotes a more complex s t r u c t u r e , with same  flow-subsequences  different  flow-sequences.  occurring In  in  many  structurally  a sense the c o n d i t i o n a l entropy  accounts f o r the i n t e r s e c t i o n of flow-sequences,  through  common  flow-subseguences. The element  above  discussion  d of D (X).  transmitted  To complete  through  the  executed on a l l data. are  defined  is  channel,  the  i . e . the  only input  a single must  be  a l g o r i t h m must be entropy  as = 2 8 (d) H^ (A)  H(A|B) = where the sum  and  SB(d)H (A|B), d  i s over a l l d i n D (X).  Then the i n f o r m a t i o n value  the implementation of a l g o r i t h m X i s given by I (A; B)  2.6  the model,  with  Thus the entropy and c o n d i t i o n a l  H (A)  of  concerned  = H (A) - H (A | B) .  Uses o f The Information Value The  term  cost  comparable  i s used here to d e s c r i b e those  a l g o r i t h m s which, i n t h e i r standard implementations, have w i t h i n  25  a few  per cent the same execution  nodes, and I(A;B)  execution cost  over a s e t {Z}  per  structured  which  is  be  First,  no  of  costly  isomorphic  flow-sequences.  p o s s i b l e and  some flow-subsequences  But  Hence,  an a l g o r i t h m  the  particular the one  observing  s t r u c t u r e has and  In  the  the  are  {Z},  appear  there  no in  an  information  reduction  is  different  flow-sequence  information  value.  i . e . many dominant  our  attention  to  a  the best implementation M (X) i s  information  implementation  the  no  value.  restricting  hence i n f o r m a t i o n v a l u e .  are  many  value. to  This  follows  by  which c o n s i d e r a t i o n of  been most b e n e f i c i a l , w i l l have the  c a l c u l a t e d r e l a t i v e to  certain  set  since  thus, a l s o the  X of {Z},  the lowest  that  lastly,  Furthermore,  and  hand,  algorithm  flow-  flow-subsequences,  where t h i s i s not the case,  other  having  And  both the c o n d i t i o n a l and  flews, w i l l have a higher On  certain  f o r o p t i m i z a t i o n of cede s i n c e  flow-sequences,  w i l l be low,  can  s t r u c t u r i n g i s b e n e f i c i a l to i t , must  c o n s i s t of many d i s t i n c t  probability  the program  Secondly,  considered  that  flow-sequences.  algorithm.  simple.  consisting  such  which s o l v e  of a dominant flow-seguence  making the program expensive to execute. algorithm  of  advantage  are dominant throughout the program.  flew-sequences,  value  from the most e f f i c i e n t  of the f o l l o w i n g .  take  can  maximum  1  structurally  subsequences they  to  The  value on t h i s s e t i s more ' i n f o r m a t i v e , i n  that i t i n d i c a t e s any be  datum.  of c o s t comparable algorithms  the same problem, i s obtained A higher i n f o r m a t i o n  c o s t per node, t o t a l number of  lowest  cost,  I n t i a l l y the i n f o r m a t i o n value i s standard  algorithm  implementation  26  M^X).  Then,  upon  structuring,  i f the  information  i n c r e a s e s , such a s t r u c t u r i n g does not r e f l e c t the flow algorithm.  But  implementation decrease  i f t h e i n f o r m a t i o n value decreases,  included  the  cost  structuring  significantly.  and A  value of the  the given  optimizing  further  which  d i s c u s s i o n and  v e r i f i c a t i o n o f t h i s f a c t i s given i n Chapter I I I , where shown  that  the  implementation  information  Value  - A Complexity  In order to analyse to  available.  other  algorithms  means  f o r the  i t s flew p a t t e r n .  a  most  of  measure  of  cost  helpful  in  can be o p t i m i z e d , disregarded,  be  executing  h e l p f u l when  are n e c e s s a r i l y ,  or complexity,  S i m i l a r l y , measures which  must  it  Measures which are  o f a s i n g l e c o s t c o n t r i b u t o r , although  general  effective.  ranking  task  cost  comparing a l g o r i t h m s r e l a t i v e to t h i s f a c t o r , as  of  same  T h i s measure should i n c l u d e the  function  situation.  and E f f i c i e n c y Measure  an a l g o r i t h m , a  the a l g o r i t h m , and r e f l e c t a  can be used to s e l e c t t h a t  which i s most a p p r o p r i a t e i n a given  2.7 Information  relative  value  i t is  consider  only flow  partially only  are  a n a l y s i n g the problem of whether and how code but i n such study, r e l a t i v e  again  leaving  the  measure,  costs in  are  often  some  sense,  sense  should  incomplete. A measure then, t h a t i s u s e f u l i n a reflect  both  of these f a c t o r s .  In the measure given above, an  attempt has been made t o i n c o r p o r a t e c o s t structural  complexity.  general  and  some  notion  of  An i n f o r m a t i o n t h e o r e t i c measure seemed  27  to be a most Through  natural  means  combining  the d e f i n i t i o n of isomorphism and  p r o b a b i l i t i e s , t h i s model becomes  a  function  respectively. for  of  of of  the  structural  w i l l have a higher  with  practical  tc  algorithms  are ' s t r u c t u r a l l y e g u i v a l e n t *  attempt  pattern  of  cost,  of  an  which i s  but  for  indicates  and  more  which  the  algorithms  according  a higher i n f o r m a t i o n implementations information execution  value.  will  costs.  partitions  the  i n approximately conditional  be  sections  data  is  useful  of  code  amount  of  will  the  when that  respond  Thus,  structure.  flow-seguences, provides  selecting  the  the  of  i s i n d i c a t e d by  the  given  algorithm,  measure  responds  structuring.  are  the  most  similarly  If  to  the  d2 of D ( X ) , same.  c o s t l y of  achieves  measure which i s a  having  terms  a  in  f o r elements d1 and  algorithm  rank  different  s e t i n t o b l o c k s , where each block  e s t a b l i s h i n g a complexity and  the  with the s m a l l e s t  efficient  the same manner to proper  If  that,  comparing  equivalence  an not  structure.  same  t h a t one  most  e n t r o p i e s are equal  optimization.  cost  the  when  which, when a n a l y s i n g  then t h e i r walks through concept  Thus,  Structural  c o n d i t i o n a l entropy,  is  to c o s t , the more expensive one  of some a l g o r i t h m X,  value  cost,  than  it  s t r u c t u r i n g i s a p p l i c a b l e , then the i n f o r m a t i o n value the  the  suitable  i n the sense  that  of  algorithm  complexity  to e s t a b l i s h an e f f i c i e n t  both  entities,  i n f o r m a t i o n content  algorithm  flow  comparable  two  r e d u c t i o n , and  execution  As d e s i r e d , an a l g o r i t h m  structuring  the  This those  given the  function  code  goal of  of both  T h i s measure, by a n a l y s i n g the output  of  i n f o r m a t i o n about the c o n s t i t u e n t flow-  28  subsequences; algorithm  can  such i n f o r m a t i o n then p o i n t s to ways i n which be  made  more e f f i c i e n t .  F u r t h e r d i s c u s s i o n of  t h i s o b s e r v a t i o n f o l l o w s i n Chapter I I I , where s e v e r a l of how  the  t h i s measure has been a p p l i e d are g i v e n .  examples  29  CHAPTER I I I : AN INFORMATION THEORETIC ANALYSIS OF ALGORITHMIC COMPLEXITY 3.0  Introduction In t h i s Chapter, we  model,  He  present a more d e t a i l e d  d e s c r i b e the processes  investigate finally,  certain  of  the  used to o b t a i n i t s components,  properties  demonstrate,  study  of  through  the  defined  examples,  measure, and  some  of  its  applications. Let  X be an a r b i t r a r y a l g o r i t h m .  d i s c u s s i o n , the nodes v , v , . . . , v Q  follow  the  directed  d e c i s i o n and Using  the  t  r  Throughout the f o l l o w i n g  of G ( X )  flow, depth f i r s t .  c a l c u l a t i o n node, or  w i l l be  numbered  to  Each node i s e i t h e r a  simply  a  calculation  procedures given below, the i n p u t , output,  and channel p r o b a b i l i t i e s can be obtained, and  node.  and  hence, the  input model  made o p e r a t i o n a l .  3.1  Procedure f o r Flow-Sequence C o n s t r u c t i o n In  the  statement of the procedure, j i s used as the  on the nodes of X.  The  first  algorithm  and  the t e r m i n a l node i s v .  is  v. ,  non-initialization  that no d e c i s i o n node occurs i n the i n i t i a l i z a t i o n stack  exiting  of  the  Here we assume process.  A  i s used t o s t o r e that p o r t i o n of a flow sequence t h a t  has  been c o n s t r u c t e d node.  step  index  p r i o r to the occurrence  Also s t o r e d on from  the  stack  t h i s d e c i s i o n node  sequences t h a t w i l l be c r e a t e d ) .  is  of the present d e c i s i o n  the  number  of  branches  (and hence the number of  flow-  30  Step 1:  Initialization  step.  I n i t i a l i z e k to 1, j to 0. Step 2: I n i t i a l i z a t i o n  Nodes.  I f j<s, increment j and r e t u r n to step 2. to  v, ,  and  if  v_  is  a  I f j=s,  set  a^  d e c i s i o n node, stack j and a,,;  increment j , and go t c step 3. Step 3: Main Flow-Sequence C o n s t r u c t i o n . I f j>r, go to step 4. a  A d j o i n V j to a »  is  d e c i s i o n node, s t a c k j , a ; i f j = t , and the stack i s not R  empty, increment k, remove a , K  j;  I f j ^ t , and v-  K  j from the s t a c k ;  increment  go to step 3.  Step 4: Terminate.  3.2  Flow-Subsequences; The  set  B  of  Weight  Assignment.  flow-subsequences  f o l l o w i n g manner, where i n i t i a l l y flow-sequence not the  a,  a l r e a d y i n B.  i s formed  from A i n the  the set B i s empty.  of A, add to B any of  a's  more  easily,  we  associate  manipulate a  numbers, P. , with each node v,- , one value f o r each of the j J subsequences  in  which  f o l l o w i n g examples).  v.  appears  (see  To accomplish t h i s ,  each  flow-subsequences  To a l l o w the a n a l y s i s programs to  flow-sutsequences  For  Appendix  B  set  f lowand  i n i t i a l i z e i to 1  repeat the f o l l o w i n g procedure f o r each node v. .  of  the and  For each f l o w -  O  subseguence  b  of  B, i f v. i s a node of b, i n c l u d e i i n P-  increment i .  Using the  subseguences  are expressed as s e t s of numbers.  are  allotted  results  of  this  process,  the  and  flow-  Next, the nodes  weights determined by the c u r r e n t implementation of  31  the  algorithm.  assigned  a  constituent  Each flow-subsequence  weight  which  and flow-sequence  i s the  flow-subsequences.  sum  of  Once t h i s  is  then  the weights of i t s  assignment  has  been  made, the c o s t p r o b a b i l i t i e s can be c a l c u l a t e d .  3.3  H i e r a r c h y Of Isomorphic Flow-Sequences The  flow-sequences  are  now  ordered  as  a  h i e r a r c h y of  f a m i l i e s o f flow-sequences, F^ , where b (F. ) i s the weight,  member  constructed  of  the  family  F- , L  q  lowest,  the number of f a m i l i e s  so f a r , and s the index o f t h e f a m i l y t o  c u r r e n t flow-sequence  which  the  w i l l be a d j o i n e d .  Step 0: I n i t i a l i z a t i o n  Step  Set  q to 1, and F^ to the flow-sequence  ties  resolved  of h i g h e s t weight,  arbitrarily.  Step 1: I f any flow-sequences highest  by  weight,  set i  remain,  set h  to  the  t o 1 and go to step 2.  one  of  Otherwise,  go t o step 5. Step 2: I f B (h) i s a subset o f B (b (F )) , then s e t d i f f to L  w (b (F. ) )-w(h) , s e t s to i , increment i If  B (h)  set  s to i  step  3.  i s not a subset of B (b (F^)) , increment i ; i f i>q, and go t o step 4 ; otherwise go t o step 2.  Step 3: I f i<q, i f B (h) greater  and go to  than  i s a subset o f  w (b ( F - ) ) - w (h) ,  then  B (b (F. ) ) ,  ana  set d i f f  d i f f e r e n c e , and s e t s to i ; i f i ^ g , increment i  is  to t h i s new and go  step 3; otherwise, step 4 . Step 4 : A d j o i n h to F„ , s e t q to i and r e t u r n to step 1. 5  diff  to  32  Step 5: Terminate. The  above  procedure  establishes  families  of isomorphic  flow-sequences, where such flow-sequences are reduced. family,  the  flow-sequences  are  ordered  by  In  weight, the most  c o s t l y being assigned the highest order (see Appendix use  of  each  B) .  The  the h i e r a r c h y makes the removal of isomorphic c o p i e s of  flow-seguences a simple p r o c e s s .  3.4  Reduction and Isomorphism For each d,  the  of  Flow-sequences  pseudograph  G (X)  is  produced,  and  AJ  d  computed.  Then, the f o l l o w i n g processes are a p p l i e d so t h a t the  flow-seguences F i r s t , A^  is  can  be  reduced  considered  for  and isomorphic c o p i e s removed. reduction.  Within  each  flow-  sequence, i f any flow-subsequence appears more than once, only a single  copy  i s r e t a i n e d i n the walk; however, the t o t a l  of the flow-sequence remains unchanged. seguences are seguence,  a',  checked has  for  weight  Next, the reduced flow-  isomorphism.  Each  reduced  flow-  an a s s o c i a t e d f a m i l y , F*, i n the h i e r a r c h y .  I f t h e r e i s a flow-sequence i n both F' and A^ which has a higher order than a' i n F', or a* occurs more than once i n A , d  copy of a' from A^. both  A^  1  to  that  flow-sequence  in  and F» which has the l e a s t order g r e a t e r than or equal  to the order o f a'. consisting Appendix  Then, add wfa )  remove a  of  T h i s process y i e l d s the  subset  A*  of  A.-  only reduced, non-isomorphic flow-seguences (see  B and the examples  i n section  3.7).  33  3.5  Entropy, Using A',  C o n d i t i o n a l Entropy,  Information  the entropy, c o n d i t i o n a l entropy  Value and  information  d  value  are  calculated  associating a relative conditional averaging  entropy over  for  each  frequency and  D(X).  with  information Each  a l g o r i t h m y i e l d s another  distinct  input probability d i s t r i b u t i o n .  of  the  which  is  examples, i n the remaining 3.6  d  in  D(X).  each  d,  the  value  are  now  The  possible,  of  hence, d e f i n e s a  Value been used f o r  to choose  we  an  implementation  of  c o p i e s removed.  I (a) = { j i b -  S e l e c t i n g An  are  In the f o l l o w i n g  we  more  assume  or  a  set  on  by  i s i n B (a) }.  Algorithm t h a t the algorithms are c o s t comparable  within t h r e e or four per cent. costs  task,  algorithm  A l s o , we d e f i n e an index  the flow-subsequences bj of a flow-sequence  First,  that  one  assume t h a t the flow-sequences have been reduced,  and isomorphic  3.6.1  two  to s e l e c t from a set of a l g o r i t h m s , that  which i s most a p p r o p r i a t e i n such a s i t u a t i o n . analysis,  with  sections.  which i s , i n a c o s t sense, the most s u i t a b l e f o r the given and secondly,  by the  i s discussed  In t h i s study, the i n f o r m a t i o n value has First,  obtained  e v a l u a t i o n and a n a l y s i s  F u r t h e r A n a l y s i s o f the Information  purposes.  Then, entropy,  implementation  set of c o s t s f o r B and  new  algorithm  fixed  less  the  For such a l g o r i t h m s ,  since  the  same, the flow s t r u c t u r e i s the  34  dominant component of largest  information  algorithm.  information  value  Specifically,  algorithms. distinctly that  the  from  l e t X and  Furthermore,  value;  the Y  two  that  flow-sequence  c  to  To see t h i s ,  bj  of  we note t h a t a  has  a  sense  information  a flow s t r u c t u r e which  we study each a l g o r i t h m ' s  subsequences  X  i n the  Then the  p o i n t to a more e f f i c i e n t coding  a n a l y s i n g a l g o r i t h m X,  optimum  of X c o n s i s t s of flow-subsequences  value i n d i c a t e s t h a t a l g o r i t h m X has used  the  comparable  algorithm  not o c c u r i n g i n other flow-seguences of X.  be  cost  b e t t e r flow s t r u c t u r e than algorithm Y,  each  obtain  structurally  be  suppose  we  of the  algorithm.  i n f o r m a t i o n value.  in  most  flow-sequence  c  cases  can  First  the  flow-  do not appear i n many  K  other flow-sequences, so the c o n d i t i o n a l p r o b a b i l i t y P(cjb^) i s c l o s e to u n i t y and, equal  to  Sp(b-) ;  f a i r l y uniformly this  does  sequence there  not  = w(c )/w(c ) , J  K  thus, over I (c^) , P ( c ) K  this  is  shows t h a t the walk p r o b a b i l i t i e s  distributed. hold.  However i n the  Most  second  flow-subsequences  e^ of Y, occur i n other flow-sequences.  are  more  approximately  flow-sequences  for  algorithm  xt •  of  As a  Y than f o r X.  flowresult,  Thus, when  calculating  j  P<ej |u.) t  its  value  are  is  considerably  r e s u l t i n g flow-sequence  = w (e^)/w (e ) , L  less  than  unity  and  hence  the  probabilities  P(e ) = £ p ( e ^ | u ) P ( u . ) , L  as  well  as  algorithm X.  corresponding  entropy,  are  less  than those f o r  Hence, s i n c e the flow-sequence weights are  nearly  35  the  same, the i n f o r m a t i o n  that  of  algorithm  algorithms,  the  Y.  one  value f o r a l g o r i t h m x i s g r e a t e r Thus,  with  when  the  comparing  higher  structure,  coded.  3.6.2  i s optimal r e l a t i v e t o t h i s  S e l e c t i n g the Algorithm Having  chosen  the  task, an a n a l y s i s of i t , instance,  the  reflects  the  is flow  value  indicates  which  factor.  Implementation  algorithm  most s u i t a b l e f o r the g i v e n  based  cost,  implementation  on with  is  made.  To see t h i s ,  In  this  minimum c o s t i s d e s i r a b l e ;  a c c o r d i n g l y the minimum of the i n f o r m a t i o n values an implementation.  value  To summarize, when the c o s t s f o r a l g o r i t h m s  are comparable, t h e maximum i n f o r m a t i o n algorithm  unstructured  information  s e l e c t e d , and an implementation of i t which  than  we c o n s i d e r  p o i n t s t o such  the  two  ways  in  which the a l g o r i t h m c o s t s can be decreased.  Case Jl: Some flow-seguence a* i s made more e f f i c i e n t by a p p l y i n g o p t i m i z i n g techniques we  t o i t s flow-subsequences.  For s i m p l i c i t y ,  assume t h a t a* c o n s i s t s of flow-subsequences b^ which do not  appear i n any other flow-sequence o f P(a*)  < 0.5.  i . e . w (a*)  Then,  decreases,  if  a*  is  the i n f o r m a t i o n  the  algorithm,  coded  more  and  that  efficiently,  value a l s o decreases.  The  proof f o l l o w s . Under the  assumptions  given  above,  l e t Z j be  the the  decrease i n N; , (z; i s zero i f b- i s not i n B (a*)) , z = S Z J , and %  36  N*= 2 N , -  defined  where  the  sums  are  over  by 2w ( b j ) , where the sum i s  I (a*). taken  R e c a l l that N i s  over  B.  Then  the  f o l l o w i n g e q u a l i t y holds f o r those j i n I (a*). P ( a * | b , ) = « (a*)/w (a^ ) = w(a*)/w(a*) =  Also,  2 P (b^)  increase,  over  implies  f o r i f SP(bj)  decreases,  were to  So  decreases,  >  (N-j-2-)/(N-Z)  (N*-z)/(N-z)  contradiction. P (a*)  I (a*)  then 2  This  1 .  2P(b^)  and  > N*/N,  and  hence  N < N *,  over I (a*) d e c r e a s e s , which  hence,  that  -P(a*)log  P (a*)  a  implies  decreases.  Now w(a*) decreases by assumption, so w (a*) (-P (a*) l o g P (a*)) , the i n f o r m a t i o n Now  we  value of a*, d e c r e a s e s .  examine  flow-subsequence  the e f f e c t  b* which  Qed.  of d e c r e a s i n g the c o s t of some  appears i n many flow-sequences.  This  s i t u a t i o n can be examined as two subcases.  Case 2-J: F i r s t suppose that b* i s not i n e i t h e r B (a^) or B (a ), J  j  i n I ( a ^ ) , and l e t z be the decrease i n w(b*).  i n c r e a s e s f o r b-, i n B ( a ) s i n c e N- i s u  Also  P(a^|b-j) = w(a^)/w(a )  w(a )  are c o n s t a n t .  J  is  J  fixed  fixed,  Now P ( b ) = N-/N  and  since  N  decreases.  both  and  Thus,  n P (a ) = S P (a |b- ) P (b ) j= 1 ;  J  increases,  and  hence  J  -P(a )log k  P (a ) u  increases.  This  37  implies w(a )  -w(a ) P ( a ^ ) l o g P (a ) u  i s fixed.  K  Now,  increases,  u  That i s , the i n f o r m a t i o n  value  < 0.5  and  increases.  assume t h a t b* i s i n B ( a ) , f o r seme j i n I ( a ^ ) , but  b*  J  i s not i n B ( a ) .  Thus w(a )  increases,  as  decreases,  J  K  and,  above  Thus, when b* indicates  that  sequence a^. overall  is  change should  not  in  is  B(a ),  value  Hence, P ( a ) ,  and  K  information  value  not b e n e f i c i a l to the  for  most  increases,  flow-  flow-sequences,  indicating  the  t h a t such a  be made.  l e t p' be the new  Again l e t z be  the decrease i n w(b*).  P(a ) , p the o l d P f a ^ ) , and K  p» < p, then -w«(a ) (p'log p») K  Thus, the i n f o r m a t i o n value  most  z.  w'(a^) = w ( a ) ~ z . K  i s l e s s than -w(a ) (p l o g p). K  decreases.  On the other hand, i f p' > p, the for  the  K  true  b* i s i n B f a ^ ) .  decreases  1  increases.  the change made was  information  Case 2-2:  not  If this  P t a ^ J b j ) = wta^J/wta *)  P (b. ) i n c r e a s e s .  thus the i n f o r m a t i o n value of a^,  If  s i n c e P(a^)  The  information  d i f f e r e n c e , p*-p,  value  still  must be  small,  s i n c e even l a r g e z i m p l i e s only small i n c r e a s e s i n P ( a | b j )  and  small  the  w  changes  information  And,  in  value  P(b),  j  in  I(a ).  w« (a ) (-p'log p')  iff  w(a ) [ (-p'log p»)-(-p l o g p) ] <  iff  w (a ) (log (p « /p?) )  K  < w ( a ) ( - p log p) , K  K  P  K  s i n c e f o r most z,  decreases.  is  observe t h a t  decreases  iff  ineguality  We  <  p')  z(p»log p ' ) .  log^p'^/p^j  satisfied.  z (p'log  That  is is,  approximately the  0,  information  this value  38  The u s e f u l n e s s of the i n f o r m a t i o n measure f o r s e l e c t i n g the most a p p r o p r i a t e implementation  o f an a l g o r i t h m i s thus e v i d e n t .  When the implementation  improvement,  information  i s an  the  corresponding  value decreases, and when t h i s i s not the case, the  i n f o r m a t i o n value i n c r e a s e s . Some o f the a p p l i c a t i o n s of such a measure are First,  i f the  probability  of  a  to  other  flow-seguences,  the a l g o r i t h m can be coded t o r e f l e c t t h i s flow, and hence  to decrease  execution c o s t s .  A l s o , when attempting  a program i n t o segments, a flow-sequence with can  given.  flow-seguence i s r e l a t i v e l y  l a r g e , but i t s weight i s comparable then  now  be  informative.  sequence has l i t t l e a major f a c t o r i n conditional  i n t e r a c t i o n with other parts of the program, a  segmentation  probabilities  subsequence appears r e f l e c t e d throughout illustrate  probability  Such p r o b a b i l i t y i n d i c a t e s that the flow-  on  problem.  the  Thirdly,  flow-sequences,  some flow-subsequence b, are c o n s i s t e n t l y  To  high  to p a r t i t i o n  i f the  r e l a t i v e to  low, then such a flow-  many times, and makinq i t more e f f i c i e n t  is  the program. the two purposes  the i n f o r m a t i o n measure i s  used f o r i n t h i s study, the f o l l o w i n g examples a r e i n c l u d e d .  3.7  Examples  3.7.0  The Form and Purpose of the Examples In  programs  order are  to  analyse  necessary  the  selected  algorithms,  certain  to produce and e v a l u a t e the output and  c a l c u l a t e the i n f o r m a t i o n value.  The language  in  which  these  39  routines  are  representing  written  is  the algorithms  task,  apply  the  indices  of  n  The  data  sets  are  numbers).  The  used as the  and  two  algorithms  hypothetical  to compute  defined  the  well  corresponding  ( i . e . a l l possible orderings  s o r t algorithms,  'heap' and  of n  •merge* [ 2 1 ] ,  task of p l a c i n g i n  In the f o l l o w i n g d i s c u s s i o n , the  five  will  below.  refer On  factorial  c e r t a i n of these data i n the examples  t h i s data s e t , the merge a l g o r i t h m  structuring Although  to  since no r e d u c t i o n  this  latter the  point  ordinarily  of  s i m p l i c i t y of  the c a l c u l a t i o n s ,  flow-sequences  implies  would  and  the  lack  As a c o n t r a s t , the heap algorithm  t h i s case, the  •shortcomings* of the structuring  and  of  code the  interaction  f o r no  noticahle  i s studied.  merge algorithm code  that  be b e n e f i c i a l ,  improvements.  both  given  i s not amenable to  among the nodes w i t h i n a flow-subsequence allow  hence,  120;  of flow-subsequences i s p o s s i b l e .  optimization  and  are  order  elements of the data set are a r b i t r a r i l y numbered from 1 to we  the the  i s chosen s i n c e s e v e r a l  p o s s i b l e c a n d i d a t e s f o r the  f i v e numbers.  the  programs  A.  f o r s o r t i n g e x i s t , and  well  the  s o r t i n g of n numbers ( e g u i v a l e n t l y  records)' i n t o order  documented algorithms  in  i n Appendix  measure, a task  are s e l e c t e d .  However,  are expressed  assembly language, described To  ALGOLW.  are  optimization  In  absent, can  be  trivial,  so  applied. The the a  task  evaluated  here i s o b v i o u s l y  somewhat  concept of s t r u c t u r i n g i s not as dominant as i t would be i n more  complex  situation.  For  a  task  involving  more  40  c a l c u l a t i o n s and d e c i s i o n s , the use of s u b r o u t i n e s as s t r u c t u r e s would  be  effective.  However,  the s i m p l i c i t y of the examples  still  allows the u s e f u l n e s s of t h i s measure to  both  as  a  cost  and s t r u c t u r a l complexity  be  illustrated,  s t a n d a r d , and as an  improvement on a measure based on execution c o s t s alone. When a n a l y s i n g an a l g o r i t h m , the standard implementation i s coded f i r s t . any  So at t h i s time, no attempt i s made  particular  area  of  the program.  to  optimize  Both the merge and heap  algorithms are implemented i n t h i s manner.  In  addition,  on t h e r e s u l t s o f an a n a l y s i s of the i n f o r m a t i o n v a l u e s , implementation the  o f the heap a l g o r i t h m i s g i v e n .  execution  costs  associated  with  based another  In the examples,  each implementation are  i n c l u d e d with the i n f o r m a t i o n v a l u e , so t h a t a comparison can be made with the c o n v e n t i o n a l measure. costs,  are  which  listed  as  are u n i t l e s s .  measure i s f i r s t  of  entries  i n e x e c u t i o n u n i t s , while the i n f o r m a t i o n value and  c o n d i t i o n a l entropy The  Table  two  applied to  the  problem  of  deciding  a l g o r i t h m s should be used f o r a given task.  The  i n f o r m a t i o n value p o i n t s t o the a l g o r i t h m which i s more s u i t a b l e for  the  selected,  situation. the  Moreover,  measure  once  further  the a l g o r i t h m analyses  i n f o r m a t i o n cn i t s o v e r a l l c o s t and s t r u c t u r e , where  attention  efficient  should  implementation  be  has  been  i t by  providing  which  indicates  focused i n order t o produce a more  of the a l g o r i t h m .  41  3.7.1  Comparison of Two  algorithms  f o r the Same Task  In t h i s s e c t i o n the f i r s t a p p l i c a t i o n discussed.  The  heap  algorithm,  amenable to s t r u c t u r i n g , while implementations implementation  are  compared  resulting  the o r d e r i n g s compared  merge, and  costs),  Table  the  I).  costs  are  are  very  informative,  The  implementation  (optimization  of  of  given.  of are  high  close. the two i.e.  or  very  Also,  the  Thus  the  algorithms,  given  proper  a  single  Using  the measure  c o s t , i t i s u n l i k e l y t h a t the heap algorithm further  structural  a flow-subseguence), reduces the  average c o s t of heap below t h a t of merge.  application.  the  the o v e r a l l cost o f heap, f o r t h i s s i t u a t i o n , would  be l e s s c o s t l y .  given  an  study.  be used to evaluate  i n d i c a t i n g t h a t heap i s more  been  standard  B-II,  r e l a t i v e l y comparable.  can  average  The  consequently  ( i . e . those with very  information  improvement  is  Merge i s seen to be cheaper; however,  c o s t per node  structuring  i s not.  the average c o s t s of the algorithms  number of nodes and value  (M),  from a s t r u c t u r i n g which improves  removing the extreme cases low  measure  example, assuming the r e l a t i v e f r e g u e n c i e s  are equal,  (see  the  (H-I), as mentioned above, i s  e f f i c i e n c y of heap, i s i n c l u d e d i n the In the f i r s t  of  consideration  for  this  of  would have particular  In the next s e c t i o n a f u r t h e r a n a l y s i s of heap i s  42  Av  Ala  info Val  Cost  merge  198.75  35. 195  heap-I  228.31  38.365  heap-II  198. 18  37.608  Table I : Example 1  The  next  resulting  two examples i n v o l v e  output  useful i f a probable.  from  the  particular In  such  two  datum  particular  data  algorithms. or  type  of  and  their  Such a n a l y s i s i s datum  is  highly  a case, the a l g o r i t h m that performs i n the  best way f o r the given datum i s the one s e l e c t e d .  Datum  IV H-I  IV W  M cost  H-I c o s t  H-II c o s t  45  31.495  34.940  200.1  206.5  202.7  69  33.480  33.328  202.9  205.9  201.9  44  40.420  40.075  199.5  200.6  195.7  Table I I : Examples 2 and 3  As shown i n Table I I i n the e n t r y f o r datum 45, the value  of  merge  structuring algorithm H-II  and  is  higher,  optimizing  w i l l perform  confirm  this.  indicating i t is  t h a t even with  unlikely  b e t t e r than merge.  information  that  the  proper heap  The c o s t s of H-I and  43  For each of data 69 and 44 the i n f o r m a t i o n two  algorithms  higher.  are  relatively  close,  and  heap  of  H-II  3.7.2  support  this  the  is slightly  T h i s i n d i c a t e s t h a t o p t i m i z a t i o n methods should  the c o s t of the heap a l g o r i t h m . H-I  but  values  reduce  Again the c o s t s a s s o c i a t e d with  observation.  A n a l y s i s of Heap Algorithm In  the  expressed  as  separated  remaining sequences  by  marked  of  dashes;  subseguences which are Execution  examples,  the  numbers,  parentheses eliminated  an  corresponding indicate  in  over each datum d generates  with  flow-subsequences  the  tc  nodes,  repeated  reduction  flow-  process.  f i v e flow-sequences;  a s t e r i s k are isomorphic  are  to o t h e r s i n the  those list.  Only the s e t of non-isomorphic flow-sequences are considered computing the i n f o r m a t i o n value. and  The  l i s t s of flow-subsequences  flow-sequences are given i n Appendix B.  nodes are omitted, to these The  following  tc  example as a means  structure.  a l g o r i t h m are f a i r l y  illustrates of  Within  close.  amenable  do a l s o . to  node  The or  the  partitioning  are  applied  use  of  the  data  set  the  • p a r t i t i o n s ' , the  improvements  implementation  of  When the c o n d i t i o n a l e n t r o p i e s  d i f f e r , t h i s i n d i c a t e s t h a t the corresponding outputs  initialization  s i n c e no o p t i m i z a t i o n techniques  i n c o s t r e s u l t i n g from a given change i n the the  The  nodes.  c o n d i t i o n a l entropy according  in  s t r u c t u r e s of  s m a l l e r the c o n d i t i o n a l entropy, flow-subsequence  improvements  the  the more are  the  44  sections  o f the a l g o r i t h m a s s o c i a t e d with t h i s datum.  Consider  the data p a i r s shown i n Table I I I .  Datum  Info V a l  Cond Ent  H-J. c o s t  H-II c o s t  |I-II|  33  56.325  1.374  229.1  223.1  6.0  36  54.939  1.731  219.0  214.2  4.8  19  42.802  2.095  218.4  213. 4  5.0  20  44.188  1.731  228.5  222.3  6.2  Table I I I : Example 4  The output flow-seguences f o r the above data a r e : (33) 1*  1-3  7- 11-16-21  27-28  1  1-3  7- 11 -16-21  (7-11-16-21)  2  2-4 -3  7- 11-16-21  9-15-23  2*  2-4 -3  7- 11-16-21  27-28  2*  2-4 -3  9- 15-23  27-28  (36) 1  1-3  7- 11 -16-21  27-28  2  1-3  7- 11 -16-21  5-12-19-25  3  2-4- 3  7- 11-16-21  9-15-23  3*  2-4- 3  7- 11-16-21  27-28  3*  2-4- 3  9- 15-23  27-28  27  27- 28  45  (19) 1  1-3  7- 11 -16-21  27-28  2  1-3  7- 1 1-16-21  5-12-19-25  3  2-4-3  7- 11-16-21  9-15-23  4  2-4-3  8- 13-17-22  27-28  3*  2-4-3  9- 15-23  27- 28  27-28  (20) 1*  1-3  7- 11-16-21  27-28  1  1-3  7- 11 -16-21  (7-11-16-21)  2  2-4-3  7- 11-16-21  9-15-23  3  2-4-3  8- 13-17-22  27-28  2*  2-4-3  9- 15-23  27  27- 28  27-28  The i n f o r m a t i o n value of (33) i s s l i g h t l y higher than of  (36)  due  to  the  simpler  s t r u c t u r e and more c o s t l y  seguences of (33). T h i s same o b s e r v a t i o n holds f o r p a i r of d a t a , with both  has t h e s i m p l e r s t r u c t u r e  second  in  In  i n d i c a t e s which datum  i t s output.  An  implementation  r e f l e c t s such a s t r u c t u r e y i e l d s g r e a t e r c o s t s a v i n g s f o r  the datum with the lower c o n d i t i o n a l entropy. is  the  flow-  (20) having the higher i n f o r m a t i o n value.  p a i r s , the lower c o n d i t i o n a l entropy  which  that  useful  the form  analysis  i f i t i s known that a high p o r t i o n of the data i s of  of e i t h e r  (19) or (20) say, and i t i s d e s i r a b l e t o know  whether o p t i m i z a t i o n methods followed  Such an  by  (19)  or  should  be  applied  to  the  path  by (20). T h i s measure i n d i c a t e s that by  c o n c e n t r a t i n g on (20) more o v e r a l l savings can be obtained.  46  In the next example, the n o t i o n of p a r t i t i o n and the e f f e c t of c o s t s on the measure are d i s c u s s e d . of  both  (3)  and  information  values  structures  are  (5)  are  differ.  similar,  close  The c o n d i t i o n a l  (see Table  From t h i s , i t i s but  that  I V ) , but t h e i r  known  (3) e i t h e r  entropy  that  their  has more c o s t l y  flow-subsequences, or has flow-sequences which can be coded more efficiently example  to  reflect  illustrates  value ranks  the  i . e . the higher  their  that  data  unique  within  according  Info I s i  structures.  •partitions' to  the c o s t , the higher  Datum  flow  the  corresponding  H-I c o s t  3  53. 334  1.610  228.5  5  35.287  1.611  204.7  Table IV: Example 5  The a s s o c i a t e d flow-sequences a r e :  (3) 1  1-3  8-13-17-22  27-28  2  1-3  7-11-16-21  (7-11-16-21)  3  2-4-3  7-11-16-21  9-15-23  3*  2-4-3  7- 11-16-21  27-28  3*  2-4-3  9-15-23  27-28  the i n f o r m a t i o n  the i n f o r m a t i o n  Cond Ent  27-28  27-28  This  value.  costs  47  (5) 1  1-3  8-13-17-22  27-28  2  1-3  7-11-16-21  (7-11-16-21)  3  2-4-3  7-11-16-21  10-18-26  4  2-4-3  7-11-16-21  27-28  3*  2-4-3  10-18-26  The  t h i r d example of t h i s s e c t i o n  27-28  (see Table V) shows that  i f a datum has a much higher i n f o r m a t i o n value than another, and i t s c o n d i t i o n a l entropy i s lower, then t h i s datum, c o n s i s t i n g of more c o s t l y flow-subsequences, node  or  flow-subsequence  more  informative  of  improvement.  equally  probable  as  well  as  i s more amenable to  Thus, by s e l e c t i n g the data,  and  applying  optimizing  technigues t o i t s flow path, g r e a t e r c o s t r e d u c t i o n s  r e s u l t than  i f such  Datum  processes were a p p l i e d to the other datum.  Info V a l  Cond Ent  H-I c o s t  H-II c o s t  |I-II|  11  54.291  1.887  216.3  21 1. 6  4.7  12  67.106  1.715  228.2  222.3  5.9  Table V: Example 6  48  The  corresponding  flow-seguences are l i s t e d  below.  (11) 1*  1-3  7-11-16-21  27-28  1  1-3  7-11-16-21  8-13-17-22  2  2-4-3  7-11-16-21  9-15-23  2*  2-4-3  7-11-16-21  27-28  3  2-4-3  10-18-26  27-28 27-28  (12) 1*  1-3  7-11-16-21  27-28  1  1-3  7-11-16-21  8-13-17-22  2  2-4-3  7-11-16-21  9-15-23  2*  2-4-3  7- 11- 16-21  27-28  2*  2-4-3  9-15-23  In  this  structure,  27-28  27-28  instance lacking  27-28  (12)  the  is  extra  seen  to  have  flow-sequence  the  simpler  10-18-26  which  appears i n (11). Here flow-subsequence 27-28 was optimized. In the l a s t example a case where the c o n d i t i o n a l are approximately outputs  corresponding  structure. reflected  the same i s c o n s i d e r e d . to  the  two  data  entropies  T h i s i m p l i e s t h a t the have n e a r l y the same  Sample (49) has a somewhat c o s t l i e r  output,  i n the i n f o r m a t i o n value given i n Table VI.  a  fact  However,  s i n c e the c o n d i t i o n a l e n t r o p i e s i n d i c a t e s i m i l a r s t r u c t u r e s , the marginal  difference  in  costs  will  not  cause  marked  49  dissimilarities  in  the  cost  savings.  sequences o f e i t h e r datum can  e f f i c i e n c y , and the r e s u l t i n q c o s t decreases f o r both cases  will  the same.  Due to the s i m p l i c i t y of the sample  the flow-subsequences of however,  this  is  entropies  be the same.  to  flowthe  nearly  structured  i s , the increase  be  be  That  not  (49)  and  (57)  are  fairly  task,  similar;  necessary i n order t h a t the c o n d i t i o n a l  Cond Ent  H-I c o s t  H-II c o s t  |I-III  Datum  I n f o Val  49  36.686  1.820  198. 1  193. 1  5.0  57  35.450  1.828  195.1  189.7  5.4  Table V I : Example 7  The  flow-sequences a s s o c i a t e d  with the t a b l e e n t r i e s  below.  (49) 1  1-3  7-11-16-21  27-28  2  1-3  5-12-19-25  3  2-4-3  7-11-16-21  9-15-23  4  2-4-3  8- 13-17-22  27-28  3*  2-4-3  9-15-23  27-28  27-28  are  given  50  (57) 1*  1-3  8-13-17-22  27-28  1  1-3  8-13-17-22  27-28  2  2-4-3  7- 1 1-16-21  3  2-4-3  8-13-17-22  4  2-4-3  9- 15-23  10-18-26 27-28  27-28  3.7.3 Summary Cf The Examples The  above  examples  are  intended  to demonstarate how the  i n f o r m a t i o n value, together with the c o n d i t i o n a l entropy, used t o a i d i n the a n a l y s i s of an a l g o r i t h m . point to expensive but  as  well,  data  can  ( i . e . c o s t l y f o r algorithms  indicate  should be c o n s i d e r e d f o r code obtain  the  maximum  The  cost  can be  measure  can  to execute),  which paths through the a l g o r i t h m optimization  saving.  In  in  an  attempt  to  more complex examples,  improvements on nodes or flow-sequences r a t h e r than  just  flow-  subsequences would be i n order. The complexity  advantages  of  using  the  information  measure are evident from these  examples.  value  as  a  51  3.8  Summary In t h i s t h e s i s , an attempt was  structural  complexity  made to d e f i n e  measure f o r an a l g o r i t h m .  a  cost  To  accomplish  t h i s , we d e f i n e d an i n f o r m a t i o n t h e o r e t i c model of the of an a l g o r i t h m , i n which the i n p u t i s a s e t the output the  of  execution  subwalks,  and  c e r t a i n walks, of a graph t h e o r e t i c r e p r e s e n t a t i o n of  algorithm.  Cost  is  included  in  the  d e f i n i t i o n of a c o s t p r o b a b i l i t y scheme, and the concepts o f r e d u c t i o n and for  and  each  isomorphism.  conventional  all  the  information  the  through value  i s calculated.  measure of c o s t alone does.  s t r u c t u r a l information  structure  An i n f o r m a t i o n  implementation of the a l g o r i t h m  shown that t h i s value p r o v i d e s  model through  It is  that  the  Moreover, i t presents  which i n d i c a t e s the amount of  interaction  between program s e c t i o n s , and  p o i n t s to dominant,  repeated  independent  and  similarities.  Under  the  flow  patterns,  assumption  information  value  of  comparable  points  minimum  value.  The  analysis  cost  implementation  an algorithm  More g e n e r a l l y , t h i s study using  information  costs,  i . e . analysing a  appropriateness  of  structural  the  t o a s t r u c t u r a l l y optimum  when the s t r u c t u r e i s f i x e d , the  to  of  has  to  maximum algorithm;  given  algorithm,  has the s m a l l e s t  information  these  considerations  in  the  been demonstrated i n Chapter I I I .  has demonstrated the  theory  and  measure  the  feasibility  complexity  of  of an  algorithm. We treated  conclude with some suggestions here.  First,  the  f o r improving  introduction  of  more  the  model  structural  52  parameters may isomorphism  improve  have  the  model.  Presently,  proved b e n e f i c i a l i n e v a l u a t i n g an  however, a r e d e f i n i t i o n or expansion of these informative  measure.  The  Without  the  valuable  measure  when using the appropriate The  algorithm; more  model o b t a i n s the i n f o r m a t i o n  value  less  yield  the  conditional  i n c l u s i o n of these weights, the measure  becomes much more r e s p o n s i v e s e n s i t i v e to c o s t s .  may  and  a  from the weighted average of the entropy entropy.  reduction  to s t r u c t u r a l i n f o r m a t i o n , and  In c e r t a i n i n s t a n c e s t h i s may than the one  yield a  d e f i n e d i n t h i s study.  'unweighted' measure, i s the f a c t t h a t  implementation  has  less more  Of note, the  most  the l a r g e s t i n f o r m a t i o n  value.  i n c l u s i o n of other parameters i n the model might a l s o  prove  u s e f u l , but t h i s would i n c r e a s e the c o s t of a p p l y i n g the measure when  this  c o s t may  a l r e a d y seem p r o h i b i t i v e .  most s t u d i e s concerned with the complexity analysis the  is  of  However, as algorithms,  based on the f o l l o w i n g assumption. algorithms  computing i s c e n t r a l to some process t h a t i s t o be repeated  many  instance  under  i n a business  which must be c a l c u l a t e d d a i l y ) . complexity relative  analysis  for  such  consideration  task which be  (for  or  this  will  times  algorithm  The  with  a p p l i c a t i o n , some procedure  Thus, the c o s t i n performing a  task  may  well  be  neglible  to the o v e r a l l savings i n c u r r e d through the use  a p p r o p r i a t e a l g o r i t h m and  i t s optimal  implementation.  a  of the  53  BIBLIOGRAPHY 1.  AHO, A, and ULLMAN, J . , O p t i m i z a t i o n of S t r a i g h t Programs, SIAM J o u r n a l Comju y, (1972) pp. 1-19  Line  2.  ALEKSEEV, V.E. , S o r t i n g Algorithms C y b e r n e t i c s ^ 5, (1969), pp. 642-648.  3.  ALLEN, F.E., C o n t r o l Flow A n a l y s i s , Proc. of a Symposium on Compiler O p t i m i z a t i o n , SIGPLAN N o t i c e s , ACM, New York, J u l y "1970, pp. 1-19.  4.  ALLEN, F. E. , Program O p t i m i z a t i o n , i n : Annual Review i n Automatic Programming VoJU 5, Pergamon P r e s s , New York,  with Minimum Memory,  7969. 5.  ASH, R. 1965.  Information Theory,  Wiley I n t e r s c i e n c e ,  6.  BACHMAN, P., A Contribution to the Problems O p t i m i z a t i o n of Programs, Inform. P r o c e s s i n g 2.11 H o l l a n d , Amsterdam, 1972, p p 7 ~ 3 9 7 - 4 0 l 7  7.  BAKU, S. On Reduction of Program Schemes, SIAM J . Comp.. 16, (1968) pp. 328-339.  8.  BERZTISS, A., A Note on Segmentation of Computer Programs, Inform.. C o n t r o l 12, (1968) pp. 21-22.  9.  BEUS, H., The Use of Information i n (1970), pp. 482-495.  Sorting  New  York, of the North-  JAJUC.M..  17,  10.  COCKE, J., Global Common Subexpression Elimination, 2£oceedings of a Symp^ on Compiler O p t i m i z a t i o n ^ SIGPLAN N o t i c e s , " A C M , New~York, J u l y " 1 9 7 0 7 ~ p p . 20-24*7  11.  FSAZER, W.E., A n a l y s i s o f Combinatory Algorithms - A Sample of Current Methodology, AFIPS Coirf. Proceedings, S p r i n g J o i n t Computer Conference AFIPS P r e s s , Montvale, N.J., 19727 pp. 4 83-491  12.  GALLAGER, R., Information Theory and R e l i a b l e Communication, Wiley, New York, 1968.  13.  GREEN, C , A Path Entropy Function Jj,A^C^M^. 20, (1973), pp. 378-384.  14.  HARARY, F., Grap_h Theory, 1969.  15.  HARTMANIS, J . , Computational Stored Program Machines, Math_. pp. 232-245.  16.  HOPKINS,  M.,  An  For  Addison-Wesley,  Optimizing  Rooted Reading  Trees, Mass.,  Complexity of Random Access Systems Theory 5, (1971) Compiler  Design,  Inform.  54  P r o c e s s i n g * 7 J , North-Holland, 396.  Amsterdam,  1972, pp. 391-  17.  IANOV, I . , On the Equivalence and T r a n s f o r m a t i o n o f Program Schemes, Comm. ,&.C.g. J[, (1958), pp. 8-12.  18.  IANOV, I . , On Matrix (1958), pp. 3-6.  19.  KARP, R., A Note on the A p p l i c a t i o n of Graph Theory to Digital Programming, Inform. C o n t r o l 3, (1960), pp. 179190.  20.  KARREMAN, G., T o p o l o g i c a l Information Content and Chemical R e a c t i o n s , B u l l . Math.. B i o ^ h j s ^ 17 (1955), pp. 279-285.  21.  KNUTH, D., S o r t i n g and Searching!, The A r t o f Computer Programmingj. V o l . 3, Addison-Wesley, Reading, Mass., 1973.  22.  KRIDER, L., A Flow A n a l y s i s Algorithm, J.A.C.E1. J 1 , (1964), pp. 429-436.  23.  MOWSHOWITZ, A., Entropy and t h e Complexity of Graphs, D o c t o r a l D i s s e r t a t i o n , U n i v e r s i t y of Michigan, 1967.  24.  NIEVERGELT, J . , On the Automatic S i m p l i f i c a t i o n of Computer Programs, Comju jUC. EU 8, (1965), pp. 366-370.  25.  PATERSON, M., Program Schemata i n : Machine I n t e l l i g e n c e V o l . 3, (D. Michie, e d i t o r ) , American E l s e v i e r , New York, 1968."  26.  PICARD, C., P a r i s , 1965.  27.  RASBEVSKY, N., L i f e , Information Theory, and B u l l i Hath. Biopjrys^ 17 (1955), pp. 229-235.  28.  RUTLEDGE, J. , On (1964) , pp. 1-9.  29.  SCHORMANN, A., The A p p l i c a t i o n o f Graphs t o the A n a l y s i s of D i s t r i b u t i o n of Loops i n a Program, Inform. C o n t r o l 7, pp. 275-282.  30.  TRUCCO, E., A Note on the Information Content of Graphs, B u l l i Math. Bio^hys^ 1.8 (1956), pp. 129-135.  31.  MARSHALL, S., On Computational Cost, i n : Annual Reyiuew i n Automatic Programming V o l . 5, Pergamon Press, New York, 19697"  32.  WOODGER, M., On I n f o processing pp."402-407. i  Program  Schemes,  T h e o r i e des Q u e s t i o n n a i r e s ,  Ianov's  C O M . 1 ^ , ^ . 1,  Gauthier-Villars, Topology,  Program Schemata, J.A.C M. A  IJ,  Semantic L e v e l s i n Programming, i n : Ul, North-Holland, Amsterdam, 1972,  55  appendix A Here we assembly  give  language  instructions  are  the  instruction  and  the  hypothetical 1  very  CDC  of  i n which the a l g o r i t h m s a r e • w r i t t e n . These basic  independent. The execution 10,  set  to  insure  they  remain  machine  c o s t s are based on the MIX [ 2 1 ] ,  assembler  language  timings.  The  PDP-  following  assumptions about t h i s language are made. (1)  there a r e 8 r e g i s t e r s ,  which are i n f a s t memory and can be  used f o r i n d e x i n g . (2)  t h e r e i s one accumulator.  (3)  input and output  are i g n o r e d .  The c o s t s l i s t e d are i n execution are  in  effect  i f indexing  is  units;  costs  capital letters refer to  letters  to  registers.  register  x; Acc r e f e r s to the accumulator.  is  parentheses  used. In t h e statement of the  instruction,  C(x)  in  the  INSTRUCTION  memory  locations,  contents  of l o c a t i o n or  COST  JUHP-  u n c o n d i t i o n a l jump  JUMPE  jump i f Acc = 0  JUMPLE  jump i f Acc < 0  JUMPL  jump i f Acc < 0  JUMPG  jump i f Acc> 0  JUMPGE  jump i f Acc > 0  JOMPNE  jump i f Acc = 0  small  1.2(1.3)  SETZM  A  set  C(A) to 0  1.7,(1.8)  SETZR  »i  set  C(i)  to 0  1.0(1. 1)  SETR  i  set  C(j)  to C(i)  1.4  SE TI R  i# n  set  C{i)  to n  1.0 (1. 1)  SETMR  A,i  set  C (i) t o C(A)  1.6(1.8)  SETRM  A,i  set  C(A) to C(i)  1 .8 (1.9)  SETNM  A  set  C (A) to -C (A)  1.7 (1.9)  set  C (i) to -C(i)  1.5(1.7)  i  SETUR  i  ADD' R  i# j  add C (i) to C(j)  1.6(1.8)  SUB I J HR  A, n  add n t o C(i)  1.2 (1.3)  A, i  add C(A) to C(i)  2.2 (2.3)  d i v i d e Acc by n  11.3  C(A) - 0; s e t f l a g  1.7 (1.9)  C (i) - 0; s e t f l a g  1.5(1.7)  C(i)  1.6 (1.8)  DIvT  CMPZM  ,n  A  CHPZR  #1  CMPR  i» j  CM PI CMPH  BLT  A  n  - C(j) ; s e t f l a g  C(Acc) - n; s e t f l a g  1.2 (1.3)  C(Acc) - C (A);  1.9(2.0)  set flag  move n contiguous words o f memory  0.8+(2.1)  57  Appendix B In o r d e r this  to i l l u s t r a t e  thesis,  we w i l l  [21].  algorithm  the  of  where t h e l a t t e r 3.3).  fleapsort:  A R~  where 4- 4-'  •on  R  e t c  »»  a  in  a n a l y s i s o f H, t h e heap  the algorithm  i n i t s textual  t h e g r a p h o f H.  n  d  Finally  we  Then,  obtain  the flow-subsequences  i slisted  i n family  a  for  1 < [ j/2]  the g r e a t e s t this implies  form.  so  that  N  the  numeric  flow-sequences,  (see  procedure  in  i s a 'heap* i f  < j <N,  integer that  and  notation  o f numbers R^,R ,... ,R  > R^  [x] i s  i n x. T h u s , R^ >B , R^>R , X  the largest  number  3  appears  top of the heap , 1  R If  file  M  introduced  programs have u s e o f a n u m e r i c r e p r e s e n t a t i o n , t h e  representation  fi  give  are established.  section  a partial  we p r o d u c e G (H),  analysis  sets  present  We f i r s t  Based on t h i s ,  some o f t h e c o n c e p t s  an  = max(R B ,... , R J .  L  l #  i  a r b i t r a r y input  file  i s transformed  down* s e l e c t i o n p r o c e d u r e c a n be used  Algorithm sorting  H. Numbers R^,...,R i s complete,  they  N  are  will  i n t o a heap, a  to sort.  rearranged  be i n o r d e r .  First  that  after  the f i l e i s  so that  repeatedly  removed and t r a n s f e r r e d t o i t s p r o p e r f i n a l N > 2.  i t f o r m s a heap, t h e n  so  rearranged  Assume t h a t  'top-  t h e t o p o f t h e heap  is  position.  58  B1 . [ ' I n i t i a l i z e ] H2.[Decrease  d  d>1,  the  file  is  H2b.  Set or  file  already  makes r = 1 ,  R .^> and R  is  K  H4.[Advance  H5b:  j  H6.[Larger  H7.[Move  B8.[Store  go t o  to  j  than R]  it  up.]  R. ]  algorithm  Set  d-1,  a heap;  R t o R^,  r  and  if  B to  d=1  Set  to  j  d.  we h a v e H6;  If 1.  If  R>R- ,  R^  to  to  initiated  in  to i  <  for  j  this  B . d  (If  then  the  r  R^ ,  go  tc  Rj ,  and  R.  (This  step  if  we  this  have  j If  N.  to j<r,  go t o  2j. go t o  (In  the  step  H5;  H8  then  +L  then  r-1;  point  < k <  and  j>r,  to  r;  = [j/2].)  and i f  R^<  (At  < j  position  i  and r  r  terminate.  plus  R^  d to  R and  Set  step  Set  R ,  d < [j/2]  son.]  1  set  made i n t o  to  final  steps  'larger  set  R  R^ t o  its  d>1,  r t o i . .  heap.)  downward.]  j=r,  B5.[Find  a  for  in  following if  being  'sift-up']  B-  r  If  set  set  for  [ H/2 ] + 1 ,  r.] is  Otherwise  H3.[Prepare  d to  step  go b a c k  H3.)  H8.  to  step  terminates Return  to  H*».  the step  *sift-up« H2.  59  60  2 2.1 I M S 1  P•  X-  2  1,2 [ t = 1 0 ] 3  3  5,6,7,8,9,10,27  4  11,12  5  15,16,17,18,19,20  6  21,22,23  7  24,25,26,28  8  13,14  9  4  Flow-Subseguences  1- 3 8- 13-17-22 27-28 7-11-16-21 10-18-26 2- 4-3 9 - 15-23 6-14-20-24 5-12-19-25  61  F a m i l i e s o f Flow-Seguences  F^:  2-4-3  7-11-16-21  2-4-3  10- 18-26  Costs  10-18-26  44.7 24.4  i  F : A  F : 3  F. :  F : s  F :  1-3  7-11-16-21  8-13-17-22  1-3  7-11-16-21  27-28  37.2  1-3  8-13-17-22  27-28  36.6  1-3  7-11-16-21  6-14-20-24  46.8  1- 3  6-14-20-24  7-11-16-21  9-15-23  2- 4-3  7-11-16-21  27-28  2-4-3  9-15-23  7-11-16-21  1-3  5-12-19-25  2-4-3  56.6  26.5  2-4-3  1-3  27-28  27-28  8-13-17-22  5-12-19-25  27-28  56.6 41.5 36.3  47.4 27.1  27-28  40.9  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051761/manifest

Comment

Related Items