UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The translation of programming languages through the use of a graph transformation language Van den Bosch, Peter Nico 1981

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1981_A1 V36_3.pdf [ 9.26MB ]
Metadata
JSON: 831-1.0051767.json
JSON-LD: 831-1.0051767-ld.json
RDF/XML (Pretty): 831-1.0051767-rdf.xml
RDF/JSON: 831-1.0051767-rdf.json
Turtle: 831-1.0051767-turtle.txt
N-Triples: 831-1.0051767-rdf-ntriples.txt
Original Record: 831-1.0051767-source.json
Full Text
831-1.0051767-fulltext.txt
Citation
831-1.0051767.ris

Full Text

c l THE TRANSLATION OF PROGRAMMING LANGUAGES THROUGH THE USE OF A GRAPH TRANSFORMATION LANGUAGE B . S c , The U n i v e r s i t y of B r i t i s h Columbia, 1972 M.Sc, The U n i v e r s i t y of B r i t i s h Columbia, 1974 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY THE FACULTY OF GRADUATE STUDIES (Department of Computer Science) We accept t h i s t h e s i s as conforming to the r e q u i r e d standard. THE UNIVERSITY OF BRITISH COLUMBIA February, 1981 (c) Peter Nico van den Bosch, 1981 by NICO VAN DEN BOSCH i n In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree a t the U n i v e r s i t y o f B r i t i s h Columbia, I agree t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and study. I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the head of my department o r by h i s o r her r e p r e s e n t a t i v e s . I t i s understood t h a t c o p y i n g or p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be allowed without my w r i t t e n p e r m i s s i o n . Department of C o > p v A t r 0, The U n i v e r s i t y o f B r i t i s h Columbia 2075 Wesbrook P l a c e Vancouver, Canada V6T 1W5 D a t e Zl ^\\ W&\ A b s t r a c t I t i s shown that the automated t r a n s l a t i o n of programming languages s u f f e r s from i t s t r a d i t i o n a l domination by context f r e e p a r s i n g techniques, s p e c i f i c a l l y i n f a i l i n g to d e a l u n i f o r m l y with such t r a n s l a t i o n - r e l a t e d concerns as language e x t e n s i o n , o p t i m i z a t i o n , e r r o r h a n d l i n g and r e p o r t i n g , and m u l t i - s t a g e t r a n s l a t i o n , as w e l l as i n g e n e r a l l y ad hoc treatment of the context s e n s i t i v e aspects of t r a n s l a t i o n , p a r t i c u l a r l y those concerned with the i d e n t i f i c a t i o n of symbols. A d e s c r i p t i v e technique f o r d i s c u s s i n g t r a n s l a t i o n t h a t takes i n t o account not onl y the immense and continued success of the s y n t a c t i c b a s i s f o r programming language t r a n s l a t i o n but a l s o the need to d e a l u n i f o r m l y with the above concerns i s presented, and demonstrated to be implementable as a programming t o o l . T h i s demonstration i s e f f e c t e d both by means of a d e t a i l e d d i s c u s s i o n of the technique's a p p l i c a t i o n to the e x p r e s s i o n of t r a n s l a t i o n a l g o r i t h m s , and by a c o n s i d e r a t i o n of the p r a c t i c a l aspects of i t s implementation as a programming language. The technique i n v o l v e s a r e p r e s e n t a t i o n of the complete s y n t a c t i c s t r u c t u r e of programs as a d i r e c t e d graph, and the ex p r e s s i o n of t r a n s l a t i o n s as l o c a l t r a n s f o r m a t i o n s of the graph r e p r e s e n t a t i o n . A wide range of t r a n s l a t i o n concerns i s d i s c u s s e d with r e f e r e n c e to graph t r a n s f o r m a t i o n . P r a c t i c a l experience with an experimental v e r s i o n of a graph t r a n s f o r m a t i o n language i s presented, and used as the b a s i s f o r a f u r t h e r development i n the de s i g n . An e v a l u a t i o n of the completed r e s e a r c h , and an assessment of i t s p o s i t i o n w i t h i n concurrent developments i n the d i s c i p l i n e of programming language t r a n s l a t i o n , conclude t h i s d i s s e r t a t i o n . Table of Contents A b s t r a c t i i Acknowledgements i v 1 I n t r o d u c t i o n 1 2 T r a n s l a t i o n and graph t r a n s f o r m a t i o n 10 2.1 S y n t a c t i c s t r u c t u r e and t r a n s l a t i o n 11 2.1.1 The impact of BNF 14 2.1.2 The problems with context f r e e syntax 25 2.2 Representing s y n t a c t i c s t r u c t u r e as a graph 38 2.2.1 Representing context f r e e syntax 39 2.2.2 A b s t r a c t syntax and a b s t r a c t syntax t r e e s 2.2.3 Representing programming languages as graphs 2.3 Graph t r a n s f o r m a t i o n s 48 3 The e x p r e s s i o n of t r a n s l a t i o n a l g o r i t h m s 52 3.1 T r a n s l a t i o n 65 3.1.1 Language extension 66 3.1.2 Language-to-language t r a n s l a t i o n 71 3.1.3 T r a n s l a t i n g in phases 83 3.2 O p t i m i z a t i o n 88 3.2.1 Source code o p t i m i z a t i o n s 90 3.2.2 Flowgraphs 98 3.2.3 Object code o p t i m i z a t i o n s 101 3.3 E r r o r handling 106 3.4 Automatic program development and improvement 11 4 Toward a t r a n s l a t o r w r i t i n g system 120 4.1 O v e r a l l s t r u c t u r e of the system 122 4.2 Toward a graph transducer language 130 4.2.1 D e s c r i b i n g graphs in programs 131 4.2.2 Graph p a t t e r n matching 139 4.2.3 Graph t r a n s f o r m a t i o n 147 4.3 Using graph transducers in t r a n s l a t i o n s 150 4.4 Experience with a graph transducer language 159 4.4.1 A Pascal-S compiler 164 4.4.2 A SASL compiler 170 5 E v a l u a t i o n and c o n c l u s i o n s 174 Appendix A SASL compiler 190 A: A SASL subset 191 B: Old f o r m u l a t i o n 193 C: New f o r m u l a t i o n 200 References 206 i v Acknowledgements T h i s page, the l a s t w r i t t e n (and the only one not su p e r v i s e d by anyone but the a u t h o r ) , i s the most p l e a s u r a b l e to w r i t e . Only here, at l a s t , i s i t p o s s i b l e to give a measure of p u b l i c thanks to those who have made the pr o d u c t i o n of t h i s t h e s i s p o s s i b l e and f r e q u e n t l y e n j o y a b l e . Harvey Abramson, my re s e a r c h s u p e r v i s o r , who has watched over t h i s t h e s i s t a k i n g shape from i t s e a r l i e s t moments; who not only understood what I was up to from the beginning, but was f r e q u e n t l y a step ahead of me in seeing what c o u l d be done with i t . David K i r k p a t r i c k , Alan Mackworth, Bary P o l l a c k , Gunther Schrack, and Mabo I t o , members at v a r i o u s times of my guidance committee, a l l of whom somehow found time to read some or a l l of three d r a f t s of a t h e s i s p r o p o s a l and three d r a f t s of t h i s t h e s i s (each longer than i t s p r e d e c e s s o r ) , and to b r i n g such i n t e r e s t and enthusiasm to the task as to have c o n t r i b u t e d s u b s t a n t i a l l y to i t s form and cont e n t . Peter Lawrence and John Peck, the u n i v e r s i t y examiners, and A.V. Aho, the e x t e r n a l examiner, f o r the time and i n t e r e s t they have been w i l l i n g to put i n t o t h e i r t a s k s . I t was in John Peck's survey course on programming languages and two subsequent summers of work on h i s A l g o l 68 compiler that I f i r s t a c q u i r e d my t a s t e f o r programming languages. John Baker, who sup e r v i s e d my master's t h e s i s and thereby taught me how to wr i t e theses (and how to play Go). The people of Compyromatic Systems, e s p e c i a l l y A l Fowler and Fred Wong, fo r a s t i m u l a t i n g - - f r e q u e n t l y c 1 i f f h a n g i n g - - two years , d u r i n g which the b a s i c ideas presented i n t h i s t h e s i s f i r s t emerged. The f a c u l t y , s t a f f , and graduate students of the Computer Science department, e s p e c i a l l y M i c h ael G o r l i c k , Vince Manis, Rob Cameron, Mark Scott Johnson, B r i a n Jones, Ross F r a z e r , Graeme H i r s t , Tom Rushworth, and Heather Johnson. There are too many I have had to leave out: these are the ones who most r e a d i l y come to mind when I think back on these four years, the ones with whom now uncountable hours have been spent d i s c u s s i n g e v e r y t h i n g from the nature of compiler w r i t i n g to the mer i t s of the a r t s i n the t w e n t i e t h century. The Department of Computer Science, the N a t i o n a l Science and E n g i n e e r i n g Research C o u n c i l of Canada, and the H.R. MacMillan Family, f o r f i n a n c i a l support. F i n a l l y , but f i r s t always i n my heart, my pa r e n t s . We have come through much together, and have grown to be l i k e f r i e n d s . V "Nature i s d e l i g h t e d with transmutations" Isaac Newton, O p t i c s 1 Chapter 1: I n t r o d u c t i o n The popular conception of a computer programmer i s someone who t o g g l e s mysterious switches on a great c o n t r o l panel, s e t s marvellous d i a l s , reads, with the u n e r r i n g eye of a barn storming p i l o t , a m u l titude of f l a s h i n g l i g h t s and s h i f t i n g meters, and f i n a l l y r e c e i v e s a s i n g l e o r a c u l a r answer from the great machine that he, and he alone, has mastered. T h i s p i c t u r e p o r t r a y s a s t e r e o t y p e ; one t h a t continues to f l o u r i s h i n such sources of contemporary mythology as the magazine gag cartoon and newspaper comic s t r i p . Perhaps such a p i c t u r e i s a p p r o p r i a t e to myths where, modern a n t h r o p o l o g i s t s from ,James Frazer on have t o l d us, the d u l l t r u t h s of p r i m i t i v e e x i s t e n c e are normally e l e v a t e d i n t o a h e r o i c s t r u c t u r e . The d u l l t r u t h of computer programming i s that the programmer, i s o l a t e d from the a c t u a l machine (and i t s i n c r e a s i n g l y u n i n t e r e s t i n g c o n t r o l p a n e l ) , w r i t e s h i s i n s t r u c t i o n s i n a more or l e s s a b s t r a c t f o r m u l a t i o n ( i t s e l f mysterious to the u n i n i t i a t e d , but no more so than the s p e c i a l n o t a t i o n s of mathematics, music, or c u l t u r a l anthropology) and submits these to the computer simultaneously with the submissions of other such programs by other such programmers, r e c e i v i n g h i s answer, very o c c a s i o n a l l y as a s i n g l e number, but much more o f t e n as a long, more or l e s s u s e f u l sequence of complaints from the machine about i t s i n a b i l i t y to perform, or even understand, h i s i n t e n t i o n s . chapter 1 2 The myth n e v e r t h e l e s s c o n t a i n s , as myths w i l l , a hard ke r n e l of t r u t h : with a l l t h e i r immense growth in s o p h i s t i c a t i o n from e l e c t r o - m e c h a n i c a l d e v i c e s to s u p e r - f a s t e l e c t r o n i c m i r a c l e s i n one g e n e r a t i o n , computers remain a complex network of "switches", and the only language they genuinely understand i s that of two-valued s t a t e s and of i n s t r u c t i o n s which cause w e l l d e f i n e d a l t e r a t i o n s i n these s t a t e s . If the machine i s to "understand" the programmer's symbolic program, the program must f i r s t be t r a n s l a t e d i n t o the machine's own language; and the only reason the programmer can a f f o r d to w r i t e h i s programs s y m b o l i c a l l y i n s t e a d o f, pace the c a r t o o n i s t , l a b o r i o u s l y t o g g l i n g them on a Wur1itzer-1ike c o n t r o l p anel, i s that the t r a n s l a t i o n process can be performed, q u i c k l y and a c c u r a t e l y , by the machine i t s e l f , o p e r a t i n g under i n s t r u c t i o n from another program. Programming languages, then, are formal symbol systems f o r the d e s c r i p t i o n of computations. Such languages may range from symbolic assembly languages, which have a c l o s e , obvious correspondence to machine languages, and can be programmed only with a thorough understanding of the machine's o p e r a t i o n , a l l the way to database manipulation languages, which d e s c r i b e complex r e l a t i o n s between data, but do not e x p l i c i t l y d e s c r i b e the a c t i o n s to be performed in s t o r i n g and r e t r i e v i n g them, nor t h e i r i n t e r n a l r e p r e s e n t a t i o n . The c l e a r t r e n d , in the l a s t t h i r t y years or so, has been i n c r e a s i n g l y toward a b s t r a c t ion in languages; that i s , i n c r e a s i n g l y the computer programmer uses a c h a p t e r 1 3 language t h a t i s c l o s e r t o the way t e c h n i c a l l y or m a t h e m a t i c a l l y t r a i n e d p e o p l e would d e s c r i b e the p r o c e s s e s t o each o t h e r or t o themselves than t o the way machines r e p r e s e n t these p r o c e s s e s i n t e r n a l l y . The advantage i n t h i s approach t o programming i s c l e a r : the more n e a r l y the d e s c r i p t i o n matches- the programmer's , own p e r c e p t i o n of the p r o c e s s , the l e s s chance t h e r e i s of h i s d e s c r i p t i o n b e i n g wrong i n some s u b t l e , machine r e l a t e d way; f u r t h e r m o r e , the more n e a r l y the d e s c r i p t i o n matches o t h e r t e c h n i c a l l y or m a t h e m a t i c a l l y t r a i n e d p e o p l e ' s p e r c e p t i o n of the p r o c e s s , the more l i k e l y they are t o read h i s d e s c r i p t i o n and c o n s e q u e n t l y t o be a b l e t o suggest improvements or e x t e n s i o n s , and, s h o u l d the program need t o be r e v i s e d by another programmer, the l e s s t r o u b l e the o t h e r w i l l have i n u n d e r s t a n d i n g how i t o p e r a t e s and i n what f a s h i o n i t can be changed. The r e s u l t of t h i s approach t o programming i s a l s o c l e a r : i n almost a l l cases a w r i t t e n program must be t r a n s l a t e d from i t s a b s t r a c t r e p r e s e n t a t i o n i n t o an e q u i v a l e n t r e p r e s e n t a t i o n i n the machine's own terms b e f o r e the machine can e x e c u t e the program's i n s t r u c t i o n s . 1 Such t r a n s l a t i o n s are a t . t h e h e a r t of 1 An e x c e p t i o n t o t h i s r u l e i s the s i t u a t i o n i n which the program i s i n t e r p r e t e d by another program b e i n g e x e c u t e d by the computer. Even h e r e , we g e n e r a l l y f i n d i t more c o n v e n i e n t , a l t h o u g h not s t r i c t l y n e c e s s a r y , t o d e s i g n the i n t e r p r e t i v e program i n such a way t h a t i t o p e r a t e s , not on the i n t e r p r e t e d program's o r i g i n a l r e p r e s e n t a t i o n , but on a chapter 1 4 a l l programming e f f o r t s and, t h e r e f o r e , the problem of b u i l d i n g s u i t a b l e ( e f f i c i e n t , h e l p f u l , accurate) t r a n s l a t o r s i s of c e n t r a l concern to the design of . programming languages. Consider, by way of i l l u s t r a t i o n , t h i s account of the e a r l y h i s t o r y of programming languages: "The very f i r s t attempt to d e v i s e an a l g o r i t h m i c language [the P l a n k a l k u l ] was undertaken in 1948 by K. Zuse... His n o t a t i o n was q u i t e g e n e r a l , but the proposal never a t t a i n e d the c o n s i d e r a t i o n i t deserved... In 1951 [Ruthishauser] t r i e d to show that in p r i n c i p l e a general purpose computer c o u l d t r a n s l a t e an a l g o r i t h m i c language i n t o machine code... However, the a l g o r i t h m i c language proposed in t h i s paper was q u i t e r e s t r i c t e d ; i t allowed only e v a l u a t i o n of simple formulae and automatic loop c o n t r o l . . . In 1954 Corrado Boehm p u b l i s h e d a method to t r a n s l a t e a l g e b r a i c formulae i n t o computer n o t a t i o n . . . L a n i n g and Z i e r l e r presented t h e i r a l g o r i t h m i c language -- the f i r s t one ever a c t u a l l y used -- and s h o r t l y t h e r e a f t e r the IBM F o r t r a n system was announced... A working group c a l l e d the GAMM subcommittee for Programming  Languages was set up a f t e r the Tl955] Darmstadt meeting in order to design...a u n i v e r s a l a l g o r i t h m i c language... T h i s subcommittee had n e a r l y completed i t s d e t a i l e d work in the autumn of 1957, when i t s members, aware of the many a l g o r i t h m i c languages alre a d y i n e x i s t e n c e , concluded t h a t , r a t h e r than present another such language, they should make an e f f o r t towards worldwide u n i f i c a t i o n [the u l t i m a t e r e s u l t of which was A l g o l 60]." (H. Ruthishauser, quoted by Naur [92], pp 16-17.) The f i r s t programming language, then, dates back to 1948, 2 and the f i r s t compiler to 1954. By 1958 there were a l r e a d y so many e f f o r t s at producing t r a n s l a t o r s f o r languages that the more convenient " i n t e r n a l " form of the program: the i n t e r n a l form i s , of course, d e r i v e d from the o r i g i n a l r e p r e s e n t a t i o n by a process of t r a n s l a t i o n . 2 The f i r s t in modern times; Ada Augusta, Countess of Lovelace, i s c r e d i t e d with the f i r s t a b s t r a c t n o t a t i o n f o r programming Babbage's u n r e a l i z e d a n a l y t i c a l engine. chapter 1 5 f i r s t p r o p o s a l f o r s i m p l i f y i n g the process r e s u l t e d i n the never r e a l i z e d u n i v e r s a l intermediate language U n c o l . 3 We may date from 1948, t h e r e f o r e , the d e s i r e to s i m p l i f y by a b s t r a c t i o n the process of programming a computer, and from 1958 the d e s i r e to s i m p l i f y the implementation of t r a n s l a t o r s f o r these a b s t r a c t i o n s . By the l a t e 1.960's there had a l r e a d y been so many e f f o r t s at s i m p l i f y i n g the process of w r i t i n g t r a n s l a t o r s , that Feldman and G r i e s were able to p u b l i s h a s u b s t a n t i a l and q u i t e necessary survey of such e f f o r t s , and to give them the umbrella t i t l e of t r a n s l a t o r w r i t i n g systems [39]. T h i s survey, which ranges roughly from Uncol to the most c u r r e n t techniques of 1967, was more r e c e n t l y supplemented by a h i s t o r y of e a r l y trends and h i s t o r i c a l c o n t r i b u t i o n s to comp i l i n g , p u b l i s h e d by Bauer [11]. The most s t r i k i n g f a c t apparent from these surveys i s that the e a r l y e f f o r t s were almost e n t i r e l y concerned with the e f f i c i e n t r e c o g n i t i o n of the component p a r t s of the program — the process c a l l e d p a r s i n g , by analogy with the model .of n a t u r a l language c o n s t r u c t i o n and r e c o g n i t i o n commonly taught in sc h o o l s . The t r a n s l a t i o n u s u a l l y , by s t r e t c h i n g the analogy between programming and spoken language, c a l l e d the semanties 3 Uncol was intended to be implemented on a l l computers not an e n t i r e l y u n r e a l i s t i c goal -- and to be .a s u i t a b l y a b s t r a c t t a r g e t f o r t r a n s l a t i o n s of a l l p o s s i b l e programming languages; the i n t e n t was to thus s i m p l i f y the t r a n s l a t i o n by b r i n g i n g the t a r g e t c l o s e r to the source. The f a i l u r e of Uncol l a y in i t s d e s i g n e r s ' innocent assumption that they knew roughly what they meant by " a l l p o s s i b l e programming languages." c h a p t e r 1 6 of these " h i g h l e v e l " languages t o machine l e v e l languages remained, l a r g e l y , a matter of a t t a c h i n g programs t o the s y n t a c t i c s p e c i f i c a t i o n . T h i s approach t o the s p e c i f i c a t i o n of t r a n s l a t o r s , c a l l e d syntax d i r e c t e d t r a n s l a t i o n , has proved t o be so s u c c e s s f u l t h a t , i n the y e a r s f o l l o w i n g the Feldman and G r i e s s u r v e y , i t has been r e f i n e d and extended, r e d i s c o v e r e d and r e - e x p r e s s e d , but never abandoned as a b a s i s f o r t r a n s l a t o r programming. T r a n s l a t o r w r i t i n g systems c o n t i n u e t o appear w i t h r e g u l a r i t y [57,65] whose o n l y a b s t r a c t i o n of the p r o c e s s of w r i t i n g t r a n s l a t o r s i s t o p r o v i d e new v a r i a t i o n s on the a u t o m a t i c p a r s e r g e n e r a t i o n t e c h n i q u e s t h a t date back t o the e a r l y 1960s, and to a t t a c h newer programming languages t o the r e s u l t i n g s y n t a c t i c r e c o g n i z e r s f o r the purposes of t r a n s l a t i o n . G e n e r a l t e x t b o o k s on the s u b j e c t of t r a n s l a t i o n [53,3] c o n t i n u e t o devote about h a l f t h e i r space t o p r e s e n t i n g g e n e r a l , t h e o r e t i c a l l y founded methods f o r a u t o m a t i c a l l y t u r n i n g s y n t a c t i c s p e c i f i c a t i o n s i n t o f a s t p a r s e r s , and the remainder p r e s e n t i n g a c o l l e c t i o n of ad hoc t e c h n i q u e s f o r t r a n s l a t i o n . Chapter 2 of t h i s d i s s e r t a t i o n w i l l p r e s e n t an i n - d e p t h a n a l y s i s of the apparent reasons t h a t t r a n s l a t o r a b s t r a c t i o n has not q u a l i t a t i v e l y p r o g r e s s e d beyond t h i s p o s i t i o n , a t t a i n e d r a t h e r e a r l y i n i t s h i s t o r y , and w i l l o u t l i n e a d e s c r i p t i v e b a s i s f o r d i s c u s s i n g t r a n s l a t i o n t h a t t a k e s i n t o account not o n l y the immense and c o n t i n u e d s u c c e s s of the s y n t a c t i c b a s i s f o r the t r a n s l a t i o n of programming la n g u a g e s , but a l s o the need t o d e s c r i b e a s p e c t s of t r a n s l a t i o n t h a t do not f i t c l e a n l y i n t o chapter 1 7 the e x i s t i n g paradigm of syntax d i r e c t e d t r a n s l a t i o n . The most important of these aspects a r e : (1) The "context s e n s i t i v e " nature of programming language syntax, which i s not e a s i l y d e a l t with i n the "context f r e e " d e s c r i p t i o n s processed by automatic p a r s i n g techn iques; (2) The s t r u c t u r i n g of t r a n s l a t o r s so that f u n c t i o n a l l y independent processes are d e s c r i b a b l e by independent procedures (the s o - c a l l e d " s t r u c t u r e d programming" d i s c i p l i n e ) r ather than, as i s almost i n e v i t a b l e in syntax d i r e c t e d t r a n s l a t i o n , by a s i n g l e i n t e g r a t e d process spread out over a d e s c r i p t i o n of the syntax that i s f r e q u e n t l y d i s t o r t e d by the demands of automatic parser generators; (3) The expression of o p t i m i z a t i o n --• that i s , the improvement of a u t o m a t i c a l l y generated machine language t r a n s l a t i o n s so that they approximate the q u a l i t y of a hand-written machine l e v e l program -- i n a c l e a n , uniform manner; (4) The handling and r e p o r t i n g of e r r o r s ; and (5) The i n c r e a s e d automation of a c t i v i t i e s now performed more or l e s s adequately by programmers, namely, making a b s t r a c t i o n s of programs e x p l i c i t and s y n t h e s i z i n g programs from t h e i r a b s t r a c t s p e c i f i c a t i o n s , improving h i g h l y a b s t r a c t programs, e d i t i n g the s y n t a c t i c a l l y s t r u c t u r e d text of programs, proving the c o r r e c t n e s s of programs. c h a p t e r 1 8 These t o p i c s w i l l be t r e a t e d b o th i n terms of the proposed approach t o the d e s c r i p t i o n of t r a n s l a t i o n s , and w i t h r e f e r e n c e t o t h e i r c l a s s i c a l t r e a t m e n t by c o m p i l e r w r i t e r s and programmers, i n a c l o s e i n s p e c t i o n of the a s p e c t s of t r a n s l a t i o n p r e s e n t e d i n c h a p t e r 3. F i n a l l y c h a p t e r 4 c o n c l u d e s the main t h r u s t of t h i s d i s s e r t a t i o n by p r e s e n t i n g a p r o p o s a l f o r the c o n s t r u c t i o n of a t r a n s l a t o r w r i t i n g system, and a c t u a l e x p e r i e n c e w i t h the use of an e x p e r i m e n t a l system based on the c o n s i d e r a t i o n s of c h a p t e r s 2 and 3. An e v a l u a t i o n of the p r o p o s a l , e s p e c i a l l y i n the l i g h t of i t s f u r t h e r development and the e f f e c t i t i s i n t e n d e d t o have on the d e s i g n and i m p l e m e n t a t i o n of programming language t r a n s l a t o r s , c o n c l u d e s the d i s s e r t a t i o n i n c h a p t e r 5. The r e a der w i l l pass t h r o u g h s e v e r a l s t a g e s i n what we might d e s c r i b e as the e v o l u t i o n of an i d e a . From the p r i m o r d i a l n o t i o n t h a t we w ish t o t r e a t the t r a n s l a t i o n of programs from one language t o another (what we w i l l c a l l , f o r the sake of b r e v i t y , the t r a n s l a t i o n of programming languages, and o f t e n s i m p l y t r a n s l a t i o n ) as u n i f o r m l y and a b s t r a c t l y as p o s s i b l e by t h i n k i n g of i t as a t e x t u a l t r a n s f o r m a t i o n ; t h r o u g h the r e a l i z a t i o n t h a t t e x t -- a word, i n c i d e n t a l l y , which d e r i v e s from the L a t i n t e x t u s meaning "web" -- as a p u r e l y l i n e a r sequence of symbols, i s i n adequate f o r the r e p r e s e n t a t i o n of the s y n t a c t i c r e l a t i o n s h i p s between the symbols t o g i v e any but the most p r i m i t i v e c o n t r o l over t h e i r c o r r e c t t r a n s f o r m a t i o n ; c h a p t e r 1 9 thr o u g h the p r e s e n t a t i o n of t r a n s l a t i o n as a t r a n s f o r m a t i o n of s y n t a c t i c s t r u c t u r e s which have l o s t much of t h e i r r e l a t i o n s h i p t o the o r i g i n a l t e x t ; f i n a l l y t o an a b s t r a c t i o n of the s y n t a c t i c s t r u c t u r e s which w i l l r e t u r n us t o the o r i g i n a l -- but now c o n s i d e r a b l y enhanced -- concept of t e x t u a l t r a n s f o r m a t i o n . I t w i l l be a lon g j o u r n e y and, l i k e most j o u r n e y s , i t w i l l be c i r c u l a r : we w i l l t r a v e l a l o n g way from I t h a c a o n l y t o a r r i v e back i n I t h a c a . However, we w i l l f i n d o u r s e l v e s and t h e r e f o r e I t h a c a -- much changed f o r the j o u r n e y . 10 Chapter 2: The r e l a t i o n s h i p between t r a n s l a t i o n and graph t r a n s f o r m a t i o n The p r i m a r y i n t e n t of t h i s d i s s e r t a t i o n i s t o p r o v i d e a u n i f o r m b a s i s upon which t o d i s c u s s and, u l t i m a t e l y , implement t r a n s l a t i o n s . I t i s the f u n c t i o n of t h i s c h a p t e r t o show t h a t such a b a s i s may be found by t r e a t i n g a t r a n s l a t i o n as. a t r a n s f o r m a t i o n on a program w i t h r e f e r e n c e t o a broadened model of the s y n t a c t i c s t r u c t u r e s i n v o l v e d . These i n c l u d e the s y n t a c t i c s t r u c t u r e s of the language i n which the program i s e x p r e s s e d , of the language i n t o which i t i s t o be t r a n s l a t e d , and of any i n t e r m e d i a t e forms t h r o u g h which t h i s t r a n s l a t i o n may p r o c e e d , i n c l u d i n g , i n the case of program o p t i m i z a t i o n , the f l o w s t r u c t u r e . That s y n t a c t i c s t r u c t u r e i s v i t a l t o the c o r r e c t t r a n s l a t i o n of programs i s e s t a b l i s h e d , i n s e c t i o n 2.1, by an e x a m i n a t i o n of the i n f l u e n c e of s y n t a c t i c d e f i n i t i o n on a l l a s p e c t s of programming languages. A graph r e p r e s e n t a t i o n f o r the s yntax of programming languages i s d e v e l o p e d i n s e c t i o n 2.2 „ u s i n g as i t s b a s i s the l o n g e s t a b l i s h e d r e l a t i o n s h i p between c o n t e x t f r e e s yntax and a t r e e r e p r e s e n t a t i o n of programs. F i n a l l y , a b a s i s f o r d i s c u s s i n g t r a n s l a t i o n i n terms of t r a n s f o r m a t i o n s on these s y n t a c t i c s t r u c t u r e s i s developed i n s e c t i o n 2.3. I t i s l e f t t o the next c h a p t e r t o demonstrate how t h i s model may be a p p l i e d t o a l a r g e range of c o n s i d e r a t i o n s , b o th t r a d i t i o n a l and modern, i n programming language t r a n s l a t i o n chapter 2 and program development. 11 2.1 The r e l a t i o n s h i p of s y n t a c t i c s t r u c t u r e to t r a n s l a t i o n Chomsky int r o d u c e d the notion of "phrase s t r u c t u r e " grammar .into l i n g u i s t i c s ; the name i s a p p r o p r i a t e to the aspect of grammars we are examining here because, while Chomsky was f u l l y aware of the computational p r o p e r t i e s of h i s g e n e r a t i v e grammars, and t h e i r r e l a t i o n s h i p to e a r l i e r computational models l i k e Markov systems and Tu r i n g machines, a c o n s i d e r a b l e part of h i s i n t e n t i n i n v e s t i g a t i n g t h i s s p e c i f i c f o r m u l a t i o n of grammars was that i t should r e s u l t i n a p r e c i s e s p e c i f i c a t i o n of the r e l a t i o n s h i p between elements of a phrase -- hence the term "phrase s t r u c t u r e grammar" [24]. In f a c t , the c i t e d paper gi v e s a method f o r c o n s t r u c t i n g a l a b e l l e d parse t r e e of a phrase from i t s d e r i v a t i o n i n a "type 1" or context s e n s i t i v e grammar ( r e s t r i c t e d to non-erasing r e w r i t i n g r u l e s ) . The i n t e r n a t i o n a l committee at work on the s p e c i f i c a t i o n of A l g o l 60 developed a form of s y n t a c t i c s p e c i f i c a t i o n based on Chomsky's work i n order to give an unambiguous s p e c i f i c a t i o n of the form of A l g o l 60 i n t h e i r p u b l i s h e d r e p o r t . T h i s was a p e r f e c t case of two p r e v i o u s l y u n r e l a t e d ideas coming together a phenomenon not uncommon i n the h i s t o r y of s c i e n c e : Chomsky's r e c u r s i v e d e f i n i t i o n of syntax and the nested block s t r u c t u r e of A l g o l 60 (already present i n the A l g o l 58 v e r s i o n , but not p r e v i o u s l y r a t i o n a l i z e d ) were made f o r each o t h e r . To s e c t i o n 2.1 12 t h i s day, BNF 1 i s used to d e s c r i b e the c o n t e x t - f r e e syntax of programming languages, and the f r e e form nested s t r u c t u r e of A l g o l 60 has remained the paradigm f o r the form of programming languages however l i t t l e t h e i r s y n t a c t i c e n t i t i e s now resemble those of A l g o l 60. The convenient d e s c r i p t i o n of programming language syntax by BNF appears to have had the e f f e c t a l s o of paving the way f o r the f i r s t t r a n s l a t o r w r i t i n g systems. Rosen [106] c r e d i t s Irons with the i n v e n t i o n of the syntax d i r e c t e d compiler, but i t i s a concept that f o l l o w s so n a t u r a l l y from BNF n o t a t i o n that i t cannot be t r a c e d with c e r t a i n t y to any author but appeared in s e v e r a l p l a c e s at once; n e v e r t h e l e s s , Irons p u b l i s h e d the f i r s t paper using the term [63]. The e a r l i e s t s u c c e s s f u l e f f o r t s i n t r a n s l a t o r w r i t i n g systems, META [113], COGENT [101], e t c . (see Feldman and G r i e s [39], pp 92-99), were based on syntax d i r e c t e d t r a n s l a t i o n schemes. In i t s s i m p l e s t form, syntax d i r e c t e d t r a n s l a t i o n i n v o l v e s w r i t i n g a BNF s p e c i f i c a t i o n of the language 1 O r i g i n a l l y , BNF stood f o r Backus Normal Form, but s i n c e Rnuth's a l t e r n a t i v e suggestion [70] that i t be c a l l e d Backus Naur Form f o r h i s t o r i c a l reasons (see Naur [92], pp 20-21,36; note d i s s e n t by Bauer, p 41), the world has been d i v i d e d i n t o two camps. I t i s probably j u s t as w e l l to s t i c k to the i n i t i a l s "BNF", s i n c e , a c c o r d i n g to a p r i v a t e communication from Ingerman reported by Rosen, "the e a r l i e s t known use of [a technique analogous to BNF] was by P a n i n i . . . Between 400 B.C. and 200 B.C. he (or h i s school) used a n o t a t i o n which i n c l u d e s a l l the m e t a l i n g u i s t i c apparatus r e i n v e n t e d by Backus (excepting only r e c u r s i v e r u l e s , f o r which no need was evinced) to d e s c r i b e the p e c u l i a r grammatical c o n s t r u c t i o n s known as samdhi which occur in S a n s k r i t " [107]. The d e s i g n a t i o n "BNF" seems p r e f e r a b l e to "Panini Chomsky Backus Naur Form," and i s by now w e l l - e s t a b l i s h e d . s e c t i o n 2.1 13 and augmenting these r u l e s with " a c t i o n s " w r i t t e n i n a more or l e s s adequate programming language; these a c t i o n s are then performed whenever the r u l e i s used s u c c e s s f u l l y i n a parse of the program being t r a n s l a t e d . The BNF s p e c i f i c a t i o n i s u s u a l l y r e s t r i c t e d by the pragmatics of automatic p a r s e r generating techniques: f o r i n s t a n c e , a r e c u r s i v e descent p a r s e r (the most popular form in the e a r l y days), must exclude l e f t - r e c u r s i v e r u l e s ; the most popular techniques today, the d e t e r m i n i s t i c LL and LR forms are known to be l e s s i n c l u s i v e i n t h e i r d e s c r i p t i v e power than general context f r e e forms, but they are a l s o f a r more e f f i c i e n t bases f o r r e c o g n i z e r s . 2 T r a n s l a t o r s continue to be p r i m a r i l y syntax d i r e c t e d i n t h e i r s t r u c t u r e (the s y n t a c t i c s t r u c t u r e o f - P a s c a l was designed with LR(k) p a r s i n g i n mind); and so do t r a n s l a t o r w r i t i n g systems (a recent example i s YACC [65] which uses a LALR(l) parser generator coupled with a c t i o n s programmed i n C). The profound e f f e c t of BNF on v a r i o u s aspects of programming language r e s e a r c h , i n c l u d i n g , but not l i m i t e d to, t r a n s l a t i o n , i s reviewed i n the next s u b s e c t i o n (2.1.1). Subsequently (2.1.2), i t w i l l be argued that the success of BNF in these a p p l i c a t i o n s has l e d to s e v e r a l problems d i r e c t l y t r a c e a b l e to the r e p r e s e n t a t i o n , whether e x p l i c i t l y or i m p l i c i t l y , of programs under t r a n s l a t i o n as a n e c e s s a r i l y See the c h a r t in Aho and Ullman [2], p 449, which p a r t i a l l y o rders known p a r s i n g techniques by the s e t s of languages they recogn i z e . s e c t i o n 2.1 14 context f r e e parse t r e e . V a r i o u s programming and d e f i n i t i o n a l techniques that have alr e a d y been de v i s e d to deal s p e c i f i c a l l y with these problems w i l l be surveyed, and t h e i r s t r e n g t h s and weaknesses i n d e a l i n g with the t r a n s l a t i o n problem w i l l be d i scussed. 2.1.1 The impact of BNF The c e n t r a l argument of t h i s d i s s e r t a t i o n i s based on the overwhelming importance of s y n t a c t i c s t r u c t u r e to the c o r r e c t t r a n s l a t i o n of programming languages. There w i l l be l i t t l e argument today over t h i s importance, and we have a l r e a d y seen i n d i c a t i o n s that much of the p u b l i s h e d m a t e r i a l on programming language t r a n s l a t i o n takes i t f o r granted that t r a n s l a t i o n i s i n t i m a t e l y j o i n e d to s y n t a c t i c r e c o g n i t i o n . T h i s s e c t i o n , p r i m a r i l y a survey, de a l s with the c e n t r a l r o l e p l a y e d by s y n t a c t i c s t r u c t u r e i n each of three areas of programming language re s e a r c h : language design, semantic s p e c i f i c a t i o n , and c o m p i l a t i o n techniques. I t s i n t e n t , i n c o n j u n c t i o n with the subsequent s e c t i o n , i s not to heap unnecessary d e t a i l on e s t a b l i s h e d f a c t i n order to argue the importance of s y n t a c t i c s t r u c t u r e , but rather to show the degree to which context fr e e s y n t a c t i c r e c o g n i t i o n -- at present the only v i a b l e and acce p t a b l y general form of automatic r e c o g n i t i o n has r e s t r i c t e d the e x p r e s s i o n of programming language t r a n s l a t i o n s . s e c t i o n 2.1.1 15 (a) Language design The f i r s t use of BNF in the systematic d e f i n i t i o n of a programming language was by Naur, i n a d r a f t r e p o r t he presented to the ACM-GAMM committee developing A l g o l 60 [ 9 2 ] . I t i s not c l e a r from the h i s t o r i c a l accounts how much i n f l u e n c e the adoption of a r e c u r s i v e d e f i n i t i o n a l technique had on the design of A l g o l 60. C e r t a i n l y IAL, the prototype now known as A l g o l 58, was s t r o n g l y i n f l u e n c e d by F o r t r a n (which, i f i t had not been an IBM pr o p e r t y , would probably have been adopted by the ACM as an American standard [ 9 7 ] ) , and, although the notion of r e c u r s i v e d e f i n i t i o n i s present i n the document d e f i n i n g A l g o l 58 [ 9 6 ] , i t i s not used to great e f f e c t e i t h e r in d e f i n i t i o n or des i g n . For i n s t a n c e , the _if_ statement has a F o r t r a n f l a v o r : i f (a>0); c:=a+b S i m i l a r l y , the notion of a l o c a l v a r i a b l e i s missing (a v a r i a b l e may be d e c l a r e d anywhere in a procedure; i t s range i s then the e n t i r e procedure), and the begin-end markers are pu r e l y a b r a c k e t i n g n o t a t i o n . I t i j ; c l e a r , however, that the form and p r e s e n t a t i o n of A l g o l 60 had an immediate and l a s t i n g e f f e c t on subsequent language d e s i g n . The d e f i n i t i o n a l technique we c a l l BNF i s , because of i t s r e c u r s i v e nature, i d e a l l y s u i t e d to the d e f i n i t i o n of a language l i k e A l g o l 60, whose s y n t a c t i c s t r u c t u r e i s , by and l a r g e , r e c u r s i v e . V i r t u a l l y every language design developed s i n c e A l g o l 60 has, at one time or another, s e c t i o n 2.1.1 16 been communicated in BNF, and, to quote P e r l i s , "where important new l i n g u i s t i c i n v e n t i o n s have oc c u r r e d , they have been embedded u s u a l l y w i t h i n an A l g o l framework... A l g o l has become so i n g r a i n e d i n our t h i n k i n g about programming that we almost a u t o m a t i c a l l y base i n v e s t i g a t i o n s i n a b s t r a c t programming on an A l g o l r e p r e s e n t a t i o n " ( [ 9 7 ] , p 13). At the same time, languages such as F o r t r a n , which was designed without b e n e f i t of BNF, C o b o l 3 and e s p e c i a l l y Basic appear to us p r i m i t i v e and without elegance when compared to A l g o l 60. Two f i n a l examples of the i n f l u e n c e of s y n t a c t i c d e f i n i t i o n on language design deserve to be mentioned. The f i r s t i s PL/I, a language c l e a r l y based on A l g o l 60 but e q u a l l y c l e a r l y designed without recourse to BNF or some other d e f i n i t i o n of s y n t a c t i c s t r u c t u r e . PL/I d i s p l a y s an odd combination of A l g o l 60's f r e e format design and F o r t r a n ' s o r i e n t a t i o n toward card images, often to the programmer's c o n f u s i o n . By c o n t r a s t , Algol' 68, the second example, i s at the other end (some would say extreme) of the design spectrum: the "concept of " o r t h o g o n a l i t y , " a kind of Occam's Razor of programming language design, l e d the designers 3 The design of Cobol took place s i m u l t a e o u s l y with the r e v i s i o n of IAL that introduced BNF (and the name A l g o l ) . Sammet, in a h i s t o r i c a l summary of the language's development, d i s c u s s e s b r i e f l y the r e l a t i o n s h i p of Cobol's s t i l l widely used ."metalanguage" and BNF; she concludes that, though i t can be done, BNF i s not w e l l s u i t e d to Cobol's s t y l e -- which suggests that the d e f i n i t i o n mechanism used has a strong i n f l u e n c e on the form of the language ([109], p 141). s e c t i o n 2.1.1 17 to attempt to reduce redundancy of f u n c t i o n to a minimum. Thus, the assignment symbol, ":=", i s t r e a t e d l i k e an op e r a t o r , and "statements" v i r t u a l l y disappear, s i n c e such t r a d i t i o n a l s y n t a c t i c s t r u c t u r e s as i f - t h e n - e l s e and beqin-end are v a l u e - r e t u r n i n g e x p r e s s i o n s , while the semicolon, the A l g o l 68 statement s e p a r a t o r , a c t s l i k e an operator c o n t r o l l i n g the e v a l u a t i o n of the two expr e s s i o n s i t separates. The r e c u r s i v e nature of BNF would appear to induce o r t h o g o n a l i t y : having "named" a concept by d e f i n i n g i t s s y n t a c t i c s t r u c t u r e i n a pr o d u c t i o n r u l e , one can then i n s e r t that concept i n any other s y n t a c t i c c o n s t r u c t simply by using i t s "name" -- the symbol used on the l e f t hand s i d e of the d e f i n i n g r u l e . A l g o l 68 was s p e c i f i e d , i t i s t r u e , using a more powerful d e f i n i t i o n a l technique than BNF, but i n the o r i g i n a l Report [131] the use of t h i s technique, and the way i t r e l a t e s to the s y n t a c t i c s t r u c t u r e of the language, i s not s u b s t a n t i a l l y d i f f e r e n t from the use of BNF in d e f i n i n g A l g o l 60. 4 (b) The s p e c i f i c a t i o n of semantics As e a r l y as the d e f i n i t i o n of A l g o l 60, language d e s i g n e r s sensed the need f o r a more p r e c i s e d e f i n i t i o n a l c a l c u l u s than E n g l i s h sentences [92]; but, except f o r some formal systems l i k e lambda c a l c u l u s , Post systems, Markov systems, and T u r i n g 4 T h i s statement i s no longer true of the Revised Report, where the d e f i n i t i o n a l technique i s put to c o n s i d e r a b l y more ambitious use. T h i s aspect of A l g o l 68 w i l l be d i s c u s s e d i n gr e a t e r d e t a i l i n s e c t i o n 2.1.2. s e c t i o n 2.1.1 18 machines, there was nothing immediately a v a i l a b l e to them to e x p l o i t f o r t h i s purpose. Tur i n g ' s and Church's work, at l e a s t , must have been known to members of the A l g o l 60 committee, but i t was s e v e r a l years before lambda c a l c u l u s was, with mixed success, a p p l i e d to the d e f i n i t i o n of programming language semantics (see Feldman and G r i e s [39], pp 104-106). If any of these techniques were co n s i d e r e d , nothing came of i t i n time f o r the p u b l i c a t i o n d e a d l i n e of the A l g o l 60 r e p o r t ; i n r e t r o s p e c t , i t i s reasonable to assume that A l g o l 60 would have s u f f e r e d from so r i g o r o u s a d e f i n i t i o n e s p e c i a l l y because none of these formal computational models r e l a t e s w e l l to a language l i k e A l g o l 60. In any case, these techniques are no longer s e r i o u s candidates f o r a , d e f i n i t i o n a l c a l c u l u s of programming languages, and i n p r e s e n t i n g some of the methods that are s e r i o u s l y pursued at present, I s h a l l be arguing that at l e a s t p a r t of the reason f o r t h e i r poor a p p l i c a b i l i t y to programming language semantics i s that the e a r l i e r methods, having been d e v i s e d without a good paradigm of a programming language, and e s p e c i a l l y without a good paradigm of s y n t a c t i c s t r u c t u r e such as i s p r o v i d e d by BNF., c o u l d not be expected to apply e l e g a n t l y to the problem of semantic s p e c i f i c a t i o n . A good t u t o r i a l on some d e f i n i t i o n a l techniques and t h e i r a p p l i c a t i o n to programming language d e f i n i t i o n i s pro v i d e d in a r e l a t i v e l y recent paper by Marcotty et a_l. [87], T h i s survey, although l i m i t e d i n scope p r i m a r i l y to " o p e r a t i o n a l " (as d i s t i n c t from " d e n o t a t i o n a l " ) semantics, 5 reviews s e v e r a l major s e c t i o n 2.1.1 19 approaches to semantics, namely, the Vienna method, p r o d u c t i o n systems, W-grammars, a t t r i b u t e grammars (and the f u n c t i o n a l l y e q u i v a l e n t a f f i x grammars), and axiomatic semantics. (The l a t t e r i s an exception to the r u l e : i t i s not an o p e r a t i o n a l semantics, i n that i t d e f i n e s no u n d e r l y i n g computational model.) The most thorough attempt at c r e a t i n g a complete o p e r a t i o n a l model of the semantics of programming languages i s the e f f o r t of the IBM Vienna L a b o r a t o r i e s , known as Vienna D e f i n i t i o n Language (VDL). Besides the d e s c r i p t i o n in Marcotty et a l . , the form and a p p l i c a t i o n of VDL was d e s c r i b e d i n d e t a i l by Wegner [139]. VDL d e f i n i t i o n s c o n s i s t of t r a n s f o r m a t i o n r u l e s over a set of t r e e s r e p r e s e n t i n g the a b s t r a c t syntax of a programming language. For any p a r t i c u l a r program, then, i t i s p o s s i b l e to determine the computation i t d e s c r i b e s by t r a c i n g the e f f e c t of the t r a n s f o r m a t i o n s d e f i n e d on i t . As a d e f i n i t i o n a l method, VDL i s analogous to BNF: f o r any s t r i n g i t i s p o s s i b l e to determine whether i t i s a s y n t a c t i c a l l y c o r r e c t e n t i t y i n a language ( e x c l u d i n g c o n t e x t u a l requirements) by determining whether i t can be generated by the BNF d e f i n i t i o n of the 5 An o p e r a t i o n a l d e f i n i t i o n of a language p r o v i d e s an a b s t r a c t model f o r computations expressed i n the language and s p e c i f i e s a t r a n s l a t i o n of programs i n t o that model, whereas a d e n o t a t i o n a l d e f i n i t i o n s p e c i f i e s the mathematical f u n c t i o n s computed w i t h i n the language. s e c t i o n 2.1.1 20 language. We w i l l r e t u r n to VTJL i n the next s e c t i o n . I t i s s u f f i c i e n t f o r our present d i s c u s s i o n to note i t s use of s y n t a c t i c s t r u c t u r e i n s p e c i f y i n g semantics. "Abstract syntax" i s a term used in analogy to the "concrete syntax" of, say, BNF. To prevent ambiguity, the co n c r e t e syntax o f , f o r i n s t a n c e , e x p r e s s i o n s , must be presented, in a context f r e e generation language l i k e BNF, in some form s i m i l a r t o : E : : = T | T "+" E T ::= P [Gl] P : : = " 1" | T h i s grammar y i e l d s one, and only one, d e r i v a t i o n f o r the s t r i n g " 1 + l x l " , namely, the one represented by the parse t r e e : E r T n E [ T l ] T + 1 x 1 However, the s y n t a c t i c s t r u c t u r e of the s t r i n g i s rev e a l e d j u s t as w e l l by a t r e e of the form: s e c t i o n 2.1.1 21 E E [T2] H - l 1 + 1 x 1 Tree T2 i s d e r i v e d from t r e e TI by e l i m i n a t i n g a l l nodes with only one branch. These, in t u r n , represent one-on-one pr o d u c t i o n s in the grammar, l i k e "T::=P", which e x i s t s o l e l y to disambiguate the d e r i v a t i o n of s t r i n g s with respect to the grammar. Having reduced the number of node types that w i l l occur in any parse t r e e , we w i l l a l s o have reduced the s i z e of a semantic s p e c i f i c a t i o n . A b s t r a c t syntax w i l l appear again i n s e c t i o n 2.2.2, where i t i s c o n s i d e r e d as a step toward the d e r i v a t i o n of a s y n t a c t i c r e p r e s e n t a t i o n f o r t r a n s l a t i o n . The other major approach to semantics i s the d e n o t a t i o n a l . Most of the primary sources f o r d e n o t a t i o n a l semantics are e i t h e r obscure or, in the case of s e v e r a l recent books, formidable; but for an e a s i l y a c c e s s i b l e t u t o r i a l and b i b l i o g r a p h y , see Tennent [124]. We w i l l not concern o u r s e l v e s here with the a p p l i c a t i o n of t h i s approach to the d e f i n i t i o n of semantics, any more than we have concerned o u r s e l v e s with the d e t a i l s of VDL's o p e r a t i o n . I t i s s u f f i c i e n t f o r our d i s c u s s i o n of the i m p l i c i t importance of s y n t a c t i c s t r u c t u r e to any c o n s i d e r a t i o n of programming languages to note that any d e n o t a t i o n a l d e f i n i t i o n of a language w i l l be based on a d e f i n i t i o n of the language's a b s t r a c t s y n t a x . 6 As in Hoare's 6 Compare Tennent's examples: [124] p 438 (LOOP), pp 441,447 (AEXP), and p 449 (GEDANKEN). s e c t i o n 2.1.1 22 semantics, d e f i n i t i o n s use only s y n t a c t i c a l l y well-formed e n t i t i e s i n the language, and w i l l o f t e n depend on a r e c u r s i v e e l a b o r a t i o n of those e n t i t i e s , based on t h e i r s y n t a c t i c s t r u c t u r e . For example, c o n s i d e r the d e f i n i t i o n of sequencing: C[S1;S2] = C[S2]°C[S1] Here, the f u n c t i o n C (which g i v e s an i n t e r p r e t a t i o n of the s y n t a c t i c e n t i t i e s we s h a l l c a l l "commands", i n c l u d i n g loops, assignment, e t c . ) , i s d e f i n e d r e c u r s i v e l y : the value of two commands separated by a semicolon i s the value of the second command composed with the value of the f i r s t . C a p p l i e d to a command d e l i v e r s a f u n c t i o n mapping environments to environments; hence, the composition of two such f u n c t i o n s w i l l y i e l d another such f u n c t i o n . A s i m i l a r d e f i n i t i o n a l mechanism i s at work i n axiomatic s p e c i f i c a t i o n s , where the above c o n s t r u c t would be d e f i n e d as: {p}Sl{q} and {q)S2{r} { p } S l ; S 2 l r j (I f SI transforms p r e d i c a t e p i n t o p r e d i c a t e q and S2 transforms q i n t o r, then the c o n s t r u c t S1;S2 transforms p i n t o r : the p r e d i c a t e s f u n c t i o n as a s s e r t i o n s d e f i n i n g as much of the environment as necessary f o r the purposes of the semantics.) We have seen that every major approach to semantic d e f i n i t i o n d e v i s e d in the l a s t twenty years has used some notion of s y n t a c t i c s t r u c t u r e . In models l i k e VDL, t h i s s t r u c t u r e i s e x p l i c i t l y manipulated, and the value of the program i s d e r i v e d s e c t i o n 2.1.1 23 from the r e s u l t s of the m a n i p u l a t i o n . In d e n o t a t i o n a l semantics, the s t r u c t u r e i s used i m p l i c i t l y i n the form the statement of semantics takes: the r e c u r s i v e d e f i n i t i o n of syntax induces an analogous r e c u r s i v e s t r u c t u r e on the d e f i n i t i o n of semantics as i n the above example, where the s y n t a c t i c r u l e SSeq ::= S " ; " SSeq i s -echoed i n the d e f i n i t i o n a l r u l e C[S1;S2] = C[S2]°C[S1] which d e f i n e s the e v a l u a t i o n of "SSeq" by d e f i n i n g the e v a l u a t i o n of i t s component p a r t s , and does so in terms of a f u n c t i o n of i t s s y n t a c t i c subforms. Of the major models of computation d e v i s e d before s y n t a c t i c d e f i n i t i o n i n terms of BNF was a v a i l a b l e , Post's, T u r i n g ' s , and Markov's are a l l a b s t r a c t i o n s of mechanical processes and are d e f i n e d i n terms of o p e r a t i o n s on a s t r i n g of symbol's, resembling a computer's memory; only lambda c a l c u l u s resembles a programming language. Those of the e a r l y models that have been used as a b a s i s f o r programming language d e f i n i t i o n in the l a s t twenty years, most notably p r o d u c t i o n systems, have had to be m o d i f i e d to i n c l u d e (and, i n f a c t , be s t r u c t u r e d by) a s y n t a c t i c d e f i n i t i o n , or e l s e have been used as a b s t r a c t machine models onto which may be s p e c i f i e d a s y n t a c t i c a l l y s t r u c t u r e d mapping from the language to be d e f i n e d [144,77]. That t h i s i s so i s i n d i c a t i v e once again of the impact of A l g o l 60 and, i n d i r e c t l y , of the use of BNF i n i t s d e f i n i t i o n on a l l a s pects of programming languages. We cannot now s e r i o u s l y sect ion 2.1.1 24 e n t e r t a i n a d e f i n i t i o n of a programming language without r e f e r e n c e to i t s s y n t a c t i c s t r u c t u r e any more than an astronomer can s e r i o u s l y e n t e r t a i n the n o t i o n of a g e o c e n t r i c u n i v e r s e . (c) Compilation techniques We have a l r e a d y seen, in the i n t r o d u c t i o n to s e c t i o n 2.1, that c o m p i l a t i o n techniques were immediately a f f e c t e d by the i n t r o d u c t i o n of BNF. An e a r l y A l g o l 60 compiler [63] was s t r u c t u r e d as a r e c u r s i v e descent p a r s e r , as were most of the e a r l y t r a n s l a t o r w r i t i n g systems (see Feldman and G r i e s [39], pp 92-103). Recursive descent p a r s i n g r e s t r i c t s grammars only in that r u l e s must not be l e f t - r e c u r s i v e ; i t tends to be a wasteful method, in that i t develops many u n s u c c e s s f u l p a r t i a l parses in f i n d i n g a c o r r e c t parse. More e f f i c i e n t p a r s i n g techniques (simple and operator precedence, LL and LR) p l a c e g r e a t e r r e s t r i c t i o n s on the grammars they can handle and t h e r e f o r e on the set of languages, 7 but at the same time, allow automatic d e t e r m i n a t i o n of whether the grammar i s ambiguous -- an important asset i n syntax d i r e c t e d t r a n s l a t i o n . I t i s an unfortunate f a c t that p a r s i n g i s a much b e t t e r understood and, consequently, much more thoroughly documented aspect of t r a n s l a t i o n than any other. T h i s s i t u a t i o n i s c l e a r 7 However, a l l the aspects of e x i s t i n g programming languages that can be d e s c r i b e d by a general context f r e e grammar can a l s o be d e s c r i b e d by an LR(k) grammar; the l i m i t a t i o n s are, t h e r e f o r e , only of t h e o r e t i c a l i n t e r e s t . s e c t i o n 2.1.1 25 not only i n the 1968 Feldman and G r i e s survey, but a l s o i n a 1967 survey by Bauer [11], as w e l l as in such g e n e r a l t e x t s on compiler w r i t i n g as G r i e s [53], and Aho and Ullman [ 3 ] . Too o f t e n , t h e r e f o r e , " t r a n s l a t o r w r i t i n g systems" turn out to c o n s i s t of a parser generator and some more or l e s s u s e f u l f a c i l i t y f o r w r i t i n g "semantic" r o u t i n e s -- most u s u a l l y , and of t e n most u s e f u l l y , a general purpose programming language, as in the cases of XPL [84], and YACC [65]. In s p i t e of t h i s one-sided development, the concept of syntax d i r e c t e d t r a n s l a t i o n has been h i g h l y s u c c e s s f u l i n the sense that i t i s the most commonly used paradigm of t r a n s l a t i o n . Not only are such " c l a s s i c a l " syntax d i r e c t e d compiler w r i t e r s as YACC s t i l l being invented, but more s o p h i s t i c a t e d techniques such as m u l t i p l e passes over the parse tr e e (Abramson et a l . [ 1 ] ) , and augmented grammars (which may be, under c e r t a i n c o n d i t i o n s , implemented as m u l t i p l e passes over a parse t r e e : see Bochmann [15]) continue to use, as t h e i r b a s i s , the paradigm of syntax d i r e c t e d t r a n s l a t i o n . 2.1.2 The problems with context f r e e syntax We have seen, i n the preceding d i s c u s s i o n , that some concept of s y n t a c t i c s t r u c t u r e i s at the core of every aspect of programming languages: design, d e f i n i t i o n , and implementation. BNF, which was developed f o r the d e f i n i t i o n of A l g o l 60, the f i r s t g enuinely "modern" programming language, i s h i g h l y s e c t i o n 2.1.2 26 s u c c e s s f u l as a model f o r the s y n t a c t i c s t r u c t u r e of programming languages and, s u i t a b l y r e s t r i c t e d , y i e l d s to well-understood and q u i t e powerful techniques for a u t o m a t i c a l l y g e nerating a c c e p t o r s ( p a r s e r s ) . Given a r e c u r s i v e d e f i n i t i o n of programming language syntax, i t was a n a t u r a l though s u r p r i s i n g l y r a p i d development that syntax d i r e c t e d t r a n s l a t o r s and t r a n s l a t o r w r i t i n g systems emerged; and the success of the syntax d i r e c t e d approach to t r a n s l a t i o n , at l e a s t in terms of the number of systems based on i t , lends i t great credence as a model for t r a n s l a t i o n . N e v e r t h e l e s s , almost from the very beginning i t has been recognized that BNF i s an inadequate r e p r e s e n t a t i o n for the complete syntax of programming languages such as A l g o l 60. F l o y d demonstrated, as e a r l y as 1962,. that the A l g o l 60 requirement that a l l v a r i a b l e s used in a program be d e c l a r e d (a requirement followed by almost a l l subsequent programming l a n g u a g e s ) , 8 makes i t impossible to d e f i n e the language completely by means of a "simple phrase s t r u c t u r e " (context f r e e ) grammar [41]. T h i s l i m i t a t i o n on context f r e e grammars, and hence on BNF, i s a l s o a l i m i t a t i o n on syntax d i r e c t e d t r a n s l a t o r s . Indeed, i t 8 A notable exception i s PL/I which, as we have a l r e a d y remarked in the d i s c u s s i o n on language design in s e c t i o n 2.1.1(a), d i d not e n t i r e l y absorb the important i n n o v a t i o n s of A l g o l 60. Transforming t h i s simple requirement i n t o a complex set of d e f a u l t r u l e s appears to have been a mistake which has brought many a PL/I programmer to g r i e f . s e c t i o n 2.1.2 27 i s i n d e a l i n g with i d e n t i f i e r s of a l l kinds that syntax d i r e c t e d t r a n s l a t o r s d i s p l a y t h e i r g r e a t e s t weakness, e i t h e r i n ad hoc treatment, or even i n a complete breakdown of the model in those p l a c e s where i d e n t i f i e r s occur. Context s e n s i t i v e r e s t r i c t i o n s on programming languages are g e n e r a l l y d e a l t with under the heading of "semantics". T h i s merely adds f u r t h e r f u z z i n e s s to an a l r e a d y overworked term, however: as Ledgard says, i n a paper d e s c r i b i n g c e r t a i n e x tensions of the context f r e e (BNF) model, "R e l e g a t i n g the context s e n s i t i v e requirements to the d e f i n i t i o n of semantics obscures the is s u e s p r o p e r l y c o n s i d e r e d as  semanties" [76] ( i t a l i c s mine; see a l s o McKeeman [85], p 14). Because data d e s c r i p t i o n and the use of names has been s t e a d i l y i n c r e a s i n g i n s o p h i s t i c a t i o n and complexity in recent y e a r s , 5 context f r e e syntax has become i n c r e a s i n g l y more inadequate as a d e s c r i p t i o n of programming languages because i t e f f e c t i v e l y ignores one of t h e i r most important aspects, namely the r e l a t i o n s h i p between a name and the nature of the data i t r e f e r s to ( i t s d e c l a r a t i o n ) ; i t cannot be long, t h e r e f o r e , before t r a n s l a t i o n techniques d i r e c t e d by context f r e e grammars are seen as h o p e l e s s l y inadequate not because s y n t a c t i c s t r u c t u r e w i l l cease to be an e f f e c t i v e b a s i s for d e a l i n g with programming languages, but, on the c o n t r a r y , because context f r e e syntax alone w i l l no longer d e s c r i b e enough of the s t r u c t u r e of the l a n g u a g e . 1 0 ' See s e c t i o n s 3.1.1 and 3.1.2, e s p e c i a l l y as they r e l a t e to the concept of a b s t r a c t i o n . s e c t i o n 2.1.2 28 N e v e r t h e l e s s , not only i s the syntax d i r e c t e d t r a n s l a t i o n paradigm s t i l l pursued as s e r i o u s l y as ever, but s y n t a c t i c d e s c r i p t i o n s even more l i m i t e d i n scope than context f r e e continue to be put forward as bases f o r general t r a n s l a t o r w r i t i n g systems. In a 1967 paper d e s c r i b i n g the ML/I macro p r o c e s s o r , 1 1 Brown admits the l i m i t a t i o n of a language design using a r e g u l a r expression r e c o g n i z e r to c e r t a i n kinds of language ex t e n s i o n [18]. But i n a 1976 paper -- which i s chosen here not to be s i n g l e d out, but to i l l u s t r a t e a type Tanenbaum proposes the use of a macro proc e s s o r , i n t h i s case ML/I, as a t r a n s l a t o r w r i t i n g t o o l [122]; however, the language chosen to i l l u s t r a t e the use of ML/I has w e l l balanced c o n t r o l s t r u c t u r e s , l i k e IF-THEN-ELSE-FI and WHILE-DO-OD, which ML/I i s able to parse: the proposed method would not do so w e l l with A l g o l 60 or P a s c a l . Other general-purpose macro p r o c e s s o r s l i k e TRAC, GPM, and Leavenworth's syntax macros, would not succeed even as w e l l as ML/I, s i m p l y because ML/l has (and Tanenbaum makes use of) a symbol-table f a c i l i t y to deal with context s e n s i t i v e matters. Even n o n s y n t a c t i c approaches, l i k e the use of Markov systems as a b a s i s f o r t r a n s l a t i o n , continue to be suggested with l i t t l e or no a t t e n t i o n given in these d i s s e r t a t i o n s to genuinely complex t r a n s l a t i o n p r o b l e m s . 1 2 That such systems 1 0 T h i s argument i s e s s e n t i a l l y s i m i l a r to one put forward by Schwanke [114]. 1 1 ML/I w i l l be more f u l l y t r e a t e d i n the s e c t i o n on language extension (3.1.1) s e c t i o n 2.1.2 29 succeed at any t r a n s l a t i o n s i s remarkable; I can only recommend, from p e r s o n a l experience, w r i t i n g a R a t f o r preprocessor in Snobol as a sure way to be convinced t h a t , while i t can be done, i t i s not the way to do i t , e s p e c i a l l y not f o r general t r a n s l a t o r w r i t i n g . Given that context f r e e grammars are inadequate f o r the d e s c r i p t i o n of programming languages, the q u e s t i o n n a t u r a l l y a r i s e s how the syntax d i r e c t e d approach has n e v e r t h e l e s s managed to become so thoroughly the b a s i s f o r t r a n s l a t o r s . The answer l i e s , in almost every case, in the use of a symbol t a b l e , and a minor r e s t r i c t i o n on programming languages that d e c l a r a t i o n s of names must t e x t u a l l y precede t h e i r use. Consider the simple language L: (1 ) L : : = BLOCK (2 ) BLOCK ::= beqin DCLS; STMS end (3a) DCLS := TYPE VAR (3b) 1 TYPE VAR ; DCLS (4 ) TYPE := r e a l i n t | ... (5 ) VAR : = a | b ... | z (6a) STMS := STM (6b) | STM ; STMS (7a) STM : •= VAR (7b) 1 BLOCK T h i s context f r e e grammar generates such programs as begin i n t a ; a end [ P l ] begin r e a l b ; c end [P2] Our i n t u i t i o n about the r e l a t i o n s h i p between d e c l a r a t i o n s and 1 2 A recent (1979) example using Markov systems i s due to Tur c h i n [125]. s e c t i o n 2.1.2 30 uses of names t e l l s us that PI i s a l e g a l , and P2 an i l l e g a l program; but the context f r e e grammar in no way d i s t i n g u i s h e s between the two, nor, as F l o y d has shown, can i t do so uniformly for a l l s t r i n g s . A p a r s e r with symbol t a b l e operates as f o l l o w s for these programs: the sequence of r u l e s in a left m o s t d e r i v a t i o n of PI i s : L => BLOCK (1 ) => begin DCLS ; STMS end (2 ) => begin TYPE VAR ; STMS end (3a) => begin i n t VAR ; STMS end (4b) => begin i n t a ; STMS end (5a) => begin i n t a ; STM end (6a) => begin i n t a ; VAR end (7a) => begin i n t a ; a end (5a) A f t e r the f i r s t a p p l i c a t i o n of r u l e 5a, the symbol t a b l e c o n t a i n s the p a i r <a,int>, i n d i c a t i n g that v a r i a b l e a has been d e c l a r e d as type i n t e g e r . At the second a p p l i c a t i o n of r u l e 5a, the symbol t a b l e i s examined f o r a p a i r <a,T>, where T may be any d e r i v a t i o n of TYPE; s i n c e <a,int> i s present, r u l e 5a succeeds and so does the d e r i v a t i o n . N o t i c e that t h i s sketch not only shows how the symbol t a b l e i s used to r e s t r i c t the context f r e e grammar, but that i t i m p l i c i t l y r e q u i r e s d e c l a r a t i o n s to precede uses, s i n c e otherwise the p a i r <a,T> would not be present at the r i g h t moment. The d e r i v a t i o n of program P2 i s s i m i l a r , except f o r the two s e c t i o n 2.1.2 31 a p p l i c a t i o n s of r u l e 5a, which generates the symbols b and c, r e s p e c t i v e l y . Note what happens in the symbol t a b l e , however: the f i r s t a p p l i c a t i o n of r u l e 5a adds the p a i r <b,real> to the symbol t a b l e ; but the second a p p l i c a t i o n f a i l s to f i n d a p a i r <c,T> in the symbol t a b l e , and the d e r i v a t i o n f a i l s . T h i s d e s c r i p t i o n of the use of symbol t a b l e s has g l o s s e d over s e v e r a l important d e t a i l s . I t c o u l d a f f o r d to do so because the example was a simple one, but i n g e n e r a l , s i n c e L i s a block s t r u c t u r e d language, the symbol t a b l e would have to a s s o c i a t e a stack of d e c l a r a t i o n s with each symbol; a d e c l a r a t i o n of that symbol in a new block would push the type onto the stack, and at the end of each block, those symbols that were d e c l a r e d i n that block would have t h e i r s t a c k s popped to the p r e v i o u s d e c l a r a t i o n . In order to do t h i s , i t i s necessary to put i n t o the symbol t a b l e not only the type (which we have only used so f a r as a token to put on the stack) but a l s o the block depth and, f o r the sake of the code generator, a count of the number of symbols a l r e a d y d e c l a r e d in the b l o c k . For example, the program begin i n t x ; begin r e a l x ; x end x end [P3] i s accepted as f o l l o w s : at the f i r s t occurrence of x, the symbol t a b l e becomes: x : <int, 1, 1> A f t e r the second occurrence of x the symbol t a b l e changes to r e f l e c t the new d e c l a r a t i o n : s e c t i o n 2.1.2 32 x : <real, 2, 1> <int, 1, 1> The f i r s t use of x i s t h e r e f o r e i d e n t i f i e d as <real,2,l> (the top element on the s t a c k ) ; a f t e r the f i r s t end, the stack i s popped to r e t u r n to i t s e a r l i e r s t a t e , and the second use of x i s i d e n t i f i e d as < i n t , l , l > ; a f t e r the second end, the stack i s popped again and x ceases to be a d e c l a r e d symbol. T h i s d i s c u s s i o n , which should be f a m i l i a r i n some form to anyone used to d e a l i n g with modern programming languages, i s presented p r i m a r i l y as an i n t r o d u c t i o n to more formal methods of saying the same t h i n g . That i t f a i l s to d e f i n e p r e c i s e l y what happens f o r a l l cases, but merely i l l u s t r a t e s what happens in two and depends on the reader's powers of i n d u c t i o n to g e n e r a l i z e , i l l u s t r a t e s the inadequacy of such i n f o r m a l methods, e s p e c i a l l y when one attempts to give a p r e c i s e d e f i n i t i o n of the r e l a t i o n s h i p between d e c l a r a t i o n and use of names in languages more r e a l i s t i c than L. That more p r e c i s e methods were needed was understood, as we have seen, at the time of A l g o l 60's d e f i n i t i o n ; but such methods were not immediately forthcoming. The d e f i n i t i o n of A l g o l 68, however, brought matters to a head, r e s u l t i n g in both van Wijngaarden's two l e v e l grammars, a l s o known as W-grammars, 1 3 and Roster's a f f i x grammars [73] ( e q u i v a l e n t l y , 1 3 Both these names are dodges around the unpronouncabi1ity of "van Wijngaarden grammars" in any language but Dutch. sect ion 2.1.2 33 and contemporaneously, Knuth's a t t r i b u t e grammars [71, 15], which have become the more g e n e r a l l y known form). The o r i g i n a l A l g o l 68 report [131] d i d not completely s p e c i f y the language's context s e n s i t i v e syntax in W-grammar, but depended h e a v i l y on a l e g a l e s e E n g l i s h that has been much c r i t i c i z e d as g i v i n g the language a r e p u t a t i o n f o r o b s c u r i t y ; the r e v i s e d r e p o r t [132], i s s u e d i n 1976, succeeded in d e s c r i b i n g a l l of the language in W-grammar form, at the cost of l o s i n g much of the r e l a t i o n s h i p between the grammar and the s y n t a c t i c s t r u c t u r e of the program i t d e s c r i b e s . A complete W-grammar d e s c r i p t i o n of L, along with a reasonable d e s c r i p t i o n of how i t operates ( i n c l u d i n g a d e s c r i p t i o n of the W-grammar mechanism), i s too long f o r the purposes of t h i s s e c t i o n , e s p e c i a l l y because such a d i s c u s s i o n i s a l r e a d y a v a i l a b l e elsewhere (Marcotty e_t a l . [87], pp 199-216), with r e f e r e n c e to A s p l e , a small A l g o l - f a m i l y language. That d i s c u s s i o n uses the metanotion TABLE to perform the f u n c t i o n of a symbol t a b l e . The d e c l a r a t i o n " i n t x" in program P3 would, f o r example, be represented as a s t r i n g of the form l o c l e t t e r x has i n t ... end which i s a l e g a l d e r i v a t i o n from TABLE ( p o r t i o n s of Asple d e f i n i t i o n s not u s e f u l f o r a d e f i n i t i o n of L are represented by e l l i p s e s ) . T h i s i s the e q u i v a l e n t of adding <x,int> to the symbol t a b l e . The r u l e that a symbol must not be d e c l a r e d more than once i s embodied in the h y p e r r u l e s e c t i o n 2.1.2 34 LOCSETY l o c TAG has MODE ... end r e s t r i c t i o n s : where TAG i s not in LOCSETY, LOCSETY r e s t r i c t i o n s ; where LOCSETY i s EMPTY . Th i s checks that the l a s t symbol (TAG) in the p o r t i o n of the t a b l e to be checked i s not in the r e s t of the t a b l e , by use of another h y p e r r u l e which e v e n t u a l l y produces "where TAG i s not in LOCSETY" as "EMPTY" i f t h i s i s so. It a l s o checks that the same r e s t r i c t i o n holds f o r the r e s t of the t a b l e . I t succeeds i f the t a b l e i s empty, which i n d i c a t e s that i t has been completely checked. The b a s i c r u l e f o r programs i s program : begin, (0) d e l t r a i n of TABLE, (1) TABLE r e s t r i c t i o n s , (2) TABLE STMTS stm train, ( 3 ) end, ... (4) T h i s c r e a t e s TABLE in l i n e (1), checks d e c l a r a t i o n r e s t r i c t i o n s through the above h y p e r r u l e in l i n e (2), and then passes the symbol t a b l e on to the statement p o r t i o n , to be used i n context checks, i n l i n e (3). The r u l e TABLE MODE TAG i d e n t i f i e r : TAG i d e n t i f i e r , where TABLE c o n t a i n s l o c TAG has MODE, (a r e s u l t of c e r t a i n d e r i v a t i o n s of the "TABLE STMTS stm t r a i n " p o r t i o n of the preceding r u l e ) i s used to check that any symbol in the program occurs i n the t a b l e and, i n c i d e n t a l l y , to e s t a b l i s h i t s type, f o r other p o s s i b l e r e s t r i c t i o n s on the use of symbols. s e c t i o n 2.1.2 35 Asple i s not block s t r u c t u r e d , so the d e f i n i t i o n of TABLE i s not s u f f i c i e n t l y complex to model the stack introduced i n the e a r l i e r d i s c u s s i o n of L, and thus d e a l with the l o c a l i t y of v a r i a b l e d e c l a r a t i o n s . However, i t i s easy to see that the same p r i n c i p l e s can be a p p l i e d . W-grammars of t h i s s o r t do not y i e l d to automatic generation of p a r s e r s and, as such, W-grammars are p u r e l y a d e f i n i t i o n a l d e v i c e . Augmented grammars on the other hand can, under a u t o m a t i c a l l y e n f o r c a b l e r e s t r i c t i o n s , be used to s p e c i f y p a r s e r s [29,15]. Again, we s h a l l depend on the d i s c u s s i o n of a t t r i b u t e grammars i n Marcotty et, a_l. (pp 250-265) to shorten our d i s c u s s i o n here to the necessary minimum. The primary concept which permits these grammars to deal with a r b i t r a r y c o n t e x t u a l requirements, and which g i v e s a t t r i b u t e grammars t h e i r name, i s that of i n h e r i t e d and s y n t h e s i s e d a t t r i b u t e s . These are e f f e c t i v e l y parameters ( r e s t r i c t e d t o , r e s p e c t i v e l y , input and output) to a r e c o g n i t i o n procedure i n a top down parser where each nonterminal symbol repr e s e n t s such a procedure. Thus the context f r e e r u l e program ::= begin d e l s ; stms end i s e q u i v a l e n t to the r e c u r s i v e program d e f i n i t i o n d e f i n e program () as "begin", d e l s ( ) , II . II I t stms ( ) , "end". s e c t i o n 2.1.2 36 In a proper context s e n s i t i v e parse, however, i t i s necessary to check that a l l v a r i a b l e s are d e c l a r e d , and t h i s i s done by passing parameters to, and r e c e i v i n g values back from, "procedures". For example, def ine program () as "begin", d e l s ({}, env), stms (env), "end". def ine d e l s (value e n v l ; r e s u l t env2) a_s d e l ( e n v l , env2) or d e l ( e n v l , env3), r r d e l s (env3, env2) . e t c . The program r u l e passes an empty set to the d e l s r u l e , and expects to r e c e i v e back a set (env) r e p r e s e n t i n g the symbols d e c l a r e d t h e r e , which i t passes on to the stms r u l e . The d e l s r u l e r e c e i v e s a set of d e c l a r e d symbols, passes t h i s on to the d e l r u l e and expects to get back a new set, c o n s i s t i n g of the o l d set augmented with f u r t h e r d e c l a r a t i o n s ; t h i s i t e i t h e r r e t u r n s or, in a r e c u r s i v e c a l l , augments f u r t h e r before r e t u r n i n g . The s e t s named "env" above, are used in a manner h i g h l y analogous to the use of TABLE in the W-grammar fo r m u l a t i o n . (See Marcotty et a l . , pp 250-265, for more standard n o t a t i o n ; compare a l s o the use of augmented grammars in s e c t i o n 4.1, below.) The purpose of going through these two examples of formal s e c t i o n 2.1.2 37 d e f i n i t i o n was to show that the problem of d e a l i n g with context s e n s i t i v e aspects of programming languages has been approached in a v a r i e t y of ways. E s s e n t i a l l y e q u i v a l e n t f o r m u l a t i o n s [1,29,77,99] need not be d i s c u s s e d f u r t h e r here, and i t should be obvious from the above d i s c u s s i o n that even two such d i s s i m i l a r approaches as W-grammars and a t t r i b u t e grammars are a p p l i e d to s p e c i f i c programming language d e f i n i t i o n s i n e s s e n t i a l l y the same way: the context fre e form of the program i s f u r t h e r r e s t r i c t e d by comparing i t s use of symbols a g a i n s t i t s d e c l a r a t i o n s of them in what amounts to a symbol t a b l e ; they are, i n the f i n a l a n a l y s i s , a l l formal restatements of our e a r l i e r d i s c u s s i o n on the use of such t a b l e s i n c o m p i l e r s . In the next s e c t i o n , we w i l l develop a r e p r e s e n t a t i o n for the s y n t a c t i c s t r u c t u r e of programs that w i l l d e a l with context s e n s i t i v e r e s t r i c t i o n s without r e s o r t i n g to symbol t a b l e s . 1 4 T h i s w i l l be done so t h a t , subsequently, a view of t r a n s l a t i o n can be developed that depends on a dynamic syntax of programs (r a t h e r than the s t a t i c view of syntax d i r e c t e d t r a n s l a t i o n ) , and so that a uniform treatment of t r a n s l a t i o n can be c o n s i d e r e d 1 4 T h i s i s not to say that symbol t a b l e s are thereby d i s c a r d e d . An implementation of a p a r t i c u l a r compiler w i l l almost c e r t a i n l y make use of some form of symbol t a b l e , unless there i s a major departure from our c u r r e n t view of syntax and i t s r o l e i n programming language d e f i n i t i o n . The symbol t a b l e w i l l merely, i n t h i s f o r m u l a t i o n , be pushed i n t o the background as a p r i m i t i v e concept upon which more l u c i d c o nceptions of s y n t a c t i c s t r u c t u r e are based, much as the goto has been pushed i n t o the background of modern language design i s s u e s even though i t c o n t i n u e s to be the b a s i c c o n t r o l flow p r i m i t i v e for a l l computers. s e c t i o n 2.1.2 38 in chapter 3. 2.2 Representing s y n t a c t i c s t r u c t u r e as a graph We have seen, in s e c t i o n 2.1, that s y n t a c t i c s t r u c t u r e i s a p e r v a s i v e concept, important i n a l l aspects of programming languages. We have seen a l s o that context f r e e syntax i s not s u f f i c i e n t l y e x p r e s s i v e of the s t r u c t u r e of programs to be abl e , by i t s e l f , to d i r e c t t r a n s l a t i o n , and we have examined the pr e v a l e n t method f o r overcoming t h i s inadequacy by using some form of symbol t a b l e . In t h i s s e c t i o n we w i l l be developing the notion of s y n t a c t i c s t r u c t u r e and i t s r e p r e s e n t a t i o n along t r a d i t i o n a l l i n e s , but to a new c o n c l u s i o n , and we w i l l see that t h i s e l i m i n a t e s -- at an important point in the t r a n s l a t i o n process the need f o r such c l a s s i c a l supports as symbol t a b l e s , and that i t opens up new p o s s i b l i t i e s i n the expression of t r a n s l a t i o n a l g o r i t h m s . T h i s development w i l l take p l a c e in three stages: the f i r s t ( s e c t i o n 2.2.1) reviews the commonly accepted r e l a t i o n s h i p between a tree r e p r e s e n t a t i o n of a program and i t s context f r e e syntax; the next ( s e c t i o n 2.2.2) reviews the disambiguating e f f e c t of r e p r e s e n t i n g s y n t a c t i c s t r u c t u r e by means of a t r e e , and the concept of a b s t r a c t syntax; f i n a l l y , s e c t i o n 2.2.3 develops a d i r e c t e d graph r e p r e s e n t a t i o n of context s e n s i t i v e syntax and p l a c e s t h i s i n i t s h i s t o r i c a l p e r s p e c t i v e . 39 2.2.1 Representing context free s y n t a c t i c s t r u c t u r e as a t r e e The r e p r e s e n t a t i o n of s y n t a c t i c s t r u c t u r e as a t r e e goes back at l e a s t as f a r as Chomsky's work on phrase s t r u c t u r e grammars. In a paper on formal p r o p e r t i e s [24] he d e s c r i b e s the r e l a t i o n s h i p between a d e r i v a t i o n of a s t r i n g with re s p e c t to a context s e n s i t i v e grammar, and a h i e r a r c h i c a l r e p r e s e n t a t i o n of i t s phrase s t r u c t u r e (a t r e e ) . We w i l l concern o u r s e l v e s here with context f r e e grammars, because i t i s these that lend themselves to automatic p a r s i n g techniques. The f o l l o w i n g d i s c u s s i o n i s based in pa r t on Aho and Ullman [ 3 ] , pp 129-133. Consider a grammar f o r a r i t h m e t i c e x p r e s s i o n s : E ::= E+E | ExE | (E) | -E | i d [G2] (1) (2) (3) (4) 15) A le f t m o s t d e r i v a t i o n of the s t r i n g "- (_id+id) " i s : E => -E => -(E) => -(E+E) => -(id+E) => - ( i d + i d ) (4) (3) (1) (5) (5) (This d e r i v a t i o n i s l e f t m o s t because at each stage the le f t m o s t nonterminal i s replaced.) Each r u l e i s marked with a pa r e n t h e s i z e d number (e.g., E ::= (E) i s r u l e 3) and each step in the d e r i v a t i o n i s marked with the number of the r u l e that was a p p l i e d . Every r u l e in a grammar may be represented by a tre e showing the h i e r a r c h i c a l r e l a t i o n s h i p between the l e f t and r i g h t s e c t i o n 2.2.1 40 hand s i d e s . For example, E 1 1 1 E x E i s a r e p r e s e n t a t i o n of the r u l e "E ::= E x E". Reading the leaves of t h i s t r e e from l e f t to r i g h t w i l l r e c o n s t r u c t the r i g h t hand sid e of the r u l e , and i s c a l l e d a s e n t e n t i a l form. Both of the nonterminals E that are leaves of t h i s t r e e may be roots of t r e e s themselves. In g e n e r a l , i n a d e r i v a t i o n E => A =>...=> A 1 n where each A i s a s e n t e n t i a l form, and A i s the t e r m i n a l i n s t r i n g , each step in the d e r i v a t i o n A => A i i + 1 r e p r e s e n t s a r e w r i t i n g of a s t r i n g of ( t e r m i n a l and nonterminal) symbols 1 1 ...1 E r r . . . r => 1 1 ...1 mm ...m r r . . . r 1 2 u 1 2 w 1 2 u l 2 v l 2 w by a p p l i c a t i o n of a r u l e E ::= m m ...m 1 2 v If the s e n t e n t i a l form A i s represented by the l e a v e s of a i p a r t i a l parse t r e e T , then we can c r e a t e T , r e p r e s e n t i n g i i+1 s e n t e n t i a l form A , by r e p l a c i n g the l e a f r e p r e s e n t i n g the i+1 symbol E by the t r e e r e p r e s e n t i n g the r u l e . T h i s process begins s e c t i o n 2.2.1 41 with a s i n g l e r o o t / l e a f node E; so i n the above example, a f t e r a p p l i c a t i o n of r u l e 4 we have the t r e e E I 1 1 E and so on, u n t i l we have the t r e e E ( E ) [T3] I 1 1 E + E I I i d i d (In a bottom-up parse the s e n t e n t i a l form i s g r a d u a l l y reduced to a s i n g l e symbol. Obviously i t i s p o s s i b l e to form the parse t r e e from such a parse as w e l l : here, i n s t e a d of a s i n g l e incomplete t r e e r e p r e s e n t i n g a s e n t e n t i a l form of the s t r i n g at some stage by i t s le a v e s , there i s a set of t r e e s -- u s u a l l y on a stack -- with the f i n a l s t r i n g at t h e i r leaves, and the s e n t e n t i a l form r e c o n s t r u c t a b l e from t h e i r root nodes.) The above example d i s p l a y s no ambiguity; n e v e r t h e l e s s , the grammar i s ambiguous and, f o r example, the s t r i n g " i_d+i_dx jld" has two l e f t m o s t d e r i v a t i o n s , imposing d i f f e r e n t i n t e r p r e t a t i o n s on the order of e v a l u a t i o n : s e c t i o n 2.2.1 42 E E [T4] I I 1 1 I 1 1 I i d E x E E + E i d I I I I i d i d i d i d (a) (b) I t i s in examples of t h i s s o r t that we can see most c l e a r l y the importance of a sense of s y n t a c t i c s t r u c t u r e . The s t r i n g "_id+_idxi_d" does not in i t s e l f i n d i c a t e the order of e v a l u a t i o n : we must impose a r u l e l i k e " m u l t i p l y before adding" ( i n t e r p r e t a t i o n a ) , or "evaluate from l e f t to r i g h t " ( b ) . The t r e e r e p r e s e n t a t i o n s , on the other hand, are unambiguous. In the next s e c t i o n , we w i l l pursue t h i s disambiguating e f f e c t of t r e e s t r u c t u r e f u r t h e r . 2.2.2 A b s t r a c t syntax and a b s t r a c t syntax t r e e s We have seen how a t r e e may be used to represent the s y n t a c t i c s t r u c t u r e of a s t r i n g , and how a t r e e r e p r e s e n t a t i o n may be c o n s t r u c t e d d i r e c t l y from the s t r i n g ' s d e r i v a t i o n . For the t r e e s i n the preceding s e c t i o n , the converse i s o b v i o u s l y a l s o t r u e : the o r i g i n a l s t r i n g may be read o f f from the leaves of the syntax t r e e , and t h e • d e r i v a t i o n may be r e c o n s t r u c t e d from the s t r u c t u r e of the t r e e . We have a l s o noted that a t r e e r e p r e s e n t s the s y n t a c t i c s t r u c t u r e of a s t r i n g unambiguously, even though the u n d e r l y i n g grammar may be ambiguous. This i s because a t r e e always s e c t i o n 2.2.2 43 represents one or another of a number of p o s s i b l e d e r i v a t i o n s . This f a c t has been used to advantage in syntax d i r e c t e d t r a n s l a t i o n s to reduce the s i z e of the semantic p o r t i o n of the s p e c i f i c a t i o n . For example, i f the above grammar G2 were to be r e w r i t t e n as an unambiguous grammar, i t might take the form E T P = T = P T + E P x T = i d | -P | (E) In t h i s grammar, there i s only one d e r i v a t i o n f o r the s t r i n g "_id+i_dx_id"; on the other hand, i t s parse t r e e now looks l i k e E T + E I I 1 1 P P x T i d i d P I i d The s i z e of the grammar has gone from f i v e r u l e s to seven, and the parse t r e e from ten nodes to twelve. In f a c t , the expression c o u l d be represented by the t r e e + _ l _ i d x [T5] l l i d i d This t r e e c o n t a i n s only f i v e symbols -- e x a c t l y as many as i n the o r i g i n a l s t r i n g -- but g i v e s unambiguous form to the s t r i n g . The e x p r e s s i o n "-(i_d+_id)" can be represented as neg + [T6] I i i d i d Here, only the four important symbols are r e p r e s e n t e d : the s e c t i o n 2.2.2 44 p a r e n t h e s e s , which' s e r v e o n l y t o d i s a m b i g u a t e the s t r i n g and have no v a l u e i n the t r e e r e p r e s e n t a t i o n , a r e removed a l t o g e t h e r . Trees T5 and T6 are s a i d t o r e p r e s e n t the a b s t r a c t s y n t a x of the s t r i n g s "Ld+_idxid" and " - (_id + i d ) " : w h i l e i t i s p o s s i b l e to g i v e a p r o c e d u r e f o r the r e c o n s t r u c t i o n of the s t r i n g and i t s d e r i v a t i o n from the t r e e , i t i s no l o n g e r a s i m p l e u n i v e r s a l statement l i k e " read t h e - l e a v e s from l e f t t o r i g h t , " but must now be g i v e n anew f o r every grammar. 2.2.3 R e p r e s e n t i n g the f u l l c o n t e x t s e n s i t i v e a s p e c t s of programming languages as graphs So f a r i n s e c t i o n 2.2 we have seen t h a t c o n t e x t f r e e s y n t a c t i c s t r u c t u r e may be r e p r e s e n t e d unambiguously and, w i t h the i n t r o d u c t i o n of the i d e a of a b s t r a c t s y n t a x , p a r s i m o n i o u s l y i n the form of t r e e s . In s e c t i o n 2.1, however, we c i t e d m a t e r i a l which e s t a b l i s h e s t h a t c o n t e x t f r e e syntax cannot e n t i r e l y d e s c r i b e programming l a n g u a g e s , and t h e r e f o r e we would l i k e t o f i n d a r e p r e s e n t a t i o n of the complete syntax of programming languages w i t h the same q u a l i t i e s of b e i n g both p a r s i m o n i o u s and unambiguous. J u s t as grammars c a p a b l e of g e n e r a t i n g programming languages e x a c t l y need be o n l y s l i g h t e x t e n s i o n s of c o n t e x t f r e e grammars, 1 5 so i t i s p o s s i b l e t o e x t e n d t r e e s t r u c t u r e s l i g h t l y s e c t i o n 2.2.3 45 in order to represent the context s e n s i t i v e s y n t a c t i c s t r u c t u r e of programs. Consider the program i l l u s t r a t i n g block s t r u c t u r e i n s e c t i o n 2.1.2: begin i n t x 1 1 ' begin r e a l x 1 2 > ; x ( 3 5 end ; x ( 4 1 end [ P3 ] An a b s t r a c t context f r e e t r e e r e p r e s e n t a t i o n might be: block ' ' 1 • ' var stms [T7] l 1 1 l 1 1 i n t x ( 1 5 block x< 4 > I 1 1 var x ( 3 1 I 1 1 r e a l x ( 2 > (The a c t u a l v a r i a b l e s i n t h i s program are a l l named "x"; the s u p e r s c r i p t s are pu r e l y a n o t a t i o n a l device to d i s t i n g u i s h the four occurrences.) A symbol t a b l e , such as the one i l l u s t r a t e d i n s e c t i o n 2.1.2, would make an i d e n t i f i c a t i o n of symbol x ( 3 ) with x < 2 ) and of x ( 4 ) with x ( 1 ) . Such an i d e n t i f i c a t i o n can be made, e n t i r e l y w i t h i n the g r a p h i c a l r e p r e s e n t a t i o n of the program, and without r e s o r t i n g to a symbol t a b l e , by t r e a t i n g i t as a more general data s t r u c t u r e than a t r e e . For example, the f o l l o w i n g data s t r u c t u r e i s an a b s t r a c t r e p r e s e n t a t i o n of t h i s c o n t e x t u a l r e l a t i o n s h i p : 1 5 See s e c t i o n 2.1.2, where i t was argued that a l l the common methods f o r d e a l i n g with the context s e n s i t i v e r e s t r i c t i o n s on programming languages e s s e n t i a l l y i n t r o d u c e a symbol t a b l e i n t o the parse or d e r i v a t i o n . s e c t i o n 2.2.3 46 block stms [T8] block >var var< i n t x r e a l x The r e p r e s e n t a t i o n of program P3 by graph T8, above, has t h i s advantage over r e p r e s e n t a t i o n T7, that i t leaves no doubt about the types of the two symbols x < 3 ) , x < 4 ) appearing i n statements. An i n t e r p r e t e r , making a l e f t to r i g h t t o u r 1 6 of T8 would encounter each d e f i n i t i o n of x twice -- for a t o t a l of four encounters: e x a c t l y as many as in a tour of T7, because the h i e r a r c h i c a l s t r u c t u r e of the program r e p r e s e n t a t i o n remains the same. The idea of r e p r e s e n t i n g context s e n s i t i v e syntax s t r u c t u r e as a d i r e c t e d graph i s not new: I have a l r e a d y mentioned Chomsky's r e p r e s e n t a t i o n for d e r i v a t i o n s with respect to context s e n s i t i v e grammars; both Woods [143] and Baker [8] have used s i m i l a r r e p r e s e n t a t i o n s f o r d e r i v a t i o n s with repect to type 0 grammars. In a l l these cases, the s t r u c t u r e captured i s that of the parse, however, not of the s t r i n g being parsed. Context s e n s i t i v e grammars can be used to r e s t r i c t the r e l a t i o n s h i p s between d e f i n i t i o n s and a c t u a l uses of symbols to j u s t those commonly accepted i n programming languages; but i n e v i t a b l y they 1 6 D e f i n i t i o n s of t r e e , graph, and l e f t to r i g h t tour f o l l o w i n Chapter 3. s e c t i o n 2.2.3 47 do so by a great deal of s t r i n g manipulation i n which nonterminal symbols are s h u f f l e d back and f o r t h . In the d i r e c t e d graph r e p r e s e n t a t i o n of such d e r i v a t i o n s , the s t r u c t u r e of the u n d e r l y i n g s t r i n g i s l o s t . Compare, f o r example, the W-grammar d e r i v a t i o n of an Asple program i n Marcotty, Ledgard and Bochmann's survey ([87], pp 199-215) with i t s context f r e e d e r i v a t i o n (p 195). The sample program should be small; say, begin i n t x; x:=1 end Draw a Chomsky-style d e r i v a t i o n graph of the former, and a simple parse t r e e of the l a t t e r , i f necessary; the d i f f e r e n c e should become apparent before too long -- as should the reason t h i s e x e r c i s e i s l e f t to the reader and not performed here. It i s one of the v i r t u e s of context fr e e d e r i v a t i o n s that the grammar imposes a h i e r a r c h i c a l s t r u c t u r e on the language. T h i s h i e r a r c h i c a l s t r u c t u r e i s d i r e c t l y d e r i v e d from the grammar, and i s e x p l o i t a b l e by syntax d i r e c t e d t r a n s l a t i o n schemes i n that i t permits the r e c u r s i v e d e f i n i t i o n of the t r a n s l a t i o n to be d i r e c t l y a s s o c i a t e d with the i n h e r e n t l y r e c u r s i v e d e f i n i t i o n of the grammar. The graph r e p r e s e n t a t i o n s d e f i n e d above combine t h i s v i r t u e , and the economical d e f i n i t i o n of t r a n s l a t i o n i t makes p o s s i b l e , with the v i r t u e of r e p r e s e n t i n g unambiguously the r e l a t i o n s h i p between an a c t u a l occurrence of a symbol and i t s d e f i n i t i o n : a r e l a t i o n s h i p that i s not d e f i n e d i n pure context f r e e r e p r e s e n t a t i o n s except by s e c t i o n 2.2.3 48 recourse to a symbol t a b l e . The view of t r a n s l a t i o n presented i n the next s e c t i o n , i n which t r a n s l a t i o n of programs i s t r e a t e d as a t r a n s f o r m a t i o n of s y n t a c t i c s t r u c t u r e s , gains i n c l a r i t y by being d e f i n e d over syntax graphs, rather than syntax t r e e s . In p a r t i c u l a r , although none of the a b i l i t y of syntax t r e e s to represent h i e r a r c h i c a l s t r u c t u r e i s l o s t , the e f f e c t of merging the formerly separate notion of a symbol t a b l e with the s y n t a c t i c r e p r e s e n t a t i o n i s to s i m p l i f y the d e f i n i t i o n of t r a n s l a t i o n : i n s t e a d of two concepts, there i s now only one to d e a l with. 2.3 Graph t r a n s f o r m a t i o n s Thus f a r in t h i s chapter we have found that the t r a d i t i o n a l treatment of programming language syntax as context f r e e l e d to a h i g h l y s u c c e s s f u l model f o r t r a n s l a t i o n which has i n recent years reached the l i m i t s of i t s competence. In p a r t i c u l a r , we have seen t h a t , while the model giv e s a c l e a r e x p r e s s i o n of, and workable b a s i s f o r , the r e c u r s i v e t r a n s l a t i o n of programming language c o n t r o l c o n s t r u c t s , such as the begin-end block s t r u c t u r e and the i f - t h e n - e l s e statement, i t f a i l s to account f o r , and give adequate e x p r e s s i o n of, the context s e n s i t i v e nature of the use of symbols in programming languages; and t h a t , i n the l i g h t of c u r r e n t trends in language design, such inadequacies must sooner or l a t e r r e s u l t i n a c r i t i c a l re-examination of the context f r e e model. I t i s arguable that s e c t i o n 2.3 49 t h i s process i s a l r e a d y t a k i n g p l a c e , where the c u r r e n t l y most s u c c e s s f u l extension of the syntax d i r e c t e d t r a n s l a t i o n model i s embodied in v a r i o u s kinds of augmented grammars (see Roster [74], and Bochman and Ward [16], f o r d e s c r i p t i o n s of t r a n s l a t o r w r i t i n g systems based on such grammars). To r e c a p i t u l a t e b r i e f l y , a syntax d i r e c t e d t r a n s l a t i o n scheme imposes a s t a t i c s t r u c t u r e on the program under t r a n s l a t i o n . T h i s s t r u c t u r e i s r e l a t e d to the context f r e e grammar d e f i n i n g the language whether d i r e c t l y , by a t t a c h i n g s p e c i f i c semantic a c t i o n s to each production r u l e , or i n d i r e c t l y , by s t r u c t u r i n g the t r a n s l a t o r a c c o r d i n g to an a b s t r a c t syntax d e r i v e d from the grammar. The t r a n s l a t i o n proces's may be thought of as a t r e e tour of t h i s s t r u c t u r e , whether i t i s implemented as such (as i n Abramson e_t a l . [1] or Crowe [29]) or, as i s more common, i t i s merely a conceptual a i d to the t r a n s l a t o r w r i t e r . During t h i s tour, code may be emitted which represents the t r a n s l a t i o n of the program. N e v e r t h e l e s s , i t i s not uncommon for a b s t r a c t d i s c u s s i o n s of t r a n s l a t i o n to take a more dynamic view of the program. McKeeman ("The t r a n s l a t i o n problem i s . . . t o take a given source text and produce an e q u i v a l e n t t a r g e t text") d e f i n e s a set of e i g h t "feedback f r e e " t r a n s f o r m a t i o n s which make up the t r a n s l a t i o n process f o r the average programming language ([85], pp 15-19). Rosen has developed a theory of t r e e t r a n s f o r m a t i o n systems analogous to the s t r i n g t r a n s f o r m a t i o n systems s e c t i o n 2.3 50 represented by Chomsky-style grammars [104]; t h i s theory was a p p l i e d to t r a n s l a t i o n by DeRemer [33]. Rosen himself has extended h i s theory to graph t r a n s f o r m a t i o n systems, and has suggested a p p l i c a t i o n s in p r o o f s of o p t i m i z e r c o r r e c t n e s s and i n the d e f i n i t i o n of i n t e r p r e t e r s [105]. D i s c u s s i o n s of o p t i m i z a t i o n a l g o r i t h m s , too, are o f t e n couched in terms of improvement t r a n s f o r m a t i o n s on graphs (Aho and Ullman [ 3 ] , pp 418-438; G r i e s [53], pp 396-406). It i s the i n t e n t of t h i s d i s s e r t a t i o n to take such a dynamic, t r a n s f o r m a t i o n a l view of t r a n s l a t i o n in a l l i t s a s p e c t s . In the next chapter i t w i l l be demonstrated that i t i s both p o s s i b l e and p r o f i t a b l e to t r e a t programs as s y n t a c t i c a l l y s t r u c t u r e d sequences of symbols in a uniform manner, and to express v a r i o u s aspects of t r a n s l a t i o n as t r a n s f o r m a t i o n s of t h i s s y n t a c t i c s t r u c t u r e . I t w i l l be seen that a wide range of t r a n s l a t i o n r e l a t e d o b j e c t i v e s , such as language e x t e n s i o n , a b s t r a c t i o n , o p t i m i z a t i o n , and e r r o r handling and recovery may be t r e a t e d in a c l e a r , e x p r e s s i v e manner when presented in t h i s form, whether in an a b s t r a c t d i s c u s s i o n or i n the a c t u a l implementation of a t r a n s l a t o r . In doing so we w i l l be t r e a t i n g the s y n t a c t i c s t r u c t u r e of programs, in the f a s h i o n of s e c t i o n 2.2.3, as g e n e r a l l y connected l i s t s t r u c t u r e s here c a l l e d graphs. We w i l l see that such a treatment -- which re p r e s e n t s those aspects of programming language syntax normally represented, i n s e c t i o n 2.3 51 c o n v e n t i o n a l t r a n s l a t i o n t e c h n i q u e s , by a symbol t a b l e , as an i n t e g r a l p a r t of the s y n t a c t i c s t r u c t u r e -- i s n e c e s s a r y , once the s t a t i c c o n c e p t i o n of s y n t a c t i c s t r u c t u r e i s abandoned, because the i n i t i a l r e l a t i o n s h i p between the t a b l e and the i n t e r n a l r e p r e s e n t a t i o n of the program cannot be g u a ranteed t o c o n t i n u e unchanged throughout the t r a n s l a t i o n . T h i s r e d u c t i o n of two c o n c e p t s i n t o one w i l l a l s o be seen t o have the u s u a l e f f e c t of s i m p l i f y i n g the e x p r e s s i o n of a l g o r i t h m s . 52 Chapter 3: The e x p r e s s i o n of t r a n s l a t i o n a l g o r i t h m s as graph transformat ions T h i s chapter i s intended to examine in some d e t a i l the ways in which t r a n s l a t i o n , in a l l i t s a s p e c t s , may be expressed as t r a n s f o r m a t i o n s on a graph r e p r e s e n t a t i o n of program s t r u c t u r e . In order to do t h i s , we s h a l l need a c l e a r conception of what a graph r e p r e s e n t a t i o n of a program i s and p r e c i s e l y what we are about when we t a l k of t r a n s f o r m i n g i t . The term graph and the concept of graph t r a n s f o r m a t i o n w i l l be d e f i n e d here; we w i l l examine graph p a t t e r n matching in the chapter on implementation which f o l l o w s . Let N be the set of c o u n t i n g numbers. N = {1, 2, 3, ... } D e f i n i t i o n ( s t a t e ) A s t a t e i s a p a i r (A,R) where (1) A i s a set of nodes, A i s a f i n i t e subset of N, and (2) R i s a f u n c t i o n , R: A x N -> A U {0}, as f o l l o w s : VkCA i n 6N such that Vn'CN k R(k,n' ) 6 A i f n'<n k = 0 i f n'>n k Remarks For x,y 6 A, and n 6 N, R(x,n) = y means that the n'th arc chapter 3 53 out of node x i s d i r e c t e d at node y, while R(x,n) = 0 means there i s no n'th arc out of x. A s t a t e , i t should be easy to see, i s very c l o s e to what a graph t h e o r i s t would c a l l a graph, 1 except that the ar c s l e a v i n g a node (the out ar c s ) are ordered by the f u n c t i o n R. The d e f i n i t i o n of s t a t e and the f o l l o w i n g d e f i n i t i o n s which depend on i t are intended not only to gi v e an unambiguous d e f i n i t i o n of graph t r a n s f o r m a t i o n , but a l s o to model reasonably r e a l i s t i c a l l y an implementation of a graph t r a n s f o r m a t i o n language. It i s convenient to t r e a t the s t a t e as a model f o r some p o r t i o n of computer memory set aside to c o n t a i n syntax graphs, to t r e a t syntax graphs as connected subportions of t h i s s t a t e , and to t r e a t graph t r a n s f o r m a t i o n s as o p e r a t i o n s on the st a t e r a t h e r than on i n d i v i d u a l component graphs. Example ( s t a t e ) The p i c t u r e 5 •>4 1 > 2 >3 6 represents a s t a t e d e f i n e d by the p a i r (A,R) where 1 Compare Aho and Ullman [ 2 ] , p 41. chapter 3 54 A = {1,2,3,4,5,6} R (5,1) -> 4 (4,1) -> 2 (1,1) -> 2 (2,1) -> 4 (5,2) -> 1 (1,2) -> 3 (2,2) -> 6 (2,3) -> 3 a l l others -> 0 Remarks It i s o f t e n convenient to l a b e l the nodes of a s t a t e with symbols that i n d i c a t e t h e i r "meaning". In such cases we may think of a s t a t e as a 4-tuple (A,R,S,L) where S i s a set of symbols and L i s a f u n c t i o n L: A->S . It i s e n t i r e l y p o s s i b l e t h a t , f o r g,r 6 A, q*r and L ( q ) = L ( r ) . T h i s w i l l f r e q u e n t l y be the case in s t a t e s r e p r e s e n t i n g s y n t a c t i c o b j e c t s , where nodes w i l l represent instances of s y n t a c t i c c l a s s e s , and there w i l l be many such instances i n a program. The presence or absence of S,L i n the s t a t e i s not m a t e r i a l to the present d i s c u s s i o n and w i l l in f a c t tend to c l u t t e r up the d e f i n i t i o n s and examples. The correspondence should be obvious and, where u n s p e c i f i e d , we can assume that S=A and VaCA, L(a)=a . D e f i n i t i o n ( c h i l d , parent) If S=(A,R) i s a s t a t e , x,y6A, then y i s a c h i l d of x (with respect to S) i f and only i f -jnCN such that R(x,n) = y . chapter 3 55 R e c i p r o c a l l y , x i s a parent of y (with respect to S) i f and only i f y i s a c h i l d of x (with respect to S ) . Remark It i s p o s s i b l e that an arc i s formed from a node to i t s e l f , i n which case, the node w i l l be i t s own c h i l d / p a r e n t . S i m i l a r l y , i f a p a i r of nodes each have an arc d i r e c t e d at the other, each i s c h i l d / p a r e n t to the other. D e f i n i t i o n (descendants, a n c e s t o r s ) If S=(A,R) i s a s t a t e , x,y6A, then y i s a descendant of x (with respect to S) i f and only i f (a) x = y, or (b) -}z6A such that z i s a c h i l d of x and y i s a descendant of z. R e c i p r o c a l l y , x i s an ancestor of y (with respect to S) i f and only i f y i s a descendant of x (with respect to S ) . Remark In g e n e r a l , there w i l l be only one s t a t e i n f o r c e at any time, and so i t i s obvious which s t a t e we mean when we say "y i s a c h i l d of x with respect to S", e t c . In such c a s e s , - i t w i l l be s u f f i c i e n t to say "y i s a c h i l d of x", e t c . chapter 3 56 Example In the preceding example, node 4 i s a c h i l d of node 5; node 3 i s a c h i l d of both node 1 and node 2. Notice that nodes 2 and 4 are each a " c h i l d " of the other. D e f i n i t i o n (the graph d e f i n e d by a node) If (A,R) i s a s t a t e , r£A, then (A',R',r) i s the graph def ined by_ r i f and only i f (1) Vx6A, x i s in A' i f and only i f x i s a descendant of r •(2) VxCA, and Vn6N, R'(x,n) = R(x,n) i f and only i f x£A' Remarks Th i s says that any node from a s t a t e may be used to d e f i n e a graph, that the graph w i l l c o n t a i n a l l the nodes reachable from that root node, and that i t w i l l c o n t a i n a l l the a r c s connecting nodes in the graph. In more standard terminology, the graph d e f i n e d by r i s the subgraph induced on the set of descendants of r . It i s e n t i r e l y p o s s i b l e that a graph w i l l have s e v e r a l r o o t s : that i s , that there e x i s t G'=(A',R',r') where r * r ' . In such cases, the graph n e c e s s a r i l y c o n t a i n s a c y c l e , of which r and r' are p a r t , s i n c e by d e f i n i t i o n r i s a descendant of r ' , r' i s a descendant of r, and so r i s a descendant of i t s e l f . c h a p t e r 3 57 A DAG ( d i r e c t e d a c y c l i c g r a p h ) , t h e r e f o r e , i s a graph i n which no node i s i t s own descendant ( e q u i v a l e n t l y , a n c e s t o r ) . A t r e e i s a DAG i n which no node has more than one p a r e n t ( t o be a graph i n t h i s model, a l l but the r o o t node must have one p a r e n t ) . Example G as i n the p r e v i o u s example, G' the graph d e f i n e d by node 2: G' : ->4 n 1 ->2 1-4 ->2 ->3 A'={2,3,4,6] R' : (2,1) -> 4 (2,3) -> 3 (2,2) (4,1) -> 6 •> 2 In s t a t e G of the above example, both nodes 2 and 4 d e f i n e the graph G.' ; the node 3 d e f i n e s a t r i v i a l graph c o n s i s t i n g of i t s e l f . We can now d e f i n e l e f t t o r i g h t t o u r s of graphs as f o l l o w s . D e f i n i t i o n ( l e f t t o r i g h t t o u r ) I f G = (A,R,r) i s a graph, f i s a f u n c t i o n t a k i n g arguments from A, t h e n , g i v e n the f o l l o w i n g a l g o r i t h m , chapter 3 58 l e f t - t o - r i g h t ( x , i ) : { i f i = l then f ( x ) ; } i f R ( x , i ) * 0 then l e f t - t o - r i g h t ( R ( x , i ) , 1); l e f t - t o - r i g h t ( x , i+1) {else f(x)} e v a l u a t i o n of l e f t - t o - r i g h t ( r , l ) w i l l r e s u l t i n a l e f t to  r i g h t tour of G. The two o p t i o n a l c a l l s of f ( x ) , only one of which should be present, w i l l r e s u l t i n a preorder and a postord e r tour, r e s p e c t i v e l y , of the nodes of G. Remarks Where r d e f i n e s a t r e e , l e f t - t o - r i g h t ( r , 1 ) i s guaranteed to v i s i t every node i n the t r e e e x a c t l y once. Where r d e f i n e s a DAG, every arc i s t r a v e r s e d e x a c t l y once. In the example T8 on page 46, a node l a b e l l e d x i s v i s i t e d once f o r every occurrence of i d e n t i f i e r x i n the program represented by the DAG, although each such node w i l l be v i s i t e d s e v e r a l times: once f o r i t s use in the d e f i n i t i o n of a d i s t i n c t v a r i a b l e x, and once f o r each use r e f e r r i n g to  that def i n i t i o n . If r d e f i n e s a graph with c y c l e s , the a l g o r i t h m w i l l not terminate: s i n c e graph p a t t e r n matching i n v o l v e s graph t r a v e r s a l , the q u e s t i o n of t e r m i n a t i o n in p a t t e r n matching over graphs must be addressed. T h i s w i l l be done i n s e c t i o n chapter 3 59 4.2.2, where general graph t o u r s , f o r the purposes of s y n t a c t i c p a t t e r n matching, are shown to be no more complex than DAG t o u r s . We are now ready to c o n s i d e r graph t r a n s f o r m a t i o n s . They are c a l l e d graph t r a n s f o r m a t i o n s , because the way we w i l l be using them in the remainder of t h i s chapter w i l l i n v o l v e the a l t e r a t i o n of i n d i v i d u a l graphs, where a graph i s , as d e f i n e d above, some node and the c o l l e c t i o n of a r c s and nodes "dangling" from i t . However, i t w i l l be convenient to d e f i n e a graph t r a n s f o r m a t i o n as a s t a t e t r a n s f o r m a t i o n . 2 I n d i v i d u a l graphs, d e f i n e d by nodes in the s t a t e , w i l l have been changed, as a r e s u l t of a t r a n s f o r m a t i o n , i n t o d i f f e r e n t l y "shaped" graphs in the r e s u l t i n g s t a t e . D e f i n i t i o n (graph t r a n s f o r m a t i o n ) Let S = (A,R) be a s t a t e , r,r'6A; then the o p e r a t i o n r => r' has the e f f e c t of t r a n s f o r m i n g S i n t o a new s t a t e S' = (A,R*) where Vx£A, and Vn6N, R'(x,n) = R(x,n) i f R(x,n) t r = r' i f R(x,n) = r 2 T h i s i s analogous to the s t a t e t r a n s f o r m i n g p r o p e r t i e s of o r d i n a r y assignment in programming languages: compare Floy d [42] and Hoare [ 5 9 ] . chapter 3 60 Remarks The o p e r a t i o n of graph t r a n s f o r m a t i o n i s d e f i n e d here i n terms of nodes r , r ' i n s t a t e S. However, s i n c e any node in a s t a t e i s the root of some graph, we are r e a l l y d i s c u s s i n g the t r a n s f o r m a t i o n of a l l graphs c o n t a i n i n g as subgraph the graph d e f i n e d by r i n t o new graphs wherein the graph d e f i n e d by r' now occupies the ( s y n t a c t i c ) p o s i t i o n p r e v i o u s l y occupied by the graph d e f i n e d by r . Example S = (A , R) ->4 ->2 8<->3 A = {1,2,3,4,5,6,7,8,9} R : (1,1) -> 2 (1,2) -> 3 (2,1) -> 4 (2,2) -> 6 (2,3) -> 3 (4,1) -> 2 (5,1) -> 4 (5,2) -> 1 (7,1) -> 8 (7,2) -> 9 (9,1) -> 8 a l l o t h ers -> 0 The t r a n s f o r m a t i o n 2 => 7 has the e f f e c t of transforming S i n t o S' = (A,R') such that R' : (1,1) -> 7 (4,1) -> 7 a l l others as f o r R c h a p t e r 3 61 S'=(A rR') ->7 i n 3 8<- 9 D i s c u s s i o n S i n c e the en s u i n g m a t e r i a l w i l l t r e a t graphs t h a t w i l l be o n l y s l i g h t v a r i a t i o n s of t r e e s , and i n p a r t i c u l a r graphs t h a t w i l l be r o o t e d a t a s p e c i f i c node, a t r a n s f o r m a t i o n l i k e "2 => 7" w i l l be c o n v e n i e n t l y r e p r e s e n t e d by the a c t u a l graphs b e i n g t r a n s f o r m e d . For example, 2 => 7 I 8< , 9 S i n c e a t r a n s f o r m a t i o n w i l l o f t e n i n v o l v e a p a t t e r n match, i t i s more o f t e n m e a n i n g f u l t o r e p r e s e n t as much of the graph b e i n g t r a n s f o r m e d -- p o s s i b l y the whole subgraph d e f i n e d by the node b e i n g t r a n s f o r m e d -- as i s n e c e s s a r y t o s p e c i f y the p a t t e r n match t h a t has taken p l a c e . For example, ->2 7 = > , L ~l 3 8<-In a l l r e p r e s e n t a t i o n s of t r a n s f o r m a t i o n s i t w i l l be c l e a r which a r e the r o o t nodes and what the h i e r a r c h i c a l r e l a t i o n s h i p between the nodes i s , e i t h e r by p l a c i n g c h i l d r e n d i r e c t l y below p a r e n t s (as 8 and 9 are p l a c e d below 7 ) , or by c h a p t e r 3 62 u s i n g arrows t o i n d i c a t e the r e l a t i o n s h i p p a r e n t - - > c h i I d (as 9—>8, and 4-->2). In much of the m a t e r i a l on graph t r a n s f o r m a t i o n , we w i l l want to put a graph pa11ern on the l e f t of the t r a n s f o r m i n g arrow, w i t h v a r i a b l e s at some of i t s l e a f nodes t o r e p r e s e n t e n t i r e , but u n s p e c i f i e d subgraphs. These v a r i a b l e s may then reappear on the r i g h t , and i t i s t o be un d e r s t o o d t h a t they r e p r e s e n t the same subgraph on the r i g h t as on the l e f t , much as v a r i a b l e s r e p r e s e n t the same v a l u e on e i t h e r s i d e of an e q u a t i o n . Graph p a t t e r n s w i l l a l l o w us t o d e s i g n a t e e n t i r e c l a s s e s of t r a n s f o r m a t i o n s -- i n s t e a d of merely i n d i v i d u a l t r a n s f o r m a t i o n s such as i n the above example; they w i l l e v e n t u a l l y form a p a r t of a graph t r a n s f o r m a t i o n programming language, i n a manner s i m i l a r t o the use of v a r i a b l e s i n c o n v e n t i o n a l programming t o d e s i g n a t e e n t i r e c l a s s e s of v a l u e s and m a n i p u l a t i o n s of these c l a s s e s , i n s t e a d of merely i n d i v i d u a l v a l u e s ( c o n s t a n t s ) . V a r i a b l e s i n p a t t e r n s w i l l be r e p r e s e n t e d as i t a l i c ( u n d e r l i n e d ) symbols, as f o l l o w s : D e f i n i t i o n (graph p a t t e r n s ) Let P = (A,R,S,L) be a s t a t e , S = V U C where V i s a s e t of v a r i a b l e ( i t a l i c ) symbols, and C a s e t of c o n s t a n t (roman) symbols. I f chapter 3 63 L : A -> S i s a f u n c t i o n which l a b e l s the nodes of A i n such a way that Va6A, L(a)6V only i f a has no c h i l d r e n , then a graph  p a t t e r n i s any graph G=(A',R',r) such that r6A and -}aeA' such that L(a)6V. Example whi l e expression G = (A,R,1) A = {1,2,3} R : (1,1) -> 2 L : 1 -> while statement V = {expression C = {while} (1,2) -> 3 2 -> e x p r e s s i o n statement} 3 -> statement n o t i c e that n e i t h e r node 2 nor node 3 has c h i l d r e n in P Di s c u s s i o n In a programming language s e t t i n g , the above example repr e s e n t s the a b s t r a c t syntax of the c l a s s of "while" statements. In a p a t t e r n match (see s e c t i o n 4.2.2) G'may be matched s u c c e s s f u l l y a g a i n s t any graph G' whose root node i s l a b e l l e d "while" and has e x a c t l y two c h i l d r e n . A f t e r such a match, the subgraph of G' d e f i n e d by the l e f t c h i l d of i t s root node w i l l be a s s o c i a t e d with (assigned to) the v a r i a b l e e x p r e s s i o n , while the subgraph d e f i n e d by the r i g h t c h i l d w i l l be a s s o c i a t e d with the v a r i a b l e statement. chapter 3 64 The d i s c u s s i o n of graph t r a n s f o r m a t i o n presented i n t h i s chapter w i l l o f t e n d e s c r i b e t r a n s f o r m a t i o n s i n t h i s form: ** * => v 2 v v (a diagram intended to represent the t r a n s f o r m a t i o n of any e x p r e s s i o n "y_**2" i n t o "v*v", an example of operator s t r e n g t h r e d u c t i o n ) . I t i s intended to represent a c l a s s of t r a n s f o r m a t i o n s such t h a t , i f a p a r t i c u l a r subgraph i s s u b s t i t u t e d f o r every occurrence of a p a r t i c u l a r v a r i a b l e symbol ( i n t h i s case, v) throughout the diagram, a p a r t i c u l a r i n s t a n c e of the t r a n s f o r m a t i o n w i l l have been generated. For example, * * * I => i var 2 var var but a l s o ** * => + i i i i i i var var var var var var I I I I I I a b a b a b ( x**2 => x*x ; (a+b)**2 => (a+b)*(a+b) ). Notice that, s i n c e no c o p i e s are made acc o r d i n g to the d e f i n i t i o n , only s u b s t i t u t i o n s , the t r a n s f o r m a t i o n p a t t e r n ** * I 1 1 => I 1 1 v 2 L_> v <—J i s e n t i r e l y e q u i v a l e n t , and perhaps gi v e s a c l e a r e r p i c t u r e of the r e s u l t i n g s t r u c t u r e of the graph r e p r e s e n t a t i o n of the program fragment. c h a p t e r 3 65 T h i s c h a p t e r i s d i v i d e d i n t o f o u r main t o p i c s : s e c t i o n 3.1 t r e a t s p l a i n t r a n s l a t i o n , t h a t i s , the r e - e x p r e s s i o n of a program w r i t t e n i n one language . as an e q u i v a l e n t program i n an o t h e r language; s e c t i o n 3.2 t r e a t s o p t i m i z a t i o n i n a v a r i e t y of forms; s e c t i o n 3.3 t r e a t s e r r o r h a n d l i n g ; and s e c t i o n 3.4 shows t h a t a graph t r a n s f o r m a t i o n t r e atment i s not out of keep i n g w i t h r e c e n t developments i n programming language r e s e a r c h , so t h a t i t i s not o n l y a u n i f o r m t r e a t m e n t f o r t r a d i t i o n a l t r a n s l a t i o n c o n c e r n s , but promises t o c o n t i n u e t o be a p p l i c a b l e a t l e a s t i n t o the f o r e s e e a b l e f u t u r e . 3.1 T r a n s l a t i o n In t h i s s e c t i o n , the d i s c u s s i o n i s i n t e n d e d t o encompass those a s p e c t s of t r a n s l a t i o n t h a t may be thought of as the r e - e x p r e s s i o n of a program w r i t t e n i n one language as an e q u i v a l e n t program i n a n o t h e r . T r a d i t i o n a l l y , c o m p i l a t i o n i s the most common form of t r a n s l a t i o n , p r i m a r i l y because i t i s the most u s e f u l : i n almost a l l c a s e s , the u l t i m a t e d e s t i n a t i o n of a program i s to be run on a machine, and t h i s i s p o s s i b l e o n l y i f the program i s c o m p i l e d -- t h a t i s , i f the program i s t r a n s l a t e d i n t o machine code, s p e c i f i c a l l y i n the form of a r e l o c a t a b l e o b j e c t module. But we w i l l a l s o c o n s i d e r , as s p e c i a l c a s e s of t h i s p r o c e s s , language e x t e n s i o n , where a programming language i n c l u d e s some f a c i l i t y f o r e x p r e s s i n g s o p h i s t i c a t e d c o n s t r u c t s i n terms of base c o n c e p t s , and o t h e r t r a n s l a t i o n t e c h n i q u e s where the o b j e c t "language" i s not machine code but some o t h e r s e c t i o n 3.1 66 programming language, or an intermediate form between programming language and machine code (an a b s t r a c t language). T h i s s e c t i o n i s d i v i d e d , p r i m a r i l y f o r convenience, i n t o three p a r t s which may be thought of as t r e a t i n g d i s t i n c t aspects of t r a n s l a t i o n , but in. f a c t w i l l d i s p l a y so many c r o s s - c u r r e n t s , that they form a l a r g e l y u n i f i e d whole. T h i s lack of c l e a r s e p a r a t i o n between these aspects i s , indeed, a f u n c t i o n of the treatment given them here i n terms of s y n t a c t i c t r a n s f o r m a t i o n s , and i s one more argument in favor of a uniform treatment such as t h i s ; f o r where apparently d i s t i n c t concepts can be shown to be r e l a t e d in a s p e c i f i c way, a s i m p l i f i c a t i o n has occur r e d which allows us to save time and may, in the long run, provide a b a s i s f o r other, p o s s i b l y more profound i n s i g h t s . In f a c t , there w i l l be c r o s s - c u r r e n t s with other s e c t i o n s as w e l l ; p a r t i c u l a r l y with s e c t i o n 3.2, where many program improvements w i l l be seen i n the same l i g h t that i l l u m i n a t e s t r a n s l a t i o n s here. 3.1.1 Language extension Simpler than c o m p i l a t i o n and r e l a t e d t r a n s l a t i o n s , and t h e r e f o r e an e a s i e r i n t r o d u c t i o n to the t o p i c , are those aspects of programming languages that give the programmer the a b i l i t y to ampl i f y the e x p r e s s i v e power of the language. The n o t i o n of a macro i n s t r u c t i o n -- a programmer d e f i n e d i n s t r u c t i o n expressed i n terms of a combination of more s e c t i o n 3.1.1 67 p r i m i t i v e i n s t r u c t i o n s -- must have appeared e a r l y and q u i t e anonymously i n s e v e r a l p l a c e s at once. It i s a simple labor saving d evice which permits the programmer to w r i t e a commonly used sequence of i n s t r u c t i o n s once and subsequently to invoke t h i s sequence by name. The concept was f i r s t d i s c u s s e d w i t h i n a c o n s i s t e n t , general framework by M c l l r o y in 1960, and i t was s p e c i f i c a l l y recommended by him as a u s e f u l concept in h i g h - l e v e l programming languages [82]. In f a c t , A l g o l 58 (or IAL) had an inchoate macro f a c i l i t y i n the do statement, which was d e f i n e d as a t e x t u a l s u b s t i t u t i o n of the statement body wherever i t s name appears i n a do ([96], p 16). E a r l y general purpose macro p r o c e s s o r s , standing on t h e i r own without a s p e c i f i c host language, were d e f i n e d by Strachey [120] and Mooers [89]. These were intended as a program preprocessor (Mooers), and as an implementation language for an intermediate l e v e l a b s t r a c t language (Strachey: we w i l l have cause to examine t h i s idea more c l o s e l y i n s e c t i o n 3.1.3), but have been used f o r e v e r y t h i n g from a semantics implementation language [1] to a tex t p r o c e s s i n g language in a word p r o c e s s i n g system [128]. Macro p r o c e s s o r s l i k e Mooers' TRAC are perhaps too p r i m i t i v e to add any genuine convenience to programming, e s p e c i a l l y because they are usable only as p r e p r o c e s s o r s to a compiler. In both TRAC and Strachey's GPM, the macro i s invoked as a bracketed sequence of s t r i n g s ; f o r example, #(AD, #(N), 2) (an example from TRAC, adding 2 to the q u a n t i t y p r e v i o u s l y s e c t i o n 3.1.1 68 d e f i n e d as N and d e l i v e r i n g the r e s u l t as i t s v a l u e ) . More s o p h i s t i c a t e d macro p r o c e s s i n g languages, which attempt to emulate the A l g o l f a m i l y model of statement form, were d e f i n e d by Leavenworth [75] and Brown [18]. These f a l l short of a c o n t e x t - f r e e s y n t a c t i c form, but w i l l d e al with r e c u r s i v e s t r u c t u r e s of the form WHILE a DO b OD IF a THEN b ELSE c FI (note c l o s i n g "parentheses" OD and F I ) . Attempts to use languages of t h i s .form as a b a s i s f o r d e f i n i n g compilers continue to be made (recent examples are suggestions by Tanenbaum [122] and Sassa [111]), but r e s t r i c t i o n s on the syntax of macro templates, such as the requirement that statements be c l o s e d with p a r e n t h e t i c markers, plac e a d d i t i o n a l r e s t r a i n t s on the use of these p r o c e s s o r s , on top of the usual l i m i t a t i o n s of context f r e e syntax in d e a l i n g with i d e n t i f i e r s . A recent use of a one-language o r i e n t e d "macro" processor i s Kernighan's A l g o l 6 0 - l i k e extension of F o r t r a n syntax through the Ratfor preprocessor [ 6 8 ] . 3 Macro p r o c e s s o r s have been used s u c c e s s f u l l y in p o r t i n g software, and have been a p p l i e d s y s t e m a t i c a l l y in t h i s context by Waite e_t a_l. [134,135], The success of macro pr o c e s s o r s in t h i s kind of software t o o l a p p l i c a t i o n stems in part from i t s being aimed at a small group of s o p h i s t i c a t e d users, p r i m a r i l y in one-shot programming 3 See a l s o Hanson [56], f o r an e v a l u a t i o n of the a d a p t a b i l i t y of t h i s concept to a r a t i o n a l i z e d SNOBOL p r e p r o c e s s o r . s e c t i o n 3.1.1 69 e f f o r t s such as language p o r t s , and i n part from r e s t r i c t i o n s on the range of the p r o c e s s o r ' s a p p l i c a t i o n , p r i m a r i l y to a s p e c i f i c language. Waite's group furthermore imposed a l i n e - o r i e n t e d form on the macro statements, which saves t h e i r processor the e f f o r t Of a complete context f r e e parse. Languages s p e c i f i c a l l y designed to be e x t e n s i b l e ' b e a r a s i g n i f i c a n t resemblance to macro p r o c e s s o r s . A l g o l 68 permits the programmer to d e f i n e new o p e r a t o r s , but not new statement types. Garwick's GPL takes the A l g o l 68 concept a step f u r t h e r to i n c l u d e ML/I s t y l e " p a r e n t h e s i z e d " statement forms [46]. F r a l e y ' s "unlanguages" [44], B a s i l i ' s " f a m i l i e s " of languages [9 ] , and the " l a y e r s " of language of G e i s e l b r e c h t e r e_t a l . [47], are a l s o f o r m u l a t i o n s of the concept of e x t e n s i b i l i t y . The idea of a b s t r a c t ion as developed by Wulf and o t h e r s , p r i m a r i l y as a b a s i s f o r s t r a t i f y i n g p r o o f s of program c o r r e c t n e s s , i n c o r p o r a t e s many of these ideas in a uniform manner and w i l l be d i s c u s s e d , i n a more a p p r o p r i a t e context, i n s e c t i o n 3.4. To p l a c e macro p r o c e s s i n g and r e l a t e d concepts in the context of t r a n s l a t i o n through s y n t a c t i c t r a n s f o r m a t i o n , l e t us c o n s i d e r a s y n t a c t i c macro f a c i l i t y f o r programming languages along the l i n e s of Leavenworth's p r o p o s a l . Such a macro f a c i l i t y , embedded in a simple programming language, would allow us to w r i t e programs of the form: s e c t i o n 3.1.1 70 begin  macro while a do b od [P4] i s L: i_f a then b; goto L f i  mend; while x>y do x := x-n od  end We expect t h i s program to have the same e f f e c t as the program begin L: i_f x>y then x := x-n; goto L f i  end In e f f e c t , the macro d e f i n i t i o n has added to the t r a n s l a t o r a t r a n s f o r m a t i o n r u l e of the form while if-then<-J => I l l l c s c • seq I goto [T9] We can think of a l l v a r i e t i e s of macro s u b s t i t u t i o n in t h i s way. To take a concrete example, P a s c a l ' s type d e f i n i t i o n f a c i l i t y (a u s e f u l concept f i r s t i n t r o d u c e d s y s t e m a t i c a l l y i n A l g o l 68, and promised l o n g e v i t y by, f o r i n s t a n c e , i t s recent i n c l u s i o n i n the design of Ada) may be seen in the same l i g h t . A d e f i n i t i o n of the form type complex = rec o r d r , i : r e a l end i s tantamount to d e c l a r i n g a t r a n s f o r m a t i o n r u l e l i k e i d r e c o r d complex f i e l d I I r e a l i d l i s t i s e c t i o n 3.1.1 71 T h i s use of " s i m p l i f y i n g " t r a n s f o r m a t i o n s to change a high l e v e l program i n t o an e q u i v a l e n t program c o n t a i n i n g only p r i m i t i v e or low l e v e l concepts w i l l be seen again i n s e c t i o n 3.1.3, where i t i s shown to l i e at the back of many schemes f o r implementing t r a n s l a t o r s , and i n s e c t i o n 3.4, where i t i s used to implement the concept of a b s t r a c t i o n . The use of macro p r o c e s s o r s and t h e i r p l a c e in programming languages i s d e s c r i b e d by Cole [25]; and a b i b l i o g r a p h y so s t r u c t u r e d as to emphasize the r e l a t i o n s h i p between macro p r o c e s s i n g and language extension was r e c e n t l y p u b l i s h e d by Metzner [88]. 3.1.2 Language-to-language t r a n s l a t i o n In a sense the t i t l e of t h i s s e c t i o n i s too i n c l u s i v e : every t r a n s l a t i o n i s from one language to another, perhaps a b s t r a c t , perhaps r e s t r i c t e d form of the same language. In another sense i t i s too e x c l u s i v e : i f we take "language" to denote an a c t u a l programming language, then not only are a b s t r a c t intermediate forms, which w i l l be d e f e r r e d u n t i l the next s e c t i o n , excluded, but so are the r e l o c a t a b l e o b j e c t modules generated by c o m p i l e r s , which are to be considered as one major aspect of c o m p i l i n g , in t h i s s e c t i o n . However, as was a l r e a d y i n d i c a t e d , the d i v i d i n g l i n e between these t o p i c s i s extremely f l e x i b l e : i t i s d i f f i c u l t to decide whether R a t f o r , for example, represents a language extension mechanism, a new language which i s d e f i n e d i n terms of, and t r a n s l a t e d i n t o , F o r t r a n , or a b a s i s f o r h i g h l y p o r t a b l e programs; in p r a c t i c e , s e c t i o n 3.1.2 72 i t i s a l i t t l e of a l l t h r e e , and ought t o be mentioned i n a l l t h r e e s e c t i o n s . In t h i s s e c t i o n , we w i l l c o n s i d e r p r i m a r i l y c o m p i l a t i o n . As a s i m p l i f y i n g a s s u m p t i o n , we w i l l make no d i s t i n c t i o n between t r a n s l a t i o n s which r e s u l t i n a r e l o c a t a b l e o b j e c t module ( t r u e c o m p i l a t i o n ) and those which produce assembly language s t a t e m e n t s or even h i g h l e v e l language output -- as f r e q u e n t l y happens i n the case of e x p e r i m e n t a l i m p l e m e n t a t i o n s t o t r y out the s t i l l f l e x i b l e d e s i g n of a new language. J u s t as t h i s c h a p t e r does not concern i t s e l f w i t h the ex a c t method by which a s t r i n g of t e x t i s t u r n e d i n t o a graph r e p r e s e n t a t i o n of the program's s y n t a c t i c s t r u c t u r e , so i t does not concern i t s e l f w i t h the e x a c t method whereby a t r a n s l a t i o n i s made t o conform to the c h a r a c t e r i s t i c s of any p a r t i c u l a r machine or o p e r a t i n g system. An i n c r e a s i n g l y i m p o r t a n t form of langua g e - t o - l a n g u a g e t r a n s l a t i o n i s program p o r t i n g . A program w r i t t e n i n one language i s t o be implemented on a machine s u p p o r t i n g o n l y another language: where the program i s a c o m p i l e r , w r i t t e n i n the language i t implements, the p r o c e s s of p o r t i n g i s known as b o o t s t r a p p i n g . S i n c e so many languages may be d e s c r i b e d as " A l g o l 6 0 - l i k e " and indeed d i f f e r from each o t h e r o n l y i n s m a l l but s y n t a c t i c a l l y s u b t l e a s p e c t s , such a t r a n s l a t i o n i s e s s e n t i a l l y a s t r a i g h t f o r w a r d one, but i t i s l i k e l y t o be f r u s t r a t e d by the l i n e , and u n s t r u c t u r e d s t r i n g , o r i e n t a t i o n of s e c t i o n 3.1.2 73 most t e x t e d i t o r s and t e x t p r o c e s s i n g languages. Even minor d i f f e r e n c e s i n language can make t h i s a major t a s k w i t h o u t h i g h l y s o p h i s t i c a t e d t o o l s : S a b i n notes h i s e x p e r i e n c e t h a t p o r t i n g F o r t r a n programs between d i f f e r e n t machines i s made n o n t r i v i a l by widespread d e v i a t i o n from the s t a n d a r d , and f r e q u e n t gaps i n the s t a n d a r d ; the bes t t o o l s a v a i l a b l e t o him were p o o r l y s u i t e d t o t h i s t a s k [ 1 0 8 ] . I can add my own e x p e r i e n c e t o t h i s ; and in d e e d , i t was a c o n s i d e r a t i o n of the u n s u i t a b l i l i t y of a v a i l a b l e t o o l s -- on the one hand of f i l e e d i t o r s and on the o t h e r of t r a n s l a t o r w r i t i n g systems t o t h i s s t r a i g h t f o r w a r d , and e x t r e m e l y common language p r o c e s s i n g t a s k t h a t t r i g g e r e d the r e s e a r c h b e i n g r e p o r t e d h e r e . We w i l l c o n s i d e r t h r e e s p e c i f i c problems i n t r a n s l a t i o n t h a t a re u s u a l l y g i v e n c o n s i d e r a b l e a t t e n t i o n i n t e x t b o o k s : one i s e x p r e s s i o n t r a n s l a t i o n w i t h o u t o p t i m i z a t i o n , one of the p l a c e s where syntax d i r e c t e d t r a n s l a t i o n i s a t i t s most a p p l i c a b l e , another i s v a r i a b l e t r a n s l a t i o n , and the t h i r d i s c o n t r o l f l o w e x p r e s s i o n t r a n s l a t i o n . (a) Simple e x p r e s s i o n t r a n s l a t i o n C o n s i d e r an e x p r e s s i o n of the form a x (b + c) -J- d which we w i l l r e p r e s e n t by a graph of the form s e c t i o n 3.1.2 74 x var I ' 1 | [T10] var + d a var var I I b c If we were t r a n s l a t i n g to a stack machine, the o b j e c t code, in symbolic form, would look something l i k e t h i s : l oad a lo a d b load c add mult i p l y l o a d d d i v i d e Such a t r a n s l a t i o n c o u l d be accomplished e a s i l y enough by a set of t r a n s f o r m a t i o n r u l e s of the form + 1 — v. seq 1 1 X X 1 I Y. > 1 X 1 1 • seq 1 add 1 X 1 i 1 - > 1 X — -s. 1 1 seq 1 I mult i p l y 1 X var i 1 ^ 1 X = > 1 I load(x) I d i v i d e X Applying these r u l e s , in any order, to T10 r e s u l t s i n a graph of the form s e c t i o n 3.1.2 75 seq seq load(d) d i v i d e load(a) mult i p l y seq load(b) l o a d ( c ) add A simple l e f t to r i g h t tour over t h i s graph, p r i n t i n g out the l a b e l of each l e a f node, s u f f i c e s to produce the assembly language form of the program. T h i s much i s pure syntax d i r e c t e d t r a n s l a t i o n and, i f the t a r g e t machine does not operate on a stack model (few -machines do), these i n s t r u c t i o n s can be t r e a t e d as macros and f u r t h e r a m p l i f i e d by the t r a n s f o r m a t i o n mechanism demonstrated in s e c t i o n 3.1.1; they w i l l not, in g e n e r a l , be very e f f i c i e n t in that form, though few compilers bother to do b e t t e r , but we w i l l l eave for s e c t i o n 3.2.3 a d i s c u s s i o n of how such t r a n s l a t i o n s can be improved. (b) V a r i a b l e t r a n s l a t i o n In the above example, we have l e f t the v a r i a b l e s a,b,c i n t h e i r symbolic form. If the t r a n s l a t i o n i s to proceed to machine code form, symbols w i l l have to become a c t u a l memory l o c a t i o n s , or a l g o r i t h m s for a c c e s s i n g the stack. Consider the program example used to demonstrate nested d e f i n i t i o n s on page 31: begin i n t x ; begin r e a l x ; x end ; x end [P3] s e c t i o n 3.1.2 76 T h i s i s r e p r e s e n t e d a s : b l o c k I 1 1 stms [ T i l ] I >var var< 1 i h H 1 i n t x r e a l x Without the c o m p l i c a t i o n s c r e a t e d by procedure c a l l s , we can choose s i m p l y t o g i v e each of these v a r i a b l e s a unique memory l o c a t i o n , and not b o t h e r w i t h implementing a s t a c k . In a t o u r of t h i s s t r u c t u r e , every time a b l o c k node i s e n c o u n t e r e d , f i r s t the d e c l a r a t i o n s and then the statements are p r o c e s s e d ; a g l o b a l v a r i a b l e l o c keeps t r a c k of the number of v a r i a b l e s (memory l o c a t i o n s ) d e c l a r e d , and e n c o u n t e r i n g a v a r i a b l e d e c l a r a t i o n t r i g g e r s the t r a n s f o r m a t i o n : var var I 1 1 => l 1 , type v type v l o c a l o n g w i t h an u p d a t i n g of the v a l u e of the v a r i a b l e c o u n t e r ("loc := l o c + 1"). N o t i c e t h a t , i n t h i s r e s p e c t , t r a n s l a t i o n remains p r i m a r i l y syntax d i r e c t e d : i t i s c o n v e n i e n t t o t r a n s l a t e the_ program i n a l e f t t o r i g h t o r d e r , though not s t r i c t l y n e c e s s a r y ; i n many c a s e s , a s t r i c t o r d e r i n g of some form, though not n e c e s s a r i l y a l e f t t o r i g h t o r d e r i n g , i s c r u c i a l t o a c o r r e c t t r a n s l a t i o n . S i n c e the b l o c k node's o n l y f u n c t i o n i s t o s e p a r a t e d e c l a r a t i o n s from s t a t e m e n t s , we can a l s o i n c l u d e the s e c t i o n 3.1.2 77 t r a n s f o r m a t i o n b l o c k I 1 1 => s d e l s s The r e s u l t of a p p l y i n g these t r a n s f o r m a t i o n s i s the program stms i , I I v a r var I 1 1 I 1 1 r e a l a 2 i n t a 1 T h i s c o r r e s p o n d s e x a c t l y t o the semantics of the o r i g i n a l program: f i r s t e v a l u a t e the r e a l v a r i a b l e (which was the second v a r i a b l e d e c l a r e d ) ; then e v a l u a t e the i n t v a r i a b l e (which was d e c l a r e d f i r s t ) . N o t i c e the power of a graph r e p r e s e n t a t i o n of the program: p r o c e s s i n g the d e c l a r a t i o n s , which c o n s i s t s i n a s s i g n i n g each d e c l a r e d v a r i a b l e a l o c a t i o n , had the g l o b a l e f f e c t of a l l o w i n g us t o p r o c e s s the s t a t e m e n t s i n the f u l l knowledge t h a t l o c a t i o n s had been a s s i g n e d t o e v e r y v a r i a b l e a p p e a r i n g i n them. T h i s i s n o r m a l l y a c c o m p l i s h e d by symbol t a b l e a c c e s s , but here we have been a b l e t o d i s c u s s the p r o c e s s of v a r i a b l e t r a n s l a t i o n w i t h o u t r e f e r e n c e t o any concept l i k e symbol t a b l e , but p u r e l y i n terms of graph t r a n s f o r m a t i o n s . C o n s i d e r now the t r a n s l a t i o n of an assignment statement: x : = x + 1 which we w i l l r e p r e s e n t as a graph s e c t i o n 3.1.2 78 I var < 1 const I I I I i I i n t x 27 i n t 1 96 Here, the d e c l a r a t i o n s have alr e a d y been processed, and the v a r i a b l e named x has been assigned l o c a t i o n 27, while the constant with value 1 has been assigned l o c a t i o n 96. 4 It i s i n the nature of the assignment operator not to t r e a t i t s l e f t and r i g h t hand s i d e s a l i k e : the v a r i a b l e on the l e f t r e p r e s e n t s a l o c a t i o n , whereas a v a r i a b l e on the r i g h t (even, as in t h i s case, the same v a r i a b l e that occurred on the l e f t ) r e p r e s e n t s a v a l u e . 5 T h e r e f o r e , i n a tour of t h i s graph, encountering the assignment operator w i l l cause d i f f e r e n t r u l e s to be a p p l i e d to the l e f t and r i g h t s u b t r ees. We can express t h i s as the set of r u l e s : : = seq -l - " I 1 1 e @ e s t o 4 Constants are not normally d e c l a r e d e x p l i c i t l y : because they are constants, t h e i r occurrence i n the program i s enough to d e c l a r e i m p l i c i t l y t h e i r type and v a l u e . I t i s general p r a c t i c e , however, to t r e a t a l l such i m p l i c i t d e c l a r a t i o n s , l i k e e x p l i c i t d e c l a r a t i o n s -- u s u a l l y by r e c o r d i n g them i n a constant t a b l e analogous to a symbol t a b l e : t h i s saves space fo r f r e q u e n t l y o c c u r r i n g c o n s t a n t s . 5 Once again, i t was A l g o l 68 which f i r s t made t h i s d i s t i n c t i o n c l e a r by t r e a t i n g assignment l i k e an operator, and by t r e a t i n g o perators as procedures. In the case of the i n t e g e r assignment operator, the l e f t operand ' i s of type r e f i n t (i n t e g e r v a r i a b l e ) whereas the r i g h t operand i s of type i n t (i n t e g e r v a l u e ) . s e c t i o n 3.1.2 79 seq => add type var i d = > l o c l o c const => l o c type v a l l o c A p p l y i n g these r u l e s -- again, in any order -- transforms the o r i g i n a l program graph i n t o the graph seq I 27< seq sto 96 add When we come to p r i n t t h i s out in a s e q u e n t i a l form, i t i s a simple matter to p r i n t : @ I l o c as "Ida l o c " , and a p l a i n " l o c " as " l o d l o c " , which represent, r e s p e c t i v e l y , l o a d i n g the address l o c and l o a d i n g the value at address l o c . A l i n e a r v e r s i o n of the program would then be: Ida 27 l o d 27 l o d 96 add sto load l o c a t i o n of x load value of x load constant 1 add x and 1 sto r e sum i n l o c a t i o n of x (c) Statement t r a n s l a t i o n We have a l r e a d y seen the r e d u c t i o n of a statement to s e c t i o n 3.1.2 80 simpler forms in the s e c t i o n on language e x t e n s i o n . There, the whi ie-do statement was re-expressed as an i f - t h e n - e l s e and goto combination. T h i s i s the usual approach to t r a n s l a t i n g from a high l e v e l language with s t r u c t u r e d statements whose c o n t r o l flow i s i m p l i c i t , to a low l e v e l language whose c o n t r o l flow i s e x p l i c i t i n i t s jump commands. It i s noteworthy t h a t , f o r example in the t r a n s f o r m a t i o n on page 70, the use of a graph r e p r e s e n t a t i o n e l i m i n a t e s the need for an a c t u a l jump l a b e l . The advantage of t h i s form of r e p r e s e n t a t i o n i s q u i t e obvious. Consider, f o r example, the e x p r e s s i o n i f a then b e l s e c f i [P5] On any standard machine a r c h i t e c t u r e , t h i s must be reduced to something of the form a jumpfalse LI [P6] b jump L2 L l : c L2:noop The s t i c k y p a r t of the above as a d e f i n i t i o n of a t r a n s l a t i o n i s the sudden appearance of the l a b e l s LI and L2. We must answer the q u e s t i o n s : Where do they come from? and, How can we be sure they do not occur elsewhere in the program? We would have to provide (and d e f i n e ) a mechanism f o r generating new l a b e l s t h a t , l i k e the use of symbol t a b l e s , i s e x t e r n a l to the t r a n s l a t i o n process; and, i f the t r a n s l a t i o n i s aimed at producing a r e l o c a t a b l e o b j e c t deck, these c o n s i d e r a t i o n s become q u i t e s e c t i o n 3.1.2 81 extraneous. However, consid e r the t r a n s f o r m a t i o n : i f - t h e n - e l s e seq = > a b c a jumpfalse b jump •>noop > c The ob j e c t graph looks s t r i k i n g l y l i k e a f l o w c h a r t . We have answered the q u e s t i o n s about l a b e l s L l and L2 by removing them, and demonstrating g r a p h i c a l l y that the jumps are l o c a l and are not a f f e c t e d by jumps elsewhere i n the program ( t h i s i s a r e s u l t of the d e f i n i t i o n of graph t r a n s f o r m a t i o n presented at the beginning of t h i s c h a p t e r ) . In a way, t h i s e l i m i n a t i o n of l a b e l s i s not s u r p r i s i n g : formal treatments of program e v a l u a t i o n -- f o r example, T u r i n g machines -- o f t e n t r e a t programs l i k e d i r e c t e d graphs whose nodes represent commands, and whose edges represent flow of c o n t r o l [ 8 ] ; flow c h a r t s , a once popular form of program design methodology, were g e n e r a l l y reduced to l i n e a r forms ( F o r t r a n programs, Cobol programs, e t c . ) by a l i b e r a l use of jump commands [28];' some f o r m a l i z a t i o n s of program c o n t r o l flow have depended h e a v i l y on d i r e c t e d graph r e p r e s e n t a t i o n s to show the u n d e r l y i n g "machine" s t r u c t u r e . 7 However, we have in t r o d u c e d a n o t a t i o n that i s out of f a s h i o n i n programming c i r c l e s , and i t needs an apology. For 6 An i n t e r e s t i n g v a r i a n t of . t h i s i s the r e d u c t i o n of a s t r u c t u r e d Ratfor program to standard F o r t r a n by the d i s c i p l i n e d i n t r o d u c t i o n of jumps [68]. 7 Greibach, f o r example, t r e a t s e q u i v a l e n c e of program schemes (fl o w c h a r t s ) i n terms of l e g a l t r a n s f o r m a t i o n s t u r n i n g one kind of scheme i n t o another [52]. s e c t i o n 3.1.2 82 the d u r a t i o n of t h i s chapter, we w i l l have to put up with o c c a s i o n a l " s p a g h e t t i " of t h i s s o r t : i n the next chapter we w i l l r e t u r n to the l i n e a r program form from which i t d e r i v e s , and w i l l do so w i t h i n a d i s c i p l i n e that leaves w e l l - d e f i n e d the r e l a t i o n s h i p s between such program fragments and the l a r g e r program that c o n t a i n s them, and a l s o d e f i n e s the u n d e r l y i n g s t r u c t u r e that i s being manipulated by a t r a n s f o r m a t i o n such as that from P5 to P6. The " s p a g h e t t i " w i l l be minimal, however, because we w i l l be d e a l i n g with small program fragments i n which jumps are l o c a l i z e d , not, as i s the case with f l o w c h a r t s used in misguided program design d i s c i p l i n e s , a complex network of jumps r e p r e s e n t i n g an e n t i r e program. Let us now look at the common syntax d i r e c t e d t r a n s l a t i o n technique used to t r a n s l a t e to quadruples (a kind of a b s t r a c t machine code used as an intermediate form f o r many syntax d i r e c t e d schemes). In such a scheme, semantic a c t i o n s (programs) are attached to s y n t a c t i c p r o d u c t i o n s . For example, the r u l e d e f i n i n g the jj^-statement might look l i k e t h i s : S : := if. B then S ( 1 > e l s e S ( 2 > { gen( B ); j l := code ( jumpfalse, n i l ); gen( S ( 1 > ) ; j2 := code (jump, n i l ) ; 11 := gen( S ( 2 > ) ; 12 := code( noop ); patch( j l , 11 ); patch( j2, 12 ) } Here, the procedure c a l l "gen(X)" causes code to be generated for the s y n t a c t i c u n i t X, and the procedure c a l l " c ode(...)" generates an a c t u a l object code statement; both these procedures sect ion•3.1.2 83 r e t u r n an address f o r the code generated. The procedure c a l l " p a t c h ( S , L ) " backpatches statement S, a jump statement, by f i l l i n g i t out with l a b e l L; statement S had been generated e a r l i e r with only a " n i l " i n p l a c e of i t s t a r g e t l a b e l . (Compare t h i s with Aho and Ullman [ 3 ] , pp 273-280; and with G r i e s [53] , pp 292-293.) T h i s process, too, c r e a t e s a d i r e c t e d graph l i k e the one generated by t r a n s f o r m a t i o n T9 (p 70), but i t does so by c l u m s i l y , s l i p p i n g i n a patch at the end. For t h i s t r a n s l a t e - a s - y o u - p a r s e technique to work at a l l , the grammar must o f t e n be c l u t t e r e d up with e x t r a p r o d u c t i o n s , or p u l l e d out of shape (compare Aho and Ullman [ 3 ] , p 277). 3.1.3 T r a n s l a t i n g in phases Although the syntax d i r e c t e d t r a n s l a t i o n model i s p r i m a r i l y aimed at s t r u c t u r i n g one-pass t r a n s l a t o r s , the concept of t r a n s l a t i n g a programming language i n s e v e r a l stages has a c o n s i d e r a b l e h i s t o r y a l s o , and i s not e n t i r e l y incompatible with the syntax d i r e c t e d approach. The idea of an intermediate language, w e l l s u i t e d to being at once the t a r g e t language f o r high l e v e l language t r a n s l a t i o n s and a u n i v e r s a l language f o r programming computers goes back to at l e a s t 1958, when i n i t i a l e f f o r t s were made toward the d e f i n i t i o n of such a language, t e n t a t i v e l y named Uncol s e c t i o n 3.1.3 84 ( U n i v e r s a l compiler o r i e n t e d language, or, seen from the other end, U n i v e r s a l computer o r i e n t e d language). S e v e r a l years l a t e r the e f f o r t was f l o u n d e r i n g i n the face of immense o b s t a c l e s presented by the d i v e r s i t y i n both computers and programming languages. Bagley, w r i t i n g i n 1962, suggests reasons why such a d e f i n i t i o n might be expected to f a i l to come i n t o e x i s t e n c e [ 7 ] . His a n a l y s i s , seen a f t e r the f a c t , i s a c c u r a t e : Uncol, though o f t e n mentioned, and f r e q u e n t l y " r e d i s c o v e r e d " under some new name or f o r m u l a t i o n , has not and, i t takes no prophet to see now, cannot e x i s t i n i t s o r i g i n a l form. However, the f a i l u r e of Uncol i s due only to i t s intended u n i v e r s a l i t y . The concept of an a b s t r a c t i n t e r m e d i a t e language as a bridge support in a multi-phase t r a n s l a t i o n can be found i n so many and d i v e r s e t r a n s l a t i o n e f f o r t s that i t must be cons i d e r e d a s u c c e s s f u l model. We have a l r e a d y mentioned that a sequence of t r a n s l a t i o n s between such "languages" can be used as the b a s i s f o r a d i s c u s s i o n of the semi-independent p o r t i o n s of a t r a n s l a t ion [85 ]. Sklansky et a l . present a formal view of program t r a n s l a t i o n based on p r e s e n t i n g the r e l a t i o n s h i p s of programming languages as a network, and t r e a t i n g t r a n s l a t i o n s as t r a n s d u c t i o n s from one node i n the network to another [119]. B a s i l i d i s c u s s e s a p r a c t i c a l technique with much the same b a s i s in p r e s e n t i n g a family of programming languages which may be implemented by means of a b o o t s t r a p technique using source code s e c t i o n 3.1.3 85 modi f i c a t ion [9 ]. S i m i l a r l y , uses f o r intermediate languages i n s p e c i f i c t r a n s l a t i o n e f f o r t s range from Strachey's CPL e f f o r t [120] through the Mobile Programming System of Waite e_t a_l. [135], and Richards' BCPL compiler [102,103], to the Pa s c a l "P" compiler [94]. Most of these languages are h i g h l y machine-like, and i t i s i n t e r e s t i n g to note that many of them use macro p r o c e s s i n g techniques to implement the intermediate language on a target machine. A review of experiences with intermediate languages modelling a b s t r a c t machines, and the e f f e c t i v e n e s s of macro pro c e s s o r s i n tra n s f o r m i n g these i n t o a c t u a l implementations, was presented by Newey et a_l. [93]; a l a t e r review by Elsworth looks at some A l g o l 68, P a s c a l , and BCPL intermediate languages [37]. Higher l e v e l i n t e r mediate languages have been a more recent development, i n c l u d i n g the c o n t r i b u t i o n of JANUS from the Colorado group [ 2 6 ] ; 8 TCOL from Carnegie-Mellon's PQCC group [78,112];' and from Appelbe, who d e s c r i b e s a three-stage 8 See a l s o a suggestion by Poole that program d e v e l o p m e n t / t r a n s l a t i o n systems might use a mixture of l e v e l s or " h y b r i d s " to g r e a t e r e f f e c t than s i m p l i s t i c phased t r a n s l a t i o n s [100]. 5 See a l s o a survey by C a t t e l l , c r i t i c a l l y examining s e l e c t e d multi-pass t r a n s l a t i o n e f f o r t s [21]; TCOL i s not so much a s i n g l e "language" as a f l e x i b l e i ntermediate r e p r e s e n t a t i o n . s e c t i o n 3.1.3 86 t r a n s l a t i o n through a high l e v e l intermediate language, GRAIL, and a low l e v e l intermediate language, STACODE [ 5 ] . Where low l e v e l i n t e r mediate languages are aimed at a s p e c i f i c group of machines, u s u a l l y by being themselves very low l e v e l machine models that ignore the more e x o t i c ( i f u s e f u l ) p o r t i o n s of most machines' i n s t r u c t i o n r e p e r t o i r e , high l e v e l i n termediate languages are aimed at a s p e c i f i c group of languages by being an a b s t r a c t ( s y n t a x l e s s ) c o l l e c t i o n of concepts common to these languages. In both cases, attempts to be more i n c l u s i v e w i l l tend i n the d i r e c t i o n of L i s p - s t y l e "systems" by i n c l u d i n g more and more s p e c i a l purpose concepts that p a r t l y o v e r l a p one another, and generate unwieldy user manuals. It i s easy enough to see how staged t r a n s l a t i o n s may be t r e a t e d as a sequence of t r a n s f o r m a t i o n s . For example, c o n s i d e r McKeeman's t r a n s l a t i o n "steps" ( [ 8 5 ] , s e c t i o n 5): source t e x t => parse t r e e => a b s t r a c t syntax t r e e (1) => standard a b s t r a c t t r e e (2) (r e d u c t i o n of language e x t e n s i o n s : c f. sect ion 3.1.1) => a t t r i b u t e c o l l e c t e d t r e e ( b u i l d s symbol t a b l e ; d e c l a r a t i o n s pruned from t r e e ) => a t t r i b u t e d i s t r i b u t e d t r e e (3) (symbol t a b l e d i s t r i b u t e d in t r e e ) => s e q u e n t i a l e x p r e s s i o n t r e e (expressions turned i n t o code) => s e q u e n t i a l c o n t r o l t r e e (4) ( c o n t r o l c o n s t r u c t s turned i n t o code) (=> t a r g e t text) T h i s i s a very f i n e d i s t r i b u t i o n of f u n c t i o n (as McKeeman s e c t i o n 3.1.3 87 admits) for convenience in an a b s t r a c t d i s c u s s i o n . . Undoubtedly a staged t r a n s l a t o r would implement t h i s in four stages (marked above by bracketed numbers): source te x t => a b s t r a c t syntax t r e e (1) => standard a b s t r a c t t r e e (2) => a t t r i b u t e d i s t r i b u t e d t r e e (3) => s e q u e n t i a l c o n t r o l t r e e (4) (=> t a r g e t t e x t ) which we may think of as roughly e q u i v a l e n t to (1) context free parse, (2) language extension r e d u c t i o n , (3) context s e n s i t i v e pass, and (4) semantic pass. The important p o i n t here i s that a t r a n s f o r m a t i o n i s t a k i n g p l a c e at each stage, and i t 'is notable that i t i s expressed in terms of a t r e e s t r u c t u r e r e p r e s e n t i n g the program. To a great extent t h i s i n s i g h t i s in keeping with developments in programming methodology in the l a s t decade: m u l t i p l e pass t r a n s l a t i o n s are more amenable to " s t r u c t u r e d " programming techniques than s i n g l e pass e f f o r t s where a l l these f u n c t i o n s (and McKeeman's a n a l y s i s of the s e p a r a b i l i t y of these f u n c t i o n s i s an accurate one) must be performed at once. We w i l l look more c l o s e l y at the a m e n a b i l i t y of a t r a n s f o r m a t i o n a l view of t r a n s l a t i o n to " s t r u c t u r e d " programming methodology in a concrete d i s c u s s i o n about TWS design in s e c t i o n 4.2.4. I t i s s u f f i c i e n t to note here that the above t r a n s f o r m a t i o n s represent a macroscopic v e r s i o n of the i n d i v i d u a l c o n s t r u c t t r a n s f o r m a t i o n s of the immediately preceding s e c t i o n s . Instead of t r a n s f o r m i n g while i n t o i f / g o t o , we are here transf o r m i n g language A i n t o language B — yet the n o t i o n of t r a n s f o r m a t i o n a p p l i e s e q u a l l y w e l l at both l e v e l s . s e c t i o n 3.1.3 88 We s h a l l c a l l these higher l e v e l t r a n s f o r m a t i o n s t r a n s d u c t i o n s . 3.2 O p t i m i z a t i o n T h i s s e c t i o n , l i k e s e c t i o n 3.1, i s d i v i d e d i n t o three p a r t s . To some extent, t h i s d i v i s i o n i s a r b i t r a r y : s e c t i o n 3.2.1 deal s with improvements made at the source language l e v e l ; s e c t i o n 3.2.2 presents the t r a d i t i o n a l r e p r e s e n t a t i o n of programs f o r purposes of o p t i m i z a t i o n as a model c o n s i s t e n t with the view that a l l t r a n s l a t i o n depends on s y n t a c t i c s t r u c t u r i n g ; and s e c t i o n 3.2.3 deal s with code improvements made during t r a n s l a t i o n and a f t e r a t a r g e t code has been produced. We can expect a c e r t a i n amount of o v e r l a p between these s e c t i o n s , because improvement at one l e v e l i n e v i t a b l y w i l l have something i n common with improvement at another. At the same time, these three realms of d i s c u s s i o n are t r a d i t i o n a l l y d i s t i n c t , and each has i t s s p e c i a l t e c h n i c a l l o r e . That we w i l l see a s i m i l a r i t y between them i s in no small measure due to the t r a n s f o r m a t i o n a l p o i n t of view we w i l l choose to take, and i s analogous to the s i m i l a r i t y we came to see, in s e c t i o n 3.1, between t r a d i t i o n a l l y separate forms of t r a n s l a t i o n techniques. In f a c t , these two s e c t i o n s w i l l not have done a l l they were intended to do i f they f a i l to leave the impression that there i s not so great a d i f f e r e n c e a f t e r a l l between t r a n s l a t i o n and o p t i m i z a t i o n . O p t i m i z a t i o n has never been a p a r t i c u l a r l y glamorous part of the t r a n s l a t i o n process, and there are s e v e r a l c l e a r reasons s e c t i o n 3.2 89 f o r t h i s : (1) Simple o p t i m i z a t i o n s seldom improve a program's running time s i g n i f i c a n t l y , while o p t i m i z a t i o n s that do have a n o t i c e a b l e e f f e c t on running time o f t e n cost too much to apply to the m a j o r i t y of programs, and a l s o cost more to implement than most t r a n s l a t o r development p r o j e c t s can a f f o r d to expend on them. (2) Programming languages were of t e n designed, in the past e s p e c i a l l y , without a c l e a r n o t i o n of what makes "good" programming, and t h i s o f t e n r e s u l t s in programs that a c t u a l l y degrade or become i n c o r r e c t under what had appeared to be safe o p t i m i z i n g t r a n s f o r m a t i o n s . (3) Machine design has always been such that c o n s i d e r a b l e i n t u i t i o n i s needed to t r a n s l a t e high l e v e l programming concepts -- l i k e a r r a y indexing, e x p r e s s i o n e v a l u a t i o n , procedure argument p a s s i n g , e t c . -- i n t o e f f i c i e n t machine l e v e l code; and i n t u i t i o n i s one t h i n g when a p p l i e d by an experienced system programmer working in machine code assembler, and q u i t e another when i t must be reduced to general p r i n c i p l e s by a compiler w r i t e r t r y i n g to i n c o r p o r a t e code improvements i n t o h i s program. I t i s the primary concern of the PQCC p r o j e c t [78], f o r example, to automate much of the o p t i m i z a t i o n of languages in a general f a s h i o n , so that o p t i m i z a t i o n w i l l be as a c c e s s i b l e i n a t r a n s l a t o r w r i t i n g t o o l as s y n t a c t i c a n a l y s i s i s at p r e s e n t . Many of these o b j e c t i o n s and t h e i r r a m i f i c a t i o n s are d i s a p p e a r i n g , however, simply because of the c o n t i n u i n g t r e n d in programming languages of p e r m i t t i n g i n c r e a s i n g l y a b s t r a c t d e s c r i p t i o n s of processes and the data they operate on. Recent s e c t i o n 3.2 90 language developments l i k e A l p h a r d and CLU p e r m i t the programmer h i m s e l f t o d e f i n e an a b s t r a c t domain w i t h i n which program b e h a v i o r may be e x p r e s s e d i n a way t h a t i s a t once c l e a r t o a human audie n c e f o r purposes of communication, v e r i f i c a t i o n , or some c o m b i n a t i o n of t h e s e , and l o c a l i z e s a l l q u e s t i o n s of i m p l e m e n t a t i o n i n such a way t h a t they need have no b e a r i n g on and t h e r e f o r e w i l l not i n t e r f e r e w i t h -- d i s c u s s i o n s about the program. 3.2.1 Source code o p t i m i z a t i o n s As programming languages become i n c r e a s i n g l y a b s t r a c t t h a t i s , as they remove more and more the e x p r e s s i o n of a l g o r i t h m s from t h e i r e x e c u t i o n a u t o m a t i c o p t i m i z a t i o n at a l l l e v e l s becomes not merely a method f o r i m p r o v i n g the i m p l e m e n t a t i o n of a l g o r i t h m s , but c r u c i a l i n making them work at a l l w i t h r e a s o n a b l e speed. Even r e l a t i v e l y low l e v e l languages l i k e F o r t r a n have r e c e i v e d c o n s i d e r a b l e a t t e n t i o n i n the a r e a of o p t i m i z a t i o n , because the s t r a i g h t f o r w a r d t r a n s l a t i o n of a r r a y r e f e r e n c e s i n l o o p s , f o r example, can i n c r e a s e the c o s t of a c r i t i c a l l o o p t o the p o i n t where the whole c o m p u t a t i o n becomes u n f e a s i b l e [ 8 1 ] . Such e a r l y examples of languages w i t h w e l l d e v e l o p e d a b s t r a c t i o n mechanisms as A l g o l 68, S i m u l a , P l a n n e r , and S m a l l t a l k have been p e r c e i v e d too o f t e n as pipe-dreams p r i m a r i l y because t h e i r i m p l e m e n t a t i o n s were not always r e a l i s t i c i n terms of e x e c u t i o n s e c t i o n 3.2.1 91 speed. Knuth ( [ 7 2 ] , pp 280-291) d i s c u s s e s the improvement of programs based on a b s t r a c t f o r m u l a t i o n s of the a l g o r i t h m by t r a n s f o r m i n g high l e v e l c o n s t r u c t s e x p r e s s i n g r e c u r s i o n i n t o lower l e v e l c o n s t r u c t s e x p r e s s i n g i t e r a t i o n . His i n t e n t i s to show t h a t , as a p r i m i t i v e concept, the goto l i e s at the back of even the most a b s t r a c t c o n t r o l c o n s t r u c t s , and that i t can be c o n s i d e r e d , i n t h i s l i g h t , i n a h i g h l y c o n t r o l l e d f a s h i o n that takes away much of i t s s t i n g . A s i m i l a r f o r m u l a t i o n was under contemporaneous development by B u r s t a l l and D a r l i n g t o n [30,19]. More r e c e n t l y , these ideas have been taken up by o t h e r s , concerned both with s i m p l i f y i n g the process of proving programs c o r r e c t -- e i t h e r by automatic means or, more r e a l i s t i c a l l y , by means of the s o c i a l processes that the s c i e n c e of a l g o r i t h m s i n h e r i t s from mathematics -- and of w r i t i n g programs at a very high l e v e l of a b s t r a c t i o n . Wang [137], Wegbreit [138], Arsac [ 6 ] , and L e v e r e t t e_t a l . [78], are a l l concerned with improving the performance of a b s t r a c t programs by g i v i n g well-understood t r a n s f o r m a t i o n r u l e s whereby they may be reduced to e q u i v a l e n t programs u s i n g more mechanistic or "concrete" c o n t r o l c o n s t r u c t s . In keeping with the method e s t a b l i s h e d in the preceding s e c t i o n , we w i l l examine a simple example and i t s treatment i n terms of s y n t a c t i c t r a n s f o r m a t i o n . We have, in f a c t , a l r e a d y seen one t r a d i t i o n a l o p t i m i z a t i o n expressed as a graph s e c t i o n 3.2.1 92 t r a n s f o r m a t i o n : operator s t r e n g t h r e d u c t i o n (p 64). Another improvement technique removes redundant computations. T h i s g e n e r a l l y r e q u i r e s an a n a l y s i s of the flow s t r u c t u r e of the program, as do other, more e s o t e r i c techniques. However, even on the statement l e v e l there are redundancies int r o d u c e d f o r the sake of c l a r i t y or because of the nature of the a l g o r i t h m . For i n s t a n c e , a computation i n v o l v i n g a r r a y s o f t e n produces a redundant computation: A( i , j ) := A( i , j ) + C t r a n s l a t e s , at a more p r i m i t i v e l e v e l , i n t o ("@A" i s an expre s s i o n f o r "the address of A", while " ! t " i s and expr e s s i o n for "the l o c a t i o n addressed by t " ; " v ( t ) " i s "the value addressed by t " ) : @A + i * d l + j + 1 := v( @A + i * d l + j + 1 ) + C which could be reduced to t := @A + i * d l + j + 1 !t := v( t ) + C N a t u r a l l y , the more dimensions i n the a r r a y , the more complex the redundant e x p r e s s i o n becomes. D e t e c t i o n of redundant expr e s s i o n s i s s i m p l i f i e d by the use of a DAG c o n s t r u c t i o n (Aho and Ullman [ 3 ] , pp 420-421, 425-426) which w i l l be demonstrated in a d e t a i l e d d i s c u s s i o n l a t e r i n t h i s s e c t i o n . Another v a l u a b l e improvement, known as code mot ion, removes c o s t l y computations from the inner body of a loop, s i n c e loops, c o n t a i n i n g a small p r o p o r t i o n of the code, g e n e r a l l y account f o r s e c t i o n 3.2.1 93 the g r e a t e s t p r o p o r t i o n of the running time. Again, t h i s kind of o p t i m i z a t i o n i s t r a d i t i o n a l l y performed with support from a flow graph r e s t r u c t u r i n g of the program syntax, such as w i l l be d i s c u s s e d i n the next s e c t i o n . However, i t i s c o n s i d e r a b l y s i m p l i f i e d i f we assume a r e l a t i v e l y a b s t r a c t language without gotos, but with only " s t r u c t u r e d " l o o p i n g mechanisms -- in which case the flow graph and the parse graph are e s s e n t i a l l y e q u i v a l e n t . An example of a loop o p t i m i z a t i o n i s the f o l l o w i n g (Waite [136], p 595): for i:=ml unti1 m2 by m3 do ... ( k l * i + k2) ... end The v a r i a b l e i (the induct ion var i a b l e ) i s used in a computation i n v o l v i n g c o n s t a n t s k l and k2. T h i s computation may be reduced to a smaller o n e : 1 0 11 := kl*ml + k2; 12 := kl*m3; for i:=ml unti1 m2 by m3 do ... ( i l 5 T . . i l := i l + i2 end Besides r e q u i r i n g only two m u l t i p l i c a t i o n s f o r an i t e r a t i o n of any l e n g t h , and only one e x t r a a d d i t i o n , on many machines i l and i2 can be implemented in r e g i s t e r s : the statement " i l := i l + i 2 " thus reduces , to a s i n g l e f a s t r e g i s t e r to r e g i s t e r 1 0 The usual caveat for o p t i m i z a t i o n a p p l i e s in t h i s example: i f the loop i s not executed at a l l , t h i s v e r s i o n i s a c t u a l l y more c o s t l y -- though only minimally so -- than the "unoptimized" v e r s i o n . Of course, i n g e n e r a l , we can expect any loop to be executed m u l t i p l e times, s i n c e that i s i t s f u n c t i o n . s e c t i o n 3.2.1 94 i n s t r u c t i o n . In g e n e r a l , however, ex p r e s s i o n o p t i m i z a t i o n depends on a graph r e p r e s e n t a t i o n of the code (w i t h i n a one-entry one-exit sequence of code c a l l e d a bas i c bloc k, f o r which, see the next s e c t i o n ) , commonly c a l l e d a DAG. For example, the program sequence (Aho and Ullman, p 428): A := B + C; E := C + D + B may be represented by the parse t r e e : i i I i A + E + [T12] I 1 1 I 1 1 B C + B I 1 1 C D However, i f we make use of a simple DAG c o n s t r u c t i o n a l g o r i t h m (Aho and Ullman, pp 420-423) and take advantage of the laws of a s s o c i a t i v i t y and commutativity that apply to a d d i t i o n , 1 1 we can c o n s t r u c t a r e p r e s e n t a t i o n f o r an i d e n t i c a l sequence in which common subexpressions have been reduced to a s i n g l e subgraph. 1 1 They do not apply a l l the time on computers, where numbers are g e n e r a l l y l i m i t e d i n t h e i r p r e c i s i o n , and the order of e v a l u a t i o n can, and in many in s t a n c e s does, a f f e c t the r e s u l t . Here i s yet another caveat t h e r e f o r e : the " o p t i m i z a t i o n " we are about to examine may, i n f a c t , p e r v e r t the programmer's i n t e n t . s e c t i o n 3.2.1 95 + -> + <- D 1 C T h i s c o u l d be d i r e c t l y t r a n s l a t e d by a tour of the DAG which, at the same time, transformed e x p r e s s i o n s i n t o temporary l o c a t i o n s once they had been encoded. Thus, a f t e r g e n e r a t i n g the (IBM 360/370 s t y l e ) code: load r l , B add r l , C the DAG looks l i k e r E -> <rl,+><-r B c + D (Note that both r e g i s t e r r l and e x p r e s s i o n "B+C" are remembered. T h i s i s because, fo r complex e x p r e s s i o n s , i t i s p o s s i b l e to run out of r e g i s t e r s ; but then the subexpression can s t i l l be r e t r i e v e d . ) Then, the assignment operator and semicolon (sequence operator) are reduced, g e n e r a t i n g the code: s t o r e r l , A and r e s u l t i n g i n the DAG: s e c t i o n 3.2.1 96 1 <rl,A,+> I + _L_ B C (Here, r l , A, and "B+C" are a l l remembered as, at present, e q u i v a l e n t , and i n order of preference f o r code generation.) The next code generated i s f o r the one remaining a d d i t i o n : add r l , D with the r e s u l t i n g DAG: I <rl,+> I 1 1 <A,+> D 1 I B C (The e x t r a i n f o r m a t i o n i s no longer i n t e r e s t i n g , but might have been u s e f u l i n a l a r g e r program segment.) The f i n a l code generated i s f o r the assignment: s t o r e r l , E T h i s process has produced the code marked (a), below: load r l , B lo a d r l , B add r l , C add r l , C st o r e r l , A s t o r e r l , A add r l , D lo a d r l , C st o r e r l , E add r l , D add r l , B (a) s t o r e r l , E (b) Compare t h i s with the code generated d i r e c t l y from the parse t r e e T12, marked (b), above. More complex forms of program improvement are a l s o s e c t i o n 3.2.1 97 e s s e n t i a l l y s y n t a c t i c t r a n s f o r m a t i o n s , but can be d i s c u s s e d only in terms of f a i r l y complex p a t t e r n matching. One simple example of t h i s kind i s r e c u r s i o n e l i m i n a t i o n , where a (not unusual) r e c u r s i v e procedure of the form f(n) = if_ n<l then 1 e l s e n * f ( n - l ) f_i i s turned i n t o an i t e r a t i v e one: f(n) = begin t:= 1; L: if. n > l then t := t * n; n : = n - 1 ; goto L f i ; t end T h i s kind of t r a n s f o r m a t i o n i s p r i m a r i l y aimed at e l i m i n a t i n g the sadly high cost of procedure c a l l i n g on most e x i s t i n g machine a r c h i t e c t u r e s , and i t g e n e r a l l y f a l l s i n t o the c l a s s of semi-automatic program improvement. I t i s intended to be a means whereby a programmer can e s t a b l i s h the c o r r e c t n e s s of h i s a l g o r i t h m at a high l e v e l of a b s t r a c t i o n , because e s t a b l i s h i n g c o r r e c t n e s s w i l l tend to be a smaller task at an a b s t r a c t l e v e l . Once c o r r e c t n e s s has been e s t a b l i s h e d the program can then be transformed, by d i s c i p l i n e d means i n v o l v i n g well-understood t r a n s f o r m a t i o n s , i n t o a more e f f i c i e n t one. The programmer w i l l p r e f e r a b l y be a i d e d in t h i s somewhat cumbersome process by a book-keeping computer. Semi-automatic program improvement shades over i n t o semi-automatic program s y n t h e s i s , where, beginning from a high l e v e l ( a b s t r a c t ) d e s c r i p t i o n , the programmer develops the d e t a i l s of h i s program to any l e v e l of d e t a i l necessary. s e c t i o n 3.2.1 98 Improvements can be a p p l i e d at any stage of the process of making the a b s t r a c t i o n concrete, wherever t h i s process has produced a l e s s than s a t i s f a c t o r i l y e f f i c i e n t d e s c r i p t i o n . There w i l l be a d i s c u s s i o n of c l o s e l y r e l a t e d m a t e r i a l i n s e c t i o n 3.4. 3.2.2 Flowgraphs and intermediate l e v e l o p t i m i z a t i o n S e c t i o n 3.1 ( t r a n s l a t i o n ) based i t s arguments on a c l o s e correspondence between t r a n s l a t i o n and s y n t a c t i c s t r u c t u r e . Even in s e c t i o n 3.2.1 (source code o p t i m i z a t i o n ) the argument was based d i r e c t l y on program examples and c l o s e l y r e l a t e d , though a b s t r a c t , graph r e p r e s e n t a t i o n s . As we pursue o p t i m i z a t i o n , however, we must sooner or l a t e r come up a g a i n s t s e r i o u s l i m i t a t i o n s i n such a correspondence. For example, i f we represent the program (from Aho and Ullman [ 3 ] , p 465): (1) i := 1; (2) i_f x<b goto (4) ; (3) goto (6)7 (4) i := 2; (5) x := x + 1; (6) y := y - 1; (7) i_f y<20 goto (9) ; (8) goto (2); (9) j := 1 as a graph r e p r e s e n t i n g the syntax d i s c o v e r e d by a c o n v e n t i o n a l p a r s e r using a BNF d e s c r i p t i o n , i t might look l i k e ( r e p r e s e n t i n g statements by t h e i r l i n e numbers): s e c t i o n 3.2.2 99 (1) (2) (8) (9) T h i s r e p r e s e n t a t i o n gives us no idea that the program c o n t a i n s a loop, l e t alone what v a r i a b l e represents the loop's i n d u c t i o n v a r i a b l e , so that we cannot begin to apply any o p t i m i z a t i o n s t r a t e g y . On the other hand, a graph r e p r e s e n t a t i o n of the c o n t r o l flow: immediately i n d i c a t e s : that there i s a loop (a c y c l e i n the graph); which nodes p a r t i c i p a t e i n the loop (those that form a s t r o n g l y connected s e t ) ; and where to look f o r an i n d u c t i o n v a r i a b l e (among the statements represented by the loop nodes). The nodes a l s o represent b a s i c blocks (blocks of code with e x a c t l y one entry and one e x i t p o i n t ) , and i t i s w i t h i n these that we may s a f e l y apply e x p r e s s i o n o p t i m i z a t i o n . Another use for t h i s r e p r e s e n t a t i o n i s in more g l o b a l v a l u e - r e t e n t i o n i n f o r m a t i o n ( i n r e g i s t e r s , e t c . ) , and in d e c i d i n g which computations can be s a f e l y h o i s t e d . o u t s i d e the loop to reduce the computations done i n s i d e . (1) ( 2 ; 3 ) < ( 4 ; 5 ) > ( 6; 7 ; 8 ) (9) Such a r e p r e s e n t a t i o n i s i n no sense out of keeping with s e c t i o n 3.2.2 100 the g e n e r a l philosophy adopted i n t h i s chapter (and, u l t i m a t e l y , in t h i s d i s s e r t a t i o n ) that programs are u s e f u l l y represented by s y n t a c t i c s t r u c t u r e s f o r purposes of t r a n s l a t i o n . Consider a more a b s t r a c t f o r m u l a t i o n of the program: i := 1; repeat i f x<b then i 1=2; x : = x + 1 f_i ; y := y - 1 u n t i l y^20; j := 1 which c o n t a i n s no gotos and t h e r e f o r e r e v e a l s more immediately i t s bas i c block s t r u c t u r e than the l i n e a r v e r s i o n . The d i f f e r e n c e between these two programs l i e s in the t r a n s f o r m a t i o n s repeat-unt i1 seq<-= > stms t e s t stms i f - g o t o goto t e s t ->null i f-then seq = > t e s t stms t e s t 1 1 -goto goto i >stms >nul The t r a n s f o r m a t i o n s simply reduce a program in an A l g o l 6 0 - l i k e language to an eq u i v a l e n t program i n a F o r t r a n - l i k e language (compare Kernighan [68], and Hanson [56]); that i s , they re-express the a b s t r a c t c o n t r o l flow s t r u c t u r e as more p r i m i t i v e , more machine-like c o n t r o l s t r u c t u r e s , and i n t h i s they are performing a kind of t r a n s l a t i o n . Flowgraphs are t h e r e f o r e a conception of a program's s e c t i o n 3.2.2 101 s y n t a c t i c s t r u c t u r e as w e l l . In f a c t , they s t r u c t u r e b a s i c b l o c k s , which in turn give a s e q u e n t i a l s t r u c t u r e to statements and a DAG s t r u c t u r e to e x p r e s s i o n s . 1 2 We can now apply the more e x o t i c forms of o p t i m i z a t i o n , which depend on the knowledge about flow of c o n t r o l embodied in flow graphs. The next s e c t i o n c o n s i d e r s s e v e r a l of these o p t i m i z a t i o n techniques. 3.2.3 Object code o p t i m i z a t i o n s In s e c t i o n 3.2.1, we saw an example of object code generation r e q u i r i n g only one r e g i s t e r for optimal e v a l u a t i o n of the sequence A := B + C; [P7] E := C + D + B; In g e n e r a l , however, the use of m u l t i p l e r e g i s t e r s can produce b e t t e r code than i f the t r a n s l a t i o n i s r e s t r i c t e d to one r e g i s t e r . Consider the e x p r e s s i o n A*B + C*D [P8] No amount of manipulation w i l l reduce t h i s ( i n a one r e g i s t e r t r a n s l a t i o n ) to anything l e s s than load r,A mult r,B s t o r e r,TEMP load r,C mult r,D add r,TEMP With two r e g i s t e r s , the save, the use of a temporary storage 1 2 T h i s approach, of combining these u s u a l l y d i s t i n c t c onceptions of program s t r u c t u r e i n t o one f o r purposes of o p t i m i z i n g code g e n e r a t i o n , was r e c e n t l y a p p l i e d a l s o in the design of PQCC [22]. s e c t i o n 3.2.3 102 l o c a t i o n and the f i n a l memory to r e g i s t e r add can a l l be reduced to a s i n g l e r e g i s t e r to r e g i s t e r add: load r l , A mult r l , B load r2,C mult r2,D add r l , r 2 With an i n f i n i t e number of r e g i s t e r s (or a stack machine) optimal r e g i s t e r use c o u l d be guaranteed f o r any e x p r e s s i o n , no matter how complex. However, r e g i s t e r s are g e n e r a l l y l i m i t e d to a small number (e i g h t on the PDP-11 and, f o r purposes of m u l t i p l i c a t i o n and d i v i s i o n , e i g h t a l s o on the IBM 360/370), and although few programs use l a r g e e x p r e s s i o n s , i t i s p r e c i s e l y those few that need good code generation techniques to render t h e i r t r a n s l a t i o n c o m p e t i t i v e with hand w r i t t e n assembly code. The same kind of a n a l y s i s that generated good code for program segment P7 can a l s o determine good r e g i s t e r use for segment P8, when combined with b a s i c block a n a l y s i s . Code generat i o n w i l l not be so s t r a i g h t f o r w a r d l y one-pass now, however, as i n the case of P7. Aho and Ullman ( [ 3 ] , pp 526-527) d e s c r i b e a process for f i n d i n g out which intermediate and f i n a l v alues computed in one b a s i c block remain u s e f u l f o r i t s successors -- and should, t h e r e f o r e , where p o s s i b l e , be r e t a i n e d i n r e g i s t e r s . T h i s process r e q u i r e s two passes over the code: one forward, and one backward, scanning each successor b a s i c block f o r use of v a r i a b l e s and intermediate v a l u e s . DAG r e p r e s e n t a t i o n s are a l s o used in Aho and Ullman's loop o p t i m i z a t i o n s (pp 454-499), h e u r i s t i c r e o r d e r i n g of e v a l u a t i o n s e c t i o n 3.2.3 103 sequence (pp 538-540), and g l o b a l r e g i s t e r a l l o c a t i o n (pp 533-537). Consider again the example on page 93: for i:=ml u n t i 1 m2 by m3 do . .. ( k l * i + k2) ... end T h i s i s represented by a flow graph of the form H: [i:=ml; M2:=m2; M3:=m3] B: [...( E )...] T: [i:=i+M3; i f i<M2 goto B]-The loop i s represented by three b l o c k s , H, B, and T ( f o r convenience, there are three b l o c k s , although B and T together form a ba s i c b l o c k ) . The exp r e s s i o n E i s a graph: * _ i _ k2 k l The i n d u c t i o n v a r i a b l e i i s e a s i l y e x t r a c t e d from the o r i g i n a l program i n t h i s case, s i n c e i t i s i n a f o r loop. In gen e r a l , i n d u c t i o n v a r i a b l e s may be d i s c o v e r e d by a n a l y s i s of the data flow (Aho and Ullman [ 3 ] , pp 466-471). The use of the in d u c t i o n v a r i a b l e in expres s i o n E r e s u l t s i n a h i g h l y r e g u l a r sequence of v a l u e s : s e c t i o n 3.2.3 104 i =ml i =ml+m3 i=ml+2(m3) kl*ml + k2 kl*ml + k2 + kl*m3 kl*ml + k2 + 2(kl*m3) i=ml+n(m3) kl*ml + k2 + n(kl*m3) Just as we make the f o r loop more p r i m i t i v e by moving the computation of m2 and m3 to the header block, and r e - e x p r e s s i n g the c o n t r o l flow as an i f/goto combination, so we make ex p r e s s i o n E more p r i m i t i v e by r e - e x p r e s s i n g the general equation (E=kl*i + k2) as an i t e r a t i v e s e r i e s d e r i v e d from the above a n a l y s i s : E[0] = kl*ml + k2 E[n] = E[n-1] + k2*m3 or, i n terms of programming c o n s t r u c t s , i l := kl*ml + k2; do i l := i l + kl*m3 od Since "kl*m3" i s a constant value, i t can be moved out s i d e the loop as w e l l : 11 := kl*ml + k2; 12 := kl*m3; do i l := i l + i2 od We can now merge t h i s ( i n f i n i t e ) loop with the c o n t r o l l i n g loop: H: i := ml; We do t h i s by r e p l a c i n g " ( E ) " by " ( i l ) " and by p l a c i n g the l o o p - i n v a r i a n t computation in the header block H, and the i t e r a t i o n in the computation-independent t a i l block T (to insure that updating i l w i l l not a f f e c t any computation performed i n the body B). The r e s u l t i s : M2 := m2; M3 := m3; do B: ... (E) . i-> . ... \ LJ / . . . T: i := i + M3 while i < M2 od • r s e c t i o n 3.2.3 105 H: i : = ml; M2 := ml; M3 •:= m3; 11 := kl*ml + k2; 12 := kl*m3; do B: ... (i1) . . . ; T: i := i + M3; i l := i l + i2 while i < M2 od A s i m i l a r form of a n a l y s i s , merging independent p o r t i o n s of a loop, was used by D i j k s t r a in a d e r i v a t i o n of GCD ([34], s e c t i o n The term peephole o p t i m i z a t i o n was f i r s t a p p l i e d by McKeeman [83] to d e s c r i b e l a s t minute code improvements, e s p e c i a l l y for code generated by syntax d i r e c t e d t r a n s l a t i o n s , where the context was s e v e r e l y l i m i t e d by the grammar. Many of these are subsumed by other methods of o p t i m i z a t i o n that are more g e n e r a l l y a p p l i c a b l e to f u l l y i n t e g r a t e d t r a n s l a t i o n e f f o r t s . McKeeman's s p e c i f i c examples of redundant lo a d e l i m i n a t i o n from sequences l i k e the f o l l o w i n g : { X := C; { load r l , C { load r l , C Y := X } => s t o r e r l , X => s t o r e r l , X load r l , X s t o r e r l , Y } s t o r e r l , Y } and p r e - e v a l u a t i o n of constant e x p r e s s i o n s , are both b e t t e r performed at higher l e v e l s , where more general methods w i l l almost c e r t a i n l y produce b e t t e r code. However, when techniques l i k e m u l t i - s t a g e t r a n s l a t i o n are used, e s p e c i a l l y i n software p o r t i n g s t r a t e g i e s , such l o c a l p a t t e r n matching and improving t r a n s f o r m a t i o n s are extremely v a l u a b l e , and r e l a t i v e l y simple to implement, c o n s i d e r i n g that each stage of the t r a n s l a t i o n i s c e r t a i n to introduce i n e f f i c i e n c i e s that are best removed before s e c t i o n 3.2.3 106 they m u l t i p l y i n the next stage. Richards [102] comments on the importance of such techniques with s p e c i f i c r e f e r e n c e to the p o r t a b l e BCPL compiler; see a l s o C a t t e l l [21] f o r a c r i t i q u e of m u l t i - s t a g e t r a n s l a t o r s , p a r t i c u l a r l y because of t h e i r tendency to lose v a l u a b l e o p t i m i z i n g i n f o r m a t i o n ( p a i r e d with a c r i t i q u e of one-stage t r a n s l a t o r s f o r being unable to cope w e l l with c o n t e x t u a l i n f o r m a t i o n ) . 3.3 E r r o r h a n d l i n g in a s y n t a c t i c a l l y s t r u c t u r e d environment "In an i d e a l world [says Horning], where n e i t h e r humans nor machines ever make mistakes, the c o m p i l e r - w r i t e r c o u l d l i m i t h i s a t t e n t i o n to c o r r e c t l y t r a n s l a t i n g c o r r e c t programs (and some naive compiler w r i t e r s do s o ) . In r e a l i t y , however, most of the programs processed by any compiler w i l l be to some extent i n c o r r e c t " ( [ 6 1 ] , p 532). There i s a c o n s i d e r a b l e l i t e r a t u r e on context f r e e s y n t a c t i c e r r o r d e t e c t i o n , recovery, and, to a l e s s e r extent, c o r r e c t i o n . Modern p a r s i n g techniques (LL and LR e s p e c i a l l y ) d e t e c t s y n t a c t i c e r r o r s at the e a r l i e s t p o s s i b l e moment, and r e l a t i v e l y good recovery techniques e x i s t so that the parser can continue, a f t e r an e r r o r , and d e t e c t f u r t h e r e r r o r s as w e l l (Aho and Ullman [ 3 ] , pp 397-402). However, many of these techniques i n t e r f e r e with the semantic p o r t i o n of syntax d i r e c t e d compilers (Gries [53], p 321), and t h i s can be c i t e d as yet another argument f o r s e p a r a t i n g the parse from the t r a n s l a t i o n . G r i e s s e c t i o n 3.3 107 a l s o notes: "The nice part about top-down e r r o r recovery i s that the p a r t i a l l y c o n s t r u c t e d t r e e conveys much usable i n f o r m a t i o n . . . n o t as r e a d i l y a v a i l a b l e i n the bottom-up method" (p 325); the hidden poin t here i s , of course, that a s y n t a c t i c s t r u c t u r e g i v e s much more support for complex d e c i s i o n s i n v o l v i n g programs (and the lack of depth in most d i s c u s s i o n s on e r r o r recovery i n d i c a t e s that i t i s nothing i f not complex) than does an u n s t r u c t u r e d s t r i n g . Elsewhere, G r i e s notes that perhaps the usual approach to e r r o r c o r r e c t i o n , which i n v o l v e s f i n d i n g the minimum sequence of s t r i n g t r a n s f o r m a t i o n s that w i l l make a " c o r r e c t " program out of an i n c o r r e c t one, would be b e t t e r d e f i n e d i n terms of t r a n s f o r m a t i o n s on a t r e e ([54], p 629) . A major annoyance in developing a program using a one pass compiler i s the redundancy of e r r o r messages. Consider the program ([ 5 3 ] , p 319): begin (1) r e a l x; ... (2) begin (3) boolean x; ... (4 ) begim ... end; (5) x := x and y; z := x or y(6) end end (7) I t c o n t a i n s one e r r o r , namely the m i s s p e l l e d begin of l i n e 5. However, t h i s e r r o r causes a mismatching of begin-end p a i r s , so that the statements on l i n e 6 w i l l be parsed in an environment where x i s d e c l a r e d to be r e a l , not boolean. We get, in the average compiler, three messages t e l l i n g us that x i s used i n c o r r e c t l y i n boolean e x p r e s s i o n s on l i n e 6; i f i t were not s e c t i o n 3.3 108 d e c l a r e d on l i n e 2, we would get another three messages t e l l i n g us that as w e l l . Some redundancy i s unavoidable i n these s i t u a t i o n s . In the absence of s p e l l i n g c o r r e c t i o n (which i s r a r e l y implemented, and f r e q u e n t l y so u n r e l i a b l e that users turn i t o f f in annoyance a f t e r i t s f i r s t major blunder) we can expect a note about the odd symbol "begim", a note about the redundant second end on l i n e 7, and one note about the misuse of x on l i n e 6; i f x were "misused" on other l i n e s , we would s t i l l l i k e j u s t one note, g i v i n g us a l i s t of o f f e n d i n g l i n e s . There are, to be sure, known techniques f o r overcoming t h i s d e f i c i e n c y , g e n e r a l l y i n v o l v i n g the use of symbol t a b l e s , s i n c e the d i f f i c u l t cases f o r context f r e e p a r s e r s are always the context s e n s i t i v e ones, and i n programming languages that i s always those cases i n v o l v i n g the use of symbols (Gr i e s [53], pp 318-320). Let us c o n s i d e r the c l a s s i c a l techniques i n terms of a graph t r a n s f o r m a t i o n approach, however. The above program would be e v e n t u a l l y parsed as block d e l s seq l i n e (1) block l i n e 2 etc 6 V< 1 > and or V ( 2 ) where V = var r e a l x s e c t i o n 3.3 109 (Note l i n e number in f o r m a t i o n r e t a i n e d by parse.) In a type checking pass, the second occurrence of x, marked " v ( 2 ) " , i s encountered i n a boolean o p e r a t i o n ("x and y " ) ; t h i s use i s i n c o r r e c t , and i s marked as such by a l t e r i n g the "symbol t a b l e e n t r y " represented by subgraph V: var V => | h... r e a l e r r o r 1 1 ' P , x type boolean {6} (x i s i n v o l v e d i n a type e r r o r at l i n e ( s ) {6}, where i t i s used in a context r e q u i r i n g a boolean.) The next problem i s with the assignment of the r e s u l t of a boolean e x p r e s s i o n to a r e a l v a r i a b l e represented by "V ( 1>". Subgraph V i s a l r e a d y marked with a type e r r o r i n v o l v i n g i n c o r r e c t use of x i n a boolean context on l i n e 6, so nothing i s done. S i m i l a r l y , the use of x represented by "V( 3'" adds no new e r r o r i n f o r m a t i o n . However, i f t h i s statement were on l i n e 7, the l i n e number in f o r m a t i o n would be augmented: var V => i f-. . . r e a l e r r o r 1 1 1 — 1 , x type boolean {6,7} The above e f f e c t i v e l y re-expresses the use of symbol t a b l e s i n e r r o r r e p o r t i n g i n a graph t r a n s f o r m a t i o n form. E r r o r s should a l s o be recorded i n a l i s t -- which may be s o r t e d , at the end of the c o m p i l a t i o n , i n t o some c a n o n i c a l order based on e r r o r types, l i n e number of f i r s t occurrence, or some other r e p o r t i n g c r i t e r i o n . If the i n f o r m a t i o n recorded i_s the " e r r o r " subgraph s e c t i o n 3.3 110 shown above, r e c o r d i n g any f u r t h e r e r r o r s about t h i s symbol w i l l a u t o m a t i c a l l y cause them to be recorded i n the e r r o r l i s t as w e l l . Consider a l s o the program begin a r r a y [ 1 : 7 , 2:5, -1:1] o_f i n t e g e r c o l o u r ; . . . c o l o r ! 2 , 3 , 0 ] . . . end T h i s i n v o l v e s a s p e l l i n g e r r o r , 1 3 but w i l l normally generate two e r r o r messages: one to complain about " c o l o r " being undeclared; the other to complain that the dimensions of the s u b s c r i p t do not match those " d e c l a r e d " f o r the v a r i a b l e , or p o s s i b l y that there i s a type i n c o m p a t i b i l i t y , or even ( i f square brackets are not a v a i l a b l e , and parentheses are used i n s t e a d ) that the procedure c a l l i s i n c o r r e c t (Aho and Ullman [ 3 ] , p 403; G r i e s [53], p 318). The second message i s extraneous, and may confuse the programmer. 1 4 I t should be suppressed by the same method used e a r l i e r : the program graph generated by the parser w i l l be of the form 1 3 P o s s i b l y introduced by having the program pass through the hands of two programmers of d i f f e r e n t n a t i o n a l i t y . 1 4 Experienced programmers begin to "see through" redundancies of t h i s s o r t a f t e r a while, simply because i t i s such a common experience. In my experience teaching i n t r o d u c t o r y programming courses, however, redundant messages are e s p e c i a l l y c o n f u s i n g to beginning programmers, who often cannot see that the second message i s a f u n c t i o n of the f i r s t , and yet are most in need of help from the compiler. M i n i m i z i n g redundant and extraneous messages i s , t h e r e f o r e , of even g r e a t e r importance i n student language p r o c e s s o r s than in p r o c e s s o r s aimed at p r o f e s s i o n a l s . s e c t i o n 3.3 111 block d e l seq • • • • • • index var s u b s c r i p t s notype e r r o r c o l o r undcl {...} When the s u b s c r i p t e r r o r i s d i s c o v e r e d , the f a c t that " c o l o r " has already been marked as an undeclared v a r i a b l e i s enough to suppress the e r r o r message about type i n c o m p a t i b i l i t y . N a t u r a l l y , t h i s w i l l be the wrong a c t i o n in the rare s i t u a t i o n where the program had two e r r o r s in i t : begin a r r a y [ 1 : 7 , 2:5] of. in t e g e r c o l o u r ; . . . c o l o r ! 2 , 3, 0 ] . . . end F i x i n g up the name only r e s u l t s i n a new message i n v o l v i n g mismatched a c t u a l and d e c l a r e d s u b s c r i p t l i s t s . However, p r o p e r l y speaking the second e r r o r does not yet e x i s t in t h i s program, s i n c e " c o l o r " may have been intended to be a separate s t r u c t u r e d v a r i a b l e having three dimensions. The s e n s i b l e r u l e i s that i t i s b e t t e r to be annoying o c c a s i o n a l l y than to be annoying c o n t i n u a l l y . 3.4 Automatic program development and improvement The e a r l i e r p o r t i o n s of t h i s chapter were concerned p r i m a r i l y with a p r e s e n t a t i o n of t r a d i t i o n a l t r a n s l a t i o n concerns in terms of a uniform model. That the concept of s y n t a c t i c t r a n s f o r m a t i o n can be s u c c e s s f u l l y a p p l i e d to these s e c t i o n 3.4 112 concerns with a great d e a l of elegance argues in i t s favor as a model for t r a n s l a t i o n . However, there are s e v e r a l new concerns i n programming language s t u d i e s and c l o s e l y r e l a t e d areas that are beginning to take w e l l - d e f i n e d form. These are not yet i n c l u d e d in t e x t s on the s u b j e c t , because they have yet to emerge i n t o a uniform r e l a t i o n s h i p to the t r a d i t i o n a l s u b j e c t s normally covered in such t e x t s . In t h i s s e c t i o n we w i l l review what appear to be the major p o i n t s i n these new concerns: i t should become reasonably c l e a r from t h i s review that they, too, can be c o n s i d e r e d i n terms of s y n t a c t i c t r a n s f o r m a t i o n s . Programming d i s c i p l i n e s and program c o r r e c t n e s s concerns have gone hand in hand s i n c e t h e i r emergence from the f i r s t c h a l l e n g e s to t r a d i t i o n a l methods of program development that spawned them more than a decade ago. At the same time, the idea of automatic s y n t h e s i s of a running program from s p e c i f i c a t i o n s of i t s a b s t r a c t c h a r a c t e r i s t i c s has developed i n t o a c l o s e l y r e l a t e d a r ea. Program development by stepwise r e f i n e m e n t 1 5 i s a d i s c i p l i n e f o r c r e a t i n g programs out of a b s t r a c t s p e c i f i c a t i o n s by making the s p e c i f i c a t i o n i n c r e a s i n g l y c o n c r e t e . Wegbreit [138] l i s t s t r a n s f o r m a t i o n r u l e s whereby a b s t r a c t s p e c i f i c a t i o n s 1 5 A phrase c o i n e d by Wirth [141]. s e c t i o n 3.4 113 may be reduced in stages to machine i n s t r u c t i o n s : the process may be p a r t i a l l y automated by making a v a i l a b l e a system which a u t o m a t i c a l l y does the book-keeping chores of a p p l y i n g such t r a n s f o r m a t i o n s under d i r e c t i o n of a human agent (the programmer). S i m i l a r systems of t r a n s f o r m a t i o n r u l e s , and t h e i r a p p l i c a t i o n i n a semi-automatic manner, are d e s c r i b e d by B u r s t a l l and D a r l i n g t o n [19], Bauer et a l . [12], and Arsac [ 6 ] , These systems are, to some extent, a development of gr a p h i c s e d i t o r s and "top-down" programming t o o l s whose t r a n s f o r m a t i o n r u l e s are simply the r e c u r s i v e syntax d e f i n i t i o n r u l e s of BNF. For i n s t a n c e , Hansen [55] d e s c r i b e s a system (Emily) f o r graphic e d i t i n g of h i e r a r c h i c t e x t (not n e c e s s a r i l y program t e x t , although that was i t s f i r s t a p p l i c a t i o n ) using as b a s i s a g e n e r a t i v e grammar. In developing such h i e r a r c h i c t e x t , one begins with a s i n g l e symbol, d i s p l a y e d on the screen (the d i s t i n g u i s h e d symbol of the g e n e r a t i v e grammar): <statement> Using a menu, the programmer transforms t h i s i n t o one of i t s p o s s i b l e p r o d u c t i o n s , which i s d i s p l a y e d a p p r o p r i a t e l y indented: for <var> := <expr> u n t i l <expr> do <statement sequence> end At each step, the programmer may expand any nonterminal symbol ( i n d i c a t e d here in the c l a s s i c a l BNF s t y l e by "<...>"). The f i n a l r e s u l t , one c o n s i s t i n g p u r e l y of t e r m i n a l symbols, i s guaranteed to be a program c o r r e c t i n i t s context f r e e s y n t a c t i c s t r u c t u r e . In program development systems, the programmer begins with an a b s t r a c t e x p r e s s i o n of, say, the e i g h t queens s e c t i o n 3.4 114 p r o b l e m : 1 6 i n i t i a l i z e ; repeat t r y c u r r e n t step; i f s u c c e s s f u l then advance e l s e regress u n t i 1 underflow o_r overflow T h i s i s a general statement of a s o l u t i o n method by b a c k t r a c k i n g ; in the e i g h t queens problem, we mean by t h i s : at any p o i n t j , place the j - t h queen on the board ("try c u r r e n t s t e p " ) , t e s t the p a r t i a l l y complete board f o r the mutual nonagression c o n d i t i o n ( " s u c c e s s f u l " ) , and then e i t h e r advance to queen j+1 or regress to queen j - l . "We mean by t h i s " i s a statement t h a t , as f a r as the programmer i s concerned, i n d i c a t e s that he can r e p l a c e the a p p r o p r i a t e a b s t r a c t i o n by the more concrete i n s t r u c t i o n , so that he ends up with the program: var i a b l e j ; i n i t i a l i z e ; repeat p l a c e the j - t h queen on the board; i f present placement meets nonagression c o n d i t i o n then proceed to queen j+1 e l s e r e c o n s i d e r queen j - l unt i1 underflow or overflow I t i s p o s s i b l e to s t a r t o f f ( " i n i t i a l i z e " ) with the e i g h t queens l i n e d up at the top of the board, one to a column, and to "place the j - t h queen" by advancing her one row. "Underflow", corresponding to a r e g r e s s i o n past the f i r s t column, means that we have a problem with ho s o l u t i o n ; while "overflow", corresponding to an advancement past the l a s t column, means that we have achieved a s o l u t i o n . So we now have a s t i l l more 1 6 Place e i g h t chess queens on a board so that none of them i s in a p o s i t i o n to capture any of the o t h e r s . R e c a l l that a queen can move any number of squares in a row, column, or d i a g o n a l . s e c t i o n 3.4 115 c o n c r e t e program: - v a r i a b l e j ; j := 1; l i n e up the queens one t o a column, on row 0; repeat advance the j - t h queen one row; i f p r e s e n t placement meets n o n a g r e s s i o n c o n d i t i o n then proceed t o queen j + l e l s e r e c o n s i d e r queen j - l u n t i 1 j < 1 or j > maxcolumn T h i s i s s t i l l a v e r y a b s t r a c t e x p r e s s i o n of the a l g o r i t h m : no d a t a s t r u c t u r e s , except the v a r i a b l e j , -have been d e c l a r e d ; no s p e c i f i c meaning has been a t t a c h e d t o the r e a l workhorse, the p r e d i c a t e " p r e s e n t placement meets n o n a g r e s s i o n c o n d i t i o n " . In f a c t , we might be tempted t o expand "proceed t o queen j + l " as s i m p l y " j := j + l " , and " r e c o n s i d e r queen j - l " as " j := j - l " , but w h i l e the e x p a n s i o n f o r "proceed" i s e s s e n t i a l l y c o r r e c t , the e x p a n s i o n f o r " r e c o n s i d e r " must ta k e i n t o account t h a t we do not w i s h t o r e t u r n t o a p r e v i o u s c o n d i t i o n , nor do we wi s h to i g n o r e any i n t e r m e d i a t e c o n d i t i o n s : t h i s a l g o r i t h m works o n l y i f i t works i t s way m o n o t o n i c a l l y t h rough a l l r e a s o n a b l e p e r m u t a t i o n s . So " r e c o n s i d e r queen j - l " must be expanded a s : procedure r e c o n s i d e r queen i i_s r e t u r n queen i+1 t o the t o p of her column; j : = i end T h i s procedure may be expanded i n l i n e by r e p l a c i n g i t s " c a l l " by i t s body, and i t s "parameter" i by j - l , o p t i m i z i n g the r e s u l t i n g e x p r e s s i o n "j-1+1" as " j " : note t h a t one has t o be c a r e f u l not to i n t r o d u c e the A l g o l 60 c a l l by name problem [ 9 0 ] . B e f o r e we go much f a r t h e r , i t w i l l become n e c e s s a r y t o i n v e n t a r e p r e s e n t a t i o n (a d a t a s t r u c t u r e ) f o r the board, s i n c e s e c t i o n 3.4 116 f u r t h e r refinements w i l l c e r t a i n l y have to make e x p l i c i t r e f e r e n c e to t h i s data s t r u c t u r e ; but we have gone f a r enough for the f l a v o r of t h i s method to become c l e a r . 1 7 The semi-automatic program development systems mentioned above g e n e r a l l y proceed along these l i n e s , by making the process e a s i e r , the same way Emily, and s i m i l a r systems, make s y n t a c t i c expansion i n t o programs e a s i e r . In f a c t , an E m i l y - l i k e system would be a good s t a r t f o r a program development system. E m i l y - l i k e systems are e s p e c i a l l y a p p r o p r i a t e i n teaching environments, where the problem with c o n f u s i n g and somewhat redundant s y n t a c t i c e r r o r messages, d e s c r i b e d i n s e c t i o n 3.3, can be overcome e n t i r e l y by making a v a i l a b l e a program e d i t o r that w i l l produce only s y n t a c t i c a l l y c o r r e c t programs. C o r n e l l , where a s y n t a x - c o r r e c t i n g compiler f o r a PL/I subset was developed e a r l i e r , has been experimenting with a program e d i t o r of t h i s kind [123]. Syntax graphs can be used e f f e c t i v e l y in such a s e t t i n g because they can represent the context s e n s i t i v e syntax of the program and can thus assure the c o r r e c t d e c l a r a t i o n of names, because they represent the h i e r a r c h i c a l s t r u c t u r e and can be d i s p l a y e d at any l e v e l of d e t a i l , and because the graph manipulation techniques o u t l i n e d here are a p p r o p r i a t e to s t r u c t u r e e d i t i n g . 1 7 Wirth [141] gives a s l i g h t l y d i f f e r e n t , but e s s e n t i a l l y s i m i l a r e x p o s i t i o n of t h i s p r o c e s s , from which the above i s der i v e d . s e c t i o n 3.4 117 Proofs of program c o r r e c t n e s s are s i m p l i f i e d by beginning from an a b s t r a c t v e r s i o n because, f o r i n s t a n c e , "f(S1,S2,...,Sn)" i s a refinement of "P" i f and only i f (using Hoare's n o t a t i o n f o r p r e d i c a t e t r a n s f o r m e r s ) : {p}f(S1,S2,...,Sn){q} i m p l i e s {p}P{q} for a l l p r e d i c a t e s p,q But the proof of "f(SI,S2,...,Sn ) " s a t i s f y i n g the requirements that i t s execution u n i f o r m l y transforms p i n t o q w i l l be much more d e t a i l e d than that f o r "P". So i f we prove "P", and then d e r i v e the expre s s i o n " f ( . . . ) " by a p p l y i n g well-understood, even proven, equivalence t r a n s f o r m a t i o n s , then a simple proof s u f f i c e s , and the c o r r e c t n e s s of the refinement i s guaranteed. ( I n c i d e n t a l l y , D i j k s t r a [ 3 4 , 3 5 ] 1 8 proceeds i n the opposite d i r e c t i o n , d e r i v i n g a program from i t s p r e d i c a t e transforming p r o p e r t i e s : the r e s u l t i n g program w i l l n e c e s s a r i l y be c o r r e c t --with respect to the p r e d i c a t e t r a n s f o r m i n g p r o p e r t i e s a s c r i b e d to i t . ) Another semi-automatic b a s i s f o r c r e a t i n g c o r r e c t programs was given a formal treatment by Manna and Waldinger: "In order to c o n s t r u c t a program s a t i s f y i n g c e r t a i n s p e c i f i c a t i o n s , a theorem induced by these s p e c i f i c a t i o n s i s proved, and the d e s i r e d program i s e x t r a c t e d from the pr o o f " ([86] from the a b s t r a c t ) . It was f u r t h e r developed, along somewhat d i f f e r e n t l i n e s , by F l o y d [43], by D a r l i n g t o n and B u r s t a l l [30], and by Wang [137]. The concept of a b s t r a c t i o n forms a b a s i s f o r 1 8 See a l s o Knuth [72], pp 287-289. s e c t i o n 3.4 118 s e v e r a l programming languages designed f o r v e r i f i c a t i o n , l i k e Mesa [48], CLU [80] and Alphard [145]; r e c e n t l y , M o r r i s d e s c r i b e d an implementation s t r a t e g y f o r a b s t r a c t i o n s using replacement by open procedures (a form of macro expansion, and a method a l r e a d y mentioned in r e l a t i o n to a b s t r a c t i o n in s e c t i o n 3.1.1): however, t h i s l e a d s , as M o r r i s p o i n t s out, to the A l g o l 60 problem of implementing parameters through a call-by-name mechanism [90]. A d e n o t a t i o n a l semantics may a l s o be- used as a b a s i s f o r proving c o r r e c t n e s s and t h i s , too, leads to the p o s s i b i l i t y of automatic program development. Mosses [91] d e s c r i b e s a t r a n s l a t o r w r i t i n g system which s y n t h e s i z e s t r a n s l a t i o n s from a d e n o t a t i o n a l semantics model of the language. Huet and Lang [62] give a methodology f o r a p p l y i n g B u r s t a l l and D a r l i n g t o n s t y l e program t r a n s f o r m a t i o n s and a method for proving t h e i r c o r r e c t n e s s with respect to t h e i r intended e f f e c t using a d e n o t a t i o n a l semantics. Thus the concern fo r c o r r e c t n e s s in programs i s t r a n s f e r r e d to a concern f o r c o r r e c t n e s s in t r a n s l a t o r s : c o n s i d e r i n g the immense importance of c o r r e c t n e s s to the users of a t r a n s l a t o r , t h i s i s no small matter, and any work aimed at making the t r a n s l a t i o n process more r e g u l a r , and thus more amenable to a demonstration of c o r r e c t n e s s w i l l be of v a l u e ; we w i l l r e t u r n to t h i s q u e s t i o n i n s e c t i o n 4.2.4. In a l l these i n s t a n c e s , the c e n t r a l p r i n c i p l e upon which they operate i s some no t i o n of t r a n s f o r m a t i o n . In the next s e c t i o n 3.4 119 chapter, a design for a t r a n s l a t o r w r i t i n g system i s o u t l i n e d based e n t i r e l y on the idea of r e p r e s e n t i n g programs as d i r e c t e d graphs and d e f i n i n g v a r i o u s forms of program manipulation as tr a n s f o r m a t i o n s on these graphs. Whether t h i s design w i l l c a r r y over s u c c e s s f u l l y i n t o a p p l i c a t i o n s of the kind d e s c r i b e d i n t h i s s e c t i o n , i s not p r e s e n t l y obvious: program development systems l i k e that of D a r l i n g t o n and B u r s t a l l tend to be i n t e r a c t i v e , and dependent on a v i s u a l d i s p l a y , while compilers tend to be batch o r i e n t e d "black box" p r o c e s s o r s . However, compilers once tended to be card o r i e n t e d with ad hoc p a r s e r s : i t may be that some combination of the concepts of a b s t r a c t i o n and of program s y n t h e s i s w i l l have the same u l t i m a t e e f f e c t on the nature of t r a n s l a t o r s as the concept of r e c u r s i v e s y n t a c t i c d e f i n i t i o n once had; namely, to a l t e r that nature permanently. If so, we can look forward to i n t e r a c t i v e program development t o o l s based not only on a r i g o r o u s n o t i o n of syntax, but a l s o on a r i g o r o u s n o t i o n of t r a n s l a t i o n , and that notion w i l l , i f the preceding overview has looked i n the r i g h t p l a c e s , almost c e r t a i n l y be based on some concept of s y n t a c t i c t r a n s f o r m a t i o n . 120 Chapter 4: Toward a t r a n s l a t o r w r i t i n g system The general progress of t h i s d i s s e r t a t i o n has been from the a b s t r a c t to the c o n c r e t e . Chapter 2 presented a general c o n s i d e r a t i o n of t r a n s l a t i o n while e s t a b l i s h i n g the importance of s y n t a c t i c s t r u c t u r e and the per v a s i v e n e s s of the concept of tr a n s f o r m a t i o n i n any d i s c u s s i o n of the s u b j e c t . Chapter 3 presented a s e r i e s of s p e c i f i c problems i n t r a n s l a t i o n w i t h i n the general framework of s y n t a c t i c graph t r a n s f o r m a t i o n , but l e f t many d e t a i l s of implementation i n the a i r . T h i s chapter i s intended to make concrete many of the i s s u e s that were d e l i b e r a t e l y l e f t a b s t r a c t in the preceding c h a p t e r s : t h i s does not mean that i t w i l l take the form of a user manual f o r an a c t u a l programming language, but i t does mean that i t w i l l be made c l e a r how such a language might be designed. I t w i l l concern i t s e l f e s p e c i a l l y with the c o n s i d e r a t i o n s that would n e c e s s a r i l y be brought to such a design, and with the c o n c l u s i o n s those c o n s i d e r a t i o n s would probably l e a d t o . Where a p p r o p r i a t e , a l g o r i t h m s guaranteeing i t s e f f i c i e n t implementation w i l l be presented. It i s both too e a r l y and too d i s t r a c t i n g to present a programming language or t r a n s l a t o r w r i t i n g system here. T h i s d i s s e r t a t i o n i s intended p r i m a r i l y to d e f i n e and b r i n g i n t o i t s t r a d i t i o n a l context a technique f o r d e s c r i b i n g t r a n s l a t i o n s that has been, as shown above, inherent i n many e a r l i e r d i s c u s s i o n s , but was never given a comprehensive treatment that would c h a p t e r 4 121 demonstrate i t s c o n s i d e r a b l e d e s c r i p t i v e power. That such a t r a n s l a t o r w r i t i n g system i s p o s s i b l e , and t h a t i t would l i v e up to the c l a i m s made f o r i t here i s s u p p o r t e d t o some e x t e n t by e m p i r i c a l e v i d e n c e which w i l l be d i s c u s s e d i n s e c t i o n 4.4; but the main support f o r t h i s t e c h n i q u e l i e s i n i t s a b i l i t y , as demonstrated i n c h a p t e r 3, t o g i v e u n i f o r m e x p r e s s i o n t o a wide range of t r a n s l a t i o n - r e l a t e d t o p i c s . A d d i t i o n a l e m p i r i c a l e v i d e n c e , presumably i n the form of a more f i r m l y d e f i n e d t r a n s l a t i o n language and a l a r g e r , more " r e a l i s t i c " range of a c t u a l a p p l i c a t i o n s , would not n e c e s s a r i l y have l e n t e x t r a s u p p o r t t o the p r i m a r y t h e s i s . On the c o n t r a r y , such e v i d e n c e would have r e q u i r e d a more d e t a i l e d p r e s e n t a t i o n and, hence, the d e f i n i t i o n of an a c t u a l programming l a n g u a g e w h i c h might w e l l have d i s t r a c t e d a t t e n t i o n from the core i d e a s by i n t r o d u c i n g q u e s t i o n s of s t y l e and even t a s t e i n language. To a v o i d t h i s e x c e s s of d e t a i l , t h e r e f o r e , and t o keep the d i s c u s s i o n t o a n e c e s s a r y minimum f o r the.purpose of c o n v i n c i n g the reader t h a t the i d e a s p r e s e n t e d i n c h a p t e r 3 are f e a s i b l e , the subsequent p r e s e n t a t i o n i s l i m i t e d t o g i v i n g c o n c r e t e form o n l y t o such a s p e c t s of t r a n s l a t i o n as might be g i v e n form i n a t r a n s l a t o r w r i t i n g language. The d i s c u s s i o n w i l l f o c u s on the advantages t h e s e forms g i v e , and the ways i n which they might be implemented i n a p r a c t i c a l manner. The remainder of t h i s c h a p t e r i s d i v i d e d i n t o f o u r p a r t s : s e c t i o n 4.1 o u t l i n e s the form of a t r a n s l a t o r w r i t t e n i n terms of graph t r a n s f o r m a t i o n s ; s e c t i o n 4.2 t r e a t s s p e c i f i c q u e s t i o n s c h a p t e r 4 122 of r e p r e s e n t a t i o n (of graphs and graph p a t t e r n s ) , of im p l e m e n t a t i o n (of p a t t e r n matching and t r a n s f o r m a t i o n ) , and of program s t r u c t u r i n g i n a graph t r a n s d u c e r language; s e c t i o n 4.3 p r e s e n t s an o v e r a l l s t r a t e g y f o r s t r u c t u r i n g t r a n s l a t o r s , both f o r purposes of s i m p l i f y i n g t r a n s l a t i o n s and, i t f o l l o w s , f o r d i r e c t i n g and c o n s i d e r i n g the i m p l e m e n t a t i o n of a t r a n s l a t o r ; and s e c t i o n 4.4 p r e s e n t s a c t u a l e x p e r i e n c e w i t h a p r i m i t i v e i m p l e m e n t a t i o n of a graph t r a n s f o r m i n g t r a n s l a t o r , and the e f f e c t t h i s had on the p r e s e n t e d d e s i g n . 4.1 O v e r a l l s t r u c t u r e of the system One t e n d s , i n r e a d i n g computer program d o c u m e n t a t i o n , t o i g n o r e the word "system" as a u s e f u l b i t of n o i s e r e f e r r i n g , i n a vague and g e n e r a l way, t o the program b e i n g d e s c r i b e d . I t s use i n the term " t r a n s l a t o r w r i t i n g system" i s , however, g e n e r a l l y c o r r e c t , s i n c e many TWSs are not s i n g l e programs, but systems of s e p a r a t e programs t o pe r f o r m the s e p a r a t e f u n c t i o n s of t r a n s l a t i o n . Feldman and G r i e s make t h i s q u i t e c l e a r : " S i n c e c o m p i l e r w r i t i n g i s a l a r g e programming t a s k w i t h many a s p e c t s , i t i s not s u r p r i s i n g t h a t many d i f f e r e n t t e c h n i q u e s have been proposed as a i d s t o c o m p i l e r w r i t e r s . In a v e r y r e a l sense, any system f e a t u r e (e.g. t r a c e , e d i t ) which h e l p s one produce l a r g e programs i s a c o m p i l e r - w r i t i n g t o o l " ( [ 3 9 ] , p 7 8 ) . I t i s t h e r e f o r e d i f f i c u l t t o d e f i n e p r e c i s e l y where the TWS ends and the o p e r a t i n g system b e g i n s ; i t may be p a r t of a l a r g e r program development system t h a t i s i n t e g r a t e d but i s not i t s e l f a s e c t i o n 4.1 complete o p e r a t i n g system. 1 123 Part of the reason f o r t r a n s l a t o r w r i t i n g t o o l s to tend to separate i n t o i n d i v i d u a l programs l i e s in the d i f f e r e n c e s in the languages t r a d i t i o n a l l y used to express i t s d i f f e r e n t f u n c t i o n s : syntax i s most o f t e n , and to great p r o f i t , d e s c r i b e d in a v a r i a n t of BNF; semantics ( e v e r y t h i n g e l s e ) i s most o f t e n d e s c r i b e d i n a general purpose programming language ( f o r example, XPL [84] and TRUST [133]), and where the two are combined, as in many syntax d i r e c t e d t r a n s l a t i o n techniques, the usual programming problems i n v o l v i n g an i n t e g r a t i o n of l o g i c a l l y separate f u n c t i o n s w i l l tend to crop up. Sandewall separates the f u n c t i o n s of p a r s i n g , c o m p i l i n g or i n t e r p r e t i n g , and program p r i n t i n g , e s p e c i a l l y because, i n a program development system, these may need to be a p p l i e d i n some order other than the t r a d i t i o n a l sequence i n a compiler ([110], pp 37-38); McKeeman's d i s c u s s i o n of " v e r t i c a l fragmentation" i n compiler c o n s t r u c t i o n exaggerates, but a l s o thereby emphasizes the s e p a r a b i l i t y of compiler f u n c t i o n s ([85], pp 20-27). Simply, a t r a n s l a t o r c o n s i s t s of the f o l l o w i n g sequence of programs: 1 Sandewall: "Programming system, i s used [here] to mean an i n t e g r a t e d p i e c e of software which i s used to support program development, i n c l u d i n g but not r e s t r i c t e d to a compiler" ([110], p 35). s e c t i o n 4.1 124 p a r s e r t r a n s d u c e r (one or more i n sequence) f l a t t e n e r A p a r s e r i s any program which t a k e s a t e x t u a l r e p r e s e n t a t i o n and, i f i t i s c o r r e c t l y formed, t u r n s i t i n t o an i n t e r n a l r e p r e s e n t a t i o n ; t r a n s d u c e r s change one i n t e r n a l r e p r e s e n t a t i o n i n t o a n o t h e r , and r e p r e s e n t the a c t u a l p r o c e s s of t r a n s l a t i o n ; f i n a l l y , a f l a t t e n e r i s any program t h a t t a k e s a s t a n d a r d i n t e r n a l r e p r e s e n t a t i o n , u s u a l l y a n o n - l i n e a r s t r u c t u r e , hence the name " f l a t t e n e r " , and produces some form of i n t e r p r e t i v e code, machine code, or even e x t e r n a l r e p r e s e n t a t i o n . Of t h e s e , o n l y the t r a n s d u c e r s a r e of d i r e c t i n t e r e s t t o t h i s d i s s e r t a t i o n : by and l a r g e , the emphasis i n the p a s t , i n any d i s c u s s i o n of t r a n s l a t i o n , has been h e a v i l y on the s i d e of p a r s i n g t e c h n i q u e s , and i t i s out of p l a c e h e r e , i n what i s i n t e n d e d t o be a d i s c u s s i o n of a g e n e r a l purpose t e c h n i q u e f o r t r a n s l a t i o n , t o spend more than the n e c e s s a r y minimum time on pa r s i n g . However, i n s p i t e of the f a c t t h a t what i s b e i n g d e v e l o p e d i n t h i s d i s s e r t a t i o n i s not a s y n t a x d i r e c t e d t e c h n i q u e f o r the e x p r e s s i o n of t r a n s l a t i o n , i t i s n e v e r t h e l e s s a t e c h n i q u e h i g h l y dependent on a s y n t a c t i c a n a l y s i s of the program under t r a n s l a t i o n . B e f o r e we can d e v e l o p a language f o r the e x p r e s s i o n of graph t r a n s f o r m a t i o n , t h e r e f o r e , and a p p l y t h i s t o the t r a n s l a t i o n of programming languages, we must f i r s t c o n s i d e r s y n t a c t i c r e c o g n i t i o n and a c c e p t a n c e t o a c e r t a i n l e v e l of d e t a i l . In p a r t i c u l a r , the argument p r e s e n t e d i n c h a p t e r 3 s e c t i o n 4.1 125 depends e n t i r e l y on the assumption that the program under t r a n s l a t i o n has r e c e i v e d the b e n e f i t of a complete parse, in which a l l the context s e n s i t i v e a s p e c t s of program acceptance have been d e a l t with as w e l l as the context f r e e s t r u c t u r e , and that both these s y n t a c t i c domains are completely and adequately r e p r e s e n t a b l e i n a graph s t r u c t u r e , without recourse to e x t e r n a l data s t r u c t u r e s l i k e symbol t a b l e s . I t has been remarked before in t h i s d i s s e r t a t i o n that the context s e n s i t i v e aspects of programming languages are t r a d i t i o n a l l y d e a l t with in the form of a symbol t a b l e . Furthermore, in the absence of a s u f f i c i e n t l y e x p r e s s i v e language f o r s y n t a c t i c d e f i n i t i o n , the symbol t a b l e w i l l be the only recourse, and i t must be s t r e s s e d that any implementation of the ideas developed in t h i s chapter w i l l probably depend on some form of symbol t a b l e to implement context s e n s i t i v e r e c o g n i t i o n . However, in the design of a programming language in the case of the present d i s c u s s i o n , a language f o r the programming of t r a n s l a t o r s -- i t i s p a r t i c u l a r l y important that i t s component concepts be r a i s e d to t h e i r h i g h e s t l e v e l of a b s t r a c t i o n i n order to reduce the i n t e l l e c t u a l e f f o r t r e q u i r e d both to program in i t and to read the r e s u l t i n g programs as a l g o r i t h m s . For example, most programming languages s i n c e A l g o l 60 have used the concept of a stack, a concept that r e q u i r e s c o n s i d e r a b l e mathematical s o p h i s t i c a t i o n to express r i g o r o u s l y [59, 79], i n order to implement the concept of name scoping; n e v e r t h e l e s s , the concept of name scoping, or l o c a l i t y of s e c t i o n 4.1 126 v a r i a b l e s , has been i n c o r p o r a t e d i n these programming languages in a way that does not r e q u i r e the i n t r o d u c t i o n f i r s t of the concept of a stack. S i m i l a r l y , j u s t as BNF i s a long-proven a b s t r a c t i o n f o r s y n t a c t i c d e f i n i t i o n that i s c o n c e p t u a l l y c l e a n e r than the same d e f i n i t i o n presented in terms of a parser w r i t t e n i n a general purpose programming language, so an extension of BNF i s used here to d e f i n e the r e l a t i o n s h i p between context s e n s i t i v e syntax and s y n t a c t i c graph r e p r e s e n t a t i o n s in a manner that i s c l e a n e r than w r i t i n g a t r e e t o u r i n g a l g o r i t h m to perform the same f u n c t i o n . 2 The b a s i s f o r our d i s c u s s i o n w i l l be augmented grammars, whose r u l e s are of the form ( [ 8 7 ] , pp 250-265): <dcl t r a i n > Y s y m b o l - t a b l e l 4 s v m b o l - t a b l e 2 ::= < d e c l a r a t i o n > Y s y m b o l - t a b l e l Asymb'ol-table2 | <declaration>Ysymbol-tablel 4 s y m b o l - t a b l e 3 t! . tt r <dcl train>ysymbol-table3 ^.symbol-table2 . T h i s i s a pure augmented r u l e which, as i n d i c a t e d on page 35, may be thought of as a r e c u r s i v e r e c o g n i t i o n procedure with parameters s y m b o l - t a b l e l and symbol-table2, and l o c a l v a r i a b l e symbol-table3. Symbols f o l l o w i n g a downward arrow, "Y", represent i n h e r i t e d a t t r i b u t e s ; symbols f o l l o w i n g an upward arrow, represent s y n t h e s i z e d a t t r i b u t e s . In terms of a r e c u r s i v e r e c o g n i t i o n procedure, the l e f t hand s i d e ' s a t t r i b u t e s 2 T h i s i n s i g h t was d e r i v e d from experience with the t r a n s l a t o r d i s c u s s e d i n s e c t i o n 4.4.1, which was implemented without b e n e f i t of such a parser and had, t h e r e f o r e , to perform the t r a n s l a t i o n of context f r e e t r e e i n t o context s e n s i t i v e graph • i t s e l f : t h i s t r a n s f o r m a t i o n r e q u i r e d an e n t i r e pass, which was as l a r g e as the t r a n s l a t i o n pass that f o l l o w e d . s e c t i o n 4.1 127 are l i k e f o r m a l parameters where, i n the language of A l g o l W, the i n h e r i t e d a t t r i b u t e s a r e v a l u e parameters and the s y n t h e s i z e d a t t r i b u t e s are r e s u l t parameters; on the r i g h t hand s i d e these a t t r i b u t e s a r e l i k e v a r i a b l e s whose v a l u e s a r e , i f i n h e r i t e d , passed t o , o r , i f s y n t h e s i z e d , r e t u r n e d by the e x p a n s i o n ( r e c u r s i v e c a l l ) of the n o n t e r m i n a l symbol t o which they a r e a t t a c h e d . In p r a c t i c e , these r u l e s are f u r t h e r augmented w i t h p r e d i c a t e s , which must e v a l u a t e t o t r u e f o r the r u l e t o s u c c e e d , and w i t h procedure c a l l s which p e r f o r m " s e m a n t i c " a c t i o n s . Bochmann has shown t h a t the number of passes n e c e s s a r y to e v a l u a t e an a t t r i b u t e grammar can be d e r i v e d from the grammar ( i t may be unbounded), and t h a t r e g u l a r c o n t e x t f r e e p a r s i n g t e c h n i q u e s a r e a p p l i c a b l e t o such grammars [ 1 5 ] . O b v i o u s l y , i f semantic a c t i o n s can be w r i t t e n i n a g e n e r a l purpose language, the r e s u l t i s a syntax d i r e c t e d t r a n s l a t o r . However, t h a t i s o v e r s t e p p i n g the bounds of the augmented grammar t e c h n i q u e ' s u s e f u l n e s s i n i t s p r e s e n t c o n t e x t (and, i f my h y p o t h e s i s i s c o r r e c t , i n any c o n t e x t ) , and I s h a l l propose here a r e s t r i c t e d s e t of o p e r a t i o n s and p r e d i c a t e s t h a t w i l l make i t p o s s i b l e t o d e f i n e , i n a r e l a t i v e l y s u c c i n c t f a s h i o n , the r e l a t i o n s h i p of the grammar of a language t o i t s a b s t r a c t r e p r e s e n t a t i o n as a syntax graph. C o n s i d e r a g a i n the language L, f i r s t p r e s e n t e d on page 29: s e c t i o n 4.1 128 (1 ) L : : = BLOCK (2 ) BLOCK ::= beqin DCLS; STMS end (3a) DCLS := TYPE VAR (3b) I TYPE VAR ; DCLS (4 ) TYPE := r e a l i n t | ... (5 ) VAR : := a | b ... | z (6a) STMS • := STM (6b) | STM ; STMS (7a) STM : := VAR (7b) I BLOCK A s i m p l e augmentation of t h i s grammar i s c a p a b l e of d e f i n i n g both the form of the a b s t r a c t s y n t a x t r e e t h a t r e p r e s e n t s i t and the r e l a t i o n s h i p s between the d e f i n i t i o n s and uses of symbols: <L>4tree ::= <Block>tU ^ t r e e . <Block>Ysymbols 4 t r e e ::= " b e g i n " [ l o c s y m b o l s := c o p y ( s y m b o l s ) 3 <Dcls>^locsymbols ^ c l c l s it. it <Stms>Ylocsymbols ^stms "end" [ t r e e := "block"<dcls,stms>] . <Dcls>Ysymbols ^ t r e e ::= <Dcl>Ysymbols Atree | <Dcl>fsymbols J^dcl t <Dcls>Ysymbols ^ d c l s [ t r e e := " d c l s " < d c l , d c l s > ] . <Dcl>'fsymbols 4 t r e e ::= <Type>^type <Var>^var [ t r e e := "var"<type,var> ; symbolsdvar := t r e e ] . <Type>4type ::= " r e a l " [ t y p e := " r e a l " ] " i n t " [ type := " i n t " ] - "a" [var <Var>^var "b" [var " z " [var := "a"] := "b"] := "z"] s e c t i o n 4.1 129 <Stms>'fsymbols A t r e e ::= <Stm>Ysymbols A t r e e | <Stm>Ysymbols 4 s t m <Stms>y'symbols Astms [ t r e e := "stms"<stm,stms>] . <Stm>y'symbols A t r e e ::= <Var>Avar [ t r e e := symbols@var] | <Block>Y"symbols A t r e e . The p r i m a r y r u l e s a re augmented w i t h the i n h e r i t e d a t t r i b u t e "symbols", which r e p r e s e n t s a symbol t a b l e , and the s y n t h e s i s e d a t t r i b u t e " t r e e " , which r e p r e s e n t s the a b s t r a c t p a r s e t r e e ; i t w i l l , i n f a c t , be a DAG. The symbol t a b l e i s a c c e s s e d by the s u b s c r i p t i n g n o t a t i o n symboltable@symbol which r e f e r s t o the c u r r e n t d e f i n i t i o n of the "symbol" i n the t a b l e . The o n l y o t h e r n o t a t i o n i s t h a t of assignment, and a sim p l e l i s t n o t a t i o n i n which a t r e e of the form b l o c k d e l s stms i s r e p r e s e n t e d as "block"< d e l s , stms > The f u n c t i o n copy c r e a t e s a copy of the s t r u c t u r e r e p r e s e n t e d by i t s argument. I t i s used here t o i n s u r e t h a t b l o c k s i n h e r i t a l l v a r i a b l e s d e c l a r e d g l o b a l t o the b l o c k , but use o n l y a l o c a l copy of the symbol t a b l e t o d e c l a r e l o c a l v a r i a b l e s , so t h a t these do not r e t u r n t o the h i g h e r l e v e l b l o c k s . A s t a c k i n g mechanism would be a more e f f i c i e n t i m p l e m e n t a t i o n of t h i s (see page 3 1 ) , but would have i n c r e a s e d the s i z e of t h i s d e f i n i t i o n s l i g h t l y ; f o r purposes of a d e f i n i t i o n , the two a r e e q u i v a l e n t . 130 4.2 Toward a graph transducer language In chapter 3 we examined many d i f f e r e n t cases of t r a n s l a t i o n and t r a n s l a t i o n r e l a t e d concerns which would be expressed i n terms of graph t r a n s f o r m a t i o n s . We saw that even such t r a n s l a t i o n methodologies as d i v i d i n g the t r a n s l a t o r up i n t o a sequence of independent stages c o u l d be d i s c u s s e d in terms of t r a n s f o r m a t i o n , where each stage i s implemented as a graph trans d u c e r . Many d e t a i l s of the a c t u a l e x p r e s s i o n of t r a n s l a t i o n algorithms were l e f t to the reader's imagination as being i n c i d e n t a l to an a b s t r a c t d i s c u s s i o n of t r a n s l a t i o n such as presented there. However, i t i s now ? time to c o n s i d e r at l e a s t some of these d e t a i l s , both as to the ex p r e s s i o n of graph t r a n s f o r m a t i o n s (and s y n t a c t i c t r a n s d u c t i o n s in general) and the reasonable implementation of these concepts, so that there can be no q u e s t i o n that the t r a n s f o r m a t i o n a l approach to t r a n s l a t i o n i s p r a c t i c a l as w e l l as e l e g a n t . The d e t a i l s are addressed i n three stages, each b u i l d i n g on. the preceding one: s e c t i o n 4.2.1 c o n s i d e r s the d e f i n i t i o n of graphs and graph p a t t e r n s i n a program; s e c t i o n 4.2.2 d e f i n e s a graph p a t t e r n matching operator and give s an a l g o r i t h m f o r i t s e f f i c i e n t implementation; and s e c t i o n 4.2.3 d e f i n e s a graph t r a n s f o r m a t i o n o p e r a t o r . 3 S e c t i o n 4.3 c o n s i d e r s a means of 3 In chapter 3, the two f u n c t i o n s were combined i n the s i n g l e operator "=>"; here they w i l l be f i r s t separated, f o r purposes of d i s c u s s i o n , and then recombined by means of a u n i f y i n g c o n s t r u c t . s e c t i o n 4.2 131 combining i n d i v i d u a l t r a n s f o r m a t i o n s i n t o a graph transducer by means of what p r a c t i c a l experience suggests to be a n a t u r a l , and c u r r e n t program c o r r e c t n e s s concerns suggest to be a w e l l formed, programming c o n s t r u c t . Where a p p r o p r i a t e , implementation a l g o r i t h m s are d e s c r i b e d , and h i s t o r i c a l background i s surveyed. 4.2.1 D e s c r i b i n g graphs in programs Throughout chapter 3, we represented graphs p i c t o r i a l l y . In one sense, t h i s i s the most n a t u r a l r e p r e s e n t a t i o n for a graph -- a p i c t u r e , as the anonymous wise man s a i d , i s worth a thousand words --• and i t served us w e l l enough in communicating the ba s i c concepts of graph t r a n s f o r m a t i o n . There i s a severe problem with t h i s r e p r e s e n t a t i o n , however, i f the r e p r e s e n t a t i o n i s to be embedded in a programming language: programs are, with some excepti o n s , s t r i n g s of t e x t . Some e a r l y programming languages were l i n e o r i e n t e d , and there are o c c a s i o n a l suggestions f o r a r e t u r n to t h i s form, e s p e c i a l l y i n g i v i n g two dimensional r e p r e s e n t a t i o n s to c e r t a i n c o n s t r u c t s , l i k e sub- and s u p e r s c r i p t s : 2 2 A : = x + y i r j i j (which, a d m i t t e d l y , i s more l i k e n a t u r a l mathematical n o t a t i o n than " A ( i , j ) := x ( i ) * * 2 + y ( j ) * * 2 " ) . However, most of our compiler technology, e s p e c i a l l y the immense body of l e x i c a l scanner and p a r s e r technology b u i l t up i n the wake of A l g o l 60, s e c t i o n 4.2.1 132 depends on a stream o r i e n t a t i o n f o r languages, and we depart from t h i s o r i e n t a t i o n only at the r i s k of l o s i n g that technology. An i n t e r e s t i n g example of t h i s need for " l i n e a r i z a t i o n " may be found i n parser d e s c r i p t i o n techniques. Although, as we determined in s e c t i o n 2.1.1, BNF i s a simple and powerful language f o r the e x p r e s s i o n of syntax, i t has c e r t a i n l i m i t a t i o n s when i t comes to producing e f f i c i e n t p a r s e r s f o r a r b i t r a r y BNF s p e c i f i c a t i o n s , that have proved very d i f f i c u l t to c h a r a c t e r i z e except through an a l g o r i t h m f o r g e n e r a t i n g p a r s e r s from BNF d e s c r i p t i o n s . 4 However, recent years have seen the comeback of t r a n s i t i o n diagram r e p r e s e n t a t i o n s of grammars, a method that dates back to the e a r l y s i x t i e s [27]. A grammar of the form A : := P A' ::= RA PA' OA may be expressed d i a g r a m a t i c a l l y as A: >P 1 — > -R<--Q<-T h i s n o t a t i o n was put to use s y s t e m a t i c a l l y in the d e f i n i t i o n of P a s c a l [64], and s e v e r a l "languages" have been invented since to express these diagrams i n l i n e a r form f o r parser generators, a l l 4 Compare Aho and Ullman [ 3 ] , chapter 6. T h i s problem i s analogous to the problem of g i v i n g p r e c i s e d e s c r i p t i o n s of programming language semantics without r e s o r t i n g to s p e c i f y i n g a c o m p i l e r . s e c t i o n 4.2.1 133 resembling the r e l a t i o n s h i p between fl o w c h a r t s and (goto r e p l e t e ) programming languages l i k e F o r t r a n . A recent (1979) use was i n extending such diagrams to "semantic" p r o c e s s i n g of programming languages [28], an e x t e n s i o n to syntax diagrams of the syntax d i r e c t e d t r a n s l a t i o n technique. Since programming language syntax i s h i g h l y h i e r a r c h i c a l (a r e s u l t of i t s being almost context f r e e ) , a p r i m a r i l y l i s t s t r u c t u r e d r e p r e s e n t a t i o n with a few l a b e l s and "gotos", would s u f f i c e to' represent syntax graphs. Such a r e p r e s e n t a t i o n , which was introduced as a t r e e d e s c r i p t i o n language in the augmented grammar of s e c t i o n 4.1, was used as the b a s i s f o r a p r e l i m i n a r y graph r e p r e s e n t a t i o n language. This proved to be r e l a t i v e l y p leasant to work with, p r i m a r i l y because the number of l a b e l s and "gotos" tended to be s m a l l : f o r example, the parse graphs of L programs were d e s c r i b e d i n s e c t i o n 4.1 without once r e s o r t i n g to a l a b e l . A r e p r e s e n t a t i o n of the i f - t h e n p a t t e r n on page 70 would look l i k e t h i s , f o r i n s t a n c e : L: " i f - t h e n " < c , "seq" < s , "goto" < !L > > > (Here the "L:" c o n s t r u c t l a b e l s a node, and the "!L" c o n s t r u c t r e p r e s e n t s an arc to the node so l a b e l l e d . ) The other major approach to l i n e a r i z i n g complex p a t t e r n s i s to b u i l d them up out of simpler subparts, in a manner analogous to the way i n which BNF s p e c i f i c a t i o n s define-an e n t i r e language by r e c u r s i v e l y d e f i n i n g i t s b a s i c p a r t s . P f a l t z and Rosenfeld s e c t i o n 4.2.1 134 [98] present a grammar of p i c t u r e components as a b a s i s f o r f o r m a l l y d e s c r i b i n g the manipulation of complexly r e l a t e d o b j e c t s (games, s u r f a c e s , b i o l o g i c a l systems). Shaw's P i c t u r e D e s c r i p t i o n Language [116] i s a c a l c u l u s f o r forming p i c t u r e s out of simple b a s i c components l i k e o r i e n t e d l i n e s , a r c s , e t c . Feder [38] uses a s i m i l a r d e s c r i p t i o n language for p i c t u r e s , based on grammars, and i n v e s t i g a t e s i t s formal p r o p e r t i e s i n r e l a t i o n to s t r i n g grammars. Sugito et. a_l. [121] present a graph manipulation language based on P f a l t z and Rosenfeld's web grammars, while Shapiro and Baron [115] do the same in a c a l c u l u s resembling Shaw's. Knowledge r e p r e s e n t a t i o n languages depending on complex r e l a t i o n s h i p s between components are o f t e n l i n e a r i z e d in t h i s f a s h i o n [14], and have in turn i n f l u e n c e d the form of TCOL's l i n e a r r e p r e s e n t a t i o n language f o r programming language c o n s t r u c t s [78]. Although these r e p r e s e n t a t i o n s are g e n e r a l l y adequate and many can be shown to be complete in the sense that they are able to represent any programming language c o n s t r u c t , or indeed any c o n s t r u c t generable by a context s e n s i t i v e grammar, they are not as n a t u r a l to a d i s c u s s i o n of programs as the l i n e a r t e x t of the program they are u l t i m a t e l y intended to r e p r e s e n t . For example, the e x p r e s s i o n L: i_f c then s; goto L f_i i s more n a t u r a l than the graph p a t t e r n above, or even the graph that p a t t e r n l i n e a r i z e s . At the same time, while the program i s n a t u r a l , when i t i s used in a t r a n s f o r m a t i o n , as i n i t s o r i g i n a l s e c t i o n 4.2.1 135 s e t t i n g on page 70, while c do s od => L: i_f c then s; goto L f_i i t i n t r o d u c e s the q u e s t i o n s : What i s the s t r u c t u r a l r e l a t i o n s h i p of t h i s fragment to any program i t i s a part o f , before and a f t e r the tr a n s f o r m a t i o n ? What r e l a t i o n s h i p , i f any, does the l a b e l L have to other l a b e l s i n the r e s t of the program? and What s y n t a c t i c e n t i t i e s do the v a r i a b l e s c and s represent? (The l a s t q u e s t i o n r e l a t e s to the s y n t a c t i c macro problem d i s c u s s e d i n s e c t i o n 3.1.1). The answers to these questions l i e i n the complete s y n t a c t i c d e f i n i t i o n of the language. J u s t as the human programmer "knows" (sometimes i n c o r r e c t l y ) the s y n t a c t i c s t r u c t u r e of such a program fragment when he i s w r i t i n g i t down or reading i t , whereas the compiler must p a i n s t a k i n g l y eke out t h i s s t r u c t u r e (always, we expect, c o r r e c t l y ) , so too the human author of a t r a n s l a t o r knows what he means by such a t r a n s f o r m a t i o n , whereas the t r a n s l a t o r w r i t i n g system must have the wherewithal to a c q u i r e t h i s knowledge from a " d e c l a r a t i o n " of the syntax i n v o l v e d . We would l i k e a mechanism for t h i s d e c l a r a t i o n , t h e r e f o r e : we would l i k e to be able to say that "while c do s od" i s a program fragment r e p r e s e n t i n g a s y n t a c t i c e n t i t y ; to d e c l a r e the p r e c i s e s t r u c t u r e of that e n t i t y ; to d e c l a r e the same in f o r m a t i o n about the c o n s t r u c t "L: i f c then s; goto L f i " ; and s e c t i o n 4.2.1 136 t o p r e s e n t these d e c l a r a t i o n s i n a form t h a t makes i t p o s s i b l e t o e x t r a c t the answers to our q u e s t i o n s about the t r a n s l a t i o n d e f i n e d above. To be more s p e c i f i c , l e t us c o n s i d e r the f o l l o w i n g program as a s p e c i f i c a t i o n of the above t r a n s l a t i o n : b e g i n language S t r u c t i_s <stm>Atree ::= " w h i l e " <exp>Aexp "do" <stm>Astm "od" [ t r e e := "while"<exp,stm>] | "stm" [ t r e e := "stm"] . <exp>Atree ::= "exp" [ t r e e := "exp"] end; language U n s t r u c t is_ <stm>fids A t r e e ::= <id>|id ":" <stm>Y"ids A t r e e [ i d s @ i d := t r e e ] | " i f " <exp>Aexp "t h e n " <stms>Yids Astms " f i " [ t r e e := "if"<exp,stms>] | "goto" <id>Aid [ t r e e := "go t o " < i d s ( i i d > ] | "stm" [ t r e e := "stm"] . <stms>ylds A t r e e ::= <stm>vids A t r e e | <stm>yids ^stm ";" <stms>tids Astms [ t r e e := "stms"<stm,stms> j . <exp>Atree ::= "exp" [ t r e e := "exp"] . <id>Aid ::= [ d e f i n e d l e x i c a l l y ] end; S t r u c t : { w h i l e exp do stm od } => U n s t r u c t : { L: i f exp then stm; goto L f i } end Two of the q u e s t i o n s asked i n c o n n e c t i o n w i t h the example on page 135 are answered h e r e : the s t r u c t u r a l r e l a t i o n s h i p of the two fragments w i t h any c o n t a i n i n g s t r u c t u r e i s d e f i n e d by the syntax of the two languages i n c o n j u n c t i o n w i t h the r u l e s f o r graph replacement d e f i n e d i n c h a p t e r 3; and the r e l a t i o n s h i p s e c t i o n 4.2.1 137 of the l a b e l "L" in the t a r g e t to any other l a b e l s "L" i n the c o n t a i n i n g program are made e x p l i c i t by the complete s y n t a c t i c d e f i n i t i o n of Un s t r u c t , a l s o i n c o n j u n c t i o n with the r u l e s f o r graph replacement (there i s no such r e l a t i o n s h i p ) . What has occurred here i s a process of a b s t r a c t i o n : the programmer d e f i n e s once f o r a l l what he means when he says a pi e c e of t e x t (enclosed i n braces) i s a program i n language S t r u c t , say; he can then use such p i e c e s of text as a b s t r a c t e n t i t i e s , knowing f u l l y t h a t , underneath, the compiler has t r a n s l a t e d h i s text i n t o a graph r e p r e s e n t a t i o n , and that t h i s i s what he i s r e a l l y m a n i p u l a t i n g . T h i s n o t a t i o n w i l l be extended, i n the next s e c t i o n , to de a l with graph p a t t e r n s , by t u r n i n g i t i n t o a kind of typed lambda n o t a t i o n , and thus answer the t h i r d of our qu e s t i o n s about the s y n t a c t i c t r a n s f o r m a t i o n example, namely, What do "c" and "s" represent? At t h i s p o i n t we have come through the f u l l c i r c l e d e s c r i b e d i n the i n t r o d u c t i o n . From an a n a l y s i s of e a r l i e r t r a n s l a t i o n e f f o r t s we concluded, in chapter 2, that there was something i n n a t e l y a t t r a c t i v e about the view of t r a n s l a t i o n that t r e a t s i t as a t e x t u a l t r a n s f o r m a t i o n , but that t h i s a t t r a c t i v e vantage i s of t e n obscured behind the very r e a l l i m i t a t i o n s of s t r i n g , and even context f r e e (tree) p a t t e r n s i n d e a l i n g with the f u l l range of t r a n s l a t i o n concerns. In chapter 3 we examined a graph t r a n s f o r m a t i o n formalism which, i t was demonstrated by examples, promised to give j u s t such unambiguous s e c t i o n 4.2.1 138 e x p r e s s i o n to t h i s f u l l range of t r a n s l a t i o n concerns. However, the graphs, i t was r e a d i l y p e r c e i v e d , presented many of the same problems i n r e p r e s e n t i n g t r a n s l a t i o n as do t r e e s i n r e p r e s e n t i n g program s t r u c t u r e , or f l o w c h a r t s i n r e p r e s e n t i n g c o n t r o l flow. Only now, by the promised process of a b s t r a c t i n g the u n d e r l y i n g graph r e p r e s e n t a t i o n in such a way that the program fragments can once again be represented as s t r i n g s of t e x t , have we been able to have i t both ways: on the one hand we have a r i g o r o u s l y d e f i n e d program s t r u c t u r e , and on the other we have a ' t r a n s l a t i o n s u p e r f i c i a l l y d e f i n e d in terms of a t e x t u a l t r a n s f o r m a t i o n . The only q u e s t i o n remaining i s a p r a c t i c a l one: We know that there are p a r s i n g techniques for complete programs i n a language, but how can we be sure that program fragments are g e n e r a l l y parsable? The answer i s twofold, once bottom up and once top down: f i r s t , assuming the fragment reduces to a s i n g l e nonterminal (a h i g h l y reasonable assumption f o r t r a n s l a t i o n , where the syntax d i r e c t e d t r a n s l a t i o n model has t h r i v e d under e x a c t l y t h i s assumption), any bottom up method w i l l produce such a nonterminal for such a fragment, provided i t i s capable of p a r s i n g the language, whether or not i t i s embedded in a l a r g e r program ( i . e . , whether the nonterminal i s part of a l a r g e r r e d u c t i o n , or an end p r o d u c t ) ; second, a general top down p a r s i n g s t r a t e g y f i r s t d e f i n e d by E a r l e y 5 w i l l g e n e r a l i z e to 5 E a r l e y ' s paper [36] i s a b i t vague; the best d e s c r i p t i o n to my knowledge i s by Graham and H a r r i s o n [49]. E a r l e y ' s a l g o r i t h m s e c t i o n 4.2.1 139 take any number of nonterminals, up to the e n t i r e nonterminal vocabulary, as i t s s t a r t i n g p o i n t , and i s t h e r e f o r e guaranteed to f i n d a parse f o r even a program fragment. 4.2.2 Graph p a t t e r n matching The preceding s e c t i o n was concerned with p r e s e n t i n g an a b s t r a c t i o n of the concept of s y n t a c t i c s t r u c t u r e . T h i s a b s t r a c t i o n was such that the r e l a t i o n s h i p of the s t r u c t u r e to the w r i t t e n form of a language c o u l d be d e f i n e d once, in a l l i t s f i n i c k y d e t a i l , and subsequently c o n s t r u c t s in the language c o u l d be w r i t t e n down in a n a t u r a l "high l e v e l " f o r m u l a t i o n without l o s i n g the p r e c i s i o n inherent i n the a b s t r a c t e d s t r u c t u r e . We examined one example of a s y n t a c t i c t r a n s f o r m a t i o n amounting to a t r a n s l a t i o n , in the e x p r e s s i o n i s n o t o r i o u s f o r being c o s t l y : i n the worst case, that of an ambiguous grammar, i t operates i n time p r o p o r t i o n a l to a f u n c t i o n of the cube of the l e n g t h of the input 0 ( n 3 ) ; only for d e t e r m i n i s t i c grammars, which are not always the most convenient d e s c r i p t i o n of the language, does i t operate in time p r o p o r t i o n a l to the l e n g t h of the input O(n); and for most grammars, the unambiguous but n o n d e t e r m i n i s t i c grammars in which most programming languages are d e f i n e d , i t s complexity i s 0 ( n 2 ) . Furthermore, none of these measures takes i n t o account the enormous overhead i n v o l v e d in performing the computation d e s c r i b e d by t h i s a l g o r i t h m . However, we are not t a l k i n g here about input s t r i n g s of upward of a thousand symbols, such as normal programs, but of t i n y program fragments of from ten to f i f t y symbols, and f o r these E a r l e y ' s a l g o r i t h m i s q u i t e a c c e p t a b l e . (An extremely recent c o n t r i b u t i o n to general context f r e e p a r s i n g by Graham e_t a l . [50] g e n e r a l i z e s the work of E a r l e y and others, and promises lower overhead; the worst-case complexity remains the same, however. ) s e c t i o n 4.2.2 140 S t r u c t : { w h i l e exp do stm od } => U n s t r u c t : { L: i f exp then stm; goto L f i } Here, the m a t e r i a l i n the b r a c k e t s r e p r e s e n t s two program fragments i n a not ve r y u s e f u l p a i r of programming languages. In g e n e r a l , however, we w i l l want t o e x p r e s s not merely program fragments, but p a t t e r n s i n the language, so t h a t t r a n s l a t i o n s of the ( p o s s i b l y i n f i n i t e ) s e t of a l l programs i n the language can be e x p r e s s e d i n a f i n i t e ( i n d e e d s m a l l ) s e t of t r a n s f o r m a t i o n s . R e f e r r i n g a g a i n t o the above example, we want t o t r a n s l a t e not o n l y the program fragment " w h i l e exp do stm end" i n t h i s manner, but a l l program fragments of i t s s y n t a c t i c form, where "exp" and "stm-" s t a n d f o r a r b i t r a r i l y complex sub-fragments of the s y n t a c t i c forms <exp> and <stm>, r e s p e c t i v e l y ; i . e . , "exp" and "stm" are t o become v a r i a b l e s i n a t r a n s f o r m a t i o n p a t t e r n . The use of v a r i a b l e s i n t h i s c o n t e x t i s analogous t o the use of v a r i a b l e s i n procedure d e f i n i t i o n . The d e f i n i t i o n . f = lambda (x) (x+x) d e f i n e s a k i n d of p a t t e r n f o r any program i n which f o c c u r s , and d e f i n e s as w e l l what e f f e c t f has on i t s environment, by u s i n g the v a r i a b l e x, r e p r e s e n t i n g f ' s immediate s u c c e s s o r , or argument, i n i t s d e f i n i t i o n . For example, the e x p r e s s i o n a + f b - c i s , by the above d e f i n i t i o n , e q u i v a l e n t t o the e x p r e s s i o n a + (b + b) - c S i m i l a r l y , the e x p r e s s i o n S t r u c t (e , s) : { w h i l e e do s od} s e c t i o n 4.2.2 141 r e p r e s e n t s a s y n t a c t i c p a t t e r n i n the language S t r u c t , where "e" and "s" act as v a r i a b l e s , such that the p a t t e r n matches any of the f o l l o w i n g e x p r e s s i o n s : while exp do stm od while exp do while exp do stm od od while exp do while exp do while exp do stm od od od e t c . However, we are up a g a i n s t the s e r i o u s q u e s t i o n : How do e and s r e l a t e to the syntax of the language? S p e c i f i c a l l y , how do we r e s t r i c t the p a t t e r n to matching while exp do while exp do stm od od e s and not while exp do while exp do stm od od I I i I e s (the s y n t a c t i c macro problem d i s c u s s e d i n s e c t i o n 3.1.1)? T h i s q u e s t i o n has important r a m i f i c a t i o n s i f the a b s t r a c t i o n i s to work c o r r e c t l y , s i n c e i t depends on there being known p a r s i n g techniques that are able to handle i t s d e t a i l s . I t i s r e s o l v e d by r e s o r t i n g once again to the analogy with procedure d e f i n i t i o n : i f we have to d e f i n e the procedure f so that i t w i l l operate not only on i n t e g e r and r e a l v a l u e s , but on Boolean values as w e l l , and i f we assume that the operator "+" i s not d e f i n e d f o r Boolean arguments, we must d e f i n e the procedure with respect to the type of i t s argument: 4 6 A l g o l 68 and ' S m a l l t a l k are examples of languages p r o v i d i n g t h i s kind of a b s t r a c t i o n f a c i l i t y . The language Ada c a l l s t h i s type of d e f i n i t i o n o v e r l o a d i n g , and p r o v i d e s f o r i t i n i t s d e f i n i t i o n . s e c t i o n 4.2.2 142 f = lambda ( i n t x) (x + x) f = lambda ( B o o l x) (x or x) In s y n t a c t i c p a t t e r n m a t c h i n g , the s y n t a c t i c d e f i n i t i o n not o n l y p r o v i d e s an a b s t r a c t i o n f o r the u n d e r l y i n g s y n t a c t i c s t r u c t u r e of a program segment, but a l s o a s e t of s y n t a c t i c t y p e s d e f i n e d by the n o n t e r m i n a l symbols i n the d e f i n i n g grammar. Our p a t t e r n now becomes S t r u c t ( <exp> e, <stm> s) : { w h i l e e do s od } which i s unambiguous. In e f f e c t , the a b s t r a c t i o n mechanism i s asked t o d i s c o v e r the u n d e r l y i n g s y n t a c t i c s t r u c t u r e of t h i s program fragment by a p p l y i n g a p a r s e r f o r language S t r u c t t o the s e n t e n t i a l form w h i l e <exp> do <stm> od which, i t can determine w i t h no d i f f i c u l t y , i s a p r o d u c t i o n of the symbol <stm>. Having d e f i n e d a p a t t e r n d e s c r i p t i o n t e r m i n o l o g y , we can now t u r n our a t t e n t i o n t o p a t t e r n m a t c h i n g . L e t us d e f i n e a p a t t e r n matching o p e r a t o r t o be a Boolean o p e r a t o r such t h a t G : : P i s t r u e i f graph G matches p a t t e r n P. That i s t o say, i f t h e r e i s a subgraph of G, r o o t e d a t the same node as G, which has the same "shape" as P and, i n s o f a r as the nodes of P a r e l a b e l l e d , has the same l a b e l s on c o r r e s p o n d i n g nodes. Thus, S t r u c t : { w h i l e exp do stm od } :: S t r u c t (<exp> e) : { w h i l e e do stm od } i s t r u e , w h i l e s e c t i o n 4.2.2 143 S t r u c t : {while exp do while exp do stm od od} :: S t r u c t (<exp> e) : { while e do stm od } i s f a l s e . On the other hand, S t r u c t : {while exp do while exp do stm od od} :: S t r u c t (<exp> e, <stm> stm) : { while e do stm od } i s t r u e . The only p r a c t i c a l c o n s i d e r a t i o n remaining i n t h i s d i s c u s s i o n of graph p a t t e r n matching i s the e f f i c i e n t implementation of a p a t t e r n matching a l g o r i t h m . In g e n e r a l , the problem of p a t t e r n matching over the set of d i r e c t e d graphs i s NP-complete, 7 but the f o l l o w i n g a l g o r i t h m w i l l demonstrate t h a t , given the syntax graphs used here, and reasonable r e s t r i c t i o n s on the match, r o o t i n g i t at the d e f i n i n g nodes of the subject and p a t t e r n graphs, the complexity of the problem i s p r o p o r t i o n a l to the number of nodes i n the p a t t e r n graph. P r e l i m i n a r i e s The f o l l o w i n g i s an a l g o r i t h m f o r the e v a l u a t i o n of "S::P", where S i s a node d e f i n i n g a graph (K,G,S) and P i s a node d e f i n i n g a graph (L,H,P), as d e f i n e d i n chapter 3. P, the p a t t e r n , may d e f i n e a p a t t e r n graph (one c o n t a i n i n g t e r m i n a l nodes with s p e c i a l symbols); S, the s u b j e c t , never d e f i n e s a 7 Garey and Johnson [45], p 202: Not only i s determining whether a graph G c o n t a i n s a subgraph isomorphic to another graph P NP-complete, but with the r e s t r i c t i o n on G that i t be a d i r e c t e d a c y c l i c graph, and on P that i t be a t r e e , the problem i s NP-complete as w e l l . These r e s t r i c t i o n s are f a r more s t r i n g e n t than those we w i l l be w i l l i n g to impose on syntax graphs and t h e i r p a t t e r n matches. s e c t i o n 4.2.2 144 p a t t e r n graph. The a l g o r i t h m i s r e c u r s i v e , and i s performed in the f o l l o w i n g environment: n v i s a set of v a r i a b l e symbols used, i f at a l l , i n l a b e l l i n g nodes of P o n l y . n A i s an i n i t i a l l y empty set which w i l l accumulate v a r i a b l e assignments d u r i n g the p a t t e r n match. Thus, where the p a t t e r n c o n t a i n s a node l a b e l l e d by a v a r i a b l e symbol v6V, and t h i s node i s matched a g a i n s t a node in the s u b j e c t , the graph d e f i n e d by the subject node s i s "assigned" to v by being a s s o c i a t e d with i t , as (v,s)€A, in step 3 of the a l g o r i t h m . T i s an i n i t i a l l y empty set which prevents c y c l e s in the match by r e c o r d i n g , i n step 6, that a node p of the p a t t e r n graph has been matched with a node s of the subject graph, as (p,s)6T; steps 1 and 2 of the a l g o r i t h m prevent any attempt to e i t h e r "rematch" p with another node s'*s once i t has been matched with s, and any attempt to go through the r e c u r s i v e match on p's c h i l d r e n a second time. The a l g o r i t h m c o n s i s t s of seven steps at any one of which the match may succeed or f a i l , and the a l g o r i t h m thus terminate, depending on a c o n d i t i o n t e s t e d at that step. Thus, at step n, we can always assume that the c o n d i t i o n s determining steps 1 through n-1 have t e s t e d f a l s e . A l g o r i t h m (S :: P) (1) If (P,S) € T, the match succeeeds. ( R e c a l l that T i s i n i t i a l l y empty. In t h i s case, the p a t t e r n node P has been v i s i t e d before (see step 6) and was then a l s o matched a g a i n s t the subject node S. We can assume, t h e r e f o r e , that a r e c u r s i v e match of the c h i l d r e n of S and P w i l l t e l l us nothing new — and would, in f a c t , cause the match S::P to recur i n f i n i t e l y . ) s e c t i o n 4.2.2 145 (2) If }n#S such that (P,n)6T, or -}m*P such that (m,S)€T, the match f a i l s . (In t h i s case, the p a t t e r n node P has been v i s i t e d before, but was matched a g a i n s t some other node i n the subj e c t graph, or the subject node S has been v i s i t e d before, but was matched a g a i n s t some other node i n the p a t t e r n graph. Since, there cannot be two such matches, t h i s one must f a i l . ) (3) I f L(P)=v 6 V then A <- A U { (v,S) }, and the match succeeeds. (In t h i s case, the p a t t e r n graph was a v a r i a b l e , represented by v. The set A i s updated to i n d i c a t e that v has been assigned the graph d e f i n e d by S -- pending s u c c e s s f u l completion of the e n t i r e p a t t e r n match.) (4) If L(P) * L ( S ) , the match f a i l s . (In t h i s case,•P and S have d i f f e r e n t l a b e l s ; that i s , the graphs d e f i n e d by nodes P and S have d i f f e r e n t r o o t s , and t h e r e f o r e do not match.) (5) If -}n,m6N such that G(S,n) * 0 and G(S,n+l) = 0 and H(P,m) # 0 and H(P,m+l) = 0 and n*m, then the match f a i l s . (In t h i s case, P and S have d i f f e r e n t numbers of c h i l d r e n . ) (6) (At t h i s p o i n t the nodes P and S have been found to be the same: P i s not l a b e l l e d with a v a r i a b l e , and has not been v i s i t e d before; P and S have the same l a b e l s and the same number of c h i l d r e n . It remains only to compare t h e i r c h i l d r e n , in order of age.) T < - T U { (P,S) }, and i <- 1, and while G(S,i) * 0 do: i f G(S,i) :: H(P,i) f a i l s then S::P f a i l s , e l s e , i <- i + 1 . (Set T records that P has been v i s i t e d and was matched with node S: t h i s prevents an advance beyond step 2 i n fu t u r e v i s i t s to P, and thus prevents i n f i n i t e l o o p s . The match then proceeds r e c u r s i v e l y , matching c h i l d r e n of S and P, in order, stopping as soon as two c h i l d r e n f a i l to match.) s e c t i o n 4.2.2 146 (7) The match succeeds. C l e a n i n g up Once the match i s over, the set A c o n t a i n s a p a i r i n g of assignments of nodes to v a r i a b l e s . If the match was s u c c e s s f u l , these assignments need to be a c t u a l l y performed, and sets A and T can then be made empty sets again f o r the next p a t t e r n match. Not i c e that no attempt i s made to prevent a v a r i a b l e v from being assigned two values i n A. T h i s w i l l happen only i f v occcurs as a l a b e l on two d i s t i n c t nodes of the p a t t e r n graph: such a p a t t e r n would be i l l - f o r m e d , and i n an a c t u a l implementation i t would have to be prevented. The primary reason f o r p r e s e n t i n g t h i s a l g o r i t h m i s to demonstrate that i t i s p o s s i b l e to r e s t r i c t graph p a t t e r n matching i n such a way that the complexity of the matching process i s p r o p o r t i o n a l to the s i z e of the graph. The a l g o r i t h m permits us to make the f o l l o w i n g a s s e r t i o n : In the e v a l u a t i o n of the p a t t e r n matching a l g o r i t h m , no arc of the p a t t e r n graph i s t r a v e r s e d more than once. Proof: If the root nodes f a i l to match, of course, no ar c s are t r a v e r s e d , but l e t us suppose the root nodes match, and a r e c u r s i v e match of the c h i l d r e n takes p l a c e . In that case s e c t i o n 4.2.2 147 see step 6 of the a l g o r i t h m -- the set T i s updated to i n d i c a t e that the p a t t e r n node has been v i s i t e d (and matched a g a i n s t the su b j e c t node), and on any subsequent v i s i t to the p a t t e r n node, the a l g o r i t h m cannot get past step 2, and thus cannot t r a v e r s e the a r c s l e a d i n g out of the p a t t e r n root node a second time. By i n d u c t i o n on the number of nodes in a p a t t e r n , i t f o l l o w s that no arc can be t r a v e r s e d more than once i n a match. From t h i s i t f o l l o w s (1) that the complexity of a p a t t e r n match i s p r o p o r t i o n a l to the number of arcs i n the p a t t e r n graphs (because they are t r a v e r s e d at most once), and (2) that a l l p a t t e r n matches terminate ( s i n c e p a t t e r n graphs are f i n i t e , and no arc can be t r a v e r s e d more than once). 4.2.3 Graph t r a n s f o r m a t i o n We have, so f a r i n s e c t i o n 4.2, progressed to the d e f i n i t i o n of a s y n t a c t i c p a t t e r n matching f a c i l i t y based on an a b s t r a c t i o n of a graph r e p r e s e n t a t i o n of the language's complete s y n t a c t i c s t r u c t u r e . The next stage i s a d e f i n i t i o n of graph t r a n s f o r m a t i o n and, as i n the d e f i n i t i o n of graph p a t t e r n s , we w i l l do w e l l to l o o k ' f o r e x i s t i n g models on which to base such a d e f i n i t i o n . Among the e a r l i e s t t r a n s f o r m a t i o n a l systems are "p r o d u c t i o n " systems and Markov systems, which precede and are s e c t i o n 4.2.3 148 c l o s e l y resembled by both BNF and the Snobol s t r i n g t r a n s f o r m a t i o n statement form: <subject> <pattern> = <new s t r i n g > (the part of the <subject> s t r i n g matched by the <pattern> -- i f at a l l -- i s r e p l a c e d by the <new s t r i n g > ) , and numerous other s t r i n g and symbol manipulation systems. F l o y d d e f i n e d a symbol m a n i p u l a t i o n ' language based on pr o d u c t i o n systems [40], l a t e r m o d i f i e d by Evans, which made i t p o s s i b l e to program a parser with semantic a c t i o n s that t ransform the parse stack. Ledgard [76,77] has extended production systems to a more s y n t a c t i c a l l y s t r u c t u r e d model, by extending i t from s t r i n g manipulation to tr e e m a n i p u l a t i o n , a l s o for purposes of syntax d i r e c t e d t r a n s l a t i o n . Another major use of a t r a n s f o r m a t i o n a l model i s in the Vienna D e f i n i t i o n Language, where the a b s t r a c t syntax of a program i s represented as a tr e e with l a b e l l e d nodes and l a b e l l e d branches. The o p e r a t i o n a l semantics of a language are d e f i n e d i n terms of i n t e r p r e t i v e t r a n s f o r m a t i o n s over the a b s t r a c t syntax of the language. These t r a n s f o r m a t i o n a l manipulations are performed by the mu operator, which can (a) add branches to a node, (b) d e l e t e branches, and (c) change the subtree connected to a branch. The l a b e l on a branch a c t s as a s e l e c t o r f o r the subtree at the other end. VDL i s a very p r i m i t i v e s t r u c t u r e manipulating language modelled on L i s p ; f o r the uses i t has been put to, there was, a c c o r d i n g to Wegner, no need f o r e i t h e r more complex s y n t a c t i c s t r u c t u r i n g , nor f o r a s e c t i o n 4.2.3 149 more a b s t r a c t t r a n s f o r m a t i o n a l operator ([139], p 12). The reason f o r t h i s i s c l e a r : VDL was intended, not as a programming language, but as a d e f i n i t i o n a l model; as such i t had to be small enough to be e a s i l y d e f i n e d in i t s e l f , and the l e a s t amount of a b s t r a c t i o n would have in c r e a s e d the s i z e of such a d e f i n i t i o n ; f o r the same reasons, a context f r e e model f o r syntax was the p r e f e r a b l e one, because of i t s s i m p l i c i t y i n d e f i n i t i o n , even though a symbol t a b l e technique had then to be implemented i n the language i n order to deal with the context s e n s i t i v e aspects of of programming languages. Much of the value of VDL and s i m i l a r o p e r a t i o n a l models ( f o r i n s t a n c e , Semanol [ 4 ] ) , has been d i m i n i s h e d by the subsequent appearance of the d e n o t a t i o n a l model which, l e s s dependent on an u n d e r l y i n g computational model, i s a l s o l e s s cumbersome. We have been using, up to now, the operator "=>" to represent t r a n s f o r m a t i o n . In chapter 3 where t h i s operator and i t s e f f e c t on graphs was d e f i n e d f o r m a l l y , i t combined, f o r s i m p l i c i t y of n o t a t i o n , the f u n c t i o n s of p a t t e r n matching and t r a n s f o r m a t i o n in expressions l i k e P a s c a l : {complex} => P a s c a l : { record r , i : r e a l end } (page 70). Here, the p a t t e r n c o n s i s t i n g of the i d e n t i f i e r "complex", matched ag a i n s t a nebulous notion of the subject program, caused a t r a n s f o r m a t i o n , in the subject program, of the subgraph r e p r e s e n t i n g the matched i d e n t i f i e r i n t o a subgraph r e p r e s e n t i n g the re c o r d d e c l a r a t i o n (an example of the implementation of language e x t e n s i o n s ) . We now have the p a t t e r n s e c t i o n 4.2.3 150 match operator and i t s e x t e n s i o n s d e f i n e d below to make s p e c i f i c r e f e r e n c e to the subject graph being matched, and can use the operator "=>" pu r e l y as a t r a n s f o r m a t i o n operator. Thus, i f program :: P a s c a l : {complex} then program => P a s c a l : { rec o r d r , i : r e a l end } f_i T h i s t r a n s f o r m a t i o n w i l l happen of t e n enough that a convenient shorthand n o t a t i o n may be used, s i m i l a r to that .of Snobol: program matching P a s c a l : {complex} => P a s c a l : I record r , i : r e a l end } 4.3 Using graph transducers in t r a n s l a t i o n s We now have at hand the ba s i c o p e r a t i o n s of graph p a t t e r n matching and graph t r a n s f o r m a t i o n , and a means of combining them i n t o a s i n g l e o p e r a t i o n , analogous to the Snobol s t r i n g replacement o p e r a t i o n . These two ope r a t o r s , along with the r e p r e s e n t a t i o n of s y n t a c t i c graph p a t t e r n s and the s y n t a c t i c a b s t r a c t i o n mechanism d e f i n e d e a r l i e r , present us with enough mani p u l a t i v e power such t h a t , i f they were a p p r o p r i a t e l y embedded i n a standard programming language, i t would be p o s s i b l e to w r i t e t r a n s l a t o r s , i n that language, based on the s y n t a c t i c t r a n s f o r m a t i o n model. However, the concept of a "standard" programming language v a r i e s r a t h e r w i l d l y depending on one's p e r c e p t i o n of s e c t i o n 4.3 151 programming; and, s i n c e there have been s e v e r a l developments i n programming methodology i n the l a s t decade t h a t , I b e l i e v e , bear d i r e c t l y on the t r a n s f o r m a t i o n a l view we have been t a k i n g of t r a n s l a t i o n , i t i s worth b r i n g i n g up these t o p i c s here, so that they may shed some l i g h t on any c o n s i d e r a t i o n s we may have f o r a t r a n s f o r m a t i o n a l t r a n s l a t o r w r i t i n g system. We have alr e a d y remarked, i n s e c t i o n 3.1.3, and w i l l do so again in s e c t i o n 4.4, that t r a n s f o r m a t i o n s o f t e n can be s t r u c t u r e d as a sequence of s u b t r a n s l a t i o n s . P a r t i c u l a r l y , the t r a n s f o r m a t i o n a l view of t r a n s l a t i o n , in which one c o n s t r u c t i s transformed i n t o another c o n s t r u c t , can be extended to a higher l e v e l view i n which a program i n one language i s transformed i n t o an e q u i v a l e n t program i n another language. Experience with a t r a n s f o r m a t i o n a l TWS has shown not only that t h i s view i s t r a n s p o r t a b l e to the design of a t r a n s l a t o r (the Pa s c a l S t r a n s l a t o r r e p o r t e d in s e c t i o n 4.4.1 c o n t a i n s two major procedures r e p r e s e n t i n g , r e s p e c t i v e l y , the P a s c a l S to Pcode, and the Pcode to Assembler t r a n s l a t i o n s ) , but that i t imposes a h i g h l y r e g u l a r s t r u c t u r e on the implementation of each t r a n s l a t o r step (which we have chosen to c a l l a t r a n s d u c e r ) , such that i t i s p r i m a r i l y a set of r u l e s f o r transfo r m i n g i n d i v i d u a l s y n t a c t i c c o n s t r u c t s i n the sub j e c t program. For example (from Kernighan [ 6 8 ] ) : s e c t i o n 4.3 152 program matching Ratfor(<expr> e, <stms> s l , s 2 ) : { i f (e) | s i | e l s e [ s2 H => Fortran(<expr> e, <stms> s l , s 2 ) : { i f (e) goto 10 ; s i ; goto 20 ; 10 s2 ; 20 continue ; } program matching Ratfor(<stm> s, <stms> s s ) : { s ; ss } => Fortran(<stm> s, <stms> s s ) : { s ;. ss ; } program matching Ratfor(<expr> e, <stms> s ) : { while (e) [ s H => Fortran(<expr> e, <stms> s ) : { 10 i f (e) goto 20 ; goto 30 ; 20 s ; goto 10 ; 30 continue ; } etc . F o r t r a n expresses i t s e l f p o o r l y (note use of semicolons as newline markers), and there i s some q u e s t i o n whether anything but an ad hoc parser w i l l handle even t h i s much of the language, but the p o i n t i s c l e a r : the Ra t f o r to F o r t r a n transducer c o n s i s t s of a set of independent t r a n s f o r m a t i o n s , and the only t h i n g not s p e c i f i e d above, other than the a c t u a l grammars f o r Ra t f o r and F o r t r a n , i s how the transducer i s to tour the Ratfor graph to turn i t i n t o a F o r t r a n graph. F o r t r a n statement numbers act l i k e l a b e l s in t h i s c o ntext. The above t r a n s l a t i o n does not imply that a Ra t f o r program c o n t a i n i n g , say, two occurrences of an j_f statement w i l l be t r a n s l a t e d i n t o a F o r t r a n program i n c o r r e c t l y c o n t a i n i n g two occurrences of the statement number 10. The statement numbers merely express a s y n t a c t i c r e l a t i o n s h i p between the statements s e c t i o n 4.3 153 represented i n these fragments, a r e l a t i o n s h i p that i s p u r e l y l o c a l to the statement fragments. As with l a b e l s i n e a r l i e r examples, they w i l l disappear in the process of t u r n i n g these fragments i n t o p a t t e r n graphs, the t r a n s l a t i o n w i l l t ransform s i m i l a r l y s t r u c t u r e d graphs in the same manner, and i t w i l l then be the f u n c t i o n of the F o r t r a n " f l a t t e n e r " , which performs the inverse f u n c t i o n of a p a r s e r , to a s s i g n unique statement numbers to the statements r e f e r e n c e d by goto statements. This t r a n s d u c e r - l i k e s t r u c t u r e of t r a n s l a t o r s i s not s u r p r i s i n g : i t i s a d i r e c t r e s u l t of the s t r u c t u r e of the problem, which i s in turn determined by the s t r u c t u r e of programming languages. The syntax d i r e c t e d t r a n s l a t i o n model was not so s u c c e s s f u l f o r so long without good reason. Indeed, Rosen has s t u d i e d t r e e and DAG t r a n s f o r m a t i o n systems analogous to s t r i n g t r a n s f o r m a t i o n systems l i k e grammars [104,105], and has come up with c o n d i t i o n s guaranteeing that a set of t r a n s f o r m a t i o n s w i l l have a unique r e s u l t , independent of the order i n which they are a p p l i e d -- a r e s u l t analogous to ambiguity r e s u l t s f o r grammars. Rosen suggests a p p l i c a t i o n s in a l i m i t e d area of t r a n s l a t i o n -- p r o o f s of o p t i m i z a t i o n c o r r e c t n e s s but the b a s i c p r i n c i p l e c l e a r l y has p o t e n t i a l f o r wider a p p l i c a t i o n , so that v a r i o u s subtasks of t r a n s l a t i o n can be expressed and proved c o r r e c t i n t h i s f a s h i o n . D i j k s t r a has used a s i m i l a r approach to the e x p r e s s i o n of a l g o r i t h m s i n such a way that they can be e a s i l y expanded i n t o remarkably elegant, and provably c o r r e c t , programs, by use of the guarded command s e c t i o n 4.3 154 c o n s t r u c t [34]. A c l o s e l y r e l a t e d , and n e a r l y contemporary paper by C h i r i c a and M a r t i n [23] uses Floyd-Hoare i n d u c t i v e a s s e r t i o n s - t o prove the c o r r e c t n e s s of t r a n s l a t i o n s performed by c o m p i l e r s , using as i t s b a s i s the n o t i o n that a compiler transforms a s y n t a c t i c a l l y complete program fragment in the language i n t o an e q u i v a l e n t program fragment in another language, where two programs may be d e f i n e d to be e q u i v a l e n t i f they perform the same p r e d i c a t e t r a n s f o r m i n g f u n c t i o n . For example, using r e l a t i o n s h i p s d e r i v e d by F l o y d , Hoare [42,59] and o t h e r s , i t i s p o s s i b l e to show that {p} do S u n t i l C {q} and [p] L: S; it_ ~'c goto L {q} for a l l p r e d i c a t e s p and q; t h i s i m p l i e s that the two program fragments are e q u i v a l e n t and that each t h e r e f o r e r e p r e s e n t s a c o r r e c t t r a n s l a t i o n , i n t o another idiom, of the o t h e r . 8 There are a l s o i n d i c a t i o n s , both t h e o r e t i c a l [119,104] and p r a c t i c a l [17,21] that t r a n s l a t o r s can best be s t r u c t u r e d as a c o l l e c t i o n of independent "passes" or t r a n s f o r m a t i o n s , most u s u a l l y a sequence but, i n Sklansky e_t a_l. [119], for i n s t a n c e , a network of t r a n s d u c e r s . DeRemer and Kron [32] have argued that c o n s t r u c t i n g a system d i f f e r s from the c o n s t r u c t i o n of a l o c a l process not only in magnitude, but a l s o in the kind of c o n s i d e r a t i o n s r e q u i r e d of the programmer: p r i m a r i l y in d e a l i n g I am using here my own f o r m u l a t i o n [129], developed independently of, but somewhat l a t e r than C h i r i c a and M a r t i n ' s , as a p r o p o s a l (unimplemented) f o r a t r a n s l a t i o n s y n t h e s i z e r along the l i n e s of Arsac and others [6], s e c t i o n 4.3 155 with the l o c a l processes as w e l l d e f i n e d "black box" f u n c t i o n s . In connection with t h i s , we may choose to t h i n k of programming language tra n s d u c e r s as black boxes that can be d e s c r i b e d by the well known "T diagram" which i s one of the l e g a c i e s of the Uncol p r o j e c t and i s u s e f u l f o r d e s c r i b i n g a b o o t s t r a p : a t r a n s l a t o r from language A to language B, w r i t t e n in language L, can be thought of as a f u n c t i o n T(A,B,L), and two such f u n c t i o n s can be combined in a p r e s c r i b e d manner to d e l i v e r a t h i r d such f u n c t i o n ; the process of b u i l d i n g up f u n c t i o n s toward a d e s i r e d r e s u l t i s g e n e r a l l y u s e f u l in d e veloping a b o o t s t r a p sequence. For i n s t a n c e , i f we have a t r a n s l a t o r from language A to language B w r i t t e n i n (high l e v e l ) language L, and we want to implement i t on machine M, which has no compiler from L to M, we use Uncol U in a three stage b o o t s t r a p as f o l l o w s : (1) We d e f i n e the only meaningful combination of two t r a n s l a t o r s to be T(K,L,M)°T(I,J,K) producing T ( I , J , L ) ( i . e . , a K-to-L t r a n s l a t o r a p p l i e d to an I - t o - J t r a n s l a t o r w r i t t e n i n K produces an I - t o - J t r a n s l a t o r i n L ) . (2) We write two a d d i t i o n a l t r a n s l a t o r s , both of which should be, i f U i s a proper Uncol, much e a s i e r to w r i t e than the T(A,B,M) t r a n s l a t o r we d e s i r e : T(U,M,M) and T(L,U,U) (3) We can now complete the process by three t r a n s l a t o r combinat i o n s : T(U,M,M)°T(L,U,U) producing T(L,U,M) T(L,U,M)°T(A,B,L) producing T(A,B,U) s e c t i o n 4.3 156 T(U,M,M)°T(A,B,U) producing T(A,B,M) V a r i a t i o n s of t h i s process have been at the back of much of the use of intermediate languages in p o r t a b i l i t y experiments [93], and i n d e s i g n i n g complex t r a n s l a t o r s as a sequence of steps [17], or even as a network of t r a n s l a t i o n s [9,44] resembling Sklansky et al_. ' s network of t r a n s d u c t i o n s [119]. The importance of t h i s kind of program c o n s t r u c t i o n c o n s i s t s i n i t s a b i l i t y to keep the complexity of programs down: an important a s s e t in the c o n s t r u c t i o n of a t r a n s l a t o r . Thus, once a transducer i s w r i t t e n , and we are assured of i t s c o r r e c t n e s s by whatever means we choose to depend on, 5 we may s a f e l y use i t as a "black box" through which a program in language A may pass and emerge transformed as an e q u i v a l e n t program in language B. For the above reasons, i t i s worth b r i e f l y c o n s i d e r i n g the c o n s t r u c t i o n and combination of t r a n s d u c e r s . We i n t r o d u c e , f o r purposes of d i s c u s s i o n , a n o t a t i o n resembling D i j k s t r a ' s guarded 5 I use the term "assured" a d v i s e d l y . I am w e l l aware of the immense and thankless task i n v o l v e d i n f o r m a l l y proving the c o r r e c t n e s s of even a small program using e x i s t i n g proof methodologies (compare, in t h i s context, the arguments of D e M i l l o et a l . [31]). However, most c o n s c i e n t i o u s programmers go through some semi-formal process whereby they assure themselves of the c o r r e c t n e s s of t h e i r programs rather than submitting them b l i n d l y to the machine to be run; the more a language i s designed to s i m p l i f y t h i s process, the more e f f e c t i v e that language w i l l be as a programming t o o l , t h e r e f o r e . s e c t i o n 4.3 157 command n o t a t i o n : transform Program matching p a t t e r n l => programl a p a t t e r n 2 => program2 a n p a t t e r n => program end The p o r t i o n s i n d i c a t e d by "Program", " p a t t e r n " and "program" are expressions e v a l u a t i n g to, r e s p e c t i v e l y , a graph, a graph p a t t e r n , and a graph. Free v a r i a b l e s introduced on the l e f t hand s i d e of a t r a n s f o r m a t i o n operator c a r r y on to the r i g h t hand s i d e , so that we can now write the R a t f o r to F o r t r a n transducer as f o l l o w s : RF := transform program matching Ratfor(<expr> e, <stms> s l , s 2 ) : { i f (e) {si}, e l s e is2} } => R F ( e T ; R F ( s l ) ; RFls2) ; F o r t r a n (...): {. . .} n Ratfor(<stm> s, <stms> s s ) : { s ; ss } => RF(s); R F ( s s ) ; F o r t r a n (...): { . . . } a Ratfor(<expr> e, <stms> s ) : { while (e) { s } } => RF(e); RF(sT; F o r t r a n (...): {. . .} n e t c . end No t i c e that the transducer i s now given a name by being a s s i g n e d to a v a r i a b l e , and i s d e f i n e d r e c u r s i v e l y to transform the subparts of the statements. A l l of t h i s i s , of course, shorthand f o r : s e c t i o n 4.3 158 RF := procedure (program); i f program :: R a t f o r ( . . . ) : {...} then RF(e);...;RF(s 2); program => F o r t r a n (...) : {...} f i ; i f program :: ... e t c . The reasons f o r i n t r o d u c i n g the new n o t a t i o n are t h a t : (1) I t i s c l e a n e r l o o k i n g , e s p e c i a l l y because the subject graph appears only once i n the transducer, i n s t e a d of two times the number of t r a n s f o r m a t i o n r u l e s , and because the p a t t e r n s now appear in a prominent p o s i t i o n at the beginning of each r u l e so that t r a n s f o r m a t i o n s are once again, as i n chapter 3, i d e n t i f i e d by t h e i r p a t t e r n , and the subject has become a w e l l d e f i n e d but, fo r the purposes of d i s c u s s i o n of i n d i v i d u a l t r a n s f o r m a t i o n s , a b s t r a c t e n t i t y ; and (2) A s s e r t i o n s about a set of t r a n s f o r m a t i o n s c o l l e c t e d i n t o a transducer, whether s t a t e d f o r m a l l y or i n f o r m a l l y , are now more r e a d i l y formulated and proved [34]. What has been presented here i s enough of a graph t r a n s f o r m a t i o n language to i n d i c a t e the form a t r a n s l a t i o n might take when expressed in such a language. Such a t r a n s l a t i o n would take f u l l advantage of the concept of s y n t a c t i c t r a n s f o r m a t i o n , but would not l o s e any of the c o n s i d e r a b l e advantages provided by syntax d i r e c t e d t r a n s l a t i o n . A transducer such as the .one above rep r e s e n t s i n essence a one pass syntax d i r e c t e d t r a n s l a t i o n ; we have even given i t the pr o d u c t i o n system s t y l e of c o n s t r u c t i o n found i n , f o r in s t a n c e , BNF s p e c i f i c a t i o n s . The only d i f f e r e n c e i s , that the s e c t i o n 4.3 159 t r a n s f o r m a t i o n s are not t i e d s t r i c t l y to any p a r t i c u l a r p a r s i n g technique, whether top down, bottom up, l e f t or r i g h t reducing, operator precedence, k-symbol lookahead, or even ad hoc. The t r a n s l a t i o n i s , in f a c t , as independent of the concrete syntax of the language as i t i s necessary to be. The next s e c t i o n w i l l r e p o rt on two t r i a l implementations of t r a n s l a t o r s , using t h i s approach. One of these was s u b s t a n t i a l , and r e l a t i v e l y r e a l i s t i c , and, although based on a more p r i m i t i v e set of c o n s t r u c t s than those d e f i n e d above, the r e s u l t s showed rather c o n c l u s i v e l y that a very l a r g e part of the process of t r a n s l a t i o n i s expressed s u c c i n c t l y by these or s i m i l a r c o n s t r u c t s . I t was t h i s experience, in f a c t , which l e d to a redesign r e s u l t i n g i n the above c o n s i d e r a t i o n s . 4.4 Experience with a graph transducer language In the preceding s e c t i o n s , we examined some of the c o n s i d e r a t i o n s that would l e a d to the design f o r a t r a n s l a t o r w r i t i n g system based on the concept of t r a n s l a t i o n as s y n t a c t i c t r a n s f o r m a t i o n . The p a r t i a l design presented there represents a second v e r s i o n of such a language, based on p r a c t i c a l experience with a f i r s t attempt. T h i s s e c t i o n i s intended to present that experience, and to i n d i c a t e what e f f e c t i t had on the design. A graph transducer "language" was implemented as a set of L i s p f u n c t i o n d e f i n i t i o n s . These were d i r e c t l y r e l a t e d to a s e c t i o n 4.4 160 minimal design for an A l g o l 6 0 - s t y l e language; the language i t s e l f was d e l i b e r a t e l y r e s t r i c t e d to the f o l l o w i n g set of c o n s t r u c t s : s u b j e c t :: p a t t e r n s u b j e c t => o b j e c t o r d i n a r y a r i t h m e t i c and Boolean operators v a r i a b l e := e x p r e s s i o n procedure (A1,...,A) E1;...;E end procname (E1,...,E) it EO then E l e l s i f E2 then . . . e l s e E f_i Graph p a t t e r n s were c o n s t r u c t e d by l a b e l : node < p a t t e r n l , p a t t e r n > ! l a b e l ? or ? v a r i a b l e D e c l a r a t i o n s c o u l d occur anywhere in the program, t h e i r range being l i m i t e d to the " n e s t " 1 0 in which they were d e c l a r e d , and were of the forms var x var x := memory The second form d e f i n e s a symbol t a b l e , which i s accessed by x [ expression sequence ] Symbol t a b l e s were used only i n programming the context s e n s i t i v e ( f i r s t ) pass over the parse t r e e , and were the f i r s t to go i n the r e v i s e d design, above, which r e l e g a t e s a l l s y n t a c t i c matters to the parse. In t h i s more p r i m i t i v e v e r s i o n , the input was a context f r e e parse t r e e , and the c o n t e x t u a l i n f o r m a t i o n had to be gathered in a pass over the t r e e : here, symbol t a b l e s were an absolute n e c e s s i t y , but we have seen ( s e c t i o n 4.1) how symbol t a b l e s may be a b s t r a c t e d by a more s o p h i s t i c a t e d syntax d e f i n i t i o n (which may, i n f a c t , be 1 0 A term taken from A l g o l 68 [131], s e c t i o n 4.4 161 implemented as a tour of a context f r e e t r e e [ 1 5 ] ) . As a r e s u l t , the d e f i n i t i o n of the t r a n s l a t i o n i s completely f r e e d from such implementation dependent c o n s i d e r a t i o n s . Graph p a t t e r n s were "evaluated" by being turned i n t o a c t u a l graphs. These p r i m i t i v e forms were, as I s a i d i n s e c t i o n 4.2.1, not unpleasant to work with, but they tended to become extremely cumbersome. Compare, f o r i n s t a n c e , the two e q u i v a l e n t e x p r e s s i o n s : program matching "if"<? v a r expr,?var t r u e , ? v a r f a l s e > => "stms"< expr, "stms"< "FJP"<!f>, "stms"< t r u e , "stms"< "UJP"<!e>, f: "stms"< f a l s e , e: "NUL" >>>>> and program matching Pascals(<expr> expr, <stms> t r u e , f a l s e ) : { i f expr then true e l s e f a l s e } => Pcode(<expr> expr, <stms> t r u e , f a l s e ) : { expr ; FJP f ; true ; UJP e ; f: f a l s e ; e: NUL }. The o r i g i n a l design i n c l u d e d a simple l o o p i n g c o n s t r u c t and value r e t u r n i n g general e x i t : l a b e l : repeat • • • e x i t l a b e l with e x p r e s s i o n end; end However, t h i s was never a c t u a l l y used. I i n c l u d e d a l o o p i n g c o n s t r u c t because my programming s t y l e i s such that i t uses i t e r a t i o n about as o f t e n as r e c u r s i o n ; i t i s not my programming s e c t i o n 4.4 162 s t y l e , t h e r e f o r e , that d i c t a t e d the absence of i t e r a t i o n i n my t r a n s l a t i o n s . Instead, i t stands to reason, the absence of i t e r a t i o n , and overwhelming dependence i n s t e a d on r e c u r s i o n , i s an a r t i f a c t of the s t r u c t u r e of the problem which, as we have remarked before, i s in turn an a r t i f a c t of the r e c u r s i v e d e f i n i t i o n of programming language syntax. I t was by c o n s i d e r i n g t h i s , what at f i r s t appeared to be s u r p r i s i n g , r e s u l t , that I came to c o n s i d e r the use of the simple "guarded command" s t y l e c o n s t r u c t of s e c t i o n 4.3 A note about e f f i c i e n c y of implementation i s a p p r o p r i a t e here. The procedure c a l l i n g mechanism imposed by the a r c h i t e c t u r e of "modern" computers on modern programming languages has, with the notable exception of a s e r i e s of Burroughs machines designed to support a d i a l e c t of A l g o l 60, been f o r c e d to be s i n g u l a r l y i n e f f i c i e n t , with the r e s u l t that the term " r e c u r s i v e " has more o f t e n than not been taken as synonymous with "accursed". R e c a l l , however, that the r e c u r s i o n here i s a d e f i n i t i o n a l a b s t r a c t i o n of a t r e e t o u r i n g a l g o r i t h m , which i m p l i e s the need for a stack, but none of the complicated r e g i s t e r s a v i n g / r e s t o r i n g and d i s p l a y m a i n t a i n i n g code i n v o l v e d in normal procedure c a l l i n g . 1 1 A c o n s i d e r a b l e p o r t i o n of the work on high l e v e l o p t i m i z a t i o n has been concerned with j u s t such r e c u r s i o n e l i m i n a t i o n f o r the sake of backward machine design, and would be p a r t i c u l a r l y a p p l i c a b l e h e r e . 1 2 The f a c t 1 1 Aho and Ullman [ 3 ] , pp 356-364; G r i e s [ 5 3 ] , pp 193-203. ' 1 2 Knuth [ 7 2 ] , pp 280-282; D a r l i n g t o n and B u r s t a l l [ 3 0 ] . s e c t i o n 4.4 163 that t h i s kind of r e c u r s i o n w i l l be common in a t r a n s l a t i o n language dependent on s y n t a c t i c s t r u c t u r e , and e a s i l y i s o l a t e d , means that i t w i l l be r e a d i l y o p t i m i z e d . T h i s was a t y p e l e s s language, and was so d e f i n e d because (1) i t i s easy enough to implement a t y p e l e s s language in L i s p , and (2) there were no c l e a r reasons for i n s i s t i n g on type d e c l a r a t i o n s . However, the c o n s i d e r a t i o n s of s e c t i o n 4.2.2, on graph p a t t e r n matching, now i n d i c a t e that t y p i n g , at l e a s t of p a t t e r n matching v a r i a b l e s , i s u s e f u l i n c r e a t i n g a more ex p r e s s i v e language for p a t t e r n s . P r a c t i c a l c o n s i d e r a t i o n s , such as the f a c t that typed languages pr o v i d e b e t t e r e r r o r d e t e c t i o n , and are more e f f i c i e n t l y implementable than t y p e l e s s languages, are another reason fo r i n c l u d i n g t y p i n g i n the design of the second v e r s i o n . Note, however, that the language has a f l e x i b l e t y p i n g f a c i l i t y , l i k e the most modern languages, and that t h i s was a d i r e c t r e s u l t of the s y n t a c t i c a b s t r a c t i o n i n t r o d u c e d i n s e c t i o n 4.2.2 Marking the graph nodes as to t h e i r s y n t a c t i c types may have a f u r t h e r b e n e f i t : other s t r u c t u r e manipulating languages, l i k e VDL and the PQCC, tag the a r c s out of a node, so that i n d i v i d u a l a r c s can be addressed, much as i n d i v i d u a l f i e l d s i n a data s t r u c t u r e are normally given names so that they can be addressed s y m b o l i c a l l y . In the o r i g i n a l v e r s i o n of the graph t r a n s f o r m a t i o n language, t h i s c a p a c i t y was not present, and the only pass i n which i t was badly missed was the context dependent s e c t i o n 4.4 164 i n f o r m a t i o n g a t h e r i n g pass. I t may be, t h e r e f o r e , that i t w i l l not now be necessary; but t h i s c o n c l u s i o n i s too much a f f e c t e d by other changes to the language ( e s p e c i a l l y by the removal of context s e n s i t i v e p r o c e s s i n g to the parser) to be reachable without f u r t h e r e x p e r i e n c e . 1 3 In the next two s u b s e c t i o n s we w i l l examine the two t r a n s l a t i o n s that have been expressed in the experimental graph t r a n s f o r m a t i o n language. These s e c t i o n s w i l l c o ncentrate e s p e c i a l l y on the c o n t r i b u t i o n made to these t r a n s l a t i o n s by the. graph t r a n s f o r m a t i o n a l view. 4.4.1 A Pascal-S compiler The l a r g e r of the two e f f o r t s at using the experimental graph t r a n s f o r m a t i o n language i n e x p r e s s i n g a t r a n s l a t o r was a Pasc a l S com p i l e r . I f e l t that a reasonably complete example of a programming language t r a n s l a t i o n was necessary (1) to provide an e m p i r i c a l proof that t h i s approach to t r a n s l a t i o n i s at l e a s t 1 3 R e c a l l t h a t , as in the above examples, i n d i v i d u a l f i e l d s are s e l e c t e d i n a p a t t e r n match by i n d i v i d u a l i d e n t i f i e r s , and may then be manipulated, as i n the i n s t r u c t i o n RF ( s ) However, i f a t o t a l t r a n s f o r m a t i o n i s not necessary, but only a m o d i f i c a t i o n -- as o f t e n happens in context s e n s i t i v e passes, where a s i n g l e subgraph may be r e p l a c e d , added, or. d e l e t e d -- i t i s u s e f u l to be able to address the f i e l d s i n d i v i d u a l l y , as in statement@<expr> := newvalue s e c t i o n 4.4.1 165 reasonable, and (2) to t r y out one approach to the design of such a t r a n s l a t i o n language on which to base p o s s i b l e , and indeed almost c e r t a i n , r e v i s i o n s and extensions of the base concept. I a l s o saw t h i s as an o p p o r t u n i t y to t r y out the e f f e c t i v e n e s s of the "graph tr a n s d u c e r " b a s i s f o r b u i l d i n g up complex t r a n s l a t o r s out of r e l a t i v e l y manageable s u b t r a n s l a t o r s . To keep t h i s e f f o r t simple enough to be f e a s i b l e w i t h i n the time a l l o t t e d , i t had to be r e s t r i c t e d to an i n t e r e s t i n g but small language. I a l s o f e l t i t had to be an e x i s t i n g language that was r e l a t i v e l y w e l l known so that t h i s r e p ort c o u l d be reasonably c e r t a i n of communicating i t s primary i n f o r m a t i o n without having to i n c l u d e an e n t i r e d e s c r i p t i o n of the language, but e s p e c i a l l y so that there c o u l d be no o p p o r t u n i t y of b i a s e s ( a c t u a l or imagined) e n t e r i n g i n t o the choice of common language c o n s t r u c t s to i n c l u d e f o r t r a n s l a t i o n -- p a r t i c u l a r l y b i a s e s toward reducing the work by s i m p l i f y i n g the problem. The language f i n a l l y s e t t l e d on was P a s c a l S [142] because (1) i t i s a proper subset of a w e l l known language based on modern c o n s i d e r a t i o n s of programming language design [64], (2) i t i s l a r g e enough to be co n s i d e r e d a "teaching subset", and i s t h e r e f o r e not a t r i v i a l or "toy" language, and (3) Wirth has p u b l i s h e d a program ( i n Pascal) which implements an i n t e r p r e t e r for the language and I hoped to compare the two v e r s i o n s f o r s i z e . To demonstrate some of the t r a n s l a t i o n scheme's s e c t i o n 4.4.1 166 f l e x i b i l i t y , I a l s o d e c ided: (1) to do the t r a n s l a t i o n i n two phases, from Pascal S to Pcode [94,13], and from Pcode to Assembler; and (2) to implement a language e x t e n s i o n mechanism in the form of i n l i n e procedures. As a r e s u l t of t h i s d e c i s i o n , the s i z e comparison proved to be out of the q u e s t i o n , because the two implementations were too d i s s i m i l a r : Wirth's v e r s i o n t r a n s l a t e s to Pcode and i n t e r p r e t s t h i s , mine continues to Assember; Wirth's i s a syntax d i r e c t e d t r a n s l a t o r which makes i t d i f f i c u l t to separate out even the P a s c a l S to Pcode t r a n s l a t i o n for p a r t i a l comparison to the corresponding pass in the t r a n s f o r m a t i o n a l t r a n s l a t o r . Aside from the two t r a n s l a t i o n passes there was a l s o a pass to c o l l e c t c o n t e x t u a l i n f o r m a t i o n ( i d e n t i f i e r types, i d e n t i f i e r ranges, e x p r e s s i o n t y p e s ) . T h i s pass was approximately as l a r g e as the P a s c a l S to Pcode t r a n s l a t i o n pass, and i t was t h i s unacceptable s i z e that r e s u l t e d ' in the r e c o n s i d e r a t i o n of s y n t a c t i c r e c o g n i t i o n r e p o r t e d in s e c t i o n 4.1 The a b i l i t y to tour the s y n t a c t i c s t r u c t u r e at w i l l proved p a r t i c u l a r l y v a l u a b l e i n t r a n s l a t i n g the Pascal case statement. Case statements u s u a l l y c r e a t e problems f o r one pass t r a n s l a t o r s , although w e l l known techniques e x i s t f o r code generation in one pass which makes the case statement no worse than any other statement r e q u i r i n g backward jumps, except q u a n t i t a t i v e l y [140]. The r e s u l t i n g t r a n s l a t i o n i s u s u a l l y from s e c t i o n 4.4.1 167 case s e l e c t o r of. l a b e l s l : state m e n t l ; • • • l a b e l s : statement; end to s e l e c t o r jump LI SI: statementl jump L2 Sn: statement jump L2 L I : t e s t s e l e c t o r i f out of bounds jump L2 jump L3+selector L3: jump S? • • • jump S? L2: NUL (compare the Z u e r i c h i n t e r p r e t e r [94]). The above i n v o l v e s , besides being as w e l l - s t r u c t u r e d as a p l a t e of s p a g h e t t i , one more jump per e v a l u a t i o n than i s a b s o l u t e l y necessary. A t r a n s l a t i o n with fewer jumps, however, i s p o s s i b l e only i f code need not be emitted the moment a c o n s t r u c t i s reduced to i t s s y n t a c t i c c l a s s , as in a l l syntax d i r e c t e d t r a n s l a t i o n s . A t r a n s f o r m a t i o n a l t r a n s l a t i o n would transform the subgraphs r e p r e s e n t i n g the statements i n t o e q u i v a l e n t Pcode subgraphs, then generate the s e l e c t o r e v a l u a t i o n , jump t a b l e ( c o n t a i n i n g p o i n t e r s to the statements), and f i n a l l y the statements. In BCPL, which has a g e n e r a l l y f e a s i b l e to generate c e r t a i n range and spread of p r e f e r a b l e to the L i s p s t y l e simpler case c o n s t r u c t , i t i s not a jump t a b l e , although f o r a s e l e c t o r values t h i s i s c e r t a i n l y c o n d i t i o n a l i n v o l v i n g i t e r a t i v e s e c t i o n 4.4.1 168 t e s t i n g and jumping over a l t e r n a t i v e s or a t a b l e of value/address p a i r s . BCPL compilers implementing an i n t e l l i g e n t code g e n e r a t i o n a l g o r i t h m which decides between these two o p t i o n s i n e v i t a b l y must make a complete pass over the code of the case statement in order to decide which i s p r e f e r a b l e f o l l o w e d by another pass to generate the code. Here e s p e c i a l l y , a graph r e p r e s e n t a t i o n of the program i s u s e f u l . P a s c a l has something of a language extension f a c i l i t y , as mentioned on page 70, i n i t s type d e f i n i t i o n statement. A d e f i n i t i o n of the form type complex = record r , i : r e a l end i s most easily•implemented by r e p l a c i n g every occurrence of the i d e n t i f i e r "complex" (w i t h i n the range of the d e f i n i t i o n ) by the subgraph r e p r e s e n t i n g the record d e f i n i t i o n . The P a s c a l S implementation i n c l u d e d t h i s s i m p l i f i c a t i o n , which took p l a c e d u r i n g the c o n t e x t u a l i n f o r m a t i o n g a t h e r i n g pass. (A b i t of a cheat: as d i s c u s s e d on page 86, language extension should be a separate pass f o r proper f u n c t i o n a l s e p a r a t i o n . ) I t was decided a l s o to i n c l u d e a parameterized language extension f a c i l i t y in the form of i n l i n e procedures. A d e f i n i t i o n of the form macro M (A ,...,A ) S ;...;S end 1 n 1 m i s implemented by r e p l a c i n g every occurrence of s e c t i o n 4.4.1 169 M (P ,...,P ) 1 n by (a subgraph r e p r e s e n t i n g ) : S ' • • S ' *J f • • • f %J 1 m where S' i s a copy of S i n which every occurrence of argument i i A has been r e p l a c e d by P (l<i<m, and l<j<n). T h i s i n v o l v e d j j c r e a t i n g a copy of the subgraph r e p r e s e n t i n g *J f • • • f \J 1 m and s u b s t i t u t i n g , f o r any occurrence of a symbol A , the j subgraph r e p r e s e n t i n g P . No copy of P needs to be made; i t j j i s , of course, more e f f i c i e n t i n space and time to s u b s t i t u t e a r e f e r e n c e to the same graph, e s p e c i a l l y i f i t r e p r e s e n t s an exp r e s s i o n which might, subsequently, be s u b j e c t to common subexpression o p t i m i z a t i o n techniques. However, i t i_s necessary to make a copy of "S ;...;S " f o r every macro expansion. It was 1 m tempting to d e f i n e a s p e c i a l language " f e a t u r e " which performed t h i s f u n c t i o n , i n c l u d i n g the s u b s t i t u t i o n of parameters; L i s p systems o f t e n have such a f u n c t i o n f o r the parameterized copying of l i s t s t r u c t u r e s , but i t f e l t l i k e a cheat, here, where i t s only obvious use was i n macro replacement. In any case, the copy and s u b s t i t u t i o n procedure proved to be reasonably small, not only because i t d i d not have much to do, but a l s o because i t only needed to c o n s i d e r that p o r t i o n of the P a s c a l S syntax (approximately h a l f ) concerned with computation, and c o u l d s e c t i o n 4.4.1 170 ignore the p o r t i o n concerned with d e c l a r a t i o n . 4.4.2 A SASL compiler The other a p p l i c a t i o n of graph t r a n s f o r m a t i o n to a t r a n s l a t i o n problem was i n a "compiler" fo r an a p p l i c a t i v e language SASL. Turner [126,127] d e s c r i b e s a method for t r a n s l a t i n g e xpressions in an a p p l i c a t i v e language (lambda c a l c u l u s , pure L i s p , SASL, et c . ) i n t o a form in which there are no bound v a r i a b l e s , and c o n s i s t i n g only of a p p l i c a t i o n s of monadic f u n c t i o n s . Only a f i n i t e number of new symbols are introduced i n Turner's method, so t h a t , for i n s t a n c e , the e x p r e s s i o n lambda (x) (x+1) i s t r a n s l a t e d to S ( S ( K plus ) ( K 1 ) ) I (dyadic o p e r a t o r s l i k e "+" and "*" are turned i n t o monadic f u n c t i o n s " p l u s " and "times" r e t u r n i n g as t h e i r value a monadic f u n c t i o n -- a process known as C u r r y i n g ) , where the new symbols S, K, and I are d e f i n e d as S f g x = f x ( g x ) K x y = x I x = x The e v a l u a t i o n of the successor f u n c t i o n above, a p p l i e d to a v a l u e , 7, say: lambda (x) (x+1) 7 => 7+1 = > 8 now becomes s e c t i o n 4.4.2 171 S ( S ( K plus ) ( K 1 ) ) I 7 => S ( K plus ) ( K 1 ) 7 ( I 7 ) =>2 ( K plus ) 7 ( K 1 7 ) 7 = > 2 p l u s 1 7 = > 8 ( s i x steps i n s t e a d of two). However, Turner i n t r o d u c e s o p t i m i z i n g t r a n s f o r m a t i o n s : (1) SASL(<comb> e l , e 2 ) : { S ( K e l ) ( K e2 ) } => SASL(<comb> e l , e 2 ) : { K e l e2 } (2) SASL(<comb> e l ) : { S ( K e l ) I } => SASL(<comb> e l ) : { e l } (3) i f (1) and (2) do not apply then SASL(<comb> e l , e 2 ) : { S ( K e l ) e2 } => SASL(<comb> e l , e 2 ) : { B e l e2 } (4) i f (1), (2) and (3) do not apply then SASL(<comb> e l , e 2 ) : { S e l ( K e2 ) } => SASL(<comb> e l , e 2 ) : { C e l e2 } and new combinator symbols B f g x = f (g x) C f g x = f x g For the successor f u n c t i o n example, only r u l e s (1) and (2) are necessary for the t r a n s f o r m a t i o n : S(S(K p l u s ) ( K 1))I => S(K( p l u s 1))I [ r u l e (1)] => plus 1 [ r u l e (2)] The f a c t o r i a l f u n c t i o n uses a l l four t r a n s f o r m a t i o n s , and thus a l s o r e q u i r e s the i n t r o d u c t i o n of the symbols B and C. The o p t i m i z a t i o n reduces i t s a p p l i c a t i v e form from 29 f u n c t i o n and constant symbols to 15. These r e s u l t s are i n t e r e s t i n g i n that they demonstrate an o p t i m i z a t i o n by t r a n s f o r m a t i o n in a com p u t a t i o n a l l y " c l e a n " c a l c u l u s , with the r e s u l t that the o p t i m i z a t i o n i s , u n l i k e ad hoc o p t i m i z a t i o n s in c o n v e n t i o n a l programming languages, provably c o r r e c t . s e c t i o n 4.4.2 172 A pure combinator language has the a d d i t i o n a l a t t r a c t i o n for a c o n s i d e r a t i o n of o p t i m i z a t i o n that the e x p r e s s i o n o p t i m i z a t i o n s d i s c u s s e d i n s e c t i o n 3.2.3 can be a p p l i e d as w e l l : that . i s , common subexpressions can be merged in a DAG r e p r e s e n t a t i o n of the program and need be e v a l u a t e d only once. Turner a l s o r e p o r t s that the combinatory code has "some remarkable s e l f o p t i m i z i n g p r o p e r t i e s i n c l u d i n g that constant c a l c u l a t i o n s are a u t o m a t i c a l l y moved o u t s i d e of l o o p s . . . " ([127], p 32). T h i s r e s u l t alone should be enough to make those who s t r u g g l e with o p t i m i z i n g compilers for standard programming languages s i t up and take n o t i c e : i t may be a chance phenomenon, but i t may a l s o be the beginning of a more r e g u l a r , t h e o r e t i c a l l y based approach to o p t i m i z a t i o n . Since the o p t i m i z a t i o n s and the d e s c r i p t i o n of a compiler and combinator e v a l u a t o r were a l l expressed by Turner as t r a n s f o r m a t i o n s on s y n t a c t i c a l l y w e l l formed a p p l i c a t i v e e x p r e s s i o n s , we decided i t was worthwhile to implement at l e a s t some of these in the graph t r a n s f o r m a t i o n language. The r e s u l t was an extremely c l e a n implementation of p a r t of Turner's t e c h n i q u e 1 4 expressed in terms very c l o s e to those used in the paper and, consequently, very short and readable. For a comparison of the two s t y l e s of t r a n s l a t i o n 1 4 Only the o p t i m i z i n g and v a r i a b l e removing t r a n s f o r m a t i o n s were implemented; a complete implementation beckoned most i n v i t i n g l y but was abandoned in favor of more p r e s s i n g matters. s e c t i o n 4.4.2 173 e x p r e s s i o n , o l d and new, see the appendix, where the SASL o p t i m i z e r i s g i v e n i n both s t y l e s . Compare t h e s e , e s p e c i a l l y the second, w i t h T u r n e r ' s p a p e r s , t o get a f e e l f o r the near " n a t u r a l n e s s " of the e x p r e s s i o n s t y l e . The t e x t of the o l d s t y l e P a s c a l . S t r a n s l a t o r i s a v a i l a b l e f o r e x a m i n a t i o n as w e l l , but i s too l a r g e t o be i n c l u d e d here f o r the m i n i m a l i n t e r e s t i t i s l i k e l y t o r e c e i v e ; i t w i l l , however, be f u r n i s h e d on r e q u e s t . 174 Chapter 5: E v a l u a t i o n and c o n c l u s i o n s T h i s d i s s e r t a t i o n has presented arguments for the e xtension of programming language t r a n s l a t i o n technology from a p u r e l y syntax d i r e c t e d model to a syntax t r a n s f o r m i n g model. The r e s u l t of such an e x t e n s i o n , i t was argued, would be to b r i n g w i t h i n reach of a s i n g l e , a l l - e m b r a c i n g technique , the var'ious t r a n s l a t i o n r e l a t e d programming a c t i v i t i e s that do not c o i n c i d e so n e a t l y with a p u rely syntax d i r e c t e d view but must, i f implemented at a l l for t r a n s l a t o r s produced by automatic means, be added on by ad hoc techniques. A r e c a p i t u l a t i o n i s i n order here of the s p e c i f i c aspects of t r a n s l a t i o n that have been brought i n t o a uniform r e l a t i o n s h i p during the p r e s e n t a t i o n of t h i s t h e s i s . (1) Programming languages e x h i b i t both context f r e e and context s e n s i t i v e r e l a t i o n s h i p s between t h e i r c o n s t i t u e n t p a r t s that together form the complete s y n t a c t i c (meaning-free) d e s c r i p t i o n of programs w i t h i n the language. T r a d i t i o n a l syntax- or, r a t h e r , p a r s e r - d i r e c t e d t r a n s l a t i o n techniques have been l i m i t e d , by t h e o r e t i c a l l i m i t a t i o n s on automatic p a r s i n g techniques, to context f r e e d e s c r i p t i o n s augmented by a symbol t a b l e or, where m u l t i p l e passes were p o s s i b l e , to context f r e e d e s c r i p t i o n s augmented by t r a n s f o r m a t i o n s of a symbol t a b l e . In a graph t r a n s f o r m a t i o n model of t r a n s l a t i o n , however, both context s e n s i t i v e and context f r e e d e s c r i p t i o n s are p o s s i b l e chapter 5 175 sim u l t a n e o u s l y , in the same formalism, and without recourse to e x t e r n a l concepts l i k e symbol t a b l e s . (2) Programs aire not only d e s c r i b e d s y n t a c t i c a l l y , but a l s o with r e f e r e n c e to the notion of c o n t r o l flow. Any s o r t of program improvement e f f o r t w i l l be based on a d e s c r i p t i o n of t h i s c o n t r o l flow, which can be represented and manipulated by the same mechanisms that make p o s s i b l e the r e p r e s e n t a t i o n and manipulation of context s e n s i t i v e s y n t a c t i c a s p e c t s . In examples presented above, we saw that the treatment of l a b e l s i n such cases i s e s s e n t i a l l y the same as that of v a r i a b l e s --except that the v a r i a b l e s do not int r o d u c e c y c l e s i n the graph s t r u c t u r e , while l a b e l s g e n e r a l l y do. As a matter of f a c t , the t r a d i t i o n a l " t u p l e " r e p r e s e n t a t i o n of programs in d i s c u s s i o n s of t r a n s l a t i o n and e s p e c i a l l y o p t i m i z a t i o n can be seen to be e s s e n t i a l l y a d i r e c t e d graph r e p r e s e n t a t i o n , from which a l l s y n t a c t i c s t r u c t u r e has been purged, l e a v i n g only the c o n t r o l flow s t r u c t u r e . Given t h i s u n i f i c a t i o n , we can see that such t r a d i t i o n a l l y d i s t i n c t areas as code o p t i m i z a t i o n , program development by stepwise refinement, t r a n s l a t i o n of " a b s t r a c t i o n s " such as macros, i n l i n e procedures, type d e c l a r a t i o n s , and more s o p h i s t i c a t e d contemporary conceptions of a b s t r a c t i o n , a l l represent v a r i a t i o n s on a base concept of program manipulation by s y n t a c t i c a l l y s t r u c t u r e d s u b s t i t u t i o n , and are i n essence no d i f f e r e n t from s t r a i g h t p r o d u c t i o n - l e v e l to m a c h i n e - l e v e l language t r a n s l a t i o n . chapter 5 176 (3) L i k e many other d e s c r i p t i o n s of complex processes, d e s c r i p t i o n s of programming language t r a n s l a t i o n s u f f e r from any d i s c i p l i n e which f o r c e s them to f i t a m o n o l i t h i c , a l l - a t - o n c e model ra t h e r than a l l o w i n g them to be d e s c r i b e d i n c r e m e n t a l l y in terms of a number of stages or "passes." Even where the t r a d i t i o n a l syntax d i r e c t e d view of t r a n s l a t i o n has p ermitted m u l t i p l e passes, the s y n t a c t i c s t r u c t u r e of the program under t r a n s l a t i o n remained s t a t i c ; and yet, e f f o r t s i n compiler w r i t i n g and program p o r t a b i l i t y have i n c r e a s i n g l y turned to a m u l t i - s t a g e model in which d e s c r i p t i o n s of the t r a n s l a t i o n s i n v o l v e d are o f t e n based on an a b s t r a c t intermediate language with a s y n t a c t i c s t r u c t u r e , a l b e i t p r i m i t i v e , uniquely i t s own and d i s t i n c t from e i t h e r the source or t a r g e t languages of the o v e r a l l t r a n s l a t i o n . The s y n t a c t i c t r a n s f o r m a t i o n model permits a treatment of the stages in such a t r a n s l a t i o n i n a manner uniform with the treatment of one-stage t r a n s l a t i o n s , and an e x t e r n a l r e p r e s e n t a t i o n of these stages that permits the intermediate languages to be d i s p l a y e d i n much the same way as they are d i s p l a y e d in papers on the subject of m u l t i s t a g e t r a n s l a t i o n techniques. Most of these arguments were presented in chapters 1 and 2. In chapter 2 a l s o , the groundwork was l a i d for an approach to t r a n s l a t i o n intended to be s u c c e s s f u l in d e a l i n g u n i f o r m l y with the t r a d i t i o n a l t r a n s l a t i o n concerns while at the same time r e t a i n i n g as much as p o s s i b l e the h i g h l y n a t u r a l use of s y n t a c t i c s t r u c t u r e in t r a n s l a t i o n . The uses of s y n t a c t i c chapter 5 177 s t r u c t u r e were o u t l i n e d and a r e p r e s e n t a t i o n using d i r e c t e d graphs i n s t e a d of t r e e s was developed. The idea of graph t r a n s f o r m a t i o n was presented there as an a l t e r n a t i v e to t r e e t o u r i n g combined with code emission (the e q u i v a l e n t of syntax d i r e c t e d t r a n s l a t i o n ) . Chapter 3 was p r i m a r i l y concerned with examining the t r a d i t i o n a l concerns of t r a n s l a t i o n and program manipulation as a s p e c t s of s y n t a c t i c t r a n s f o r m a t i o n . The concepts of graph and graph t r a n s f o r m a t i o n were f o r m a l l y s p e c i f i e d and the model they d e f i n e d was used in the subsequent d i s c u s s i o n . Language ex t e n s i o n , p o r t i n g , e d i t i n g , code g e n e r a t i o n , o p t i m i z a t i o n , and m u l t i - p a s s t r a n s l a t i o n were a l l examined in the l i g h t of t h i s model; more recent d i r e c t i o n s i n programming language manipulation were a l s o shown to i n c o r p o r a t e aspects of s y n t a c t i c t r a n s f o r m a t i o n . Chapter 4 completes the survey of t r a d i t i o n a l concerns i n terms of s y n t a c t i c t r a n s f o r m a t i o n by p r e s e n t i n g the e s s e n t i a l a s p e c t s of an e s s e n t i a l l y t r a d i t i o n a l t r a n s l a t o r w r i t i n g system. Although t h i s TWS design i s somewhat unconventional in i t s emphasis on t r a n s l a t i o n r a ther than s y n t a c t i c acceptance, i t i s n e v e r t h e l e s s t r a d i t i o n a l i n i t s "batch'' o r i e n t e d design, performing the three steps of a c c e p t i n g , t r a n s d u c i n g , and code ge n e r a t i n g i n a s t r a i g h t f o r w a r d sequence. (Later on i n t h i s chapter, recent trends in t r a n s l a t o r w r i t i n g w i l l be examined: these d i f f e r s i g n i f i c a n t l y i n t h e i r approach to the e x p r e s s i o n c h a p t e r 5 of t r a n s l a t i o n from the s yntax d i r e c t e d model.) 178 Much of t h i s d i s s e r t a t i o n , t h e n , i s concerned w i t h a s y n t h e s i s of t r a d i t i o n a l t r a n s l a t i o n by means of a d e s c r i p t i v e t e c h n i q u e which d i f f e r s from the s y n t a x d i r e c t e d t e c h n i q u e i n t h a t , where the l a t t e r depends e n t i r e l y on a s t a t i c s y n t a c t i c r e p r e s e n t a t i o n of a program, and i s d r i v e n by the a c t i o n s i n d u ced on a p a r s e r by a grammar, the former d y n a m i c a l l y a l t e r s the s y n t a c t i c r e p r e s e n t a t i o n t o s u i t the v a r i o u s s u b t e c h n i q u e s and t o move the program by s t a g e s from the domain of one s t a t i c s y n t a c t i c d e s c r i p t i o n ( t h a t of the source language) t o the domain of a n o ther (the t a r g e t l a n g u a g e ) , p a s s i n g t hrough as many i n t e r m e d i a t e s t a g e s , c o n c r e t e or a b s t r a c t , as i s e i t h e r n e c e s s a r y or c o n v e n i e n t . I n s t e a d of a p r o c e s s which t o u r s the s y n t a c t i c s t r u c t u r e d e termined by the s yntax of the language under t r a n s l a t i o n , and e m i t s the t r a n s l a t i o n i n a l e f t t o r i g h t d i s c i p l i n e , the p r o p o s a l was t o t r a n s f o r m the s y n t a c t i c s t r u c t u r e on a l o c a l , i n c r e m e n t a l b a s i s , u n t i l i t had become a s y n t a c t i c a l l y s t r u c t u r e d r e p r e s e n t a t i o n of the t r a n s l a t e d program. Chapter 4 a l s o d e v e l o p e d a language f o r the d e f i n i t i o n of t r a n s d u c e r s i n which the programmer may t r e a t the t r a n s d u c t i o n as one, not on g r a p h s , but on t e x t u a l program fragments i n t h e i r n a t u r a l r e p r e s e n t a t i o n . T h i s p r e s e n t a t i o n of t r a d i t i o n a l t e c h n i q u e s a l o n e , i f i t was a t a l l s u c c e s s f u l i n e s t a b l i s h i n g t h a t a u n i f o r m r e s t r u c t u r i n g of t r a n s l a t i o n t e c h n i q u e s was p o s s i b l e , forms a chapter 5 179 strong argument for the merits of s y n t a c t i c t r a n s f o r m a t i o n in t r a n s l a t i o n . However, the argument i s not complete, and that of chapter 3, because of the breadth of the s u b j e c t , was no more than c u r s o r y i n i t s examination of t h i s t r a d i t i o n a l m a t e r i a l . Except f o r a short example i n the appendix, and the somewhat more e l a b o r a t e p r o j e c t b r i e f l y d e s c r i b e d i n s e c t i o n 4.4.1, the e f f e c t i v e n e s s of the model i s thus f a r weak in e m p i r i c a l ev idence. Such evidence can only accumulate from acceptance of t h i s technique, or some v a r i a t i o n on t h i s technique i n v o l v i n g a s i m i l a r approach to t r a n s l a t i o n , by a s i g n i f i c a n t p o r t i o n of the community concerned with programming language t r a n s l a t i o n . Much l i k e the widespread acceptance of the syntax d i r e c t e d model. One p o s s i b l e route to t h i s acceptance would be the implementation and d i s s e m i n a t i o n of a t r a n s l a t o r w r i t i n g system based on the design in chapter 4. T h i s r e p r e s e n t s a l a r g e , but f e a s i b l e e f f o r t , and i t i s one of my long-term goals to produce such a system. Since i t i s a "system" as presented in chapter 4, i t can be c o n s t r u c t e d i n p a r t s , some of which may prove to be i n t e r e s t i n g computational t o o l s i n t h e i r own r i g h t . Let us b r i e f l y examine the primary p a r t s making up such a system, and a s t r a t e g y f o r t h e i r implementation, here. Graph support t o o l s Under t h i s heading f a l l s the core of software t o o l s chapter 5 180 necessary to the other p a r t s . These i n c l u d e the software to a l l o c a t e space f o r , and c o n s t r u c t , graphs, to perform p a t t e r n matching and graph t r a n s f o r m a t i o n e s s e n t i a l l y to implement the m a t e r i a l d e s c r i b e d f o r m a l l y (and with an eye to implementation) at the beginning of chapter 5 and in s e c t i o n 4.2.2. Such t o o l s would be u s e f u l not only i n a t r a n s l a t o r w r i t i n g system based on s y n t a c t i c (graph) t r a n s f o r m a t i o n , but i n many a p p l i c a t i o n s whose u n d e r l y i n g model i s a d i r e c t e d graph which i t may be necessary to manipulate: knowledge r e p r e s e n t a t i o n networks, p i c t u r e d e s c r i p t i o n networks (see s e c t i o n 4.2.1), and other complex h i e r a r c h i e s , such as databases. Indeed, the r e c u r s i v e s t r u c t u r e of a r e l a t i o n a l database c l o s e l y resembles the r e c u r s i v e s t r u c t u r a l r e l a t i o n s h i p s imposed on a program by the grammar of the language i t i s expressed i n ; moreover, the conver s i o n of r e l a t i o n a l databases to meet changing demands on t h e i r a b i l i t y to s t o r e and r e t r i e v e i n f o r m a t i o n i s a c o n t i n u i n g problem that r e q u i r e s at l e a s t as much complex (and t h e r e f o r e c a r e f u l ) c o n s i d e r a t i o n as the t r a n s l a t i o n of a programming language [118] . S y n t a c t i c a c c e ptors By s e p a r a t i n g s y n t a c t i c r e c o g n i t i o n and t r a n s l a t i o n , we not only f r e e the t r a n s l a t o r from the l i m i t a t i o n s imposed by context f r e e (and even more r e s t r i c t e d ) s y n t a c t i c r e c o g n i t i o n , we a l s o f r e e the parse, i f such i s being used f o r s y n t a c t i c acceptance, chapter 5 181 from the demands placed on i t by the t r a n s l a t i o n . Indeed, we f r e e a l t o g e t h e r the programs that perform the task of c r e a t i n g syntax graphs from the need to be p a r s e r s : there i s no reason why t r a n s l a t i o n cannot be one of the f u n c t i o n s performed by a program e d i t o r , j u s t as text f o r m a t t i n g i s f r e q u e n t l y one of the tasks performed, a l b e i t not always very s u c c e s s f u l l y , by text e d i t o r s ; a program e d i t o r l i k e Emily (see s e c t i o n 3.4) bypasses many of -the f u n c t i o n s of a parser --although i t i s syntax d i r e c t e d - - because i t f o r c e s the c r e a t i o n of a s y n t a c t i c a l l y c o r r e c t program, whereas a parser e x i s t s in part to ensure the s y n t a c t i c c o r r e c t n e s s of input by i s s u i n g e r r o r messages and even by attempting automatic c o r r e c t i o n of e r r o r s . Incremental compilers and program development systems are i n c r e a s i n g l y important aspects of t r a n s l a t i o n , and programming environments encouraging the use of these w i l l b e n e f i t from t r a n s l a t o r development tools 0,which encourage the s e p a r a t i o n of input and t r a n s l a t i o n phases -- so that one can be completely changed without a f f e c t i n g the other. P a r s e r s of a l l kinds have a wide range of uses as standalone software t o o l s , as i s demonstrated by the many, oft e n s u r p r i s i n g a p p l i c a t i o n s the parser generator YACC has been put to [66]. YACC i s the product of a design philosophy that d i r e c t s systems of software to be b u i l t out of manageably s i z e d " t o o l s . " "Code" generators (graph f l a t t e n e r s ) In s e c t i o n 4.1 the l a s t phase of a t r a d i t i o n a l t r a n s l a t i o n chapter 5 182 was d e s c r i b e d as an inverse p a r s e r . As with p a r s e r s , i t i s convenient to be able to r e p l a c e one kind, an o b j e c t module generator, f o r i n s t a n c e , with another which " p r e t t y p r i n t s " the t a r g e t program, or one which s t o r e s an intermediate r e p r e s e n t a t i o n of the t r a n s l a t e d program's graph s t r u c t u r e in a f i l e using a l i n e a r graph n o t a t i o n . R e c a l l that, i n the d i s c u s s i o n on o p t i m i z a t i o n in t h i s model, o p t i m i z a t i o n i s not c o n s i d e r e d one of the f u n c t i o n s of a graph f l a t t e n e r (hence the avoidance of the term "code g e n e r a t o r " ) : o p t i m i z a t i o n i s one of the aspects of graph t r a n s f o r m a t i o n . Again, as with s y n t a c t i c a c c e p t o r s , s e p a r a t i n g the f u n c t i o n of graph f l a t t e n i n g from that of t r a n s l a t i o n (graph t r a n s f o r m a t i o n ) has i t s b e n e f i t s : j u s t as s y n t a c t i c r e c o g n i t i o n may not be necessary i n a t r a n s l a t i o n (the program may have been generated by use of a s y n t a c t i c e d i t o r ; the p r e v i o u s l y parsed program may have been s t o r e d in a l i n e a r graph n o t a t i o n and need not be parsed a g a i n ) , so a code generation may not be necessary e i t h e r . In p a r t i c u l a r , the s y n t a c t i c graph need not be t r a n s l a t e d at a l l , but may be i n t e r p r e t e d i n s t e a d . These three subsystems are each programming t o o l s i n t h e i r own r i g h t : the graph manipulation package has a p p l i c a t i o n s , as i n d i c a t e d , f a r beyond i t s use as a programming t o o l ; parser generators l i k e YACC are e s s e n t i a l l y a form of input processor with a wide range of uses o u t s i d e of language p r o c e s s i n g ; a s y n t a c t i c e d i t o r (an a l t e r n a t i v e to p a r s i n g ) would be a p p l i c a b l e a l s o to document e d i t i n g [117] and database manipulation [118]; c h a p t e r 5 183 a syntax graph f l a t t e n e r c o u l d s e r v e a t l e a s t as a program p r e t t y p r i n t e r , when the i n p u t graph i s not t r a n s f o r m e d . They can e x i s t , t h e r e f o r e , and be u s e f u l even w i t h o u t t h e i r b e i n g p a r t of a TWS, and they can be c o n s t r u c t e d , one by one, each b e i n g i m m e d i a t e l y a v a i l a b l e l o n g b e f o r e a f u l l TWS has come i n t o e x i s t e n c e . F u r t h e r m o r e , once they e x i s t , t hey can themselves be used i n the c o n s t r u c t i o n of a TWS. A t r a n s l a t o r w r i t i n g system i s i t s e l f a t r a n s l a t o r , and can be implemented as such. U s u a l l y , i n f a c t , i t i s a t l e a s t t h r e e t r a n s l a t o r s i n the sense t h a t each has i t s own d e s c r i p t i o n language: one t o generate symbol s c a n n e r s from l e x i c a l d e s c r i p t i o n s e x p r e s s e d i n a language s p e c i f i c t o scanner g e n e r a t o r s ; one t o generate p a r s e r s from s y n t a c t i c d e s c r i p t i o n s e x p r e s s e d i n a language s p e c i f i c t o p a r s e r g e n e r a t o r s ; and one to g e n e r a t e t r a n s l a t o r s from semantic d e s c r i p t i o n s . However, the TWS o u t l i n e d i n c h a p t e r 4 combined these i n t o a more u n i f o r m d e s c r i p t i o n . Once the b a s i c t r a n s l a t o r w r i t i n g t o o l s e x i s t , t h e s e can be used t o c o n s t r u c t a t r a n s l a t o r f o r programs w r i t t e n i n the language o u t l i n e d i n c h a p t e r 4; the f i r s t program w r i t t e n i n t h i s language and c o m p i l e d w i t h t h i s t r a n s l a t o r s h o u l d be a p r o p e r d e s c r i p t i o n of the m e t a - t r a n s l a t o r i t s e l f . Even b e f o r e the TWS e x i s t s -- even when o n l y i t s a s s o c i a t e d l i b r a r y of t o o l s e x i s t s -- a number of a p p l i c a t i o n s become p o s s i b l e , and can be b u i l t up out of the t o o l s , u s i n g some o t h e r programming language, most l i k e l y a g e n e r a l purpose a l g o r i t h m i c chapter 5 184 one, to c o n t r o l .the use of the t o o l s . T h i s approach of b u i l d i n g systems from t o o l s i s d e l i b e r a t e l y modelled on the "software t o o l s " approach to system c o n s t r u c t i o n developed mostly at B e l l Labs and advocated by Kernighan and Plauger [69]. T h i s approach depends on g e t t i n g the b a s i c t o o l s out i n t o the world and r e f i n i n g them i t e r a t i v e l y as a wide base of experience i s a c q u i r e d in t h e i r use; i t appears to be a s u c c e s s f u l method which has been a p p l i e d s p e c i f i c a l l y to language development t o o l s in s e v e r a l d i f f e r e n t s e t t i n g s [1,67]. The primary d i v i s i o n of the t r a n s l a t i o n t o o l s i n t o three subparts r e f l e c t s the t r a d i t i o n a l view of t r a n s l a t i o n as a s i n g l e continuous process with s e v e r a l d i s t i n c t phases (which may be performed e i t h e r s t r i c t l y s e q u e n t i a l l y or by c o o p e r a t i n g s e q u e n t i a l p r o c e s s e s ) . However, once these three p o r t i o n s are seen as c o l l e c t i o n s of c l o s e l y r e l a t e d t o o l s , i t becomes j u s t as n a t u r a l to r e s t r u c t u r e the t r a n s l a t i o n p r o c e s s . It was a l r e a d y suggested in t h i s chapter that the f u n c t i o n of s y n t a c t i c acceptor normally performed by a parser c o u l d be performed by an i n t e r a c t i v e program e d i t o r i n s t e a d . Such an e d i t o r would, i f i t acted l i k e Emily or the C o r n e l l S y n t h e s i z e r [123], be used to b u i l d s y n t a c t i c a l l y c o r r e c t program graphs which c o u l d then be passed through a graph transformer. More than t h i s , however, once the program graph e x i s t s , i t can be e d i t e d , whenever semantic changes are necessary, by a process of graph t r a n s f o r m a t i o n . T h i s process would produce new graphs chapter 5 185 which can e i t h e r be p r i n t e d by a p r e t t y p r i n t i n g graph f l a t t e n e r , or s t o r e d , or t r a n s l a t e d . The program can be s t o r e d as a graph, using a l i n e a r graph n o t a t i o n as i n the PQCC, or as a kind of ob j e c t module with r e l o c a t a b l e l i n k s to be r e s o l v e d , next time the program i s e d i t e d , by a kind of loader, or, again, as a p r e t t y p r i n t e d program. A s t r u c t u r e e d i t o r would perform many of the f u n c t i o n s of a program development system such as d i s c u s s e d i n s e c t i o n 3.4. A l i b r a r y of s y n t a c t i c transforms for systematic program development [6,30] c o u l d be maintained : such a l i b r a r y c o u l d be developed i n the s y n t a c t i c t r a n s f o r m a t i o n language and compiled analogously to the way subroutine l i b r a r i e s are maintained for c o n v e n t i o n a l g e n e r a l purpose programming systems. At the other end, the transformed graphs r e p r e s e n t i n g t r a n s l a t e d or p a r t i a l l y t r a n s l a t e d programs, need not be " f l a t t e n e d " at a l l , e i t h e r by a p r e t t y p r i n t e r or i n t o o b j e c t modules. One aspect of programming language " t r a n s l a t i o n " not d e a l t with in chapter 3, but c e r t a i n l y t r a d i t i o n a l i n the sense that i t has been known as a technique, and a p p l i e d f o r as long as compilers have, i s program i n t e r p r e t a t i o n . Furthermore, the idea of basing an i n t e r p r e t e r on a s y n t a c t i c r e p r e s e n t a t i o n of the program a l r e a d y has a c o n s i d e r a b l e h i s t o r y behond i t . The Vienna D e f i n i t i o n Language (VDL), an i n t e r p r e t i v e automaton, uses an a b s t r a c t s y n t a c t i c r e p r e s e n t a t i o n of the program i t i s i n t e r p r e t i n g -- in essence a s e m a n t i c a l l y augmented t r e e with both branches and nodes l a b e l l e d (branches are not i m p l i c i t l y chapter 5 186 ordered with respect to a node) and performs i t s i n t e r p r e t a t i o n by s u c c e s s i v e l y transforming the p o r t i o n s of the t r e e used to represent the data and the program u n t i l some f i n a l s t a t e r e s u l t s -- i n the s i m p l e s t case a s i n g l e node r e p r e s e n t i n g the value computed by the program [139]. VDL i s not a language fo r the d e f i n i t i o n of usable i n t e r p r e t e r s ; that i s not i t s f u n c t i o n , which i s a p u r e l y d e f i n i t i o n a l one. However, Turner [127], whose work on p a r t i a l c o m p i l a t i o n of expressions i n an a p p l i c a t i v e language was d i s c u s s e d i n chapter 4, has based the design of an i n t e r p r e t e r fo r the r e s u l t i n g v a r i a b l e - f r e e expressions on the use of p o s s i b l y c y c l i c l i s t s t r u c t u r e s to represent the e x p r e s s i o n s , and a t r a n s f o r m a t i o n a l i n t e r p r e t e r to evaluate them. One of the b e n e f i t s of t h i s approach i s that i t permits a r e l a t i v e l y e f f i c i e n t form of normal order r e d u c t i o n , which i s used because i t can guarantee t e r m i n a t i o n provided the e v a l u a t i o n can terminate. The s u b s t i t u t i o n of unevaluated arguments for f u n c t i o n parameters in e v a l u a t i n g a f u n c t i o n c a l l , on which normal order r e d u c t i o n depends and which normally causes p a r t i a l l y e v a l u a t e d programs to grow out of p r o p o r t i o n with t h e i r o r i g i n a l s i z e , i s no more c o s t l y in t h i s model than the more usual a p p l i c a t i v e order regime, by v i r t u e of the DAG-forming subgraph replacement scheme i t uses to implement normal order e v a l u a t i o n (analogous to the d i s c u s s i o n of common subexpression o p t i m i z a t i o n i n s e c t i o n 3.2.1). Se v e r a l r e l a t i v e l y recent p r o p o s a l s have advocated a s i m i l a r " l a z y chapter 5 187 e v a l u a t i o n " approach, p a r t i c u l a r l y f o r performing computations that are e s s e n t i a l l y non-terminating, but achieve i n t e r e s t i n g p a r t i a l r e s u l t s along the way (the s i e v e of Eratosthenes, which generates a l l primes, i s an a n c i e n t example of such an a l g o r i t h m ) : see Turner's paper f o r a d i s c u s s i o n and f u r t h e r r e f e r e n c e s . Rosen [105] has a l s o suggested graph t r a n s f o r m a t i o n as a b a s i s for the formal d e f i n i t i o n of i n t e r p r e t e r s . The preceding design was presented in part as an i n d i c a t i o n of the d i r e c t i o n s of research d e r i v i n g d i r e c t l y from the m a t e r i a l developed in t h i s d i s s e r t a t i o n . I t i s a l s o presented as a workable o u t l i n e f o r others wishing to pursue these ideas: in h i s summary on the p r a c t i c a l value of software t o o l s i n the Unix s e t t i n g , Johnson [67] remarks that whereas few s u c c e s s f u l t o o l s were "designed" i n the t r a d i t i o n a l ' sense -- by being turned out as f i n i s h e d products by a s k i l l e d team of t o o l w r i g h t s --, the most common process was i n s t e a d one of i t e r a t i v e refinement of the p r i m i t i v e t o o l s by a community of s k i l l e d t o o l u s e r s . If the concept of s y n t a c t i c t r a n s f o r m a t i o n has the kind of merit claimed f o r i t here as a b a s i s for t r a n s l a t o r d e f i n i t i o n , i t w i l l come to be adapted to many d i f f e r e n t forms of t r a n s l a t i o n , under many d i f f e r e n t circumstances, and n e c e s s a r i l y by many d i f f e r e n t people, o f t e n q u i t e independently of each other. The r e s u l t w i l l be a s i m i l a r i t e r a t i v e refinement of the b a s i c concepts to s u i t more n e a r l y the d i r e c t c h a p t e r 5 188 c o n c e r n s of t r a n s l a t o r w r i t e r s . The f o r m a l i z e d d e f i n i t i o n s a t the b e g i n n i n g of c h a p t e r 3 and i n s e c t i o n 4.2.2 a r e so c o n s t r u c t e d t h a t they can be used d i r e c t l y i n d e r i v i n g a p r a c t i c a l i m p l e m e n t a t i o n , as a f u r t h e r i n c e n t i v e t o the use of these t o o l s . F o r t u n a t e l y , t h i s p r o c e s s a l r e a d y appears t o be t a k i n g p l a c e . Very r e c e n t l y t h e r e have been r e p o r t s on s e v e r a l p a r a l l e l e f f o r t s i n the ar e a of code g e n e r a t i o n t h a t have depended on a b s t r a c t syntax graphs ( p r i m a r i l y t r e e s ) t o r e p r e s e n t the programs, and p a t t e r n matching and graph t r a n s f o r m a t i o n s as the means t o e f f e c t o p t i m i z a t i o n s on the program [20,51,78].- These a l s o have, i n common w i t h the m a t e r i a l p r e s e n t e d h e r e , the assumption t h a t code g e n e r a t i o n i s to be c o n s i d e r e d s e p a r a t e l y from i n p u t a c c e p t a n c e . The t h r e e r e s e a r c h e f f o r t s c i t e d here run the gamut from b e i n g s t r o n g l y d e t e r m i n i s t i c [20] t o h i g h l y h e u r i s t i c [78] i n t h e i r approach t o s e l e c t i n g a p p r o p r i a t e o p t i m i z a t i o n s . The t r a n s l a t o r w r i t i n g system p r e s e n t e d i n c h a p t e r 4 compares w i t h t h e s e on the d e t e r m i n i s t i c end of the spectrum, i n t h a t the programmer, not a set of h e u r i s t i c s , d e t e r m i n e s the t r a n s f o r m a t i o n s t h a t w i l l t ake p l a c e ; the o v e r a l l p h i l o s o p h y of t r a n s l a t i o n p r e s e n t e d i n t h i s d i s s e r t a t i o n i s ' h i g h l y c o m p a t i b l e , however, w i t h the approach taken at the o t h e r end of the spectrum (the PQCC p r o j e c t ) , i n i t s use of s u c c e s s i v e t r a n s d u c t i o n s of a graph r e p r e s e n t a t i o n t o c o n s t r u c t a t r a n s l a t o r . In f a c t , the p r i m a r y aim of t h i s d i s s e r t a t i o n was t o e s t a b l i s h the h i g h l y u n i f o r m a p p l i c a t i o n of c h a p t e r 5 189 t h i s ( s y n t a c t i c ) t r a n s f o r m a t i o n t e c h n i q u e t o a l l a s p e c t s of t r a n s l a t i o n . In t h a t sense, i t r e p r e s e n t s the d e s i g n of a t o o l k i t f o r a l l such t r a n s f o r m a t i o n a l systems, now and i n the f o r s e e a b l e f u t u r e . 190 Appendix: A SASL compiler In s e c t i o n 4.4.2, we d i s c u s s e d a t r a n s f o r m a t i o n a l "compiler" f o r the a p p l i c a t i v e language SASL, based on a recent a l g o r i t h m f o r b r a c k e t - a b s t r a c t i o n developed by Turner in which the number of new symbols introduced i s small [126, 127]. T h i s appendix presents an implementation of Turner's a l g o r i t h m i n terms of s y n t a c t i c t r a n s f o r m a t i o n s : the language of Turner's papers i n f a c t i m p l i e s that s y n t a c t i c t r a n s f o r m a t i o n would be a n a t u r a l way of t h i n k i n g about the a l g o r i t h m . T h i s appendix proceeds by f i r s t d e f i n i n g a subset of SASL over which t r a n s l a t i o n s w i l l be d e f i n e d , and then g i v i n g two v e r s i o n s of the bracket a b s t r a c t i o n a l g o r i t h m due to Turner: (1) The a l g o r i t h m i n the o l d fo r m u l a t i o n of graph t r a n s f o r m a t i o n , the one f o r which an i n t e r p r e t e r was a c t u a l l y implemented, and thus the only one of the two fo r m u l a t i o n s which has been e m p i r i c a l l y v e r i f i e d ; and (2) The a l g o r i t h m in the new for m u l a t i o n of graph t r a n s f o r m a t i o n developed in chapter 4, which i s unimplemented and t h e r e f o r e u n v e r i f i e d . In both cases the language of the fo r m u l a t i o n i s d e f i n e d i n f o r m a l l y before the a l g o r i t h m i s presented. The reader w i l l probably f i n d the second, more a b s t r a c t f o r m u l a t i o n of the a l g o r i t h m e a s i e r to read, and indeed the f i r s t , more p r i m i t i v e v e r s i o n i s here only f o r completeness, i n case there i s something s u b t l y wrong with the second, u n v e r i f i e d v e r s i o n . 191 A: A SASL subset SASL i s a " s y n t a c t i c a l l y sugared" lambda c a l c u l u s , or a p p l i c a t i v e language. Where in lambda c a l c u l u s one would w r i t e l e t sue = lambda (x) (plus x 1) in SASL the same ex p r e s s i o n i s w r i t t e n def sue x = plus x 1 For the sake of s y n t a c t i c s i m p l i c i t y , we w i l l use no i n f i x n o t a t i o n , which i s pe r m i t t e d in SASL, but w i l l represent a l l o p e r a t o r s , l i k e " p l u s " i n the above examples, i n p r e f i x form. Moreover, a l l f u n c t i o n s w i l l be expected to take only a s i n g l e argument: any multi-argument f u n c t i o n w i l l be c o n s i d e r e d to have been reduced, by a process known as C u r r y i n g , to a single-argument f u n c t i o n . Both these o p e r a t i o n s can, i t i s commonly known, be a p p l i e d without l o s s of g e n e r a l i t y . A BNF d e f i n i t i o n of t h i s b a s i c subset i s then as f o l l o w s : e x p r e s s i o n ::= d e f i n i t i o n | f a c t o r d e f i n i t i o n ::= "def" symbol "=" f a c t o r | "def" symbol symbol "=" f a c t o r f a c t o r ::= primary | f a c t o r primary primary ::= symbol | " ( " f a c t o r " ) " A program c o n s i s t s of a sequence of e x p r e s s i o n s ; symbols are e i t h e r names, r e p r e s e n t i n g f u n c t i o n s or v a r i a b l e s , or c o n s t a n t s . SASL programs would normally be evaluated i n an incremental, i n t e r p r e t i v e manner. s e c t i o n a " 192 The sublanguage generated by the symbol " f a c t o r " i s a pure a p p l i c a t i v e subset. Turner's a l g o r i t h m , l i k e other bracket a b s t r a c t i o n a l g o r i t h m s , i s aimed at reducing a l l e x p r e s s i o n s to t h i s form, and e l i m i n a t i n g i n the process a l l v a r i a b l e s : i n the successor f u n c t i o n "sue", f o r example, "x" i s a v a r i a b l e . T h i s i s accomplished i n two stages. F i r s t , f u n c t i o n s with a bound v a r i a b l e , those generated by a p p l i c a t i o n of the BNF r u l e d e f i n i t i o n ::= "def" symbol symbol "=" f a c t o r are reduced to f u n c t i o n s without a bound v a r i a b l e by a process Turner c a l l s a b s t r a c t i o n , 1 which i s d e f i n e d by the t r a n s f o r m a t i o n s def f x = E => def f = [x] E [x] ( E l E2) => S ([x] E l ) ([x] E2) [x] x => I [x] y => K y where S, K and I were d e f i n e d i n s e c t i o n 4.2.2. T h i s process i s represented i n the two implementations by the procedure " A b s t r a c t " . The second step i s an o p t i m i z i n g one, i n which e x c e s s i v e l y complex a p p l i c a t i o n s of S, K and I are reduced, i n most cases, to simpler ones. The o p t i m i z i n g t r a n s f o r m a t i o n s are S (K E l ) (K E2) => K ( E l E2) S (K E l ) I => E l S (K E l ) E2 => B E l E2 { i f no e a r l i e r r u l e a p p l i e s } S E l (K E2) => C E l E2 { i f no e a r l i e r r u l e a p p l i e s } where B and C were d e f i n e d i n s e c t i o n 4.2.2. T h i s process i s implemented below by the procedure "Optimize". 1 Not to be confused with the use of t h i s term i n the work of Wulf and o t h e r s , which i s i t s use i n the main body of t h i s d i s s e r t a t i o n . 193 B: Old f o r m u l a t i o n The graph t r a n s f o r m a t i o n language used i n the e x p e r i m e n t a l i m p l e m e n t a t i o n was d i s c u s s e d , but not d e f i n e d i n s e c t i o n 4.2. The d e f i n i t i o n which f o l l o w s i s somewhat i n f o r m a l , and o n l y d e f i n e s enough of the language to make r e a d i n g of the a l g o r i t h m p o s s i b l e . T h i s i n f o r m a l d e s c r i p t i o n s h o u l d be l e s s cumbersome than a f o r m a l one and, f o r the purposes of t h i s a p p e n d i x , s u f f i c i e n t l y p r e c i s e . The f o l l o w i n g a re the l e g a l language forms. var x The v a r i a b l e x i s d e c l a r e d . The v a r i a b l e has as i t s scope the s m a l l e s t nest i n which i t i s c o n t a i n e d ; where language forms d e f i n e a n e s t , t h i s w i l l be noted i n t h e i r d e f i n i t i o n , below. The v a l u e of t h i s e x p r e s s i o n i s the name of x; thus "var x" can be used wherever "x" can be used, and w i l l have the same v a l u e . x := E The e x p r e s s i o n E i s e v a l u a t e d and i t s v a l u e a s s i g n e d t o the v a r i a b l e x. The v a l u e of t h i s e x p r e s s i o n i s the v a l u e of E. procedure ( A l , . . . , A n ) E end The v a l u e of t h i s e x p r e s s i o n i s a proce d u r e w i t h n f o r m a l p a r a m e t e r s , A l through An, and body E. A p r o c e d u r e i s thus a m a n i p u l a b l e e n t i t y , l i k e any c o n s t a n t , and may be a s s i g n e d t o s e c t i o n b 194 a v a r i a b l e , w h i c h t h e n becomes i t s "name". A p r o c e d u r e d e f i n e s a n e s t , i n w h i c h t h e f o r m a l p a r a m e t e r s a r e a u t o m a t i c a l l y d e c l a r e d . F ( A l , . . . , A n ) The p r o c e d u r e a s s i g n e d t o t h e v a r i a b l e F i s e v a l u a t e d w i t h a c t u a l p a r a m e t e r s A l t h r o u g h An. The v a l u e of t h i s e x p r e s s i o n i s t h e v a l u e r e s u l t i n g from t h e e v a l u a t i o n of t h e p r o c e d u r e w i t h t h e s e a c t u a l p a r a m e t e r s . if E l t h e n E2 [ e l s i f E ] * [ e l s e E] li C o n d i t i o n a l e x p r e s s i o n i n t h e ( u n r e v i s e d ) A l g o l 68 s t y l e . A c o n d i t i o n a l e x p r e s s i o n d e f i n e s a n e s t . Pl- : : P2 B o o l e a n e x p r e s s i o n p e r f o r m i n g t h e p a t t e r n match d e s c r i b e d i n s e c t i o n 4.2.2. ? o r ? x In a g r a p h p a t t e r n match o n l y , t h e symbol ? matches any s u b g r a p h . I f i t i s a t t a c h e d t o a v a r i a b l e , as i n "? x", a s u c c e s s f u l p a t t e r n match w i l l c a u s e an a s s i g n m e n t of t h e s u b g r a p h matched by " ? " t o t h e v a r i a b l e . " s t r i n g " < P l , . . . , P n > The v a l u e of t h i s e x p r e s s i o n i s a p a t t e r n whose r o o t node c o n s i s t s o f t h e v a l u e " s t r i n g " and whose c h i l d r e n a r e formed s e c t i o n b 195 by PI through Pn, which may be p a t t e r n s , simple (non-pattern) values, or v a r i a b l e s . E l => E2 The graph r e s u l t i n g from e v a l u a t i n g e x p r e s s i o n E l i s transformed, by the method d e f i n e d in chapter 3, i n t o the graph r e s u l t i n g from e v a l u a t i n g e x p r e s s i o n E2. For t h i s v e r s i o n of the graph t r a n s l a t i o n language, we must a l s o supply a p a r s e r / t r e e b u i l d e r and a " f l a t t e n e r " , which c o n s t r u c t s a readable output.from the transformed graph. Since the input and output of the SASL "compiler" are graphs i n the same language, the f l a t t e n e r performs the i n v e r s e f u n c t i o n of the p a r s e r / t r e e b u i l d e r . We need only d e f i n e the r e l a t i o n s h i p between the grammar and the r e s u l t i n g graph, t h e r e f o r e . In the f o l l o w i n g , each grammatical form i n the language i s shown with i t s r e s u l t i n g graph as "form => graph". The graphs are given in the above-defined graph e x p r e s s i o n sublanguage. expr e s s i o n ::= d e f i n i t i o n => d e f i n i t i o n e x p r e s s i o n ::= f a c t o r => f a c t o r d e f i n i t i o n ::= "def" symbol "=" f a c t o r => "def" <symbol,"void",factor> d e f i n i t i o n ::= "def" symboll symbol2 "=" f a c t o r => "def" <symboll,symbol2,factor> f a c t o r ::= primary => primary f a c t o r ::= f a c t o r primary => " a p p l i c a t i o n " <factor,primary> primary ::= symbol => "sym" <symbol> primary ::= " ( " f a c t o r " ) " => f a c t o r s e c t i o n b 196 The r e s u l t i s a parse t r e e f o r the " a b s t r a c t " syntax, l e a v i n g out many of the disambiguating but otherwise redundant symbols and one-on-one p r o d u c t i o n s . For example, the exp r e s s i o n def sue x = plus x 1 r e s u l t s in the parse t r e e def I 1 1 — 1 sym sym a p p l i c a t i o n I I , ' '• I sue x a p p l i c a t i o n sym I 1 1 I sym sym x I I p l u s 1 (Notice that " p l u s " i s here C u r r i e d . What i s a p p l i e d to x i s a f u n c t i o n r e s u l t i n g from the a p p l i c a t i o n of " p l u s " to 1.) We are now ready to examine the program. The top procedure merely orders the two stages: var Improve := procedure (E) A b s t r a c t ( E ) ; Optimize(E) end C Improve C s e c t i o n b 197 var A b s t r a c t := procedure (E) C { def f v = body } => { def f = [v] body } C i_f E : : "def "<?var f , "sym"<?var v> , ?var body> then A b s t r a c t ( b o d y ) ; • E => "def"<f,"void","abstraction"<v,body>> C { def f = body } C e l s i f E :: "def"<?var f, " v o i d " , ?var body> then Abstract(body) <f { [v] (L R) } => { S ([v] L) ([v] R) } 0 e l s i f E :: "abstraction"<?var v, " a p p l i c a t i o n " < ? v a r L,?var R>> then A b s t r a c t ( L ) ; A b s t r a c t ( R ) ; E => "application"<"application"<"sym"<"S">, "abstraction"<v,L>>, " a b s t r a c t ion"<v,R>> «f { [v] v } => { I } 0 e l s i f E :: " a b s t r a c t ion"<?var v,?var vl> and v :: v l then E => "sym"<"I"> <S { [v] i } => { K i } $ e l s i f E :: "abstraction"<?var v, ?var i> then E => "application"<"sym"<"K">, i> C { L R } $ e l s i f E : : " a p p l i c a t i o n " < ? v a r L, ?var R> then A b s t r a c t ( L ) ; A b s t r a c t ( R l f_i end c; A b s t r a c t <J s e c t i o n b 198 var O p t i m i z e := p r o c e d u r e ( E ) <f { S (K E l ) (K E2) } => { K ( E l E2) } $ i_f E : : " a p p l i c a t i o n " < "application"<"sym"<"S">, "application"<"sym"<"K">, ?var El>>, "application"<"sym"<"K">, ?var E2>> then Opt i m i z e ( E l ) ; O p t i m i z e ( E 2 ) ; E => "application"<"sym"<"K">, " a p p l i c a t i o n " < E l , E 2 > > C ( S (K E l ) I } => { E l H e l s i f E :: " a p p l i c a t i o n " < " a p p l i ca t i on"<"sym"<"S">, "application"<"sym"<"K">, ?var E l » , "sym"<"I">> then O p t i m i z e ( E l ) ; E => E l C { S (R E l ) E2 } -> { B E l E2 } 0 e l s i f E :: " a p p l i c a t i o n " < "application"<"sym"<"S">, "application"<"sym"<"K">, ?var El>>, ?var E2> then O p t i m i z e ( E l ) ; O p t i m i z e ( E 2 ) ; E => "application"<"application"<"sym"<"B">,E1>, E2> C { S E l (K E2) } => { C E l E2 } 0 e l s i f E :: " a p p l i c a t i o n " < "application"<"sym"<"S">,?var E l > , "application"<"sym"<"K">,?var E2>> then O p t i m i z e ( E l ) ; O p t i m i z e ( E 2 ) ; E => " a p p l i c a t i o n " < " a p p l i c a t i o n " < " s y m " < " C " > , E l > , E2> e l s i f E :: " a p p l i c a t i o n " < ? v a r E l , ? v a r E2> then O p t i m i z e ( E l ) ; O p t i m i z e ( E 2 ) ; O p t i m i z e ( E ) f_i end (f O p t i m i z e C s e c t i o n b 199 The reader may wish t o t o be a s s u r e d t h a t t h i s i s c o r r e c t by t r a c i n g the s h o r t sequence:, E = { def sue x = p l u s 1 x } E' = A b s t r a c t ( E ) = { def sue = S (S (K p l u s ) (K 1)) I } E" = O p t i m i z e ( E ' ) = { def sue = p l u s 1 } and, i f he has the courage, the much l o n g e r sequence: E = { def fac n = cond (eq n 0) 1 ( t i m e s n ( f a c (minus n 1 ) ) ) } E' = A b s t r a c t ( E ) = { def fac = S (S (S (K cond) (S (S (K eq) I ) (K 0 ) ) ) (K 1)) (S (S (K t i m e s ) I) (S (K f a c ) (S (S (K minus) I) (K 1) ) ) ) } 2 E" = O p t i m i z e d ' ) = { def fac = S (C (B cond (C eq 0)) 1) (S t i m e s (B fac (C minus 1 ) ) ) } 3 2 T h i s i s s l i g h t l y d i f f e r e n t from T u r n e r ' s v e r s i o n [ 1 2 7 ] , p 3 4 ) , i n which (S (S (K eq) I) (K 0 ) ) ) i s r e p l a c e d by the e n t i r e l y e q u i v a l e n t (S (S (K eq) (K 0) ) I ) ) a r e s u l t t h a t cannot be o b t a i n e d by s t r i c t a p p l i c a t i o n of T u r n e r ' s r u l e s . The e q u i v a l e n c e of t h e s e two e x p r e s s i o n s , which t e s t f o r e q u a l i t y t o z e r o , can be v e r i f i e d by a p p l y i n g them both t o a v a r i a b l e , say x, and r e d u c i n g both e x p r e s s i o n s so formed t o "eq 0 x". 3 There i s a t y p o g r a p h i c a l e r r o r i n S o f t w a r e — P r a c t i c e and  E x p e r i e n c e ( [ 1 2 7 ] , p 3 5 ) , which l e a v e s out the second "C". s e c t i o n b 200 C: New f o r m u l a t i o n The "new" f o r m u l a t i o n of s y n t a c t i c t r a n s f o r m a t i o n was a l r e a d y d e f i n e d i n f o r m a l l y i n some d e t a i l d u r i n g the d i s c u s s i o n s of c h a p t e r 4. However, f o r the sake of u n i f o r m i t y w i t h the p r e c e d i n g s e c t i o n on the o l d f o r m u l a t i o n , and e s p e c i a l l y f o r the r e a d e r ' s c o n v e n i e n c e , the f o l l o w i n g i s a s h o r t s y n o p s i s of enough of the language t o make r e a d i n g the SASL " c o m p i l e r " p o s s i b l e . Once more i t i s n e c e s s a r y t o imagine t h a t a p a r s e r / t r e e b u i l d e r and a " f l a t t e n e r " e x i s t , or can be c o n s t r u c t e d w i t h i n a t r a n s l a t o r w r i t i n g system based on s y n t a c t i c t r a n s f o r m a t i o n . However, t h i s time each t r a n s l a t i o n program i n c l u d e s a s y n t a c t i c d e f i n i t i o n of the source and t a r g e t languages which d e f i n e s the r e l a t i o n s h i p between the source and t a r g e t languages and the s y n t a c t i c graphs b e i n g m a n i p u l a t e d . 4 T h i s d e f i n i t i o n i n t u r n makes i t much e a s i e r t o read and comprehend the t r a n s l a t i o n program -- as a comparison of the program i n t h i s s e c t i o n w i t h t h a t i n s e c t i o n B s h o u l d d e m o nstrate. We b e g i n by d e f i n i n g the language forms which d i f f e r from the o l d f o r m u l a t i o n : 4 In the case of our SASL example, the source and t a r g e t languages are both SASL, so o n l y one s y n t a c t i c d e f i n i t i o n i s needed. section c 201 language L JJB [syntactic d e f i n i t i o n ] end The syntactic d e f i n i t i o n consists of an augmented grammar in which nonterminal symbols are delimited by angle brackets and terminal symbols by quotes, attributes may be attached to any nonterminal symbol with the operators "A" (synthesized attribute) and "y"" (inherited a t t r i b u t e ) , 5 syntactic rules are separated by periods l e f t and right hand sides of rules are separated by the BNF production symbol "::=", alternative right hand sides of rules are separated by the BNF alternative separator "|", and "semantic actions", consisting of assignments of subtree expressions (as used in section B) to attribute symbols are enclosed in square brackets " [ " and " ] " . The rule <bin>Ab ::= <bin>Abl <bin>Ab2 [b := "b"<bl,b2>] | "leaf" [b := "leaf"] for example, ambiguously defines the set of a l l binary trees with leaves labelled "leaf" and nodes labeled "b", such that the expression {leaf leaf} corresponds to the graph (a binary tree) b I 1 1 leaf leaf and the expression {leaf leaf leaf} may correspond to either of the graphs (binary t r e e s ) 6 5 The d e f i n i t i o n of SASL i s context free and requires no inherited a t t r i b u t e s . s e c t i o n c 202 b b 1 I 1 , l e a f l e a f b i I I 1 l e a f l e a f l e a f l e a f transform v a r i a b l e matching p a t t e r n E ;...; =>E ;...; E 1 11 l i In n . . . n p a t t e r n E ; . . . ; =>E ; . . . ; E k k l kj km end This d e f i n e s a transducer, a procedure which may be a p p l i e d in p a r a l l e l to any number of s y n t a c t i c graph e x p r e s s i o n s , as in A b s t r a c t ( E l , E2) in the f o l l o w i n g program: " A b s t r a c t " i s a v a r i a b l e p o s s e s s i n g a t r a n s d u c e r , and i s a p p l i e d to E l and E2 ( p o s s i b l y in p a r a l l e l ) . Any graph " v a r i a b l e " , the formal argument of the transducer, i s matched a g a i n s t the k graph p a t t e r n s one at a time. If any of them succeeds, the subsequent expressions are e v a l u a t e d . Any e x p r e s s i o n c o n s i s t i n g of a t r a n s f o r m a t i o n symbol f o l l o w e d by an e x p r e s s i o n y i e l d i n g a graph causes a t r a n s f o r m a t i o n i n the p a t t e r n graph. 6 Ambiguity i s to be frowned on i n s y n t a c t i c d e f i n i t i o n s . In the p r e c e d i n g , n e i t h e r the reader of the phrase {leaf l e a f leaf} nor the compiler can be e n t i r e l y sure of the u n d e r l y i n g s t r u c t u r e . If the compiler determines on one i n favor of the other, any p a t t e r n match in which t h i s t r e e i s used as a p a t t e r n i s l i a b l e to f a i l . The syntax f o r SASL, given below, i s not ambiguous. s e c t i o n c 203 Now we a r e ready t o lo o k a t the SASL c o m p i l e r i n the r e v i s e d form: var S A S L _ c o m p i l e r := procedure (Parsed_SASL_program) language SASL .is <expression> 4 T R E E : : = < d e f i n i t i o n > ^ t r e e | <fact o r > 4 T R E E • < d e f i n i t i o n > 4 T R E E : : = "def" <symbol> ^sym ^ f a c t [ t r e e := "def"<sym,"void",fact>] . | "def" <symbol> ^syml <symbol> fysym2 "=" <factor> ^ f a c t [ t r e e := "def"<syml,sym2,fact>] . <fac t o r > 4 T R E E : : = <primary> ^ t r e e | <factor> ^ f a c t <primary> 4P ri- m [ t r e e := " a p p l i c a t i o n " < f a c t , p r i m > | < a b s t r a c t i o n > 4 T R E E • <primary> 4 T R E E : : = <symbol> ^ t r e e | " ( " < f a c t o r > i^tree " ) " . < a b s t r a c t i o n > ^ t r e e ::= " [ " <symbol> ^sym "]" <factor> ^ f a c t [ t r e e := " a b s t r a c t i o n " < s y m , f a c t > ] . <symbol> ^ t r e e ::= [ d e f i n e d l e x i c a l l y ] end; var Improve := procedure (E) A b s t r a c t ( E ) ; O p t i m i z e (E) end C Improve $; s e c t i o n c 204 var A b s t r a c t := transform E matching SASL(<symbol> f,v; <factor> body): { def f v = body } A b s t r a c t ( b o d y ) ; => SASL(<symbol> f,v; <factor> body): { def f = [v] body } n SASL(<symbol> f; <factor> body): { def f body } Abstract(body) a SASL(<symbol> v; <factor> L; <primary> R): { [v] (L R) } A b s t r a c t (L, R); => SASL(<symbol> v; <factor> L; <primary> R): • { S ([v] L) ([v] R) } n SASL(<symbol> v ) : {[v] v} => SASL: { I } o SASL(<symbol> v , i ) : {[v] i} => SASL(<symbol> i ) : {k i} n SASL(<factor> L; <primary> R): {L R} A b s t r a c t (L, R) end <: A b s t r a c t C; s e c t i o n c 205 var Optimize := transform E matching SASL(<primary> E1,E2): {S (K E l ) (K E2)} Optimize ( E l , E2) ; => SASL(<primary> E1,E2): {K ( E l E2)} n SASL(<primary> E1,E2): {S (K E l ) E2} Optimize ( E l , E2); => SASL(<primary> E1,E2): {B E l E2} a SASL(<primary> E1,E2): {S E l (K E2)} Optimize ( E l , E2); => SASL(<primary> E1,E2): {C E l E2} n SASL(<factor> E l ; <primary> E2): {El E2} => Optimize ( E l , E2)' end C Optimize C; Improve (Parsed_SASL_Program) end <: SASL_Compiler $ The compiler i s c a l l e d with a parsed SASL program or ex p r e s s i o n , which i t transforms i n t o a "compiled" or v a r i a b l e - f r e e SASL e x p r e s s i o n . For example, var E := P a r s e r ( ) ; SASL_Compiler (E); F l a t t e n e r (E) H. Abramson, T. Rushworth, and T. Venema TOSI: A t r e e o r i e n t e d s t r i n g i n t e r p r e t e r f o r the design and implementation of semantics Software -- Pract i c e and Exper ience 7 (1977), pp 663-670 A.V. Aho and J.D. Ullman The theory of p a r s i n g , t r a n s l a t i o n , and c o m p i l i n g ; Volume I_: P a r s i n g P r e n t i c e - H a l l , 1972 A.V. Aho and J.D. Ullman P r i n c i p l e s of compiler design Addison-Wesley, 1977 E. R. Anderson, F.C. Belz, and E.K. Blum SEMANOL(73): A metalanguage f o r programming the semantics of programming languages Acta I n f o r m a t i c a 6 (1976), pp 109-131 W.F. Appelbe A semant i c r e p r e s e n t a t ion f o r t r a n s l a t ion of h i g h - l e v e l a l g o r i t h m i c languages PhD T h e s i s , Department of Computer Science, The U n i v e r s i t y of B r i t i s h Columbia, 1978 J . J . Arsac S y n t a c t i c source to source transforms and program manipulation Communications of the ACM 22 (1979), pp 43-54 P.R. Bagley P r i n c i p l e s and problems of a u n i v e r s a l computer-oriented language Computer J o u r n a l 4 (1962), pp 305-312 J.L. Baker Notes f o r a course in automata theory Department of Computer Science, The U n i v e r s i t y of B r i t i s h Columbia, 1976 V.R. B a s i l i The SIMPL f a m i l y of programming languages and compilers T e c h n i c a l r e p o r t TR-305, U n i v e r s i t y of Maryland Comp. S c i . Center, 1974 F. L. Bauer and J . E i c k e l (eds.) Compiler c o n s t r u c t i o n : An advanced course S p r i n g e r - V e r l a g , 1976 F.L. Bauer H i s t o r i c a l remarks on compiler c o n s t r u c t i o n 207 in [10] , pp 603-621 [12] F.L. Bauer, M. Broy, R. Gnatz, W. Hesse, B. Krieg-Brueckner, H. P a r t s c h , P. Pepper, and H. Woessner Towards a wide spectrum language to support program s p e c i f i c a t i o n and program development SIGPLAN N o t i c e s 13 12 (December 1978), pp 15-24 [13] R.E. Berry Experience with the PASCAL P-compiler Software — P r a c t i c e and Experience 8 (1978), pp 617-627 [14] D.G. Bobrow and T. Winograd An overview of KRL, a knowledge r e p r e s e n t a t i o n language C o g n i t i v e Science 1 (1977), pp 3-46 [15] G.V. Bochmann Semantic e v a l u a t i o n from l e f t to r i g h t Communications of the ACM 19 (1976), pp 55-62 [16] G.V. Bochmann Compiler w r i t i n g system f o r a t t r i b u t e grammars The Computer J o u r n a l 21 (1978), pp 144-148 [17] H. Boom The o r g a n i z a t i o n of the o b j e c t code generator in A l g o l 68 H T e c h n i c a l r e port IW 33/75, S t i c h t i n g Mathematisch Centrum, Amsterdam, 1975 [18] P.J. Brown The ML/I macro processor Communications of the ACM 10 (1967), pp 618-623 [19] R.M. B u r s t a l l and J . D a r l i n g t o n A t r a n s f o r m a t i o n system f o r developing r e c u r s i v e programs J o u r n a l of the ACM 24 (1977), pp 44-67 [20] J.L. C a r t e r A case study of a new code generation technique for compilers Communications of the ACM 20 (1977), pp 914-920 [21] R.G. C a t t e l l A survey and c r i t i q u e of some models of code generation T e c h n i c a l r e p o r t , Carnegie-Mellon U n i v e r s i t y , 1977 [22] R.G. C a t t e l l , J.M. Newcomer, and B.W. L e v e r e t t Code generation i n a machine-independent compiler SIGPLAN N o t i c e s 14 8 (August 1979), pp 65-75 208 [23] L.M. C h i r i c a and D.F. Martin An approach to compiler c o r r e c t n e s s SIGPLAN Not i c e s 10 6 (June 1975), pp 96-103 [24] N. Chomsky On c e r t a i n formal p r o p e r t i e s of grammars Information and C o n t r o l 2 (1959), pp 137-167 [25] A.J. Cole Macro p r o c e s s o r s Cambridge U n i v e r s i t y Press, 1976 [26] S.S. Coleman, P.C. Poole, and W.M. Waite The Mobile Programming System, JANUS Software -- P r a c t i c e and Experience 4 (1974), pp 5-23 [27] M.E. Conway Design of a separable t r a n s i t i o n diagram compiler Communications of the ACM 6 (1963), pp 396-408 [28] J.R. Cordy, R.C. H o l t , and D.B. Wortman Semantic c h a r t s : A diagrammatic approach to semantic p r o c e s s i n g SIGPLAN Not i c e s 14 8 (August 1979), pp 39-49 [29] D. Crowe Generating p a r s e r s f o r a f f i x grammars Communications of the ACM 15 (1972), pp 728-732 [30] J . D a r l i n g t o n and R.M. B u r s t a l l A system which a u t o m a t i c a l l y improves programs Acta I n f o r m a t i c a 6 (1976), pp 41-60 [31] R.A. D e M i l l o , R.J. L i p t o n , and A.J. P e r l i s S o c i a l processes and proof s of theorems and programs Communications of the ACM 22 (1979), pp 271-280,614 [32] F.L. DeRemer and H. Kron Programming-in-the-large versus programming-in-the-small SIGPLAN Not i c e s 10 6 (June 1975), pp 114-121 [33] F.L. DeRemer T r a n s f o r m a t i o n a l grammars in [10], pp 121-145 [34] E.W. D i j k s t r a Guarded commands, nondeterminacy and formal d e r i v a t i o n s of programs Communications of the ACM 18 (1975), pp 453-457 [35] E.W. D i j k s t r a A d i sc i p l i n e of programming P r e n t i c e - H a l l , 1976 209 [36] J . E a r l e y An e f f i c i e n t c o n t e x t - f r e e p a r s i n g a l g o r i t h m Communications of the ACM 13 (1970), pp 94-102 [37] E.F. Elsworth Compilation v i a an intermediate language The Computer J o u r n a l 22 (1978), pp 226-233 [38] J . Feder Plex languages Information Sciences 3 (1971), pp 225-241 [39] J . Feldman and D. G r i e s T r a n s l a t o r w r i t i n g systems: An e x p l o r a t i o n of concepts and p r i n c i p l e s Communications of the ACM 11 (1968), pp 77-113 [40] R.W. F l o y d A d e s c r i p t i v e language for symbol manipulation J o u r n a l of the ACM 8 (1961), pp 579-584 [41] R.W. F l o y d On the nonexistence of a phrase s t r u c t u r e grammar f o r A l g o l 60 Communications of the ACM 5 (1962), pp 483-484 [42] R.W. F l o y d A s s i g n i n g meaning to programs Proceedings of the Amer. Math. S o c , Symposia in A p p l i e d Math., 19 (1967), pp 19-31 [43] R.W. F l o y d Toward i n t e r a c t i v e design of c o r r e c t programs T e c h n i c a l report no. CS-235, or AI Memo AIM-150, Computer Science Department, S t a n f o r d U n i v e r s i t y , 1971 [44] R.A. F r a l e y Unlanguage grammars and t h e i r uses T e c h n i c a l report 77-6, Department of Computer Science, The U n i v e r s i t y of B r i t i s h Columbia, 1977 [45] M.R. Garey and D.S. Johnson Computers and i n t r a c t i b i l i t y : A guide to the theory of NP-completeness W.H. Freeman and Company, 1979 [46] J.V. Garwick GPL: A t r u l y general purpose language Communications of the ACM 11 (1968), pp 634-638 [47] F. G e i s e l b r e c h t i n g e r , W. Hesse, B. R r i e g , and H. S c h e i d i g Language l a y e r s , p o r t a b i l i t y and program s t r u c t u r e in [130], pp 79-99 210 [48] CM. Geschke, J.H. M o r r i s J r , and E.H. S a t t e r t h w a i t e E a r l y experiences with Mesa Communications of the ACM 20 (1977), pp 540-553 [49] S.L. Graham and M.A. H a r r i s o n P a r s i n g of general c o n t e x t - f r e e languages Advances in Computers 14 (1976), pp 77-185; E a r l e y ' s a l g o r i t h m pp 122-139 [50] S.L. Graham, M.A. H a r r i s o n , and W.L. Ruzzo An improved c o n t e x t - f r e e r e c o g n i z e r ACM Transact ions on Programming Languages and Systems 2 (1980), pp 415-462 [51] S.L. Graham T a b l e - d r i v e n code generation IEEE Computer, 13 8 (August 1980), pp 25-34 [52] S.A. Greibach Theory of program s t r u c t u r e s : Schemes, semanties, ver i f i c a t ion (Lecture notes in computer s c i e n c e , 36) S p r i n g e r - V e r l a g , 1975 [53] D. G r i e s Compiler c o n s t r u c t ion f o r d i g i t a l computers John Wiley & Sons, 1971 [54] D. G r i e s E r r o r recovery and c o r r e c t i o n : An i n t r o d u c t i o n to the l i t e r a t u r e i n [10], pp 627-638 [55] W.J. Hansen Creat ion of h i e r a r c h i c t e x t with a computer d i splay PhD T h e s i s , S t a n f o r d U n i v e r s i t y , 1971 [56] D.R. Hanson RATSNO -- An experiment in software a d a p t a b i l i t y Software — P r a c t i c e and Experience 7 (1977), pp 625-630 [57] L.E. Heindel and J.T. Roberto LANG-PAR -- An i n t e r a c t i v e language design system American E l s e v i e r , 1975 [58] P.G. Hibbard and S.A. Schuman (eds.) C o n s t r u c t i n g q u a l i t y software (Proceedings of the IFIP Working Conference on C o n s t r u c t i n g Q u a l i t y Software) North-Holland P u b l i s h i n g Co., 1978 [59] C.A.R. Hoare An axiomatic b a s i s f o r computer programming Communications of the ACM 1 2 ( 1 9 6 9 ) , pp 576-580,583 211 [60] C.A.R. Hoare and N. Wirth An axiomatic d e f i n i t i o n of the programming language Pascal Acta I n f o r m a t i c a 2 ( 1 9 7 3 ) , pp 335-355 [61] J . J . Horning What the compiler should t e l l the user in [10 ] , pp 526-548 [62] G. Huet and B. Lang Proving and a p p l y i n g program t r a n s f o r m a t i o n s expressed with second-order p a t t e r n s Acta I n f o r m a t i c a 11 (1978), pp 31-55 [63] E.T. Irons A syntax d i r e c t e d compiler f o r A l g o l 60 Communications of the ACM 4 (1961), pp 51-55 [64] K. Jensen and N. Wirth PASCAL: User manual and re p o r t S p r i n g e r - V e r l a g , 1974 [65] S.C. Johnson YACC: Yet another compiler-compiler Computing Science T e c h n i c a l Report #32, B e l l L a b o r a t o r i e s , Murray H i l l , NJ, 1978 [66] S.C. Johnson and M.E. Lesk Unix time sharing system: Language development t o o l s B e l l System T e c h n i c a l J o u r n a l 57 6 (July-August 1978), pp 2155-2175 [67] S.C. Johnson Language development t o o l s on the Unix system IEEE Computer, 13 8 (August 1980), pp 16-20 [68] B.W. Kernighan RATFOR -- A preprocessor f o r a r a t i o n a l FORTRAN Software -- P r a c t i c e and Experience 5 (1975), pp 395-406 [69] B.W. Kernighan and P.J. Plauger . Software t o o l s Addison-Wesley, 1976 [70] D.E. Knuth Backus Normal Form vs. Backus Naur Form Communications of the ACM 7 (1964), pp 735-736 [71] D.E. Knuth Semantics of c o n t e x t - f r e e languages Mathematical Systems Theory 2 (1968), pp 127-145; Math. Sys. Th. 5 (1971), p 95 212 [72] D.E. Knuth S t r u c t u r e d programming with goto statements Computing Surveys 6 (1974), pp 261-301 [73] C.H.A. Koster A f f i x grammars in [95], pp 95-106 [74] C.H.A. Koster Using the CDL compiler-compiler in [10], pp 366-426 [75] B.M. Leavenworth Syntax macros and extended t r a n s l a t i o n Communications of the ACM 9 (1966), pp 790-793 [76] H.F. Ledgard Production systems: or, Can we do b e t t e r than BNF? Communications of the ACM 17 (1974), pp 94-102 [77] H.F. Ledgard Production systems: A n o t a t i o n f o r d e f i n i n g syntax and semant i e s IEEE Transact ions on Software E n g i n e e r i n g 3 (1977), pp 105-124 [78] B.W. L e v e r e t t , R.G.G. C a t t e l l , S.O. Hobbs, J.M. Newcomer, A.H. Reiner, B.R. Schatz, and W.A. Wulf An overview of the Production Q u a l i t y Compiler-compiler p r o j e c t T e c h n i c a l report CMU-CS-79-105, Department of Computer Science, Carnegie-Mellon U n i v e r s i t y , 1979 [79] B. L i s k o v and S. Z i l l e s S p e c i f i c a t i o n techniques f o r data a b s t r a c t i o n s SIGPLAN Notices 10 6 (June 1975), pp 72-87 [80] B. L i s k o v , A. Snyder, R. Atk i n s o n , and C. S c h a f f e r t A b s t r a c t i o n mechanisms in CLU Communications of the ACM 20 (1977), pp 564-576 [81] E.S. Lowry and C.W. Medlock Object code o p t i m i z a t i o n Communications of the ACM 12 (1969), pp 13-22 [82] M.D. M c l l r o y Macro i n s t r u c t i o n e x tensions of compiler languages Communications of the ACM 3 (1960), pp 214-220 [83] W.M. McKeeman Peephole o p t i m i z a t i o n Communications of the ACM 8 (1965), pp 443-445 213 [84] W.M. McKeeman, J . J . Horning, and D.B. Wortman A compiler generator P r e n t i c e - H a l l , 1970 [85] W.M. McKeeman Compiler c o n s t r u c t i o n in [10], pp 1-36 [86] Z. Manna and R.J. Waldinger Toward automatic program s y n t h e s i s Communications of the ACM 14 (1971), pp 151-165 [87] M. Marcotty, H.F. Ledgard, and G.V. Bochman A sampler of formal d e f i n i t i o n s Computing Surveys 8 (1976), pp 191-276 [88] J.R. Metzner A graded b i b l i o g r a p h y on macro systems and e x t e n s i b l e languages SIGPLAN N o t i c e s 14 1 (January 1979), pp 57-68 [89] C.N. Mooers TRAC, A p r o c e d u r e - d e s c r i b i n g language f o r the r e a c t i v e t y p e w r i t e r Communications of the ACM 9 (1966), pp 215-219 [90] J.B. M o r r i s Data a b s t r a c t i o n : A s t a t i c implementation s t r a t e g y SIGPLAN N o t i c e s 14 8 (August 1979), pp' 1-7 [91] P.D. Mosses Mathemat i c a l semant i c s and compiler generat ion PhD T h e s i s , The U n i v e r s i t y of Oxford, 1975 [92] P. Naur The European side of the l a s t phase of the development of A l g o l 60 SIGPLAN N o t i c e s 13 8 (August 1978), pp 15-44 [93] M.C. Newey, P.C. Poole, and W.M. Waite A b s t r a c t machine mod e l l i n g to produce p o r t a b l e software --A review and e v a l u a t i o n Software — P r a c t i c e and Experience 2 (1972), pp 107-136 [94] K.V. N o r i , U. Amann, K. Jensen, and H.H. Naegeli The PASCAL "P" compiler: Inplementation notes T e c h n i c a l r e p o r t , ETH, Z u e r i c h (undated) [95] J.E.L. Peck (ed.) A l g o l 68 implementation North H o l l a n d P u b l i s h i n g Company, 1971 214 [96] A.J. P e r l i s and K. Samelson P r e l i m i n a r y report -- I n t e r n a t i o n a l A l g o r i t h m i c Language Communications of the ACM 1 12 (December 1958), pp 8-22 [97] A.J. P e r l i s The American si d e of the development of A l g o l SIGPLAN N o t i c e s 13 8 (August 1978), pp 3-14 [98] J.L. P f a l t z and A. Rosenfeld Web grammars Proc. I n t . J o i n t Conf. on A r t i f i c i a l I n t e l l i g e n c e , Bedford, MA, 1969, pp 609-619 [99] R.H. P i e r c e and J . Rowell A t r a n s f o r m a t i o n - d i r e c t e d c o m p i l i n g system The Computer J o u r n a l , May 1977, pp 109-115 [100] P.C. Poole Towards improved r e l i a b i l i t y and e f f i c i e n c y through hybr i d s in [58], pp 63-73 [101] J.C. Reynolds COGENT programming manual Research and development report ANL-7022, Argonne N a t i o n a l Laboratory, 1965 [102] M. Richards The p o r t a b i l i t y of the BCPL compiler Software -- Pract i c e and Exper ience 1 (1971), pp 135-146 [103] M. Richards B o o t s t r a p p i n g the BCPL compiler using INTCODE in [130], pp 265-270 [104] B.R. Rosen Tree-manipulating systems and Church-Rosser theorems J o u r n a l of. the ACM 20 (1973), pp 160-187 [105] B.K. Rosen D e r i v i n g graphs from graphs by a p p l y i n g a p r o d u c t i o n Acta I n f o r m a t i c a 4 (1975), pp 337-357 [106] S. Rosen (ed.) Programming systems and languages McGraw-Hill, 1967 [107] S. Rosen The A l g o l programming language in [106], pp 48-78 [108] M.A. Sabin P o r t a b i l i t y -- Some experiences with FORTRAN 215 Software — P r a c t i c e and Exper ience 6 (1976), pp 393-396 [109] J.E. Sammet The e a r l y h i s t o r y of COBOL SIGPLAN N o t i c e s 13 8 (August 1978), pp 121-161 [110] E. Sandewall Programming in an i n t e r a c t i v e environment: The LISP experience Computing Surveys 10 (1978), pp 35-71 [111] M. Sassa A p a t t e r n matching macro pr o c e s s o r Software -- P r a c t i c e and Experience 9 (1979), pp 439-456 [112] B.R. Schatz, B.W. L e v e r e t t , J.M. Newcomer, A.H. Reiner, and W.A. Wulf TCOL(ADA): An intermediate r e p r e s e n t a t i o n f o r the DOD Standard Programming Language T e c h n i c a l report CMU-CS-79-112, Department of Computer Science, Carnegie-Mellon U n i v e r s i t y , 1979 [113] D.V. Schorre META-II: A s y n t a x - o r i e n t e d compiler w r i t i n g language Proceedings of the ACM 19th N a t i o n a l Conference, 1964, s e c t i o n DI.3 [114] B. Schwanke Survey of scope is s u e s in programming languages T e c h n i c a l r e p o r t CMU-CS-78-131, Department of Computer Scien c e , Carnegie-Mellon U n i v e r s i t y , 1978 [115] L.G. Shapiro and R.J. Baron ESP 3: A language for p a t t e r n d e s c r i p t i o n and a system f o r p a t t e r n r e c o g n i t i o n IEEE T r a n s a c t i o n s on Software E n g i n e e r i n g 3 (1977), pp 169-183 [116] A.C. Shaw Pa r s i n g of graph r e p r e s e n t a b l e p i c t u r e s J o u r n a l of the ACM 17 (1970), pp 453-481 [117] A.C. Shaw A model for document p r e p a r a t i o n systems ( d r a f t ) Department of Computer Scie n c e , U n i v e r s i t y of Washington, 1978 [118] N.C. Shu, B.C. Housel, and V.Y. Lum CONVERT: A high l e v e l t r a n s l a t i o n d e f i n i t i o n language f o r data c o n v e r s i o n Communications of the ACM 18 (1975), pp 557-567 0 216 [119] J . Sklansky, M. F i n k e l s t e i n , and E.C. R u s s e l l A formalism for program t r a n s l a t i o n J o u r n a l of the ACM 15 (1968), pp 165-175 [120] C. Strachey A g e n e r a l purpose macrogenerator The Computer J o u r n a l 8 (1965), pp 225-241 [121] Y. S u g i t o , Y. Mano, and K. T o r i i On a two-dimensional graph manipulation language GML Systems«Computers«Controls 7 (1976), pp 1-9 [122] A.S. Tanenbaum A general-purpose macro processor as a poor man's compiler-compiler IEEE T r a n s a c t i o n s on Software E n g i n e e r i n g 2 (1976), pp 121-125 [123] T. Teitelbaum and T. Reps The C o r n e l l program s y n t h e s i z e r : A syntax d i r e c t e d programming environment T e c h n i c a l r e p o r t , C o r n e l l U n i v e r s i t y , May 1980 [124] R.D. Tennent The d e n o t a t i o n a l semantics of programming languages Communications of the ACM 19 (1976), pp 437-453 [125] V.F. T u r c h i n A supercompiler system based on the language REFAL SIGPLAN N o t i c e s 14 2 (February 1979), pp 46-54 [126] D.A. Turner Another a l g o r i t h m f o r bracket a b s t r a c t i o n The J o u r n a l of Symbolic Logic 44 (1978),pp 67-70 [127] D.A. Turner A new implementation technique f o r a p p l i c a t i v e languages Software — P r a c t i c e and Experience 9 (1979), pp 31-49 [128] P. van den Bosch The design and implementation of a document processor M.Sc T h e s i s , Department of Computer Science, The U n i v e r s i t y of B r i t i s h Columbia, 1974 [129] P. van den Bosch On the joys of axiomatic semantics Term paper f o r a course on t o p i c s in programming languages, Department of Computer Science, The U n i v e r s i t y of B r i t i s h Columbia, 1977 [130] W.L. van der Poel and L.A. Maarssen' (eds.) Machine o r i e n t e d higher l e v e l languages North-Holland, 1974 217 [131] A. van Wijngaarden, B.J. M a i l l o u x , J.E.L. Peck, and C.H.A. Roster Report on the a l g o r i t h m i c language A l g o l 68 Numerische Mathematik 14 (1969), pp 79-218; W-grammars d e f i n e d in s e c t i o n 1.1, pp 88-93 [132] A. van Wijngaarden, B.J. M a i l l o u x , J.E.L. Peck, C.H.A. Roster, M. S i n t z o f f , -CH. Lindsey, L.G.L.T. Meertens, and R.G. F i s k e r Revised r e p o r t on the a l g o r i t h m i c language A l g o l 68 S p r i n g e r - V e r l a g , 1976 [133] T. Venema TRUST user's guide T e c h n i c a l manual, Department of Computer Science, The U n i v e r s i t y of B r i t i s h Columbia, 1976 [134] W.M. Waite A language-independent macro processor Communications of the ACM 10 (1976), pp 433-440 [135] W.M. Waite The Mobile Programming System: STAGE2 Communications of the ACM 13 (1970), pp 415-421 [136] W.M. Waite Opt im i z a t ion in [10], pp 549-602 [137] A. Wang A case study in program t r a n s f o r m a t i o n BIT 16 (1976), pp 322-331 [138] B. Wegbreit G o a l - d i r e c t e d program t r a n s f o r m a t i o n IEEE T r a n s a c t i o n s on Software E n g i n e e r i n g 2 (1976), pp 69-80 [139] P. Wegner The Vienna D e f i n i t i o n Language Computing Surveys 4 (1972), pp 5-63 [140] G. Winiger A note on one-pass CASE statement c o m p i l a t i o n SIGPLAN N o t i c e s 11 1 (January 1976), pp 32-36 [141] N. Wirth Program development by stepwise refinement Communications of the ACM 14 (1971), pp 221-227 [142] N. Wirth PASCAL-S: A subset and i t s implementation T e c h n i c a l report #12, ETH Zu e r i c h , 1975 218 [143] W.A. Woods C o n t e x t - s e n s i t i v e p a r s i n g Communications of the ACM 13 (1970), pp 437-445 [144] J.M. Wozencraft and A. Evans, J r . Notes on programming l i n g u i s t i c s Department of E l e c t r i c a l E n g i n e e r i n g , Massachussetts I n s t i t u t e of Technology, 1971 [145] W.A. Wulf, R.L. London, and M. Shaw An i n t r o d u c t i o n to the c o n s t r u c t i o n and v e r i f i c a t i o n of ALPHARD programs IEEE Transact ions on Software E n g i n e e r i n g 2 (1976), pp 253-265 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051767/manifest

Comment

Related Items