UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The design and implementation of a document processor Van den Bosch, Peter Nico 1974

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1974_A6_7 V34_8.pdf [ 7.25MB ]
Metadata
JSON: 831-1.0051838.json
JSON-LD: 831-1.0051838-ld.json
RDF/XML (Pretty): 831-1.0051838-rdf.xml
RDF/JSON: 831-1.0051838-rdf.json
Turtle: 831-1.0051838-turtle.txt
N-Triples: 831-1.0051838-rdf-ntriples.txt
Original Record: 831-1.0051838-source.json
Full Text
831-1.0051838-fulltext.txt
Citation
831-1.0051838.ris

Full Text

The Design and Implementation of a Document Processor by Peter N. van den Bosch B.Sc., University of B r i t i s h Columbia, 1972 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n the Department of Computer Science We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA October, 1974 In p r e s e n t i n g t h i s t h e s i s in p a r t i a l f u l f i l m e n t o f the r e q u i r e m e n t s f o r an advanced degree at the U n i v e r s i t y o f B r i t i s h Co lumb ia , I ag ree that the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s tudy . I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d by the Head o f my Department o r by h i s r e p r e s e n t a t i v e s . It i s u n d e r s t o o d that c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d w i thout my w r i t t e n p e r m i s s i o n . Department o f C o ^ Q < £ ^ ^ C ^ h ^ C f e The U n i v e r s i t y o f B r i t i s h Co lumbia Vancouver 8, Canada Date <\ O c * Q t e < l tV^ Abstract with the growing use of computers as tools for the automation of c l e r i c a l tasks, there has come not only a p r o l i f e r a t i o n of documentation, but the r e a l i z a t i o n that computers could be employed i n automating certain aspects cf the production cf documents — not only such documents as describe computer developments, but also papers, b r i e f s , l e t t e r s , etc. The runoff program, usually an adjunct to a text editing f a c i l i t y , has been in existence for a long time, but i t s use has been limited tc computing i n s t a l l a t i o n s and those d i r e c t l y involved with computing. The reason for t h i s i s twc-fcld: public unawareness, and the ad hoc nature of runoff program design have prevented wider use. This thesis i s an attempt to present a reasoned design of a program which acts enough l i k e a rather i n t e l l i g e n t typewriter to be usable by members of the public, but gives the user with a greater computing background enough power of expression in terms of programming language and layout design, to overcome some of the l i m i t a t i o n s of e a r l i e r runoff programs. Previous work in the area of text processing which relates to document processing i s examined in some d e t a i l . The underlying ideas common to existing document-processing f a c i l i t i e s are brought f o r t h , and examined in the l i g h t of what a user might reasonably expect of such a f a c i l i t y . The resu l t i n g design for a document processor i s presented in an orderly fashion, o u t l i n i n g the reasons for design decisions and backing away r e s p e c t f u l l y from designs which are unfeasible for economic inplementation. An entire chapter i s devoted to a description of the r e s u l t i n g document processor, in the form cf a somewhat r a r i f i e d user's manual. Suggestions for and d e t a i l s of an implementation are given, based cn the author's own experiences with implementation. A bibliography and a short glossary of the most important terms and those most l i k e l y to confuse are appended to the thesis. i i Table of Contents Introduction 1 Chapter 1: A Review of Relevant Literature 10 A: A General Survey 11 B: Manuals and Descriptive Papers 14 1. A survey of existing systems 15 2. Design Considerations 21 2.1 Command Language 21 2.2 Layout Control 24 2.3 J u s t i f i c a t i o n 32 Chapter 2: The Design of a Document Processor 37 1. C r i t e r i a for Design 37 2. Input Text and Command Language 38 3. Page Segmentation 42 4. Non-linear Text 50 5. J u s t i f i c a t i o n 57 6. User-interaction with the Formatting Process 59 7. Defaulting Gracefully 62 8. Source L i s t i n g s 63 Chapter 3: Texture: A User's Manual 65 1. Using Texture immediately 67 2. Some more Texture 68 3. Eureka! 80 4. The rest of Texture 93 Chapter 4: Implementing Texture 115 1. Choice of Programming Language 115 2. Eureka 118 3. Texture and Eureka 122 4. Texture and Text 125 5. Keeps, Figures, Footnotes and As-is Text 129 6. Blocks and Layouts 135 7. Environments 136 8. A Note on Primitives 137 Chapter 5: Conclusions 140 Bibliography 148 Glossary 153 Credits 156 i i i icknowled^jents Many people have shown int e r e s t in the work described in t h i s thesis, and i t i s impossible a l l those who have contributed by asking pointed questions and have made he l p f u l suggestions; however, there are some who must be acknowledged i n d i v i d u a l l y . In the f i r s t place, my supervisors: John L. Baker, who has put a great deal of time and e f f o r t into the improvement of t h i s t h e s i s , and whose high standards have caused my chew up the ends of several perfectly good pencils; Alan Ballard, whose considered opinions on many subjects, including document processing, have livened many a coffee break; and J.E.L. Peck, who supervised my f i r s t year as a graduate student, and whose enthusiasm for TRAC f i n a l l y got through to me. I would also l i k e to acknowledge the contributions of Michael Gorlick, who provided the germ for many of the ideas in Texture and, i n c i d e n t a l l y , i t s name; and of Vince Manis, who said "Eureka!" at p recisely the right moment. F i n a l l y , I would l i k e to thank the National Research Council of Canada for f i n a n c i a l support, and my parents for moral support in more ways than can be recounted here. iv "You s h a l l see them on a beautiful quarto page, where a neat r i v u l e t of text s h a l l meander through a meadow of margin," 1 Introduction With the growth in computing has come a p a r a l l e l growth i n documentation. T r a d i t i o n a l l y such documentation i s typed, and i f i t i s to be widely disseminated i t i s reproduced by mimeograph or photocopy. This i s a perfectly reasonable procedure for r e l a t i v e l y stable documents, but i t i s i n the nature of computing to be subject to rapid change and regeneration, with the res u l t that related documentation must be regenerated equally r a p i d l y . This means that a document must be largely retyped for each generation, since the the insertion or deletion of even a moderate amount of text tends to a l t e r the / document throughout. That i s a l o t of work. The reason i t i s done at a l l i s that i t i s necessary for documentation to be kept up to date and to be disseminated. Frequently there i s no method, other than typing, available or convenient. Because retyping i s largely a r e p e t i t i o n of labours and therefore a waste of e f f o r t , and because the work involved in typing a document i n the f i r s t place i s comparable to that of entering i t s text i n a machine-readable form (keypunching or, ever more commonly, on-line entry), i t i s reasonable that almost every computing i n s t a l l a t i o n should have what i s known as a ' r u n o f f program. A runoff program e s s e n t i a l l y acts l i k e a competent, fast but extremely simple-minded t y p i s t . The text of the o r i g i n a l document i s entered and stored i n some convenient Introduction 2 form. The runoff program uses the text to produce a typed document on a device with a reasonable character set, j u s t i f i e d or u n j u s t i f i e d . Unlike a secretary-typist, the program cannot determine when a paragraph should be started, or a new page, without the presence, throughout the text, of what can best be described as proof-reader's marks — simple commands which a f f e c t the format of the f i n a l document. When the document changes, the stored text i s edited by whatever means are l o c a l l y a v a i l a b l e , and run through the runoff program again. A competent secretary who can cope with the s t y l i s t i c idiosyncracies i n documents submitted for typing by several people can t h e o r e t i c a l l y cope with the l o c a l edit and runoff programs and eventually learn to produce revised documents much more quickly (in absolute time, assuming reasonable turnaround, and c e r t a i n l y i n man-hours) by t h i s method than by typing. Such a program has much wider a p p l i c a b i l i t y than t h i s , ' c f course. Any reasonably large document which i s l i k e l y to go through several drafts while large portions of i t remain unchanged could p r o f i t a b l y be produced by runoff program: papers, theses, novels, b r i e f s , proposals, etc., would a l l normally go through several cycles of rev i s i o n and retyping. If producing a new clean copy of a document becomes as simple as issuing the proper sequence of commands to a computer (and computer systems are becoming le s s formidable in t h i s respect, also) then the writer w i l l probably benefit from being free to c a l l for another clean copy afte r only a few al t e r a t i o n s . There even appears to be a growing use of computer-typed. Introduction 3 "personalized" form l e t t e r s — a somewhat less desirable use. In f a c t , i t i s no great stretch of the imagination tc see the resemblance of the work done by a runoff program to some of the functions of a typesetter ( j u s t i f i c a t i o n in p a r t i c u l a r ) . In short, then, the following advantages may be claimed for the processing of documents by computer: 1. Typing speed increases because there i s no need to worry about running off the edge of a page, no need to r o l l out sheets of paper when they are f u l l and r o l l i n new sheets, with carbons, etc., and l e s s concern about s p o i l i n g the document with errors. 2. The r e s u l t i n g document i s more uniform i n appearance than a typed document not only because of possible text j u s t i f i c a t i o n , but because human fatigue i s eliminated a f t e r the i n i t i a l entry stage. 3. Any segment of text needs to be proofread once only — net every time the document i s retyped — because once errors are eliminated from a piece of text, they remain eliminated, nor can new errors be introduced except by e d i t i n g . 4. Editing (addition, deletion or rearrangement of text) does not force a retyping of the whole document; when the new d r a f t i s needed, the program w i l l produce one i n mere seconds of computing time. 5. Documents come more and more to be available in computer-readable form, and are therefore more e a s i l y disseminated than paper documents, and more e a s i l y reproduced than microfilm or microfiche. Introduction 6. Since the basic hardware, at least for a document runoff f a c i l i t y , i s becoming more and more widely available, and r e l a t i v e l y cheap to use, the cost can be quite low. Demand for such a f a c i l i t y i s r i s i n g , and w i l l probably continue to r i s e i n the future, as i t becomes more widely available. Then has the runoff program lightened the typing burden of the secretary? Apparently not, and the reason i s immediate: runoff programs are almost invariably 'in-house' products, intended only for use at a certain i n s t a l l a t i o n ; t h e i r documentation i s scanty because the i n i t i a t e d need none, and whatever there i s , i s obscure and depends on jargon. People not concerned with computing have never heard of them (nor do they have access, even i f they have heard of them). Commercially available runoff programs are l i t t l e better: IBM'S TEXT360 i s intended for programmers, and the documentation r e f l e c t s t h i s ; the same company's Administrative Terminal System has a manual intended for secretaries (perhaps the only computer manual to use the pronoun 'she') but i s so simple-minded a program that i t would fr u s t r a t e anyone trying to produce even a moderately sophisticated document. It i s my contention that the runoff program i s now, or i s l i k e l y to become, at le a s t as important a part of any computer i n s t a l l a t i o n ' s resources as the various programming language processors. It i s one of the aspects of computing that i s of immediate value to the community at large, assuming that the runoff programs and the documentation which describes them Introduction 5 become more accessible to a non-programming but i n t e l l i g e n t public. Even so, one i s aware of some problems, hinted at above, which perhaps are preventing runoff programs from coming into their own: 1 . While the f a c i l i t i e s f or text formatting provided i n the average runoff program are adequate for a large proportion of documents, everyone ine v i t a b l y wants, at some point, to do something s l i g h t l y more sophisticated than the designer of the runoff program had i n mind. Since most runoff programs w i l l reproduce text •as-entered,' t h i s end can always be attained by crude means, but as a method i t i s no better than typing and probably a good deal worse, with the r e s u l t that the impatient user (and we are a l l impatient with computers) eventually gives up and goes back to t r a d i t i o n a l methods. 2 . The design of the average runoff program reveals l i t t l e i n the way of unifying structure. This i s usually because the o r i g i n a l design has been modified and enhanced so often that, with consideration of 'upward compatibility' getting i n the way of design, a l l e x i s t i n g unity has long since been l o s t . The res u l t i s that the average user i s confused and frustrated by a r b i t r a r i n e s s and inconsistency, discovers that the amount of work he has to go through to get his document to look 'right' reduces the e f f i c i e n c y of using a runoff program, and eventually gives up t r y i n g . In many ways, the above statements apply to early I n t r o d u c t i o n 6 programming languages as w e l l . The r u n o f f program in some sense resembles an assembler f o r a p e c u l i a r s o r t of machine: a t y p e s e t t i n g machine. I f the d i g i t a l computer i s a mechanization of a c l e r k with a p e n c i l and a very l a r g e s c r a t c h pad, then the t y p e s e t t i n g machine i s a mechanization of a t y p e s e t t e r . Of course t h i s t y p e s e t t i n g machine i s , i n the case of r u n o f f programs, no more than a c o n c e i t . We simulate i t with a d i g i t a l computer and the boundaries between r u n o f f program and simulated t y p e s e t t e r (or t y p e w r i t e r ) are seldom kept c l e a r by the programmer. But there are t y p e s e t t i n g machines, and they do have assembly languages which are not markedly d i f f e r e n t from r u n o f f languages. I t i s t h e r e f o r e u s e f u l to t h i n k of a r u n o f f program as running on a t y p e s e t t i n g machine of l i m i t e d c a p a b i l i t i e s i n the way of output. J u s t as e a r l y assemblers were o f t e n ad hoc programs intended to smooth the process of coding, and l i t t l e more, so e a r l y r u n o f f programs have tended to be ad hoc programs intended to smooth the process of d i r e c t i n g t e x t through one of these t y p e s e t t i n g machines. The f u n c t i o n s we expect an assembler to perform have become r e l a t i v e l y standard, however, while the f u n c t i o n s we expect of a r u n o f f program have not. T h i s t h e s i s i s an attempt to i n i t i a t e such a process by doing the f o l l o w i n g three t h i n g s : 1. To examine e x i s t i n g r u n o f f and t y p e s e t t i n g programs i n an attempt to r e c o g n i z e the b a s i c f u n c t i o n s i n v o l v e d i n p r o c e s s i n g document t e x t . 2. To design an 'assembly language* f o r document pr o c e s s i n g (a I n t r o d u c t i o n 7 r u n o f f p r o g r a m ) w h i c h c o m b i n e s t h e s e f u n c t i o n s i n t o a u n i f i e d s t r u c t u r e , i n d i c a t i n g why t h e d e s i g n t o o k t h e f o r m i t d i d f o r t h e b e n e f i t o f f u t u r e d e s i g n e r s . 3 . To i n d i c a t e how i m p l e m e n t a t i o n o f t h e s e i d e a s m i g h t p r o c e e d , a n d w h e r e s u c h i m p l e m e n t a t i o n m i g h t p r o v e h a z a r d o u s , g u i d i n g p o s s i b l e i m p l e m e n t o r s t h r o u g h t h e s e d i r e s t r a i t s . T h e r e s u l t i n g p r o g r a m i s c a l l e d T e x t u r e ( f o r p u r e l y a e s t h e t i c r e a s o n s — a l l t h i n g s m u s t h a v e n a m e s ) . I t i s a p r o c e s s o r f o r d o c u m e n t t e x t i n t e n d e d t o p r o d u c e o u t p u t on a t y p e w r i t e r t e r m i n a l , a l i n e - p r i n t e r , o r i d e a l l y p e r h a p s , one o f t h e r a p i d g r a p h i c s d e v i c e s , l i k e a h i g h - r e s o l u t i o n d o t - p r i n t e r , w h i c h a r e b e c o m i n g more a n d m o r e f e a s i b l e . T h e c o n c e p t s a r e n o t r a d i c a l l y d i f f e r e n t f r o m t h o s e i n v o l v e d i n d e s i g n i n g a u t o m a t i c t y p e s e t t e r s , a n d t h e s e w i l l c e r t a i n l y be e x a m i n e d . Nor i s i t i n c o n c e i v a b l e t h a t T e x t u r e c o u l d be u s e d t o c o n t r o l a n a u t o m a t i c t y p e s e t t i n g d e v i c e . B u t p r i m a r i l y , T e x t u r e i s i n t e n d e d a s a r u n o f f p r o g r a m — a n a i d t o t h e w r i t i n g , p r o o f i n g a n d e d i t i n g o f d o c u m e n t s : t h e r e i s t o o much i n h i g h q u a l i t y t y p e s e t t i n g t h a t r e q u i r e s t h e i n t e r v e n t i o n o f human a e s t h e t i c s f o r T e x t u r e e v e r t o be e n t i r e l y s a t i s f a c t o r y i n s u c h a n a p p l i c a t i o n . (A t t h i s p o i n t , I f e e l t h e n e e d f o r an a s i d e . T h u s f a r , i n t h i s i n t r o d u c t i o n , I h a v e b e e n u s i n g t h e t e r m s • r u n o f f p r o g r a m ' a n d ' a u t o m a t i c t y p e s e t t i n g p r o g r a m * t o d e s c r i b e t w o t y p e s o f p r o g r a m w i t h e s s e n t i a l l y t h e same f u n c t i o n . The r u n o f f p r o g r a m p r o d u c e s t y p e w r i t t e n o r t y p e w r i t e r - l i k e d o c u m e n t s , w h i l e t h e a u t o m a t i c t y p e s e t t i n g p r o g r a m c o n t r o l s , o r p r o d u c e s o u t p u t f o r t h e c o n t r o l o f some f o r m o f t y p e s e t t i n g e q u i p m e n t . The c o n c e p t s Introduction 8 involved are not r a d i c a l l y d i f f e r e n t , even i f the processes are. (Since Texture i s such a processor, a reference term l i k e •runoff program' w i l l be used rather often throughout t h i s t h e s i s , and I f e e l the need to establish one term for the concept of automatically processing document text, independent of output device, which can be used consistently throughout the thesi s . Bunoff programs are also c a l l e d 'output processors,' e s p e c i a l l y when they have some relationship to a text editing program, and 'text processors.' I f e e l that 'text processor' i s too vague (so i s an editor; so i s a SNOBOL program), and that •output processor' i s too suggestive of output interface routines, to be properly descriptive. Typesetting programs are also c a l l e d 'composition programs' — a good term, but one which has too much the aura of printing and typesetting about i t . I personally prefer the term 'document processor,' which conveys the idea that the thing named i s a processor of documents — a concept both general enough to include runoff programs and typesetting programs, and s p e c i f i c enough to exclude other forms of text processing, while allowing that document editing — a form of document processing — i s a closely related f i e l d . ) The thesis i s organized along these l i n e s : In Chapter 1, relevant and re a d i l y available l i t e r a t u r e on document processing i s surveyed, and at the same time, the l i t e r a t u r e i s examined f o r recurring themes — for concepts which are common to many such processors -- and these concepts are placed i n the perspective of what has, and what might be done with them. In Chapter 2, a design for a document processor i s outlined; Introduction 9 important aspects of the design are examined minutely, and reasons are given for choosing one pa r t i c u l a r path above another, for including one concept and excluding another. Chapter 3 i s a user's manual for the resu l t i n g processor; at the same time t h i s chapter s p e c i f i e s the design with an eye to implementation. Chapter 4 discusses an actual implementation. The discussion avoids excessive, machine dependent d e t a i l , r e a l i z i n g that situations d i f f e r from machine to machine and operating system to operating system, but with an eye to easing the implementor•s burden by saving him much experimentation with unsuitable data structures. Chapter 5 looks back at what has been achieved, and forward to what might be achieved, given recent and projected advances i n hardware; i t expresses enthusiasm for possible software developments, but i n j e c t s a note of caution into these f l i g h t s of fantasy. A bibliography and a glossary of the terminology used in t h i s thesis are appended. 10 Chapter 1 A Review of Relevant Literature "He ask advice but seek corroboration." Most of the published l i t e r a t u r e i n the f i e l d of document processing i s concerned with computer typesetting, and so deals with s p e c i f i c problems of typesetting rather than general notions of design. The l i t e r a t u r e to be considered f a l l s into four categories: 1. Surveys 2. Papers describing document processors. Most of these are typesetting programs (or, as they are usually known, systems). 3 . Manuals describing document processors. 4. Papers about s p e c i f i c aspects of document processing. The organization of t h i s chapter i s as follows: In part A, papers other than manuals are b r i e f l y reviewed, and references to the bibliography are given, so that interested readers may fi n d more d e t a i l in the o r i g i n a l sources. In part B , manuals and descriptive papers are considered for two purposes: (1) to extract general notions about what constitutes a document processor, and i n c i d e n t a l l y to come up with a consistent terminology for these notions; and (2) to study what s p e c i f i c solutions — or lack of solution — to the problems of A Review 11 describing documents are to be found in the various document processors described. Part kz_ A General Survey To begin t h i s survey, i t might be best to look f i r s t at any exist i n g surveys on the same general subject. A pair of survey papers appeared i n the 1966 edition of Ad varices i n Computers. Wayne A. Danielson [1] surveyed the 1966 state of the art of computer assisted copy-editing, and looked at the exis t i n g solutions to problems l i k e j u s t i f i c a t i o n and hyphenation. I t i s interesting to see that the solutions to these problems have not improved markedly in the eight years which have passed since then. He looks at some experimental copy-editing systems and an experimental edition planner, a l l rather primitive. He makes the statement that "...editors probably w i l l appreciate any improvements i n ££2<luction which takes place ... They w i l l , however, r e s i s t any computer applications which go beyond production and begin to enter into the Judcjm^njbal. processes involved in e d i t i n g . " And well they should. William R. Eozman, in a survey of computer-aided typesetting, spends much of his paper enumerating the virtues of the available typesetting hardware [ 2 ] . There i s l i t t l e discussion of software, and i n f a c t , the tone and content of A Review 12 these two papers convince one that there i s l i t t l e to be learned from going back any further than 1965. Since the hardware Bozman mentions has now gone, the way of a l l hardware, the survey i s remarkable for only one point: i t i s the only paper, other than Kunkel's own, which makes mention of Kunkel and Harkum's hyphenless j u s t i f i c a t i o n — an innovation r e a l l y made possible fo r the f i r s t time by computers, but which has yet to be u t i l i z e d (more on t h i s in Part B). A much more recent survey of interactive text editors by van Dam and Rice [3] i s of greater inte r e s t , because many text editors contain some sort of formatting, or even • r u n o f f f a c i l i t y . The survey gives a good overview of very recent work i n the f i e l d of text e d i t i n g , and acts as a very good annotated bibliography of the l i t e r a t u r e . Since the borderline between text editing and document processing i s very hazy, many of the documents they reference w i l l be discussed l a t e r in t h i s survey. The most recent (1973) survey of relevance i s a t u t o r i a l on on-line text processing by Richard C. Roistacher. The term •text processing' here i s intended to embrace text editing and (document) output processing. The survey i s incomplete, but there i s a l i v e l y discussion on what makes a good editor or output processor, and since there i s an appendix containing examples of runs, the reader could, i f he had a masochistic streak, obtain a good idea of what i t i s l i k e to use one of these processors — masochistic, because wading through someone else^s conversation with a computer i s always a painful and i l l - m o t i v a t e d task. A propos of document processing, Roistacher A Review 13 makes the ominous statement, "...do NOT attempt to write your own output processor. You have neither the time nor the money to do the job decently." Besides surveys, there have been very few relevant papers published in the l i t e r a t u r e that do not describe yet another program to do document processing at one l e v e l of sophistication or another. This i s , of course, endemic to computing i n general. "Hyphenless J u s t i f i c a t i o n " by Kunkel and Markum [30] i s a short paper explaining a remarkably simple, remarkably e f f e c t i v e innovation i n typesetting, which e s s e n t i a l l y removes the need for hyphenation (this, as we s h a l l see, i s an important point i n document processing) without s a c r i f i c i n g the v i s u a l quality cf printed text by introducing excessive spacing or squeezing. The solution l i e s i n tampering with a •motherhood' p r i n c i p l e of typesetting, that the set s i z e (the width, i n points) cf the text must remain constant throughout, with the obvious exception of t i t l e s and s p e c i a l text. Instead, the authors suggest varying the set size by minute amounts, from l i n e to l i n e , as the spacing of the l i n e demands. A more detailed explanation i s given in the section on j u s t i f i c a t i o n , l a ter in this chapter. Since t h i s technique involves an increased amount of c a l c u l a t i o n , i t i s made possible only by the introduction of computers into document processing. This appears to be the only r e a l innovation which computers have introduced into typesetting thus.far. A much more recent a r t i c l e e n t i t l e d "Computer A Review 14 Photocomposition of Technical Text" [31] recognizes, a l b e i t between the l i n e s , the inadequacies of most document processors (in p a r t i c u l a r automatic typesetters) when they have to be used to deal with unusual situations. The authors describe a series of modifications to the Page-1 Composition Language [19] which made i t possible to do automatic typesetting of physics journals, and the various obstacles which had to be overcome. That there were obstacles at a l l , indicates that Page-1 was not intended by i t s designers to be so abused. The paper i s , in t h i s negative sense, i n s t r u c t i v e ; and i t must be of some marginal i n t e r e s t to anyone with a sim i l a r problem: i t says, i n e f f e c t , that i t i s better to modify than to write your own, in spite of the d i f f i c u l t i e s . Part Bj_ Manuals and Descriptive Papers This section i s organized i n two parts. The f i r s t part i s a b r i e f discussion of each system or program, with references to the bibliography. The second section i s an attempt to outline the concepts i n document processing which should be explored in greater depth, and a look at each of these concepts, as they are represented i n the existing systems, and as i t seems to me they should be represented. At t h i s point, t h i s chapter moves from a survey into design considerations, and t h i s prepares the ground for chapter 2, which gives an outline of the considerations, and A Review 15 the r e s u l t s , in terms of design, of t h i s process. 1± h HHEim of exist i n g systems Frequently, a very primitive kind of document format f a c i l i t y i s included i n a text editor. WYLBUR £20], an editor and remote job entry system, obeys commands to a l t e r a range of li n e s so that the text within them i s a) no more than a given width, b) j u s t i f i e d , or c) centred. However, WYLBUR does not divide the text into pages and so has none of the f a c i l i t i e s that one soon comes to expect of document processors: t i t l e s , page numbering, footnotes etc. I t may be argued that a l l of t h i s can be accomplished by editing the document (using WYLBUR) but any sizeable change i n the document w i l l upset a l l the work done on the previous generation. Except for short documents l i k e l e t t e r s , writeups or memoranda, i t simply i s not adequate. IBM's Magnetic Tape S e l e c t r i c Typewriter (MTST) [5] and Information Control Systems's Astrotype [6] are extremely rudimentary edit and runoff systems for form l e t t e r s and short documents. Both are intended for o f f i c e use; for typing out l e t t e r s , and for easing the process of correcting such l e t t e r s . They have some usefulness i n producing form l e t t e r s , since they can be made to stop at certai n points, wait for manual typing (name, address, etc.) and then accept a signal to resume automatic typing. The edit i n g c a p a b i l i t i e s are rudimentary, and the formatting f a c i l i t y extends to typing the text, ragged right or as-given, within user-determined margins. IBM's ATS [13] i s also an edit-runoff combination, but much more v e r s a t i l e i n A Review 16 either of i t s functions than the two previously mentioned systems, and much more suited to document processing than WYLBUR. F i n a l l y QED [21;22], which i s part of Corn-Share, i s an edit-runoff system, not markedly better than ATS or WYLEUR. Runoff f a c i l i t i e s also form part of more sophisticated text processing systems. HES, the Hypertext Editing System [7 ] i s a highly i n t e r a c t i v e editor and t ^ t r e t r i e v a l system capable of producing formatted output on a li n e printer by producing TEXT360 source. Actually the authors are more concerned with displaying text on CRTs than with printing i t : the whole idea of hypertext editing i s to do away with printed documents and replace them with on-line terminals able to display documents in a non-linear fashion to anyone who needs the information contained i n them. A hypertext i s a segment of document which may reference other hypertext. The directed graph thus created may be edited l o c a l l y , may be made to reference even more hypertexts, and may be viewed by displaying the nodes, a screenful at a time, and by permitting the user to leap frcm node to node, and f i n a l l y retrace his path to return to any point of departure. The authors hope that, given such a f a c i l i t y , one would no longer need printed documents. HES was extended to FRESS (File Retrieval and Editing System), which i s commercially available for an IBM 360/67 (a v i r t u a l memory machine). Its output f a c i l i t i e s also include paper tape for photocomposition equipment [8], An even more ambitious project to replace pen and paper i s Stanford Research Institute's NLS, which i s intended to augment A fieview 17 human thinking and e f f e c t i v e l y turn the computer into a giant scratchpad. Their output processor [27] has a poor command language, but then i t i s presumably not intended for d i r e c t human manipulation — the source to t h i s processor would, one assumes, be produced by NLS, from the text f i l e s which, l i k e those in FRESS, may be edited on-line and are able to contain font-change information and formatting codes. Needless tc say, NLS and FRESS require more equipment and resources than the average computing i n s t a l l a t i o n has at i t s disposal. NLS has been developed for the ARPA network so that unless one i s connected to ARPA (and few are) i t i s inaccessible. As for developing one l i k e i t : Roistacher's statement about time and money applies tenfold. There are many commercial typesetting systems and I cannot pretend to be aware of them a l l , or to have included more than a representative sample in t h i s survey. Most of them belong to small companies which f o l d without having produced widely disseminated propaganda, l e t alone any decent documentation. This i s no heavy concern, however, since a glance at the more widely known systems assures one that there i s l i t t l e difference between any two of them. In the same way that i t became apparent, early i n the development of d i g i t a l computers, that i t was simply too burdensome to write programs i n binary, or worse, by wiring computers, so i t i s with typesetting computers. A 1966 paper presented at the F a l l Joint Computer Conference proudly describes a typesetting machine, the Mergenthaler Linotron [15], A Review 18 which i s to be programmed by hand-wired plug-boards. The author i s quite confident that t h i s w i l l be no burden, since plug-boards are ea s i l y interchanged. At the same conference, another paper takes quite the opposite tack. instead of hardware, Kunkel [28] designs software to a s s i s t in page composition. The software can be replaced by other software, which handles the same input, but produces output (paper or magnetic tape) intended for an en t i r e l y d i f f e r e n t phototypesetting apparatus. Moreover, the various settings, such as page depth or column width, can be adjusted in the middle of a document: there i s no need to take out the current plug-board and put in another. There i s very l i t t l e extraordinary i n the way of design, in Kunkel's system, and some pretty horrible notions about command language design. It was for t h i s system that Kunkel and Markum developed their "hyphenless j u s t i f i c a t i o n " , and there i s a good discussion here, as well as in the Datamation paper [30], about what i t involves. At the same conference there was a paper e n t i t l e d "Computer Typesetting of complex s c i e n t i f i c material" [29] which describes a "macro coding system for inputting text, mathematical expressions, and tables, employed i n a computer-based typesetting system which performs routine typesetting ca l c u l a t i o n s and implements the variety of decisions made by an [human] input operator i n the setting of s c i e n t i f i c material." In other words, the authors have recognized that typesetting equipment i s too primitive to handle t h e i r complex problems A Review 19 conveniently, and they have designed a s l i g h t l y (but only s l i g h t l y ) higher l e v e l command language to ease the process cf getting t h e i r text typeset. They describe some techniques they have developed for seroi-automatically typesetting mathematical expressions and chemical structures. Like Kunkel, they have re a l i z e d that i t i s necessary to be general: "Our philosophy," they say, " i s to work with available equipment, but to design systems capable of u t i l i z i n g new equipment as i t becomes ava i l a b l e . " IBM's own entry i n the automatic composer f i e l d i s a compact system based on th e i r S e l e c t r i c typewriter, and i t s multiple, e a s i l y exchangeable type-fonts. In two enormous papers i n the IBM Journal of R&D [17;18] the p r i n c i p a l workers on the project describe, i n d e t a i l , the mechanics of the development. It was necessary to modify the S e l e c t r i c typewriter so that i t could type faster (14.1 characters per second, was the f i n a l result) and could perform proportional spacing. It was also necessary to come up with a new . carbon ribbon and a new ribbon feed mechanism. They designed a stored-program mini-computer which drives the whole apparatus — and could, presumably, be used to program any' application in the area of text processing which the user has i n mind. For the te c h n i c a l l y minded, these papers must be a joy, but i t i s apparent that nearly a l l the work went into hardware, and very l i t t l e into software. The appendix to the second paper actually describes the operation of the composer (called MT/SC), and i t i s not a very sophisticated document processor. A Heview 20 CypherText [16] i s perhaps the most v e r s a t i l e and unfettered document processor included i n t h i s survey. I t i s e s s e n t i a l l y a preprocessor with no fixed regard for the ultimate output medium: be i t photocomposition, "hot lead" or typewriter. The command language i s quite good and the idea of making i t "extensible" i s a very fi n e one, even i f the implementors do not e n t i r e l y achieve i t to s a t i s f a c t i o n , and do not appear to know very much about language design. I w i l l go into my few quibbles with CypherText l a t e r in t h i s chapter. The Harris Composition System, or HCS [23], i s at about the same l e v e l of sophistication as CypherText. The CypherText paper actually references the HCS Language Manual, and there are some very clear influences of HCS on CypherText. HCS contains some very good ideas about page layout, about command languages and about the need for a macro language i n something so lew l e v e l . These w i l l be discussed l a t e r , at the appropriate point. The "runoff" programs, which aim at l i n e printers or typewriters for their output medium, and are generally used to produce writeups or papers, usually come i n families of si m i l a r programs, because they are r e l a t i v e l y e a s i l y modified, and therefore d i f f e r e n t at every i n s t a l l a t i o n . Again there i s a problem with completeness i n surveying these, because there are many small, unpretentious l o c a l programs, which never see the l i g h t of day outside the i n s t a l l a t i o n at which they were programmed. IBM FORMAT [10;11 ] i s an IBM product, as are TEXT360 [12] and the Administrative Terminal System, or ATS [13]. FORMAT [9] A Review 21 i s also a subsystem of the Michigan Terminal System, has been vigorously modified, and now only d i s t a n t l y resembles Gerald Berns's o r i g i n a l description. Runoff has many versions, a l l more or less s i m i l a r : some members of the family are Multics Runoff [24], Tenex Runoff [25] and PDP-11 Runoff [26]. 2 i Design considerations as r e f l e c t e d by existing systems Zs.1 Command Language It seems quite natural that the command language of an automatic typesetter should resemble the command language of a human typesetter {that i s , the language through which the editor or author of a document communicates with the typesetter), which i s to say, the in s e r t i o n , i n the text, of marks which indicate s p e c i f i c actions. This i s indeed the usual approach taken by the designers of document processors, and the form of the command language d i f f e r s only s l i g h t l y from processor to processor. The most primitive approach i s the escape character followed by one or more single-character commands. IBM FORMAT, TEXT360, ATS, Runoff, Kunkel's composer, LINCO I I I , and the S e l e c t r i c Composer a l l take t h i s approach, although with va r i a t i o n s : Runoff, of which several s l i g h t l y different versions e x i s t , has i t s commands always s t a r t on a new l i n e , prefixed with a A Review V 22 f u l l stop {.) in column 1. FORMAT commands are interspersed in the text, more or less i n free form, with some annoying exceptions, The escape character i s a right parenthesis, but only i f i t i s preceded by a blank — i n most cases, t h i s causes no problems with text being mistaken for commands. There i s also a 'command mode* in which commands are given one to a l i n e . In MTS FORMAT thi s mode may be entered as often as necessary, while in IBM FORMAT i t i s entered only once, at the beginning of the job. The TEXT360 commands are bracketted by minus or plus signs depending on th e i r nature (which must be extremely annoying to anyone trying to use some mathematical expressions), often have more than one character per command and even have some rudimentary parameters. In ATS the escape character i s the ATTN key on an IBM 2741 terminal (for which i t was designed); commands may be up to a l i n e long and are terminated by carriage return; as i n TEXT360, commands may have parameters. The LINCO III typesetting system has i t s own keyboard, with special command buttons, other commands are prefixed by a special escape character. The s e l e c t r i c Composer also has i t s own keyboard — the escape character i s a 'prefix key'; the commands are single characters followed by numerical modifiers, and are typed in di f f e r e n t colour ribbon on the input copy. The need for a more sophisticated command language becomes apparent to anyone who has ever t r i e d to use a document processor resembling the above. F a i r l y recently (about 1970, i t appears from publication dates) more intere s t i n g and more powerful command languages have emerged. A Review 23 CypherText, a typesetting language which i s independent cf output device, brackets command sequences i n slashes (or some other, user-defined, symbol). Commands are separated by semicolons and commands may have parameters, which are separated by commas. There i s , moreover, a f a c i l i t y for defining s t r i n g s , which can i n turn be referred to by name, and for defining parameterized macros. The authors of CypherText claim that "One prerequisite of an extensible typesetting language i s an unambiguous syntax. Every e f f o r t has been made to keep the CypherText syntax simple and consistent." But i t does not seem to have occurred to them that CypherText commands should have the same parameter structure as user-defined macros, thus allowing nested c a l l s , e specially of conditional execution primitives. Harris-Intertype's Harris Composition System [23] has somewhat the same idea as CypherText, and even recognizes the need for separate l e f t and right delimiters for commands (in t h i s case [ and ]), but they do even less with the concept than CypherText. Once again, commands are separated by semicolons, and parameters by commas. HCS allows for the d e f i n i t i o n of parameterless macros and of counters, but that i s a l l the user gets i n the way of computational power. I t i s possible, with judicious use of the 'define s t r i n g ' and 'execute s t r i n g ' i n s tructions to get the e f f e c t of parameters, but i t remains inconvenient. HTS FORMAT, which i s an augmented version of IBM's FORMAT, includes a parameterized macro f a c i l i t y . The escape character A Review 24 i s d i f f e r e n t from the command escape character ( i t i s a bar, |, as opposed to a right parenthesis), but the same escape character i s used for another f a c i l i t y i n FORMAT. The human mind i s remarkably adjustable, however, and there i s l i t t l e confusion. There i s no re a l a b i l i t y to manipulate parameters within macros, nor i s there conditional macro expansion. This i s done through another f a c i l i t y , which enables the user to write his own l i b r a r y of functions in some compilable (and compatible) language. Within FORMAT input text, function and macro c a l l s look a l i k e , and the i r parameters may in turn consist of functions and macro c a l l s (because l e f t and right s t r i n g delimiters are the same, i f a parameter i s a piece of text containing a macro c a l l whose parameters are also complex, i t becomes necessary to double the str i n g delimiters, u n t i l the whole thing becomes rather confusing — however, this i s almost never necessary). Recursion i s of course impossible, since there i s no conditional macro expansion. To use a l l of MTS FORMAT, the user must learn many 'languages 1 and conventions: the control card conventions, the command conventions, the macro •language' and a programming language — probably FORTRAN. 2_. 2 Layout Control (a) General Remarks Since the document processor can best be described as an A Review 25 extremely simple-minded t y p i s t (or typesetter), i t i s necessary to use the command language to indicate how the text i s to be l a i d out on the page. One must be f a i r l y e x p l i c i t : i t i s not enough to assume that the program knows what a t y p i s t knows, or that i t can figure out for i t s e l f that you want t h i s paragraph indented and that set of figures in tables, neatly aligned. This i s a l l layout control, and document processors vary widely i n the amount of control they w i l l give the user. What do they have in common? A l l of them w i l l do what a typewriter does: tab, space, carriage return with single or double spacing, s h i f t from upper to lower case and the reverse (in the case of runoff programs t h i s implies a typewriter or upper-and-lower-case print chain as output medium). Most of them w i l l do what a t y p i s t does: put t i t l e s and page numbers on each page, space and indent new paragraphs, go to a new page when the current page i s f u l l , and do consistent indenting. (b) Segmentation of the Page Most of the designers of early document processors have assumed that the output would consist of a single column cf n l i n e s . The l i n e s are usually adjustable i n width, and normally the location of the s t a r t i n g l i n e (with respect to the top of the page), and the value of n, are also adjustable. This i s adequate for the majority of documents, and the argument has always been that i f you were producing anything fancier for phototypesetting, a simple job of cut-and-paste could produce the desired r e s u l t s . In fact, t h i s i s true, but i t has long 6 A Review 26 been recognized that "...some of the tasks which the Editor now perforins with pencil, s c i s s o r s and paste pot on hard copy could also be performed e l e c t r o n i c a l l y by an appropriately instructed computer." [1] In the time i t takes to lay out a document by hand, a computer could have produced the same document with better alignment on the page (and with easier reproducability, of course). Naturally t h i s assumes that we have instructed the computer correctly the f i r s t time, and t h i s i s seldom true. However, once the •bugs1 are out of the automated layout, any changes i n the document can be made with a reasonable certainty that the layout w i l l remain constant. It i s f o r t h i s reason that there have been some attempts to expand the layout f a c i l i t i e s provided by document processors. TEXT360 has a single column and a double column mode. The two columns are of the same s i z e , and rather limited in application, but the two modes may be mixed cn a page. The widths of the columns are given at the time the mode i s entered (the defaults f i t n i c e l y into a standard 8 1/2 x 11 page). IBM FORMAT permits up to 8 columns per page, a l l the same width (this could, t h e o r e t i c a l l y , be adjusted with indentation, but wide blank spaces would have to be l e f t between the columns. Indentation i s not fea s i b l e anyway, because there i s no way of detecting that a new column i s being started, so that the width cannot be adjusted at the correct time). Modes cannot be mixed on a page as i n TEXT360: that i s , i t i s not possible to have three columns i n the top half of the page and two i n the bottom half. A Review 27 S i m i l a r l y Kunkel's Page Composing System allows up to three columns per page of egual width, with equal inter-cclumn spacing. Some very powerful page layout control i s made available in Harris-Intertype's HCS. There are twenty 'blocks*, into which the program may be instructed to put typeset text. There i s a 'current' or active block, whose top l e f t corner i s adjusted with respect to the top l e f t corner of the physical page. Within the block, text i s limited to top, bottom, l e f t and right margins, which are adjusted with respect to the top l e f t corner of the block. Coupled with l o t s of useful feedback from the „ formatting program (which w i l l be discussed below), t h i s f a c i l i t y makes a wide range of control available to the user: the three-columns-at-the-top and two-at-the-bottom problem mentioned above i s now t r i v i a l ; i t i s possible to plan layouts which include blank space for pictures and diagrams; above a l l , i t leaves to the user a number of decisions which are r i g h t f u l l y the user's. (c) Non-linear Text The bulk of most documents consists of 'linear text'; that i s , text which, l i k e t h i s paragraph, i s put into the next available space on the page, j u s t i f i e d and a l l . There i s a large amount of 'non-linear* text however; which i s to say, text which goes elsewhere on a page, or must be treated s p e c i a l l y . This includes footnotes, which must go at the bottom of a page or column;, 'keeps', pieces of text which must be kept together A Review 28 for aesthetic reasons, instead of being broken across boundaries i n the text; 'figures', which are also text that must be kept together, but may ' f l o a t ' through the linear text u n t i l a place can be found where they can stay together; tabular material, which i s always to be treated s p e c i a l l y ; and f i n a l l y 'semi-repeated' text l i k e headers, footers and page numbers, which recur (though often with changes) on every page. If footnotes are treated at a l l (and they are not i n the IBM S e l e c t r i c Composer, CypherText, HCS or Berns's FORMAT) i t i s usually as a kind of mandatory keep of n l i n e s which i s released when there are exactly n l i n e s remaining on the page. The re s u l t i s that either the entire footnote appears on the page cn which i t i s referenced, or i t appears on the next page. This i s 1 understandable, but not standard typesetting practice, and rather ugly when, as in MTS FORMAT, the skip to the next page i s performed i f a footnote w i l l not f i t on the current page. In TEXT360 one i s limited to accumulating 10 l i n e s of footnote per column and the (cumulative) footnote i s printed at the end of the column. MTS FORMAT l i m i t s the footnote to 20 l i n e s . Keeps occur in both FORMATS, TEXT360 and ATS and come in two flavours: the "I want to keep the next n l i n e s on the same page" and the "I want to keep the following text (up to a certai n end mark) i n the same page" flavours. The f i r s t i s easy to program -- the user can do i t himself i f there i s a way of asking how many li n e s remain (which can be done i n , say, CypherText and HCS, but not i n any of the above) — and wholly inadequate. IBM FORMAT has both flavours, TEXT360 and ATS have A Review 29 only the l a t t e r and MTS FORMAT only the former. There are r e s t r i c t i o n s in a l l cases about what commands cannot appear in a keep. Figures appear in MTS FORMAT and TEXT360 (where they are known as 'fl o a t i n g keeps'). In FORMAT, but not in TEXT360, the user has control over the number of l i n e s of 'linear text' which must appear above and below the figure when i t i s f i n a l l y released. Tables, i t i s usually assumed, are to be handled with tabs. This i s fine for tables with l i t t l e information in them, but i f any segment of the table runs more lines than the others, tabs become merely a small convenience in waging an otherwise d i f f i c u l t battle with the document processor. Both the MTS FORMAT and TEXT360 have f a c i l i t i e s for automatically outlining a table horizontally and v e r t i c a l l y . l i t t l e i s said about tables elsewhere that i s informative. Some areas of the page layout are usually reserved for headers, footers, and pagenumbers. Automatic page numbering i s pretty well universal i n typewriter oriented document processors. ATS has a 'heading' and 'footing* mode, which are e s s e n t i a l l y the same thing at d i f f e r e n t ends of the page. The text i s reproduced at the top or bottom of each subsequent page i n 'as entered' format. In TEXT360, the t i t l e , s u b t i t l e and footer a l l have the i r fixed place i n the layout, and for t h i s reason they must not exceed one l i n e ; they are reproduced 'as entered'; footers may be made to alternate from l e f t to righ t , along with the page-number, on even and odd pages, thus giving A Review 30 the e f f e c t of two footers on a two-page spread. MTS FORMAT has s i m i l a r f a c i l i t i e s (including l e f t and right t i t l e s , but not l e f t and r i g h t s u b t i t l e s ) with more reasonable r e s t r i c t i o n s on the number of l i n e s each may occupy. The typesetting systems, esp e c i a l l y those, l i k e HCS and CypherText, where the user has considerable control ever and feedback from the formatting process, take the attitude that the user w i l l probably want to handle non-linear text himself. Unfortunately, the programming f a c i l i t i e s i n these systems are not adequate for the user to program a s p e c i f i c method once and then reuse that method i n a l l his subsequent documents. The r e s u l t i s that he probably ends up doing the same thing over and over again — much the same way as a programmer, forced to use an assembler without good macro f a c i l i t i e s , must write cert a i n constructs again and again. (d) Interaction with the formatting process Suppose one had a rather long aside i n his document which, i f there were s t i l l twenty l i n e s remaining on the page on which i t was referenced, would be put out as a footnote, but i f there were fewer, would be put out at the end of the chapter. What i s required i s the a b i l i t y to ask the document processor how many l i n e s remain on the page (or i n the current column). Other reasonable questions are: how much space remains on the current l i n e ; what i s the page number of t h i s page (for forward references); what are the settings of various system variables (useful inside macros). A Review 31 What sort of questions w i l l document processors answer? Most of the typewriter oriented ' r u n o f f programs w i l l not divulge much more than the date. This occasionally gives the user a sense of helplessness when he wants to do something which the designers of the processors had not considered, but which could be done, given the proper information about the condition of the process. Naturally an a b i l i t y to ask about the condition of the formatting process implies an a b i l i t y to a f f e c t the formatting process dynamically. Where the l a t t e r i s not av a i l a b l e , the former i s of almost no use. It i s no surprise then, that CypherText and HCS have a large number of 'handles' on the process, and FORMAT and TEXT360 have none. Simply being able to ask questions at s p e c i f i c places within the text i s often not enough feedback, however. There comes a time when you want a the processor to tug at your sleeve and say "Hey, a s p e c i f i c event has occurred; do you have anything you'd l i k e to do before the process continues?" This i s very much l i k e the concept of interrupt in most computers, and the concept of *on-conditions' which occur i n PL/I, and i n s i m i l a r forms i n a few other programming languages. In f a c t , i t i s a way of simulating the i d e a l s i t u a t i o n , which i s that document processor and user program operate in p a r a l l e l to format the text. Only i n HCS has t h i s concept been applied. I t i s possible to define s t r i n g s of commands which are executed whenever a right margin i s encountered, or the l a s t l i n e of a block has been set. This command s t r i n g could then, for instance, include A Review 32 a r e - d e f i n i t i o n of the end-of-line (end-of-column) condition, fo r next time. 2^ .3 J u s t i f i c a t i o n Almost a l l document processors do automatic j u s t i f i c a t i o n , and those which do not can hardly be considered document processors at a l l . The methods of achieving flush right margins vary, but they a l l come down to ins e r t i n g spaces. With typewriter (or printer) output, t h i s spacing factor i s fixed, and rather large (the S e l e c t r i c composer excepted). one i s therefore forced to space only the words: the algorithm i s quite simple, and i s given i n Chapter 2. A further refinement, possible only i n typesetting, i s to space out the l e t t e r s within words s l i g h t l y ; the IBM S e l e c t r i c Composer, which uses as input/output medium a souped-up version of the IBM S e l e c t r i c Typewriter, i s also capable of reasonable refinement i n spacing (down to a t h i r d the actual width of a comma). Even so, the r e s u l t s are often d i f f i c u l t to read, because inter-word and i n t e r - l e t t e r spacing becomes excessive. The c l a s s i c solution to this problem i n commercial typesetting i s to hyphenate, but almost everyone r e a l i z e s that automatic hyphenation i s a nasty business. The English language in p a r t i c u l a r , does not conform to a reasonable set of s y l l a b i f i c a t i o n r u les: as a re s u l t , automatic hyphenation i s either slow and expensive, or more prone to error than a human typesetter [1]. S t i l l , the developers of document processors A Review 33 have realized the need fo r better j u s t i f i c a t i o n , and several methods of hyphenation have been developed. The most obvious i s to use a dictionary of l e g a l hyphenations which, needless to say, i s cumbersome and therefore slow, since the storage medium i s usually a magnetic tape. It i s much more economical to use an algorithm which finds the 'most l i k e l y * position for a hyphen; but t h i s approach leads to more error than i s tolerable, even i n low-quality jobs l i k e newspapers. (Although Mergenthaler's Linotron [15] i s claimed to be 97% accurate using only an algorithm). The compromise i s to use an algorithm coupled with a (short) dictionary of intractable exceptions: successive refinement of both the algorithm and the dictionary can lead to remarkable accuracy. One can also resort to the ancient expedient of consulting an oracle: i n t h i s case, a human operator. I f hyphenation i s indicated (perhaps because the word i s too long to f i t cn the l i n e ; i s at l e a s t , say, s i x characters long, and at least three of those characters could be l e f t on the current l i n e ) , the program types out the word, giving maximum length e.g., RESUSCIT_ATE — and the human operator then responds by typing i t up to the hyphenation point — e.g., REStlSCI . This would appear to be very wasteful both of human and machine resources, but i t i s exactly what the IBM S e l e c t r i c Composer does, and i t was not an o r i g i n a l idea even there: a special purpose computer, the LIWASEC, used a crude version of the same idea. One other method of j u s t i f i c a t i o n , which does not rely on A Review 34 hyphenation, should be mentioned. It goes against a l l typesetting convention, but for low and medium quality jobs i t appears to work. The method, which could only be practised on f a i r l y f l e x i b l e typesetting equipment, was developed by Kunkel and Markum [30;28], and consists of varying the set size of a l i n e : If a l i n e i s too long, the program reduces the set size by 1/2 point, and attempts to j u s t i f y . If i t i s s t i l l tco long, the program moves the l a s t word of the l i n e to the beginning of the next l i n e , raises the set size of the l i n e by 1/2 point, and j u s t i f i e s the l i n e . The r e s u l t s are remarkably readable: the eye does not discover the maximum difference of a point between successive l i n e s , and excessive spacing i s almost eliminated (and so, i n c i d e n t a l l y , are •ladders', the occurrence of two or more hyphens at the end of successive l i n e s ) . As Kunkel says, "Set size v a r i a t i o n i s NOT letterspacing. It produces words whose horizontal l e t t e r proportions are not changed, as they are i n l e t t e r spacing. This i s of benefit to the reader because i t i s v i s u a l l y smoother and because end-of-line hyphens do not interrupt his reading." This statement seems very true when one considers words l i k e "mon-sieu r " , broken over two l i n e s (or, closer to home, "coin-cidence") . Situations l i k e t h i s often necessitate a pause, and re-reading. Why was i t not discovered before? I suspect that i t i s because t h i s method i s more amenable to the high processing speeds of computers than to human typesetters. Moreover, i t i s r e a l l y only possible on photo-typesetting equipment, where A Review / 35 changes i n set size are easier than on t r a d i t i o n a l typesetting eguipment — and photo-typesetting i s a r e l a t i v e l y recent development. It may also be because printers are l i k e l y to be conservative about such changes — which do, i t must be admitted, lower the quality of the printing, even i f i t i s i n v i s i b l e to a l l but the closest scrutiny. This appears to be a far more reasonable method of automatic j u s t i f i c a t i o n than hyphens, and yet, i n spite of publication i n 1965, i t does not appear to have caught on: automatic typesetters s t i l l use hyphenation as t h e i r p r i n c i p a l j u s t i f i c a t i o n method. There i s more to adjusting text on a page than j u s t i f i c a t i o n , however, and four other kinds of adjustment appear f a i r l y consistently i n document processors: flush l e f t (which i s the ordinary l e t t e r or manuscript type — unjustified) also known as 'ragged right* or 'quad l e f t ' ; flush right — the opposite of flush l e f t , and of very l i t t l e value except in tables and peculiar layouts; centred; and ' s p l i t * , where half the l i n e i s set flush l e f t and the rest flush right — useful in t i t l e s and tables of contents. In most typewriter-oriented document processors i t i s also possible to have text come out •as entered' — a n expedient useful for bypassing the annoying habit of document processors not to do what you want, esp e c i a l l y i n diagrams and tables. It i s possible to see trends i n the development of document A Review 36 p r o c e s s o r s , however s l i g h t . There i s a tendency f o r c e r t a i n concepts ( l i k e macro languages, widened user c o n t r o l and i n c r e a s i n g l y s o p h i s t i c a t e d page l a y o u t s ) to emerge, and i t i s p o s s i b l e t h a t f u r t h e r refinement of these concepts w i l l lead to b e t t e r document p r o c e s s o r s , i n the same way t h a t refinement of programming language concepts l i k e procedure, i t e r a t i o n and data s t r u c t u r e , has l e d to b e t t e r programming languages. T h i s does not imply agreement on the ' p e r f e c t ' document processor (in f a c t , i f we c a r r y on the analogy with programming languages, i t i m p l i e s almost c h a o t i c d i v e r s i t y o f o p i n i o n ) , but i t does imply an improvement i n an i n c r e a s i n g l y important type of software. 37 Chapter 2 The Design of a Document P r o c e s s o r "We h e l d these t r u t h s t o be s e l f - e v i d e n t . . . " 1±. C r i t e r i a f o r Design What f o l l o w s a r e the b a s i c c r i t e r i a which have governed the d e s i g n o f T e x t u r e , t h e document p r o c e s s o r d e s c r i b e d i n c h a p t e r 3 o f t h i s t h e s i s . The p r i m a r y c o n c e r n has been t o produce a d e s i g n (and u l t i m a t e l y a program) which i s u n d e r s t a n d a b l e , not o n l y t o computer programmers (who, a f t e r y e a r s of p r a c t i c e , have l e a r n e d t o adapt t o any d e s i g n e r ' s i d i o s y n c r a c i e s ) , but t o anyone w i t h a r e a s o n a b l e u n d e r s t a n d i n g o f t h e problems t h a t might be i n v o l v e d i n p r o d u c i n g f i n i s h e d documents and of the i n f o r m a t i o n needed i n t h i s p r o c e s s . These c r i t e r i a c o u l d p r o f i t a b l y be used t o govern t h e d e s i g n o f any ' u s e r - o r i e n t e d ' program, and a good number of 'programmer-oriented' ones as w e l l . 1. The p r o d u c t s h o u l d be u s a b l e , a f t e r a minumum of i n s t r u c t i o n , by someone i n n o c e n t o f computing. 2. I t s h o u l d g i v e the advanced use r enough power t o a c c o m p l i s h Design 38 sophisticated ends with a minimum of e f f o r t 3 . Decisions which are r i g h t f u l l y the user's should be l e f t to the user. 4. (Bauer - Samelson Principle) Nobody should pay for features he does not use. 5. (Orthogonality) Concepts, once introduced, should apply throughout the design, not merely at certain points. 6. (Ockham's Razor) It i s vain to multiply e n t i t i e s beyond need. 2JL In£ut Text and Command language (a) Format of Input Text To make the use of a document processor as natural as possible (and considering i t s wide range of usefulness, t h i s means that people who know nothing about computing should have l i t t l e d i f f i c u l t y in using i t cor r e c t l y , almost from the f i r s t ) the entry of text to be processed should resemble, as much as i t can, the process of typing the same text on an ordinary typewriter. This means that, while there w i l l i n e v i t a b l y have to be 'meaningful' symbols to i n s t r u c t the computer in i t s task, such symbols should be as few as possible, i n order that the user not be over-burdened with them. Thus the format of the input text should be completely free (as i t i s with most document processors). Commands to the processor are to be interspersed with ordinary text, in any way Design 39 that makes the user most comfortable — i t i s as though one were typing remarks to a human typesetter. Commands should be c l e a r l y distinguished from text, not only for the computer's sake, but also, and primarily, for the user's — so that the source l i s t i n g of his document w i l l be clear and easy to ed i t . (b) Command Language The command language should have a simple, powerful syntax, so that the casual user w i l l have no d i f f i c u l t y learning or remembering the structure of commands, and so that the programmer who wishes to express complex actions, w i l l have no d i f f i c u l t y i n doing so. A simple syntax can be implemented neatly, and therefore economically, thus saving processing costs, at least on t h i s stage. It has the added benefit that those who must do extensive programming to achieve their ends are protected from large expense. T r a d i t i o n a l l y , command languages are of the canonical form: Command I d e n t i f i e r Parameter 1 ... Parameter-n e.g. PERAMBULATE P T W The separators between these parts are almost invariably commas or blanks. The reason for t h i s form may be resembles programming language function notation: either that i t Design 40 PERAMBULATE(P,T,W) which i n turn resembles mathematical notation; or" that i t resembles (as does mathematical notation) i n an abstract way, the imperative sentence structure of those European languages which have contributed most to modern mathematics and to computing: "Perambulate Paul from the Table to the Wall" (especially i f we practise the admirable habit of our ancestors, and of modern German, of c a p i t a l i z i n g the nouns, thus r a i s i n g them up out of the surrounding noise words l i k e 'from', 'the' and 'to*). This sort of syntax i s simple and, of course, applicable to any computing s i t u a t i o n — as demonstrated by the rela t i o n s h i p of the lambda calculus to programming languages; in pa r t i c u l a r . Lisp. The command language should be •extensible', which i s to say, i t should be possible to define new commands which look just l i k e primitive commands, and to define these commands in terms of existing commands (primitive or user-defined). The command language should, i t stands to reason, be naturally adept at handling text, since text i s the document processor's major order of business. It should also be a universal programming scheme (within the soft interpretation usually given t h i s phrase when applied to programming languages) Design 41 so that there i s no unreasonable r e s t r a i n t on the user's power of computation. This l a s t point may seem rather moot, but there i s not, i n fa c t , a document processor described i n the l i t e r a t u r e discussed i n chapter 1 whose command language i s computationally complete. The f i r s t candidate for a command language was one with the syntax and semantics of Lisp. The syntax i s c e r t a i n l y simple, and the brackets (in some other form, since parentheses are far too common i n documents — as witness the present document) act quite naturally as escapes, both to the user's eye and to the processor's scanner, from text to command mode and back again, besides permitting syntactic nesting of commands to any l e v e l . But Lisp semantics are overly complex: the language was intended for l i s t processing, and while users of a document processor may have use for l i s t processing (for example, in building indexes), they should not be burdened with an e n t i r e l y new concept to master. Lisp i t s e l f has never had decent string-handling f a c i l i t i e s , and i n order to introduce these, the language would probably have to be altered some. What i s r e a l l y needed i s a good general-purpose macro processor, with a set of document processor primitives. This w i l l give the user f u l l computational power; i t w i l l provide what I f e e l to be a very natural f a c i l i t y for the processing of text; and i t w i l l serve, i n an orthogonal fashion, as command language. For t h i s purpose, there are apparently three exemplary languages at present: DMIST (the HTS version of TRAC), GPM and M6 [ 33; 32 ;34 ;35 ]. They are quite s i m i l a r in syntax and Design 42 s u p e r f i c i a l semantics. As currently designed, Texture uses a l o c a l version of UMIST c a l l e d Eureka (which i s more cor r e c t l y spelled 'Eureka!*, but I f i n d the exclamation mark d i s t r a c t i n g ; the name means nothing, although i t may be said to be derived from UMIST by an obscure process of mental association on the part of i t s author, and to have been reinforced by the p o s s i b i l i t y of a questionable pun). GPM, however, has a better storage d i s c i p l i n e (a stack) which leads to a more e f f i c i e n t implementation because i t reduces copying. Moreover, i t i s even simpler than UMIST in that i t does not have the awkward d i s t i n c t i o n between active and neutral evaluation that UMIST has. A future modification in the design of Texture, in which Eureka i s replaced by some version of GPM, would be minor, because the change would have almost no ef f e c t on the look of a Texture document source. 2i E§;3§ Segmentation Most documents have a simple layout: a single block of text which, when surrounded by pleasing margins, f i l l s an 8 1/2 x 11" page. It i s nonetheless useful on occasion to be able to segment a page into subsections. In a l e t t e r , for example, there are one or two address blocks, a body, and some closing text. These form separate e n t i t i e s : i t i s much more awkward to combine them into one than i t i s to separate them. As a f i r s t approximation, then, we segment a layout into any number of blocks of text. Each block i s defined as being so Design 43 many columns by so many rows, and being i n a s p e c i f i c location r e l a t i v e to the t o p - l e f t corner of the page; thus a block i s e f f e c t i v e l y a template within which text may be formatted by the processor. A layout s p e c i f i e s an ordering of the templates. Thus blocks are f i l l e d with text, one by one, in the order in which they appear i n the layout. How much power does t h i s give us in describing the layout of a document? Besides l e t t e r s , as mentioned above, the method may be applied to multiple column layouts, defining blank space for diagrams, headlines and pictures, and fancy layouts such as found in magazines. The two-columns-in-the-top-half and three-in-the-bottom problem mentioned i n chapter 1 i s very simple and obvious to define. In short, one can leave the computer to do the bulk of the usual cut-and-paste work. Is i t adequate? Immediately one can think of some cases which cannot be handled. Suppose, for example, there were a layout incorporating a photograph: Design 44 r -.j i space for photo CAPTION Block 2 should extend as far down as the bottom l i n e of the caption, while block 3 should begin d i r e c t l y below the caption. But i t cannot be known accurately how many l i n e s the caption w i l l extend without running the entire document through the processor, and i f the caption i s altered, i t s length may change, and with i t , the format of the page. I t i s reasonable to want to do the layout right exactly once, and thereafter not wish to be concerned with the ef f e c t which changes i n the text w i l l have on the layout. Another example: i n a two-column layout, one may want footnotes to appear in a single column across the bottom of the page: Design 45 i FOOTNOTES If there are no footnotes, naturally blocks 1 and 2 should extend to the bottom of the page, whereas i f there are footnotes, the footnotes block w i l l have to be large enough to exactly accomodate a l l the footnotes issued on t h i s page (and any s p i l l a g e from the previous page). What we have here, and in the previous example, i s not fixed- s i z e blocks, but blocks with f l e x i b l e boundaries; and not merely f l e x i b l e blocks (like the footnote block), but blocks ( l i k e 1 6 2, i n the footnote example) whose f l e x i b l e boundary depends on the f l e x i b l e boundary of another block. Again, the user should not be expected to work t h i s out himself, when there i s a perfectly capable computer available to do i t for him. There are two approaches which come to mind: (a) Insist that the bounds of a block must be computable at the moment when that block f i r s t becomes active (that i s , when a l l preceding blocks are f u l l and t h i s one becomes the template into which subsequent text i s to be set). Thus, Design 46 given s u f f i c i e n t power to guery the c o n d i t i o n c f the l a y o u t , the user can g i v e an a r b i t r a r i l y complex expr e s s i o n f o r the boundaries of any block, and so delay d e c i s i o n s about the s i z e of a block u n t i l the moment i t i s e n t e r e d . T h i s w i l l c e r t a i n l y take care of the example with the photo c a p t i o n , but i t i s not adequate f o r the f o o t n o t e example. Why? No t i c e that the bottom boundaries .of blocks 1 and 2 are not f i x e d u n t i l the e n t i r e page i s f u l l , whereas the scheme d e s c r i b e d here r e q u i r e s t h a t they be computed as scon as the page ( i . e . , block 1) i s begun. To overcome t h i s l i m i t a t i o n , one needs scheme (b): (b) Let bounds of b l o c k s be expressions i n v o l v i n g the bounds of other b l o c k s . The dependencies of blocks on other blocks are c e r t a i n l y f i x e d throughout the process ( i . e . , they form a t r e e , i n the sense of graph theory; note that they do not form a graph, because c y c l e s -- A depending on B, which depends, through some c h a i n , on A — are dangerous!), but the boundaries of dependent b l o c k s remain f l e x i b l e u n t i l the page i s f u l l or u n t i l a l l b locks on which they depend have been f i x e d . T h i s w i l l c e r t a i n l y handle the f o o t n o t e s example and i t i s more g e n e r a l than scheme (a) — scheme (a) i s e s s e n t i a l l y scheme (b) with the the r e s t r i c t i o n t h a t dependent b l o c k s must be preceded i n the l a y o u t by the b l o c k s they depend on — so t h a t i t w i l l a l s o handle the photograph c a p t i o n example. However, scheme (b) i n v o l v e s more computation by f a r than scheme (a). Consider the f o o t n o t e s example. Here i s a l a y o u t Design 47 i n which the bottom l i n e s of blocks 1 and 2 are dependent on the top l i n e of the footnotes block. While li n e a r text i s going into block 1, footnotes may be issued and w i l l go into the footnotes block without any bother. Eventually block 1 f i l l s up, and linear text s t a r t s to go into block 2; now a footnote i s issued. It must go into the footnotes block so long as the page i s not f u l l , but i f another l i n e goes into the footnotes block, block 1, which i s dependent on the footnotes block, w i l l shrink by one l i n e . This requires that a l l text that has been issued since that l i n e must be withdrawn from the _p_ac|e makeup and £f=£E2£§ssed• w i lY s o d r a s t i c a solution? Because the text must be rearranged, and the values on any environment enquiries made by the user (in creating an index, f c r example) must be reprocessed c o r r e c t l y . Even t h i s may not be correct, because (although unlikely) a * l o c a l * footnote may have been put at the bottom of block 1, and i t i s from t h i s footnote that we are peeling a l i n e . Thus a l l text up to the point where the footnote was issued must be reprocessed. However, the footnote may have been issued on the condition that there were at lea s t s i x l i n e s remaining i n the block, and t h i s may no longer be true. To be e n t i r e l y safe, therefore, the text of the whole page w i l l have to be reprocessed (and the conditions, both of the Eureka string-space and global settings which c c n t r c l the layout, must be reset to t h e i r values at the top of the page). This i s i n t o l e r a b l e . If a f i v e - l i n e footnote i s issued in block 2, then most of the page must be reprocessed f i v e times. And yet short-cuts w i l l only lead to paradox — having given the Design 48 user f u l l computing power and arbitrary inter-dependence of blocks within layouts, there i s safety only i n the complete reprocessing of the text involved. Is scheme (b) adequate? Consider the following case: one has a document (a users' manual, for instance) in which descriptive text i s to be set i n double columns, and i l l u s t r a t i v e examples are to be set i n single columns the f u l l width of the page. What i s wanted, e s s e n t i a l l y , i s the power to switch a r b i t r a r i l y from double to single column mode; or, in terms of the document processor being described here, to make up the layout as one goes along. It i s not known, at the st a r t of the page, how many blocks are going to be i n that page when i t i s done. This i s something which TEXT360 i s able to handle, although the photo caption example i s beyond TEXT360's scope. A l l three examples can be handled with HCS or CypherText, but only after what I suspect are herculean e f f o r t s on the part on the user. We could propose scheme (c): (c) Text i s j u s t i f i e d to a given column width, and saved in e f f e c t i v e l y i n f i n i t e blocks, which the user may t i e off at any time. When the page i s complete, the user's program in s t r u c t s the document processor where these blocks, or contiguous subsections of these blocks, are to be placed on the page. This, i t can be shown without too much e f f o r t on the part of either the reader or the author, i s adequate for any layout Design 49 processing one can conceivably want to do. But then so i s a rudimentary runoff program, combined with scissors and a pot of good paste. Thus, scheme (c) i s powerful but too primitive and scheme (b) i s costly and not quite powerful enough. There i s the added danger, with scheme (b), that the processor somehow runs away from the user; that i t i s no longer comprehensible just what w i l l happen for a given layout d e f i n i t i o n in a given s i t u a t i o n . Could schemes (a) and (c) be combined, perhaps? One can imagine, for instance, that along with regular blocks, which form the (possibly empty) layout at the beginning of the page, there might also be "incidental* blocks, which may be started at any time, halted at any time, and directed to their proper places on the page at any time. This i s the electronic equivalent of scissors-and-paste, and one can only hope that the average user need never concern himself with i t . The danger, however, i s not that the user w i l l have too l i t t l e power of expression (the user w i l l always have too l i t t l e power of expression; what the user r e a l l y wants i s a secretary who can read both his writing and his mind) but that the design w i l l be overloaded with such 'features' as t h i s . Having taken the step in design of deciding to give users a system which, as much as possible, removes the need f o r excessive e f f o r t i n producing a finished document -- which takes some of the drudgery out of the process — i t seems contradictory to introduce a concept which has exactly the opposite e f f e c t . Very l i t t l e i s gained by introducing scheme (c) and much, i n the way of elegance and Design 50 s i m p l i c i t y , i s l o s t . Thus, admitting to a possible l i m i t a t i o n on the document processor's power, I choose to leave the design of layouts at scheme (a) . There i s one tempting p o s s i b i l i t y (and here I come scrabbling back from my position of righteous pu r i t y ) , which i s perfectly f e a s i b l e , w i l l help to accomodate the examples which caused me to postulate schemes (b) and (c), and w i l l not burden the user with d i f f i c u l t concepts. That p o s s i b i l i t y i s to adapt the TEXT360 notion of multiple columns to the Texture block structure, and to do thi s i n as general a fashion as possible. It can be made possible to divide a block v e r t i c a l l y into n columns of equal width and spacing, at any point i n the text. If there are k l i n e s of maximum width m available in the block, there are now kn li n e s of width (m- ( (n-1) s) )-*n, where s i s the space between columns. This 'columnizing' affects only the text between one such multiple-columns command and another, and goes from block to block u n t i l i t i s terminated. The text can s t i l l be processed on the f l y , thus saving the expense of backup and reprocessing, and the layout examples which precipitated schemes (b) and (c) can be handled quite adequately. H*. Non-linear Text How do footnotes, keeps, figures and recurring text (such as page numbers, t i t l e s and footers) f i t into the layout d e f i n i t i o n scheme described above? And what i s the processor (or the user) concerned with i n each of them? Design 51 (a) Footnotes If layouts consisted of a single column i t would be reasonable to say that footnotes are put at the bottom of the column, and continue to be put there u n t i l the page i s f u l l . The remainder of the footnote — the part which could not be set in the current column — i s put at the bottoms of subsequent pages. This follows typesetting conventions quite c l o s e l y . With multiple columns one could adopt one of three strategies: the footnote i s put at the bottom of the column in which i t i s issued and overflows onto subsequent columns (which may be on subsequent pages); the footnote i s put at the bottom of the rightmost column, moves up u n t i l i t hits the top of the column, or the text, and then overflows onto the next page; or the footnote i s put at the bottom of the rightmost column, moves up through preceding columns u n t i l i t h i t s the text and then overflows onto the next page. Of these, the f i r s t i s the most reasonable in common usage. It i s also the most common approach of typesetters when dealing with multiple column material. The second may s t i l l be reasonable, but i s l i k e l y to lead to confusion on the part of a reader. The t h i r d i s almost bound to be confusing, especially i n a layout of more than two columns. What about the document processor? Do we make an a r b i t r a r y choice, and apologize to any user who i s inconvenienced, because he wanted another method? One could implement a l l three and leave the choice entirely- free, but t h i s almost c e r t a i n l y v i o l a t e s Ockham's Razor. At the same time, i t must be Design 52 recognized that the program i s incapable of making the correct choice for a l l situations, e s p e c i a l l y because we are not merely dealing with multiple columns, but with a completely general layout consisting of a r b i t r a r i l y situated and related text blocks. The solution i s a compromise: recognizing that the thi r d choice i s p r a c t i c a l l y useless, we combine the f i r s t and second into a single f a c i l i t y which defaults, in the absence of other i n s t r u c t i o n , to the f i r s t . For each footnote, the user i s able to specify the block i t i s to go into. The footnote i s put at the bottom of that block, and continues upward u n t i l i t h i t s the text or the l i m i t s of the block. After that, i t overflows into subsequent blocks. In the absence of a s p e c i f i c a t i o n , the current block i s assumed. (b) Keeps Keeps, i t should be recognized, are a form of linear text, because they preserve the order of the text as i t appears i n the document source. They are discussed here, because they embody some of the same concepts as other material i n t h i s section. It seems reasonable that keeps should come in three v a r i e t i e s : line-keeps, block-keeps, and page-keeps. In an X-keep, the processor attempts to keep the text enclosed by the keep within the object X (lin e , block or page) that i s currently being processed. If t h i s i s not possible, the processor closes of f the current X, and releases the keep-text as ordinary text, to be put into the next X. This obviously has the unpleasant Design 53 e f f e c t (never to be tolerated i n high-quality typesetting) of leaving a gap behind whenever the text does not f i t ; at least, i n the case of a fixed-boundary block. (c) Figures Frequently there i s text which need not appear i n - l i n e with the main text of a document, but must be kept together as a unit. It i s especially useful to be able to prevent diagrams from being s p l i t across block boundaries, and normally i t i s unimportant that such a figure appear immediately i n - l i n e with the text that references i t . For thi s reason, the document processor defines figures. Again, there are three v a r i e t i e s , for the sake of orthogonality, although the most useful appears tc be the block-figure. If the text f o r a block-figure (page-figure, lin e - f i g u r e ) does not f i t into the current block (page, l i n e ) , then i t i s held back, and for each subsequent block (page, line) an attempt i s made to f i t the figure. Normally, an attempt i s made as soon as the block (page, l i n e ) i s entered, but i t i s conceivable that the user w i l l want to specify how many l i n e s (blocks, columns) of l i n e a r text must be set before the attempt i s made. While the figure remains unset, other text i s set i n - l i n e . This removes the problem of the gap l e f t by keeps. Design 54 (d) Recurring Text As outlined in chapter 1, the usual approach to page numbers and t i t l e s of various sorts i s to treat them each as a s p e c i a l problem, in i t s own niche on the page. However, as t,he t i t l e of t h i s section suggests, they have an aspect in common which makes them amenable to being handled as e s s e n t i a l l y one problem: they recur, from page to page, i n much the same form. Naturally, page numbers change with each page, tut t h e i r position i n the layout does not change. T i t l e s may remain constant for many pages, and even when they are changed, t h e i r position remains the same. With the exception of typesetting systems l i k e HCS and CypherText, where the user must determine, and program, many things for himself, document processors generally reserve space on the page for some t i t l e material and page numbers. If the designer of your l o c a l document processor has seen f i t to include s u b - t i t l i n g f a c i l i t i e s , then you can use them whenever you need them, but i f he has not, you must do without. Naturally i t i s unfair to subject users to the whims of a designer, or to one set of conventions: one user may want several l e v e l s of t i t l e s , s u b t i t l e s and footers (though heaven knows why), while another user may be s a t i s f i e d with l i t t l e i n the way of t i t l e s , but absolutely must s a t i s f y his sense of symmetry by having page-numbers come out centred on the bottom l i n e of the layout. The problem i s more acute even than that, i n the document processor being designed here, by the complex p o s s i b i l i t i e s inherent i n layouts. Users may well want two Design 55 page-numbers to appear i n one layout, i f the material i s to be produced two-up, and l a t e r to be reduced photographically. Again, we cannot automatically decide what the user wants. The solution? Each block has associated with i t a s t r i n g of text, which i s released into the stream each time the block i s entered. The text, of course, may embody any amount of Eureka program, and so t h i s f a c i l i t y may be used for any purpose which requires text to be repeated. Its most immediate use, naturally, i s in the area of t i t l e s and page numbers. Several problems are solved at once, and neatly: positioning the t i t l e or page number becomes exactly l i k e positioning other text — i t i s done by means of blocks; t i t l i n g and numbering blocks i s no d i f f e r e n t from t i t l i n g and numbering pages; with l i t t l e e f f o r t , the user may design as complex a t i t l i n g and s u b t i t l i n g structure as he needs. (e) Environments What happens i f there are indents in e f f e c t , and a footnote i s issued? The footnote w i l l probably appear at the bottom of the block i n which i t was issued: should the footnote have the same indents as the text which references i t ? The answer i s no, not i n general, but i f the footnote i s j u s t i f i e d and set in the same environment as the text in which i t was issued, i t w i l l have the same indents. Another problem: in a certain document, there i s a two l i n e t i t l e which appears at the top of every page. Normally, the text i s of the document i s double-spaced, but at the bottom of Design 56 page 14, there i s a quotation from Marshal McLuhan, and to emphasise the density of his prose, the author of the document decides to space the quote sin g l y . Naturally, the quotation runs u n t i l well past the middle of page 15. Should the t i t l e at the top of page 15 be single spaced? Again, our reaction i s that the t i t l e has nothing to do with the text appearing below i t , but should look the same on page 15 as i t does on pages 14 and 16. Other, equally common examples could be c i t e d , but the conclusion i s already obvious: non-linear text e x i s t s i n i t s own environment, I t cannot be expected to be affected by the text which surrounds i t . This leads to an obvious design decision: each piece of non-linear text (footnotes, figures and recurring text — once more i t must be emphasised that keeps are linear text with a s l i g h t peculiarity) exists i n i t s own environment; decisions made outside t h i s environment should not affect the text inside i t , and no decision made inside the environment should carry outside. Environments are not en t i r e l y closed rooms. Macros are not a part of them, and so carry across the boundaries. Indents, tab-stops, l i n e spacing, j u s t i f i c a t i o n modes, etc., are part of the environment. Environments are not nested, and must net be confused with the programming language concept of blocks: they are activated and suspended in a quite sequential manner, at the occurrence of certa i n events. By the same token, i t seems reasonable that there should be f a c i l i t i e s for the user to save, activate and restore environments so that, f o r instance, a Design 57 single environment can be established f c r footnotes at the beginning of the document, and never be worried about again, or that s p e c i f i c environments can be associated with s p e c i f i c blocks. 5^ J u s t i f i c a t i o n As mentioned in chapter 1, there appear to be f i v e •modes" of adjustment which can be applied to a l i n e : ragged r i g h t margins, ragged l e f t margins, both margins f l u s h , centred (both margins ragged), and s p l i t . Of these, the f i f t h i s an anomaly: while the other four may continue from l i n e to l i n e u n t i l turned o f f , the • s p l i t 1 adjustment — most often used in tables of contents, t i t l e s and numbered formulae -- i s b u i l t into the text; that i s , the author indicates where a s p l i t i s to occur, whenever there i s a s p l i t . The text from the s p l i t to the end of the l i n e i s set with a flush r i g h t margin; the text from the beginning of the li n e to the s p l i t i s set with a flush l e f t margin, Tabbing upsets the j u s t i f i c a t i o n process, i n that i t i s no longer possible to j u s t i f y the whole l i n e when a tab has occurred i n i t , without upsetting the eff e c t of the tab. In order to combine the two concepts in as general a fashion as possible, we introduce the concept of a segment. A segment contains the text from the beginning of the l i n e or from a tab point to the end of the l i n e or to the next tab point. Within a segment any one of the four regular j u s t i f i c a t i o n modes may Design 58 apply. If a segment i s terminated by a tab, i t seems reasonable to adjust i t ragged right (this i s what i s normally done i n other document processors) ; the segment terminated by end of l i n e i s , of course, adjusted i n the prevailing j u s t i f i c a t i o n mode. This i s what i s done in the absence of other i n s t r u c t i o n s , but i t should be possible for the user to i n s i s t on a given j u s t i f i c a t i o n mode being used i n any segment. Normally, the user w i l l not have to concern himself with, or even be aware of the existence of segments, but he may find them convenient at some point, when his documentation becomes s u f f i c i e n t l y complex. It can now be seen that ' s p l i t ' adjustment i s not a primitive. The l i n e i s simply broken up into two segments: the part before the s p l i t , and the rest; the f i r s t segment i s set ragged r i g h t , the second i s set ragged l e f t . It i s f a i r l y obvious how l i n e s are adjusted ragged l e f t or ri g h t , or centred. The words are spaced by the minimum word spacing (normally, for typewriter oriented output, a space; but the user, naturally, has the a b i l i t y to adjust t h i s value). The remaining spaces are then put afte r the l a s t word (for ragged right margins), before the f i r s t word (ragged l e f t ) , or divided evenly between the front and back (for centred). Flush l e f t and right margins naturally present a greater problem. The basic algorithm (to which, much l i k e the recipe fo r a muffin, enhancements may be added to taste) i s quite simple: - i f there are n words i n the segment, there are n-1 Design 5 9 spaces; suppose there are k characters in t o t a l among the words i n the segment, and the segment i s to be set in a space s characters wide - t h i s means that between any two words we can put (s-k) -s-(n-1) spaces (where 4- i s integer division) that leaves (s-k) mod (n-1) spaces (we s h a l l c a l l t h i s value r) which are apportioned among the exis t i n g inter-word spaces (to prevent ' r i v e r s 1 of blank space, they are normally put into the leftmost r inter-word spaces on one l i n e , and into the rightmost r inter-word spaces on the next) As for enhancement: i t i s possible, i f the value (s-k) * (n-1) exceeds some maximum word spacing value, to attempt hyphenation. I f hyphenation f a i l s , rather than putting an unsightly spacing on the page, one can simply decide to set the l i n e ragged r i g h t . For a si m i l a r algorithm concerned with the f i n e r spacing p o s s i b i l i t i e s of phototypesetting equipment, see Kunkel and Marcum[28], i i User-interaction with the Formatting Process (a) Querying the Process If the user i s to be kept on an equal l e v e l with the document processor, and not merely be i t s keeper, feeding i t with the appropriate t i d b i t s to get back the required stunts. Design 60 there must be a rel a t i o n s h i p of mutual trust, A S i t stands, in most instances, the user entrusts his text to the document processor, and i s told nothing about the process u n t i l his document, i n one form or another, comes out the other end. This i s no way to run a document processor. Whatever information the processor has available to i t should be available to the user, i n a convenient form, at any stage in the process. By the same token the user should, within reason, be able to forbid or command any aspect of the process. Information i s useless without control; control i s meaningless without correct information. Besides being a reasonable design decision, t h i s attitude r e l i e v e s the implementor of having to add the primitives which could as conveniently, and often more conveniently, be programmed in the command language (perhaps as a l i b r a r y cf useful functions). It also r e l i e v e s the designer of having to cover every case with primitives; he need only design primitives which can be used to build more complex functions. A good, simple example of t h i s might be the other kind of keep mentioned in chapter 1, which jumps tc the next unit (block or page) when there are fewer than n l i n e s i n the current unit. Given the a b i l i t y to ask how many lines remain in the current unit, and the power to demand a skip to the next unit, combined with some form of conditional macro evaluation (implicit in Eureka), the user can program t h i s kind of keep for himself, should he want i t . If he does not, i t s d e f i n i t i o n , as primitive Design 61 or otherwise, w i l l at least not be c l u t t e r i n g up the work space, (b) Events In events we see an old computing concept painted in bright new colours. When asynchronous processes are controlled by a computer, i t i s usually by means of interrupts. When one enables an interrupt, one i s generally interested i n knowing that a given event occurred, at the moment i t occurs. In processing a document there are also events which, depending on various complex fac t o r s , occur at odd places in the text. Events such as the completion of a l i n e , of a block or of a page are of importance to the user, but he cannot decide when they w i l l occur without doing the bulk of the document processing by hand — a tedious procedure. Thus the document processor should define a number of these events, and allow the user to define a s t r i n g of text, for each such event, which i s to be inserted i n the stream whenever the event occurs. S i m i l a r l y , the user might be given f a c i l i t i e s for defining his own events. He may wish to know whenever a f u l l stop (.) appears i n the text, followed by a blank. When t h i s happens, he wants to issue an extra blank (sentences are normally spaced wider than words) and c a p i t a l i z e the next l e t t e r (that i s , the f i r s t l e t t e r of the f i r s t word in a new sentence). It may be argued that t h i s can be done with any reasonable f i l e editor, but the text i n question may have been generated by a macro and therefore could not have been detected by a f i l e editor; in any Design 62 case, i t makes the source text much more readable i f such c l u t t e r i s l e f t out. Note that such events are not macros; the period and blank are not replaced by the event text, the event text i s inserted immediately behind them. Zi Defaulting Gracefully In consideration of design c r i t e r i o n 1, that the document processor should be almost immediately usable by someone who has had l i t t l e or no contact with computing, i t seems worthwhile to consider default conditions. In p a r t i c u l a r , i t i s important that, without any action on the user's part, the document processor acts i n a completely expected and unsurprising manner. Should the user wish to override the defaults, he w i l l have every opportunity to do so, but i t i s not necessary for him to know any more than how to run the document processor, and what a document processor does. Thus, for example, the i n i t i a l layout should be one which w i l l produce a normal looking typed page, the way a t y p i s t would, without being instructed. Preferably the pages should be numbered at the top r i g h t corner as well. With these conditions, a large majority of a l l documents could be produced with no more knowledge than how to s t a r t a new paragraph and how to cause underlining. And as f o r paragraph indentations, t h i s should correspond to the standard indentation fcr such documents: 5 columns. Defaults, on a l l aspects of the document processor, should Design 63 be such that the user can ignore much of the user manual, u n t i l his need for more sophisticated document-manipulation techniques becomes great enough to send him back to the manual (or, as i s often the case, the l o c a l •expert*). 3 A Source L i s t i n g s Too l i t t l e attention i s paid, i n most cases, to the importance of a useful source l i s t i n g . Very few documents have ever been run through a document processor successfully the f i r s t time. One of the problems has always been that i t i s d i f f i c u l t to correct a formatted document, given that the corrections must be applied to an unformatted source. TEXT360 and MTS FORMAT are perhaps the most helpful processors i n thi s respect: they put references to source f i l e line-numbers i n the margins of the document, and page numbers, referring to the document, i n the margins of the source l i s t i n g . Errors, which should be few, especially i f the processor has a 'natural* f e e l to i t and defaults reasonably, none the le s s occur. These should stand out in the source l i s t i n g l i k e coal i n a snow-drift for easy i d e n t i f i c a t i o n , and should c l e a r l y indicate the location of the error in the source. With a command language l i k e Eureka, the source l i s t i n g also becomes the source l i s t i n g for a programming language, and t h i s adds further r e s p o n s i b i l i t i e s for easing the user's job of correcting. Fortunately, Eureka has a simple syntax: i t seems s u f f i c i e n t to take the example of L i s p - l i s t i n g programs, and provide a nesting count. Thus, i f the nesting i s incorrect, i t Design 64 w i l l be apparent from the l i s t i n g . The n o t i o n of ' e v e n t s ' , which has been i n t r o d u c e d i n t o t h i s document p r o c e s s o r , adds th e p o s s i b i l i t y o f t e x t b e i n g i n v i s i b l y i n s e r t e d i n t o t h e stream and t h e n c a u s i n g unexpected, and i n f a c t i n c o r r e c t , a c t i o n s . The ' i n v i s i b i l i t y ' o f e v e n t s must be removed, and t h i s can be done by h a v i n g a l l e v e n t s c l e a r l y i n d i c a t e d i n t h e s o u r c e l i s t i n g , a l o n g w i t h the e f f e c t of the event ( t h a t i s , the t e x t which was i n s e r t e d ) . As i t i s w i t h e r r o r s , t h i s s o r t of i n f o r m a t i o n s h o u l d be s e p a r a t e d from the a c t u a l s o u r c e t e x t i n as v i s i b l e a manner as p o s s i b l e — i t i s t h e s o u r c e t e x t which must be e d i t e d , and i t must be easy f o r t h e u s e r t o see the d i f f e r e n c e between t h a t and any h e l p f u l i n f o r m a t i o n the p r o c e s s o r might choose t o add. F i n a l l y , w h i l e e r r o r s and e v e n t s might be d i s t i n g u i s h e d from the s o u r c e l i s t i n g i n a s i m i l a r manner (boxed, f o r i n s t a n c e , or i n a d i f f e r e n t f o n t o r r i b b o n c o l o u r , f a c i l i t i e s p e r m i t t i n g ) , t h e y s h o u l d a l s o be d i s t i n g u i s h e d from one another as much as p o s s i b l e . 65 Chapter 3 Text urej^ A Userj_s Manual " I t i s shaped, s i r , l i k e i t s e l f , and i t i s as broad as i t hath breadth; i t i s just.so high as i t i s , and moves with i t s own organs; i t l i v e s by that which nourisheth i t , dnd the elements once out cf i t , i t transmigrates." This chapter i s intended to describe Texture thoroughly, both to the prospective user, and to the reader of t h i s thesis. I t w i l l , therefore, form the basis for a user's manual (when combined with a description of how to use Texture on any given operating system), and i t w i l l , at the same time, give the sp e c i f i c a t i o n s for an implementation. The input to Texture i s a stream of text. This text may be broken across the 'boundaries' usually imposed on such a stream by the harsh p r a c t i c a l i t i e s of the r e a l world: such boundaries as the end of a card, a record i n a l i n e - f i l e or on a tape. To the document processor, however, a l l t h i s looks l i k e one contiguous stream of text. The text w i l l probably include the text of the document which Texture eventually outputs; but mixed with this prose there w i l l i n general be remarks to the processor i n the form of Eureka c a l l s . Eureka, mentioned in Chapter 2 , i s a language for the d e f i n i t i o n and expansion of macros. It defines primitive functions for these purposes, along with some conditional evaluation, input/output and string A User's Manual 66 manipulation primitives. A Eureka program i s a Eureka c a l l , which i s generally enclosed i n the delimiters < and >. The input to the document processor i s therefore f i r s t examined by Eureka, which either ignores the text ( if i t i s not inside the Eureka c a l l d elimitiers) or evaluates the text i f i t i s a c a l l . The value of a c a l l i s either scanned anew or, i f i t was a neutral c a l l (one preceded by the symbol ' : • ) , i t i s ignored. When Eureka "ignores" text, i t passes the text along to Texture proper (the document processor) which outputs the text i n a finished form. The form of the output text i s controlled by the c a l l i n g of primitives (described i n the user's manual, below) which affect the processing of subsequent text as soon as they have teen evaluated. To the user, then, i t appears as though Texture i s actually a set of document processing primitives defined in Eureka, and that the Eureka output routine i s one which formats the processed text. This chapter w i l l describe the primitives (those which are proper to Eureka and those which define Texture) as though t h i s were, in f a c t , the case; that t h i s impression i s somewhat si m p l i f i e d w i l l become apparent when the complexities of implementation are discussed i n Chapter 4, The chapter i s organized into four parts. The f i r s t part, e n t i t l e d "Using Texture immediately," i s concerned with the basics, including a few useful commands. The second part, "Some more Texture," gives the remaining commands l i k e l y to be needed by the average user. The t h i r d part i s 1 "Eureka!" which introduces the user to writing programs in Eureka. The fourth A User's Manual 67 part i s e n t i t l e d "The rest of Texture," and gives a l l the other commands and advanced concepts which remain. I i Using Texture immediately, If one knew nothing else about Texture than how to run i t , i t would be possible to feed some l i n e s of raw text to Texture, and to get the text printed out again, neatly formatted for a typewriter-size page. This i s seldom adequate, however. In any document, i t i s necessary to be able to produce paragraphs, underline words and tab to desired columns. These are accomplished by means of "commands" to the processor, and i t i s perhaps most convenient to think of them as "asides" or "proofreader's marks." Like proper asides, they are enclosed in parentheses — i n thi s case angle brackets (< and >). This naturally precludes the use cf angle brackets for any other purposes, such as mathematical notation, since the processor always thinks of them as command delimiters, and the user should be aware of t h i s . In the next section i t w i l l be explained how angle brackets can be produced as ordinary text. Commands may be inserted anywhere i n the text: whatever their e f f e c t i s , i t w i l l be f e l t at the point at which they occur. whenever possible. Texture acts the way a typewriter would, and commands are treated as though they were special buttons on a typewriter. The most important commands, and th e i r e f f e c t s on the text, are the following: A User's Manual 68 <P> New paragraph On an e l e c t r i c typewriter, this would be the same as h i t t i n g the carriage return button, and then spacing over f i v e places. A l l the paragraphs i n thi s document were started with a <P> before the f i r s t word i n the paragraph. <L> New l i n e On an e l e c t r i c typewriter, t h i s would be the same as h i t t i n g the carriage return button. Instead of <LXLXL>, to give the effe c t of h i t t i n g 'carriage return' three times, i t i s possible to say <L ,3>. <U> and <NOTU> Underlining Every word between a <U> and the next <NOTU> w i l l be underlined. The blanks separating these words are not underlined. Letters, d i g i t s and hyphens within words are underlined, but punctuation i s not. In section 4 the description of the pair of functions UND and NOTUND w i l l explain how the set of characters which i s underlined may be changed 2 S o m e more Texture Tabs As anyone who has ever used a typewriter w i l l know, i t i s A Dser's Manual 69 convenient to be able to jump, at the touch of a single button, to a predetermined column. Almost a l l typewriters have a f a c i l i t y for setting * stops' at such columns, and so has Texture. Following the usual terminology, t h i s f a c i l i t y i s c a l l e d Tabbing. <TABSET,n> Setting tab-stops The value n i s a number corresponding to a column. A tab-stop i s set at that column. Thus, <TABSET,26> w i l l set a tabstop at column 26. These locations are always r e l a t i v e to the left-hand p r i n t i n g edge. <TABCLEAB,n> Clearing tab-stops This has the opposite e f f e c t of TABSET. The tab-stop at column n i s removed. <TABCLEAR>, without a value n, causes a l l tab-stops to be removed. <T> Tabbing This w i l l cause a tab to the next tab-stop (set by TABSET). If no tabs are set, or one wishes to tab to a s p e c i f i c column, i t i s possible to say <T,n>, which w i l l cause a tab to column n. <LINESPACING,n> Setting the l i n e spacing On typewriters there i s usually a switch which controls what i s known as 'single', 'double' or ' t r i p l e ' spacing. This A User's Manual 70 means that a carriage return w i l l result i n the paper being advanced one, two or three l i n e s , respectively. In Texture, t h i s command controls the number of blank l i n e s between any two t2££^ 2 i£S§« Thus single spacing i s set by <LINESPACING,0>, double spacing by <LINESPACING,1> and so on, for any positive value. <LI,n> and <RI,n> Indentation On typewriters there are usually a pair of 'margin' buttons which set the l e f t and ri g h t margins. In Texture, i t i s these commands which act as those buttons. <LI,n> resets the l e f t margin to column n, and <RI,n> resets the right margin n columns to the l e f t of the right printing edge. It i s worth knowing that they act much the same way as the equivalent buttons on a typewriter. I f you have already typ.ed a l i n e , and then reset the l e f t margin, the effect of your reset w i l l not be seen u n t i l the next l i n e . S i m i l a r l y i f you are past column n, and reset the right margin to column n, the e f f e c t w i l l not be seen u n t i l the next l i n e . Indentation of the l e f t margin has no e f f e c t on tab columns. <T,n> w i l l have the same ef f e c t after a <LI,m> as before (for n>m). The expected occurs for a <P> command, i f we think of paragraphing as a new-line followed by spacing over some number of columns; that i s , the paragraph does not begin at a given column, i t i s over some number of columns from the l e f t margin. A User's Manual 71 Adjusting text within a l i n e Texture normally adjusts a l l text so that both margins are flush (like t h i s paragraph). If a l i n e i s suddenly terminated, by the equivalent of a carriage return (<1> for example, or <P>), the l i n e w i l l only have the left-hand margin f l u s h . Sometimes, however, i t i s desirable to centre some text within a l i n e (as i s , for instance, the t i t l e of t h i s chapter), or to have several l i n e s coming out with a ragged right margin, as though they were typed on an ordinary typewriter. For t h i s , there are other modes of adjustment. <JUSTIFIED> This i s the normal mode of adjustment. It causes both margins to be f l u s h . <RAGRIGHT> This i s the sort of adjustment you get when you type something. Only the l e f t hand margin i s set flu s h ; the ri g h t hand margin i s set exactly where i t was when the line f i l l e d up, and the processor had to go to the next l i n e . This paragraph i s an example of ragged right text. A User's Manual 72 <RAGLEFT> This i s the opposite of RAGRIGHT, and causes text to be set with a ragged l e f t , and a flush right margin. There i s l i t t l e use f o r such an adjustment mode, except in paragraphs l i k e t h i s , and in setting concrete poetry. <CENTRED> This mode causes each l i n e to be centred as much as possible. An almost equal number of blanks i s put on each size of the l i n e to centre i t . A good use of t h i s mode i s i n t i t l e s , and the t i t l e of t h i s chapter i s , indeed, centred. A poor use i s i n normal documents, unless one i s demonstrating centred text, as in t h i s paragraph. <SPLIT> This i s not so much an adjustment mode as a command which causes a l i n e to be s p l i t into two parts, one set flush against the left-hand margin, the other flush against the right. I t i s useful i n tables of contents, and for other s i m i l a r e f f e c t s . For example, the l i n e "<L> Chapter 3 ..... <SPLIT> 42 <L>" would come out Chapter 3 ..... 42 <CHAR,x> Special characters Some l i n e printer print-chains have a number of sp e c i a l A User's Manual 73 c h a r a c t e r s on them which are very u s e f u l f o r producing handsome documents. These i n c l u d e braces ({ }) , sguare brackets ([ ]) , s u p e r s c r i p t numbers (»23*s67890) a n d other graphic c h a r a c t e r s . Since these c h a r a c t e r s do not have an e g u i v a l e n t on most data entry equipment (t e r m i n a l s and, heaven f o r b i d , keypunches), i t i s d i f f i c u l t to get at them without a f a c i l i t y l i k e CHAR. The value x i s the decimal e q u i v a l e n t of the bit-sequence of the p a r t i c u l a r c h a r a c t e r . Thus, |, which i s not i n c l u d e d on some t e l e t y p e s , has the b i t - r e p r e s e n t a t i o n 01001111 i n EBCDIC, and i s t h e r e f o r e <CHAR,79>. Having to remember numbers i s almost as annoying as not having the c h a r a c t e r s a v a i l a b l e i n the f i r s t p l a c e , and i t w i l l be e x p l a i n e d i n the next s e c t i o n how one can d e f i n e mnemonics f o r one's f a v o r i t e s p e c i a l c h a r a c t e r s . T h i s makes i t p o s s i b l e t o keep t h i s s et of d e f i n i t i o n s i n a f i l e , o r , on p r i m i t i v e o p e r a t i n g systems, as a deck of c a r d s ; t h i s f i l e can then be appended to the f r o n t of the document source and the c h a r a c t e r s need never be worried about again. When t h i s chapter i s converted i n t o a user's manual, i t should be accompanied by a l i s t of such c h a r a c t e r s , and the decimal e q u i v a l e n t s i n the p a r t i c u l a r o p e r a t i n g system. Case s h i f t s J u s t as not every i n p u t device has a f u l l c h a r a c t e r s e t , so not every input device has lower case c h a r a c t e r s . I t would be very tiresome i f we had to do a l l our lower-case l e t t e r s with CHAR, and i n r e c o g n i t i o n of t h i s . Texture p r o v i d e s s e v e r a l s o u r c e - e d i t i n g commands, A User's Manual 74 <DOWN> and <NOTDOWN> After a DOWN, every text character from "A" through "Z" i s automatically turned into i t s lower-case equivalent, u n t i l a NOTDOWN i s encountered. Eureka program text i s not affected by t h i s t r a n s l a t i o n . <UP> and <NOTUP> After an UP, every text character from "a" through " 2 " i s automatically turned into i t s upper-case equivalent, u n t i l a NOTUP i s encountered. Eureka program text i s not affected by t h i s t r a n s l a t i o n . Some special operators For convenience. Texture defines some single-character operators which are not treated as text, but have an ef f e c t on the next character which follows them 1 . »•_" places an underscore under the next character. Thus Hba_th" becomes "bath" on output. 2. "9" causes the next character to be shifted up. Thus, "amcaphee" becomes "McPhee". This can be very useful in DOWN mode. 3. " g " causes the next character to be shif t e d down. Thus, "0E.0E.0CummingsH becomes "e,e.cummings". This i s very useful in UP mode, to avoid having to s h i f t down for a single character. 4. "'fr" causes the next character to be treated l i t e r a l l y as A User's Manual 75 text. Thus, "15*f*EACH" becomes "152EACH", not "15eACH» as i t would" have been without the star. S i m i l a r l y , "*<P>'» does not cause a new paragraph, but simply causes the text " < P > " to come cut (which i s the only way t h i s chapter could have been written). 5. "/" causes the next character to be overprinted cn the previous. Hence, M a / " n becomes " 8 " , and ":/<CHAR,191>" becomes 6. " - i " i s replaced by a blank on output. The difference between Ma->b" and "a b" i s that "a b" i s two words, and may be broken across l i n e s or separated by extra spaces after j u s t i f i c a t i o n , while "a-<b" i s a single word which happens to have a blank i n i t . The character which replaces the -» i s or i s not underlined in underlining mode, depending on whether the character would normally be underlined. Thus a blank, which i s the default character, would not be underlined. These operators always act upon the next text character. Thus two or more operators might appear i n a row and so a f f e c t the same text character. (e.g., "_aa M r e s u l t s i n "A".) <SET,c,op> Setting special characters It i s not necessary to consider oneself stuck with the six s p e c i a l characters given above. Sometimes these p a r t i c u l a r characters are inconvenient, and we'd prefer another set. In A O s e f s Manual 76 the command, "c" becomes the spe c i a l character denoted by "op", where "op" means the following: <SET,c,TEXT> c i s henceforth treated as ordinary text. This can be convenient i f one i s not using one of the special characters, and wants to be free from the bother of having to put a star i n front of any use of c i n text. <SET,c,BREAK> c i s henceforth treated, the way a blank i s now, as a break between words. Hereafter, "acb" i s treated as the two words "a" and "b". <SET,c,HTB> c i s henceforth treated as a n o n - t r i v i a l blank (the way -> i s by def a u l t ) . That i s , from now on, c i s replaced by a blank, on output (described above). <SET,c,OPC> c i s henceforth treated as an overprint operator, the way / i s by default. <SET,c,LIT> c i s henceforth treated as a l i t e r a l - n e x t operator, the way * i s by default. <SET,c,DOWN> c i s henceforth treated as a down-shift operator, the way \t i s by default. <SET,c,DP> c i s henceforth treated as an up-shift operator, the way a i s by default. <SET,c,UNDER> c i s henceforth treated as an underscore-operator, the way _ i s by default. <LIST> and <NOTLIST> Turning the source l i s t i n g on and off The action of these commands i s obvious: LIST causes the A User's Manual 77 source to be l i s t e d (as i t i s by default). NOTLIST w i l l turn off the l i s t i n g , u n t i l the next LIST command. This l i s t i n g i s normally put out at the end of the document, but i t may be directed to any f i l e or device accessible for printing by means of the command <LIST,name>, where 'name' i s the name of the f i l e or device. A l l source l i s t i n g after such a d i r e c t i v e w i l l go to the given location. The exact d e t a i l s of a source l i s t i n g w i l l vary from i n s t a l l a t i o n to i n s t a l l a t i o n : they always do. Eut' certain general features w i l l be r e l a t i v e l y constant. Each lin e of source text i s echoed to the s o u r c e - l i s t i n g f i l e as soon as i t i s read i n , prefaced by the following information: line-number (for e d i t i n g purposes); Eureka nesting at the time the l i n e was read in (normally zero; useful for those who write quite complex Eureka programs); and page number at the time the l i n e was read in (for cross-referencing between source and document). When an error occurs, the l i n e , up to the point of error, i s echoed again and the error message i s printed below i t . This w i l l occur inside some sort of d i s t i n c t i v e box or, i f f a c i l i t i e s permit, i n a d i f f e r e n t font or colour from the source l i s t i n g . When an 'event* (see section 4) occurs, and causes some text to be inserted, a similar procedure takes place: the text up to the point of the event i s echoed and the event text i s printed below i t . Again the event should be c l e a r l y distinguished from the source l i s t i n g , and i n fact, where possible, be c l e a r l y distingushed from errors. h User's Manual 78 <PN,n> Setting the page number Texture w i l l automatically number pages sequentially, s t a r t i n g with page 1. Sometimes, when a document i s being assembled chapter by chapter, i t i s convenient to s t a r t off with a number other than 1, and for t h i s reason, <PN,n> sets the page number to n. If one wants to know the current page number f o r some reason (and there are good reasons, such as setting up a backward reference), the command <Pfl> i s always replaced by the value of the current page number. Thus the text "This i s page <PN>" becomes "This i s page 78" when i t i s printed. Headers and footers It i s convenient to have some f a c i l i t y for putting headers at the tops of pages and footers at the bottoms, so that a person f l i p p i n g through a document can f i n d his place quickly, by looking for the appropriate i d e n t i f y i n g text. For the user's convenience, Texture defines, by default, a header at the top of each page, and a footer at the bottom. The header i s divided into l e f t and right parts, each of which may be set i n d i v i d u a l l y . The right header i s , by default, the page number, but i t may of course be reset. This should be s u f f i c i e n t for most uses, but i f i t i s not, section 4 w i l l explain how to define one's own headers. A User's Manual 79 <LTITLE f text> Setting the l e f t header The l e f t header i s set to 'text'. On each subsequent page, the text 'text' i s put at the l e f t top of the page. If 'text' i s absent, as i n <LTITLE>, the text that would appear at the top of the current page also appears i n the stream at this point (that i s , the value of <LTITLE> i s the l e f t t i t l e ) . <RTITLE,text> Setting the right header The r i g h t header i s set to 'text*. On each subsequent page, the text 'text' i s put at the right top of the page. If 'text' i s absent, as in <RTITLE>, the text that would appear at the top of the current page also appears in the stream a t . t h i s point (that i s , the value of <BTITLE> i s the r i g h t header). The value of <RTITLE> defaults to "<PN>", with the r e s u l t that the page number comes out at the top right of each page. <TITLE,text> Setting the entire header The header i s set to 'text'. On each subsequent paqe, the text 'text' i s put at the top of the page. I f 'text' i s absent, as in <TITLE>, the text that would appear at the top of the current page also appears i n the stream at t h i s point (that i s , the value of <TITLE> i s the page header). I n i t i a l l y , the page header i s given the value ,«<LTITL.E><SPLITXRTITLE>», which i s what causes the l e f t header and r i g h t header to appear at opposite ends of the same l i n e . A User's Manual 80 <FOOTEB,text> Setting the footer The footer i s set to 't e x t 1 . On each subsequent page, the text 'text' i s put at the bottom of the page. If 'text' i s absent, as in <FOOTEB>, the text that would appear at the bottom of the current page also appears in the stream at this point (that i s , the value of <F0CTER> i s the current f o o t e r ) . <PAGE> Skipping to the next page On a typewriter, t h i s i s the eguivalent of r o l l i n g the current page out, r o l l i n g the next page i n , typing the headers and page number and then moving the carriage to the f i r s t column of the f i r s t l i n e . Ia. EurekaJ, Eureka i s a complete programming language, i n which Texture i s embedded as a set of primitives and through which f u l l computing power i s available to the user throughout the processing of a document. The average user w i l l not need to know very much about Eureka, because he w i l l only be giving simple commands to Texture. But i f i t becomes necessary to do some complex calculations and to have d i f f e r e n t commands performed depending on some condition, i t i s necessary to know how Eureka can be used as a programming language. Fortunately, t h i s i s very simple. What follows then, i s a A User's Manual 81 Child's Garden of Eureka. Every Eureka program looks something l i k e t h i s : <a,b,c i..,z> Each of the a,b,c etc. may be an a r b i t r a r i l y long string of characters. Eureka evaluates the prototype program above by scanning each of the a,b,c etc. for more Eureka programs. These i t evaluates f i r s t , and rescans the values they return for more Eureka programs. When i t i s fi n i s h e d scanning, i t treats the value "a" as the name of a function, and ' c a l l s ' that function. The function may use the values of b through z for i t s own dark purposes, and f i n a l l y i t returns a st r i n g of characters as i t s value. This value replaces the program, and i s immediately rescanned by Eureka. For example: <EQ,5,<ADD,2,3>,YES,NO> Eureka f i r s t evaluates a l l programs inside the main program, and replaces them by their value. There i s exactly one program i n s i d e : the <ADD,2,3>, which, needless to say, adds i t s two parameters, and returns the r e s u l t . Thus, when finished scanning. Eureka faces the program <EQ,5,5,YES,N0> Now Eureka evaluates the main program. EQ i s a function which compares i t s f i r s t and second arguments, and i f these are equal returns the t h i r d , otherwise i t returns the fourth argument. Hence the resu l t i s YES A User's Manual 82 Of course, i t i s n ' t quite that simple. These are the precise semantics: 1. There are three strings and a stack whose elements are st r i n g s : a neutral, an active and a scanning s t r i n g , and an evaluation stack. A Eureka program to be evaluated i s on the active s t r i n g , and at the beginning of evaluation, the neutral and scanning strings and the stack are empty. 2. Text i s taken from the front of the active s t r i n g and put on the end of the neutral s t r i n g , character by character, u n t i l a closing bracket (>) i s encountered. 3 . Text i s taken off the end of the neutral s t r i n g , character by character, and put on the end of the scanning s t r i n g u n t i l an argument separator i s encountered (,). When an argument separator i s encountered, the text on the scanning s t r i n g i s put on top of the stack. Step 3 i s repeated u n t i l an opening bracket (<) i s encountered, at which point the text on the scanning s t r i n g i s put on top of the stack as a single stack element. Scanning resumes with step 4 . 4. The stack represents the name and arguments of a function c a l l . The top element on the stack i s the name of the function to be calle d , the remainder of the stack, i n order, gives the arguments. The l a s t character on the neutral s t r i n g i s examined. I f t h i s character i s the neutral evaluation character (:), then t h i s character i s removed from the neutral s t r i n g , and the re s u l t of the function c a l l (a string) i s put at the end of the neutral s t r i n g . If t h i s A user's Manual 83 character i s any character but the neutral evaluation character, the r e s u l t of the function c a l l i s put at the front of the active s t r i n g . Evaluation resumes at step 2. That i s a l l , except that the algorithm i s enhanced as follows, to allow for quoting of text (which i n h i b i t s evaluation): a. In step 2 , whenever an opening quote (") i s encountered, the scanning l e v e l i s incremented by one; whenever a closing quote (») i s encountered, the scanning l e v e l i s decremented by one. Only at a scanning l e v e l of 0 (the i n i t i a l l e v e l ) , does a closing bracket (>) terminate step 2, b. In step 3, whenever a closing quote (») i s encountered, the scanning l e v e l i s raised by one; whenever an opening guote {") i s encountered, the scanning l e v e l i s lowered by one. Only when the scanning l e v e l i s zero, does an argument separator (,) cause text to be put on the stack, and then only after being stripped of any leading " and t r a i l i n g *. A program example The user may wish eventually to reproduce the output of texture in a book format, by photographic means. If he does, he w i l l probably want the page numbering to be placed d i f f e r e n t l y f o r odd-numbered (right-hand) pages than for even-numbered (left-hand) pages. In most books, the page number on even-numbered pages i s on the l e f t , and on odd-numbered pages on A User's Manual the r i g h t . This might be done by the following commands: <LTITLE,"<EQ,<MOD,<PN>,2>,0,"<PN>',"<MYTITLE> «>•> <BTITLE ,"<EQ ,<MOD ,<PH> ,2> ,0 ,'*<MYT ITLE> ' , M<PN> •>' > <DS,MYTITLE,text> Here, MYTITLE i s a s t r i n g which expands as the desired header text. I t is on the right top (and the page number on the l e f t top) whenever the page number i s even -- that i s , when the page number modulo 2 i s zero — and vice versa when i t i s odd. To c l a r i f y the Eureka scanning algorithm, l e t us trace through the events at the top of a page. At the top of a page. Eureka i s looking at the following s i t u a t i o n : ||document text ... The bars represent the d i v i s i o n s of the st r i n g s . To the l e f t of the f i r s t bar, i s the neutral s t r i n g . Between the bars i s the scanning s t r i n g , and to the right of the second bar i s the active s t r i n g . F i r s t , Texture causes the text of the t i t l e to be inserted at the front of the active s t r i n g : ||<TITLE>document text ... Eureka scans u n t i l i t encounters the closing bracket (>) (step 2) <TITLE||>document text ... The arguments of the c a l l (there i s only one) are scanned u n t i l an opening bracket (<) i s encountered: (step 3) <|TITLE|document text ... The function TITLE i s c a l l e d with no arguments, and the r e s u l t i s put on the active s t r i n g (this was an active c a l l ) , (step 4 ) | |<LTITLEXSPLITXBTITLE>document text ... A User's Manual 85 The process repeats for LTITL.E: (steps 2-4) | |<EQ,<MOD,<PN>,2>,0,"<PN>«,"<MYTITLE>»> ... Here, the e l l i p s i s (...) represents the st r i n g "<SPLIT><RTITLE>document text...". Again, step 2 causes the scan to proceed to the f i r s t closing bracket: (step 2) <EQ,<MOD,<PN||>,2>,0,.., The function PN i s evaluated: (step 3-4) <EQ,<MOD,||18,2>,0,... Again, the end of the f i r s t complete c a l l i s found: (step 2) <EQ,<MOD>,8,2||>,0,... This time, each argument of the c a l l i s stacked: (step 3) <EQ,<M0D,8,|2|,0,"<PN>»,"<MYTITLE> •>... (Step 3) <EQ,<MOD,|8 j,0,"<PN>',"<MYTITLE> * >... (step 3) <EQ,<|MOD|,0,"<PN>»,"<MYTITLE>•>... This r e s u l t s in an evaluation stack which looks l i k e : MOD 8 _2_ and the st r i n g : <EQ,||,0,"<PN>»,"<MYTITLE>'>... The evaluation of the stack produces 0 (18 modulo 2), which i s put on the active s t r i n g : (step 4) <EQ,||0,0,"<PN> *,"<MYTITLE>'>. . . The scan continues. Note that modification (a) (to the scanning algorithm) causes the scan to pass over the closing brackets enclosed in quotes: ( S t e p 2) <EQ,0,0,"<PN>,,"<MYTITLE>,||>... Again, the arguments are scanned and stacked; modification (b) (to the scanning algorithm) causes the scan to pass over the A User's Manual 86 opening brackets enclosed in quotes. (step 3) <EQ,0,0,"<PN>', |"<MYTITLE>'|.. . (step 3) <EQf0f0,|"<PN>*|. .. (step 3) <EQ,0f|0|... (step 3) <EQ, 10 |.. . (step 3) <|EQ|... F i n a l l y , the stack looks l i k e : EQ 0 0 <PN> <MYTITLE> This i s evaluated: i t i s true that 0=0, so the th i r d argument "<PN>" i s the value, which i s placed on the active s t r i n g : (*) (step 4) | | <PNXSPLITXRTITLE>document text.... And so evaluation continues: PN w i l l evaluate to the page number, SPLIT w i l l cause an action to take place in Texture, and w i l l evaluate to the empty s t r i n g , and RTITLE w i l l evaluate to the text of MYTITLE, F i n a l l y , "document text ... " can be scanned and processed. Neutral evaluation Suppose the d e f i n i t i o n of LTITLE had been as follows: <LTITLE,":<EQ,<M0D,<PN>,2>,0,"<PN>',"<MYTITLE>'>•> That i s , the function EQ has a neutral evaluation symbol (:) in front of i t . Then everything would have been rather the same, up to the point marked " ( * ) " , above. Just before this point, the Eureka scanning area looks l i k e : : | | <SPLITXRTITLE>document text ... (that i s , the neutral evaluation symbol preceded the c a l l of EQ, which has just been stacked and evaluated). Then, when the A User's Manual 8 7 value of the EQ i s returned (<PN>), t h i s value i s put cn the neutral rather than the active s t r i n g , and the resul t i s : (step 4 ) <PN>||<SPLITXRTITLE>document text ... 1 A careful scrutiny of the scanning algorithm w i l l convince the reader that t h i s "<PN>'» w i l l never be evaluated, but rather that i t w i l l be passed on to the document processor. In that case, the text at the top of the page w i l l be the s t r i n g "<PN>", and not the value of <PN>, which i s a number (in t h i s example, 1 8 ) . This i s the difference between neutral (:<fn>) and active (<fn>) evaluation: the r e s u l t , or value, i s not rescanned, and therefore, i f i t contains any Eureka programs, i s not evaluated any further; the result of an active evaluation i s rescanned. The reader w i l l soon r e a l i z e that t h i s i s one way of getting function c a l l s to pass through Eureka (and so to the document processor) without being evaluated. Beyond t h i s , the document-processor user has l i t t l e use for neutral evaluation; however, the Eureka programmer may f i n d neutral evaluation valuable to prevent the values of c a l l s from being themselves evaluated. Some Eureka functions The following three functions depend on l e x i c a l ordering. It i s assumed here that the alphabet over which strings may be formed i s ordered i n some fashion (this w i l l d i f f e r by loc a t i o n : EBCDIC ordering i s s l i g h t l y d i f f e r e n t fron ASCII ordering, but i t i s usually f a i r l y l o g i c a l . Usually, the blank i s lowest, and the l e t t e r s and d i g i t s are ordered a<b...<z <A<B...<Z A User's Manual 88 <0<1. . .<9. ) Two strings aA and bB (where a,b are single characters and A,B are strings) are l e x i c a l l y related as follows: i f a < b then aA < bB i f a > b then aA > bB i f a = b then aA r bB i f and only i f A r B, where r i s one of <, >, or = The empty st r i n g i s l e x i c a l l y less than any non-empty st r i n g , and equal to i t s e l f . <EQ,a,b,true,false> If a i s l e x i c a l l y equivalent to b, then the value i s the value of 'true'; otherwise i t i s the value of ' f a l s e ' . <LT,a,b,true,false> If a i s l e x i c a l l y less than b, then the value i s the value of 'true'; otherwise i t i s the value of ' f a l s e ' . <LE,a,b,true,false> If a i s l e x i c a l l y l e s s than b, or equivalent to b, then the value i s the value of 'true'; otherwise i t i s the value of • f a l s e ' . <ADD,a,b> The value i s the sum of a and b (assuming that a and b are A User's Manual 89 integer numbers). <SUB,a,b> The value i s the difference of a and b (assuming that a and b are integer numbers). <MULT,a,b> The value i s the product of a and b (assuming that a and b are integer numbers). <DIV,a,b> The value i s the quotient of a and b (assuming that a and b are integer numbers). <MOD, a,b> The value i s the remainder after integer d i v i s i o n of a by b (assuming that a and b are integer numbers); that i s : <SUB,a,<MULT,<DIV,a,b>,b» <TM> The value i s the time of day in a form determined by the l o c a l operating system. A User's Manual 90 <DT> The value i s the date in a form determined by the l o c a l operating system. <DS,name,text> This functions defines 'text' to be a s t r i n g with the name •name'. Henceforth, whenever "<name>" appears on the evaluation s t r i n g (see semantics, p. 82) evaluation w i l l replace i t with "text". <SS,name,gap1,gap2 ...,gapn> This function 'segments* the st r i n g with the name 'name', on the various 'gap's, which are strings. Segmentation i s done as follows: For each 'gap', every segment of 'name's st r i n g i s checked for an occurrence of 'gap' i n that segment; for each such occurrence, the segment i s broken into two new segments, the part before the occurrence and the part a f t e r ; between these two segments, there i s created a numbered segment gap (if i t was created by matching the k'th argument, then i t i s the segment gap numbered k) . For example: <DS,string,most people l i k e cheese> creates a s t r i n g (named 'string') consisting of a single segment, which we w i l l represent as A User's Manual 91 (most people l i k e cheese) If t h i s i s followed by the command, <SS,string,e,o> the r e s u l t w i l l be the following s t r i n g : (m)2(st p) 1 2(pl)1( l i k ) 1 ( ch) 1 1 (s) 1 Evaluation of the text "<string>" w i l l now r e s u l t in the text mst ppl l i k chs However, simple evaluation of the text "Kstring,$!>" w i l l r e s u l t i n the text mst p$!pl$! l i k $ ! ch$!$!s$! and "<string,a,b>" re s u l t s i n mbst pabpla l i k a chaasa That i s , the f i r s t argument replaces the gaps numbered 1 and the second argument replaces the gaps numbered 2. This can be carri e d on for as many gaps as there are, of course. Missing arguments are considered as an empty s t r i n g , and excess arguments (those for which there i s no gap) are ignored. It i s worth noting that, since the segmentation st r i n g s c a l l e d 'gap' above, are taken i n l e f t to ri g h t order, the pair of commands, <DS,foo,abcabcabcabc> <SS,foo,ab,bc> w i l l r e s u l t in 'foo' taking the i n t e r n a l form, 1 (c)1 (c)1 (c)1 (c) after 'ab' has segmented i t , and that 'be* w i l l thereafter f a i l to segment 'foo' any further. If the arguments had been reversed, as i n , <SS,foo,bc,ab> A User's Manual 92 the r e s u l t would have been instead: (a) 2 (a) 2(a) 2 (a) 2 The functions DS and SS are i s , of course, how the Eureka programmer defines macros with parameters. He could define a simple d e f i n i t i o n function as follows: <DS,define,"<DS,|name|,|text|> <SS,|name|,|parameters)>•> <SS,define,|name|,|parameters!,|text|> Which i s cal l e d as follows: <define,do,"what,howoften', "<LT,0,howoften, "what<do, what,<SUB,howof ten, 1 » ' > ' > The above c a l l of 'define 1, i n c i d e n t a l l y , defines a useful function c a l l e d 'do', which w i l l concatenate 'howoften' (an integer number) evaluations of a given s t r i n g 'what' (which may in turn be a Eureka program) . For example the c a l l <do,+,5> expands as follows (these 'snapshots' of the Eureka scanning area give the s i t u a t i o n at the st a r t of each new application of step 2 of the algorithm) : <do,+,5> <LT,0, 5, " + <do, +, <SUB, 5, 1 » » > +<do,+,<SUB,5,1>> +<do,+,U> +<LT,0,4," + <do, + ,<SUB,4, 1 » ' > • • • +++++<do,+,0> +++++<LT,0,0,"+<do,+,<SUB,0,1>>'> ++ + + + <DD,name> Deletes the d e f i n i t i o n of the str i n g c a l l e d 'name*. This A User's Manual 93 w i l l remove the s t r i n g from the Eureka s t r i n g space, and thus save a certain amount of space. <BYE> Execution - of Eureka (and therefore Texture) terminates immediately. Since t h i s could r e s u l t i n part of the current page being l o s t , i t i s a good idea to prefix the c a l l to BYE with a c a l l to PAGE (see section 4 ) . Uj, The rest of Texture One of the most important aspects of Texture i s that the user can control the layout of his text. This i s done by defining a layout as follows: <LAYOUT,name,parti,part2 ...,partn> This defines a layout named 'name', consisting of n 'part's. Each 'part' i s the name, either of another layout, or of a block. The layout i s defined as consisting of a l l the blocks and layouts whose names are given, in the order in which they are given. For each layout that i s part of a layout d e f i n i t i o n , the blocks of which the sub-layout consists replace the sub-layout. Thus i f layout A consists of (X,B,Y,Z) and layout B consists of (U,V,W), then layout A consists of (X,U,V,w,Y,Z). A User's Manual 94 <BLOCK,name,left,right,top,bottom,text> This defines a block. A block i s a segment of the printed page, going from column ' l e f t ' to column 'right', and from l i n e •top' to l i n e 'bottom'. Inside the space defined by the block, there i s room to put text. The f i r s t text that goes into any block i s the sixth parameter, 'text* (which may be an empty s t r i n g ) ; after that, text from the input stream i s put into the block. To put this into perspective. Texture defines a standard layout as follows: <BLOCK,standard-header,5,68, 1,1,"<TITLE><NEXT> •> <BLOCK,standard-text,5,68,5,58> <BLOCK, standard-footer, 5, 68,60,60, »<E00TEB><NEXT>•> <LAY0UT,standard-layout,standard-header, standard-text, standard-footer> This results i n the page you are now studying, and i s probably adequate for most documents. Notice that a l l these blocks go from columns 5 to 68, and that the header and footer blocks are only one l i n e deep: t h i s means that only one l i n e of header and one l i n e of footer are possible. Notice also, that the standard-layout could define i t s three constituent blocks to be i n any order, and i t would s t i l l produce the same layout (but not the same r e s u l t s : i f the footer i s before the text, then any change i n the footer i s not f e l t u n t i l the next page, but i f i t i s a f t e r the text, any changes are f e l t on the same page. This i s a footer A User's Manual 95 To invoj£§ a layout, (that i s , to make a given layout the active layout), use the following command: <INVOKE,layout> where 'layout' i s the name of a layout. This means that 'layout' w i l l become the active layout for the next output page, and any subsequent pages, u n t i l another c a l l of INVOKE. Thus, i f one wanted a layout to become immediately active, i t would be necessary to command, <INVOKE,layout><PAGE> In the d e f i n i t i o n of a block, any of the arguments beyond 'name' may be l e f t empty. They default as follows: ' l e f t ' ( l e f t margin) 5 'right' (right margin) 68 •top' (top line) 5 'bottom' (bottom line) 60 •text' empty st r i n g It i s also possible to redefine each of the boundaries or the 'text' at any time, and the r e d e f i n i t i o n s w i l l be vali d on the next page of output to be processed. To redefine the values, the following commands may be used. <LEFT,name,x> The block named 'name' has i t s l e f t column at 'x», where 'x' i s a number. <RIGHT,name,x> The block named 'name1 has i t s right column at 'x«, where A User's Manual 'x' i s a number. 96 <TOP,name,x> The block named 'name' has i t s top line at 'x*, where 'x' i s a number. <BOTTOM,name,x> The block named 'name' has i t s bottom l i n e at 'x*, where 'x* i s a number. In the four functions given above, i f 'x' i s absent, the corresponding current value becomes the value of the c a l l . Thus, <RIGHT,standard-text> y i e l d s the value 68, which i s the rightmost column of the block named 'standard-text'. Ey current value, i s meant the value on the current page. For example, suppose the right margin of block 'A' i s column 68, and the following appears i n the input stream: "I am now setting A's r i g h t column to <RIGHT,A,40><RIGHT,A>" The r e s u l t i n g text w i l l be, "I am now setting A's r i g h t column to 68" because, u n t i l the next page, the current value of <RIGHT,A> w i l l be 68. I f , i n any of the above c a l l s , the argument 'name* i s also missing, the value for the current block w i l l be returned. A User's Manual 97 Outlines Blocks are normally i n v i s i b l e templates for the document text. However, i t i s sometimes convenient, i f some section of text needs to be emphasized, to have an outline, or box, drawn around i t . This method of "making a block v i s i b l e " i s accomplished as follows: <VISIBLE> and <NOTVISIBLE> The next l i n e of text after a c a l l to VISIBLE, and a l l subsequent l i n e s u n t i l a c a l l to NOTVISIBLE, w i l l be inside a box looking something l i k e : [ 1 I I I I II I After a c a l l to VISIBLE, immediately after the current l i n e i s f i n i s h e d , the top of the box i s generated, and the l e f t and right margins of the text are indented by 1 column (unless they are already indented; i f t h i s i s not adequate, the user can always issue his own indents, of course). I f the bottom of the current block i s reached before NOTVISIBLE i s c a l l e d , a box bottom w i l l be generated, and at the top of the next block, a new box w i l l be started a f t e r any mandatory text (text associated with the new block) has been processed. The characters which go to making up the outline of the v i s i b l e block may be altered: A User's Manual 98 <SET,c,BOXTOP> The box top character (by default, -) i s replaced by c. <SET,c,BOXBOTTOM> The box bottom character (by default, -) i s replaced by c. <SET,c,BOXLEFT> The l e f t side character (by default, |) i s replaced by c. <SET,c,BOXBIGHT> The right side character (by default, |) i s replaced by c. <S ET, c, LT CORN ER> The ri g h t top corner character (by default, r) i s replaced by c. <SET, c,RTCORNER> The ri g h t top corner character (by default, -,) i s replaced by c. <SET,c,LBCORNER> The l e f t bottom corner character (by default, «-) i s replaced by c. <SET,c,RBCORNER> The ri g h t bottom corner character (by default, J ) i s replaced by c. A user's Manual 99 There remain some other, i l l - c l a s s i f i e d functions, which w i l l be l i s t e d here: <NEXT> The current block i s terminated immediately. <LINES-LEFT> The value of t h i s function i s the number of l i n e s remaining i n the current block. <COLS-LEFT> The value of t h i s function i s the number of columns remaining i n the current l i n e . <UND,chars> and <NOTUND,chars> The argument 'chars* i s a st r i n g of characters. After a c a l l to UND, those characters in 'chars' w i l l be underlined whenever underlining mode i s active ( i . e . , between a <U> and <NOTU>). After a c a l l to NOTUND, the characters in 'chars* w i l l not be underlined whenever underlining mode i s active. <WIDOW,n> This function i s c a l l e d WIDOW, although i t actually helps prevent "widows", a typesetting term which means that some small amount of text i s awkwardly l e f t on one page when i t belongs A User's Manual 1 0 0 with a body of text on the next or previous page. In e f f e c t , t h i s function w i l l cause a jump to the next block i f there are at the moment of the c a l l no more than n l i n e s remaining i n the current block. Notice that <WIDOW,n> i s equivalent to <LT,<LINES-LEFT>,n,"<NEXT>'>. <SET,C,FILLER> The ' f i l l e r ' character (by default, a blank) i s set to c. The f i l l e r character i s put between words, between segments, between the l e f t margin and the f i r s t word, and between the l a s t word and the right margin. Thus changing the f i l l e r character (e.g., to a ".") before a tab, w i l l have the effect of creating a tab-drop character. Naturally the f i l l e r should be set back to a blank as soon as the tab i s completed, or i t w i l l be inserted everywhere. <SET,c,UNDERSCORE> The underscore character (by default, _) i s set to c. Whenever a character i s to be 'underscored* ( i . e . , after the occurrence of the underscore operator, or after a <U>) i t w i l l now be overprinted with a *'c**. <SET,c,NTC> The »non-trivial* character (by default, a blank) i s set to c. The n o n - t r i v i a l character operator (by default, -«) i s henceforth replaced by a "c". A User's Manual 101 <MIN-WS,n> The 'minimum word spacing' (the least number of character that are to separate words) i s set to n. Ey default, this value i s 1, which i s why most words in t h i s document are separated by one blank. If n i s absent, the current minimum word spacing value i s returned. <MAX-WS,n> The 'maximum word spacing' (the greatest number of characters that are to separate words, after j u s t i f i c a t i o n ) i s set to n. By default, t h i s value i s 5 . If i t i s not possible to j u s t i f y a given l i n e with at most n blanks between words, the j u s t i f i c a t i o n routine gives up, issues a message to that e f f e c t , and sets the l i n e ragged r i g h t . This i s no solution to the problem of excessive spacing, of course, but i t i s often better than having a l i n e come out unreadable because of unreasonable spaces between words. If the user cares more about fl u s h right margins than about spacing, he need only set n to some enormous value. If n i s absent, the current maximum word spacing value i s returned. <MIN-SS,n> The minimum number of spaces between sentences i s set to n (this value i s the standard 2 spaces, by default). A sentence i s defined as ending i n a f u l l stop and a word-terminator (a f u l l stop and a blank, usually), where a f u l l stop i s one of A User's Manual 102 . ! or ?. If n, i s absent, the current value i s returned. <PAGE-DEPTH,n> The 'page depth' i s set to n. For a normal l i n e p r i n t e r , i n most i n s t a l l a t i o n s , the default value w i l l be about 60. If t h i s i s not so, i t can be reset to the correct value by using t h i s function. Where possible. Texture w i l l attempt to print a l l n l i n e s of the page contiguously; t h i s means that i f a given i n s t a l l a t i o n ' s printer skips to a new page afte r 60 l i n e s , but permits t h i s skip to be overridden by carriage control, Texture w i l l override the skip. This can be useful for pri n t i n g , say, 100-line pages, of two columns, and photo-reducing these, f o r conference proceedings, etc. <PARAGBAPH-INDENT,n> The paragraph indentation value i s set to n. By default, t h i s value i s the usual, s e c r e t a r i a l 5 columns. If n i s absent, the function returns the value of the current setting. <KEEP, $> The text following t h i s c a l l , up to the next occurrence of '$• (which may be any st r i n g of characters) forms a block-keep. If the text does not f i t into the current block, the current block i s terminated, and the text i s put at the top of the next block. The s t r i n g *$• disappears e n t i r e l y ; i t i s merely an end-marker. If •$• i s absent, i t i s assumed to be the string A User's Manual 103 consisting of a single blank. <LKEEP,$> The text following t h i s c a l l , up to the next occurrence of '$' (which may be any str i n g of characters) forms a line-keep. I f the text does not f i t into the current l i n e , the current l i n e i s terminated, and the text i s put at the front of the next l i n e . The string '$• disappears e n t i r e l y ; i t i s merely an end-marker. If '$' i s absent, i t i s assumed to te the s t r i n g consisting of a single blank. <PKEEP,$> The text following t h i s c a l l , up to the next occurrence of •$' (which may be any str i n g of characters) forms a page-keep. If the text does not f i t into the current page, the current page i s terminated, and the text i s put at the top of the next page. The s t r i n g •$' disappears e n t i r e l y ; i t i s merely an end-marker. If '$• i s absent, i t i s assumed to be the str i n g consisting of a single blank. <FOOT,$,name> The text following t h i s c a l l , up to the next occurrence of '$• (which may be any st r i n g of characters) forms a footnote. The text i s put at the bottom of the block 'name', i f possible; i f 'name' i s absent, i t i s put at the bottom of the current block. When the block i s f u l l , the remaining text i s put at the A User's Manual 104 bottom of subsequent blocks, u n t i l used up. <FIG,$> The text following t h i s c a l l , up to the next occurrence of '$• (which may be any str i n g of characters) forms a block-figure. If the text w i l l not f i t into the current block, i t i s held u n t i l the current block i s terminated, and then attempts are made for each subsequent block, u n t i l i t can be f i t t e d . <LFIG,$> The text following t h i s c a l l , up to the next occurrence of •$' (which may be any st r i n g of characters) forms a l i n e - f i g u r e . If the text w i l l not f i t into the current l i n e , i t i s held u n t i l the current l i n e i s terminated, and then attempts are made f c r each subsequent l i n e , u n t i l i t can be f i t t e d . <PFIG,$> The text following t h i s c a l l , up to the next occurrence of '$' (which may be any st r i n g of characters) forms a page-figure. If the text w i l l not f i t into the current page, i t i s held u n t i l the current page i s terminated, and then attempts are made for each subsequent page, u n t i l i t can be f i t t e d . A User's Manual 105 <AS-ISf$> The text following t h i s c a l l , up to the next occurrence of '$• (which may be any s t r i n g of characters) i s treated "as given". Each l i n e i s output exactly as i t appears i n the source stream. No Eureka programs or special operators are evaluated within as-is text. The following f i v e functions are a c t u a l l y Eureka primitives, rather than Texture primitives, but are given here because they are very closely related to document processing. <RS,eof> The value of t h i s function i s the next s t r i n g of text, from the current input medium, up to but not including an end-of-string marker. By default, the input medium i s the document source stream; the end-of-string marker i s a f u l l stop (.). If end of f i l e i s encountered before the end-of-string marker, the value of the c a l l i s the argument ' e o f , which i s a c t i v e l y evaluated. <PS,string> The argument ' s t r i n g ' i s output to the current output medium. The value i s the n u l l s t r i n g . By default, the output medium i s the document processor, so that "<PS,string>" i s eguivalent to " s t r i n g " , while "<PS,abc<PS,def>hjk>" i s equivalent to "defabchjk" (because the argument i s evaluated A User's Manual 106 f i r s t ) . The output medium may, however, be set to another f i l e or device. <RC,eof> The value of t h i s function i s the next character from the current input medium. The argument ' e o f has the same meaning as for RS. <INPUT,name> After t h i s c a l l , RS and RC w i l l read from the f i l e or device named by 'name'. I f the argument 'name' i s absent, the document source f i l e again becomes the input medium. The usefulness of t h i s function may vary from operating system to operating system. It may be used to read from a f i l e attached to one of the available input units however, and so merge input from two separate sources together i n the stream. It should be remembered that RS peels off the end-cf-string marker; naturally the end-of-string marker can be set to any convenient character. <OUTPUT,name> After t h i s c a l l , PS w i l l print into the f i l e or device 'name'. If 'name' i s absent, the document processor becomes the current output medium. A User's Manual 107 <SET,cf EOS> The end-of-string character, for RS, i s set to c. For example, the user may be producing form l e t t e r s . The names and addresses of the people to whom these form l e t t e r s are to be sent are in a f i l e c a l l e d 'VICTIMS'. Thus the complete Texture input might look l i k e : <define,LETTER,"|name|,|address|', »<MYADDRESS> <L> | name| <L> |address| Dear |name|; <L> <TEXT OF LETTER>•> <define,MYADDRESS,etc. . . > <define,TEXT OF LETTER,etc...> <INP0T,VICTIMS> <do,10000,"<LETTER,<RS>,<RS,"<PAGEXBYE>'»<PAGE>'> Fl e x i b l e Blocks In the d e f i n i t i o n of a block, <BLOCK,name,left,right,top,bottom,text> i t i s possible to replace either 'top' or 'bottom' with the st r i n g "*". This " s t a r " indicates that the user i s not e n t i r e l y c e r t a i n when the block w i l l be f u l l , and so wishes one end of the block to be f l e x i b l e . The block i s terminated by (a) a c a l l to NEXT, or (b) the f l e x i b l e end "bumping into" the l i m i t s of the physical page. Obviously, by i t s e l f t h i s concept i s not p a r t i c u l a r l y useful. Since the 'block' i s an i n v i s i b l e entity anyway, used only to inst r u c t the processor about the l i m i t s of A User's Manual 108 the text, one can just as ea s i l y replace 'top* with "1" as with ••*", and replace 'bottom' with "<PAG EBBPTH>" as with "*". The concept becomes useful only when combined with the concept of "dependent blocks". Dependent Blocks As shown i n the section on blocks on page 82 , i t i s possible to redefine the bottom (or top) of a block by <BOTTOM,name,n> It i s sometimes useful to have the bottom (or top) of a block depend one the bottom (or top) of another, f l e x i b l e block. This can be done by a c a l l l i k e , <TOP,name1,n,name2,B0TT0M> This says that the top l i n e of block 'namel' w i l l be n li n e s below the bottom l i n e of block 'name2'. Naturally, n may be a negative number, i n which case the top l i n e of 'namel* w i l l be n l i n e s above the bottom l i n e of *name2'. Henceforth, a layout which contains the block *namel* should also contain the block *name2', and *name2' must precede 'namel* i n the layout. A User's Manual 109 I T | r | II M II 1 II II II Here the caption i s f l e x i b l e (depending on the amount of text in the caption) and both the bottom of block 2 and the top of block 3 depend on where the bottom l i n e of the caption ends up. One could, therefore, af t e r giving other d e f i n i t i o n s , specify the following: <BCTTCM,caption,*> <B0TT0M,block2,0,caption,BCTT0M> <T0P,block3,2,caption,B0TT0M> I appreciate the fa c t that t h i s i s a rather primitive method of defining dependencies, and i t i s indeed on of the few places where the Eureka syntax f a i l s to handle a situation conveniently. The main problem appears to be the fact that a very simple syntax — almost a lack of syntax — i s being c a l l e d on to describe a rather complex s i t u a t i o n . It would have been more pleasing i f the d e f i n i t i o n of dependencies could have been included with the o r i g i n a l d e f i n i t i o n of the blocks, but I f e e l that t h i s over-burdens the c a l l of the BLOCK primitive. A User's Manual Realizing that whatever approach was taken would appear awkward to some group of users, I decided to create primitives only, and leave convenient d e f i n i t i o n (by means of Eureka) to the user. Events There are a number of occurrences which i t would be very useful to be warned about by the processor. There are times, f o r instance, when one would l i k e to 'know' that the processor has just f i n i s h e d a l i n e , and i s about to s t a r t on the next one; at t h i s point, one would l i k e to in s e r t some text. For t h i s reason. Texture defines events. <HANG,event,text> The argument 'text' i s associated with the event named by the argument 'event'. Whenever that event occurs, the text i s inserted into the stream. The 'event' may be one of the following: LINE new l i n e BLOCK new block PAGE new page EOL end of source l i n e EOF end of source f i l e User-defined operators The l i t e r a l - n e x t , underscore, overprint etc., operators are by no means the only kind of operator available. The user may A User's Manual 111 wish to define his own operators, since these are a convenient shorthand for frequently-performed operations. The d e f i n i t i o n of operators i s done in terms of Eureka. The operator i s e f f e c t i v e l y replaced by the Eureka program in terms of which i t i s defined. <OP,c,text> The character c i s a sp e c i a l operator, henceforth. Whenever i t appears i n the text, i t i s replaced by the s t r i n g given by 'text'. Such an operator could form the basis for a user-defined event f a c i l i t y . Thus, i f the user wants tc take special actions on the occurrence of a f u l l stop (, ? or !) , such as c a p i t a l i z i n g the f i r s t l e t t e r of the next word, then the three f u l l - s t o p characters could be defined as an end-of-sentence operator. The RC function might be used, in such an operator d e f i n i t i o n , to read ahead i n the input stream, check i f the required conditions p r e v a i l (the f u l l - s t o p should be followed by a blank, a quote and a blank, or a r i g h t parenthesis and a blank), and a s l i g h t r e v i s i o n of the text which i s encountered i n t h i s fashion can be returned as the value (e.g., i f the f u l l - s t o p operator "!" encountered a blank and the l e t t e r "q", i t would return the s t r i n g "! aq"). As an example, here i s such a program: <define,TERMINATION OP,|terminator|, »<DS, $char$, <NEXTC» |terminator!<EQ,<$char$>," », "<DS,$char$,<$char$XNEXTC» <EQ,<$char$>,») ',") A User's Manual 112 "<EQ,<$char$>,"*" ',"*» ', "<$char$>'>»>»>'> <DS,NEXTC,"<RC,"<PAGEXBYE>»>»> <0P,.,"<TERMINATOR 0P,.>'> y <0P,!,"<TERMINATOR 0P,!>«> <0P,?,"<TERMINATOR OP,?>*> The user-defines operator f a c i l i t y could also be used to define, i n a manner not unlike that of the program given above, an "escape" character for an alternate command mode. Thus "#" could be made an operator which reads in text up to the next blank, and turns t h i s text into Texture primitives. In t h i s fashion, "#L" could stand for "<L>" and "#T4" for "<T,4>", while "#LT4" could be "<LXT,4>". Multiple Columns Within a block, i t i s possible to set text i n multiple columns, side by side. In such a scheme, i t i s possible to jump from single to double-column mode at w i l l , and back again. <COL,n,space> The current block w i l l be divided v e r t i c a l l y into n columns of equal width, which are f i l l e d one by one, from l e f t to rig h t . There are 'space' spaces between any two columns. The argument •space' may be l e f t out, i n which case i t defaults to the current spacing of multiple columns. Footnotes continue to appear at the bottom of the block, and w i l l be in single-column mode unless otherwise s p e c i f i e d . A User's Manual 113 <COL-SPACE,n> This sets the space between multiple columns to n. The default value i s 3 spaces, which i s quite adequate, unless the columns are s t r i k i n g l y wide and are to be reduced photographically. If the argument »n« i s l e f t o f f , the!value of t h i s function i s the current spacing. Environments Footnotes, figures and mandatory text (the text associated with a block) are in th e i r own environments. This means that the text and commands of one of these are processed with certain global switches set independently of the text surrounding them. As soon as, say, a footnote i s entered, the old values of the global switches involved are saved on a stack of environments, and a brand-new environment, with default settings comes into force. Included in an environment are the following: l e f t and right margins l e f t and right indents underlining switch case-shift switches paragraph indents word, sentence and l i n e spacing tab stops block outline v i s i b i l i t y j u s t i f i c a t i o n method A User's Manual 114 <ENVIRONMENT,name> The current environment i s saved under the name 'name*. <ACTIVATE,name> The current environment i s replaced by the environment named 'name*. » These two functions can be useful i n setting up an environment for footnotes (for example), and r e c a l l i n g t h i s environment whenever a footnote i s entered. To accomplish t h i s , one might, f c r example, give the sequence of in s t r u c t i o n s : <ENVIRONMENT,save> some instructions to set up the footnote environment • ENVIRONMENT,foot-environment> <ACTIVATE,save> <define,footnote,|end|,"<FOOT,|end|> <ACTIVATE,foot-environment>'> Henceforth, where 'FOOT' would have been c a l l e d , »foctnote' i s ca l l e d instead, i n exactly the same way. 115 Chapter 4 IlElgJBgD^JBg Texture "The unbreakable f e t t e r s which bound down the great wolf Fenrir had been cunningly forged by Loki from these: the f o o t f a l l of a cat, the roots of a rock, the beard cf a woman, the breath cf a f i s h , the s p i t t l e of a bird. " Although at f i r s t glance i t seemes a r e l a t i v e l y straightforward task to implement Texture, as described in chapters 2 and 3, or to implement any other document processor, there are a number of possible p i t f a l l s for the unwary. To help implementors in avoiding such traps, and to give an outline cf my own approach to implementation, I propose, i n thi s chapter, to cover the v i t a l points in a Texture implementation. Ii Choice of Programming Language I have implemented a subset of Texture on two separate occasions: once i n Lisp and once in PL/I. A note on choice cf language seems appropriate. For an i n i t i a l assault on a problem, Lisp i s highly s a t i s f a c t o r y . If one i s computing in an environment where i n t e r a c t i v e programming i s made possible and convenient (and I was), the use of a language processor which i s intended for in t e r a c t i o n , eases the process of understanding the problem one i s t r y i n g to solve. Lisp, which l o c a l l y has enjoyed good Implementing Texture support, and boasts an i n t e r a c t i v e editor and debugging package, i s such a processor. An implementation i n Lisp, even i f major parts of i t could have been compiled, can hardly be considered a "production" program. The only concern i n t h i s implementation was to iron out the rough spots i n as convenient a manner as possible. The temptation i s always to go to the opposite extreme f o r the actual "production" implementation, and choose an assembler or extremely low-level language, "for the sake of e f f i c i e n c y . " And yet, t h i s has been shown time and again (although the controversy continues to rage) to be no more e f f i c i e n t than to choose a high-level language which conveniently describes the problem, and thereby saves the programmer endless hours of headache. PL/I i s not such a language. Besides the usual objections (to i t s being a language designed by dwarves and wrought by t r o l l s , with the inevitable r e s u l t that i t appears grandiose and in fact somewhat grotesque while underneath there beats the heart of a cowering weakling) one can complain of i n e f f i c i e n c y and inconsistency. The processor produces such stupendously bad code for innocent and reasonable control structures l i k e blocks and procedures, that a reasoned, enlightened program i s turned into a nightmare of d i a b o l i c a l complexity by translation into PL/I, and the programmer i s soon coerced into a monolithic programming style by a growing awareness of the i n e f f i c i e n c i e s he w i l l introduce i f he acts otherwise. Implementing Texture Then why choose PL/I? There were e s s e n t i a l l y two reasons fo r t h i s , and they remain v a l i d , even i f the choice they precipitated was not. The f i r s t was t r a n s p o r t a b i l i t y ; the second was the obvious need f o r a language i n which f l e x i b l e data structures and str i n g manipulation could be conveniently expressed. The most reasonable choices open to me l o c a l l y were PL/I, BCPL and Sue. PL/I programs are at least transportable to others of IBM's most popular l i n e of computers, although they are not at a l l e a s i l y transported to most other computers. BCPL, on the other hand, a language growing in popularity, and with good reason, i s available on several computers, and the care f u l programmer can make i t eminently simple tc transport his program to another machine. Sue i s implemented under OS, and was converted to MTS l o c a l l y ; thus a Sue program i s approximately as transportable as a PL/I program. PL/I does have reasonably sophisticated data structures including a certain amount of f l e x i b i l i t y i n the i r s i z e . I t also has f a i r l y good s t r i n g manipulation, although i t i s a surprise to discover how l i t t l e s t r i n g manipulation i s actually done in Texture. BCPL i s perhaps too low-level for convenient s t r i n g and data structure manipulation, although Sue can, with some e f f o r t , be made to serve i n such a capacity. As a programming language, however (that i s , a language which s i m p l i f i e s the act of programming), PL/I i s not very successful, and t h i s far outweighs any sophistication in data structure i t may off e r . Besides being an incoherent language, i t s compiler on the 360 produces inexcusably bad code, even Implementing Texture 118 after currently available optimization. This programmer, i n any case, w i l l not l i g h t l y be tempted to program i n PL/I again. 2_;_ Eureka In order to avoid some of the inherent i n e f f i c i e n c y of copying text back and forth across several st r i n g s , the ex i s t i n g implementation of Eureka adheres to the s p i r i t rather than the l e t t e r of the scanning algorithm given in Chapter 3. The algorithm as actually implemented may be broken down into four top-level functions: getting characters off the active s t r i n g ; putting characters (the strings r e s u l t i n g from active c a l l s ) on the front of the active s t r i n g ; putting strings (neutral text and the res u l t s of neutral c a l l s ) on the neutral s t r i n g ; expanding c a l l s . As explained in section 3, below, the f i r s t three of these are assumed to be accomplished by procedures which are passed to Eureka. The fourth, the actual expansion of c a l l s , i s b r i e f l y explained here. There are two scan modes: Eureka starts o f f in one mode (where the nesting depth i s zero), in which characters are taken from the active s t r i n g and sent to the neutral s t r i n g (in t h i s case, the document processor outputs them in some form); as soon as an opening bracket (<) i s encountered. Eureka enters another scanning mode, in which the nesting depth i s non-zero, by r a i s i n g the nesting depth by one. At a non-zero nesting depth, characters are taken off the active s t r i n g and added to a data structure conveniently named 'str i n g ' , (hereafter known as the scanning s t r i n g ) , which Implementing Texture 119 contains the following information: - length of s t r i n g - type of str i n g (parameter, active or neutral c a l l ) - maximum length of str i n g - text of string Whenever a neutral evaluation symbol (:) i s encountered, the next active character i s examined. If i t i s an opening bracket (<), the c a l l mode ( i n i t i a l l y 'active') i s set to 'neutral'; otherwise, the neutral evaluation symbol i s treated as text. . . Whenever a l e f t quote (") i s encountered, a counter calle d the copy-count ( i n i t i a l l y zero) i s raised by one. If the copy-count i s greater than zero, the l e f t quote i s copied to the current scanning s t r i n g . Henceforth, a l l text i s copied bl i n d l y i to the current scanning s t r i n g , with one exception: whenever a right quote (•) i s encountered, the copy-count i s lowered by one; i f the copy-count i s greater than zero, the right quote i s copied to the current scanning s t r i n g ; otherwise, subsequent text i s no longer copied b l i n d l y . Whenever an argument delimiter (,), opening bracket (<) or closing bracket (>) i s encountered, and the copy-level i s zero, the current scanning s t r i n g i s placed on top of a stack c a l l e d the scanning stack, and a new 'st r i n g ' structure becomes the current scanning s t r i n g . The following actions also take place: Argument delimiter (,): The 'type' of the current scanning s t r i n g i s set to 'argument'. Opening bracket (<): This represents a new nesting l e v e l : Implementing Texture 120 the nesting l e v e l i s raised by one. If the current scanning mode i s 'neutral', the 'type' of the current scanning s t r i n g i s set to 'neutral', and the scanning mode becomes 'active'. Otherwise the current scanning string i s given the type 'active'. Closing bracket (>): This means a complete c a l l has been stacked. A l l strings whose type i s 'argument' are pepped off the top of the scanning stack and saved, in reverse order, on an argument l i s t . One more s t r i n g , which represents the name of the s t r i n g or function to be expanded, i s also popped. The appropriate function i s c a l l e d with the arguments, i f i t i s a function; otherwise the given macro s t r i n g i s expanded by placing the arguments i n the corresponding segment gaps (see the discussion on t h i s , below). If the c a l l was active, the r e s u l t i n g s t r i n g i s pushed back onto the front of the active s t r i n g ; otherwise, the r e s u l t i n g s t r i n g i s fed to the document processor. The nesting l e v e l i s lowered by one. If the scanning stack i s not empty ( i . e . , the nesting l e v e l exceeds zero), i t s top element i s popped o f f , and becomes the current scanning s t r i n g . A c a l l may name either a (Eureka or Texture) primitive or a macro. A primitive c a l l has the e f f e c t of some procedure being c a l l e d , with the arguments of the c a l l . The c a l l returns a s t r i n g (usually the empty s t r i n g , i n the case of Texture primitives) which i s the value of the c a l l . To explain the expansion of a macro, i n the existing implementation, i t i s necessary to look at the implementation of the ES (define Implementing Texture 121 string) and SS (segment string) primitives. The c a l l <DS,name,text> causes the creation of a 'form'. A form i s a data structure consisting of the information: - length of name - name - pointer to linked l i s t of text segments The s t r i n g 'name' i s made the text of the form, and the form i s hashed into a table of forms. The pointer i s made to point to a single •segment*, which i s a data structure consisting of: - segment number - text length - text - pointer to next segment The text of t h i s segment i s the st r i n g 'text'; the segment number i s zero; the pointer i s n u l l . The c a l l <SS,name,g 1,g2...,gn> i s processed as follows. For each argument g i (which i s a s t r i n g ) , the following i s performed: for each segment in the linked l i s t of segments associated with the s t r i n g 'name', whose segment number i s zero, an attempt i s made to match g i to a substring of the segment text; when a match i s made, the segment i s broken up into three segments, the f i r s t segment consisting of the text up to the match (with segment number equal to zero), the second segment consisting of the matched text (with segment number equal to i ) , the t h i r d segment consisting of the text after the match (with segment number equal to zero). Implementing Texture 122 To expand a s t r i n g i n a c a l l , an empty 'string* structure, which we s h a l l c a l l the expansion s t r i n g , i s created. Each segment i n the linked l i s t of segments associated with the s t r i n g named by the c a l l i s examined in turn ( i . e . , the linked l i s t i s toured i n order): i f the segment number i s zero, the text of the segment i s added to the expansion s t r i n g ; i f the segment number i s i>0, the text of the i t h argument of the c a l l i s added to the expansion s t r i n g . The expansion str i n g , a f t e r expansion i s completed, i s the value of the c a l l . 2i Texture and Eureka The d e t a i l s of an implementation of Eureka having been covered, there remains the question of how Eureka i s to communicate with the text processor. The problem i s not unlike that of communication between the scanner/parser and the semantics pass in a programming language processor. A body of text must be gone over i n one phase of the processor, and handed on, i n a pre-digested form, to the other phase. There are several approaches to t h i s problem, and choice between them i s often largely a matter of circumstance and taste: i n some cases the parser c a l l s on the semantics at appropriate points in the parse — the semantics either produce a "parse tree" as a r e s u l t , or may produce code d i r e c t l y ; in other cases, usually in interpreters, the semantics w i l l c a l l on the parser to return the next unit of text, i n an i n t e r n a l form, which can be interpreted; i n yet other cases the two are sequential passes, the f i r s t making available to the second an Implementing Texture 123 i n t e r n a l form of the whole input text. A similar d i v e r s i t y of structures exists between parsers and scanners: there i s no d e f i n i t e hierarchy, but i n some situations one structure i s more convenient than another. In Texture, the s i t u a t i o n i s the following: the neutral text (the text which, i n the scanning algorithm, i s l e f t on the neutral string) i s the text of the document. It i s t h i s text then, which must get to the document processor, to be formatted. Further communication i s provided through those Eureka primitives which are concerned with document processing — those primitives which, f o r the user, define Texture. It i s important however, that the communication between Texture and Eureka remain properly synchronized. It i s not s u f f i c i e n t to l e t text gather on the neutral s t r i n g and occasionally send i t down to the document processor. Consider, for example, the following: some text <L> more text If t h i s appeared on the active s t r i n g , and Eureka waited u n t i l i t was a l l off the active s t r i n g before sending the neutral s t r i n g down to the document processor, the result would be in c o r r e c t . The l i n e skip would be evaluated by Eureka long before Texture ever got to see the two words in "some text"; t h i s means that the e f f e c t which the user presumably intended, by placing the <L> where i t i s , has f a i l e d to be accomplished. Ideally, the two would be coroutines, Texture asking Eureka to resume whenever i t needed more text, and Eureka asking Texture to resume whenever i t had another 'neutral* character (a Implementing Texture 124 character, at the lowest nesting l e v e l , which i t i s about to put on the neutral s t r i n g ) . This process can be simulated quite reasonably with a more conventional procedure structure than coroutines, however. In the existing implementation, a top-level supervisor c a l l s Eureka with the name of a 'producer* which w i l l give Eureka a character out of the input stream whenever i t i s c a l l e d ; and with the name of a »consumer' which w i l l r e l i e v e Eureka of a neutral character whenever i t has one. The producer i s , of course, an input routine, and the consumer i s the document processor. Eureka i s further modified so that i t recognizes Texture primitives along with i t s own standard primitives. ft non-obvious implementation problem in the interaction of Texture and Eureka arises from the fact that both need tc push text back into the input stream, to be reprocessed. Eureka needs to rescan the text r e s u l t i n g from an active c a l l , while Texture passes on to Eureka the text of, say, figures which need to be reprocessed, or the text of events, by pushing this text back into the input stream. An easy trap to f a l l into i s to assume that Eureka can maintain i t s own active s t r i n g , independent of the input stream; when scanning, Eureka would then f i r s t exhaust the active s t r i n g before turning to the 'producer* for more input text. However, i f text on the active s t r i n g causes, say, the triggering of an event, this approach w i l l r e s u l t i n an incorrect order of evaluatioon, because Eureka w i l l f i r s t f i n i s h scanning the active s t r i n g , before getting the Implementing Texture 125 event text which Texture pushed into the stream. The simple solution i s to have both Texture and Eureka push text onto the same backup stack (in the existing implementation, l i n k i n g i t onto the front of a linked l i s t of such te x t s ) . it-t texture and Text What follows here, i s an outline of the way Texture, in the existi n g implementation, manages the text which Eureka passes down to i t , character by character. When Eureka c a l l s Texture with a new character. Texture i s in the process of building a word; out of these words i t constructs segments; out of segments l i n e s ; and out of l i n e s , a v i r t u a l page, ready for output. This process i s described below. (a) Words An incoming character i s examined. If i t i s a word-break (such as a blank, or some character defined by <SET,c,BREAK>), the word i s completed, and added to the current segment. If i t i s a sp e c i a l operator character, a f l a g i s set in expectation of the next character. If i t i s an ordinary character, i t i s added to the end of the current word, and acted upon by any operators that may have preceded i t (for example, i f an overprint operator preceded i t , the character i s made an overprint of the l a s t character i n the current word). When a word i s finished, i t i s a data structure consisting Implementing Texture 126 of: - word length - text of word - whether to be underlined - number of overprints - an array of structures consisting of: - overprint location (in word) - overprint character (at that location) - pointer to next word (b) Segments A segment i s the smallest unit of text that can be adjusted. A segment i s terminated by a • s p l i t ' , a 'tab' or the end of a l i n e . The text within a segment i s adjusted according to the i n s t r u c t i o n of the c a l l which terminated the segment (a normal end of l i n e would cause i t to use the prevailing j u s t i f i c a t i o n mode, a ' s p l i t ' would cause a ragged l e f t or ragged right s e t t i n g , depending on which side of the s p l i t the segment was on). The r e s u l t i s a data structure consisting of: - number of words - pointer to linked l i s t of words - pointer to next segment ( j u s t i f i c a t i o n data:) - l e f t margin number of f i l l e r characters between l e f t margin and f i r s t word Implementing Texture 127 - number of f i l l e r characters between words in f i r s t part of segment - number of words i n f i r s t part of segment - number of f i l l e r characters between words in second part of segment - number of f i l l e r characters after l a s t word (c) Lines A l i n e i s completed whenever the l a t e s t word runs over the right margin of the current block. Whatever remains cf the l a t e s t word after hyphenation, i s pushed back into the stream (in case end-of-line events cause any text to be inserted), and the words gathered up into a segment, which i s combined with other segments into a l i n e . A l i n e i s a data structure consisting of: - l e f t margin - right margin - pointer to linked l i s t of segments - pointer to next l i n e Lines are kept in a " v i r t u a l page" (readers with v i r t u a l memory computers should try to avoid confusion) which i s an i n t e r n a l representation of the printed page. The v i r t u a l page i s a vector of pointers (as many pointers as there are l i n e s on the printed page — t h i s i s adjusted by the PAGE-DEPTH primitive, but w i l l usually be 60). Each pointer points to a linked l i s t of linked l i s t s of l i n e s . Each such l i s t of l i n e s contains l i n e s which are to be printed on the same physical Implementing Texture 128 l i n e , but w i l l not overlap i f put i n the same buffer, thus saving on the t o t a l number of output operations. Since a l i n e conveys no information about adjustment, and no more information about location than a segment, i t would be equally sensible to do away with l i n e s and add segments to the v i r t u a l page. This would eliminate a data structure from the implementation — always a desirable thing, when i t can be done without compromising the c l a r i t y of the program. In my implementation, l i n e s were intended for easy removal of keeps and figures from the v i r t u a l page, but i f the " a u x i l l i a r y v i r t u a l page" scheme suggested below i s used there i s no further need f o r l i n e s . (d) Blocks Each time a l i n e i s added to the v i r t u a l page, the number of l i n e s remaining in the current block i s decremented by the number of spaces between l i n e s plus one. When a l l l i n e s a l l o t t e d to a block have been consumed, the block i s done, and the next block i n the layout becomes active. When a block becomes active, new margins are set and any mandatory text (text associated with that block) i s pushe'd into the input stream. If the text was to be outlined with a " v i s i b l e block", the top of the box i s added to the layout as well. (e) Pages When the l a s t block i n a layout i s done, the page i s done. Implementing Texture 129 The page i s output in the least number of operations possible (many i n s t a l l a t i o n s have printers which recognize carriage control, such as l i n e skips, and double l i n e skips; many i n s t a l l a t i o n s also charge for l i n e s printed as well as pages printed, and t h i s can be a substantial saving). After the old page has been printed out, a new page i s created, and the pending layout (which may be the same as the old layout) becomes the new active layout. The process starts over again, with the f i r s t word i n the f i r s t l i n e . 5_j_ Keeps x Figures^ Footnotes and As-is Text (a) Nesting vs. End-markers The reader w i l l have discovered, i n the previous chapter, that the four text-modes, keep, f i g u r e , footnote and as-is text, a l l have a sim i l a r command structure: <command,$> The '$• serves as a 'terminator' s t r i n g , or end-marker. For example, a footnote might look something l i k e t h i s : ...text <FOOT,_|_> footnote text _|_ more text... Why l i k e that? Why not l i k e t h i s : ...text <FOOT,"footnote text'> more text... which i s , after a l l , more consistent with the structure of Texture. The reason i s three-fold: economy, goof-proofing and r e a d a b i l i t y . Economy, because the Eureka scanning algorithm Implementing Texture 130 must pass over the footnote text twice — once to read i t i n , and once to evaluate the arguments — before c a l l i n g the Texture primitive FOOT; since the text must not be evaluated anyway, Texture might as well do the reading, and eliminate the second scan. Goof-proofing, because the hasty or naive user is l i k e l y to forget to nest his text in guotes {" and *, by de f a u l t ) , thus causing any Eureka c a l l s to be evaluated prematurely -- and t h i s can be disastrous i f they are Texture primitives; there i s , moreover, the p o s s i b i l i t y that his text contains the closing quote (') by way of an apostrophe — again with unpleasant r e s u l t s . Readability, because the text may run several l i n e s , and the careful document writer w i l l choose an e a s i l y distinguished end-marker, such as the "goal-post" in the example given above. When too deeply nested, moreover. Eureka programs become very d i f f i c u l t too read. (b) Keeps and Figures The problem with keeps and figures i s that text, once processed, may not f i t into the current block. I f t h i s happens, the results of processing must be cleaned out of the v i r t u a l page, and at some point l a t e r the keep or figure must be processed again. This implies the a b i l i t y , of the processor, to "tack up" and restore the state of the process to i t s condition before the keep or figure was entered. With most document processors, t h i s i s not a t e r r i b l y complicated task, but i n Texture, where an entire Eureka program may appear between two characters of text, Implementing Texture 131 the problem i s somewhat magnified. I t w i l l be necessary to save the "state" of the process at the point of invokation of the keep or figure. This state consists of a number of global settings and switches; i t also consists of the i n t e r n a l representation of the output page the v i r t u a l page -- and of Eureka's string space. I t i s rather expensive to save these l a s t two items for each keep, esp e c i a l l y since, I would estimate, a s o l i d three-fourths to f o u r - f i f t h s of a l l keeps f i t on the f i r s t t r y . As for Eureka's string space, i t i s probably more sensible to maintain an "undo l i s t " which i s updated each time Eureka's s t r i n g space i s altered inside the keep. This i s very reasonable, because i t i s unlikely that the string-space w i l l be altered at a l l inside a keep. Rather than saving the state of the i n t e r n a l layout, i t i s perhaps more reasonable to "set" the keep text in a a u x i l l i a r y v i r t u a l page and, when the keep has been processed successfully, to merge t h i s a u x i l l i a r y v i r t u a l page with the v i r t u a l page intended for output. Should the keep f a i l to f i t into the a u x i l l i a r y v i r t u a l page, then the a u x i l l i a r y v i r t u a l page i s the only thing that needs to be released — and t h i s i s l i k e l y to be simpler and les s costly a process than that of creating and freeing an entire copy of the v i r t u a l page. The existing implementation proceeds as follows: After a c a l l for a keep or figure, the text up to the end-marker i s read i n , bypassing Eureka, and saved. The current environment i s saved, and an "undo l i s t " for changes to the Eureka string-space, i s begun. F i n a l l y , an a u x i l l i a r y v i r t u a l Implementing Texture 132 page i s set up to receive the l i n e s , and, for figures only, a new environment i s started. The text i s now pushed back into the input stream, terminated by an end-of-keep indicator, or an end-of-figure i n d i c a t o r , and the appropriate text mode i s activated by means of a ' f l a g ' . Text i s treated i n the usual way, except that lines are going into an a u x i l l i a r y v i r t u a l page. For line-keeps (or l i n e - f i g u r e s ) : i f the end-indicator i s encountered before end-of-line, the keep-text i s merged with any text already in the l i n e ; i f end-of-line occurs before the end-marker has been encountered, text up to the end-marker i s flushed from the input stream, processed text i s removed (by removing the a u x i l l i a r y v i r t u a l page), and the old environment i s restored. If i t was a keep, a l i n e - s k i p i s performed, and the text i s again inserted into the stream, t h i s time without an end-indicator. If i t was a figure, i t i s saved i n a gueue of l i n e - f i g u r e s , and an indicator f l a g i s set; each subsequent l i n e , the figure text i s again released. For block-keeps (or block-figures): i f the end-indicator i s encountered before end-of-block, the keep-text i s merged with any text already in the layout; i f end-of-block occurs before the end-marker has been encountered, text up to the end-marker i s flushed from the input stream, processed text i s removed (by removing the a u x i l l i a r y v i r t u a l page), and the old environment i s restored. I f i t was a keep, a block-skip i s performed, and the text i s again inserted into the stream, t h i s time without an end-indicator. If i t was a figure, i t i s saved in a queue of Implementing Texture 133 block-figures, and an indicator f l a g i s set; each subsequent block, the figure text i s again released. For page-keeps (or page-figures): i f the end-indicator i s encountered before end-of-page, the keep-text i s merged with any text already i n the layout; i f end-of-page occurs before the end-marker has been encountered, text up to the end-marker i s flushed from the input stream, processed text i s removed (by removing the a u x i l l i a r y v i r t u a l page), and the old environment i s restored. I f i t was a keep, a page-skip i s performed, and the text i s again inserted into the stream, t h i s time without an end-indicator. If i t was a figu r e , i t i s saved in a queue of page-figures, and an indicator f l a g i s set; each subsequent page, the figure text i s again released. (c) Footnotes After a c a l l , the text up to the end-marker i s read i n , bypassing Eureka. The current environment i s saved and a new environment i s started; the footnote text i s pushed back into the input stream, and an a u x i l l i a r y v i r t u a l page i s set up. Text i s processed as usual, except that l i n e s go into the a u x i l l i a r y v i r t u a l page. If the current block f i l l s up, the next block i s activated, and the process continues u n t i l the l a s t block on the page i s f u l l . This means that the footnote i s too large for the entire page, and part of i t must go onto the next page. The current page i s completed i n the normal manner, and on the next page the remainder of the footnote (or as much as f i t s ) i s set. Implementing Texture 134 If the convention i s adopted that the user can specify how much text can go into each block before a footnote i s released to i t , i t w i l l become necessary to save the environment cf the footnote after each block, and hold any remaining text, u n t i l a given number of l i n e s of l i n e a r text have gone into the block. In t h i s fashion, linear text and footnote do not get into each other*s way. (d) As-is Text After a c a l l , the text up to the end-marker i s read i n , bypassing Eureka. The text i s modified to the extent of (1) placing a ' l i t e r a l - n e x t ' operator before a l l 'special' characters, such as the word-break, the overprint operator and the Eureka-call opening bracket, and (2) putting a 'new-line* command (<L>) after every source l i n e . This modified text i s then released by being pushed back into the input stream. While th i s method may seem a l i t t l e i n e f f i c i e n t (why use the processor — i n p a r t i c u l a r , Eureka — to accomplish something one could do by simply adding the text d i r e c t l y to the v i r t u a l page?), i t i s r e a l l y quite reasonable. Eureka would have to look at the text anyway, and the only primitives i t now has to c a l l are the <L>s at the end of each input l i n e . The cost of t h i s c a l l i s minimal. The r e a l advantage i s in keeping the »AS-IS* primitive so simple: i f i t were done any other way, the primitive would have to be concerned with blocks becoming f u l l , pages becoming f u l l , mandatory text having to be inserted (and the need for starting Implementing Texture 135 and stacking environments, which t h i s implies); i n short, a l l the things which the document processor w i l l already do guite handily i f Eureka passes down a l l the characters one by one. Blocks and Layouts Blocks and layouts are maintained accessible by name, so that, when they are invoked, they can be ea s i l y found. Blocks are data structures consisting of the information: - name - l e f t column - right column - top l i n e (or f l e x i b l e ) - pointer to top-line dependency (if any) - bottom l i n e (or f l e x i b l e ) - pointer to bottom-line dependency (if any) - mandatory text ( i f any) A layout i s a data structure consisting of - name - pointer to linked l i s t of blocks and layouts At the s t a r t of each page, the 'pending' layout becomes the 'active' layout, which i s a linked l i s t cf specs. These specs contain, i n order, the information important to each block in the active layout, and also: - (1) minimum number of l i n e s remaining in the block - (2) current top l i n e - (3) current bottom l i n e - (4) whether any text has been put into the block yet Implementing Texture 136 In blocks of fixed boundaries, only the f i r s t and fourth of these items are updated, but in blocks of dependent or f l e x i b l e boundaries, either the second or t h i r d items w i l l also be updated with each l i n e . When the f i r s t item reaches a quantity so small that the combined line-spacing and a new lin e would not f i t i n t o the block, the block i s automatically terminated. Whenever a new block i s entered, i t i s f i r s t checked for any dependent boundaries. If.there are any, the dependency i s worked out, and since the dependency must have preceded the block i n the layout, the dependent boundary can now be fixed. The mandatory text i s checked next, and i f there i s any associated with t h i s block, i t i s released into the stream, terminated by an end-of-mandatory-text marker. A new environment i s begun, and the old environment saved. If the end-of-mandatory-text marker i s encountered before the block i s f u l l , the old environment i s restored. If the block i s f u l l before the end-of-mandatory-text marker i s encountered, the remaining mandatory text i s flushed out of the stream, and the old environment i s restored, 7. Environments The exact contents of an environment have already been given i n chapter 3. Environments are data structures, which are stacked by being linked together (they w i l l seldom be more than one deep, however). Whenever a footnote, figure or mandatory text i s entered, a new environment i s set up, and the old one stacked. The ACTIVATE command causes a copy of the named Implementing Texture 137 environment to replace the current environment. The ENVIRONMENT command attaches a copy of the current environment to the given name; for t h i s reason, environments, l i k e layouts and blocks, are stored by name. 8_. A Note on Primitives It i s worth noting, i n a chapter on implementation, that the implementation of the document processing primitives described i n Chapter 3 i s , i n a few cases, t h e o r e t i c a l l y unnecessary. Aside from a number of primitives, l i k e FIG and KEEP, which form an i n t r e g a l part of the design of Texture, there are a few primitives which exist largely for the convenience of the user, but are not r e a l l y 'primitive' in the sense that they cannot be expressed in terms of other primitives. A simple example of t h i s i s the new-paragraph primitive, which i s equivalent to <LXT,<ADD,<LI>,<PARAGRAPH-INDENT>>>. When i t comes to deciding on the primitives to be implemented, I chose as much as possible to implement only the lowest-level primitives, and leave out any which could be stated in terms of other primitives. This decision was not s t r i c t l y adhered to when i t came to functions l i k e P, FOOTER, TITLE, etc., because i t was important that a novice user should be able tc produce documents without any deep understanding of Eureka, and how to program i n i t . Another example i s AS-IS, -which can be "written as a f a i r l y complex Eureka program. This function was programmed as a Implementing Texture 138 v. primitive largely because of t h i s complexity, and the inevitably higher cost of a Eureka version than a compiled version. Again, i t was included for user-convenience: i t i s my b e l i e f that the use of 1AS-IS• i s a l a s t resort against the stupi d i t y of a computer program such as t h i s , and for t h i s reason the AS-IS primitive i s very simple-minded. Users who desire more complexity (FORMAT, where some macros are and some are not expanded, and some commands and operators are and some are not obeyed, furnishes an example of such unorthogonal complexity which the demands of usage always seem to impose on us) are l i k e l y to be capable of writing the appropriate Eureka program. One further program example might be the page-number function PN, which might be implemented as follows: <DS,$PN$,1> <define,PN,|n|,"<EQ,|n|,,"<$PN$>«,"<DS,$PN$,|n|>•>•> <HANG,PAGE,"<DS,$PN$,<ADD,<$PN$>,1>>»> Notice that the s t r i n g named 'SPNS1 i s maintained to keep track of the page number, while PN i s a function which either sets or returns the page number. At the top of each page, the value of $PN$ i s raised by one. The remaining functions which the user could implement himself are: P LTITLE RTITLE TITLE FOOTER HIDOW Implementing Texture 139 PARAGRAPH-INDENT (assuming P i s implemented as given above) AS-IS 140 Chapter 5 < Conclusions "Let's talk of graves, of worms, and epitaphs." This thesis has attempted to present the design and implementation of a document processor in a reasoned, well-structured fashion. From an enquiry into what has been said and done about the subject, i t proceeded into a discussion of what might be expected of a document processor. The conclusions drawn from t h i s discussion led naturally into abstract design, and abstract design into s p e c i f i c a t i o n s for commands and the i r semantics. F i n a l l y , suggestions were given for a possible implementation, noting perilous places along the route, drawn from experience with a p a r t i a l implementation. Naturally i t i s impossible to describe the entire process of design, especially because such things, l i k e most creative thought, occur i n an i r r a t i o n a l order. It can only be hoped that the four preceding chapters reveal the underlying structure of t h i s process, along with explaining s a t i s f a c t o r i l y the re s u l t i n g design. In the Introduction, I stated three main objectives to be accomplished in t h i s thesis. It might be worth looking back, at t h i s point, and try to establish (from a necessarily somewhat subjective position) how close the results came to accomplishing Conclusions 141 the objectives. The f i r s t objective was to examine existing runoff and typesetting systems in an attempt to recognize the basic functions involved in processing document text. Chapter 1 contains such an examination, doubling as a survey. I chose to examine i n d e t a i l the approaches taken i n the l i t e r a t u r e to the three areas of command language, layout control and j u s t i f i c a t i o n , because i t appeared that these were the v i t a l areas i n document processing. It became apparent that command language was an area i n which l i t t l e had been done previously that was of any value. Designers of document processors have taken the view that any set of symbols which appear to express the functions performed by the document processor i n some fashion, w i l l do the job. Some more recent systems, in p a r t i c u l a r HCS and CypherText show that the authors understand the need for greater power of expression. From looking at the command languages of other document processors, i t became apparent that i t i s necessary to accept the lesson from programming language design which teaches that s i m p l i c i t y and consistency are major factors in the human engineering of computer languages. The section on j u s t i f i c a t i o n , on the other hand, discloses that much thought has gone into t h i s area (perhaps because i t i s r e l a t i v e l y simple, but I think also because i t i s the f i r s t problem which document processors were intended to solve, and i t i s s t i l l the major one). There was therefore l i t t l e need to attempt to improve on exi s t i n g understanding, except to f i t the pertinent concepts into the Conclusions 1 4 2 r e s u l t i n g document processor design. The area of layout co n t r o l , a very broad topic which inevitably covers everything else of importance in document processing, includes the areas of page segmentation, non-linear text (my umbrella term for text which i s treated out of order — or i n some other order than that i n which i t appears i n the source l i s t i n g ) , and the communication between user and processor beyond the issuing of simple commands. Here, as i t was in the case of command language, the l a t e r systems have shown more insignt than the e a r l i e r ones. Very useful ideas came from HCS, in the area of user-processor communicaton (through the a v a i l a b i l i t y of query primitives and through a notion si m i l a r to that of 'events* in Texture) and page segmentation (a concept very similar to Texture's blocks). In the area of non-linear text, much was to be learned from TEXT360 and MTS FORMAT. The typesetter-oriented HCS and CypherText were of very l i t t l e help, because few of the non-linear text functions (footnotes, figures etc.) were included i n t h e i r design. Because Texture was intended to bridge the gap between what amounts to a r e l a t i v e l y sophisticated use of computers ( i . e . , document processing) and a user who has not necessarily used computers in any ether way before, i t was important that concepts such as these — intended to improve the user's power and ease of expression — should be made a part of the design. The second objective was to design a document processor which unified the concepts learned from t h i s examination of e a r l i e r work into a system which was to be both powerful in the Conclusions 143 a b i l i t y i t afforded users to express themselves, and yet gentle on the novice who has only modest requirements. Texture i s a beginning to the process of unifying the various aspects of computer-assisted document edi t i n g . It combines, under one umbrella, a number of solutions to problems l i k e page layout and page i d e n t i f i c a t i o n (headers, page numbers, etc.) and presents a u n i f i e d approach to text adjustment. I t proposes a few methods, such as the very natural extension of the command language into a text-oriented programming language, and the concept of programmable •events,' which are intended to give the user f u l l e r control over the processing of his document. But i t i s only a beginning, because these concepts are not unif i e d well enough. The idea of •layout 1 as defined i n Texture does not cover the f u l l range of meaning attached to the word by an editor or typographer; nor do the various concepts such as •figure', *keep', 'footnote* and 'multiple column text' f i t together into a unified process the way concepts l i k e ' i t e r a t i o n ' , 'sequencing* and ' c a l l * f i t together in programming languages — they continue to have the flavour of 'features,' necessary for the processor to be of value, tut never s a t i s f a c t o r i l y a part of an o v e r a l l design. Can there be such a u n i f i e d whole? If there can, I do not see i t from here, and perhaps t h i s i s because i t i s too early i n the investigation of the problems i n document processing for such unity tc become apparent. (It should also be remembered that printing i t s e l f has long been known as "the black a r t , " and t h i s does not refer to the ink alone. once again the analogy of document processor Conclusions 144 design to programming language design comes to mind: holy quests for the 'perfect 1 programming language, in which every programmer w i l l produce elegant and e a s i l y understood programs, have thus far f a i l e d to produce anything other than yet more programming languages i n which d i s c i p l i n e d programmers produce d i s c i p l i n e d programs and others do not. This i s not to say that the quest should not continue: i t may succeed.) The t h i r d objective was to indicate how implementation of the design has proceded and might better proceed in the future. I f e l t that t h i s was important, because one of the big problems i n software development i s the time spent in implementation. Any time which can be saved by steering the implementor away from incorrect or p o t e n t i a l l y incorrect implementation decisions i s time which can be spent on improving the implementation. My own experience has been that a very basic document processor can be implemented i n a month or so, but that a complete Texture would require a good deal more time (several months, ce r t a i n l y ; a fortunate thing i s that the approach of implementing various aspects of the design as primitives i n a consistent command language allows a f u l l system to be b u i l t up i n stages). Too l i t t l e attention i s paid, i n the l i t e r a t u r e , to the problem of giving thorough, but not overly detailed accounts of implementation: no-one more than the author of a design can understand the inherent problems i n implementing his design, and much gr i e f could be saved i f t h i s understanding could be shared more openly. Naturally i t i s to be understood that space l i m i t a t i o n s are far more stringent i n journal or conference Conclusions 145 papers than in a the s i s , and that corporate documentation is necessarily vague about implementation because t h e i r object i s to s e l l a product, not to di s t r i b u t e designs. In a sense, I hope that Texture and i t s cousins also form an end. Texture i s a member of a class of programs that can best be described as 'batch-oriented'. That i s , you feed something in at one end of the black box, and out the other end come the r e s u l t . While your morsels are inside the black box, you have no control over the operation -- you can only examine the r e s u l t s , modify the food and feed again. It i s thi s kind of program which has led to the immortal phrase 'garbage in — garbage out.' Well, i t i s f r u s t r a t i n g . And yet, indications from a l l . over the place give us hope that i t need not continue to be so. We are a l l aware that i n recent times the rate of development of new hardware has far outstripped our a b i l i t y to u t i l i z e i t to i t s f u l l potential. We have the imagination, c e r t a i n l y : witness only the Utopian visions of IBB t e l e v i s i o n commercials, or endless new projects to use computers i n daring, even revolutionary, ways, presented at every conference. What we lack i s the s k i l l or patience to in s t r u c t those great, stupid machines — those el e c t r o n i c grandsons cf eighteenth-century automata, able to do one spe c i a l task so very, very well — in the i n f i n i t e s i m a l i n t r i c a c i e s of action and decision which the human mind accomplishes without, so to speak, batting an eyelash. Nonetheless, we are catching up to the hardware because. Conclusions 146 although machines are getting f a s t e r , smaller, more convenient to program, with bigger memories and smaller, r e l a t i v e costs, they are not r a d i c a l l y changing i n concept; nor are the various peripheral devices. So what could we be doing with the present technology, or i t s even more d e l i g h t f u l descendants, a few years on, to make document editing and processing even less of a chore? Ideally, the author of a document should be able to s i t down at a graphics terminal capable of displaying a comfortably sized part of his document, and designed to permit him tc edit both the text of the document and i t s appearance. This implies that the document processor i t s e l f can be r e l a t i v e l y simple no more sophisticated than small, unpretentious runoff programs are now — because the author, not the processor, i s making the formatting decisions as the document proceeds. If the author does not l i k e the placing of a footnote, he 'pushes' i t around with graphics aids l i k e a 'mouse'. (How d e l i g h t f u l i t would be to draw a polygon around a piece of text, and move the polygon around, reshaping the text inside to one's heart's content — now therej^s scissors-and-paste for the electronic age!) A l l t h i s computing power should, of course, come from a hard-working l i t t l e mini-computer dedicated to the user's needs and no-one else's; when you s t a r t asking for computing power l i k e t h i s , the e f f e c t of time-sharing i s destroyed by an inevitably degraded response. The output media could vary widely: from tape (hopefully magnetic) for an e a s i l y accessible typesetting machine, to hard-copy of a s l i g h t l y less 'finished* form, from a Conclusions 147 high-resolution dot printer to the good old typewriter for stodgy publishers who w i l l accept nothing else. "Yes, yes," one can hear the reader say, "and t h i s also; and that as well!" However, we are going to have to wait a b i t . I t i s coming, c e r t a i n l y : projects such as FBESS [7;8] and NLS [27] are trying to achieve exactly such a f a c i l i t y , and more besides. But there are very few i n s t i t u t i o n s with the money for such extravagant visions (and anything less would only be a disappointment, and therefore an annoyance), either in terms of hardware cost or software development time — and the l a t t e r i s r i s i n g steeply, even though the former i s gently de c l i n i n g . So we must do, for now, with Texture, i t s cousins and t h e i r possible descendants. They w i l l serve quite well f o r now, and better as they are improved and the problems become more c l e a r l y understood. They have the advantage over the more Utopian systems of being easy to implement (a few man-months as opposed to a man-year for programming languages, and who knows how long f o r NLS-like systems), are easy to f i t into the operation of a small computing i n s t a l l a t i o n ( a l l that i s needed i s a computer and input and output devices — though preferably with a reasonable character set), and inexpensive to operate. 148 Surveys [1 ] The Man-Machine Combination for Computer-Assisted Copy Editing Wayne A. Danielson Advances in Computers, 7(1966), Franz L. Alt S Morris Bubinoff (eds.) Academic Press, New York, 1966; pp 181-193 [2] Computer-Aided Typesetting William R. Bozman Advances in Computers, 7(1966), Franz L. A l t & Morris Rubinoff (eds.) Academic Press, New York, 1966; pp 195-207 [3] On-line Text E d i t i n g : A Survey Andries van Dam and David E. Rice Computing Surveys, vol 3, no.3 (September 1971); pp 93-1*14 [4] On-line Computer Text Processing: A T u t o r i a l Richard C. Roistacher Centre for Advanced Computation, University of I l l i n o i s at Urbana-Champain; CAC Document no.82, August 15,1973 Manuals and Descriptions [5] Magnetic tape S e l e c t r i c typewriter (MTST) IBM forms 543-0510-1, 543-0515, 549-0204 and 549-0700 [6] Astrotype Form #30 Automatic O f f i c e Division, Information Control Systems, Inc. [7] A Hypertext editing system for the /360 (HES) Steven Carmody, Walter Gross, Theodor H. Nelson, David Rice and Andries van Dam Pertinent concepts i n computer graphics. Bibliography 149 M. Faiman S J. Nievergelt (Eds.) University of I l l i n o i s at Urbana, March 1969; PP 291-330 [8] FBESS (File Retrieval and Editing System) user's guide Text Systems Inc., July 1971 [9] Format - A Documentation Program B i l l Webb University of B.C. Computing Centre, August 1973 (frequently revised) [10] The FORMAT Program Gerald M. Berns IEEE Transactions EWS-11 (August 1968); pp~85-91 [11] Description of FORMAT, a Text-Processing Program Gerald M. Berns Communications of the ACM, vol 12, no.3 (march 1969); pp~T41-146 [12] TEXT360 - Introduction and Reference Manual IBM Form C35-0002-0 March 1969 [13] System/360 Administrative Terminal System (ATS) Terminal Operations Manual IBM Form GM20-0589-2 Ap r i l 1970 [14] LINCO III Automatic Typesetting System form UP-7683 SPERRY RAND - UNIVAC [15] A Special Purpose Computer for High-speed Page Composition Constantine J. Makris Proceedings F a l l Joint Computer Conference vol 29 (1966) ; pp~137-148 [16] CypherText: An extensible composing and typesetting language C.G. Moore and R.P. Mann Pl22S®^iS3§ F a l l Joint Computer Conference vol 37 (1970); B ibliography 150 pp 555-561 [17] The IBM S e l e c t r i c Composer G. A. Holt, A. Frutiger, B.T. Crtcher, D.E. Sederholm, H. Pijlman, G.E. Siemer, G.T. Slaughter, M. Prewarski, B.W. Miles, C C . Wilson, C.N. Van Avery, N. C a i l , J.S. Morgan, J.R. Norwood, R.D. Mathews, J.W. Spears, J.C. Rogers IBM Journal of R 6 D vol 12, no 1 (January 1968); pp~3-91 [18] Development of the IBM Magnetic Tape S e l e c t r i c Composer D.A. Bishop, R.S. Heard, R.E. Hunt, J.E. Jones, R.A. Rathenkamp IBM Journal of R S D vol 12, no 5 (Semptember 1968); pp 380-398 [19] PAGE-1 Composition Language - Reference Manual Form Rep. 73-06-003P Radio Corporation of America (RCA) January 197 1 (RCA has stopped marketing t h i s product, and no longer supplies the documentation, but see [31]) [20] WYLBOR: An Interactive Text Editing and Remote Job Entry System Roger Fajman and John Borgelt Communications of the ACM vol 16, no 5 (May 1973); pp"314-322 [21] The Compatible Time Sharing System: A Programmer's Guide Section AH.9.01 P. A. Crisman (editor) 2nd Ed. MIT Press 1965 [22] An Online Editor (QED) L. Peter Deutsch and Butler W. Lampson Communications of the ACM vol 10, no 12 (December 1967); pp~793-799 and 803 [23] Harris Composition System - Language Manual Intertype (Harris Intertype Corporation) March 1970 [24] Multics Programmers* Manual Bibliography 151 Massachusetts Inst i t u t e of Technology 1972 [25] Tenex User's Guide Bolt, Beranek, and Newman Inc. 1973 [26] PDP-11 Runoff L. Hade D i g i t a l Equipment Corporation 3 October 1971 [27] NLS Output Processor's Guide Augementation Research Center Stanford Research Inst i t u t e [28] A Computer-Assisted Page Composing System George Z. Kunkel Proceedings F a l l Joint Computer Conference vol 29 (1966) ; pp T57-167 [29] Computerized Typesetting of Complex S c i e n t i f i c Text J.H. Kuney, B.G. Lazorchak, S.W. Walcavich Proceedings F a l l Joint Computer Conference vol 29(1966) ; pp 749-156 Related A r t i c l e s [ 3 0 ] Hyphenless J u s t i f i c a t i o n George E. Kunkel and Tilmon H. Marcum Datamation vol 11, no 4 (A p r i l 1965); pp~42-44~ [ 3 1 ] Computer Photocomposition of Technical Text Franz L. Alt and Judith Yuni Kirk Communications of the ACM vol 16, no 6 (June 1973); pp~386-391 Macro Processors [32] TRAC, A Text Handling Language C.N. Mooers and L.P. Deutsch £E°2§sdings of the ACM National Conference (1965) ; pp 229-246 Bibliography 152 [33] Umist Description, Appendix B of MTS manual MTS-570-0 12-1-67 pp 745-758 [34] A General Purpose Macrogenerator C. Strachey Computer Journal vol 8, no 3 (October 1965); pp~225-241 [35] The M6 Macro Processor Andrew D. Ha l l B e l l Laboratories; Computer Science Technical Report #2 12 A p r i l 1972 1 5 3 A Glossary of Terms " B o t t i c e l l i i s n ' t a wine you juggins; B o t t i c e l l i ' s a cheese!" For the benefit of those readers who find some of the terms used i n t h i s thesis s l i g h t l y obscure, a brief glossary i s included here. Where there was a choice of several terms, my choice i s explained by a b r i e f note. adjustment when re f e r r i n g to a text segment: the act of positioning the text i n a segment, be i t flush l e f t , flush r i g h t , centred or j u s t i f i e d . argument additional or qualifying information i n a Eureka function c a l l (often known as a 'parameter'; the term 'argument' i s most often used with reference to programming languages, while 'parameter' i s used rather more loosely i n r e f e r r i n g to any gualifying information.) block or text block - the rectangular region of physical page into which text i s composed. boundary with reference to text blocks: any one of the four sides of a block, but most often, as in • f l e x i b l e boundary', 'dependent boundary' or •fixed boundary' with respect to the top or bottom l i n e . delimiter a programming language term refer r i n g to textual markers which separate the parts of a program. In Eureka, the opening and closing brackets (<, :< and >), the separators (,) and the quotation marks ('» and ') . figure text which must be kept together, but may occur at some l a t e r point. That i s , text which need not occur i n - l i n e with other text because i t forms a self-contained unit. These are also c a l l e d ' f l o a t i n g keeps,' but I preferred 'figure' because i t i s shorter and more descriptive of the text's probable nature. I also f e l t i t would be easier f o r 154 the reader to keep clear the difference between figures and keeps than i t would be for keeps and f l o a t i n g keeps.) f l a g a programming term. A f l a g i s a variable, usually Boolean, which indicates a condition, or mode (q.v.). hyphenation a process of deciding whether and where to s p l i t a word across two l i n e s when i t i s the la s t word i n a l i n e , and w i l l not e n t i r e l y f i t on that l i n e . j u s t i f i c a t i o n the procedure by which words and characters are spaced so that they f i t the sp e c i f i e d l i n e width exactly, r e s u l t i n g i n flush l e f t and right margins. keep text which must be kept together and, i f i t cannot be, causes a skip to the next text-containing unit (li n e , block, page). layout a c o l l e c t i o n of blocks forming one page. Also, the format of a page as determined by, the author or editor of the document. letterspacing a typesetting term. Inserting thin spaces between the l e t t e r s of a word so that i t w i l l take up more space in the l i n e . macro a programming language term. A macro i s a st r i n g of text which may be referred to by name. Whenever i t i s referred to, t h i s reference i s replaced by the s t r i n g . E^rameterized macros are those in which certain parts of the string are replaced by the arguments (or parameters) of the reference. mode state or condition of the process. The process may be in several modes at once, such as being i n footnote mode (processing a footnote) and underline mode (underlining a l l words). 155 primitives the set of basic or fundamental operations i n Eureka; those functions not defined by the user. runoff program a document processor normally limited to producing typewriter-quality output for quick reading and edi t i n g . segment in Texture: the smallest adjustable c o l l e c t i o n of text. A Texture segment is terminated by end of l i n e , a tab or a • s p l i t * ; a new segment i s begun whenever an old segment has been terminated. i n Eureka: text between any two gaps. s t r i n g a sequence of characters typesetting program a document processor oriented toward commercial typesetting equipment, as a means of automating parts of the typesetter's job. v i r t u a l page the i n t e r n a l representation of the page as i t i s to be printed when the page i s done. 156 Credits The quotations appearing throughout t h i s thesis have the following sources: After the t i t l e page: Richard Brinsley Sheridan (1751-1816); The School for Scandal I , i Chapter 1: Louis Levinson Chapter 2: Thomas Jefferson Declaration of Independence of the United States of America (1776) Chapter 3: William Shakespeare (1564-1616); A£ii2£2 and Cleopatra I I , v i i Chapter 4: Snorri Sturluson (1179-1241) ; The Prose Edda Chapter 5: William Shakespeare (1564-1616) ; King Richard II l l l , i i Glossary: Punch o r t The London Charivari (1895) 157 "Even a f o o l , when he holdeth his peace, i s counted wise: and he that shutteth his l i p s i s esteemed a man of understanding." - Proverbs 17:28 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051838/manifest

Comment

Related Items