UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The role of exception mechanisms in software systems design Atkins, Margaret Stella 1985

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1985_A1 A83.pdf [ 8.5MB ]
Metadata
JSON: 831-1.0051908.json
JSON-LD: 831-1.0051908-ld.json
RDF/XML (Pretty): 831-1.0051908-rdf.xml
RDF/JSON: 831-1.0051908-rdf.json
Turtle: 831-1.0051908-turtle.txt
N-Triples: 831-1.0051908-rdf-ntriples.txt
Original Record: 831-1.0051908-source.json
Full Text
831-1.0051908-fulltext.txt
Citation
831-1.0051908.ris

Full Text

T H E R O L E O F E X C E P T I O N M E C H A N I S M S IN S O F T W A R E S Y S T E M S D E S I G N  by M. STELLA ATKINS  B.Sc. Nottingham University (England), 1966 M.Phil. Warwick University (England), 1976  A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E REQUIREMENTS FOR T H E D E G R E E OF D O C T O R OF PHILOSOPHY m T H E F A C U L T Y O F G R A D U A T E STUDIES (Department of Computer Science)  We accept this thesis as conforming to the required standard.  T H E U M V E R S I T Y O F BRITISH C O L U M B I A October 1985 © M.Stella Atkins, 1985  \  In presenting this thesis in partial fulfilment  of the requirements for an advanced  degree at the University of British Columbia, 1 agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department  or  by  his or  her  representatives.  It  is understood that  copying or  publication of this thesis for financial gain shall not be allowed without my written permission.  Department The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3  ILL. Oe><?W  DE-6(3/81)  Abstract Exception handling is a crucial aspect of practical programming, particularly in systems allowing logical concurrency such as multi-process distributed systems. First, a survey of existing exception handling mechanisms in operating systems is performed, which shows a diversity of implementations, depending on the process model and the method of inter-process communication. The thesis then develops a model for designing software which exploits the different mechanisms for handling normal and exceptional events. The model is applicable in many multi-process programming environments, and not only preserves modularity, but also enhances efficiency and reliability, while often increasing concurrency. To derive such a model, exceptions in multi-process software are classified primarily according to the program level at which they are detected and handled. Server-to-client exceptions are of particular interest because of their ubiquity; these are exceptions detected by a server and handled by a client. The model treats systems programs as event driven, and proposes dividing the events into normal or exceptional, according to the cost and mechanisms for handling them. Techniques are described for designing software according to three criteria: minimising the average run-time, minimising the exception processing time, and incrementally increasing the program's functionality. Many examples are given which illustrate the use of the general model. Program paradigms in several languages and in several systems are introduced to model features which are system dependent, through illustrative examples for asynchronous i/o multiplexing, and for exception notification from a server to its client or clients. Finally, some programs which have been implemented according to the rules of the model are described and compared with their more conventional counterparts. These programs illustrate the practicality and usefulness of the model for diverse systems and concurrent environments.  ii  iii  Table of Contents 11  Abstract  List of Tables  List of Figures  vii  * be  Acknow ledgement  1 Introduction 1.1 Introduction 1.2 What is an Exception? 1.3 Goals and Scope of the Thesis 1.4 Motivation 1.5 Synopsis of the Thesis  1 1 4 7 9 10  2 Exceptions in Multi-process Operating Systems 2.1 Introduction 2.2 General Exception Mechanisms in Operating Systems 2.2.1 UNIX 2.2.1.1 UNIX Processes 2.2.1.2 UNIX Signals as Exception Mechanisms 2.2.1.3 Intra-process Exception Mechanisms 2.2.2 Thoth 2.2.2.1 Thoth Processes 2.2.2.2 Process Destruction as an Exception Mechanism 2.2.2.3 Intra-process Exceptions 2.2.3 PILOT 2.2.3.1 Processes and Monitors in PILOT 2.2.3.2 Intra-process Exceptions in PILOT 2.2.3.3 Inter-process Exceptions in PILOT 2.2.3.4 Exceptions within Exception Handlers 2.2.4 The Cambridge C A P System 2.2.4.1 CAP Processes and Protected Procedures 2.2.4.2 Intra-process Exceptions in CAP 2.2.5 RIG 2.2.5.1 Exceptions in RIG  12 12 16 16 16 17 18 19 19 20 20 20 21 21 22 22 23 23 25 26 26  iv 2.2.6 Medusa 2.2.6.1 Processes in Medusa 2.2.6.2 Exceptions in Medusa 2.2.6.3 Internal Exceptions in Medusa 2.2.7 Remote Procedure Call 2.3 Mechanisms for Interactive Debugging 2.4 Exception Handling in Action 2.4.1 Inter-process Cooperation 2.4.2 Asynchronous Events -- Terminal Interrupt 2.5 Summary 3 Principles for Improving Programs through Exception Mechanisms 3.1 Classification of Exceptions 3.1.1 Inter-process Exceptions 3.2 Model for Systems Design 3.2.1 Objective A: Minimise the Average Run-time 3.2.2 Objective B: Minimise the Exception Processing Time 3.2.3 Objective C: Increase the Functionality of the System 3.3 Examples Illustrating Minimising the Average Run-time 3.3.1 Partition the Event Set into 2 Groups to Reduce Detection Costs 3.3.1.1 Spell Example 3.3.1.2 Putbyte Example 3.3.2 Reduce Handling Costs by Restructuring Programs 3.3.2.1 Os Example 3.3.2.2 Verex Name-server Example 3.3.2.3 Newcastle Connection Example 3.3.2.4 The TROFF/TBL/EQN Example 3.3.2.5 Point-to-point Network Protocol Example 3.3.3 Reduce Context-saving Costs in Normal Events 3.3.3.1 Read-back Example 3.3.3.2 L N T P Example 3.4 Examples Illustrating Increasing the Functionality of the System 3.4.1 Using Inter-process Exceptions from Server-client 3.4.1.1 ITI < B R K > Example 3.4.2 Ignorable Exceptions 3.4.2.1 ITI Reconnection Example 3.4.2.2 Window-size-change Example 3.4.2.3 Hash Table Example 3.4.2.4 Slow Graph Plotter Example 3.4.2.5 Mesa File-server Example  28 28 28 29 30 31 34 34 40 4 4  46 46 49 52 58 61 63 65 66 66 70 72 72 74 76 78 81 83 83 84 87 87 88 88 88 89 90 90 91  V  4 Program Paradigms for Implementing System Dependent Features 4.1 Program Paradigms for IateF-.process Exception Notification 4.1.1 The KEYBOARD/MOUSE Example 4.1.2 Exception Notification in a Synchronous Message-passing System 4.1.3 Exception Notification in Procedure-oriented Concurrent Languages 4.1.4 Exception Notification in Message-oriented Concurrent Languages 4.1.5 Exception Notification in Operation-oriented Concurrent Languages 4.1.6 Exception Notification in Distributed Systems 4.2 Program Paradigms for Ignorable l:many Exceptions 4.2.1 The Storage Allocator Example 4.2.2 Storage Allocator Using Unreliable Broadcast Notification 4.2.3 Storage Allocator in Mesa 4.2.4 Storage Allocator in Medusa 4.2.5 Storage Allocator in V-kernel 4.3 Exception Handlers as Processes: Mechanisms for Mutual Exclusion  92 93 93 94 97 99 100 101 101 101 107 110 112 112 117  5 Programs Illustrating the Model 5.1 Exceptions and Protocol Implementations -- a New Approach 5.1.1 The ITI Protocol 5.1.2 The Interrupt Protocol 5.1.3 The ITI Implementation in a Thoth-like Operating System 5.1.3.1 Notification of Inter-process Exception Using Process Destruction  123 125 125 127 129  5.1.3.2 Notification of Inter-process Exception Using a Toggle Switch 5.1.3.3 ITI as a Filter and an Exception Process 5.1.4 Problems with Synchronisation 5.1.5 Recovery after network failure 5.1.6 Summary 5.2 A Nested Transaction Mechanism for Atomic Data Updates 5.2.1 Introduction 5.2.2 Overview of the ARGUS Model 5.2.2.1 Atomic Objects 5.2.2.2 Nested Actions 5.2.2.3 Guardians 5.2.3 Implementation Considerations. 5.2.3.1 Introduction 5.2.3.2 Requirements for correct implementation  130 131 133 136 140 141 143 ' 143 144 144 145 145 147 147 148  VI  5.2.4 Features of the Implementation 5.2.4.1 Transaction Invocation 5.2.4.2 Atomic Data Access 5.2.4.3 Subaction Commit 5.2.4.4 Top-level Commit 5.2.4.5 Transaction Aborts 5.2.4.6 Transaction Read/Write Conflicts 5.2.4.7 The Group Leader's Role 5.2.4.8 Group-Leader's TOP-COMMIT Algorithms 5.2.4.9 Group Leader's ABORT Algorithms 5.2.5 Efficiency of Mechanisms  150 151 152 153 153 154 154 156 157 161 162  6 Conclusions 6.1 Summary of results 6.1.1 Characterize the Nature of Exceptions 6.1.2 Model for Systems Design 6.1.3 Program Paradigms for Systems Design 6.1.4 Program Examples 6.2 Further Work  165 165 166 167 168 169 169  References  172  Appendix A. The putbyte program Appendix B. The os program Appendix C. The TROFF/TBL/EQN system  176 ,  181 187  vii  List of Tables  3.1  Execution Costs for a General Event Manager  67  5.1 5.2  Number of SendReceiveReply Messages on < B R K > Comparison of Locus and V-kernel Nested Atomic Actions  131 162  A. l B. l  Execution Costs for the Putbyte Routine Execution Costs for the Os Utility  177 183  viii  List of Figures  2.1  Thoth Server-client Notification using a Victim Process  37  3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9  Level 1 Classification of Exceptions A Classification of Inter-process Exceptions Level 1 Model for Systems Design Level 2 and 3 Model to Minimise the Average Run-time Model to Minimise the Exception Processing Time Model to Increase the Program's Functionality Expected Execution Costs for Different Values of np and n Restructuring the Verex Name-server to Reduce Ipc in the Normal Case Restructuring Newcastle Connection to Reduce Ipc in the Normal Case  46 49 55 59 62 64 70 75 77  4.1 4.2 4.3 4.4 4.5 4.6 4.7  Thoth Server Notification Using Two Worker Processes Thoth Switch Notification Levin's Storage Allocator A User of Levin's Storage Allocator Storage Allocator Using a Monitor and Broadcast Exceptions Pool-server in V-kernel A User and Exception Handler of the V-kernel Storage Allocator  95 96 103 105 109 114 117  5.1 5.2 5.3 5.4 5.5 5.6 5.7  ITI Implementation as a Server Network Packets to be Exchanged on an Interrupt Process Switches to Handle an Interrupt Using Process Destruction ITI as an I/O Filter Process Switches on < B R K > with a Separate Exception Process Formation of a Version Stack with Nested Transactions Group Leader Communication Structure  126 128 130 134 135 155 156  B. l  The Initial Version of Os  182  C. l C.2  Pipe Worker Processes Message Traffic for Handling an Equation  187 189  ix  Acknowledgements Thanks to Son Vuong, my supervisor, for his support during the last few years, and to my thesis committee of Sam Chanson, Doug Dyment, Paul Gilmore and Mabo Ito, for their active encouragement and advice on the presentation of this thesis, and to Roy Levin for his constructive comments and criticisms. Thanks to David Cheriton who started my investigation into exceptions, and who gave continued inspiration and advice in the first few years while he was at the University of British Columbia. Thanks to the many graduate students and staff in the department who have helped me through discussions and criticisms, especially to Ravi Ravindran, Steve Deering and Vincent Manis, and to my colleagues Rachel Gelbart and Lisa Higham for their continued support. Many thanks to Roger Needham and the staff and students at the Cambridge Computer Laboratory for their friendly support and use of their facilities, and thanks also to the staff at Simon Fraser University for letting me finish. Finally, thanks to Derek my husband, and to my children, for their patience.  CHAPTER 1 Introduction  1.1. Introduction A crucial aspect of practical programming, particularly in systems allowing logical concurrency such as multi-process distributed systems, is exception handling. Over the last few years some operating system and programming language structures have been developed to aid in the handling of exceptional events. The developments are interrelated because of the recursive nature of computer systems: the operating system must be written in a programming language, and the program must be run by an operating system. The various exception handling mechanisms that have been proposed for programming languages all aim to separate the common from the uncommon case and thus to help programmers separate their concerns, by preserving modularity. In general this is achieved by handling an exception at a higher level of abstraction from where it is detected. This is reflected in most embedded programming language mechanisms, where handlers for exceptions are searched in a path which is directed up the so-called procedure  calls hierarchy.  Of course,  the burden of having to write the code for the uncommon case cannot be eliminated. The claimed methodological advantage of embedded exception handling mechanisms is the separation of such code from the main line program both in the time of its writing and in its location in the program text. Similarly, in operating systems software, the facilities for handling exceptional events have been developed for modularity in programming. Many modern operating systems are  1  4*  structured as a hierarchy of processes, where it is possible for a process to control subordinate processes while itself being subject to the control of another. For modularity it is desirable to provide the same exception handling facilities at every level in the hierarchy. However, in operating systems, performance and reliability are additionally crucial to success, and the exception mechanisms must also be designed to enhance the overall system performance and reliability. For example, both the user and the operating system require mechanisms for handling errors — otherwise an erroneous program which never halted could only be stopped by switching on" the power supply! In modern concurrent operating systems where a program may consist of several processes which may be running on separate processors, the need for such mechanisms is even more acute. As Nelson remarks in his thesis on Remote Procedure Calls [Nelson 81]: " Machinery for aborting processes and forcing asynchronous exceptions may seem somewhat violent to the gentle programmer of sequential abstractions. But realistic remote procedure environments and, indeed, realistic applications will in fact require this machinery to provide robust services and exploit the full performance of distributed systems." The user requires a mechanism to allow him to terminate program execution arbitrarily, and such a mechanism is typically invoked by hitting the terminal < B R K > key. The operating system must respond to such an event by terminating the offending program, reclaiming the resources that were allocated to it, and generally attempting to return to the state which existed before the program was run. Such activity may be considered as an example of exception handling in operating systems. This thesis studies not only the problems which designers of concurrent operating systems have to overcome in handling exceptional events such as the < B R K > described above, but also the problems of systems design exploiting exception mechanisms in general, in a  3 multi-process environment. The high level language exception mechanisms are usually engineered so that their use is economic only when they are rarely exercised; the very circumstance in which an error of logic is likely to remain undetected [Black 82]. Furthermore none of the high level language mechanisms extend to multiple processes (except for an exception ABORT); they are concerned with exceptions detected and handled within a process. The structures which are available for handling exceptional events in a multiple-process environment are specific to the particular environment; most are not portable to other systems, and they do not pretend to be of general use. This research extends the development of linguistic exception mechanisms which deal with intra-process exceptions, to mechanisms in a multi-process environment; in particular to event driven  programs used in operating systems. An event driven program is one which  responds to events which may be generated externally to the computer system (such as by the user hitting the terminal < B R K > key), or by other processes within the computer system (such as an i/o completion event or a timer firing). Peterson and Silberschatz state that operating systems are event driven programs [Peterson 83], in that if there are no jobs to execute, no i/o devices to service, and no users to respond to, an operating system will sit quietly, waiting for something to happen. We use the term event driven more widely, to include system servers (or monitors) which are structured hierarchically, with a client, or clients, making requests of a server below, which in turn may use the resources of another server (or may make a nested monitor call). We  also view some data-driven programs as event driven programs -- in these cases, the  events correspond to the value of the data received. The program units which respond to these events are called event managers. In some cases, the events occur randomly, with unk-  4  nown probabilities; in others, the probability of an event is deterministic. In event driven programs where the relative probabilities of the events are known, we can say that the event managers execute some small percentage of the code to handle the normal or most probable events, and the rest of the code is executed to handle the unusual events. Now it is computer folklore that 90% of most program code is executed 10% of the time. It is therefore important to structure such event managers so that the normal case is clearly differentiated from the unusual cases, so the programmer can separate his concerns, to writing efficient code for 1  the normal cases which are executed most of the time . This thesis develops principles for writing efficient event managers; principles which exploit different mechanisms for handling normal and exceptional events. The principles are used to derive a model for designing software, applicable in multi-process programming environments, which not only preserves modularity, but also enhances efficiency and reliability, and often increases concurrency. 1.2. What is an Exception? Exceptions have often been considered as errors, although most designers of exception handling mechanisms claim that their mechanisms, while suitable for dealing with errors, are also valuable for other events. Unfortunately this guideline is of little use because the term error has as many interpretations as exception.  Black devotes a Chapter in his thesis Exception to the topic What are Exceptions? tine  cannot  1  complete  Handling  — the case against [Black 82],  He tries out phrases like an exception occurs when a rou-  the task it is designed  to perform  (from the C L U reference manual  But note that in systems where the probability of an event is not known a priori, such structuring may not be possible.  5 [Liskov 78]), inconvenient detected  results  of an operation  at run-time, (from the ADA  tion insufficiency  (from [Goodenough 75]), errors that must be  definition [US 83]), and means of indicating  implementa-  [Black 82] and finally observes that none of these definitions describe an  abstract object called exception, which can be neatly formalised. In his thesis Program  Structures  for Exceptional  Condition  Handling,  Levin [Levin 77]  rejects any connotation of error by exception, but he also rejects the common guideline that an exception is a rarely occurring event, as the frequency of many events depends on context. For example, looking up a name in a compiler's symbol table may yield two results: name present or name absent. Which of these is more frequent depends on context. Why  should  one result be considered exceptional, when the other is not? Levin's solution to the dilemma is to treat both results as an exception. The dictionary defines the word exception in a manner which captures the essence of the notion of unusual case: i.e. " exception - a person, thing, or case to which the general rule is not applicable" -- and this is the spirit in which we use it, strenuously avoiding the connotations it has as a synonym for error, observing that errors are a strict subset of  exceptions.  (At least, we hope that errors are unusual eventsl). We therefore define an exception as a rare event which is treated differently to a normal event. This thesis claims that if an event can be defined as being unusual in most contexts, then it can be considered an exceptional event. This definition agrees closely with Levin's for the problems presented in this thesis. However, it is accepted that in some problems where the context changes, the exception events could become the norm. For example, a system resource-manager may receive one request for almost all its resources, and for the next week it may be running in the exceptional  mode of being short of resources. If this situation  is judged  6  to be exceptional, then it can still be claimed that there are enough resources for most requests in the normal mode of operation for that resource-manager. If however, the probabilities of the events are unknown or very variable, then the best solution may be to treat all events in the same way, and many of the design proposals in this thesis are not applicable in such situations. It transpires that there are several cases encountered in computing practices, particularly in operating systems, which are not generally considered as exceptions, but which also fall into this category under the dictionary definition. For example, program initialisation code which is executed only once and hence usually does not follow the general rule. Similarly, opening a file before it can be read or written is an essential part of the program execution, but the execution of this event occurs with a low frequency compared with execution of other events on the file, such as reading and writing to the opened file. An  exception  mechanism  is composed of three parts: detection  hardware or software), notification  and  handling.  Notification  (which may  be by  is the act of signalling the  occurrence of an exception to interested parties; in a multi-process system one or more processes may  be involved. Together, the detection and notification may be considered as  drawing attention to the exception event. Exception  handling  is defined as the execution of  some actions in response to the exception notification. All these components of exception mechanisms are discussed in detail in the thesis. When, then, should we use exceptions? For example, when the End-of-File has been read, is it treated as an exception or as a normal, but relatively infrequent event? There is no clear demarcation between these viewpoints, but they may  require different programming  styles, and may be implemented by different mechanisms which may incur different overheads  7 at compile-time and run-time. Our research addresses these issues in detail, and shows that generally an exception-handling mechanism should be used when a programmer wishes to improve clarity of the normal-case computation, provided it does not incur unreasonable costs or impair reliability. By using an exception mechanism we attempt to reduce cost by making 2  the normal case simpler and faster .  1.3.  Goals and Scope of the Thesis  The thesis of this dissertation is that designing systems software to exploit the use of exception mechanisms leads to efficient modular programs which are also easy to write and to understand. Our goals are to show how to design efficient and modular systems programs exploiting exception mechanisms. This thesis does not include discussion of hardware reliability; this topic has been adequately reviewed in another survey paper [Randell 78]. We do not discuss in detail the exception mechanisms embedded in high level languages, as Black has covered this in his thesis, although these mechanisms are referred to throughout the text. Issues of proof of program correctness are also considered beyond the scope of this thesis, although correctness issues are referenced whenever applicable. To achieve the goals some general principles for exception mechanisms are needed — principles which have universal application across languages and systems. The systems we consider are event driven, where the relative probabilities of the events are known a priori.  In  such systems it is possible to define what is an exceptional event. But context is not the only factor in designating what is an exceptional event -- the language/system in which 2  again with due regard to the fact that should the operating context change, the previous exception events may become normal case with corresponding changes (sometimes a dramatic increase) in run-time.  8 computations  are  expressed  also influences what is an  exception.  Therefore  the  languages/systems in which exceptional events occur must also be considered. Only then can generalities be made about what constitutes an exceptional event, and how best to handle it. Thus the first goal is to characterize  the nature  of exceptions.  We do this by considering a  wide range of systems problems from different areas of systems design, in particular, event 3  managers such as system servers, communication protocols and data-driven system programs . A general classification of exceptions is then derived. The next goal is to provide a model for designing efficient and modular software in event driven programs in a multi-process environment, by exploiting exception mechanisms. This goal is achieved by considering three different criteria for program design: minimising the average run-time, minimising the exception-handling time, or increasing the program's functionality in a modular way. A model is derived which provides system design guidelines for these three criteria. Because of the diversity of languages/systems, the final goal is to provide program paradigms which" employ the design principles propounded in the general model, in particular for the system dependent features of the model. The program paradigms may then be used as models for implementation in different environments. To fulfill this goal, several programs are described in detail, and two of the design models which have been implemented together with their exception-handling mechanisms are described to show their applicability. The operating system Verex [Lockhart 79], a derivative of Thoth [Cheriton 79a] was used to perform some of the research, and also TRIPOS [Richards 79b], implemented on the Cambridge Ring local area network [Wilkes 80]. Later, the UNIX 4.2BSD operating system 3  Note that we do not discuss in detail the inner kernel features which are driven by hardware interrupts; our research is concerned at a higher level of abstraction than that provided by a raw kernel.  9 [Joy 83] was used, implemented on VAX computers and SUN workstations connected over an Ethernet local area network [Metcalfe 76]. Thus the designs were tested on different real systems, to show how requirements of adequacy and practicality were met.  1.4. Motivation  The major motivation is that problems are encountered in managing the numbers and types of special cases and unusual circumstances when writing software for concurrent systems, and there are few, if any, answers to these problems. As programs grow in size and complexity, special cases and unusual circumstances are compounded. Controlling and checking for exceptions takes more and more effort. With the advent of parallel programming, distributed systems and computer networks, not only the number of exceptions increases, but also the type of possible exceptions. Exceptions can be  within sequentially executed programs, between tasks sharing memory but operating in parallel, and between tasks executing in parallel which do not share any memory. Research on this subject is needed because there are very few, if any, exception-handling techniques implemented for multi-process programs that preserve structural clarity, and that have simply defined semantics so that the techniques are easy to use and understand. In support of this claim, the authors of one multi-process system, PILOT [Lampson 80] written in the high level language Mesa [Mitchell 79] comment that: Aside from the general problems of exception-handling in a concurrent environment, we have experienced some difficulties due to the specific interactions of Mesa signals with processes and : monitors. In particular, the reasonable and consistent handling of signals (including UNWINDS ) in entry procedures represents a considerable increase in the mental overhead involved in designing a new monitor or understanding an existing one. Motivation from my own experiences came when writing software for Verex, a multipleprocess operating system. It was desirable to trap errors in programs so that they could then  10  be conveniently debugged instead of immediately destroyed. An error-handling mechanism which optionally diverted program termination by handing control to the interactive debugger was implemented by the author. It transpired that the mechanism could be used to handle more than erroneous situations. In considering just how and where the mechanism would be used, many problems of exception-handling had to be addressed. For example, should the program unit raising the exception be allowed to resume from the place where the exception was raised, or should the exception handler terminate execution of the current program unit? Is it desirable, or even feasible, to use the same mechanism to handle exceptions in sequential code, exceptions between parallel tasks sharing memory, and between parallel tasks in a distributed system which have no shared memory? How does interactive debuggingfitinto the scheme? It is the aim of this thesis to answer some of these important issues. The main contribution of my thesis is in characterisation of exceptions in the context of multi-process systems software, and in the development of a model and program paradigms which improve system programs by exploitation of exception-handling mechanisms ~ either by making the programs more efficient at run-time, and/or by making the program code more modular and reliable. My work also considers the use of exception mechanisms in high level languages in developing systems software and application programs, and attempts to unify the different approaches taken by message-based and procedure-based systems.  1.5. Synopsis of the Thesis  In Chapter 2 exception handling mechanisms in several recent multi-process operating systems are presented. Most of these operating systems use different constructs for exceptional events within a process and between processes. These in turn depend on the character of the process which the operating system supports and the inter-process communication  11  mechanism. Thus the discussion of exception mechanisms in operating systems includes comparisons of various process and inter-process communication concepts, and also includes a discussion of the various mechanisms available for interactive debugging, as these are often claimed to be useful for exception handling. Two typical problems in operating systems design are then presented, and current solutions given for which exception mechanisms have been used. In Chapter 3 a taxonomy of exceptions is derived from these examples, including a new class of ignorable exceptions. A model for systems design is then propounded, based on the classification. Many examples are given to illustrate the model. In Chapter 4 program paradigms in different languages and systems are described for example problems. These paradigms are based on the general model, but they employ system dependent features such as the server-client inter-process exception notification mechanism. Chapter 5 describes implementations of two systems programs which were designed according to the model, showing the suitability and practicality of the approach. Chapter 6 concludes with a summary of the thesis, and considers further areas for systems design methodology which use exception mechanisms.  CHAPTER 2 Exceptions in Multi-process, Operating Systems  2.1.  Introduction  The  history of programmed exception-handling has been reviewed by Goodenough  [Goodenough 75] and Levin [Levin 77], both of whom made new proposals for exceptionhandling in high level language design. Since 1977, there has been more emphasis on formal exception-handling in both high level language and in multi-process operating system design. It is worthwhile to make a review of multi-process operating systems, because the nature of their exception mechanisms depends on the process model and on the inter-process communication mechanism supported by the operating system. A multi-process operating system is defined as a system which provides facilities for multiple concurrent user processes which behave as the active entities of the system. It is possible for one process to create another process dynamically. A process may acquire, release and share resources, and may interact, cooperate or conflict with other processes. In multi-process operating systems the designers have to deal with the problems of synchronisation and deadlock. Synchronisation is needed, both for mutual exclusion (because of sharing) and for sequencing (to impose event-ordering for cooperation among processes). The survey article Concepts  and  Notations  for Concurrent  Programming  by Andrews  and Schneider [Andrews 83] is taken as a guide to the review. However, Andrews and Schneider confine their discussion to concurrent high level languages whereas we are also concerned with concurrent operating systems. They divide the synchronization primitives into  12  13  two groups; those which rely on shared memory, and those based on message passing. The operating systems discussed fall into several different classes, including systems from both the above groups. (1) The UNIX time-sharing operating system [Ritchie 74] was developed at Bell Telephone Laboratories. It is a multi-user system which allows each user to create multiple processes to run in parallel, and contains a number of novel features such as pipes which have made it very popular. It is written in the high level language C [Ritchie 78]. UNIX-style operating systems have a large kernel supporting user programs through use of system calls. A null system call typically takes .6 millisecs on a V A X 11/750; a read or write system call takes at least 2.5 msecs. Processes are large and there is a high overhead on process creation. We also consider distributed versions of UNIX such as LOCUS [Popek 81] and Newcastle Connection [Brownbridge 82], where several UNIX hosts are connected over a local area network. (2) Thoth was designed as a portable multi-process real-time operating system at the University of Waterloo [Cheriton 79a], The aims of Thoth [Cheriton 79b] were to achieve structured programming through use of many processes, both for concurrent and sequential tasks. Various facilities are provided to make this structuring attractive; small, inexpensive processes, efficient inter-process communication by messages or shared memory, and dynamic process creation and destruction. Thus, the Thoth style is to split certain sequential programs into many cooperating processes. Process switching typically takes 0.5 msec on a SUN workstation. Derivatives of Thoth are Verex [Lockhart 79], [Cheriton 81], and PORT [Malcolm 83]. There are distributed versions; Vsystem [Cheriton 83a, 83b] where an identical kernel resides on hosts connected over a  14  local area network, and Harmony [Gentleman 83], where multiple 68000 microprocessors are connected by a Multibus. Object-oriented operating systems such as PILOT, designed at Xerox P A R C for the personal computer environment [Redell 80] written in the language Mesa, rely on shared memory, and use monitors for synchronisation of processes. The overheads on monitor calls are only a little more than for procedure calls, but there is no direct mechanism for general inter-process communication between arbitrary processes. PILOT is a singlelanguage, single-user system, with only limited features for protection and resource allocation. PILOT and Mesa are mutually dependent; PILOT is written in Mesa and Mesa depends on PILOT for much of its run-time support. Thus Mesa monitors were chosen as the basic inter-process communication paradigm for the processes of PILOT, as described by Lampson and Redell in [Lampson 80]. For distributed object-oriented operating systems, a remote procedure call mechanism is appropriate [Nelson 81], [Spector 83]. CAP  [Herbert 78] is a multi-user single processor machine built at the University of  Cambridge for research into capability-based memory protection [Wilkes 79].  The  machine is controlled by a microprogram which provides the user with a fairly conventional instruction set, together with the support of a capability-based addressing and protection scheme. In this protection scheme, the fundamental notions are  capability  and object. A capability identifies (i.e. names) some object, and possession of a capability allows some access to the object it names. If the object contains executable code it is called a protection  domain.  The programs of CAP  are protected from each other; they  are mutually suspicious subsystems (in contrast to those of PILOT). Most of the  CAP  15  operating system is written in ALGOL68C [Bourne 75]. Users may not perform much multi-tasking in the CAP system, but these protected systems present peculiar problems with respect to process creation and destruction, which warrants their inclusion in this discussion. (5)  RIG [Lantz 80] is a distributed message-based multi-process operating system developed at Rochester University. It differs from Thoth-like systems in that it has no shared memory between processes. RIG uses special Emergency  messages  for exception han-  dling. (6)  Medusa [Ousterhout 80], [Sindhu 84] is a unique operating system designed for a particular multiprocessor hardware configuration, Cm*,  at Carnegie-Mellon University. Like  RIG, it introduces several novel approaches to exception management and is thus included here. (7)  Remote Procedure Call (RPC) is also discussed briefly, as Nelson in his thesis Remote Procedure  Call claims it can be used as a programming language primitive for construct-  ing distributed systems, although it has no special exception mechanisms. General exception mechanisms are discussed for all these types of multi-process operating systems. Now  interactive debugging is often cited as a use for exception mechanisms, so the vari-  ous mechanisms available for achieving interactive debugging are described next. Exceptions occurring in two typical example problem situations in operating systems are then described. In many operating systems supporting multiple processes, interaction between processes is often of the server-client  relationship. The server process typically  manages a shared resource, which is requested by client processes, either locally or remotely.  16 To the client, the request for service appears like a synchronous procedure call; the call returns when the request is satisfied, and the results may be passed as parameters. For example, in most message-based operating systems, the host's files are managed by a  file-server  process. If the client processes are viewed as cooperating together, as is the case for a singleuser workstation environment, special communication may be required to enhance the performance. The first example discusses the management of files by cooperating processes. When the server provides access to input-output devices (i/o devices), the desired communication may be more complex than that provided by a synchronous procedure call. For example, the terminal server may wish to communicate to the client that the user has pressed the interrupt key ( < B R K > ) . But the client may be outputting to the terminal at that moment, and may not be reading from the device. Such  asynchronous  events  do notfitinto  the procedure call paradigm, and alternative solutions to handle such events must be found. The second example describes how different operating systems allow the user to handle such asynchronous events.  2.2. G e n e r a l E x c e p t i o n M e c h a n i s m s i n O p e r a t i n g S y s t e m s  2.2.1.  UNIX  2.2.1.1. U N I X Processes  UNIX has a structure similar to the classical 2-level operating systems where the kernel contains nearly all of the operating system code. A user process executes either in mode  or in user  one another.  mode.  kernel  A UNIX user may have many processes which do not share data with  17 In kernel mode, a process uses the kernel stack and data space, which is common to all processes operating in kernel mode. Thus for execution of i/o, a user process will switch modes via a system call from user to kernel, allowing access to shared system data. Synchronisation is required between these so-called kernel processes  to avoid conflict. This is  achieved by various mechanisms such as raising the priority of a kernel process executing critical code to avoid interrupts till it has tidied up, or by setting explicit locks on shared resources. This simple way for achieving mutual exclusion using a non-preemptive scheduler works for a single processor system but would be inappropriate in a multi-processor system. Kernel processes communicate through the shared data of the kernel. In UNIX Version 7 user processes may communicate by creating a pipe, which provides a buffered stream of characters. If one process attempts to read from an empty pipe it is suspended. Thus pipes are not suitable for reporting asynchronous exceptions such as user interrupts, because of their blocking characteristics. In UNIX 4.2BSD [Joy 83], there are facilities for inter-process com1  munication through use of sockets , although there is a high overhead on this communication. For example, a Send-Receive-Reply from one process to another on the same machine through a socket takes 10 msecs on a V A X 11/750.  2.2.1.2. U N I X Signals as E x c e p t i o n M e c h a n i s m s  A user process may receive notification of an exceptional event by a signal. 2 A signal is a binary flag, which works like a software interrupt.  A process that has been sent a signal will  start to execute code corresponding to the signal when it is next activated by the operating system. The default action of all signals is to stop the receiving process.  Hn UNIX Version 7 sockets are not available though pipes are available in both UNIX Version 7 and in UNIX 4.2BSD 2 the only direct inter-process communication mechanism available in UNIX Version 7  18 The signal mechanism is very simple: each process has a fixed number (16) of ports at which a signal may be received. The user can specify different treatment of a signal by associating a port with a procedure (i.e. a signal handler). A signal is usually generated by some abnormal event, initiated either by a user at a terminal, by a program error, or from another process. One signal is SIGINT which is usually sent to the process running a command when the user presses < B R K > on the terminal. The user can chose to ignore the signal, or execute some clear-up code in the signal handler before terminating. One signal, SIGKILL, cannot be ignored or handled. Thus a process receiving SIGKILL is forced to terminate immediately, with no chance to tidy up at all. To prevent a user from arbitrarily killing processes belonging to a different user, the user-id of the process issuing the SIGKILL signal must be the same as the target process. Although the signal mechanism is simple to implement and is very powerful in its generality, it has the disadvantage that, like a hardware interrupt, a signal can occur whenever a process is active, leading to non-deterministic execution. The common way of dealing with this asynchronous event is for the signal handler to set a global flag and then to resume execution at the point of interruption. The process inspects the flag when convenient. This involves two stages in the signal handling, which can lead to errors while handling multiple calls of the signal. Furthermore, signals have the same priority, so multiple signals are processed non-deterministically. Finally, a process receiving a signal cannot determine which process sent it, so their use is necessarily restricted to well-defined cases.  2.2.1.3. Intra-procesB E x c e p t i o n M e c h a n i s m s  When a process has received a signal, the signal handler may wish to restore the process to some previously known state. A UNIX process may use the C language library functions  19 setjmp  and l o n g j m p for providing a very basic method for passing over dynamic program  levels, equivalent to a non-local GOTO, later use by l o n g j m p . setjmp.  setjmp(env)  longjmp(env,val)  saves its stack environment in  env  for  restores the environment saved by the last call of  It then returns in such a way that execution continues as if the call of s e t j m p had  just returned the value  v a l to  the function that invoked  setjmp.  The feature is really only intended as an escape from a low-level error or interrupt. This mechanism provides no opportunity for procedures which were active but have now been terminated (passed-over procedures) to clear up any resources they may have claimed. However, it is cheap and easy to implement. The mechanism has been enhanced by Lee [Lee 83], who defines macros so that the C language appears to be extended with exception-handling operations, similar to those incorporated into the ADA language [US 80].  2.2.2.  Thoth  2.2.2.1. T h o t h P r o c e s s e s  Each process belongs to a team which is a set of processes sharing a common address space and a common free list of memory resources. Each process on a team also has its own stack and code segments, like coroutines. Processes which share no address space are said to be on different teams. Individual processes on a team are globally addressable, and may communicate via messages. Inter-process communication is achieved through fully-synchronized message passing; concurrency is achieved with multiple processes, one for each event, rather than with a non-blocking Send, which would use buffered messages.  20 2.2.2.2. P r o c e s s D e s t r u c t i o n as a n E x c e p t i o n M e c h a n i s m  In Thoth, the idea is to have a separate process for each possible asynchronous event such as an i/o interrupt, and to ensure that it executes fast enough to be ready for subsequent interrupts. Each event is synchronous with respect to the process that responds to it. Global asynchrony of the total program is achieved by execution of its multi-process structure. The claim is that the synchrony of each individual process makes the program easier to understand. All asynchronous communication is handled by process destruction, as follows. First, each process that encounters a fault or exception, or that is the subject of an attention, is destroyed. Second, if the program is to continue running after one of these conditions, the part to remain is designated as a separate process from the process to be destroyed. Processes which are blocked awaiting messages from, or sending messages to a process which has been destroyed are awakened immediately; the death of a process is detected synchronously when another process attempts to communicate with it. 2.2.2.3. I n t r a - p r o c e s s E x c e p t i o n s  No special mechanism exists for handling within-process exceptions in Thoth, and the Zed language in which it was written [Cheriton 79c] provides no special features for exception-handling. Processes incurring exceptions are destroyed so their state does not need to be remembered.  2.2.3.  PILOT  \  21 2.2.3.1.  Processes a n d M o n i t o r s i n P I L O T  PILOT supports two types of processes -- tightly-coupled processes which interact through the shared memory of a monitor (these are similar to the processes on a team in Thoth), and loosely-coupled processes which may reside on different machines, communicating via messages passed over a network. New processes are created by a special procedure activation which executes concurrently with its caller. Such a new process has its own local data, but may also share data with the parent process. Synchronisation and inter-process communication  of these processes is  achieved through monitors with condition variables [Hoare 74]. A monitor acts as a basic mechanism for providing mutual exclusion, as only one process may be executing code in a monitor  procedure at a time. Thus access to a shared resource is usually managed  entry  through the  entry  message-passing  procedures of a monitor. So a monitor is like a server process in a  operating system. Synchronisation and multiplexing of client processes  accessing a shared resource are provided by explicit use of condition variables rather than by queues of messages.  2.2.3.2. I n t r a - p r o c e s s E x c e p t i o n s i n P I L O T  The Mesa language provide extensive exception-handling facilities in sequential code. Exceptions are propagated through the  calls hierarchy  until a handler is found. The root pro-  cedure of a Mesa process has no caller; it must be prepared to handle any exceptions which can be generated in the process. Uncaught exceptions cause control to be sent to the debugger; if the programmer is present he or she can determine the identity of the errant procedure. Unfortunately this takes considerable time and effort. The interaction between the Mesa exception-handling and the PILOT implementation of Mesa processes has been found to  22 be irksome [Lampson 80]. It can be expressed as a limitation of the signalling mechanism that an exception signal cannot be propagated from one process to the process which created it.  2.2.3.3.  Inter-process E x c e p t i o n s i n P I L O T  No special form of inter-process exception is provided in Mesa. It is difficult to communicate exceptional events using monitors. For example, if a pair of processes are communicating through a monitor and one dies, there is no means for the remaining process to be notified. Instead, a timeout interval may be associated with each condition variable, and a process which has been waiting for that duration will resume regardless of whether that condition has been notified. Interestingly, PILOT was originally designed to raise an exception if a timeout occurred. It was changed because programmers preferred the less complicated special timeout mechanism for simple retry situations, rather than employing the intra-process Mesa exception mechanism to handle such retries.  2.2.3.4. E x c e p t i o n s w i t h i n E x c e p t i o n H a n d l e r s  A further complication in the interaction between Mesa monitors and the exception handling mechanism arises when an exception is generated by an entry procedure of a monitor. There are two ways of dealing with this. (1) A SIGNAL statement will call the appropriate exception handler from within the monitor, without releasing the monitor lock (as the monitor invariant might not be satisfied at the time of the SIGNAL statement). This means that the exception handler must avoid invoking that same monitor again, else deadlock will result. Also if the handler does not return control to the monitor, the monitor entry procedure must provide an UNWIND handler to restore the monitor invariant. Mesa automatically supplies the  23 code to release the monitor lock if the UNWIND handler is present, but if the entry procedure does not provide such a handler, the lock is not automatically released, leading to the potential for further deadlocks. (2) Alternatively, the entry procedure can restore the invariant and then execute a R E T U R N WITH ERROR statement which returns from the entry procedure thus releasing the monitor lock, and then generates the exception. However, neither mechanism is checked at compile time so their misuse is possible.  2.2.4. T h e C a m b r i d g e C A P S y s t e m  2.2.4.1. C A P P r o c e s s e s a n d P r o t e c t e d P r o c e d u r e s  CAP supports a hierarchy of processes based on the master coordinator, which is a single protection domain roughly equivalent to the UNIX kernel. Any process can dynamically set up a sub-process by executing the enter-subprocess instruction. A process may call a procedure in another protection domain during execution, if it has a special ENTER capability for it. These procedures, accessible behind E N T E R capabilities, are called cedures  protected  pro-  (PPs) and provide both protection and modularity. Each protection domain has its  own stack and virtual address space. Changing protection domains is roughly equivalent to a Supervisor call in UNIX, or making an entry to a monitor in PILOT. Protected objects, such as a file directory, are managed by PPs, which may be entered by any process possessing an appropriate capability. Although it is always entered at the same place, a PP usually examines its private data structures left by the the previous call, and makes further decisions on the basis of what it finds. In the CAP operating system, each PP is written as a complete ALGOL68C program, not as a procedure. A special run-time  24 library has been written to allow PPs to have a relationship very similar to a coroutine structure. The domains are fully protected from one another except for capabilities which are explicitly passed as arguments.  A C A P process is composed of the various PPs independently compiled, which can then be bound together to form a system image. A certain amount of dynamic linking of new PPs into a process is provided, but creation and deletion of new processes is a non-trivial task (cf. Thoth and PILOT). During execution, the CAP process crosses protection boundaries, behaving somewhat like a UNIX process making calls to privileged system routines and changing its mode from user to kernel. However, the UNIX system is only 2-level, whereas there can be many levels of PP calls in CAP. The CAP operating system PPs are required to provide several kinds of services, the majority being concerned with gatekeeping [Needham 77] (i.e. control of access to a service or resource). One example is in calls to the master coordinator.  In the C A P operating system,  critical Sections which may not be interrupted are executed only in the master coordinator process. Access to the ENTER-COORDINATOR instruction is pre-checked by a gatekeeper PP called ECPROC, running within the caller's process. Alternatively, some services are provided in dedicated systems processes, rather than as PPs in the user's process - such processes correspond roughly to the classic server process model. Access to the service is controlled by a PP in the user's process. Thirdly, some system-wide services such as message buffers are managed by a PP which has exclusive capabilities for the data structures. Processes wishing to access buffers must request the PP to do so on their behalf.  Inter-process communication is through messages held in buffers which are dynamically allocated from a fixed pool. They are accessed through objects called message channels.  A  PP exists for setting up message channels; it checks software capabilities and establishes a communication path between processes. A successful call to this procedure results in one or more capabilities for more primitive message passing procedures such as SENDMESSAGE, which is one of the entries to ECPROC, with arguments specifying the (software) capabilities for the message channels. ECPROC performs any transfer of data or capabilities which may be required, and then makes a call to the master coordinator if any scheduling action is required.  2.2.4.2. I n t r a - p r o c e s s E x c e p t i o n s in  CAP  CAP  has a primitive fault-handling mechanism which causes a simulated E N T R Y to a  particular protected procedure called F A U L T P R O C in the user's process. F A U L T P R O C examines the state of the process and takes special action on virtual memory faults and linkage faults, eventually returning to retry the failing instruction. For other faults, it alters the stacked program counter of the faulting PP to a fixed value, and returns to it, having stored useful information about the fault in a fixed place. It also sets a flag to cause a further fault when the original PP itself does a RETURN, similar to the mechanism that is used to propagate asynchronous attentions. The code at the fixed address can examine its environment, including the information stored away by FAULTPROC, and decide what to do. Faults which have not been dealt with by a PP are propagated back to the calling domain, with an appropriate change in semantics on crossing the protection boundary. For example, the fault limit violation  incurred by a procedure is distinct from called domain suffered a limit  and took no corrective  action.  violation  26 2.2.5. R I G  2.2.5.1. E x c e p t i o n s i n R I G .  In RIG, internal exceptions (i.e. within-process exceptions), are handled by a procedure call oriented mechanism, based on two library procedures in the implementation language BCPL [Richards 79a].  Errorset and E r r o r allow an error notification to propagate back up  the calls hierarchy to a designated point. The program state at the point where the exception is raised, is lost. Thus RIG does not provide for program resumption. Errorset is a function that accepts a severity level and a procedure as arguments. E r r o r accepts a severity and error code as arguments. If E r r o r is called, the call stack is unwound to the point of the most recent Errorset with a severity level equal to or greater than the severity of the error. The error code is then returned to the caller of Errorset, which can then attempt to recover from the error. Calls to Errorset may be nested. This mechanism is quite powerful, although as Lantz himself remarks that provisions should be made for associating handlers with particular exceptions, similar to Levin's or those of CLU. The RIG inter-process exception mechanism is more unusual, as it employs a new message type -- an emergency  message. An emergency message is delivered with highest priority,  ahead of any other messages queued for the receiving process, and will cause a blocked process to be awakened. When an emergency message is received, the emergency  handler associated  with the process is invoked. RIG adopts a server-type process structure for its operating system utilities, in contrast to Medusa which uses a shared object concept. In RIG, a process has to explicitly register interest in another process before it will receive emergency messages from it, whereas in Medusa, users of shared objects are automatically notified of exceptions through  27 the backpointers stored with the shared objects. Emergency messages are typically generated by event-handlers,  although they may be  generated by any process. An event handler is a process that is capable of detecting, or will always be informed about, the occurrence of a particular kind of event. In general, a process, PA, must register with an appropriate event handler that it wishes to be notified when a particular event, EPB, occurs in process PB. When EPB occurs, the event handler notifies PA via an emergency message. One particular event with which all processes are concerned is the death of other processes and machines with which they are communicating. The RIG Process  Manager  acts  as an event handler for suicide, crash or suspension. The Process Manager relies on the kernel to notify it whenever a process dies. The Process Manager then sends the appropriate emergency message to all interested parties. The relatively simple approach of RIG to emergency handling has several advantages: it is cheap to implement, there is only one handler per process, so it is easier to read, and there is less non-deterministic program execution than in UNIX. The disadvantages of this approach are that there is only one handler for all emergencies, and the internal exception handling mechanisms provide no means for implementing exception handling in a high level language such as ADA [US 80], so a separate mechanism must be used by each language compiler. Further, there is still some non-deterministic execution on receipt of an emergency message, as the process is awakened from either Send or Receive states.  28 2.2.8.  Medusa  2.2.6.1. Processes i n M e d u s a  In Medusa, cooperating processes, called activities,  are grouped into task forces which  form the fundamental unit of control. A l l the operating system utilities are task forces. Processes in both Medusa and RIG communicate by passing messages; in Medusa, processes on the same task force may also use shared memory. A task force is thus similar to a Thoth team; the difference is that Medusa activities are scheduled to run in parallel, whereas in Thoth, the principle that only one process on a team executes at a time is crucial in the design philosophy.  2.2.6.2. E x c e p t i o n s i n M e d u s a  In Medusa, whenever an exception is detected, it is sent to a single  exception-reporter  utility. This central utility acts as a clearing-house for the reporting of exceptions to handlers. Thus although the detection of exceptions can occur at any level, the reporting is encapsulated in a single utility. This provides a uniform reporting mechanism for all exceptions, both inter-process exceptions and within-process exceptions, regardless of their origin. The predefined internal exceptions are divided into about a dozen reporting classes. One class is floating point overflow, another is execution  of unimplemented  instruction.  There are  eight other classes for external (inter-process) exceptions. Each activity may nominate a different handler to deal with each of these predefined reporting classes. The exception reporter utility also provides functions for adding user-defined exceptions to this list. We include a discussion of Medusa's internal exceptions here; external exceptions are described later, in Section 4.2.3.  29  2.2.6.3. I n t e r n a l E x c e p t i o n s i n M e d u s a  Four different types of handlers may be used for internal exceptions. (1)  By default, an internal exception is handled by the parent of the activity's task force -the parent has limited access to the state of the activity and can resume it if desired. This method is commonly used as Medusa's recovery mechanism. For example, many small programs will be invoked from the user's command interpreter and will not deal with exceptions at all; the command interpreter will kill the exception-generating task force and will output a message on the user's terminal.  (2)  The handler may be specified as an out-of-line handler, which is equivalent to the address of an interrupt routine. This type of handler is expected to be used as an entry into the reporting mechanisms of high-level languages, which may then propagate exceptions through abstractions in programs.  (3) An in-line handler is invoked only when the activity checks explicitly for an exception occurrence — no special report of the exception is made. (4)  Medusa also provides the ability for an activity to name a buddy activity in the same task force to handle any specified exception class. When the exception occurs, the buddy receives access to all the private information of the exception-generating activity, which is suspended. Thus the buddy can be used to help in interactive debugging, and in many other situations where remote handling is essential for recovery. One handler may also activate another handler if it is unable to deal with an exception  (e.g. an out-of-line handler may activate the task force's parent, by defining extra exceptions to the exception reporter utility).  30 The notion of a buddy activity to handle exceptions is a departure from the procedureoriented mechanisms previously discussed. Its chief advantage is that the handler can reside on a separate processor, thus protecting the handler code from the errant processor. There is also a saving of space occupied by the handler on the errant processor, and it is useful for recovery operations after certain computer limitation exceptions such as stack full.  2.2.7.  Remote Procedure  Call  RPC has been proposed as a high level language mechanism for the synchronous transfer of control between programs in disjoint spaces whose primary communication medium is a narrow channel. Nelson details two designs for RPC in his thesis [Nelson 81]. The first assumes full programming language support and involves changes to the language's compiler and binder, while the second involves no language changes, but uses a separate translator — a source-to-source RPC compiler — to implement the same functionality. The RPC paradigm is suitable for tasks with master/slave relationships. However, RPC offers only that subset of general inter-process communication provided by message-based operating systems involving hierarchic control structures; the maintenance of a dialogue between peers having symmetrical control relationships is not easy. This is because in an RPC environment, control is passed from the requesting process when the server process is called, and is returned when the server has completed processing the request. The concept of asynchronously interrupting an executing server process is counter to the RPC paradigm. RPC alone is not sufficient for conveniently programming in a distributed computing environment, because the server cannot make out-of-order replies to clients. (By out-of-order we mean that at any instant a server may have accepted more than one client request and the  31 server makes a reply to the least recent client request first). RPC is the only inter-process communication mechanism in the concurrent high level language ADA [US 80] (to which references are made throughout this thesis).  2.3. M e c h a n i s m s f o r I n t e r a c t i v e D e b u g g i n g  Interactive debugging operations such as step-by-step traces are often cited as a use for inter-process exception mechanisms. However,  programmed  exception-handling cannot usually  be used for interactive debugging, because access to the local variables of a routine by another part of a program or by a another process is usually prohibited by scope rules. Thus special mechanisms must be employed for interactive debugging. Several of these mechanisms are now described. Many commercial time-sharing systems provide a debugger, such as DEC's VAX-11 DEBUG utility [Beander 83], which uses special mechanisms. V A X DEBUG is strictly an object program debugger which has access to the symbol table of the user's compilation units. DEBUG uses VAX/VMS exception handling mechanisms to provide the needed control over user programs such as stopping execution at breakpoints. The main feature of this mechanism (described fully by Beander) is that breakpoints are converted to intra-process exceptions which are caught by DEBUG and handled by the debugger instead of being propagated up the calls hierarchy. UNIX's software interrupt mechanism has been extended to provide a powerful but inefficient mechanism whereby a parent process can monitor the progress of one or more child processes. These tracing facilities can be used for interactive debugging and include the ability for the parent to inspect and even modify values within the data area of the child process. The child is traceable only if it gives its permission by explicitly executing a system call.  Then, every time the child process encounters a software interrupt the parent process is given the opportunity to intervene. The child may be blocked indefinitely if the parent ignores it. A better scheme would involve a more efficient transfer of control between parent and child, and would also only operate if both parent and child mutually agreed tracing should commence. In Verex we have developed a unique process-oriented exception mechanism, which relies on the existence of an exception-server process. Whenever an active process incurs an execution error (i.e. an error detected by the hardware such as illegal instruction, illegal address), it is blocked and a kernel routine is invoked which forwards the message buffer to the exception-server as if it came directly from the offending process. The exception-server can inspect the status of processes that send messages to it, and can detect those that have incurred errors (the operating system sets a flag). The exception-server has the power to control the offending process — either by destroying it, or forwarding it to any other interested parties, such as the debugger. The first handler for all errors is the exception-server; to avoid bottlenecks, the exception-server creates a separate team to handle each exception request. At present, the team created has full power to pry into user and system data areas. The advantage of this approach in a distributed system is that it is often easier to handle an exception condition at a point that is logically distinct from the site where the exception was raised. The Verex error handling mechanism could obviously be extended to handle exceptions as well as errors, by programming a RAISE statement to send an appropriate message to the exception-server. At present we use this mechanism only to terminate or debug the erroneous process.  33 In Medusa, the mechanism for debugging and tracing^ called MACE, is separate from the exception reporting mechanisms already described in Section 2.2.6. M A C E executes as a single task force with just one activity, and it is located on a dedicated LSI-11 processor. M A C E has its own simple terminal i/o and exception handling facilities, so it need not rely on any of the other utilities; it is therefore almost crash-proof. The other utilities notify M A C E of breakpoints using a pipe process; M A C E can restart such broken activities when the operator desires. The author implemented an interactive remote debugger [Atkins 83a] for programs written in BCPL running on machines connected over a Cambridge Ring [Wilkes 80]. Most of these machines have the Ring as their only peripheral, and use it to communicate with terminals and discs. Hence a resident debugger is of limited use, as many machine crashes will cause contact with the terminal to be lost, rendering the debugger inaccessible just when it is most needed. However, as a computer's Ring interface allows a remote machine to read and write words of its memory, and halt, start and reset it, it is possible to move the debug program to another machine. The debugger can inspect and modify data structures directly; simple communication can be achieved by polling fixed memory locations in the target machine. This relies on very little software being working in the target machine — just enough to signal that an abort has happened and issue a message. The remote debugger acts interactively to read and update memory in the remote machine, which may be unaware that it is being examined. It also handles traps and aborts in the remote machine. It is thus a multi-event program, awaiting either a character from the user at the keyboard, or a signal at the remote machine. These events are asynchronous. This debugger is very successful, allowing new machines to be installed on the Ring in only a few days. Standard coroutines are used to implement the debugger because of their low set-up cost and low run-time overhaed  34 on coroutine switching. However, special mechanisms are used for signalling traps and breakpoints. In Mesa, any exceptions which have been propagated up the calls hierarchy to the root procedure, are then passed to the debugger, as described in Section 2.2.3.2. Thus the only operating system which uses the  same  mechanism for debugging and  exceptions is Mesa; and it has been observed that in Mesa, the major reason for raising an exception is to invoke the debugger, and thus the exception the language are secondary to the use of exception  handling  facilities embedded in  detection.  Thus in general, the mechanisms for debugging are not suitable for programmed exception handling, and so the thesis is concerned mainly with the general exception mechanisms described in the previous sub-Sections 2.2. 2.4.  Exception Handling in Action  2.4.1. Inter-process C o o p e r a t i o n  ^  An interesting approach to how servers may notify clients of exceptional events, is described by Reid et al. [Reid 83] for the Mesa file-system. In the Mesa file system, client processes are viewed as cooperative, and to achieve this cooperation they support file-sharing. If one process wishes to use a file in a way that conflicts with the way that a second process is using the file, the process that is using the file may be asked to relinquish it. For example, if a process wants to write a file being read by another process, the process that is reading the file is asked to stop. Also, a process may ask to be notified when a file becomes available for a particular use. However, the processes that share files need neither communicate explicitly, nor know one another's identities. We assume that such notification does not happen fre-  35 quently, and hence can be regarded as an exceptional event. Cooperating processes are used in the design of the Xerox Development Environment, an integrated multi-window programming environment to support sophisticated tools such as windows that load themselves with the new version of an error log each time it changes. To achieve the notification, clients provide call-back procedures, by passing them as procedure parameters to the file-system monitor (which acts like a server). For example, a client can ask the file system monitor to notify it whenever a file becomes available for some particular access, as the client might want to be awakened when the file is available so it can try again. The procedure AddNotifyProc is called to register such a request with the file system, and the procedure RemoveNotifyProc is called to remove it. When the file system determines that the conditions have been satisfied,^ it calls the NotifyProc passed in as the parameter. The system has to guarantee that when a client is notified, it can indeed acquire the file for its desired access. There is nothing new in using procedure parameters to provide call-back facilities from subprogram to caller [Black 82]. However, the novel way this has been used in providing inter-process communication for a monitor-based operating system is worthy of further examination. The main use of this mechanism is to allow cooperating clients to execute a PleaseReleaseProc whenever the file-server to which they have registered a NotifyProc needs to use the same file for another process. However, the authors state that there are difficulties with this mechanism. First, because the client must be prepared to have its call-back procedures invoked at any time. This may cause subtle synchronisation issues in the inter-process communication between the client, the file system, and (indirectly) other clients.  36 Next, the difficulties are inherent in writing multi-process programs. As a means of communication, the call-back procedures expose these difficulties. Note that clients need not master the subtleties of call-back procedures to use the file system. They can choose instead not to cooperate in their use of files, using a system-provided PleaseReleaseProc that always returns no. Often, tools are first written with little or no cooperation and they gradually evolve to allow more cooperation. Third, since many clients may be calling it simultaneously, the file system must lock some of its internal data structures while it calls the client-provided PleaseReleaseProc or NotifyProc. Although essential to preserving the consistency of data structures and to provide some guarantees on its behaviour, this means that there are file system operations that cannot be invoked from PleaseReleaseProc Therefore, some of the file system PleaseReleaseProc;  or NotifyProc without causing deadlock.  procedures may  these include Release.  not be called from  If the PleaseReleaseProc  within a  calls one of these  procedures, the process will deadlock on the file system's monitor for that file. If it must call one of these procedures, it must fork another process to perform the call and must not wait for  that  process  to complete,  since  the process  will  not complete  until the  PleaseReleaseProc returns. The return value later from a PleaseReleaseProc may indicate that a process has been forked that will release the file. We consider how to implement such a set of cooperating processes in the Thoth environment. As servers cannot Send to clients which have made Send requests to them (by convention for security and deadlock-avoidance [Gentleman 81]), some other means has to be provided for notifying clients of the exceptional event PleaseRelease.  37 The drastic approach would be for each client to nominate another process, the victim, which the server kills when it wishes to notify a PleaseRelease request. A fourth process, the client's vulture, awaits the death of the victim (usually by execution of a Receive Specific) from it, and is notified by the operating system when this occurs. The vulture can then notify the client by a Send request. This arrangement is shown pictorially in Figure 2.1. The disadvantage of this mechanism is that the vulture and victim processes may  be  required in certain applications, merely to provide exception notification to the client. An advantage of this mechanism is that the server does not block when it kills the victim, so no deadlocks occur during the message exchange. Another advantage is that the client receives the notification (via the vulture) when the client executes a Receive — thus the notification is synchronous with respect to the client's execution. Furthermore, it always works, and its use is now well-understood in writing systems code. If the victim process can  4. client may request more information  Key  6. client informs server about new victim  >  = Send = Op. Sys action  3. vulture inform client  5. client reinstates new victim  1.  server destroys  victim  [ vulture ] ^ 2. operating system awakens vulture  Figure 2.1. T h o t h Server-client Notification using a Victim Process.  38  ,  serve another role, such as a timer or worker whose state is not needed if the exception occurs, the overheads are not too great. When the client receives a message from the vulture that the victim has died, the client must recreate the victim (for future notifications), Send to the server to establish the new victim, Send to the server for more details (as the victim's death serves only as a binary signal of the event's occurrence) before finally executing the PleaseReleaseProc code. The Thoth solution described above requires the following process switches. (1)  For the server to destroy the victim, the equivalent of 2 process switches (e.g. in the Verex implementation, the server does a Send to the Team-invoker who takes appropriate action before making a Reply to the server although in the PORT operating system this is achieved by a kernel call taking the equivalent of <1 process switch).  (2)  For the vulture to notify the client, 2 more process switches.  (3)  For the client to recreate the victim (if necessary), the equivalent of 4 process switches (e.g. in the Verex implementation, the client does a Send to the Team-invoker who makes a request of the Team-creator before replying).  (4)  Then the client has to notify the server of the new victim, and, after receiving the reply, has to request more information from the server about the nature of the exception. This takes 4 more process switches. Therefore this mechanism requires the equivalent of 12 3  process switches, (including Destroy and Create) which could take 6 msecs . In the UNIX environment, the client could be notified asynchronously that an exception had occurred, by a UNIX signal from the server. However, there are only 16 different signals, and their use is by convention pre-set to certain exceptions (as explained previously) so their 8  assuming the Send-Receive-Reply sequence takes 1 msec.  39 use in this situation would not be encouraged. An alternative in UNIX 4.2BSD would be to use an asynchronous Send from the server to the client via a datagram socket. This in theory does not block the server, so deadlock should not occur. As for the previous example, the client process would receive the notification synchronously when it executed a read from that socket. The client process could then execute the PleaseReleaseProc code. At first sight the UNIX implementation seems simplest, but there are major drawbacks. Firstly, the asynchronous Send does block when the i/o buffers are full, which can occur when communication is over a slow network, leading to potential deadlock. Secondly, because of the nature of UNIX processes, and the asynchronous nature of the inter-process communication, the time for a Send-Receive-Reply is slow (10 msecs on a VAX 11/750), and the system call to a socket is slow (2.5 msecs). Thus the UNIX 4.2BSD 4  notification (using a socket) takes 12.5 msecs . An alternative in Thoth, is to keep the client process always listening to the server. The server then replies directly to the client. If the client has to be ready to act as a server itself to higher level clients (often the case in a hierarchy of servers) and accept requests, the client must spawn a worker on the same team to await notification messages from the server. This may mean that the server makes an out-of-order Reply to the worker. This technique is commonly used and reduces the number of process switches compared with the vulture-victim mechanism. The process switching is now simply from the server to the worker, and from the worker to the client. This takes only 1.5 msecs on a SUN workstation, much faster than the UNIX approach. If we provide the worker with the code of PleaseReleaseProc, we can call the worker an exception process and possibly even eliminate the need for the client to be told 4  This is measured as the time for a datagram transfer of 32 bytes from server to client if client is ready to receive.  40  of the exception. In this case, the number of process switches would be still further reduced to two, whereby the server switches to the exception process and back again. This is the same as the UNIX situation using an asynchronous  Send  to a datagram socket. The separate  exception handler process so described has the advantage that the exception handler code is separate from the normal-case code in the client process. Furthermore, the context of the client code does not have to be saved to execute the exception code, as the client process keeps its separate existence during the exception process's execution. Another apparent advantage is that increased concurrency is possible. This solution is very attractive, but it is fraught with problems in real applications, particularly in the synchronisation of exception process and the client. This is discussed fully in Section 4.3 on Exception  h a n d l e r s as processes.  In RIG, an emergency message from server to client can provide the notification like a socket in UNIX4.2BSD. However, execution of the client is non-deterministic as receipt of the emergency message causes a change of flow of control if the client is either sending or receiving. If the client is executing a Receive at the time, the emergency message arrives synchronously. However, if the client is executing a Send at the time, the emergency message arrives asynchronously. Thus the degree of non-determinism falls between that of the Thoth and UNIX4.2BSD model, and the Mesa implementations.  The same difficulties with potential  deadlocks and difficulty of synchronisation will occur as with the Mesa and Thoth approaches.  2.4.2. A s y n c h r o n o u s E v e n t s ~ T e r m i n a l I n t e r r u p t ;  One common problem is in defining what should happen to an executing program when an asynchronous event occurs such as an attention or an i/o interrupt. When a terminal user presses < B R K > , what happens depends on what the program specifies should happen, and  41  on what mechanisms are available. For example, an editor might specify that typing < B R K > will cause an exit to the program which called it only if all the user's files have been saved; otherwise it asks if that is really what is meant. In UNIX one of the software signals (SIGINT) is used to interrupt all the client processes which are using the terminal when the user hits < B R K > . Many client programs do not catch interrupts, and are killed. However, a process can arrange to catch interrupts and take appropriate tidy-up actions before either continuing or not. In the Thoth environment, process destruction is used to handle < B R K > by specifying each team to have an associated terminal server. Each such server provides a facility whereby a client process may nominate a victim process which is to be destroyed on input of < B R K > . The victim is destroyed immediately. Alternatively, the client can specify no victims; can make itself unbreakable, and can field the < B R K > as it wishes. Generally all processes on a team are breakable, so an interrupt on the team's terminal destroys the entire team. The chief advantage of this mechanism is that it is simpler than UNIX-like signals as the user doesn't have to worry about a process's internal state after the event — it doesn't exist! Furthermore, a list of processes blocked on others has to be maintained by the operating system for implementation of the message-passing primitives, so little extra effort is required to implement this destruction mechanism. However, it is necessary to maintain the integrity of resources which are owned by processes which are destroyed, either by a list in kernel data, or by garbage collection, or by timeouts. The Thoth solution is to provide garbage collection of orphaned resources, or to make some processes (such as servers) unbreakable. Runaway unbreakable processes can  42  always be destroyed by an appropriately authorised Destroy command, like the UNIX KILL command. Unlike UNIX, any other processes blocked on a destroyed process are awakened immediately. Thus the Thoth process destruction provides a general mechanism suitable for communication of asynchronous exceptional events. This mechanism has been used in the Thoth text editor to provide a very simple means for handling both < B R K > and other exceptions [Cheriton 79b]. In this scheme, the editor is split into two processes on the same team. The process Main holds all the data structures such as the text buffer which must survive a < B R K > . The other Editor process implements the functionality of the program, viz. reading and performing user commands. The Main process is unbreakable while the Editor process is breakable, so only the Editor is destroyed on a < B R K > . Because the Editor process executes the commands, < B R K > causes the execution of the current command line to stop. The text buffer remains consistent because it is maintained by the Main process. The Main process detects when the Editor process has been destroyed and creates another Editor process which proceeds to read a command line from the input. Thus, the handling of < B R K > appears to the user as that of the editor immediately returning to command level. This structure is also exploited in handling exceptions. On detecting an exception, the Editor  process destroys itself after leaving a pointer to an error message in a global variable  indicating the exception. The user is notified of the exception by the next incarnation of the Editor process, which prints the error message. Full details are given in [Cheriton 79b].  In PILOT, communication with peripheral devices is handled by monitors and condition variables much like communication between processes. Each device has an interrupt handler process which is awakened by a signal on a condition variable. The target (user) process  43  receives notification of the event by the interrupt handler via another condition variable. User interrupts from a terminal are treated as exceptions (usually the ABORT exception) which the user can chose to handle or ignore in the usual way. Thus there is no way to stop a runaway process except to reboot the system — in a single-user environment this is an acceptable approach. A similar attitude is taken by the designers of TRIPOS, a single-user operating system designed at Cambridge University [Richards 79b]. In CAP, an asynchronous event such as an attention may be signalled via the master coordinator. The target process is notified of the event by suffering an enforced jump. There is just one attention handler active at any time in a CAP process, which is entered immediately an attention occurs. The operating system allows the user to trap all attentions by specifying an attention handler, which may either perform tidy-up actions on the current domain and then execute a standard R E T U R N to the calling domain, or may attempt to execute a CLEAR on the attention. The operating system sets a RETURN-TRACE flag so that whenever the currently executing domain executes a standard RETURN, the calling domain is also notified of the attention in exactly the same way.  During the execution of the tidy-up  operations, other protected procedures can be entered; the CAP operating system notes that these protected procedures are in a special state and disallows recursive calls to the attention routines through multiple attentions. Attentions are of different degrees of severity, and different capabilities are required to clear them. The attention is propagated to the calling protection domain if the attention routine in the current protection domain cannot clear it. The calling protection domain then has a chance to handle the attention. The protection domain STARTOP, the initial protection domain in every user process, has a capability which allows it to clear every attention. By  44  default, an attention will terminate a user's j o b a n d r e t u r n to initial state. T h e advantages of this m e c h a n i s m are that it allows the user to trap all faults a n d that it copes w i t h m u l t i p l e attentions.  T h e disadvantages are that it allows non-deterministic exe-  cution, a n d that it cannot be used to stop r u n a w a y processes w h i c h loop d u r i n g execution of attention handlers.  A n d as the attentions  are arranged in a simple hierarchy of levels, the  mechanism has p r o v e n to be somewhat inflexible for h a n d l i n g different types of asynchronous events encountered in c o m p u t e r c o m m u n i c a t i o n s a n d i n d i s t r i b u t e d e n v i r o n m e n t s .  2.5.  Summary T h e survey of the various operating systems shows the great diversity of exception h a n -  d l i n g mechanisms.  P u r e l y procedure-based mechanisms, used extensively before about  1977  have disadvantages when applied to d i s t r i b u t e d systems because of the assumptions of shared memory.  For  h a n d l i n g exceptions  procedure-based a n d message-based  in d i s t r i b u t e d o p e r a t i n g systems,  i n R I G a mixture  of  techniques are used; R P C uses a procedure-based tech-  nique, and M e d u s a uses a process-based technique. T h e examples show that the m e c h a n i s m for notification  of exceptional events (such as  the asynchronous event of < B R K > ) is system dependent, a n d varies according to the process model a n d the inter-process c o m m u n i c a t i o n facilities p r o v i d e d b y the operating system. notification of  <BRK>,  UNIX  uses  its  inter-process  exception  mechanism to  software signal w h i c h forces a c o n t r o l flow change in the target process. exception is asynchronous in this e n v i r o n m e n t (where a synchronous one where the process is in a state ready to receive it).  T h u s the  provide a <BRK>  exception is defined as  T h o t h uses immediate destruction of  the process a w a i t i n g t h a t event, a n d other processes receive notification when they attempt communicate  with  the  destroyed  process.  This  is  For  therefore  treated  as  a  to  synchronous  45  exception.  The PILOT  handler for each device uses monitors w i t h c o n d i t i o n variables to  notify user processes of device interrupts, again this is a synchronous exception.  In C A P , i / o  device interrupts force an asynchronous transfer of control to an attention routine in the currently executing process, similar to a hardware i n t e r r u p t .  O f the mechanisms aforemen-  t i o n e d , only T h o t h ' s c a n be conveniently extended to a d i s t r i b u t e d e n v i r o n m e n t , as the others rely o n shared m e m o r y for their execution.  In R I G , an emergency message is used to notify  the target user process: this message unblocks the target either s y n c h r o n o u s l y or asynchronously.  In all cases, the default action of the operating system is to destroy the target user  process (or processes) to w h i c h the terminal was connected.  T h u s it cannot be stated in isolation t h a t X or that  is a good technique for exception h a n d l i n g ,  Y is a good technique for exception notification; exception mechanisms must be con-  sidered in the context of the process model a n d the inter-process c o m m u n i c a t i o n facilities available.  CHAPTER 3 Principles for Improving Programs through Exception Mechanisms  3.1.  C l a s s i f i c a t i o n of E x c e p t i o n s  The discussions and examples in the previous chapter illustrate several features of exceptions encountered by event managers, and these features are used to derive an informal classification of exceptions, as shown in Figure 3.1 below. Exceptions have been divided into three main groups, according to where the exception is handled. (1)  Server-client exceptions. These need to be communicated from the server to the client (i.e. contrary to the usual inter-process communication from client to server), in order to be handled by the client. The exception must be communicated to the client at the layer  Level 1  „  reqm :st exceptions  Figure 3.1 Level 1 Classification of Exceptions  46  47  above, possibly after the server has handled it and transformed it. When the client and server form a uni-process system, such exception notification is called  call-back  notification. When the client exists as a separate process, we call server-client exceptions inter-process  exceptions.  Peer exceptions. These exceptions are detected and handled at the same level, and are typically represented by the less frequent arm of an if-then-else branch. So a peer exception detected by a server during processing of a normal request event is handled entirely at the server's level and is therefore transparent to the client at the layer above. If the server consists of a single process, peer exceptions may be conveniently caught and handled using existing mechanisms for intra-process exceptions. In a multi-process system where the server may use related worker processes at the same level to handle specific events, the server may use a cooperating exception process to handle the exception. A n example is described in Section 5.1.5, where the ITI protocol's exception process handles reconnection after a network failure, transparently to its client process at the level above. Still further distribution of peer exception handling can be achieved, by arranging that a peer exception is handled by a process on a different host implementing the same level of abstraction as the server. For example in a single level of a communications protocol a checksum error on data received from host B may be detected by the link level protocol on host A. The link level protocol on host A handles this exception by requesting a retransmission from the host B link level protocol; host A's client at the network level above is unaware of this retransmission. Therefore there are two different types of exceptions; those which occur at a level below code which is to handle it (i.e. server-client, or inter-process exceptions), and those which  48  occur at the same level as the handler code (i.e. between cooperating peers).  A n analogy w i t h  the structured layered a p p r o a c h to c o m m u n i c a t i o n s software, as displayed b y the I S O Reference M o d e l of O p e n Systems Interconnection (OSI) [ Z i m m e r m a n 80] shows t h a t server-client exceptions are part of the layer interface  a n d that peer exceptions are part of the peer  proto-  col. T h e c o m m u n i c a t i o n peer protocols are t i g h t l y specified, whereas the interfaces are not, as they are system-dependent.  T h u s a s t a r t i n g point for the classification of exceptions has been t a k e n along the d i m e n sion of where they are h a n d l e d - - whether an exception is an inter-process exception f r o m server to client where the exception is specified as part of the interface (the exact nature of w h i c h is system dependent), or whether it is f r o m peer to peer, in w h i c h case the exception and its h a n d l i n g can be specified as part of the protocol. (3)  Request exceptions. of an unusttal  F o r completeness one more type of exception is distinguished; that  request f r o m a client to a server.  w i t h a low frequency, say <10% operation in an interface.  T h i s is a request event w h i c h occurs  o n average; such events represent an infrequently used  F o r example, the operating system i / o servers (or monitors)  are d r i v e n by events such as requests for i / o f r o m the clients, a n d by interrupts or messages from the i / o devices signalling i / o c o m p l e t i o n (a call-back notification).  S o m e of  these events will occur m u c h more frequently t h a n others — e.g. the user request to open a file for i / o will occur only once, whereas subsequent r e a d / w r i t e requests to the same i / o server will usually occur m a n y times.  It is useful to consider the user's  OPEN-FILE  request to be an exceptional event f r o m the server's point of view, so that the server can use appropriate code to detect it.  S u c h a rare request is a request  exception  a n d the  server's reply is c o m m u n i c a t e d synchronously to the client, w h i c h m a y or m a y not per-  49  f o r m further exception h a n d l i n g .  3.1.1. Inter-process Exceptions T h i s thesis concentrates tem  mechanisms  C h a p t e r 2).  on inter-process exceptions, as the linguistic a n d operating sys-  f o r m a n a g i n g inter-process  exceptions  are very diverse, (as described i n  W e now show that inter-process exceptions c a n be further classified along three  dimensions, illustrated i n F i g u r e 3.2. Server-client  communication  is in direct  analogy  with  the intra-process  mechanisms e m b e d d e d i n h i g h level languages, where a handler m a y be attached cedure call at a higher level i n the program u n i t .  exception to a pro-  A n exception occurring i n a procedure is  usually propagated u p the procedure calls hierarchy until a handler is f o u n d f o r t h a t exception.  Similarly, an inter-process exception is detected b y the server a n d transferred u p the  process call hierarchy (i.e. from a server to its client, or from a remotely called procedure to its caller, or f r o m an inner nested m o n i t o r to its enclosing monitor). m a y request to open a file of u n k n o w n name.  FILE-NOT-FOUND  F o r example, a client  T h e file-server or m o n i t o r detects the exception  while t r y i n g to open the file — this is then c o m m u n i c a t e d to the client  asynchron ous / synchronous notifica i o n  Figure 3.2  Level 2 Classification of Inter-process Exceptions  50 w h i c h handles it b y t a k i n g appropriate action such as p r o m p t i n g its user for another E x p l o i t i n g the  analogy w i t h high level languages  further, we assume  that  filename. an  process exception mechanism w o u l d allow exceptions to be propagated back u p the  inter-  abstrac-  tion hierarchy just as the linguistic exception mechanisms allow exceptions to be p r o p a g a t e d up the procedure calls hierarchy.  S u c h inter-process call-back mechanisms are available in  most operating systems, as described previously in chapter 2.  Now in [Liskov 79] the view is  held t h a t uncaught exceptions w i t h i n a process should cause the signaller to terminate.  This  approach also coincides w i t h t h a t of B r o n [Bron 76] a n d the A D A designers [US 80] who state t h a t the p r o g r a m unit raising an exception signal must be t e r m i n a t e d . expect t h a t  a structured  w h i c h detected  inter-process  exception  a n d raised the exception.  mechanism w o u l d  T h e r e f o r e we would  terminate  the  process  B u t the analogy w i t h the intra-process linguistic  exception mechanisms breaks d o w n here, because one of the purposes of a server or m o n i t o r is to  handle m u l t i p l e concurrent  requests,  so the  server  cannot  be  arbitrarily  terminated . 1  Instead, the server process detecting the exception s h o u l d return to its usual blocked state awaiting another  event.  S i m i l a r l y , if a m o n i t o r detects an exception, the  m o n i t o r should  ensure its locks are released a n d the m o n i t o r r e t u r n executed, before the exception is raised at the higher level.  T h u s it is noted t h a t the signaller of an uncaught inter-process  needs to be resumed, not t e r m i n a t e d , after raising the signal.  exception  These points of view can be  merged by stating that for single process systems, the only handlers w h i c h resume operations at the point where the exception was detected at the time they were raised (excluding flagsetting handlers), should be i n v o k e d b y exception signals w h i c h w o u l d evaporate if the exception signal was not caught.  In other words, allowing r e s u m p t i o n after an exception signal is  'Unless a separate server instance is used to service each request  51 using it like an ignorable information signal. In contrast, for multi-process systems where the server continues after signalling an inter-process exception, the exception can be either ignored or not. This leads to the definition of three major dimensions on which these inter-process exceptions can be placed, already illustrated in Figure 3.2. (1)  One dimension is the degree of necessity of handling; whether the exception is ignorable (such as the PleaseRelease exception), or whether the exception must be handled for correct operation of the system, (such as the < B R K > exception). An ignorable excep2  tion is defined in this thesis as one which, if uncaught, would just evaporate (i.e. the signaller would resume without delay). Now  of the high level language exception  mechanisms embedded in the languages ADA, CLU and Mesa, only Mesa allows resumption of the signaller, yet in Mesa, uncaught exceptions are transformed into an invocation of the remote debugger. Thus in uni-process systems, ignorable exceptions are not conveniently handled by any existing high level language mechanisms. However, in a multi-process system as noted previously, the signaller needs to be resumed after communicating an inter-process exception. Thus ignorable  exceptions take on a  new  significance in a multi-process environment, because the desired semantic of the signaller being allowed to resume after signalling an interprocess exception is already supported by systems servers. Furthermore, uncaught inter-process exceptions are already ignored in many systems. For example, in UNIX, a SIGNAL to a non-existent process has no effect on the signaller, and has no effect on the system — it is safely ignored.  2  Note that this definition means that errors cannot be classed as ignorable exceptions, unless, during program development, we temporarily wish to ignore uncaught errors.  52  (2)  A n o t h e r dimension is the n u m b e r of interested parties to be notified; an exception mayneed to be c o m m u n i c a t e d b y a server to several clients (such as one client (such as  <BRK>).  PleaseRelease),  or to  F o r convenience, we distinguish just two cases; inter-  process exceptions w h i c h m a y be broadcast to potentially m a n y clients, or sent to one specific client.  In U N I X , a S I G I N T signal c a n be sent to A L L the processes i n a g r o u p ,  or to one specific process.  T h e processes wishing to receive a signal j o i n a group in order  to receive a broadcast inter-process exception; these processes are cooperating perform some task.  together to  T h u s exceptions w h i c h are broadcast b y a server to several clients  are i n t e n d e d to develop further cooperation between the clients; as such, the broadcast exceptions m a y be h a n d l e d differently f r o m 1:1 inter-process exceptions, a d d i n g another dimension to the classification of exceptions. (3)  F i n a l l y we consider the notification of an inter-process exception to a client — if the notification is asynchronous,  (i.e. if the client is not in a state to receive the exception by  h a v i n g an o u t s t a n d i n g request to the server) t h e n it is h a n d l e d differently t h a n a syn-  chronous the  inter-process exception (in direct response to a request).  server/client  interface  a n d is system-dependent,  usually requires special c o m m u n i c a t i o n mechanisms.  T h i s feature is part of  as asynchronous c o m m u n i c a t i o n T h u s another dimension of inter-  process exceptions is the m e t h o d of notification.  3.2.  M o d e l for Systems Design W i t h the above classification of exceptions  model for systems events.  design w h i c h exploits the  in m i n d , we are now r e a d y to develop a  d i c h o t o m y between n o r m a l a n d exceptional  T h e m o d e l is d e r i v e d f r o m the author's experiences in designing a n d i m p l e m e n t i n g  c o m m u n i c a t i o n s software and software utilities. T h e first observation made b y the a u t h o r is  53  that m a n y systems a n d communications  programs w h i c h retain little state between  events,  are suitable for an event-driven a p p r o a c h , where the programs take the f o r m of an outer loop with an event wait at the top followed b y a case analysis.  A d v a n t a g e s of this a p p r o a c h for  systems software design over the alternative mechanism of encoding the state in the p r o g r a m counter,  are  given i n [Macintosh 84].  software is to a v o i d the use of m o d e s  3  T h e M a c i n t o s h approach  to  designing applications  w h i c h are often confusing to the users.  T h e program-  m i n g technique proposed to achieve this is an event-driven a p p r o a c h , where the heart of every application p r o g r a m  is its  main  event  loop, w h i c h repeatedly  calls the M a c i n t o s h T o o l b o x  E v e n t M a n a g e r to get events (such as a press of the mouse button)  a n d then responds  to  t h e m as appropriate. T h e event-driven a p p r o a c h m a y be conveniently achieved by arranging t h a t the client maintains as m u c h state as possible so that only the m i n i m u m state is m a i n t a i n e d b y the event manager between events. the server.  T h e state i n f o r m a t i o n is then encoded in the request event to  T h i s a p p r o a c h is used  in the A p o l l o D O M A I N system [Leach 83].  F o r example, a  client request to read page N of a file is made idem-potent b y specifying the page n u m b e r , N , in the  request  event  to  the  file-server.  Such a  file-server  need not  between events, a n d can be designed as an event-driven server.  then  maintain  state  T h e client must m a i n t a i n the  state i n f o r m a t i o n of the current page n u m b e r .  S u c h multi-process event-driven systems are conveniently modelled by message systems s u c h as T h o t h ,  R I G and U N I X 4.2BSD.  However, they  datagram-based are not so con-  veniently modelled in the monitor-based systems such as P I L O T , where the inter-process c o m m u n i c a t i o n is t h r o u g h a  a monitor  a n d not direct.  T h e d u a l nature  of message-based  and  A mode is part of an application that the user has to formally enter and leave, and that restricts the operations that can be performed while it's in effect.  54  monitor-based operating systems has been discussed by Lauer and Needham in [Lauer 79], so that in general an event-oriented approach can be simulated in all systems. The differences between them are illustrated in the examples of Chapter 4. Programs which are not amenable to the event-driven approach include numeric systems such as the evaluation of a function in a deterministic purely functional programming language such as pure LISP; these systems are outside the scope of this thesis. Programs particularly suitable for the event-driven approach include simulation programs, operating system software and non-deterministic programs. Given a system which is tractable to the event-oriented approach, the author considered what top-down design principles are to be employed. Clearly efficiency is an important criterion, as inefficient programs are very susceptible to subsequent kludges to improve performance. The author was also concerned with mechanisms to achieve modular, incremental changes to programs, based on experiences with dynamic program specifications to add bellsand-whistles features — difficult to achieve without adding uneccessary complexity to the system. From a bottom-up approach, there are many possible configurations of processes and inter-process communication paths in multi-process systems. Different choices may be made according to the major objectives of the system. Merging the top-down and bottom-up design ideas result in the model outlined below in Figure 3.3. The model treats program systems as event driven, and proposes the isolation of the events of the system which may be handled with equal cost. These equal cost events are then identified as normal or exceptional, according to their probability of occurrence and the nature of the event. The program system is then designed according to an appropriate objective, chosen from three common cases -- minimising the average run-time, minimising the  55  exception processing time (useful for real-time processing), or increasing program functionality. Other objectives such as maximising throughput or maximising resource utilisation may be modelled similarly; for conciseness we limit our discussion to these three objectives. In the rest of this Chapter, the features of the model are elaborated in detail, with many examples to illustrate their use. (1) Treat the program system as event driven. Most systems programs respond to events; even some data-driven utility programs may be considered as event driven, where the value of the data represents the event to which the program responds. An example of such a program is a spell utility program which compares words in a source text file with those in a standard dictionary. Such a program reads characters one by one from the input file till it finds a word delimiter. A word delimiter may be any special symbol such as , ( ; ! . : etc. The completed word is then compared with the dictionary. An analysis of the spell utility shows that it may be considered as event-driven,  where the  Model for Systems Design (1) Treat the program system as event driven. (2) Isolate the events of the system which may be handled with equal cost, and group together any such events. (3) Identify events as Normal or Exceptional (based on their probability of occurrence and the nature of the event). (4)  Design the program according to the objective: Level 1  Minimise average run-time  Minimise exception processing time  Increase the program's functionality  Figure 3.3 Level 1 M o d e l for Systems Design  56  value of the next character from the i n p u t file corresponds to a particular event. for 8-bit characters i n p u t , there are 2 = 2 5 6 events. 8  simply  RUNNING.  Thus  T h e program has just one state,  L i k e all event-driven programs, s p e l l executes a perpetual loop, get-  t i n g the next event (character f r o m text file) a n d b r a n c h i n g o n that state-event pair to an appropriate event handler. If two or more events are detected a n d h a n d l e d in the same way, they c a n be g r o u p e d together to f o r m a constant-cost  state-event g r o u p .  T h e p r o b a b i l i t y of this new c o m p o -  site event o c c u r r i n g is the s u m of the probabilities of the i n d i v i d u a l events occurring. F o r example, in s p e l l , each alphabetic character can be treated i n the same w a y ; therefore events  corresponding to alphabetic characters c a n be g r o u p e d together  sidered normal.  a n d con-  T h e p r o b a b i l i t y of a n o r m a l event o c c u r r i n g is the s u m of the probabili-  ties of the i n d i v i d u a l alphabetic characters occurring. In general, the division of events into N o r m a l or E x c e p t i o n a l should be made such t h a t all n o r m a l events occur w i t h a p r o b a b i l i t y greater t h a n all exceptional events.  However,  in some systems it m a y not be easy or obvious how to separate events in this w a y .  A  guideline for whether to describe an event as n o r m a l or exceptional is to decide whether an e x c e p t i o n - h a n d l i n g m e c h a n i s m , requiring a different p r o g r a m m i n g style a n d i n c u r r i n g different overheads (usually costly) at compile a n d at r u n - t i m e , is appropriate for the event.  R e c a l l that an exception mechanism should be used w h e n a p r o g r a m m e r wishes  to i m p r o v e clarity of the normal-case c o m p u t a t i o n , p r o v i d e d it does not incur unreasonable costs or i m p a i r reliability. In particular, errors  m a y always be treated as exception  events, even if an error event occurs w i t h a p r o b a b i l i t y greater t h a n some other so-called n o r m a l event.  However, for some systems,  the p r o b a b i l i t y of a n event varies widely  57  f r o m time to time, a n d the occurrence of events m a y be highly correlated. in  a database  F o r example,  transaction system, some queries m a y be rare on average but when one  such query is m a d e , other related rare queries m a y occur soon afterwards.  In such situa-  tions, the best rule is to take the a p p r o a c h that all events should be treated i n the same way. For  ( A n example of this is the X.25 protocol i m p l e m e n t a t i o n described in C h a p t e r 5). yet other systems, the probabilities of the events are not k n o w n a priori.  A g a i n , it  is best to treat all such events in the same w a y , a n d as stated i n the i n t r o d u c t i o n , the model cannot be used in such situations.  T h u s we restrict our discussion to event d r i v e n  systems where the probabilities of the events are k n o w n a We A.  priori.  have chosen to model systems meeting one of three c o m m o n objectives. M i n i m i s e the average r u n - t i m e .  A model for this i m p o r t a n t objective is described in  the Section following, a n d m a n y examples are described in detail in Section 3.3. B.  T h i s objective is considered, because m a n y real-time process c o n t r o l systems require  p r o m p t , urgent action in response to an exception event such as the temperature ing  a permitted threshold.  exceed-  Speedy exception processing can usually only be achieved by  a d d i n g overhead of data-collection to the n o r m a l events, so this objective conflicts w i t h objective A above. C.  A new a p p r o a c h to p r o g r a m design is to achieve incremental increase of the func-  tionality of a server/client system b y first designing a m i n i m a l system, a n d t h e n p r o v i d ing for incremental a d d i t i o n of new features. Section 3.2.3  T h e model for this objective, described in  below, proposes exploiting inter-process  clients to achieve the incremental f u n c t i o n a l i t y .  exceptions  f r o m a server to  its  58  Another objective, that of ensuring that the exception handling code is correct could be considered separately. This objective may be appropriate in systems where the efficiency of the mechanisms are not important, compared with the necessity of correct exception handling, and where implementation of correct code may be to the detriment of performance. This situation may occur in a real-time process control system where failures are infrequent and may never be fully testable. Our model does not address this important issue in detail, because proofs of program correctness are outside the scope of this thesis. The model is now extended to cover these different objectives.  3.2.1.  Objective A : Minimise the A v e r a g e R u n - t i m e  The model is shown in Figure 3.4. (1) At the first level, the major components of executing the normal events are determined, so the program design can be concentrated upon reducing the greatest cost component 4  from the following : a) cost of detecting that the event has occurred. b) approximate cost of handling the normal event c) cost of maintaining minimum information needed just for correct handling of all exception events -- called the (2)  context-saved  cost.  To structure the program, the most significant cost component is considered from the above costs. a) If there are many events, and/or the handling costs are small, the event detection costs may be significant. The model proposes mapping one exception onto another to 4  The case where the exception handling time has a significant impact on the average run-time, such as when a very expensive inter-process exception notification mechanism is used, is considered under the next objective: that of reducing the exception-handling time.  59  O b j e c t i v e A : M i n i m i s e the A v e r a g e R u n - t i m e  Level 2  a. detection costs significant  b. normal-case h a n d l i n g  c. context-saving costs  costs significant  significant  m a p one exception  structure to reflect logical  push more code  onto another so  event flow, N O T the logical  into exception h a n d l i n g  N O R M A L cases c a n be  structure, e.g. os u t i l i t y .  instead of i n n o r m a l  detected i n just 1 test  event h a n d l i n g  e.g. spell, p u t b y t e  e.g. readback, nested atomic actions.  Level 3  F o r multi-process systems  l . a d d request exceptions  2.try/alternative  to a server, a n d ensure  process  3.use broadcast  configurations to reduce  exceptions to  all events are seen  ipcr a n d context-switching  reduce network traffic  first by server  fpr h a n d l i n g n o r m a l events  over a L A N  e.g. V e r e x name-server,  e.g. restructure I T I as a  e.g. V - k e r n e l atomic  Newcastle C o n n e c t i o n ,  reader a n d writer filter  TBL/TROFF/EQN.  + exception process  for h a n d l i n g n o r m a l events actions  4. design problem-oriented protocols  5. a d d separate exception  even at the expense of  processes o n separate  generality e.g. L N T P protocol  cpu's for concurrency e.g. p o i n t - t o - p o i n t protocol  Figure 3.4 Level 2 and 3 Model to Minimise the Average Run-time enable n o r m a l events to be detected i n j u s t one test.  F o r example, the  spell  and  put-  byte utilities described i n Sections 3.3.1.1 a n d 3.3.1.2 respectively.  b) If h a n d l i n g costs are significant, the p r o g r a m system should be structured according to logical event flow, not the logical structure, e.g. the uni-process os u t i l i t y described i n  60  Section 3.3.2.1.  In multi-process systems there are several ways to achieve this.  They  are elaborated in the t h i r d level below.  c) If i n f o r m a t i o n is saved d u r i n g n o r m a l case processing purely for correct h a n d l i n g of exceptional events, t h e n a n a t t e m p t must be made to minimise this context-saving T h e model proposes p u s h i n g as m u c h code as possible into infrequently executed tion h a n d l i n g c o d e . 5  excep-  F o r example, the provision of a read-back facility described in Sec-  tion 3.3.3.1, a n d the design of the protocol L N T P Section 3.3.3.2.  code.  for local area networks, described i n  T h e author's i m p l e m e n t a t i o n of nested atomic transactions, described in  C h a p t e r 5 also reflects this design principle of m i n i m i s i n g context-saving code.  (3)  In this level, various techniques are proposed for s t r u c t u r i n g multi-process programs  to  minimise the average r u n - t i m e . 1. A d d request exceptions to servers, a n d allow all events to be seen first by the server w h i c h handles n o r m a l events. 3.3.2.2,  and  the  Newcastle  TBL/TROFF/EQN  system  F o r example, the V e r e x name-server described in Section C o n n e c t i o n system,  also can  discussed  be redesigned to  in  Section  3.3.2.3.  The  exploit this technique, a n d an  extended example to achieve this is described in Section 3.3.2.4. 2.  Try  alternative  process  configurations to  reduce  the  inter-process  (and therefore context switches) in h a n d l i n g n o r m a l events. this  design  C h a p t e r 5.  principle is the  author's  communication  A n example w h i c h follows  i m p l e m e n t a t i o n of the  ITI  server  described in  It was redesigned f r o m a server process w i t h a reader a n d a writer worker,  to a system consisting of a reader filter, a writer filter, a n d an exception handler process.  s  bearing in mind that it might be worthwhile to increase the expected use of the exception handling code to ensure it works correctly by mapping one exception onto another.  61  3. Use broadcast exception messages to reduce message traffic over a local area network (compared w i t h m a n y 1:1 messages), e.g. the author's i m p l e m e n t a t i o n of atomic  actions  in the V - k e r n e l , described i n C h a p t e r 5.  4. D e s i g n p r o b l e m oriented protocols to handle n o r m a l events most efficiently, at the possible cost of reducing the protocol's generality a n d f u n c t i o n a l i t y . T h i s a p p r o a c h has been successfully used i n the A p o l l o D O M A I N system the  LNTP  [Leach 83].  A n o t h e r example is  protocol, described i n Section 3.3.3.2, i n w h i c h a tradeoff has been  made  between functionality a n d performance i n that the protocol's flow control has been t u n e d at the t r a n s p o r t layer to the specific environment of a L o c a l A r e a N e t w o r k .  5. D i s t r i b u t e exception h a n d l i n g code so it is situated o n physically distinct  processors  thus increasing the concurrency so n o r m a l case processing c a n continue w i t h o u t interruption.  A n example  of this  is a p o i n t - t o - p o i n t  network  protocol  described  i n Section  3.3.2.5.  3.2.2. O b j e c t i v e B : M i n i m i s e t h e E x c e p t i o n P r o c e s s i n g T i m e T h e model is shown i n Figure 3.5. A s for O b j e c t i v e A , the major components of responding to a n exception are d e t e r m i n e d , so the program design c a n be concentrated  u p o n reducing the greatest cost component f r o m  the following: a) cost of detecting that the event has occurred. b) cost of the notification of the exception to interested c) approximate cost of h a n d l i n g the exception.  parties.  62  Level 2  O b j e c t i v e B : M i n i m i s e the E x c e p t i o n Processing T i m e  a. detection costs  b. exception notification  significant  c. exception h a n d l i n g costs  costs significant  significant  m a p all other  T h e s e costs m a y be significant  push more code  cases onto one so  in multi-process systems.  into normal-case  E X C E P T I O N cases  T h i s system dependent feature  can be detected  h a n d l i n g to reduce  is modelled by the  in just one test  exception h a n d l i n g .  K E Y B O A R D - M O U S E example.  Figure 3.5  Model to Minimise the Exception Processing Time  T o structure the p r o g r a m , the most significant cost c o m p o n e n t is considered f r o m the above costs.  a) T e c h n i q u e s for r e d u c i n g the detection costs for exception events are similar to those for r e d u c i n g the detection costs for n o r m a l events, considered in Section 3.2.1.  b) In some situations, the inter-process exception notification costs are significant. part of the system dependent interface layer between client a n d server. a  general  model for  minimising  this  cost,  program  paradigms  T h i s is  Therefore, instead of  for different systems  and  languages are g i v e n . A n example p r o b l e m , t h a t of m u l t i p l e x i n g i n p u t f r o m a k e y b o a r d a n d a mouse, is described in Section 4.1, w h i c h illustrates various methodologies for r e d u c i n g the 1:1 inter-process exception costs, a n d a n example for using l : m a n y inter-process exceptions is the storage allocator described in Section 4.2.  F u r t h e r , if even the s m a l l extra cost of process  switching is intolerable, the time for server-client exception notification m a y be m i n i m i s e d by using a uni-process client/server system w i t h asynchronous exception notification f r o m the server to an in-line procedure in the client v i a hardware or software interrupt signals.  63  c) F o r m a n y systems, the exception h a n d l i n g time c a n be reduced only by increasing the contextual i n f o r m a t i o n saved while processing n o r m a l events.  T e c h n i q u e s for r e d u c i n g the excep-  tion h a n d l i n g time are the same as those used for reducing the n o r m a l case h a n d l i n g costs considered i n Section 3.2.1.  3.2.3.  O b j e c t i v e C : Increase t h e F u n c t i o n a l i t y o f t h e S y s t e m .  T h e model is shown i n Figure 3.6 below.  T h e m i n i m a l system for h a n d l i n g n o r m a l events is first i m p l e m e n t e d , bearing i n m i n d that extra features,  w h i c h m a y be termed exceptional, m a y be added later.  T h e idea is to  allow m o d u l a r increase to the program's (or system's) functionality b y treating new features as inter-process exceptions f r o m a server to its client or clients.  T h e s e extra features must be  largely independent of the rest of the system. O n e convenient w a y to achieve this incremental increase of functionality is b y a d d i n g a separate exception server.  process at the client level to handle each new feature p r o v i d e d b y the  T h i s c a n be achieved if there is only weak cohesion between the processes, a n d has  the desirable effect of increasing the concurrency of the system. reconnection  of a b r o k e n network connection i n the author's  described i n Section 3.4.2.1, the I T I < B R K >  E x a m p l e s are the o p t i o n a l  I T I protocol i m p l e m e n t a t i o n ,  exception handler, a n d the storage allocator i n  M e s a , described i n Section 4.2.3. B y using ignorable  inter-process exceptions, extra features p r o v i d e d b y the server c a n be  ignored b y a client until a handler is p r o v i d e d . provides a handler for the inter-process independent modules;  this  of the client is good  code,  software  so that  T h e feature is implemented w h e n the client  exception. the client  engineering  practice  Ideally, the handler s h o u l d be almost a n d handler  are only  a n d it facilitates  loosely-cohesive  proofs  of p r o g r a m  64  Level 2  Objective C: Achieve Incremental Increase of the Functionality of the System  Implement a basic system with minimal functionality and treat new features as inter-process exceptions detected by the server for its client (or clients) to handle.  Exploit low cohesion between client and handler by making the exception handler a separate process, e.g. ITI reconnection after network failure, ITI < B R K > , storage allocator in Mesa and in V-kernel.  ignorable interprocess  Add a handler to the client whenever convenient e.g. window-size-change, hash table, slow graph plotter  mHst-be-handled inter-process exceptions  Implement the feature by adding a handler to the client e.g. ITI < B R K >  Exploit cheap unreliable message delivery in a distributed system (such as datagrams) e.g. V-kernel atomic actions implementation  Exploit broadcast inter-process exceptions to increase the cooperation between clients of a server e.g. the Mesa file server, the storage allocator. Figure 3.8 Model to Increase the Program's Functionality  correctness. For example, the author implemented and tested the ITI protocol without a handler for the < B R K > exception -- this was successfully added later, as described in Sec-  65  tion 3.4.1.1 below.  O t h e r examples of the use of ignorable inter-process exceptions are given  in Section 3.4.2.2-Section 3.4.2.4. In d i s t r i b u t e d systems  where the cost of reliable message  delivery is high, it c a n be  advantageous to use a cheaper, b u t unreliable, message delivery system.  A n example is the  author's i m p l e m e n t a t i o n of atomic actions, described i n Section 5.2. O n e aim m a y be t o provide increased cooperation between the clients of a server, for which broadcast inter-process exceptions are particularly useful. this objective t h r o u g h the techniques described is the M e s a 3.4.2.5.  A n example w h i c h achieves  file-server,  described in Section  A s the inter-process exception mechanism is system dependent, program paradigms  are given i n Section 4.2 for a storage allocator p r o b l e m w h i c h uses broadcast  inter-process  exceptions to increase cooperation between clients, in different languages a n d operating systems.  3.3. E x a m p l e s I l l u s t r a t i n g M i n i m i s i n g t h e A v e r a g e R u n - t i m e F i r s t the costs of the p r o g r a m events must be analysed t o determine w h i c h c o m p o n e n t costs are the most significant, so the appropriate b r a n c h of the model c a n be followed. ple technique for uni-process programs  has been developed b y the author,  A sim-  and a typical  analysis of a general p r o g r a m is detailed. T h i s technique has p r o v e d very helpful i n analysing uni-process programs a n d systems, a n d extensions of the techniques to multi-process systems have also been made b y the author.  In a general event-oriented p r o g r a m , there is a set U of possible events t h a t c a n occur d u r i n g p r o g r a m execution, a n d a p r o b a b i l i t y f u n c t i o n p such t h a t p(u) indicates the p r o b a b i l ity of event u , where u 6 U.  Efficient performance is i m p o r t a n t i n software programs, a n d  so a cost f u n c t i o n , C , is also defined, such that the cost C ( u ) is the cost of executing the  66  program on receipt of event u. Then the expected cost of executing the program, CT is given by: Cr  =  Z u  6U  C(u)*p{u)  It is assumed that the total cost of executing an event, C(u), is composed of several parts, including the cost of detecting that the event has occurred. By restructuring the program to reduce a component of the execution cost, such as the detection cost, the expected run-time may be reduced. 3.3.1. P a r t i t i o n t h e E v e n t Set i n t o 2 G r o u p s t o R e d u c e D e t e c t i o n C o s t s  3.3.1.1. S p e l l E x a m p l e  Suppose there is an event-oriented program with events  u1,u2  ••  « (  n  - i )  "n •  There is a  simple test to detect the occurrence of each of the first (n-1) events, and the n th event is assumed to be none of the others. Further, the tests for events 1 through (n-1) must be made, in turn, before the n lh event can be detected, and the n th event is the most probable. An obvious algorithm to execute this is given by:  loop g e t e v e n t ( e v e n t ); if e v e n t =  «j t h e n h a n d l e u,;  else i f e v e n t =  u 2 t h e n h a n d l e u2;  else i f e v e n t =  «(„-i) t h e n h a n d l e «( n -i)>  else h a n d l e u„ ; endloop;  We assume that this cost of executing an event is independent of program state. If this is not the case, then the event set may be increased to include compositions of program state with event, so that the constant cost assumption holds, as shown in the p u t b y t e example in the next sub-Section.  67  A n example of such a p r o g r a m is a s p e l l utility p r o g r a m m e n t i o n e d previously i n Section 3.2, where each event is the next character f r o m a text file, a n d all alphabetic characters are c o n sidered to be n o r m a l e v e n t s . 7  Now if there are (n-1) w o r d delimiters, events ul,u2>  correspond to the character event being one of the w o r d delimiters a n d event «  n  • • -«( -i)  corresponds  to the character being an o r d i n a r y a l p h a b e t i c . N o t e t h a t the w o r d delimiters are peer 8  B  excep-  in that they are transparent to the caller of s p e l l , a n d t h e y are h a n d l e d w i t h i n the u t i l -  tions  ity p r o g r a m .  A s s u m e the probabilities of the exceptional events are all e q u a l t o p , as s h o w n in T a b l e 9  3.1, a n d the h a n d l i n g cost H is the same for each event.  T h e cost of detection of the events is  given in the second c o l u m n of T a b l e 3.1, assuming t h a t each test takes 1 i n s t r u c t i o n . T h e total execution cost, (excluding the constant extra cost of h a n d l i n g each event) is s h o w n as  Table 3.1 Execution Costs for a General Event Manager event  cost of detection  expected execution cost  P  1  "2  P  2  «n-2  P  n-2  (n-2)p  n-1 n-1  (n-l)p (n-l)(l-(n-l) )  "n-1 «n  Total expected costs/event (excluding handling costs) 7  probability of event  P l-(n-l)p  lp 2p  P  new detection cost 2 3  n-1 n-1 1  (n-l)-p(n-l)(n-2)/2  new expected execution cost 2p 3p  (n-l)p (n-l)p l-(n-l)p l-p+np(n-l)/2  We assume that a simple test to see if a character is alphabetic cannot be made  In our analyses we assume events occur independently of each other, so there is a constant probability p(u) of an event occurring. Violation of this assumption merely reduces the strength of the performance gains to be made from following our model, and does not affect the nature of the gain. 8  In the spell utility, the character BLANK will appear as a word delimiter much more frequently than other special characters. For simplification in the general example, we assume that all exceptional events are equally probable. 9  68  column 3 of the Table. To improve this program, the events are divided into 2 sets — normal {«„ } and excepu  tional {«!, «2,-"> (n-i)}- Then a dummy variable is introduced so that all the exceptional events can be mapped onto the one exception, to reduce the detection costs of the normal 10  event . Then the program could be written as follows: The cost of detection of the events is now given as column 4 of Table 3.1. If we assume as before, that the handling cost is H for each event  11  and the probabilities of the exceptional events are all equal to p then new costs  are shown in the last column of the Table.  loop getevent ( event ); if f(event) then HandleExceptions(events); else handle un ; end loop; HandleExceptions(events) begin if event = u ; then handle «j; else if event = u 2 then handle u2;  else if event = « ( - 2 ) then handle else handle «(„_!); end; n  U(n_2);  10  The choice of a dummy variable depends on the application. Suppose that all exceptional events can be mapped to return a value T R U E from some function, / (event), and all normal events to return FALSE. In particular, for the spell program, such a function could be a simple look-up table, ODDCHAR, with 256 entries, one for each possible 8-bit character, indexed by the character's bitpattern. The entry in ODDCHAR for each delimiter is the value TRUE, and for each alphabetic character, the value is FALSE. Thus ODDCHAR[event] corresponds to / (event). u  without making further aggregations of constant-cost events  69 The last row of the table shows C j = total expected cost per event in the initial program and C 2 = total expected cost per event in the new program. Then C+{n-l)-p  {n-l)(n-2)/2  )+np (n -l)/2, from columns 3  and C 2 =//  and 5 of the Table. The expected change in run-time, C /C l  2  can be calculated for different  values of the combined probability, np , of the exception events. Let C /C 1  2  in run-time. Then there is a reduction in run-time when a > l and when C\jC'  Now C jC x  2  > a when H+{n-l)-p{n-l){n-2)/2  >  = a, the change 2  > oc.  a{H+{l-p)+np{n-l)/2).  Let np =y . Then C f C > a when x  2  H+{n-l)-y(n-l)(n-2)/2n  > a(H +(l-y jn )+y (n -l)/2)  y < 2n{n-l-a+H-Ha)/{{n-l)(n  a+n-2)-2a)  For the best possible speedup, when H=0, y <  2n{n-l-a)/({n-l)(na+n-2)-2a)  As n —»-oo,np —»-2/(a+l). Values of y—np  against n are plotted in Figure 3.7 for a = 1.25, 1.5 and 2.  From Figure 3.7, it is seen there is a reduction in run time for all cases where n >2. Significant reductions of more than half, when a=2 (and H=0), occur when the combined probability np of the exception events is less than 0.5. To achieve a speedup of two times when II>0, the numerator in the above expression must be positive i.e. n-3-H >0, so H <n-3. Thus the partition of the event-set U into two disjoint sets N and E, and the mapping of several exception events onto one to reduce detection costs in N roughly halves the run time  70 t-o  <*=  O  5  10  /S  20  as  1-2.5  -  n—?  (C=I-25C0  30  Figure 3.7 Expected Execution Costs for Different Values of np and n 12  when the handling cost for each event is small relative to the number of events .  3.3.1.2. P u t b y t e E x a m p l e  The author has analysed a more complex example illustrating bow exception detection costs have been reduced in a real program, putbyte. In the Verex operating system [Cheriton 80] the routine putbyte outputs a character to a selected output device. Cheriton reduced the average run-time of this routine by mapping several exceptions onto one using a 12  Performance gains in testing for the common casesfirsthave long been recognised, as reflected in the FREQUENCY statement in FORTRAN 1 and 11, which enabled the programmer to inform the compiler of the probability of various cases by facilitating flow analysis. However it was dropped from subsequent versions of FORTRAN as the statement required a tremendous amount of analysis at compile time and did not yield enough benefits at run-time. But as Sammet remarks in [Sammet 69], the fact that the statement became unimportant is not a reflection on the value of the initial technological contribution — that of giving considerable thought to the possibilities of compiler optimisation of object code.  71  fix-up function, and hence reducing the detection costs. The author analysed the programs in detail, which are described in Appendix A. The program was redesigned by dividing the events (each byte to be output) into 2 sets -- normal { putbyte called to a valid output device with the in-core buffer not full} and exceptional { putbyte called to an invalid output device, putbyte called to a valid output device with the in-core buffer full, and putbyte called with the END-OF-FILE character}. In the modified program the normal event is detected in just one test, instead of in two tests as in the original version. Because the handling costs are very small, a reduction in expected run-time from 5.75 to 4.75 instructions/event is achieved, which represents a significant saving for such a commonly used utility. The putbyte function illustrates the principle of exploiting an existing exceptiondetection mechanism which catches peer exceptions — namely the check made within putbyte for more space in the output buffer — to catch a server-client exception occurring at a higher level, namely, there being no selected output. This saves run-time overhead on exception detection in the normal case — alternative implementations incur the overhead of checking for selected output on every call of putbyte. Furthermore, it saves code. The author has observed that this practice of exception detection at low levels is used extensively in high-level languages. For example, consider the DIVIDE operator. In most implementations the user does not bother checking if the dividend is zero before trying to divide - but the underlying hardware  does  trap the error, and, ideally, some mechanism is  provided in the high-level language for the user to take suitable recovery action. This raises the issue of what level is most appropriate to catch and handle an exception; this example shows that it is sometimes practical to catch and handle it at the same level, whereas in later  72 examples we show it is sometimes a p p r o p r i a t e to catch it at a lower level a n d handle it at the level above (as in the Newcastle C o n n e c t i o n example of Section 3.3.2.3).  3.3.2. R e d u c e H a n d l i n g C o s t s b y R e s t r u c t u r i n g P r o g r a m s T h e division of the events into two groups for increase i n efficiency of detection c a n be exploited further to make programs more efficient. Suppose the total cost in r u n n i n g a p r o g r a m is given b y :  CT  =  £  C{e)*P{e)  e € E  where C (e) =  +  £  C(n)*P(n)  n £ N  the total cost of executing an exception event, a n d C (n) =  cost of executing a  n o r m a l event. T h i s must be m i n i m i s e d to reduce the average r u n - t i m e cost. programs  N o w as m a n y event-driven  are d r i v e n by a small n u m b e r of events, we assume that p(n)  > >  p(e).  Thus  reducing the cost of h a n d l i n g a n o r m a l event, C(n) s h o u l d reduce the total cost, even if the cost of h a n d l i n g another, exceptional, event is correspondingly increased.  3.3.2.1. O s E x a m p l e A n example of restructuring for the statistically d o m i n a n t case to reduce average r u n time (due to C h e r i t o n ) , is a u t i l i t y p r o g r a m os (for over-strike). O s converts a formatted text file c o n t a i n i n g backspace characters for underlining a n d boldfacing, to a file achieving the same p r i n t e d effect using o v e r p r i n t i n g of whole lines, a n d c o n t a i n i n g no backspaces.  L i k e the  s p e l l p r o g r a m described earlier, this d a t a - d r i v e n u t i l i t y can be treated as an event-driven program, where the i n p u t value of each character read represents a n event.  In its original  struc-  ture, the p r o g r a m used multiple line buffers, one for each level of o v e r p r i n t i n g on the line. E a c h character except backspace  a n d line terminators was placed in the highest level line  73 buffer t h a t was currently u n o c c u p i e d in the current character position.  T h u s , its  reflected the logical operation it was performing, translating a character stream p r i n t i n g line buffers. the least 0.5%  structure into over-  However, its structure also meant that the character it processed w i t h  overhead was the backspace  character.  G i v e n that backspaces  constitute  about  of the characters in most text files, this program was structured inefficiently. Details of  the p r o g r a m , a n d the author's analysis  are given in A p p e n d i x B .  F r o m the analysis of the event costs, the expected cost of execution of the p r o g r a m is R +1.25  VK + 13.71  i n s t r u c t i o n s / e v e n t , where R =  read a byte a n d W =  average n u m b e r of instructions needed to  average n u m b e r of instructions to write a byte.  T h i s p r o g r a m does N O T reflect any s t r u c t u r i n g for the statistically d o m i n a n t case — indeed it processes one of the least likely bytes (backspace) most efficiently. C h e r i t o n undertook to rewrite the p r o g r a m .  F i r s t , statistically speaking, the program is s i m p l y a null  c o p y i n g its i n p u t to its o u t p u t u n c h a n g e d . nize are END-OF-FILE, sparent to the user.  a server-client  S t a r t i n g w i t h this view, the exceptions to recog-  exception, a n d a BACKSPACE,  a peer  exception  tran-  T h i s leads to a different program structure, also detailed in A p p e n d i x B ,  w i t h an expected cost of execution of ( R + If W = R = 4 ,  filter  1.24 W + 3.8)  instructions/event.  a reasonable a s s u m p t i o n for U N I X , this new p r o g r a m shows an analytical  speed-up of 22.7/12.8, c o m p a r a b l e to that of two times observed. In the analysis, we find t h a t the processing of normal-case characters changes from to W + 3  i n s t r u c t i o n s / e v e n t in the new p r o g r a m , a n d as it is assumeed that W =  vides an i m p r o v e m e n t .  4, this pro-  T h e r e f o r e , some of the speed-up is achieved t h r o u g h the more efficient  processing of n o r m a l characters. the NEWLINE  9.85  T h e rest is achieved b y the m u c h - r e d u c e d cost of processing  character - f r o m (125W  +  382)  to ( W +  3) i n s t r u c t i o n / e v e n t .  Here analysis  74  shows that not only is the normal case processing in the restructured program more efficient, but  also one of the so-called  exception  cases,  NEWLINE,  has been made much more  13  efficient . This example shows that by restructuring the program to reflect the statistically dominant case rather than the logical flow, it is possible to make major savings in run-time 14  costs . This particular example corresponds to one of Bentley's rules in his book Efficient  Code,  [Bentley 82], called  Lazy  Evaluation:  Writing  the strategy of never evaluating an  item until it is needed, to avoid evaluation of unnecessary items, where, in the new version of the os utility, the multiple line buffers for overprinting are not established until they are needed. A similar technique could be used for a high level language parser. In most high level languages, the most likely statement after an assignment statement is another assignment statement. Therefore the parser should hand code to the assignment statement handler first once an assignment statement has been found; if the parse then fails, the program should be able to back-up and recover. 3.3.2.2. V e r e x N a m e - s e r v e r E x a m p l e  To improve run-time efficiency of event-driven systems involving multiple processes, the system should be structured to minimise the inter-process communication or message-flow for the statistically dominant case, not necessarily for the logical or functional flow. By doing so,  13  but at the increased expense of handling the B A C K S P A C E exception  but note that if W is greater than 6, it would actually take longer to process normal case characters. Therefore the cost of the system-provided i/o instructions must always be carefully checked to see whether structural changes would be in fact beneficial. 14  75  the handling cost of the normal events is reduced. As an example, consider a system design for a name-server. Any system resource accessed by name is checked by the name-server and passed to the controlling process. Ideally, the name-server would handle access requests to disc files by handing them to the file-server, input-output requests to the appropriate i/o device handler, and so on. However, this is too inefficient for file access (which is the most common resource request by name), so many systems have adopted the solution of two types of naming. An alternative approach, implemented (by Demco) in the Verex operating system is to treat named requests for files as tional  normal  events  and named requests for other devices as  excep-  It is illustrated in Figure 3.8 below. A l l requests for named services made by a  events.  Original Version  New Version  \  N /  N  A  server / '  ex \ Key = Send = Forward — > = Reply Figure 3.8 Restructuring the Verex Name-server to Reduce Ipc in the Normal Case  76  client go first to the file-server w h i c h handles file-name requests as before. detects exceptions NOT-FILENAME  The  file-server  a n d handles t h e m b y executing a kernel request F o r -  w a r d , w h i c h forwards the message buffer to the name-server as if it came directly f r o m the client.  T h e name-server treats the request as before.  needed to access each file, i n v o l v i n g 2 process switches.  In this way only 2 ipc messages  are  If 90% of n a m e d requests are for files,  for every 10 requests there are 22 process switches — 9 requests are handled by the file server in 2 process switches each, a n d one request is forwarded from the file server to the name server a n d t h e n to the required i / o server, w h i c h makes the R e p l y to the client.  W i t h the  original i m p l e m e n t a t i o n 30 process switches w o u l d be used -- each request passes f r o m the name server to the required i / o server w h i c h makes a R e p l y to the client.  T h e client is unaware of this i m p l e m e n t a t i o n ; hence client requests for n a m e d resources other t h a n file names are treated as request  exceptions  only received file names f r o m the name server). a d d i n g request  exceptions  b y the file server (which previously  T h u s restructuring has been achieved by  to the file server a n d r e m o v i n g the c o m m o n  file-name  requests  from  the name server.  3.3.2.3. N e w c a s t l e C o n n e c t i o n E x a m p l e The another  Newcastle C o n n e c t i o n D i s t r i b u t e d U N I X s i t u a t i o n where  the  system  could  Operating  be restructured  System to  reflecting the statistically d o m i n a n t case rather t h a n the logical  [Brownbridge 82]  i m p r o v e its flow.  is  efficiency b y  In this s y s t e m , before  each system call t h a t uses a file is performed, a check is made, to see whether the file is local or remote.  T h i s check consists of a system call, s t a t , w h i c h returns l o c a l or r e m o t e .  Local  calls are placed unaltered to the underlying kernel for service; remote calls are packaged w i t h some e x t r a i n f o r m a t i o n , such as the current user-identifier, a n d passed to a remote m a c h i n e  77  for service.  A s s u m i n g at least, say, 90%  of requests are for local files, this i m p l e m e n t a t i o n  uses the server-client exception REMOTE-FILE tem calls  stat  and  open  duces the overhead of a  detected b y the  stat  server (in U N I X the sys-  act like servers) to control the client's subsequent actions.  stat  It intro-  call whenever a file is accessed by name; this system call takes  1.6 msecs on a V A X 11/750. A n alternative a p p r o a c h , designed by the author, is illustrated in F i g u r e 3.9 below.  It  eliminates the overhead in the n o r m a l case where local file access is required, b y assuming all file accesses are local, a n d b y m a k i n g the client t r y a local file access exception event REMOTE-FILE the  file-access  server.  The  tional f o r m NOT-LOCAL Original Version  first.  is represented b y an a d d i t i o n a l request file-access  server  open  In this case, the  exception  to  detects when the request is of the  (i.e. possibly remote, or erroneous)  open, excep-  a n d handles it b y m a k i n g a New V e r s i o n  Key ^  =  Send  >  =  Reply  Figure 3.9 Restructuring Newcastle Connection to Reduce Ipc in the Normal Case  78  failure return from the client call. as a remote file access.  T h e client checks the reason for failure, a n d t h e n re-tries  T h i s saves the overhead of a system call o n every local access at the  cost of extra overhead for every remote access. a d d i n g a request  exception,  REMOTE-FILE  T h i s example also illustrates r e s t r u c t u r i n g by  to a server o p e n w h i c h already has to check for  the exception of the file not being f o u n d locally, a n d b y r e m o v i n g all the c o m m o n requests for local file detection  f r o m the s t a t server.  B o t h these examples illustrate how to reduce the context s w i t c h i n g for the n o r m a l case by a d d i n g a request  normal  exception  for exceptional cases to the server which is used for h a n d l i n g  cases, a n d t h e n d i r e c t i n g all requests first to t h a t server.  technique in these examples  because the normal-case server  It is a particularly good  already h a d to check for the  exceptional cases.  3.3.2.4. T h e T R O F F / T B L / E Q N E x a m p l e T h i s technique can also be used where a p r o g r a m suite consists of a system of m u l t i p l e processes, where o u t p u t f r o m one process is used as i n p u t to another v i a a pipe, such as the U N I X word-processing system w h i c h consists of a suite of programs, T R O F F , T B L a n d E Q N , for various specialised tasks. TROFF typesetter.  is a p h o t o t y p e s e t t i n g language w h i c h formats  text documents for a photo-  It accepts lines of text interspersed w i t h lines of format control i n f o r m a t i o n , a n d  formats the text into a printable, paginated d o c u m e n t h a v i n g a user-designed style. T B L is a d o c u m e n t f o r m a t t i n g preprocessor for T R O F F . T B L turns a description of a table into a T R O F F p r o g r a m t h a t prints the table.  It isolates a p o r t i o n of a j o b that it can  handle (viz. a table) a n d leaves the remainder for other programs, b y isolating a table w i t h delimiter pairs such as T S and T E .  79  E Q N is a system for typesetting mathematics, a n d it interfaces directly w i t h T R O F F , so m a t h e m a t i c a l equations can be embedded in the text of a m a n u s c r i p t .  T h i s is done b y sur-  r o u n d i n g a n equation w i t h delimiters such as E Q a n d E N . If a text d o c u m e n t contains tables a n d equations, some of w h i c h are e m b e d d e d in the tables, it is processed in U N I X thus: T B L  <document>  |E Q N | T R O F F  where | is the s y m b o l for a U N I X p i p e , m e a n i n g t h a t the o u t p u t from the p r o g r a m on the left of the pipe, is sent as i n p u t to the p r o g r a m on the right of the pipe.  A pipe is imple-  mented as a large buffer in kernel m e m o r y , w h i c h saves the overhead of creating a t e m p o r a r y file for the program's o u t p u t .  T h i s provides a structured, m o d u l a r system w h i c h reflects its  logical action; the process T B L is a module that passes the next i n p u t character to the process E Q N , w h i c h in t u r n passes the next i n p u t character to the process T R O F F .  T h i s w a y of s t r u c t u r i n g modules has been f o u n d to be successful in m a n y applications. F o r example, a compiler m a y produce u n o p t i m i s e d code quickly for test purposes, or the output f r o m the compiler m a y be passed as i n p u t to an expensive optimiser before final code generation for p r o d u c t i o n programs.  T h a t is, the test compiler o u t p u t m a y be passed as i n p u t to  a code generator (unoptimised), or the test compiler o u t p u t m a y be passed as i n p u t to a final code generator (optimised). However, in m a n y documents, most characters require no processing by T B L or E Q N , so there is a performance flaw w i t h this design. W e now consider how such systems can be made efficient as well as nicely-structured. T h e model proposes that the system can be restructured so the inter-process c o m m u n i c a t i o n is m i n i m i s e d for the statistically d o m i n a n t case.  T h i s c a n be achieved b y treating lines  80  with no equations or tables as n o r m a l lines, a n d lines w i t h tables or equations as exceptional. All  lines are sent to T R O F F  EQN  first.  T R O F F detects when the line is of the exceptional f o r m  or T B L a n d handles it b y f o r w a r d i n g the line to the appropriate server, where it is  treated as before -- w h i c h means t h a t after processing b y , say, E Q N , it is passed back to TROFF  for further  processing.  T h u s the  EQN-TROFF  effectively calls E Q N w h i c h in t u r n calls T R O F F problem of m u t u a l recursion in a  filters in The  feed-back  relationship is complex:  again to process its o u t p u t .  TROFF  T o solve this  loop, the U N I X designers invented  pipes  and  the early 1970's . 15  thesis model proposes that T R O F F ,  E Q N a n d T B L are each structured as simple  unbuffered server processes, executing the following loop:  loop Receive(msg, id); **from the level above or from E Q N process-as-necessary; Send(msg, to-server-below); Reply(msg,id ); **to the sender endloop; N o r m a l text is processed only by T R O F F .  Suppose T R O F F  encounters  or T B L  an equation.  T h e n T R O F F passes it to E Q N (in T h o t h - I i k e operating systems this m a y be done by executing  a kernel routine  Forward, w h i c h  directly from the client), a n d T R O F F  Reply  forwards the message buffer to E Q N as if it came completes  its loop immediately instead of m a k i n g a  to the client.  Note that in the os utility the problem was solved procedurally by constructing line-buffers to hold the intermediate output processed in HandleExceptlon. The data was processed by writing inline code, instead of making a call-back procedure. The problem with the text formatting example is that the size of the intermediate buffer for EQN and TBL is not known, and the code of TROFF is too large to write again in-line.  81  A f t e r E Q N has processed the d a t a it executes a cessed further.  Send  to T R O F F for the d a t a to be pro-  T R O F F recognizes that this, d a t a must not be f o r w a r d e d ; instead it processes  it as usual, finally u n b l o c k i n g E Q N b y m a k i n g a  Reply  to it, so E Q N can then u n b l o c k the  client process. T h e author's design and analysis of b o t h the pipe system a n d the restructured design, for a T h o t h - l i k e operating system, are given in A p p e n d i x C .  A n a l y s i s shows t h a t for  each 1000-line document there are 12000 context switches in the initial design. structure,  assuming 90%  system  In the new  of the text is n o r m a l , a n d 5% is equations a n d 5% is tables, there  are 4300 context switches, a reduction in process switches of nearly 3:1.  If a context switch is equivalent to n instructions, 7700n instructions are saved o n processing a 1000-line d o c u m e n t , a significant amount in most systems where context switching m a y take half a millisecond. O n e c o u l d argue t h a t the E Q N | T B L | T R O F F system could be i m p r o v e d b y c o m b i n i n g all the programs into one module, as the T E X w o r d processor does [ T E X 84].  B u t multi-process  s t r u c t u r i n g is good design; our approach is to develop techniques to optimise it.  3.3.2.5. P o i n t - t o - p o i n t N e t w o r k P r o t o c o l E x a m p l e In multiple-processor systems, further improvements can be made in r u n - t i m e b y distrib u t i n g the exception  h a n d l i n g code so it is situated on physically distinct processors.  appropriate technique is to i m p l e m e n t each exception  handler as a separate process.  An This  allows the n o r m a l case code t o continue w i t h o u t i n t e r r u p t i o n , w i t h a m a x i m u m s a v i n g in r u n - t i m e equal to the cost of the exception h a n d l i n g . T h i s technique can be used to a d v a n tage w i t h handlers for peer exceptions a n d for server-client  exceptions.  82  A n application of this principle is f o u n d in the design of a point to point network protocol.  In some protocol implementations, if the network becomes busy, a busy receiver s i m p l y  discards i n c o m i n g d a t a packets, instead of t r y i n g to handle t h e m b y sending a message such as H O L D , w h i c h w o u l d take up more processor a n d network time.  A s the sender already has  exception code to handle lost packets, the sender will use this mechanism to retransmit some  later  time.  This  may  help the receiver  to reduce the congestion; further,  d y n a m i c r o u t i n g algorithms, this i n d u c e d delay on the channel to the congested  at  in some node will  reduce its preference as a route, further reducing the congestion.  F r o m the view of the m o d e l , this example illustrates the principle of m i n i m i s i n g the cost of h a n d l i n g a critical real-time exception b y m a p p i n g one exception ( R E C E I V E R - B U S Y ) the receiver onto another ( L O S T - P A C K E T ) ,  h a n d l e d at the sender.  at  T h i s shows how peer  protocols can m a p peer exceptions between peer processes. T h i s example also illustrates how an i m p l e m e n t o r c a n choose the event (in this case, the exceptional RECEIVER-BUSY sible.  It  also leads to  event), to be h a n d l e d most efficiently, so that recovery is posa general principle of how exceptions  m a y be h a n d l e d s i m p l y ,  translating t h e m into exceptions that are already h a n d l e d by another m e c h a n i s m .  by  T h e proto-  col runs o n t w o distinct processes, a n d w h e n one, the receiver, becomes busy, the exception h a n d l i n g is effectively forced onto the other v i z . the sender, thus saving r u n - t i m e at the busy node.  However, most  programs  and  algorithms  are  not  structured  to  exploit such con-  currency, a n d the problems of c o o r d i n a t i o n of exception h a n d l i n g a n d n o r m a l case processing are not yet well u n d e r s t o o d .  O n a single-processor system there are obviously no gains in r u n - t i m e to be made by exec u t i n g the exception handler as a separate process, but there are clear logical advantages, viz:  83.  (1)  good separation of n o r m a l case code f r o m exception code  (2)  context saving of the n o r m a l case is locked in the well-defined state of a process  (3)  experience gained in s t r u c t u r i n g programs this w a y will be useful in designing d i s t r i b u t e d systems w h i c h exploit true concurrency. In the operating system M e d u s a , the exception h a n d l i n g code can be specified as another  process; the b u d d y of the v i c t i m w h i c h incurs the exception.  T h i s feature was described in  Section 2.2.6.3, a n d it was shown to be useful in certain situations, such as in remote debugging-  3.3.3. R e d u c e C o n t e x t - s a v i n g C o s t s i n N o r m a l E v e n t s T h e t h i r d a p p r o a c h to i m p r o v i n g efficiency t h r o u g h d i v i d i n g  event-oriented  programs  into n o r m a l a n d exception events, is to reduce the context saved in the n o r m a l case appreciating that in some critical real-time programs, r a p i d exception h a n d l i n g m a y be needed w h i c h m a y conflict w i t h this objective.  3.3.3.1. R e a d - b a c k E x a m p l e F o r a simple a p p l i c a t i o n of the principle of r e d u c i n g context-saving in the n o r m a l case, consider the os p r o g r a m of Subsection 3.3.2.1.  In the revised p r o g r a m , the c o l u m n count o n  the current line is m a i n t a i n e d w i t h every byte read, in order to handle the backspace exception, thus a d d i n g the overhead of context saving to the normal-case processing.  T h i s amounts  to 1 i n s t r u c t i o n per byte w h i c h equals 6% of the total h a n d l i n g cost of 13 i n s t r u c t i o n s / e v e n t . B y m a i n t a i n i n g this context, the exception handler is quite s t r a i g h t f o r w a r d to write. A s an alternative, the normal-case context saving could be eliminated b y p u s h i n g more work onto the H a n d l e E x c e p t i o n routine, for example, m a k i n g it read  backwards  to check  84  the last occurrence of the newline character.  T h i s illustrates a tradeoff between m a i n t a i n i n g  the context up front so that exception h a n d l i n g is relatively simple, or by m a i n t a i n i n g little or no context, a n d m a k i n g the exception h a n d l i n g more complex. and  T h i s could be a u t o m a t i c a l l y  efficiently m a i n t a i n e d by the system, if there was a w a y of specifying it in a p r o g r a m .  S u c h a facility c o u l d be used as illustrated in the program fragment given at the e n d of Appendix B.  3.3.3.2. L N T P A  Example  more complex example is now described, showing how i m p r o v e d efficiency m a y be  obtained t h r o u g h reduction of context-saving for exception h a n d l i n g in c o m m u n i c a t i o n protocols.  C o m m u n i c a t i o n protocols  can  be treated  as event-oriented  programs,  each  layer of  w h i c h is t y p i c a l l y i m p l e m e n t e d as a finite state machine m a k i n g state transitions in response to the messages received f r o m the layers above a n d below. In c o m m u n i c a t i o n protocols, m u c h of the code deals w i t h error detection a n d recovery.  In long h a u l networks, ( L H N s ) ,  charac-  terised by a long network delay (low b a n d w i d t h ) a n d high error rate, there is a high overhead on n o r m a l case p r o c e s s i n g so that errors can  16  of v i r t u a l circuit p o i n t - t o - p o i n t connections, to m a i n t a i n context  be h a n d l e d w i t h o u t a b r e a k d o w n of the v i r t u a l circuit.  A R P A N E T protocol T C P / I P over error-prone subnets.  F o r example, the  [ D A R P A 81] was designed to provide internet packet  transport  T h e services p r o v i d e d include internet address h a n d l i n g , r o u t i n g ,  internet congestion c o n t r o l a n d error reporting, multiple checksums (one at each the T C P level a n d the other at the IP level), segment fragmentation a n d reassembly, d a t a g r a m destruction mechanisms a n d service level options to clients.  self-  T h e large, byte level sequence  numbers used in T C P (32 bits) incurs considerable processing overhead [Chanson 85a] but is  18  W e assume normal case is when there is no lost or erroneous data  85  justified in the context of a L H N on the grounds t h a t a large recycling interval allows easy identification of out-of-order a n d duplicate segments, facilitate the f r a g m e n t a t i o n of segments  a n d t h a t byte level sequence  numbers  at the local IP layer a n d the intervening gateways,  and the reassembly at the remote T C P entity. In contrast, a single local area network ( L A N ) is characterised  b y a low transmission  error rate, by single-hop r o u t i n g a n d h i g h channel speed such as 10 M b / s e c , c o m p a r e d w i t h a l i m i t i n g speed of about 9 . 6 K b / s e c for a long haul network.  In a local area network, the m a j o r  portion of the packet delay time (time to process a n d successfully deliver a packet to its destination ) shifts f r o m the transmission delay time of long haul networks to the protocol processing time. works.  T h u s the efficiency of the protocols is a n i m p o r t a n t design issue for local area net-  T h e overheads  in m a i n t a i n i n g context for correct error  h a n d l i n g of a  connection  oriented protocol such as T C P / I P are unjustifiably high for a local area network as the error rates for local area networks are so small (less t h a n 10~ ). 9  T h e overhead of layered protocol  i m p l e m e n t a t i o n has been described in [Bunch 80] — a d d i n g protocol layers adds overhead.  O n e solution, a d o p t e d in [Chanson 85b], in w h i c h a v i r t u a l circuit is i m p l e m e n t e d b y a very simple protocol for p o i n t - t o - p o i n t d a t a transfer, applies this principle to reduce n o r m a l case r u n - t i m e . consideration  T h i s protocol, called L N T P the  characteristics  of  T C P / I P for local c o m m u n i c a t i o n . 17  LANs.  (Local N e t w o r k T r a n s p o r t Protocol) takes into LNTP  runs  under  4.2  BSD UNIX  replacing  Since the m a j o r i t y of the packets in a L A N are for local  c o n s u m p t i o n , this scheme greatly improves the network t h r o u g h p u t rate as well as the mean packet delay t i m e .  When packets are destined for other networks supporting TCP or some other protocol, the protocol can be implemented at the gateway.  86  The  f u n d a m e n t a l philosophy in the design of L N T P  is s i m p l i c i t y .  T h e objective is to  reduce the protocol processing time, in a d d i t i o n to i m p r o v i n g u n d e r s t a n d a b i l i t y a n d ease of maintenance.  T h u s we set  out to  design a new protocol t h a t includes only the  strictly required in a single L A N operating e n v i r o n m e n t .  features  A n y functions that are needed only  in rare occasions, particularly if they can be easily achieved b y the application programs, not i n c l u d e d .  are  T h e r e f o r e , internet congestion a n d error control, r o u t i n g , service options pro-  v i d e d to high level protocols a n d certain functions that handle d a m a g e d packets f o u n d in a t y p i c a l L H N protocol are not i n c l u d e d as they are irrelevant in a L A N e n v i r o n m e n t .  Even  c h e c k s u m m i n g is specified as a n o p t i o n since the error rates of the c o m m u n i c a t i o n m e d i u m are negligible. T h e only m a n d a t o r y error control feature of L N T P is the selective  retransmis-  sion scheme to handle packet loss due to buffer overflows. Consistent w i t h the characteristics of a single L A N e n v i r o n m e n t , the peer address (16-bit address space) does not include internet component.  Furthermore,  since  the  p r o b a b i l i t y of out-of-order  delivery a n d duplicates  negligible, the sequence n u m b e r space is made small (4-bits) - - thus detection of lost  is  packet  exceptions is achieved using very simple sequence n u m b e r s .  T o reduce the state i n f o r m a t i o n required to m a i n t a i n a connection (which simplifies the control structure leading to reduced processing overhead), the sender a n d receiver are logically separated  and the send a n d receive channels are completely decoupled. A consequence of this  decision is t h a t a receiver is unable to piggyback control i n f o r m a t i o n on reverse d a t a packets to the sender, a feature c o m m o n l y s u p p o r t e d b y L H N protocols to conserve  communication  b a n d w i d t h . However, network b a n d w i d t h is not a scarce resource in a L A N . M o r e o v e r , our measurement  results  [Chanson 85a]  show p i g g y b a c k i n g is rare due to the u n a v a i l a b i l i t y of  reverse d a t a packet at the right time. T h u s in view of the simplicity and reduced processing overhead, L N T P is asymmetric in send a n d receive.  87  T h e flow control mechanism in L N T P and receiver, concept  maximises the parallel operations of the  sender  a n d minimises the n u m b e r of control packets t h a t has to be exchanged.  The  of threshold in the window space reduces the control traffic (no control before  threshold is exceeded). Because of the deterministic nature of packet delays, a  the  mathematical  model can be f o r m u l a t e d for a proper threshold to be set as a f u n c t i o n of the system  parame-  ters to maximise the network t h r o u g h p u t rate.  LNTP  implements a single logical timer; in contrast, almost all other protocols specify a  separate timer  for each  o u t s t a n d i n g packet  (though  some  implementations  cut  corners in  this).  A p r e l i m i n a r y i m p l e m e n t a t i o n of L N T P in the 4.2 B S D U N I X kernel r u n n i n g on a S U N workstation has been m a d e .  T h e protocol was tested in a software  performance c o m p a r e d to T C P / I P .  loopback mode, a n d its  In U N I X 4 . 2 B S D , the file transfer rate is increased to  k b i t s / s e c c o m p a r e d w i t h 360 k b i t s / s e c for T C P / I P r u n n i n g under identical conditions.  450 This  improvement is basically due to the s i m p l i c i t y in the control structure resulting in lower protocol overhead for L N T P . T h i s design illustrates the r e d u c t i o n of average r u n - t i m e b y reduction of context maintenance  specifically for e r r o r - h a n d l i n g , a n d b y t u n i n g a protocol's  flow  efficient for n o r m a l error-free d a t a transfer.  3.4.  E x a m p l e s Illustrating Increasing the F u n c t i o n a l i t y of the System  3.4.1. U s i n g Inter-process E x c e p t i o n s f r o m S e r v e r - c l i e n t  control to  be  most  88  3.4.1.1. I T I < B R K >  Example  A n example of the technique for p r o v i d i n g incremental increase to a system's f u n c t i o n a l ity was made b y the author i n i m p l e m e n t i n g the transport layer protocol X . 2 9 [ C C G 78], here referred to as D a t a p a c ' s Interactive T e r m i n a l Interface (ITI) protocol. the network layer protocol X . 2 5 [ C C I T T  I T I uses the services of  76] in the V e r e x operating system.  T h e author  designed the ITI protocol as an i n p u t / o u t p u t filter consisting of a reader process a n d a writer process [Atkins 83b]. T h i s m i n i m a l configuration s u p p o r t e d n o r m a l r e a d / w r i t e requests, b u t d i d not s u p p o r t the exception event of a < B R K > .  T h e a u t h o r was able to i m p l e m e n t a n d  test this basic system w i t h o u t a n y exception h a n d l i n g facilities. A n exception process for handling < B R K > exception.  was t h e n a d d e d to the I T I layer a n d the system retested w i t h the  <BRK>  T h i s example illustrates i m p l e m e n t i n g a new feature b y m a k i n g the server export  an inter-process exception to a loosely-cohesive handler at the client layer. fully i n C h a p t e r 5.1.  It is described  It also shows how concurrency is increased t h r o u g h multi-process struc-  turing.  3.4.2. I g n o r a b l e E x c e p t i o n s So far, only those exceptions w h i c h m u s t be h a n d l e d for correct operation of the system have been considered.  However, certain exceptions, w h i c h we call ignorable  exceptions, m a y  be used for notification o n l y ; if there is no handler to receive the notification, the system still runs correctly, t h o u g h m a y b e w i t h reduced f u n c t i o n a l i t y or less efficiently.  3.4.2.1. I T I R e c o n n e c t i o n E x a m p l e A s already m e n t i o n e d i n the previous Section, the a u t h o r i m p l e m e n t e d a transport layer protocol ITI w h i c h uses the services of the network layer X . 2 5 protocol.  89  It was observed that o n several occasions a remote user w o u l d lose his connection, a n d would i m m e d i a t e l y log on again, o n l y t o find that the program he was w o r k i n g o n h a d been destroyed because of the b r o k e n v i r t u a l circuit.  It therefore seemed desirable to provide a n  ITI i m p l e m e n t a t i o n a n d session-layer service w h i c h w o u l d hide a failure in the u n d e r l y i n g network from the a p p l i c a t i o n layer until either a timeout fired, or the same user logged in again. In the latter case, the a p p l i c a t i o n p r o g r a m will be available to the user just as if no disconnection h a d occurred (except for a brief R E C O N N E C T E D  message).  T h e a u t h o r decided to  exploit an ignorable inter-process exception from the X.25 server to the client I T I to provide this incremental increase to the system's f u n c t i o n a l i t y .  A s already discussed, the I T I layer already h a d a separate exception process to handle the < B R K > new feature  exception.  T h e author decided to use the same exception process to handle the  w h i c h w o u l d achieve reconnection as shown above, b y exploiting a n ignorable  inter-process  exception f r o m the X . 2 5 layer below.  inter-process  exception as a n o n - b l o c k i n g  Reply  f r o m the X . 2 5 server to the I T I exception  process whenever the network connection was lost. handle the reconnection.  T h e author i m p l e m e n t e d the ignorable  T h e exception process was modified to  T h e technique used is detailed i n Section 5.1.5.  T h i s is a powerful  example of increasing the f u n c t i o n a l i t y of a client/server system incrementally b y using ignorable 1:1 inter-process  exceptions.  3.4.2.2. W i n d o w - s i z e - c h a n g e E x a m p l e C o n s i d e r how a window management handle a W I N D O W - S I Z E - C H A N G E  p r o g r a m c o u l d use a n exception m e c h a n i s m to  notification.  If no handler exists for this exception, the  window manager takes no action for line size t r u n c a t i o n or expansion.  If a n exception handler  exists to handle this notification, a more intelligent action such as line w r a p - a r o u n d m a y be  90  taken. This illustrates increasing a program's functionality through the use of 1:1 ignorable inter-process exceptions.  3.4.2.3.  Hash Table Example  In many software systems, a program may be made more efficient by being able to invoke an exception mechanism which runs an alternative algorithm under certain exceptional situations. If no exception mechanism is present, the program still runs correctly, but more slowly. As an example, consider a program which implements a hash table. If the hash table entries get too long, say greater than 100 entries, an information signal is sent to the caller of the routine. If the caller wishes to handle such a signal, it can call an alternative algorithm (i.e. an exception handler) for hashing the table entries, which will reduce the table entries to a reasonable number, allowing subsequent calls to proceed more efficiently. If no alternative algorithm is available, the notification signal is simply ignored by the caller. This approach could be used in a compiler implementation.  3.4.2.4. Slow G r a p h P l o t t e r E x a m p l e  Another example where an alternative algorithm could be useful, would be for a slow graph plotter. If a large plot were requested, and there was already a long backlog of work, an exception notification could be made to the caller. The caller could then invoke a timeconsuming algorithm to optimise the plotter steps for that job, thus reducing the plotting time estimate. If the caller provided no alternative algorithm, the plot would still be correct, but would take a long time. This approach could be extended in a distributed system so that alternative algorithms for calculating the plotter steps would all execute concurrently. When the plotter became  91  available to plot this job, the algorithm which had completed with the fewest steps would be chosen, and the other algorithms would be aborted. 3.4.2.5. Mesa File-server Example The Mesa file server described previously in Chapter 2 illustrates how a program that provides certain basic default facilities, may be subsequently enhanced by using notification to an exception handler. The program runs to a minimal specification without the exception mechanism, and may  provide further functionality using the exception mechanism. If the  client does not provide a handler for the PleaseRelease notification, the notification is simply ignored. Otherwise, the client provides increased functionality by attempting to release its files. This is an example of a increasing the cooperation between clients by using ignorable broadcast inter-process exceptions.  CHAPTER 4 Program Paradigms for Implementing System Dependent Features  This Chapter presents program paradigms employing the design principles described in the general model for two example problems. The program models are given for different operating systems, and in different high level languages. High level languages are discussed because concurrent programming is the basis of both applications and systems, and  one  wishes to see a uniformity through all levels of the system. Many designers favour a language-based approach to distributed operating systems, with the aim of providing uniform access to user and kernel supplied resources, both locally and remotely. For example, programmers at the University of York in England have decided to use ADA  [USA 80] to imple-  ment a distributed UNIX, and they will only allow users ADA-like tasks, not UNIX-like processes. The model, shown in Figure 3.3, proposes designs for achieving 3 objectives: viz. A, for reducing the average run-time costs; B, for reducing exception handling costs; and C, for increasing program functionality. For all these objectives, a general-purpose and efficient inter-process exception mechanism is necessary. For objective A, in a multi-process environment, it is important to ensure that the inter-process exception notification costs do not overwhelm the normal case computation costs. Minimising the inter-process exception notification costs is a declared part of objective B (see Figure 3.5). And for achieving incremental increase of a program's functionality, a general purpose inter-process exception mechanism is essential (see Figure 3.6). . Mechanisms to achieve this are system dependent, so a general model cannot easily be described. Instead, a set of program paradigms is presented 92  93  for achieving inter-process exception notification. The 1:1 inter-process exception notification is illustrated by a problem which often arises in concurrent programming involving exception notification of asynchronous events -- that of a window manager process which multiplexes i/o from 2 or more devices simultaneously -- the so-called KEYBOARD-MOUSE problem. The model also proposes the use of ignorable exceptions for increasing a system's functionality; again, the implementation of ignorable l:many inter-process exception notifications is system dependent, so a set of program models for exploiting such exceptions is described. The problem chosen is that of a storage allocator client/server system, where the cooperation between the clients is enhanced by the server's ignorable, broadcast inter-process exceptions. Finally, this Chapter discusses program models for achieving mutual exclusion between processes, as it is often logically advantageous to use a separate exception handler process to receive ignorable signals synchronously, rather than use a procedure in the client process, to increase concurrency and to emphasise the independence of the exception handling code from the normal case code. The implementation of mutual exclusion is system dependent, and several program paradigms are given for implementing the mutual exclusion necessary when exception handlers are implemented as separate processes. We show that atomic transactions are helpful for implementing exception handlers as separate processes.  4.1. P r o g r a m P a r a d i g m s f o r I n t e r - p r o c e s s E x c e p t i o n N o t i f i c a t i o n  4.1.1. T h e K E Y B O A R D / M O U S E  Example  Efficient and general inter-process exception notification mechanisms are needed to reduce exception handling costs in multi-process systems, and also to distribute exception handling by using multiple processes, where the invoked process (the exception handler), is  94  not necessarily related to its invoker.  T h e example problem is concerned w i t h synchronising  simultaneous i n p u t from two or more i n p u t devices, such as a window manager process w h i c h multiplexes i n p u t / o u t p u t b y reading input from a k e y b o a r d a n d a mouse simultaneously. T h e problem arises in ensuring that a n exception s u c h as < B R K > cal real time, whilst other  i / o events  are  processed  is processed w i t h i n a criti-  as expediently and fairly as possible.  F u r t h e r m o r e , it s h o u l d be possible for the user to control the arrival of input f r o m the various i / o devices.  F o r example, w h e n no mouse i n p u t is required or expected, the window manager  does not wish to receive i n p u t f r o m the mouse whenever it is accidentally m o v e d , as processing such messages c o u l d slow d o w n the system considerably.  T h u s the k e y b o a r d i n p u t m a y  be considered as normal-case, a n d mouse input as exceptional. A  new solution for this problem derived f r o m the model is described for T h o t h - l i k e  operating systems.  W e t h e n discuss how m o d e r n concurrent  p r o g r a m m i n g languages allow  one to model these designs, w i t h i n the framework of the three types of concurrent m i n g languages described b y A n d r e w s a n d Schneider in their survey Concepts  for  Concurrent  Programming  [Andrews 83];  v i z . monitor-based languages,  program-  and  Notations  message-oriented  languages a n d operation-oriented languages.  4.1.2. Exception Notification in a Synchronous Message-passing System In the synchronous message-passing environment of T h o t h - l i k e operating systems, asynchronous i n p u t - o u t p u t can be readily modeled by a window manager w i t h 2 workers, BOARD  ( K B ) a n d M O U S E (i.e. one worker for each k i n d of asynchronous event).  shown in F i g u r e 4.1  KEY-  T h i s is  below.  T h e device server X at the level below demultiplexes the i n p u t a n d makes a the appropriate worker.  Reply  to  T h e window manager receives notification fairly w h e n input is ready  95  =  Send  Figure 4.1. Thoth Server Notification Using Two Worker Processes on each device provided  the operating  system  is fair  in its scheduling.  T h u s the p r o b l e m of  scheduling has been p u s h e d d o w n to the level below the server. F o r a message-based system, it is not easy to provide an efficient way to control the possible i n p u t devices selected.  F o r example, if the mouse reader reported each move to the w i n -  dow manager, i n c l u d i n g accidental moves f r o m the user, there w o u l d be 2 process switches for each move, w h i c h w o u l d slow the system d o w n considerably. W e propose a solution to this p r o b l e m , in w h i c h the window manager provides a switch for its clients to toggle on or off v i a a simple request. request to the level below, say (server) process X .  T h e window manager transmits this  T h i s is illustrated in F i g u r e 4.2.  Let us call the client requests to toggle the switch  tifyMouse.  PleaseNotifyMouse  T h e server X must m a i n t a i n a toggle for each i n p u t device.  i n p u t , X checks its toggle for that i n p u t device (initially all are on).  and  NoNo-  Whenever X  has  If the toggle is on, X  96  Figure 4.2. Thoth Switch Notification demultiplexes the i n p u t a n d replies or queues it for the appropriate reader as before. the client makes a  NoNotifyMouse  request to the window manager, the toggle for mouse  i / p in X is changed to off. W h e n the toggle is offX device.  O n l y when the client makes a  will the mouse d a t a  After  simply discards its input d a t a from that  PleaseNotifyMouse  be sent to the level above.  request to the window manager  T h e m a s k i n g of the mouse d a t a can be  pushed as low d o w n as required w i t h appropriate additions for the toggle at the client-server interface at each level.  T h i s solution has been i m p l e m e n t e d b y the a u t h o r in another context,  that of r e a d i n g < B R K >  interrupts f r o m a remote user over the X 2 5 network.  text, the toggle h a d to be reset after every  In t h a t con-  d a t a i n p u t , causing an overhead of 2 extra process  switches on every n o r m a l case i n p u t . T h e a u t h o r deemed it was too inefficient a solution, a n d  97  designed another solution for this p r o b l e m e m p l o y i n g an inter-process to client, w h i c h is described fully in Section 5.1. efficient  switching  and  convenient  is performed  mechanism  only rarely  with  T h u s the toggle switch tool provides an  for m a s k i n g asynchronous  respect  exception f r o m server  to the receipt  events,  of normal  input  providing  that  the  items.  4.1.3. Exception Notification in Procedure-oriented Concurrent Languages In procedure-oriented 79],  languages (or, equivalently, monitor-based)  such as M e s a [Mitchell  used to implement the single-user operating system P I L O T [Lampson 80] a n d M o d u l a - 2  [Wirth 82], used for developing a real-time operating system called M E D O S on the L i l i t h c o m puter, monitors were designed to provide a very cheap m e t h o d of inter-process c o m m u n i c a t i o n [Hoare 74].  In the M o d u l a a n d M e s a languages the m o n i t o r a p p r o a c h often necessitates t h a t a  process finishing w i t h a resource must explicitly it, so they will be awakened.  able.  notify  (or  signal)  other processes w a i t i n g for  T h e s e other processes are said to be w a i t i n g o n a condition  vari-  S u c h use of a c o n d i t i o n variable, or global flag, is u n s t r u c t u r e d a n d difficult to p r o g r a m  correctly.  T h e notification is easily forgotten  as the  notify  is not an essential part of the  releasing process's action; one might even criticise this as being a violation of the  abstraction  of the process - - such i m p l e m e n t a t i o n details s h o u l d not be the concern of the p r o g r a m m e r that level, b u t s h o u l d be h i d d e n b y being i m p l e m e n t e d at the level below.  at  T o overcome these  difficulties, the M e s a implementors placed a timeout on every c o n d i t i o n variable so that missing  notify  signals w o u l d not block processes for ever.  system-set timeout- f r o m 200  A n experiment w h i c h increased  this  msecs to an effectively infinite value revealed that m a n y such  notifications were missing [Levin 82].  F u r t h e r difficulties w i t h the interaction of monitors,  processes a n d exception signals have been described b y L a m p s o n in [Lampson 80].  98  Now a  notify  signal in a m o n i t o r will only wake up processes w h i c h are already w a i t i n g  on that c o n d i t i o n variable; it cannot be used for general-purpose inter-process c o m m u n i c a t i o n . In M e s a , for inter-process exceptions the only signal is A B O R T . using  monitor-based languages  for d i s t r i b u t e d systems  required in general between unrelated processes.  T h u s there are problems in  where exception  c o m m u n i c a t i o n is  However, a p r o g r a m model to achieve this in  M e s a is described in Section 4.2.2. W i r t h discusses the problem of m u l t i p l e x i n g asynchronous i n p u t f r o m a k e y b o a r d a n d a mouse in his book o n M o d u l a - 2 [ W i r t h 82]. w h i c h in contrast to the conventional  Read,  H e suggests t h a t the user execute a  does not wait for the next keystroke on the key-  board, but instead immediately returns a special value if no character is f o u n d . detecting whether whether  i n p u t has occurred consists  a character has just  BusyRead  BusyRead  T h e code for  of a loop w i t h i n w h i c h is a test to  detect  been read a n d the mouse has m o v e d , followed b y a call of  w h i c h returns either a character f r o m the k e y b o a r d or the special value.  Such a  Busy- Wait loop is clearly only practical o n a dedicated single-user processor as c p u cycles are unavailable for other w o r k . these tests.  O f course, one c o u l d t h e n resume a b a c k g r o u n d process  after  A better a p p r o a c h is to have a m u l t i p l e x o r w h i c h polls for the various events, r u n  once every clock t i c k .  T h e M a c i n t o s h uses this technique of one consolidated event queue,  managed b y the operating s y s t e m . an i / o event occurs [Macintosh 84].  T h e user executes  get-next-event  and pauses there until  T h u s we observe that in procedure-oriented concurrent  languages, one software tool useful for inter-process exceptions is an event queue managed b y the o p e r a t i n g s y s t e m .  99  4.1.4. Exception Notification in Message-oriented Concurrent Languages A n d r e w s a n d Schneider define message-oriented high level languages as those w h i c h provide  send  and receive  as the p r i m a r y means for process interaction, a n d they assume  shared m e m o r y between processes. PLITS  [Feldman 79]  observed  that  Receive-Reply  Message-oriented languages such as C S P [Hoare 78]  are intended for developing multi-user operating systems.  in the  message-passing  s y n c h r o n i z a t i o n primitives the  no and  T h e author  synchronous  Send-  inter-process c o m m u n i c a t i o n primitives of T h o t h were not m e n t i o n e d ; that  omission was remedied in a subsequent letter to Surveyor'8  Forum  [Atkins 83c].  It has already been s h o w n how the synchronous T h o t h - l i k e environment m a y be used to model the problem of m u l t i p l e x i n g asynchronous i n p u t f r o m two or more devices; however, T h o t h processes on the same team m a y share m e m o r y .  W e must therefore ensure that the  problem can be modelled w i t h o u t the use of shared m e m o r y , if this p r o g r a m design is to be applicable to a high level language s u c h as P L I T S or C S P . T h e m a i n use of shared m e m o r y is to save c o p y i n g d a t a , but d a t a can obviously be transferred by messages if necessary. synchronization between processes  T h e second use of shared m e m o r y is in i m p l e m e n t i n g  on a t e a m ; this can be achieved b y a semaphore.  Hence  the solution o u t l i n e d above for T h o t h will work directly in message-oriented languages where no shared m e m o r y is available. Nelson remarks in his thesis o n Remote  Procedure  Calls  [Nelson 81] t h a t the need for  asynchronous exceptions between processes has long been recognised a n d i m p l e m e n t e d in the message-passing  world: P L I T S ,  asynchronous exception.  M e d u s a a n d R I G all provide some m e t h o d of  inter-process  H e continues, by n o t i n g that the R I G implementors, in particular,  found this c a p a b i l i t y essential for designing reliable systems.  However, it is shown here that  100  the  toggle  switch  mechanism  provides  notification tool for message-based  a  suitable  synchronous  systems; thus asynchronous  inter-process  exception  notification is not absolutely  essential for good performance.  4.1.5. Exception Notification in Operation-oriented Concurrent Languages O p e r a t i o n - o r i e n t e d languages provide remote procedure call as their p r i m a r y means for process  interaction.  processor  systems,  A D A [ U S A 80] are  examples  and SR  of s u c h  [Andrews 82],  languages.  A s in procedure-oriented  operations are performed on an object b y calling a procedure. of an operation synchronizes w i t h the so-called caretaker t i o n is executed.  In A D A this s y n c h r o n i z a t i o n is called a  A rendezvous presents  intended for designing m u l t i languages,  T h e difference is t h a t the caller  t h a t implements it while the opera-  rendezvous.  problems for general inter-process c o m m u n i c a t i o n as the  server-  task can o n l y be c o m m u n i c a t i n g w i t h one active process at a time, allowing for no asynchronous inter-process notification. F o r inter-process exceptions in A D A only the F A I L U R E signal can be given, so this m e c h a n i s m cannot be used for general notification. A D A can multiplex calls b y nesting  accept  A server process in  statements that service the calls.  T h i s technique  is not yet widely used, a n d as A n d r e w s a n d Schneider point out, i m p l e m e n t a t i o n of some of the concurrent p r o g r a m m i n g aspects of A D A is likely to be h a r d . SR  ( S y n c h r o n i z i n g Resources)  remote procedure call.  [Andrews 81,82], also uses the rendezvous form of  the  However, b o t h asynchronous a n d synchronous message passing  are  s u p p o r t e d , a n d A n d r e w s has achieved successful implementations of servers in S R using these operations.  O n l y time will tell if the lack of specific software tools in A D A for inter-process  exceptions will hinder its g r o w t h as a software language.  101  4.1.6. E x c e p t i o n N o t i f i c a t i o n i n D i s t r i b u t e d S y s t e m s In d i s t r i b u t e d operating systems where the user process m a y be located o n a different host to the one the t e r m i n a l is connected (e.g. d u r i n g a remote login session), a m e c h a n i s m must be available to divert the asynchronous interrupt to the remote host, as one of the special  difficulties w i t h  memory.  inputs  i n a d i s t r i b u t e d system  is the lack  of shared  T h i s means that a hardware interrupt on one processor can't force a transfer of  control to a process  interrupt  asynchronous  executing concurrently o n another  processor ~  some k i n d of  software  facility is needed.  In c o m m u n i c a t i o n protocols such as X . 2 5 [ C C I T T 76], a special out-of-band  message (an  interrupt packet) is used to t r a n s m i t s u c h p r i o r i t y i n f o r m a t i o n to the remote host, a n d this must be received as soon as possible b y the target process.  A n interactive t e r m i n a l interface  protocol is used on the remote host to simulate the t e r m i n a l ; however this m a y r u n at a m u c h higher level t h a n the local t e r m i n a l server. achieve the desired effects o n < B R K > .  Special facilities m a y therefore  be required to  T h e author designed a m e c h a n i s m , discussed in the  ITI network protocol example i n C h a p t e r 5, to solve this problem in b o t h Verex a n d U N I X .  4.2. P r o g r a m P a r a d i g m s f o r I g n o r a b l e l : m a n y E x c e p t i o n s  4.2.1. T h e S t o r a g e A l l o c a t o r E x a m p l e In his thesis L e v i n proposes an i m p o r t a n t , interesting mechanism for inter-process tions, b y designing into a high level language a means for the owner of a shared (typically a server) to notify A L L (or some occurred.  H e calls such exceptions  number) of its clients that  o n a shared object  structure  processes t h a t have used, a n d are still using the service  class  excep-  abstraction  an exception has  exceptions.  A l l the  m a y receive exception signals, as  102  specified in the server's functional description. users.  T h u s a module exports  an exception to its  T h i s feature was not i m p l e m e n t e d by L e v i n , but we show here that it can be imple-  mented, a n d that it could be extremely useful, particularly in d i s t r i b u t e d operating systems. W e develop several p r o g r a m paradigms based on the example p r o b l e m of a storage allocator w h i c h m a y detect that its resources, while adequate to meet the current request, r u n n i n g low.  are  S u c h a P O O L - S C A R C E exception condition could be usefully propagated to all  contexts where resources m i g h t conceivably be made available to the allocator.  Y e t there is  no reason to suspend the current allocation request while the c o n d i t i o n is being h a n d l e d ; the two actions are logically independent a n d should proceed (conceptually at least) in parallel. Hence the exception P O O L - S C A R C E  is described as an ignorable asynchronous  broadcast  exception in our classification.  The  allocator m a y also find it impossible to satisfy a resource request, a n d m a y raise a  P O O L - L O W exception w i t h its clients, expressing its urgent need for storage. request will r e m a i n p e n d i n g while this exception is being processed. classified as an ignorable asynchronous broadcast exception.  POOL-LOW  in  the  In this case, it must raise a  exception, signifying its inability to meet the requestor's d e m a n d s .  E M P T Y cannot be ignored by the client m a k i n g the request; thus P O O L - E M P T Y as a 1:1  is also  T h e allocator, even after  above, m a y not possess adequate resources to satisfy the request. POOL-EMPTY  N a t u r a l l y , the  POOL-  is classified  synchronous exception w h i c h m u s t - b e - h a n d l e d . Levin's proposed allocator is shown  Figure  4.3  below . 1  Because  Levin's  so-called  sequential-conditional  selection  policy  (described in detail later) could cause deadlock if the pool module is i m p l e m e n t e d as a m o n i -  1  We  have altered the notation from Alphard [Wulf 76] to a Mesa-like notation in which Levin's  condition variables are replaced by exception variables to avoid confusion with monitor condition variables  103  StorageAllocator: M O D U L E = begin pool; P integer; free POOL-LOW policy sequential-conditional; exception exception POOL-EMPTY policy broadcast; raises POOL-LOW on pool, P O O L - E M P T Y on Allocate;  Allocate: E N T R Y P R O C E D U R E (p:pool, size: integer) returns (ddescriptor) = begin P(outer); P(inner); if p.free < size then begin V(inner); raise POOL-LOW until p.free > = size; P(inner); if p.free<size then begin V(inner); V(outer); raise POOL-EMPTY; end; end; p.free := p.free - size; < create descriptor d for segment of amount size > V(inner); V(outer); end; Release: E N T R Y P R O C E D U R E (p:pool, d: descriptor) begin P(inner); p.free := p.free + < size of segment referenced by d > < release space associated with d > V(inner); end;  end; Figure 4.3 Levin's Storage Allocator  tor, L e v i n achieves m u t u a l exclusion a n d c o n d i t i o n synchronization using the two semaphores,  inner  a n d outer. T h e f u n c t i o n o f these semaphores is explained i n detail b y L e v i n , a n d also  104  by Black in [Black 82]. The author proposes a new approach which is implementable in existing systems, both monitor and message-based, by following the design techniques specified in the model. Program paradigms are presented for these implementations in Mesa, a language supporting monitors, and in Verex, a message-based operating system. The storage allocator is very simple. Allocate accepts a pool and size as parameters, obtains sufficient storage from the pool and returns a reference to it. If adequate storage is not available to do this, the POOL-LOW exception is generated. It is raised in turn with the users of the pool until either adequate resources become available, or there are no more handlers. If the handlers do not succeed in releasing enough store the POOL-EMPTY exception is generated. The Release operation is straightforward. A pool user has the form described in Figure 4.4 below. The pool user is prepared to handle POOL-LOW at any instant. Except during the time when segments are being added to m, POOL-LOW is handled by the squeeze procedure. But during critical data-structure updates to m, invocation of squeeze is inhibited by a dummy (masking) handler for POOLLOW that does nothing; in this situation, the POOL-LOW exception is lost. Levin notes the differences in synchronization and communication requirements among the structure class exceptions POOL-SCARCE, POOL-LOW and POOL-EMPTY and their handlers, and proposes a new mechanism to solve the allocator problem. He defines a tion  policy  for each exception, and describes three such policies:  sequential-conditional  Under the  and  selec-  broadcast-and-wait,  broadcast.  broadcast-and-wait  policy, all eligible handlers are invoked in parallel. When  all handlers thus initiated have completed, execution resumes immediately following the raise (like Mesa's SIGNAL) statement. Thus, while all eligible handlers execute in parallel (concep-  105  Userpool : M O D U L E = begin p : pool; m : hairy-list-structure; exception : NOSOAP policy broadcast; raises N O S O A P on somefunc; somefunc: P R O C E D U R E (<info>) begin d l , d2 : descriptor; d l := Allocate (p,amtl) [POOL-EMPTY: raise NOSOAP; return]; d2 := Allocate (p,amt2) [ POOL-EMPTY: Release (dl); raise NOSOAP; return]; < fill in d l and d2 > begin add-to(m,dl); add-to(m,d2); end [POOL-LOW: ] end; squeeze: P R O C E D U R E () begin < perform data-dependent compaction of m using Release(p,d) to release any elements d removed by compacting m > end; end [POOL-LOW: squeeze() ] Figure 4.4 A User of Levin's Storage Allocator  tually at least), t h e signaller is suspended until they have all completed.  Reliable Broadcast Send.  T h e server,  W e will call this a  or the operating system s u p p o r t i n g such  servers,  needs a list of all its clients as it must delay until all the handlers are done. In his example, L e v i n does n o t use this policy f o r raising the P O O L - L O W exception - instead he uses the sequential-conditional the  raise  statement.  policy.  T h i s policy has a predicate associated w i t h  T h e predicate is evaluated, a n d , if T R U E , execution continues following  106  the  raise  statement.  If the predicate is F A L S E , an eligible handler is selected a n d i n i t i a t e d .  W h e n it completes, the predicate is again e v a l u a t e d .  2  The  raise  statement terminates when  either the predicate becomes T R U E , or the set of eligible handlers is exhausted.  T h u s in his  example, the exception is raised only as m a n y times as needed to satisfy the current request - it is therefore more efficient for any single request.  However, in m a n y situations, if the pool is  low for one request, there is a high p r o b a b i l i t y that the next request will also find the pool low. It w o u l d seem more advantageous to use the broadcast-and-wait clients will squeeze their d a t a .  policy, so t h a t all the  T h i s m a y save subsequent shortage, in the same w a y that  attempts to reduce page faults b y a n t i c i p a t i n g subsequent requests.  page-look-ahead  B u t the  server then has to wait for all the handlers to complete before it c a n c o n t i n u e ; this can delay the current  request  unnecessarily if say,  the  first  handler to complete released sufficient  resources for satisfying the current request.  Levin  also proposes a v a r i a t i o n of broadcast-and-wait  relaxes the c o m p l e t i o n requirement.  statement.  which  A l l eligible handlers are initiated in parallel but the sig-  naller does not wait for any of them to complete.  raise  s i m p l y called broadcast  It merely continues execution following the  T h u s all handlers a n d the signaller execute in parallel.  Levin's f o r m a l proof  rule for this policy requires t h a t the signaller a n d handlers act nearly i n d e p e n d e n t l y . T h e s e are exactly  the semantics he i n t e n d e d — the broadcast policy should be used n o r m a l l y to  report to users of an abstraction a purely  informational  condition.  (He notes that y o u c a n  relax the proof rules somewhat, but t h e n we risk serious interactions w i t h parallelism a n d synchronization mechanisms.) T h i s selection policy could be used w h e n the storage falls below a certain t h r e s h o l d ; the server sends a P O O L - S C A R C E exception to a broadcast c o m m u n i c a -  2  making sure each time a different handler is selected  107  tion channel to w h i c h all clients subscribe.  O n receipt of the notification, each client takes  appropriate action to release excess storage. T h i s selection policy, unlike the others, does not require that the server keep a list of client handlers, as the signaller acts independently of the handlers.  If the signaller broadcasts  to all subscribers, a n d some do not receive the message, the signaller is not directly affected. T h u s we can use a so-called and  Unreliable Broadcast  to implement Levin's broadcast  the interesting feature is that it can be i m p l e m e n t e d without  scribers  at the signaller.  a centralised  policy,  list  of sub-  Instead, i n a d i s t r i b u t e d system, all the system needs is to guarantee  to try to deliver the message to all hosts in the system.  Hence the server does not have the  b u r d e n of being responsible for m a i n t a i n i n g a list of clients' handlers, nor is concerned w i t h garbage collection of clients w h i c h are no longer interested in the exception. As  several local area networks provide an efficient broadcast  exploit it for such broadcast  exceptions.  A n unreliable broadcast  C h e r i t o n for the V - k e r n e l [Cheriton 84], mented in U N I X 4 . 2 B S D [ A h a m a d 85]. lem,  first  a n d a multicast  feature, we w o u l d like to has been i m p l e m e n t e d by  feature has recently  been imple-  B u t to use such an unreliable broadcast in this prob-  we need to show t h a t we can redesign L e v i n ' s storage allocator to exploit a  cast signal, instead of the broadcast-and-wait  signal w h i c h he used.  broad-  W e then discuss how we  could use the M e d u s a , M e s a a n d V - k e r n e l operating systems as v i r t u a l machines to provide tools to support this high level language construct.  4.2.2. Storage Allocator Using Unreliable Broadcast Notification B l a c k discusses various alternatives 129),  a n d proposes  to Levin's storage allocator  a solution in C S P [Hoare 78]  server to a list of its clients.  His solution persists,  in his thesis (pp  w h i c h uses a reliable broadcast in t h a t if a n  Allocate request  120-  f r o m the cannot  be  108  satisfied immediately it continues to signal the P O O L - L O W exception to all the user processes until enough store is available. but  Release  timeout  D u r i n g this time no further  requests will be handled correctly.  is necessary so that the  Allocate  requests are  accepted,  W e consider this an undesirable solution; a  client will not be suspended for arbitrarily long periods  should no client be ready to execute  Release.  W e propose using monitors, c o n d i t i o n variables, a n d their associated  WAIT  and SIG-  N A L operators w h i c h were designed [Hoare 74] for h a n d l i n g synchronization a n d d a t a access in such situations to i m p l e m e n t the allocator.  N o w L e v i n a n d B l a c k dismiss the notion of  using a monitor for achieving the necessary features because of deadlock.  cate  If a call of  generates a P O O L - L O W exception a n d a client handler attempts to call  handler will be suspended awaiting completion of the  Allocate  cannot complete u n t i l at least one of the handlers completes. because of the semantics of L e v i n ' s sequential-conditional continue until an eligible handler has completed.  request.  Allo-  Release,  However,  the  Allocate  T h i s p r o b l e m arises specifically  policy, whereby a signaller can not  T h i s problem of m a n a g i n g resources in m o n -  itors was already recognized, a n d has been described b y L a m p s o n a n d Redell in their implem e n t a t i o n of P I L O T [ L a m p s o n 80] in M e s a .  T h e y resolved it b y use of a W A I T on a c o n d i -  t i o n variable w h i c h releases the m o n i t o r lock. Our icy,  solution requires that the P O O L - L O W exception obeys the broadcast  because then a m o n i t o r can be used for m a n a g i n g the storage p o o l .  t r a y e d in F i g u r e 4.5, where a Mesa-like c o n d i t i o n variable  selection pol-  T h i s solution is por-  MoreAvailable  with an  associated  timeout is u s e d . 3  3  Note that the semaphore inner is not needed here, as the monitor protects the critical data from concurrent access.  109  StorageAllocator: M O N I T O R = begin p : pool; free : integer; exception : POOL-LOW policy broadcast; exception : POOL-EMPTY policy broadcast; condition variable : MoreAvailable, Squeezedone; Boolean : REQUEST-PENDING = FALSE; Allocate: E N T R Y P R O C E D U R E (p:pool, size: integer) returns (ddescriptor) = begin if REQUEST-PENDING then WAIT Squeezedone; | P(outer) if p.free < size then begin REQUEST-PENDING = TRUE; raise POOL-LOW; while p.free<size do begin WAIT MoreAvailable; if <timeout on MoreAvailable fired> then begin raise POOL-EMPTY; REQUEST-PENDING = FALSE; N O T I F Y Squeezedone; | V(outer) return; end; end; end; p.free := p.free - size; < create descriptor d for segment of amount size > REQUEST-PENDING = FALSE; N O T I F Y Squeezedone; | V(outer) end; Release: E N T R Y P R O C E D U R E (p:pool, d: descriptor) begin p.free := p.free + < size of segment referenced by d > < release space associated with d > N O T I F Y MoreAvailable; end; end; Figure 4.5 Storage Allocator Using a Monitor and Broadcast Exceptions  110  In our p r o g r a m p a r a d i g m , the nal  to  all its  clients  MoreAvailable  when  Allocate  storage  is  procedure raises the P O O L - L O W exception sig-  inadequate,  and  then  c o n d i t i o n variable, w i t h an appropriate timeout.  m o n i t o r lock, thus enabling handlers to execute the pool, a N O T I F Y signal on the  Release.  MoreAvailable  cedure can then t r y again to satisfy the request.  squeezing are delayed on the c o n d i t i o n variable  a  WAIT  The W A I T  on  the  releases  the  W h e n e v e r some store is released to  c o n d i t i o n is executed.  If the timeout fires  4  can raise the P O O L - E M P T Y exception a n d r e t u r n as before.  outer  executes  Allocate  the procedure  Allocate  Sqeezedone,  The  pro-  Allocate  requests made d u r i n g  w h i c h acts like the semaphore  in L e v i n ' s m o d e l .  Now  that we have designed a storage allocator m o n i t o r w h i c h uses a notification b r o a d -  cast to all its clients, we develop program models for its i m p l e m e n t a t i o n in various languages and  systems.  4.2.3. S t o r a g e A l l o c a t o r i n M e s a The  allocator example as it stands is almost exactly implementable in M e s a , except for  the problem of raising the broadcast exception P O O L - L O W .  W e have already seen one solu-  tion in the M e s a cooperating files example, where each client provides a procedure  parameter  to the server.  T h e server invokes each client's procedure in t u r n until enough storage has  been released.  U n f o r t u n a t e l y this requires the server to m a i n t a i n a list of clients a n d their  call-back procedures.  The variable  4  alternative in our p r o g r a m p a r a d i g m is for each client to W A I T o n a new c o n d i t i o n  PoolLow  w h i c h is signalled b y the server when the exception is detected w i t h a  remember we do not have a list of clients, so we cannot tell when ALL the handlers have complet-  ed  Ill  NOTIFY BROADCAST  PoolLow.  U n f o r t u n a t e l y the clients w o u l d be unable to handle any  other business while they were being blocked on the  PoolLow  be resolved b y each client p r o v i d i n g a peer exception POOL-LOW  c o n d i t i o n variable.  T h i s could  handler process w h i c h waits on  c o n d i t i o n variable a n d w h i c h executes SQUEEZE  synchronously whenever  the the  condition is signalled. T h i s solution also has the effect of increasing concurrency. The  i m p l e m e n t a t i o n of c o n d i t i o n variables  in M e s a still involves a queue of w a i t i n g  processes to be held in the m o n i t o r , but this list is m a i n t a i n e d by the operating system rather t h a n by the monitor's  programmer.  Software tools for efficient i m p l e m e n t a t i o n of this system are required for the cooperating processes, so that when a client process dies, its peer exception handler dies also, a n d is (eventually) made  removed from the monitor's queue.  only when (or  receive a r b i t r a r y died.  T h i s process death notification s h o u l d be  if) the c o n d i t i o n variable was  notified, otherwise  the  monitor w o u l d  u n w a n t e d interrupts f r o m the operating system whenever a client process  Furthermore,  c o n d i t i o n variables as specified b y Hoare w o u l d not be adequate, as the  M e s a c o n d i t i o n variables are i m p l e m e n t e d w i t h timeouts a n d the signal o n a c o n d i t i o n v a r i able signifies a notification hint ~ operation.  this i m p l e m e n t a t i o n of monitors is required for correct  It is not clear how efficient such a mechanism could be in M e s a , p a r t i c u l a r l y when  some clients are executing on remote machines. the implementors of the  PleaseRelease  P r e s u m a b l y the overheads are quite high, as  cooperating files d i d not use this a p p r o a c h .  However,  the m a c h i n e r y is present a n d c o u l d be used.  T h e r e is just one p r o b l e m w i t h this scheme for synchronous notification, a n d that is in ensuring the m u t u a l exclusion of access to shared d a t a of the client a n d the exception process while it is squeezing the d a t a .  A t o m i c actions, or shared m e m o r y a n d guaranteed progress of  112  a process until it voluntarily relinquishes the cpu (such as for Thoth team-members) would be sufficient to provide mutual exclusion; various designs are discussed later in Section 4.3.  4.2.4. S t o r a g e A l l o c a t o r in M e d u s a  Medusa defines  external  condition  reports  which are based on Levin's proposed structure  class exceptions, which can be raised on shared objects. An external report is raised by any activity with access to a shared object which incurs an  object  exception.  For each shared  object in the system there are 8 classes of object exception, some of which are signalled by the utilities (e.g. parity failure may occur on memory pages), and others may be defined and signalled by user activities. When an object exception is signalled, all other activities with access to the object receive an external report of the exception. These external reports are made using  flagbox  objects which are managed by Medusa's exception reporter utility. Each  activity can read the flags synchronously in the flagbox when it wishes, or it can request to be interrupted when a particular flagbox becomes non-empty. In this latter mode, an external report is similar to a UNIX  software  signal  interrupt.  The major disadvantage of the Medusa external mechanism is that backpointers are needed from every shared object to its users. As we have already shown, such tight coupling is neither necessary nor desirable; in many applications an unreliable broadcast notification from the server to its (unknown) clients is sufficient.  4.2.5. S t o r a g e A l l o c a t o r in V - k e r n e l  In Thoth-like message-passing environments, an extension of the server-client notification schemes already proposed in Section 2.3.1 requires each user registering an exception-process to be notified. The server is responsible for maintaining such a list, and for garbage-collection  113  of clients w h o are no longer interested i n the exception.  T h i s solution suffers f r o m the same  disadvantage as Mesa's a n d M e d u s a ' s , that back-pointers are needed from t h e server to its users. call a  O u r new solution uses a multicast inter-process  Broadcast  (or M u l t i c a s t )  c o m m u n i c a t i o n p r i m i t i v e , w h i c h we  Send (BSend) [Cheriton  84] T h i s has been i m p l e m e n t e d i n  the V - k e r n e l [Cheriton 83c] a n d i n U N I X 4 . 2 B S D [ A h a m a d 85]. A server c a n i n f o r m clients of a group-id  on w h i c h exceptions can be sent. T h e server uses the n o n - b l o c k i n g  BSend  to the  g r o u p - i d to notify the exception, a n d clients c a n chose to receive f r o m that g r o u p - i d either w i t h a separate exception-process, or b y their m a i n process (if the client is itself a server wait-  ing on a Receive ). A program model for a storage allocator server i n the V - k e r n e l is shown in F i g u r e 4.6.  pool-server() begin integer id, group-id, CLIENT_ID, A M T , reply, T H R E S H O L D = 100; Boolean REQUEST-PENDING = FALSE; message msg [MSG-SIZE]; pool P5 repeat begin id := Receive(msg); select (msg .RE Q U E ST- C ODE) begin case Allocate: begin if REQUEST-PENDING then begin queue-request(id,msg); reply := NO-REPLY; | P(outer) continue; end; Alloc: if p.free < msg .SIZE then begin REQUEST-PENDING := TRUE; BSend (< message POOL-LOW>, group-id); CLIENT-ID := id; A M T := msg.SIZE; | Implements WAIT, as  114  reply := NO-REPLY; < start timer > continue; end; p.free := p.free - msg.SIZE; satisfy-allocate (id, msg.size); reply := NO-REPLY;  | no reply to client  |makes reply to id j no need to reply  end; case Release: begin < update p.free>; if REQUEST-PENDING then begin | Implements N O T I F Y if p.free > = A M T then begin satisfy-allocate(CLIENT-ID,AMT); REQUEST-PENDING := FALSE; end; end; reply := OK; | to releaser end; case Timeout: begin if REQUEST-PENDING then begin Reply ( <message P O O L - E M P T Y > , CLIENT-ID ); REQUEST-PENDING := FALSE; end; reply := NO-REPLY; | to timer end; end; if reply =^ NO-REPLY then begin msg.REPLY-CODE := reply; Reply (msg, id); end; if (( - REQUEST-PENDING) A (requestq ^ NULL)) then begin dequeue-request(id,msg); | goto Alloc; | V(outer) end end; end;  Figure 4.6 Pool-server in V-kernel  115  The server uses BSend to notify any interested clients of the POOL-LOW exception. Clients obtain access to groupid (on which the exception is broadcast) through some initial request to the server, not shown here. The procedure satisfy-allocate(id,size) creates a descriptor for the requested segment of  size  resource units, and then executes a Reply to the  process id. The procedures Queue-request(id,msg) and Dequeue-request(id,msg) save and restore any Allocate requests which are received during a  REQUEST-PENDING.  Thus  the queue of waiting Allocate requests is explicitly maintained by the server, instead of by the monitor's condition variables. One advantage of this approach is that the server can make arbitrary scheduling decisions easily, rather than be constrained by the mechanisms provided by the monitor implementation.  Another advantage is that the implementation of  WAIT and BROADCAST NOTIFY are explicit here; there are no hidden overheads in operating system implementation of the condition synchronisation primitives. However, this eventdriven server is longer than the monitor version in Figure 4.4 as it requires extra variables (REQUEST-PENDING and the queue-variables) to encode its state and manage its flow control. But this example provides a useful program paradigm for message-based systems which do not provide a monitor construct or separate ports for different types of server requests. A corresponding pool user is shown in Figure 4.7. An alternative approach is to assume that a new Allocate request occurs only infrequently when the server is in the state exceptional  BUSY  REQUEST-PENDING.  The server can make an  reply to the client instead of queueing the request explicitly, and the client  must take appropriate action such as delaying before retrying the request. This server does not implement the same scheduling semantics, but by designing the client to maintain the extra state information  RETRYING,  the server code is simplified. This is the approach advo-  116  userpoolQ begin extrn extrn integer  hairy-list-structure m; semaphore M-UPDATING; id, pool-id, reply;  repeat begin id := Receive(msg); select (msg.REQUEST-CODE) begin case somefunc: begin descriptor dl,d2; Send(<AIIocate-msg>, pool-id); if msg.REPLY = POOL-EMPTY then begin reply := NOSOAP; | to caller continue; end; d l := msg.PTR; Send(<Allocate-msg>, pool-id); if msg .REPLY = P O O L - E M P T Y then begin Send(<Release-msg d l > , pool-id); reply := NOSOAP; continue; end; d2 := msg.PTR; < fill in d l and d2 from pool-id's message > P(M-UPDATING); add-to(m,dl); add_to(m,dl); V(M-UPDATING); reply := OK; end; if reply = NO-REPLY continue; msg .REPLY-CODE := reply; Reply (msg, id); end; end; exception-handlerQ begin  | to caller  117  hairy-list-structure m; semaphore M-UPD ATING;  extrn extrn repeat begin  id := Receive(msg); select (msg.REQUEST-CODE) begin case POOL-LOW: case POOL-SCARCE: begin P(M-UPDATING); <squeeze by performing data-dependent compaction of m using Send(<Release d message>, pool-id) to release any elements d removed by compacting m> V(M-UPDATING); end; end; end; end; Figure 4.7 A User and Exception Handler of the V-kernel Storage Allocator  cated i n the general model for designing event-driven servers which m a i n t a i n m i n i m u m state, and for the message-based systems like T h o t h this a p p r o a c h considerably simplifies the server code.  4.3. Exception Handlers as Processes: Mechanisms for Mutual Exclusion If systems are designed so that exceptions are h a n d l e d b y a separate process, the problem  is encountered  involved.  in s y n c h r o n i z a t i o n of the exception  process  a n d the other  T h i s is p a r t i c u l a r l y acute if the exception process executes non-atomic  process is defined t o execute atomically  processes  ally, where a  if it completes i n its entirety or not at a l l , w i t h no  visible intermediate states. For  a n example, consider the V - k e r n e l storage allocator described previously.  exclusion is necessary  o n the d a t a structure  handler must not be  squeezing  it.  m;  while U S E R  Mutual  is u p d a t i n g it, the exception  T h e r e are several alternatives  for i n h i b i t i n g  squeeze  118  while m  is being u p d a t e d :  (1)  reduce concurrency b y allowing only the one process, U S E R , to update  (2)  use a simple P - V semaphore  (3)  hardware test-and-set a shared variable  (4)  temporarily remove  (5)  t e m p o r a r i l y mask the action of squeeze in E P while U S E R is u p d a t i n g  (6)  rely on n o n - p r e e m p t i v e process execution - - while the exception process executes, U S E R  squeeze f r o m  cannot o b t a i n r e a d / w r i t e access to (7)  define m  to be atomic  called atomic  m  the server's list of handlers while U S E R is u p d a t i n g  m  m  m  d a t a w h i c h can only be u p d a t e d or accessed atomically, b y so-  transactions.  T h e s e solutions are considered in t u r n below. (1)  F i r s t , m u t u a l exclusion c o u l d be ensured b y m a k i n g the U S E R process receive the exception notification f r o m the server, a n d execute server w o u l d  BSend  on synchronously.  squeeze only  the P O O L - L O W exception, w h i c h U S E R w o u l d  Receive  Receive)  a  b y h a v i n g a separate exception  SendReceiveReply  process.  T o receive  loop between U S E R a n d the server.  considered undesirable for regular server does not expect any loop.  act  a n d when there is no need to the  exception message, the client U S E R w o u l d subscribe to the server's group-id. complete  and  The  T h i s c o u l d be an effective solution when the U S E R acts as a server  itself (and thus is usually ready a n d w a i t i n g on a gain concurrency  when it is free to do so.  Reply  Sends,  a broadcast  broadcast This would  A l t h o u g h this is  Send in the V - k e r n e l where  the  has been i m p l e m e n t e d so it is safe to complete such a  R e a s o n i n g about deadlock in s u c h systems is thus the same as if the  broadcast  119  Send were like a w i t h a regular The  Reply;  Receive  the difference is that the client can pick up this k i n d of  Reply  statement.  r e m a i n i n g solutions all provide a greater degree of parallelism, whereby the pro-  cess executing  squeeze ( E P )  In  the P - V semaphore  F i g u r e 4.7  is distinct f r o m the process u p d a t i n g M-UPDATING  m  (USER).  was used to force m u t u a l exclusion  between U S E R a n d E P .  T h i s technique has the overhead of two extra process  switches  for each P - V operation.  Ideally a solution is required w h i c h involves no extra process  switches for a normal-case d a t a update (when no other process is accessing the critical data).  Use  a shared variable, M - U P D A T I N G ,  atomic  action  such  as  p r o v i d e d by  initialised to a  hardware  FALSE,  w h i c h m a y be used in an  test-and-set.  The  USER  and  EP  processes test this before accessing the critical d a t a thus: while M - U P D A T I N G do d e l a y ( l ) ; M - U P D A T I N G :=  TRUE;  critical code M-UPDATING :=  FALSE;  T h i s has little overhead in the normal-case — just two statements i n U S E R a n d E P  to  test a n d reset the boolean flag. In our solution, if an access conflict is f o u n d , the second process loops in a  delay  statement until the resource is freed b y the other process.  solution has the obvious d r a w b a c k t h a t it is not crash-resilient, nor is it fair. exceptional case of a d a t a access conflict between U S E R  This In the  a n d E P , the simple solution  imposes some extra delay on U S E R a n d E P , a l t h o u g h this m a y well be balanced by the lack of process switching.  T h i s solution therefore follows the principle of m i n i m i s i n g the  normal-case overhead (by h a v i n g no context switches), at the expense of some overhead in extra delay a n d c p u cycles o n h a n d l i n g the exceptional case of data-access conflict.  120  However, hardware s u p p o r t is required for the indivisible test-and-set operation. T h i s solution can only be used if a list of clients is held at the server, a n d a R e l i a b l e  Broadcast  is used to notify the clients.  T h i s solution imposes a n overhead of 4 process  switches on each normal-case d a t a update — two to notify the server of the change in the list of eligible exception handlers, a n d two more to reset the list -- plus the maintenance of the list. M a s k i n g the effect of squeeze in E P w h e n U S E R is u p d a t i n g the data-structures, again requires 4 process switches.  F i r s t U S E R executes a  then after the critical update, the U S E R has to  Send  Send  to E P to set the mask, a n d  again to E P to reset the m a s k .  T h i s solution has the advantage over the previous m e t h o d , t h a t the server is independent of its users, so it can be used w i t h the unreliable broadcast. E x p l o i t i n g n o n - p r e e m p t i o n between processes o n a t e a m to achieve m u t u a l exclusion is available in T h o t h - l i k e operating systems, a n d has been used successfully for m a n y applications.  same  T h e idea is t h a t a process cannot be i n t e r r u p t e d by any other process on the  team; thus if a page fault occurs, only a process on another t e a m can execute, not  another process o n the same t e a m .  T h i s contrasts w i t h the M e d u s a activities in a task  force, w h i c h are designed explicitly to execute in parallel.  However, w h e n the actions  necessary for d a t a updates are complex, i n v o l v i n g procedure calls, there are no means of k n o w i n g , or specifying, w h i c h process actions are atomic (i.e. do not yield the processor b y m a k i n g system calls) a n d which ones are non-atomic (i.e. do yield the processor, leaving an intermediate state of c o m p u t a t i o n visible to another process). w h i c h happens to yield the processor m u s t not be called.  Hence a procedure  In large systems where m o d u -  larity is essential, it is not k n o w n w h i c h procedures yield the processor; thus such restric-  121 tions are intolerable and unmaintainable. (7)  Much research has been done on  atomic  data  updates  —  all-or-nothing computations  [Lomet 77], [Moss 82]. Liskov and Scheifler described a high level language approach in their paper  Guardians  and Actions:  Linguistic  support  for Robust,  Distributed  Programs  [Liskov 83], They promise some high level language tools for specifying atomic actions in their integrated programming language and system called ARGUS. Their approach has centered on a class of applications concerned with the manipulation and preservation of long-lived, on-line data such as airline reservations. In ARGUS, the user can define atomic  data  which is accessed by processes which are called  transactions). The action may complete either by  committing  actions  or  (or, equivalently, aborting.  When an  action aborts, the effect is as if the action had never begun: all modified atomic objects are restored to their previous state. When an action commits, all modified objects take on their new states. Let us see how such ideas could be used in the storage allocator. USER and EP share access to  atomic  data  m. If EP obtains a write permission on the data, and then blocks  (maybe for some debugging i/o) and then USER tries to execute, USER will be prevented from accessing the atomic data, as EP has a write lock to it. Ideally, USER will be delayed until EP has finished — just like waiting on a P-V semaphore. But in this case, the programmer merely has to specify that the data m is  atomic;  the implementing system then takes  care of the queue on the semaphore. This linguistic tool presents a real advance in concurrent high level language design, and should certainly be incorporated into new languages.  Its  implementation in other operating systems has not been studied as the ARGUS project is an integrated system.  122 T h e author has designed an i m p l e m e n t a t i o n of A R G U S ' s transactions under the  UNIX  operating system, exploiting the principles of s t r u c t u r i n g programs for normal-case efficiency, w h i c h is described in full in the next C h a p t e r .  T h i s provides a tool for i m p l e m e n t i n g excep-  tion processes, w i t h due regard for process s y n c h r o n i z a t i o n a n d serializability. To tools  conclude, for efficient i m p l e m e n t a t i o n of exception  for m u t u a l exclusion  software atomic  actions.  are  needed such as  handlers as processes, efficient  p r o v i d e d b y hardware  test-and-set, or  by  CHAPTER 5 Programs Illustrating the Model  T h i s chapter describes two programs w h i c h illustrate the use of the m o d e l . F i r s t , two structures used for exception h a n d l i n g in multi-process implementations of the X.25  and X.29  protocols in Verex  [Lockhart  79]  are described.  B o t h these protocols  are  hierarchic a n d layered. T h e X.25 protocol [ C C I T T 76] is a m u l t i - l a y e r protocol for interfacing c o m p u t e r equipment to a packet-switched network.  E a c h X.25 protocol layer — the p h y s i c a l ,  link level or packet level - - can be described as a finite state machine ( F S M ) event  manager  m a k i n g appropriate state transitions on receipt of packets or timeouts [ B o c h m a n n 78]. C C I T T V i r t u a l T e r m i n a l Protocol X . 2 9  [ C C G 78],  here referred to as Datapac's  The  Interactive  T e r m i n a l Interface (ITI) protocol, establishes a d a t a structure t h a t is viewed as representing the t e r m i n a l .  A remote terminal user m a y therefore interact w i t h a host-side application pro-  gram over the D a t a p a c network v i a the host-side X.25  a n d ITI protocols.  T h e s e protocols  were studied because although each protocol layer is i m p l e m e n t e d as a collection of cooperating  processes,  the protocols are structured to reflect differences in the type of exceptional  situations t h a t m a y occur.  T h e author i m p l e m e n t e d the ITI protocol o n the Verex operating  system [Atkins 83b], by following the design principles established in the m o d e l . Deering's i m p l e m e n t a t i o n of the X.25 logical flow of events.  protocol [Deering 82]  W i t h i n each of the lower  form of a F i n i t e State M a c h i n e server.  is structured to reflect  protocol layers of X . 2 5 , one process takes the  T h i s process provides all the necessary s y n c h r o n i s a t i o n  of t r a n s m i t t i n g a n d receiving, leading to efficient h a n d l i n g of all transitions. lower  level  protocols,  where  it  is  the  difficult  to  123  identify  normal  transitions  T h u s in these because  the  124  probabilities of the events are not k n o w n , there are no special exception cases; all events are treated in a similar w a y .  T h e resulting software is p r o d u c e d quickly and models closely the  protocol descriptions c o n t r i b u t i n g to its u n d e r s t a n d a b i l i t y a n d m a i n t a i n a b i l i t y .  F o r higher which  occur  level protocols such as the X . 2 9  with  low p r o b a b i l i t y .  The  protocol, exception events can be identified  author  has  structured  the  ITI  protocol  so  that  t r a n s m i t t i n g a n d receiving are performed by two independent filter processes, a n d exception h a n d l i n g is performed b y an auxiliary exception process w h i c h is i n v o k e d as needed [Atkins 83b]. T h i s i m p l e m e n t a t i o n of ITI illustrates designing a p r o g r a m to achieve all three objectives of the model (see Figure (1)  3.3).  O b j e c t i v e A : decreasing the average r u n - t i m e , is achieved b y r e d u c i n g the n o r m a l case h a n d l i n g time (cost b of L e v e l 2 of F i g u r e 3.3).  A n alternate process configuration is  used to reduce inter-process c o m m u n i c a t i o n for the n o r m a l events ( r e a d / w r i t e requests). T h i s is shown as feature 2 of L e v e l 3 of Figure 3.4. (2)  O b j e c t i v e B : decreasing  the exception  processing time,  is achieved  for the  exception event, b y r e d u c i n g the inter-process exception notification time.  <BRK>  T h i s is shown  as cost b of F i g u r e 3.5. (3)  O b j e c t i v e C : increasing the program's  f u n c t i o n a l i t y in a m o d u l a r way, is achieved by  s t r u c t u r i n g the p r o g r a m to use an ignorable exception notification f r o m the server (X.25) to the client (ITI), for network reconnection. This  implementation  approach used for X.25,  of ITI  as  a  filter  is  T h i s design m e t h o d is s h o w n in F i g u r e 3.6. compared  with  the  multi-process  server  a n d it is shown t h a t the filter structure matches the f u n c t i o n a l i t y of  the ITI protocol, a n d results in efficient software.  125  In Section 5.2,  a mechanism for i m p l e m e n t i n g nested  atomic d a t a base updates is described.  atomic  transactions  to  perform  T h e need for such a m e c h a n i s m has already been dis-  cussed in the previous C h a p t e r in Section 4.3 concerning m u t u a l exclusion mechanisms. author's  The  i m p l e m e n t a t i o n illustrates objective A of the model, b y r e d u c i n g the average r u n -  time, shown in F i g u r e 3.4.  T h i s is achieved b y r e d u c i n g b o t h cost b — the n o r m a l case h a n -  d l i n g cost -- a n d cost c -- the context-saving cost ~ of level 2 of the m o d e l .  T o reduce  latter context-saving cost incurred while h a n d l i n g n o r m a l events, the a p p r o a c h is to that transactions tional event.  the  assume  C O M M I T b y default, a n d therefore to assume t h a t an A B O R T is an excep-  C o n t e x t - s a v i n g is thus kept to a m i n i m u m while processing normal events, but  at the a d d e d expense of exception h a n d l i n g when an exception occurs. A n o t h e r technique to reduce the n o r m a l case h a n d l i n g costs, is to use broadcast  excep-  tions to reduce the network traffic over a L o c a l A r e a N e t w o r k (feature 3 of L e v e l 3 of the model in F i g u r e 3.4).  A h y b r i d scheme of reliable-unreliable broadcast inter-process c o m m u n i -  cation is used to achieve this.  T h i s design gives efficient performance a n d is easy to p r o g r a m .  5.1. Exceptions and Protocol Implementations ~ a New Approach 5.1.1. The ITI Protocol ITI acts m a i n l y as a n i / o filter, t r a n s m i t t i n g r e a d / w r i t e requests f r o m its client (called the a p p l i c a t i o n , A P P L ) t h r o u g h to X . 2 5 .  N o w if it is structured  as a finite state machine  server like the X . 2 5 server, t h e n the I T I M a i n process requires two worker processes, a reader (ITIReader)  and  a writer  (ITIWriter),  X 2 5 M a i n , as s h o w n in Figure  5.1.  to  handle delays  in reading f r o m  a n d w r i t i n g to  126  Figure 5.1 ITI Implementation as a Server T h e a u t h o r has analysed such an i m p l e m e n t a t i o n , a n d finds that each read a n d write request from the client involves 6 more process switches at the ITI layer ~ I T I M a i n to I T I R e a d e r to X 2 5 M a i n and back again. network, there is a total of 7  Send-Receive-Reply  X.25), or 14 context switches.  f r o m client to  T h u s for a n o r m a l case read from the  messages (including the processing w i t h i n  O n the TI990 computer, each  Send-Receive-Reply  takes 3 msecs (though on faster machines w i t h hardware s u p p o r t for context takes about 1 msec). each read request.  sequence  s w i t c h i n g it  T h e r e f o r e it takes at least 21 msecs to process the context switches for  A t full speed on a 9600-baud D a t a p a c link, 1200 bytes/sec are received, in  d a t a packets of 128 or 256 bytes.  T h u s up to 10 packets must be processed per second; the  context switches for this take 210 msecs. keep u p w i t h the d a t a a r r i v i n g .  If the machine is dedicated to the task, it c a n easily  T h e n the argument for readability a n d u n d e r s t a n d a b i l i t y of  multiprocess s t r u c t u r i n g compensates for the context switching overhead. However w i t h three  127  X.25  v i r t u a l circuits going full speed, or w i t h one other user on the system the  computer  could hardly keep u p . F o l l o w i n g the m o d e l , it was noted that a large n u m b e r of context switches were being made for normal-case processing, so the a u t h o r decided to restructure  the ITI protocol  reflect its major f u n c t i o n - - that of an i / o filter (rather t h a n a general-purpose server).  to By  using a different process configuration, n o r m a l case r e a d / w r i t e requests can be handled more efficiently.  B u t some consideration m u s t also to be given to h a n d l i n g the exceptional situa-  tions w h i c h might occur, such as network failing or a user h i t t i n g < B R K > .  T h e X.25-ITI  interface m u s t be designed so t h a t inter-process exceptions c a n be c o m m u n i c a t e d efficiently and conveniently.  F i r s t , an analysis of the b e h a v i o u r of the protocol on the < B R K >  exception was m a d e .  5.1.2. The Interrupt Protocol Since their inception, c o m m u n i c a t i o n protocols have e m p l o y e d asynchronous interrupt signals, b u t t y p i c a l l y require t h a t interrupts be synchronised w i t h the parallel d a t a c h a n n e l . T h e X.25 protocol specifies a complex sequence of d a t a transfers over the network to handle the i n t e r r u p t exception signal < B R K > , w h i c h occurs w h e n a remote user presses the interr u p t key o n his t e r m i n a l . A total of 5 packets m u s t be exchanged across the network t h r o u g h the link layer of X.25 o n a < B R K > To  transmit  expedited eventually  data be  an interrupt  signal. T h e s e are shown below in F i g u r e 5.2.  caused  b y a remote  user pressing < B R K > ,  ( I N T ) packet is sent b y the remote X.25's client, C L R E M O T E , acknowledged by a n  X.25's client, C L O C A L . acknowledgement arrives.  interrupt  confirm  (INT-CONF)  packet  an  interrupt  w h i c h must  f r o m the  Subsequent interrupts must be refused b y C L R E M O T E  until  local the  T h e I N T packet m a y overtake o r d i n a r y d a t a packets in transit.  128  CLREMOTE <BRK>  a n d R e m o t e X.25  L o c a l X.25 and C L O C A L  f r o m remote  user 1.  INT = = = = = = = >  2.  C L O C A L notifies its client of interrupt  Q - D A T A pkt (IND-OF-BRK) C L O C A L can now confirm = = = = = = = >  3.  after restoring o u t p u t  Q - D A T A pkt (reset param) < = = = = = = =  CLREMOTE  can  now restore o u t p u t  4.  INT-CONF  <======= CLREMOTE now  can  accept more  interrupts 5.  Q - D A T A pkt (param reset done) = = = = = = = >  E n d of interrupt h a n d l i n g  Figure 5.2 Network Packets to be Exchanged on an Interrupt  Thus C L R E M O T E  also inserts a Q-DATA  packet w i t h the code  ( I N D - O F - B R K ) into the d a t a stream at the point of the < B R K > .  INDICATION-OF-BREAK CLREMOTE  also has an  option t o d i s c a r d all subsequent o u t p u t , w h i c h it m a y exert after a < B R K > . W h e n C L O C A L receives the I N T packet f r o m X 2 5 M a i n , it takes appropriate action to notify its client t h a t a n interrupt has occurred ( possibly b y destroying the client).  Then  129  CLOCAL  must  read  data  packets  until  the I N D - O F - B R K  passed-over i n flight are discarded at the receiver.  packet.  Commonly,  packets  O f course, if the remote host was o u t p u t -  ting, or was quiescent at the time of the < B R K > ,  there will be only the I N D - O F - B R K  packet to be read. W h e n C L O C A L receives the I N D - O F - B R K , it must send a Q-DATA  packet back w i t h a  parameter set t o a code to tell C L R E M O T E to stop d i s c a r d i n g o u t p u t (if it was), a n d t h e n the I N T - C O N F packet c a n be sent. sends the Q-DATA  A s the last stage in h a n d l i n g a n interrupt, C L R E M O T E  packet back t o C L O C A L , w i t h the parameters reset t o indicate that out-  put has been resumed.  5.1.3. The ITI Implementation in a Thoth-like Operating System W e observe t h a t a l t h o u g h the network c a n fail at a n y time, it is not necessary for the higher levels to be i n f o r m e d of this event  immediately (i.e. asynchronously).  Instead, the  application m a y continue to execute n o r m a l l y , unaware of a n y d i s r u p t i o n to the i / o service until a read or write operation t o / f r o m the network is requested.  W h e n this occurs, the ITI  reader process m a y receive, synchronously, the exceptional i n f o r m a t i o n concerning the failure of the u n d e r l y i n g network layer. T h i s synchronous scheme cannot however be used to c o m m u n i c a t e a < B R K >  interrupt  from the remote user, as notification cannot always wait until the user requests a n i / o operation.  F o r example, if a user initiates a long c o m p i l a t i o n , then notices a n error a n d presses  <BRK>,  a response is required w i t h i n a short finite p e r i o d , say half a second.  In s u c h a  situation, the client is deaf to the server; thus another solution must be f o u n d . W e have already discussed tools for such  inter-process exception notification such as  process destruction (Section 2.3.2.3) a n d the toggle switch (Section 4.1.1). In order to chose  130  between these alternatives,  a n analysis of their behaviour o n the  <BRK>  exception  was  made.  5.1.3.1. Notification of Inter-process Exception Using Process Destruction T h e process switches i n v o l v e d in processing a n interrupt using process destruction are shown below in F i g u r e 5.3 . 1  Network  E a c h inter-process  SendReceiveReply  message exchange causes  L o c a l X 2 5 a n d ITI  1. I N T = = = = = = >  ...  X25Main  > Destroys ITI's v i c t i m  ITI's v u l t u r e = >  I T I M a i n —> Recreates T i m e r  I .  X 2 5 M a i n < = I T I M a i n re-registers its timer v i c t i m 2. I N D - O F - B R K = = = = = = >  ...  X25Main = >  ITIReader = >  ITIMain  ...  X25Main < =  ITIWriter < =  ITIMain  ...  X 2 5 M a i n < = ITIWriter < =  ITIMain  3. Q - D A T A < = = = = = = 4. I N T - C O N F < = = = = = =  Subsequent interrupts c a n now be accepted by C R E M O T E 5. Q - D A T A = = = = = = > where ... =  ...  X25Main = >  ITIReader = >  ITIMain  a d d i t i o n a l process switches w i t h i n X.25 t o handle each packet  Figure 5.3 Process Switches to Handle an Interrupt Using Process Destruction assuming that the ITI protocol is structured as a server with two workers to handle delays in communication with X.25.  131  2 process switches.  T h e n u m b e r o f SendReceiveReply  messages to handle each packet are  given i n c o l u m n 1 of T a b l e 5.1 below.  T h u s a total of 20 + n X . 2 5 p to handle the < B R K >  2  =  33  SendReceiveReply  exception, w h i c h takes at least 99 msecs.  ITI's client t o m a k e its response (such as a message implemented,  timesharing with  seconds for the < B R K >  message exchanges are needed It takes even longer for  <IO-BRK>).  W h e n this system was  another  user we observed response times of between 1-4  to be h a n d l e d .  T h i s poor performance arose because the other user  could o b t a i n a full c p u q u a n t u m between each context s w i t c h .  5.1.3.2. Notification of Inter-process Exception Using a Toggle Switch Here, I T I puts itself in the position of being always ready to receive a message f r o m the lower level, regardless of whether or not the application has a read request o u t s t a n d i n g . the client is never deaf to the server below.  Thus  T h e author has i m p l e m e n t e d this b y a d d i n g  Table 5.1 Number of SendReceiveReply Messages on < B R K > packet no.  ITI as a server no. of context switches  1  9*2  =  2*2  2  3*2  3*2  3  2*2  3*2  4  3*2  2*2  5  3*2  3*2  Total number of context switches  20*2  13*2  20  13  Total number of SendReceiveReply exchanges  2  ITI as an I/O filter no. of context switches  nX.25p is the number of additional process switches within X.25 necessary to handle every packet, 3 for inbound packets and 2 for outbound packets.  132  another state variable to b o t h the ITI a n d X.25 servers to act as a toggle switch to exert c o n trol on the m o v e m e n t of d a t a from the lower level X.25 to the higher level ITI, as described in Section 4.1.1. returns  W h e n the switch is on, X.25 returns b o t h d a t a a n d interrupts; when off, X.25  <BRK>  signals o n l y , a n d X.25 queues n o r m a l d a t a as before until the client above  makes another read request.  U n f o r t u n a t e l y , this scheme incurs the overhead of 4 extra pro-  cess switches o n each read request - 2 to set the switch on before reading d a t a a n d 2 to set it  off afterwards. T h i s overhead can be reduced to 2 extra process switches for every normal-case read by setting a default for X.25 request.  to set  the switch  off a u t o m a t i c a l l y after satisfying every  read  T h e n the I T I M a i n process controls the switch setting so t h a t when there is a n appli-  cation level read request, it requests d a t a f r o m X.25 b y setting the switch on. T h i s default however is not the correct one for application programs such as file readers w h i c h do always have an o u t s t a n d i n g read to the level below, a n d w h i c h therefore are not deaf to the server.  S u c h programs do not need the toggle switch, w h i c h should always be on.  T h e default s h o u l d enable such programs to r u n w i t h no normal-case overhead.  T h e s e two  aims are i n c o m p a t i b l e . T h u s , if the ITI overhead is reduced to 2 process switches by using a default off after every read has been satisfied, t h e n every other client of X.25 must be aware of this default, a n d must set the switch on after every read request.  T h i s was considered to  be an unacceptable solution, so the author i m p l e m e n t e d the former design, w h i c h increases the context switches o n a normal-case read f r o m 14 to 18, a significant a m o u n t .  A t full speed  (9600 baud) over the D a t a p a c network, the c o m p u t e r c o u l d no longer keep u p w i t h one other user using c p u cycles.  133 5.1.3.3. I T I as a F i l t e r a n d a n E x c e p t i o n P r o c e s s F o l l o w i n g the m o d e l , the author decided to use an inter-process exception notification from X.25 to a separate exception handler process at the ITI layer above for n o t i f y i n g exceptional asynchronous events encountered b y ITI f r o m the remote client. W h e n the server detects an exception, it checks to see if an exception E H , exists, a n d is ready.  If so, the server makes a  a n d continues processing exactly as before.  Reply  handler  to the E H , renders the E H  process, unready  T h u s f r o m the server's point of view, it exports  an exception merely b y m a k i n g a n o n - b l o c k i n g Reply to a registered, ready E H , a n d then it  unreadies  the E H . If there is a registered E H w h i c h is unready,  message until the E H makes a  Send  the server queues the  to indicate it is ready again.  T h i s approach is taken so  that if there is no exception process ready, the server only has to queue up a with minimal information. tion replies.  Reply  Reply  message  If the queue becomes full, the server c a n d r o p subsequent excep-  T h e server therefore needs to make no assumptions about the E H at the higher  level, thus preserving m o d u l a r i t y requirements. T h e author observed that the exception process c o u l d also aid in s t r u c t u r i n g ITI as a filter, b y allowing the exception process to take control as necessary on a < B R K > , a n d initiate further read requests (for the I N D - O F - B R K packet) a n d write requests (for the Q - D A T A and I N T - C O N F X 2 5 M a i n with a  packets)  Send  until the exceptional situation has been dealt w i t h .  w h e n it is ready again to  events o c c u r r i n g across the X . 2 5 - I T I  handle further exceptions.  E H notifies Exceptional  interface d u r i n g this processing are queued up in X 2 5  until there is a ready registered exception handler process ready to deal w i t h t h e m .  ITI can  now be s t r u c t u r e d as a reader filter ( R F I L ) , a writer filter ( W F I L ) , a timer a n d an exception process E H , as shown below in F i g u r e 5.4, a n d the author successfully i m p l e m e n t e d ITI in this  134  way. This structure reflects the major functionality of ITI, that of an i/ofilter,better than the server model, and it is structured more efficiently for the normal case (no interrupts or network failures) by  reducing  the number of process switches by 2 on each read/write request,  from 14 to 12. Here, the run-time overhead for normal cases has been reduced by a significant margin through restructuring the conventional i/o server by two i/ofiltersand an exception process. In this implemetation, on <BRK>, the context switches are as shown in Figure 5.5. Recall that each inter-process SendReceiveReply message exchange causes 2 process switches. The number of SendReceiveReply messages to handle each network packet are now given in column 2 of Table 5.1.  " Process  = d i r e c t i o n o f SEND Message r e q u e s t s  Figure 5.4 ITI as an I / O Filter  135  Network  Local X25 and ITI  1. I N T = = = = = = >  ...  X25Main = > E H  ...  X25Main = > RFIL = > E H  ...  X25Main < = WFIL < = E H  ...  X25Main < = WFIL < = E H  2. I N D - O F - B R K = = = = = = >  3. Q - D A T A < = = = = = = 4. I N T - C O N F < = = = = = =  |  Subsequent interrupts c a n now be accepted b y remote client 5. Q - D A T A =======> where ... =  ...  X25Main = >  RFIL  a d d i t i o n a l process switches w i t h i n X . 2 5 to handle a packet  Figure 5.5 Process Switches on < B R K > with a Separate Exception Process T h u s a total of 13 + n X . 2 5 p handle the < B R K > server.  3  =  26  SendReceiveReply  message exchanges are used to  exception, c o m p a r e d w i t h 33 using process destruction w i t h a I T I as a  W h e n the system was t i m e s h a r i n g w i t h another user we observed response times of  a r o u n d 1 second for the < B R K >  t o be h a n d l e d .  T h u s the m a j o r effect of reducing the con-  text switches is not only i n the actual context switch time (a difference of j u s t 7x3 = 21 msecs), b u t i n the r e d u c t i o n of c p u time slices made available t o other users d u r i n g  <BRK>  processing.  3  remember nX.25p is the number of additional process switches within X.25 necessary to handle every packet, = 3 for inbound packets and 2 for outbound packets.  136  5.1.4. Problems with Synchronisation T h e exception h a n d l i n g on < B R K >  is non-atomic  in t h a t intermediate states d u r i n g  exception h a n d l i n g are visible; the exception h a n d l i n g involves several inter-process  message  exchanges w h i c h take a finite time, d u r i n g which other processes m a y observe and intervene with the only p a r t i a l l y c o m p l e t e d h a n d l i n g . There <BRK>  are  at  least  2  alternatives  previously m e n t i o n e d .  T h i s causes problems in synchronisation.  for E H to  coordinate  the  W e could either allow the E H to  5 packet  Send  exchanges  on  directly to X 2 5  read a n d write the packets, or we could m a k e E H use the services of R F I L a n d W F I L .  to  The  author took the latter a p p r o a c h , as either R F I L or W F I L w o u l d p r o b a b l y be interacting w i t h X 2 5 M a i n at the time of the exception, a n d E H w o u l d need to coordinate their actions a n y w a y until the exception h a n d l i n g was completed.  If E H was allowed to  Send  to X 2 5 M a i n directly  for the packet exchange, R F I L a n d W F I L could service concurrently (and incorrectly) other requests from their clients. T h e concurrency d u r i n g exception h a n d l i n g causes complications.  If the < B R K >  occurs d u r i n g a quiescent period (while X 2 5 M a i n h a d a deaf client) then  the s y n c h r o n i s a t i o n appears  Receive  to be s t r a i g h t f o r w a r d , as R F I L  and W F I L  are  waiting o n a  for i / o instructions. T h e following stages should occur d u r i n g exception h a n d l i n g :  (1)  E H is readied  b y X 2 5 M a i n executing a  (2)  E H makes a  (3)  R F I L makes a  (4)  E H makes a  (5)  R F I L makes a  Send  to E H .  to R F I L for the I N T packet.  Send  Send  Reply  to X 2 5 M a i n . . . . and eventually a  Reply  to E H .  to R F I L for the I N D - O F - B R K packet.  Send  to X 2 5 M a i n . . . . and eventually a  Reply  to E H .  137 (6)  EH makes a Send to WFIL to dispatch the Q-DATA packet  (7) WFIL makes a Send to X25Main .... and eventually a Reply to EH. (8)  EH makes a Send to WFIL to dispatch the INT-CONF packet.  (9) WFIL makes a Send to X25Main .... and eventually a Reply to EH. (10) EH considers the job is done, and makes a Send to X25Main, indicating it is ready to handle another exception. However, even in this simple scheme complications arise if ITI's application program (APPL) intervenes with a R/W request during these non-atomic message exchanges. For example, if A P P L executes a read request to RFIL during stage 3 above, then after RFIL has made its Reply with the INT packet to EH, RFIL executes Receive again, without stopping, and at this time only the message from A P P L is in RFIL's message queue, as EH has not yet had a chance to execute since being unblocked by RFIL at the end of stage 3. So RFIL will accept APPL's read request and will (eventually) return the IND-OF-BRK packet to its bewildered client A P P L instead of to EH. Thus after a < B R K > exception we need to be able to delay any further interaction of APPL with the filters, until the exception handling is finished. This can be achieved in several ways; the easiest is by setting global flags between EH and the filters, as they are all on the same team and share memory. But even with global flags, there are situations which are not easily resolvable without the ability to be able to give EH the highest priority on the team. This can arise for example when there is already an outstanding read request from A P P L when the exception occurs. The server, X25Main, unblocks EH with a Reply, and also, without stopping, unblocks RFIL with a Reply with its data -- the INT packet. Both EH and RFIL are ready to execute, and  138  if R F I L executes first, it can pass the I N T packet to A P P L as d a t a ; A P P L can swallow it, a n d make another read request of R F I L . w h i c h makes a  Reply  R F I L c a n execute next a n d  w i t h the I N D - O F - B R K packet.  Send  again to X 2 5 M a i n ,  R F I L c o u l d then r e t u r n the I N D - O F -  B R K packet to A P P L , before E H even begins to execute. If we cannot control process priorities, then we must make R F I L aware t h a t an I N T packet is special, a n d subsequent d a t a packets for exception processing must be h a n d l e d by EH.  Such  an  approach  violates  modularity, and  is very  difficult  to  program  correctly.  Whereas if it is possible to specify that E H executes first if more t h a n one process on its team is ready, t h e n the problem can be ameliorated.  E H executes first, a n d makes a read request  to R F I L to pick u p the I N D - O F - B R K packet.  M e a n w h i l e , R F I L is u n b l o c k e d w i t h the I N T  d a t a packet w h i c h it gives to A P P L as before.  Now R F I L c a n accept E H ' s read request for  d a t a , a n d pass the I N D - O F - B R K packet correctly to E H .  B u t another c o m p l i c a t i o n arises if b o t h E H a n d A P P L make read requests to R F I L while it is busy fetching d a t a .  Unless R F I L accepts any messages f r o m E H in p r i o r i t y , t h e n again  R F I L could send the I N D - O F - B R K packet to A P P L . Recall from Section 4.3  that in Verex a process on a team  executes atomically w i t h  respect to other processes on a t e a m , till it v o l u n t a r i l y relinquishes the processor, a n d that a higher priority process always executes before a lower p r i o r i t y one on the same team, even t h r o u g h i n v o l u n t a r y time-slice a n d page-fault deschedules.  B o t h process p r i o r i t y a n d message  priority are available in V e r e x so the a u t h o r was able to achieve this complex s y n c h r o n i s a t i o n without altering R F I L . A m a j o r p r o b l e m in w r i t i n g an exception handler as a separate concurrent process, arises in s y n c h r o n i s a t i o n . A l t h o u g h the technique allows clean separation of normal-case code, a n d  139  is conceptually m o d u l a r a n d well-structured, if non-atomic action is needed d u r i n g exception h a n d l i n g , special features are needed to write clear, correct programs. global (shared) m e m o r y , process's messages.  process  T h r e e such features  priority, a n d message priority for h a n d l i n g the  are  exception  T h i s latter was also recognised b y the implementors of R I G ; emergency  messages r u n at a higher p r i o r i t y t h a n regular messages. In general, we need to be able to specify a partial ordering on events, so t h a t the client is not serviced d u r i n g the exception processing.  V a r i o u s mechanisms for s y n c h r o n i z a t i o n in con-  current p r o g r a m m i n g have been succinctly described in [Andrews 83].  T h e article states that  ensuring the invisibility of inconsistent states d u r i n g an atomic action is a n active area.  Some of the most p r o m i s i n g results for our problem have been o b t a i n e d b y A n d r e w s i n  his language, S R (for Synchronising described in the next Section 5.2. using  research  global  PENDING,  Boolean  flags  Resources)  [Andrews 82], a n d b y L i s k o v ' s atomic  T h e exception h a n d l i n g c o u l d be readily specified in S R b y  such  as  INTERRUPT-PENDING  plus scheduling a n d s y n c h r o n i z i n g constraints.  unstructured,  actions,  does lead to some easier  and  READ-REQUEST-  Use of global  flags,  although  proofs of p r o g r a m correctness t h a n w i t h  message-  passing alone, as S c h l i c h t i n g a n d Schneider have shown [Schlichting 82]. T h u s we have shown that the use of a separate exception process to handle asynchronous exceptions in the T h o t h message-passing environment is b o t h practical a n d applicable. T o show that there are situations exploiting a separate exception process w h i c h are relatively easy to p r o g r a m , we describe i n the next Section how an exception process is used to handle another exception: the end-of-file exception w h i c h occurs when the v i r t u a l circuit is arbitrarily a n d a b r u p t l y t e r m i n a t e d .  140  5.1.5. Recovery after network failure A s described in Section 3.4.2.1., we wish to provide an ITI i m p l e m e n t a t i o n a n d sessionlayer service w h i c h w o u l d hide a failure i n the u n d e r l y i n g network f r o m the a p p l i c a t i o n layer until either a timeout fired, or the same user logged in again.  In the latter case, the applica-  tion program will be available to the user just as if no disconnection h a d occurred (except for a brief R E C O N N E C T E D message). T h e m a j o r p r o b l e m w i t h p r o v i d i n g reconnection is that the state of the suspended protocol must be saved; yet in order to allow the remote user to log in again for reconnection, a w o r k i n g version of the protocol m u s t be available.  T h e old a n d new v i r t u a l circuits  must  t h e n be merged after a successful login. The  author observed t h a t the exceptional  event of failure of the u n d e r l y i n g network  could also be efficiently h a n d l e d using an exception handler process. 5.1.2,  A s explained in Section  this end-of-file ( E O F ) failure is c o m m u n i c a t e d synchronously to ITI on its next read or  write request to X . 2 5 .  The  Reply  contains the E O F i n f o r m a t i o n .  T h e exception process is used as follows. T h e r e is one exception handler process, E H , for each X.25 v i r t u a l circuit.  If a network failure o n a v i r t u a l circuit is recorded, the E H associ-  ated w i t h the failed v i r t u a l circuit is notified immediately b y a  Reply  E H tears d o w n the v i r t u a l circuit a n d suspends itself b y executing a cess w h i c h w o u l d be i n v o k e d to handle a reconnection.  f r o m the server below.  Send  request to the pro-  If a timeout occurs before the remote  user has reconnected, the t i m e r destroys the suspended session, causing the user to be logged out. If the  user attempts to reconnect  v i a a new X.25  period, another E H process is created for this new circuit.  v i r t u a l circuit w i t h i n the  timeout  T h e reconnection sequence involves  141  a check of the remote user's status, a n d if it is suspended, the old E H associated t o r n - d o w n v i r t u a l circuit is u n b l o c k e d w i t h a  Reply.  w i t h the  T h e o l d E H links its session to the new  X.25 v i r t u a l circuit, a n d wakes up the reader a n d the writer, completing reconnection  before  destroying the new E H . T h e reader a n d writer t h e n continue exactly as if the u n d e r l y i n g network h a d not failed, r e p l y i n g w i t h new d a t a to the suspended a p p l i c a t i o n . This problems.  a p p l i c a t i o n was  very straightforward  to implement a n d h a d no synchronisation  T h e author made one m i n o r change to R F I L so that o n receiving a n E O F  Reply  from the server below, R F I L checked if an exception handler process was ready to handle the EOF.  If so, R F I L v o l u n t a r i l y b l o c k e d , instead of immediately t e r m i n a t i n g the v i r t u a l circuit  by c o m m i t t i n g suicide.  This  reconnection  feature provides a very useful a d d i t i o n to  the  remote login-in service, a n d has been measured as being i n v o k e d b y 5% of remote login sessions.  5.1.6. Summary T w o mechanisms for h a n d l i n g exceptions in multi-process protocol implementations have been described.  T h e first mechanism reflects the use of c o m b i n i n g the role of a server w i t h a  finite state machine.  T h i s structure is particularly efficient for the lower level protocols such  as X . 2 5 , where it is difficult or impossible to define a n o r m a l flow of control. F o r higher level protocols where exceptional situations occur less frequently, it is more efficient to structure according to the n o r m a l flow of control.  T h e server  model is not so  efficient in these situations - instead, the protocol is structured as an i / o filter w i t h a special exception handler process. T h i s structure has the following advantages over the server  structure:  142  (1)  it reduces overhead in the normal case by reducing the number of process switches on each read/write request.  (2) the normal-case code is separated from the exception code, leading to good modular programming so that, for example, the client and server can be debugged without any exception handling, as the normal-case code is completely separated from the exceptionhandling code (3)  it provides a clean mechanism for asynchronous server-client notification  (4)  it provides a clean queueing mechanism for the exceptions; they are not lost  (5) the increased concurrency allows exception handling to be done in parallel if that is possible (though not in this example) (6)  it can be generalised to other operating systems. The  disadvantage is that the synchronisation of the exception process with other  processes may be difficult — the increased concurrency may be hard to handle. The author implemented two program paradigms for non-standard exception notification from server to client for synchronous Thoth-like operating systems (1)  using a toggle switch for rendering deaf clients able to hear.  (2)  by using an exception process which registers with the server below The advantage of the latter approach in a hierarchy of servers is that the client (itself a  server) can be restructured as an i/o filter, with corresponding increase in performance for normal case event processing, whereas the former toggle-switch solution actually increases the run-time for normal case handling in this example, although it can be used to advantage when the client actually  wants  to be deaf to a device for an extended period, as described in Section  143  4.1.1.  5.2. A Nested Transaction Mechanism for Atomic Data Updates 5.2.1. Introduction M e c h a n i s m s for achieving m u t u a l exclusion have already been discussed i n Section one a p p r o a c h is to declare  atomic  transaction,  atomic  data  w h i c h is operated  o n b y atomic  4.3; An  transactions.  as defined b y M u e l l e r [Mueller 83] is a c o m p u t a t i o n consisting of a collec-  tion of operations w h i c h take place indivisibly in the presence of b o t h failures a n d concurrent computations.  A t o m i c transactions  differ f r o m non-atomic  transactions  in that  they  apparently indivisible - no intermediate states are observable by another transaction. are also recoverable in that the effect is all-or-nothing atomic action  commits),  are  They  - either all the operations prevail (the  or none of t h e m do (the atomic transaction  aborts) so  that all the  modified objects are restored to their initial state. T h e r e is m u c h current literature o n atomic transactions,  transactions atomic  [Liskov 83],  transactions,  [Mueller 83],  [Moss 81].  a n d , more recently, o n  Nested transactions  useful for c o m p o s i n g activities  provide a hierarchy of  in a m o d u l a r fashion.  which is i n v o k e d f r o m w i t h i n a transaction, called a subaction  nested  A  transaction  appears atomic to its caller.  Subactions can c o m m i t a n d abort independently, a n d a subaction can abort w i t h o u t forcing its parent to abort.  However, the c o m m i t of a subaction is c o n d i t i o n a l : even if a subaction  c o m m i t s , its actions m a y be subsequently aborted b y its parent or another ancestor.  Further,  objects modified by subactions are written to stable storage o n l y when the top-level actions commit.  T h u s a nested transaction  recovery for subactions.  mechanism  must  provide proper s y n c h r o n i s a t i o n a n d  144  The implementation of such a mechanism is discussed here, because the implementation is based on the general model for designing software. Liskov's ARGUS project [Liskov 83] has been taken as the specification for nested transactions. In ARGUS, Guardians and Actions are provided as linguistic support for robust distributed programs. The ARGUS project was conceived as an integrated programming language and operating system, but we show that the language can be conveniently and efficiently supported by other distributed operating systems which provide certain features. We describe a possible implementation of a nested transaction mechanism, structured to exploit exceptions, for distributed operating system kernels such as the V-system [Cheriton 83]. We compare this with the conventional approach to implementing nested transactions as exemplified in the LOCUS Distributed UNIX project, and show how structuring the program (in this case, the nested transaction mechanism) to exploit the exceptions improves the efficiency.  5.2.2.  Overview of the ARGUS Model  5.2.2.1.  Atomic Objects  ARGUS provides for  atomic  abstract  data  types  linguistically declared as  stable  objects.  Only activities which have read or write access to such atomic objects are treated as atomic activities. The implementation of atomic objects is based on simple read and write locks with the usual constraints on concurrent access: (1) multiple readers allowed, but readers exclude writers (2) a writer excludes readers and other writers.  145  When a write lock is obtained, a version of the object is made and the action operates on this version. If the action commits, the version is retained, otherwise it is lost. A l l locks are held till completion of the top-level action, and the commit of a top-level action is irrevocable.  5.2.2.2. N e s t e d A c t i o n s  To keep the locking rules simple in nested actions, a parent cannot run concurrently with its children. The rules for read and write locks are extended so that (1)  an action can obtain a read lock if every action holding a write lock is an ancestor.  (2)  an action may obtain a write lock provided every action holding a read or write lock on that object is an ancestor. When a subaction aborts, its locks are discarded. Liskov also states that when a subac-  tion commits, its locks are inherited by its parent. We will show that this is not necessary for correct operation of the mechanism, and in this point our model differs from Liskov's.  5.2.2.3.  Guardians  In ARGUS, a distributed program is composed of one or more provides access to atomic objects through a set of provides access to files. We use the word  guardian  atomic object is kept -- equivalent to the [Mueller 83]. In contrast, guardian  handlers  handler  operations,  guardians.  guardian  much like a file server  alone, to refer to the location where the  Transaction  Scheduling  Site  (TSS) of LOCUS  refer to subactions, as described below.  Liskov states that when a handler is invoked, the following steps occur: (1)  A  A new subaction, SP1, of the calling action is created.  146  (2) A message containing the arguments is constructed. (3) The system suspends the calling process and sends a message to the target guardian. If that guardian no longer exists, subaction SP1 aborts and the call terminates with a failure  exception.  (4) The system makes a reasonable attempt to deliver the message, but the call may fail. (5) The system creates a process and a subaction SP2 (of subaction SP1) at the receiving guardian to execute the handler. Note that multiple instances of the same handler may execute simultaneously. The system takes care of locks and versions of atomic objects used by the handler in the proper manner, according to whether the handler commits or aborts. (6) When the handler terminates, a message containing the results is constructed, the handler action terminates, the handler process SP2 is destroyed, and the message is sent to the caller S P l . If the message cannot be sent the subaction S P l aborts, and the call terminates with a failure  exception.  (7) The calling process continues execution. A guardian makes these resources available to its users by providing a set of operations, called handlers, which can be called by other guardians to make use of these resources. Given a variable x naming a guardian object, a handler h of the guardian may be referenced as x.h. A handler can terminate in either normal or exceptional condition. A handler executes as a subaction, so it must either commit or abort - either normal exit or exceptional exit may be committed or aborted. If the guardian crashes, all the active handler calls are aborted and its atomic data must be retrievable from stable storage.  147  L i s k o v ' s nested transactions are similar to the nested transaction model of L O C U S , however, in L O C U S , the only atomic d a t a supported is of type  file.  T o aid in c o m p a r i s o n of o u r  approach w i t h t h a t of L i s k o v a n d L O C U S , we use atomic d a t a o n l y of type file in our examples.  5.2.3. I m p l e m e n t a t i o n C o n s i d e r a t i o n s  5.2.3.1.  Introduction  In the A R G U S  model of nested transactions,  a n d i n the L O C U S  relationship between client a n d g u a r d i a n is not just  i m p l e m e n t a t i o n , the  that of a one-off request  for d a t a - -  instead, close links are retained so that the system c a n notify guardians immediately a lockholder c o m m i t s or aborts.  Parents inherit their children's r e a d / w r i t e locks on t e r m i n a t i o n - -  the list of such locks is called the participant  file  list.  T h e i r implementations are structured  so that if a s u b a c t i o n is successful it must explicitly execute a c o m m i t at t e r m i n a t i o n . causes C O M M I T messages to be sent to every p a r t i c i p a n t . list  of participants.  T h e list  transferred to its parent. which violates  of participants  T h u s each action must m a i n t a i n a b y the c o m m i t t i n g  child  is  then  T h e parent transaction thus becomes aware of its childrens' deeds,  m o d u l a r i t y concepts,  operating system.  locked  This  b o t h i n the language A R G U S ,  a n d in the s u p p o r t i n g  W e claim t o preserve these boundaries i n our i m p l e m e n t a t i o n .  F u r t h e r , their implementations are s t r u c t u r e d for f u n c t i o n a l i t y rather t h a n efficiency, as there is a h i g h overhead i n the case of no aborts a n d no R / W conflicts. T h i s is counter to the principles for exception h a n d l i n g , a n d o u r i m p l e m e n t a t i o n w h i c h follows the model is considerably more efficient.  148  It is assumed that (1) subactions commit more frequently than abort (2) all atomic data is frequently available for R/W  access; conflicting R/W  requests are rare.  The principle followed in our implementation is to structure the system to minimise the run-time by reducing the inter-process communication in the statistically dominant case, and also to reduce the context saved in the normal case, by maintaining the minimum context necessary to achieve correctness. Thus when exceptions do occur, the exception-handling is costly. However, by making the normal case more efficient we reduce the chance of conflicting data accesses and this in turn may further reduce the number of aborts. We now describe the minimum context necessary to achieve correctness, before presenting an overview of the features of our implementation. We then make a comparison of the efficiency of our implementation with others. 5.2.3.2. Requirements for correct implementation To handle aborts, each guardian needs the ability to detect whether any of its atomic data has been updated by the aborted process, and if so, to restore a valid version of the atomic data, prior to that changed by the aborted process. To handle R/W  conflicts, each guardian needs the ability to detect whether its atomic  data are free, and if not, whether any write-lock owners are not ancestors of the requesting action. To handle aborts correctly, each guardian of atomic data needs to keep a stack of versions and list of actions holding each lock. For conflict resolution, each guardian needs to keep a current version and a list of actions and its ancestors holding each R/W  lock.  149  We  unique  assume that each transaction is uniquely identified i n the network b y its (Tid).  identifier  transaction  W e assume also that it is possible to determine f r o m a transaction's  T i d b o t h the home site of the transaction ( the site where the transaction begins  executing)  and the T i d ' s of all the transaction's superiors. W e thus encode the transaction's ancestors i n its identifier, a n d c a n merge the version list w i t h the ancestor  list, so that each g u a r d i a n  needs just one volatile d a t a structure, called a t-lock (after L O C U S ) to hold the l o c k i n g a n d recovery i n f o r m a t i o n for each atomic object.  T h e t-lock structure is as follows: Tlock  =  Struct [ R e a d R e t a i n e r s , T o p ]  ReadRetainers =  List [ Tid]  Top  =  Struct [ V s t a c k ]  Vstack  =  Stackfversion, T i d ]  A n o t h e r requirement for correct operation is the existence of a c o m m u n i c a t i o n facility which c a n be used b y the guardians to receive ABORT We  a n d TOP-COMMIT  messages reliably.  discuss mechanisms for achieving this in Section 5.2.4. However,  we show  in our implementation, that  guardians  need not be  immediately  informed of sub-action c o m m i t s for correct operation, p r o v i d e d the g u a r d i a n c a n determine whether a transaction h o l d i n g a R / W lock is still alive o r not.  So although the guardian's  version-owner m a y be out-of-date (can occur w h e n the c o m m i t t e d subtransaction T dies), the g u a r d i a n c a n still operate  correctly,  i n that it does not use out-of-date  achieved because the g u a r d i a n keeps the current together w i t h what it t h i n k s is the current owner.  versions.  T h i s is  version o n the t o p of the version stack, T h e version stack is u p d a t e d whenever  there is need to resolve a potential R / W conflict, w i t h the youngest living ancestor replacing T i n a n y owner's list. update its t-lock periodically.  of T  A l t e r n a t i v e l y , the g u a r d i a n c a n use a b a c k g r o u n d process to  150  5.2.4. Features of the Implementation A transaction  a n d a transaction  group  transactions correctly  a n d efficiently.  called the group leader,  TOP-COMMIT  leader  a n d all its subtransactions  messages.  ( G p l d ) are used to implement nested  E a c h new top-level transaction creates a new  atomic d a t a accessed b y any transaction, and  group  use the same G p l d .  A n y g u a r d i a n of  notifies its group leader, so as to receive  ABORT  S u c h messages are sent b y transactions to the group leader,  who in t u r n reliably notifies all the guardians in that group. ple  process,  T h e group leader m a y use a sim-  1:1 scheme for notification, or alternatively m a y use a multicast or broadcast  feature.  We  feature is available in the d i s t r i b u t e d V - k e r n e l [Cheriton  84].  digress briefly to describe such a mechanism w h i c h can be used advantageously. An The  unreliable multicast  multicast f u n c t i o n is achieved t h r o u g h a G r o u p - i d .  G r o u p - i d will receive messages sent to it.  A process w h i c h subscribes to  the  N o group membership list is m a i n t a i n e d , a n d so  members of a group are only loosely coupled, a n d messages are not reliably delivered.  We  show that this m a y be extended s i m p l y to a reliable multicast w h i c h fulfills the requirements. The  algorithms used by the group leader, a n d their performance, are discussed fully in Sec-  tions 5.2.4.7- 5.2.4.9. The  implementation  is s t r u c t u r e d  children's locks automatically.  so  that  parents  do  not  inherit  their  committed  Indeed, when a c h i l d c o m m i t s , which it can do only on its  death, the w a i t i n g parent is merely i n f o r m e d of that fact, a n d the participating guardians are not i n f o r m e d of the child's d e a t h . guardian's  atomic  data,  However, if a subsequent a t t e m p t is made to access such a  the g u a r d i a n m u s t t h e n be able to detect the status of its lock-  holders, a n d u p d a t e its t-Iock d a t a accordingly.  151  T o handle the c o m m i t of a top-level transaction, the action must send a T O P - C O M M I T message to the group leader.  A l l guardians to w h i c h the transaction group has, at a n y time,  held a lock, then receive the T O P - C O M M I T message.  E a c h guardian must then c o m m i t to  stable storage any current versions held b y that transaction group. In the event of network p a r t i t i o n i n g , the problem is the same as that discussed in the L O C U S paper, a n d no top-level transaction can c o m m i t . cases, as any failed component can prevent c o m m i t m e n t . mit;  W e provide a user override for these O r p h a n transactions will not c o m -  they s h o u l d eventually detect the decease of their top-level ancestor and should d u l y  abort. M a n y message-based operating systems provide the software tools necessary for efficient and correct i m p l e m e n t a t i o n ; namely that transactions m a y be i m p l e m e n t e d as processes, that parent transactions  m a y await replies f r o m their children subactions.  and  Processes s h o u l d  be cheap to create a n d destroy d y n a m i c a l l y so that each invocation of a g u a r d i a n handler can be efficiently i m p l e m e n t e d as a new process, as specified in A R G U S . ing between processes s h o u l d not be too expensive. like operating systems,  a n d even U N I X 4.2  F u r t h e r , context switch-  T h e s e criteria are all fulfilled b y T h o t h -  processes c o m m u n i c a t i n g over d a t a g r a m  can be used for functional correctness, if not efficiency. A synchronous  sockets  Send-Receive-Reply  inter-process c o m m u n i c a t i o n facility is needed — again, this is available in the T h o t h derivatives, a n d can be s i m u l a t e d w i t h U N I X datagrams, w h i c h are now  as we show in our examples, details of  given.  5.2.4.1. Transaction Invocation The  calling  process,  P m a y invoke another transaction  T either as a nested top-level  action (only if P is also top-level), or as a subaction, usually b y m a k i n g a handler call to a  152  g u a r d i a n , v i z . g.h. T h e following a l g o r i t h m is e m p l o y e d : (1)  A t the site of P , a new process, p i d P I is created for the new t r a n s a c t i o n T . be r u n at a remote site, an appropriate remote p i d is generated).  (If P I is to  If the i n v o k e d transac-  tion is a top-level transaction, it creates a new group leader, whose unique  process-  identifier is G p l d .  (2)  P I is a d d e d to the list of P's c h i l d r e n .  (3)  T h e complete ancestor list, a n d the G p l d is passed to P I a n d P I is readied.  (4)  T h e parent P awaits c o m p l e t i o n of P I  b y executing a Receive(specific) from P I , or if  several children are initiated concurrently, the parent must arrange to be notified of each one's death a n d its status ( C O M M I T T E D or A B O R T E D ) .  5.2.4.2. A t o m i c D a t a A c c e s s A n y t r a n s a c t i o n T wishing to o b t a i n R / W access to atomic d a t a must perform the following a l g o r i t h m .  N o t e that this a l g o r i t h m will usually be executed b y the atomic  data's  g u a r d i a n u p o n receipt of an O P E N request. (1)  If a t-lock for the d a t a does not exist, one is created, a n d a t o p version stack entry is made f r o m the state m a i n t a i n e d in non-volatile storage.  (2)  If the request is for R e a d a n d W r i t e access, the request is denied if any other living t r a n saction h o l d i n g a lock is not an ancestor.  If the request is granted, a new version stack  entry is made for T c o n t a i n i n g a c o p y of the top of the version stack.  (3)  Otherwise if the request is for R e a d access, the request is denied if any other transaction h o l d i n g a write lock is not an ancestor of T .  If the request is granted, an entry for T is  153  added to R e a d R e t a i n e r s . (4)  F o r successful  requests,  t h e g u a r d i a n checks  t o see if this transaction  different f r o m the G p l d of any other reader or writer.  has a  Gpld  Note that all writers m u s t have  the same group leader, b y the locking rules, b u t readers m a y be in different groups. If this is a new G p l d , the g u a r d i a n notifies the group leader so it c a n receive A B O R T TOP-COMMIT  and  messages.  5.2.4.3. S u b a c t i o n C o m m i t N o special action is t a k e n .  T h e parent receives i n f o r m a t i o n that the c h i l d died, c o m m i t -  ted, and the parent continues normal execution.  5.2.4.4. T o p - l e v e l C o m m i t The leader.  c o m m i t t i n g toplevel action, T , must send a T O P - C O M M I T  message to the group  T h e group leader executes an algorithm described below i n Section 5.2.4.8.  to send  C O M M I T messages to each g u a r d i a n w h i c h has joined the group. A l l guardians to w h i c h any transaction of the group has, at any time, held a lock, receive the C O M M I T message (eventually).  In the case of network p a r t i t i o n i n g , not all guardians m a y immediately receive the  C O M M I T message.  T h i s p r o b l e m m a y be solved as in the L O C U S  be considered further here.  description, a n d will not  A s parents cannot r u n concurrently w i t h children, all children  must b y now be dead (or unable to receive the message).  E a c h g u a r d i a n must then c o m m i t  any current versions held b y t h a t g r o u p . T h e group leader uses a s t a n d a r d 2-phase c o m m i t protocol such as described b y Moss [Moss 81]. A f t e r w r i t i n g d a t a to stable storage a n d c o m mitting, the g u a r d i a n removes the t-lock structure together w i t h all the version lists.  154 5.2.4.5. Transaction Aborts To haDdle a transaction ABORT, all the guardians in the aborting transaction's group must receive the ABORT message and the transaction-id P of the aborting process. The mechanism for the receipt of ABORT is via the group leader, as for TOP-COMMIT. The guardian checks whether P is a ReadRetainer of a version, then performs the following algorithm for each atomic data item. (1)  If P is a ReadRetainer of a version, or in the ancestor list of a ReadRetainer, P is removed from the list; goto step 4.  (2)  Otherwise, the guardian checks, starting at the bottom of its version stack, whether P is a WriteRetainer, or in the ancestor list of a WriteRetainer of a version.  (3)  If a match is found with a version, say v,, the oldest version prior to v, is restored, and the owner is set equal to P's ancestor.  (4)  If there are no remaining R/W retainers, the t-lock is removed. To clarify this, an example of the formation of a version stack with nested transactions  before and after a transaction aborts is shown in Figure 5.6 below. Thus, A B O R T causes the version stacks and the ReadRetainers and WriteRetainers to 4  be updated .  5.2.4.0.  Transaction Read/Write Conflicts  To handle conflicts correctly, the guardian must check for each atomic data access attempt, if there is a R/W lock on the data. 4  Note that we must check to see if the owner held several versions — if so, the oldest must be restored. Therefore the version stack must be checked from the bottom-up for completeness, even though  155  V e r s i o n Stack  Version Stack  before T 2 A B O R T s  after T 2  version  T2 A B O R T s  owner-ancestors  version  ABORTs  owner-ancestors  VO  stable store  VO  stable store  VI  TO  VI  Tl  V2  T2--T1--T0  V3  T3--T2-T1--T0  V4  T2-T1--T0  V5  T4-T2-T1--T0  V6  T2-T1-T0  bottom  Figure 5.8 Formation of a Version Stack with Nested Transactions (1) If no R / W locks are held, access is p e r m i t t e d a n d a n o r m a l d a t a access is performed as described (2)  above.  If a R / W lock is held on the d a t a , we search to see if the current version owner is an ancestor of P , by checking whether the owner is still alive. version stack is u p d a t e d .  In doing so, the top of the  If there is a conflict, access is denied.  the most likely ABORT is the current version holder.  156  5.2.4.7. The Group Leader's Role E a c h group leader maintains a list of participants b y accepting i n f o r m a t i o n from a guard i a n whenever a transaction i n a new group receives access to some of the guardian's atomic data  (i.e. b y gaining a R / W lock).  ABORT  messages f r o m transactions  T h e group i n the g r o u p .  leader  also  accepts T O P - C O M M I T a n d  T h e structure  of this c o m m u n i c a t i o n is  shown i n F i g u r e 5.7 below.  O n receipt  of a n A B O R T  or T O P - C O M M I T  leader has to notify the participants i n its list.  message  5  from a transaction,  the group  F o r this purpose, the G L uses a worker pro-  cess o n the same team, whose role is to provide t w o - w a y c o m m u n i c a t i o n as described previously i n Section 2.3.2.  T h e worker has access to the participant list m a i n t a i n e d b y the G L ,  and exploits this fact t o handle the T O P - C O M M I T extra context switches  a n d ipc between  protocol, u n a i d e d b y G L , thus a v o i d i n g  the worker a n d the G L .  O n l y w h e n the T O P -  Figure 5.7 Group Leader Communication Structure 6  Note that A B O R T messages must always take priority over TOP-COMMIT  157  COMMIT is resolved does the worker re-Send to its GL. The worker thus acts like an exception process described earlier. There are no synchronisation problems between the worker and its GL because the TOP-COMMIT message should be the last from the group. Similarly, during the processing of ABORT the worker only re-Sends to its GL when the ABORT is completed. If more ABORT's for the group are received by GL while its worker is processing an earlier one, the GL must save them up. To avoid multiplexing and possible delays, we specify one GL per transaction tree group. Usually it is created on the same host as the top-level transaction, although for efficiency in execution of the 2-phase COMMIT protocol the GL should be on the host which has most participants. Thus all group leaders execute the same code, first creating their worker to manage the 2-way communication between participants and the GL. 5.2.4.8. G r o u p - L e a d e r ' s T O P - C O M M I T  Algorithms.  The GL executes the following algorithm on receipt of a TOP-COMMIT message from the group's top-level transaction T. (1)  Reply TOP-COMMIT(T) to worker.  (2)  Wait to receive next message. The worker executes the following algorithm when there is no multicast or broadcast ipc  mechanism. (1)  Send to GL for instructions  (2)  On receiving the Reply TOP-COMMIT(T), obtain the list of participants.  (3)  For each participant, the worker must Send a P R E P A R E ( T ) message.  158  If the p a r t i c i p a n t replies  ABORT(T),  transactions must be aborted. timeout  p e r i o d , the  PREPARE(T)  G o to step 8.  worker  message.  then the whole c o m m i t must fail, a n d all the  can  either  If the participant does not reply w i t h i n a  ABORT  (goto  If the participant responds  step  8),  or  retransmit  P R E P A R E D ( T ) then  the  goto step  3 unless each p a r t i c i p a n t has been notified.  A l l the participants have responded  PREPARED(T)  at this p o i n t .  R e c o r d in per-  manent m e m o r y a list of the participants along w i t h a notation t h a t transaction T is now  COMPLETING.  For  each p a r t i c i p a n t ,  Send a  COMMITTED(T). period, the message When  COMMIT(T)  COMMIT(T) is  all participants  the  have  responded, reply  must  to  A B O R T E D ( T ) to  The  participants act as follows:  If a  PREPARE(T)  list of participants  C O M M I T T E D ( T ) to  each  participant.  When  and  all  participants  have  G L . Done.  message is received, a n d  T  is u n k n o w n at the participant (it never  Otherwise, prepare the transaction locally as follows.  write-locked  associated  G L . Done.  ran there, was locally aborted, or was w i p e d out b y a crash), then respond  state of any objects  respond  retransmitted.  responded, erase the  ABORT(T)  message  T h e participant  If there is a participant w h i c h has not responded after a timeout  m e m o r y f r o m permanent m e m o r y . R e p l y Send  message.  ABORT(T).  W r i t e the identity a n d the new  by T to the permanent  memory.  However, the o l d  state of the objects is not overwritten — the p o i n t is to have b o t h the old a n d the new state recorded in permanent m e m o r y , so that the transaction can be locally c o m m i t t e d (by installing the  new states) or aborted (by forgetting the new states) on d e m a n d ,  159  regardless of crashes. atomic update. ately, b u t  W h e n a transaction is prepared, its  write  m u s t then reply (2)  If an  T h i s write t o p e r m a n e n t m e m o r y must be performed as a single  read  locks m a y be freed i m m e d i -  locks must be held u n t i l the transaction is resolved.  P R E P A R E D ( T ) to  ABORT(T)  T h e participant  the G L .  message is received, t h e n locally abort T . If T is PREPARED  erase  f r o m permanent m e m o r y the potential new states of objects modified b y it (using a single atomic write), a n d then perform the usual version restoration associated w i t h t r a n saction abort. (3)  If a  COMMIT(T)  message is received, then T must have been locally prepared, a n d is  either still prepared, or already c o m m i t t e d .  If it is prepared, install the tentative new  states of objects modified b y T i n b o t h permanent a n d volatile m e m o r y , d i s c a r d i n g the old states of these o b j e c t s . 6  F i n i s h c o m m i t t i n g the transaction b y releasing all its locks?  If T is no longer locally prepared, t h e n it has already been c o m m i t t e d , a n d no special action  is  required.  COMMITTED(T) (4)  In  either  case  (T  is  locally  prepared  or  not),  respond  to the G L w h e n done.  If a transaction has been prepared for a long time a n d n o t h i n g has been heard f r o m the G L , t h e n ask it for the state of the transaction. still preparing, no  record  COMMIT(T)  of the transaction,  T h e G L replies  if it is c o m m i t t i n g . then  the node  PREPARE(T)  if it is  If the G L is not r u n n i n g , or there is where  the G L r a n should  respond  ABORT(T). Moss argues [Moss 82] that this algorithm is correct a n d will always w o r k eventually. If there is a crash at a p a r t i c i p a n t after the first phase, the permanent 8  again, ensure that the update is achieved atomically with a single write  m e m o r y contains the  160  relevant information for recovery. If a transaction was not yet completed, then the crash will wipe it out, and the transaction will be ABORTED. Further scenarios are explained in Moss, but only the case of the coordinator failing may cause some participants to lock data forever. A manual override must be used in such cases - the data is, however, correctly preserved on permanent storage. Alternative algorithms may be used by the worker if a multicast or broadcast ipc mechanism is available, which reduces the ipc and message traffic still further. (1) The GL broadcasts or multicasts (Bsends) a P R E P A R E ( T ) multicast message, and waits till all the replies have been picked up. (2) If the GL receives an A B O R T reply, then go to step 7. (3)  If the GL receives all P R E P A R E D ( T ) replies within the timeout period, it records in permanent memory a list of participants along with a notation that transaction T is now COMPLETING.  Otherwise it can retransmit P R E P A R E ( T ) messages on a 1:1 basis  as in step 4 of the previous algorithm, and if the timeout still expires, the transaction must be aborted, (go to step 7). (4) A l l the participants have responded P R E P A R E D ( T ) at this point. Now Bsend a C O M M I T message. (5) As step 6 before. (6) As step 7 before. (7) BSend  an A B O R T ( T ) message. When all participants have responded, reply  A B O R T E D ( T ) to GL. Done.  161  5.2.4.9. Group Leader's ABORT Algorithms The  G L executes the following algorithm o n receipt of a n A B O R T  message f r o m a n y  transaction T of the g r o u p . It is similar to the C O M M I T algorithm described above.  A B O R T ( T ) to  (1)  Reply  worker.  (2)  W a i t to receive next message. T h e worker executes the following algorithm when there is no multicast or broadcast ipc  mechanism. (1)  S e n d to G L for instructions  (2)  O n receiving the R e p l y  (3)  F o r each p a r t i c i p a n t , S e n d an  (4)  If the p a r t i c i p a n t does not reply w i t h i n a timeout period, the message is r e t r a n s m i t t e d . If  ABORT(T),  the p a r t i c i p a n t responds  ABORT(T)  that  M O R E ) then  participant  the list of participants. message.  A B O R T E D ( T , NONE)  there are no more transactions remove  obtain  from  then that participant has indicated  i n that group holding R / W locks, so the worker can its list..  If the participant  responds  ABORTED(T,  that participant has indicated there are more transactions  in that group  h o l d i n g R / W locks, a n d the worker cannot remove that participant from its list.  Goto  step 3 until each p a r t i c i p a n t has been notified. (5)  A l l the  participants  have  responded  ABORTED(T)  at  this  point.  Reply  A B O R T E D ( T ) to G L . D o n e E x t e n s i o n o f the algorithm m a y be made if a broadcast or multicast feature is available, as for the 2-phase C O M M I T described above.  162  5.2.5. Efficiency of Mechanisms A c o m p a r i s o n of the L O C U S a n d V - k e r n e l implementations is shown i n T a b l e 5.2. In L O C U S , for each s u b a c t i o n opening n atomic files a n d later c o m m i t t i n g , there are 4 + 2 n messages to parent a n d T S S . F o r top-level c o m m i t a 2-phase protocol i n v o l v i n g 4 n messages (to T S S ) is executed. T h u s if T - > T l - > T 2 - > T 3 - > n atomic files there are 3*(4+2n) messages between T , T 1 , T 2 , T 3 a n d t h e T S S , plus 4n messages for 2-phase top-level c o m m i t .  T h i s is s h o w n i n  c o l u m n 1 of T a b l e 5.2. T h u s , contrary to the claim i n [Mueller 83], the 2-phase c o m m i t does not d o m i n a t e at 3 or more levels of nesting of transactions. T h e n u m b e r of message  where a message event is described as a message being  events,  sent or received, is equal to 2 * (no. of messages) i n this case, as each message is 1:1. A s t h e process context s w i t c h i n g is often a d o m i n a n t time-cost, we use this measure, w h i c h is equal to t h e message events.  So i n the above example, i n the case of no aborts or  Table 5.2 A Comparison of Locus and V-kernel Nested Atomic Actions no. of ipc messages  LOCUS  V-kernel no broadcast  V-kernel broadcast  Level m subtrans. opening n files  m*(4+2n)  4+2n  4+2n  Level 3 subtrans.  3*(4+2n)  4+2n  4+2n  NO ABORTS  T O P COMMIT T O T A L messages  4n  2+4n  2+2n  12+10n  6+6n  6+4n  2n  6+4n  7+3n  12+8n  6+4n  7+3n  ABORTS Level m subtrans. aborts Level 1 trans, aborts  163  d a t a conflicts, the L O C U S system requires 24 + 20n context switches. F o r our scheme, for each subaction c o m m i t there are no context switches.  T h e r e f o r e the  only messages for the above model are to create the G L , to establish the p a r t i c i p a n t list a n d to execute the 2-phase c o m m i t protocol.  T h e cost of establishing the G L is the overhead of 2  process creations (for G L and its worker) =  4 ipc messages i n V - k e r n e l .  the group, P , does a S e n d to the G L . T h e r e f o r e there are 2n messages  E a c h p a r t i c i p a n t in for n participants,  shown i n c o l u m n 2 of the T a b l e . For  top-level c o m m i t (no broadcast) the 2-phase protocol i n v o l v i n g 2 messages from G L  to its worker, a n d 4 n messages between the worker a n d the participants is executed.  Thus,  after the 2-phase top-level c o m m i t for successful update of n files (without broadcast),  (4+2n)  -I- (2+4n) =  6 + 6 n ipc messages, requiring 12 -I- 12n context switches have been executed, as  shown i n the T a b l e . If a multicast  T h i s compares f a v o u r a b l y w i t h L O C U S ' 24 + 20n context switches. algorithm is used to broadcast  the T O P - C O M M I T  message  i n the V -  kernel, only 2 + 2 n messages instead of 2 + 4 n are required, as shown i n c o l u m n 3 of T a b l e 5.2. Let us consider what these figures m e a n i n real time.  If n = 5 a n d the context switching  time is lOmsecs (as it is for a S e n d - R e c e i v e - R e p l y between two U N I X 4 . 2 B S D processes o n a V A X 11/750), L O C U S takes 124*10 msecs = 1 2 4 0 msecs to c o m m i t (ignoring message transit time), whereas o u r system takes 10(12+12*5) = takes 10(6+20) =  260 msecs.  720 msecs (no multicast), a n d w i t h multicast  T h u s o u r algorithm is considerably faster i n the normal-case of  no aborts or conflicts. We  now consider the exceptional case of a transaction a b o r t i n g .  In the L O C U S system,  if a s u b t r a n s a c t i o n A B O R T s , a message is sent t o each of the a b o r t i n g subtransaction's participants ( i n c l u d i n g those inherited from its c o m m i t t e d children).  If the t r a n s a c t i o n T has n  164  participants, a n d if the lowest subtransaction A B O R T s ,  LOCUS  highest s u b t r a n s a c t i o n A B O R T s , L O C U S uses 3*(4+2n) +  2n =  In our system, the subtransaction sends a n A B O R T sends A B O R T  uses 2n messages.  If the  12 + 8n messages.  message to the G L w h i c h in t u r n  to each participant before r e p l y i n g to the a b o r t i n g transaction.  T h u s in our  system, if the lowest transaction A B O R T s we have the following: On A B O R T ,  2 + either 2n (no multicast) or 1+n (if multicast) messages  So the T O T A L messages (1 A B O R T , no timeouts) -I- 3n, as s h o w n in the T a b l e .  =  6 -I- 4n (no multicast) or 7  T h i s is the same for any level of transaction a b o r t i n g .  T h u s our system is faster t h a n L O C U S to handle an A B O R T if the aborting  is more  than  2 levels  higher  than  the file user.  transaction  T h i s s i t u a t i o n w o u l d c o m m o n l y occur in prac-  tice: the file-opener w o u l d be the lowest-level transaction, a n d it might well be aborted b y a grandparent.  T h u s even the exception-handling c a n be faster in our system, a n d the n o r m a l -  case always is faster.  CHAPTER 6 Conclusions  T h i s thesis is another step towards the time when concurrent d i s t r i b u t e d p r o g r a m m i n g is c o m m o n p l a c e .  Its c o n t r i b u t i o n is in i d e n t i f y i n g exceptional events in the context of systems  software, a n d in the development of a design model illustrated by m a n y examples a n d program paradigms w h i c h show how systems programs m a y be i m p r o v e d by exploiting exception mechanisms ~ either by m a k i n g the programs more efficient at r u n - t i m e (according to certain criteria), a n d / o r b y m a k i n g the program code easier to write t h r o u g h e m p l o y i n g incremental program design. T h e s e software engineering techniques are applicable to all event d r i v e n systems where the relative p r o b a b i l i t y of the events is k n o w n a priori;  s u c h systems include  hierarchic systems servers, c o m m u n i c a t i o n s protocols a n d utility programs.  T h u s the model is  an extremely useful tool for solving a wide range of systems software design problems.  6.1. Summary of results T h e thesis establishes a set of general principles for exploiting exceptions i n systems programs  b y p r o p o s i n g a general  according to desired objectives.  model to  achieve i m p r o v e d performance  and functionality  T h e thesis also describes program models to carry out these  objectives, a n d gives two extended examples w h i c h show the practicality of the design methodology. It is appropriate here to s u m m a r i z e these results, in the light of achieving the goals of the thesis, w h i c h are to show how to design efficient a n d m o d u l a r systems programs exploiting exception mechanisms.  T h e goals are elaborated as follows: to characterize  165  the nature  of  166  exceptions in operating systems, to develop a model for designing efficient and modular software in a multi-process environment, and to provide program paradigms which employ the design principles by exploiting exception mechanisms. The goals are examined separately in the following subsections.  6.1.1. Characterise the Nature of Exceptions An event-oriented view of systems has been taken, where an events, which may  event  manager  responds to  be externally or internally generated. This thesis shows that there are  many systems programs which may be treated as event driven. Systems servers (or monitors), i/o library routines, and, less obviously, some data-driven utility programs come under this description. The exceptions encountered by an event manager may be handled either by the manager itself -- called peer exceptions -- or by the client at the level above — called server-client exceptions. In multi-process systems, where a client is a process separate from the resource which it is using, exceptions from the event manager (called a server) to the client are called  inter-process  exceptions.  Such exceptions, where the handler is at a different  level to the detector, have been classified along three dimensions: viz. ignorable v. exceptions which demand attention, broadcast v. 1:1 exceptions, and asynchronous v. synchronous exceptions. These distinctions occur because the exception mechanisms may be different for the various classes of inter-process exceptions. For example, notification of ignorable exceptions may be different to that for exceptions which must be handled in a distributed system where a cheap unreliable message delivery may be exploited for ignorable exceptions. Similarly, broadcast exceptions from a server to several clients may be used to develop further cooperation between the client processes, and may therefore be handled differently to the exceptions which  167  are 1:1 f r o m server to client. T h i s classification of exceptions therefore provides a very useful basis for research  into  exception mechanisms, a n d for multi-process systems design exploiting exception mechanisms.  0.1.2. Model for Systems Design F o r event-oriented systems in w h i c h the relative probabilities of the events are k n o w n a  priori,  an appropriate design objective is chosen.  T h e thesis model considers three objectives,  namely: m i n i m i s i n g the average r u n - t i m e , m i n i m i s i n g the exception h a n d l i n g time, a n d (less obviously) increasing the program's f u n c t i o n a l i t y .  T h i s latter objective was chosen as a new  approach to systems design, w h i c h is p a r t i c u l a r l y useful a n d appropriate in a multi-process environment. P r o g r a m s w h i c h are designed to minimise the average r u n - t i m e s h o u l d use the following design principles: (1)  T h e events are d i v i d e d into 2 groups; the so-called normal  tional (2)  events, a n d the other,  excep-  events.  Cost-benefits m a y be made in the detection, h a n d l i n g , or context-saving of the n o r m a l case events d e p e n d i n g o n the n u m b e r a n d probabilities of the events.  (3)  T h e p r o g r a m s h o u l d be designed t o minimise the r u n - t i m e b y either r e d u c i n g the detection cost of n o r m a l events, r e d u c i n g the h a n d l i n g cost of n o r m a l events, or r e d u c i n g the context-saved  while processing n o r m a l events.  is s t r u c t u r e d according to functionality  T h i s design m a y mean that the program  rather t h a n its logical purpose.  In other words, it  m a y pay to cut the p r o g r a m into vertical slices of f u n c t i o n , rather t h a n horizontal slices of m o d u l a r i t y .  O n e a p p r o a c h is to structure the inter-process c o m m u n i c a t i o n to m i n i m -  168  ise the message-flow in the statistically d o m i n a n t cases, b y recognizing the most c o m m o n events in the system, a n d ensuring t h a t they are processed w i t h the m i n i m u m n u m b e r of context switches.  T h i s c a n be achieved b y choosing the event manager process w h i c h  eventually handles most c o m m o n events to handle all the events first, b y a d d i n g request exceptions.  T h i s event manager  processes  the c o m m o n requests  quickly w i t h o u t  any  extra context switches, a n d it forwards the exception requests to other processes for h a n dling.  A n o t h e r a p p r o a c h is to execute exception-handling code in parallel where possible  b y d i s t r i b u t i n g the e x c e p t i o n - h a n d l i n g over m u l t i p l e processes.  A general principle w h i c h leads to better design in a multi-process or d i s t r i b u t e d system, is to consider exception handlers as separate processes.  T h e n o t i o n of ignorable  exceptions, in  the sense t h a t if they are not h a n d l e d they have no effect on the correct operation of the prog r a m , m a y also be used to i m p r o v e p r o g r a m f u n c t i o n a l i t y .  T h e s e two tools used  together  enable a system to be developed w i t h o u t an exception handler, a n d t h e n its f u n c t i o n a l i t y c a n be increased b y the a d d i t i o n of a handler.  F u r t h e r , broadcast  exceptions  c a n be used to  enhance cooperation between the clients of a server, a n d cheap unreliable message delivery can be safely used for ignorable exceptions.  T h e m a n y examples illustrate that a p p l y i n g the model enables multi-process programs to be structured s i m p l y a n d in a m o d u l a r w a y . T h u s the model provides a very powerful software engineering tool for systems design methodology.  8.1.3. Program Paradigms for Systems Design T h e design principles have been extended into usage models when the design alludes to a system dependent feature, such as in the inter-process exception notification scheme.  169  O n e example is given in several operating systems a n d h i g h level languages, for m u l t i plexing of asynchronous i / o events, where a server needs to notify a client of asynchronous input from more t h a n one i n p u t device.  A n o t h e r example shows how cooperation c a n  be  achieved between the clients of a storage allocator server, using ignorable, broadcast messages, w h i c h m a y be unreliably delivered for the most efficient c o m m u n i c a t i o n between  processes.  F i n a l l y , system dependent mechanisms are given for h a n d l i n g the m u t u a l exclusion problems w h i c h m a y arise if the exception handler is i m p l e m e n t e d as a separate process, i n c l u d i n g the use of language-level atomic  actions.  In this w a y , the properties are isolated w h i c h high-level languages should have for implem e n t i n g the design ideas of the m o d e l .  Hence the systems a n d language approaches to sys-  tems design have been unified.  6.1.4. P r o g r a m E x a m p l e s T w o programs are described w h i c h were designed b y the author to test the a p p l i c a b i l i t y and effectiveness  of the  exploiting exceptions. not exploit exceptions.  model.  T h e programs  have i m p r o v e d structure  a n d f u n c t i o n by  B o t h programs r u n more efficiently t h a n their counterparts w h i c h do T h e s e examples show that the principles a n d tools are b o t h practical  and effective.  6.2.  Further W o r k T h e t r e n d towards d i s t r i b u t e d systems continues - - to loosely-coupled systems without  shared m e m o r y , t i g h t l y - c o u p l e d multiprocessor systems w i t h shared m e m o r y , a n d t o w a r d s h y b r i d systems,  reflecting the move towards c o m p u t e r systems  modules executing an a l g o r i t h m .  as concurrent  cooperating  New techniques for m a n a g i n g these d i s t r i b u t e d systems  are  170  required.  T h e principles enunciated in this thesis for multi-process systems extend n a t u r a l l y  into the d o m a i n of d i s t r i b u t e d systems. O n e open question concerns the techniques for exploiting unreliable messages in networks of computers.  T h e use of broadcast (or multicast) messages is very convenient in local area  networks where all the nodes receive the same message, thus r e d u c i n g network traffic over m a n y 1:1 messages.  However, in long haul networks, characterised by multiple hops between  source a n d destination machine, the network traffic is usually increased for broadcast sages, because of the r o u t i n g algorithms.  mes-  Y e t it is in precisely these networks that the reliable  message delivery is most costly to provide (in the sense of elapsed time for acknowledgements and in protocol processing overhead).  T h e tradeoff between broadcasting unreliable messages  and p r o v i d i n g reliable 1:1 message delivery s h o u l d be examined in these contexts.  T h e use of b r o a d c a s t / m u l t i c a s t  inter-process  c o m m u n i c a t i o n is not yet well developed;  there are open questions o n the best semantics of a broadcast S e n d ; e.g. s h o u l d the sender be able to specify whether to wait for one, m a n y or all replies.  T h e use of broadcast S e n d for  exception mechanisms impacts these semantics, as the idea b e h i n d exception mechanisms is to take the load off the n o r m a l case processing w h i c h can t h e n be made as efficient as possible. Hence a n unreliable broadcast S e n d in w h i c h the sender cannot specify, or need to k n o w , the n u m b e r of replies is considerably cheaper t h a n a broadcast S e n d where a list of all recipients is needed for c h e c k i n g that all replies have been received.  Performance estimates of these  alternatives w o u l d be valuable i n deciding the protocols a n d semantics of broadcast c o m m u n i cation. A n o t h e r open question concerns the i m p l e m e n t a t i o n of transaction scheduling sites ( T S S ) for atomic  actions.  T h e idea behind a T S S is to protect atomic d a t a , (which is shared by  171  several processes),  a n d is crucial for cooperation of access to shared d a t a between exception  handlers a n d other processes.  atomic TSS.  Shared d a t a between the processes s h o u l d be specifyable as  for simple synchronisation, and such atomic d a t a s h o u l d be efficiently m a n a g e d by a F u r t h e r work on this c o u l d include enhancements  to  high level languages  such  as  Modula-2 in w h i c h cooperating systems are w r i t t e n , to provide easy methods for specifying exceptions and exception h a n d l i n g in this way.  Efficient algorithms for expedient a n d fair selections of events s h o u l d be developed for event managers; the task of deciding w h i c h ready event to handle next cannot be left to the operating system in a d i s t r i b u t e d e n v i r o n m e n t . New in  mechanisms are needed for distributed highly parallel c o m p u t a t i o n s , to reflect needs  b o t h systems  software  a n d applications software  written  in high level languages.  A  higher-level p r o g r a m m i n g p r o b l e m w h i c h c o u l d e m p l o y our principles for good design, by the use of exception h a n d l i n g facilities in multiprocess a n d / o r d i s t r i b u t e d operating systems, is the  SMALLTALK  system  [Goldberg 84).  SMALLTALK  message-based concurrent p r o g r a m m i n g in A l applications.  is  an  important  step  towards  If it could be distributed over a  local area network b y enhancing the language w i t h unreliable exception notification, its prog r a m m i n g power m i g h t be considerably enhanced.  172  References 1.  Ahamad, M. and Bernstein, M.J., "Multicast Communication in UNIX," Proceedings 5th International Conference on Distributed Computing Systems, pp. 80-87, May  1985.  2.  Andrews, G.R., "Synchronizing resources," ACM Transactions on Programming Languages and Systems, vol. 3(4), pp. 405-430, 1981.  3.  Andrews, G.R., "The Distributed Programming Language SR—Mechanisms, design implementation," Software P&E, vol. 12(8), pp. 719-754, August 1982.  4.  Andrews, G.R. and Schneider, F.B., "Concepts and Notations for Concurrent Programming," ACM Computing Surveys, vol. 15(1), March 1983.  5.  Atkins, M.S. and Knight, B.J., "Experiences with Coroutines in BCPL," Software P&E, vol. 13(8), Aug. 1983a.  6.  Atkins, M.S.,  "Exception Handling in Communication Protocols," Proceedings 8th Data  Communications  7. 8.  and  Conference, Oct. 1983b.  Atkins, M.S., "Notations for Concurrent Programming," Surveyors' Forum, ACM Computing Surveys, vol. 15(4), Dec. 1983c. Beander, B., "VAX ACM  DEBUG: An Interactive, Symbolic, Multilingual Debugger," Proceedings  SIGSOFTjSIGPLAN  Software Engineering Symposium on High-Level Debugging,  Aug.  1983. 9.  Bentley, J.L, "Writing Efficient Programs," in Prentice-Hall Software Series, New 1982.  10.  Black, A., "Exception Report,  Jersey,  Handling - the case against," in University of Washington Tech  1982.  11.  Bochmann, G.V., "Finite state description of communication protocols," Computer Networks, vol. 2(4/5), pp. 361-372., Oct. 1978.  12.  Bourne, S.R., Birrell, A.D., and Walker, I., "ALGOL68C Reference Manual," University of Cambridge Computer Laboratory,,  13.  Bron, C , Fokkinga, M.M.,  1975.  and Haas, A.CM.  De, "A Proposal for dealing with abnaormal  termination of programs," Twente University of Technology, Mem.Nr.150,  Nov.  The  Netherlands,  1976.  14.  Brownbridge, D.R., Marshall, L.F., and Randell, B., "The Newcastle Connection or UNIXes of the World Unite!," Software P&E, vol. 12(7), pp. 1147-1162, 1982.  15.  Bunch, S.R. and Day, J.D., "Control Structure Overhead in TCP," Trends and Applications:  127, May 16.  CCG,,  Computer Network Protocols,  Proceedings  Gaithersburg, Maryland,  pp.  IEEE 121-  1980.  "Datapac Interactive Terminal Interface (ITI) Specification," Trans Canada Tele-  phone System, Ottawa, Ontario, Oct.  1978.  17.  CCITT,, Interface between DTE and DCE for public data networks, March 1976.  18.  Chanson, S.T., Ravindran, K., and Atkins, M.S., "Performance Evaluation of the Transmission Control Protocol in a Local Area Network Environment," Canadian INFOR — Special Issue on Performance Evaluation, vol. 23(3), pp. 294-329, Aug. 1985a.  19.  terminals operating in the packet mode on  Chanson, S.T., Ravindran, K., and Atkins, M.S., "LNTP - An Efficient Transport Protocol for Local Area Networks," UBC British Columbia, Feb. 1985b.  Computer Science Technical Report 85-4, University of  20.  Cheriton, D.R., Malcom, M.A., Melen, L.S., and Sager, G.R., "Thoth, a portable real-time operating system.," CACM, vol. 22(2), pp. 105-115, Feb. 1979a.  21.  Cheriton, D.R.,  "Multi-process Structuring and the Thoth Operating System," UBC Com-  puter Science Technical Report, University of British Columbia, March 1979b.  173  22.  Cheriton, D.R. and Steeves, P., "Zed Language Reference Manual," UBC Computer Science Technical Report 79-2, University of British Columbia, Sept. 1979c.  23.  Cheriton, D.R., "Distributed I/O using an object-based protocol," UBC Computer Science Technical Report 81-1, University of British Columbia, Jan. 1981.  24.  Cheriton, D.R., "The Thoth System: Multi-Process structuring and Portability," in Elsevier North-Holland, New York, 1982.  25.  Cheriton, D.R., "Local Networking and Internetworking in the V-System," Proceedings Sth Data Communications  26.  Conference, Oct. 1983a.  Cheriton, D.R. and Zwaenpoel, W., "The Distributed V-kernel and its Performance for Diskless Workstations," Proceedings of the 9th Symposium on Operating Systems Principles,  Oct 1983b. 27.  Cheriton, DR., "One-to-Many Interprocess Communication in the V-System," Proceedings Fourth International  Conference on Distributed Systems, May 1984.  28.  Clark, D., Proceedings SIGCOMM  88, 1983.  29.  DARPA,, "Internet program protocol specifications — Internet Protocol, Transmission Control Protocol," Information Sciences Institute, USC, CA, vol. RFC 791, 793, Sept. 1981a.  30.  DEC,, "Digital Equipment Corporation - BLISS-11 Programmer's Manual," Maynard, Mass., 1974.  31.  Deering, S.E, "Multi-process structuring of X.25 software," UBC Computer Science Technical Report 82-11, University of British Columbia, Oct 1982.  32.  Dijkstra, E.W., "Guarded Commands, nondeterminacy, and formal derivation of programs," CACM, vol. 18(8), August 1975.  33.  Feldman, J.A., "High-level Programming for Distributed Computing," CACM, pp. 363-368, June 1979.  34.  Gentleman, W.M., "Message passing between sequential processes: The Reply primitive and the administrator concept," Software P&E, vol. 11(5), May 1981.  35.  Gentleman, W.M., "Using the Harmony Operating System," National Research Council Canada, Division of Electrical Engineering,  36.  vol. 22(6),  Ottawa, Ont., vol. NRCC/ERB-966, Dec. 1983.  Geschke, C M . and Satterthwaite, E.H., "Exception Handling in Mesa," XEROX  PARC  report, Palo Alto, 1977, 1977.  37.  Goldberg, A., "SMALLTALK-80 The Interactive Programming Environment," in AddisonWesley, 1984.  38.  Goodenough, J.B., "Exception-handling: Issues and a Proposed Notation," CACM 18(12), pp. 683-696, Dec. 1975.  39.  Herbert, A.J., CAP Operating System Manual, University of Cambridge Computer Laboratory, 1978.  40.  Hoare, C.A.R., "Monitors: An operating system structuring concept," CACM, pp. 549-557, Oct. 1974.  41.  Hoare, C.A.R., "Communicating Sequential Processes," CACM, Aug. 1978.  42.  IBM,, "PL/l(F)Language Reference Manual," Form GC28-8201, IBM Corporation, 1970.  43.  Joy, W., UNIX4-2BSD  44.  , vol.  vol. 17(10),  vol. 21(8), pp. 666-677,  Operating System, 1983.  Lampson, B.W., Mitchell, J.G., and Satterthwaite, E.H., "On the Transfer of Conrol Between Contexts," In Lecture Notes in Computer Science,vol.l9,B.Robinet(ed.),  Springer-  Verlag,N.Y., pp. 181-203 , 1974. 45.  Lampson, B.W. and Redell, D.D. , "Experience with Processes and Monitors in Mesa," CACM  28(2), pp. 105-117, Feb. 1980.  174  46.  Lantz, K . A . ,  "Uniform Interfaces for Distributed Systems," PhD Thesis, University of  Rochester, 1980.  47.  Lauer, H.C. and Needham, R.M., "On the Duality of Operating System Structures," Operating Systems Review, vol. 13(2), pp. 3-19, April 1979.  48.  Leach, P A . and Levine, P.H., "The Architecture of an Integrated Local Network,"  IEEE  Journal on Selected Areas in Communications, vol. SAC-1(5), pp. 842-856, Nov. 1983.  49.  Lee, P.A., "Exception Handling in C Programs," Software Practice & Experience, vol. 13(5), pp. 389-405, 1983.  50.  Levin, R., "Program Carnegie-Mellon  Structures for Exceptional Condition Handling," PhD Thesis,  University, 1977.  51.  Levin, R., Personal Communication, May 1982.  52.  Liskov, B., "CLU Reference Manual," Computation Structures Laboratory for Computer Science., July 1978.  53.  Liskov, B. and Snyder, A., "Exception Handling in CLU," IEEE Engineering  54.  SE-5(6):546-558,November  Group Memo  161, MIT  Transactions on Software  1979, 1979.  Liskov, B. and Scheifler, R., "Guardians and Actions: Linguistic Support for Robust, Distributed Programs," ACM Transactions  on Programming  Languages and Systems, vol. 5(3),  July 1983. 55.  Lockhart, T.W., "The Design of a Verifiable Operating System Kernel," UBC Computer Science Technical Report 79-15, University of British Columbia, Nov. 1979.  56.  Lomet, D.B., "Process structuring, synchronization and recovery using atomic transactions," Proc. ACM Conf. Language Design for Reliable Software, SIGPLAN  Notices, vol. 12(3), pp.  128-137, March 1977. 57.  Luckham, D.C. and Polak, W., "ADA Exception Handling: An Axiomatic Appoach," ACM Transactions on Programming Languages and Systems, vol. 2(2), pp. 225-233, April 1980.  58.  Macintosh,, "Inside Macintosh," in Apple Computers, 1984.  59.  Malcolm, M., Bonkowski, B., Stafford, G., and Didur, P., "The Waterloo Port Programming System," Technical Report, Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, January 1983.  60.  Metcalfe, R.M. and Boggs, D.R., "Ethernet: distributed packet switching for local computer networks," Communications of the ACM, vol. 19(7), pp. 395-404, July 1976.  61.  Mitchell, J.G., Maybury, W., and Sweet, R., "Mesa Language Manual, version 5.0," in Rep. CSL-79-3, Xerox PARC,  62.  April 1979.  Moss, J.E.B., "Nested Transactions: An APproach to Reliable Distributed Computing," in PhD. Thesis, Masssachusetts Institute of Technology, vol. MIT/LCS/TR-260, April 1981.  63.  Mueller, E.T., Moore, J.D., and Popek, G.J., "A Nested Transaction Mechanism for LOCUS," Proceedings of the 9th Symposium on Operating Systemas Principles, Oct. 1983.  64.  Nelson, B.J., "Remote Procedure Call," in PhD. Thesis. Rep. CMU-CS-81-119,  Department  of Computer Science, CMU, May 1981.  65.  Ousterhout, J.K., Scelze, D.A., and Sindhu, P.S., "Medusa: an experiment in operating system design," CACM, vol. 23(2), pp. 92-105, Feb. 1980.  66.  Parnas, D.L., "On a Buzzword: Hierarchical Structure," Proc. IFIP  Congress,North-Holland  publ. Co., 1974.  67.  Peterson, and Silberschatz,, "Operating Systems Concepts," Prentice Hall, 1983.  68.  Popek, G., Walker, B., Chow, J., Edwards, D., Kline, C , Rudisin, G., and Thiel, G., "LOCUS: A Network Transparent, High Reliability Distributed System," Proceedings of the 8th Symposium on Operating Systems Principles, ACM, pp. 169-177, Dec. 1981.  175  69.  Randell, B., "System structure for Fault Tolerance," Proc. Int'l Conf. on Reliable Software, Los Angeles 1976, 1975.  70.  Randell, B., Lee, P.A., and Treleaven, P.C., "Reliability Issues in Computing Systems Design," Computing Surveys, vol. 10(2), pp. 123-165, June 1978.  71.  Redell, D.D., Dalai, Y.K., Horsley, T.R., Lauer, H.C., Lynch, W.C., P.R.McJones,, Murray, H.G., and Purcell, S.C, "PILOT: an operating system for a personal computer," Communications of the ACM, vol. 23(2), pp. 81-92, Feb. 1980.  72.  Reed, D.P., "Implementing Atomic ACtions on Decentralised Data," ACM Trans. Computer Systems, vol. 1(1), pp. 3-23, Feb. 1983.  73.  Reid, L.G. and Karlton, P.L., "A File System Supporting Cooperation between Programs," Proceedings of the 9th Symposium on Operating Systems Principles, Oct. 1983.  74.  Richards, M. and Whitby-Strevens, C , "BCPL — The Language and its Compiler," in Cambridge University Press, 1979.  75.  Richards, M., Aylward, A.R., Bond, P., Evans, R.D., and Knight, B.J., "TRIPOS - a portable operating system for mini-computers," Software P&E, vol. 9, pp. 513-526, 1979.  76.  Ritchie, D.M. and Thompson, K., "The UNDC Time-Sharing System," CACM pp. 365-375, July 1974.  77.  Ritchie, D.M., S.C.Johnson,, Lesk, M.E., and Kernighan, B.W., "The C Programming Language," The Bell System Technical Journal, vol. 57(6), pp. 1991-2021, July-Aug. 1978.  78.  Sammet, J.E., "Programming Languages: History and Fundamentals," Prentice-Hall, Inc. N.J., 1969.  79.  Scotton, G.R., "An experiment in High level protocol design," University of British Colum-  , vol. 17(7),  bia, Department of Computer Science, MSc. Thesis, Dec. 1981.  80.  Schlichting, R.D. and Schneider, F.B., "Using Message Passing for Distributed Programming: Proof Rules and Disciplines," Carnegie-Mellon Technical Report, vol. 82-491, May 1982.  81.  Sindhu, P.S., "Distribution and Reliability in a Multiprocessor Operating System," CMU Technical Report , vol. CMU-CS-84-125, 1984.  82.  Spector, A., "Performing Remote Operations Efficiently on a Local Computer Network," Communications of the ACM, vol. 25(4), April 1982.  83.  Defense, US Department of, ADA Programming Language Military Standard, MIL-STD-1815,  Washington, D.C, Dec. 80. 84.  vanWijngaarden, A., "Revised Report an the Algorithmic Language A L G O L 68," Acta informatica, vol. 5,Fasc.l-3, 1975.  85.  Wilkes, M.V. and Needham, R.M., "The Campbridge CAP computer and its operating system," in Elsevier North Holland, 1979.  86.  Wilkes, M.V. and Needham, R.M., "The Cambridge Model Distributed System," Operating Systems Review, vol. 14, pp. 21-29, 1980.  87. 88.  Wirth, N., "Programming in Modula-2," in Springer-Verlag, New York, 1982. Wulf, W.A., London, R.L., and Shaw, M., "An Introduction to the Construction and Verification of Alphard Programs," IEEE  Transactions  on Software Engineering, vol. SE-  2,4, pp. 253-265., Dec. 1976. 89.  Zimmerman, H., "OSI Reference Model—The ISO Model of Architecture for Open Systems Interconnection," IEEE Trans. Commun., vol. COM-28, pp. 425-432, April 1980.  Appendix A The putbyte program. T h i s appendix describes a n example of m i n i m i s i n g the average r u n - t i m e b y r e d u c i n g t h e n o r m a l case detection costs in the 81]. put  putbyte  u t i l i t y p r o g r a m in the V e r e x i / o system [Cheriton  T h i s system provides, at the a p p l i c a t i o n level, a similar model of selecting i n p u t a n d outas is used in several implementations of B C P L [Richards 79a].  library [Cheriton 79c] provides the routines  T h e Verex Z-language i / o  selectoutput a n d putbyte  t o o u t p u t a charac-  ter.  In Z , as in m a n y C - l i k e languages, the i / o library routines share global d a t a .  and  selectoutput  share  the global variable  selected-output, f r o m w h i c h the variables  selected-output.ptr a n d selected-output.bufcnt are accessed. veniently be considered to have states  byte  UNINITIALISED  has to deal w i t h are the request for initialisation,  byte t o the selected o u t p u t ,  putbyte  are accepted.  putcharacter.  T h e user must call  Putbyte  T h e routine and  putbyte  RUNNING.  initialise,  m a y be con-  T h e events  put-  a n d the request t o p u t a  Initialisation must occur before any other calls to  selectoutput  t o cause initialisation of the global 1  variable  selected-output w h i c h influences the state o f putbyte. If the p r o g r a m m e r tries to  o u t p u t a character when there is no selected o u t p u t device, a n exception occurs. t i o n is a server-client t y p e , w h i c h is detected by the server routine (in this case,  T h i s excep-  putbyte),  w h i c h may perform some h a n d l i n g before passing the exception on for h a n d l i n g b y the client,  putbyte's  caller.  ^n more modular languages, the global data and the routines manipulating it would be encapsulated in a module, and invocations of selectoutput and putbyte would be treated as request events upon which appropriate action must be taken: some events would cause state transitions and others would not. In the unstructured C-like languages, relevant global variables may similarly be treated as state variables and calls to routines which manipulate these, such as selectoutput and putbyte as events which cause state transitions.  176  177  Now the cost of handling a byte when the local buffer is full, is much more than when it is not full. Thus the execution cost of the routine depends on the buffer state as well as the event, so violating the assumption of constant-cost events. The event set is increased to reflect constant cost, by dividing putbyte's state of RUNNING BUFFER-FULL  into two new states,  so there is a constant cost of handling each  and BUFFER-NOT-FULL  event-state pair. Thus there are 3 states, UNINITIALISED, two external events, initialise  BUFFER-FULL,  and putcharacter.  and  BUFFER-NOT-FULL,  By combining these into constant-cost state2  event handling the events 3x-aA defined in Table A.l below are obtained . Table A . l event  initialisation  Si=  s = putbyteuninitialised 2  probability  cost of detection  expected execution cost  new detection cost  new expected execution cost  3000  0  0.75  0  0.75  10"*  1000  1  0.001  2  0.001  1000  2  1.002  o  1.002  2  2  3.996  1  2.997  0.001  s 4 = putbyte-withbuffer-not-full  0.999  Total expected costs/event  cost of handling  0.00025  s = putbyte-withbuffer-full 3  Execution costs for the putbyte routine  4.75  5.75  A regular implementation of putbyte might be of the following form: The routine error handles the server-client exception NO-SELECTED-OUTPUT  by-  printing a message and making a synchronous error return to the client. The routine emptybuf handles the peer exception PUTBYTE-BUFFER-FULL  which is transparent to the  user (except that the call to putbyte may take longer). ^ o r simplicity, the rare combinations of initialise with states BUFFER-FULL NOT-FULL are ignored.  and  BUFFER-  178  putbyte(ch)  { if selected-output = N U L L then error(NO-SELECTED-OUTPUT); selected-output.ptrH—h — ch; if selected-output.bufcnt-1—h = 1 0 0 0 then emptybuf();  } emptybuf()  { write ( selected-output.buf, selected-output.bufcnt ); selected-output.bufcnt = 0 ;  } T h i s program is now analysed to see where its efficiency c a n be i m p r o v e d . T o o b t a i n the probabilities of each event, assume initialisation (i.e. a call of  selectoutput) is  once for each set of d a t a , a n d its cost is independent of the previous state of average length of the d a t a to be written is 4000 characters, then p (s  x  ) =  performed just  putbyte.  0.00025.  that a request to p u t b y t e w i t h no selected o u t p u t occurs very rarely, say 1 in 10 ten, a n d the buffer is 1000 characters, then p ( a  3  with-buffer-not-full occurs w i t h p r o b a b i l i t y 0.999.  )=  6  If the  Assuming bytes writ-  0.001, so the r e m a i n i n g event,  putbyte-  Note that no k i n d of division is made here  between the events.  A s s u m i n g that the routine  toutput  initialise  w h i c h handles the request exception call of  takes approximately 3000 instructions, the routine  tions, a n d the routine  error  also takes 1000  instructions,  emptybuf a n d say just  takes 1000  selec-  instruc-  2 instructions  are  needed to insert the byte into the buffer a n d update the counter for event s , then the han4  d l i n g costs for each event are as described in c o l u m n 2 of T a b l e A . l . N o w the cost of detecting event  a  x  in  putbyte  is zero,  because initialisation is h a n d l e d in a separate  routine.  A s s u m i n g each test costs one i n s t r u c t i o n , the detection costs are as shown in c o l u m n 3 of T a b l e A . l . Hence the expected average r u n - t i m e cost =  5.75  instructions/event.  179 To minimise the run-time cost, the detection cost of the commonest event, s should be 4  reduced to just one test. Unfortunately this event cannot be tested explicitly; it is assumed to be none of the others.  The approach is to follow the general principle of mapping several  exception events onto one to reduce detection costs. mapping s  2  and s  3  Here, performance can be improved by  together (as the initialisation exception s  x  is already separated otit  through a call to a separate routine). The  detection  of the server-client  exception  PUTB YTE- UNINITIALISED  mapped onto the detection of peer exception BUFFER-FULL  can be  by arranging that the uninitial-  ised global variable selected-output points to a dummy file which has a full buffer. The two exceptional events s  2  and a are then detected in just one test. Individual exceptions s and 3  2  s are separated in the handler emptybuf. A n appropriately modified program reads as fol3  lows: putbyte(ch) {. if selected-output.bufcnt-|—|- < 1000 then selected-output.ptr-j—|- = else emptybuf(ch);  ch;  } emptybuf(ch)  { if selected-output — dummy-file then error(NO-SELECTED-OUTPUT); selected-output.ptr = ch; write ( selected-output.buf, selected-output.bufcnt ); selected-output.bufcnt = 0; selected-output.ptr = selected-output.buf;  } The new program's detection and execution costs are given in the last two columns of Table A . l .  The average run-time cost is reduced by 1/5.75 =  17.4%; achieved by dividing  the events into two sets - {a } corresponding to a normal event and {s s }, 4  2  3  corresponding to  exceptional events, and by modifying the program so that the normal event is detected in just  180  one test.  181  Appendix B The os program. T h i s appendix describes an example of m i n i m i s i n g the average r u n t i m e b y restructuring a utility p r o g r a m , os, to reflect the statistically d o m i n a n t case rather t h a n the logical  flow.  Os converts a f o r m a t t e d text file c o n t a i n i n g backspace characters for u n d e r l i n i n g a n d boldfacing, to a file achieving the same p r i n t e d effect using o v e r p r i n t i n g of whole lines, a n d c o n t a i n ing no backspaces.  T h i s d a t a - d r i v e n u t i l i t y c a n be treated as a n e v e n t - d r i v e n p r o g r a m , where  the i n p u t value of each character read represents an event. In its original  structure,  o v e r p r i n t i n g o n the line.  the p r o g r a m uses multiple line buffers, one for each level of  E a c h character except backspace a n d line terminators are placed in  the highest level line buffer that is currently u n o c c u p i e d i n the current character position. Thus,  its structure  reflects  the logical operation  stream into o v e r p r i n t i n g line buffers.  it is p e r f o r m i n g , t r a n s l a t i n g  a character  However, its structure also means that the character it  processed w i t h the least overhead is the backspace character.  G i v e n t h a t backspaces  consti-  tute about 0.5% of the characters i n most text files, this p r o g r a m is s t r u c t u r e d inefficiently. A n initial version of the p r o g r a m is given i n F i g u r e B . l . U s i n g the same a p p r o a c h as i n the state-event pairs, each w i t h constant event  is reading of the next  putbyte  example, the p r o g r a m is d i v i d e d into several  cost over execution of the event.  character f r o m the text  file.  With  the p r o g r a m  described above, the cost of the event depends o n w h a t the character is —  SPACE,  END-OF-FILE,  a n d all the others.  T h e m a i n external structure  NEWLINE,  If the p r o g r a m has one state,  RUNNING,  constant cost event-state combinations for analysis are as shown i n T a b l e B . l below.  BACKthe  182  while ( byte 7^ END-OF-FILE ) begin read( byte ); 3 if byte = N E W L I N E then writeline( ) ; else if byte == B A C K S P A C E then cc-; else begin for (i=0; i< M A X O V E R P R I N T S ; ++i ) begin line = Lineptrp]; if line[cc] = B L A N K then begin line[cc] = byte; CC+  + ',  Lastxfi] = max{Lastx[i],cc}; break; end; end; end; end; writeline( ) begin for (i=0; i < MAXOVERPRINTS; ++i ) begin last = Lastx[i]; if last = 0 then break; line = Lineptr[i]; for (j=0; j<last; ++j ) begin put ( linejj] ); linep] = BLANK; end; Lastx[i] = 0; end; put (NEWLINE); cc = 0; end; Figure B.l The Initial Version of Os.  'Note that writeline() would be expanded as an in-line macro  183  Table B . l Execution costs for the os utility event  BACKSPACE NEWLINE END-OF-FELE REMAINING CHARS  cost of probab- cost of handling detection ility 1 0.005 0.015 125W+3S2 100 7 0.985  Total expected costs/event  Assuming BACKSPACE  3 2 1 3  new new expected new expected detection execution handling execution cost cost cost cost .25W+0.8 2 0.02 50W+160 W+l 2 0.01W+0.03 1.25W4+3.84 10"* 0 1 lO" .9S5W+2.955 2 9.85 W+l 1.24W+3.785  1.25W+13.71  occurs say 1 in 200 characters, p(INITIALISE)  larly, if the average length of the line is 100 characters, then p(NEWLINE) the average textfileis 10000 characters, then p (END-OF-FILE).—  = .005; simi-  = 0.01. Assuming 5  10" .  Furthermore, it is assumed the system function read(byte) takes R instructions," which are required to handle every event. Now the cost of handling NEWLINE  depends on how  many backspace characters there are in that line, so either more events should be generated to represent this, or the average handling costs over all lines should be taken. The latter approach is used here, noting that there is approximately one backspace character on every other line. The average handling cost per newline character is the average of a line without a backspace and a line with a backspace character half way along. This costs approximately ((100(W+3)+5) + (150(W+3)+10))/2 = 125W + 382, where the system function put(byte) 4  takes W instructions, and the cost of executing a for-loop is 2 instructions per iteration . From Table B.l the expected cost of execution of the initial version = (1.25W + R + 13.71)  4  Note that more characters are written than read, because of the extra blank characters in the overstrike buffers.  184  instructions/eventf This  p r o g r a m does N O T reflect  any s t r u c t u r i n g for the statistically d o m i n a n t case -  indeed it processes one of the least likely bytes ( B A C K S P A C E ) most efficiently. T h e p r o g r a m can be rewritten to reflect the statistically d o m i n a n t case of n o r m a l characters by w r i t i n g it as a null filter c o p y i n g its i n p u t to its o u t p u t u n c h a n g e d . tions to recognize are  exception  END-OF-FILE,  transparent to the user.  a  server-client  S t a r t i n g w i t h this view, the excep-  exception, a n d a  BACKSPACE,  a  peer  T h i s leads to the following p r o g r a m structure:  while ( byte ^ END-OF-FILE ) begin read( byte ); if byte = B A C K S P A C E then HandleException(); else put( byte ); end; In the  HandleException  routine, characters are read a n d stored till the next newline  into line buffers for o v e r p r i n t i n g , similar to the initial p r o g r a m w h i c h used buffers for all lines. T h i s normal-case processing is not sufficient as it stands to handle the  BACKSPACE  excep-  tion because it is necessary to know the current character position on the line when a backspace is encountered.  T h e final p r o g r a m is structured as above w i t h this a d d i t i o n a l context-  saving code, w h i c h takes the f o r m of the insertion of the following lines in the  else  clause  above:  if byte = N E W L I N E then col = 0; else colH—h; T h i s p r o g r a m was measured as twice the speed of the original on most text files, a n d , as C h e r i t o n p o i n t e d out, this is contrary to the expectation that the gain w o u l d be small because the fixed file access overhead w o u l d d o m i n a t e .  'for simplicity the constant cost R for each event is excluded from the Table  185  An  analysis of this program reveals t h a t the new execution cost is composed of three  parts: the h a n d l i n g cost, the detection costs a n d the context-saving cost ( =  1 instruction).  T h e s e are s h o w n i n T a b l e B . l , where the context-saving cost is merged w i t h the h a n d l i n g cost.  In this version the cost of h a n d l i n g the  BACKSPACE  exception i n  HandleException  includes the cost of r e a d i n g a n d storing characters till the next newline — w h i c h means t h a t the cost of h a n d l i n g a n o r d i n a r y byte depends o n whether a backspace has already occurred on a line, or n o t . T o preserve c o m p a r i s o n w i t h the first  os p r o g r a m , all  the  extra  costs i n  processing the o r d i n a r y bytes o n a line w i t h a backspace, are ascribed to the B A C K S P A C E character.  T h e cost of h a n d l i n g every o r d i n a r y character is t h e n W + 1, a n d the cost of h a n -  d l i n g B A C K S P A C E is the extra cost i n h a n d l i n g the line buffers. A s s u m i n g as before, t h a t a BACKSPACE  tion +10.  appears half w a y along a line, the expected cost of executing  HandleExcep-  is the cost of reading a n d w r i t i n g one line buffer of half the average length = Then  the total  expected  cost  of this  version  of the p r o g r a m  is  50(W+3)  approximately  (R+1.24W+3.8) instructions/event. T h i s final p r o g r a m fragment illustrates the use of a readback facility, as described i n section 3.3.3.1, w h i c h w o u l d reduce context-saving costs i n the processing of n o r m a l events, b y enabling the last  NEWLINE  character to be located i n the i n p u t d o c u m e n t , thus o b v i a t i n g  the need to preserve a c o l u m n count.  while (byte != END-OF-FILE) begin read(byte); if byte = B A C K S P A C E then HandleException(); else put(byte); end; HandleException( byte ) begin col = 0;  186  oldbyte = NULL; while oldbyte 7^ N E W L I N E do begin readback( oldbyte ); C0I++;  end; newread(byte); < p ro cess- chars- as- b efo re >; end;  187  Appendix C. The T R O F F / T B L / E Q N system. In a Thoth-like operating system, a UNIX-like pipe system could be implemented using multiple processes, by using a pipe worker process to connect two systems processes, as shown below in Figure C.l. The TBL, EQN and TROFF processes act like servers, waiting on a Receive statement for either data from the client, or for the WORKER-READY message from their pipe-worker. After processing a client request (usually a text line or less), if the worker is free, the server  Figure C.l Pipe Worker Processes  188  makes a  Reply  to the w o r k e r before m a k i n g a 6  awaiting another request.  Reply  to its client a n d c o m p l e t i n g its loop b y  If a client request cannot be serviced i m m e d i a t e l y because the  worker is b u s y , the server q\ieues it (for later processing b y the worker), a n d makes a to its client, as before.  T h e capacity of the queue c a n be set to the capacity of a U N I X p i p e .  If the queue becomes f u l l , the server witholds the ties the queue.  Reply  Reply  to the client until the worker e m p -  T h e worker processes the request by executing a  level below. W h e n the worker becomes free it executes a  Send  Send  to the server at the  to its master for more d a t a .  T h u s i n the above system s t r u c t u r e d as a hierarchy of clients a n d servers, there are 6 x 2 =  12 process switches for each d a t a item i n the source file — f r o m C L t o T B L to P W 1 to  E Q N to P W 2 to T R O F F to W R I T E R - - a n d back again. Suppose an average d o c u m e n t has 5% of its text as tables to be processed b y T B L , 5% as equations to be processed b y E Q N , a n d 9 0 % as n o r m a l text to be f o r m a t t e d only by TROFF.  T h e n this system structure does not reflect the statistically d o m i n a n t case.  T h e system c a n be restructured so the inter-process c o m m u n i c a t i o n is minimised for the statistically d o m i n a n t case b y treating lines w i t h equations or tables as exceptional. are sent to T R O F F a  Send  first.  Reply  T h e user's d o c u m e n t is read b y a client process, C L , w h i c h executes  to T R O F F , for each unit of d a t a .  text, then executes a  A l l lines  Send  T R O F F executes a  Receive  a n d processes n o r m a l  to the level below (say to process W R I T E R ) before executing a  to C L , w h i c h t h e n proceeds s y n c h r o n o u s l y w i t h the next item of d a t a f r o m the user's  document.  T h u s for each item of n o r m a l text there are 4 process switches — f r o m C L to  T R O F F t o W R I T E R to T R O F F t o C L . T h i s is illustrated below i n F i g u r e C . 2 .  e  Care must be taken to ensure that the data is copied into a safe place before the server makes a  Reply to its client.  189  Figure C.2 Message Traffic for Handling an Equation TROFF  detects when a line is of the exceptional f o r m E Q N or T B L a n d handles it by  f o r w a r d i n g to the appropriate server, T R O F F for further processing.  where it is processed as before, a n d t h e n re-sent  to  T h u s 7 process switches are incurred for equation text ~ f r o m  C L to T R O F F to E Q N to T R O F F to W R I T E R to T R O F F t o E Q N to C L . T R O F F forwards subsequent i n p u t to E Q N , until E N occurs.  A similar scheme can be used for T B L text a n d  for mixed E Q N a n d T B L text. If an average d o c u m e n t has 5% of its text as tables to be processed b y T B L , 5% as equations to be processed b y E Q N , a n d 9 0 % as n o r m a l text to be f o r m a t t e d only b y T R O F F , in a 1000-line d o c u m e n t there are 4*900 -I- 7*50 + 7*50 = w i t h 12*1000 =  12000 for the pipe system.  4300 context switches, c o m p a r e d  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051908/manifest

Comment

Related Items