Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The design of a distributed kernel for a multiprocessor system Boyle, Patrick David 1982

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1982_A6_7 B69.pdf [ 3.04MB ]
Metadata
JSON: 831-1.0051817.json
JSON-LD: 831-1.0051817-ld.json
RDF/XML (Pretty): 831-1.0051817-rdf.xml
RDF/JSON: 831-1.0051817-rdf.json
Turtle: 831-1.0051817-turtle.txt
N-Triples: 831-1.0051817-rdf-ntriples.txt
Original Record: 831-1.0051817-source.json
Full Text
831-1.0051817-fulltext.txt
Citation
831-1.0051817.ris

Full Text

The Design of a Distributed Kernel for a Multiprocessor System by PATRICK DAVID BOYLE B.Sc, The University of British Columbia, 1975 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Computer Science We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA June 1982 (c) Patrick David Boyle, 1982 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of G? » w |QyC\jf^  ^ c OJ The University of B r i t i s h Columbia 1956 Main Ma l l Vancouver, Canada V6T 1Y3 Date DE-6 (3/81) A b s t r a c t The possibilities of increased responsiveness, throughput, availability, reliability and cost-effectiveness invite investigation of the hardware and software design of multiprocessor computing systems. This thesis describes an experiment in the design of a multiprocessor operating system based on the distribution of kernel functionality among the processors. One of the design objectives was to build a system capable of supporting real-time applications and a general-purpose, multi-user environment concurrently. The hardware base is a simple, closely-coupled, star network of autonomous computers constructed from "off-the-shelf" boards. The operating system developed, named Distributed Verex, is an extension of Verex which is a descendant of Thoth. The Verex kernel provides an environment of processes and inter-process communication via message-passing on a uniprocessor computer. Distributed Verex provides the same environment uniformly and transparently throughout the multiprocessor system. Distributed Verex has been implemented and is undergoing continuing development. Initial performance measurements are given. - ii -Tab le of C o n t e n t s Abstract ii List of Figures vi Acknowledgements vii 1. Introduction 1 1.1 Motivation 2 1.2 Definition of Multiprocessor 4 1.3 Classification of Multiprocessors 6 2. Verex 11 2.1 Overview of Verex 11 2.2 The Verex Kernel 12 2 .2 .1 Process Management 14 2 .2 .2 Interprocess Communication 14 2 .2 .3 Device Management 17 2 .2 .4 Process Scheduling 18 2 . 2 . 5 Kernel Details 19 2.3 The Verex Team Server 21 - iii -2 . 4 Suitability of Verex 22 3 . System Architecture 2 3 3 . 1 Machine Model 23 3 . 2 Hardware Architecture 26 3 . 3 Suitability of the Hardware 2 9 4 . Distributed Verex Kernel 32 4 . 1 Kernel Design Criteria 32 4 . 2 Kernel Design Overview 34 4 . 3 Distributed Verex Kernel Design 37 4 . 3 . 1 Kernel Data Structures 37 4 . 3 . 2 Inter - Kernel Requests 39 4 . 3 . 2 . 1 Inter - Kernel Communication Mechanism . . . 40 4 . 3 . 2 . 2 Remote Request Issuing 44 4 . 3 . 2 . 3 Remote Request Handling 46 4 . 3 . 2 . 4 Message-passing, an Example 47 5 . Evaluation 52 5 . 1 Discussion 52 5 . 2 Conclusions 5 5 5 . 3 Future Research 56 - iv -References 58 - v -List of Figures Figure 1 Common-bus Interconnection, SL10 8 Figure 2 Crossbar Interconnection, C.mmp 9 Figure 3 Process State Transitions 18 Figure 4 Star Network of Processing Modules 24 Figure 5 Hardware Configuration 28 Figure 6 Remote Interprocess Communication Architecture . . . . . 37 Figure 7 Inter-kernel Request Buffer Format 41 Figure 8 Remote Request Inter-kernel Interactions 50 Figure 9 Kernel Operation Execution Times 53 - vi -Acknowledgements I would like to thank Dr. David R. Cheriton for supervising my research and for his comments on this document. His continued assistance after he left U.B.C. was greatly appreciated. I also wish to thank David for exposing me to the wonderful world of processes and message-passing. I have benefited greatly from participating in the development of Verex. I would also like to thank Steve Deering, John Demco, Elizabeth and Dad for editing my verbose prose (callously omitting needless paragraphs), and for their guidance. Thanks also to Dr. S. T. Chanson for reading and commenting on the final draft. A Graduate Fellowship was gratefully received from U.B.C. - vii -The Design of a Distributed Kernel for a Multiprocessor System Chapter 1 Introduction This thesis describes the design and implementation of a distributed kernel for a multiprocessor computing system. The kernel provides an execution environment of processes and interprocess messages. The hardware is organized as a star network of closely-coupled processors. The centra l processor of the star is a minicomputer ca l led the master processor; the others are cal led slave processors. Each slave consists of an autonomous, s ingle-board, microprocessor-based computer. The network resides within a single computer chassis (and possibly in associated expansion chassis). Information exchange between the master and a slave is accomplished through the sharing of memory local to the slave and is synchronized through the use of inter-processor interrupts. The communication medium is the chassis backplane, the system bus, which provides high data transfer rates with very low error probabil i ty. The master's operating system is an enhanced version of an existing uniprocessor operating system, Verex [9]. Small kernels in the network slave computers, in conjunction with the master processor's kernel , uniformly present the Verex kernel model for processes executing on al l network processors. Distributed Verex provides mult ip le , separate, protected execution environments which support the Verex kernel model . This model includes synchronous message-passing and bulk data transfer fac i l i t ies for communication among processes executing on the multiprocessor computer . -1-The implementation described can provide a modest general-purpose computing fac i l i t y (about 10 users), can handle rea l - t ime applications such as computer communications networking, and can support both t ime-sharing and rea l - t ime applications concurrently . This thesis demonstrates that the uniprocessor Verex kernel model of processes and message-passing, and the team model of groups of processes (sharing code and data, and executing in their own separate address space) can be easily and effect ive ly extended to handle a distr ibuted, multiprocessor architecture. This c la im is substantiated by the description of an implementation of such a system and by measurement of the performance of the implementation. 1.1 Motivation The current trend in computer systems is toward further distr ibution. This trend is motivated by the modular design constraints of hardware and software and by economics. Early computer hardware designs specif ied a single centra l processing unit which both performed al l the computation and control led the operation of peripheral devices. This architecture has been superseded by designs incorporating distributed control and increased modularity . The introduction of peripheral control lers , direct memory access, numeric processors, and multiple "centra l " processors ref lect this evolution. Computer software design has undergone similar changes. Operating systems once maintained their state and provided inter - funct ion communication through the manipulation of global data structures. Code size was reduced by using mult iple entry points to routines. This monolithic design is being replaced by newer ones emphasizing functional independence of modules, module protect ion, and wel l defined -2-inter -module interfaces. The effects of these design strategies may be observed in the current trend toward the distribution of operating system functions to system servers. The economics of large-scale integrated c i rcui ts has rendered multiprocessing units based on microprocessors a viable alternative to systems with a single, more powerful processor. The costs of processors and memory are decreasing while peripheral devices remain expensive. The possibil it ies of increased throughput, f lex ib i l i t y , avai labi l i ty , re l iabi l i ty and cost -effect iveness make multiprocessor architectures at t ract ive . This thesis describes this project's success in real iz ing some of the potential of multiprocessor systems. The objectives of this research were three - fo ld : — to experiment with the design of a multiprocessor operating system based on the distribution of kernel functionality among the processors. This included the provision of location-independent interprocess communication implemented upon a lower level inter -kernel communication mechanism. — to investigate the suitabi l i ty of a s imple, c losely -coupled, star network of autonomous computers as a basis for a multiprocessor machine control led by a distributed operating system. — to build a machine capable of supporting rea l - t ime applications and a general-purpose, mult i -user environment concurrently . This objective arose from interest in providing program development and text processing fac i l i t ies , while supporting high-speed loca l -area and long-haul packet -switched communications network connections. Designing an ef fect ive multiprocessor system is a complex task. The operating system must provide a reasonable abstraction of the hardware to user programs and -3-the hardware architecture must provide suff ic ient functional ity to the operating system. The designer faces hardware architecture design decisions involving inter-processor communication and synchronization mechanisms, memory and device sharing and protect ion, and processor hierarchies. He also faces software design decisions involving communication mechanisms between program units, distribution of control and knowledge, fai lure recovery, and the granularity of paral lel ism to be supported. Some of these design issues wi l l be discussed later in this thesis. 1.2 Definition of Multiprocessor In order to establish a context for the discussion, this section presents several definitions of "multiprocessor system", fol lowed by a brief description of some multiprocessor systems. Systems of interconnected processors may be divided into three groups: computer networks and mult i -computer systems, mult iple ar i thmetic unit processors, and multiprocessor systems. As wi l l be discussed below, it is the closeness of processor coupling which determines into which group a part icular system of processors fa l ls . A computer network typical ly consists of several conventional computer systems interconnected by a communications l ink. The individual computers may be geographically separated, such as those interconnected by the A R P A network [16], or close together, such as those connected by an Ethernet loca l -area network [18]. Mult ip le ar i thmetic unit processors, also known as SIMD (single instruction st ream, multiple data stream) computers, consist of a single instruction decoder, and multiple ar i thmetic and logical units (ALUs) . The ALUs operate in para l le l , in lock -s tep , on mult iple streams of data under the direction of the instruction decoder. An example of a multiple A L U machine is the I L L I A C IV [2] which is an array processor. Multiprocessor systems are much more closely l inked and are described below. -4-Enslow [12] claims that the character ist ics of a "true" multiprocessor system are: — Multiprocessors consist of two or more processors of comparable capabi l i t ies . — A l l processors share access to common memory. — A l l processors share access to input/output channels, control lers and devices. — The entire system is control led by one operating system providing interaction between processors and their programs at the job, task, step, data set and data element levels. Enslow suggests that systems consisting of more than one processor, but which do not satisfy the above c r i te r ia are cal led mult i -computer systems. Satyanarayanan's [19] c r i te r ia are less rest r ic t i ve : — The system consists of two or more non-special ized processors (i .e., excludes I/O channels and other special-purpose processors). The processors need not be homogeneous. — A l l processors must be capable of performing signif icant computation individually. — A l l processors share access to common global memory. The American National Standard Vocabulary for Information Processing [21] defines a multiprocessor as "a computer employing two or more processing units under integrated contro l" . -5-' Satyanarayanan's definit ion describes a subset of the machines included by the loose American Nat ional Standard def ini t ion; Enslow's describes a subset of Satyanarayanan's. These definitions i l lustrate the lack of consensus on the distinguishing character ist ics of a multiprocessor system. Nevertheless, one can conclude that the closeness of coupling of the connected processors determines whether the machine is c lassif ied as a multiprocessor. In this thesis, a multiprocessor machine is considered to be: — an MIMD (multiple instruction s t ream, multiple data stream) computer. — a system consisting of two or more non-special ized, possibly non-homogeneous, processors. Each processor must be capable of signif icant computation individually. — a system in which the processors are t ight ly -coupled. Processor synchronization mechanisms must be required for the correct operation of the machine (e.g. to control access to global memory shared among a l l of the processors, or memory shared by subgroups of the processors). 1.3 Classification of Multiprocessors Multiprocessor architectures can be classif ied by the types of connections between the processors and memory. Secondary distinguishing character ist ics include their inter-processor communication mechanisms and symmetry . In a symmetr ic system memory, processor and device addressability are uniform among the processors. In the fol lowing paragraphs, several systems are discussed and classif ied according to the these character ist ics . -6-One of the simplest interconnection schemes ut i l izes a common communication path, the common bus, connecting al l of the functional units. This is simple since the communication medium can be a passive multiconductor cable or a computer bus, and bus protocols handling arbitration and communication mechanisms are wel l understood. This type of interconnection has been used for years in uniprocessor systems which have multiple bus masters (e.g. the processor and direct memory access control lers) . The s impl ic i ty of the common bus connection scheme brings with it the disadvantages of the introduction of a c r i t i c a l component to the system, and the contention for the l imited communication bandwidth. Examples of common-bus computers include the SL10 [3] discussed below, and the Pluribus [14]. The SL10 consists of a network of modules (see Figure 1). The types of modules can be grossly divided into two types: processor modules and common memory modules. Each processor module consists of a 16-bit processor, local memory and possibly local peripheral interfaces. Processors may not reference memory or devices local "to another processor module. A common memory module consists of memory, possibly peripheral control lers , and hardware support (the l ist control ler) for synchronization and management of queues stored in the memory. The memory and peripheral control lers on the common memory module's local bus are shared by the processors connected to the common bus. One common memory module and several processor modules may be connected to the common bus. Expansion of the capacity of the system is accomplished simply by connecting more processor modules to the interconnection bus; this succeeds unti l the bandwidth of the interconnection bus becomes the l imit ing factor . Bus arbitration is provided by the bus switches which connect the local buses to the interconnection bus. The inter-processor communication mechanism is provided by the list control ler and common memory in the common memory module. - 7 -- p r o c e s s o r I I - p r o c e s s o r 1 1 II II - l o c a l memory I I I I II - l o c a l i n t e r f a c e s I I I I I I - l o c a l i n t e r f a c e s I I I I I I I I I I I I I I i n t e r c o n n e c t i o n bus TT 1 1 - s h a r e d memo r y 1 1 i i I I - l i s t c o n t r o l l e r 1 1 I i I I - s h a r e d i n t e r f a c e s Figure 1. Common-bus Interconnection, SL10 The l ist control ler allows messages in common memory to be queued (dequeued) with a single write (read) cyc le from any processor. The SL10 is symmetr ic in the sense that the processor modules have uniform addressability to both local devices and memory, and common devices and memory. The processor hierarchy is s ingle - level . Another connection scheme uti l izes a crossbar swi tch . Each processor unit and memory unit is separately connected to a row or column of a matr ix of paths. The paths are interconnected at cross-points by switches. Having separate paths to each memory unit el iminates the bus-bottleneck problems of common-bus computers, resulting in a much higher total transfer rate . However, the cross-point switches are compl icated since they must be capable of switching paral lel transmissions and resolving mult iple requests for access to the same memory module occurring during a single memory cyc le . Expansion of the system is accomplished either by increasing the capacity of the switches if they are the bott leneck, or by adding more processor or memory units and their associated switches. A system which uses crossbar interconnection is the C.mmp [20] (Figure 2). memo r y memo r y I I address address t r a n s l a t o r t r a n s l a t o r 1 - proces sor 1 - p r o c e s s o r I - l o c a l memo r y 1 - 1 oc a 1 memor y 1- l o c a l I /O 1- l o c a l I / O c o n t r o l l e r Figure 2. Crossbar Interconnection, C.mmp Any processor unit can be connected to any memory unit through the crossbar switch. Each processor unit consists of a 16-bit processor, local memory (used for interrupt and trap vectors only) and local I/O devices. The address translators are required since the total system memory size exceeds the addressability of the processor. The inter-processor communication mechanism is provided by shared memory and inter-processor interrupts. C.mmp is symmetric since processor modules have uniform addressability to both local devices and memory, and common memory. The processor hierarchy is single-level. Another connection architecture uses multiport memory. Each memory unit has a separate connection to each processor attached to it. Simultaneous reference arbitration is performed by the memory units, thus the memory units are complicated but the interconnections are simple. Because of the separate memory-to-processor connections, the potential total transfer rate is high. The dual processor IBM 360/65 [1] is an example of a symmetric system which uses multiport memory. - 9 -The connection architectures which have been discussed above are those which occur most frequently. An interesting example of a system which does not f i t into the above classif ications is a hybrid of the star and common bus architectures. The C m * [13] [15] consists of f ive star sub-networks linked by two buses. Each star sub-network or "cluster" consists of a K m a p , the centra l device of the star , connected to ten computer modules each consisting of a microprocessor, local memory and local I/O devices. The Kmap is a compl icated , fast , microprogrammed device (4K of 80-bit words, 150 nanoseconds cyc le t ime). The f ive Kmaps are l inked together by two common buses. When the computer module hardware detects a memory reference made by its processor to a location not in its local memory, the reference is passed to the Kmap which handles intra-c luster and inter -c luster memory references. Processor memory reference t ime depends on whether the location referenced is local to the processor, in the processor's c luster , or in another c luster . Thus, the performance of the machine depends on the ratio of local to non- local memory reference frequencies. The systems mentioned above are a sample of the multiprocessor architectures which have been implemented. They have been included to establish a background for later discussion of the design described in this thesis. - 1 0 -Chapter 2 Verex The operating system on which this work is based is Verex [4], a descendant of Thoth [10]. A brief description of the structure of the Verex operating system and its program environment is given below. This is fol lowed by a description of the Verex kernel and a discussion of the suitabi l i ty of Verex as a basis for this research. 2.1 Overview of Verex Verex is a uniprocessor, interact ive , mult i -process, message-based operating system. It may be configured either as a general-purpose t ime-sharing system or as a dedicated rea l - t ime system. The operating system is a two- layer structure: a smal l kernel , and processes which perform operating system functions. The kernel provides processes and process management functions, interprocess communicat ion, and low- leve l device support. The processes provide operating system services and may-use the fac i l i t ies provided by other system processes and the kernel . The kernel is the lowest level of operating system software. It is designed to be se l f - suf f ic ient ; it must not depend on services provided by, or correct execution of, software other than its own. This design constraint prevents the kernel f rom being an element in cyc l ic dependencies and requires that kernel code and data be protected. These constraints are motivated by considerations of ver i f iabi l i ty [17]. In Verex terminology, a program consists of a team or several cooperating teams. Conceptual ly , each team executes on its own vir tual processor, in its own separate address space. A team consists of one or more processes which are separate executing ent i t ies. Processes on a team share the team's v irtual processor - 1 1 -and address space and thus share code and may share data. Each process executes with its own stack. A process can communicate -wi th other processes, even those not on its team, using a ful ly synchronized message-passing mechanism. A Verex message is f ixed- length and smal l (8 words). Verex system services are provided by the kernel and teams of processes cal led servers. Current Verex servers include the name server which provides logical name-to - ident i f ier mapping, the storage server which provides f i le system services, terminal servers, the t ime server, the X .25 server which provides a network connection, and the team server which provides for the creation and management of teams (see section 2.3). Run - t ime support functions cal led environment functions provide the interface between programs and the operating system. These functions provide - standard access to the services provided by the kernel and system servers and execute entirely in the invoker's address space. The environment functions request services provided by system server processes using message-passing; requests to the kernel are made through the execution of a software processor trap instruct ion. The environment function operating system interface hides the implementation of operating system requests. This permits modif icat ion of the Verex model implementation while maintaining an unchanging program environment, and modif icat ion of the program environment without requiring alteration of the Verex implementat ion. 2.2 The Verex Kernel The Verex kernel is a simple mult i - tasking monitor. The kernel model provides processes, process management, interprocess communicat ion, and low- leve l device management. The kernel also handles micro - leve l processor scheduling (process-switching). >. - 1 2 -Processes are separate executing ent i t ies, each with its own state which includes a message queue and a smal l stat ical ly al located message buffer used for interprocess communicat ion. Inside the kernel the description of a process including its state is stored in a process descriptor. Outside the kernel a process is identif ied solely by its global process identifier. The kernel per-process data structures are smal l , process creation and destruction are "cheap", and the process-switching is e f f i c ient , permitt ing hundreds of processes to exist concurrently . Processes are created in a tree of processes, the root of the tree being the f irst process created by the kernel upon kernel in i t ia l i zat ion . The tree is structured by creat ion ; a l l processes created by a given process are the immediate descendants of that process. Verex kernel operations are invoked by the execution of a software processor trap instruct ion. Up to three arguments are passed through the invocation mechanism, specifying the kernel operation to be executed and two kernel operation parameters. Verex kernel operations behave as procedure cal ls and thus execute indivisibly with respect to the invoking process. The execution of a kernel operation may result in the indefinite blocking of the executing process. Upon completion of the execution of a kernel operation a single value may be returned to the invoking process. The kernel operations execute with interrupts disabled and thus are indivisible. One of the design cr i ter ion of the Verex kernel was that it must be capable of providing predictable rea l - t ime response to external events (signalled by interrupts). Thus, kernel operations are constrained to execute in bounded t ime . Verex kernel operations may be classif ied by function as process management, interprocess communicat ion, or device management operations. The major environment function cal ls which ut i l i ze the kernel operations are described below. -13-2.2.1 Process Management The kernel process management functions include operations to create or destroy a process, read and write the volatile environment or registers of a process (Read_regs, Write_regs), query the validity of a process identifier (Valid_id), and query if one process is awaiting reply from another (Awaiting_reply). id = Create_process( priority ) creates a new process with the specified priority (relative to the other processes on its team) and returns a unique global process identifier. The process is left awaiting a reply from the creating process. success = Destroy_process( id ) halts and removes the process associated with id, rendering this identifier invalid providing id is a valid identifier, the process specified by id has no descendants, and the requestor has the required authority. The kernel unblocks processes that are blocked on nonexistent processes (indicated by invalidity of process identifiers) with a time-out mechanism. 2.2.2 Interprocess Communication Verex processes communicate using small, fixed-length, fully synchronized messages. There is also a block data transfer facility used to transfer data between cooperating processes when the amount of data to be transferred exceeds the capacity of a single (8-word) message. The interprocess communication kernel operations allow a process to read and write the contents of its kernel message buffer (Readjnsg, Writejnsg), and to exchange its buffer with other processes. These exchange operations allow a process to send a message to another process (Send_msg), receive a message (Receive_msg), -14-forward a message (Forwardjnsg) , and reply to a send (Reply jnsg) . The message-passing environment functions described below use combinations of these kernel operations to provide their fac i l i t ies . The process state transition diagram in Figure 3 i l lustrates the blocking nature of the fol lowing operations. In the fol lowing, a buffer is l o c a l to a process if it is in the address space of the process's team. success = Send( id, message ) sends the contents of the local buffer message to the destination process, the receiver , identif ied by id. This operation succeeds if id is a val id process identif ier and the receiving process replies to the sender (i .e. receiving process is not destroyed before it replies). The sending process is blocked awaiting reply unti l the receiving process replies to i t . If the receiving process is already blocked awaiting a message from the sender, it is unblocked and the message is del ivered. The Send environment function uses the Write_msg kernel operation to copy the process's local buffer into its kernel message buffer , Send_msg to deliver the message to the receiving process's kernel message buffer , and Read_msg to copy the receiver's reply back into the sender's local buffer . id = Receive( message ) copies the oldest message in the receiver's message queue into the local buffer message, removes the message from the queue, and returns the process id of the message's sender. If no messages are queued, the requesting process is receive blocked unti l a message is sent. This operation is cal led a receive general as the received message may come from any sender. -15-success = Receive( message, id ) succeeds if the process specif ied by id sends to the receiving process. It fai ls if id is an inval id process identif ier or if the specif ied process is destroyed before it sends to the receiving process. If the sending process has a message queued when the receiver executes Rece ive , the queued message is copied into the receiver's local buffer message, and is removed from the message queue. The sender remains blocked awaiting reply. If there is no message queued from the sender, the requester is blocked unti l the sender sends or is destroyed. This operation is cal led a receive s p e c i f i c as the received message must come from a specif ic sender. success = Reply( message, id ) succeeds if the sending process specif ied by id exists and is awaiting a reply f rom the replying process. Reply results in the transfer of the contents of the local buffer message to the buffer local to the sender specif ied in its Send request. The sender is then unblocked. ForwardC message, from_id, t o _ i d ) forwards the message to the process identif ied by t o _ i d and makes it appear as if the message were originally sent by the process identif ied by from_id. If t o _ i d is not a valid process ident i f ier , or if the process is not receive-speci f ic blocked from from_id or receive-general blocked, the process identif ied by from_id is unblocked and returned a fai lure code. Otherwise, the message is transferred to the receiver's local buffer and the receiver is unblocked. -16-2.2.3 Device Management The kernel low_level device operation provides process synchronization with the occurrence of interrupts. The environment function Await_interrupt( device_code ) invokes a kernel operation which blocks the requesting process awaiting interrupt on the specified device. When an interrupt occurs on the device, the interrupt handler in the kernel may unblock the blocked process. Use of this operation is permitted only to device server processes. Many operating systems manage devices using kernel resident device drivers written in assembly language. Await_interrupt permits Verex to perform device handling outside the kernel in the more pleasant environment of processes, written in a high-level language, which may use kernel services. A more complete description of the Verex program environment may be found in [9]. A more complete description of the Verex kernel operations may be found in [6]. Awa i t i n t e r r u p t i n t e r r u p t R e p l y Forw_ard Figure 3. Process State Transitions 2.2.4 Process Scheduling When a process is created, its priority relative to other processes on its team is specif ied The kernel's processor scheduling algorithm guarantees that upon a process-switch : the processor is al located to the highest priority ready process. Once a process has been al located the processor, it executes unti l either it blocks or is pre-empted by the readying of a higher priority process- Process readying results f rom an interrupt handler unblocking a process, the destruction of the process on which a process is blocked, or a reply given to a process which was blocked awaiting reply. -18-2.2.5 Kernel Details This subsection brief ly details the Verex kernel data structures and how they are manipulated through the execution of kernel operations. Outside the kernel , a process is identif ied solely by its process identif ier (PID). When a process is c reated , the new process's PID is returned to its creator . Throughout the l i fe of the process, that PID uniquely identifies that process. When a process is destroyed, its PID is inval idated. Some t ime later , the PID may be reused to. identify a different process since there is a f in i te number of possible PIDs (a PID is stored in a ful l -word) . This reuse may result in conf l ic ts if some process uses the PID to identify a process which was previously al located the PID. In order to minimize the probability of such conf l i c ts , the recycle t ime of the PIDs is maximized. Associated with each process is a process descriptor (PD) in which per-process information is maintained. The PD includes the fo l lowing: PID, process state, process pr ior i ty , message buffer pointer, message queue l inks, process tree l inks, ready-process queue l inks, and when the process is not executing, the saved processor state . Storage for a f ixed number (about 100) of PDs and small message buffers (one buffer per PD) are stat ical ly al located during system generation. The PDs of al l val id processes are l inked into a tree structured by creat ion . A l l processes created by a given process are the immediate descendants of that process. Links to a process's brother, most recently created son, and creator (father) are maintained in the P D . Unal located PDs are maintained in the free PD l is t . During kernel in i t ia l i zat ion a vector of pointers to PDs is established with one entry per P D . When a process is c reated, the resulting fu l l -word PID consists of a generation number (several bits) fol lowed by the index into the vector of pointers which corresponds to the PD al located to that process. - 1 9 -With in - the kernel , the mapping from a PID to the appropriate P D , and the validation of a given PID must be fast as these functions are performed at least once during the execution of most kernel operations. This requirement is satisf ied using the fol lowing scheme. The P D corresponding to a part icular PID can be found by masking off the PID generation bits and using the resulting word as an index into the vector of P D pointers. The validity of the PID can then be determined by comparing the PID stored in the PD with the PID to be tested. A detailed description of the Verex scheme can be found in [5]. Since the Verex kernel is a monitor, and since Verex kernel operations execute with interrupts disabled, only a single kernel stack is required. Invoking a kernel operation results in the kernel executing on this s tack. A Verex kernel operation is invoked by a process executing a processor trap instruct ion. The execution of a kernel operation may not be suspended and later resumed - it executes indivisibly to complet ion. Complet ion of a kernel operation leaves the invoking process either blocked or ready. When the invoking process is next al located the processor, its execution w i l l resume at the instruction which succeeds the processor trap instruct ion. If the invoking process is le f t ready upon the completion of the kernel operation, and that operation has not resulted in the readying of higher priority process, the invoking process w i l l resume execution immediately . The process al located the C P U is cal led the a c t i v e p r o c e s s ; there is only one active process at a t ime . If a kernel operation is executed, it must have been invoked by the - act ive process. Interrupts interrupt the execution of the act ive process. The kernel maintains a pointer to the PD of the active process to fac i l i ta te its access during the execution of kernel operations. Performing a process switch consists of saving the state of the active process in its P D , changing the pointer to the active process's P D , restoring the state of the next process to execute, and init iat ing execution of the new active process. -20-As i l lustrated in Figure 3, a process must be either ready to execute or blocked in one of the fol lowing states: awaiting reply, receive blocked or awaiting interrupt. The PDs of processes which are ready to execute are l inked together in a pr ior i ty -ordered queue cal led the ready-process queue. Processes which are awaiting reply or receive blocked on another process are l inked together in a message queue l ist . Pointers to the PDs of processes which are blocked awaiting interrupt are saved in a vector cal led the Interrupt table which is indexed by logical device number. A blocked process may be readied as a result of three types of events. F i r s t , a process which is awaiting a reply f rom (receive-specif ic blocked on) a process is readied when it is replied to (sent to) by that process. A process which is receive-general blocked is readied when any process sends to i t . Second, processes which are blocked on a process which has been destroyed are readied eventually by a t ime-out mechanism implemented by the clock interrupt handler. Third, a process which is blocked awaiting an interrupt from a device may be readied by the device interrupt handler. 2.3 The Verex Team Server Verex operating system services are provided by the kernel and teams of processes cal led servers. The services offered by Verex servers are provided by the kernels of most other operating systems. The Verex team server is a team of processes which provides teams, manages memory a l locat ion, and manages processor al location at the team leve l . The team server accepts requests to invoke a team specif ied by a f i le system pathname, destroy a team of processes identif ied by the team's root-process ident i f ier , and extend the memory al located to the data segment of a team. Also, the team server periodical ly reorders the relat ive scheduling priorit ies of memory-resident teams to force sharing of the processor among teams. This occurs -21-at a higher leve l , and considerably less frequently, than the kernel - level process-switching. As part of the reordering of relat ive team pr ior i t ies , the team server provides program swapping: exchanging memory-resident programs with those swapped out to disk. The team server takes advantage of the pleasant environment of mult iple processes, interprocess communication and other fac i l i t ies provided by the Verex kernel , as wel l as the services of other system servers, to perform its functions. 2.4 Suitability of Verex The Verex operating system provides a suitable basis for this research project for several reasons. The Verex kernel is smal l and s imple, and the kernel interface is wel l defined. The kernel only provides processes, process management, interprocess communication and low- leve l device handling. The rest of the operating system functional i ty is provided by system server processes, not by the kernel . The interaction between servers and user processes is provided by kernel interprocess communication operations. Thus, in order to provide the Verex operating system environment on a mult i -processor system, it is only necessary to provide the Verex kernel model uniformly over al l of the processors. The kernels (one per processor) cooperatively need only provide processes with globally unique ident i f iers , process location independence (other processes do not know on which processor a process executes), and interprocess communicat ion. -22-Chapter 3 System Ar c h i t e c t u r e In the previous chapters, several multi -processor architectures have been discussed and the operating system on which this work is based has been outl ined. This chapter describes the ideal machine model for the implementation of Distr ibuted Verex and how this model was rea l i zed . 3.1 Machine Model This is a project in operating system design, not hardware design, so the s impl ic i ty and avai labi l i ty of the hardware required to build a mult i -processor system were important factors in the choice of the hardware archi tecture. These factors , and the centra l ized nature of the operating system design, led to the choice of a star network archi tecture. The ideal machine model for this research project logical ly consists of a star network of processing modules with the network master processor acting as a message switch and central control ler of the slave processors attached to i t . Each of the processing modules consists of a processor, local memory, and possibly local devices. Each module's local memory must be inaccessible to other network processors. The module's memory associated with its kernel code and data must be protected when its processor is not executing kernel code. The network master processor, the center of the star , may communicate directly with any of the other processors, the slave processors. Slave processors can communicate directly with the master but inter -s lave communication must go through the master processor. The communication mechanism must be re l iable , provide high data rates, and have low latency. The master processor must have the abi l i ty to reset slave processors. -23-I I I I I I I s 1 a v e I I s 1 a v e I I s 1 a v e I I I I I I I \ — T / \ I / \ I / \ I / I s l a v e | I s l a v e I I s l a v e | I Figure 4. Star Network of Processing Modules This model was chosen for several reasons. F i rs t l y , the star network configuration offers the advantages of central ized cont ro l , simple information sharing and uncomplicated inter-processor communicat ion. Centra l i zed control eases the problems associated with management of processor act iv i ty . The processor management functions include processor al location and deal location, program swapping, and recovery from faults resulting from errant programs. Communicat ion among devices connected in a star network is much simpler than other organizations. There are no routing or connectiv i ty problems and only one intermediate device, the central node, which acts as an intermediary in communications. Information which is common to a l l or several network nodes may be stored in the centra l node to minimize information duplication and problems of simultaneous update. Also, the central node may perform operations on central ly -stored common data at the request of other nodes, thereby reducing the complexity of node programs. -24-Secondly, the processors are autonomous; each processor has its own local memory and devices. Inter-processor communication is only required when one processor requires the cooperation of another processor to perform some function or to transfer data. Thus, in addition to providing processing power, the model encourages the use of slave processors as intel l igent device handlers. Thirdly, a wide variety of hardware configurations, ranging from multiprocessor systems to mult i -computer systems ut i l i z ing high-speed inter -computer interconnections satisfy the general requirements of this model . For example, several autonomous processors interconnected by a high-speed local area network, which provides rel iable datagram transport between the slave processors and the master processor, would provide a suitable hardware base. Fourthly , the character ist ics of the inter-processor communication mechanism permit the use of l ight -weight protocols and dictate that the response t ime for inter-processor messages be l imi ted by the speed of the destination processor and not the communications channel. F ina l ly , the ef fect of hardware fai lure is local ized by preventing processor's access to other processor's local memory and devices. There are few disadvantages to this general model . F i rs t l y , as in any star network, fai lure of the central node isolates every other node. If the fai led central node normally provides services required by other nodes then the whole system may f a i l . -25-Secondly, it is possible for the centra l processor to become a l imit ing factor in system performance. The model specifies that this processor mediates in al l inter-processor communicat ion. F ina l l y , the memory protection requirements el iminate many candidate system architectures. 3.2 Hardware Ar c h i t e c t u r e The hardware architecture chosen for the in i t ia l implementation of Distr ibuted Verex consisted of a Texas Instruments minicomputer for the master processor, and several Texas Instruments autonomous microcomputer modules for slave processors. The original uniprocessor hardware configuration of the master processor consisted of a Texas Instruments 990/10 processor, 288 kilobytes of memory, and the usual peripheral devices including two 50 Megabyte disk drives, eight terminals and a printer. The 990/10 is a byte addressed, 16 bit minicomputer with memory mapping hardware. Communicat ion between the 990/10, memory and peripheral devices is accomplished through the use of the system backplane which provides two logical data and control paths, the communications register unit (CRU) and the TILINE. The C R U may be visualized as 4096 consecutive bits manipulated by special C R U input/output instructions. Some low-speed devices, such as the terminal mult iplexors, use the C R U for communication with the C P U . The TILINE is a high-speed, mul t i -master , asynchronous, paral lel bus. It provides a 16 bit wide data path and 20 address lines to define an address space of 1024 ki lowords. The lower 512 words of the top 1024 words of the address space constitute the TILINE Peripheral Contro l Space. Some peripheral device control lers , such as the disk control lers , use this memory mapped control space for communication with the C P U . The 990/10 is usually the TILINE bus master; memory - 2 6 -devices are always bus slaves. D i rect memory access devices, such as disk control lers , can be either bus masters or slaves. The desired multiprocessor architecture has been real ized by augmenting the uniprocessor configuration with Texas Instruments 990/5 processors. The 990/5 is a single-board, autonomous, 16 bit , microprocessor-based, computer designed to plug directly into the backplane of the computer chassis. It supports up to 62K bytes of local R A M and 2K bytes of local E P R O M and provides several local (on board) devices. These devices are manipulated through the local C R U and include two asynchronous ser ial interfaces, a synchronous ser ial inter face, and three t imers. The 990/5 executes the same instruction set as the 990/10 except that there is no privi leged mode, there are no i l legal instruction interrupts, and there are no instructions related to memory mapping. The 990/5 may be configured to operate either as a uniprocessor computer , or as a slave processor in a multiprocessor conf igurat ion. When configured as a slave processor, it is a bus slave device only. It may use only the global (system wide) C R U bits assigned to its board and its local ly defined C R U bits, and it may not access non- local memory. In a multiprocessor configuration bus master devices, including the master processor, and the local processor have read/write access to the 990/5's local memory. The only hardware modif icat ion required to the of f - the -shel f 990/5 was a simple change which inhibited the 990/5's access to the TILINE peripheral control space. - 2 7 -ma s t e r corrm' n bu f f e r s 1 a v e corrm' n b u f f e r r e s t o f s l a v e memo r y mas t e r to s l a v e d a t a s l a v e to ma s t e r d a t a I i n t e r - p r o c e s s o r i n t e r r u p t s S l a v e P r o c e s s o r Ma s t e r P r o c e s s o r S l a v e P r o c e s s o r Figure 5. Hardware Configuration Synchronization of communication between the master and a slave processor is provided by C R U - d r i v e n maskable interrupts. No s lave- to -s lave interrupts are provided. The master processor may interrupt or be interrupted by slave processors. A master (slave) processor may acknowledge an interrupt f rom a slave (master) processor. Interrupt acknowledgements interrupt the processor to which the acknowledgement is sent. Inter-processor interrupts may be select ively disabled by the processors. Data transfer between the master and a slave is provided through shared memory local to the slave. Fu l l duplex communication between the master and a slave is accomplished using the fol lowing scheme. On each slave, two communication areas are reserved in its memory: the master and the slave communicat ion areas. -28-When the master processor wishes to transfer information to the slave, it f i l l s the master communication area and interrupts the slave processor. The slave then processes the information and enters its response in the master communication area. The slave then signals its completion of processing of the information by acknowledging the master processor's interrupt. When the slave processor wishes to transfer information to the master, it f i l ls the slave communication area and interrupts the master processor. The master signals its completion of the handling of the information by acknowledging the slave processor interrupt (see Figure 5). Master -s lave inter-processor communication for up to 16 slaves ( l imited by the hardware addressing scheme) can be accommodated in this fashion. 3.3 Suitability of the Hardware This hardware organization is appropriate for several reasons. F i rs t l y , it is simple and relat ively inexpensive to assemble. The 990/5 processors are commonly used for smal l computer applications and are thus readily avai lable. Volume production results in a reasonably low pr ice. The hardware is available of f - the-shel f and requires only slight modif ications to provide a reasonable subset of the ideal model's properties. Also, a l l of the network processors execute the same instruction set, which simpli f ies software development and permits experimentation with program swapping. In addit ion, eventhough the 990/5 ut i l izes only a microprocessor, its average instruction execution rate is sl ightly higher than the 990/10 minicomputer (the 990/5 does not suffer from memory mapping and bus access overhead). Since the processing speeds are s imi lar , but the 990/5 is much cheaper than the 990/10, this combination has potential for real iz ing those elusive multiprocessor performance/cost gains. Secondly, the system architecture is easily real izable on most bus-structured computers. A slave processor acts as just another bus device which has the capabi l i ty - 2 9 -to generate interrupts and has a memory mapped communication area. Progress real ized as a result of the in i t ia l implementation could be carr ied to implementations on other machines. Thirdly, the configuration is extensible to the degree allowed by the system bus, the power supply and the inter-processor communication mechanism. Fourthly , inter-processor data transfer is provided by memory copying over the system bus by the master processor. Data transfer occurs at memory speed with low error probabil i ty , thus the communications protocol can be very s imple . The inter-processor synchronization is provided through the standard interrupt mechanism. F ina l l y , the configuration permits experimentation with the use of slave processors as intel l igent device handlers. An example of such an application is a network communications handler, the Verex X .25 server [11]. The slave processor boards are equipped with an H D L C compatible synchronous inter face. The software in the slave handles the packet , link and physical layers of the communications protocol . Access to the system bus is only required during the communication of high level requests and data between the X .25 server and c l ient processes running on other processors. There are disadvantages with the hardware conf igurat ion. F i r s t l y , the slaves have no dynamic address translation or memory protect ion. Lack of memory mapping results in the undesirable sharing of memory between the slave's kernel and the appl icat ion. Consequently there is reduced memory for the application and no protection for the kernel code or data. An additional problem results from the restr ict ion that the f irst 64 words of slave memory must be dedicated to interrupt and trap vectors, and the last 4 to reset vectors. The protection problems could be solved by remapping part of the R O M to low memory and implementing a memory protect fence in upper memory. The interrupt and trap vectors would be in R O M and -30-the kernel code and data could be put in high memory beyond the fence. Access to the memory above the fence would be control led through the use of a C R U bit which could be set upon interrupt or t rap. This solution to the memory protection problem would require signif icant hardware modi f icat ion. Secondly, there is no mechanism available to the master processor to reset the slave processors in the event of a slave fa i lure . This mechanism could be provided through some hardware modifications to the slave processors and the introduction of a paral lel interface to the system bus. The master processor could then reset a slave through manipulation of a line on the paral lel inter face. Thirdly, since the 990/5 is a single board computer, no devices other than those provided on the board may be control led by the slave processors. F ina l l y , since the 990/5 processors are designed to function either as a uniprocessor, or as a slave processor in a multiprocessor conf igurat ion, considerable logic is incorporated which would not be required if they acted solely as slave processors. This unnecessary (for this application) funct ional i ty costs in chassis space, in power supply requirements and in expense. -31-Chapter 4 Distributed Verex Kernel The previous two chapters have presented the hardware architecture and software base on which Distr ibuted Verex was built . This chapter describes the design objectives of the multiprocessor kernel and how these objectives were met. . 4.1 K e r n e l Design C r i t e r i a The uniprocessor Verex kernel provides an execution environment of processes and interprocess messages. Distr ibuted Verex provides the same environment uniformly throughout a multiprocessor system. This is achieved through cooperation among the processors' kernels. The design of the distributed kernel recognizes several properties of the machine, the operations required, and the Verex philosophy resulting in the fol lowing design c r i t e r i a : — Globally unique process identif iers and process location transparency must be provided. — Semantically local operations must be handled local ly . — The size of the slave kernel software must be min imized , increasing the size of the master processor's software if necessary. — The correct operation of the kernel in the master processor must be independent of the correct operation of the slaves. — The Distr ibuted Verex kernel operations provided by each kernel must be those of the Verex kernel . -32-The Verex kernel model must be presented uniformly throughout the multiprocessor system. The Verex kernel performs operations on processes; a process is identif ied by its process ident i f ier . Since the contents of a Verex interprocess message are untyped, process identif iers may be exchanged between processes on different processors without the knowledge of either processor's kernel . These exchanges are fundamental to the operation of a Verex system and cannot be restr icted to occur only within the bounds of a single processor in the conf igurat ion. Therefore, process identif iers must be globally unique: a process identif ier must uniquely identify a single process regardless of on which processor the identif ier is referenced. Furthermore, a process must not need to know the processor configuration or the location of a specif ic process to identify that process; it must only need to know its process ident i f ier . This requires that p r o c e s s - l o c a t i o n transparency be provided. Compl icated Verex rea l - t ime application programs which must respond to asynchronous external events (e.g., the X .25 server [11]) are typical ly designed using mult iple processes. These programs rely on high-speed process switching and require rapid response to device interrupts. In order to provide the performance required for such applications, kernel operations which can be handled by a processor's local kernel must be. Addit ional motivation for this cr i ter ion is that the loads on the system bus and the master processor are minimized since unnecessary master/slave kernel interact ion is disallowed. The size of the slave kernel software is kept to a minimum because the slave processors have no memory mapping; space used by slave kernel software detracts f rom the total space available for application programs on the slave. The space is more affordable on the master processor because its memory mapping allows dedicating a separate address space to its kernel comparable in size to the total space available on a slave processor. -33-Verex was designed with ver i f iabi l i ty in mind. One of its original design constraints was that the correct execution of the Verex kernel could not depend on the correct execution of any software other than its own. With the extension of Verex to a multiprocessor system, this constraint has also been - extended: the Distr ibuted Verex master kernel must not allow the incorrect execution of any software, including the slave kernel software, to af fect the correctness of its execution. This cr i ter ion has signif icance beyond that of software ver i f iab i l i t y . Since the slave kernels are not protected by memory mapping hardware, slave application software could corrupt even a veri f ied slave kernel . This coupled with the possibil ity of slave hardware fai lure make this cr i ter ion a pract ica l necessity. The execution of a Distr ibuted Verex process should be independent of the processor on which it is executing unless it requires the use of a device offered by a part icular slave processor. This ensures that macro - leve l processor al location (both for in i t ia l program loading and for program swapping) is straightforward and that maintenance of the system software is manageable. Compat ib i l i ty of kernel operations with those of Verex is required so that the large number of existing Verex programs may be run on a Distr ibuted Verex system without modi f icat ion . 4.2 Kernel Design Overview The kernel level of Distr ibuted Verex is composed of a master kernel on the master processor, a slave kernel on each slave processor, and an underlying inter -kernel message communication mechanism. The master kernel of the Distr ibuted Verex system is an extended version of the Verex kernel described in Chapter 2. Basical ly , the Verex kernel provides process management, interprocess communication and device control in a uniprocessor environment. In addition to the fac i l i t ies provided by the Verex kernel , the master kernel of a Distr ibuted Verex system provides the fol lowing services to the slave - 3 4 -kernels: management of process identi f iers, communication between processes executing on different processors, some infrequently -executed and complex kernel operations, and services which query or update information stored central ly by the master processor. When a slave kernel requests service from the master kernel , it is said to have invoked a remote kernel operation. Process identif iers are globally unique and their al location and invalidation are handled by the master processor in response to requests f rom the slave kernels and processes resident on the master processor. Communicat ion between processes executing on the same processor is handled by that processor's kernel . Communicat ion between processes residing on different processors is mediated by the master kernel ; it is the only kernel which maintains a record of the location of each process. Some infrequently-used and complex kernel operations are off - loaded to the master kernel rather than provided in slave kernels. This reduces the size and complexity of the slave kernel but requires the cooperation of the master kernel to execute the operation. The decision of whether to distribute the code for a kernel operation, or to put the code only in the master kernel , requires consideration of the associated t ime-space tradeoffs. If the code is included in each slave kernel , the total memory in the multiprocessor system occupied by the code for that operation is the product of the number of processors and the code s ize . Furthermore, the inclusion of this code in the slave kernel detracts f rom the memory available for the slave's application program. Including the code only in the master kernel reduces the total memory occupied but results in slower execution since a remote kernel operation is required to execute the funct ion. These tradeoffs have influenced the design of the interprocess communication remote operations (discussed at length with an example in a later section) as wel l as the handling of process creat ion and process destruction. The process creation operation involves the al location of a globally -35-unique PID, and establishment of PDs in both the slave kernel and the master kernel . The process destruction operation involves freeing the PID and the PDs in the slave kernel and the master kernel , and unblocking of processes blocked on the process being destroyed. These two-operations are much more complex than the other kernel operations. Handling these central ly reduces the inter -kernel interaction and avoids the c r i t i c a l section problems which would otherwise occur . The second component of the Distr ibuted Verex kernel level consists of the slave kernels. Each slave kernel must provide the complete Verex kernel model to its processes by supporting al l of the Verex kernel operations. Some of the kernel operations, the l o c a l k e r n e l o p e r a t i o n s , are handled entirely by the slave kernel while the remote kernel operations are performed with the cooperation of the master kernel and possibly other slave kernels. The kernels handle the fol lowing functions local ly : local process switching, message passing among processes executing on that processor, and control of devices local to that processor. The f inal component of the Distr ibuted Verex kernel level is the underlying inter -kernel communication mechanism. This mechanism accommodates ful l -duplex communication between the master kernel and any or al l of the "s lave kernels concurrently . The inter -kernel communication mechanism is built upon a lower level inter-processor communication fac i l i t y provided by the hardware. The hardware supplies maskable inter-processor interrupts and gives the master processor access to the memory local"to each slave processor. Communication between two processes executing on different processors is supported by their respective kernels. If the processes execute on two different slave processors, the communication is mediated by the master kernel . Inter-kernel communication is provided by lower - level fac i l i t ies . This structure resembles the layered protocol organization of modern computer communication architectures. -36-p r o c e s s I < > | p r o c e s s s l a v e I < > I m a s t e r I < > | s l a v e k e r n e l I I k e r n e l I I k e r n e l i n t e r - I < > I i n t e r - I < > I i n t e r -k e r n e l I I k e r n e l I I k e r n e l co r rm'n I I co r rm'n I I co r rm'n __ T T T__ Figure 6. Remote Interprocess Communication A r c h i t e c t u r e 4.3 Distributed Verex K e r n e l Design Distr ibuted Verex provides the standard Verex execution environment uniformly for al l processes executing in the distributed system. Providing this environment requires maintenance of distributed kernel data structures and support of inter -kernel interactions based on an underlying inter -kernel communication mechanism. The fol lowing sections describe the kernel data structures, the inter -kernel communication protocol , and the details of providing remote kernel operations. 4.3.1 Kernel Data Structures Process management involves the maintenance of per-process state information stored in PDs , and management of PIDs. Processes may be categorized as either master or slave processes. A master process executes on the master -37-processor under control of the master kernel ; a s l a v e p rocess executes on one of the slave processors under control of that processor's slave kernel . Each kernel maintains the Verex kernel data structures for its local processes. In addit ion, the master kernel maintains a PD for al l slave processes in the distributed system, and a descriptor for each slave processor. Slave kernels maintain no information on processes other than their own. The details of the kernel data structures are presented below. A process is identif ied by a globally unique PID. Process location transparency must be provided, thus the PID neither depends on, nor directly identi f ies, the processor on which the process executes. The central ized nature of this system's design dictates that the master kernel be responsible for the a l locat ion, destruction and validation of al l PIDs. Thus, if a slave kernel is requested to create a new local process, it must interact with the master kernel to acquire the PID to be used. Similar ly the master kernel must take part in the destruction of slave processes. The design of Distr ibuted Verex states that kernel functions which are semantical ly local to a single processor should be handled by its kernel These -functions are a subset of the Verex kernel functions and include local process switching and interprocess communicat ion. Thus each slave kernel must maintain the Verex kernel data structures, including PDs , for its local processes. The slave kernel must also be able to identify its local processes in order to determine whether a kernel operation can be performed local ly . Since processes are identif ied solely by their PIDs, when a PID is specif ied as a kernel operation parameter to a slave kernel , that kernel must determine whether the PID identif ies one of its local processes. If it does not, the slave kernel must interact with the master kernel to have the operation performed. Since a slave kernel only maintains PDs for its local processes, the standard Verex PID val idity test succeeds in di f ferent iat ing between valid local PIDs and others. -38-The master kernel manages the master processes completely by itself and therefore must maintain PDs and the other Verex kernel data structures for its own processes. The master kernel also mediates inter -kernel requests and must ^now the disposition and the location of al l slave processes as w e l l . Thus the master processor maintains a PD for every process in the distributed system. Since the slave kernels perform most of the kernel functions local ly , it is not necessary that the master kernel maintain any of the Verex kernel data structures for slave processes, except for PDs and message queues. The master kernel requires detailed slave process state information only while the process is involved in a remote kernel operation. At other times the master kernel marks the process (in its master kernel PD) as under the care of its slave kernel (by setting its process state to S L A V E _ R E A D Y ) . The message "ueue associated with each slave process is used only during remote interprocess communication operations. Slave process state changes resulting from local slave kernel operations need not be ref lected in the master kernel data structures. Thus no master-slave kernel interaction is required during local operations. 4.3.2 Inter - Kernel Requests A slave kernel issues a s lave- to -master inter -kernel request when a slave process executes a kernel operation which cannot be handled local ly . To the requesting process, the execution of a remote kernel operation is indistinguishable from the execution of a local operation. The semantics of al l Verex kernel operations have been retained with the exception of the "Receive -genera l " remote kernel operation which has changed sl ightly as a result of distribution of the kernel . Master - to -s lave inter -kernel requests result from either a master process requesting an operation involving a slave process, or a slave process requesting an operation involving a slave process which executes on a different slave processor. -39-These two types of inter -kernel requests are discussed below fol lowing a description of the inter -kernel communication mechanism. 4 . 3 . 2 . 1 Inter - K e r n e l C o m m u n i c a t i o n M e c h a n i s m Inter-kernel \reauests are exchanged between kernels using a lower level inter -kernel communication fac i l i t y which in turn ut i l izes the underlying hardware inter-processor communication mechanism (see Figure 6). Al l Verex kernel operations behave l ike procedure ca l l s . Remote kernel operations behave like remote procedure ca l l s : execution of the process which invokes a remote kernel operation does not resume unti l the r e q u e s f h a s been processed to^completion. The lower level inter -kernel communication fac i l i t y ( I K C F ) consists of two modules in each kernel : a remote request issuer and a remote request handler. The remote request issuer on one processor sends inter -kernel requests to the remote request handler on another processor. The remote request issuer and handler use a simple send-and-wait protocol . The requesting kernel issues a request to the other kernel and does not issue another request unti l its pending request has been processed and responded to by the other kernel . The I K C F provides ful l -duplex communication between each slave kernel and the master kernel . No direct s lave- to -s lave communication is possible. The I K C F uses the inter-processor communication mechanism provided by the hardware to issue the request, await a response from the other kernel , and to retr ieve that response. The hardware provides shared memory local to each slave processor, and s lave- to -master and master - to -s lave maskable inter-processor interrupts. It also provides a maskable acknowledge interrupt which indicates that a processor's pending inter-processor interrupt has been acknowledged by the other -40-processor. Other than during program loading, the only part of the shared memory on each slave accessed by the master processor consists of two smal l dedicated buffer areas, the master buffer and the slave buffer, used to contain inter -kernel request parameters (Figure 7 ) . The master (slave) buffer is used, during master - to -s lave (slave-to-master) inter -kernel exchanges. "T "T " T "T r emo t e 1 two r e m o t e I r e q u e s t o r s 1 8 - w o r d 1 r e s p o n s e o p e r a t i o n 1 o p e r a t i o n 1 P I D - 1 m e s s a g e I c o d e number 1 a r g u m e n t s 1 1 1 1 b u f f e r 1 1 1 Figure 7. Inter-kernel Request Buffer Format A s lave- to -master inter -kernel exchange proceeds as fol lows. A slave's I K C F request issuer is invoked by its slave kernel as the result of a slave process's attempt to execute a remote kernel operation. The slave process is blocked and its state is set to "remote blocked". The request issuer f i l ls the slave buffer with the necessary request parameters, interrupts the master processor, and enables the acknowledge interrupt f rom the 'master processor. When the master kernel f ields the interrupt f rom the slave processor, it invokes its I K C F remote request handler. The master's request handler then examines the remote operation parameters stored in the slave processor's slave buffer and performs the required act ion, interact ing 'with other slave kernels (using master - to -s lave exchanges) if required. The master's I K C F then enters a response code and possibly other data in the slave's slave buffer and acknowledges the interrupt from the slave processor. This acknowledgement interrupts the slave processor and the slave kernel invokes its I K C F request issuer's completion routine to handle the response. Depending on the type and success of the remote kernel operation, one of several actions are taken by the request issuer. Further details of request handling are given in the next sect ion. -41-In the above scenario, the slave process which executes the remote kernel operation blocks unti l the operation completes. What happens on the slave processor while the operation is in progress? If there are other processes ready to execute on the slave processor then there is potential for overlapping slave process execution with the performance of the remote kernel operation. The slave kernel design could specify any one of three courses of act ion. F i rs t , the slave kernel does not execute processes during the performance of remote kernel operations, it only handles interrupts. Second, the slave kernel executes processes during the performance of a remote kernel operation but does not allow more than one outstanding remote kernel operation at a t ime. If a process attempts to execute a remote operation while another is in progress, the process w i l l be blocked and the processor w i l l id le, only handling interrupts. Third, the slave kernel executes processes during the performance of a remote kernel operation and allows mult iple concurrent outstanding remote kernel operations. A remote kernel operation may take from a few to many mill iseconds to complete , depending on the operation and the system load. The f irst choice offers no overlapping of slave process and remote kernel operation execution, however during this t ime signif icant computation can be performed. The third choice is quite compl icated , requiring the maintenance of variable length queues by the kernel , adding costs in space and t ime. The second choice is a compromise between the other two. In order to determine which choice should be taken, the slave kernel's behaviour was predicted. Measurements of a normal workload on a standard Verex system indicate that about 80 percent of kernel operations are related to message passing. Device management kernel operations account for another 10 percent and the remaining 10 percent query data stored in the kernel . Measurements also indicate that about 60 percent (.80*.60 + .10) of message passing occurs among processes on the same team. -42-With Distr ibuted Verex, a slave kernel handles message passing among its lo°al processes and provides local device management. Thus about 60 percent of slave kernel operations w i l l be handled local ly under a general workload. The Verex t ime c r i t i c a l -applications are compl icated, multiprocess programs such as handlers for computer communication network connections. These programs exchange about 80 percent of their interprocess messages among processes on their teams. Under Distr ibuted Verex, about 75 to 85 percent of al l kernel operations w i l l be handled local ly by the slave kernels for these programs. For our part icular hardware configurat ion, with remote operation queuing (choice three above), i t was calculated that a slave kernel could handle a maximum of approximately 700 kernel operations per second, given the above stat ist ics and the kernel operation execution times and mix.~- Tfie X . 2 " server, one of the target programs for the slave processors, handling three concurrent f i le transfers on three logical channels over a 9600 baud l ink, executes approximately 250 kernel operations per second. So in pract ice , the kernel operation load is expected to be reasonably high, therefore no queuing (choice one above) was deemed unacceptable. Assuming that remote kernel operations are distributed somewhat randomly l e a d s - t o the prediction that even with a load of 250' ' e r n e l operations per second, the probabil ity of requiring the queuing of more than one remote kernel operation w i l l be s m a l l . (Measurements to verify this argument are incomplete at the t ime of writing.) Thus the second scheme was chosen for the in i t ia l implementation within the slave kernel . A master - to -s lave inter -kernel exchange proceeds as fol lows. The master's I K C F request issuer is invoked by the master kernel either as a result of a master process's attempt to execute a remote kernel operation, or as part of the master kernel's handling of a slave's remote kernel operation. The requesting process's state is set to "remote blocked". If there i s - a remote request to the destination processor in progress, the new request is queued and handled when the current request - 4 3 -completes. Otherwise, the master's request issuer f i l ls the destination slave's master buffer with the necessary request parameters, interrupts the slave processor, and enables the acknowledge interrupt f rom the slave processor. When the slave processor fields the interrupt f rom the master, it invokes its I K C F remote request handler which examines the parameters in its master buffer and performs the required act ion. The handler then enters a response code and "possibly other data in its master buffer and acknowledges the interrupt f rom the master processor. This acknowledgement interrupts the master processor resulting in the invocation of the master's I K C F request issuer's completion routine which handles the response as fol lows. If this request was originally result of a s l a v e - t o - m a s t e r request, a response is sent to the requesting slave kernel as described above. Otherwise, the request resulted from a master process executing a remote kernel operation and one of several actions described in section 4.3.2.2 is taken. Unl ike the s lave- to -master case, there is only one acceptable scheme for handling outstanding master - to -s lave requests: they must be queued (choice three above). One of the design c r i te r ia of the distributed kernel was that the correct execution of the master kernel must not depend on the correct execution of the slave kernels. Normal master kernel operation must continue even when faced with handling several master - to -s lave kernel requests to the same slave kernel while that slave kernel is busy (or not responding). 4.3.2.2 Remote Request Issuing The master and slave kernels interface with their I K C F request issuer through the function ca l l Request_remote( requestor_id, op, a r g l , arg2, buffer ) This function's arguments specify the kernel operation to be performed remotely , the two kernel operation arguments, and the message buffer to be used, if required. - 4 4 -Request_remote is invoked when a kernel determines that a kernel operation cannot be handled local ly . The requesting process's state is set to R E M O T E _ B L O C K E D . After the remote operation has been processed, the issuing kernel receives a response which is handled by the kernel's I K C F request issuer's completion routine Complete_request( requestor_id, op, a r g l , a r g 2 , buffer, response-action) The f i rst four arguments to this function have the same values as the corresponding Request_remote function c a l l . The contents of the buffer may have been modified during the processing of the operation, and the response-action f ie ld is a response code supplied by the remote kernel . The response-action indicates one of UNBLOCK, R E T U R N V A L U E U N B L O C K , and S E T B L O C K S T A T E . An additional code returned only to a slave kernel f rom the master kernel is C R E A T E _ P R O C E S S . A response-action code of U N B L O C K specif ies that the process specif ied by requestor_id should be unblocked (readied). This response results f rom remote kernel operations such as a Forward which do not return a value. R E T U R N _ V A L U E _ U N B L O C K indicates that the requesting process should be unblocked and returned the specif ied value. This response results f rom the execution of remote operations which query information-stored central ly by thernaster kernel , and operations which fa i l and return a fai lure code. S E T _ B L O C K _ S T A T E indicates that the process state f ie ld in the requesting process's PD be set to the specif ied state. A Send to an existing remote process would be responded to with a S E T _ B L O C K _ S T A T E code indicating that the sending process's state should be set to awaiting reply. When a slave process executes the kernel operation which creates a process, its slave kernel issues a remote request to the master kernel . The master kernel al locates a PID and sets up its PD for the process and returns the PID to the slave -l\5-kernel with a C R E A T E _ P R O C E S S response-action code. The slave kernel then completes the operation by in i t ia l iz ing the new process's PD and returning the new PID to the requesting process. The processing of the remote operations is straightforward with the exception of the remote message-passing x operations. - Subsection 4.3.2.4 gives a detailed example of the implementat ion. 4 .3.2.3 R e m o t e R e q u e s t H a n d l i n g The "master and al l the slave kernels contain a remote request handler which is invoked in response to a remote request inter-processor interrrupt. To a slave (the master) kernel , this interrupt signals an incoming remote request from the master (a slave) kernel . Within a slave kernel , remote request handling is l imited in order to minimize kernel size and complexi ty . Slave kernels handle Send, Reply , Forward, Destroy_process and a few query remote kernel operations. In addition they handle two control requests for - resett ing the slave processor and init iat ing execution of the root process of the team loaded into the slave processor. A slave process may execute any Verex kernel operation. Any of these operations, except local device handling operations, may result in a remote kernel operation which the master kernel must be prepared to process. The master kernel handles two types of remote operations: those it processes completely i tself , and those which require interaction with another slave kernel . The latter category involves Send, Reply and Forward interprocess communication operations between processes on different slave processors. The former includes al l other kernel operations: those operations which query the master's central ly stored datai the interprocess communication operations between slave and master processes, the process creation and destruction operations, the block data transfer pr imit ives , and the non- local Receive operation. - 4 6 -When the-master kernel encounters a remote operation which it cannot handle local ly , it issues a master - to -s lave inter -kernel request to the appropriate slave kernel . When the second slave kernel has processed this master - to -s lave request, it returns the results to the master kernel which in turn returns the results to the requesting slave kernel . The requesting slave kernel receives no indication of which kernel eventually processed the request. The performance of a remote operation by the destination kernel is s imi lar to its handling of local kernel operations. The differences are in the mechanisms of invoking the operations and returning the results. The I K C F remote request handler is invoked to handle an incoming remote request by an interrupt handler. The request handler establishes the requesting process as the active process and then cal ls the standard Verex kernel procedure to perform the requested operation. The master kernel has a P D for each process in the distributed system. To establish the requestor as the active process, the master simply modifies the pointer to the active process's PD to point to the requestor's P D . Slave kernels only maintain PDs for local processes, so the slave kernel I K C F fabricates a PD for the requesting process, and then modifies the act ive process pointer to point to this P D . Upon completion of the kernel operation, the I K C F returns the results to the requesting kernel and resumes execution of a local process. 4.3.2.4 Message-passing, an Example To i l lustrate the operation of the distributed kernel , communication between two processes on different slaves, w i l l be described. Interprocess communication consists of one process Sending a message to another process which eventually Receives the message and later Replies to the sender. There are two operation orderings to consider: f i rs t , the receiver issues a Receive and is R E C E I V E _ B L O C K E D before the Send is issued; second, the sender Sends and is SEND B L O C K E D before -47-the receiver Receives . In the first case, the receiver f i rst executes either a Receive -speci f ic on the sending process, or a Receive -genera l . If it is a Receive-general operation and there are no local processes S E N D _ B L O C K E D on the receiver , then the slave kernel blocks the receiver and issues a Receive-general remote receive request to the master kernel specifying the receiver's PID. If it is a Rece ive -spec i f ic operation, and the specif ied PID is unknown to the receiver's kernel , then the kernel blocks the receiver and issues a Receive -spec i f ic remote request specifying the sender's PID, and the receiver's PID. The slave kernel then schedules its highest pr ior ity ready process. The master kernel's I K C F request handler is invoked to handle the Receive remote request interrupt. It marks the receiver's master P D as R E C E I V E _ B L O C K E D , queues the receiver on - the sender's P D , and acknowledges the remote request specifying a response-action code of 5ET_ B L O C K _ S T A T E to R E C E I V E _ B L O C K E D . The response interrupts the receiver's processor and the slave's I K C F issuer completion routine then sets the receiver's state to R E C E I V E _ B L O C K E D and resumes execution of the interrupted process. Eventually the sender slave process executes the Send kernel operation specifying the remote receiver's PID and a message buffer. The sender's kernel detects that the PID specif ied is not a local PID, blocks the sender, issues a Send remote request to the master kernel , and schedules its highest pr ior ity ready process for execution. The master kernel's I K C F request handler is invoked by the resulting interrupt. It detects that' the remote receiver is already R E C E I V E _ B L O C K E D and replies to the sender's kernel with a response-action code of S E T _ B L O C K _ S T A T E specifying the state A W A I T I N G _ R E P L Y . The sender's slave kernel's I K C F issuer complet ion routine is invoked to handle the acknowledge interrupt. It sets the sender's state to A W A I T I N G _ R E P L Y and resumes execution of its interrupted process. Meanwhile, the master kernel issues a Send remote request to the receiver's kernel . The receiver's I K C F request handler, invoked by the interrupt handler, passes the sender's message buffer contents and PID to the receiver , acknowledges the master's interrupt with a -48-response-action code of S E T _ B L O C K _ 5 T A T E indicating that the receiver's master PID state should be set to R E M O T E _ R E A D Y . The receiver is then readied for execution. Eventually, the receiver executes a Reply operation specifying the sender's PID. The receiver's kernel blocks the receiver , issues a Reply remote request to the master kernel specifying the sender's PID and a message buffer , and schedules its highest priority ready process. The master kernel's I K C F request handler is invoked to process the request interrupt, and responds to the receiver's kernel with a R E T U R N _ V A L U E _ U N B L O C K response-action code with a return value indicating success. The slave kernel's I K C F completion routine then readies the receiver . Meanwhile, the master kernel issues a Reply remote request to the-sender's kernel , handing on the receiver's message-buffer contents. The sender's I K C F handler returns a response-action code of S E T _ B L O C K _ S T A T E indicating that the master kernel should set the sender's' state to R E M O T E _ R E A D Y . It then readies the sender providing a copy of receiver's message buffer. In the second operation ordering, the sender invokes a Send remote operation which is f ielded by the master kernel which marks the sender as S E N D _ B L O C K E D and queues the process on the receiver's PD in the master kernel Eventually the receiver invokes a Receive remote request, either a Receive -spec i f ic indicating the sender's PID or a Receive -general . The master kernel responds to the Receive with a copy of the sender's message buffer and a response-action code of R E T U R N _ V A L U E _ U N B L O C K , indicating success. Eventually the receiver executes a Reply remote request specifying the sender's PID and a message buffer . The master kernel responds to the receiver's kernel indicating success, and issues a Reply remote request to the-sehder's kernel . The sender's kernel schedules the sender and provides a copy of the receiver's message buffer . The sender's kernel responds to the master kernel with a S E T _ B L O C K _ S T A T E indicating that the state of the sender in the master kernel's PID should be set to R E M O T E R E A D Y . - 4 9 -Figure 8 summarizes the number of inter -kernel interactions required to perform each type of remote kernel operation. s l a v e -t o - s l a v e m a s t e r - t o - s l a v e S e n d - R e c e i v e - R e p l y l 4 I 3 1 (mas t e r s e n d e r ) 2 ( s l a v e s e n d e r ) R e c e i v e - S e n d - R e p 1y1 5 1 2 1 ( m a s t e r s e n d e r ) 2 ( s 1 a v e s e n d e r ) F o r w a r d I 2 I 1 1 C r e a t e P r o c e s s 1 X ••> 1 X 1 D e s t r o y P r o c e s s 1 X I 1 1 Q u e r y 1 2 I 1 1 s 1 a v e - t o - m a s t e r Figure 8. Remote Request Inter-kernel Interactions The master kernel and the slave kernels local ly handle Receive -spec i f ic operations which specify a local process. A Receive -spec i f ic on a remote process is handled s imi lar ly to the other remote operations. As mentioned above, the semantics of the Receive-general operation, when executed by a slave process, have been slightly changed as a result of the distribution of the kernel . In Verex, messages are received in the order in which they are sent. In Distr ibuted Verex, when a slave process executes a Receive -genera l , the slave kernel checks if there are any local processes S E N D _ B L O C K E D on the receiver . If there are any then the Receive is satisf ied local ly ; if there are none then the remote operation is issued. - 5 0 -The change in the behaviour of Receive-general resulted from a design decision motivated by an attempt to minimize slave kernel complex i ty . The change could be el iminated by providing queuing of remote Sends by the destination slave kernel instead of by the master kernel , resulting in an increase in the size and complexity of the slave kernels. In the worst case, with the altered semantics, a receiving process may take arbitrari ly long to receive its outstanding remote messages. Some applications include processes that only receive messages from other processes on its team. When executed under a slave kernel , these processes would be more ef f ic ient ly supported by a version of Receive that only accepted local messages and did not issue Receive remote requests. -51-Chapter 5 Evaluation 5.1 Discussion The Distr ibuted Verex kernel has been implemented and is undergoing testing and further development. The current processor configuration consists of a Texas - Instruments 990/10 plus a single 990/5 auxil iary processor; hardware problems are preventing experimentation with more 990/5 slave processors. The services offered by the team server have been extended: arbitrary Verex programs may be invoked on a slave processor and executed as if they were running under Verex. The swapping of ' slave teams to disk has not yet been implemented but does not present signif icant d i f f icu l t ies . Initial performance measurements are encouraging. Figure 9 summarizes the number of inter -kernel interactions and the execution t ime On milliseconds) to perform typical kernel - operations. The f i rst two rows represent the sequence of kernel operations . which provide interprocess communicat ion. The third row is for simple query kernel operations. The column labelled " loca l " contains times for the operations handled local ly by either the master or a slave processor - their instruction execution rates are s imi lar . The other two columns contain measurements for operations handled remotely ; Reprogramming of the inter -kernel request mechanism is expected to reduce the overhead by about 0.5 to 1 mil l isecond per exchange. - 5 2 -| o c a 1 ma s t e r - to - s 1 a v e s l a v e - t o - m a s t e r S e n d - R e c e i ve - R e p 1y I 0 2 . 1ms 3. 7 . 9ms 2 5 . 8ms ( m a s t e r s e n d e r ) ( s l a v e s e n d e r ) R e c e i v e - S e n d - R e p 1y I 0 2 . 1ms 2 5 . 8ms 2 5 . 8ms ! ( m a s t e r s e n d e r ) ( s l a v e s e n d e r ) Q u e r y 0 0 . 4ms 1 2 . 1ms 1 2 . 1ms Figure 9. Kernel Operation Execution Times These measurements indicate that each inter -kernel exchange imposes an overhead of 1.5 to 2 mil l iseconds. Since most of the Verex kernel operations are short and simple, any overhead is noticable. Measurements to quantify the performance degradation resulting from remote kernel operations are incomplete at the t ime of wr i t ing . It is anticipated that performance of the applications targeted for slave processors w i l l be affected l i t t le by the overhead. The behaviour of one such appl icat ion, the X .25 server, has been predicted. Of the 250 kernel operations it executes per second at maximum load, 86 percent w i l l be handled by the local kernel . The maximum estimated overhead due to inter -kernel communication is about 7 percent ( 2 msecs/remote-op * [.14 * 250] remote-ops/sec ), most of which w i l l be overlapped with master and slave process execution. Concerns of ineff ic ient resource ut i l i zat ion arise for several reasons. F i rs t , Distr ibuted Verex does not support process migrat ion: when a processor is not loaded with a program to execute, processes from an executing team may not be moved to the free processor for execution. Second, since there is no shared memory between slave processors, there can be no f ine-grained processor sharing: when there are no ready processes on one processor, the processor idles even if there are ready but not executing processes on other processors. Thirdly, with the part icular hardware we are - 5 3 -using, an complete address space of memory is dedicated to each slave team and its kernel even if only a f ract ion of the space is occupied. The Potent ial low resource ut i l i za t ion , as usual, must be traded off against the expected performance increase and improvement in rea l - t ime response offered by this design." Preferent ia l al location of slave processors to large programs, computationally intensive programs, or programs requiring rea l - t ime response to external events w i l l increase the gains. A slave processor is al located to a team of processes; spl i t t ing a team over several processors does not f i t the Verex team model . So at best, the speed of execution of a single team program wi l l be its speed of execution on a single dedicated processor. If a speed-up is required, concurrency can be achieved by partit ioning the program into multiple teams of processes, with each team allocated a separate processor. Concerns of re l iabi l i ty and robustness also arise because of the star network conf igurat ion, the tight coupling, and the hierarchical processor structure. Fai lure of the system power supply results in complete system fai lure. Fai lure of the master processor or its kernel results in isolation of the slave kernels. Fai lure in a slave results in the loss of an executing team and access to the slave's local peripheral devices. This project does not address the issues of the design of faul t - to lerant hardware or software, but some effort has been invested in the software to reduce the effects of fai lures. The Distr ibuted Verex kernels are equally suspicious of remote kernels and local processes, protecting themselves against corruption by either . Timeouts have been implemented at the team server level to detect faulty slave processors or slave kernels during team loading. Timeouts have not been implemented at the inter -kernel comrrr'nication leve l , but should be. -54-5.2 C o n c l u s i o n s Multiprocessor Distr ibuted Verex has been real ized by extending the uniprocessor Verex kernel . The design and implementation ef for t was considerably less than that expended on the original implementation of Verex. The simple Verex model of processes and message-based interprocess communicat ion, implemented by a simple kernel , has proved to be an excellent foundation for this experiment. This multiprocessor system was implemented on of f - the -shel f hardware, and did not require the sophisticated hardware' base demanded by other multiprocessor systems. The processor configuration is logical ly a star network consistinq of a central master processor t ightly coupled to peripheral slave processors. The centra l ized -master kernel fac i l i tates the provision of a global process name space and location-independent interprocess communicat ion. The slave kernels provide ef f ic ient kernel support for logical ly local kernel operations, interacting with other kernels only when necessary Fol lowing the example of modern computer communication protocol architectures, ~~the interprocess communication has been build upon a lower level inter -kernel communication fac i l i t y , which in turn ut i l izes an inter-processor communicat ion mechanism. Distr ibuted Verex supports rea l - t ime and general-purpose °omputing applications concurrently . Processors can be dedicated to demanding applications; others may be timeshared among dif ferent programs as is the C P U of a uniprocessor mult i -processing system. -55-5 .3 F u t u r e R e s e a r c h It is becoming increasingly d i f f icu l t to wring more speed from uniprocessor machines, but the demand for computing resources grows steadi ly . The decreasing cost, and increasing density and performance of VLSI components are making the construction of cheap and powerful multiprocessor hardware possible - but hardware alone is not enough, a complete hardware/sofware system must be created. The in i t ia l implementation of Distr ibuted Verex provides such a system but has leaves many issues for further investigation. — The performance of the multiprocessor Distr ibuted Verex system under a general mult i -user workload should be measured and compared to the uniprocessor Verex system, using the same hardware. Similar comparisons should be made on the response t ime and throughput for rea l - t ime applications. — The load on the common bus and the central processor require examination to determine their performance l imitat ions. Measurement of the delay introduced by contention for these resources and their potential for introducing bottlenecks should be included in this study. — The provision of t ime-outs at the inter -kernel communication leve l , and mechanisms for detection of slave processor fai lure should be investigated. — The system architecture requires further study and comparision with other designs in an attempt to answer the fol lowing questions. Is the processor hierarchy required? Is a single common inter-processor communication path suff ic ient or should s lave- to -s lave paths be provided? Is a commmunication mechanism based on shared memory and inter-processor interrupts the best design? What changes in the system software would be required to accommodate modif icat ion of the processor hierarchy and -56-inter-processor communication mechanism? The successful implementation of Distr ibuted Verex has encouraged other closely related research in distributed computing. Two projects with slighly divergent directions show immediate promise. F i rs t , extension of the Verex model to multiprocessor configurations with looser processor coupling, such as workstations connected by a local area network is already in progress. Second, redesign of Distr ibuted Verex and the hardware architecture with the goal of providing a high-performance, t ight ly -coupled multiprocessor computer capable of handling rea l - t ime applications, such as a network gateway, is under investigation. -57-R e f e r e n c e s 1. Amdahl , G . M . , Blaauw, G .A . , and Brooks, F .P . , "Archi tecture of the IBM System/360", IBM Journal of Research and Development 8, 2 (Apri l 1964), 87-101 2. Barnes, G. , Brown, R., K a l o , M . , K u c k , D., Slotnick, D., and Stokes, R., "The ILL IAC IV computer", IEEE Trans, on Computers C , 17 (August 1968), 746-757 3. Cashin, P . M . (Ed.), "SL10 Notes", Bel l -Northern Research (June 1980), Ot tawa, Canada 4. Cher i ton , D.R. , "Designing an Operating System to be Ver i f iable" , U B C Computer Science Technical Report 79-9, University of Br i t ish Columbia (October 1979), Vancouver, Canada 5. Cher i ton , D.R. , "Process Identif ication in Thoth", U B C Computer Science Technical Report 79-10, University of Brit ish Columbia (October 1979) Vancouver, Canada 6. Cher i ton , D.R. , "The Verex K e r n e l " , U B C Computer Science Technical Report , University of Br i t ish Columbia (September 1980), Vancouver, Canada 7. Cher i ton , D.R. , "The Design of a Distr ibuted Kerne l " , Procedings of A C M National Conference, (November 1981) 8. Cher i ton , D.R. , "The Thoth System: Mult i -Process Structuring and Portabi l i ty" , Elsevier North -Hol land (1982), New York 9. Cher i ton , D .R . and Murphy, W., "Verex System Programmer's Manual" , U B C Computer Science Technical Report 79 -1 , University of Br i t ish Columbia (September 1979), Vancouver,Canada 10. Cher i ton , D.R. , M a l c o m , M.A . , Melen , L.S. , and Sager, G .R. , Thoth, a portable  rea l - t ime operating system, C o m m . of A . C . M . 22, 2 (February 1979), 105-115 11. Deer ing, S.E. , "Mult i -process Structuring of X .25 Software", U B C Computer Science Technical Report , University of Brit ish Columbia (1982), Vancouver, Canada 12. Enslow, P .H . , "Multiprocessor Organization - A survey", Computing Surveys 9, 1 (March 1977), 103-109 13. Ful ler , S .H . , Ousterhout, J . K . , Rask in , L., Rubinfeld , P.I., Pradeep, J .S. , and Swan, R . J . , "Mult i -Microprocessors : An Overview and Working Example", Proc . of IEEE 66, 2 (February 1978), 216-228 - 5 8 -14. Heart , F .E. , Ornstein, S .M . , Crowther , W.R. , and Baker, W.B. , " A new minicomputer/multiprocessor for the A R P A network", Proc . AFIPS 1973 National Computer Conf. , Vol . 42 (1973), 529-537 15. Jones, A . K . , Gehringer, E.F. (Eds.), "The C m * Multiprocessor Project : A Research Review", Technical Report C M U - C S - 8 0 - 1 3 1 , Carnegie -Mel lon University (July 1980), Pi t tsburg, U .S .A . 16. Kahn , R.E . , "Resources-sharing, computer communication networks", Proc . of IEEE 60, 1 (November 1972), 1397-1407 17. Lockhart , T.W., "The Design of A Verif iable Operating System Kerne l " , U B C Computer Science Technical Report 79-15, University of Br i t ish Columbia (November 1979), Vancouver, Canada 18. Metca l fe , R. and Boggs, D., "Ethernet: Distr ibuted Packet Switching for Loca l Communicat ion Networks", Communications of the A C M 19, 7 (July 1979), 395-404 19. Satyanarayanan, M . , "Commerc ia l Multiprocessing Systems", IEEE Computer 13, 5 (May 1980), 75-96 20. Wulf, W.A . and Be l l , G .C . , " C . m m p - A mult i -mini -processor" , Proc . AFIPS 1972 Fa l l J t . Computer Conf . , Vo l . 41 (1972), 765-777 21. "Vocabulary for Information Processing", American National Standard X3.12 (1970) - 5 9 -

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051817/manifest

Comment

Related Items