UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A real-time system for multi-transputer systems Chadha, Sanjay 1990

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1990_A7 C39.pdf [ 4.73MB ]
Metadata
JSON: 831-1.0098236.json
JSON-LD: 831-1.0098236-ld.json
RDF/XML (Pretty): 831-1.0098236-rdf.xml
RDF/JSON: 831-1.0098236-rdf.json
Turtle: 831-1.0098236-turtle.txt
N-Triples: 831-1.0098236-rdf-ntriples.txt
Original Record: 831-1.0098236-source.json
Full Text
831-1.0098236-fulltext.txt
Citation
831-1.0098236.ris

Full Text

A R E A L - T I M E SYSTEM FOR MULTI-TRANSPUTER SYSTEMS By Sanjay Chadha B. Tech. Indian Institute of Technology, New Delhi  A THESIS  SUBMITTED  IN P A R T I A L F U L F I L L M E N T O F  T H E REQUIREMENTS MASTER  OF  FOR T H E DEGREE OF  A?P\JtV  $Cl£*ICJET  in T H E FACULTY OF GRADUATE ELECTRICAL  STUDIES  ENGINEERING  We accept this thesis as conforming to the required standard  T H E UNIVERSITY  O F BRITISH  COLUMBIA  April 1990 © Sanjay Chadha, 1990  In  presenting  degree  this  at the  thesis  in  partial fulfilment  of  University of  British Columbia,  I agree  freely available for reference copying  of  department publication  this or of  and study.  this  his  or  her  requirements that the  I further agree  thesis for scholarly purposes by  the  may be  representatives.  It  thesis for financial gain shall not  £LtzC7*l)  CAL  The University of British Columbia Vancouver, Canada  D a t e  DE-6 (2/88)  M A Y II,  \W  ^C/Na.e  an  advanced  Library shall make it  that permission for extensive granted  is  by the  understood be  that  allowed without  permission.  Department of  for  fN $  head  of  my  copying  or  my written  Abstract  Two important problems namely a versatile, efficient communication system and allocation of processors to processes are analysed. An efficient communication system has been developed, in which a central controller, the bus-master, dynamically configures the point-to-point network formed by the links of the transputers. The links are used to form a point-to-point network. A n identical kernel resides on each of the nodes. This kernel is responsible for all communications on behalf of the user processes. It makes ConnectLink and ReleaseLink requests to the central controller and when the connections are made it sends the the messages through the connected link to the destination node. If direct connection to the destination node cannot be made then the message is sent to an intermediate node, the message hops through intermediate nodes until it reaches the destination node. The communication system developed provides low latency communication facility, and the system can easily be expanded to include a large number of transputers without increasing interprocess communication overhead by great extent. Another problem, namely the Module Assignment Problem ( M A P ) is an important issue at the time of development of distributed systems.  M A P s are computationally  intractable, i.e. the computational requirement grows with power of the number of tasks to be assigned. The load of a distributed system depends on both module execution times, and intermodule communication cost (IMC). If assignment is not done with due consideration, a module assignment can cause computer saturation. Therefore a good assignment should  ii  balance the processing load among the processors and generate minimum inter-processor communication (IPC) ( communication between modules not residing on the same processor). Since meeting the deadhne constraint is the most important performance measure for RTDPS, meeting the response time is the most important criteria for module assignment. Understanding this we have devised a scheme which assigns processes to processors such that both response time constraints and periodicity constraints are met. If such an assignment is not possible, assignment would fail and an error would be generated. Our assignment algorithm does not take into consideration factors such as load balancing. We believe that the most important factor for RTDPS is meeting the deadhne constraints and that's what our algorithm accomplishes.  111  Acknowledgement  I would like to thank my supervisor Dr. Ito and my co-supervisor Dr. Lawrence for their pertinent suggestions and continous encouragements throughout the course of my thesis. Also I would like to thank Real Frenette for his important discussions and for developing the necessary hardware. I will also like to express my thanks to my son Sumit (for not disturbing) and my mother Urmila for encouraging me. I would like to thank Dr. Ito (Acct # 589054) again for the financial support he gave me throughout the course of my thesis.  iv  Table of Contents  Abstract  ii  Acknowledgement  iv  Introduction  1  1.1  Distributed Processing Systems  1  1.1.1  1  1  2  Loosely-Coupled vs. Tightly-Coupled Systems  1.2  Real-Time Systems  2  1.3  Real-Time Distributed processing systems  2  1.4  The challenge of real-time (distributed/centrahzed)  systems  3  1.4.1  Task Partitioning  4  1.4.2  Specification and Verification  5  1.4.3  Real-time scheduling theory  5  1.4.4  Real-time operating systems  5  1.4.5  Static or Dynamic resource allocation scheme  6  1.4.6  Real-time programming languages  6  1.4.7  Real-time communications  6  1.4.8  Real time monitor  7  1.5  Focus of this Research  7  1.6  Thesis Overview  8  Efficient Communications Facility For Multi-Transputer 2.1  Introduction  Systems  9 9  v  3  4  2.2  Related Work  11  2.3  The Architecture:  16  2.4  System Design  18  2.4.1  V M T M Node:  19  2.4.2  B B K Node (Bus Master)  24  2.4.3  Host Machine (Sun)  31  2.5  Buffers and Deadlocks  32  2.6  Broadcast Messages  35  2.7  Monitoring  36  2.8  Communication Primitives  36  2.9  Performance Results and Conclusions  37  Performance Analysis For the Communication System  38  3.1  Objective  38  3.2  The Tests  40  3.3  Results  41  3.4  Discussion of the Results  46  3.5  Conclusions  47  Module Assignment Problem for Real-Time Systems  48  4.1  Introduction  48  4.2  Related Work  49  4.3  Assumptions and Problem Formulation  51  4.4  Incorporating Real-time constraints  54  4.5  The Delay Model  59  4.6  The Algorithm  64  4.7  THE ALGORITHM  64 vi  5  4.7.1  Preprocess CFGs  66  4.7.2  Another important heuristic  67  4.8  Study of the Assignment algorithm  68  4.9  Discussion  80  4.10 Conclusions  83  4.11 Summary  83  Conclusions and Future Research W o r k  84  5.1  Conclusions  84  5.1.1  Real-time communication  84  5.1.2  Module Assignment  85  5.2  5.3  Future Research Work  87  5.2.1  87  Real-time communication system  Assignment Problem  87  Appendices  89  A  89  T h e Test P r o g r a m  References  90  vii  Chapter 1 Introduction  1.1 Distributed Processing Systems To overcome the limitations of conventional von-Neumann architectures, distributed architectures have been developed. The spectrum of distributed processing systems ranges from multiprocessor systems in which processors share common memory, to multicomputer systems in which all processors have private memory. In multiprocessor systems processes communicate via shared memory whereas in multicomputer systems processes communicate via message passing.  1.1.1  Loosely-Coupled vs. Tightly-Coupled Systems  Distributed systems can be classified according to their degree of coupling. The terms loosely-coupled and tightly-coupled have been used to distinguish between systems based on characteristics [2] such as the interprocessor communication bandwidth. The tightlycoupled system is one in which the atomic operations on more than one processor are efficient, whereas loosely-coupled systems is one in which the atomic operations are less efficient. These systems can also be distinguished on the basis of partial failures. In looselycoupled systems there is a high possibility of a partial failure of communicating processes, whereas the possibility of a partial failures is very small in a tightly-coupled system. A system consisting of workstations connected together in a L A N or a W A N is an  1  Chapter  1.  2  Introduction  example of a loosely coupled system. A system which contains processors residing on boards that are connected through a  system  bus  falls into the category of tightly-coupled  systems.  1.2  Real-Time Systems  Real-time systems are systems in which correctness of the system depends not only on the logical result of the computation but also on the time at which the results are produced. Real-time computing systems play a vital role in our society and cover a wide spectrum of applications. Examples of current real-time computing systems include the control of automobile engines, nuclear power plants, flight control systems, space shuttle, aircraft control, and robotics. Present real-time systems are expensive to build and their timing constraints are verified with ad-hoc techniques. Minor change in the requirements of the systems leads to expensive and time consuming changes to the system. Millions (even billions) of dollars are being spent by industry to build today's realtime systems. Current brute force techniques will not scale to meet the requirements of guaranteeing real-time constraints of the next-generation systems [3].  1.3  Real-Time Distributed processing systems  Real-time distributed processing systems (RTDPS) are systems solely dedicated for a given application task. The RTDPS consists of a set of processors connected together by an interconnection network which may be a fully connected network, a store and forward network, or a local area network. Communications among processors are provided in the form of messages. Each processor in a RTDPS is autonomous but works in close coordination with other processors in the system. On each processor resides an identical kernal to manage  Chapter 1.  Introduction  some important tasks.  3  The main purpose of a real-time operating system is (1) to  provide efficient and predictable communication among processes residing on different processors, and (2) to schedule processes such that their deadhne constraints are met. Other non-mandatory functions which a real-time distributed operating system (RTDPS) may perform are (1) load balancing among all the processors present in the system (2) cooperate with other processors in scheduling of dynamically arriving jobs. In a R T D P S , the application task is partitioned into a set of software modules or modules . The application task is repeatedly invoked by external signals. Task response a  time or port-to-port ( P T P ) time is defined from the time the task is invoked to the completion of the task execution. During the task's execution,  2  modules need to communicate with other modules via  message exchanges. These messages are called intermodule communication (IMC). For a specific application task, the volume of IMC messages among modules is determined by task partitioning. The overhead for message exchanges on a local computer is usually small and can be assumed to be negligible. However, if IMC is between modules residing on different processors, that is the message traverses through the interconnection network, the message becomes interprocessor communication (IPC). The IPC load thus depends on the assignment of modules on the processors and task partitioning. IPC has significant impact on the system performance and the response time.  1.4  T h e c h a l l e n g e of r e a l - t i m e ( d i s t r i b u t e d / c e n t r a l i z e d )  systems  An important approach used in managing large-scale systems is to hierarchically decompose the system into smaller, simpler modules, that are realizations of the abstract data In this dissertation software modules, modules, processes are used inter-changeably to mean a sequential piece of code. Through out this dissertation we have used task, application task interchangeably to mean a set of task related by a Control Flow Graph (CFG) (see Chapter 4). 1  2  Chapter  1.  Introduction  4  type model. Although this method allows one to reason about the correctness of computation at each level of abstraction, it has no provisions to support reasoning about time and reliability abstractions, two important aspects of real-time systems. B u i l d i n g a science of large scale real-time systems requires new research i n many distinct but related areas. T h e following subsections identify the main areas which need research.  1.4.1  Task Partitioning  The main purpose of distributed processing is to have more executional power by distributing the program modules on different processors such that response time of the application times can be improved. Thus an important step i n design of R T D P S is to partition the application task into a set of smaller tasks i.e. software modules.  These  software modules are then assigned to different processors w i t h the help of a module assignment algorithm. There are a few guidelines which are important at the time of task partitioning. They are: 1. Partition the task such that code which can be executed i n parallel are i n different modules. 2. Execution times of modules should not be too much different from each other because similar module sizes often facilitates load balancing. Also the presence of large modules may delay the response times of other modules if a non pre-emptive scheduling scheme is used. 3. Partitioning should be done such that total I M C should be m i n i m u m . This is done to reduce to reduce I P C which is expensive and time consuming.  5  Chapter 1. Introduction  1.4.2  Specification and Verification  The fundamental challenge in the specification and verification of real-time systems is how to incorporate the timing constraints.  Methods for specification and verification  should include methods to represent the timing constraints of the applications and should include methods to verify that the constraints are met. 1.4.3  Real-time scheduling theory  Scheduling theory addresses the problem of meeting the specified timing constraints. Satisfying the timing requirements of real-time systems demands the scheduling of the systems resources to some well known and understood algorithms so that the timing behavior of the system is understandable, predictable, and maintainable [1, 7, 8, 9, 11, 13]. Scheduling theorj' is not restricted to the study of real-time systems or even general computer systems. It also arises in the study of manufacturing systems, transportation systems, process control systems. However, real-time system scheduling problems are different from the scheduling problems found in areas of operation research. The schedulers for real-time systems have different objectives. Real-time computing systems lack an incentive for minimizing the response time other than for meeting the deadlines, real-time systems are highly dynamic, requiring on-line, adaptive scheduling algorithms. Since the scheduhng algorithms are NP-complete [13], heuristic techniques are required. 1.4.4  Real-time operating systems.  One major focal point for developing real-time systems is the real-time operating system. The operating system must have a support: for guaranteeing real-time constraints, for fault-tolerance, for integrating time constraints with resource allocations and for scheduling across a spectrum of resource types including sensor processing, communications,  Chapter  1.  6  Introduction  C P U , memory and other forms of I/O. If the system is distributed another complicated problem is end-to-end timing analysis i.e. timing constraints are applied to collection of cooperating tasks rather than an individual task.  1.4.5  Static or Dynamic resource allocation scheme  A highly integrated resource allocation scheme is necessary to adequately address timing constraints, predictability, adaptability and fault tolerance. Resource allocation scheme may be static in which case the program modules are known well in advance, or dynamic in which case program modules may arrive dynamically and have to be allocated to a processor in real-time. While making this allocation, timing constraints of the program module have to be taken into account [6, 3, 12].  1.4.6  Real-time programming languages  As complexity of real-time systems increases, high demand will be placed on the programming abstractions provided by languages [4]. The more complex real-time systems will require a programming methodology that will give due consideration to the needs of meeting real-time requirements.  The important research issues include: support for  constructs which support expression of timing constraints, programming environment should provide the programmer with primitives to control and to keep track of the resource utilization of software modules and support for distributed programs and fault tolerance. 1.4.7  Real-time communications  The communication systems are the backbone of real-time distributed systems.  The  real-time communication systems should be versatile, efficient and predictable.  The  Chapter  1.  7  Introduction  predictability is the single most important issue for a real-time communication system. To be successful, the real-time communication system must be able to predictably satisfy individual message-level timing requirements.  The timing requirements are driven not  only by apphcations interprocess communication, but also by time constrained operating system functions invoked on the behalf of apphcation processes.  In a non real-time  communication systems verification can be done by verifying logical correctness of the system, however in a real-time system it is also necessary to verify the timing correctness  1.4.8  Real time monitor  Monitoring and debugging real-time systems is a complicated problem, due to lack of advance tools. Conventional debuggers verify the logical correctness of the system or program. However, it is very difficult to detect or fix a timing bug for real-time programs. Tools such a schedulabibty analyzer, and real-time monitor/debugger is required to reduce the complexity of the real-time software [14].  1.5  Focus of this Research  The main focus of this research he in the following areas: (1) development of a communication system for real-time systems. The communication system is developed for a Multi-Transputer [21] system using a dynamically reconfigurable network; (2) static resource allocation scheme, a new analytic method to estimate task response time for RTDPS is introduced (3) using the model, an algorithm for module assignment is developed.  Chapter  1.6  1.  Introduction  8  Thesis Overview  In Chapter 2 the communication system developed is described, Chapter 3 gives the performance results of our communication system, Chapter 4 discusses the module assignment problem and our algorithm, and finally Chapter 5 presents conclusions and directions for future work.  Chapter 2  Efficient Communications Facility For Multi-Transputer Systems  2.1  Introduction  Multicomputer systems present a lot of advantages, but to actually use these advantages a system designer faces a lot of difficult problems. In this Chapter we discuss how we solve one such problem namely efficient and versatile communication system for a MultiTransputer sj'stem. The major objectives we had while designing our system were: 1. The system should be versatile. B}' versatility we mean that the communication system developed can be used for various requirements like image and signal processing, robotics, matrix manipulation etc.. Our communication system should be such that it can be used by a large variety of users with different communication requirements. 2. The system should be efficient. Since communication overhead is more than the communication costs, overhead of the interprocess communication system can take away a lot of advantages of parallelism. Hence due consideration should be given to an efficient communication system. 3. The system should be scalable. Requirements of users may increase with time, or for a real-time system decrease in the deadhne time of the task may lead to requirement of additional processor(s). The communication system should be such  9  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 10  that the mean communication cost should increase slowly with increase in the number of processors. 4. The system should be reliable. Since most of the parallel systems are used for important and some times life critical applications, reliability is an important characteristic of a communication system. While developing our system we have given due consideration to the above characteristics. Although we have not done anything for reliability we have made efforts to see that the communication system can be modified to a reliable communication system. An efficient communication system has been developed, in which a central controller, the bus master; dynamically configures the point-to-point network formed by the links of the transputers  1  . A n identical kernel resides on each of the nodes . This kernel is 2  responsible for all communications on behalf of the user processes. It makes ConnectLink and ReleaseLink requests to the central controller and when the connections are made it sends the the messages through the connected hnk to the destination node. If direct connection to the destination node cannot be made then the message is sent to an intermediate node, the message hops through intermediate nodes until it reaches the destination node. The communication system developed provides a low latency communication facility, and the system can easily be expanded to include a large number of transputers without increasing interprocess communication overhead by a great extent. Transputer [21] is a microprocessor with four serial full duplex links. These links can be used to form a point-to-point network Node, processor, transputer has been used interchangeably throughout this Chapter 1  2  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  2.2  11  Related Work  With the rapid growth of microprocessor technology, distributed computer systems have become attractive for today's world's ever increasing need for high computational power. Multicomputer systems have a lot of advantages which make it an active area of research. Multicomputer systems today are used in very strategic places like high-speed aircraft control, robotics control, image and signal processing.  The most important  advantages being cost-effectiveness (MIPS / dollar), possibihty of expanding the system as computational demand increases, high availability, graceful degradation in case of partial failures. Though distributed systems present a lot of advantages, efficiently using a distributed system is a difficult problem, and is an area of active research. The problems which a systems engineer may face include the design of a general low latency interconnection network, design of a parallel algorithm which maps onto the network efficiently, and the allocation of program modules to processing nodes. Since interprocess communication between processes on different nodes is a costly process, communication overhead in a poorly designed communication system may out weigh the speed up due to parallelism. Thus realizing the importance of a low cost general communication system, a lot of researchers have designed efficient communication systems [17, 18, 24]. Our scheme differs from [17] and [18] because in our scheme the control of a. dynamic point-to-point network is managed by a central controller whereas in [17] each individual iWarp cell makes and breaks its own connection, and in [18] the C A B makes and breaks connections on behalf of the sender process residing either on C A B or on the host machine. In our scheme the central controller makes direct connection between the source and destination nodes wherever possible thus reducing the number of hops. Only in much  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems12  fewer cases, in which direct connection is not possible, is the message sent through the intermediate node. In this Chapter we discuss a communication system designed for the Multi-Transputer system at U B C . The communication system is part of a real-time operating system being developed for robotics control. Since each transputer has four links, each transputer can be connected to at most four transputers at a time. Since in a general system a process on a node may send/receive messages to/from processes on a large number of nodes, the upper bound of four, on the number of links prohibits formation of a general communication system. A number of solutions have been proposed to alleviate the restriction imposed by the transputer as pointed out by Jones and Martin [26], such as message passing through routing, channel multiplexing, and dynamic reconfiguration. In message passing systems, messages hop from the source node to the destination node. To make message passing efficient new techniques such wormhole routing [19], virtual-cut through [20] have been developed. Virtual-cut avoids unnecessary buffering. When the header of a message packet arrives at an intermediate node whose selected output channel is free, the packet will be directed to the next node immediately (hence the name cut-through). Therefore a message packet is buffered only when its selected output channel is busy. In wormhole routing packets stay inside the network while waiting for the selected output channel to become free. In our technique, in most of the cases messages are transported from the source to the destination node, and in few cases the messages hop through very few intermediate nodes. In channel multiplexing one or more virtual channels are multiplexed onto one hardware link. In a dynamic reconfiguration scheme, links are dynamically switched to allow any arbitrary connection.  Chapter 2.  Efficient Communications  Facility For Multi-Transputer  Systems  13  We have chosen dynamic reconfiguration as the method to alleviate the restriction of upper bound on the number of links for communication because of the following reasons. • In message passing through routing, the messages may pass through a number of intermediate nodes, (this number of intermediate nodes would increase rapidly with the size of the multicomputer system), thus wasting computation cycles for message buffering and routing on the intermediate nodes. • Message passing is subject to deadlocks [25], avoidance or breaking of which might be expensive. • Messages transported through a message-passing system are subject to a nondeterministic delay as a result of factors such as queuing delay, retransmission delay, buffering delay etc. This delay depends on a lot factors like load on the intermediate node, message traffic etc.. Since our communication system is designed for real-time use, we can only allow deterministic delays. • Automatic channel multiplexing/demultiplexing becomes intractable for even the simplest protocols. Since we are developing an operating system such a scheme is obviously not useful to us, also this scheme is practical only in a system with a small number of processors. • Dynamic reconfigurable systems performs significantly better than the static architectures as experimentally shown by Das and Fay [27] Our scheme is in some ways similar to the one developed at University of Manchester [26]. The following are the basic differences : • In their system, synchronization between a send and a receive is maintained, whereas in our scheme, after sending the message, a sender is free to perform other  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  14  computations. The message is buffered by the kernel and sent to the destination node. • In their scheme a request for link, and release link is made by the process which wants to make the communication whereas in our scheme the kernel makes connect and release hnk requests, thus preventing any process to monopolize over a link. Here is a brief outline of our scheme. There are two kinds of nodes in our system, one ordinary (slave) node on which all the user processes reside, and the other is the master node, which controls the dynamic configuration of the point-to-point network formed by the links of the transputer. Whenever a process on the slave ( V M T M ) node has to send or receive a message, it makes the request to the kernel residing on each node. The kernel is aware of the connection of its links i.e. the kernel knows where are its links connected. It checks if the message is for a process, residing on one of the nodes connected; if it is, it sends the message through the link, else it sends a. request to the Master node to connect one of (  the free links to the destination node. On receiving the response from the Master, the kernel can assume that the connection is made and sends the message. H a message has to be transferred from one node to another node which cannot be directly connected, then the message is sent onto an intermediate node. The message hops till it reaches the destination node. The number of hops is small in comparison to a system in which links cannot be dynamically configured i.e. for a system with statically connected interconnection network. The 10 processes running on the host machine are modified versions of CIO supplied by Logical Systems Inc..  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  Vmtm 0  Vmtm 7  Bbk 0 (Master)  Host Machine  : Transputer (T800) 3  : Adaptor  O  9  • fort  : Cross Bar  Figure 2.1: The Architecture  15  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems16  2.3  The Architecture:  The architecture used to realize the communication system resides in two VMEbus card cages hnked together with a VMEbus repeater. The first card cage consists of eight VMTM boards (Parsytec), two BBK boards (Parsytec) and an Interrupt board. The processors on the VMTM boards communicate with each other via a point-to-point network formed by the links of the processing elements. Interconnection of this network is controlled by the processor on the BBK board. The architecture is as shown in Figure 2.1. Transputer: The Transputer is a chip manufactured by INMOS [21]. Transputers have 4 full duplex hardware serial links, which are used to make a connection between transputers. Each hard link has two half duplex hnks. V M T M Board: Each VMTM board [22] consists of four INMOS T800 transputers each with 1 MByte local memory. The transputers can be connected to the VMEbus through adapters. Each VMTM board has nine external ports. These ports can be connected together by cables, which enables communication among transputers on different boards . Also on each board is a 32x32 crossbar switch which is used to connect any link of any transputer on the board to any other hnk, any adapter or any port on the board. The VMTM node (transputer on VMTM) is a slave node i.e. it cannot initiate data transfer on the VMEbus. The hnk adapter, an IMS C012 chip, transforms data received serially from a transputer hnk to an 8 bit wide parallel data word that can be read from the VMEbus. This adapter in our architecture, performs two important functions 1. Input /Output(IO). 10 is done via the adapters. To enable 10 of processes residing on the transputer there are 10 server processes on the host machine. A protocol has been established between the 10 on the transputer and 10 server process on the host machine which enables the 10.  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  2. To send requests to the Master. The adapter can be in two modes. One is the interrupt generation mode; in this mode outputting a byte to the adapter creates an interrupt. The other mode is the ordinary mode in which no interrupt is generated. Ordinarily an adapter is in an interrupt generation mode but when 10 is performed the adapter should be in the ordinary mode. BBK Board The transputer on the BBK board is the bus master with 2 MBytes of dual-ported memory which can be asynchronously accessed by the transputer and the VMEbus. This bus master reads the requests pending through the adapters. Communication between a slave processor and a bus master can be initiated either by polling or by interrupt handhng. With polling, communication delay would quickly reach an unacceptable level as the number of nodes in the network increase, hence an interrupt facility has been developed. Interrupt Board: An interrupt board has been designed and made at UBC. Whenever a VMTM node outputs a byte onto the adapter it generates an interrupt on the interrupt board. Whenever an interrupt is generated on the interrupt board, the interrupt board generates a event on the event channel of the BBK node. This generation of the interrupt at the interrupt board depends upon whether the adapter is in the interrupt generation mode or not. Whenever a processor wants to connect one of it's link to a hnk of some other transputer, it puts the request on the adapter, which creates a interrupt on the interrupt board, and consequently an event is generated on the event channel. The master then reads the interrupt register on the interrupt board and finds out which node created the interrupt. It then fetches the request and performs the necessary steps to satisfy the request. If the request is satisfied, the response is sent back, else the request is queued. When the VMTM node receives the response it can assume that its hnk is now connected  17  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  18  to the requested hnk and can send messages to the destination node. Note that the hnk which is connected to the adapter, has two functions to perform, one is to send requests to the bus master and other is to perform I/O with the 10 process residing on the host machine. Care has to be taken that only one function is to be performed at a time, hence multiplexing on the adapter is required.  2.4  System Design  A user process residing on a slave node can make four kinds of request namely: Send, Receive, ConnectHost and ReleaseHost. Send is a request for sending a block of message to a destination user process. The request is made to C M O U T (Communication Manager Out) process which buffers the request in one of the Writer queues. The Writer process makes a request for connection of its hnk to the destination node, to the Master Node via the BusOut process. In the Master node the request is read by Buslnterface process and the request is sent to the Allocate Resources process for the making of a connection. When the connection is made the response is sent to the Writer process via the Busln process. When the Writer receives the response, knows that the connection has been made and sends the message. The Reader process on the destination node receives the message and buffers the message on the destination process queue. The Receive request is made by the user process to C M O U T . C M O U T , on receiving the request, checks if the there is a message for the user process. If there is a message it sends the message else it sends a Null pointer. A Connect Host request is made to HC (Host Connector)process, which sends the request to the Master node via BusOut process. When the Master node's Buslnterface reads the request it sends the request to CioOut process. The CioOut process sends the  Chapter 2. Efficient Communications Facility For Multi-Transputer  Systems  19  request to the MasterCioCtl. which unblocks a blocked CIO process. A  Release  host  request travels from the CIO process to Cioln on the Master via the  Master and SlaveCioCtl processes on the host machine. The process structure on the V M T M node is shown in Figure 2.2. To describe the functioning of the whole system , each part is described individually as an independent unit.  2.4.1  V M T M Node:  Each V M T M node has the following server processes.  A l l server processes are infinite  processes i.e. they run forever and are shown in Figure 2.2. • Communication Manager Out (CMOUT). • Reader ( R E A D E R ) . • Writer ( W R I T E R ) . • Communication Manager In (CMIN) • Host Connector (HOSTC). • Buslnterface Out (BUSOUT). • Buslnterface In (BUSIN). It might seem that there are a lot of processes on each node, and these processes would starve the the user processes. In actuality these processes do minimal jobs, and take very little processor time. To give an idea of the kernel size we will give some figures here. The kernel occupies 20.16 Kbytes, the Reader process occupies 1.82 Kbytes, the Writer process occupy 4.24 Kbytes, the C M O U T , the C M I N , and H O S T C together occupy 6.75 Kbytes of code, the B U S O U T and BUSIN together occupy 6.6 Kbytes of code. The stack  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems20  Figure 2.2: VMTM node (slave) server processes  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  space allocated to a Writer process is 1 Kbytes, to a Reader process and to H O S T C is 0.5 Kbytes each , and to BUSIN, B U S O U T , C M O U T , C M I N 1 Kbytes of stack space is allocated to each process.  Communication Manager Out (CMOUT) The function of C M O U T is to receive messages from the user process, buffer the message and then queue the buffer onto one of the four queues. There are four queues for messages going out, three of which are associated with the three links used for communication and fourth for the pool of messages, where messages which cannot be sent immediately are queued. When the Communication Manager receives a message it puts the message in the buffer, and checks if any of the three hnks are connected to the destination node. If one of the hnks is connected to the destination node then the message is placed onto the queue associated with that hnk. If the message is for a node which is not currently connected to any hnk then it buffers the message in the common pool of buffers. Note that "Communication Manager receives a message" actually means Communication Manager receives the address, and the length of the message rather than the whole message, this save execution time and buffer space.  Writer There are three Writer server processes. Three processes are required to effectively use the D M A capability of message transfer on a transputer.  While a message is being  transferred on a hnk, another process can execute on the processor. Thus there might be a situation where all three Writers are writing onto the hnks, and some other process may be executing on the processor.  21  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  Each Writer process checks if its buffer queue is empty; if it is, it schedules itself off. If the queue is not empty it checks the destination node of the messages and makes a ConnectRequest to the Buslnterface Out, which in turn sends the request to the bus Master. When the request is satisfied, the Writer sends all the messages in the buffer to the destination node. After sending all the messages in its buffer, it checks in the common pool of buffers, if there are messages to be sent, if there are, then the Writer makes another ConnectRequest.  If the common pool of buffers is empty i.e. there is  no more messages to send, it sends ReleaseRequest to disconnect the hnk and schedules itself off.  Reader Like Writer, there are three Reader processes one associated with each hnk. The reason is the same as explained earlier. Each reader process waits on the hnk associated with it. Whenever it receives a message it puts the message in the buffer, and queues the buffer onto the queue associated with the user for whom the message is destined.  Communication Manager In (CMIN) This process handles Receive request. There can be two types of Receive requests, Synchronous and Asynchronous. In a Synchronous receive the receiver process is blocked until it receives the message. In an Asynchronous receive even if the the receiver process has no message in the buffer, it is not blocked. When the C M I N receives an asynchronous receive request it checks in the buffer associated with the user process, if the message is present, it returns the address and the length of the message. If there is no message it returns a N U L L pointer. When the C M I N receives a synchronous receive request it checks in the buffer queue associated with the user, if the message is present it returns the address and the length,  22  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 23  else it saves the request. After every n units of time it checks in the buffer whether the request is met, if it is, it sends to the user process the message and it's length. Note : If the value of n is too small then CMIN would check very frequently and hence waste processor time, if n is too large then a message might arrive at the destination node much before, it is received by the receiver. Hence there is a trade off for the value of n. HostConnector,(H.C.) This process serializes all requests for connection to the host (Sun) for 1 0 . When hostConnector receives the request, it sends the request to the Buslnterface Out, which in turn sends a request to the bus master. Buslnterface Out,(BUSOUT) Since, at any instant, not more than one process can write onto a hnk, BusInterfaceOut funnels writes onto the adapter and ensures that only one request is sent at a time. BUSOUT funnels request from : • Communication Manager Out: The communication manager may request to send a broadcast message to the Master. Broadcasting will be explained in Section 2.5. • HostConnector: Host Connector may send a HostConnect request. When the HostConnect request reaches the host machine via the bus master, it unblocks a blocked 1 0 server process, thus enabhng the 1 0 . Details of 1 0 would be discussed in a later section. • Writer: Each Writer may send a ConnectRequest and a ReleaseRequest of the hnk associated with it.  Chapter 2.  Efficient Communications Facility For Multi-Transputer Systems  24  On receiving any of these messages the B U S O U T sends the message to the Master node. When B U S O U T receives a message from HostConnector, it sends the message to the Master and blocks itself and BUSIN. Note that this is necessary since while a process is doing 10 there should be no other process reading or writing on the adapter either from within the transputer or from the VMEbus.  Buslnterfaceln  (BUSIN)  Buslnterfaceln directs messages from the bus master to the appropriate server. It may receive the following messages from the Master : • Response for an earlier ConnectRequest: This response is sent to the appropriate Writer. No responses are expected of Release Request. • Broadcast Message: Messages are Broadcast via the Master, the message thus received is buffered onto the appropriate user queue. • Broadcast Acknowledge: A response to a previous broadcast sent, the acknowledgment is sent to C M O U T . • StartMonitoring: Used for starting the Monitoring process (explained later). • StopMonitoring: Used for stopping the Monitoring process and sending the results accumulated during the monitoring process.  2.4.2  B B K Node (Bus Master)  This node is called the Master for two reasons. One is that this node can write onto the VMEbus and second that this node controls the circuit switching of the whole network. Since circuit switching of the whole network is managed by this node, performance of the system depends on the performance of the Master node.  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  25  Following are the processes on the Master node which are shown in Figure 2.3. • Interrupt Handler • Buslnterface • ReleaseResources • AllocateResources • Send Response • CioOut • Cioln  Interrupt Handler This process waits on the event channel of the transputer. When it receives the interrupt on the event channel it fetches the interrupt register on the Interrupt board and checks which transputers have pending requests.  It sends a signal to the appropriate  Buslnterface process to fetch the request.  Buslnterface This process receives requests from the node which created the interrupt on the Interrupt Board. There is one Buslnterface process associated with each transputer. This is done to enable transputers to send their requests in parallel. The following are the kinds of request which a Buslnterface process can receive. 1. Connect Link : This is a request to connect a particular hnk of a node to any hnk on the destination processor. Firstly a check is made whether the source hnk is  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  BBK (Master) Buslnterface Processes "  Q  .  »  .  "  Interrupt andter  O - O Cio  c Server  o  o < Processes  Host Machine (Sun) Figure 2.3: Master and 10 processes  26  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems27  connected to any other hnk. If it is, then the resources (hnk, port) that it is holding are first released by sending the request to the ReleaseResources process, and then request is sent for allocation of resources to the AllocateResources process. 2. Release Link: The resources occupied are released by sending the release request to the ReleaseResources process. 3. Broadcast Message: A broadcast message consists of the process id of the source process, the process ids of the destination processes, the message , and the length of the message.  Since only one process is allowed to write onto an adapter, the  request is sent to the process SendResponse, whereby it is sent to the nodes where the destination user processes reside. 4. Connect Host: Sends the connect request to CioOut to be sent to the host machine. 5. Error: To indicate any error which might have taken place on the V M T M node such as a user process may send a message to a nonexisting user process. 6. Monitor Message : The monitor information received from each node is stored in a file. Presently we monitor the number, of messages Sent and Received by each user process, and the length of messages. We use this information for allocation of processors to processes.  This information gives an accurate measure of the  Communication cost.  Release Resources and Allocate Resources Release Resources releases the resources held by a hnk whereas Allocate Resources allocates the resources needed to make a connection between the source and destination node. Detailed working of the processes is explained in the pseudo code in Figures 2.4 and 2.5 respective!}'. After allocating resources the Allocate Resources sends the response to the  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  28  SendResponse process to be sent to the node which made the request. If the resources ( e.g. hnk on the destination node, or port on the board ) are not available then the request is queued onto the queue associated with resource not presently available. When a resource is released the resource queue is dequeud, and if any request is pending due to that resource then the request is satisfied. Release Resource process is a high priority process. This is done so that resources are released as soon as they arrive since there may be requests pending due the resources used by these allocations.  Send Response This process has the responsibility of sending a response to a request back to the node which made the request, to send broadcast messages and to send Start Monitor and Stop Monitor signals.  CioOut This process receives from the Buslnterface process an adapter number on which the CIO server process has to be started. It sends the adapter number to the MasterCioCtl residing into the host-machine via the dual memory registers as shown in Figure 2.3. It first writes the adapter number of the process on the Output data register, and then makes the Output status register valid. The MasterCioCtl reads the adapter number, starts the corresponding CIO server process by sending the signal to the slave and changes the Output status register to invalid. Also CioOut disables the interrupt generation mode of the adapter.  /  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  29  loop ( waitefor allocation request sourcenode,link,desnode); sourceboard = board on which source node resides; deshoard  - hoard on which destination node resides ;  if ( requested connection already made) { Send to SendResponse(sourcenode,link);  } else [ if ( sourcenode == desnode ) ( I* Just one resource namely link on desnode is required */ ii (a link is free on desnode) ( make the connection; Send to SendResponse(sourcenode,link); ) else queue the request on desnode ; ) else (I* Two resources namely link on desnode and a port I* connecting deshoard and sourceboard */ if( sourceboard and deshoard are not directly connected) { choose an intermediate board which is the shortest path between the two boards ; choose a node on the above choosen intermediate board ; desnode - above chosen node; deshoard = board on which the new chosen node resides; ) if (a port connecting sourceboard and desboard available) [ reserve the port; if (a link is free on desnode) { make the connections; Send to SendResponse(sourcenode,link); ) else queue the request on desnode ;  1 else queue the request on the port; ) ) ) forever;  Figure 2.4: Allocate Resources  Chapter  2.  Efficient  Communications  Facility  For Multi-Transputer  Systems  30  loop[ wake for a release requestf sourcenodejinkjesnode); (ff the connection request was madefromboth the sides) I I* No resources art released because ike connection was requested*! /* from both the sides, the resources would be released only when V I* both sides make Release Request  *l  continue; I desboard = board on which destination node resides; sourceboard » board on which sourcenode resides ; if (desboard = sourceboard) { I* Only one resource has to be released namely link on desnode *l disconnect the nodes; (ff the queue associated with the destination node has a request)! try to satify the request; if (request satisfied) Send to SendResponse(dequednode4tquedUnk); 1 else I J* Two resources have to be released namely the desnode, and the port used *l Disconnect the sourcenode and port on the source node; Diconnect the desnode and port on the desboard ;  f( the queue associated with the destination node has a request) ( try to satisfy the request; if (request satisfied) Sendto SendRcsponsefdequednodeJequedlink); ) iff the queue associated with the port usedfor connection has a request) ( try to satisfy the request; iff request satisfied) Send to SendResponse(dequednodeJequedtink); I ) I forever;  Figure 2.5: Release Resources  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  31  Cioln This process receives messages from the MasterCioCtl on the host machine indicating that a 10 on a particular has completed. When Cioln receives this message it enables the interrupt capability of the adapter.  2.4.3  Host Machine (Sun)  Generally 10 is done by a 10 process on the Sun and 10 routines on the transputer. A protocol has been developed (by Logical Systems Inc.) between the two , which enables the 10. The 10 process on the host-machine polls the adapter and if there is a request performs the function and sends back the response. This scheme is not sufficient and hence has been modified by us for two reasons. • All the requests which come through the adapter are not for 10. Recall that the Buslnterface Out uses the adapter to send Connect, Release and other requests. Hence we cannot allow the 10 processes to poll the adapter continuously. • Since there are a lot of transputers in the system, and there is one 10 process per transputer node, there would be a lot of 10 process polling the adapters using the VMEbus very frequently, thus making the VMEbus busy, and hence decreasing the performance of the system. We allow 10 process to poll only when one of the processes on the transputer makes a Connect Host request. To solve the above problems, we designed the 10 capability described below. The following are the processes used in the scheme and are shown in Figure 2.3. 1. Master Cio Controller (MasterCioCtl). 2. Slave Cio Controller (SlaveCioCtl). 3. Cio process (CIO).  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems32  Master Cio Controller Tins process checks a specified address on the Master node to see if there is a request, to start a CIO process. When it receives a request it passes the request to the SlaveCioCtl. Also it checks if a SlaveCioCtl has a CIO Completed signal from one of the 10 processes; if it has, it informs the Cioln process on the Master node through a specified address on the Master node.  Slave Cio Controller The Slave Cio Controller passes Start CIO request from the Master to the CIO process and passes the CIO Completed signal from the CIO process to the Master Cio Controller.  Cio Process This process is a modification of the CIO supplied by Logical Systems Inc.. Each CIO process is blocked on a pipe, until it receives a Start CIO request.  On receiving the  request it starts polling the adapter like an ordinary 10 process. When it receives a CIO Completed signal from the process on the V M T M node it sends the signal to the Slave Cio Controller and blocks itself until it receives another Start CIO request.  2.5  Buffers and Deadlocks  In message passing systems deadlocks [25] like the one shown in the Figure 2.6. can occur. Such deadlocks cannot occur in our system because messages go directly to the destination node and hence would ultimately be consumed by a Receive. For message passing between nodes which are on boards not directly connected, a message is first sent to a node on the intermediate board, the message is passed from one board to another  Chapter 2. Efficient Communications FacUity For Multi-Transputer Systems BaTTei Node 1  Buffer Node 4  Buffer Node 3  Figure 2.6: Deadlock Situation until it reaches the destination board. Deadlocks for such cases are solved by a technique similar to the one given in [25]. There may be potential deadlock situations which are pertinent to schemes where the Writers and Readers are independent processes i.e. Reader cannot send back the status of the buffers to the Writer. And hence the state of the buffers in the destination node cannot be passed from the Reader on the destination node to the Writer on the source node. There are two situations in which blocking can occur but these two situations are avoided by the kernel by sending error signal to the sender. 1. A process is blocked on a Synchronous Receive (RecFromSy) from a particular user, the input buffer queue is full with messages, and none of the messages are from the user which the receiver process has requested. This certainly is a potential deadlock situation because the receiver would never receive the message, since the buffers are full and buffers will always remain full because the process is blocked . Such situations can be avoided by avoiding RecFromSy and using ReceiveSy. RecFromSy is a synchronous request from the user specified in the parameters (Section 3.8) where as ReceiveSy is a synchronous receive request from any user. The user id is  33  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems  34  passed back as a parameter (Section 3.8). We see the occurrence of above situation as a error on the designer's part and break the deadlock by returning an error signal. The program in such a situation can either use MsgsForMe, or Msgslnfo (Section 2.7) and then wait for the message it was initially waiting for, or use a ReceiveSy rather than RecSyFrom.  If ReceiveSy is used instead of RecFromSy  then the potential blocking situation cannot occur because the blocking would occur only because the message for the user specified in the parameter cannot enter because all the buffers are full but with ReceiveSyn the first message message in the buffer would be received making space for one message. MsgForMe collects all the messages for the user and puts it in the buffer. Msglnfo gives the information about the messages in the buffers. The information contains the length of the message and the sender process id. 2. The other potential blocking situation is similar to the previous one. Such a situation can occur between two or more processes. The situation is depicted in the Figure 2.7. Both A and B are sending messages to each other, their Input buffers are full and their Out buffers are full and both are sending messages to each other. We see this situation as an error on the designer's part, since a process whose input buffers are full should Receive the messages, rather than Send messages. Such deadlocks would occur in any kind of system i.e. all processes are sending messages but no one is receiving messages. In such a situation we return an error signal. To avoid one process from monopolising the buffers either for sending or for receiving messages, we restrict any process from using more than a maximum number of buffers at a time. This maximum has been heuristically calculated as MaxBuffer TotalMaxBuffer/N  + TotalMaxBuffer/N  * N + TotalMaxBuffer/N  =  *N *N  TotalMaxBuffer is maximum number of buffers allowed, and N is the number of user  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems35  Buffer In FULL  Buffer Out y  •4  FULL Buffer Out  , FULL  FULL Buffer In  Figure 2.7: A deadlock situation in our scheme processes on one node. This prohibits any process from monopolising the buffers, and also avoids wastage of buffers, in case one or more processes do not Send or Receive messages. Note that blocking problems occurring in our scheme have a different nature than deadlocks which may occur in a static store-and-forward network. Detecting deadlocks in such systems (static store-and-forward) is a difficult problem. Although deadlock avoiding schemes can be used but these do not allow usage of maximum bandwidth of the system.  2.6  Broadcast Messages  We have two options for broadcasting messages.  One is to connect the hnks of the  transputer on which the sender resides, to all the nodes on which the destination processes reside and then send the message, and other is to send the message to the bus master, which would in turn sends the broadcast message to the appropriate nodes. We chose the second scheme due to the following reasons. One is that in the first scheme the time between the message received at the first node and message received at the last node is  Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 36  nondeterrninistic and would depend on a lot of factors, whereas in the latter scheme all the messages would reach their destination nodes in close proximity . Another reason is the latter scheme is much more efficient than the former. 2.7  Monitoring  The communication system is developed as a part of a real-time distributed operating system being developed here at UBC. Since allocation of processors to processes is an important problem of distributed systems, we have designed a monitoring scheme which monitors the number of messages sent and received by processes, average message length and execution time of each process. To enable this monitoring we have a Monitoring process on the bus master which every n units of time sends a message to SendResponse to send StartMonitoring to all the nodes. After another unit of time a StopMonitoring message is sent to all the nodes. On receiving the StopMonitoring message each node sends back the monitored information. The bus master collects and stores this information. The monitoring process repeats this process a number of times and takes the average of the values and stores it in a file. This information can be used to estimate the communication cost of the system and interprocess communication cost. 2.8  Communication Primitives  The following are the communication primitives provided by the communication system. Send Primitives : int Send(int touser,char ^message, int length, char tag); Receive Primitives : char *Receive(int *fruser, int *length,char *tag);  Chapter 2. Efficient Communications Facility For Multi-Transputer Sysb  char *RecFrom(int fruser,irit *length,char *tag) ; char *ReceiveSy(int *fruser,int *length,char *tag); char *RecFromSy(int fruser,int *length,char *tag) ; char ReceiveAll(MSGS *msgs); Miscellaneous char MsgForMe(MSGINFO *msginfo); char MsgFrom(int £ruser,MSGINFO *msginfo); 2.9  Performance Results and Conclusions  Performance results and conclusions are presented in the next Chapter.  Chapter 3 Performance Analysis For the Communication System  3.1  Objective  Performance analysis is an important issue in the design of a communication system. Our objective while analyzing the performance of our communication system is to test the claims we made in the previous chapter namely versatility, scalability, and efficiency. To test the claims made we measured four important parameters for different test environments. Comparing these parameters for different environments will give us some measure of truthfulness of our claims. The tests were made for an increasing number of transputers and for different length of messages. The test program is given in Appendix A. The test program reads from a file the length of message it has to send, how many messages it has to send to each user. After reading the information it sends messages to the user processes and then receives the messages back from the user processes.  It  finds out the mean message traversal time in the test case by dividing by the number of messages sent and also divides the result by two to get the message traversal time in one direction (since the message traversal time consists of time for message to go to the destination node and back). We made the following tests: 1. Measured the average length of the queue at the master controller.  There is a  queue at the master controller where the requests from the slave nodes are queued before they are processed. The mean length of this queue for a different number of  38  Chapter 3. Performance Analysis For the Communication System  39  transputers in the system, gives the measure of ability of the communication system to allow increasing number of processors, and gives a measure of the switching overhead. 2. Mean number of requests which are blocked because a resource is not available. An example of a blocked request is one which is made to connect a hnk to any hnk of another processor, but all the links of the other processor are connected to some other processors. In such a case the request is blocked until a hnk becomes free. Such a situation cannot arise in a system with a fixed interconnection network. Hence measuring this quantity gives us some measure of the disadvantage of a dynamically configured interconnection network over a fixed interconnection network. Mean number of blocked requests is estimated by adding the number of blocked requests after every unit of time and then to take the mean we divide it my the number of samples taken. 3. Mean response time of the requests for connection. Mean response time gives the measure of efficiency of the master controller and comparison of mean response time for system with different number of transputers gives the measure of scalability of our communication system. 4. Measure the Mean Message traversal time between the user processes. The message traversal time is the most importantfigureand gives the measure of the efficiency of the system. Comparing the mean message traversal time for system with different number of transputers will give us some measure of scalability of our communication system.  Chapter 3. Performance Analysis For the Communication System  3.2  40  The Tests  The following tests were made: 1. Set 1:  • We measured the above given quantities for different length of messages. The message length varied from 8 Bytes to 2 KBytes. • The above test were repeated for 2, 4, 8, 12, and 24 transputers. Tests for transputers 8 and 12 were made twice each with different pattern of communication. With the first test in each case the communication was restricted to be local (communication on the same board) and in the second case 50 % of the communication done was local and 50 % non-local. In all the tests the made the connections between the boards were made such that messages donot have to hop through the intermediate node. 2. Set 2: To compare the performance of our system with that of a system with a fixed interconnection network we measured the message traversal time for different length of messages and for different number of hops. Note that since we did not develop a complete message passing system for fixed interconnection network our test results do not show the overhead of the kernel which resides on each of the transputer nodes. To measure the average traversal time of our system we used four transputer nodes. The results of the test varies with the load of the VME bus. Since the reconfiguration messages have to the master through the VME bus, the load on the VME bus effects the performance of our system. And since the VME bus is used by a lot of users in our system, load on the VME bus is one single factor which effects the results of our system.  Chapter 3. Performance Analysis For the Communication System  3.3  41  Results  Before we present the results we would like to point out that for the first sets of results the absolute values are not important. The reason for this is that while testing, all processes on all the processors Send and Receive messages at their maximum capacity thus producing an extra load on the master controller. By maximum capacity we mean that the users processes just Send and Receive messages and do not do any other computation. In all practical situations processes Send and Receive messages much more infrequently. We made the tests to see how the parameter values change with different number of transputers in the system and for different length of messages. Thus the relative performance of the system for varied number of processors gives us a good measure of scalability of the system. The values of the tested parameters are shown in Figures 3.1 to Figure 3.6. Note the bump in Figure 3.1. We believe that the bump is because of an increase in the load on the V M E bus. Peaks in Figures 3.3, 3.4 and 3.5 are basically the characteristics of a parallel system. Since independent processes interact with each other independently and in a random fashion we get varied results. We are interested in their average values. An important result is shown in Table 3.1 and Figure 3.6. The values shown are the actual values because the tests were made in which Send and Receive was done at a slower rate. The figure also shows message traversal time for message which hops through a different number of nodes. The number of hops are from 1 to 5. The values do not show the overhead due to the kernel which would reside on each node. The actual overhead will include overhead due to allocating memory space, queuing the messages, delay at the queues, delay due to deadlock avoidance scheme, delay due to retransmission. The figures only show overhead due to retransmission.  Even without all the delays which  would be incurred in any practical system, our system will perform better than a system  42  Chapter 3. Performance Analysis For the Communication System  Messge Length  Figure 3.1: Mean Traversal time  -.-2 PEs —t—4PEs -*- 8 PEs -o-8PEs -x- 12 PEs ..o.. 12 PEs . J... 24 PEs  10°  10'  10*  103  Message Length  Figure 3.2: Total Number of Requests  10"  Chapter 3. Performance Analysis For the Communication System  250  200  - . - 2 PEs —+— 4 PEs - • - 8 PEs -o-8PEs -x- 12 PEs ..o.. 12 PEs ..x.. 24 PEs  150  100  50  0 10°  10  2  10  3  Messge Length  Figure 3.3: Mean Response time  Figure 3.4: Average Queue Length  10"  43  Figure 3.5: Average Number of Blocked Requests with average number of hops 4 and greater and the average length of messages 512 bytes and above. In actuality our system will perform better than systems which have mean number of hops somewhere between 2 and 3. Note that the curves in Figure 3.6 would be shifted upwards if the message delay due to queuing, buffering and retransmission are included in the system. To give an idea of the number of hops required for various configurations we would like to give the average number of hops of a few configurations. In a three dimensional hypercube the average number of hops is 1.714. In a four dimensional hypercube ( 16 nodes) the average number of hops is 2.133. In a 2 dimensional mesh with 25 nodes the average number of hops is 2.5. Note that with transputers requiring 10 four dimensional hypercube and two dimensional mesh cannot be formed (because both need four links per node, transputers do have four links but one hnk is used for 10). In such statically connected networks average number of hops increases rapidly with an increase in the number of processors.  Chapter 3. Performance Analysis For the Communication System  Number of hops 1 2 3 4 5 dynamic  8 2.16 2.04 3.05 4.14 5.17 71.00  16 2.69 2.29 3.41 4.65 5.82 70.00  Message 32 64 4.16 4.84 2.84 3.73 4.13 5.58 5.69 7.80 7.10 9.65 73.00 75.0  45  Length (in byt es) 128 | 256 | 512 | 1024 | 2048 6.41 5.70 8.21 16.01 33.54 5.66 9.49 17.18 32.60 63.40 8.45 14.20 25.70 48.66 94.77 11.93 20.24 36.85 70.17 136.77 14.84 25.06 45.62 86.64 176.80 75.00 76.00 83.00 82.00 84.00  Table 3.1: Message Traversal Time (fixed interconnection network vs. dynamically reconfigured network.)  Message Length  Figure 3.6: Message Traversal Time (fixed interconnection network vs. dynamically reconfigured network.)  Chapter 3. Performance Analysis For the Communication System  46  Number Mean Message Traversal Time for of Processors all length of messages (msec) 2 2.14 4 5.04 8 7.01 12 12.20 24 22.50 Table 3.2: Increase in Mean Message Traversal Time with increase in the number of processors 3.4  Discussion of the Results  Though the results tend to vary quite a bit the results tend to show a particular trend. The average message traversal time has seen to increase slowly with an increase in the number of processors. Except for the bump (due to the increase in load of the VME bus) the increase in average message traversal time is consistent. From Figure 3.1 (graph for Mean Traversal Time) we can see that the mean traversal time increases linearly with the increase in the number of transputers as shown in Table 3.2. To remove the effect of load of VME we removed the bump by flattening the bump in the curve for 24 transputers. We can estimate from the above results that the average traversal time will become unacceptable (above 100 msecs) when the number of processors reach in hundreds. These results are affected by a single external factor and that is the VME bus load. Our system will perform much better, and would give more consistent results if the VME bus is isolated from rest of the users. The peaks in graphs in Figure 3.3, Figure 3.4 and Figure 3.5 indicate the random nature of the parallel processes. The tests if repeated again in a similar environment (a shared bus) would probably  Chapter 3. Performance Analysis For the Communication System  47  give quite different results. The variation will depend on the speed and load of the bus used, but the trend of increase will remain the same.  3.5  Conclusions  From the first set of tests it can be seen that mean average traversal time increases slowly with increase in the number of transputers as shown in Figure 3.1. From Figure 3.3 we can see that average response time of requests increases slowly with increase in the number of processors. From Figure 3.4 it is clear that mean number of requests at the master controller increases slowly with increase in the number of transputers. From Figure 3.5 we see that very few requests are blocked because of unavailability of resources. This shows that messages would not be blocked for long time because of the unavailability of resources. Although the performance of our system is good with small number of processors the performance of the system would become unacceptable as the number of processors reaches in hundreds. In such a situation the master Mall become a bottle-neck. In such a situation, cooperating masters (See Chapter 5.) would be required to improve the performance of the system. In a system with a fixed interconnection network the average message traversal time would become unacceptable much earlier. From the second set of results Figure 3.6 and Table 3.1 we see that our system will perform better than the system with a fixed interconnection network if the average number of hops increases above a certain threshold. This threshold depends on a lot of factors which include: switching speed of the cross-bars, speed and load on the bus, average length of messages, message distribution etc.  Chapter 4 Module Assignment Problem for Real-Time Systems  4.1 Introduction Although distributed systems provide several advantages such as response time improvement, higher availability, graceful degradation in case of partial failures, a designer of distributed systems faces a lot of difficult problems, which requires careful consideration. We studied one of these problems in Chapter 2. Another problem namely Module Assignment Problem (MAP) is an important issue at the time of development of real-time and non real-time distributed systems. M A P s are computationally intractable, i.e. the computational requirement grows with power of the number of tasks  1  to be assigned.  The load of a distributed system depends on both module execution times, and intermodule communication cost (IMC). If assignment is not done with due consideration, a module assignment can cause computer saturation. Therefore a good assignment should balance the processing load among the processors and generate minimum inter-processor communication (IPC) ( communication between modules not residing on the same processor). Balancing the load requires the modules to be distributed evenly among the processors whereas reducing communication cost requires modules to be assigned to the same processor. Since the two aims are contradictory, a balance between the two has to be achieved to get an optimal or near optimal module assignment. In this Chapter module, process are used interchangeably meaning a sequential piece of code, and task and application task refers to a group of modules represented by its C F G . J  48  Chapter 4. Module Assignment Problem for Real-Time Systems  49  MAP for Real-Time Distributed Processing Systems (RTDPS) faces additional constraints, which makes MAP for RTDPS computationally more intensive than MAP problems for non real-time systems. Since meeting response time constraint is the most important performance measure for RTDPS, meeting the response time constraint is the most important criteria for module assignment. Understanding this we have devised a scheme which assigns processes to processors such that both response time constraints and periodicity constraints are met. If such an assignment is not possible, assignment would fail and error would be generated. Our assignment algorithm does not take into consideration factors such as load balancing. We believe that the most important factor for RTDPS is meeting the deadline constraints and that's what our algorithm accomplishes. 4.2  Related Work  A number of module assignment algorithms have been proposed. These algorithms can be broadly classified into three categories: graph-theoretic, integer programming and heuristic approaches [32, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44, 45]. Graph-theoretic methods become intractable for systems with more than two processors. The graph-theoretic method has several other drawbacks as well, as pointed out by [30]: it does not have a mechanism for representing restrictions on resources, and realtime timing constraints. Furthermore, it cannot take into account precedence relations among tasks and take advantage of the fact that all tasks need not be allocated at the same time. The integer programming formulation of MAP has been considered by Ma et. al. [40] and Ma [41]. The objective of their model is to minimize the cost function. The cost function is the cumulative sum of execution times and IPC times.  Chapter 4. Module Assignment Problem for Real-Time Systems  50  An integer programming problem for MAP is a NP-hard problem [30]. Hence heuristic methods are sought which would decrease the computational requirement. One common method of solving integer programming problems is to use a branch-and-bound search technique on a tree where vertices represent possible search states. Time complexity of this technique in the worst case is 0(n ) if a global optimum is required where n is m  the number of processors and m is the number of tasks to be assigned. However, with judicious checking of constraints, and judiciously branching to the next node a tighter bound can be put on the complexity of the technique. But then the technique would not guarantee an optimal solution. In most of the practical situations a good local optimum would suffice. Also optimal solutions are desirable only in situations where design constraints can be measured with a good degree of accuracy. Distributed systems to date lack methods to accurately measure IMC [31]. A work very closely related to our work is due to K. K. Leung [28, 29]. He developed a delay model using queuing theory. The delay model helped in estimating the response time of the application tasks. Using the model he developed a scheme to allocate the modules. He made the following assumptions: 1. Tasks are aperiodic. 2. Module invocation arrivals are independent. 3. Module invocation inter-arrival times are exponentially distributed. Using these assumptions and a well known queuing model he estimated the delays on each computer. Adding up the communication time, execution time and. the queuing delay he estimated the task response time. Using this he developed an heuristic algorithm to allocate the processors to modules.  Chapter 4. Module Assignment Problem for Real-Time Systems  51  We believe that the assumptions he made are not true in most practical situations: 1. Most real-time tasks are periodic. 2. Module invocation arrivals are not independent, since the invocations depend upon the structure of the C F G . 3. Module invocation inter-arrival times are not exponentially distributed since the module invocations are periodic. M A P for real-time systems may have one of two different objectives. The objective is either to decrease the response time of the application task (soft-real time systems) or is that the allocation is made such that all real-time constraints are met. This is where the difference lies. Leung's objective was to reduce response time of the application tasks and not that the tasks have to meet their deadlines. Our objective on the other hand is that all response time requirements have to be met. Allocations are not made if the real-time constraints are not met.  4.3  Assumptions and Problem Formulation  We make the following assumptions while discussing our algorithm: 1. A l l processors are identical. Thus the execution time for each module is same on all the processors. Our scheme is in no way restricted by this assumption but this makes the scheme much easier to comprehend. 2. The scheduling discipline is deadline scheduling which does not allow preemption. 3. The processes are generally periodic but aperiodic can be modeled by using their average invocation period.  Chapter 4. Module Assignment Problem for Real-Time Systems  52  Following are the system parameters used in our scheme: n = total number of processors in the system, C F G = Control Flow Graph ( C F G ) (discussed later). k = total number of CFGs in the system, = total number of modules infcthC F G ,  rrik  Eki = execution time of ith module of kth. C F G , Rk = response time constraint offcthC F G , Cij = I M C communication cost between modules i and j, CCij = I P C communication cost between modules i and j , Di  m  = distance between processor 1 and m,  Dki = delay due to presence of other modules of the same C F G , Dk2 = delay due to presence of a common module. Common module is a module which is invoked by two or more CFGs, Dkz = delay due to presence of other modules of other CFGs, Xk = time-period of invocation of kth C F G , Ecm = execution time of common module. C E T = Cumulative Execution Time is sum of execution times and the I M C communication costs in any path of a C F G . Subscript i and j are used to represent modules, subscript k used to represent C F G , subscripts I and m are used to represent processors. Subscript k will not be used when we are dealing with a case with only one C F G . CC^ n d C^ are related by C C , j = C^ * Di a  m  where Di  m  is the distance (in units of  time) between the processors I and m on which the two modules i and j reside respectively.  Chapter 4. Module Assignment Problem for Real-Time Systems  C  EXIT 1 RT (Response Time 1)  C EXIT 2  ^ RT (Response Time 2)  Figure 4.1: An example of a Control Flow Graph (CFG 1).  53  Chapter 4. Module Assignment Problem hi Real-Time Systems  4.4  54  Incorporating Real-time constraints  An application task in a RTDPS is partitioned into a set of modules. The precedence relationships among the modules can be represented by a task control flow graph (CFG) [28] as shown in Figure 4.1. The task is invoked by an external signal. This invocation can be periodic (in which case the C F G executes periodically) or non-periodic ( in which case C F G executes aperiodically). Periodic invocations are characterized by their time-period, whereas aperiodic invocations are characterized by their mean time-period of invocation. After a module completes its execution, it sends messages to invoke (enable) a succeeding module(s) as indicated by the C F G . A C F G may be made up of sub-control flow graphs. There are four types of subcontrol flow graphs: sequential thread, and-fork to and-join, or-fork to or-join and loop. The subcontrol graphs are shown in Figure 4.2. A C F G can be any complex structure made out of these subgraphs. In fact these subgraphs can also be comprised of other subgraphs.  AD arcs starting from a And-fork, or an Or-fork may not join together to  form And-join or Or-join (respectively). Such an arc should have its own response time constraint. Each C F G should have at least one exit node (port). Each port should have a response time constraint associated with it called the port time. Port time is the time within which an invoked task should finish it's execution else it would be considered a failure. An application may consist of more than one C F G . Now we discuss how real-time constraints can be formulated as an integer programming problem. Our scheme of expressing constraints is similar to and derived from a scheme by B . Dasarathy and M . Feridun [30]. Our expressions contain timing delays due  Chapter 4. Module Assignment Problem for Real-Time Systems  i  —* A Or SubGraph  Figure 4.2: Examples of Subgraphs  55  Chapter 4. Module Assignment Problem for Real-Time Systems  56  to presence of modules of the same CFG and also of other CFGs. We will use these constraints in our assignment algorithm. Note a task refers to a set of modules represented by its CFG. Each control flow graph has two types of constraints [30]. 1. Deadhne or Latency constraint : Each task after being invoked by an external signal should respond ( by executing all its modules) before or by its response time ( also called port-to-port time). This constraint of completing the task on or before a deadhne is called a deadhne constraint. 2. Periodicity or bandwidth constraint: A task may be invoked repeatedly, modules comprising the CFG should be assigned such that none of these invocations are lost or queued for a long time. If we don't force this constraint invocation requests would keep on accumulating to the extent, that the invocation requests can no longer be stored by the system. Now we express how the constraints can be specified. We will start with a simple case and then develop the most general case. Independent CFGs Independent CFG is the one in which all modules are invoked from within the CFG. The assignment of modules can be done in two ways. CASE I: All modules are assigned to processors not having any other module assigned to it i.e. when a module is invoked it is ready to execute. The latency constraint for such a case is given by [30]: (4.1)  Chapter 4. Module Assignment Problem for Real-Time Systems  57  where Cdj = CiiDhn  (4.2)  Module i resides on processor I and module j resides on processor m. And the periodicity constraint is given by [30] maxEi < X  (4.3)  CASE II : More than one module can be assigned to a single processor. N.B.: In an independent CFG we are implicitly assuming that all modules assigned to a single processor are modules that belong to one CFG. The latency constraint for this case is: Y^Ei + ^CCij i i,j  + D^K  (4.4)  where D\ is the delay due to presence of more than one module on a processor. This delay is because an arrival of an invocation does not guarantee immediate execution of the invoked module. This is because another module may be executing at the time of arrival of the invocation request. In addition to the module executing there may be modules in the ready hst, the modules which were invoked before the last module and have not yet completed their execution. How such a delay is estimated is the topic of the next section. The periodicity constraint for this case is: max{  Yl  E + D )<X l  1  (4.5)  for all i on processor 1 i.e. sum of execution times of all modules of a CFG which are assigned to any one processor plus the queuing delay period of invocation.  on that processor should be less than the time-  Chapter 4. Module Assignment Problem for Real-Time Systems  58  Dependent CFGs Dependent CFGs are CFGs which have at least one module in common i.e. a module can be invoked either when it receives invocation from one of the parent CFG (an Or-join) or when it receives invocation from all parent CFGs (an And-join). There can be many ways in which modules can be assigned to the processors. We will discuss two of them: CASE I : In this case all processors contain modules from either of the CFG, but not from both. This implies that the module common to both the CFGs should reside alone in a processor (as it is part of both CFGs). The latency constraints for such a case are : Eki + £  CC  kij + Dki + D  < ll  k2  k  (4.6)  Dk2 denotes the delay incurred by the CFG k because of the common module. Dki is the delay incurred because of the presence of more than one module on a processor. The method to estimate the delay will be shown in a later section. The periodicity constraints for such a case are : E  ki  + D i -f D  k2  fc  < X  k  (4.7)  i and E  m  < X  (4.8)  k  CASE II : This is the most general case in which there is no restriction on module assignment. The latency for this case is similar to Eq(4.6) except that an additional term Dk3 is added to the left hand side of the inequality. The additional delay Dk3 is incurred when a processor has modules from more than one CFG. D 3 is sum of delays on all processors k  Chapter 4. Module Assignment Problem for Real-Time Systems  59  on which modules of CFGk resides. The delay is zero on processors which has modules belonging to CFGk only. The latency constraints for such a case are : £  E + £ CC ki  *  +D  kij  +D  kl  +D  k2  k3  < K  k  (4.9)  *J  The periodicity constraints for this case are same as periodicity constraints of the previous case except an addition of D z on the left hand side of the inequalities. k  The periodicity constraints for such a case are: £ E + D + D -r D ki  kl  k2  k3  <\  k  (4.10)  and ^ - f £> < A fc3  fc  (4.11)  We have imphcitly made some assumptions. Periodicity constraints are not affected by the communication cost. This is not exactly true, but for all practical purposes can be assumed to be true. 4.5  The Delay Model  If there is a single module present on a processor, an invocation request immediately follows the execution of the module. But if more than one module is present, then immediate execution of the module is not guaranteed, in fact most of the time the start of the execution will be delayed. We call this delay the queuing delay. Invocation request (customers) wait to be executed on the processor (to be served by the server). The queuing model is as shown in Figure 4.3. There are n different kinds of customers (different software modules on a processor), each with Ri (rate of invocation of the modules) rate of arrival and each kind of customer has a fixed service time Si (execution  Chapter 4. Module Assignment Problem for Real-Time Systems  60  Figure 4.3: The Queuing Model times of the modules). No known queuing model closely resembles this model hence it would not be very appropriate to use an}' of the known queuing models. We have developed a model to estimate the delay of a module. The model is based on simple heuristics and is discussed below: Before we discuss the model we would like to introduce some terms. Each module has a Response Time (RT) associated with it. This RT indicates the time within which the the module has to complete its execution. If it does not then there is no way the application task of which the module is a part can meet its latency constraint. Hence meeting the RT of a module is a necessary but not sufficient condition for the application task to meet its latency constraint. In other words RT of a module is the time within which the module after the invocation of the whole application task has to complete its execution. How RT for each module is estimated will be discussed in the next section. Since the delay of a module depends on the presence of other modules we will discuss  Chapter 4. Module Assignment Problem for Real-Time Systems  61  in detail the various possibilities in which a module is assigned to a processor. The following is the complete list of possibilities: CASE I: A module is the only module on the processor. Queuing delay in this case is obviously zero. There may be other delays such as context switching delay or delay due to communication. Henceforth we will be discussing only the queuing delay. CASE II: A module is present with other modules, but all other modules belong to the same C F G . To estimate the delay in such a situation we have to examine the timing model we use. There is a fixed timing relation (approximately) between different modules i.e. the time at which a module starts it's execution can be fairly weD estimated.  Since the  execution time is known, we can estimate the time duration during which the module executes. A delay would be incurred by a module if its execution duration overlaps with that of some other module.  The delay at worst would be equal to the overlap. Consider  execution of a module A as shown in Figure 4.4. The module starts it execution at time t. Let an invocation request for another module B arrive before A completes its execution. B will have to wait until A completes its execution. The delay is clearly equal to the overlap period. Hence the delay is estimated as the execution period overlap with other modules of the same C F G . C A S E III: A module is present with other modules and one of which is at least of a different C F G . We will work backwards to estimate the delay for such a situation. We start by assuming that our allocation is such that latency and periodicity constraints are met.  If the latency constraint is met for the application task then it follows that RT  requirements of each module is met too. Since the periodicity constraints are met (this we make sure by not allowing the total  Chapter 4. Module Assignment Problem for Real-Time Systems  62  Overlap  Process A  |-  Process B  rime  Figure 4.4: Delay is introduced when execution durations of two modules overlap. execution cost of the modules of the same C F G to increase the T P of invocation of the apphcation task on a single processor), each process should complete its execution before another invocation for the same module arrives. Hence the queuing delay of a module is given by, QDela <X-Ei  (4.12)  yi  and the maximum delay is given by, QDelay  imax  =\ - E  l  (4.13)  While estimating the delay of a module we assume that a module is subjected to maximum delay, and this is true in case where a processor is fully utihzed i.e. total utilization of all the modules assigned on the processor is equal to maximum utilization allowed. For the RT requirement of the module to be met another obvious condition should be met and is given by, RT -CET  >TP  (4.14)  which simply implies that for the RT requirement for the module to be met, RT requirement should be greater than or equal to the time period of the task. C E T (Cumulative  Chapter 4. Module Assignment Problem for Real-Time Systems  63  execution time ) for a module denotes the time at which the module is invoked. It is the time take by execution of all the previous modules and time taken by communication between all previous module pairs.  CASE IV: A module is a Common Module ( C M ) . Recall that a C M is a module which is invoked by modules of more than one C F G . A C M and all descendants of a C M are supposed to be part of all C F G s of which the module is a C M . C M can be of two types: O R or A N D . An OR C M : A C M is of type O R i f it is enabled for execution by invocation from any of its parent module.  T h e maximum delay such a module can be subjected to is  given by:  QDelay < (No.of Parents — 1) * ExecutionTime  (4.15)  A n invocation request may arrive at a time when the C M is already executing and there are other invocation requests for the C M queued waiting to be satisfied. T h e m a x i m u m number of invocation requests that can be present is one less than the total number of parents. Hence maximum delay is the number of invocations requests present multiplied by the execution time of the C M . An A N D C M : A C M is of type A N D if it is enabled only when it receives invocation requests from all its parents. Additional delay in such a case is given by, QDelay < RT — ExecutionTime  (4-16)  The argument for this delay is similar to that due to modules from different C F G s . Delay can not be more than RT — ExecutionTime  , because if it is then no possible, assignment  would be able to meet the latency constraint.  Chapter 4. Module Assignment Problem for Real-Time Systems  4.6  64  The Algorithm  The most important contribution of our algorithm is that it calculates the number of processors required to make the assignment. The number of processors not only depend on the execution cost of the modules but also on the latency and the periodicity constraints. Processors are added, to make sure that both periodicity and latency constraints are met. We believe that for a fairly large system estimating the number of processors required is not an easy task. Making such an estimate for a real-time system is even more difficult. Also throughput of a system need not always increase with the number of processors, in many cases the over all throughput may even decrease. Therefore a good estimate of processors required is necessary. Our scheme makes the assignment depending on the communication cost, the structure of the CFG, the maximum processor utilization, the periodicity and the latency constraint. Assignment of a module is made only if all the requirements are met. There is another important difference: almost all algorithms have an objective cost function which they try to minimize, our scheme has no such objective function. We have just one objective, assignment of modules such that both the periodicity and the latency constraints are met, possibly in a minimum number of processors. Minimizing the number of processors is not an objective function. In schemes in which there is an objective function, a number of heuristic or random solutions are generated and the solution with the best value of the objective is chosen as the solution. In our scheme allocation of processes are made only if their allocation is such that it ensures that the deadlines of the CFGs would be met. 4.7  THE ALGORITHM  1. Read the CFGs, IMCs and Interprocessor distances.  Chapter 4. Module Assignment Problem for Real-Time Systems  65  2. Check the latency and periodicity constraints for each C F G read. If either constraint cannot be met eq (3.1) and eq (3.3) , notify and exit. 3. Preprocess CFGs (detailed in next section). 4. For i = 1 to Number of graphs do 5. Allocate the Current module to the Current Processor (CurrP). 6. Check if allocating of this module, produces a delay on the already allocated modules such that the Response Time (RT) constraint is not met any more. If there is no such module then the assignment is made, go to Step 11. Else go to Step 7. 7. For each processor pePU ( P U is the set of processors on which modules have already been assigned) find the utilization factor left and increase in delay of other modules already present if this module is assigned to the processor.  Select the processor  which has maximum utilization factor left and the total increase in delay of already present modules is minimum. 8. Assign the module to the processor chosen in Step 7 and check if allocation of the process to the processor produces a delay in already present module, beyond their RT. If there is no such module, the module is assigned to the processor and go to Step 11. If there is at least one module whose delay is increased beyond its RT then the assignment is not made. 9. A new processor is required ( no processor already present can accommodate this module). Add a new processor. Assign the module to the new processor. 10. Check the latency constraint of the newly assigned module. If the constraint is met go to Step 11, else notify that the allocation cannot be made and exit.  Chapter 4. Module Assignment Problem for Real-Time Systems  66  11. For j = 1 to Number of Children of the just assigned module: Allocate all children by invoking Step 5. 12. Check if the module is the last module of the graph to be allocated. If it is, check the latency constraint, if met go to next Step, else notify the failure and exit. 13. Check if all graphs are allocated. If allocated exit else go to Step 4. Note that the process of assignment is a recursive process. 4.7.1  Preprocess CFGs  CFGs are processed before they are used for assignment. Preprocessing has four important functions to perform, which influences the assignment. They are: 1. Set the response time requirement of each module: This is done by processing each CFG in a backward direction starting from the exit modules. The RT of the exit module is the RT of the apphcation task, exiting at that module ( an application task can have more than one exit modules and hence more than one RTs). CFG is then traversed in the upward direction and RT of each module is set. To estimate the RT of a module, we use the RT of the application task and subtract from it the execution cost of the modules below it (the module's descendants) and an estimate of communication time. 2. Adjust the execution cost of the modules: Descendants of an OR module, are not invoked each time their parent are invoked i.e. only one of the arc is invoked each time the parent completes its execution. Recall there is a probability of invocation associated with each arc. To accommodate this mutual exclusive behavior of the sibhng modules, we adjust the execution time of each descendant  Chapter 4. Module Assignment Problem for Real-Time Systems  67  of an OR module. The adjustment is made by decreasing the average execution cost by multiplying it with the probability of its invocation. 3.  A r r a n g e the C h i l d r e n :  An OR and an AND module has more than one child.  Which child arc should be allocated first, influences the assignment. We assign the arc in the decreasing order of available time T . T av  av  is the measure of difficulty  for modules of the arc to meet its latency constraint. T  av  of meeting the latency  constraint depends on the amount of communication between the two modules and the RT of the module. We estimate the available time (T ) by, av  T aRT hiid av  4.  c  — ExecutionTime hnd + CommunicationCost  A r r a n g e the C F G s :  c  parent,chiid  (4-17)  If the application has more than one CFG, then order in  which the graphs are assigned influences the assignment. Hence we assign CFGs in decreasing order of difficulty with which the latency constraint are met. The difficulty is estimated by subtracting CET of the longest path from the RT of the application task. Surprisingly, this heuristic does not influence the assignment to a great deal except that the modules are assigned to different processors. We believe that there should be CFGs for which this heuristic will make remarkable difference. 4.7.2  A n o t h e r i m p o r t a n t heuristic  As already pointed out that all arcs going out of an OR module are invoked one at a time i.e. they are mutually exclusive. Using this, we try to assign modules which are OR-related to the same processor. This improves the assignment as a whole by using less processors, without much effecting the application task response time.  Chapter 4. Module Assignment Problem for Real-Time Systems  Module No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  68  Execution Time (psec) 25.0 22.0 12.0 30.0 112.0 12.0 82.0 82.0 64.0 10.0 18.0 22.0 20.0 8.0 10.0  Table 4.1: Execution Time of modules of C F G l  Two modules are OR-related if they have some common OR-parent.  4.8  S t u d y of t h e A s s i g n m e n t  algorithm  To study the performance of our algorithm we will consider a few applications. The assignment is studied for different response times and invocation times. To estimate the performance of the algorithm, our algorithm also computes the minimum number of processors required. This number is based on the execution cost and does not consider the latency constraint, and the communication cost. To test the performance we also assign the tasks with a very large response time, in order to remove the latency constraint and then compare the number of processors required. Four different applications are considered. The first two applications have one C F G each and are shown in Figure 4.5 and Figure 4.6. The execution times and the IMC are  Figure 4.5: C F G l for Test 1  Chapter 4. Module Assignment Problem for Real-Time Systems  Module From 1 2 2 2 3 5 5 8 9 11 12 13 14  Module To 2 3 4 5 6 8 9 12 13 15 14 14 15  IMC Communication Cost 12.0 12.0 12.0 12.0 10.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 2.0  Table 4.2: Communication Time of modules of C F G l  Exp. No.  1 2 3 4 5 6 7  Response Time (^.sec) Required Exit 1 Exit 2 82.0 200.0 82.0 200.0 82.0 215.0 82.0 230.0 ' 80.0 218.0 80.0 320.0 600.0 1023.0  Invocation Period (^.sec)  200.0 100.0 100.0 100.0 75.0 75.0 100.0  Table 4.3: Response and Invocation Times for C F G l , Test 1  70  Chapter 4. Module Assignment Problem for Real-Time Systems  Exp. No. 1  CPUl 1 2 5 6 12 14 15  2 3  could 1 3 8  -  4  5  1 2 3 6 9 10 1 2 -  -  6  7  1 2 3 4 6 10 1 2 3 6 9 10  CPU2 3  4  6 7 9 10 11 13 not 5 12 14 15 -  4 5 11 15  5  8 11 12 15  -  Assignment CPU3 CPU4 -  be 3 4 6 9 10 11 13 7 8 12 14 10  8 12 14  -  5 14 -  -  _  _  5 7  4 8 12 13 14 15  -  -  Estimated Response  71  CPUS  Time (LISCC)  -  196.4  Min. Proc Required 2  213.4  3  207.4  3  216.4  4  216.4  4  248.4  3  -  allocated  -  -  -  -  -  -  -  -  -  -  3 4 6 10 11 13 15 7 9 13  7 9 13  7  -  -  -  -  -  _  Table 4.4: Assignment Results for CFGl, Test 1  Chapter 4. Module Assignment Problem for Real-Time Systems  72  shown in Table 4.1 Table 4.2 and Table 4.5, Table 4.6 respectively. All times are in //sec.  Chapter 4. Module Assignment Problem for Real-Time Systems  Module No. 1 2 3 4 5 6 7 8 9 10  73  Execution Time (/zsec) 16.0 14.0 10.0 10.0 6.0 4.0 4.0 6.0 10.0 2.0  Table 4.5: Execution Time of modules of CFG2  For the study of the communication system we assume that communication cost between any pair of processor is the same (1 unit delay). The algorithm developed can actually take care of situations where communication cost between processors vary. The assignment is done for different response times and invocation periods and the result are shown in Table 4.4 and Table 4.8 for response time and invocation periods given in Table 4.3 and Table 4.7 respectively. The figures under the columns labeled C P U denotes the module number. If the application tasks consists of more than one CFGs then we use an ordered pair (x, y) where x denotes the module number of the C F G y. The third application consists of two independent application tasks. The CFGs of Figure 4.5 and Figure 4.6 are used. The assignment is done for various combinations of response times and the result of assignment is shown in Table 4.10 for response times and invocation periods as shown in Table 4.9. The fourth application consists of two application tasks with a common module. The application task and the common module is shown in Figure 4.7. Execution costs and I M C communication costs are same as before. The assignment is done for various combination of response times and the result of assignment is shown in Table 4.12 for response times and invocation periods given in  Chapter 4. Module Assignment Problem for Real-Time Systems  Figure 4.6: CFG2 for Test 2  74  Chapter 4. Module Assignment Problem for Real-Time Systems  Module From 1 2 2 3 3 4 5 6 7 8 9  Module To 2 3 4 5 4 10 7 8 9 9 10  IMC Communication Cost 2.0 2.0 2.0 2.0 0.0 8.0 8.0 8.0 8.0 8.0 2.0  Table 4.6: Communication Time of modules of CFG2 Table 4.11.  75  Chapter 4. Module Assignment Problem for Real-Time Systems  Exp. No.  1 2 3 4 5 5  Response Time Required (/xsec) Exit 1 60.0 60.0 60.0 120.0 120.0 20.0  Invocation Period (^jsec)  20.0 100.0 25.0 100.0 20.0 50.0  Table 4.7: Response and Invocation Times for CFG2, Test 2  76  Chapter 4. Module Assignment Problem for Real-Time Systems  Exp. No. 1  CPU1 1 6 -  2  3  4  5  6  1 2 3 4 8 9 10 1 6 9 10 1 2 3 4 5 8 9 10 1 6  Assignment CPU2 CPU3 2 4 3 8 9 10 6 7  -  -  -  -  -  -  -  -  -  -  -  -  2 3 5 7 6 7  8  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  5 7 8 9 10 not  4  -  -  -  -  57.0  1  58.0  3  57.0  1  57.0  1  -  -  -  Min. Proc, Required 3  -  -  2 3  Estimated Response Time (//sec) Exit 2 56.0  -  -  -  could  CPU4 5 7  -  be  allocated  Table 4.8: Assignment Results for CFG2, Test 2  Chapter 4. Module Assignment Problem for Real-Time Systems  Exp. No.  1 2 3 4 5 6  Response Time Required (/i sec) CFG 1 CFG 2 Exit 1 Exit 2 Exit 1 80.0 320.0 60.0 80.0 270.0 60.0 80.0 310.0 60.0 80.0 310.0 60.0 70.0 380.0 70.0 70.0 270.0 70.0  78  Invocation Period (//sec)  75.0 100.0 75.0 75.0 120.0 69.0  30.0 30.0 30.0 20.0 80.0 20.0  Table 4.9: Response and Invocation Times for CFG2, Test 3  Chapter 4. Module Assignment Problem for Real-Time Systems  Exp. No. CPUl 1  2  3  4  5  6  (1.1) (2,1) (3,1) (4,1) (6,1)  Assignment CPU4 CPU5  CPU2  CPTJ3  (8,1)  (5,1) (14,1)  (7,1) (9,1) (13,1)  (11.1) (12,1) (16.1)  CPU6  (1,2) (4,2) (6,2) (7,2) (10,2) (2,2) (3,2) (5,2) (7,2) (9,2)  (2,2) (3,2) (5,2) (7,2) (9,2)  (1,1) (2,1) (3,1) (6,1) (9,1) (10,1) (1,1) (2,1) (3,1) (4,1) (6,1) (10,1) (1,1) (2,1) (3,1) (4,1) (6,1) (10,1)  (4,1) (5,1) (11,1) (15,1)  (7,1) (8,1) (12,1) (13,1) (14,1)  (1.2) (4,2) (6,2) (8,2) (10,2)  (8,1) (11,1) (14,1) (15,1)  (5,1) (14,1)  (7,1) (9,1) (13,1)  (1,2) (4,2) (6,2) (8,2) (10,2)  (2,2) (3,2) (5,2) (7,2) (9,2)  (8,1) (11,1) (14,1) (15,1)  (5,1) (14,1)  (7,1) (9,1) (13,1)  (1,2) (5,2)  (2,2) (3,2)  (1.1) (2,1) (3,1) (4,1) (6,1) (8,1) (10,1)  (5,1) (11.1) (12,1) (14,1) (15,1)  (7,1)  (4,2)  (13,1)  (1,2) (2,2) (3,2) (5,2) (6,2) (7,2) (8,2) (9,2) (10,2)  (8,1)  (5,1)  (9,1) (12,1) (13,1) (14,1)  (7,1)  (1,1) (2,1) (3,1) (4,1) (6,1) (10,1)  (11.1) (15,1)  (9.1)  (1,2) (5,2)  CPU7  (6,2) (7,2) (8,2) (9,2) (10,2)  (2,2) (3,2)  CPU8  (4,2)  (6,2) (7,2) (8,2) (9,2) (10,2)  79  Estimated Response Time (/xsec) CFGl CFG2 59.8 54.0 216.4  Min. Proc. Required 6  52.8  207.4  54.0  5  59.8  216.4  54.0  6  59.8  216.4  57.0  7  59.0  191.0  52.0  4  59.8  224.4  57.0  7  Table 4.10: Assignment Results for Test 3  Chapter 4. Module Assignment Problem for Real-Time Systems  Exp. No.  1 2 3 4  Response Time Required (/zsec) CFG 2 CFG 1 CG 70.0 320.0 40.0 70.0 320.0 50.0 70.0 320.0 50.0 100.0 270.0 150.0  80  Invocation Period psec 75.0 70.0 100.0 75.0  20.0 17.0 100.0 40.0  Table 4.11: Response and Invocation Times for C F G l , CFG2 and C G , Test 4  4.9  Discussion  The results obtained shows that the allocation of processes varies a great deal with response time and invocation period. With large invocation periods the actual number of processors used is less than the numer of processors used for smaller invocation periods. This is as expected, and the reason is the tighter periodicity constraints in case of smaller invocation periods. With an increase in the response time constraint, the tasks are allocated such that the estimated task response time is more. This is because relaxing the response time constraint (by giving it a larger value) does not force the algorithm to produce allocation with smaller response time, in such a case it produces a more balanced allocation. Also a point worth noting is that in applications with more than one C F G , it is very rarely seen that the modules of two different CFGs would reside on the same processor. The reason for this may be that CFGs used in the tests were such that they utilize most of the processor's time on the processors they reside on or that allocating the two modules of two different CFGs on the same processor incurs a large delay discouraging them to be allocated together. We also noted that when both response time and periodicity constraints are decreased then allocations could not be made. The reason is because of tighter periodicity constraint  Chapter 4. Module Assignment Problem for Real-Time Systems  c  ^  EXIT 1  ^  R T (Response Time 2)  EXIT 1  Figure 4.7: CFG3 made out of CFGl, CFG2 and CG (Test 4)  81  Chapter 4. Module Assignment Problem for Real-Time Systems  Exp No  Assignment CPU4 | CPU5  CPUl  CPU2  CPU3  1  (1,1) (2,1) (1.3) (2,3) (3,3)  (5,1)  (8,1) (12,1) (14,1)  (4,1) (9,1) (13,1) (15,1)  (7,1) (2,2)  (3,2) (5,2) (6,2) (7,2) (9,2) (10,2)  2  (1,1) (2,1) (1,3) (2,3) (3,3) ( ,1)  (5,1)  (8,1) (12,1) (14,1)  (4,1) (9,1) (13,1) (15,1)  (1,2)  (2,2) (10,2)  3  (1,1) (2,1) (1,3) (2,3) (3,3) (8,1)  (5,1) (12,1) (14,1) (15,1)  (4,1) (6,1) (9,1) (13,1)  (1,2) (2,2)  (3,2) (5,2) (6,2) (7,2) (9,2) (10,2)  4  (1,1) (2,1)  (5,1)  (8,1) (12,1) (14,1)  (4,1) (11,1) (12,1) (15,1) (1,3) (2,3) (3,3)  (7,1) (2,2)  CPU6  CPU7  (3,2) (5,2) (7,2) (9,2)  CPU8  (6,2) (8,2)  82  Estimated Response Time (fisec) CFGl CFG2 CG 272.0 63.0 48.8  Min. Proc. Required 7  213.4  58.0  48.8  7  272.0  63.0  48.8  4  272.0  63.0  43.0  7  7  (1,2) (8,2)  (3,2) (5,2) (6,2) (7,2) (9,2) (10,2)  Table 4.12: Assignment Results for C F G l CF2 and C M , Test 4  Chapter 4. Module Assignment Problem for Real-Time Systems  83  the algorithm tries to allocate the modules to different processors thus increasing the response time (allocating the modules to different processors increases IPC) beyond the response time constraint. 4.10  Conclusions  The results obtained show that the assignment algorithm allocates the processes in minimum number of processors or one or two processors more than the minimum number of processors. The number of processors used depends on the tightness of the deadhne. The algorithm in 52.4% of the cases allocated the processes to minimum number of processors, in 42.4% of the cases allocated using one processor more than the minimum number of processors and in 4.2% of the cases used greater than one processor more than the minimum number of processors. Note that the number of processors actually used for assignment depends upon the deadhne constraint, execution times of the processes and the period of invocation of the process. 4.11  Summary  The design objective for RTDPS is to assign modules to processors such that all periodicity constraints are met. A new scheme to estimate the response time of an application task has been introduced. Based on this scheme a heuristic technique has been described which assigns modules to processors, making sure that all periodicity and latency constraints are met. The technique uses simple heuristics to choose an assignment such that all constraints are met. The technique is then validated using a number of assignment problems.  Chapter 5  Conclusions and Future Research Work  5.1  Conclusions  5.1.1  Real-time communication  A new technique has been devised and developed to allow efficient and predictable communication among processes residing on different Transputers. The technique is novel in the sense that the point-to-point network formed by the links of the transputers is dynamically reconfigured. The interconnection network is dynamically reconfigured to enable messages to go directly to the destination node. In a fixed (static) interconnection network, a message may have to hop through the intermediate nodes. We believe that our scheme is better than a store and forward network because of the following reasons: • The messages reach the destination node directly, retransmission from intermediate nodes is avoided thus improving efficiency. • The message in the static network has to pass through intermediate nodes, hence there is an extra overhead involved in buffering the messages at the intermediate nodes. • In the static network the number of average hops required by a message increases rapidly with the increase in the number of processors. Thus such a system cannot be expanded very efficiently, whereas in our scheme the average delay in the messages increases much slowly. Hence our system can be expanded to a system with much  84  Chapter 5. Conclusions and Future Research Work  85  larger processors. Another master can be added to the system to improve speed and to allow more processors to be efficiently added to the system. • Store and forward networks are subjected to deadlocks whereas our scheme avoids deadlocks by sending the message directly to the destination node. We believe that detecting deadlocks in a distributed system is a difficult problem. In real-time systems since the messages have to reach the destination in predictable time, deadlocks cannot be allowed in such systems (deadlocks lead to unpredictable delays). There are deadlock avoidance schemes but these do not allow the maximum bandwidth of the system to be used. • Since our communication system is to be used for a real-time system the communication system should have predictable delay for message transfers. This is true for our system. In a store and forward network the delay is unpredictable due to the following reasons: - Load at the intermediate nodes affects the delay of the messages, - The path which the messages takes to reach the destination node affects the time of message travel, - A message delay depends to a large extent on other messages present in the system. Analysis show that our system performs better than the system which with a static interconnection network, and this has been confirmed by experimental results. 5.1.2  Module Assignment  Response time is the most important criterion for real-time systems.  Response time  is greatly effected by the way modules are assigned to the processors hence module  Chapter 5. Conclusions and Future Research Work  86  assignment is an important part of a real-time system. We have developed an analytic technique to measure the delay in a module. The delay in modules is due to: 1.  IPC,  2. presence of other modules of the same CFG (since they contend for the same resources such as processor, I/O etc.), 3. presence of other modules of the some other CFG (since they contend for the same resources such as processor, I/O etc.), 4. presence of common modules ( module common to more than one CFG) The delay in a module is then used to estimate the response time of the application tasks. An heuristic algorithm is developed which assigns modules to processors taking into account the structure of the CFG, the communication cost between processors and the volume of communication between modules. The novel characteristic of our algorithm is that it does not require the number of processors as an input and estimates the minimum number of processors on which the modules can be assigned. We believe that for a complex system, estimating the number of processors required is an equally difficult problem as the assignment problem. Hence our scheme can conveniently be used in such situations in which number of processors required is not known. Our algorithm assigns only if allocation is possible, that is response time of all application tasks are met. Our scheme also takes care of situations in which a module or a CFG is part of more than one CFG. The module assignment algorithm should serve as a valuable tool for designing distributed systems.  Chapter 5. Conclusions and Future Research Work  5.2  87  Future Research Work  5.2.1  Real-time communication system  A number of research areas need further investigation. They include the following: 1. A Fault tolerant system: It is obvious that in our system if a master fails the whole system will fail. Realizing the importance of real-time systems, the system can be made more reliable by adding a backup (slave) processor which remains dormant in normal operation, periodically collects the status of the network configuration and monitors the welfare of the master controller. When the backup process detects the the failure of the master it can take over the responsibilities of the master. 2. A Faster more Versatile system: If our system is expanded to a very large system, chances are that the master controller would become the bottle neck of the system. To avoid such a situation more than one cooperating masters can be used. 3. A hardware master controller: A chip performing the function of the master controller can be designed. The design of such a chip would reduce the switching overhead dramatically. 5.3  Assignment Problem  A number of research areas need further investigation. They include the following: 1. Development of a real-time scheduler which would work in close association with the assignment algorithm. 2. Development of a monitor tool which periodically checks whether the response time of the task is met and reassigns some processes dynamically such that the response  Chapter 5. Conclusions and Future Research Work  times are met. 3. Improvement of the algorithm.  Appendix A  T h e Test P r o g r a m /.............................................. .  uitr.c i near program uaed to u k i the parforunc* tasting  ..  #include <conc.h> /include <stdio.h> #define USRDIR "/bon/fsO/robotice/eiLnjayc/comaunication/uBer/ma"  /  #include " l l b . h "  extern _procoaB_id; int MsgiTo[32];~ int lan; /* /*  Baad tha information about the length, and the auabir of maaaagaa to ba aant to each uaar  RaadMsgInfo(int IamUaar) < PILE *fp ; int i j,k,lineno,from » -1,to,NoofMsga; char line(80],directive[10]; char Startad • 0; for(i=0;i<32;i++) Nag«To[i) = 0; if((fp° fopen(USRDIR,"r")> »= MULL) ( printf('Could not open file\n"); exit(l); > tgeta ( Una, 80, fp),aacanf (line, »%d" ,&len); while ( fgeta(line,80,fp> I - HULL) < if(line(0] •« '«') continue; if(Startad tt lamUser l~ from) return(0); aacanf(line,"%•"/directive); if(letrcep(directive,Troa:-)) < aacanf(line,-t'a %d",*from); } else < if( IamUaar ~~ from){ Started » 1; if(aacanf(line,-%d %d",*to,tNoofMags) <2) printf("Error on line Id",lineno); NagaXo(to) « MoofMsga; f  )  >  lineno++;  ) }  main()  <  int i,j,k; int length; char .Msg ; int fruaar; nnaigned long tia>el,tim2,totaltime; char Tag,Meaaaga[«09i]; int naarno; uaarno " _froc«B_ld; ConnectHost(userno) ; RaadMagXnfo(_procesa_id); ReleaseBoat(uaarno) ;  89  References  [1] D. W. Leinbaugh and M. R. Yamini, Guaranteed Response Times in Distributed Hard-Real-Time Environment, IEEE Transactions on Software Engineering, Dec. 1986. [2] M. A. Malcolm and R. Vasudevan, Coping with Network Partitions and Processor Failures in a Distributed System, IEEE Transactions on Distributed Systems 1989. [3] J. A. Stankovic, A Serious Problem for Next-Generation Systems, IEEE Computer, Oct. 1988 pp. 10-19. [4] B. Dasarathy, Timing Constraints of Real-Time Systems: Constructs for Expressing them, Methods of Validating them, IEEE Real Time Symposium, 1982, pp. 197-204. [5] A. Burns Occam's Priority Model and Deadhne Scheduling, IEEE Real Time System Symposium, 1988. Proceedings of the 7th Occam User group, Technical Meeting, 1416 Sept. 1987, Grenable, France pp. 146-159. [6] K. G. Shin and Y Chang, Load Sharing in Distributed Real-Tim Systems with State-Change Broadcasts, IEEE Transaction on Computers Vol. 38, No. 8. August 1989. [7] W. Zhao et. el., Scheduling Tasks with Resource Requirements in Hard Real-Time Systems, IEEE Transaction on Software Engineering, Vol. 13 No. 5, May 1987. [8] S. R. Biyabani and J. A. Stankovic, The Integration of Deadhne and Criticalness in Hard Real-Time Scheduling, IEEE Transactions on Software Engineering, 1988. 90  References  91  [9] W. Zhao et. al., Preemptive Scheduling Under Time and Resource Constraints, IEEE Transactions on Computers, Vol. C-36, No. 8, August 1897. [10] L. Sha, R. Rajkumar and J. A. Lehoczky, Priority Inheritance Protocols An Approach to Real-Time Synchronization, CMU-CS-87-181, December 1987. [11] D. W. Leinbaugh, Guaranteed Response Times in a Hard-Real-Time Environment, IEEE Transaction on Software Engineering, Vol. SE-6, No. 1, 1980. [12] J. F. Kurose, S. Singh and R. Chipalkatti, A Study of Quasi-Dynamic Load Sharing in Soft Real-Time Distributed Computer Systems, IEEE Real Time System Symposium, pp. 201-208, 1986. [13] J. A. Stankovic et. al., Evaluation of a Flexible Task Scheduling Algorithm for Distributed Hard Real-Time Systems, IEEE Transactions on Computers, December 1985. [14] H. Toduda et. al. A Real-Time Monitor for a Distributed Real-Time Operating System, Technical Report CMU-CS-88-179, July 1988. [15] S. Chadha, M. Ito, P.Lawrence Efficient Communication Facilities for MultiTransputer Systems, 4th Annual Symposium on Parallel Processing, Fullerton, California, April 4-6 1990. [16] H. T. Rung Network-Based Multicomputers: Redefining High Performance in the 1990s January 1989 Technical Report CMU-CS-89-138. [17] S. Borkar et. al. iWarp: An integrated Solution to High-Speed Parallel Computing pp 330-339, Proceedings of Supercomputing '88, IEEE Computer Society and ACM SIGARCH ,(Orlando, Florida, November 1988).  References  92  [18] E. A. Arnouldm et. el. , The design of Nectar: A Network Backplane for Heterogeneous Multicomputers, Proceedings of Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III) ACM, (April 1989). [19] W. L. Dally and C. L. Seitz, The Torus Routing Chip, Distributed Computing 1 ,4, 1986. [20] P. Kermani and L. Kleinrock, Virtual Cut-Through: A New Computer Communication Switching Technique, Computer Networks 3, 1979. [21] Preliminary Data IMS T800 Transputer, Inmos. [22] J. Lowenhag, VMTM Technical Documentation version 1.1 Megaframe series Hardware Documentation, Parsytec 1987. [23] H. J. Denuell, BBK-V2 Technical Documentation version 2.3 Megaframe series Hardware Documentation, Parsytec April 1988. [24] W. C. Athas C. L. Seitz, Multi computers: Message-Passing Concurrent Computers, Computer 21,8 (August 1988), 9-24. [25] J. D. William , L. S. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Transactions on Computers Vol. C-36, No. 5 May 1987. [26] P. Jones and A. Murta, Support for Occam Channels via Dynamic Switching in Multi-Transputer Machines Proceedings of the 9th Occam User Group Technical meeting 19-21 September 1988 Southhammpton, UK.  References  93  [27] P. K. Das and D. Q. Fay, Performance studies of Multi-Transputer Architectures with Static and Dynamic Links. Microprocessing and Microprogramming 24 (1988) pp 281-290.  [28] W. W. Chu and K. K. Leung, Module Resplication and Assignment for Real-Time Distributed Systems, Proceedings of the IEEE, Vol 75, No 5, May 1987. [29] Kin Kwong Leung, Task Response time and Module replication for Real-Time distributed processing System, PhD dissertation, UCLA, December 1985. [30] B. Dasarathy and M. Feridun, Task Allocation Problems in the synthesis of Distributed Real-Time systems, IEEE Real Time System Symposium, 1984, pp 135144.  [31] W. W. Chu, M. T. Lan and J. Hellerstein, Estimation of Intermodule Communication(IMC) and Its Application in Distributed Processing Systems, IEEE Transaction on Computers, Vol. C-33, no. 8, pp. 691-699, August 1984. [32] W. W. Chu and L. M. T. Lan, Task Allocation and Precedence Relations for Distributed Real-Time Systems, IEEE Transactions on Computers, Vol. C-36, no. 6, June 1987. [33] K. Efe, Heuristic Models of Task Assignment Scheduling in Distributed Systems, Computer, Vol. 15 no. 6, pp. 50-56, June 1982. [34] C. C. Shen and W. H. Tsai, A Graph matching Approach to Optimal Task Assignment in Distributed Systems Using a Minmax Criterion, IEEE Transactions on Computers, Vol. C-34, no. 3, pp. 197-203, March 1985. [35] Gautam Kar and C. N. Nikolaou, Assigning Processes to Processors: a fault-Tolerant Approach. IEEE Transactions on Fault Tolerant Computing 1894 pp. 306-309.  References  94  [36] S. Davari and S. K . Dhall, A n on line algorithm for real-time tasks allocation, I E E E Real time Systems Symposium, 1986 pp 194-200. [37] H . S. Stone, Multiprocessor Scheduling with the Aid of Network Flow Algorithms. I E E E Transactions on Software Engineering, Vol SE-3, No 1, January 1977. [38] H . S. Stone and S. H . Bokhari, Control of Distributed Processes, I E E E Computer Volume 11 number 7, July 1978, pp 97-106. [39] S. T. Levi et. al., Allocation for Real-Time Computations under Fault Tolerance Constraints. I E E E Real-Time Systems Symposium 1988, pp 161-170. [40] P. R. M a et. al., A Task Allocation Model for Distributed Computing Systems. I E E E Transactions on Computers, 1982, pp 249-255. [41] P. M a and et. al., On Design of a Task Allocation Scheme for Time-Critical Applications. I E E E Real-Time System Symposium, 1981, pp 121-125. [42] J. A . Bannister and K . S. Trivedi, Task Allocation in Fault-Tolerant Distributed Systems. Acta Informatica 20:261-81,1983 [43] W . W . Chu and et. al., Task Allocation in Distributed Data Processing. I E E E Computer pp 57-69 November 1980. [44] L . Meier, Allocation and Scheduling of Software Modules, I E E E Real-Time System Symposium, 1981, pp 127-129. [45] R. P. M a , A Model to Solve Timing-Critical Application Problems in Distributed Computer Systems. I E E E Computer January 1984, pp 62-68.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0098236/manifest

Comment

Related Items