UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A real-time system for multi-transputer systems Chadha, Sanjay 1990

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1990_A7 C39.pdf [ 4.73MB ]
Metadata
JSON: 831-1.0098236.json
JSON-LD: 831-1.0098236-ld.json
RDF/XML (Pretty): 831-1.0098236-rdf.xml
RDF/JSON: 831-1.0098236-rdf.json
Turtle: 831-1.0098236-turtle.txt
N-Triples: 831-1.0098236-rdf-ntriples.txt
Original Record: 831-1.0098236-source.json
Full Text
831-1.0098236-fulltext.txt
Citation
831-1.0098236.ris

Full Text

A REAL-TIME SYSTEM FOR MULTI-TRANSPUTER SYSTEMS By Sanjay Chadha B. Tech. Indian Institute of Technology, New Delhi A THESIS S U B M I T T E D IN PARTIAL F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F A?P\JtV $Cl£*ICJET in T H E F A C U L T Y O F G R A D U A T E STUDIES E L E C T R I C A L E N G I N E E R I N G We accept this thesis as conforming to the required standard T H E UNIVERSITY O F BRITISH C O L U M B I A April 1990 © Sanjay Chadha, 1990 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of £LtzC7*l) CAL ^C/Na.e fN $ The University of British Columbia Vancouver, Canada D a t e M A Y II, \W DE-6 (2/88) Abstract Two important problems namely a versatile, efficient communication system and alloca-tion of processors to processes are analysed. An efficient communication system has been developed, in which a central controller, the bus-master, dynamically configures the point-to-point network formed by the links of the transputers. The links are used to form a point-to-point network. An identical kernel resides on each of the nodes. This kernel is responsible for all communications on behalf of the user processes. It makes ConnectLink and ReleaseLink requests to the central controller and when the connections are made it sends the the messages through the connected link to the destination node. If direct connection to the destination node cannot be made then the message is sent to an intermediate node, the message hops through intermediate nodes until it reaches the destination node. The communication system developed provides low latency communication facility, and the system can easily be expanded to include a large number of transputers without increasing interprocess communication overhead by great extent. Another problem, namely the Module Assignment Problem (MAP) is an important issue at the time of development of distributed systems. MAPs are computationally intractable, i.e. the computational requirement grows with power of the number of tasks to be assigned. The load of a distributed system depends on both module execution times, and inter-module communication cost (IMC). If assignment is not done with due consideration, a module assignment can cause computer saturation. Therefore a good assignment should ii balance the processing load among the processors and generate minimum inter-processor communication (IPC) ( communication between modules not residing on the same pro-cessor). Since meeting the deadhne constraint is the most important performance measure for RTDPS, meeting the response time is the most important criteria for module assignment. Understanding this we have devised a scheme which assigns processes to processors such that both response time constraints and periodicity constraints are met. If such an assignment is not possible, assignment would fail and an error would be generated. Our assignment algorithm does not take into consideration factors such as load balancing. We believe that the most important factor for RTDPS is meeting the deadhne constraints and that's what our algorithm accomplishes. 111 Acknowledgement I would like to thank my supervisor Dr. Ito and my co-supervisor Dr. Lawrence for their pertinent suggestions and continous encouragements throughout the course of my thesis. Also I would like to thank Real Frenette for his important discussions and for devel-oping the necessary hardware. I will also like to express my thanks to my son Sumit (for not disturbing) and my mother Urmila for encouraging me. I would like to thank Dr. Ito (Acct # 589054) again for the financial support he gave me throughout the course of my thesis. iv Table of Contents Abstract ii Acknowledgement iv 1 Introduction 1 1.1 Distributed Processing Systems 1 1.1.1 Loosely-Coupled vs. Tightly-Coupled Systems 1 1.2 Real-Time Systems 2 1.3 Real-Time Distributed processing systems 2 1.4 The challenge of real-time (distributed/centrahzed) systems 3 1.4.1 Task Partitioning 4 1.4.2 Specification and Verification 5 1.4.3 Real-time scheduling theory 5 1.4.4 Real-time operating systems 5 1.4.5 Static or Dynamic resource allocation scheme 6 1.4.6 Real-time programming languages 6 1.4.7 Real-time communications 6 1.4.8 Real time monitor 7 1.5 Focus of this Research 7 1.6 Thesis Overview 8 2 Efficient Communications Facility For Multi-Transputer Systems 9 2.1 Introduction 9 v 2.2 Related Work 11 2.3 The Architecture: 16 2.4 System Design 18 2.4.1 V M T M Node: 19 2.4.2 B B K Node (Bus Master) 24 2.4.3 Host Machine (Sun) 31 2.5 Buffers and Deadlocks 32 2.6 Broadcast Messages 35 2.7 Monitoring 36 2.8 Communication Primitives 36 2.9 Performance Results and Conclusions 37 3 Performance Analysis For the Communication System 38 3.1 Objective 38 3.2 The Tests 40 3.3 Results 41 3.4 Discussion of the Results 46 3.5 Conclusions 47 4 Module Assignment Problem for Real-Time Systems 48 4.1 Introduction 48 4.2 Related Work 49 4.3 Assumptions and Problem Formulation 51 4.4 Incorporating Real-time constraints 54 4.5 The Delay Model 59 4.6 The Algorithm 64 4.7 T H E A L G O R I T H M 64 vi 4.7.1 Preprocess CFGs 66 4.7.2 Another important heuristic 67 4.8 Study of the Assignment algorithm 68 4.9 Discussion 80 4.10 Conclusions 83 4.11 Summary 83 5 Conclusions and Future Research W o r k 84 5.1 Conclusions 84 5.1.1 Real-time communication 84 5.1.2 Module Assignment 85 5.2 Future Research Work 87 5.2.1 Real-time communication system 87 5.3 Assignment Problem 87 Appendices 89 A The Test P rog ram 89 References 90 vii Chapter 1 Introduction 1.1 Distributed Processing Systems To overcome the limitations of conventional von-Neumann architectures, distributed ar-chitectures have been developed. The spectrum of distributed processing systems ranges from multiprocessor systems in which processors share common memory, to multicom-puter systems in which all processors have private memory. In multiprocessor systems processes communicate via shared memory whereas in multicomputer systems processes communicate via message passing. 1.1.1 Loosely-Coupled vs. Tightly-Coupled Systems Distributed systems can be classified according to their degree of coupling. The terms loosely-coupled and tightly-coupled have been used to distinguish between systems based on characteristics [2] such as the interprocessor communication bandwidth. The tightly-coupled system is one in which the atomic operations on more than one processor are efficient, whereas loosely-coupled systems is one in which the atomic operations are less efficient. These systems can also be distinguished on the basis of partial failures. In loosely-coupled systems there is a high possibility of a partial failure of communicating processes, whereas the possibility of a partial failures is very small in a tightly-coupled system. A system consisting of workstations connected together in a L A N or a W A N is an 1 Chapter 1. Introduction 2 example of a loosely coupled system. A system which contains processors residing on boards that are connected through a system bus falls into the category of tightly-coupled systems. 1.2 Real-Time Systems Real-time systems are systems in which correctness of the system depends not only on the logical result of the computation but also on the time at which the results are produced. Real-time computing systems play a vital role in our society and cover a wide spectrum of applications. Examples of current real-time computing systems include the control of automobile engines, nuclear power plants, flight control systems, space shuttle, aircraft control, and robotics. Present real-time systems are expensive to build and their timing constraints are verified with ad-hoc techniques. Minor change in the requirements of the systems leads to expensive and time consuming changes to the system. Millions (even billions) of dollars are being spent by industry to build today's real-time systems. Current brute force techniques will not scale to meet the requirements of guaranteeing real-time constraints of the next-generation systems [3]. 1.3 Real-Time Distributed processing systems Real-time distributed processing systems (RTDPS) are systems solely dedicated for a given application task. The RTDPS consists of a set of processors connected together by an interconnection network which may be a fully connected network, a store and forward network, or a local area network. Communications among processors are provided in the form of messages. Each processor in a RTDPS is autonomous but works in close coordination with other processors in the system. On each processor resides an identical kernal to manage Chapter 1. Introduction 3 some important tasks. The main purpose of a real-time operating system is (1) to provide efficient and predictable communication among processes residing on different processors, and (2) to schedule processes such that their deadhne constraints are met. Other non-mandatory functions which a real-time distributed operating system (RTDPS) may perform are (1) load balancing among all the processors present in the system (2) cooperate with other processors in scheduling of dynamically arriving jobs. In a RTDPS, the application task is partitioned into a set of software modules or modules a . The application task is repeatedly invoked by external signals. Task response time or port-to-port (PTP) time is defined from the time the task is invoked to the completion of the task execution. During the task's execution, 2 modules need to communicate with other modules via message exchanges. These messages are called intermodule communication (IMC). For a specific application task, the volume of IMC messages among modules is determined by task partitioning. The overhead for message exchanges on a local computer is usually small and can be assumed to be negligible. However, if IMC is between modules residing on different processors, that is the message traverses through the interconnection network, the message becomes interprocessor communication (IPC). The IPC load thus depends on the assignment of modules on the processors and task partitioning. IPC has significant impact on the system performance and the response time. 1.4 T h e challenge of real-t ime (d i s tr ibuted/centra l ized) systems An important approach used in managing large-scale systems is to hierarchically decom-pose the system into smaller, simpler modules, that are realizations of the abstract data 1In this dissertation software modules, modules, processes are used inter-changeably to mean a se-quential piece of code. 2Through out this dissertation we have used task, application task interchangeably to mean a set of task related by a Control Flow Graph (CFG) (see Chapter 4). Chapter 1. Introduction 4 type model. Although this method allows one to reason about the correctness of compu-tation at each level of abstraction, it has no provisions to support reasoning about time and reliability abstractions, two important aspects of real-time systems. Bui lding a science of large scale real-time systems requires new research in many distinct but related areas. The following subsections identify the main areas which need research. 1.4.1 Task Partitioning The main purpose of distributed processing is to have more executional power by dis-tributing the program modules on different processors such that response time of the application times can be improved. Thus an important step in design of R T D P S is to partition the application task into a set of smaller tasks i.e. software modules. These software modules are then assigned to different processors with the help of a module assignment algorithm. There are a few guidelines which are important at the time of task partitioning. They are: 1. Partit ion the task such that code which can be executed in parallel are in different modules. 2. Execution times of modules should not be too much different from each other because similar module sizes often facilitates load balancing. Also the presence of large modules may delay the response times of other modules if a non pre-emptive scheduling scheme is used. 3. Partit ioning should be done such that total I M C should be minimum. This is done to reduce to reduce I P C which is expensive and time consuming. Chapter 1. Introduction 5 1.4.2 Specification and Verification The fundamental challenge in the specification and verification of real-time systems is how to incorporate the timing constraints. Methods for specification and verification should include methods to represent the timing constraints of the applications and should include methods to verify that the constraints are met. 1.4.3 Real-time scheduling theory Scheduling theory addresses the problem of meeting the specified timing constraints. Satisfying the timing requirements of real-time systems demands the scheduling of the systems resources to some well known and understood algorithms so that the timing behavior of the system is understandable, predictable, and maintainable [1, 7, 8, 9, 11, 13]. Scheduling theorj' is not restricted to the study of real-time systems or even general computer systems. It also arises in the study of manufacturing systems, transportation systems, process control systems. However, real-time system scheduling problems are dif-ferent from the scheduling problems found in areas of operation research. The schedulers for real-time systems have different objectives. Real-time computing systems lack an in-centive for minimizing the response time other than for meeting the deadlines, real-time systems are highly dynamic, requiring on-line, adaptive scheduling algorithms. Since the scheduhng algorithms are NP-complete [13], heuristic techniques are required. 1.4.4 Real-time operating systems. One major focal point for developing real-time systems is the real-time operating system. The operating system must have a support: for guaranteeing real-time constraints, for fault-tolerance, for integrating time constraints with resource allocations and for schedul-ing across a spectrum of resource types including sensor processing, communications, Chapter 1. Introduction 6 C P U , memory and other forms of I/O. If the system is distributed another complicated problem is end-to-end timing analysis i.e. timing constraints are applied to collection of cooperating tasks rather than an individual task. 1.4.5 Static or Dynamic resource allocation scheme A highly integrated resource allocation scheme is necessary to adequately address timing constraints, predictability, adaptability and fault tolerance. Resource allocation scheme may be static in which case the program modules are known well in advance, or dynamic in which case program modules may arrive dynamically and have to be allocated to a processor in real-time. While making this allocation, timing constraints of the program module have to be taken into account [6, 3, 12]. 1.4.6 Real-time programming languages As complexity of real-time systems increases, high demand will be placed on the pro-gramming abstractions provided by languages [4]. The more complex real-time systems will require a programming methodology that will give due consideration to the needs of meeting real-time requirements. The important research issues include: support for constructs which support expression of timing constraints, programming environment should provide the programmer with primitives to control and to keep track of the re-source utilization of software modules and support for distributed programs and fault tolerance. 1.4.7 Real-time communications The communication systems are the backbone of real-time distributed systems. The real-time communication systems should be versatile, efficient and predictable. The Chapter 1. Introduction 7 predictability is the single most important issue for a real-time communication system. To be successful, the real-time communication system must be able to predictably satisfy individual message-level timing requirements. The timing requirements are driven not only by apphcations interprocess communication, but also by time constrained operating system functions invoked on the behalf of apphcation processes. In a non real-time communication systems verification can be done by verifying logical correctness of the system, however in a real-time system it is also necessary to verify the timing correctness 1.4.8 Real time monitor Monitoring and debugging real-time systems is a complicated problem, due to lack of advance tools. Conventional debuggers verify the logical correctness of the system or program. However, it is very difficult to detect or fix a timing bug for real-time pro-grams. Tools such a schedulabibty analyzer, and real-time monitor/debugger is required to reduce the complexity of the real-time software [14]. 1.5 Focus of this Research The main focus of this research he in the following areas: (1) development of a com-munication system for real-time systems. The communication system is developed for a Multi-Transputer [21] system using a dynamically reconfigurable network; (2) static resource allocation scheme, a new analytic method to estimate task response time for RTDPS is introduced (3) using the model, an algorithm for module assignment is devel-oped. Chapter 1. Introduction 8 1.6 Thesis Overview In Chapter 2 the communication system developed is described, Chapter 3 gives the performance results of our communication system, Chapter 4 discusses the module as-signment problem and our algorithm, and finally Chapter 5 presents conclusions and directions for future work. Chapter 2 Efficient Communications Facility For Multi-Transputer Systems 2.1 Introduction Multicomputer systems present a lot of advantages, but to actually use these advantages a system designer faces a lot of difficult problems. In this Chapter we discuss how we solve one such problem namely efficient and versatile communication system for a Multi-Transputer sj'stem. The major objectives we had while designing our system were: 1. The system should be versatile. B}' versatility we mean that the communication system developed can be used for various requirements like image and signal pro-cessing, robotics, matrix manipulation etc.. Our communication system should be such that it can be used by a large variety of users with different communication requirements. 2. The system should be efficient. Since communication overhead is more than the communication costs, overhead of the interprocess communication system can take away a lot of advantages of parallelism. Hence due consideration should be given to an efficient communication system. 3. The system should be scalable. Requirements of users may increase with time, or for a real-time system decrease in the deadhne time of the task may lead to requirement of additional processor(s). The communication system should be such 9 Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 10 that the mean communication cost should increase slowly with increase in the number of processors. 4. The system should be reliable. Since most of the parallel systems are used for important and some times life critical applications, reliability is an important char-acteristic of a communication system. While developing our system we have given due consideration to the above characteristics. Although we have not done anything for reliability we have made efforts to see that the communication system can be modified to a reliable communication system. An efficient communication system has been developed, in which a central controller, the bus master; dynamically configures the point-to-point network formed by the links of the transputers 1 . An identical kernel resides on each of the nodes 2 . This kernel is responsible for all communications on behalf of the user processes. It makes ConnectLink and ReleaseLink requests to the central controller and when the connections are made it sends the the messages through the connected hnk to the destination node. If direct connection to the destination node cannot be made then the message is sent to an intermediate node, the message hops through intermediate nodes until it reaches the destination node. The communication system developed provides a low latency communication facility, and the system can easily be expanded to include a large number of transputers without increasing interprocess communication overhead by a great extent. 1 Transputer [21] is a microprocessor with four serial full duplex links. These links can be used to form a point-to-point network 2Node, processor, transputer has been used interchangeably throughout this Chapter Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 11 2.2 Related Work With the rapid growth of microprocessor technology, distributed computer systems have become attractive for today's world's ever increasing need for high computational power. Multicomputer systems have a lot of advantages which make it an active area of research. Multicomputer systems today are used in very strategic places like high-speed aircraft control, robotics control, image and signal processing. The most important advantages being cost-effectiveness (MIPS / dollar), possibihty of expanding the system as computational demand increases, high availability, graceful degradation in case of partial failures. Though distributed systems present a lot of advantages, efficiently using a distributed system is a difficult problem, and is an area of active research. The problems which a systems engineer may face include the design of a general low latency interconnection network, design of a parallel algorithm which maps onto the network efficiently, and the allocation of program modules to processing nodes. Since interprocess communication between processes on different nodes is a costly process, communication overhead in a poorly designed communication system may out weigh the speed up due to parallelism. Thus realizing the importance of a low cost general communication system, a lot of researchers have designed efficient communication systems [17, 18, 24]. Our scheme differs from [17] and [18] because in our scheme the control of a. dynamic point-to-point network is managed by a central controller whereas in [17] each individual iWarp cell makes and breaks its own connection, and in [18] the C A B makes and breaks connections on behalf of the sender process residing either on C A B or on the host machine. In our scheme the central controller makes direct connection between the source and destination nodes wherever possible thus reducing the number of hops. Only in much Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 12 fewer cases, in which direct connection is not possible, is the message sent through the intermediate node. In this Chapter we discuss a communication system designed for the Multi-Transputer system at U B C . The communication system is part of a real-time operating system being developed for robotics control. Since each transputer has four links, each transputer can be connected to at most four transputers at a time. Since in a general system a process on a node may send/receive messages to/from processes on a large number of nodes, the upper bound of four, on the number of links prohibits formation of a general communication system. A number of solutions have been proposed to alleviate the restriction imposed by the transputer as pointed out by Jones and Martin [26], such as message passing through routing, channel multiplexing, and dynamic reconfiguration. In message passing systems, messages hop from the source node to the destination node. To make message passing efficient new techniques such wormhole routing [19], virtual-cut through [20] have been developed. Virtual-cut avoids unnecessary buffering. When the header of a message packet arrives at an intermediate node whose selected output channel is free, the packet will be directed to the next node immediately (hence the name cut-through). Therefore a message packet is buffered only when its selected output channel is busy. In wormhole routing packets stay inside the network while waiting for the selected output channel to become free. In our technique, in most of the cases messages are transported from the source to the destination node, and in few cases the messages hop through very few intermediate nodes. In channel multiplexing one or more virtual channels are multiplexed onto one hard-ware link. In a dynamic reconfiguration scheme, links are dynamically switched to allow any arbitrary connection. Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 13 We have chosen dynamic reconfiguration as the method to alleviate the restriction of upper bound on the number of links for communication because of the following reasons. • In message passing through routing, the messages may pass through a number of intermediate nodes, (this number of intermediate nodes would increase rapidly with the size of the multicomputer system), thus wasting computation cycles for message buffering and routing on the intermediate nodes. • Message passing is subject to deadlocks [25], avoidance or breaking of which might be expensive. • Messages transported through a message-passing system are subject to a non-deterministic delay as a result of factors such as queuing delay, retransmission delay, buffering delay etc. This delay depends on a lot factors like load on the in-termediate node, message traffic etc.. Since our communication system is designed for real-time use, we can only allow deterministic delays. • Automatic channel multiplexing/demultiplexing becomes intractable for even the simplest protocols. Since we are developing an operating system such a scheme is obviously not useful to us, also this scheme is practical only in a system with a small number of processors. • Dynamic reconfigurable systems performs significantly better than the static archi-tectures as experimentally shown by Das and Fay [27] Our scheme is in some ways similar to the one developed at University of Manchester [26]. The following are the basic differences : • In their system, synchronization between a send and a receive is maintained, whereas in our scheme, after sending the message, a sender is free to perform other Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 14 computations. The message is buffered by the kernel and sent to the destination node. • In their scheme a request for link, and release link is made by the process which wants to make the communication whereas in our scheme the kernel makes connect and release hnk requests, thus preventing any process to monopolize over a link. Here is a brief outline of our scheme. There are two kinds of nodes in our system, one ordinary (slave) node on which all the user processes reside, and the other is the master node, which controls the dynamic configuration of the point-to-point network formed by the links of the transputer. Whenever a process on the slave ( V M T M ) node has to send or receive a message, it makes the request to the kernel residing on each node. The kernel is aware of the connection of its links i.e. the kernel knows where are its links connected. It checks if the message is for a process, residing on one of the nodes connected; if it is, it sends the message through the link, (else it sends a. request to the Master node to connect one of the free links to the destination node. On receiving the response from the Master, the kernel can assume that the connection is made and sends the message. H a message has to be transferred from one node to another node which cannot be directly connected, then the message is sent onto an intermediate node. The message hops till it reaches the destination node. The number of hops is small in comparison to a system in which links cannot be dynamically configured i.e. for a system with statically connected interconnection network. The 10 processes running on the host machine are modified versions of CIO supplied by Logical Systems Inc.. Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 15 Vmtm 0 Vmtm 7 Bbk 0 (Master) Host Machine : Transputer (T800) 9 • fort 3 : Adaptor O : Cross Bar Figure 2.1: The Architecture Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 16 2.3 The Architecture: The architecture used to realize the communication system resides in two VMEbus card cages hnked together with a VMEbus repeater. The first card cage consists of eight VMTM boards (Parsytec), two BBK boards (Parsytec) and an Interrupt board. The processors on the VMTM boards communicate with each other via a point-to-point net-work formed by the links of the processing elements. Interconnection of this network is controlled by the processor on the BBK board. The architecture is as shown in Figure 2.1. Transputer: The Transputer is a chip manufactured by INMOS [21]. Transputers have 4 full duplex hardware serial links, which are used to make a connection between transputers. Each hard link has two half duplex hnks. V M T M Board: Each VMTM board [22] consists of four INMOS T800 transputers each with 1 MByte local memory. The transputers can be connected to the VMEbus through adapters. Each VMTM board has nine external ports. These ports can be con-nected together by cables, which enables communication among transputers on different boards . Also on each board is a 32x32 crossbar switch which is used to connect any link of any transputer on the board to any other hnk, any adapter or any port on the board. The VMTM node (transputer on VMTM) is a slave node i.e. it cannot initiate data transfer on the VMEbus. The hnk adapter, an IMS C012 chip, transforms data received serially from a transputer hnk to an 8 bit wide parallel data word that can be read from the VMEbus. This adapter in our architecture, performs two important functions 1. Input /Output(IO). 10 is done via the adapters. To enable 10 of processes residing on the transputer there are 10 server processes on the host machine. A protocol has been established between the 10 on the transputer and 10 server process on the host machine which enables the 10. Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 17 2. To send requests to the Master. The adapter can be in two modes. One is the interrupt generation mode; in this mode outputting a byte to the adapter creates an interrupt. The other mode is the ordi-nary mode in which no interrupt is generated. Ordinarily an adapter is in an interrupt generation mode but when 10 is performed the adapter should be in the ordinary mode. BBK Board The transputer on the BBK board is the bus master with 2 MBytes of dual-ported memory which can be asynchronously accessed by the transputer and the VMEbus. This bus master reads the requests pending through the adapters. Communication between a slave processor and a bus master can be initiated either by polling or by interrupt handhng. With polling, communication delay would quickly reach an unacceptable level as the number of nodes in the network increase, hence an interrupt facility has been developed. Interrupt Board: An interrupt board has been designed and made at UBC. When-ever a VMTM node outputs a byte onto the adapter it generates an interrupt on the interrupt board. Whenever an interrupt is generated on the interrupt board, the inter-rupt board generates a event on the event channel of the BBK node. This generation of the interrupt at the interrupt board depends upon whether the adapter is in the interrupt generation mode or not. Whenever a processor wants to connect one of it's link to a hnk of some other trans-puter, it puts the request on the adapter, which creates a interrupt on the interrupt board, and consequently an event is generated on the event channel. The master then reads the interrupt register on the interrupt board and finds out which node created the interrupt. It then fetches the request and performs the necessary steps to satisfy the request. If the request is satisfied, the response is sent back, else the request is queued. When the VMTM node receives the response it can assume that its hnk is now connected Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 18 to the requested hnk and can send messages to the destination node. Note that the hnk which is connected to the adapter, has two functions to perform, one is to send requests to the bus master and other is to perform I/O with the 10 process residing on the host machine. Care has to be taken that only one function is to be performed at a time, hence multiplexing on the adapter is required. 2.4 System Design A user process residing on a slave node can make four kinds of request namely: Send, Receive, ConnectHost and ReleaseHost. Send is a request for sending a block of message to a destination user process. The request is made to C M O U T (Communication Manager Out) process which buffers the request in one of the Writer queues. The Writer process makes a request for connection of its hnk to the destination node, to the Master Node via the BusOut process. In the Master node the request is read by Buslnterface process and the request is sent to the Allocate Resources process for the making of a connection. When the connection is made the response is sent to the Writer process via the Busln process. When the Writer receives the response, knows that the connection has been made and sends the message. The Reader process on the destination node receives the message and buffers the message on the destination process queue. The Receive request is made by the user process to C M O U T . C M O U T , on receiving the request, checks if the there is a message for the user process. If there is a message it sends the message else it sends a Null pointer. A Connect Host request is made to HC (Host Connector)process, which sends the request to the Master node via BusOut process. When the Master node's Buslnterface reads the request it sends the request to CioOut process. The CioOut process sends the Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 19 request to the MasterCioCtl. which unblocks a blocked CIO process. A Release host request travels from the CIO process to Cioln on the Master via the Master and SlaveCioCtl processes on the host machine. The process structure on the V M T M node is shown in Figure 2.2. To describe the functioning of the whole system , each part is described individually as an independent unit. 2.4.1 V M T M Node: Each V M T M node has the following server processes. A l l server processes are infinite processes i.e. they run forever and are shown in Figure 2.2. • Communication Manager Out (CMOUT). • Reader (READER) . • Writer (WRITER). • Communication Manager In (CMIN) • Host Connector (HOSTC). • Buslnterface Out (BUSOUT). • Buslnterface In (BUSIN). It might seem that there are a lot of processes on each node, and these processes would starve the the user processes. In actuality these processes do minimal jobs, and take very little processor time. To give an idea of the kernel size we will give some figures here. The kernel occupies 20.16 Kbytes, the Reader process occupies 1.82 Kbytes, the Writer process occupy 4.24 Kbytes, the C M O U T , the CMIN, and HOSTC together occupy 6.75 Kbytes of code, the BUSOUT and BUSIN together occupy 6.6 Kbytes of code. The stack Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 20 Figure 2.2: VMTM node (slave) server processes Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 21 space allocated to a Writer process is 1 Kbytes, to a Reader process and to HOSTC is 0.5 Kbytes each , and to BUSIN, BUSOUT, C M O U T , CMIN 1 Kbytes of stack space is allocated to each process. Communication Manager Out (CMOUT) The function of C M O U T is to receive messages from the user process, buffer the message and then queue the buffer onto one of the four queues. There are four queues for messages going out, three of which are associated with the three links used for communication and fourth for the pool of messages, where messages which cannot be sent immediately are queued. When the Communication Manager receives a message it puts the message in the buffer, and checks if any of the three hnks are connected to the destination node. If one of the hnks is connected to the destination node then the message is placed onto the queue associated with that hnk. If the message is for a node which is not currently connected to any hnk then it buffers the message in the common pool of buffers. Note that "Communication Manager receives a message" actually means Communi-cation Manager receives the address, and the length of the message rather than the whole message, this save execution time and buffer space. Writer There are three Writer server processes. Three processes are required to effectively use the D M A capability of message transfer on a transputer. While a message is being transferred on a hnk, another process can execute on the processor. Thus there might be a situation where all three Writers are writing onto the hnks, and some other process may be executing on the processor. Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 22 Each Writer process checks if its buffer queue is empty; if it is, it schedules itself off. If the queue is not empty it checks the destination node of the messages and makes a ConnectRequest to the Buslnterface Out, which in turn sends the request to the bus Master. When the request is satisfied, the Writer sends all the messages in the buffer to the destination node. After sending all the messages in its buffer, it checks in the common pool of buffers, if there are messages to be sent, if there are, then the Writer makes another ConnectRequest. If the common pool of buffers is empty i.e. there is no more messages to send, it sends ReleaseRequest to disconnect the hnk and schedules itself off. Reader Like Writer, there are three Reader processes one associated with each hnk. The reason is the same as explained earlier. Each reader process waits on the hnk associated with it. Whenever it receives a message it puts the message in the buffer, and queues the buffer onto the queue associated with the user for whom the message is destined. Communication Manager In (CMIN) This process handles Receive request. There can be two types of Receive requests, Syn-chronous and Asynchronous. In a Synchronous receive the receiver process is blocked until it receives the message. In an Asynchronous receive even if the the receiver process has no message in the buffer, it is not blocked. When the CMIN receives an asynchronous receive request it checks in the buffer associated with the user process, if the message is present, it returns the address and the length of the message. If there is no message it returns a N U L L pointer. When the CMIN receives a synchronous receive request it checks in the buffer queue associated with the user, if the message is present it returns the address and the length, Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 23 else it saves the request. After every n units of time it checks in the buffer whether the request is met, if it is, it sends to the user process the message and it's length. Note : If the value of n is too small then CMIN would check very frequently and hence waste processor time, if n is too large then a message might arrive at the destination node much before, it is received by the receiver. Hence there is a trade off for the value of n. HostConnector,(H.C.) This process serializes all requests for connection to the host (Sun) for 1 0 . When host-Connector receives the request, it sends the request to the Buslnterface Out, which in turn sends a request to the bus master. Buslnterface Out,(BUSOUT) Since, at any instant, not more than one process can write onto a hnk, BusInterfaceOut funnels writes onto the adapter and ensures that only one request is sent at a time. BUSOUT funnels request from : • Communication Manager Out: The communication manager may request to send a broadcast message to the Master. Broadcasting will be explained in Section 2.5. • HostConnector: Host Connector may send a HostConnect request. When the Host-Connect request reaches the host machine via the bus master, it unblocks a blocked 1 0 server process, thus enabhng the 1 0 . Details of 1 0 would be discussed in a later section. • Writer: Each Writer may send a ConnectRequest and a ReleaseRequest of the hnk associated with it. Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 24 On receiving any of these messages the BUSOUT sends the message to the Master node. When BUSOUT receives a message from HostConnector, it sends the message to the Master and blocks itself and BUSIN. Note that this is necessary since while a process is doing 10 there should be no other process reading or writing on the adapter either from within the transputer or from the VMEbus. Buslnterfaceln (BUSIN) Buslnterfaceln directs messages from the bus master to the appropriate server. It may receive the following messages from the Master : • Response for an earlier ConnectRequest: This response is sent to the appropriate Writer. No responses are expected of Release Request. • Broadcast Message: Messages are Broadcast via the Master, the message thus received is buffered onto the appropriate user queue. • Broadcast Acknowledge: A response to a previous broadcast sent, the acknowledg-ment is sent to C M O U T . • StartMonitoring: Used for starting the Monitoring process (explained later). • StopMonitoring: Used for stopping the Monitoring process and sending the results accumulated during the monitoring process. 2.4.2 BBK Node (Bus Master) This node is called the Master for two reasons. One is that this node can write onto the VMEbus and second that this node controls the circuit switching of the whole network. Since circuit switching of the whole network is managed by this node, performance of the system depends on the performance of the Master node. Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 25 Following are the processes on the Master node which are shown in Figure 2.3. • Interrupt Handler • Buslnterface • ReleaseResources • AllocateResources • Send Response • CioOut • Cioln Interrupt Handler This process waits on the event channel of the transputer. When it receives the inter-rupt on the event channel it fetches the interrupt register on the Interrupt board and checks which transputers have pending requests. It sends a signal to the appropriate Buslnterface process to fetch the request. Buslnterface This process receives requests from the node which created the interrupt on the Interrupt Board. There is one Buslnterface process associated with each transputer. This is done to enable transputers to send their requests in parallel. The following are the kinds of request which a Buslnterface process can receive. 1. Connect Link : This is a request to connect a particular hnk of a node to any hnk on the destination processor. Firstly a check is made whether the source hnk is Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 26 BBK (Master) Buslnterface Processes " Q . » . " Interrupt andter O - O c o o < Cio Server Processes Host Machine (Sun) Figure 2.3: Master and 10 processes Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 27 connected to any other hnk. If it is, then the resources (hnk, port) that it is holding are first released by sending the request to the ReleaseResources process, and then request is sent for allocation of resources to the AllocateResources process. 2. Release Link: The resources occupied are released by sending the release request to the ReleaseResources process. 3. Broadcast Message: A broadcast message consists of the process id of the source process, the process ids of the destination processes, the message , and the length of the message. Since only one process is allowed to write onto an adapter, the request is sent to the process SendResponse, whereby it is sent to the nodes where the destination user processes reside. 4. Connect Host: Sends the connect request to CioOut to be sent to the host machine. 5. Error: To indicate any error which might have taken place on the V M T M node such as a user process may send a message to a nonexisting user process. 6. Monitor Message : The monitor information received from each node is stored in a file. Presently we monitor the number, of messages Sent and Received by each user process, and the length of messages. We use this information for allocation of processors to processes. This information gives an accurate measure of the Communication cost. Release Resources and Allocate Resources Release Resources releases the resources held by a hnk whereas Allocate Resources allo-cates the resources needed to make a connection between the source and destination node. Detailed working of the processes is explained in the pseudo code in Figures 2.4 and 2.5 respective!}'. After allocating resources the Allocate Resources sends the response to the Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 28 SendResponse process to be sent to the node which made the request. If the resources ( e.g. hnk on the destination node, or port on the board ) are not available then the request is queued onto the queue associated with resource not presently available. When a resource is released the resource queue is dequeud, and if any request is pending due to that resource then the request is satisfied. Release Resource process is a high priority process. This is done so that resources are released as soon as they arrive since there may be requests pending due the resources used by these allocations. Send Response This process has the responsibility of sending a response to a request back to the node which made the request, to send broadcast messages and to send Start Monitor and Stop Monitor signals. CioOut This process receives from the Buslnterface process an adapter number on which the CIO server process has to be started. It sends the adapter number to the MasterCioCtl residing into the host-machine via the dual memory registers as shown in Figure 2.3. It first writes the adapter number of the process on the Output data register, and then makes the Output status register valid. The MasterCioCtl reads the adapter number, starts the corresponding CIO server process by sending the signal to the slave and changes the Output status register to invalid. Also CioOut disables the interrupt generation mode of the adapter. / Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 29 loop ( waitefor allocation request sourcenode,link,desnode); sourceboard = board on which source node resides; deshoard - hoard on which destination node resides ; if ( requested connection already made) { Send to SendResponse(sourcenode,link); } else [ if ( sourcenode == desnode ) ( I* Just one resource namely link on desnode is required */ ii (a link is free on desnode) ( make the connection; Send to SendResponse(sourcenode,link); ) else queue the request on desnode ; ) else (I* Two resources namely link on desnode and a port I* connecting deshoard and sourceboard */ if( sourceboard and deshoard are not directly connected) { choose an intermediate board which is the shortest path between the two boards ; choose a node on the above choosen intermediate board ; desnode - above chosen node; deshoard = board on which the new chosen node resides; ) if (a port connecting sourceboard and desboard available) [ reserve the port; if (a link is free on desnode) { make the connections; Send to SendResponse(sourcenode,link); ) else queue the request on desnode ; 1 else queue the request on the port; ) ) ) forever; Figure 2.4: Allocate Resources Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 30 loop[ wake for a release requestf sourcenodejinkjesnode); (ff the connection request was made from both the sides) I I* No resources art released because ike connection was requested*! /* from both the sides, the resources would be released only when V I* both sides make Release Request *l continue; I desboard = board on which destination node resides; sourceboard » board on which sourcenode resides ; if (desboard = sourceboard) { I* Only one resource has to be released namely link on desnode *l disconnect the nodes; (ff the queue associated with the destination node has a request)! try to satify the request; if (request satisfied) Send to SendResponse(dequednode4tquedUnk); 1 else I J* Two resources have to be released namely the desnode, and the port used *l Disconnect the sourcenode and port on the source node; Diconnect the desnode and port on the desboard ; f( the queue associated with the destination node has a request) ( try to satisfy the request; if (request satisfied) Sendto SendRcsponsefdequednodeJequedlink); ) iff the queue associated with the port used for connection has a request) ( try to satisfy the request; iff request satisfied) Send to SendResponse(dequednodeJequedtink); I ) I forever; Figure 2.5: Release Resources Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 31 Cioln This process receives messages from the MasterCioCtl on the host machine indicating that a 10 on a particular has completed. When Cioln receives this message it enables the interrupt capability of the adapter. 2.4.3 Host Machine (Sun) Generally 10 is done by a 10 process on the Sun and 10 routines on the transputer. A protocol has been developed (by Logical Systems Inc.) between the two , which enables the 10. The 10 process on the host-machine polls the adapter and if there is a request performs the function and sends back the response. This scheme is not sufficient and hence has been modified by us for two reasons. • Al l the requests which come through the adapter are not for 10. Recall that the Buslnterface Out uses the adapter to send Connect, Release and other requests. Hence we cannot allow the 10 processes to poll the adapter continuously. • Since there are a lot of transputers in the system, and there is one 10 process per transputer node, there would be a lot of 10 process polling the adapters using the VMEbus very frequently, thus making the VMEbus busy, and hence decreasing the performance of the system. We allow 10 process to poll only when one of the processes on the transputer makes a Connect Host request. To solve the above problems, we designed the 10 capability described below. The following are the processes used in the scheme and are shown in Figure 2.3. 1. Master Cio Controller (MasterCioCtl). 2. Slave Cio Controller (SlaveCioCtl). 3. Cio process (CIO). Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 32 Master Cio Controller Tins process checks a specified address on the Master node to see if there is a request, to start a CIO process. When it receives a request it passes the request to the SlaveCioCtl. Also it checks if a SlaveCioCtl has a CIO Completed signal from one of the 10 processes; if it has, it informs the Cioln process on the Master node through a specified address on the Master node. Slave Cio Controller The Slave Cio Controller passes Start CIO request from the Master to the CIO process and passes the CIO Completed signal from the CIO process to the Master Cio Controller. Cio Process This process is a modification of the CIO supplied by Logical Systems Inc.. Each CIO process is blocked on a pipe, until it receives a Start CIO request. On receiving the request it starts polling the adapter like an ordinary 10 process. When it receives a CIO Completed signal from the process on the V M T M node it sends the signal to the Slave Cio Controller and blocks itself until it receives another Start CIO request. 2.5 Buffers and Deadlocks In message passing systems deadlocks [25] like the one shown in the Figure 2.6. can occur. Such deadlocks cannot occur in our system because messages go directly to the destination node and hence would ultimately be consumed by a Receive. For message passing between nodes which are on boards not directly connected, a message is first sent to a node on the intermediate board, the message is passed from one board to another Chapter 2. Efficient Communications FacUity For Multi-Transputer Systems 33 BaTTei Node 1 Buffer Node 4 Buffer Node 3 Figure 2.6: Deadlock Situation until it reaches the destination board. Deadlocks for such cases are solved by a technique similar to the one given in [25]. There may be potential deadlock situations which are pertinent to schemes where the Writers and Readers are independent processes i.e. Reader cannot send back the status of the buffers to the Writer. And hence the state of the buffers in the destination node cannot be passed from the Reader on the destination node to the Writer on the source node. There are two situations in which blocking can occur but these two situations are avoided by the kernel by sending error signal to the sender. 1. A process is blocked on a Synchronous Receive (RecFromSy) from a particular user, the input buffer queue is full with messages, and none of the messages are from the user which the receiver process has requested. This certainly is a potential deadlock situation because the receiver would never receive the message, since the buffers are full and buffers will always remain full because the process is blocked . Such situations can be avoided by avoiding RecFromSy and using ReceiveSy. RecFromSy is a synchronous request from the user specified in the parameters (Section 3.8) where as ReceiveSy is a synchronous receive request from any user. The user id is Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 34 passed back as a parameter (Section 3.8). We see the occurrence of above situation as a error on the designer's part and break the deadlock by returning an error signal. The program in such a situation can either use MsgsForMe, or Msgslnfo (Section 2.7) and then wait for the message it was initially waiting for, or use a ReceiveSy rather than RecSyFrom. If ReceiveSy is used instead of RecFromSy then the potential blocking situation cannot occur because the blocking would occur only because the message for the user specified in the parameter cannot enter because all the buffers are full but with ReceiveSyn the first message message in the buffer would be received making space for one message. MsgForMe collects all the messages for the user and puts it in the buffer. Msglnfo gives the information about the messages in the buffers. The information contains the length of the message and the sender process id. 2. The other potential blocking situation is similar to the previous one. Such a situ-ation can occur between two or more processes. The situation is depicted in the Figure 2.7. Both A and B are sending messages to each other, their Input buffers are full and their Out buffers are full and both are sending messages to each other. We see this situation as an error on the designer's part, since a process whose in-put buffers are full should Receive the messages, rather than Send messages. Such deadlocks would occur in any kind of system i.e. all processes are sending messages but no one is receiving messages. In such a situation we return an error signal. To avoid one process from monopolising the buffers either for sending or for receiv-ing messages, we restrict any process from using more than a maximum number of buffers at a time. This maximum has been heuristically calculated as MaxBuffer = TotalMaxBuffer/N + TotalMaxBuffer/N * N + TotalMaxBuffer/N * N * N TotalMaxBuffer is maximum number of buffers allowed, and N is the number of user Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 35 Buffer In •4 Buffer Out FULL y , FULL FULL FULL Buffer Out Buffer In Figure 2.7: A deadlock situation in our scheme processes on one node. This prohibits any process from monopolising the buffers, and also avoids wastage of buffers, in case one or more processes do not Send or Receive messages. Note that blocking problems occurring in our scheme have a different nature than deadlocks which may occur in a static store-and-forward network. Detecting deadlocks in such systems (static store-and-forward) is a difficult problem. Although deadlock avoiding schemes can be used but these do not allow usage of maximum bandwidth of the system. 2.6 Broadcast Messages We have two options for broadcasting messages. One is to connect the hnks of the transputer on which the sender resides, to all the nodes on which the destination processes reside and then send the message, and other is to send the message to the bus master, which would in turn sends the broadcast message to the appropriate nodes. We chose the second scheme due to the following reasons. One is that in the first scheme the time between the message received at the first node and message received at the last node is Chapter 2. Efficient Communications Facility For Multi-Transputer Systems 36 nondeterrninistic and would depend on a lot of factors, whereas in the latter scheme all the messages would reach their destination nodes in close proximity . Another reason is the latter scheme is much more efficient than the former. 2.7 Monitoring The communication system is developed as a part of a real-time distributed operating system being developed here at UBC. Since allocation of processors to processes is an important problem of distributed systems, we have designed a monitoring scheme which monitors the number of messages sent and received by processes, average message length and execution time of each process. To enable this monitoring we have a Monitoring process on the bus master which every n units of time sends a message to SendResponse to send StartMonitoring to all the nodes. After another unit of time a StopMonitoring message is sent to all the nodes. On receiving the StopMonitoring message each node sends back the monitored information. The bus master collects and stores this information. The monitoring process repeats this process a number of times and takes the average of the values and stores it in a file. This information can be used to estimate the communication cost of the system and interprocess communication cost. 2.8 Communication Primitives The following are the communication primitives provided by the communication system. Send Primitives : int Send(int touser,char ^message, int length, char tag); Receive Primitives : char *Receive(int *fruser, int *length,char *tag); Chapter 2. Efficient Communications Facility For Multi-Transputer Sysb char *RecFrom(int fruser,irit *length,char *tag) ; char *ReceiveSy(int *fruser,int *length,char *tag); char *RecFromSy(int fruser,int *length,char *tag) ; char ReceiveAll(MSGS *msgs); Miscellaneous char MsgForMe(MSGINFO *msginfo); char MsgFrom(int £ruser,MSGINFO *msginfo); 2.9 Performance Results and Conclusions Performance results and conclusions are presented in the next Chapter. Chapter 3 Performance Analysis For the Communication System 3.1 Objective Performance analysis is an important issue in the design of a communication system. Our objective while analyzing the performance of our communication system is to test the claims we made in the previous chapter namely versatility, scalability, and efficiency. To test the claims made we measured four important parameters for different test environments. Comparing these parameters for different environments will give us some measure of truthfulness of our claims. The tests were made for an increasing number of transputers and for different length of messages. The test program is given in Appendix A. The test program reads from a file the length of message it has to send, how many messages it has to send to each user. After reading the information it sends messages to the user processes and then receives the messages back from the user processes. It finds out the mean message traversal time in the test case by dividing by the number of messages sent and also divides the result by two to get the message traversal time in one direction (since the message traversal time consists of time for message to go to the destination node and back). We made the following tests: 1. Measured the average length of the queue at the master controller. There is a queue at the master controller where the requests from the slave nodes are queued before they are processed. The mean length of this queue for a different number of 38 Chapter 3. Performance Analysis For the Communication System 39 transputers in the system, gives the measure of ability of the communication system to allow increasing number of processors, and gives a measure of the switching overhead. 2. Mean number of requests which are blocked because a resource is not available. An example of a blocked request is one which is made to connect a hnk to any hnk of another processor, but all the links of the other processor are connected to some other processors. In such a case the request is blocked until a hnk becomes free. Such a situation cannot arise in a system with a fixed interconnection net-work. Hence measuring this quantity gives us some measure of the disadvantage of a dynamically configured interconnection network over a fixed interconnection network. Mean number of blocked requests is estimated by adding the number of blocked requests after every unit of time and then to take the mean we divide it my the number of samples taken. 3. Mean response time of the requests for connection. Mean response time gives the measure of efficiency of the master controller and comparison of mean response time for system with different number of transputers gives the measure of scalability of our communication system. 4. Measure the Mean Message traversal time between the user processes. The message traversal time is the most important figure and gives the measure of the efficiency of the system. Comparing the mean message traversal time for system with different number of transputers will give us some measure of scalability of our communication system. Chapter 3. Performance Analysis For the Communication System 40 3.2 The Tests The following tests were made: 1. Set 1: • We measured the above given quantities for different length of messages. The message length varied from 8 Bytes to 2 KBytes. • The above test were repeated for 2, 4, 8, 12, and 24 transputers. Tests for transputers 8 and 12 were made twice each with different pattern of commu-nication. With the first test in each case the communication was restricted to be local (communication on the same board) and in the second case 50 % of the communication done was local and 50 % non-local. In all the tests the made the connections between the boards were made such that messages donot have to hop through the intermediate node. 2. Set 2: To compare the performance of our system with that of a system with a fixed interconnection network we measured the message traversal time for different length of messages and for different number of hops. Note that since we did not develop a complete message passing system for fixed interconnection network our test results do not show the overhead of the kernel which resides on each of the transputer nodes. To measure the average traversal time of our system we used four transputer nodes. The results of the test varies with the load of the VME bus. Since the reconfiguration messages have to the master through the VME bus, the load on the VME bus effects the performance of our system. And since the VME bus is used by a lot of users in our system, load on the VME bus is one single factor which effects the results of our system. Chapter 3. Performance Analysis For the Communication System 41 3.3 Results Before we present the results we would like to point out that for the first sets of results the absolute values are not important. The reason for this is that while testing, all processes on all the processors Send and Receive messages at their maximum capacity thus pro-ducing an extra load on the master controller. By maximum capacity we mean that the users processes just Send and Receive messages and do not do any other computation. In all practical situations processes Send and Receive messages much more infrequently. We made the tests to see how the parameter values change with different number of trans-puters in the system and for different length of messages. Thus the relative performance of the system for varied number of processors gives us a good measure of scalability of the system. The values of the tested parameters are shown in Figures 3.1 to Figure 3.6. Note the bump in Figure 3.1. We believe that the bump is because of an increase in the load on the V M E bus. Peaks in Figures 3.3, 3.4 and 3.5 are basically the characteristics of a parallel system. Since independent processes interact with each other independently and in a random fashion we get varied results. We are interested in their average values. An important result is shown in Table 3.1 and Figure 3.6. The values shown are the actual values because the tests were made in which Send and Receive was done at a slower rate. The figure also shows message traversal time for message which hops through a different number of nodes. The number of hops are from 1 to 5. The values do not show the overhead due to the kernel which would reside on each node. The actual overhead will include overhead due to allocating memory space, queuing the messages, delay at the queues, delay due to deadlock avoidance scheme, delay due to retransmission. The figures only show overhead due to retransmission. Even without all the delays which would be incurred in any practical system, our system will perform better than a system Chapter 3. Performance Analysis For the Communication System 42 Messge Length Figure 3.1: Mean Traversal time -.-2 PEs —t—4PEs -*- 8 PEs -o-8PEs -x- 12 PEs ..o.. 12 PEs . J... 24 PEs 10° 10' 10* 103 Message Length 10" Figure 3.2: Total Number of Requests Chapter 3. Performance Analysis For the Communication System 4 3 250 200 150 100 50 0 10° - . - 2 PEs —+— 4 PEs - • - 8 PEs -o-8PEs -x- 12 PEs ..o.. 12 PEs ..x.. 24 PEs 102 103 Messge Length 10" Figure 3 . 3 : Mean Response time Figure 3 . 4 : Average Queue Length Figure 3.5: Average Number of Blocked Requests with average number of hops 4 and greater and the average length of messages 512 bytes and above. In actuality our system will perform better than systems which have mean number of hops somewhere between 2 and 3. Note that the curves in Figure 3.6 would be shifted upwards if the message delay due to queuing, buffering and retransmission are included in the system. To give an idea of the number of hops required for various configurations we would like to give the average number of hops of a few configurations. In a three dimensional hypercube the average number of hops is 1.714. In a four dimensional hypercube ( 16 nodes) the average number of hops is 2.133. In a 2 dimensional mesh with 25 nodes the average number of hops is 2.5. Note that with transputers requiring 10 four dimensional hypercube and two dimensional mesh cannot be formed (because both need four links per node, transputers do have four links but one hnk is used for 10). In such statically connected networks average number of hops increases rapidly with an increase in the number of processors. Chapter 3. Performance Analysis For the Communication System 45 Number Message Length (in byt es) of hops 8 16 32 64 128 | 256 | 512 | 1024 | 2048 1 2.16 2.69 4.16 4.84 6.41 5.70 8.21 16.01 33.54 2 2.04 2.29 2.84 3.73 5.66 9.49 17.18 32.60 63.40 3 3.05 3.41 4.13 5.58 8.45 14.20 25.70 48.66 94.77 4 4.14 4.65 5.69 7.80 11.93 20.24 36.85 70.17 136.77 5 5.17 5.82 7.10 9.65 14.84 25.06 45.62 86.64 176.80 dynamic 71.00 70.00 73.00 75.0 75.00 76.00 83.00 82.00 84.00 Table 3.1: Message Traversal Time (fixed interconnection network vs. dynamically re-configured network.) Message Length Figure 3.6: Message Traversal Time (fixed interconnection network vs. dynamically reconfigured network.) Chapter 3. Performance Analysis For the Communication System 46 Number of Processors Mean Message Traversal Time for all length of messages (msec) 2 2.14 4 5.04 8 7.01 12 12.20 24 22.50 Table 3.2: Increase in Mean Message Traversal Time with increase in the number of processors 3.4 Discussion of the Results Though the results tend to vary quite a bit the results tend to show a particular trend. The average message traversal time has seen to increase slowly with an increase in the number of processors. Except for the bump (due to the increase in load of the VME bus) the increase in average message traversal time is consistent. From Figure 3.1 (graph for Mean Traversal Time) we can see that the mean traversal time increases linearly with the increase in the number of transputers as shown in Table 3.2. To remove the effect of load of VME we removed the bump by flattening the bump in the curve for 24 transputers. We can estimate from the above results that the average traversal time will become unacceptable (above 100 msecs) when the number of processors reach in hundreds. These results are affected by a single external factor and that is the VME bus load. Our system will perform much better, and would give more consistent results if the VME bus is isolated from rest of the users. The peaks in graphs in Figure 3.3, Figure 3.4 and Figure 3.5 indicate the random nature of the parallel processes. The tests if repeated again in a similar environment (a shared bus) would probably Chapter 3. Performance Analysis For the Communication System 47 give quite different results. The variation will depend on the speed and load of the bus used, but the trend of increase will remain the same. 3.5 Conclusions From the first set of tests it can be seen that mean average traversal time increases slowly with increase in the number of transputers as shown in Figure 3.1. From Figure 3.3 we can see that average response time of requests increases slowly with increase in the number of processors. From Figure 3.4 it is clear that mean number of requests at the master controller increases slowly with increase in the number of transputers. From Figure 3.5 we see that very few requests are blocked because of unavailability of resources. This shows that messages would not be blocked for long time because of the unavailability of resources. Although the performance of our system is good with small number of processors the performance of the system would become unacceptable as the number of processors reaches in hundreds. In such a situation the master Mall become a bottle-neck. In such a situation, cooperating masters (See Chapter 5.) would be required to improve the performance of the system. In a system with a fixed interconnection network the average message traversal time would become unacceptable much earlier. From the second set of results Figure 3.6 and Table 3.1 we see that our system will perform better than the system with a fixed interconnection network if the average number of hops increases above a certain threshold. This threshold depends on a lot of factors which include: switching speed of the cross-bars, speed and load on the bus, average length of messages, message distribution etc. Chapter 4 Module Assignment Problem for Real-Time Systems 4.1 Introduction Although distributed systems provide several advantages such as response time improve-ment, higher availability, graceful degradation in case of partial failures, a designer of distributed systems faces a lot of difficult problems, which requires careful consideration. We studied one of these problems in Chapter 2. Another problem namely Module As-signment Problem (MAP) is an important issue at the time of development of real-time and non real-time distributed systems. MAPs are computationally intractable, i.e. the computational requirement grows with power of the number of tasks 1 to be assigned. The load of a distributed system depends on both module execution times, and inter-module communication cost (IMC). If assignment is not done with due consideration, a module assignment can cause computer saturation. Therefore a good assignment should balance the processing load among the processors and generate minimum inter-processor communication (IPC) ( communication between modules not residing on the same pro-cessor). Balancing the load requires the modules to be distributed evenly among the proces-sors whereas reducing communication cost requires modules to be assigned to the same processor. Since the two aims are contradictory, a balance between the two has to be achieved to get an optimal or near optimal module assignment. JIn this Chapter module, process are used interchangeably meaning a sequential piece of code, and task and application task refers to a group of modules represented by its CFG. 4 8 Chapter 4. Module Assignment Problem for Real-Time Systems 49 MAP for Real-Time Distributed Processing Systems (RTDPS) faces additional con-straints, which makes MAP for RTDPS computationally more intensive than MAP prob-lems for non real-time systems. Since meeting response time constraint is the most important performance measure for RTDPS, meeting the response time constraint is the most important criteria for module assignment. Understanding this we have devised a scheme which assigns processes to processors such that both response time constraints and periodicity constraints are met. If such an assignment is not possible, assignment would fail and error would be generated. Our assignment algorithm does not take into consideration factors such as load balancing. We believe that the most important factor for RTDPS is meeting the deadline constraints and that's what our algorithm accomplishes. 4.2 Related Work A number of module assignment algorithms have been proposed. These algorithms can be broadly classified into three categories: graph-theoretic, integer programming and heuristic approaches [32, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44, 45]. Graph-theoretic methods become intractable for systems with more than two proces-sors. The graph-theoretic method has several other drawbacks as well, as pointed out by [30]: it does not have a mechanism for representing restrictions on resources, and real-time timing constraints. Furthermore, it cannot take into account precedence relations among tasks and take advantage of the fact that all tasks need not be allocated at the same time. The integer programming formulation of MAP has been considered by Ma et. al. [40] and Ma [41]. The objective of their model is to minimize the cost function. The cost function is the cumulative sum of execution times and IPC times. Chapter 4. Module Assignment Problem for Real-Time Systems 50 An integer programming problem for MAP is a NP-hard problem [30]. Hence heuristic methods are sought which would decrease the computational requirement. One common method of solving integer programming problems is to use a branch-and-bound search technique on a tree where vertices represent possible search states. Time complexity of this technique in the worst case is 0(nm) if a global optimum is required where n is the number of processors and m is the number of tasks to be assigned. However, with judicious checking of constraints, and judiciously branching to the next node a tighter bound can be put on the complexity of the technique. But then the technique would not guarantee an optimal solution. In most of the practical situations a good local optimum would suffice. Also optimal solutions are desirable only in situations where design constraints can be measured with a good degree of accuracy. Distributed systems to date lack methods to accurately measure IMC [31]. A work very closely related to our work is due to K. K. Leung [28, 29]. He developed a delay model using queuing theory. The delay model helped in estimating the response time of the application tasks. Using the model he developed a scheme to allocate the modules. He made the following assumptions: 1. Tasks are aperiodic. 2. Module invocation arrivals are independent. 3. Module invocation inter-arrival times are exponentially distributed. Using these assumptions and a well known queuing model he estimated the delays on each computer. Adding up the communication time, execution time and. the queuing delay he estimated the task response time. Using this he developed an heuristic algorithm to allocate the processors to modules. Chapter 4. Module Assignment Problem for Real-Time Systems 51 We believe that the assumptions he made are not true in most practical situations: 1. Most real-time tasks are periodic. 2. Module invocation arrivals are not independent, since the invocations depend upon the structure of the C F G . 3. Module invocation inter-arrival times are not exponentially distributed since the module invocations are periodic. M A P for real-time systems may have one of two different objectives. The objective is either to decrease the response time of the application task (soft-real time systems) or is that the allocation is made such that all real-time constraints are met. This is where the difference lies. Leung's objective was to reduce response time of the application tasks and not that the tasks have to meet their deadlines. Our objective on the other hand is that all response time requirements have to be met. Allocations are not made if the real-time constraints are not met. 4.3 Assumptions and Problem Formulation We make the following assumptions while discussing our algorithm: 1. A l l processors are identical. Thus the execution time for each module is same on all the processors. Our scheme is in no way restricted by this assumption but this makes the scheme much easier to comprehend. 2. The scheduling discipline is deadline scheduling which does not allow preemption. 3. The processes are generally periodic but aperiodic can be modeled by using their average invocation period. Chapter 4. Module Assignment Problem for Real-Time Systems 5 2 Following are the system parameters used in our scheme: n = total number of processors in the system, C F G = Control Flow Graph (CFG) (discussed later). k = total number of CFGs in the system, rrik = total number of modules in fcth C F G , Eki = execution time of ith module of kth. C F G , Rk = response time constraint of fcth C F G , Cij = IMC communication cost between modules i and j , CCij = IPC communication cost between modules i and j , Dim = distance between processor 1 and m, Dki = delay due to presence of other modules of the same C F G , Dk2 = delay due to presence of a common module. Common module is a module which is invoked by two or more CFGs, Dkz = delay due to presence of other modules of other CFGs, Xk = time-period of invocation of kth C F G , Ecm = execution time of common module. C E T = Cumulative Execution Time is sum of execution times and the IMC commu-nication costs in any path of a C F G . Subscript i and j are used to represent modules, subscript k used to represent C F G , subscripts I and m are used to represent processors. Subscript k will not be used when we are dealing with a case with only one C F G . CC^ a n d C^ are related by C C , j = C^ * Dim where Dim is the distance (in units of time) between the processors I and m on which the two modules i and j reside respectively. Chapter 4. Module Assignment Problem for Real-Time Systems 53 C EXIT 1 RT (Response C EXIT 2 ^ RT (Response Time 2) Time 1) Figure 4.1: An example of a Control Flow Graph (CFG 1). Chapter 4. Module Assignment Problem hi Real-Time Systems 54 4.4 Incorporating Real-time constraints An application task in a RTDPS is partitioned into a set of modules. The precedence relationships among the modules can be represented by a task control flow graph (CFG) [28] as shown in Figure 4.1. The task is invoked by an external signal. This invocation can be periodic (in which case the C F G executes periodically) or non-periodic ( in which case C F G executes aperi-odically). Periodic invocations are characterized by their time-period, whereas aperiodic invocations are characterized by their mean time-period of invocation. After a module completes its execution, it sends messages to invoke (enable) a suc-ceeding module(s) as indicated by the C F G . A C F G may be made up of sub-control flow graphs. There are four types of sub-control flow graphs: sequential thread, and-fork to and-join, or-fork to or-join and loop. The subcontrol graphs are shown in Figure 4.2. A C F G can be any complex structure made out of these subgraphs. In fact these subgraphs can also be comprised of other subgraphs. AD arcs starting from a And-fork, or an Or-fork may not join together to form And-join or Or-join (respectively). Such an arc should have its own response time constraint. Each C F G should have at least one exit node (port). Each port should have a response time constraint associated with it called the port time. Port time is the time within which an invoked task should finish it's execution else it would be considered a failure. An application may consist of more than one C F G . Now we discuss how real-time constraints can be formulated as an integer program-ming problem. Our scheme of expressing constraints is similar to and derived from a scheme by B. Dasarathy and M . Feridun [30]. Our expressions contain timing delays due Chapter 4. Module Assignment Problem for Real-Time Systems 55 i — * A Or SubGraph Figure 4.2: Examples of Subgraphs Chapter 4. Module Assignment Problem for Real-Time Systems 56 to presence of modules of the same CFG and also of other CFGs. We will use these con-straints in our assignment algorithm. Note a task refers to a set of modules represented Each control flow graph has two types of constraints [30]. 1. Deadhne or Latency constraint : Each task after being invoked by an external signal should respond ( by executing all its modules) before or by its response time ( also called port-to-port time). This constraint of completing the task on or before a deadhne is called a deadhne constraint. 2. Periodicity or bandwidth constraint: A task may be invoked repeatedly, modules comprising the CFG should be assigned such that none of these invocations are lost or queued for a long time. If we don't force this constraint invocation requests would keep on accumulating to the extent, that the invocation requests can no longer be stored by the system. Now we express how the constraints can be specified. We will start with a simple case and then develop the most general case. Independent CFGs Independent CFG is the one in which all modules are invoked from within the CFG. The assignment of modules can be done in two ways. CASE I: All modules are assigned to processors not having any other module assigned to it i.e. when a module is invoked it is ready to execute. The latency constraint for such a case is given by [30]: by its CFG. (4.1) Chapter 4. Module Assignment Problem for Real-Time Systems 57 where Cdj = CiiDhn (4.2) Module i resides on processor I and module j resides on processor m. And the periodicity constraint is given by [30] maxEi < X (4.3) CASE II : More than one module can be assigned to a single processor. N.B.: In an independent CFG we are implicitly assuming that all modules assigned to a single processor are modules that belong to one CFG. The latency constraint for this case is: Y^Ei + ^CCij + D^K (4.4) i i,j where D\ is the delay due to presence of more than one module on a processor. This delay is because an arrival of an invocation does not guarantee immediate execution of the invoked module. This is because another module may be executing at the time of arrival of the invocation request. In addition to the module executing there may be modules in the ready hst, the modules which were invoked before the last module and have not yet completed their execution. How such a delay is estimated is the topic of the next section. The periodicity constraint for this case is: max{ Yl El + D1)<X (4.5) for all i on processor 1 i.e. sum of execution times of all modules of a CFG which are assigned to any one processor plus the queuing delay on that processor should be less than the time-period of invocation. Chapter 4. Module Assignment Problem for Real-Time Systems 58 Dependent CFGs Dependent CFGs are CFGs which have at least one module in common i.e. a module can be invoked either when it receives invocation from one of the parent CFG (an Or-join) or when it receives invocation from all parent CFGs (an And-join). There can be many ways in which modules can be assigned to the processors. We will discuss two of them: CASE I : In this case all processors contain modules from either of the CFG, but not from both. This implies that the module common to both the CFGs should reside alone in a processor (as it is part of both CFGs). The latency constraints for such a case are : Eki + £ CCkij + Dki + Dk2 < llk (4.6) Dk2 denotes the delay incurred by the CFG k because of the common module. Dki is the delay incurred because of the presence of more than one module on a processor. The method to estimate the delay will be shown in a later section. The periodicity constraints for such a case are : Eki + D f ci -f Dk2 < Xk (4.7) i and Em < Xk (4.8) CASE II : This is the most general case in which there is no restriction on module assignment. The latency for this case is similar to Eq(4.6) except that an additional term Dk3 is added to the left hand side of the inequality. The additional delay Dk3 is incurred when a processor has modules from more than one CFG. Dk3 is sum of delays on all processors Chapter 4. Module Assignment Problem for Real-Time Systems 59 on which modules of CFGk resides. The delay is zero on processors which has modules belonging to CFGk only. The latency constraints for such a case are : £ Eki + £ CCkij + Dkl + Dk2 + Dk3 < Kk (4.9) * *J The periodicity constraints for this case are same as periodicity constraints of the previous case except an addition of Dkz on the left hand side of the inequalities. The periodicity constraints for such a case are: £ Eki + Dkl + Dk2 -r Dk3 < \ k (4.10) and ^ - f £>fc3 < Afc (4.11) We have imphcitly made some assumptions. Periodicity constraints are not affected by the communication cost. This is not exactly true, but for all practical purposes can be assumed to be true. 4.5 The Delay Model If there is a single module present on a processor, an invocation request immediately follows the execution of the module. But if more than one module is present, then immediate execution of the module is not guaranteed, in fact most of the time the start of the execution will be delayed. We call this delay the queuing delay. Invocation request (customers) wait to be executed on the processor (to be served by the server). The queuing model is as shown in Figure 4.3. There are n different kinds of customers (different software modules on a processor), each with Ri (rate of invocation of the modules) rate of arrival and each kind of customer has a fixed service time Si (execution Chapter 4. Module Assignment Problem for Real-Time Systems 60 Figure 4.3: The Queuing Model times of the modules). No known queuing model closely resembles this model hence it would not be very appropriate to use an}' of the known queuing models. We have developed a model to estimate the delay of a module. The model is based on simple heuristics and is discussed below: Before we discuss the model we would like to introduce some terms. Each module has a Response Time (RT) associated with it. This RT indicates the time within which the the module has to complete its execution. If it does not then there is no way the application task of which the module is a part can meet its latency constraint. Hence meeting the RT of a module is a necessary but not sufficient condition for the application task to meet its latency constraint. In other words RT of a module is the time within which the module after the invocation of the whole application task has to complete its execution. How RT for each module is estimated will be discussed in the next section. Since the delay of a module depends on the presence of other modules we will discuss Chapter 4. Module Assignment Problem for Real-Time Systems 61 in detail the various possibilities in which a module is assigned to a processor. The following is the complete list of possibilities: CASE I: A module is the only module on the processor. Queuing delay in this case is obviously zero. There may be other delays such as context switching delay or delay due to communication. Henceforth we will be discussing only the queuing delay. CASE II: A module is present with other modules, but all other modules belong to the same C F G . To estimate the delay in such a situation we have to examine the timing model we use. There is a fixed timing relation (approximately) between different modules i.e. the time at which a module starts it's execution can be fairly weD estimated. Since the execution time is known, we can estimate the time duration during which the module executes. A delay would be incurred by a module if its execution duration overlaps with that of some other module. The delay at worst would be equal to the overlap. Consider execution of a module A as shown in Figure 4.4. The module starts it execution at time t. Let an invocation request for another module B arrive before A completes its execution. B will have to wait until A completes its execution. The delay is clearly equal to the overlap period. Hence the delay is estimated as the execution period overlap with other modules of the same C F G . CASE III: A module is present with other modules and one of which is at least of a different C F G . We will work backwards to estimate the delay for such a situation. We start by assuming that our allocation is such that latency and periodicity constraints are met. If the latency constraint is met for the application task then it follows that RT requirements of each module is met too. Since the periodicity constraints are met (this we make sure by not allowing the total Chapter 4. Module Assignment Problem for Real-Time Systems 62 Process A | -Process B Overlap rime Figure 4.4: Delay is introduced when execution durations of two modules overlap. execution cost of the modules of the same C F G to increase the TP of invocation of the apphcation task on a single processor), each process should complete its execution before another invocation for the same module arrives. Hence the queuing delay of a module is given by, QDelayi<X-Ei (4.12) and the maximum delay is given by, QDelayimax = \ - E l (4.13) While estimating the delay of a module we assume that a module is subjected to maximum delay, and this is true in case where a processor is fully utihzed i.e. total utilization of all the modules assigned on the processor is equal to maximum utilization allowed. For the RT requirement of the module to be met another obvious condition should be met and is given by, RT -CET >TP (4.14) which simply implies that for the RT requirement for the module to be met, RT require-ment should be greater than or equal to the time period of the task. C E T (Cumulative Chapter 4. Module Assignment Problem for Real-Time Systems 63 execution time ) for a module denotes the time at which the module is invoked. It is the time take by execution of all the previous modules and time taken by communication between all previous module pairs. CASE IV: A module is a Common Module ( C M ) . Recall that a C M is a module which is invoked by modules of more than one C F G . A C M and all descendants of a C M are supposed to be part of all C F G s of which the module is a C M . C M can be of two types: O R or A N D . An OR C M : A C M is of type O R if it is enabled for execution by invocation from any of its parent module. The maximum delay such a module can be subjected to is given by: QDelay < (No.of Parents — 1) * ExecutionTime (4.15) A n invocation request may arrive at a time when the C M is already executing and there are other invocation requests for the C M queued waiting to be satisfied. The maximum number of invocation requests that can be present is one less than the total number of parents. Hence maximum delay is the number of invocations requests present multiplied by the execution time of the C M . An AND C M : A C M is of type A N D if it is enabled only when it receives invocation requests from all its parents. Addit ional delay in such a case is given by, QDelay < RT — ExecutionTime (4-16) The argument for this delay is similar to that due to modules from different C F G s . Delay can not be more than RT — ExecutionTime , because if it is then no possible, assignment would be able to meet the latency constraint. Chapter 4. Module Assignment Problem for Real-Time Systems 64 4.6 The Algorithm The most important contribution of our algorithm is that it calculates the number of pro-cessors required to make the assignment. The number of processors not only depend on the execution cost of the modules but also on the latency and the periodicity constraints. Processors are added, to make sure that both periodicity and latency constraints are met. We believe that for a fairly large system estimating the number of processors required is not an easy task. Making such an estimate for a real-time system is even more difficult. Also throughput of a system need not always increase with the number of processors, in many cases the over all throughput may even decrease. Therefore a good estimate of processors required is necessary. Our scheme makes the assignment depending on the communication cost, the struc-ture of the CFG, the maximum processor utilization, the periodicity and the latency constraint. Assignment of a module is made only if all the requirements are met. There is another important difference: almost all algorithms have an objective cost function which they try to minimize, our scheme has no such objective function. We have just one objective, assignment of modules such that both the periodicity and the latency constraints are met, possibly in a minimum number of processors. Minimizing the number of processors is not an objective function. In schemes in which there is an objective function, a number of heuristic or random solutions are generated and the solution with the best value of the objective is chosen as the solution. In our scheme allocation of processes are made only if their allocation is such that it ensures that the deadlines of the CFGs would be met. 4.7 T H E A L G O R I T H M 1. Read the CFGs, IMCs and Interprocessor distances. Chapter 4. Module Assignment Problem for Real-Time Systems 65 2. Check the latency and periodicity constraints for each C F G read. If either con-straint cannot be met eq (3.1) and eq (3.3) , notify and exit. 3. Preprocess CFGs (detailed in next section). 4. For i = 1 to Number of graphs do 5. Allocate the Current module to the Current Processor (CurrP). 6. Check if allocating of this module, produces a delay on the already allocated mod-ules such that the Response Time (RT) constraint is not met any more. If there is no such module then the assignment is made, go to Step 11. Else go to Step 7. 7. For each processor pePU (PU is the set of processors on which modules have already been assigned) find the utilization factor left and increase in delay of other modules already present if this module is assigned to the processor. Select the processor which has maximum utilization factor left and the total increase in delay of already present modules is minimum. 8. Assign the module to the processor chosen in Step 7 and check if allocation of the process to the processor produces a delay in already present module, beyond their RT. If there is no such module, the module is assigned to the processor and go to Step 11. If there is at least one module whose delay is increased beyond its RT then the assignment is not made. 9. A new processor is required ( no processor already present can accommodate this module). Add a new processor. Assign the module to the new processor. 10. Check the latency constraint of the newly assigned module. If the constraint is met go to Step 11, else notify that the allocation cannot be made and exit. Chapter 4. Module Assignment Problem for Real-Time Systems 66 11. For j = 1 to Number of Children of the just assigned module: Allocate all children by invoking Step 5. 12. Check if the module is the last module of the graph to be allocated. If it is, check the latency constraint, if met go to next Step, else notify the failure and exit. 13. Check if all graphs are allocated. If allocated exit else go to Step 4. Note that the process of assignment is a recursive process. 4.7.1 Preprocess CFGs CFGs are processed before they are used for assignment. Preprocessing has four impor-tant functions to perform, which influences the assignment. They are: 1. Set the response time requirement of each module: This is done by pro-cessing each CFG in a backward direction starting from the exit modules. The RT of the exit module is the RT of the apphcation task, exiting at that module ( an application task can have more than one exit modules and hence more than one RTs). CFG is then traversed in the upward direction and RT of each module is set. To estimate the RT of a module, we use the RT of the application task and subtract from it the execution cost of the modules below it (the module's descendants) and an estimate of communication time. 2. Adjust the execution cost of the modules: Descendants of an OR module, are not invoked each time their parent are invoked i.e. only one of the arc is invoked each time the parent completes its execution. Recall there is a probability of invocation associated with each arc. To accommodate this mutual exclusive behavior of the sibhng modules, we adjust the execution time of each descendant Chapter 4. Module Assignment Problem for Real-Time Systems 67 of an OR module. The adjustment is made by decreasing the average execution cost by multiplying it with the probability of its invocation. 3. A r r a n g e the C h i l d r e n : An OR and an AND module has more than one child. Which child arc should be allocated first, influences the assignment. We assign the arc in the decreasing order of available time Tav. Tav is the measure of difficulty for modules of the arc to meet its latency constraint. Tav of meeting the latency constraint depends on the amount of communication between the two modules and the RT of the module. We estimate the available time (Tav) by, TavaRTchiid — ExecutionTime chnd + CommunicationCost parent,chiid (4-17) 4. A r r a n g e the C F G s : If the application has more than one CFG, then order in which the graphs are assigned influences the assignment. Hence we assign CFGs in decreasing order of difficulty with which the latency constraint are met. The difficulty is estimated by subtracting CET of the longest path from the RT of the application task. Surprisingly, this heuristic does not influence the assignment to a great deal except that the modules are assigned to different processors. We believe that there should be CFGs for which this heuristic will make remarkable difference. 4.7.2 A n o t h e r i m p o r t a n t heurist ic As already pointed out that all arcs going out of an OR module are invoked one at a time i.e. they are mutually exclusive. Using this, we try to assign modules which are OR-related to the same processor. This improves the assignment as a whole by using less processors, without much effecting the application task response time. Chapter 4. Module Assignment Problem for Real-Time Systems 68 Module No. Execution Time (psec) 1 25.0 2 22.0 3 12.0 4 30.0 5 112.0 6 12.0 7 82.0 8 82.0 9 64.0 10 10.0 11 18.0 12 22.0 13 20.0 14 8.0 15 10.0 Table 4.1: Execution Time of modules of C F G l Two modules are OR-related if they have some common OR-parent. 4.8 S t u d y of the A s s i g n m e n t a l g o r i t h m To study the performance of our algorithm we will consider a few applications. The assignment is studied for different response times and invocation times. To estimate the performance of the algorithm, our algorithm also computes the minimum number of processors required. This number is based on the execution cost and does not consider the latency constraint, and the communication cost. To test the performance we also assign the tasks with a very large response time, in order to remove the latency constraint and then compare the number of processors required. Four different applications are considered. The first two applications have one C F G each and are shown in Figure 4.5 and Figure 4.6. The execution times and the IMC are Figure 4.5: C F G l for Test 1 Chapter 4. Module Assignment Problem for Real-Time Systems 70 Module From Module To IMC Communication Cost 1 2 12.0 2 3 12.0 2 4 12.0 2 5 12.0 3 6 10.0 5 8 8.0 5 9 8.0 8 12 8.0 9 13 8.0 11 15 8.0 12 14 8.0 13 14 8.0 14 15 2.0 Table 4.2: Communication Time of modules of C F G l Exp. No. Response Time ( .^sec) Invocation Period ( .^sec) Required Exit 1 Exit 2 1 82.0 200.0 200.0 2 82.0 200.0 100.0 3 82.0 215.0 100.0 4 82.0 230.0 100.0 5 ' 80.0 218.0 75.0 6 80.0 320.0 75.0 7 600.0 1023.0 100.0 Table 4.3: Response and Invocation Times for C F G l , Test 1 Chapter 4. Module Assignment Problem for Real-Time Systems 71 Exp. No. Assignment Estimated Response Min. Proc C P U l CPU2 CPU3 CPU4 CPUS Time (LISCC) Required 1 1 3 - - - 196.4 2 2 4 -5 6 - -6 7 - - -12 9 - -14 10 - - -15 11 - - -- 13 - - -2 could not be allocated 3 1 5 3 - 213.4 3 3 12 4 - -8 14 6 - 15 9 - -- - 10 - -- - 11 -- - 13 - -4 1 4 7 - - 207.4 3 2 5 8 3 11 12 - -6 15 14 -9 - 10 -10 - - - -5 1 5 8 3 7 216.4 4 2 - 12 4 -- - 14 6 - - - 10 -- - - 11 - - - 13 -- - - 15 -6 1 8 5 7 216.4 4 2 11 14 9 -3 12 - 13 -4 15 -6 - - - -10 _ _ - -7 1 5 4 7 - 248.4 3 2 7 8 9 -3 12 13 -6 - 13 -9 - 14 - -10 15 _ Table 4 .4 : Assignment Results for CFGl, Test 1 Chapter 4. Module Assignment Problem for Real-Time Systems 72 shown in Table 4.1 Table 4.2 and Table 4.5, Table 4.6 respectively. All times are in //sec. Chapter 4. Module Assignment Problem for Real-Time Systems 73 Module No. Execution Time (/zsec) 1 16.0 2 14.0 3 10.0 4 10.0 5 6.0 6 4.0 7 4.0 8 6.0 9 10.0 10 2.0 Table 4.5: Execution Time of modules of CFG2 For the study of the communication system we assume that communication cost between any pair of processor is the same (1 unit delay). The algorithm developed can actually take care of situations where communication cost between processors vary. The assignment is done for different response times and invocation periods and the result are shown in Table 4.4 and Table 4.8 for response time and invocation periods given in Table 4.3 and Table 4.7 respectively. The figures under the columns labeled C P U denotes the module number. If the application tasks consists of more than one CFGs then we use an ordered pair (x, y) where x denotes the module number of the C F G y. The third application consists of two independent application tasks. The CFGs of Figure 4.5 and Figure 4.6 are used. The assignment is done for various combinations of response times and the result of assignment is shown in Table 4.10 for response times and invocation periods as shown in Table 4.9. The fourth application consists of two application tasks with a common module. The application task and the common module is shown in Figure 4.7. Execution costs and IMC communication costs are same as before. The assignment is done for various combination of response times and the result of assignment is shown in Table 4.12 for response times and invocation periods given in Chapter 4. Module Assignment Problem for Real-Time Systems 74 Figure 4.6: CFG2 for Test 2 Chapter 4. Module Assignment Problem for Real-Time Systems 75 Module From Module To IMC Communication Cost 1 2 2.0 2 3 2.0 2 4 2.0 3 5 2.0 3 4 0.0 4 10 8.0 5 7 8.0 6 8 8.0 7 9 8.0 8 9 8.0 9 10 2.0 Table 4.6: Communication Time of modules of CFG2 Table 4.11. Chapter 4. Module Assignment Problem for Real-Time Systems 76 Exp. No. Response Time Required (/xsec) Invocation Period (^ jsec) Exit 1 1 60.0 20.0 2 60.0 100.0 3 60.0 25.0 4 120.0 100.0 5 120.0 20.0 5 20.0 50.0 Table 4.7: Response and Invocation Times for CFG2, Test 2 Chapter 4. Module Assignment Problem for Real-Time Systems Exp. No. Assignment Estimated Response Min. Proc, CPU1 CPU2 CPU3 CPU4 Time (//sec) Exit 2 Required 1 1 2 4 5 56.0 3 6 3 8 7 - - 9 -- - 10 -2 1 6 - - 57.0 1 2 7 - -3 - - -4 - - -8 - - -9 - - -10 - - -3 1 2 8 - 58.0 3 6 3 - -9 5 - -10 7 - -4 1 6 - - 57.0 1 2 7 - -3 - - -4 - - -5 - - -8 - - -9 - - -10 - - -5 1 2 5 4 57.0 1 6 3 7 -- - 8 -- - 9 -- - 10 -6 could not be allocated Table 4.8: Assignment Results for CFG2, Test 2 Chapter 4. Module Assignment Problem for Real-Time Systems 78 Exp. No. Response Time Invocation Period (//sec) Required (/i sec) C F G 1 C F G 2 Exit 1 Exit 2 Exit 1 1 80.0 320.0 60.0 75.0 30.0 2 80.0 270.0 60.0 100.0 30.0 3 80.0 310.0 60.0 75.0 30.0 4 80.0 310.0 60.0 75.0 20.0 5 70.0 380.0 70.0 120.0 80.0 6 70.0 270.0 70.0 69.0 20.0 Table 4.9: Response and Invocation Times for CFG2, Test 3 Chapter 4. Module Assignment Problem for Real-Time Systems 79 Exp. No. Assignment Estimated Response Min. Proc. C P U l CPU2 CPTJ3 CPU4 CPU5 CPU6 CPU7 CPU8 Time (/xsec) Required C F G l CFG2 1 (1.1) (8,1) (5,1) (7,1) (1,2) (2,2) 59.8 216.4 54.0 6 (2,1) (11.1) (14,1) (9,1) (4,2) (3,2) (3,1) (12,1) (13,1) (6,2) (5,2) (4,1) (16.1) (7,2) (7,2) (6,1) (10,2) (9,2) 2 (1,1) (4,1) (7,1) (1.2) (2,2) 52.8 207.4 54.0 5 (2,1) (5,1) (8,1) (4,2) (3,2) (3,1) (11,1) (12,1) (6,2) (5,2) (6,1) (15,1) (13,1) (8,2) (7,2) (9,1) (14,1) (10,2) (9,2) (10,1) 3 (1,1) (8,1) (5,1) (7,1) (1,2) (2,2) 59.8 216.4 54.0 6 (2,1) (11,1) (14,1) (9,1) (4,2) (3,2) (3,1) (14,1) (13,1) (6,2) (5,2) (4,1) (15,1) (8,2) (7,2) (6,1) (10,2) (9,2) (10,1) 4 (1,1) (8,1) (5,1) (7,1) (1,2) (2,2) (6,2) (4,2) 59.8 216.4 57.0 7 (2,1) (11,1) (14,1) (9,1) (5,2) (3,2) (7,2) (3,1) (14,1) (13,1) (8,2) (4,1) (15,1) (9,2) (6,1) (10,2) (10,1) 5 (1.1) (5,1) (7,1) (1,2) (4,2) 59.0 191.0 52.0 4 (2,1) (11.1) (9.1) (2,2) (3,1) (12,1) (13,1) (3,2) (4,1) (14,1) (5,2) (6,1) (15,1) (6,2) (8,1) (7,2) (10,1) (8,2) (9,2) (10,2) 6 (1,1) (8,1) (5,1) (9,1) (7,1) (1,2) (2,2) (6,2) 59.8 224.4 57.0 7 (2,1) (11.1) (12,1) (5,2) (3,2) (7,2) (3,1) (15,1) (13,1) (8,2) (4,1) (14,1) (9,2) (6,1) (10,2) (10,1) Table 4.10: Assignment Results for Test 3 Chapter 4. Module Assignment Problem for Real-Time Systems 80 Exp. No. Response Time Invocation Period Required (/zsec) psec C F G 2 C F G 1 C G 1 70.0 320.0 40.0 75.0 20.0 2 70.0 320.0 50.0 70.0 17.0 3 70.0 320.0 50.0 100.0 100.0 4 100.0 270.0 150.0 75.0 40.0 Table 4.11: Response and Invocation Times for C F G l , CFG2 and C G , Test 4 4.9 Discussion The results obtained shows that the allocation of processes varies a great deal with response time and invocation period. With large invocation periods the actual number of processors used is less than the numer of processors used for smaller invocation periods. This is as expected, and the reason is the tighter periodicity constraints in case of smaller invocation periods. With an increase in the response time constraint, the tasks are allocated such that the estimated task response time is more. This is because relaxing the response time constraint (by giving it a larger value) does not force the algorithm to produce allocation with smaller response time, in such a case it produces a more balanced allocation. Also a point worth noting is that in applications with more than one C F G , it is very rarely seen that the modules of two different CFGs would reside on the same processor. The reason for this may be that CFGs used in the tests were such that they utilize most of the processor's time on the processors they reside on or that allocating the two modules of two different CFGs on the same processor incurs a large delay discouraging them to be allocated together. We also noted that when both response time and periodicity constraints are decreased then allocations could not be made. The reason is because of tighter periodicity constraint Chapter 4. Module Assignment Problem for Real-Time Systems 81 c E X I T 1 ^ E X I T 1 ^ R T (Response Time 2) Figure 4.7: CFG3 made out of CFGl, CFG2 and CG (Test 4) Chapter 4. Module Assignment Problem for Real-Time Systems 82 Exp No C P U l CPU2 CPU3 Assignment CPU4 | CPU5 CPU6 CPU7 CPU8 Estimated Response Time (fisec) Min. Proc. Required C F G l CFG2 C G 1 (1,1) (2,1) (1.3) (2,3) (3,3) (5,1) (8,1) (12,1) (14,1) (4,1) (9,1) (13,1) (15,1) (7,1) (2,2) (3,2) (5,2) (6,2) (7,2) (9,2) (10,2) 272.0 63.0 48.8 7 2 (1,1) (2,1) (1,3) (2,3) (3,3) (7,1) (5,1) (8,1) (12,1) (14,1) (4,1) (9,1) (13,1) (15,1) (1,2) (2,2) (10,2) (3,2) (5,2) (7,2) (9,2) (6,2) (8,2) 213.4 58.0 48.8 7 3 (1,1) (2,1) (1,3) (2,3) (3,3) (8,1) (5,1) (12,1) (14,1) (15,1) (4,1) (6,1) (9,1) (13,1) (1,2) (2,2) (3,2) (5,2) (6,2) (7,2) (9,2) (10,2) 272.0 63.0 48.8 4 4 (1,1) (2,1) (5,1) (8,1) (12,1) (14,1) (4,1) (11,1) (12,1) (15,1) (1,3) (2,3) (3,3) (7,1) (2,2) (1,2) (8,2) (3,2) (5,2) (6,2) (7,2) (9,2) (10,2) 272.0 63.0 43.0 7 Table 4.12: Assignment Results for C F G l CF2 and C M , Test 4 Chapter 4. Module Assignment Problem for Real-Time Systems 83 the algorithm tries to allocate the modules to different processors thus increasing the response time (allocating the modules to different processors increases IPC) beyond the response time constraint. 4.10 Conclusions The results obtained show that the assignment algorithm allocates the processes in min-imum number of processors or one or two processors more than the minimum number of processors. The number of processors used depends on the tightness of the deadhne. The algorithm in 52.4% of the cases allocated the processes to minimum number of proces-sors, in 42.4% of the cases allocated using one processor more than the minimum number of processors and in 4.2% of the cases used greater than one processor more than the minimum number of processors. Note that the number of processors actually used for assignment depends upon the deadhne constraint, execution times of the processes and the period of invocation of the process. 4.11 Summary The design objective for RTDPS is to assign modules to processors such that all period-icity constraints are met. A new scheme to estimate the response time of an application task has been intro-duced. Based on this scheme a heuristic technique has been described which assigns modules to processors, making sure that all periodicity and latency constraints are met. The technique uses simple heuristics to choose an assignment such that all constraints are met. The technique is then validated using a number of assignment problems. Chapter 5 Conclusions and Future Research Work 5.1 Conclusions 5.1.1 Real-time communication A new technique has been devised and developed to allow efficient and predictable com-munication among processes residing on different Transputers. The technique is novel in the sense that the point-to-point network formed by the links of the transputers is dynamically reconfigured. The interconnection network is dynamically reconfigured to enable messages to go directly to the destination node. In a fixed (static) interconnection network, a message may have to hop through the intermediate nodes. We believe that our scheme is better than a store and forward network because of the following reasons: • The messages reach the destination node directly, retransmission from intermediate nodes is avoided thus improving efficiency. • The message in the static network has to pass through intermediate nodes, hence there is an extra overhead involved in buffering the messages at the intermediate nodes. • In the static network the number of average hops required by a message increases rapidly with the increase in the number of processors. Thus such a system cannot be expanded very efficiently, whereas in our scheme the average delay in the messages increases much slowly. Hence our system can be expanded to a system with much 84 Chapter 5. Conclusions and Future Research Work 85 larger processors. Another master can be added to the system to improve speed and to allow more processors to be efficiently added to the system. • Store and forward networks are subjected to deadlocks whereas our scheme avoids deadlocks by sending the message directly to the destination node. We believe that detecting deadlocks in a distributed system is a difficult problem. In real-time sys-tems since the messages have to reach the destination in predictable time, deadlocks cannot be allowed in such systems (deadlocks lead to unpredictable delays). There are deadlock avoidance schemes but these do not allow the maximum bandwidth of the system to be used. • Since our communication system is to be used for a real-time system the commu-nication system should have predictable delay for message transfers. This is true for our system. In a store and forward network the delay is unpredictable due to the following reasons: - Load at the intermediate nodes affects the delay of the messages, - The path which the messages takes to reach the destination node affects the time of message travel, - A message delay depends to a large extent on other messages present in the system. Analysis show that our system performs better than the system which with a static interconnection network, and this has been confirmed by experimental results. 5.1.2 Module Assignment Response time is the most important criterion for real-time systems. Response time is greatly effected by the way modules are assigned to the processors hence module Chapter 5. Conclusions and Future Research Work 86 assignment is an important part of a real-time system. We have developed an analytic technique to measure the delay in a module. The delay in modules is due to: 1. IPC, 2. presence of other modules of the same CFG (since they contend for the same resources such as processor, I/O etc.), 3. presence of other modules of the some other CFG (since they contend for the same resources such as processor, I/O etc.), 4. presence of common modules ( module common to more than one CFG) The delay in a module is then used to estimate the response time of the application tasks. An heuristic algorithm is developed which assigns modules to processors taking into account the structure of the CFG, the communication cost between processors and the volume of communication between modules. The novel characteristic of our algorithm is that it does not require the number of processors as an input and estimates the minimum number of processors on which the modules can be assigned. We believe that for a complex system, estimating the number of processors required is an equally difficult problem as the assignment problem. Hence our scheme can conveniently be used in such situations in which number of processors required is not known. Our algorithm assigns only if allocation is possible, that is response time of all appli-cation tasks are met. Our scheme also takes care of situations in which a module or a CFG is part of more than one CFG. The module assignment algorithm should serve as a valuable tool for designing dis-tributed systems. Chapter 5. Conclusions and Future Research Work 87 5.2 Future Research Work 5.2.1 Real-time communication system A number of research areas need further investigation. They include the following: 1. A Fault tolerant system: It is obvious that in our system if a master fails the whole system will fail. Realizing the importance of real-time systems, the system can be made more reliable by adding a backup (slave) processor which remains dormant in normal operation, periodically collects the status of the network configuration and monitors the welfare of the master controller. When the backup process detects the the failure of the master it can take over the responsibilities of the master. 2. A Faster more Versatile system: If our system is expanded to a very large system, chances are that the master controller would become the bottle neck of the system. To avoid such a situation more than one cooperating masters can be used. 3. A hardware master controller: A chip performing the function of the master controller can be designed. The design of such a chip would reduce the switching overhead dramatically. 5.3 Assignment Problem A number of research areas need further investigation. They include the following: 1. Development of a real-time scheduler which would work in close association with the assignment algorithm. 2. Development of a monitor tool which periodically checks whether the response time of the task is met and reassigns some processes dynamically such that the response Chapter 5. Conclusions and Future Research Work times are met. 3. Improvement of the algorithm. A p p e n d i x A T h e Test P r o g r a m /... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. uitr.c i near program uaed to uki the parforunc* tasting / #include <conc.h> /include <stdio.h> #define USRDIR "/bon/fsO/robotice/eiLnjayc/comaunication/uBer/ma" #include " l l b . h " extern _procoaB_id; int MsgiTo[32];~ int lan; /* Baad tha information about the length, and the auabir of /* maaaagaa to ba aant to each uaar RaadMsgInfo(int IamUaar) < PILE *fp ; int ifj,k,lineno,from » -1,to,NoofMsga; char line(80],directive[10]; char Startad • 0; for(i=0;i<32;i++) Nag«To[i) = 0; if((fp° fopen(USRDIR,"r")> »= MULL) ( printf('Could not open file\n"); exit(l); > tgeta ( Una, 80, f p),-aacanf (line, »%d" ,&len); while ( fgeta(line,80,fp> I - HULL) < if(line(0] •« '« ') continue; if(Startad tt lamUser l~ from) return(0); aacanf(line,"%•"/directive); if(letrcep(directive,Troa:-)) < aacanf(line,-t'a %d",*from); } else < if( IamUaar ~~ from){ Started » 1; if(aacanf(line,-%d %d",*to,tNoofMags) <2) printf("Error on line Id",lineno); NagaXo(to) « MoofMsga; ) > lineno++; ) } main() < int i,j,k; int length; char .Msg ; int fruaar; nnaigned long tia>el,tim2,totaltime; char Tag,Meaaaga[«09i]; int naarno; uaarno " _froc«B_ld; ConnectHost(userno) ; RaadMagXnfo(_procesa_id); ReleaseBoat(uaarno) ; 89 References [1] D. W. Leinbaugh and M. R. Yamini, Guaranteed Response Times in Distributed Hard-Real-Time Environment, IEEE Transactions on Software Engineering, Dec. 1986. [2] M. A. Malcolm and R. Vasudevan, Coping with Network Partitions and Processor Failures in a Distributed System, IEEE Transactions on Distributed Systems 1989. [3] J. A. Stankovic, A Serious Problem for Next-Generation Systems, IEEE Computer, Oct. 1988 pp. 10-19. [4] B. Dasarathy, Timing Constraints of Real-Time Systems: Constructs for Expressing them, Methods of Validating them, IEEE Real Time Symposium, 1982, pp. 197-204. [5] A. Burns Occam's Priority Model and Deadhne Scheduling, IEEE Real Time System Symposium, 1988. Proceedings of the 7th Occam User group, Technical Meeting, 14-16 Sept. 1987, Grenable, France pp. 146-159. [6] K. G. Shin and Y Chang, Load Sharing in Distributed Real-Tim Systems with State-Change Broadcasts, IEEE Transaction on Computers Vol. 38, No. 8. August 1989. [7] W. Zhao et. el., Scheduling Tasks with Resource Requirements in Hard Real-Time Systems, IEEE Transaction on Software Engineering, Vol. 13 No. 5, May 1987. [8] S. R. Biyabani and J. A. Stankovic, The Integration of Deadhne and Criticalness in Hard Real-Time Scheduling, IEEE Transactions on Software Engineering, 1988. 90 References 91 [9] W. Zhao et. al., Preemptive Scheduling Under Time and Resource Constraints, IEEE Transactions on Computers, Vol. C-36, No. 8, August 1897. [10] L. Sha, R. Rajkumar and J. A. Lehoczky, Priority Inheritance Protocols An Ap-proach to Real-Time Synchronization, CMU-CS-87-181, December 1987. [11] D. W. Leinbaugh, Guaranteed Response Times in a Hard-Real-Time Environment, IEEE Transaction on Software Engineering, Vol. SE-6, No. 1, 1980. [12] J. F. Kurose, S. Singh and R. Chipalkatti, A Study of Quasi-Dynamic Load Sharing in Soft Real-Time Distributed Computer Systems, IEEE Real Time System Sympo-sium, pp. 201-208, 1986. [13] J. A. Stankovic et. al., Evaluation of a Flexible Task Scheduling Algorithm for Distributed Hard Real-Time Systems, IEEE Transactions on Computers, December 1985. [14] H. Toduda et. al. A Real-Time Monitor for a Distributed Real-Time Operating System, Technical Report CMU-CS-88-179, July 1988. [15] S. Chadha, M. Ito, P.Lawrence Efficient Communication Facilities for Multi-Transputer Systems, 4th Annual Symposium on Parallel Processing, Fullerton, Cal-ifornia, April 4-6 1990. [16] H. T. Rung Network-Based Multicomputers: Redefining High Performance in the 1990s January 1989 Technical Report CMU-CS-89-138. [17] S. Borkar et. al. iWarp: An integrated Solution to High-Speed Parallel Computing pp 330-339, Proceedings of Supercomputing '88, IEEE Computer Society and ACM SIGARCH ,(Orlando, Florida, November 1988). References 92 [18] E. A. Arnouldm et. el. , The design of Nectar: A Network Backplane for Heteroge-neous Multicomputers, Proceedings of Third International Conference on Architec-tural Support for Programming Languages and Operating Systems (ASPLOS III) ACM, (April 1989). [19] W. L. Dally and C. L. Seitz, The Torus Routing Chip, Distributed Computing 1 ,4, 1986. [20] P. Kermani and L. Kleinrock, Virtual Cut-Through: A New Computer Communi-cation Switching Technique, Computer Networks 3, 1979. [21] Preliminary Data IMS T800 Transputer, Inmos. [22] J. Lowenhag, VMTM Technical Documentation version 1.1 Megaframe series Hard-ware Documentation, Parsytec 1987. [23] H. J. Denuell, BBK-V2 Technical Documentation version 2.3 Megaframe series Hard-ware Documentation, Parsytec April 1988. [24] W. C. Athas C. L. Seitz, Multi computers: Message-Passing Concurrent Computers, Computer 21,8 (August 1988), 9-24. [25] J. D. William , L. S. Seitz, Deadlock-Free Message Routing in Multiprocessor In-terconnection Networks, IEEE Transactions on Computers Vol. C-36, No. 5 May 1987. [26] P. Jones and A. Murta, Support for Occam Channels via Dynamic Switching in Multi-Transputer Machines Proceedings of the 9th Occam User Group Technical meeting 19-21 September 1988 Southhammpton, UK. References 93 [27] P. K. Das and D. Q. Fay, Performance studies of Multi-Transputer Architectures with Static and Dynamic Links. Microprocessing and Microprogramming 24 (1988) pp 281-290. [28] W. W. Chu and K. K. Leung, Module Resplication and Assignment for Real-Time Distributed Systems, Proceedings of the IEEE, Vol 75, No 5, May 1987. [29] Kin Kwong Leung, Task Response time and Module replication for Real-Time dis-tributed processing System, PhD dissertation, UCLA, December 1985. [30] B. Dasarathy and M. Feridun, Task Allocation Problems in the synthesis of Dis-tributed Real-Time systems, IEEE Real Time System Symposium, 1984, pp 135-144. [31] W. W. Chu, M. T. Lan and J. Hellerstein, Estimation of Intermodule Communica-tion(IMC) and Its Application in Distributed Processing Systems, IEEE Transaction on Computers, Vol. C-33, no. 8, pp. 691-699, August 1984. [32] W. W. Chu and L. M. T. Lan, Task Allocation and Precedence Relations for Dis-tributed Real-Time Systems, IEEE Transactions on Computers, Vol. C-36, no. 6, June 1987. [33] K. Efe, Heuristic Models of Task Assignment Scheduling in Distributed Systems, Computer, Vol. 15 no. 6, pp. 50-56, June 1982. [34] C. C. Shen and W. H. Tsai, A Graph matching Approach to Optimal Task As-signment in Distributed Systems Using a Minmax Criterion, IEEE Transactions on Computers, Vol. C-34, no. 3, pp. 197-203, March 1985. [35] Gautam Kar and C. N. Nikolaou, Assigning Processes to Processors: a fault-Tolerant Approach. IEEE Transactions on Fault Tolerant Computing 1894 pp. 306-309. References 94 [36] S. Davari and S. K. Dhall, An on line algorithm for real-time tasks allocation, IEEE Real time Systems Symposium, 1986 pp 194-200. [37] H. S. Stone, Multiprocessor Scheduling with the Aid of Network Flow Algorithms. IEEE Transactions on Software Engineering, Vol SE-3, No 1, January 1977. [38] H. S. Stone and S. H. Bokhari, Control of Distributed Processes, IEEE Computer Volume 11 number 7, July 1978, pp 97-106. [39] S. T. Levi et. al., Allocation for Real-Time Computations under Fault Tolerance Constraints. IEEE Real-Time Systems Symposium 1988, pp 161-170. [40] P. R. Ma et. al., A Task Allocation Model for Distributed Computing Systems. IEEE Transactions on Computers, 1982, pp 249-255. [41] P. Ma and et. al., On Design of a Task Allocation Scheme for Time-Critical Appli-cations. I E E E Real-Time System Symposium, 1981, pp 121-125. [42] J. A. Bannister and K. S. Trivedi, Task Allocation in Fault-Tolerant Distributed Systems. Acta Informatica 20:261-81,1983 [43] W. W. Chu and et. al., Task Allocation in Distributed Data Processing. IEEE Com-puter pp 57-69 November 1980. [44] L. Meier, Allocation and Scheduling of Software Modules, IEEE Real-Time System Symposium, 1981, pp 127-129. [45] R. P. Ma, A Model to Solve Timing-Critical Application Problems in Distributed Computer Systems. IEEE Computer January 1984, pp 62-68. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0098236/manifest

Comment

Related Items