UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A topology independent parallel development environment Goldstein, Norman J. 1991

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-UBC_1991_A6_7 G66.pdf [ 4.54MB ]
JSON: 831-1.0051984.json
JSON-LD: 831-1.0051984-ld.json
RDF/XML (Pretty): 831-1.0051984-rdf.xml
RDF/JSON: 831-1.0051984-rdf.json
Turtle: 831-1.0051984-turtle.txt
N-Triples: 831-1.0051984-rdf-ntriples.txt
Original Record: 831-1.0051984-source.json
Full Text

Full Text

A T O P O L O G Y I N D E P E N D E N T P A R A L L E L D E V E L O P M E N T E N V I R O N M E N T By Norrnan J . Goldstein B. Sc. (Mathematics) McGi l l University Ph. D. (Mathematics) Cornell University A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S C O M P U T E R S C I E N C E We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A Apri l 1991 © Norman J. Goldstein, 1991 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada Date / f i V II HI DE-6 (2/88) Abstract This thesis describes a topology independent parallel programming environment, along with the motivation for its inception and design. The system is currently being used on a 74 node reconfigurable transputer network. A user's program is described to the system using a high level of abstraction. This speeds up program development and facilitates the modifying of program parameters such as "problem size". The system provides intelligent choices as the user's program is bound to the hardware. The binding may be controlled to any extent desired by the user. This allows the hardware to be used efficiently, while relieving the user from having to know intimate details of the underlying hardware architecture. 11 Table of Contents Abstract ii List of Tables vii List of Figures viii Acknowledgement ix 1 Introduction 1 1.1 Description 2 1.2 Other Work 6 2 Hardware and Maintenance Software 10 2.1 Hardware 10 2.2 System Software 13 2.2.1 Resources 13 2.2.2 Hardware Database 14 2.2.3 Ports 14 2.2.4 Crossbars 15 2.2.5 Session Setup 15 3 Crossbar Control 16 3.1 Introduction 17 3.2 Description of the algorithm 18 iii 3.3 Derivation of the algorithm 20 4 System Design 22 4.1 The Front end 22 4.2 User Input 23 4.3 The Back end 25 4.4 Outline 27 5 Sample Session 29 6 Channels 33 6.1 Events, Virtual circuits and ID's 33 6.2 Local Channels 35 6.3 Non-local Channels 36 6.4 Channel Properties 37 7 Resource & Place Modules 39 7.1 Resource module 39 7.1.1 Problem formulation 40 7.1.2 Constraints 40 7.1.3 Cost function 41 7.1.4 The constants 42 7.2 Place module 43 7.2.1 Outline 44 7.2.2 Discussion 45 8 Setup & Load Modules 46 8.1 Setup module 46 iv 8.2 Load module 47 9 Conclusions 48 9.1 Experimental Results 48 9.1.1 Spring 49 9.1.2 Ring 49 9.2 Future Work 50 9.2.1 TROLLIUS 50 9.2.2 LD-NET 51 9.2.3 High level 51 Bibliography 53 Appendices 56 A T M A P Implementation 56 A. l tmap 56 A. 2 chjsend , ch_recv 68 B Prep-p Based Modules 72 B. l vgraph 72 B.2 weight 77 B. 3 contract 78 C File Descriptions 81 C. l TMAP specific files 81 C. l . l Syslnfo 81 C.l.2 ParamLst 83 v C.l.3 LoadLst 84 C.l.4 ChanVIds 85 C. l . 5 Resources 86 C.1.6 LkData 89 C.l.7 ChanLks • . . . . 89 C. 2 Prep-p files 90 C.2.1 ProcLst 90 C.2.2 IOAdjLst 91 C.2.3 CodeSize 92 C.2.4 Degrees 92 C.2.5 EdgeWts 92 C.2.6 VipToRp 93 D Sample Source Files 94 D. l Control Information 94 D.2 Host Code 95 D.3 Root Code 97 D.4 Node Code 98 D.5 Substitutions 99 vi List of Tables 2.1 Ports to access the network 11 2.2 Reset Components 11 7.3 For the constant K\ of the cost function 43 9.4 Spring example ITB times 49 9.5 Spring example program times 49 9.6 Ring example program times 50 vii List of Figures 3.1 The crossbar control chain 18 3.2 The 3 switch setting procedures 19 5.3 The graph of vnodes and channels 31 viii Acknowledgement I take this opportunity to thank the many people who have helped me put this thesis together. Their support covers the broad range of moral, technical and financial. You know who you are. The thesis topic was suggested to me by my thesis advisor Alan Wagner who offered his own insights and ideas throughout the whole process. I wanted a problem with a large implementation component and I got it. I am grateful that my work is forming a basis for further projects in this area at U B C . Alan's organizational skills and planning are responsible for all this coming together. My parents Henry and Ethel were always there for me. For the last year, I have been fortunate to have the friendship of Rivian Rimer. This made the long work hours more bearable and productive; and it looks good that the relationship will continue for a lot longer. The 4 people with whom I used the transputer hardware the most are Jie Jiang, Hilde Larsen, Ola Siksik and H . Sreekantaswamy. It was a pleasure to learn together and share with each other. A special thanks to Jie for imparting his hard-earned understanding of our transputer system. From my first days at the department, our Head, Maria Klawe, was appreciative of my being there and supportive of my efforts. To her a hearty "Thank you". Norm Hutchinson made several helpful suggestions, and the time he put into reading this thesis is much appreceiated. There are people who made the non-technical (and some technical) sides of life more pleasant - eating, hiking, canoing, singing and just plain hanging out. They include Yoko Ando, Esfandiar Bandari, Andrew Csinger, Heidi Dangelmaier, Elliot Drobner, Mike Horsch, Maxime Kalfon, Jian & Ying L i , Steve h Sarah Mason, Valerie McRae, Manny Noik, Sue Rathie, Greg Rowland, David Sidilkover and Diana Weaver. There were many experts who were happy to help when approached. To name a few: Donald Acton, Barry Brachman, Greg Burns, Ian Cavers, Frank Flynn, Nadine Hofmann, Ming Lau, Marc Majka, Rick Morrison, George Phillips, Peter Phillips, Rick Sample, Bernd Stramm and Deborah Wilson. And how does one adequately thank the secretarial staff? They know all the answers and they know all the funny lines. Here they are. Gale Arndt, Evelyn Fong, Theresa Fong, Sunnie Khuman, Carol Whitehead, and Grace Wolkosky. Finally, in additition to my thanks, I offer my apologies to those whom I have forgotten to mention. ix Chapter 1 Introduction Parallel processing has been used to advantage for countless centuries. As discussed in [2], current supercomputers are incapable of analyzing static images and doing real-time analysis of motion, "Yet biological early vision processes are clearly able to perform the computations by exploiting the inherent parallelism of visual inputs in a truly concurrent fashion." The internal speeds at which a biological system functions are slow compared to a typical CPU. According to model parameters used in [1], neuronal signals travel typically between 1 and 10 m/sec, which is seven orders of magnitude slower than the speed of electric signals used in computers, and the refractory period (delay between synapse firings) is modelled at 10 ms, which is five orders of magnitude slower than the clock speeds of modern CPU's. Obviously, the combining of the best of both worlds - massive parallelism and the internal speed available through current technology - offers computational power well beyond our present capabilities. A parallel program runs on a network of processors. In addition to the hardware, system software is required to help the programmer develop the program. This situation 1 Chapter 1. Introduction 2 is analogous to the use of high level languages to program a standard computer, rather than machine instructions or assembly language. The added complications centre around deciding which processors should run which pieces of code, and along which channels they should communicate. This is known as the mapping problem, and it is NP-complete; some of the difficulties involved in mapping are discussed in Berman [11]. 1.1 Description This thesis describes the T M A P system (Transputer MAPping tool), a parallel program-ming environment, specifically designed and tailored for the multiprocessor network at the Computer Science Department of the University of British Columbia; the system is designed, however, so that porting to another site would require a minimum of time and effort. The hardware that comprises our network is centred around 74 transputers (cf [14]). Each transputer is a C P U with its own memory, and 4 serial data links. Memory is not shared amongst the tranpsuters; inter-node communication is effected soley through the data links, which may be wired directly to each other. However, most data links are wired to crossbar switches. Our network contains 10 of these switches. Each switch has 32 data links and is capable of creating, under software control, 16 bidirectional connections. Other transputer links are used for I/O with the external world. Due to its size and to various hardware constraints, our network is irregular and complex. A more complete description is given in §2.1. One property of T M A P is to hide this complexity from the user, and present the user with a manageable interface which is logically consistent with the machine architecture Chapter 1. Introduction 3 and topology. Moreover, the T M A P system incorporates the fact that our network is made up of a variable number transputers and software switches, and provides software tools to configure and control the switches and manage user level access; cf §2.2. The scope of the T M A P project ranges from user program design to allocation of hardware resources and the mapping of the processes onto these resources in a way that is efficient with respect to the interconnection potential of the hardware. The building of T M A P required the integration of two major pieces of software. The first is Prep-p [12], which we modified and used to parse the application program's process graph, and to partially map the processes into the network. The second is Trollius [13], which is the operating system that we run on our processors. We rely on Trollius to boot the machine, load programs and do high-level message passing. These aspects are elaborated on in §4.3. The lowest level software model supported by our hardware is that of a collection of processes who send messages to one another; indeed, the transputer has hardware support for context switching and inter-process communication. The degree of synchronousity of the communication is controllable from the application program on a per message basis via buffering. So, this is not communicating sequential processes in the strict sense of Hoare [21], where there is a static collection of processes with necessarily synchronous end-to-end communication. The software model, of communicating processes, is used by T M A P to present the programmer with a virtual machine, thus relieving the programmer from needing to know about the specific architectural details of our configuration. An application program is described to T M A P as a graph, where each node is a process, and the labeled edges are Chapter 1. Introduction 4 the channels of communication that the programmer may reference in the application source code, for inter-process communication. A guiding light for the development of TMAP was to obtain coordinate-free com-munication without loss of efficiency in the run-time code. Frequently, after mapping a process graph to processors, channels end up being of three types. • Local - The 2 processes are on the same processor. • Adjacent - The 2 processes are on processors having a direct physical communica-tion link. • Distant - Communication between the 2 processes involves hopping through inter-mediate nodes. Distant communication is expensive using Trollius, orders of magnitude greater than with channels of the the first two types. Yet, even in these two cases, it is difficult to take advantage, in Trollius, of the efficient low level communication. The network information needed to do this is not immediately known to the programmer, but is known only after the mapping is completed. Given the immense gain possible for using low level communication, it was important to develop a system that would, almoost completely, take care of the bookkeeping to allow the programmer to easily make use of the efficient communication offered by the hardware. This has been accomplished by TMAP . The virtual machine, as seen by the application programmer, is a black box on which a set of communicating processes may execute. These processes constitute the parallel program. It is important to note that this software model is at a sufficiently low level that TMAP is able to take full advantage of the most efficient communication supported Chapter 1. Introduction 5 by the hardware. Moreover, this is accomplished through high level send/receive library routines, which are easy to use. This approach to programming allows applications to be • Scalable - A single parameter could increase the size of the process graph, • Coordinate free - Inter-process communication uses only the channel names occur-ing in the process graph, • Topology independent - An application need not be recompiled, even if the network hardware interconnections are changed, and • Portable - Once TMAP is ported to another platform, previously written applica-tions for TMAP are able to run at the new site. To keep the scope manageable, TMAP does not deal with the routing of distant channels, other than by using the Trollius router. With the advent of the T90001 trans-puter [3], even distant routing will be hardware supported, although placement will still be important. Despite this aversion towards distant communication, broadcasting, which is not im-plemented in this version of TMAP, may be naturally added to the system; cf §9.2.1. A broadcast occurs when one process sends the same message to a group of processes, often dispersed throughout the network. The TMAP system design is detailed in Chapter 4. A complete TMAP session allows a user to compile, link and load the code which implements a parallel program, and 1 Formerly known as the HI. Chapter 1. Introduction 6 consists of the sequential execution of 7 modules. As input to the first module, the user describes the logical structure of the application program. The successive modules allo-cate resources, set up structures to implement the requested channels of communication, and eventually load the user's code onto the processors. Another important property of TMAP is that the programmer is able to specify as much hardware/network information as desired; the system would supply intelligent choices for whatever the programmer omits. This "freedom of information" does not encumber the programmer, and offers maximal flexibility in using the environment. This property was implemented using a specific design methodology. Each module expects certain text files as input. All of these, except for the Graph Description File (cf §4.4) and the user's source code, are generated by the system, and contain only the minimal information needed by subsequent modules. The user may modify these output files as necessary, before they are used by the system. As an aid in bookkeeping, the system supplies a shell script with which to invoke TMAP with the specified file changes and flags; the user is expected to edit this script. 1.2 Other Work Several topology-independent parallel programming environments for transputers have heretofore been developed. However, none of these systems may be easily ported to our multicomputer, and none of them automatically configure switchable network connec-tions. Moreover, with TMAP, there is no restriction on the language being used to code the application. Chapter 1. Introduction 7 Four of these systems are described, briefly, below. We have not had direct experience with these systems; our summaries are based on descriptions that have appeared in the literature. The popular system Express [5] is not elaborated on here, as it is not topology-independent - the user routes messages by specifying the physical node numbers. Helios Helios [9] is actually an interactive distributed operating system, implementing a U N I X type interface that may run on the nodes. It allocates resources for a user program ac-cording to what is currently available in the network; however, the network is assumed to be static, i.e. the transputer interconnections are not switchable. The situation is reversed in T M A P - resource allocation is done statically (cf §4.4) , but the transputer interconnections are set to best accomodate the user's requested communication configu-ration using the programmable switches (cf §4.2). Helios uses its own C compiler/linker. A Neural Network Modeller The system [6] is designed to model modular neural networks on a transputer network. It is based on CSTools ([8]), where standard C or Fortran may be used by the programmer to implement parallel processes. The Compiler step translates the user's description of the neural network, to be used as input for the next module, the Splitter, which decomposes the problem. The third and last part is the Simulator, which loads the user's code, as directed by the Splitter. The system is claimed to be quite flexible, and should be able Chapter 1. Introduction 8 to handle problems which are not in the neural network domain. I could not ascertain whether the system deals with switchable transputer interconnections. M A R C The system [4] accepts only application programs written in occam [10]. This language is a somewhat higher level replacement for the difficult RISC assembler of the transputers. It is limited as a production language, as it has a small selection of data types, no pointers, no stack, and no support for recursion or dynamic process creation; it does have built-in message passing and synchronization routines, which actually guided the design of the transputer, which, therefore, is able to efficiently implement the occam constructs. By using occam , the user's code is automatically decomposed into a fine2 set of communicating processes. However, the placement of these processes into the network is left to the user (or to the environment in which the user is working). Moreover, I believe that at this stage of our understanding of the mapping problem, such a fine-grained description of the user's program is not sufficiently useful to offset the work required to place them into the network. M A R C uses the transputers themselves, in a distributed fashion, to come up with a mapping for the processes, using a method based on diffusion. I believe this to be a good idea. Our network at U B C has variable connections; it would be interesting to devise such a method on our hardware. In T M A P , all the work, except for actually running the user's code, is done by our host, a SUN SPARC station. In order to avoid deadlock, M A R C considers only those process graphs that admit 2 An individual instruction may actually be a process! Chapter 1. Introduction 9 an Eulerian path. Messages are routed, bidirectionally, along the path. A short cut may occur if a node would be visited twice. In T M A P , the routing of channels which are between processes that are neither on the same nor neighbouring nodes is handled by Trollius. Although this is not guaranteed to be deadlock-free, this aspect has not been a problem for us. As in Helios, the system does not automatically configure the network to best acco-modate the user program - the wiring must be done beforehand. S h e a The system described in Shea [7] configures occam programs to run on a transputer network. The user is expected to contract (or partition) the processes to number no more than the number of available processors, and the system then places the processes into the network using a greedy algorithm. The system is centred about a routing mechanism which eliminates deadlock and maintains synchronousity of all channels, even those that are dilated. Chapter 2 Hardware and Maintenance Software Our multicomputer system consists of various hardware components. These are managed by low level interface routines, by which all other software may access and control the various components. Section 2.1 describes the hardware in some detail. This sets the stage for describing the maintenance software in §2.2. Our numbering system is quite simple. If there are N pieces of a particular hardware type, they are numbered 0 . . . TV — 1 . 2.1 Hardware The heart of our network consists of 74 INMOS transputers (cf [14]). Each is a 32 bit CPU (model T800) with either 1 or 2 Meg of RAM, and 4 data links for internode communication. There are also 10 C004 programmable 32 x 32 crossbar switches, each having 32 data links and 1 control link (cf [14]). Pairs of data links on a C004 may be connected by sending appropriate software commands to the control link. Most of the transputer links in our system are connected to data links of the crossbars. In this way, transputer link interconnections may be varied through software commands, as requested by a user. 10 Chapter 2. Hardware and Maintenance Software 11 Port Function Location Connection 0 Data CSA board 0, link 0 Transputer 0, link 1 1 Data CSA board 0, link 1 Transputer 2, link 1 2 Data CSA board 0, link 2 Transputer 4, link 1 3 Data CSA board 0, link 3 Transputer 6, link 1 4 Not used CSA board 0, link 4 — 5 Not used CSA board 0, link 5 — 6 Data B011 board Transputer 73, link 0 7 System CSA board 1, link 0 Crossbar 0, link 30 8 System CSA board 1, link 1 Crossbar 0, control link 9 Data CSA board 1, link 2 Crossbar 1, link 18 10 Data CSA board 1, link 3 Crossbar 1, link 19 11 Not used CSA board 1, link 4 — 12 Not used CSA board 1, link 5 — Table 2.1: Ports to access the network Transputers Component Port Size Old box New box 0 0 18 0 1 9 . . . 24 1 1 18 2 3 25 ...40 2 2 18 4 5 41 ...56 3 3 19 6 7 8 57 . . . 72 4 6 1 73, on the B011 board. Table 2.2: Reset Components Chapter 2. Hardware and Maintenance Software 12 The host machine through which the network is accessed is a SUN SPARC work-station, having 13 ports. Each functioning port is connected to a data link of either a transputer or a crossbar. Two ports, 7 and 8, are used exclusively for setting the crossbar switches. We have experienced problems with 4 of the ports, so we do not use them. The remaining ports may be used by application programs to implement communication be-tween the transputers and our host. Table 2.1 contains the details. There are two types of interface boards - the B011 board of INMOS ([15]), and 2 boards from Computer System Architects ([16]). Aside from the host, there are 2 other physical locations which house our network. The old box contains 9 transputers and 2 crossbar switches, and the new box contains 64 transputers and 8 crossbars. The network is divided into 5 reset components, as summarized in Table 2.2. Each component has a root transputer that is wired directly to a port. The remaining trans-puters in a component are arranged in a linear reset chain starting with the root; when the port is reset, all the transputers in the corresponding component are reset, and are ready to be loaded for the next session. In this way, our system supports up to 5 con-current users, who have complete control over the transputers allocated to them. A user may be allocated as many components as desired. Chapter 2. Hardware and Maintenance Software 13 2.2 System Software 2.2.1 Resources Our goal for system management is to allow simultaneous access of the multicomputer by several users. We do not run a multiuser operating system on each transputer, so we decided that each user would "own" a certain subset of the transputers, and the system would guarantee that users be protected from one another. Each piece of network hardware has a number of hard links which may be connected to other links. To implement our management policy, we noticed that it suffices to view each link as the unit of resource allocation. In a natural way, the link allocation strategy varies according to the piece of hardware being discussed, as explained in the following paragraphs. On a transputer, a user gets either all the links, or none of them; this is implemented by allocating the transputer - the links are allocated implicitly. Actually, transputer allocation is even coarser; the unit is an entire reset component. As an example of the irregularity in our network, there is 1 transputer, number 8, for which link 3 is unavailable. It is connected to a specialized circuit that implements a real-time monitor, to be used with the T M O N system [19]. On our host, only data ports are allocated, and this is done individually. Even if one user gets all the ports, other users may still access the host, but they will be unable to access the transputers. On a crossbar, 3 data links (and the control link) are reserved for system use, for Chapter 2. Hardware and Maintenance Software 14 the setting of the switches. This is explained in detail in Chapter 3 . The remaining 29 data links may be allocated to users. Some data links on some crossbars are wired to data links on other crossbars. This was done to enrich the set of possible interconnection networks. These inter-crossbar wires are treated as a unit for allocation purposes - a user is allocated either both links or neither link. 2.2.2 Hardware Database As can be seen from the hardware description, the system and interconnections are complex and irregular. A l l of the link-to-link connections are maintained, hierarchically, in a text hardware configuration file. The information is converted by lex and yacc into a binary form, which is made available to system maintenance routines. Note that in many systems, the software often depends on some regular link to link connections. For example, link 0 of each transputer is connected to link 1 of the next, in order to form a chain. This regular use of links is not present in our system, nor should it be. 2.2.3 Ports Simultaneous access of a particular port by more than 1 process is prevented through the use of lockfiles, there being 1 for each port. The standard functions of reading and writing with timeout, and resetting are provided. Chapter 2. Hardware and Maintenance Software 15 2.2.4 Crossbars The 10 crossbars described in §2.1 are managed, at the lowest level by the software described in Chapter 3. In addition, each of links 0 . . . 28 may be locked by a user (The other links are reserved for system use). The status of these locks, for all the crossbars, is maintained in a single lockfile. 2.2.5 Session Setup The first step in setting up a prospective user's session, is locking the required resources. If any such resource is unavailable i.e. locked by some other user, all previous locks, if any, obtained by the prospective user are released, and the user must restart the session. This simple-minded strategy for resource management does make deadlock very un-likely. The possibility for deadlock still exists, since there is no a priori order in which to request the ports; however for deadlock to occur, the sequence of restarts for each user would have to be carefully timed to lock a resource the other has not yet obtained. Not surprisingly, deadlock has never occured. Moreover, even the remote possibility of deadlock may be eliminated, as was done for the crossbar links, by using a second lockfile for the ports. Chapter 3 Crossbar Control This chapter describes our current method of setting the crossbar switches. The method was chosen for 2 reasons. Each crossbar is treated the same way, so there is simplicity in the symmetry. More importantly, it minimized the number of wires necessary to cross between the old and new boxes of our network (cf §2.1); at that time, this seemed like an important criterion. The major disadvantage to this method is that 3 data links on each crossbar are unavailable to users for the forming of networks. Another theoretical problem is that the algorithm is exponential in time with respect to the number of switches being used; in practice, the numbers are small, and the time appears to be instantaneous. The method is to arrange the switches into a linear control chain, as described below. The algorithm is optimal, based on a complexity analysis, and was implemented using the crossbar switches described in §2.1. In a larger system, this method could be used in conjunction with the method of "fanning out" from a single switch to the control links of other switches (at most 31 in a single fan). Fanning out is done in constant time, requires a dedicated switch and probably several long wires. 16 Chapter 3. Crossbar Control 17 The discussion that follows relates to our system by taking N to be 9 , port A as 8, and port B as 9. The specific connections are described in Table 2.1 . 3.1 Introduction The N + 1 switches are labeled Co .. .CN • Each has 32 data links, labeled 0 . . . 3 1 , and 1 control link, denoted as ' L ' . Any 2 data links may be connected to each other, by sending the appropriate command to the crossbar through the control link. In this way, up to 16 bidirectional connections may be formed. Link number i on switch CN is denoted Cn.i . Three data links, say, 0 , 1 and 2 , are chosen, and together with the control link, are used solely to configure the switches in a chained fashion. For each switch, except the last one, link 2 is wired to the control link of the next switch. Port A is wired to CQ.L , and port B is wired to C 0 . l . For each i from 0 ...(N — 2 ) , C; .0 is wired to C,-+i. l . This configuration is illustrated in Figure 3 . 1 . A switch is said to be controlled if its control link is communicating with either port A or port B , while using only links 0 , 1 and 2 of any of the switches to attain the communication path. Due to the connection of port A , switch Co is always under immediate control of the host. For n > 0 , switch CN is controlled when port B is connected, via crossbar connections, to CN-L . There is only 1 way for this to be the case; the path to control C3 is shown with double arrows in Figure 3.1 . Chapter 3. Crossbar Control 18 CO CI C2 C3 L 2 L 2 L 2 L 2 1 0 1 0 1 0 1 0 Figure 3.1: The crossbar control chain. The remaining links of each switch are used to configure the multiprocessor system; one must get control of each switch, and issue it whatever extra commands are necessary to effect the network configuration. In §3.2 , an algorithm is described for doing this. In §3.3 , this algorithm is shown to be optimal, under the constraints of how the switches are hardwired together. 3.2 Description of the algorithm The main algorithm is described in terms of Procedures 3.1 and 3.2, which call each other recursively. It will be seen in §3.3 that Procedure 3.3 requires 2^ — 1 switch connections, and that this is optimal; of course, this number does not include the extra network configuration commands that are issued to each switch. The main algorithm could be implemented with a simple recursive function, but then it would be awkward to issue each switch its network configuration commands. Chapter 3. Crossbar Control 19 Procedure 3.1 (Get_Control_Absolute) No assumption is made about the current switch settings. Input: n — The switch number to get control of. Output: port— The port number to use to send commands to the switch, begin For i = 0 to n do Get_Control_Next(i,port) end Procedure 3.2 (Get_Control_lNext) The previous switch is assumed to be controlled, except, of course, if n = 0 . Input: n The swi tch number t o get c o n t r o l o f . Output: p o r t The p o r t number to use to send commands to the s w i t c h . beg in I f n = 0 , set p o r t = A and r e t u r n . Set p o r t = B . Connect l i n k s 1 and 2 i n swi tch n — 1 . F o r i = n — 2 down to 0 do beg in G e t _ C o n t r o l _ A b s o l u t e ( i , p o r t ) Connect l i n k s 0 and 1 i n swi tch i . end end Procedure 3.3 (Configure_Switches) Each switch is controlled, in turn, and issued its configuration commands. Input: Swi tchSet t ings A f i l e c o n t a i n i n g the swi tch s e t t i n g s to c o n f i g u r e the network. beg in F o r i = 0 to N do beg in Get_Control_Next ( i , p o r t ) Use the data i n Swi tchSet t ings to i s s u e the network c o n f i g u r a t i o n commands to t h i s s w i t c h . S p e c i f i c a l l y , swi tch number ' i ' i s i s s u e d i t s commands through p o r t number ' p o r t ' . end end Figure 3.2: The 3 switch setting procedures. Chapter 3. Crossbar Control 20 3.3 Derivation of the algorithm Let K„ be the number of switch connections that are needed to guarantee getting control of switch Cn ; this is the complexity of Procedure 3.1. As already noted, u0 = 0 . For n > 0 , the previous switch must be controlled and issued 1 command to connect links 1 and 2, as depicted in Figure 3.1. Thereafter, all the previous switches must be controlled, in decreasing order, and issued 1 command to connect links 0 and 1. Thus, the following recurrence relation holds. un = u n_i + 1 -f un_2 + 1 + ••• 4- u0 + 1 (3.1) From this follows un+i = un + 1 + un = 2un + 1 . The solution to this equation is, for all n > 0 , un = 2n - 1 . (3.2) Equation 3.2 expresses the values that an algorithm would like to achieve. Such an algorithm is described with the help of an intermediate sequence. Let vn be the number of switch connections necessary to get control of Cn , given that Cn-i is controlled; this is the complexity of Procedure 3.2. Of course, v0 = 0 . For n > 0 , the relation is n-2 Vn = 1 + £ ( U ; + 1) s=0 = i + E ^ = T~x . (3.3) Chapter 3. Crossbar Control 21 A consequence of Equation 3.3 is that N J2vi = 2N -l = uN . This last equation shows that Procedure 3.3 has the claimed complexity, and that this is the best lower bound, UJV • Chapter 4 System Design The user program is described to TMAP as a logical collection of communicating proc-esses. The user supplies this decomposition directly, or it would be obtained from TRES [18], another project at UBC, under development. This information is what we felt to be the minimal necessary to implement a configured system to efficiently run the user's program. Various graph structures are used throughout TMAP, and are described as they arise in the following discussion. 4.1 The Front end As mentioned in §1.1, we have integrated Prep-p [12] amd Trollius [13] into our system. The first module in Prep-p parses the user's description of the application program. The description is in terms of a process graph. The labeled edges correspond to channels of communication. The second module assigns communication weights to the channels mentioned in the process graph. The third module contracts (or partitions) the process graph to have a sufficiently small number of nodes i.e., no more than the number of 22 Chapter 4. System Design 23 available processors. In this way, Prep-p separates the mapping problem (cf Chapter 1) into 2 stages - contraction followd by an embedding. The basic system design of TMAP is modeled after Prep-p, e.g. using text files to communicate from module to module. In fact, the first 3 Prep-p modules have been mod-ified to work in TMAP; cf §4.4 and Appendix B. TMAP diverges from Prep-p since the target hardware architecture of Prep-p is the CHiP [22], a synchronous multiprocessor, whose I/O methodology is quite different from our system's. 4.2 User Input The main user input is a weighted undirected process graph. Each node of this graph is called a virtual node (Vnode), in keeping with the Prep-p nomenclature, and represents a process that will be executing on some processor in the network. In addition, each edge has 2 labels, so that each vnode may have its own name for the edges incident to it. In this way, a vnode now has a name of its own choosing for each of its neighbours, and this is used to the implement the topology-free communication described in Chapter 6 -each edge of the graph repesents a communication channel. Each vnode is implemented as an executable file. The user specifies whether it is to be run on the host or on a transputer. Thus, there are 2 types of vnodes, and are described using Trollius [13] terminology - ITB "In the box", and OTB "Outside the box". This virtual graph (Vgraph) is also weighted, each edge being labeled by an integer estimating the amount of communication that the user expects to occur along that edge. We decided that each edge should be bidirectional, as a transputer link may carry Chapter 4. System Design 24 information in either direction, and this tends to simplify the graph. As mentioned in §1.1, Prep-p is used as a basis for the front end of TMAP. Specifically, we use the first 3 modules of Prep-p to contract a process graph to a specified number of nodes. Changes that we made to Prep-p include specifying the processor type on which a process is to be run, contracting only the set of processes targeted for transputers, allowing the specification of more than 1 communication channel between 2 processes, and redefining how I/O is to be done. These changes were necessary, as the target hardware architecture of Prep-p is so different from our own. Further information that may be supplied by the user includes the following. • Communication channels (edges in the process graph). The user may use these channel names in the application source code for inter-process communication. The semantics are described in Appendix A.2. These names are assigned values when the executables are loaded onto the network. This relieves the user from needing to know about the hardware topology, the numbering of the links, and eliminates the need for recompilation when the topology changes. Also, only 1 copy of the executable need be maintained on disk, rather than all the patched copies that get loaded onto the various nodes! • Number of transputers to use. The contraction module partitions the ITB vnodes into disjoint groups, each being called a real node (Rnode). Each transputer will receive at most 1 rnode i.e. each transputer will receive the executables associated with at most 1 rnode. The edge weights are used to guide the contraction. The TRES project deals with the question of determining an optimal number of nodes, as well as other parameters such as topology, on which to run a specified set of user Chapter 4. System Design 25 processes. • Number of ports to use. In our system, it is the number of ports that limits the number of concurrent users of our system. For this reason, the default number of ports is one. • Channel usage (edge weights). These weights are estimates by the user as to how much the communication channels will be used. It is recommended that the user specify the first 3 of the above 4 categories, especially the first two. The user may also specify how to connect the transputers via the software configurable crossbar switches, how to contract the vnodes to modes, and how to place the modes onto the network. Some of this is accomplished with the help of intermediate languages. Obviously, most users would be happy to leave this work to the system. 4.3 The Back end As mentioned in §1.1, a user's program, loaded by TMAP, is designed to run under the Trollius operating system. Trollius follows, at a low level, the same design constraint as TMAP viz. hiding information while allowing the user to specify as much detail as desired. Trollius also employs user-supplied/modifiable text files, e.g. the network configura-tion file (bnail), and this fits well into the TMAP design. However, it is a major chore to write up a bnail file in our system, as the interconnections are, typically, quite irregular. Furthermore, any changes to our system e.g. a minor rewiring, would require any affected bnail files to be edited and changed. The TMAP system automatically creates any bnail Chapter 4. System Design 26 files that are needed. Trollius implements network-wide routing for message passing. Under Trollius, to send a message to another process, the Trollius node number of the destination process must be known to the sending process. When the destination process is on an adjacent node, or on the same node as the sending process, there are more efficient message passing routines to use. However, to make use of them, the programmer must be acutely aware of the interconnection topology being used on the transputers, and the mapping of processes onto this topology. With Trollius, if the user chooses the same node numbers to use in various topologies, then the same code may be run without recompilation; however, the routing of messages will probably be inefficient, involving hops across nodes for messages to reach their des-tinations. On the other hand, if the user processes are loaded onto the nodes in a way which is logically more consistent with the communication patterns of the program, then the code must be recompiled with the correct network information; knowing this infor-mation is a burden for the user. The elimination of this problem was a chief motivator in the design of T M A P . The T M A P project may be viewed as extending the information hiding of Trollius to include not needing to know the Trollius numbers of nodes running specific processes. However, the design of our system is such, that it is amenable to modification for use with other low level transputer management systems, such as Logical Systems' ld-net [24]. This is explained in Chapter 9. Only two functions were added to the Trollius libraries in order to implement the send/receive capabilities using the channels defined in the graph of vnodes. These are Chapter 4. System Design 27 described in Chapter 6. 4.4 Outline TMAP consists of a sequence of modules which, normally, are executed in order. 1. Vgraph module. User input consists of a graph description file (GDF) , which is processed largely by the GraphCom module of the Prep-p system. This GDF specifies the user processes, along with their communication channels, using a con-venient graph description language. The source code for a process may reference a structure variable having the same name as the specified channel. At load time, the fields of this structure variable are patched to contain the Trollius number of the destination node and an integer event specifying the particular process on the destination node. 2. Weight module. The user may use the XX language of poker [23] to describe, for each process, those aspects of the process that the user deems appropriate. This skeleton code is scanned by the CallXX module of Prep-p to generate weights for the edges. 3. Resource module. The resources in question are which transputers and ports to use. The user may have specified nothing about this, or partial information, or completely the desired hardware. Partial information may be simply how many nodes and ports to use, or even specifying some of them. In any case, we use the method of binary integer linear programming to satisfy the outstanding resource requirements; cf §7.1. Our current network is divided into 5 reset components; cf Chapter 4. System Design 28 §2.1. Each component is assigned a binary integer variable to represent availability of the component and port. The objective function's main goal is to minimize the number of components being used, as it is the number of components which limit the number of concurrent users for our system. 4. Contract module. The contraction module of Prep-p is used to contract the vnodes to the required number of modes, as explained in §4.2. 5. Place module. The embedding of modes into the allocated transputers, and the for-mation of transputer connections, are accomplished by adapting a greedy algorithm of Smedley[20]. 6. Setup module. Up to this point, the work done by TMAP has been independent of the low level software being used, in this case Trollius. The purpose of the Setup module is to minimize the amount of work that the next module, Load, must do. To this end, the Seup module organizes the information that is needed by the Load module. This includes the creation of the Trollius files bnail. (network topology) and route, (routing). 7. Load module. The resources are locked exclusively for the user. If any of the resources is not available, all locks for this user are released, thus avoiding deadlock, and TMAP is exited; otherwise, the executable files are loaded onto the network. Chapter 5 Sample Session The example in this chapter is intended to help the reader form a single picture of the T M A P system, and does not cover all the features offered by T M A P , such as command line parameters that each vnode may receive at load time. Complete documentation may be found in the appendices. The ITB nodes form a ring. Each experiment consists of passing a message of a specified length around the ring a specified number of times, using one of 3 different types of message passing, and printing how long it took to do this. The 3 levels of communication are described in Appendix A.2. Before each experiment, a control packet, specifying the 3 parameters, is sent to each node. Appendix D has all the code for this example. Here is the G D F that is used: /* This i s a Graph. Description F i l e . It i s based on the GraphCom module of Prep-p . The f i r s t processing done on th i s f i l e i s by the C preprpcessor, cpp , whose output i s directed into GraphCom . */ /* Even though Prep-p i s able to handle special graph types more optimally, t h i s ring i s specified as 'irregular', f o r si m p l i c i t y . */ 29 Chapter 5. Sample Session irregular { /* The user may define variables and i n i t i a l i z e them. */ /* This f i r s t variable i s relevant to th i s example, only. It i s the length of the ITB ring. To keep things uncomplicated i n th i s example, i t must have the value at least 3 . */ length = 10; /* These 2 variables must always be defined f o r Prep-p . nodemin = smallest number of a vnode . nodecount = t o t a l count of vnodes . */ nodemin = 0; nodecount = length +1; /* 1 more for the host node. */ /* These 3 variables are s p e c i f i c to TMAP, and take on default vaules i f they are not defined: OTBmin = -1, i.e a l l vnodes are ITB . TRANScount = 18 . /* 18 i s a good number for the UBC network. PORTcount = 1 . */ /* A l l vnodes numbered above or equal to OTBmin w i l l be loaded 0TB OTBmin = length; /* At least t h i s number of transputers w i l l be allocated by the Resource module. */ TRANScount = length; > procedure Host nodetype : { i == OTBmin } port root : {0} procedure root nodetype : { i==0 } port host : {OTBmin} port next : {1} port previous : {length-1} Chapter 5. Sample Session 31 next previous previous next root host previous next previous Figure 5.3: The graph of vnodes and channels. procedure node nodetype : { i>0 && K l e n g t h } port next : { (i+1) '/, length > port previous : -Ci-1} The preceeding GDF specifies that the ITB ring has 10 nodes, with each process being connected to its neighbours by channels it knows as "next" and "previous". Each node procedure is a template for 9 vnodes numbered 1 . . . 9 . The root procedure represents only vnode 0, which has an additional channel called "host" with the sole OTB process, number 10, represented by the procedure Host. Figure 5.3 depicts this arrangement. The rest of this chapter explains the steps to actually run the program. Ideally, the command tmap is all that would be necessary. Unfortunately, reality sets in. • We are insisting on a perfect embedding of the communication graph specified in the GDF; the Contract module must be told not to do any contraction. • The embedding of the "contracted" graph into the hardware is based on heuristics; cf §4.4. In this example, it turns out, the algorithm must be told the mapping of Chapter 5. Sample Session 32 a few of the nodes, in order to avoid dilations. A dilated channel does not support the 2 lower levels of communication. Even so, most nodes are taken care of by the system, and at no point need one deal with link numbers. This is the only point where the user would need some familiarity with the hardware. • The Load module must be told that the host process is to be run in the foreground, as it is accepting input from the keyboard. By default, host processes are run in the background. • For reasons of efficiency, the Load module does not check that an executable to be loaded onto the network is up to date with respect to the user's source code. The user must ensure this and prepare the makefiles before invoking the Load module, in the manner described in Appendix A , paragaraph "USER C O D E " . However, if an executable is not present, the Load module will automatically make it. The complete command line is tmap -C n MapInit=MapInit.1 LoadLst=LoadLst.f The syntax is described in Appendix A . Chapter 6 Channels As explained in §4.2, a program consists of a set of sequential processes, which are running on the various processors. During its execution, each process will be sending information to, and receiving information from, other processes in the network. The two communication routines are chsend and ch-recv . Their semantics, basic functionality, and 3 levels of communication are described in Appendix A.2. The remainder of this chapter is concerned with the implementation of these 2 routines. The channel structure consists of 2 integer fields viz. node and id. The id field encodes whether the channel is local i.e. whether the 2 ends of the channel are on the same physical node, and this is known only after the contraction of the graph of virtual nodes. The first time a channel is used, it may initialize itself; however, the concerns are different for local and non-local channels. 6.1 Events, Virtual circuits and ID's Trollius uses an integer called an event to distinguish receivers on the same physical node; thus, 2 integers are used for the routing of messages - the node number of the destination, and an event. 33 Chapter 6. Channels 34 Trollius does offer an optimized level of communication, called a virtual circuit. The increased efficiency results from fewer inter-process calls being made. Normally, on each node, the link proprietors, the router, the kernel, the buffers and the user must coordinate. When two communicating processes set up a virtual circuit, no other events are allowed to use the hardware links that make up the circuit. The lower 27 bits of the id field of the T M A P channel structure contains an integer which is called the channel id. These numbers are generated as the G D F is parsed; cf §4.4. The Trollius event which the channel uses for network level sending is 0 x 7 F F F F F F F — id . In this way, T M A P uses only high positive events, allowing the user to use the remaining positive events (negative events are reserved for the Trollius kernel). Unfortunately, this event cannot be safely used for receiving from a virtual circuit, since Trollius requires that the lower 4 bits of such events, being used by a process, be unique; these 4 bits are used to index into a virtual circuit descriptor table which has 16 slots. Fortunately, the largest degree of any of our nodes is no more than 16 - for a transputer it is 4, and for our host it is currently 7 (the number of ports available to a user). Each link may support only 1 virtual circuit at a time. When the T M A P communication routines are setting up a virtual circuit, the table is searched for a high-numbered available slot, say number i . The event which is used for receiving on the virtual circuit would then be 0x7FFFFFF« , corresponding to a channel id of OxF — i . For this reason, the assignment of channel id's begins at a number greater than 0, currently 10 . Chapter 6. Channels 35 6.2 Local Channels The purpose of initializing a local channel is to synchronize the two ends for communi-cation that would bypass Trollius services. Thereafter, the user would be able to take advantage of this option by setting the flag parameter in the communication routines. For local channels, only 2 levels of communication are implemented. The Trollius virtual circuit is not used; if requested TMAP uses the hardware level, instead. The main reason for this design decision was to not take up valuable slots in the Trollius virtual circuit descriptor table. This table must be used to effect low-level communication for the non-local channels, as described in the next section. Initialization is relevant to ITB nodes, only. A local OTB channel never initializes itself. This decision was taken for this first release of TMAP, since local OTB channels were not deemed sufficiently important. A future version of TMAP may use sockets or shared memory to implement "hardware" communication for local OTB channels. The following initialization procedure is invisible to the user, and is handled entirely by the TMAP commuication routines. When an ITB local channel is first used, the sender tags on to the message the address of the node field, and this address is placed into the node field at the receiver's end. The sender's node field may now be used for synchronization using the in/out machine instructions of the transputer. The sender's id field is coded to indicate that the node field is the channel synchronization word, while the receiver's id field is coded to indicate that the node field contains the address of the channel synchronization word. Chapter 6. Channels 36 6.3 Non-local Channels Typically, to route a message at the network level, Trollius sends a message to the router daemon, which responds with the hardware link to use. Each process streamlines this protocol by caching a table of links for intended destinations. The purpose of initializing a non-local channel is to store, in the id field, the link number which is being used to route the channel with the other end. Once this is done, the communication routines may use the datalink layer of Trollius, rather than the slower network layer. On a transputer, the link number is between 0 and 3; on the host, the link number is between 0 and 10, these being the port numbers (cf Chapter 2.1). Four bits of the id field are reserved for this. If the channel is not dilated i.e. the 2 ends are on adjacent nodes (this is known after the contracted graph has been placed into the hardware), the link number is patched into the channel structure as the executable is being loaded onto the network. This is accomplished by the TMAP front end to the Trollius loader. For a dilated channel, initialization is done at run-time, by making a call to the Trollius router. This is done automatically by the TMAP communication routines. TMAP does not support hardware communication over a dilated channel, and Trollius recommends that a virtual circuit not be used over a dilated channel, because of the like-lihood of deadlock. So, the following discussion, on the 2 lower levels of communication, refers to channels between nearest neighbours. Chapter 6. Channels 37 The first step in setting up a hardware communication is the obtaining of a virtual circuit. The user initiates this by setting the appropriate bit of the flags parameter to the TMAP communication routines. The consequent flurry of activity, outlined below, to implement this request is invisible to the user. The sender obtains an event for receiving along the virtual circuit as explained in §6.1, and tags it along with the message. The receiver will use this event for sending along the virtual circuit. It is at this time that the receiver sends the sender an event that the sender should use for sending along the virtual circuit! This convoluted logic occurs only during the setting up and dismantling of a virtual circuit. Once established, the user may choose to efficiently use the virtual circuit or the hardware, by setting the appropriate bit in the flag parameter of the communication routines, assuming that both ends are ITB; if one end is OTB, hardware communication is not an option. Chapter 9 contains the results of experiments comparing the throughputs of the 3 levels of communication. 6.4 Channel Properties A channel may be shared amongst different processes. The first rule for doing this is that their channel addresses must point to the same physical channel structure. This is not a problem in the linear address space of a transputer. Secondly, if hardware communication is being used, only 1 process should be using the channel; the transputer hardware does Chapter 6. Channels 38 not deal with 2 successive inQ's in a way that is consistent with channel usage l . On our host, a U N I X environment is used, and processes do not share the same address space. In this case, for example, a parent may pass a child the entire channel structure; however, thereafter, only 1 of these processes should be using the channel. To date, programs have used only the channel graph set up at load time. However, channels may be dynamically created, destroyed and rerouted. This would require a high degree of cooperation amongst the processes comprising a program, and a lot of architectural details of the network would need to be known to the processes; for neither of these aspects would the current version of T M A P offer much help, nor does T M A P support dynamic switching of the crossbar links to better support a different topology of the communication channels. Section 9.2.3 discusses the topic of phases with respect to T M A P . The blocking behaviour of channel communication is inherited from Trollius, which provides flexibility with optional buffering. Generally, at the network level of commu-nication, the sender does not block, but the receiver does block. With hardware com-muication, both sender and receiver block, according to the transputer architecture. In the occam language, a channel has the same blocking behaviour as hardware com-munication in T M A P . In occam, a channel is not a variable, and cannot be shared amongst different processes; there is no notion of a channel address. These restrictions place a heavier burden on the occam programmer, in the same way that assembly pro-gramming is more burdensome than even a moderate language such as C . JThe first in() would be treated as an out() ! The functions in() and out() are machine instructions which implement message passing. Chapter 7 Resource &: Place Modules The 3 modules Resource, Contract and Place do virtually all the work of choosing re-sources with which to implement the user's program; cf §4.4. The Prep-p module, Con-tract, is described in Appendix B.3. 7.1 Resource module The Resource module reads the file Syslnfo, described in Appendix C . l . l , and determines the number of transputers and ports that remain to be chosen; in that file, the user may have specified some of this hardware explicitly. The outstanding resource requirements, if any, are satisfied using the method of binary integer linear programming. The particular package that we use is XMIP83 [25]. The output of the Resource module is the file Resources, which is described in Appendix C. l .5 . The reader is referred to Chapter 2 for those architectural details of our system that this section assumes. 39 Chapter 7. Resource & Place Modules 40 7.1.1 Problem formulation There are 7 data ports available to users. Five of them are associated to the 5 com-ponents of our system, and are represented by the variables CQ. .. c\ , corresponding to components 0... 4 . The remaining 2 data ports are represented by variables po and Pi , corresponding to ports 9 and 10 . Each of these 7 variables may take on only the values 0 and 1. Initially, a value of 0 for a variable indicates that the user has not explicitly requested that resource; a value of 1 indicates otherwise. After allocation, the values of these variables indicate which resources the user will have allocated for implementing the program. Since this problem is small (only 128 = 27 possible solutions), an exhaustive search for the "best" solution is reasonable (A precise formulation of "best" is given in §7.1.3). However, we give a more general approach by which larger problems may, also, be han-dled. We will see that minimizing the cost function 7.8 subject to the constraints 7.4 solves our resource allocation problem. 7.1.2 Constraints We define 3 functions of the 7 variables defined in the previous section. A c - Po + Pi = 18co + 18cx -I- 18c2 + 19c3 + c4 — CQ + C-! + C 2 + C 3 + C4 (The number of extra ports allocated) (The number of transputers allocated) (The number of components allocated) Chapter 7. Resource & Place Modules 41 Let Np , Nt and Nc , respectively, be the minimum number of ports, transputers and components requested by the user, as specified in the file Syslnfo. There are 3 constraints: Ac + Ae = Co + cx + c2 + c3 + c4 + p0 + Pi > Np At = 18co + 18ci + 18c2 + 19c3 + c4 > Nt (7.4) Ac = co + ci + c2 + c3 + c4 > 7YC 7.1.3 Cost function It is most important to minimize the number of components being used, Ac , as this determines the number of concurrent users of our system. Next in importance is to minimize the number of transputers, At ,and finally, to minimize the number of extra ports, Ae . Rather than having 3 successive minimization problems, a single problem is con-structed. Lemma 1 Let f and g be real valued functions on a finite set S . Then there is a constant, K , such that if the function Kf + g has a minimum value at any point s £ 5 , then / has a minimum at the same point s . The proof of this lemma consists of the derivation of Inequality 7.5, which gives a range for K . The value K satisfies the lemma if (\/s,teS) ( / ( * ) > / ( a ) Kf(t) + g(t) > Kf(s) + g(s) ) Chapter 7, Resource &: Place Modules 42 The right side of the above implication may be manipulated to give the following in-equality (without the "max"), which gives a range of K that satisfy the lemma. K > m a x (7.5) /(*)>/(») fit) - f{s) y ' The cost function is constructed by first choosing a constant K\ , according to the lemma, for the function h = K1Ac + At . (7.6) Then choose a constant K% for the function C O S T = K 2 h + A e . (7.7) We will see in the next section that we may take K\ = 1 and K2 = 3, in which case, the cost function is COST = 3 ( A C + A t ) + A e = 3(19co + 19c! + 19c2 + 20c3 + 2c4) + Po + Pi (7-8) = 57CQ + 57ci + 57c2 + 60c3 + 6c4 + p0 + px . 7.1.4 The constants Columns 2 and 3 of Table 7.3 show the minimum and maximum number of transputers in the number of components specified in column 1. To calculate the constant K\ , we see from Table 7.3 that the largest value of the numerator in Inequality 7.5 is 0 (here, f = A c and g = At); so K i may be any value larger than 0 , say 1 , to keep the function h of Equation 7.6 integral. We apply the lemma to the function of Formula 7.7 to calculate K2 . Since the function A e takes on values between 0 and 2 , the numerator in Inequality 7.5 is bounded Chapter 7. Resource Sz Place Modules 43 A c min A t max A t 0 0 0 1 1 19 2 19 37 3 37 55 4 55 73 5 74 74 Table 7.3: For the constant K% of the cost function by 2 ; so K2 may be any number larger than 2 , say 3 . We have used the fact that the denominator of Inequality 7.5 is at least 1 , as it is an integer, being the difference of values of the function h . 7.2 Place module The result of the Contract module is the graph of modes; cf §4.2. The Place module embeds this graph into the hardware specified in the file Resources. This module is the weakest in terms of being user-friendly. This is not surprising, as the general mapping problem is intractable; the leading paragraphs of Chapter 1 discuss this briefly. Most TMAP sessions benefit from telling the system where to map some of the modes. An example of this is in the sample session of Chapter 5. However, we have not, to date, ever had to tell TMAP which link numbers of the transputers to use, although this could be done using a substitution on the file LkData, which is created by the Place module; cf Appendix A. l . The actual embedding is done by adapting a greedy algorithm of Smedley [20], a former Masters student of A. Wagner; the latter is also the thesis advisor of the author, Chapter 7. Resource &: Place Modules 44 and did the coding for the adaptation. 7.2.1 Outline This section gives a brief description of the Place module. First, RAM structures are initialized from the hardware database; cf §2.2. Next, the available hardware is obtained from the file Resources, and various heuristic parameters to guide the placement algorithm are calculated. These are briefly discussed in the description of the file HostMap in Appendix A. l , paragraph "SYSTEM FILES". The graph of modes may have multiple edges between 2 given nodes. The first stage of the embedding algorithm begins by identifying multiedges as a single edge, and proceeds to embed this simpler graph into the hardware. An edge is mapped only if a direct hardware connection may be formed to carry the edge - dilations are not allowed. A dilation would occur if the edge would have to map to a path in the transputer network that contains transputers other than the 2 endpoints. The second stage expands the mapped edges back to multiedges, and tries to further map these into the hardware, spreading multiedges over available links, where possible, and multiplexing over existing links, if necessary. Chapter 7. Resource &: Place Modules 45 7.2.2 Discussion This 2 stage approach was taken to increase the chance of avoiding unmapped edges (because of would-be dilations); the preference was to have multiplexing. However, ex-amples have arisen where a "very" greedy algorithm would be more helpful i.e. to do the multiplexing on the fly. This tends to occur when the user specifies not to contract the graph of vnodes, and wants separate transputer links to implement multiple channels between processes. T M A P does not currently support the option of doing the mapping in this "very" greedy way. The handling of dilated edges is more in the domain of "routing" than "mapping", although the 2 problems are obviously closely linked. This version of T M A P does not do routing. Unmapped edges are routed at run-time using the Trollius router. Chapter 8 Setup &: Load Modules The Setup and Load modules are the only modules which are specific to the low-level software being used, in this case, Trollius; cf §4.4. Appendix A . l discusses all the files mentioned in this chapter. 8.1 Setup module The primary purpose of the Setup module is to prepare for the Load module, stopping short of allocating resources and loading executables. This allows for the fastest repeated invocation of the Load module, assuming no changes are being made that require the attention of previous modules. The file LkData is used to create the file bnail. . This is the Trollius file which details, amongst other things, the transputers and ports being used, and their link-to-link interconnections. The Trollius numbering for the nodes is chosen here. In this implementation of T M A P , the Trollius number of a node is the same as the physical number; cf Appendix C. l .6 . The bnail syntax does allow the user to choose a different numbering; perhaps it would be helpful to use the numbering of the modes (the output of the Contract module). This question was deemed unimportant for this first version 46 Chapter 8. Setup &: Load Modules 47 of T M A P , as users are expected not to need to know about Trollius node numbers; in reality, they are helpful for debugging and accessing Trollius support routines e.g. state. The file ProcMap is created from various previous files. Each line specifies the Trollius node number to which the executable of a vnode is to be mapped. The file ChanLks is created; cf Appendix C . l .7. This file is the most helpful in determining, by inspection, whether T M A P has done a good enough job, with respect to dilations, in mapping the vgraph into the hardware. 8.2 Load module The main job of the Load module is to patch the channel structures in each executable, be-fore invoking the Trollius loader. The Load module accesses the files ProcMap, ChanLks, ChanVIds, ProcLst.sys, LoadLst and ParamLst in order to load the user's executables, in increasing order of the vnode number. A future version of T M A P should allow the user to specify which executables to load, and the order in which to load them. Specifying the order would make some start-up logic for a program easier - a program can not claim links for a virtual circuit (cf Chapter 6) until they are no longer needed by Trollius for loading subsequent executables. Selective loading would allow a user, for example, to reload a process. Chapter 9 Conclusions The T M A P system is easy to use, both by the application programmer wishing to design, code and run a parallel program, and by the system developer wishing to enhance and expand the functionality offered by T M A P . The following two sections illustrate these properties. We emphasize here that a major improvement enjoyed by T M A P over Trol-lius is that application programs are topology-independent and, at the same time, they take advantage of efficient communication routines. Moreover, if the hardware topology changes, a program's source code need not be recompiled and no changes need be made to the Trollius router. 9.1 Experimental Results The 2 sets of results described here are concerned with communication times within a ring. Comparisons are made of the times obtained when using the 3 different levels of communication; cf Appendix A.2. The basic functionality to achieve the apparent speed-up is already present in Trollius. The significance with respect to using T M A P is that the choice of communication level is easy to implement, both at run-time and when writing the application code; cf §4.3. 48 Chapter 9. Conclusions 49 9.1.1 Spring In this example, a round of communication consists, mostly, of all neighbours exchanging some information. The code was supplied by Russel Wvong. David Feldcamp added an X Windows interface, and I modified the communication calls to be used under T M A P . The ring has 18 nodes. Table 9.4 shows the times for the ITB nodes to complete their messsage passing. ITB flag Network Virtual Hardware ITB time 15.1 5.7 4.7 Table 9.4: Spring: ITB communication times (sec). Table 9.5 was compiled using hardware communication amongst the ITB nodes, and shows the total program elapsed time, including sending the problem parameters to, and receiving the results from, the root transputer. The ITB-OTB communication is seen to be significant in this problem. O T B flag Network Virtual Total time 19.0 8.2 Table 9.5: Spring: Entire program times (sec). 9.1.2 Ring This second example consists of messages being passed around a ring of 10 nodes. Each message is sent Count number of times and consists of Length number of bytes. This is Chapter 9. Conclusions 50 the example of §5. Table 9.6 shows some recorded times, including the communication with the host process, and the initial control message. Count Length Network Virtual Hardware 1 0x100 0x1000 0x10000 0x100 1 1.15 0.70 0.68 1.80 1.05 0.69 18.8 6.65 0.85 Table 9.6: Ring: Entire program times (sec). As expected, hardware message passing was fastest, with the most marked differences between the 3 types of communication occuring with many messages being sent - the overhead is more burdensome. 9.2 Future Work 9.2.1 T R O L L I U S In Prep-p, each procedure in the graph description file is limited to having at most 8 channels, and this restriction is inherited by TMAP . Thus, as alluded to in §1.1, each process "knows" only a small number of channel names viz. 8 . Nonetheless, TMAP does allow a natural implementation of broadcasting, using the Trollius cast daemon. A broadcast occurs when a process sends a single message to a set of processes, usually covering several, if not all, nodes of the network. A broadcast would be done by referencing a set of vnodes as the intended recipients. The vnode numbers are defined in the Graph Description File; cf §4.2. To implement this, each physical node of the network would run a TMAP broadcast process that would know Chapter 9. Conclusions 51 about the vnodes in the network. Each such broadcast process would have a channel, invisible to the user, that would be shared amongst the vnodes on that physical node. Additonally, the broadcast process would interact with the Trollius cast daemon. 9.2.2 L D - N E T As mentioned in §4.3, the implementation of T M A P may be modified to work with other low level systems, instead of Trollius e.g. Logical Systems' ld-net. In this case, a process could be modeled as a set of object modules, or as a library. After contraction, the set of libraries mapping to a particular node would be linked into a single executable, as re-quired by ld-net. This executable would also contain a router to implement network-wide message passing. The mainQ function in the executable would have the responsibility of creating the user's processes and the router process(es). 9.2.3 High level Bookkeeping A l l the files mentioned in this section are discussed in Appendix A . l . A complete T M A P session uses, at a minimum, • 35 files in the main directory, • 5 subdirectories, each containing its own files, • an X X file for various procedures in the G D F , and Chapter 9. Conclusions 52 • directories containing the user's application code. In addition, a session often uses • substitution files created by the user to control the progress of the session (and copies of the standard files), and • copies of the t m script, specific to particular configurations being used. Needless to say, this is an overwhelming amount of data to keep track of - not for the system, but for the user who wants to feel that they are in control. A graphical interface is being developed to improve the ease with which information may be displayed for, and altered by, the user. Phases The current version of T M A P is static with respect to the communication graph. It is often the case during a program, that a communication pattern changes sufficiently and is constant over a long enough period, that a change is warranted in how T M A P supports the new message patterns. At a software level, this is relatively easy to do - just define channel structures which point to the desired processes. The more difficult aspect is to reconfigure the hardware interconnections to efficiently support the new communication pattern. A project is being discussed as to how to best implement phases in T M A P . It is hoped that Prep-p's usage of phases may be largely applicable. Bibliography [1] Matthew A . Wilson and James M . Bower. The simulation of large-scale neu-ral networks. In "Methods in neuronal reasoning, from synapses to networks", Christof Koch and Idan Segev (Ed.), MIT Press, Cambridge M A , 1989, Ch. 9, P291-333. [2] Massimo A . Sivilotti, Michelle A. Mahouwald and Carver A . Mead. Real-time vi-sual computations using analog C M O S processing arrays. Advanced Research in VLSI : Proceedings of the 1987 Stanford Conference, P. Losleben (Ed.), Cam-bridge, M A . MIT Press p295-312. In "Neurocomputing. Foundations of Re-search", James A . Anderson and Edward Rosenfeld (Ed.), MIT Press, 1988, p701-711. [3] Craig Davidson (Pacific Parallel). The HI: A formidable new parallel CPU. In "The Paragraph", Autumn 1990, Quantum Leap Systems, San Diego, California. [4] Jacques E. Boillat, Nicolas Iselin and Peter G. Kropf. MARC: A tool for auto-matic configuration of parallel programs. University of Berne, Switzerland. From Transputing '91, Sunnyvale, California. [5] EXPRESS, Logical Systems C Version. ParaSoft Corporation, Mission Viejo, California, 1988. [6] T. Tollenaere &; G.A. Orban. Transparent Problem Decomposition and Mapping - a CS Tools Based Implementation. Katholieke Universiteit Leuven, Belgium. From Transputing '91, Sunnyvale, California. [7] Kai Ming Shea. A mechanism for mapping processes onto transputer networks. Masters thesis. Department of Computer Science, University of Hong Kong. July, 1990. [8] Meiko Scientific "CSTools for SunOS". Meiko Limited, Aztec West, Bristol, 1990. [9] The Helios Operating System. Perihelion Software Ltd. Prentice Hall, U K , 1989. [10] O C C A M 2 Reference Manual. INMOS Limited, C. A . R. Hoare (Ed.). Prentice Hall, Hertfordshire, 1988. 53 Bibliography 54 Francine Berman. Why is mapping hard for parallel computers? I E E E Paral-lel/Distributed Computing Networks Seminar, 1990, p89-112. Francine Berman and Bernd Stramm. Prep-p: evolution and overview. Technical report CS89-158, University of California, San Diego, 1989. Gregory D. Burns, Andrew K . Pfiffer, David L. Fielding and Alison A . Brown. The trillium operating system. In Proceedings of the Third Conference on Hy-percube Concurrent Computers and Applications. A C M Press, January 1988. Transputer technical notes. INMOS Limited. Prentice Hall, 1989. IMS B011 Transputer VMEbus board, User manual. INMOS Limited, Bristol, U K . January, 1988. PART.8, Installation guide and User manual. Computer System Architects, Provo, Utah. March 1990. H . Sreekantaswamy, N . Goldstein, A . Wagner and S. Chanson. Resource man-agement in a large reconfigurable transputer-based system. In proc. "Sixth Dis-tributed Memory Computing Conference", OACIS and U . Michigan. Portland, Oregon, Apri l 1991. A . Wagner, S. Chanson, N . Goldstein, J . Jiang, H . Larsen and H. Sreekan-taswamy. TIPS: Transputer-based interactive parallelizing system. In proc. "Transputing '91", World transputer user group. Sunnyvale, Cai., Apr i l 1991. Jie Cheng Jiang. Performance Monitoring in Transputer-based Multicomputer Networks. Masters thesis. Dept. of Computer Science, University of British Columbia. August, 1990. Garth Smedley. Algorithms for embedding binary trees into hypercubes. Masters thesis. University of British Columbia, 1989. C. A . R. Hoare. Commuicating sequential processes. Communications of the A C M 21, Aug 1978, P666-677. Lawrence Snyder. Introduction to the configurable, highly parallel computer. I E E E Computer, Jan 1982, p47-56. Lawrence Snyder. Parallel programming and the Poker programming environ-ment. I E E E Computer, July 1984, p27-36. Transputer Toolset, version 89.1. Logical Systems, Corvallis, Oregon, 1989. Bibliography 55 [25] Jaya Singhal, Roy E. Marsten and Thomas L. Morin. Fixed Order Branch-and-Bound Methods for Mixed-Integer Prograamming: The Z O O M System. ORSA Journal on Computing, 1, 1989, p44-51. Appendix A T M A P Implementation The first 3 appendices consist of a collection of UNIX style manual pages, to flesh out the details of the TMAP implementation and operation. None of the Trollius manual pages are reproduced here, while some of the more relevant Prep-p pages are included. In the text of the pages, a reference such as "foobar(l)" signifies that there is a manual page for foobar ; however, if the information is not TMAP specific, the manual page will not appear in these appendices. Appendix A. l describes the user interface to TMAP, when invoking it from the shell prompt, while Appendix A.2 describes the interface that a programmer must use to take advantage of the communication facilities offered by TMAP . A . l T M A P NAME tmap - map processes and define a T r o l l i u s session. SYNTAX tmap [-c] [-h] [-m <module>] [-f <first>] [-1 <last>] [-C n] [-S n] [-L [sn]] [GDF] [StdFile=SubstFile ...] OPTIONS 56 ix A. TMAP Implementation c only delete most output f i l e s of the specified modules. Do not do any processing. Substitu-tions are to be done. h print a help screen, along with the module names, in the order in which they are executed. Nothing else is to be done. m <module> execute only this module; this is equivalent to using the -f and -1 options with the same module name specified. Module names are case insensi-tive. f <first> begin execution with the module having this name. The default is Vgraph , the 1st TMAP module. 1 <last> end execution with the module having this name. The default is Load , the last TMAP module. C n t e l l the Contract module not to do any contrac-tion. This actually works by incrementing the number of nodes to which the module is allowed to contract, so the user should have specified at least the number of ITB vnodes that occur in the GDF . If more is specified, this flag has no effect. If fewer is specified, this flag may s t i l l be useful, considering how the 0TB con-traction is done; cf "0TB CONTRACTION", below. S n t e l l the Setup module not to make the Trollius route f i l e . This might be preferred i f the user anticipates substituting the route f i l e for another. In any case, the Load module will make the route f i l e i f i t is older than the bnail f i l e . L [sn] send instructions to the Load module. The string parameter [sn] must not be empty i.e. either 's' or 'n' or both must appear. The ' s' forces a spread(l) to be done. By default, i f TMAP detects that a Trollius session is active, i t w ill use that session to load executables Appendix A. T M A P Implementation onto the network, and w i l l not do a spread. If a T r o l l i u s session i s not active, the 's' param-eter has no effe c t . The 'n' prevents TMAP from running tm_pf, described i n the section "ITB Printing", below. This only has an effect i f TMAP i s doing a spread. In th i s case, the user must run tm_pf i n order to obtain the output from any tmprintfO c a l l s occuring i n the user's ITB code. Running tm_pf i n a different window allows the ITB printing to be separated from the rest of the session. [<GDF>] i s the name of the Graph Description F i l e . It is relevant only f o r the Vgraph module, and defaults to "GDF" i f omitted. [StdFile=SubstFile] StdFile i s expected to be one of the standard output f i l e s of the TMAP system. These f i l e s are l i s t e d i n the next section, under the head-ing of i t s module. After completion of the module which produces StdFile , t h i s f i l e i s copied to StdFile.STD , and the user's specified replacement SubstFile i s copied to StdFile Before the f i r s t s pecified module i s executed, a l l substitutions of the previous modules are done. Any number of substitutions may be speci-f i e d , separated by white space. There i s no white space on either side of the '=' . DESCRIPTION TMAP i s an environment i n which to design and run a p a r a l l e l program on the transputer-based multicomputer at UBC . The user must choose a separate directory f o r each TMAP environ-ment, as fi x e d f i l e names are used. A complete TMAP session consists of the execution of the following 7 modules, i n the spe c i f i e d order. Each module expects s p e c i f i c text f i l e s as input, and produces text f i l e s as output, to be used as input to subsequent modules. The substitution commands, described i n the previous paragraph, allow the user to modify these f i l e s and fi n e tune the execution of TMAP . For each module, the output f i l e s , on which the substitution commands w i l l work, are l i s t e d , below. endix A. T M A P Implementation Vgraph The user supplies a Graph Description F i l e (GDF) , defined i n vgraph(l) . The GDF describes the abstract processes that comprise the user's program; each such process i s c a l l e d a " v i r t u a l process", ViP or vnode, i n accordance with the terminology of prep-p(1). ViP's communicate with one another using channels (cal l e d "ports" i n the GDF). ProcLst.sys — specifies the procedure name of each vnode. IOAdjLst.sys — specifies the communication between vnodes. The next 2 f i l e s r e f l e c t the changes to the above 2 f i l e s , after performing the 0TB contraction, described i n the section "0TB CONTRACTION", below. ProcLst(l) — IOAdjLst(l) — ParamLst(l) — speci f i e s the command l i n e parameters to be passed to each ViP. callXX(l) — compiles the XX f i l e s , which are described i n the sec-t i o n "SPECIAL FILES", below. IOLst(l) — should be empty. It i s used by the Contract module. In the prep-p system, t h i s specifies external 10 ports; t h i s aspect of prep-p i s not used i n the TMAP system. CHiPParams(l) — i s expected by the prep-p modules. ChanVIds(l) — assigns a unique integer i d to each channel. LoadLst(l) — specifies loading instructions f o r each Vip. endix A. T M A P Implementation Weight The 3 output f i l e s of the Weight module are produced by the XX(1) compiler, using as input only the .x f i l e s , there being 1 for each OTB-contracted procedure. The numbers i n these f i l e s do not r e f l e c t the actual code that w i l l be executed by the TMAP system at Load time. (The prep-p system does use t h i s code to execute on i t s target architecture; the object output i s supressed by TMAP.) These f i l e s are used to guide the modules Con-tr a c t and Place , according to whatever h e u r i s t i c i s used. The user must decide just how much information i s important to include i n the .x source f i l e s . CodeSize(l) — spe c i f i e s the number of bytes of code that would be gen-erated f o r the target architecture of the prep-p system. The user may use t h i s to t e l l TMAP the r e l a t i v e codesizes of the various OTB-contracted ViP's . Degrees(1) — spe c i f i e s the degrees of the OTB-contracted ViP's Channels between 0TB processes are ignored u n t i l Load time. EdgeWts(l) ~ sp e c i f i e s the communication weights as seen by each end of each channel. Presumabley, whatever goes i n 1 end of a channel, comes out the other end. TMAP deals with t h i s inconsistency by assigning a single weight to a channel, i t being the sum of the 2 weights at the ends. Resource The resources requested by the user are specified i n the f i l e Syslnfo(l) . The Resource module compares these requests with the hardware database to come up with a minimal al l o c a t i o n of resources which s a t i s f y a l l the requests of t h i s TMAP session, and l i s t s these suggested resources i n the f i l e Resources(l) . Cost.dat — Cost.log — endix A. T M A P Implementation Resources(1) — Contract VipToRp(l) — This l i s t s the intermediate contraction result produced by prep-p, and i s used as input to produce the TMAP con-tr a c t i o n , s p e c i f i e d i n RpLst, described below. RpLst — This i s produced from VipToRp by moving OTBproc to be the sole member of a (perhaps new) s o l i t a r y r e a l node. If thie i s already the case, no change i s made to the prep-p contraction. The syntax for RpLst i s opposite to VipToRp — each l i n e of RpLst consists of 2 integer entries. The 1st i s the ViP being referenced, and the 2nd i s the r e a l node onto which i t i s mapped. Maplnit — This sp e c i f i e s the i n i t i a l mapping of r e a l nodes to host nodes. Each l i n e of the f i l e specifies the mapping of a r e a l node to a physical node, and has the format: <rnum> -> <pnum> , where mum i s a r e a l node, as defined i n RpLst, and pnum i s a physical node number. The host, l o g i c , i s numbered a s - 1 . Place LkData(l) — MapAll — This f i l e extends Maplnit , described i n the previous module. Each l i n e of the f i l e has the form <rnode> —> <pnode> , where mode i s a r e a l node, as defined i n RpLst , and Appendix A. T M A P Implementation pnode i s the physical number of a node i n the network, as described i n LkData(l) . Setup b n a i l . — i s a network configuration f i l e to set up a T r o l l i u s session. The syntax i s described i n n a i l ( l ) . route. — i s produced by map(l) using b n a i l . as input. This f i l e contains the routing tables that w i l l be shipped to each transputer by T r o l l i u s while the system i s being booted. ProcMap — specifies the T r o l l i u s node number of the transputer to which each Vip i s mapped. ChanLks(1) — contains information about the non-local channels being used by the user's program. A channel i s non-local i f i t s 2 ends are on different physical nodes of the network. Load star. — This i s not used by the TMAP system. It i s produced by the UBC version of solder(l) . ITB PRINTING The present version of T r o l l i u s , 2.0, has a bug with the f i l e system, and stdio i n par t i c u l a r . Two routines are sup-p l i e d with TMAP to work around t h i s . int tmprintf(char*,...) i s used l i k e p r i n t f O except i t i s implemented with direct message passing to get i t s output to Logic, thus avoiding the f i l e s y s bug. This i s available, also, OTB, but I suggest not using i t there; i n fac t , I get system errors when using tmprintf() on Logic! I don't know why. The return value i s EOF for error, or the no. of characters output. The include f i l e /project/transputer/include/tmap.h has the prototype for t h i s function, and defines an upper bound for the length of output for a single invocation of tmprintf() , currently IK Appendix A. T M A P Implementation This include f i l e also defines tmprintf to be p r i n t f f o r an 0TB compilation. So, i f you r e a l l y want to play around with tmprintf() on Logic, put #undef tmprintf at the top of the source f i l e . The routine which actually does the prin t i n g to the screen i s c a l l e d tm_pf , and i s i n /project/transputer/bin . So, choose a window where you want your ITB printing to appear, and run tm_pf& . When you do a t k i l l ( l ) , t h i s routine i s k i l l e d , too, just l i k e any other 0TB process which does a k i n i t ( l ) . For t h i s reason, the T r i l l i u m kernel must be running on Logic f o r tm_pf to i n s t a l l i t s e l f . tm_pf loops forever, receiving on a high-numbered event, currently 0 x 7 f f f f f f 5 . When using tm_pf , don't use t h i s event f o r receiving, otherwise, on Logic. 0TB CONTRACTION The contraction routines that are used by TMAP are from the prep-p(1) system, and were not designed to deal with more than 1 CPU type. For t h i s reason, the TMAP system requires that the user specify, i n the GDF, whether a process i s to run on the host, or on a transputer, as explained i n vgraph(l) . I do not see t h i s as a serious r e s t r i c t i o n , as the 0TB processes tend to be 10 servers f o r the ITB processes, anyways. The f i l e s ProcLst.sys and IOAdjLst.sys r e f l e c t the o r i -g i n a l GDF, whereas ProcLst and IOAdjLst r e f l e c t the 0TB contraction. It i s these l a t t e r 2 f i l e s which are used as input to the other 2 prep-p based modules, Weight and Con-trac t . The f i l e ProcLst contains a phony procedure, OTBproc, which represents a l l the 0TB processes defined i n the GDF. The contraction output i s adjusted so that OTBproc i s not shar-ing a node with any other Vip. This i s how the 2 CPU types are handled i n the TMAP system. The f i l e IOAdjLst has 2 minor s y n t a c t i c l differences with IOAdjLst.sys, which make parsing easier. F i r s t , each l i n e ends with a semicolon, and second, there i s a space between Appendix A. TMAP Implementation each destination port name and the closing parenthesis. F i n a l l y , each port of OTBproc i s of the form <name>_<N> , which means that port name i s attached to ViP number N i n the GDF, and as ref l e c t e d i n IOAdjLst.sys . SPECIAL FILES Certain f i l e s , once created, are never overwritten by the TMAP system. They are <GDF> , Makefile , bin/Makefile , tm , and a l l the XX f i l e s *.x . They are a l l created by the Vgraph module. The f i l e Syslnfo may be p a r t i a l l y overwrit-ten by the TMAP system, as explained i n i t s own manual page. SYSTEM FILES The remaining f i l e s of the TMAP system are a l l system f i l e s the user i s expected not to modify them, and need not be concerned with them. They are b r i e f l y described here, as an aid to understanding the workings of TMAP . The f i l e s are grouped according to the directory i n which they exist. bin/ In addition to the Makefile mentioned i n the previous section, t h i s directory contains the unpatched executable fo r each procedure i n the GDF . As well, f o r each pro-cedure, there i s a TMAP symbol f i l e having the extension .tym . This i s used by the patchO routine i n the Load module to f i n d the correct offset i n the executable, at which to write i n the channel information. The compiler tec has fewer c a p a b i l i t i e s than cc or gec For t h i s reason, the li n k i n g of ITB and OTB executables i s handled d i f f e r e n t l y . OTBdir/ Each OTB procedure has a C source f i l e here, i n which the channel variables axe defined. The compiled object f i l e s are here, too. The TMAP symbol f i l e i s created after l i n k i n g , by searching the executable's symbol table for the channel array. ITBdir/ This i s the corresponding directory f or the ITB pro-cedures. Each source f i l e , i n addition, contains a direc-t i v e f o r the assembler to pass the address of the channel Appendix A. T M A P Implementation array up to the stage of li n k i n g the process' executable. At that point, the information i s placed into the TMAP sym-bol f i l e . include/ Each procedure has an include f i l e which declares the channels available to that procedure; these are here f o r the convenience of the user, to be included i n the user's source code. sys/ The f i l e s i n t h i s directory are either used only i n t e r -n a l l y by TMAP , or are to be an aid i n debugging. A l l of them, except f o r RawParamLst and OTBfg are created by the Place module. RawParamLst — This i s created by the Vgraph module, and shows the parameters f o r each procedure, before variable substitution distinguishes the individual ViP's . Guest — The 1st l i n e has 2 integer entries. The 1st i s the number of guest nodes to be embedded into the available host nodes. The guest node numbering i s the same as i n RpLst . The 2nd entry i s the no. of edges i n the guest graph. In t h i s context, an edge corresponds to a non-l o c a l channel, as described f o r the f i l e ChanLks . Each subsequent l i n e i s of the form: g l g l weight where g l and g2 are 2 guest nodes, and weight i s the commun-ic a t i o n weight, as described, above, for the f i l e EdgeWts , of the edge connecting them. HostMap — The 1st l i n e of t h i s f i l e consists of 2 integer entries. The 1st entry i s the number of available host nodes. These are l i s t e d i n the TRANS= l i n e of the f i l e Resources, described above. They are renumbered to begin at 0 , and are augmented by 1 more node to represent Logic . The 2nd entry i s the number of pre-mapped guest nodes. The 2nd l i n e consists of 3 integer entries, which are used to guide the embedding heuris-endix A. T M A P Implementation t i c of the Place module. The 1st i s the least no. of guest nodes that should be mapped into the old box (this i s 1 of 2 boxes housing transputers i n the UBC network) . The 2nd number i s the t o t a l no. of a v a i l -able nodes i n the old box. The 3rd entry i s the no. of wires between the old box and the new box, that may be used by the embedding. There are as many remaining l i n e s as specified by the 2nd entry of the 1st l i n e of t h i s f i l e . Each l i n e i s a pair of integers. The 1st i s a guest node, and the 2nd i s the host node to which i t i s to be mapped. There should be at least 1 such l i n e , i n order to ensure that the OTB procedures get mapped to Logic! These li n e s are merely translations of the information i n Maplnit . swdata — This consists of the crossbar settings that would implement the connections spec i f i e d by the Place module (cf switch_set(l)) . EdgesO — This f i l e i s produced by the algorithm which embeds the guest graph into the host graph. The 1st entry i s an integer specifying the number of remaining l i n e s , each having the format index l i n k l link2 where index i s the index of the edge, as l i s t e d i n the f i l e Guest , and l i n k l and link2 are the l i n k numbers over which the channel ( i . e . edge) i s to be routed. If l i n k l and link2 are -1 , the edge must be either multi-plexed over an existing edge, or dila t e d across i n t e r -mediate nodes. Edges — This i s produced from EdgesO by doing the multiplex-ing, using a sub-optimal knapsack approach with the weights. Any edge having the l i n k s as -1 i s neces-s a r i l y d i l a t e d f o r t h i s embedding. OTBfg — This may be created by the Load module. When an OTB fix A. T M A P Implementation process i s specified to be run i n the foreground (cf LoadLst(l)), the patched executable i s copied from the bin directory to t h i s f i l e , and i s loaded i . e . exe-cuted, l a s t . CODE For each Procedure i n the GDF, the user i s expected to sup-ply TMAP with the directory and f i l e name of the code to be run on the network, as explained i n the Makefile i n the main directory. The user must have a Makefile i n each such directory; there need be only 1 such directory, i f a l l the user's code i s compiled i n that directory. The channel v a r i -ables w i l l be linked i n by doing a make i n the main direc-tory of the TMAP session, so the user-supplied code i s not expected to be f u l l y linked. The ITB compiler tec (man page l s t c c ( l ) ) and the 0TB compiler c c ( l ) (or gcc(l)) have dif f e r e n t c a p a b i l i t i e s ; the format of the code that TMAP expects from the user i s described separately f o r the 0TB and ITB cases. 0TB: The default name for the code of procedure <name> i s <name>.bin . The extension emphasizes that t h i s i s a p a r t i a l l y linked executable. It may be produced by using 2 l i n e s i n the user's Makefile similar to: <name>.bin: f i l e l . o f i l e 2 . o Id - r -o $9 f i l e l . o f i l e 2 . o - l g The f l a g - l g should be used only i n the case that the user i s compiling with the debug -g option. ITB: The default name for the code of procedure <name> i s <name>.ta . This i s an archive. The current version of tec i s not able to combine separate object f i l e s into a p a r t i a l l y linked executable. It may be produced by using 3 l i n e s i n the user's Makefile similar to: <name>.ta: f i l e l . t o f i l e 2 . t o rm $® t r a r cr $8 f i l e l . t o f i l e 2 . t o Appendix A. T M A P Implementation A.2 Communication Routines NAME ch_send, ch_recv - TMAP message passing SYNOPSIS #include "tmap.h" int ch_send(TMCHAN *chan, void *msg, i n t msg.len, int flags ); int ch_recv(TMCHAN *chan, void *msg, int msg.len, int flags ); This manual page i s modelled on the T r o l l i u s manual page for nsend/nrecv . DESCRIPTION These 2 routines implement message passing i n the TMAP sys-tem; cf tmap(l) . One end of the channel sends, and the other receives the message. The chan parameter points to a TMAP channel structure, described i n the next section. Such channels are defined i n the Vgraph(l) module of the TMAP system. A user need only reference a channel variable; the object code which imple-ments the channels i s linked to the executable by TMAP . The user should include the header f i l e which defines these channels, as described i n tmap(l), section "SYSTEM FILES", paragraph "include/" . A channel i s said to be l o c a l i f both ends are on the same physical node. This property of a channel depends on the mapping of the user's processes into the network. The msg parameter points to the location i n RAM of the 1st byte of the buffer to be used for sending/receiving . The msg.len parameter specifies the no. of bytes of data to be sent/received , starting at the location msg . It i s recommended that the sender and receiver specify the same message lengths; otherwise, the most e f f i c i e n t l e v e l of com-munication, at the hardware l e v e l , may not be used. The T r o l l i u s routines, which are used for higher l e v e l message passing, do allow length mismatching. The user may use a more complicated strategy at the hardware l e v e l by using Appendix A. T M A P Implementation multiple sends or recieves to achieve matching lengths. The f l a g s parameter specifies how the message i s to be sent/received , and are a l l from T r o l l i u s , except for THOLD Their functions f a l l into 3 categories: 1) Buffers BYBUF — Don't use buffers on t h i s node. NOBUF — Don't use buffers anywhere along the route. 2) Data t r a n s l a t i o n This i s described i n the section "DATA REPRESENTATION", below. 3) Message l e v e l KHOLD — Use a T r o l l i u s v i r t u a l c i r c u i t . THOLD — Use hardware communication. By default, both routines use the T r o l l i u s routines dsend(l) and drecv(l) , unless a hardware v i r t u a l c i r c u i t i s speci-f i e d by the user. A v i r t u a l c i r c u i t (VC) comes i n 2 flavours — using T r o l l i u s , and using hardware. This version of TMAP implements hardware communication only on ITB nodes, i n which case, T r o l l i u s i s dispensed with, and simple in/out pairs are used. The current version of TMAP has r e s t r i c t i o n s on the use of VC's . A l o c a l 0TB channel should not use a VC at a l l . If either end of a channel i s DTB, do not use a hardware VC ; even i f a T r o l l i u s VC i s being used, beware that t h i s may interfere with other uses of the port e.g. printing. If both ends of a channel are ITB , both types of VC may be used, and the hardware version i s recommended. Actually, for a l o c a l ITB channel, a VC request i s always implemented as the hardware type. It i s recommended that the T r o l l i u s type of VC be used only for communicating with an 0TB pro-cess . THE TMAP MESSAGE STRUCTURE Both functions accept a pointer to a channel structure which is defined i n "tmap.h" as: typedef struct { Appendix A. TMAP Implementation 70 i n t node; /* I n i t i a l l y , the node number at the other end. */ in t i d ; /* The channel i d at that node (plus other i n f o ) . */ } TMCHAN; The node f i e l d i s patched at load time to be the T r o l l i u s node number at the other end of the channel. The i d f i e l d encodes the T r o l l i u s event to use i n non-VC message passing, as well as whether or not the channel i s l o c a l , and whether or not the channel i s i n i t i a l i z e d . A l o c a l channel i s i n i t i a l i z e d i f the node f i e l d either i s the channel synchronization word for hardware message passing, or i t points to the node f i e l d at the other end, which i s the synchronization word. In t h i s version of TMAP, a l o c a l OTB channel i s never i n i t i a l i z e d . A non-local channel i s i n i t i a l i z e d i f the i d also encodes the hardware l i n k to be used f o r the sending and receiving of messages. DATA REPRESENTATION On nodes of different types, data may have different representations. For example, integers may be stored with the most s i g n i f i c a n t byte f i r s t i n memory (big-endian) or with the most s i g n i f i c a n t byte l a s t i n memory ( l i t t l e -endian) . The flags parameter of the sender can be set to the follow-ing data representation flags (they w i l l have no effect when used with ch_recv). Each f l a g assumes a data type, and w i l l make the appropriate change i n the data representation of the message. They w i l l have no effect i f data conversion i s not needed. DINTMSG msg points to integers. DFLTMSG msg points to single precision r e a l numbers. DDBLMSG msg points to double precision r e a l numbers. DRAWMSG msg representation w i l l not be changed. If msg contains a mixture of data types, the user w i l l have to change the representation using the functions l t o t ( l ) , Appendix A. T M A P Implementation t t o l ( l ) , etc. BLOCKING The blocking behaviour i s the same as for the T r o l l i u s rou-tines nsend/nrecv , except i f the THOLD f l a g i s set, which is allowed only f o r ITB-ITB communication. In t h i s case, either end w i l l block u n t i l i t has sent or received i t s sp e c i f i e d length of message. RETURN VALUE Upon successful completion, 0 i s returned. Otherwise, -1 i s returned and the global variable errno i s set to indicate the error. ERRORS The following error codes are i n addition to those returned by T r o l l i u s . [ELINKUSED] ch.send or ch_recv f a i l e d because the l i n k required to set up a v i r t u a l c i r c u i t i s already being used by another channel. [ENOVEVENTS] The T r o l l i u s kernel ran out of v i r -t u a l event descriptors. [ECHANLOCAL] T r o l l i u s reports that an u n i n i t i a l -ized non-local channel i s actually l o c a l . Unless the user i s creating channels at run-time, t h i s indicates an error i n the TMAP loader. BUGS Hardware VC's involving an 0TB process are not implemented. The routines corresponding to T r o l l i u s ' dtry_send/dtry_recv are not implemented. Appendix B Prep-p Based Modules The manual pages in this appendix discuss the 3 modules of Prep-p that were modified for use in the T M A P system. The manual pages are slightly altered to reflect these changes. B . l Vgraph Module NAME GraphCom - Prep-P graph compiler SYNOPSIS GraphCom DESCRIPTION This man page i s based on the o r i g i n a l prep-p man page f o r GraphCom(l) , and describes the modified version which i s used f or the TMAP(l) system at UBC . For the sake of c l a r -i t y , the differences with the o r i g i n a l GraphCom are not described, here. GraphCom has two functions: (1) i t serves as the f i r s t module of the prep-p mapping preprocessor, and (2) i t i s used to define l i b r a r y graph descriptions f o r l a t e r use. Which of these i s done depends on the f i r s t word i n the input f i l e (standard input). If t h i s i s the word " l i b r a r y " , then function (2) i s assumed. If i t i s anything else, i t i s interpreted as a graph type (e.g. tree, l i n e , i r r e g u l a r ) , 72 Appendix B. Prep-p Based Modules and fmiction (1) i s assumed. As i t s main function (1 above), GraphCom reads a graph description from standard input and produces the standard output f i l e s IOAdjLst, ProcLst, IOAdjLst.sys, ProcLst.sys, IDLst, ParamLst, ChanVIds, LoadLst, the s h e l l s c r i p t callXX, the Poker f i l e CHiPParams, and an XX(1) f i l e correspond-ing to each procedure name i n the GDF , f o r use i n the Weight(l) module. If a pa r t i c u l a r XX f i l e already exists, i t i s l e f t alone. In t h i s way, the system produces skeleton XX f i l e s which the user may modify. These f i l e s are essen-t i a l l y an expansion of the graph description given by the user. Their format i s documented i n separate man pages. B r i e f l y , IOAdjLst(l) i s an adjacency l i s t , ProcLst(l) asso-ciates process code names with v i r t u a l process numbers, IOLst(l) gives d e t a i l s of external 10 streams, f i l e s and columns (indices) i n f i l e s . CallXX(l) i s a s h e l l s c r i p t which compiles the process XX(1) source code required. This XX code i s used i n the TMAP system only to produce the out-put of the Weight module, described i n weight(1) and tmap(l) . The actual executable code provided by the user i s prob-ably written i n C . The user t e l l s TMAP where t h i s code i s using the f i l e Makefile, which i s one of the non-standard f i l e s of the TMAP system, and i s described i n tmap(l). GRAPH DESCRIPTION LANGUAGE For a complete explanation of the graph description laguage, consult the document "A Child's Garden of GraphCom" by Dwight Newton. The following i s just a b r i e f overview, which does not attempt to describe a l l the features a v a i l -able. In p a r t i c u l a r , the l i b r a r y d e f i n i t i o n f a c i l i t y i s not described here. A graph description contains 3 main parts: a graph type, a global bind l i s t , and a number of procedure declarations. The graph type s p e c i f i c a t i o n serves two functions. F i r s t , the connectivity i s checked by subsequent modules for some graph types. Second, the graph type determines which con-t r a c t i o n and placement strategies w i l l be used for the map-ping of the p a r a l l e l program described i n the graph descrip-t i o n . The graph types known at the time of t h i s writing are endix B. Prep-p Based Modules l i n e , loop, mesh (square mesh), hexmesh, torus, felem ( f i n -i t e element), bincube, shuffex, s4pin (4-pin s h u f f l e ) , tree, t r e e f o l d , b u t t e r f l y , ccc (cube connected cycles), c y c l i c ( c y c l i c contraction), even (even contraction), irregular (probabalistic or graph traversal contraction). The global bind l i s t gives the values (as C-language expres-sions) for symbolic names used i n the procedure declara-tions . The expressions can be i n terms of other symbolic names defined l a t e r or e a r l i e r , and i n terms of a node number i . The most important symbolic names are "nodemin" , "nodecount" , "OTBmin" , "TRANScount" and "PORTcount". There are nodecount procedures numbered consecutavely from nodemin, a non-negative integer; these 2 constants must be sp e c i f i e d e x p l i c i t l y . A l l the procedures numbered from OTB-min upwards represent processes that w i l l be run on the host, l o g i c . A l l other processes w i l l be mapped to a tran-sputer. This d i s t i n c t i o n i s made to f a c i l i t a t e the contrac-t i o n of the processes onto the available hardware. The default value for OTBmin i s -1, an i n v a l i d index, which spec i f i e s that there are no processes to run on the host. TRANScount i s the minimum number of tranputers onto which the processes w i l l be contracted; i f , during resource a l l o -cation, more transputers happen to be available, the user may specify, i n the f i l e Syslnfo, whether to allow the Con-tr a c t module to use a l l the available transputers (TRANSfill = 1) , or to r e s t r i c t the contraction to at most TRANScount transputers (TRANSfill = 0) . The default value for TRAN-Scount i s 18, the size of components 0, 1 and 2 i n the tran-sputer system. Similarly, PORTcount i s the minimum number of ports (hardware links with logic) that the user wishes to obtain from the Resource module. The PORTfill variable i n Syslnfo s p e c i f i e s whether to use a l l available ports during the Place module, or just the number PORTcount. The default value f o r PORTcount i s 1 . An example of a global bind l i s t i s : { k = 5; nodemin = 0; nodecount = n; Appendix B. Prep-p Based Modules OTBmin = n-1; /* The l a s t 2 processes w i l l run on the host. TRANScount and PORTcount revert to t h e i r default values, described above. */ n = l«k; /* 2**k */ > which could be used when the number of nodes and the log (base 2) of that number are used l a t e r i n the procedure declarations. It i s generally preferable to use symbolic names to simplify the expressions i n the procedure declara-tions (see below), since i t improves rea d a b i l i t y of the description. Further, for each of the values of i (there are exactly nodecount of them), each symbolic name i s evaluated at most once, no matter how many times i t i s used. If common subexpressions are given one symbolic name, considerable running time can be saved. The procedure declarations specify the name of the source code f i l e s to be used f o r the process, and the connectivity of the processes. A procedure declaration can be a use-clause which imports an entire predefined subgraph from a li b r a y d e f i n i t i o n , or i t can be a simple procedure declara-t i o n f o r just one procedure. This i s an example of a simple procedure declaration: procedure LEAF{ i , "I am a lea f " } nodetype : { i >=J ((nodecount+l)/2 i < nodecount } port PARENT : { i/2 } when {leaf_io} port OUTPORT : output dataout A simple declaration specifies a procedure name, 2 parame-ters to be passed at load time, a condition on the node number i under which the procedure runs as process i , and the ports through the procedure communicates to other processes. The parameters are not separated by commas. The condition i s an expression which contains C-operators, sym-b o l i c names defined i n the global bind l i s t , and the node number variable i . If the condition evaluates to a non-zero value for some pa r t i c u l a r i , then the source code f o r pro-cess i i s taken from the XX f i l e which has the procedure's name (extended by ".x") i n the current directory. The exam-ple above would generate code from the f i l e LEAF.x for the leafs of a f u l l binary tree with i-based numbering. Appendix B. Prep-p Based Modules The port declarations contain the port name, the destination of the port, and possibly a guard expression i f the port i s used only under certain conditions. The destination i s either a node number ( i f the port leads to another process), or an external 10 stream/file name. Both the destination expression and the guard expression can contain symbolic names (from the global bind l i s t ) , C-operators, and the node number variable i . A use-clause contains a f i l e name ( i . e . where to read the l i b r a r y d e f i n i t i o n from), and a l o c a l bind l i s t . The follow-ing i s an example of a use-clause: use "/usr/local/prep-p/libdefs/tree" { k = n c l ; nodemin = nml; } where n c l and nml would be defined i n the global bind l i s t . What t h i s says i s that you want whatever the t r e e - l i b r a r y defines included i n your graph, and that you want to replace the l i b r a r y ' s d e f i n i t i o n of what nodemin and k by ncl and nml, respectively. Note that changing nodemin (or k) i n t h i s l o c a l bind l i s t does not affect what they are i n the global bind l i s t , or i n other l o c a l bind l i s t s . Thus, you can include d i f f e r e n t l i b r a r i e s , or the same l i b r a r y several times, each time with different values f o r the symbolic names. AUTHOR Darin Johnson wrote the o r i g i n a l version. Dwight Newton com-ple t e l y revised the program. Bernd Stramm wrote t h i s man page. Norman Goldstein modified GraphCom f o r use by the TMAP system at UBC . Appendix B. Prep-p Based Modules B.2 Weight Module NAME callXX - Is a C S h e l l s c r i p t which c a l l s the XX compiler to compile a l l procedures indicated i n an algorithm's graph description. DESCRIPTION The UBC TMAP system uses a modification of the o r i g i n a l prep-p c a l l X X ( l ) . The assembler output f i l e s (.s) are suppressed. Parameters to the XX(1) procedures have no e f f e c t . Instead, parameters are passed to the procedures i n the o r i g i n a l Graph Description F i l e , as described i n vgraph(l) . The XX language i s a si m p l i f i e d sequential programming language f o r defining the codes to be executed by the pro-cessing elements of the CHiP computer. The XX compiler i s used to compile the program written i n XX language and to f i n d a l l the possible syntactic errors. In order to get a better contraction r e s u l t , the contraction algorithm i n Prep-P uses two additional pieces of informa-t i o n on the given graphs. F i r s t . the EdgeWts shows the port usage i n each process of an algorithm and the maximum degree of a l l the processes. Secondly, The CodeSize shows the actual codesize of a given code name. The XX compiler has been modified by i n s t a l l i n g two separate passes to compute these two f i l e s . The callXX s c r i p t , created by the GraphCom module, consists of the command: XX <codenamel> <codename2> ... <codenamen> where <codenamel> through <codenamen> are exactly those Dos Equis procedures which w i l l be executed by V i r t u a l Proces-sors of the algorithm. The usual name for the <codenamei> procedures are ViPO.x, ViPl.x, etc. In addition, the callXX s c r i p t checks that a l l procedures passed without error through the XX compiler, and sends an error message to the user i f t h i s i s not the case. Appendix B. Prep-p Based Modules 78 FILES Output F i l e s : ./EdgeWts - the l i s t of heavy and l i g h t edges, and the weights of the l i g h t edges. ./CodeSize - the codesize f o r a given code name. This i s the no. of bytes of Intel 8051 code that would be produced f or the Poker system, and may be used as a r e l a t i v e estimate of the code size f o r any processor, as re f l e c t e d i n the XX source code, NOT i n the source code that w i l l eventually be loaded onto the transputer network by the TMAP system. ./Degrees - the number of channels possessed by each ViP . AUTHOR Susan Merritt. Shin-Tih. Modifications f o r the TMAP system at UBC were made by Norman Goldstein. B.3 Contract Module The minor change for T M A P is described in A . l , paragraph "OTB C O N T R A C T I O N " . NAME contract - p a r t i t i o n v i r t u a l processes into groups of r e a l processes SYNOPSIS contract [-D] [-q] [-o][-r] DESCRIPTION Contract i s the module of prep-p which pa r t i t i o n s a group of v i r t u a l processes (vips) onto a smaller group of r e a l processes (reps). Contract runs after GraphCom and callXX i n the execution of prep-p. It takes the f i l e s generated by GraphCom from the graph description f i l e , and generates new f i l e s to r e f l e c t the mapping. These f i l e s are then l a t e r used by placement. Appendix B. Prep-p Based Modules Contraction f i r s t determines whether a given graph needs to be contracted ( i . e . the number of vips i s greater than the number of reps). If so, i t performs the contraction algo-rithm. If not, i t just normalizes the node numbers for vips so that they are contiguous, generates the output f i l e s f o r a one to one mapping, and returns. The contraction algorithm acts as follows: i f the graph i s a l i b r a r y graph f o r which a l i b r a r y contraction exists, t h i s template w i l l be used to generate the p a r t i t i o n i n g . If the graph i s irregular i n form ( i . e . has no l i b r a r y contrac-tion) , contract uses a l o c a l neighborhood search (LNS) algo-rithm to determine a good configuration. LNS sets up an i n i t i a l p a r t i t i o n i n g configuration, and then i t e r a t i v e l y selects a vip at random, moves i t from one rep to another, and compares the two configurations v i a a cost function. If the cost i s lower for the new configuration, i t i s retained. Otherwise the older configuration i s restored. The process ends when a threshold of repeating f a i l u r e s occurs. Library contractions are defined procedurally, i . e . i f the user wishes to add a l i b r a r y contraction, i t must be hard coded. For the LNS contraction, there are three cost functions available. The o r i g i n a l cost function attempts to minimize the edges between reps, while attempting to evenly d i s t r i -bute vips among the available reps. In the second cost function, we d i f f e r e n t i a t e between heavy and l i g h t edges. Light edges have a weight associated with them, which denote the predetermined amount of t r a f f i c over the edge; heavy edges are assumed to bear a large amount of t r a f f i c , since the amount i s not predeterminable. In t h i s calculation, external edges to I/O pads are handled i n the same fashion as internal I/O edges between vips. This cost function works using a three t i e r e d group of functions. If the two configurations are equivalent for function A, func-t i o n B i s used, and likewise for function C. A l l three func-tions attempt to y i e l d an even d i s t r i b u t i o n of vips on reps, but also attempt to control the d i s t r i b u t i o n of edges between vips/reps. Function A attempts to maximize the Appendix B. Prep-p Based Modules 80 number of heavy edges within a given rep; function B does the same f o r l i g h t edges, but attempts to maximize the sum of weights of the l i g h t edges within a given rep; and func-t i o n C attempts to minimize the number of edges between reps. The t h i r d cost function i s available only to prove the v a l i -dity of the f i r s t two. It randomly configures the graph, attempting only to ensure that the graph s a t i s f i e s certain constraints (see below). The -D f l a g causes a rather large amount of u n i n t e l l i g i b l e gibberish to be printed to the screen; t h i s i s not highly recommended for the casual user. Normally, contract gives some information as to what i t i s doing, and prints out a summary of the results of i t s work. If the -q f l a g i s used, t h i s output w i l l be silenced. FILES Input F i l e s : ./EdgeWts - the l i s t of heavy and l i g h t edges, and the weights of the l i g h t edges ./ProcLst - the name of the executable code for a given vip ./CodeSize - the actual codesize for a given code name ./IOAdjLst - a l i s t of the interconnectivity between vips Output F i l e s : ./ConPadMap - the mapping of v i r t u a l pads to r e a l pads ./ConAdjLst - the mapping of v i r t u a l adjacency l i s t s onto r e a l adjacency l i s t s ./VipToRp - the mapping of vips to reps AUTHOR Bernd Stramm Ken Cooper Appendix C File Descriptions The manual pages in this appendix discuss the formats of some of the files used in the T M A P system. The first section consists of files that are unique to T M A P . The second section consists of Prep-p files which are used by T M A P as well. C . l T M A P specific files C . l . l Syslnfo File NAME Syslnfo - system information for a TMAP session. DESCRIPTION Syslnfo i s produced by the 1st module, Vgraph, and i s used by most subsequent modules. The f i l e consists of 11 f i e l d s , broken up into 4 blocks. The 1st 2 blocks are overwritten by the Vgraph module, using whatever i s speci f i e d i n the Graph Description F i l e (GDF) , while the l a s t 2 blocks are only ever changed when the user e x p l i c i t l y edits Syslnfo . The 1st block summarizes information which should not be changed by the user, except by rerunning the Vgraph(l) module, and has the format: PROCcount = <PR0Ccount> ; nodemin = <nodemin> ; 81 endix C. File Descriptions nodecount = <nodecount> ; OTBmin = <0TBmin> ; where PROCcount i s the number of procedures defined i n the GDF, nodemin i s the smallest index of a ViP , nodecount i s the number of ViP's i n the GDF, and OTBmin i s the smallest index of an 0TB ViP (cf vgraph(l)); i t i s -1 i f there are no 0TB procedures. Both PROCcount and nodecount r e f l e c t the values before 0TB contraction (cf tmap(l)). The 2nd block summarizes the resource requests that may be made from the GDF , and has the format: TRANScount = <TRANScount> ; PORTcount = <P0RTcount> ; where TRANScount i s the least no. of transputers that the user w i l l have allocated, and PORTcount i s the least no. of ports that the user w i l l have allocated. The 3rd block allows the user to specify which ports and which components of the network to use (Individual tran-sputers may be specified using the f i l e Resources(1)). The format i s : COMPcount = <C0MPcount> ; Components = COMPlist ; Ports = PORTlist ; where COMPcount i s the least no. of components that the user w i l l have allocated, and COMPlist and PORTlist are l i s t s of the s p e c i f i c components and ports to use. The syntax i s i l l u s t r a t e d i n the EXAMPLE , below. The 4th, and l a s t , block controls the usage of the allocated resources. The format i s : TRANSfill = <TRANSfill> ; PORTfill = <P0RTfill> ; where TRANSfill and PORTfill are boolean variables that specify whether to use extra resources that may be the res u l t of resource allocation; f o r example, transputers are Appendix C. File Descriptions allocated i n blocks, each block being a reset component. A value of 0 means not to use more than i s specified i n the corresponding "count" variable. The default value for TRANSfill i s 0 , while the default value for PORTfill i s 1 . EXAMPLE Here i s a sample Syslnfo f i l e . PROCcount = 3 ; nodemin = 1 ; nodecount = 7 ; OTBmin = 6 ; TRANScount = 20 ; PORTcount = 2 ; COMPcount = 1 ; Components = <3> ; Ports = <0> <-l> ; TRANSfill = 0 ; PORTfill = 1 ; The user wishes to have at least 20 transputers and 2 ports. One component i s spe c i f i e d v i z . number 3 . Also, 2 ports are requested, but only 1 i s specified v i z . number 0 ; the <-l> i s used to terminate an under-specified l i s t . Presum-ably, the components allocated would consist of components 0 and 3, having a t o t a l of 37 transputers. Since TRANSfill i s 0 , at most 20 of these would be used. Had the user not s p e c i f i e d any p a r t i c u l a r components, the Resource module would have allocated components 0 and 4, possessing exactly 20 transputers and 2 ports. Users should r e f r a i n from res-t r i c t i n g the choice of resources, unless there are good rea-sons to do otherwise. C.1.2 ParamLst File NAME ParamLst- command l i n e parameters. Appendix C. File Descriptions DESCRIPTION ParamLst i s the f i l e which specifies the command l i n e param-eters which are to be passed to each executable at load time. This f i l e i s produced by the Vgraph module of the TMAP system, and consists of repeated l i n e s of the following format: vnum : count : ParameterList where vnum i s the number of the ViP being referred to, count i s the number of parameters being passed to t h i s process (count i s 0 i f no parameters are being supplied, although, i n a C program, argc i s 1), and ParameterList i s a sequence of count strings, each having the same format as described i n LoadLst(l) for the instructions to the loader. The ViP numbers are l i s t e d i n increasing order. C.1.3 LoadLst File NAME LoadLst- loader instructions. DESCRIPTION LoadLst i s the f i l e which instructs the Load module of the TMAP(l) system how to load the executables onto the nodes of a network. This f i l e i s produced by the Vgraph module, and consists of repeated l i n e s of the following format: : vnum <instructions> where vnum i s the number of the ViP being referred to, and instructions i s a text string to be interpreted by the loader. In t h i s s t r i n g , a pair of double quotes (") may be used to hide white space. The back slash (\) may be used to introduce a l i t e r a l (") or a l i t e r a l (\) . The meta s t r i n g must not be empty. The empty instruction should be written as "" , which i s , i n f a c t , the default instruction f or each process. For ITB loading, consult loadgo(l) for the allowable Appendix C. File Descriptions options. The only allowable option for OTB i s " f " . This instructs the Load module of TMAP to put that process i n the fore-ground; the default i s to run each executable i n the back-ground using the ampersand (&) . It i s an error to specify more than 1 process as foreground, as they would compete f o r stdin . The order of loading i s as specified i n th i s f i l e , with 1 exception. If any OTB process i s specified as foreground, i t i s saved t i l l the l a s t . This i s done as a convenience (to the design of TMAP). A user's program should be designed so that the order of loading i s not signficant. : 3 "-o DUT3" Presumably, vnode 3 i s an ITB process. Its stdout i s to be redirected to the f i l e 0UT3 . : 5 " f " Again, presumably, vnode i s an OTB process, and i t w i l l be run i n foreground. C.1.4 ChanVIds File NAME ChanVIds - channel v i r t u a l ID's . DESCRIPTION ChanVIds i s the f i l e containing the TMAP encoding of the channels which are specified i n the Graph Description F i l e . This f i l e i s produced by the Vgraph module, and i s used by the Load module to patch the correct information into the executables on t h e i r way to th e i r destination nodes. The f i l e consists of repeated li n e s of the following format: : vnum ChannelList ; where ChannelList specifies a l l the channels connecting with Appendix C. File Descriptions the ViP number vnum . ChannelList i s a sequence of entries of the form : name = { vnum.dest , id_dest } where name i s the name of the channel, as known to ViP vnum, vnum.dest i s the number of the ViP at the other end of the channel, and id.dest i s the i d which i s used by ViP vnum to send along the channel. The id's are distributed i n such a way, that ViP vnum receives along the channel by using the i d obtained by toggling the lowest order b i t of id_dest . It i s necessary to assign a different i d to each dir e c t i o n of the channel, since both ends of the channel might be loaded onto the same physical node; t h i s ensures that a receiver obtains only what the other end has sent. C.1.5 Resources F i l e NAME Resources - resources to be used by a TMAP session. DESCRIPTION The f i l e Resources i s produced by the Resource module, and completely l i s t s a l l resources available to the Contract and Place modules. The 1st l i n e has the form Error= <Error> where the integer value Error should be 0 . The next 3 l i n e s specify how many transputers, ports and components w i l l be used. It has the format TRANSplace= <TRANSplace> PORTplace= <PORTplace> Using <num_comps> components. where TRANSplace and PORTplace are the numbers of tran-sputers and ports that the Place modules should use to Appendix C. File Descriptions actually implement the user's program, and num_comps i s the number of components allocated. The next 2 l i n e s specify the components and ports, as i n the f i l e Syslnfo(l) . The format i s Components: COMPlist ; Ports: PORTlist ; Where COMPlist and PORTlist have the same format as i n the f i l e Syslnfo(l) . Of course, neither l i s t should now be under-specified. The next f i e l d specifies the transputers to be used by the Place module. The format i s TRANS= TRANSlist ; where TRANSlist l i s t s the physical numbers of transputers that the Place module may choose from. The EXAMPLE, below, i l l u s t r a t e s the format. The f i n a l f i e l d s p ecifies the inter-crossbar wires that the Place module may choose from. The format i s XXWIRE= <inter-box-wires> <other-wires>; where inter-box-wires l i s t s the available wires connecting the 2 boxes housing the tranputers, and other-wires l i s t s a l l the other inter-crossbar wires. Again, the EXAMPLE, below, has an i l l u s t r a t i o n EXAMPLE Here i s a sample Resources f i l e . Error= 0 TRANSplace= 10 P0RTplace= 2 Using 2 components. Appendix C. File Descriptions Components: <0> <4>; Ports: <0> <6>; TRANS= 0 73 1 24 9 17 21 20 16 12 13 19 22 14 23 15 11 10 18; XXWIRE= 2 9 0 1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25; The system w i l l be using only 10 tranputers, but the user has requested that 2 ports be used. As a res u l t , components 0 and 4 have been allocated. The tranputers i n these 2 com-ponents are l i s t e d i n the "TRANS=" f i e l d . The ordering of the transputers i n t h i s l i s t i s s i g n i f i c a n t to the heuris-t i c s of the Place module. Generally, the most "useful" transputers are l i s t e d f i r s t . As seen i n the 1st l i n e of the "XXWIRE=" f i e l d , only 2 of the inter-box wires are a l l o -cated to t h i s session. The next section explains t h i s pol-icy. INTER-XBAR WIRES The routine xxwires l i s t s some hardware information, including the numbering of the inter-xbar wires. The wires are numbered upwards from 0 . For each such index, a wire entry has the format index=( x b a r l , l i n k l ; xbar2,link2) where index i s the wire number, x b a r l , l i n k l i s 1 end of the wire, and xbar2,link2 i s the other end of the wire. There are only 8 inter-box wires. To avoid c o n f l i c t s at load time, each of components 0,1,2 and 3 allows the user to use 2 of these wires, as l i s t e d i n the following table Component Inter-box wires 0 1 2 3 2,9 3,8 4,7 5,6 The policy i s , that i f a user has not been allocated a com-Appendix C. File Descriptions ponent, then the associated inter-box wires should not be used. C.1.6 LkData File NAME LkData - l i n k interconnection specifications. DESCRIPTION LkData i s produced by the Place module of the tmap(l) sys-tem. It has the format f or specifying l i n k interconnections i n the transputer network at UBC , and consists of repeated l i n e s of 4 integers: nodel l i n k l node2 link2 where nodel and node2 are 2 physical node numbers i n the network. Currently, transputers are numbered from 0 to 73 , and the host, l o g i c , i s -1 Transputer l i n k noumbers range from 0 to 3 , and host l i n k numbers range from 0 to 10 The l i n e specifies that l i n k l on node nodel i s to be con-nected to link2 on node node2 , i n the sence of being either hardwired, or by using up to 2 crossbars to implement the connection. As f a r as message-passing software i s con-cerned, t h i s i s a direct physical connection. In r e a l i t y , there i s a delay incurred f o r each C004 crossbar used to effect the connection, of around 18'/, . This would give a 50'/, reduction i n throughput i f 3 crossbars were to be used. C . l .7 ChanLks File NAME ChanLks - channel i n i t i a l l i n k s . DESCRIPTION ChanLks i s produced by the Setup module of the tmap(l) sys-tem, and describes the i n i t i a l routing of messages along each non-local channel (A channel i s non-local i f i t s 2 ends Appendix C. File Descriptions are on d i f f e r e n t physical nodes) . This f i l e i s used by the Load module to patch the correct information into the exe-cutables on t h e i r way to t h e i r destination nodes. The f i l e consists of repeated li n e s of the following format: <vnodel> (<linkl> <namel> , <name2> <link2>) vnode2 where the integers vnode1 and vnode2 are the ViP numbers at either end of the channel, the strings namel and name2 are the names by which the channel i s known at the respective ends, and the integers l i n k l and link2 are the links by which messages are received and sent at the respective ends of the channel. A value of -1 s i g n i f i e s that the channel i s d i l a t e d i . e . i t does not connect 2 processes on nodes having a physical connection. This version of TMAP does not deal with the routing of dila t e d channels. C.2 Prep-p files C.2.1 ProcLst File NAME ProcLst - V i r t u a l processor to Procedure name mapping f i l e . DESCRIPTION ProcLst i s the f i l e used by the Prep-P system to map v i r t u a l processors (nodes i n the o r i g i n a l computation graph) to pro-cedure names (name of the code to be executed by that v i r -t u a l processor). The f i l e consists of repeated li n e s of the following format: : N : procedure-name where N i s the number of the v i r t u a l processor and procedure- name i s the name of the XX routine to be executed by N. If an algorithm uses a graph generation program (such as genccc.c), then the XX code generated for execution by v i r t u a l processor N i s usually c a l l e d ViPN.x Appendix C. File Descriptions The ProcLst f i l e i s produced by the GraphCom routine. It i s used by the genCN routine to generate the CodeNames f i l e f o r Poker and by Premux. AUTHOR Susan Merritt C.2.2 IOAdjLst File NAME IOAdjLst - detailed adjacency l i s t f i l e f o r Prep-P graphs DESCRIPTION IOAdjLst i s the f i l e used by the Prep-P system to describe the adjacencies of the v i r t u a l processors (ViPs) i n a graph, as well as the ports which connect the adjacent ViPs or extern 10 pads. The f i l e consists of repeated li n e s of the following format: : V : (pname, pname) VI : (pname, <) -m : (pname, pname) V2 : (pname, >) -n : . . . where V i s the number of an ViP, the Vi are numbers of ViP adjacent to i t , negative number -m or -n represents the i/o pad adjacent to (instead of another ViP), and the values i n parentheses are the portnames on V and on V i which form the two ends of the connection. Note that i f V i s connected to an i/o stream, we use the symbol "<" or ">" to distinguish the input port and the output port. IOAdjLst i s generated by the Prep-P program GraphCom. The f i l e i s used as input f o r l a t e r stages of the Prep-P system such as contr and genPN. AUTHOR Shin-Tih Jing Appendix C. File Descriptions C.2.3 CodeSize File NAME CodeSize - f i l e containing the size of Intel 8051 code for each procedure node DESCRIPTION CodeSize i s the f i l e used by the Prep-P system to evaluate contraction. The f i l e consists of repeated l i n e s of the following format: procedure-node-name : S where procedure-node-name corresponds to a XX f i l e and S i s i t s size of Intel 8051 code. This f i l e i s produced by the XX compiler and i s used by the contraction routine, amount other f i l e s , to generate ConPad-Map, ConAdjLst and VipToRp. C.2.4 Degrees File NAME Degrees - the degree of each ViP . DESCRIPTION The f i l e Degrees just l i s t s the number of ports connected to each ViP i n a TMAP session. This f i l e i s produced by the Weight module, and consists of 1 integer entry f or each ViP, l i s t e d i n order of increasing ViP number. C.2.5 EdgeWts File NAME EdgeWts - f i l e containing internal and external edges weight of each procedure nodes DESCRIPTION Appendix C. File Descriptions EdgeWts i s the f i l e used by the Prep-P system to evaluate contraction. The f i r s t l i n e of f i l e indicates the number of procedure nodes. The rest of the f i l e consists of repeated l i n e s of the following format: procedure-node-name : port-name weight I [port-name weight I ] where procedure-node-name corresponds to a XX f i l e , port-name i s one of the defined port of the node, and weight i s an assigned value predicting the amount of t r a f f i c through t h i s port. This f i l e i s produced by the XX compiler and i s used by the contraction routine, among other f i l e s , to generate ConAdjLst, VipToRp and ConPadMap. C.2.6 VipToRp File NAME VipToRp - V i r t u a l Processor to Real Processor mapping f i l e DESCRIPTION VipToRp i s the f i l e used by the Prep-P system to map v i r t u a l processors (nodes i n the o r i g i n a l computation graph) to r e a l processors (nodes i n the contracted graph). The f i l e consists of repeated li n e s of the following format: : R : VI V2 V3 . . . where R i s the number of a r e a l processor, and the Vn are numbers of v i r t u a l processors mapped to i t . The VipToRp f i l e i s produced by the contraction module. AUTHOR Susan Merritt , N. Goldstein (minor deletion f o r TMAP). Appendix D Sample Source Files This appendix contains the source code for the sample program of Chapter 5. As men-tioned there, each message passing experiment is preceeded by the passing of a control packet around the ring. D . l Control Information The 3 fields of the control structure tell each node what it needs to know about the upcoming rounds of message passing. /* Header f i l e control.h f o r the sample TMAP program. */ struct control { int count, /* The number of times to send the message. */ length, /* The number of bytes i n each message. */ type; /* Either 0 , KHOLD or THOLD . */ }; 94 Appendix D. Sample Source Files 95 D.2 Host Code The host process performs an endless loop, setting up the message passing experiment as directed by the user, and printing the time to complete the experiment. The host and the transputer have different internal representations for storing integers in R A M , but by using the DINTMSG flag, Trollius does the conversion, and this is otherwise invisible to the user. /* Source f i l e Host.c for the sample TMAP program. */ #include <stdio.h> #include <sys/time.h> /* For time functions. */ #include "Host.h" /* The vnode include f i l e . */ ttinclude "control.h" /* Specific to th i s example. */ /* This i s the OTB 10 server for the example. The prompt i s f o r 3 values: count (Hex integer) — The number of times to pass the packet around the ring, length (Hex integer) — The length of each packet, type (character) — The type of message passing to use: 'n' - T r o l l i u s network l e v e l , 'v' - T r o l l i u s v i r t u a l c i r c u i t , or 'h' - Hardware. When done, the elapsed time, i n seconds, that the OTB process waited, i s printed. The prompt then appears to begin the next experiment. */ mainO •c /* Carries the control information described above. The header control.h has the structure. */ struct control control; Appendix D. Sample Source Files /* W i l l store i n i t i a l and f i n a l elapsed times. */ struct timeval tvO , t v l ; char c; /* An 0TB process must c a l l k i n i t O to use T r o l l i u s services. */ i f ( kinit(O) ) •C terror("Host doing k i n i t ( O ) " ) ; e x i t ( - l ) ; } while(l) { printf("\nEnter HEX count length [nvh]: " ) ; scanf C"/,x'/,x7,ls" ,&control. count, ^control. length , Sec); control.type = ( c == 'n' ? 0 : c == 'v' ? KHOLD : c == 'h' ? THOLD : -1 ); i f ( control.type == -1 ) continue; /* Error on input. */ /* Record i n i t i a l times. */ gettimeofday(&tv0,NULL); /* Send the root node the control information. */ ch_send(root,&control,sizeof(control),DINTMSG); printf("waiting..."); fflush(stdout); /* Receive n o t i f i c a t i o n that the message passing i s completed. ch_recv(root,&c,1,0); /* Record f i n a l times. */ gettimeofday(&tvl,NULL); /* Display 0TB time calculation. */ p r i n t f ( " Elapsed time (sec)= 7,f An" , ((tvl.tv.sec - tv0.tv_sec) * 1000000 + tvl.tv_usec - tv0.tv_usec )/1.0e6 ); > Appendix D. Sample Source Files )7* main */ D.3 Root Code /* Source f i l e root.c for the sample TMAP program. */ #include "root.h" /* The vnode header f i l e . */ ttinclude "control.h" /* Specific to this example. */ main() •C /* Carries the control information described in Host.c . */ struct control control; /* This is the type of message that is used. Ini t i a l l y , i t is the Trollius network level. */ int type = 0; /* To act both as a message and the buffer to receive a message. */ char *buf; while(l) •C /* Get the control information from the host. */ ch_recv(host,&control,sizeof(control) ,0); /* Pass the control information to the next in the ring. */ ch_send(next,&control,sizeof(control),type); /* Receive the same information back from the last in the ring! It is easier to do this, than to arrange for the last in the ring not to send i t on to the root. */ ch_recv(previous,&control,sizeof(control),type); /* Now that everyone has the control information, set the type of message level to use. */ type = control.type; /* Create room for receiving. */ buf = (char*) malloc(control.length); Appendix D. Sample Source Files 98 while( — control.count >= 0 ) •C /* These 2 li n e s pass the message once around the ri n g . */ ch_send(next,buf,control.length,type); ch_recv(previous,buf,control.length,type); } /* Send acknowledgement to host. */ ch_send(host,&type,1,0) ; /* Prepare f or the next round of communication. */ free(buf); }/* while(l) */ }/* main */ D.4 Node Code The code for the node processes differs from that for the root process in that it does not interact with the Host , and the recv-send pairs are reversed; otherwise all the nodes would block waiting for its neighbour to send! /* Source f i l e node.c for the sample TMAP program. */ #include "node.h" /* The vnode header f i l e . */ #include "control.h" /* Specific to t h i s example. */ mainO •c /* Carries the control information described i n Host.c . */ struct control control; /* This i s the type of message that i s used. I n i t i a l l y , i t i s the T r o l l i u s network l e v e l . */ int type = 0 ; Appendix D. Sample Source Files 99 /* To act both as a message and the buffer to receive a message. */ char *buf; while(l) •C /* Receive and pass on the control information. */ ch_recv(previous,&control,sizeof(control),type); ch_send(next,&control,sizeof(control).type); /* Now that everyone has the control information, set the type of message l e v e l to use. */ type = control.type; /* Create room for receiving. */ buf = (char*) malloc(control.length); while( — control.count >= 0 ) •C /* These 2 l i n e s pass the message once around the ring. */ ch_recv(previous,buf.control.length,type); ch_send(next,buf,control.length,type); } /* Prepare f o r the next round of communication. */ free(buf); }/* while(l) */ }/* main */ D.5 Substitutions A substitution file is typically created by modifying the current file being used/output by TMAP. Appendix D. Sample Source Files 100 Maplnit . l This first file, Maplni t . l , was created by modifying Maplnit, which consisted only of the first line, which maps the O T B processes to the host. The remaining 4 lines were added to tweak the embedding not to have any dilations. 10 -> -1 0 -> 2 9 -> 3 1 -> 25 8 -> 40 LoadLst.f In the original file, LoadLst, each string is empty. A n ' f was added for vnode 10 , so that that host process would be loaded in the foreground. 0 1 2 3 4 5 6 7 8 9 10 with H. Sreekantaswamy, A. Wagner and S. Chanson. Resource management i n a l a r g e r e c o n f i g u r a b l e t r a n s p u t e r -based system. In proc. " S i x t h D i s t r i b u t e d Memory Computing Conference", OACIS and U. Michigan. P o r t l a n d , Oregon, April-May 1991. with A. Wagner, S. Chanson, J . J i a n g , H. Larsen and H. Sreekantaswamy. TIPS: Transputer-based i n t e r a c t i v e p a r a l l e l i z i n g system. In proc. "Transputing '91", World t r a n s p u t e r user group. Sunnyvale, C a i . , A p r i l 1991. The geometry of s u r f a c e s i n the 4-quadric. Rend. Sem. Mat. U n i v e r s . P o l i t e c n . Torino 43, 467-499 (1985). with I. Dolgachev, On the Springer r e s o l u t i o n of the minimal conjugacy c l a s s . J . of Pure and A p p l i e d Algebra 32, 33-47 (1984). A s p e c i a l s u r f a c e i n the 4-quadric. Duke Math. J . 50, 745-761 (1983). Ampleness i n complex homogeneous spaces and a second L e f s c h e t z theorem. P a c i f i c J . of Math. 106, 271-291 (1983). Ampleness and connectedness i n complex G/P. Trans. Amer. Math. Soc. 274, 361-373 (1982). NSERC P03t Graduate S c h o l a r s h i p , 1976-79. Held at C o r n e l l U n i v e r s i t y . NSERC Post D o c t o r a l F e l l o w s h i p , 1979-81 and 1983-88. Held a t UBC. 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items