UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A hierarchical software development environment for performance oriented parallel programming Feldcamp, David A. 1992

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1993_spring_feldcamp_david.pdf [ 4.07MB ]
JSON: 831-1.0051189.json
JSON-LD: 831-1.0051189-ld.json
RDF/XML (Pretty): 831-1.0051189-rdf.xml
RDF/JSON: 831-1.0051189-rdf.json
Turtle: 831-1.0051189-turtle.txt
N-Triples: 831-1.0051189-rdf-ntriples.txt
Original Record: 831-1.0051189-source.json
Full Text

Full Text

A HIERARCHICAL SOFTWARE DEVELOPMENT ENVIRONMENT FORPERFORMANCE ORIENTED PARALLEL PROGRAMMINGbyDavid A. FeldcampB.S.E.E. University of Texas at AustinA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE STUDIESCOMPUTER SCIENCEWe accept this thesis as conformingto the required standardThe University of British ColumbiaDecember 1992© David A. Feldcamp, 1992In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of (207,t g," Date  14 /3/ M3The University of^itish ColumbiaVancouver, CanadaDE-6 (2/88)AbstractThe future of high performance computing lies in massively parallel computers. In order tocreate software to utilize this ever more powerful and complex hardware, software designersmust reconcile the desire to provide simplifying abstractions with performance requirements.This thesis examines one approach to addressing this problem.It has been observed that most parallel programs are written so that they possess the structureof one of a relatively small number of paradigms which describe the essence of a virtualmachine. It is well known that better performance will be achieved when there is a closematch between the structure of a virtual machine and the algorithms of a program built usingthe machine. We have proposed the skeleton-template-module (STM) as a parameterized,partially implemented virtual machine, corresponding to a particular paradigm. An STM isimplemented by an expert system programmer and serves to hide the internal complexity ofthe virtual machine from application programmers who are only presented with an interface tothe missing components and parameters.To be truly useful a construct such as an STM must be supported by and fully integrated intoa viable programming environment. Such a system must support both the construction ofSTMs and their use to create applications as part of a general programming framework.However, the lessons of existing sequential and parallel programming systems must not beforgotten. "Ease of use" must remain a high priority. Reuse must be facilitated andencouraged wherever possible. Scalability in its several forms must exist. Finally, thesystem must also recognize that programming is but one task in the development of parallelprograms and so must support the existence and cooperative functioning of a variety of tools.Of these tools, the most important from the developer's point of view is the user interfacewhich provides both support for constructing programs and interacting with the rest of theenvironment.We have designed and built such an interface and environment (Parsec — the Parallel Systemfor Efficient Computation) along with prototype mapping and loading tools as a follow-on toour original TIPS environment. The system utilizes the process-port-channel model ofparallel program description and supports these primitives with a graphical interface. Thisinterface provides arbitrary hierachical organization within a program by the use of STMs andlogical subgraphs within STMs. In addition, objects such as module parameters, processtypes, process groups, and parameterized graphs are provided to both support STMs as well asmeet the objectives of "ease of use," reuse, and scalability.IITable of ContentsAbstract^ ii1 Introduction1. 1 Motivation^ 11.2 Reconciling Development Complexity and Performance1.2.1 Program Design^ 31.2.2 Environment Support^ 71.3 Objectives^ 81.4 Outline of Thesis 102 Related Work2.1 Paradigms, Skeletons, and Templates^ 112.2 Software Development Environments2.2.1 Similarities^ 142.2.2 Tools and Integration^ 162.3 Parallel Software Development Tools2.3.1 CODE^ 202.3.2 HeNCE 212.3.3 Paralex 222.3.4 Poker 232.3.5 ParaGraph^ 232.3.6 Frameworks 242.3.7 PIE 253 Primitives for a Parallel Environment3.1 Computation3.1.1 Choosing a Primitive^ 283.1.2 Implementation 293.2 Communication3.2.1 Complexity ^ 313.2.2 Ports 333.2.3 Channels 343.3 Summary^ 35iii4 Supporting the Process-Port-Channel Primitives4.1 Projects^ 374.2 Modules 374.3 Parameters 404.4 Process Types^ 424.5 Process Groups 444.6 Parameterized Process Graphs^ 464.7 Target Environments^ 484.8 Files^ 484.9 Summary 495 Interface Issues5.1 Base Program Representation^ 505.2 Controlling Display Complexity 525.3 Interface Levels^ 555.4 Auxiliary Views 565.5 Summary 566 Implementation and Example6.1 Defining Parameters^ 586.2 Defining a Parameterized Graph^ 606.3 Defining Process Types 636.4 Adding Processes^ 656.5 Establishing Port Bindings and Other DescriptiveInformation^ 656.6 Defining Process Creation Information^ 676.7 Creating a Reusable Module^ 686.8 Instantiating a Reusable Module 69iv7 Conclusions7.1 Synopsis^ 737.2 Current Status 747.3 Future Work7.3.1 Communication Support and Typing^ 747.3.2 Data Decomposition ^ 757.3.3 Derived Objects 767.3.4 Advisors^ 76References^ 781 Introduction1.1 MotivationThe future of high performance computing lies in massively parallel computers. Theforces of technological evolution clearly point in this direction [Ke921. Computers withhundreds of processors are being designed and built and systems with thousands of processorswill soon follow. However, just as in the past, this rapid and steady march of technology inthe hardware domain is being followed at a considerable distance by progress in the softwaredomain which is neither so steady or rapid.The essential problem facing software designers is how to reconcile two critical, butconflicting, objectives. The first is that software developed for these powerful machines mustmake reasonably efficient use of their power to justify their existence. There is, after all, littlepoint in using a 100 processor computer to obtain a ten percent performance improvement overa 10 processor system. The second objective requires that it must be reasonably easy to createprograms for these machines in order for them to be broadly useful. The obvious conflict is aclassic computer science dilemma: abstraction versus performance.In the sequential domain there is a relatively smooth correlation between increases inabstraction and decreases in performance. As a consequence it has been possible for a widespectrum of choices to be available to suit almost any requirement. Furthermore, performanceimprovements in sequential computers have consistently led to significant, albeit notproportional, increases in program performance. In contrast, in the parallel domain thecorrelation between abstraction and performance is far from smooth. It is not unusual forrelatively minute adjustments in a parallel program to produce dramatic speedups.12 CHAPTER 1Furthermore, parallel domain abstractions seeking to provide a level of simplificationcomparable to that in the sequential domain must hide a much greater underlying diversity.One only has to observe the enormous variety of parallel architectures as compared to that ofserial architectures to see the scope of the problem. And, of course, the less correlationbetween abstractions and reality, the greater the potential for performance degradation.It is also, of course, possible to take the view that future improvements in parallelhardware performance and design will make such complexities irrelevant. Based on thisassumption, the primary problem to solve is that of ease of use. There already exist a numberof high-level textual and graphical and textual languages for programming parallel machines(see Chapter 2). Many of these have been implemented, do produce significant simplifications,and in some cases, produce significant speedups. In essence, these systems seek to present asimple, universal virtual machine to the user. The assumption underlying these systems is thata generic code generator can with a few user supplied function create a complete and efficientparallel application.However, what if the virtual machine implemented by the system is not a good fit forthe machine the application is to run on? What if the application algorithms are not a good fitfor the virtual machine? Certainly the existing systems create complete applications, but theirperformance over a wide spectrum of applications is at best highly suspect. Even under thebest of circumstances it is fairly optimistic to hope that a generic code generator or compilerwill be able to convert an arbitrary application in a generic high level representation into anefficient program on a particular parallel architecture. Creating compilers to perform this task isone of the most difficult in parallel programming [Ke92]. Thus, its not surprising that the bestexisting parallel compilers are highly specialized to optimize very specific forms of parallelismfor particular architectures.Furthermore, there is no clear concept of precisely what amount of performance isexpendable for gains in ease of use. What also must be remembered is that the performance ofany software system is rarely uniform. While the worst case performance of a "universal"software system may indeed be unacceptable, it may also perform very well for a morerestricted set of applications or platforms. Thus, talking about the quality of performanceprovided by any given system is not simple.What is clear is that efficiency is desirable. Software that performs well today is verylikely to perform even better on tomorrow's hardware. History has shown that expectedINTRODUCTION^3improvements in hardware performance, when they do materialize, are frequently needed justto keep up with the demands of applications. It is unlikely, particularly in domain of high-performance computing, that in the foreseeable future such high performance hardware will beavailable that future improvement will genuinely be "surplus," available to be sacrificed forease of use. Thus, it is important to develop software systems that strive to maximizeefficiency while still achieving the goal of limiting development complexity to a reasonablelevel.1.2 Reconciling Development Complexity and Performance1.2.1 Program DesignA significant hint as to how parallel programming can be simplified without sacrificingperformance can be seen in one trend in current parallel programming. It has been observedthat most parallel programs are written so that they possess the structure of one of a relativelysmall number of paradigms [AhCaGe91, A191]. A parallel programming paradigm is a concisestrategy for parallelizing a program [F179]. Examples include processor farm, divide andconquer, multi-pipelines, and geometric decomposition.Another useful observation is that these paradigms describe the essence of a virtualmachine. Of course, seeking a universal virtual machine for all parallel programming isattractive. Many such machines have been proposed, some still primarily on paper such as thePRAMs and Valiant's model [Va90], others such as Linda [Ca89] already implemented and inuse. However, whether any of them will be able to provide generally acceptable performancefor all programs is as yet an open issue. Notwithstanding, it is well known that betterperformance will be achieved when there is a close match between the structure of a virtualmachine and the algorithms of a program built using the machine. Even if an acceptableuniversal machine is found, there will always be a significant community which will not findthe relatively modest performance costs associated with it acceptable. These facts inconjunction with the previous observation that programmers of parallel machines currently usea set of paradigms suggest that providing a choice of virtual machines from which the best fitcan be chosen is both natural and efficient and likely to remain desirable well into the future.The concept of a skeleton program has long existed in programming contexts wherethere is a significant amount of overhead involved in creating an application (i.e., code, data4 CHAPTER 1structures, and functionality required for a program to operate in a particular environment). Askeleton program implements the essential structural components required of any non-trivialprogram in the context. For example, there exist skeleton programs for Macintosh andXWindow applications. Such programs are frequently provided by vendors as programmingexamples. Another term for this type of program is application shell. The important attributeof such programs is that someone other than the application programmer has provided most ofthe system dependent infrastructure code in such a form that it needs merely to be fleshed outby the application programmer. The diversity of parallel programming is such, though, that nosingle skeleton can hope to simultaneously provide a complete framework while at the sametime still be general enough to be viable for the entire domain, or even a majority of it ICo891.A closely related concept is that of programming templates. This is typically seen as apartially coded program implementing a basic framework with annotated gaps left whereapplication specific code can be added to create a complete program (see §2.1 for examples).Frequently some sort of support mechanism is provided for moving between the gaps. Whilethis is not drastically different from the idea of program skeletons, it introduces the idea ofpresenting an interface for completely specifying a partially implemented program.A notable contribution to the unification and extension of these concepts is the work ofCole on algorithmic skeletons [Co891. Cole identifies the following two importantcharacteristics of an algorithmic skeleton:• it provides only an underlying structure that can be hidden from the user• it is incomplete.Much of the incompleteness of a skeleton can be expressed by parameterization. Thisparameterization need not just be by simple variables such as the number of processors, butalso by other factors, such as granularity and topology, which are machine dependent and canhave an enormous impact on performance [FeSrWa92].Consider now an implementation of the virtual machines borrowing from all of theseconcepts. In deference to its relationship to these concepts we call this object a skeleton-template-module or, more tersely, an STM. Such a construct could be a parameterized,partially implemented program hiding the skeleton structure from the user and only presentingan interface to the missing components and parameters. Such a concept provides a route tosatisfying the need to limit development complexity while at the same time not shutting the dooron performance concerns. In many respects, this approach to simplifying applicationINTRODUCTION^5specification provides the best of both worlds — a high degree of freedom and flexibility foroptimization to the expert developer of an STM and a quite simple instantiation interface for theapplication developer. Hiding underlying structure from the user by presenting a simpleinterface results in programs that are easier to understand and maintain, as well as less prone toerror. In particular, the programmer can focus on the computational task at hand rather than thecontrol and coordination of the parallelism. An STM encapsulates the control andcommunication primitives of the application into a single abstraction.While communication encapsulation can be made complete in all cases, encapsulation ofcontrol is less clear. It has been noted that there are actually two types of control structure invirtual machines: skeletal control and procedural control (e.g., IEpcc924). Skeletal control isthe case where all control decisions are made by the virtual machine and interaction between itand code that completes and utilizes the virtual machine only involves data. Examples of thiscase include processor farm, divide and conquer, and geometric decomposition. Proceduralcontrol exists in cases where the virtual machine is more flexible and the application can choosean operation to be applied to a set of data and/or the order of operations. A good example ofthis case is the vector virtual machine. There is not a clear delineation between these cases as isevident if one considers a processor farm where part of the data specifies which of a set ofoperations is to be performed on the data. We simply maintain that as much control as possibleshould be maintained inside a virtual machine and, in particular, all control involvingparallelism. For example, while an application developer should specify operations and theirorder for a vector virtual machine it should not be his responsibility to perform orderingoptimizations within those constraints or to worry about which instructions are compatible withpipelining.With respect to performance, the obvious advantage to our approach is that, because ofthe restricted nature of a paradigm, it is easier to make efficient use of a specific parallelmachine. And, because it is derived from a simple strategy, it is easier to provide applicationdevelopers with a machine-independent, easy to use interface. Since the underlying structure ishidden from the user it can be implemented in an optimized, machine dependent way withoutimpacting the portability of the applications built on top of it. Furthermore, it has been notedeven in the sequential domain that programming paradigms are easier to implement and debugthan normal, general purpose programs and, in turn, applications built out of theseimplementations also have a simpler structure and are easier to debug Va891.6^CHAPTER 1/ApplicationsVirtual MachinesOperating SystemHardwareFigure 13 Layers of abstraction for parallel program developmentOur approach implies that there are four primary layers of abstraction in parallelprogram development as depicted in Figure 1.1. The top two layers follow the abovediscussion. The bottom two layers exist in almost any system although the scope, complexity,and portability of the operating system can vary greatly. The nature of programming takingplace at our virtual machine and applications layers corresponds respectively to that in the Yand Z layers in Snyder's three layer decomposition of parallel programming [Sn911. Snyder'sX layer corresponds to serial programming.Ideally the virtual machine layer would be operating system independent to facilitateportability and simplify implementation. This could be achieved by some form of lowestcommon denominator operating system or interface to operating systems. While some systems(see Chapter 2) have attempted this type of generality, it fails to take into account the significanthardware differences which frequently, but not always, underlie operating system differences.Ignoring these differences means sacrificing performance. Thus, we take the position that thedeveloper of a virtual machine on a particular hardware architecture must have access to a fullyfeatured operating system for accessing the functionality of the architecture in order to insureadequate performance.INTRODUCTION^71.2.2 Environment SupportAnother means of simplifying the construction of efficient parallel programs is toprovide support in the form of an integrated environment for program development andexecution. As discussed in IChG091] in the context of our earlier TIPS environment, such anenvironment incorporating intelligent tools can simplify or take over many tedious and errorprone tasks while at the same time facilitating better performance and portability. There are alarge number of functionalities which such an environment should support and integrate; forexample,• structural specification• editing source code• program compilation• answering queries about available resources• mapping (contraction, placement, assignment)• module loading and execution• runtime support• multi-module scheduling• monitoring• visualization• debugging.This type of environment should not be confused with "programming systems" which provideonly a language or those that only provide some form of user interface for programspecification (a single component of a full environment).The potential to incorporate considerable amounts of intelligence into the system existsin several of the functionalities, such as mapping and scheduling. While such intelligence maybe considerable, it should never be seen as a replacement for the developer and should alwaysproduce results that can be examined and modified by the developer. Given the current levelsof algorithm sophistication, in the best cases such tools may remove the need for developerinvolvement, and in the remainder at least provide the developer a reasonably good startingpoint.One of the problems of intelligent tools is that they frequently require large amounts ofdata to make good decisions. For example, a mapper will need information about resourceavailability and can make use of statistics about those resources and performance information8 CHAPTER 1from a monitor to avoid creating bottlenecks. Generating and dealing with this data can, ofcourse, be a substantial burden on a developer. However, by integrating all functionalities in asingle environment it is possible that this data can be produced, managed, and consumedwithout developer involvement.The value of such integration should not be underestimated. As an example, considerthe problem noted in the last section of the need for virtual machine developers to directlyaccess the operating system in order to make efficient use of the hardware. This problemcommonly arises when there exist several different communications primitives which each havedifferent costs and restrictions. Making the best use of these is essential to obtaining goodperformance but is a burden to the developer and significantly inhibits flexibility andportability. However, making such optimizations is exactly the type of task a mapper canperform. With access to hardware descriptions and the program specification, a mapper candetermine optimized specifics (e.g, physical resources, routing, multiplexing) for anycommunication. These specifics can be passed to the program when it is loaded and utilized bythe runtime system. Of course, given the complexity of the optimization problems in parallelprogramming, no tool, even when utilizing a spectrum of algorithms, can guarantee to do aperfect job or even an acceptably good job in all cases. Thus, such tools must always providemechanisms to allow a developer to adjust or even ignore their decisions. Nevertheless, withthe bulk of these tasks handled by the environment behind the scenes, a developer is free tofocus his attention on programming at a higher, more portable level of primitives provided bythe runtime system — without sacrificing significant performance.In addition to the above general functionalities, an environment should reflect, support,and take advantage of programming models such as the one discussed in the previous section.For instance, it could reflect that model by possessing a hierarchy of programming levels andcould support it by incorporating mechanisms for declaring, updating, and utilizing STMparameters.1.3 ObjectivesAs noted in Chapter 2, the essence of the STM concept, both in parts and as a whole,has been proposed before. There do of course exist comprehensive parallel programmingenvironments and even vague proposals for such environments utilizing programmingINTRODUCTIONparadigms. However, there does not yet exist a system which turns these concepts into areality. Such a system must support both the construction of STMs and their use to createapplications. In order to be relatively flexible such programming should be part of a generalprogramming framework. Such an approach insures that the inevitable variations andexceptions can be accommodated cleanly within the system rather than as tacked on, case bycase extensions. Furthermore, building on top of a more general, lower level frameworkcreates the potential for greater consistency and homogeneity within the system.The system must also recognize that programming is but one task in the development ofparallel programs. The system must be an environment which supports the existence andcooperative functioning of a variety of tools which aid the developer in creating, debugging,tuning, and executing a parallel program. Of these tools, the most important from thedeveloper's point of view is the user interface which provides both support for constructingprograms and interacting with the rest of the environment.While the design of this interface must reflect all of these concerns, it also has theresponsibility for making the objective of "ease of use" a reality. While there is no simpleformula for performing this task, there are a number of issues which must be addressed:• interaction with the system should be as easy and intuitive as possible• reuse must be facilitated in order to minimize the amount of actualprogramming which must be done• scalability must exist in several formsit must be possible to move with little or no effort from smalldevelopment versions to massively parallel applications ofdifferent sizes to suit the resources of different machinesit must be possible to specify complex programs withoutbecoming overwhelmed by that complexity—it must be possible to compose existing programs into largerprograms with minimal effortThe objective of this thesis is to motivate and describe the design of an environment and userinterface which addresses all of these concerns. In doing so, we recognize the need to bothshare as much as possible in our context with conventional software developmentenvironments as well as borrow useful concepts from other parallel environments. Wherethese sources of inspiration do not suffice, novel approaches or modifications to existing1 0^CHAPTER 1approaches will be proposed. The initial implementation of this design is discussed whererelevant.1.4 Outline of ThesisThe remainder of this document is structured as follows. Chapter 2 outlines severalrelated topics, briefly surveys a number of related projects, and motivates a number of thedesign decisions in the following chapters. Chapter 3 discusses the basic parallel programmingmodel which serves as the foundation of our system, and particularly the interface. Chapter 4describes and motivates a number of objects which have been added to the system to make thebasic model more usable. Chapter 5 discusses the basic considerations in the design of theuser interface display to support user manipulation of the objects presented in the previous twosections. Chapter 6 presents an example of how the interface may be used to construct an STMwhile Chapter 7 presents conclusions and a brief view of directions for future work.2 Related Work2.1 Paradigms, Skeletons, and TemplatesThere are two different ways of looking at parallel programming systems. In one thesystem should provide useful, small set of hopefully simplifying primitives which the author ofa program incorporates into a program. The responsibility for structuring and implementingmost, if not all, parallel control and communication is still the user's responsibility. Linda[Ca89], Strand [Fo0v911, general purpose MIMD 'operating systems' (e.g., Trollius[BuPf88]), and operating system 'interfaces' (e.g., PVM [Su901, CHIMP [Wi91b1) areexamples of this approach. The other view, exemplified by our project, is that the systemshould provide everything possible, such as program organization, control, communication,and performance tuning facilities. The user's only task is to provide those few routines uniqueto his application such as computation and data generation. Of course, it is impossible for sucha system to possess the generality of the first type, but we see the benefits of restrictinggenerality to the level of a single paradigm far outweighing the costs.Numerous parallel programming systems are available [LeER92] and many of these doattempt to utilize parallel paradigms; however, none of these systems directly integrate theparadigms and their virtual machine implementations into the environment. For example, insome languages the characteristics of the paradigm are often not used by either the compiler orother tools in the environment. There are also compiler optimization tools that do source tosource translation of programs for parallel machines, but the underlying abstract model of themachine utilized for the optimization is more general and less accurate than the models weconsider [EiB191]. However, an approach similar to ours can be seen in Gelernter's proposal1112 CHAPTER 2for integrating the Linda Program Builder into a support environment [AhCaGe91]. Anothersimilar approach can be seen in PIE's [GrSe85, LeSeVr89] implementation machines[RaSeVr90] which are discussed in §2.3.7.Our approach was originally motivated by the algorithmic skeletons proposed by Cole1Co891. It differs from and extends Cole in that support from and integration into anenvironment is included in the definition of our STM. Furthermore, our focus is onperformance tuning instead of asymptotic behavior. Like Cole we see the potential for writinghighly portable applications using virtual machines. However, we believe that it is not possibleto implement virtual machines independently from the underlying hardware and environmentand that the quality of the virtual machine ultimately relies on the potential for it to be tuned tothe characteristics of the parallel system on which it is to execute. Maximizing this potentialwhile minimizing the user complexity requires that a virtual machine implementation bedesigned to take advantage of whatever performance information is available with as littleassistance from the application developer as possible. Such information may be obtained froma virtual machine specific source such as a parameterized model of its execution or a generalmechanism such as a performance monitor. The role of the support environment is to providethe general mechanisms, enable virtual machine specific extensions, and provide means bywhich they may be integrated with other sources of relevant information such as a mapper.We do not claim that those systems which only provide primitives are not capable ofpresenting a virtual machine to a user. For example, there is a project to build on top ofCHIMP libraries implementing different virtual machines (e.g., [Epcc92a]). Another exampleis the Linda Program Builder which provides a specified paradigm in the form of the full Lindacoding of the skeleton with marked areas which the user must fill in with application specificcode. This code must to some extent employ the Linda language (extensions). In contrast, ouruser components are kept distinct from system components and are defined so that the usershould need to use no communication or other, explicitly parallel, primitives. This contributesto portability since the user must only recompile and relink his components to migrate to adifferent platform which supports the same skeleton, even though it exists on differenthardware and uses a different language, communication technique, and/or algorithm.Providing source code templates to speed and assist user module coding does not compromisethis separation.RELATED WORK^13In addition to the relatively small number of libraries implementing virtual machines,there are a large number of general purpose libraries available for parallel programming.Some, such as those available with Logical C [LC891, provide little more than wrappers foraccess to the underlying hardware and are, thus, not at all portable. Others are part of atargeted for a particular type of parallel system. These may be part of a relatively completeoperating system as in the case of Trollius or part of a system to provide specific functionalityas in the case of PICL [HeGi91]. As in the case of Trollius, the more complete systems mayprovide full access to the hardware in addition to higher level, more flexible, and moreperformance costly features (e.g., non-local communication). Still other libraries such as PVMand CHIMP have been developed with the specific purpose of providing a uniform multi-platform programming environment on top of platform or architecture specific operatingsystems. However, it also true that these systems carry a performance cost which varies anddo not support all types of architectures, and do not always perform optimal communications(e.g., use a non-local communication primitive for a local communication when fasterprimitives for local communication are available). Some of the more widely used libraries forparallel programming, such as PICL and Trollius, are supported by monitoring andvisualization tools. ParaGraph [HeEt911, for example, allows users to visualize performanceinformation about communication intensive PICL function calls and is also supported as anoption in Trollius.The problem with these general purpose libraries from an application programmingpoint of view is that they are at too low a level, as are, necessarily, the information andvisualizations gained by monitoring them. Even with this degree of support there is largeamount trial and error involved in tuning a program since there is no direct, obvious correlationbetween the actual causes of problems and the effects which the user sees. One of the moresophisticated environments at this level is the work of Fox [FoKe91]. This system utilizesexperimentally measured "typical" times for all of its primitive operation and uses these topredict how long a given block of code will take to execute. Tools at this level help userslocally optimize their program, but they do not address the problem of global optimization. Incontrast, our use of an virtual machines with analytic models clearly indicates to a user whichproperties of a program most affect performance, perhaps even without running it once. Evenwhen analytic models do not exist for virtual machines, the fact that a virtual machine14^CHAPTER 2implements a restricted, parameterized model of computation serves to simplify and focuspotential performance issues.2.2 Software Development Environments2.2.1 SimilaritiesAny environment which supports the software development process can be seen as asoftware engineering environment. Software engineering environments in the sequentialdomain have evolved to the point where there is a considerable degree of standardization, atleast with respect to functionality and general design principles (see for example [EnWe91]).Since environments for both the sequential and parallel domains share the general purpose offacilitating the building of large, complicated programs with as much generality as possible, itis potentially useful to examine compatibilities between these "standard" features of sequentialenvironments and the requirements for parallel environments.However, one must be careful when making such an examination. While sequentialand parallel environments can be viewed as existing in two different domains, the sequentialcase can also be viewed as below the parallel case in a hierarchical organization since most, ifnot all, existing sequential environment functionality is applicable to the design andimplementation of the sequential components out of which a parallel/distributed program isconstructed. Notwithstanding this connection, the interesting question is to what extent theresults of work with sequential environments can be applied to parallel environments. Some ofthe compatibilities include:• a small set of primitive building blocks• a rich set of functionalities supporting the use of the primitive building blocks• a focus on modularity and reusability• a basic structural representation — graphs• a basic implementation structure consisting a graphical interface wellintegrated with a set of intelligent tools which perform specific functions.Conventional software engineering environments utilize functions (or some similar unitof code encapsulation) as the basic building blocks out of which programs are built andfunction invocations as the basic method for connecting these building blocks into largerstructures. Most environment operations deal with these objects or aggregates of them.RELATED WORK^1 5Keeping their number so small makes the environment simpler and homogeneous. Acorresponding set of primitives for a parallel environment are discussed in Chapter 3.Furthermore, just as much of the utility of conventional environments derives from their richset of features for defining and manipulating functions, a set of features to support parallelprimitives is needed to make a parallel environment useful. Such a set of features is discussedin Chapter 4.A common high priority on modularity and reusability is not at all surprising since bothshare the objective of developing large, complex programs. However, it should also be notedthat the units of modularity will be different, just as are the basic building blocks and thefeatures built on top of them. Also, reuse is capable of taking on a different meaning in theparallel context since much of the size and complexity derives from replication rather thaninclusion. Modularity and reusability are guiding motivations in the discussion of Chapter 3and particularly §4.2 and §4.4. An additional requirement for providing reusable modules,flexibility, is discussed in §4.3 and 0.5-6.It should also be noted that other high priorities in conventional environments such assupporting large work groups and documentation are still issues in the parallel context.However, their importance is considerably less in the parallel context because the currentlyenvisioned typical user (an individual in a research environment with a task specificapplication) does not have as great a need for these features as the typical users of sequentialenvironments (groups of programmers in a commercial environment creating applications withlong lifespans). The typical user of a parallel environment instead places a very high priorityon performance and scalability which are also issues in sequential environments, but not toppriorities there.Conventional software engineering environments rely heavily on graph representationsof structural information and graphical displays of this information. Given the importance ofstructure in parallel programs and the correlation of logical structure to physical structure, sucha choice makes even more sense in a parallel context. Graphs are of course widely used inmany different contexts as a convenient and understandable representation of structures andtheir internal dependencies. However, the use of a graph representation does not necessitatethe use of a graphical display and interface. For instance, both the Prep-p mapping tool[Br5t89] and the Oregami mapping system [LoRa90] use a textual language to specify a graphrepresentation of a parallel program. It may be argued that such an approach is more powerful16^CHAPTER 2and flexible than a graphical interface approach and is in many cases more terse. Even grantingthe validity of these arguments, a graphical interface has two surpassing advantages. First, itprovides a much easier to understand representation, and, second, it has the potential toprovide an intuitive set of manipulation operations instead of the implementation specific syntaxand semantics of a particular language. Some of the issues involved in specifying a graphicalrepresentation and interface for a parallel environment are discussed in Chapter Tools and IntegrationConventional software engineering environments must support a number of distinct butrelated activities. This is also true of environments supporting the development and executionof parallel programs although the number activities is potentially larger, particularly fordistributed memory systems. A minimal set of these activities will include:• structural specification• editing source code• program compilation• answering queries about available resources• mapping (contraction, placement, assignment)• program loading and execution.In addition, support for other activities such as• multi-program scheduling• monitoring• visualization• debuggingis highly desirable. While some of these activities parallel those in conventional environments,others are not relevant in the sequential domain (e.g., mapping) and others are handled byoperating systems (e.g., loading, scheduling). Another difference is that while these activitiesare typically implemented in a single, general purpose form in the conventional case, in theparallel case many will be implemented as a set of specialized variants (e.g., a set of mappingalgorithms which each perform well over a restricted domain). Furthermore, the level at whichthe activities are carried out will be different in many cases (e.g., debugging a single processvs. debugging a group of communicating processes).RELATED WORK^17What is common is the recognition of the fact that a significant number of distinctactivities must be supported and that any attempt to build them into a single tool would beunwieldy at best. However, even in conventional environments, there are no clear guidelinesas to where application boundaries should be. In a parallel environment, this problem is evenmore complicated because of the existence of additional activities and the fact that theimplementation of an activity may consist of a number of domain specific variants.The early implementations of our project relied heavily on sharing information via filesand, later, on combining multiple activities into one tool. However, there are severaldrawbacks with this approach:• a base interface for a system as complex as this is itself a large andcomplex program• a file interface for a system as complex as this is cumbersome and timeconsuming to maintain (e.g., establishing file formats, maintainingparsers, managing file locations and names)• including conceptually independent tools such as the mapper inanother program, regardless of design modularity, hinders theirdevelopment• a large amount of time and effort is required to create, maintain, andmodify complex language data structures and the operations that utilizethem.These observations motivated us to re-examine our design options. Once again, conventionalsoftware engineering environments pointed a way out. Better and more sophisticatedintegration allows greater decomposition in addition to other side benefits. While this is hardlysurprising, the concept that the overhead and restrictions implicit in such a decision might beacceptable and worthwhile was not intuitive.Tool integration is itself a very rich and complicated topic. However, the discussion ofit can be simplified by observing that it can be decomposed into a few orthogonal dimensionsas illustrated in Figure 2.1 1Wa891. Of these, clearly the most important with respect to ourimmediate problems was data integration.18^CHAPTER 2Data IntegrationAfter evaluating the data integration alternatives we chose to utilize a relational database.The prevailing opinion in the software engineering environment community is that these are notpowerful enough to handle the real demands of software engineering environments 1WiWo891.Nevertheless, they are still used by many commercial software engineering tools. They arecertainly acceptable, at least as an interim solution, because of our relative simplicity (comparedto research software engineering environments for sequential systems) and attractive becausethey are well understood and reliable. Furthermore, we found that the relational model mappedwell onto our data representation needs.A Presentation IntegrationStandard"look and feel"StandardtoolkitStandard windowmanagerStandard windowsystem^ Data IntegrationExplicit message^Message Shared files Database Object baseDaemonTriggerMessage serverControl IntegrationFigure 2.1 Three dimensions of tool integration [Wa89]RELATED WORK^19Before settling on relational databases a number of options were examined. Weconsidered implementing a simple, custom database system in order to minimize the impact onoverall project complexity and performance. However, we ultimately rejected this approachbecause of the effort required to implement and support such a system and because it wouldonly have distanced us from, but not removed, the maintenance of a file interface. Otherexisting simple Unix databases such as ndbm and xrdb were considered, but they do notprovide any real capability for expressing the structural richness of our data and our diversequery needs. We were unable to consider state of the art object databases (see ILo891 forexamples) because of they are not widely available. Furthermore, they are still a relativelynew, and thus unstable, technology.We initially used the Postgres relational database system [Postgres92], a follow-on toIngres being developed at U.C. Berkeley, because it is state of the art, public domain, andpromises "extended relational" features such as rules and an object oriented approach whichshould make it better able to address the needs of a software engineering environment than aconventional relational database. However, an initial implementation constructed on thisplatform encountered several problems. By far the most significant of these was very poorperformance, even on relatively fast hardware. Performance problems with relationaldatabases in general have been noted by other parallel interface system developers ISeRu851.However, the severity of our problems with Postgres can be attributed to its being primarily aresearch project as well as inefficient programming necessary to avoid bugs and incompletefeatures. Since the performance was unacceptable for general use a second implementationusing an Oracle V6 database was created. Even without performance tuning thisimplementation has much better performance and is fully acceptable for general use.Presentation IntegrationIn contrast to our difficulties with data integration, presentation integration has been amuch clearer issue. From the beginning we chose to develop the XWindows basedOpenWindowsTm environment which implements the OpenLookTm "look and feel" guidelines[0L89]. Choosing such a standard presentation environment is desirable both from a user'spoint of view as well as from a design and development point of view. Our decision was20^CHAPTER 2simple because of the ready availability of OpenWindows on our Sun workstations in additionto the availability of a well evolved interface prototyping tool (OpenWindows DevGuideTm).Control IntegrationControl integration between our tools is currently very simple because of the toolsinvolved: the interface, mapper, and loader. The mapper and loader programs are currentlydirectly invoked from the interface via Unix process control primitives. However, morecomplex control interactions may be required for the scheduler which will require repeatedinteractions with both the mapper and a performance prediction tool to arrive at reasonablyoptimal scheduling decisions. A more complex, event driven mechanism will certainly berequired to support user interaction with debugging and visualization tools.2.3 Parallel Software Development ToolsThere currently exist a number of substantial, relatively general purpose softwaredevelopment environments for parallel systems. Notwithstanding their desire to serve asgeneral purpose tools, most of these systems have a particular focus to their development andtend to treat remaining issues as secondary. A number of the more interesting and relevantsystems are discussed briefly in the following sections.2.3.1 CODEThe primary objective of CODE (Computation Oriented Display Environment) [Br89] isto present a unified, well-defined, comprehensive model for parallel programming. CODEattempts to do this by presenting control flow (dependency) graphs as a language. Instead ofutilizing the process as the basic building block for parallel computation CODE uses a"sequential unit of computation" [Br85].Such a computation unit consists of a firing rule and an action that transforms a set ofinput dependencies into a set of output dependencies. A computation unit is triggered whendata is available for all of its input dependencies. A major limitation of such computation unitsis that they, at least in theory, do not permit the existence of persistent state which is needed fora significant number of programs (e.g., search, geometric decomposition). Of course, it ispossible in practice, with knowledge of the implementation, to find ways to store persistentRELATED WORK 21implementation at the price of losing portability and violating the model. The developers ofCODE see the separation of dependency and firing rule specification from functionality as oneway of facilitating reuse.In addition, CODE provides objects such as merge and distribution filters. These areobjects, independent of computation units, which CODE computation units may either senddata to or receive data from. A merge filter provides a number of algorithms for merging manyinput channels into a single channel while a distribution filter provides several algorithms fordistributing a single input channel among many output channels. Such filters also provide ameans of constructing logic 'OR' dependencies.CODE is targeted to shared memory and tightly coupled MIMD computers and, thus,assumes a homogeneous environment. The main component of CODE is in essence a programspecification tool which primarily consists of a graphical interface for constructing dependencygraphs, sequential units of computation, their firing rules, and filters. The other maincomponents of CODE are source code generators to convert graphical specifications intocompliable conventional languages. The CODE interface does support hierarchical graphs andcomputation unit or subgraph reuse via its ROPE tool lLeLi88]. However, CODE subgraphsare purely a user convenience for display and manipulation; the system itself deals solely withthe flat, complete graph. CODE does not provide any support for special graph constructssuch as for loops which are at best awkward when implemented manually and in many casescause the system to fail. Some support for conditionals in the graph is available via specialfilters. CODE does provide very limited support for scalability in the form of 'replicationnodes.' Such a node is essentially a statically resizable vector of completely identical replicasto which a single parent can distribute different pieces of data. All output from replicas must besent to the same destination.2.3.2 HeNCEHeNCE (Heterogeneous Network Computing Environment) [BeDo9 1 a, BeDo9lb] issimilar in a number of ways to CODE. Both systems require the explicit specification ofparallelism within a program and do not facilitate scalability. Furthermore, both utilize agraphical interface to specify a directed control flow (dependency) graph in which a nodecorresponds to a computation which is triggered when all the input dependencies are availableand which, at least formally, may maintain no state information. Unlike CODE, however, the22 CHAPTER 2computation corresponding to a node in the graph may in HeNCE be a parallel computationitself as long as it executes on a single parallel machine. Although the issue is not discussed intheir work, such parallel nodes could be used to implement particular paradigms and socorrespond to unparameterized STMs.Unlike CODE, HeNCE supports graph primitives for describing loops, pipelines, andconditionals and so avoids some of CODE's more obvious limitations. Pipelines are ageneralization of the CODE's replication nodes and permit any delimited piece of graph to bereplicated a specified number of times although they retain the constraint that all replicas musthave the same input source and output destination. HeNCE also supports a 'fan' primitivewhich corresponds to a CODE replication node. HeNCE implements these graph primitives asgraph writing operations (e.g., a pair of loop primitives cause the enclosed graph to bereplicated the specified number of times when the full control graph is internally generated).However, in spite of its support for conditionals, HeNCE does not provide functionalityequivalent to CODE's filters. HeNCE programs are intended to be run on a heterogeneoussystem. To facilitate this, HeNCE supports the existence of multiply defined, architecturespecific node implementations with a cost matrix to help a user select the best for hisapplication and available resources.HeNCE programs run on a network of heterogeneous systems and are written usingPVM [Su90]. In a HeNCE program a tightly coupled parallel machine such as a hypercube ornetwork of transputers is treated as a single machine and, as a consequence, seems primarilyapplicable to coarse grained applications. Although HeNCE and PVM could conceivably beimplemented to work inside tightly coupled systems, no effort has been made to so, perhaps inrecognition of the significant differences in resource management, overheads, and performanceexpectations.2.3.3 ParalexParalex [BaA191] is a relatively new system which is similar to both CODE andHeNCE. Like both of these systems it utilizes a graphical interface to represent programs ascontrol flow graphs (the developers prefer to view them as coarse grain dataflow graphs),assumes coarse granularity, and provides minimal support for specifying computation units(e.g., assumes only one source file, no libraries). However, unlike the other systems, itprovides no support for scalability and neither supports or even permits cycles in its graphs.RELATED WORK 23Paralex does support filter nodes very similar to those in CODE although those in Paralex arerestricted to dividing output data. Like HeNCE, Paralex does not support subgraphs and issolely targeted to distributed systems rather than more tightly coupled architectures.Paralex is distinct from the other systems described in this section in its support forfault tolerance through replication, dynamic load balancing, and the sophistication of itsmapping system. In addition, Paralex supports debugging with monitoring and visualizationtools as well as a simple debugger which allows a user to stop execution when one or morenodes is reached, examine or modify node input data, and continue execution. By providing amonitor, visualization tools, a debugger, and a substantial mapper Paralex provides a muchmore complete system for parallel programming.2.3.4 PokerOne of the most notable early parallel programming environments with a graphicalinterface is Poker [Sn841. Poker was targeted towards a particular class of homogeneous,configurable, tightly coupled MIMD architectures. It provided a relatively simple graphicalinterface for drawing a program's communication structure. The communications graph iscompletely static with no provision for supporting the design of scalable software. Additionalpieces of information such as the names of source code files are attached as annotations to thecommunication graph. Poker was one of the earlier parallel programming systems to highlightprocess and process communication ports as primitives for describing a parallel program. Alarge amount of the effort invested in Poker was involved in developing tools to map parallelprograms onto real hardware architectures or simulations of them. Ultimately these tools forsuch tasks as describing a process graph, contracting it, placing processes onto processors,and assigning physical resources to communication channels evolved into the Prep-p mappingsystem [BeSt89].2.3.5 ParaGraphThe ParaGraph graphical interface [BaCuLo90] is primarily dedicated to supporting thescalable specification of parallel programs. The designers see this task requiring the ability tospecify graph families rather than graph instances, to compose graph, to annotate these graphswith labels or information (as in Poker), and to display such graphs in a usable manner.ParaGraph supports the specification of graph families by utilizing aggregate rewrite (AR)24^CHAPTER 2graph grammars which are well suited to interconnection graphs which the authors characterizeas being sparse, recursively constructed, nearly symmetrical, and of low radius MaCu871.The ParaGraph interface essentially allows a user to specify a base graph and a set ofAR productions which both utilize and rewrite graph node attributes in addition to the graph.Construction of graphs which are not easily produced with AR mechanism can be aided by`scripts' which allow the user to control graph generation directly (i.e., specify allowablederivation sequences). Support for composing graphs is not described in detail but relies tosome extent on matching attributes for different graphs. Mechanisms for displaying the largegraphs this system is capable of producing have not yet been presented. Like Poker,ParaGraph uses processes, ports, and channels as its underlying primitives for describing aparallel program. ParaGraph is currently used in conjunction with a parallel computersimulator and is not designed to support heterogeneous systems.2.3.6 FrameworksFrameworks [SiScGr91] is currently the only system which supports any sort ofparadigm based approach to parallel programming with a graphical interface. Frameworks isdesigned and implemented to run on a homogeneous network of workstations in a multi-usercontext. This system supports two built in variations on a processor farm virtual machine: a`manager template' which only supports static worker allocation and a 'contractor template'which supports dynamic worker allocation. The primary purpose of the graphical interface isto create multiple instances of these templates, attaching task specific application code to thetemplates, and connect the template instances to create a large application. These activities allcorrespond to our application level of parallel programming (see Figure 1.1).The graphs created by connecting templates are required to be acyclic and templateinstances must not require persistent state. Frameworks does provide a typing mechanism forconnections between template instances and does perform type checking on these connections.The developers of Frameworks have observed significant performance improvements forcoarse grained applications. The developers also claim that the system could be easily adaptedto a heterogeneous environment and to more tightly coupled parallel systems. While theimplementation would be straightforward, it is not clear how the significant differences inresource management, overheads, and performance expectations would be addressed.RELATED WORK 252.3.7 PIEThe PIE system [LeSeVr89] is primarily oriented towards debugging correctness andperformance by providing performance monitoring and graphical visualization for both sharedand distributed memory programs. PIE also provides graphical displays such as program`roadmaps' and invocation trees to aid user understanding of parallel programs. As with oursystem, PIE is interested in efficiently mapping parallel application onto specific architectures.However, with respect to our work, what is particularly notable about PIE is that itsearly design [GrSe85] called for a system built to support a level of abstraction which theycalled an 'implementation machine' (IM) between algorithm and 'realization machine'(architecture) levels. They saw these IMs as corresponding to parameterized parallelprogramming paradigms. They also noted the potential for simplifying development and, moreimportantly from their point of view, simplifying debugging of correctness and performance.More recently, in [RaSeVr90] they described the implementation of a dataflow virtual machinefor shared memory architectures using precompilers, code templates, and runtime supportlibraries. This paper also presents worst case performance estimates for the virtual machineand empirically validates their correctness.The motivation for IMs corresponds very closely that presented for our STMs. Thetwo approaches are similar in a number of ways and their levels of parallel programmingabstraction correspond to ours if the OS and hardware levels are seen as both residing in their'realization machine' level. However, like the Edinburgh virtual machine libraries and unlikeour system the PIE system provides no environment support for building, instantiating, andmaintaining virtual machines. Support to help users create virtual machine applications exists,but is designed and implemented on a case by case basis. There is no attempt to fit theseactivities into a coherent larger framework. Overall, there is no discussion of support forscalability other than the early mention of the need for IMs to be parameterized. IM levelparameters have not yet been described in detail or implemented. Furthermore, while PIE canand does collect performance information on virtual machines, no techniques have beenpresented for using this information to automatically tune performance.26^CHAPTER 23 Primitives for a Parallel EnvironmentAll programs constructed with Parsec are ultimately described in terms of a simple,standard, and relatively general model for describing a parallel program [BaCuLo90, Wi9 1 a,Sn84]. This model consists of three primitive types of objects and their relationships. Theseobjects are processes, ports, and channels. Processes possess communication ports which canbe bound to one end of a communication channel. Specifying these bindings defines the basicstructure of a parallel program. A graphical representation of the process-port-channelprimitives and their relationships is presented in Figure 3.1.Bound Ports^Unbound PortChannel^ProcessFigure 3.1 Parsec primitives2728^CHAPTER 33.1 Computation3.1.1 Choosing a PrimitiveThe process is probably the most widely used type of object in all types of multi-taskprogramming and is the most obvious parallel to the role of the function in sequentialenvironments. Even systems which do not present processes as objects visible to users (e.g.,parallel Prolog implementations), almost always employ them internally. Not only is theprocess a natural unit of decomposition for parallel programming, but hiding it from the useralso would mean that environment operations on processes such as mapping and schedulingwould also be completely hidden from the user. Furthermore, debugging would become amore difficult problem since many components of a program should either be hidden from theuser or mapped from the implementation representation to a user representation.The main alternative to the process as the basic building block for parallel computationis the "sequential unit of computation" [Br85, Br89]. This object has the useful property ofonly accepting input communications at its start and producing output communications when itcompletes. While each such an object can be viewed as a member of a restricted class ofprocesses, CODE makes no claims to the user as to how they will be implemented. Browne'sapproach is conceptually attractive since it completely separates computation andcommunication and essentially preserves the sequential concept of invocation. However, thisapproach has a number of problems.While the pleasing qualities of Browne's model are evident in programs in which dataflows from the top to the bottom of a program, its problems become apparent in any casewhere loops including both computation and communication exist. In the simplest case wherethere are a single computation and communication and the computation maintains no stateinformation (e.g., a simple server task), the loop can simply be viewed as repeatedinvocations. However, the model fails in the cases where state must be maintained since nomechanism exists in the model for specifying and preserving this information. One of theproposed solutions to this problem is for a such a unit of computation to send itself a messagecontaining its state, which in turn introduces the problem of initialization. Of course, withknowledge of the underlying implementation, one may use an appropriate means of preservingdata.PRIMITIVES FOR A PARALLEL ENVIRONMENT 29The real issue is that no guarantees exist at any substantial degree of detail as to how theunit will actually be implemented. While a user may not be interested in whether a particularpiece of code is running on a transputer, workstation, or supercomputer, he is likely to beinterested in whether it is going to be a distinct process or not, whether that process will existcontinually or be repeatedly restarted, etc. While a user may not be interested in or need toknow what low level primitives are being used to implement a communication, in many cases itis important to know whether that communication is buffered or not. It is one thing to takeaway complexity, it is another to simplify by imposing restrictions. It can be argued that suchdecisions should be left up to a system which will choose optimally, but currently no suchsystems exist. To perform most types of tuning one must have access to a model that moreclosely corresponds to the actual running implementation. It is important to remember,however, that not everyone need work at this level — if some expert programmers developefficient restricted virtual machines which provide a similar, or even simpler, interface than thegeneric systems both the objectives of ease of use and performance can be satisfied for theauthors of applications built on top of these restricted virtual machines.There are also problems of a more practical nature. If the system automaticallytranslates every unit of computation into a process, then for many reasonable granularitieswhere the units will be equivalent to single functions and in cases trivial functions performancesignificant performance overheads will be incurred. If the environment groups units ofcomputation into process, then, as noted earlier, there is the question of how a user will debugsuch a program or interact with tools such as schedulers that must operate on processes.Furthermore, many performance optimizations rely on closely coordinating computation andcommunications.3.1.2 ImplementationThe process object is, not surprisingly, the most complex of the three primitive objectsand a significant number of pieces of information involved in specifying one. Parsec dividesthis information into two groups — structural information and construction information.Structural information includes such things as process type, ports, port bindings, groupmembership, weight, and constraints. Construction information includes source files,libraries, compile configuration, and runtime parameter specification. Aside from providing alogical division in a large set of information, this separation makes the high level structure30^CHAPTER 3independent of the actual process instances to the point where the same structure could bereused by changing the construction information for the processes.3.2 CommunicationOnce a primitive building block is established, there still remains the problem of howthose blocks will be connected to form a larger structure. As in the conventional softwareengineering case this connection is a form of communication. The use of processes as thebuilding blocks means that the communication is simply a passing of data between existingobjects rather than an invocation of one object by another. This certainly has implications forprogram semantics, but not for program structuring.The need for reuse does, however, have a significant impact. Within parallel programsreuse takes the form of replication. Unlike distributed operating systems wherecommunications abstractions may hide replicas from the user, replicas must play the same roleas other processes in a parallel program. Just as for non-replicated processes, replicas must bebound into a program structure with explicit connections to other processes.Changing the structure of a parallel program means changing these connectionsbetween its processes. However, if a process is to remain replicatable, the internal processspecification must remain independent of the process's external connections. It is for thisreason that ports are necessary. A port is essentially a named variable specifying a logicaldestination and whose value specifies a physical destination and other implementation attributes(see §3.2.2). Regardless of the complexity of this information or future extensions, the userneed only deal with the name he has assigned to the port.If ports are not used, then the only way that the process can specify a communicationdestination is by explicitly naming the destination. Since communications destinations in aprogram must have unique names, this implies that one would have to change the code for aprocess each time the objects it communicates with change or, even more frequently, everytime the process is replicated. Thus, for example, it would not be possible in a mesh connectedsystem to have a number of identical processes which each communicate with their neighborsvia named objects north, south, east, and west. This is, of course, highly undesirable anddemands the establishment of an interface internal communication specification from externalspecification.PRIMITIVES FOR A PARALLEL ENVIRONMENT 31Channels are not strictly necessary to establish simple communication relationships inthis scheme since ports provide the necessary separation between logical and physicalcommunication. However, when stating that logical port 'north' on process A is physicallybound to a destination of port 'south' on process B one implicitly states that there is aconnection (or dependency) between the two ports and processes. In graph representation, thisrelationship is invariably represented as an edge. A channel is the primitive which correspondsto this relationship.While there is the similar concept of logical dependency in sequential environments, thecorrespondence of the logical to the physical is usually negligible. However, in the parallelcontext, the correspondence of this relationship to the physical will vary depending on thecontext. In distributed memory or point-to-point communication systems, it correspondsdirectly to a physical route between two processors. As such, a number of importantproperties, clearly not the property of either end of the connection, may be attached to it (e.g.,weight, routing information, protocol). On the other hand, in shared memory or purelybroadcast communication systems, the channel may only have its logical value although theremay frequently be software options for the communication to be specified (e.g., buffering,protocol, timeouts). Since the logical relationship will exist regardless of how it is treated,raising its status costs so little that it may be conveniently ignored where it is of little use.However, omitting channels would cost generality in those cases where they do have physicalparallel.3.2.1 ComplexityYet another advantage of ports is that they provide a means of hiding externalcommunications complexity from a process. For example, it is possible to have severalprocesses connected by several channels to a single input port which the process will simplysee as a single data stream. Furthermore, the number of connections to the port can be changedwith no impact on the process. Conversely, a process can send messages to an output portwithout regard to how many channels are connected to it and will not be affected if that numberchanges. The responsibility for dealing with multiple messages coming or going through asingle port belongs to the system communication functions which implement thecommunication primitives (see §3.2.2).32 CHAPTER 3Allowing multiple bindings to ports and channels provides a simple and abstract way ofproviding a useful and relatively general set of communication primitives: one-to-one, one-to-many (multicast/broadcast/scatter), and many-to-one (gather). Of course, this portfunctionality can be mostly duplicated by adding significant complexity to the user code for aprocess (e.g., an iteration over a set of connections to either send a message for a multicast orpoll for data). While the process-port-channel model may not be the best fit for all types ofsystems it can be mapped in a logical, easy to understand manner to such diverse architecturesas single processor multi-tasking, parallel shared memory, and MIMD in an efficient mannersince it does not require any sophisticated features be present but can take advantage of themwhere they are present and useful.Another approach to providing one-to-many and many-to-one communications can beseen in CODE's merge and distribution filters (see §2.3.1). This approach was motivated byCODE's design goal of isolating communication control from computation. It has theadvantages of achieving this objective plus eliminating the need for the user to provide anycommunication control code. However, this approach does not map clearly onto a processoriented model and has the considerable disadvantages of preventing the developer fromclosely integrating computation with communications since it both compartmentalizes thefunctionality provides it the form of system utilities which are not intended to be useraccessible. This integration is frequently necessary in order to provide optimal performance[ SrChWa9 1].Fundamental to the choice of the process-port-channel model is the design decision thata program developer should only be provided with a very simple and generic set ofcommunication primitives. This is in contrast to other environments such as Trollius [BuPf88]which provide numerous primitives, each with their own capabilities, limitations, advantages,and disadvantages. While this rich set of choices provides considerable potential for handoptimization of programs, it makes programming more complicated and can make programsvery platform and structure dependent (i.e., a program may assume not only that it will be runon a transputer network, but also a transputer network with a very specific interconnectionpattern). In contrast, using a small set of generic primitives supports our aims of programmingsimplicity, at least modest portability, and providing support for automatic communicationoptimization techniques. Performance is not sacrificed for simplicity and portability sincePRIMITIVES FOR A PARALLEL ENVIRONMENT^33automatic communication optimization is provided. Note also that this approach does not ruleout providing users with the option of specifying constraints on particular communications.3.2.2 PortsPorts are the external interface to and from a process and may be used for input, output,or both. Port names must only be unique to a process. They may be bound to one end of oneor more channels or left unused. Dynamic binding of ports at runtime is supported by theprocess-port-channel model and has been implemented in other systems (e.g., [Wi91 a]).However, dynamic binding is not currently supported in our implementation since there is noway to optimize communication over such connections with static analysis tools. As aconsequence all communications paths are specified as explicit port to port connections in thestructural specification.Processes communicate by either synchronously or asychronously sending data to orreceiving data from a specific port by invoking one of these four functions with the port as anargument:port_send(Port *port, void *data, int size)port_async send(Port *port, void *data, int size)port recv(Port *port, void *data, int *size)port_async_recv(Port *port, void *data, int *size)The code for a process contains variables of type Port whose names correspond tothose of the ports bound to channels in the structural specification. The information to fill thesestructures is automatically generated by environment tools (primarily the mapper) andautomatically passed to the process by the loader as extra 'command-line' parameters. Theseparameters are bound to the port structures of a process by a call to the functioninit_ports(int argc, char **argv, Port *pl, Port *p2, Port *pn, NULL)when the process is initialized. The loader orders the parameters by searching the source codefor a process for this function and matching the function arguments with the names of ports inthe specification.There are currently four parameters: an identifier for the destination node, an identifierfor the physical communication link locally associated with the port, a communication eventidentifier, and a set of flags specifying the implementation type of the connection (multiplexed,34 CHAPTER 3non-multiplexed, neighbor-to-neighbor, multicast, etc.). This set of parameters is specific toour initial target environment of a network of transputers running Trollius. A corresponding,but different set will be needed for different environments. Again, note that these variations arehidden from the user.This approach is not compatible with many current types of 'pragmatic' programmingseen in parallel programs. For example, in many programs a process obtains its logicalidentifier (usually an integer) from the operating system at runtime (e.g., by getnodeid ( ) inTrollius) and then establishes communications with a set of processes whose identifiers itderives from its own (e.g., the next and previous). Not only is this a problem because it is anexample of dynamic channel binding, but it also includes assumptions about the executionplatform and the overall mapping of the program. Such assumptions are not supported by oursystem since it is not aware of them and, thus, may invalidate them. Furthermore, even if thesystem were aware of them they would inhibit portability and optimization potential.3.2.3 ChannelsChannels are "pipes" down which data can flow between processes. As such theydefine the communication pathways within a program. Channels can be either unidirectional orbidirectional and can have one or more ports bound to each end. The implementation ofchannels is system dependent. It could be message passing over links between distributedmemory units, some form of shared memory IPC within a shared memory unit, or sockets on aworkstation. As noted in §3.2.1, even within a particular architecture, the operating systemlayer may provide several sets of communications primitives to choose from based on advicefrom tools that form part of the environment.Channels possess properties such as a weight reflecting the traffic density on thechannel and constraints. These properties are used by the mapping system to determine thebest way to implement the channel as part of an optimal communication strategy. If duplicateconnections between processes are not permitted, it is not necessary to name channels likeprocesses and ports since a channel is uniquely identified by the pair of processes it isconnected to. If duplicate connections are permitted as in our system, the port to which achannel is connected within each process is required for unique identification. It is attractive toavoid giving channels names since many users will want to ignore them and automaticallygenerated names (e.g., Channel X) would convey no useful information to a user whenPRIMITIVES FOR A PARALLEL ENVIRONMENT 35binding ports to channels. An intuitive approach for binding ports to channels is to let the userspecify a port and select a channel implicitly by specifying the neighbor process with whichcommunication is desired. Such a list of neighbor processes is simple to construct given a listof channels connected to a particular node. Neighbor/port pairs are required if duplicatechannels between processes are allowed.3.3 SummaryThis chapter has discussed the process-port-channel primitives utilized by this projectas well as some of their features in our implementation. The choice of these primitives wasmotivated by their flexibility and close correlation to existing implementation environmentswhich both influence performance. The numerous advantages of including ports in the setwere also discussed. In addition, the process-port-channel primitives were compared with theunit of computation/data dependency primitives utilized by several other systems. The nextchapter will discuss objects built on top of these primitives to facilitate their easy and effectiveuse.36^CHAPTER 34 Supporting the Process-Port-ChannelPrimitives4.1 ProjectsA project is the highest level structure in Parsec. It corresponds to the user concept ofan application or program and is the root module (0.2) of a (possibly trivial) tree of moduleinstances. It also has the special properties of existing in a special name space and of owningany miscellaneous global configuration options set by a user.4.2 ModulesA module is a named group of processes with an external interface. A module mayeither be an instance of an exported (public) module or an unique instance. In either case amodule may include (instances of) other modules. All objects belong to a module or one of itsinstances although this may be the special "project module" with an empty interface at the toplevel of a project hierarchy. Thus, modules are the primary mechanism in Parsec forhierarchical organization as well as process group level reuse and information hiding.While objects with such purposes are common in conventional programming contexts,in the parallel environment context reuse of subgraphs is only supported in CODE via itsassociated ROPE facility 1LeLi881. However, the subgraph reuse mechanism in CODE/ROPEonly makes copies of the data in the database, thus making the propagation of fixes orimprovements to the unit difficult. In the context of distributed operating systems there is therelated concept of a process cluster, although the members are usually replicas created to3738 CHAPTER 4enhance throughput or fault tolerance with perhaps a common manager. In many cases suchclusters either present a single public communication interface (e.g., through a managerprocess) or appear to do so because of a communication system based on one-to-many andmany-to-one interactions that are hidden from the user.Figure 4.1 Relationship of modules to module instancesCurrently, the module interface consists of parameters, ports, and source code filesselected by the module developer. This interface, while addressing communications, is moregenerally targeted to facilitating as much (or as little) module flexibility as the module developerdesires to provide. Enhancing the flexibility of a module increases its reusability, just as aSUPPORTING THE PROCESS-PORT-CHANNEL PRIMITIVES^39function may be generalized by adding parameterized options. We are not aware of othersystems that provide an external structural interface beyond communication binding for aheterogeneous group of processes.As noted previously, modules may contain other modules. Incorporating a module intoa larger program involves binding the module's external ports to channels belonging to themodule's parent. Ports are selected to be part of a module's interface by binding them to theplaceholder channel Ext ernal instead of a real channel when the module is created. At theinterface external ports must be referred to by the combination of the port name and the name ofthe process owning the port since port names are only unique to a process and any process in amodule may have any number of external ports. Otherwise, the binding of external ports tochannels is the same as ordinary port to channel binding.Modules either implement complete programs or STMs. Interface source code files(those created in the file category External) are the mechanism by which an instance of anSTM is filled in and which provides STMs with their functional flexibility. These files are thecollection of all files belonging to processes in the module created in the file categoryExternal. Many files are used by a number of processes because of the process type andfile reuse mechanisms. Interface files can contain any amount of code the module developerwishes to provide for the user of the module, but each user gets a copy of the original whichcan be fully modified. Future modifications to the interface source code files cannot be directlypropagated to existing instances since such changes are below the level of detail the system isaware of. Although it is not currently supported, it would be straightforward to warn the usersof existing instances that interface modifications have been made and present the updatedversion which can be selectively copied to their existing version.Parameters are the most general purpose component of the interface and provide STMswith their structural and performance flexibility. Most or all of the parameters for a moduleshould be part of the interface since they are the most important mechanism for quicklychanging a module without accessing its underlying representation. A number of uses forparameters within a module have been identified, while others have potential (§4.3).Parameters also provide a means by which information from outside the developmentenvironment can be used within the system (e.g., data from an analysis tool for a particularSTM written by its developer).40^CHAPTER 44.3 ParametersModule parameters are a key component of our environment for facilitating structuraland performance flexibility and, hence, scalability. While parameters are central to STMs, theiruse is not restricted to that context. Parameters are not new; however, their nature in ourenvironment is somewhat different from than in conventional contexts.Usually parameters are used to either pass data to a module or to modify its runtimebehavior. The former is not relevant to our system since data passing is handled via theseparate communication pathways. The latter is a subset of a more general type of usagedirected at modifying the structure and behavior of a module. However, many useful types ofmodification do not take place at runtime in a statically configured system (e.g., selecting thedimensions of a process graph structure). Furthermore, the computations determining theappropriate values for configuration parameters are not naturally located in the parent module,but rather in external analysis tools. Thus, parameters may, in our context, be thought of assettings rather than passed values.Parameters may be either internal to a module or part of its interface (0.2). Thisdifferentiation only extends as far as visibility — both variants may be used in exactly the sameways inside a module. Internally, parameters fulfill the role of program constants inconventional programming languages. Removed from the specific context of compiledlanguages, this role is essentially that of a parameter which is only available to the developer;hence, our treatment of them as variants of interface parameters.Module parameters have a name, data type (e.g., integer, string), data string, kind, anddescription. The parameter 'kind' is a specification of what the source of the actual parametervalue is and when it is obtained. This feature recognizes the fact that module configurationinformation may come from a number of sources and need to be obtained at different times inthe development-execution cycle. Currently, three types of parameter kinds are available:• static (user specified)• external program to provide a value to the environment• external program to provide a value at runtime.The term 'external program' is used here to refer to a utility program not developed as part ofthe environment; i.e., written by the developer of a module, or possibly, an application. Eachof these parameter 'kinds' has its own capabilities and limitations. Static parameters are theSUPPORTING THE PROCESS-PORT-CHANNEL PRIMITIVES^41default and are the value obtained by converting the data string to the parameter type. Thevalue may be specified either by the user directly or by another tool in the environment underuser direction, such as a general purpose performance analyzer. For the other two parameterkinds the data string is a host system command to invoke an external program which providesthe parameter value on a standard output stream.The last parameter kind is used as a runtime command line argument for processes andits value is never known by the environment. However, it is useful for machine generated,highly dynamic values which only influence the execution of the program. Many performancetuning parameters fall into this category since their values are process 'constants' for aparticular execution, have little meaning to a user and can change with each execution of aprocess. This mechanism allows these changes to take place without any intervention on thepart of the user or the environment.The second type of parameter is like the last except that the environment executes thecommand and stores the value it returns. The user may force it to do this to see the value andmay provide an alternative override value which will cause the program to be ignored. Thiskind is particularly useful for performance parameters which have effects on the structure of themodule which is maintained and updated by the environment (e.g., the number of workerprocesses in a module, the number of children each process has in a tree structuredcomputation).Module parameters are currently targeted for use in• process command line arguments• parameterized structures• weights• compiler flags.Other potential uses include• selecting the type of a process network structure (e.g., a ring vs. a tree)• selecting process types (e.g., to select a particular manager or workeralgorithm implementation)• setting pragma-type flags for processes (e.g., indicating priority, that aprocess should not share a processor with other processes).Another possible logical extension for parameters is to allow them to be bound to otherparameters or, even allowing any parameter to be replaced by a function of one or more42 CHAPTER 4parameters. All of these extensions are implementable within the current framework and havepotential uses; however, we require more experience with the current system beforedetermining if they justify their implementation costs.Notwithstanding the current limitations of parameters, no other existing parallelsoftware engineering environment provides parameters integrated into the environment, muchless with our degree of flexibility. ParaGraph does provide a facility for writing parameterizedcontrol scripts for graph structure definition, but the extent to which these parameters arevisible to or managed by the environment is unclear [BaCuLo90]. Certainly, their role isrestricted to the scripts and their means of definition restricted. Of course, most, if not all,systems permit parameters of varying types to be brought in through the back door of theunderlying application implementation. Aside from the extra effort for programmers, thisapproach sacrifices the advantages of having parameters that are standardized, visible in theenvironment, influence environment constructs, and abstracted from the processimplementation level.4.4 Process TypesUnlike reuse in conventional programming contexts, process reuse inparallel/distributed environments may take two forms. The first, consisting simply of using thesame process in several programs, parallels the conventional concept. However, inparallel/distributed programming, it is almost never the case that each process within a programis significantly unique. For example, master-slave models, domain decomposition models,and fault tolerant models all call for process reuse in the form of replication. In some restrictedcases such replication can be completely hidden by the system as in Paralex where replicas ofcomputation units are generated and managed by the system to create fault tolerance. Realproblems arise, however, when the user wishes to create one or more replicas of a process sothat the replicas are explicit components of an application, each with its own role andconnections to other processes.Process Type• Ports• Descriptive Information- weight- pragmas- color• Construction Information- files- arguments- compiler/environmentProcess• Port Bindings• Group MembershipsSUPPORTING THE PROCESS-PORT-CHANNEL PRIMITIVES^43Figure 42 Relationship of process types to processesAs noted in §4.2 both forms of reuse are complicated by the use of staticcommunication binding. The port primitive resolves this problem for our system andParaGraph. However, many other systems require that the user manually replicate theprocesses by specifying each instance for precisely this reason. More sophisticated systemsallow the user to copy a process specification and then modify it, but the copy becomes aunique object with no persistent connection to the original making modification of somecommon property tedious and error prone. Furthermore, there is no provision for handling theneed to have slight variations of basic type (e.g., for a worker at the end of a chain as opposedto the other workers in the chain).Our system addresses these issues by introducing the process type object. Processesare specified as an instance of a type or as a custom type (i.e., a unique instance). While the44^CHAPTER 4system internally treats custom types just like other types, custom types do not need to becreated by the user as a type; rather, the user need only manually fill in the necessary fields fora process specification. Process types specify most process structure and constructioninformation which may be applied to an instance in one of two ways. The information mayeither be mandatory for all instances• source code• port namesor it may be default information in those cases such as• compiler flags• target environment (§4.7)• pragma information (e.g., weight, priority)• runtime argument listwhere particular process instances may need to differ without affecting the basic identity of theprocess. The main piece of information which cannot be included in a process type is thebinding of ports to channels. Note that without the communication interface abstractionprovided by ports, process types would not be possible since each process would have to nameits unique external channels.Making some information only a type default and recognizing that other informationmust be specified as part of the process instance, permits the type specification to be ascomplete as possible without being as rigid as those systems that only permit exact replication.By using types creating a new process specification can be as quick as specifying the process(instance) name, choosing an existing type from a menu, and binding its ports to channels,thus satisfying the objective of easy replication. Furthermore, modifications to typeinformation immediately apply to all processes of the type making modification quick andreliable.4.5 Process GroupsGrouping objects is a successful technique for controlling complexity. Both themodule mechanism and the process type mechanism provide system support for groupingswith pre-defined meanings. However, in many cases structure, module, or application specificgroupings will need to be defined.SUPPORTING THE PROCESS-PORT-CHANNEL PRIMITIVES^45Our system provides a basic grouping mechanism for processes available to both otherfunctionalities (e.g., parameterized process networks) and directly to users. The underlyingrepresentation is the same in either case. Functionality is provided for the user to create,rename, or delete named groups, which processes can be added to or deleted from.The current primary role for groups is to serve as an alias for a set of processes. Forinstance, support is provided for binding communication ports to a particular port existing ineach member group of processes. As members are added or removed from the group, theactual port-to-port bindings are updated automatically. For example, consider a data collectorprocess that wants to receive results from all the leaves of a tree of processes. Not only is thespecification of the initial connections made easier by this functionality, but maintaining theconnections if the dimensions of the tree change is automatic instead of a tedious manualprocess. This latter functionality is necessary for parameterized process networks (§4.6).Currently, for this binding operation to be valid, all members of the group must be of the sameprocess type. Other possible alternatives include simply requiring that the port name used bygroup members for the connection be the same or, if port typing (*7.3.2) were to besupported, that these ports have the same type.Another possible use for groups is to simplify user modifications of a set of nodes suchas changing process weights or display characteristics. Other, not yet implemented, uses forgroups will include aiding and simplifying interactions with other tools such as thoseperforming trace analysis and visualization.In other systems, the only close parallel to our system's process groups exists inParaGraph. ParaGraph nodes/processes possess named attributes whose values are usuallyderived as part of the construction process. Information such as files can bound tonodes/processes based on the values of attributes or some function of them. A looserrelationship exists with process groups in Via [WM] which are directly supported by theruntime system and whose membership and utilization are more dynamic than ours. One of themain functions of process groups in Via is to transparently increase throughput. In both thispurpose and in their implementation they more closely resemble process clusters in distributedoperating systems.46^CHAPTER 44.6 Parameterized Process GraphsA parameterized process graph is some regular interconnection of processes whosedimensions are parameters. Each parameterized graph is a graph family. Theseinterconnection networks may vary from the simple, such as rings and chains, to themoderately complex, such as trees and meshes, to the even more complex, such ashypercubes, butterflies, and shuffle exchange networks. This functionality is critical toproviding scalable modules. As the size of networks grow, it will become increasing importantfor developers to build and test programs on a minimal network and then easily scale theprogram to take advantage of the full set of hardware [BaCuLo90]. A similar need to rescalemay exist when porting a program from one hardware platform to another since each is likelyto have distinct resource limits.In our system a parameterized process graph instance is created by selecting a topologytype from a list of over a dozen frequently used interconnection patterns. Each topology typehas one or two parameters which control the structure of the network (e.g., for a tree theparameters are degree and depth). These parameters may either be explicitly entered by theuser or bound to an environment parameter (§4.3). Once the parameters are specified thenetwork can be instantiated. In addition to the implicit description, the instantiation processinserts an explicit representation into the module process network. Changing the implicitdescription causes the old explicit representation to be removed and replaced with the onespecified by the new implicit description. The explicit representation may be modified directly,although any changes are lost if the implicit representation is changed. In addition to providingthis ability to tweak the parameterized network, having an explicit representation permits auniform representation of the module network to be maintained. From its point of view theparameterized network facility is simply a macro operation on the module network.The availability of process groups is necessary for parameterized process networks. Inaddition to parameters, each topology type has a set of predefined, frequently overlapping,process groups based on conventional labeling schemes. For instance, for a tree the groupsare: root, leaves, last (leaf), internal (i.e., not root or leaf), all, and remainder (i.e., all nodesnot in another selected group). The membership of the 'remainder' group will vary dependingon which other groups are selected for use and, thus, provides a mechanism to insure that theset of nodes can be cleanly and completely partitioned regardless of which of the other groups aSUPPORTING THE PROCESS-PORT-CHANNEL PRIMITIVES^47user wishes to use. In general, all topologies have the basic groups of first, last, all, andremainder. By binding communications from outside the parameterized network to groupsinside the network rather than specific processes, correct bindings can be preserved even if thestructural parameters are changed. These groups also provide an easy means of bindingprocess types to individual processes in such a way that process type assignments will also bepreserved in spite of parameter changes.While this feature in our system addresses the same scalability concerns as ParaGraph,there are a number of significant differences. In ParaGraph network scalability is intrinsic tothe network construction mechanism. Networks are specified in ParaGraph by specifying theproductions of a graph grammar and iteratively applying those productions to construct themembers of the graph family specified with the grammar [BaCu87]. This mechanism iscertainly more powerful and general than the one employed by our current implementation.However, it does have at least two drawbacks.The first is that constructing a graph grammar is not always an easy or intuitiveprocess. While a family of trees is relatively easy, even other simple structures such as meshesrequire a considerable amount of effort and understanding. However, there are otherapproaches to the generation of graph families such as the use of graph operators (e.g., likethose in [Ha69]). In many cases these operators offer a much simpler means of defining agraph family than do graph grammars. For example, the specification of a hypercube is almosttrivial using the graph-theoretic cross product operator (0K121). Many of these operators havethe advantage of mapping very simply to graphical interfaces, unlike graph grammers. Forexample, to build a hypercube a user could draw a graph consisting of two connected nodes,select these nodes, and then invoke the cross product operator as a menu command. Repeatingthe process of selecting the original nodes and those produced by the command and invokingthe command would allow a user produce in n steps a 2n hypercube. This type of functionalityis available in a few existing graph construction and editing packages. While such operatorsare not supported in our current implementation, similar support for quickly building completegraphs or subgraphs is provided (holding down one key when creating a node causes edges toall other nodes to be automatically generated, holding down a different key causes edges to allselected nodes to be created).The second is that of the considerable implementation complexity. It is not clear thatthe generality of ParaGraph is required by most users, particularly when the generality48^CHAPTER 4significantly increases the difficulty of taking advantage of the functionality. It is far from clearthat a set of predefined graph families frequently used will not satisfy the needs of the vastmajority of users. The considerable gains in both implementation and usage simplicity madepossible by taking this approach make testing its validity a worthwhile exercise beforeproceeding to more complex solutions. Both graph grammars and operator based approachescould be used to provide a more general and powerful implementation foundation than thecurrent C codings. Ultimately, however, the goal must be to provide the required generalitywith the simplest user interface or combination of interfaces possible.4.7 Target EnvironmentsA parallel application will, in many, if not most cases, run in a heterogeneousenvironment. For example, even relatively homogeneous machines such as a hypercube ortransputer network will be connected to some host system. This means that different processtypes, or even different instances of the same process type will need to be compiled fordifferent machines with different architectures. Alternatively, different process types may bewritten in different languages or require special support (e.g., extra header files and libraryfiles for a process with a graphical interface).The abstraction of a target environment configuration allows the creation of named,resuable configurations which may then be specified for each process type and overridden forindividual process instances. Information included in a compiler configuration includes acompile host, compiler name, source and target file suffixes, include file paths, library filepaths, basic libraries, and standard compiler flags.4.8 FilesA relatively simple type of object dealt with by the system is files. Currently the systemsupports three categories of files:• internal source code• external (module interface) source code• librariesSUPPORTING THE PROCESS-PORT-CHANNEL PRIMITIVES^49Files are all currently maintained on the host's network filesystem. As a consequencefilesystem specific path information is part of the file object specification and naming must becompatible with host conventions. An important feature related to managing files is supportingreuse at the file level. By doing so code updates are minimized and the impact of updates iswell defined and can be automatically managed as well.Ultimately it would be best to move files into the environment database once limitationsand performance issues are resolved. Such a move should prevent many problems withaccidental file corruption or loss, locking, and portability. For example, the Inversionfilesystem feature of Postgres supports storing large objects such as files in the database andpreserves all critical database functionality (e.g., transaction proctection, access via the querylanguage) for these objects, while also providing an alternate interface which appears to accessa filesystem [Postgres92].4.9 SummaryThis chapter has discussed a number of objects built on top of the process-port-channelprimitives presented in Chapter 3 in order to satisfy many of the design objectives discussed inthe first two chapters. Some of these objects such as projects and modules are structural unitsof organization for aggregates of primitives. Modules also facilitate reuse as does the processtype object. The related objectives of scalability and performance tunability are clearlyaddressed by objects such as module parameters, parameterized graphs, and process groups.The next chapter will discuss user interface design issues in which both the primitives ofChapter 3 and the modules described in this chapter will play a role.5 Interface Issues5.1 Base Program RepresentationA user interface and particularly a graphical user interface needs a base representation.The existence of a base representation provides a number of advantages. It can define or atleast help define the minimal information needed to specify a complete project. It also providesa default representation which is always applicable and valid. The existence of such arepresentation also gives a user's interactions with the interface a sense of rootedness since onewill always begin and normally return to the base view. Finally, it allows all other views to beas specific or general as desirable and to added or removed from the system without seriousconsequences. These additional views will be referred to as auxillary views.A basic choice when graphically representing a distributed program is whether torepresent the program as a process graph or a control flow graph. While these representationsare by no means mutually exclusive, each has its inherent strengths and weaknesses which tendto strongly influence the programming idiom being implemented.Process graphs are a common high level representation for multi-process programssuch as ParaGraph [BaCuLo90]. The nodes in such a graph represent processes and the edgesrepresent the existence of a communication/data dependency between two or more processes.Edges may be directed or undirected, depending on the amount of information which onedesires to reflect in the graph. A key aspect of process graphs is that they provide noinformation about when or how often communications take may take place and, in the case ofundirected graphs, the direction of the communication.50INTERFACE ISSUES^51Control flow graphs (also called dependency graphs) are commonly seen in existinggraphical interfaces for parallel programming environments such as CODE fl3r89] and HeNCEfBeDo9laj. The nodes in a control flow graph represent sequential units of computation whiledirected edges represent data inputs to or outputs from the unit of computations. While manysystems restrict their graphs to be DAGs for simplicity or do not support loops well (e.g.,CODE), the control flow graph model and more sophisticated systems do support loops(cycles) within graphs (e.g., HeNCE). A key feature of the control flow graph is that it pushescommunication specification to the graph level. That is, the direction, timing, and to someextent the frequency of communications are exposed since computations can only receiveinputs when they begin and produce outputs when they terminate.The process graph it is particularly well suited to process oriented paradigms such asCSP or client-server, it can represent all of the paradigms we are aware of with equal ease. Incontrast, the control flow graph is only simple for the generalized dataflow paradigm. Whileits full form has the same representational power as the process graph, in most cases thecontrol flow graph will be considerably more complicated both in terms of the number of nodesand edges and because of the various types of edges that must be employed. For example,most persistent, non-trivial processes must be represented by a number of nodes in a flowcontrol graph connected in a loop structure.This is not merely a representational complexity issue, but also a descriptive problem inthat the control flow graph provides no clear representation of what will become a process inthe final implementation. Thus, not only is an additional, non-trivial task created for theenvironment, but also a human comprehension problem. How can a person assist in finalinstantiation tasks such as mapping when there is no clear relation between his input and theobjects which the instantiation task is manipulating? In contrast, process graphs, while not inany way precluding dynamic assignment schemes, directly map onto static physical resourceallocations in distributed programs with the nodes/processes assigned to processors and theedges/channels assigned to communication pathways. While static resource assignment is notthe best policy in all situations, the costs of dynamic allocation are too high for smallergranularities (e.g., single processes) in MIMD architectures where, at least from a numericalpoint of view, most of the structural complexity lies.The simplicity of the process graph representation is also its weakness. A processgraph specifies the minimum amount of information needed to specify a distributed program;52 CHAPTER 5i.e., its processes and the existence of dependencies between them. Much information whichmight be required or useful to complete the program or advise tools (e.g., temporal orderinginformation, details about dependencies) is not present in this representation while it is presentin the control flow graph. Thus, the control flow graph is better suited for time dependentdescriptions and is widely used as an input to scheduling algorithms.We have adopted the process graph as the primary graphical representation because itssimplicity makes input, display, and user understanding easier. Furthermore, the majority ofgraph objects in any program (i.e., the lower levels with the smallest granularity) will bestatically scheduled in most cases. Finally, the control flow graph can be utilized as anauxiliary representation in those cases where it would be advantageous to represent theadditional information.5.2 Controlling Display ComplexityThe concept of the subgraph is a useful tool in dealing with complicated graphs.However, the general concept of a subgraph can itself be quite complex since a single nodemay be a member of any number of subgraphs. Restricting subgraphs to be strict partitions ofthe master graph does, on the other hand, provide a useful mechanism for controllingrepresentational complexity in graphs. Furthermore, the nested use of such subgraphs createsa simple, easy to grasp hierarchical grouping of nodes in the graph. While most graphpackages (e.g., EDGE [Ne88]) support the subgraph abstraction, most graphical parallelprogramming tools do not. The notable exception is CODE.The subgraphs of a larger graph can serve the two related, but distinct, functions ofdisplay organization and program decomposition. This difference is analogous to thedifference in programming languages between creating code blocks by the use of commentsand blank lines as opposed to making a code block a function or procedure. In other words,the former is purely a superficial decomposition to aid readability and programmerunderstanding while the latter is a structural decomposition recognized by the language andcompiler. CODE does not make this distinction and treats each visual subgraph as adecomposition of the program. In contrast, subgraphs in our interface may be either a view ora module, each of which is designated with a distinct symbol.INTERFACE ISSUES^53A view is a logical grouping of nodes and edges which has been contracted into asubgraph to represent a logical relationship and/or to control the complexity of the parent view.Views may of course contain other views. The term view is used to imply that the subgraphprimarily exists as a display convenience and has little or no meaning with respect theconstruction of the program (a view may be used as a pragma for contraction tools).A module is essentially a view which has been identified as a distinct structural unit.As described in §4.2 modules have an external interface and are a unit of reuse. Since modulesare also views there is no conflict between the two definitions. The fact that a module is simplya special kind of view is reflected in the fact that new modules (as opposed to imported moduleinstances) are created by modifying an existing view. Modules may contain any number ofviews and other modules. Thus, as described in §4.1 a project consists of a hierarchical graphof modules which may themselves be logically partitioned into a logical graph of views (seeFigure 5.1).Figure 5.1 Example of hierarchical structuring using modules and views54 CHAPTER 5Making the distinction between views and modules is useful in several ways. First,while it is possible using the CODE scheme to have subgraphs that are only for logicalpurposes and others that are actual program subunits and/or units of reuse, there is norepresentational or internal distinction of this real structural difference. This failure to make adistinction between unique and reused program components contributes to the problem inCODE of propagating updates to the master version of a reusable module. Finally, making thedistinction provides for greater functional flexibility since not every subgraph must be areusable program module. This is particularly important in our system because of the extrafeatures associated with modules (e.g., their interface) in addition to their additional descriptiveand internal overhead.While the utilization of subgraphs can control graph complexity, it does not solve theproblem in all cases. One basic aid provided by our system is the preservation of layoutinformation, even in cases of reuse, so that manual effort invested in a layout is not wasted.Also, because all modules are subgraphs there is no problem with mixing a newly createdmodule instance graph with that of the parent. In addition, our system provides a standardlayout with each of the parameterized graph structure types. This support is particularlyimportant for our system since these subgraphs are the most likely to be large as well asfrequently structurally modified.Nevertheless, there will inevitably arise large and complicated subgraphs which are notlogically decomposable. In these cases techniques for automatically laying out large andcomplex graphs are required. Many examples of such techniques can be found in graph toolssuch as EDGE [Ne88]. Each of these techniques attempts to optimize one or more particular"good" graph properties such as the number of edge crossings or proximity of connectednodes. However, most techniques are restricted to or only work well for particular classes ofgraphs since optimizing graph layout is in general an NP complete problem. ParaGraphintends to exploit its knowledge about the underlying structure of a graph (derived from the ARgrammar productions for the graph family) to choose techniques which will work well for thatparticular graph's family. In order to handle these cases, we also provide a general purposegraph layout algorithm which has been found to be successful in producing visually attractivelayouts for a wide variety of graphs [FrRe91]. This algorithm models the graph as a manybody problem where the edges are springs connecting the nodes which repel each other. Thelayout produced is an equilibrium point in the many body simulation.INTERFACE ISSUES^555.3 Interface LevelsAs noted in §1.2 parallel program development can be viewed as consisting of tasks atseveral distinct levels of abstraction. While the number and specific definitions of these levelsis an open issue, the concept of supporting more than one such level in a programming systemis not novel [Sn91]. However, there is little or no precedent for designing a smooth transitionbetween such levels or actually implementing such support, particularly using a graphicalinterface. Of course any system providing modularity and easy reusability can be said toprovide support for "higher level" programming in the sense that a C application programmercan avoid writing system code by using the standard Unix libraries. However, there is adifference between providing a set of utilities such as those in a Unix library and providing acomplete program skeleton which is only missing a few task specific routines. Furthermore,there is a significant difference in abstraction between building an application out of languageprimitives and functions provided by libraries and building an application out of a set ofskeletal programs joined by communication channels.As was also noted in Chapters 1 and 2 the concept of paradigm based programs is notnew, and, as noted in §2.3.6 the PIE project proposed the integration of paradigm basedprogramming into their environment as early as 1985. However, such integration has not beenseen until recently (e.g., the Linda Program Builder [AhCaGe91]) and has yet to be seen in agraphical environment. Public STMs provide this functionality in our system. By simplyimporting an existing public skeleton module, filling in the interface files, and setting theinterface parameters a user can quickly and easily create a complete parallel programincorporating considerable optimizations without knowing anything about parallelprogramming in general or the system on which he will be executing the program in particular.By importing multiple modules and connecting them via their external ports one can easilycreate an integrated, multi-function parallel program. Since a user can perform all of thesefunctions to create complete programs in the system without entering the lower level (thecontents of the modules) or possessing a significant amount of the expertise required toprogram at that level, Parsec does provide two different levels of interface. Furthermore, itdoes so transparently within a unified framework.56^CHAPTER 55.4 Auxiliary ViewsThe need for and establishment of a base representation and display in no way dimishesthe role of or need for auxiliary views, in fact, they are increased. A primary role for suchviews is to provide specialized interaction with other tools in the environment such asschedulers and mappers. For example, the control flow graph rejected as a base representationis precisely what is needed for specifying time orderings and interacting with a scheduler.5.5 SummaryThis chapter has discussed several user interface design issues. The chapter began bymotivating the need for a base representation and the choice of process graphs for thisrepresentation. The need for auxiliary, special purpose representations was also noted.Restricted subgraphs were presented as the means of hierarchically organizing a programwithin the user interface. These subgraphs correspond either to the modules presented in theprevious chapter or to logical groups of processes called views. It was also noted thathierarchically organized module subgraphs provide for multiple levels of programmingabstraction with smooth, consistent transitions between the levels. The following chapter will,in the context of a brief example, provide an overview of how the elements from the previousthree chapters have come together in the current implementation.INTERFACE ISSUES^576 Implementation and ExampleThis chapter presents as an example the specification of the processor farm STMdescribed in [FeSrWa92] and briefly describes how it may be included in an application. ThisSTM provides optimal performance on a balanced, rooted k-ary tree and has parameters for treedepth and granularity. The granularity parameter controls the aggregation of individual taskdata units into larger units for the purposes or communication and assignment to workerprocesses. The degree of the tree is also a potential parameter but is of less interest since forthis STM it is a system parameter rather than an application parameter because the optimal valueis the largest that can be physically realized on a given system.We will begin the process of defining the processor farm by defining its parameters andthen the parameterized graph controlled by one of the parameters. This parameterized graphconstitutes the main structural component of the module. The following step will be tocomplete the process type specifications for the processes in the parameterized graph. We willthen add a custom process to serve as the 'master' process for the processor farm and as part ofits definition establish its communication bindings to the parameterized graph. Finally, we willsee how the processor farm can be turned into a reusable module and included in otherprojects.6.1 Defining ParametersFigure 6.1 shows an empty project frame with its menu buttons and its 'Graph' menupulled down. Most of the important functions used in this example are available from thismenu. A logical place to begin defining a STM is its parameters since several other constructs58IMPLEMENTATION AND EXAMPLE 59utilize them. Figure 6.2 shows the parameter dialog accessed by selecting Parameters...' fromthe 'Graph' menu. This dialog utilizes a basic user interface construct employed throughoutthe interface — the add-change-delete list. In this case the list contains all of the parameterswithin the module from which it was activated (noted in the footer of the window). Items maybe added by filling in or setting the attribute fields below the list and then pressing the 'Add'button. The attributes for an existing item are displayed by selecting the item in the list. Theseitems may changed and the changes committed by pressing the 'Change' button. A new itemderived from an existing item may be easily created by selecting the original, making thechanges, and then pressing the 'Add' button. Pressing the 'Delete' button simply deletes theselected item in the list. 13^ Graph Editor — untitled.2(File v) (Edit v (-Elisi-31TW;)ffigEDLCV7----)liews (Project v)ToolsStandard..,Parameterized...CollapseMove LipAuto LayoutProcess Types...Process Groups...ONESEINDFigure 6.1 Empty frame and 'Graph' menu Depth( Add )(Change)( Delete )Internal: ^GranularityKind: 0 StaticName: DepthValue Type: d IntegerData: ^(Override:Description: Depth of processor farm tree..60^CHAPTER 677- ParametersModule: Base^Instance: <Orig nabFigure 6.2 Parameter dialogThe list in Figure 6.2 shows the two parameters for the STM and the attribute valuesfor the selected parameter, 'Depth.' For the purposes of this example, both parameters arespecified as being of kind 'static.' However, in the actual STM they would be specified asvalues provided to the environment by an external program. For this STM that program is aperformance analysis tool written by the author of the STM and based on the underlyinganalytic model. This tool provides, depending on its arguments, the appropriate parametervalue based on statistics stored in a file by a previous execution of the application. The nameof the analysis tool and its arguments would be specified in the field where the value of a staticparameter is specified (here '3').6.2 Defining a Parameterized GraphOnce the parameters have been specified, the next logical step is to specify theparameterized tree that will be the main part of the STM. Note that the STM design calls for thetree to consist of only workers, with the master process connected to the root of the tree.Figure 6.3 shows the dialog for specifying parameterized subgraphs, once again based on anadd-change-delete list.( Add )(Change)( Delete )Subgrap hs•f TreeDescription: Worker tree for processor farm..Degree: 2 Parameter Binding:^NoneLevels: 3^Parameter Binding: 113 DepthName:Groups: CI RemainderProcess Type: GI NewModule: Base Instance: <Original>XFile v Edit v Views vGraph v Project vDisplay vGraph Editor — untitled2 (edited)Graph Type: GI Rooted Tree[NoneGroup DepthBindings: Root <-- Worker (^Bind^)Leaf <-- Terminal WorkerRemainder <-- Worker I (Change)Delete )Base Name: Tree,IMPLEMENTATION AND EXAMPLE^61Figure 6.3 Parameterized subgraph dialog and its result for the processor farmBelow this list is the first group of attributes for a parameterized subgraph. Thesespecify the structure and dimensions of the graph family. The first field is a menu stack whichcurrently lists over a dozen standard interconnection networks. Here we have selected 'rootedtree.' There are two other pairs of fields for specifying the parameters of the selected graphfamily (only one pair may be visible if the graph family only has one parameter). The labels62 CHAPTER 6for the pairs depend on which graph family has been selected. The first item in the pair is thecurrent value of the parameter which may be set here manually or may be derived from thevalue of the second item. It is used to bind the graph family parameter to a module parametersuch as the one we created in the last step. If such a binding is made the first field shows thecurrent parameter value but is not editable. Here we bind the depth parameter to the moduleparameter 'Depth' we previously created but leave the degree dimension as a manually set valueof '2' to specify a binary tree. A parameterized subgraph will immediately reflect changesimplied by a change to a module parameter to which one of its parameters is bound. Manualchanges to unbound graph family parameters are made by changing the value and pressing the`Change' button.The second group of parameterized subgraph attribute fields comprise a variant of theadd-change-delete list used for establishing bindings. Here the function is to bind processtypes to groups of nodes within the subgraph. There is a menu stack which lists the groups forthe current interconnection structure (for a rooted tree: root, last, leaves, remainder, and all).The other menu stack lists existing process types. New types may be created by selecting`New' from the menu and filling a new name in the adjacent field. A group and process type isbound by pressing the 'Bind' button. For the processor farm there are two different processtypes for the workers in the tree. The workers for the leaves must be slightly different sincethey cannot pass work on to children. Those groups used here will be also available forcommunication bindings. Thus, we need to specifically note the 'Root' group for our exampleeven though it has the same process type as group 'Remainder.'Once the specification is complete, the 'Add' button may be pressed. This causes anumber of actions to take place. First, the group to process type bindings are validated to makesure every node in the graph has received a single process type binding. If this is correct, thesystem proceeds to create any new process types, their ports, and then the graph for the currentdimension settings as is seen in the main window in Figure 6.3. Currently, process types notcreated by the parameterized subgraph mechanism must already have the documented canonicalport names for the group the type is bound to. A future version will allow the user to map thecanonical port names to arbitrary port names in existing parameter types.IMPLEMENTATION AND EXAMPLE^63‘29^ Process TypesTypes. .___...!Terminal WorkerWorker (^Add^)'(Z-Fa-rTiW)( Delete )1=1Name: Terminal Worker^ ( Files,.. )Ports:Child°Child1Parento"^( Add^)_—I.^ (Z-riar-TW)TI^( Delete )Name:^ Flow:^In^F6' Out*Description: ...—■—Constraints: •^Anti —social^•^High PriorityWeight: 1^CICITypeDescription:Processor farm worker process without chi 1 dren.AModule: Base^ Instance: <Original>Figure 6.4 Process type dialog6.3 Defining Process TypesThe next step in specifying the processor farm is to complete the process typespecification for the process types automatically created by the parameterized subgraphmechanism. This is done by selecting 'Process Type...' from the 'Graph' menu to display thedialog in Figure 6.4. Initially the process types are already present in the list and the ports foreach of them already specified (ports are listed and modified in the first group of attributes64^CHAPTER 6below the main list). We may still, however, want to fill in some of the information the secondgroup of attributes to elaborate and refine the process specification.SD^Type Creation Information - Terminal WorkerInternal Files:prog.11c=*^( Add v)Islave.c^ I —.7_'^(Change)I CDelete )InternalExternalPath: /project/transputer/tips/tests/test2Source File Name: slaverLibrary Description: Core C implementation of processorfarm SIN worker for transputerunder Trollius..7h--—=Arguments:<Granularity>0-,.—7.-—II(Constant^)(processor id)(process name)<Depth><Granularity>Value:Target Environment: CI^Transputer (Trollius)Compiler Congifuration: 0^DefaultCompiler Flags: -D LAST -DDEGREE-Z,(Apply)^(Reset)Module: Base^ Instance; <Original>Figure 6.5 Process type creation information dialogAs noted in the discussion of process types, process type information is divided intotwo categories: descriptive and construction. The dialog in Figure 6.4 deals with thedescriptive. The dialog in Figure 6.5 deals with the construction information, including sourcefiles, and is accessed by pressing the "Files..." button in Figure 6.4. The first panel in thedialog is for specifying the files used by the process type: internal, external (interface), andlibrary. The second panel is for specifying the parameter list for processes of the type. The'Add' button has an attached menu which list of module parameters (those items in <>'s),special values to be filled in at load time (those items in O's), and the default item 'Constant'which installs the string in the 'Value' field. The final panel contains compilation and targetinformation.IMPLEMENTATION AND EXAMPLE^65r IA^Grap h Editor - untitled.8 (edited)(File v) (Edit v)^(Display 7) (Graph v) (Views v) (Project v)1=17.—__—Process 7 IExternal FilesInternal Files^r=iDeleteGeneral...Creation...Tree 0Tree 1^ Tree 20^Tools• efilliTree 3^Tree 4^Tree 5^Tree 6X)..1114 1^I JFigure 6.6 Manually creating a new process6.4 Adding ProcessesOnce the process type information for the parameterized tree is complete, the next stepis to add the master process. Nodes may be added manually by selecting appropriate tool fromthe tool palette shown in Figure 6.6 and then clicking the mouse where you want the iconplaced. This can be obtained either from the 'Graph' menu or by pressing a mouse button onan empty area of canvas. By clicking on an existing process with the appropriate mousebutton, you can obtain the process menu which allows you to alter or delete the process.6.5 Establishing Port Bindings and Other DescriptiveInformationBy selecting the general information option you can access the process informationdialog in Figure 6.7. This dialog allows you to name the process and specify its type if you66 CHAPTER 6wish it to be of a standard type. Since this dialog includes the information contained in theprocess type structural specification, that information will be reflected here if you make thatchoice. Unbound ports are displayed in the port binding list as bound to the special channel`None.' If, on the other hand, you desire a custom type, as in this case, type information canbe filled in here (changing type information for a non-custom process will cause that process tobecome custom). In addition, this dialog allows specification of information such as groupmemberships and, most importantly, port bindings that cannot be specified as part of a type.t9^ Process InformationType: 0 Custom^Name: MasterGroups: ...^(Add^ v)T^(Delete)1=1PortBindings: interface — External ,cn^—^Bind^)(C!Worker ==<Tree.Root:Parent>^I —(Change)ChannelGroup( Delete )rmBind^Local Port: Worker^Flow:^In^Outto^Group: a^Tree.Root^Port: ] ParentChild()Description: Communication of tasks and results withroot worker of parameterized worker treyChildlParentConstraints: ^^Anti—social^g^High PriorityWeight: 2^=--mammap.ftProcess/TypeDescription:Master process for processor farm STM.^Reponsi bl efor generating tasks for workers,^receivingresults, and communicating with STM parent..(Apply)^(Reset)Module: Base^ Instance: <Original>Figure 6.7 Process information dialog for master processTwo different categories of binding are possible here: channel and group. For ourexample, we need both a binding to the root node of the parameterized tree and to the STMinterface. The latter is relatively trivial and can be done by specifying a port name and bindingit to the special channel 'External' (a menu of available channels is displayed in place of the 'toIMPLEMENTATION AND EXAMPLE 67Group' menu if 'Channel' mode is active). The 'to Port' menu is inactive in channel modeunless there is more than one channel to the same node in which case the destination portnames are needed for unique identification. For the connection to the root process of the treewe could have drawn in a channel and used a channel binding. However, the channel wouldhave been lost if the parameterized tree were regenerated because of a change. To obtain apermanent binding we can use the 'Group' mode to bind this process to port 'Parent' of thegroup 'Root' using the group menu and port menu (which only displays the ports for selectedgroup) available in this mode. Channel edges in the graph are automatically maintained forgroup bindings.G^Process Creation Information - MasterExternal Files ,—^AcW. ;9!I user_mastenc^ Iuser.h-;—1=CZEMIDuser.huser_slave.cInternal Path: /project/transputer/ti ps/tests/test2Source File Name: user_master,c,ExternalLibraryDescription: C source template for processorfarm master process user functions..Arguments:<Granularity>  ( Add v)_e^( C hange)I ( Delete )Value:.Target Environment: 0^Transputer (Trolli us)Compiler Congifuration: 0^DefaultCompiler Flags:(Apply)^(Reset)Module: Base^ Instance: <Original>Figure 6.8 Process creation information dialog for master process6.6 Defining Process Creation InformationFigure 6.8 shows the process creation information dialog for the master process whichis accessed by selecting the creation information option from the process menu. This isidentical to that used for non-custom types. Notice that the files entered earlier are nowGraph Editor — untitled.8 (edited)(File v) (Edit v) (Display v) (Graph v) (Views v) ( Project v)7..mesteredata_ managerdebug_infodg_ processdo_taskget_resulthp_memIp_memmainprint_resultsrecv_resultresult_managerrr_processsend_data41 101Master IInfo^r>External Files c. GRZEWEEI prog.hDelete^ Calal68^CHAPTER 6available as items on the 'Add' button menu. Thus, we can with almost no effort re-use filesand prevent manual duplication errors. In this case we need to re-use the external file `user.h.'Figure 6.9 shows the completed processor farm graph. Also shown is a file accessmenu item for the master process. These menu items are maintained automatically based on theinformation supplied to the creation information dialog. From here you can directly access aspecific file or even function within the file in the environment's editor.Figure 6.9 Completed processor farm process graph6.7 Creating a Reusable ModuleOnce the description is completed the next step is to turn it into a reusable module. Thefirst step in this process is to turn the graph into a view subgraph. This is done by selecting aset of nodes (in this case all of them by using 'Select All' from the 'Edit' menu) and thenselecting 'Contract' from the 'Graph' menu. Figure 6.10 shows the resulting display. Likeprocesses, views have a menu accessible by clicking on the view icon. We could use the'View Subgraph' option to see the original graph, but right now we want to turn this view intoa module. To do this we select the 'Info...' option from the menu to get the dialog also shownAlli j( Info... View SubgraphExpand SubgraphDeleteModule: Base^ Instance:^ (Apply )Flags: ^ Ignore for Contraction^ (Reset )Subgraph0=1ill1111111111=1^View Name: Processor Far%Graph Editor — untitled.° (edited)(Edit v) (Views v) (Project v)Subgraph InformationBoundary: logical I ModuleFile vVisible Degree: 0^Degree: 0Show Description for: View I module InstanceIMPLEMENTATION AND EXAMPLE 69in Figure 6.10. To achieve the conversion we change the boundary type from 'Logical' to`Module' and fill in the appropriate name fields. Once this is done we have a custom modulewhich can be made public by pressing the 'Export' button (which is only enabled for custommodules). At this point we have completed our initial task and have a new module icon andaccess to its menu as displayed in Figure 6.12.Figure 6.10 Processor farm converted to a view subgraph and the subgraph dialog6.8 Instantiating a Reusable ModuleNow, suppose that we wish to use this STM in an application. We can do this bycreating an empty project (e.g., by using 'New' from the 'File' menu) and then selecting themodule instance tool from the tool menu (the square icon in Figure 6.6). Clicking on thedesired location on the canvas will cause the dialog in Figure 6.11 to be displayed. Selectingprocessor farm from the list and filling in the desired names will cause the icon in Figure 6.12to be displayed. As noted earlier, modules have their own distinct menu attached to their icons.ModulesProcessor Farm.-414^import ModuleInstance Name:, ^View Name:(Create)Info...View ModulDeletePort Bindings...P.DrrIOCrSource FilesConfigure Files..DepthGranularityProcessor Farm70^CHAPTER 6As an application developer the interface options are the most important. The port bindings areimportant for connecting processes or other templates to this one.Figure 6.11 Import module dialogFigure 6.12 Module symbol and menuInterface Parameter Setting — Processor FarmName: DepthKind: StaticValue Type: IntegerData: ^OVe.rYifiP: Description: Depth of processor farm tree.(Apply) (Reset)Module: Base^Instance (Original)Figure 6.13 Interface parameters submenu and dialogIMPLEMENTATION AND EXAMPLE 71The parameters menu lists those module parameters created for the module and notdeclared external. This menu and the dialog accessed by selecting one of the parameters areshown in Figure 6.13. Note that only the parameter value is editable at the interface. This isalso the main target of the object description since the person importing a module may havenothing to do with its implementation.la^/nfs/projectl/projectlAransputer/group/feldcarnp/graphCFTITC ^- 1(-7ag;T) (Grep v) (Match v) (Comment v)^Misc. v^ *ir.)^[41 1 1(Line:) q^(Windows v) Cil^1 =1/....--. Processor Farm STM - master task source template**^Called functions:**^[1]^data_generator()* args:^NONE* return value:^NONE*^[21^result_reciever()* args:^NONE* return value:^NONE*/Rinclude^n user.h"/** Generate task sized data units and submit them to the system.*^Must call:^do_task(x. y)* args:^[1]^pointer to user task data (MUST be malloc'd). [2]^size of user data**^return value:^NONE*^Memory allocation:^do_task() COPIES data passed to it so data_generator()* is responsible for any memory it allocates*/void data_generator(){user_data^data;/* load/generate complete user data, if necessary *//* decompose data into task units and submit them via 'do_task()' */while (TRUE /* decomposition not complete */){do_task(&data, sizeof(user_data));13./** Receive and process task results*^Must call:^get_result()args:^NONEreturn value:^pointer to malloc'd task result data structure*/void^result_receiver()„... File: /project/transputer/tips/tests/test2/user_master.c^ AFigure 6.14 Editor window with interface file72 CHAPTER 6The file menu lists those files declared as 'external' for the module in a menu just likethose for processes shown in Figure 6.10. Insuring clean reuse at the specification stageinsures no duplications here. Figure 6.14 shows one of these files as brought up by selectingit from the menu. The 'Configure Files...' option brings up a dialog very similar to the filespecification panel of the creation information dialog except that it has only two modes. One isfor adding libraries and one for source files. A user may utilize this dialog to change the pathof a file supplied to him and thus get his own copy of template file or to attach additional sourcefiles to a particular interface file. These 'attached' files will be included in the make processeverywhere the interface file is so that it will become a part of all of the same processes.7 Conclusions7.1 SynopsisThis thesis began by noting the need to reconcile the desire to provide simplifyingabstractions with performance requirements. Virtual machines implementing parallelprogramming paradigms were presented as a type of abstraction which does provide significantsimplification and can be made compatible with performance objectives. Compatibility isachieved by analyzing a virtual machine to determine the design and implementation attributeswhich can be tuned to improve the performance of a particular application of the virtualmachine and then parameterizing the virtual machine in terms these attributes. We haveproposed the skeleton-template-module (STM) as a parameterized virtual machine,corresponding to a particular paradigm, which a user can employ to create a particularapplication by filling its skeletal or template structure.It was also noted that to be truly useful a construct such as an STM must be supportedby and fully integrated into a viable programming environment. Such a system must supportboth the construction of STMs and their use to create applications as part of a generalprogramming framework. Other objectives for such a system, such as "ease of use," reuse,and scalability, common to both sequential and parallel environments, were also discussed.We have designed and built such an interface and environment, along with prototypemapping and loading tools. The system utilizes the process-port-channel model of parallelprogram description and supports these primitives with a graphical interface. This interfaceprovides arbitrary hierarchical organization within a program by the use of STMs and logicalsubgraphs within STMs. In addition, objects are provided to both support STMs and further7374 CHAPTER 7enhance "ease of use." Some of these objects such as projects and modules are structural unitsof organization for aggregates of primitives. Modules also support the objective of reuse asdoes the process type object. The related objectives of scalability and performance tunabilityare clearly addressed by objects such as module parameters, parameterized graphs, and processgroups.7.2 Current StatusAn alpha version of the user interface and database component of the environmentimplementing all of the major features discussed in this paper exists at this time. It consists ofnearly of 30,000 lines of C and SQL source code for the Oracle V6 database. Prototypeimplementations of the mapper, loader, and runtime support systems have also beenimplemented. Our system also includes a slightly modified version of the Tmon monitor[JiWaCh90]. In addition to more advanced versions of these tools, initial versions of tools formulti-module scheduling and communication pattern debugging are currently underdevelopment.The tools and database have been developed and currently run on Sun SPARCworkstations under SunOS 4.1.X. However, they have been designed and implemented to beportable to any Unix platform with access to a high-performance SQL database. The runtimesystem is built on top of Trollius 2.1 running on a network of 70+ transputers and a SPARChost. We are investigating extending the runtime system to support networks of workstationsby using PVM. The system is currently undergoing testing and has been used to implement theprocessor farm STM and a port of a many body simulation originally implemented directly inTrollius and using our original mapper, Tmap [Go91].7.3 Future Work7.3.1 Communication Support and TypingThere are a number of ways in which communication functionality can be simplifiedand enhanced. One obvious enhancement is providing a network data representation systemsuch as Sun's XDR [XDR87] or OSI's ASN.1 [IS087]. Doing so will solve several problemsCONCLUSIONS^75which must currently be dealt with by the user. The first of which is the existence of differentbyte orderings on different systems. Even when using relatively homogeneous systems thiscan be a very annoying problem, and can be serious impediment to portability. The secondproblem is that of requiring the user to pack data structures into and unpack data from messagedata units. This is an inescapable task, but tedious and error prone. Data structure compilerswhich automatically create these routines are available for both of the mentioned datarepresentation systems. Of the two, XDR is preferable in performance sensitive contexts likeparallel programs since it carries much less overhead than ASN.1 [PaRo89].Another potentially useful mechanism is providing some sort of typing for ports. Thiswould provide a more explicit specification of communication and, thus, make a greater degreeof structural validation possible. Furthermore, by tying port types to types declared with thedata representation support mentioned above, most of the coding overhead involved in dealingwith message passing systems could be eliminated. What are not clear are issues such as howstrict type checking should be, whether multiple types should be supported on a port, or, if thatis the case, whether time ordering information should be recognized as part of a 'port type.'The latter, for example, brings the whole problem into the domain of protocol specification andverification.7.3.2 Data DecompositionOne of the persistent parallel programming problems is that of data decomposition andits complement, result aggregation. The nature of these tasks is such that they are highlyapplication specific and, as such, are not able to be incorporated into the specification of avirtual machine or the implementation of an STM. In fact, two of the three most commonpieces of code required to instantiate an STM are function implementing these two tasks (thethird being a computation on the decomposed data). Reducing the amount of programmereffort involved in dealing with these tasks would, thus, be another major step in simplifyingparallel application implementation.There do exist some well defined standard decomposition strategies for some datastructures, particularly arrays (e.g., [MeVR91]), although they are not nearly as general asvirtual machines. However, even here there is the question of how these strategies can bepresented to the user and parameterized into a relatively general form. Even more challenging76^CHAPTER 7is the problem of how to deal with those situations where such strategies are not available orapplicable.7.3.3 Derived ObjectsAlthough reusable objects such as modules and process types save large amounts ofeffort, both in construction and maintenance, a significant duplication of effort is still requiredto create types and modules that are almost, but not quite, the same. In the case of processtypes we had initially considered having fields for individual processes to override process typeinformation to help address this problem. However, this solution is inadequate for preciselythe same reasons that process types are needed in the first place — reuse and maintainability.The same variant may occur a number of times and the information defining the variation maybe need to be updated at some point in the future.As a result, we are now considering the concept of derived objects to solve thisproblem. A derived object is defined as a base object plus some set of differences. This isclosely related to the concept of inheritance in object oriented systems. Applying inheritancethis domain raises a number of questions regarding the semantics for different objects (some ofwhich may be compositions of objects including derived objects), whether derivation chainsshould be permitted, and whether such modifications should encourage or discourage therelaxation of object scoping rules. Furthermore, any non-trivial definition would present anumber of challenges with regard to how it can be efficiently implemented in an environmentsuch as ours.7.3.4 AdvisorsEven with the best tools and environment for constructing parallel applicationspossible, converting a sequential application into a parallel application or writing one fromscratch are unlikely to become trivial tasks. There remains a significant amount of highlyspecialized expertise involved in such tasks as identifying the best virtual machine for analgorithm, choosing a good data decomposition scheme, or determining the optimal value for aparameter. This is fertile ground for technologies such as expert systems and user modelingwhich can incorporate such expertise and use it to guide and advise the user.Such a software 'advisor' was proposed as one of components of PIE IGrSe85J.However, the utilization of virtual machines which implement restricted models of computationCONCLUSIONS^77should assist and simplify such an effort since the restricted models clearly delineate differentareas of expertise and the developer of a particular virtual machine is likely to be an expert withrespect to its particular attributes. Also, using restricted models makes it more possible that`expertise' can be reduced to analytical models instead of general heuristics and techiques.Furthermore, while it may seem attractive, an interactive tool may not always be best orsimplest solution. For example, as already supported by our environment, a virtual machineimplementor utilize his knowledge to identify and parameterize performance sensitive aspectsof an implementation and then write programs to analyze information from a monitor tocompute the optimal values for these parameters. In such cases the application developer neednever deal with or even be aware of the expertise at work making adjustments to improve theexecution of a program.References[AhCaGe91] D. Gelernter, S. Ahmed, and N. Carriero, "The Linda Program Builder," in A.Nicolau, D. Gelernter, T. Gross and D. Padua, ed., Advances in Languages andCompilers for Parallel Processing, Research Monographs in Parallel and DistributedComputing, MIT Press, Cambridge. Massachusetts, 1991, pp. 71-87.[A191] G. Alverson, Abstractions for Effectively Portable Shared Memory Parallel Programs,Dept. of Computer Science and Engineering, University of Washington, PhD Thesis,Oct. 1990.[BaA191] O. Babaoglu, L. Alvisi, et al, Paralex: An Environmentfor Parallel Programming inDistributed Systems, University of Bologna, Dept. of Mathematics, TechnicalReport UB-LCS-91-01, Feb. 1991.[BaCu87] D. Bailey and J. Cuny, "Graph Grammar Based Specification of InterconnectionStructures for Massively Parallel Computation," Proc. of the Third Int' l. Workshop onGraph Grammars, Springer-Verlag, Berlin, 1987, pp. 73-85.[BaCuLo90] D. Bailey, J. Cuny, and C. Loomis, "ParaGraph: Graph Editor Support forParallel Programming Environments," International Journal of Parallel Programming,Vol. 19 (2), 1990, pp. 75-110.[BeDo9la] A. Beguelin, J. Dongarra, et al, Graphical Development Tools Jr Network-BasedConcurrent Supercomputing, Oak Ridge National Laboratory, 1991.[BeDo9 1 b] A. Beguelin, J. Dongarra, et al, HeNCE: A Users' Guide, Oak Ridge NationalLaboratory, 1991.[BeSt89] F. Berman and B. Stramm, Prep-p: Evolution and Overview, Department ofComputer Science, University of California at San Diego, Technical Report CS89-158,1989.[Br85] J.C. Browne, "Formulation and Programming of Parallel Computations: A UnifiedApproach, "Proc. of the Int' l Conf. on Parallel Processing, CS Press, Los Alamitos,Calif, 1985, pp. 624-631.78REFERENCES^79[Br89] J.C. Browne, M. Azam, and S. Sobek, "CODE: A Unified Approach to ParallelProgramming," IEEE Software, Jul. 1989, pp. 10-18.[BuPf88] G. Burns, A. Pfiffer, et al, "The Trillium Operating System," in Proceedings of theThird Conference on Hypercube Concurrent Computers and Applications, ACM Press,Jan. 1988.[Ca89] D. Gelernter and N. Carriero, "Linda in Context," Communications of the ACM, Vol.32 (4), Apr. 1989, pp. 444-458.[ChGo91] S. Chanson, N. Goldstein, et al, "TIPS: Transputer-based Interactive ParallelizingSystem, " Transputing '91 Conference Proceedings, Sunnyvale, Calif,  IOS Press,Apr. 1991.[Co89] M. Cole, Algorithmic Skeletons: Structured Management of Parallel Computation,MIT Press, Cambridge Massachusetts, 1989.[EiB191] R. Eigenmann and W. Blume, "An Effectiveness Study of Parallelizing CompilerTechniques," Proceedings of the ICPP, 1991, pp. 17-25.[EnWe91] A. Endres and H. Weber, eds., Software Development Environments and CASETechnology, Lecture Notes in Computer Science No. 509, Springer-Verlag, Berlin,1991[Epcc92a] Edinburgh Parallel Computing Centre, PUL-TF Prototype User Guide,EPCC-KTP-PUL-TF-PROT-UG 1.4, 1992.[FeSrWa92] D. Feldcamp, H. Sreekantaswamy, A. Wagner, S. Chanson, "Towards aSkeleton Based Parallel Programming Environment," Transputer Research andApplications 5, IOS Press, Apr. 1992.[F179] R. Floyd, "Paradigms of Programming," Communications of the ACM, Vol. 22 (8),Aug. 1979.[FoKe91] G. Fox, K. Kennedy, et al, "A Static Performance Estimator to Guide DataPartitioning Decisions," Proc. 3rd ACM Symp. on Principles and Practice of ParallelProgramming (PPOPP), Apr. 1991, pp. 213-223.[Fo0v91] I. Foster and R. Overbeek, "Bilingual Parallel Programming," Advances inLanguages and Compiler for Parallel Processing, Research Monographs in Parallel andDistributed Computing, MIT Press, Cambridge, Massachusetts, 1991, pp 24-43.[FrRe91] T. Fruchterman and E. Reingold, "Graph Drawing by Force-Directed Placement,"Software Practice and Experience, Vol. 21 (11), Nov. 1991, pp. 1129-1164.[Go91] N. Goldstein, A Topology Independent Parallel Development Environment, Dept. ofComputer Science, University of British Columbia, Masters Thesis, April, 1991.80^REFERENCES[GrSe851 F. Gregoretti and Z. Segall, Programming for Observability Support in a ParallelProgramming Environment, Department of Computer Science, Carnegie MellonUniversity, Technical Report CMU-CS-85-176, November 18, 1985.[Ha69] F. Harary, Graph Theory, Addison-Wesley, Reading Mass., 1969.[HeEt91] M. Heath, J. Etheridge, et al, ParaGraph: A Tool for Visualizing Performance ofParallel Programs, Oak Ridge National Laboratory, Technical Report.[HeGi91] M.T. Heath, G.A. Giest, et al, PICL — A Portable Instrumented CommunicationLibrary, Oak Ridge National Laboratory, ORNL/TM-11130.[1S087] CCITT, Recommendation X.208, "Specification of Abstract Syntax Notation One(ASM.1)," Geneva, Switzerland, 1987.[JiWaCh901 J. Jiang, A. Wagner and S. Chanson, "TMON — A Transputer PerformanceMonitor," in D. Fielding, ed., Transputer Research and Applications NA TUG 4, IOSPress, Oct. 1990.[Ke92] K. Kennedy, "Software for Supercomputers of the Future," The Journal ofSupercomputing, Kluwer Academic Publishers, Boston, 5 (1992), pp. 251-262.[LC891 Transputer Too/set, version 89.1, Logical Systems, Corvallis, Oregon, 1989.[LeLi88] T. Lee and C. Lin, ROPE User's Manual: A Reusability-Oriented ParallelProgramming Environment, Technical Report, Dept. of Computer Science, Universityof Texas at Austin, 1988.[LeER92] T. Lewis and H. El-Rewini, Introduction to Parallel Computing, Prentice-Hall,1992.[LeSeVr89] T. Lehr, Z. Segall, D. Vrsalovic, et al, "Visualizing Performance Debugging,"IEEE Computer, October 1989, pp. 38-51.[Lo89] F. Long, ed., Software Engineering Environments, Proc. of the Int'l Workshop onEnvironments, Chinon, France, Sept. 1989, Springer-Verlag, Berlin, 1990.[LoRa90] V. Loin S. Rajopadhye, et al, OREGAMI: Software Tools for Mapping ParallelComputations to Parallel Architectures, Dept. of Computer Science, University ofOregon, Technical Report CIS-TR-89-18, Jan. 1990.[MeVR91] P. Mehrotra and J. Van Rosendale, "Programming Distributed MemoryArchitectures Using Kali," Advances in Languages and Compiler for ParallelProcessing, Research Monographs in Parallel and Distributed Computing, MIT Press,Cambridge, Massachusetts, 1991, pp. 364-384.[Ne88] F. Newbery, "An Interface Description Language for Graph Editors," Proc. of theIEEE Workshop on Visual Languages, 1988, pp. 10-12.REFERENCES^81[RaSeVr90] M. Rao, Z. Segall, and D. Vrsalovic, "Implementation Machine Paradigm forParallel Programming," Proceedings Supercomputing '90, New York, NY,Nov. 1990, pp. 594-603.[OL89] Sun Microsystems, Inc., Open Look Graphical User Interface FunctionalSpecification, Addison-Wesley, Reading, Massachusetts, 1989.[PaRo89] C. Partridge and M. Rose, "A Comparison of External Data Formats," in E.Stefferud, O. Jacobsen, and P. Schicker, Message Passing Systems and DistributedApplications, Elsevier Science Publishers B.V. (North-Holland), 1989, pp. 233-245.[Postgres92] Postgres Reference Manual, Version 4.0, 1992, available via anonymous ftpfrom postgres.berkeley.edu .[SeRu85] Z. Segall and L. Rudolph, "PIE: A Programming and Instrumentation Environmentfor Parallel Processing," IEEE Software, Nov. 1985, pp. 22-37.[SiScGr91] A. Singh, J. Schaeffer and M. Green, "A Template-Based Approach to theGeneration of Distributed Applications Using a Network of Workstations," IEEETransactions on Parallel and Distributed Systems, Vol. 2 (1), Jan. 1991, pp. 52-67.[Sn84] L. Snyder, "Parallel Programming and the Poker Programming Environment," IEEEComputer, July 1984, pp. 27-36.[Sn91] L. Snyder, "The XYZ Abstraction Levels of Poker-like Languages," in A. Nicolau, D.Gelernter, and D. Padua, ed., Advances in Languages and Compilers for ParallelProcessing, Research Monographs in Parallel and Distributed Computing, MIT Press,Cambridge, Massachusetts, 1990.[SrChWa91] H. Sreekantaswamy, S. Chanson and A. Wagner, Performance PredictionModeling of Multicomputers, TR 91-27, Department of Computer Science, Universityof British Columbia, Vancouver, Canada, Nov. 1991.[Su90] V. Sunderam, "PVM: A Framework for Parallel Distributed Computing,"Concurrency: Practice and Experience, Vol. 2 (4), December 1990, pp. 315-339.[Wa89] A. Wasserman, "Tool Integration in Software Engineering Environments," in F.Long, ed., Software Engineering Environments, Proc. of the Int'l Workshop onEnvironments, Chinon, France, Sept. 1989, Springer-Verlag, Berlin, 1990, pp. 137-149.[WiWo89] A. Wileden and A. Wolf, "Object Management Technology for Environments," inF. Long, ed., Software Engineering Environments, Proc. of the Intl Workshop onEnvironments, Chinon, France, Sept. 1989, Springer-Verlag, Berlin, 1990,pp. 137-149.[Wi91 a] G. Wilson, Via: A Structured Message-Passing System for Parallel Computers,Edinburgh Parallel Computing Centre, TR91-16, 1991.82^REFERENCES[Wi9 1 IA G.Wilson, Seminar at the University of British Columbia, Dept. of ComputerScience, 1991.[XDR87] Sun Microsystems, Inc., XDR: External Data Representation Standard, (RFC1014), in Internet Working Group Requests for Comments, no. 1014, NetworkInformation Center, SRI International, Menlo Park, California, Jun. 1987.[Va90] L. Valiant, "A Bridging Model for Parallel Computation," Communications of theACM, Vol. 33 (8), pp. 103-111, 1990.[Za891 P. Zave, "A Compositional Approach to Multiparadigm Programming," IEEESoftware, Sept. 1989, pp. 15-25.


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items