Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Implementation of the Cambridge ring protocols on the sun workstation Chan, Linda 1985

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1985_A6_7 C41.pdf [ 3.38MB ]
Metadata
JSON: 831-1.0051853.json
JSON-LD: 831-1.0051853-ld.json
RDF/XML (Pretty): 831-1.0051853-rdf.xml
RDF/JSON: 831-1.0051853-rdf.json
Turtle: 831-1.0051853-turtle.txt
N-Triples: 831-1.0051853-rdf-ntriples.txt
Original Record: 831-1.0051853-source.json
Full Text
831-1.0051853-fulltext.txt
Citation
831-1.0051853.ris

Full Text

IMPLEMENTATION OF THE CAMBRIDGE RING PROTOCOLS ON THE SUN WORKSTATION By LINDA CHAN Bachelor of Science, University of British Columbia, 1983 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA July 1985 © Linda Chan, 1985 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date A ^ i A s t - IM , [IgV A B S T R A C T As Local Area Networks gain momentum in recent Computer Science research, implementation is generally characterized by various factors such as efficiency, reliability, error recovery, and synchronism; however, how well the above issues can be achieved is heavily dependent on the facilities available in an implementation environment. Due to the recent popularity of message passing and concurrent processes, the UNIX 4.2bsd operating system with its interprocess communication facility is chosen to be the implementation environment for the Cambridge Ring's Basic Block and Byte Stream Protocols. Basic Block Protocol, implemented as a device driver in the system kernel, is the lowest level protocol which provides an unreliable datagram service, while the Byte Stream Protocol, implemented using multi-concurrent processes in the user space, provides a reliable, full-duplex virtual circuit service based on the service provided by the Basic Block Protocol. This thesis describes the protocol implementation on a 68000 based SUN workstation, and discusses results learnt from the experiment. The multi-concurrent processes approach is found to work adequately well for a small number of clients, but incur high overhead when the number of clients is large. implementations protocol also gain their recognition. Protocol -ii-Table of Contents ABSTRACT ii T A B L E OF CONTENTS iii LIST OF FIGURES vi ACKNOWLEDGEMENTS viii Chapter I Introduction 1 1.1. Thesis Motivations and Objectives 1 1.2. Thesis Outline 3 Chapter II Cambridge Ring Hardware and Protocols 4 2.1. Slotted Ring 4 2.2. Cambridge Ring Hardware 5 2.2.1. Repeater 7 2.2.2. Station Unit 7 2.2.3. Access Box 7 2.2.4. Monitor 8 2.2.5. Ring's Minipacket 8 2.2.6. Ring Operation 9 2.3. Cambridge Ring Protocols 11 2.3.1. Basic Block Protocol 11 2.3.1.1. How To Transmit A Basic Block 13 2.3.1.2. How To Receive A Basic Block 14 2.3.2. Byte Stream Protocol 16 2.3.2.1. Byte Stream Packet 16 2.3.2.2. Byte Stream Command 17 2.3.2.3. State Transition Table 19 2.3.2.4. Initial State 21 - i i i --iv-2.3.2.5. Initial Connection 21 2.3.2.6. Timeout 21 2.3.3. Relationship between Basic Block and Byte Stream Protocols 22 Chapter III Protocol Efficiency Versus Implementation Environment 23 3.1. Where Should A Protocol Reside 23 3.2. UNIX 4.2bsd Environment : 26 3.2.1. Unix Process Control 26 3.2.2. Interprocess Communication Facilities 27 3.2.3. I/O System 31 3.3. Overall Implementation Design of BBP and BSP 32 Chapter IV Basic Block Protocol Implementation 36 4.1. Existing Basic Block Protocol Implementation 36 4.2. Basic Block Protocol on the SUN Workstation 38 4.2.1. Device Drivers for the SUN Workstation 38 4.2.1.1. The SUN Hardware and the Multibus 40 4.2.2. The Implementation 42 4.2.2.1. Modified Algorithm 42 4.2.2.2. Ring Driver 43 Chapter V Byte Stream Protocol Implementation 45 5.1. Reader 46 5.2. Writer 48 5.3. Multiplexor 50 5.4. BSP Server 52 5.4.1. Data Structures 52 5.4.2. BSP Server Administrator 55 5.4.3. Secondary Server 56 5.4.4. Basic Block Processing 57 5.4.5. Timer Design 61 Chapter VI Performance Measurements 64 6.1. Experiment Descriptions and Results 64 -V-6.1.1. Performance of the Basic Block Protocol 64 6.1.2. Performance of the Byte Stream Protocol 65 6.1.3. Overall Performance of the Two Protocol Layers 70 6.2. Evaluation 73 6.2.1. Analysis of the Performance of the Protocols 73 6.2.2. Brief Comparison to the Implementation on the LSI 11/23 75 Chapter VII Conclusions 78 7.1. Thesis Summary 78 7.2. Future Work 79 BIBLIOGRAPHY 80 List of Figures Figure II-1. Hardware Components[WiWh79] 5 Figure II-2. Cambridge Ring Local Area Network 8 Figure II-3. Minipacket 8 Figure II-4. Basic Block Format 11 Figure II-5'. Header Minipacket[John80] 12 Figure II-6. Transmitter Algorithm 13 Figure II-7. Receiver Algorithm 14 Figure II-8. Byte Stream Packet 16 Figure II-9. Command Format 17 Figure 11-10. State Transition Table[DaI8l] 20 Figure 11-11. Relationship between the Protocol Layers 22 Figure III-1. Server Process 30 Figure III-2. Client Process 31 Figure III-3. Overall Design of BSP and BBP 35 Figure IV-1. Modified Header Minipacket 36 Figure IV-2. Modified Basic Block Format 37 Figure IV-3. Declaration of the Character Device Switch 39 Figure IV-4. Initialize cdevsw Table for Ring Driver 39 Figure IV-5. Motorola Byte Ordering 41 Figure IV-6. IEEE-P796 Byte Ordering 41 Figure IV-7. Modified Transmitter Algorithm 42 Figure IV-8. Modified Receiver Algorithm 43 Figure V - l . Reader 47 Figure V-2. Writer 49 Figure V-3. Using Select Primitive 51 Figure V-4. Byte Stream Packet Buffer 53 Figure V-5. Request Frame 53 -vi--vii-Figure V-6. Server State Information Node . 54 Figure V-7. BSP Server Administrator 55 Figure V-8. Secondary Server 56 Figure V-9. Basic Block Arrival 58 Figure V-10. Processing OPEN Command 59 Figure V - l l . Processing Data Command 60 Figure V-12. Start Timer 63 Figure V-13. Stop Timer 63 Figure VI-1. Performance of the Basic Block Protocol 65 Figure VI-2. Result of UNIX IPC Communication Cost 66 Figure VI-3. Byte Stream Protocol Performance with Various Packet Size 67 Figure VI-4. Byte Stream Protocol Performance with Different Number of Clients 68 Figure VI-5. Byte Stream Protocol Performance with Various Timeout Values 69 Figure VI-6. Overall Performance of the Two Protocol Layers 70 Figure VI-7. Protocol System Performance with Different Number of Clients 71 Figure VI-8. Protocol System Performance with Six Clients Versus Various Timeout and Preallocated Buffer Values 72 Figure VI-9. BSP Implementation on the LSI ll/23[Wong84] 75 A C K N O W L E D G E M E N T I would like to extend my sincere thanks to Dr. Samuel Chanson for his guidance and his supervision of this thesis, to Dr. Son Vuong for the reading of this thesis, to Mr. Patrick Boyle for his contribution in building the ring interface. Also, I deeply appreciate the help from Julian Wong during the initial stage of the thesis. I am also grateful to the Natural Sciences Engineering Research Council for the financial support. Lastly, a special "thank" to my dear family members and friends for their continuous encouragement during the difficult times. -viii-C H A P T E R I INTRODUCTION 1.1. Thesis Motivations and Objectives With the ever-increasing growing demand for cost-effective data communication, Local Area Network has become the key concern of many recent Computer Science research. A Local Area Network (LAN) can be defined as an interconnection of comput-ing nodes via a communication network that covers a limited geographical area less than a few kilometers in diameter. It is generally characterized by its inexpensive transmis-sion media, high data, transmission rate and a high degree of interconnection between nodes on the network. Furthermore, with the advances in distributed computing, the many benefits of local area networking cannot be overlooked. These include [Clark78] • high data rates • low transmission error « low cost connection and installation • high reliability and availability • flexible topology • resource sharing and load leveling • modular and incremental growth A Local Area Network consists not only the physical hardware but also software to control communication among the connected nodes. The communication software imple--1-ments one or more Protocol . A protocol, by definition, is a set of rules which the com-municating parties must follow to achieve a meaningful conversation. It also provides a more intelligent form of communication than those found in the physical hardware. Due to the fact that raw transmission over a communication channel is susceptible to error, a protocol usually provides mechanism for error detection, error recovery, traffic control and routing decision to ensure the provision of a reliable, error-free communication chan-nel to the network users. In recent protocol implementations, protocol design follows a few design rules and the standard adopted by the International Standard Organization (ISO). Protocols are commonly implemented in a layered fashion. Each layer is self-contained and provides specific services to higher layers, interacting only with its immediate neighbours. The layer interface is designed in such a way that only minimal information crosses the data boundary. Consequently, the layer structure provides the benefits that the entire proto-col can be implemented in a piecemeal manner and that each layer operates indepen-dently so that modification done to a specific layer need not affect the operations of other layers. From a protocol implementor's point of view, factors such as efficiency, reliability, error recovery, and synchronism are key concerns. Unfortunately, the ease of how well the above issues can be achieved is often dictated by the facilities available in an imple-mentation environment. The objective of this thesis is thus to investigate how well an environment the UNIX 4.2bsd running on the SUN workstation provides for the imple-mentation of the Cambridge Ring's Basic Block and Byte Stream Protocols, using the multiple concurrent processes approach. Performance measurements are collected to analyze the effectiveness of the selected approach. I. 2. Thesis Outline Following this introduction chapter, readers will find a presentation on the back-ground information of the Cambridge Ring Hardware and Software protocols in Chapter II. The controversial issue of where should a protocol reside in the operating system is discussed in details in Chapter III. The next two chapters, IV and V, are presentations of the Basic Block and Byte Stream Protocols implementations respectively, followed by a performance analysis in Chapter VI. Finally, results learnt from the study and sugges-tions for possible improvements to the implementations conclude the thesis in Chapter VII. Since the protocol implementations are written in the language C, all coding illus-trations presented throughout the thesis are also C-like. Thus, the readers are assumed to have a basic understanding of the language C. Readers unfamiliar with the language are advised to refer to the C Manual [Kern78]. C H A P T E R H C A M B R I D G E R I N G H A R D W A R E A N D P R O T O C O L S This chapter provides the background information on the slotted ring technology and an overview of the Cambridge Ring. The ring hardware and software protocols are discussed. This discussion forms the basic understanding of the protocol implementation which will be presented in later chapters. 2.1. Slotted Ring The slotted ring can generally be described as a ring management technique which divides the total electrical length of the ring into a number of fixed-size slots. Each slot is used to transmit a message or data packet from one station to another. Since the slot is of fixed length, a long message is required to be transmitted in different slots. In order to distinguish an empty slot from an occupied one, a bit in the slot is used for this func-tion. Whenever a station wishes to transmit, it awaits an empty slot by checking the slot-full indication bit. Once an empty slot arrives, it fills the slot with its packet. The packet could then be passed around the ring without collision. The packet, circulating around the ring, can be purged either by the destination node or the source node. Also, this slotted ring management technique allows more than one station to transmit data simultaneously without collision, as long as there is empty slot available; thus, efficient use of the bandwidth is in order. Cambridge Ring, to be described in the next section, is an example of a local area network which employs the slotted ring technique. Ring Hardware -4- Protocols Ring Hardware Protocols 2.2. Cambridge Ring Hardware The Cambridge Ring [Coll82, NeHe82, Toltec, WiWh79],. developed at the Com-puter Laboratory of the Cambridge University, is now used both in commercial and experimental applications. It is a slotted ring with a typical raw data rate of 10 megabits/sec, using the twisted pairs as transmission medium. Attached to the ring are communicating entities known as station*. The station unit is connected to the ring by a Repealer, and to the local host by an Access Box. The repeater and station unit are identical for all stations except that the station unit has inserted into it a plug specifying its address, while the access box is specially built to tailor the station interface to a par-ticular type of computer or a particular device to the ring. Moreover, on each data ring, there is a Monitor station whose duties include supervising start-up and handling cer-tain: types of error. Repeater Station Unit Access Box to device Figure II-1 : Hardware Components[\Vi\Vh79] Ring Hardware -6- Protocols Figure II-2 : Cambridge Ring Local Area Network Ring Hardware -7- Protocols 2.2.1. Repeater The ring is capable of transmitting information at relatively high rate. Information travels in only one direction, passing from one station down to another. Thus, the repeater at each station node is mainly used to regenerate signal to pass downstream and to pass the data to its station unit. Moreover, included in the repeaters are transformers which remove any common mode interference or low frequency noise, and screening of the twisted pairs is found to be unnecessary. Currently, though the repeaters are thoroughly tested with a 200-meters of cable between them, the upper limit still remains at 100-meters, with many repeaters still connected very much closer together than this |WiWh79]. 2.2.2. Station Unit The Station unit provides receiver and transmission logic to transmit and receive data to/from the ring. It has a source select register (SSR) which can be set to allow reception of data from any source, a specific source, or no source. Furthermore, the sta-tion unit watches the bit stream emitted by the repeater and detects the framing of the minipackets. It also has parity logic to check the parity of packets and delay logic to delay passing unfavourable reponses to the Access Box so that devices would not be able to jam the ring with continual retries. 2.2.3. Access Box The access box provides the necessary interface to connect the computer host to the ring. Ring Hardware -8- Protocols 2.2.4. Monitor The Monitor Station is used to start up the ring and to detect various types of errors. During start up mode, the Monitor sets up a packet structure, reads the switches giving the packet count and also counts and holds the number of gap digits. Once the run mode is entered, the error counters and indicators are cleared and the standard framing structure is maintained. The structure consists of a fixed number of packets. 2.2.5. Ring's Minipacket The unit of transmission at the lowest level is the 40 bits minipacket as shown in Figure II-3. control response frame monitor usage destination source data data bits bits • parity Figure II-3 : Minipacket It consists of a source address byte, a destination address byte, two data bytes, two response bits and six control bits. The first bit is the frame bit which is always set to one to distinguish it from gap bits. Next comes the monitor bit to check whether the packet has passed by the monitor station in previous round. Following the monitor bit is the usage bit which indicates whether the packet is in use. After that come the desti-nation and source bytes, followed by two bytes of data. These are, in turn, followed by 2 control bits for specifying protocol type and 2 response bits which are both initially set to one when the packet is launched. Upon return of the packet, the response bits can be set to one of four different values, representing the action taken by the destination sta-tion. They are : Ring Hardware -9- Protocols 00 accepted by destination station 01 destination station is busy 10 destination station rejected the minipacket 11 destination station ignored the minipacket The last bit is the parity bit which is used solely to detect transmission error, but is not copied by the destination station. 2.2.6. Ring Operation The ring operation is very simple. To send a minipacket from Station P to Station Q, it is necessary to load the destination byte and the data bytes in Station P and indi-cate that the packet is ready for transmission. The station itself would then load in the source byte and initialize the control bits and response bits, having the complete mini-packet waiting in a shift register. As soon as an empty slot passes by Station P, the sta-tion fills the slot up with its minipacket from the shift register, marking the packet full. The minipacket travels around the ring until it reaches Station Q where it will be acted upon accordingly, and then returns to Station P. Station P, the originating station, cannot transmit anymore minipacket to any sta-tion until the minipacket sent to Station Q has returned. The time taken for a mini-packet to return is known as the ring delay, and is in the range of 10 microseconds [Need79]. Once the minipacket returns, the sending station is notified and its access box would read in the response bits to determine the fate of the minipacket. If retransmis-sion is needed, the minipacket is still available in the shift register. However, the send-ing station is not permitted to use the empty slot immediately. The slot is marked empty to be passed to the next station. This mechanism automatically provides an Ring Hardware -10= Protocols anti-hogging control so that one particular station cannot monopolize the usage of the empty slot. With m stations on the ring, each station is guaranteed a minimum of l/(m+l) of the ring capacity [Need79]. One final point regarding the ring hardware operation is that the error rate is very low (about 1 in 5E11 bits) [Need79]. The block of data being transferred would rarely be damaged, leading to the simpler development of higher level protocols. . After this brief description of the ring hardware and its operation, it is now appropriate to discuss the higher level protocols developed for the Cambridge Ring. However, before proceeding any further, it is worth noting that the stations can com-municate with each other just by using the available minipacket, without any higher level protocols. If there is a very low level of intelligence at both ends, no error control and flow control are required, and data proceed at a reasonably predictable rate, com-munication using the low level minipacket would be sufficient. However, with most sys-tems, there is sufficient intelligence available to handle higher level units — thus, higher level protocols such as Basic Block Protocol [John80] and Byte Stream Protocol [John80] are developed to allow the communicating entities on the ring to communicate more intelligently. Ring Hardware -11- Protocols 2.3. Cambridge Ring Protocols The two lowest level protocols developed for the Cambridge Ring are the Basic Block and Byte Stream Protocols. The two protocol layers operate independently but in a cooperative fashion. They interact with each other via the usage of the Byte Stream Packet, which will be discussed in a later section. 2.3.1. Basic Block Protocol The Basic Block Protocol interacts with the ring hardware. It provides various functions such as the transfer of blocks of data (2 - 2048 bytes), delivery to a port within a destination computer, and detection of transmission errors with no recovery. In short, basic block protocol provides an unreliable datagram service to the protocol layer above it. The unit of information transfer in the Basic Block Protocol is known as a basic block. It has a format shown in Figure II-4. Header Port Data Data Checksum Figure II-4 : Basic Block Format Ring Hardware -12- Protocols 1001 T Y Length where T Y = type of basic block 0 : long block with checksum 1 : long block with checksum zero 2 : single 16-bit mini-packet carrying data in "Length" field 3 : octet block with checksum Length = length of data train A long block consists of : - header minipacket - port minipacket - length + 1 data minipackets - checksum minipacket An octet block consists of : - header minipacket, with Length field = 0010011001 - size minipacket, a 16-bit count of the number of data octets - port minipacket - data minipackets - checksum minipacket Figure II-5 : Header Minipacket [John80] The basic block commences with a header minipacket, shown in Figure II-5, which consists of a fixed pattern of four bits, 2 control bits indicating the type of the block and 10 data bits specifying the number of data minipackets in the block. Following the header comes the port minipacket. It contains the port number which gives the logical destination of the block in the destination station. The minipacket with an invalid port number is discarded by the recipient machine which will set its SSR to be deaf to the sending station for a time period so as to reject reception of the data minipacket. Next comes a train of data minipackets, up to the number specified in the header. To ter-minate a basic block, the checksum minipacket is sent for the purpose of detecting any transmission error. Ring Hardware -13- Protocols 2.3.1.1. How T o Transmit A Basic Block Since BBP is the lowest level protocol, it receives the data to be sent from the layer above - the Byte Stream Protocol. To transmit the data for the Byte Stream layer, BBP breaks the data into various data minipackets. In order to form a basic block, the BBP sends not only the data minipackets but also the header, port , and checksum minipack-ets. For each of the minipacket in the basic block, the BBP software would load the destination address bytes and the 2 data bytes onto the interface and signals the inter-face to transfer this minipacket. The minipacket would then travel around the ring, back to the sender with its response bits set accordingly. The algorithm used to transmit a basic block is outlined in the following figure. Transmitter () { Start Timer; Init Checksum minipacket; Transmit Header; While response —= 'busy' or 'rejected' { retransmit Header; If retries > MAX RETRY VALUE, Exit; } If response 'ignored','error', Exit; Transmit PORT; For (rest of block ) { transmit DATA; While response = = 'busy' { retransmit DATA; If retries > MAX RETRY VALUE, Exit; } If response = = 'rejected','ignored','error' Exit; } Transmit CHECKSUM; If TIMEOUT, Exit; } Figure II-6 : Transmitter Algorithm Ring Hardware -14- Protocols 2.3.1.2. How T o Receive a Basic Block The Receiver is supplied with a source select register (SSR), a received address register and a 16 bit data register. The source select register enables the receiver to select the station it wishes to communicate. It can be set to 3 different states: • listen to all stations • listen exclusively to 1 station • listen to nobody The received address register contains the address of the station which originated the transmission. To receive a Basic Block, the following algorithm is used : Receiver () { Forever (; ;) { Set SSR to listen to all stations; Wait for valid header; If source of header is ok { Set SSR to listen to source only; In it Checksum, counter, timer; Wait for port; If port ok { Count in DATA; Wait for Checksum; If Checksum ok and no TIMEOUT Deliver block to client; } } } Figure II-7 : Receiver Algorithm Ring Hardware -15- Protocols During transmission and reception, a timer is set so that recovery from a reception/transmission station crashing in the middle of a block is made possible. The timeout mechanism is useful in situation where ring errors might cause the minipacket never to return to its sender. Retry mechanism is also used to compensate for the slow speed of some stations. When a minipacket is marked as "busy" upon its return, the same minipacket can be retransmitted up to but not exceeding the retry limit. The retry and timeout values are system dependent, selected by the implementor. Ring Hardware -16- Protocols 2.3.2. Byte Stream Protocol Above the Basic Block layer is the Byte Stream layer. Some of the Byte Stream services include : • error correction (retries) • loss detection (timeouts) • duplicate detection (sequence number) • flow control (acknowledgements) • connection management (OPEN, CLOSE, RESET) In short, Byte Stream Protocol provides a reliable, full-duplex virtual circuit service. Once a byte stream is opened between two processes, each process considers the stream to be a full duplex bi-directional connection. Neither process is a master nor a slave. The process acting as the receiver sends a 'nodata' when requested to send data, while the transmitter sends a 'not ready' to notify the receiving process that it is not ready to accept any data. As is the case in most systems, data transfer usually takes place in one direction only. 2.3.2.1. Byte Stream Packet The unit of information transfer in the Byte Stream layer is a Byte Stream Packet, as shown in Figure II-8. control/ reception txmission data ... data command command (if any) 16 bits 16 bits 16 bits 16 bits Figure II-8 : Byte Stream Packet Ring Hardware -17- Protocols Reception command is information referring to reception of data. They are the Ready (RDY) and the Not Ready (NOTRDY) commands. Transmission command is information referring to transmission of data. They are the Data (DATA) and No Data (NODATA) commands. The Control command includes RESET, CLOSE, OPEN, OPENACK. Command Format The format of a command is: 8 bits 8 bits Figure II-9 : Command Format Command contains the command bit pattern and sequence number if any Flags contains command parameters Note: The sequence number associated with each command is represented by including it in parentheses () following the command names. 2.3o2.2. Byte Stream Command Byte Stream Commands are used by the communicating processes to communicate and to achieve synchronization. The following provides a summary of the various byte stream commands. Ring Hardware -18- Protocols Open is a connection command requesting the opening of a port for a connection. OpenAck is the acknowledgement of Open. It contains information about the request (whether it was successful or not) and if the connection is successful, status information about the destination port. Reset is a control command generated on request from higher level software or as a result of a serious error condition occurred within the Byte Stream layer. All received data should be ignored until Reset Response is received in reply. At that time, the byte stream re-enters the initial state and resumes the normal transaction. Close is a control command to request the termination of a byte stream connec-tion. All transactions are terminated immediately. The reply to a Close is also a Close command. Data(n) is a transmission command used to indicate that the packet contains data, with a sequence number n. NoData(n) is a transmission command used to inform the receiver that it has no data immediately available to send. Rdy(n) is a reception command used to indicate that blocks with sequence number less than n have been successfully received, and that it is ready to receive block n. NotRdy(n) is a reception command used to acknowledge the receipt of all packets with sequence number less than n, but is unavailable to receive any more data. Data(n) and Rdy(n) are considered as essential elements due to the fact that they are required to be acknowledged within a time interval. The sender of the essential ele-ments is responsible to retransmit whenever a timeout takes place. On the other hand, Nodata(n) and Notrdy(n) are classified as nonessential elements, since they act as ack-nowledgement and are not themselves acknowledged. The sender of the nonessential ele-ments is required to send the appropriate essential element at a later time. Ring Hardware -19- Protocols 2.3.2.3. State Transition Table Byte Stream Protocol implementations are generally event driven. The protocol is best presented in a State Transition Table, shown in Figure 11-10. Since transmission and reception are symmetrical, a single state transition Table is presented for both. In the State Transition Table, there are three states and six events. Essential (E) : Idle (I) :• Events E[rep] E[exp] N[exp] Timeout Buffer Ready Idle Handshake An essential element has been sent, awaiting an acknowledgement within a timeout interval. Nonessential (N) : A nonessential element has been sent. No acknowledgement is expected. A nonessential element has been received. No action can take place until the corresponding essential element is received. A repetition of an essential element which has been sent previ-ously. An essential element with the expected sequence number. A nonessential element with the expected sequence number. Timer expires while waiting for a response for an essential ele-ment. There is buffer space to receive data or there are data to send. Idle Handshake Timer expires implies that the previous essential element should be retransmitted. Ring Hardware -20- Protocols ^^~^State Event E N I E[rep]: Data(n-l)/ Rdy(n) No Action Next State : E Retransmit Notrdy(n) Nodata(n)/ Next State : N Protocol Error E[exp]: Data(n)/ Rdy(n+1) Empty/fill buffer n +:= 1 Buffer ready? yes: Transmit Rdy(n)/ Data(n) Next State : E Start Timer Protocol Error Empty/fill buffer n +:= 1 Buffer ready? yes: Transmit Rdy(n)/ Data(n) Next State : E Start Timer no: Transmit Notrdy(n)/Nodata(n) Next State : N no: Transmit Notrdy(n)/Nodata(n) Next State : N N[exp]: Nodata(n)/ Notrdy(n+l) Start Idle Handshake Timer Next State : I Protocol Error No Action Next State : I Timeout Retransmit Rdy(n)/Data(n) Next State : E Start Timer No Action Next State : N Protocol Error Buffer Ready No Action Next State : E Transmit Rdy(n)/Data(n) Next State : E Start Timer No Action Next State : I Idle Handshake Timer Expires Protocol Error No Action Next State : N Retransmit Rdy(n)/Data(n) Next State : E Start Timer Figure 11-10 : State Transition Table [Dal8l] Ring Hardware -21- Protocols 2.3.2.4. Initial State Initially, the receiver sets its sequence number to zero and remains in state N, while the transmitter puts itself in state I with sequence number -1. Once the receiver has room to receive data (i.e., a buffer ready event), it sends a Rdy(O) and goes to state E . Upon receipt of such a command, the transmitter regards this as event E[exp], incre-ments its sequence number to 0, and transmits either Data(0) or Nodata(0). 2.3.2.5. Initial Connection Initial connection is established via the usage of the OPEN and OPENACK pack-ets. When a station wishes to establish connection with another station, it sends an OPEN packet. The OPEN packet is directed to a ring service address, which is obtained by looking up a text name in the name server. The recipient station acknowledges its intention to communicate with the OPEN packet sender by transmitting an OPENACK packet. Once the OPENACK reaches the originating station, the byte stream is now set up between these two stations. 2.3.2.6. Timeout Timeout mechanism is employed to detect the loss of a packet during normal data transfer; thus the timer used for this purpose is known as Normal timer. Moreover, the timeout mechanism is used during the idle handshake period to avoid perpetual wait in an Idle state when the other station is no longer available. When the idle handshake timer expires, the previous essential element is retransmitted, putting the sender back to Essential state. Ring Hardware -22- Protocols The selected timeout values depend on various factors such as the machine speed and desired throughput under error conditions. Therefore, the choice remains the implementor's decision. However, a typical value for the Normal timeout is one or two seconds, whereas one minute would be appropriate for the Idle handshake timer. 2.3.2.7. Relationship between Basic Block and Byte Stream Protocols The following illustration best presents the relationship between the protocol layers. It shows the mapping from client data onto a Byte Stream block and subsequently onto a Basic Block. client's data ... data reception command txmission command data bsp ... bbp header port data data data checksum Figure II-11 : Relationship between the Protocol Layers C H A P T E R H I P R O T O C O L E F F I C I E N C Y V E R S U S I M P L E M E N T A T I O N E N V I R O N M E N T In this chapter, we will examine how an implementation environment affects proto-col efficiency. A discussion on the controversial issue of where to place a protocol pack-age within an operating system will first be presented, followed by a brief study of the UNIX 4.2bsd environment where the Cambridge Ring protocols will be implemented. Finally, based on the previous discussion, an overall implementation design of the Basic Block and Byte Stream Protocols will be given. 3.1. Where Should A Protocol Reside Efficiency is a very important aspect in protocol implementation. Experience sug-gests that efficiency is affected by factors such as the design of the protocol and the underlying structure of the host operating system. Oftentimes, good modularization and structuring of the protocol do not necessarily lead to good performance. It is the designer's responsibility to determine the amount of tradeoff between good structure and good performance. However, the dominant factor which determines how well this conflict can be resolved is not the protocol but the operating system [Clark82]. Thus, the important issue of how the protocol implementation is integrated into the host operating system requires special attention. To add a protocol to an operating system, one could implement it either as a pro-cess that is provided by the operating system or in the system kernel itself. This decision Efficiency -23- Environment Efficiency -24- Environment is strongly influenced by the hardware architecture and operating system design. As with all methods, there are advantages and disadvantages associated with each approach; they will be presented in the following discussion. Implementing protocol in a process gives a very simplistic view. Since no kernel modification is necessary, the task of implementing the protocol is alleviated. The imple-mentor need not have a detailed knowledge of the kernel. Thus, future maintenance of the protocol package is simplified. Furthermore, if the underlying operating system sup-ports an adequate interprocess communication (IPC) facility, the protocol package could then be implemented using, many concurrent processes. Each process is now dedicated to perform a specific protocol function. They communicate with each other via the IPC provided by the operating system. This multi-concurrent processes approach further enhances the comprehensiveness of the protocol, which certainly leads to easier develop-ment. Unfortunately, there are various drawbacks associated with the process oriented approach. One main disadvantage is that process scheduling could cause significant real-time delay. If the protocol must compete for CPU resources with other processes, it must await a scheduling quantum before the protocol can be run. Moreover, its performance can further deteriorate if the operating system does not have the priority tools to bring the process into execution whenever there is work to be done. With the multi-concurrent processes approach, there are additional problems such as synchronization, mutual exclu-sion and deadlock. However, these can easily be solved by the implementor's careful usage of the IPC primitives and semaphores. Thus, the main disadvantage associated with the concurrent processes approach is the high communication cost associated with the IPC. Efficiency -25- Environment Obviously, the reason for putting the protocol package in the kernel level is that the overhead cost of process swapping and scheduling can be avoided. Also, the com-munication cost is reduced; thus the performance of the protocol is improved. Also, in the structure viewpoint, it is reasonable to view the network as a device, and device drivers are usually implemented in the kernel level. Thus, it seems to be a very good idea to place the protocol package in the kernel. However, there are many disadvantages with the kernel approach also. Since ker-nel code is modified, recompilation of the kernel is needed everytime any change is made. This will disrupt the system's availability to other users if it is a multi-users system. Any mistake made in the kernel would affect system performance. This approach makes future maintenance of the protocol package a potential problem. Furthermore, a network protocol is usually a complicated piece of software and requires complex actions to service a timeout event. Most systems lack facilities at the kernel level to act on such an event. Thus, the interrupt mechanism is used to bring the protocol into execution. However, running a protocol at interrupt time brings major drawbacks. Since the actions required are complex and possibly time consuming, prob-lems could arise in masking an interrupt for too long to service an interrupt. Also, as a result of running in an interrupt handler, the system scheduler cannot be invoked; thus, the system would have no control over the percentage of system time used by the proto-col handler. If a station is overwhelmed with packets from a malfunctioning station, the system would spend most of its time in the interrupt handler, thus causing the system to be unavailable to other processes [Clark82]. Efficiency -26- Environment In the above discussion, tradeoffs between placing the protocol package in a system process or in the kernel level are presented. However, the choice still remains the respon-sibility of the implementor. 3.2. U N I X 4.2bsd Environment 3.2.1. UNIX Process Control UNIX processes are created by the system primitive fork. Once it is created, an entry is made in the system process table to keep track of the process when it is not active. Information such as the process name, the location of the other segments, and scheduling information are stored. From the process table entry, all data associated with a process may be accessed. Also associated with a process are the user data segment, the system data segment and the text segment. The system data segment contains the current status of the pro-cess and it is not addressable from the user process. Processes are swapped to and from secondary memory, by copying in and out the user data segment, the system data segment and the text segment. A kernel swapping process is present to perform the swapping of processes. It examines the process table to select a process to be swapped out when no more primary memory is available. The selection criterion of which process to be swapped in and out would not be discussed here. Once the process is swapped in and ready to run, it has to await the scheduler to assign it the processor. The scheduler selects a process to run based on its priority. Pro-Efficiency -27- Environment cess priorities are assigned by an algorithm based on the recent ratio of the amount of compute time to real time consumed by the process. A high ratio implies a low priority. The compute-to-real-time ratio is updated every second. The scheduler simply selects the process with the highest priority to run. If a high priority process hog the computer, its priority will drop, whereas, a low priority process being ignored for a long time will have its priority raised [Thom78]. 3.2.2. Interprocess Communication Facilities In previous UNIX versions, interprocess communication has been a very weak area. Only related processes can communicate with each other via the usage of pipes. This has proven to be inadequate since it restricts communication within processes of the same ancestor, making it almost impossible to be used in a distributed environment. UNIX 4.2bsd release has further enhanced the IPC facilities by allowing processes to communi-cate in a UNIX file system-like name space and a network name space. Now, unrelated processes can communicate in a distributed environment. The IPC concept used by the UNIX 4.2bsd revolved around the socket notion. A socket is an endpoint of communication where a conversation can take place. To create a socket, a system call s = socket (domain, type, protocol) is issued. Currently, the IPC supports UNIX and Internet domains, and 3 types of socket. (1) s t ream socket a bidirectional, reliable, sequenced, and unduplicated flow of data without record Efficiency -28- Environment boundaries (2) datagram socket a bidirectional, unreliable, unsequenced, and unduplicated flow of data with record boundaries (3) raw socket provides users access to the underlying communication protocols which support socket abstractions A particular protocol can also be selected by specifying its number. Usually a socket is bound to a name. To bind a socket to a name, bind(s,name,namelen) is used. Once a socket is bound, it can now communicate with any unrelated process by listening on the bound socket to accept connection. listen(s, max number of outstanding connections); snew = accept (s, &from, &fromlen); The client could request connection by issuing connect (s,''server-name", sizeof("server-name")); Sending and receiving data on the socket could be done in various ways. write (s, buf, sizeof(buf)); read (s, buf, sizeof(huf)); or send (s, buf, sizeof(buf), flags); recv (s, buf, sizeof(buf), flags); where flags could be Efficiency -29- Environment SOF_OOB send/receive out of band data SOF_PREVIEW look at data without reading SOF_DONTROUTE send data without routing packets To close down a connection, simply discard the socket by close (s)„ The above brief presentation of UNIX 4.2bsd IPC provides a basic understanding of the available primitives. It is by no means complete. Interested readers should consult the IPC Primer [LeFaJo] for further details. Efficiency -30- Environment The following client and server processes serve as an illustration on how to use the IPC facilities. server () { int s, snew; char from[MAX_SIZE]; int fromlen; /* create a socket */ s = socket (AF_UNIX, SOCKJ3TREAM, 0); / * bind a name to socket */ bind (s, "server", sizeof("server")); /* listen for connection request */ listen (s,5); / * forever waiting for connection request */ for (; ;).. { snew = accept (s, &from, &fromlen); /* fork a new process to serve client */ if (fork () = = 0) { close (s); doit (snew, &from); } close (snew); /* no need to communicate with */ } / * client in parent process *•/ Figure III-1 : Server Process Efficiency -31- Environment client () { int s, n_bytes; / * create a socket * / s = socket (AF_UNLX, SOCK .STREAM, 0); / * request connection to server * / connect (s, "server", sizeof("server")); /* sends a request to server, and * / / * blocks until server replies */ for (; ;) { send (s, &request, sizeof(request), 0); n_bytes = recv (s, &answer, sizeof(answer), 0); } } Figure III-2 : Client process 3.2.3. I/O System In the UNIX system, there are three types of files : namely, the ordinary disk files, directories , and special files. However, we will not examine each file type individually, but will concentrate on the special files. Each I/O device running on the UNIX system has a special file associated with it. An entry of the special file is created in the directory /dev. All system calls used to access an ordinary file can also be performed on a special file. This will of course activate the associated device. • fd = open (name, mode) • close (fd) Efficiency -32- Environment • bytes_read = read (fd, buffer, number_of_bytes) • write (fd, buffer, number_of_bytes) • ioctl (fd, command, data) There are two classes of devices : block and character devices. Block devices are those which consist of randomly addressed, secondary memory blocks of 512 bytes each; for instance, disk and tapes. Any other types of device which do not belong to the block devices are classified as character devices. Devices of both types are characterized by a major and a minor device number. The major device number selects the device driver which will deal with the class of dev-ices, whereas, the minor number is passed to the driver to select one of several identical physical devices in the class. There are several advantages in treating I/O devices as special files. Firstly, file and device I/O are as similar as possible. Secondly, the file and device names have the same syntax and meaning, so that a device name could be passed as a parameter instead of a file name. Finally, special file could have the same protection as any ordinary file. 3.3. Overall Implementation Design of BBP and BSP As previously discussed in Section 3.1, the issue of how a protocol package is integrated into the host operating system is an important aspect in protocol implementa-tion. Consequently, before implementation of BBP and BSP can take place, a design plan of where each protocol layer should reside and how they should interact needs to be devised. Efficiency -33- Environment Ideally, it would be perfect to place both layers in the kernel level to achieve good, performance. Nevertheless, efficiency is not the only issue that has to be considered; there are also good structuring, ease of maintenance, comprehensiveness and many more. Therefore, in the BBP and BSP implementation, great emphasis is placed on the latter issues, resulting in a tradeoff of protocol efficiency. Since the Cambridge Ring hardware is attached to the SUN workstation as a dev-ice, a device driver needs to be written to interact with this physical device. The Basic Block Protocol is best implemented as a device driver due to the fact that it interacts directly with the ring hardware interface. Furthermore, once an incoming packet arrives, it should be dealt with without delay. Thus, placing the BBP in the kernel level would bypass the process scheduling delay. Also, the UNIX system calls used to access a file or device provide the interface to upper layers. Subsequently, the ring driver could be made invisible to its users. Despite the various drawbacks associated with the process oriented approach, the Byte Stream Protocol is designed to be implemented in processes supplied by the UNIX operating system. The BSP employs various concurrent processes working together to provide the protocol services to user processes. This multi-concurrent processes approach is made possible by the IPC supported in the UNIX 4.2bsd environment. The processes run independently of each other and are dedicated to specific functions. The functions of each of these processes will be presented in Chapter V. Since the multi-concurrent processes approach enhances comprehensiveness, ease of development, and future maintenance, it is then necessary to sacrifice efficiency to achieve the design goals. Performance measurements will be presented in Chapter VI, so Efficiency -34- Environment that we can decide whether the sacrifice has been made to a good cost. As a result of the design plan, the Cambridge Ring protocol family has been sliced into two. The Basic Block Protocol Layer resides in the kernel level whereas the Byte Stream Protocol Layer can be found in various system processes, as shown in the Figure III-3. KERNEL 1 - outgoing message 2 - incoming message 3 - request for bsp service 4-fork 5 - outgoing bsp packet 6 - incoming bsp packet Figure III-3 : Overall Design of BSP and BBP outgo ino; KTHS • b»»io bloeV DRIVER incoming (BBP) basis b lock - system process j \ - device driver — • - process communication -parentforks a child process C H A P T E R IV B A S I C B L O C K P R O T O C O L I M P L E M E N T A T I O N In this chapter, we will examine the Basic Block implementation on the SUN Workstation. However, before we do so, the existing implementations of the Basic Block Protocol on a TI 990 and a LSI 11/23 would be briefly presented, indicating their devia-tions from the original Basic Block Protocol proposed by Cambridge. 4.1. Existing Basic Block Protocol Implementation Currently, here at UBC, Basic Block Protocol has been implemented on a TI 990 running. VEREX operating system and on a LSI 11/23 running UNIX Version 7. The structure of a basic block is modified to enhance efficiency. (1) Instead of sending a PORT minipacket, the port value is now imbedded into the HEADER minipacket. This modified Header minipacket is used to maximize the usage of the available data space in a minipacket. Now, the transmission and reception algorithms are 4 bits 12 bits Figure IV-1 : Modified Header Minipacket BBP -36- Implementation BBP -37- Implementation simplified, and throughput can be improved since one less minipacket is sent when-ever a basic block is transmitted. In this modified version, the Header minipacket is not recognized by the leading bit pattern 1001, but by setting the two control bits of the 40-bits minipacket. (2) In addition, the Checksum minipacket is discarded from the basic block. Since the Local Area Network is known to have low error rate, the bits in transmission are seldom corrupted. The overhead cost of generating the checksum and the transmis-sion cost of sending the checksum minipacket are thus wasted. In experiments con-ducted on the TI 990, measurements indicate that the calculation of checksum gen-erally consumes a large amount of the processor time during the reception and transmission of a basic block. Without the checksum minipacket, the throughput is found to increase by as much as 10% [Wong84]. With all these benefits, the basic block structure is thus modified as follows: Header Data Data Figure IV-2 : Modified Basic Block Format BBP -38- Implementation 4.2. Basic Block Protocol on SUN Workstation 4.2.1. Device Drivers for the SUN Workstation UNIX provides a device independent view of the hardware to its application software. AH application software access any piece of raw hardware in identical ways. The interface between UNIX application software and a given piece of raw hardware is provided by a device driver for that piece of hardware. A device driver provides an interface between the UNIX operating system's device-independent scheme of things and the special characteristics of a particular piece of hardware [Sun84j. New device drivers could be added to the SUN's configurable kernel with minimal efforts. As noted in Section 3.2.3, all devices and special files have entries in the directory /dev. When a file is opened, the corresponding device driver is invoked via the entry in the /dev directory. The connection between the specific device name in the /dev direc-tory is made through two C structures, bdevaw (block device switch table) and cdevsw (character device switch table) found in file conf.c. Whenever a new device driver is added, the corresponding entries to the appropriate switch table have to be updated. Each character device may provide seven functions : open, close, read, write, ioctl, select, and mmap. The cdevsw table contains the interface routines present for charac-ter devices. When a file operation needs to be done on a special device, the operating sys-tem will execute the code of the corresponding device driver for that particular device. The device switch tables enable the operating system to locate the device driver routines correctly. The declaration of the character device switch is shown in the following figure. BBP -39- Implementation struct cdevsw { int (*d_open) (); / * routine to call to open the device */ int (*d_close)(); / * routine to call to close the device*/ int (*d_read) (); / * routine to call to read from device*/ int (*d_write)()j / * routine to call to write to device * / int (*d_ioctI)(); / * special interface routine * / int (*d_stop) (); / * routine to call to open the device * / int (*d_reset)(); / * routine to call to open the device */ struct tty *d_ttys;/* tty structure * / int (*d_select)(); / * routine to call to select the device*/ int (*d_mmap) (); / * routine to call to mmap the device * / } Figure IV-3 : Declaration of the Character Device Switch It is through the entries in the above structure that the link between the main unix code and the driver is established. For the ring driver, the initialization of the required pointers in the cdevsw structure would be as follows: . / * for devices with major device number less than the ring's major number *•/ { ringopen, ringclose, ringread, ringwrite, nodev, nodev, nodev, 0, nodev, nodev } Figure IV-4 : Initialize cdevsw Table for Ring Driver BBP -40- Implementation 4.2.1.1. The S U N Hardware and the Multibus Though the SUN uses Motorola MC68000 processor, its system hardware is built around the IEEE-P796 Multibus. Many conflicting issues arise from the MC68000 proces-sor and the IEEE-P796 Multibus board. These issues should be treated with great care when device drivers are to be written. The MC68000 processors do not distinguish between the memory and peripherals whereas the Multibus makes the distinction. Thus the Multibus has the notion of two separate address spaces, namely the Multibus memory space and Multibus 1/ O address space. Multibus memory space is simply used for memory or devices that look like memory, in that you talk to such devices simply by writing data to memory locations or reading from memory locations. Devices that look like memory are called 'memory mapped' devices. Multibus I/O address space is another 'space' that is typically used for device control re-gisters. Devices using the I/O address space are said to be I/O mapped devices.1 However, the MC68000 processor views the entire universe as one address space. For-tunately, the SUN memory management hardware provides mapping between the system's address space to the Multibus memory space or the Multibus I/O address space. Another conflicting issue requires special attention is the byte ordering issue. IEEE-P796 and Motorola do not agree on the addressing of bytes in a word. They have the conflicting idea of where to place byte 0 in a 16-bit word. ^ u n System Internals Manual, "Writing Device Drivers for the SUN Workstation", page 7. BBP -41-For Motorola, the byte ordering is : Implementation 15 0 ByteO | Bytel Figure IV-5 : Motorola Byte Ordering As for IEEE-P796, it is as follows: 15 0 Byte 1 Byte 0 Figure IV-6 : IEEE-P796 Byte Ordering Once a byte or word transfer crosses the interface between the MC68000 and the Mul-tibus, special manipulation is required to ensure that the interpretation of the content of a byte or a word is made correctly. BBP -42- Implementation 4.2.2. The Implementation In order to enable communication with the TI 990 and LSI 11/23, we have chosen to adopt the same modified basic block structure described in Section 4.1 on our imple-mentation. Due to insufficient time, only type 0 basic block is implemented. Other types can be added easily in the future. 4.2.2.1. Modified Algorithm Since the Basic Block structure is modified, the transmission and reception algo-rithms presented in sections 2.3.1.1 and 2.3.1.2 also need modification, perhaps, simplification. Transmitter () { Start Timer; Load port and length values into Header packet; Transmit Header; While response = = 'busy' or 'rejected' { retransmit Header; If retries > MAX RETRY VALUE, Exit; } If response = = 'ignored' or 'error', Exit; For (rest of basic block ) { transmit DATA; While response = = 'busy' { retransmit DATA minipacket; If retries > MAX RETRY VALUE, Exit; } If response = = 'rejected','ignored','error' Exit; } If TIMEOUT, Exit; } Figure IV-7 : Modified Transmitter Algorithm BBP -43- Implementation Receiver () { Forever (; ;) { Set SSR to listen to all stations; Wait for valid header; If source of header is ok { extract port value from header; If port ok { Set SSR to listen to source only; Start Timer; Count in DATA minipackets; Deliver block to client; } } } Figure IV-8 : Modified Receiver Algorithm 4.2.2.2. Ring Driver Based on the modified transmission and reception algorithms, the ring driver is thus installed. It consists of various routines corresponding to the standard file manipu-lation routines. • ringopen () - to open the ring driver for later usage • ringclose () - to close the ring driver • ringread () It employs the read ahead strategy to store the incoming packets in pre-allocated buffers, awaiting the upper layer to process them. Once all buffers are filled, the station sets its SSR to be deaf to everyone so as to reject all transmissions. As soon as buffer space becomes available, it resets its SSR to listen to everyone on the network. Currently, the number of buffers allocated is 4. BBP -44- Implementation • ringwrite () Each minipacket component of a basic block is transmitted after the previous mini-packet has been accepted by the destination station. For those rejected or ignored mini-packets, they will be retransmitted up to the maximum retry value. Once the limit is exceeded, the transmission is aborted and an error code is returned to the upper layer. Upon successful transmission, it returns the number of bytes transmitted. In addition to the above interface routines, there are ringprobe () routine called at boot time to determine whether the ring is physically attached to the SUN workstation, and an interrupt routine, ringintr (), called to service the generated interrupt for the arrival of a minipacket from the ring. All these routines comprise the ring driver and enable the Basic Block Protocol to function properly. C H A P T E R V B Y T E S T R E A M P R O T O C O L I M P L E M E N T A T I O N As mentioned in Section 3.3, the BSP handler employs various concurrent processes, namely Reader, Writer, Multiplexor and Server, working together to provide the protocol services to the client layer. Its concept is developed from an old saying, "divide and conquer". A complicated piece of task is best subdivided into smaller tasks so that each subtask can excel in its dedicated assigned function. This process is iterated until all subtasks are well-defined and manageable. With the above motivation inmind, the BSP handler is then designed and implemented. In this chapter, we will examine the functions and the implementation details of the processes. Prior to the presentation, a list summarizing the function of each process is provided for quick overview. •• Reader reads the basic block from the ring device driver and passes it to the • Multiplexor multiplexes the usage of the Reader process among, the BSP Servers • BSP Server provides byte stream services to its clients and ensures the protocol is functioning properly multiplexor • Writer writes the basic block directly to the ring device driver BSP -45- Implementation BSP -46- Implementation 5.1. Reader The Reader'3 main task is to continuously read in basic blocks from the ring driver. Since it has no knowledge to which BSP Server the basic block should be passed, the Reader then relays the basic block to the Multiplexor. When a read fails at the ring driver level, the Reader is notified of the error occurrence and relays the message to the BSP Server via the Multiplexor. Since the Reader performs relatively simple functions, its implementation is very straight forward. Initially, it establishes connection with the Multiplexor process. Then, the Reader enters an infinite loop to perform its dedicated function — reading in basic blocks. The following illustration best describes the Reader's purpose in life. -47- Implementation main(argc,argv) int argc; char *argv[]; int s; int s_mu!tiplexor; int fromlen, bytes_read; int ring_fd; int status; BB_BUFFER bb; if ((ring_fd = open ("/dev/rmg", 0)) < = 0) { printf ("Reader : unable to open ring for readingO); exit (1); } s = socket(AF_UNLX, SOCK_STREAM, 0); if (bind (s, "reader", sizeof("reader")) < 0) { printf ("READER : binding error 0); exit(l); } Iisten(s,5); s multiplexor = accept (s,0,fromlen); / * keep reading in basic block & pass it to the multiplexor */ for (;;) { bytes_read = read (ring_fd,&bb, MAX_BUF_SIZE+CMD_LEN+CNTR_LEN); if (bytes_read > 0) { send(s_multiplexor, &bb, bytes_read, 0); } } } Figure V - l : Reader BSP -48- Implementation 5.2. Writer The Writer's function is analogous to that of the Reader. On behalf of the BSP Servers, the Writer sends the assembled byte stream packet to the ring driver to be transmitted. As implied by its name, the Writer does nothing else other than writing to the ring driver. To perform its function, the Writer receives byte stream packets to be transmitted from its users. Since any number of users can utilize the Writer's service, the Writer has to establish individual connection with each user and keep track of the socket descriptor values. Also, a multiplexing function needs to be performed to initiate reading from a specific socket with available data. The following piece of code tells the story for the Writer. •49- Implementation / * forever loop to multiplex among the user processes */ f o r (;;) { tempi = rd_mask; / * set up the read mask * / rd_mask |= listen_mask; status = select (nfds, &rd_mask, 0, 0, NULL); if (status > 0) { / * connection request */ if ((listen_mask & rd_mask) != 0) connect_req (); rd_mask *== listen_mask; / * user has data to send */ if (rd_mask !—0) user_data (); / * write packet to ring * / if (exc_mask != 0) exception_handler (); rd_mask = tempi; } } Figure V-2 : Writer BSP -50- Implementation 5.3. Multiplexor As mentioned previously, the Multiplexor's function is to multiplex the BSP servers upon the usage of the Reader process. It acts as a middleman to relay incoming basic blocks from the Reader to the Servers. To accomplish this task, the Multiplexor has to establish connections with all the Servers as well as the Reader. With all these connec-tions on hand, it is a non-trivial job to decide to which socket should data be sent and whether there is a pending connection request. Fortunately, the UNIX 4.2bsd IPC has a select primitive to aid the implementation. select (nfds, &readfds, &writefds, &exceptfds, &timeout); w here nfds specifies the range of file descriptors specified in a mask readfds a bit mask of a set of file descriptors from which data can be read writefds a bit mask of a set of file descriptors to which data is to be written exceptfds a bit mask of a set of file descriptors for which exceptional conditions are pending timeout specifies that the selection is not to last more than the timeout value = 0 if selection takes the form of a poll, returning immediately = NULL pointer if selection is to be blocked indefinitely. The bit masks are created by or-ing bits of the form "1 < < file descriptor". When the select call returns, the bit masks are modified. A descriptor fd is selected if a 1 is present in the fd t h bit of the mask [LeFaJo]. A segment of code is extracted from the Multiplexor to illustrate the above concept. BSP -51- Implementation — set up various masks -— / * forever loop to multiplex among the user processes and READER process */ for (; ;) { tempi = rd_mask; / * set up the read mask */ rd_mask |= listen_mask; rd_mask |= reader_mask; status = select (nfds, &rd_mask, 0, &exc_mask, NULL); if ((listen_mask & rd_mask) != 0) / * connection request */ connect_req (); /'* Basic Block arrival */ if ((reader_mask & rd_mask) != 0) bb_arrival (); rd_mask ' = reader_mask; / * mask out the READER mask rd_mask " = listen_mask; /* mask out the listen mask */ if (rd_mask != 0) / * user wishes to close socket * / close_sock (); if (exc_mask != 0) exception_handler (); rd_mask = tempi; } Figure V-3 : Using Select Primitive As seen in the above code segment, the Multiplexor sits forever in an infinite loop waiting for connection request, basic block arrival, and byte stream packet to send. In order to perform its multiplexing function, the Multiplexor needs a storage struc-ture to remember the socket descriptors for all the connection requests. Furthermore, there is also a port number associated with each BSP Server. To provide a mapping BSP -52- Implementation between the Server's port number and the socket descriptors, an array indexed by the port number is used. Thus, every time a basic block arrives, the port number is translated into the socket descriptor value by accessing the above array. 5.4. BSP Server Basically, the BSP Server is the heart to the entire protocol. In summary, it interacts with the User level above and the Basic Block level below. Its functions include processing User's requests and basic blocks that have arrived, updating state variables to ensure that the BSP is functioning properly, and maintaining a timer for detecting lost and duplicate packets. To accomplish all the above functions, the BSP Server is divided into many modules during the design phase. Each module has its own specific function and is designed with ease of future maintenance as goal. Thus, the modules are very well-structured and comprehensive. Before we dwell into the design details of the major modules, we will first examine the data structures that are introduced. 5.4.1. Data Structures There are three main data structures used, namely, the Byte Stream Packet Buffer, the Request Frame used in the Server communication with the User and the Multiplexor, and the Server State Information Node. The following figures illustrate the format of the above data structures. BSP -53- Implementation typedef struct buf { short len_of_data; char port,dest_addr; char recp_flag, recp_cmd; char txmit_flag, txmit_cmd; char data[MAX_BUF_SIZE]; } BB_BUFFER; Figure V-4 : Byte Stream Packet Buffer Definitions: len_of_data number of data bytes port destination port number dest_addr destination station address recp_flag reception command flag recp_cmd reception command txmit_flag transmission command flag txmit_cmd transmission command data user data typedef struct { int request_type; int count; } REQ_FRAME; Figure V-5 : Request Frame Definitions: request_type type of request count if request contains data, count stores the value of the length of data; otherwise, it is set to zero. BSP -54- Implementation typedef struct { int port_no; int user_sock; int multi_sock; int writer_sock; int r_state, frame_expected, r_tries, r_max_size; int x_state, frame_just_sent, x_tries, x_max_size, x_port, x_addr; int rbufstat; int read_pending; int writ_pending; int proc_status; BB_BUFFER *recp_buf, *txmit_buf; } USER_INFO_NODE; Figure V-6 : Server State Information Node Definitions: port_no server's own port number user_sock user's socket descriptor value multi_sock multiplexor's socket descriptor value writer_sock writer's socket descriptor value r_state reception state frame_expected sequence number of the expected frame r_tries number of retries for the receiver r_max_size maximum block size for the receiver x_state transmission state frame_just_sent sequence number of the preceding transmitted frame x_tries number of retries for the transmitter x_max_size maximum block size for the transmitter x_port destination port number x_addr destination station address rbufstat reception buffer status read_pending user wishes to do a read writ_pending usr wishes to do a write proc_status server status recp_buf pointer to the reception buffer txmitjbuf pointer to the transmission buffer BSP -55- Implementation 5.4.2. BSP Server Administrator The BSP Server Administrator is the master process which delegates task to its slave processes. It simply sets up its communication socket, binds it to a well-known name and waits in an infinite loop for user connections. Once an incoming connection is received, it creates a new slave process to serve the client who has just made contact. Then, the administrator goes back to await another connection, forgetting the previous client. s = socket (AFJJNLX, SOCKJSTREAM, 0); / * bind to a well-known name */ if (bind (s, "server", sizeof("server")) < 0) { printf(" Server binding errorO); exit(l); } listen (s,5); / * listen to socket */ connected_call = 0; for (; ;) / * forever await connection request */ { if ((snew = accept (s, 0, &fromlen)) < 0) { if (errno != EINTR) printf (" Connection acceptance errorO); continue; } if ((pid [connected_call-t-+] = fork ()) = = 0) { close (s); serverl (snew); / * fork child to serve client * / } close (snew); } Figure V-7 : BSP Server Administrator BSP -56- Implementation The advantage of having one process serving one client is the simplification of data structures needed. The server process needs to keep status information for only one client. Also, the server code itself is less complicated and a dedicated server for each client allows certain degree of parallelism which should increase performance. 5.4.3. Secondary Server Though it is named the secondary server, it is by no means unimportant. In fact, this is where all the communications and processing take place. Its main procedure is shown in the following figure. — initialization — / * forever processing client requests and packet from multiplexor */ for (;;) { / * see which socket has available data */ u_or_multi (&read_user, &read_multi, sock, multi_sock); if (read_user) { / * process user request */ get_user_req (sock, &request, unode); process_user_req (request,unode); } if (read, multi) { / * process arrived packet */ get_multi_req (multi_sock, &request, unode); process_reader_req (request, unode); } } Figure V-8 : Secondary Server The task of the Server is to initialize communication link and to process requests in an infinite loop. Initialization involves establishing connection with the Multiplexor and allocating space for the Server State Information Node. Requests could come from either BSP -57- Implementation the User or the Multiplexor. Therefore, it requires the select primitive to determine which socket contains incoming request. User requests can be of various types, corresponding to the different Byte Stream Commands. • REQ_OPEN • REQ_CLOSE •-REQJREAD • REQ_WRITE • REQ_RESET and • CLOSE_SOCK to indicate the end of communication The processing of these requests involves setting up the byte stream packet with the appropriate command and transmitting it. Requests from the Multiplexor are of two types. BB_ARRIVE : indicates basic block arrival READ_FAIL : indicates a ring read error has occurred All of the above user and multiplexor requests come in the format specified in the REQ_FRAME data structure, as shown in Figure V-5. 5.4.4. Basic Block Processing With the Multiplexor sending a BB_ARRIVE request, it implies that the communi-cating party has sent a message. To decode the message, the Server needs to process the BSP -58- Implementation arrived basic block. The following figure outlines the decoding procedure. frame_arrival (unode,bptr) USER_INFO_NODE *unode; BB_BUFFER *bptr; { switch (bptr->recp_cmd) { case RESET_COM : rec_reset (unode,bptr); break; case CLOSE_COM : rec_close (unode, bptr); break; case OPEN_COM : rec_open (unode, bptr); break; case OPENACK_COM : openack_rec (unode, bptr); break; /* not a control basic block, * / / * must contain data */ default : data_frame (unode, bptr); break; } Figure V-9 : Basic Block Arrival The algorithm first examines whether the basic block contains a control command. If it does, appropriate control actions will be performed. For instance, an OPEN_COM received implies that an OPENACK_COM needs to be sent as acknowledgement. Thus, the Server processes the OPEN_COM as follows : BSP -59- Implementation open_req (unode) USER_INFO_NODE *unode; { BB_BUFFER *bptr; short *msg; int cntrl_timeout (); if (unode->proc_status != NOT_OPENED_YET) { msg_to_user (unode->user_sock, ALREADY_OPENED, 0); return; } / * setup all the OPEN COM parameter */ bptr = unode- > txmitjbuf; setup_bb (bptr,unode->x_port); bptr->len_of_data = CNTR_DATA_LEN; msg = (short *)&bptr->port; *(msg + FIRST_COM) = OPEN_COM; *(msg + PORT_SPOT) = unode->port_no; * (msg + NO_OF_P ARAM) = OPTIONALJPARAM; *(msg + SVC_REQ) = 0; *(msg + REC_BLK_SIZE) = unode- >r_max_size; *(msg + TX_BLK_SIZE) = unode- >x_max_size; unode->proc_status = WAIT_OPENACK; unode->x_tries = MAX_TRIES; / * send the assembled block to the writer * / send_bb (bptr, unode->-writer_sock); start_timer (NORMAL, cntrl_timeout); } Figure V-10 : Processing OPEN Command All the non-control commands are processed according to actions specified in the State Transition Table in Figure 11-10. For instance, a D A T A _ C O M is handled with the following piece of code. BSP -60- Implementation data_cmd (unode,bptr) USER_INFO_NODE *unode; BB_BUFFER *bptr; { int rdy_timeout (); / * expected frame or non-expected frame? */ switch (check_seq_no(unode->frame_expected, bptr->txmit_cmd)) { case E_rep : switch (unode->r_state) { case ESSENTIAL : return (OK); case NONESSENTIAL : return (OK); case IDLE : p_error (" DATA COM : State IDLE with E[rep]"); return (ERROR); } case E_exp : stop_timer (); switch (unode->r_state) { case ESSENTIAL : case IDLE : unode->frame_expected = (++unode->frame_expected) % MODULO; unode- >rbufstat = FULL; if (unode- >read_pending) / * user waiting for data * / { data_to_user(bptr,unode->user_sock); unode- >rbufstat = EMPTY; rdy_txmit (unode); start_timer (NORMAL,rdy_timeout); unode-> read _pending = FALSE; } else notrdy_txmit (unode); return (BB_OUT); case NONESSENTIAL: p_error (" DATA COM : State NONESSENTIAL with E[exp]"); return (ERROR); } } } Figure V - l l : Processing Data Command BSP -61- Implementation Essentially, the actions taken are based on the packet sequence number received and the current BSP state. In the normal situation, the current state status is updated and an acknowledgement packet sent if necessary. However, no action may be needed in some situations while others lead to protocol error. Like the DATA_.COM, the NODATA_COM, RDY_COM, N0TRDY_.COM are pro-cessed in similar manner. 5.4.5. Timer Design As noted in Figure V - l l , timer is started and stopped when event occurs. Thus, a timer should be maintained by the Server. It would be rather awkward and silly to have the Server constantly checking the clock, determining whether the timeout interval has expired. To facilitate the Server's task, it would be ideal to set an alarm, notifying the Server that a timeout event has taken place. Fortunately, UNIX 4.2bsd does provide a primitive alarm to perform the clock checking function. A time interval value in seconds is passed as parameter. Once the time interval has expired, it sets off the alarm. For instance, invoking alarm(lO) implies that "let me know when 10 seconds have expired". A parameter of zero turns off the alarm. The UNIX alarm employs a signaling concept. Every time an alarm is sound, it sends an alarm signal to the invoking process. The signaling concept works as follows: signal ( S I G A L R M , alarm_handlr); means that every time an alarm signal is received, control is transferred to the alarm_handlr routine. In our case, an appropriate timeout routine can be invoked, BSP -62- Implementation according to actions specified in the State Transition Table. However, the alarm feature of UNIX does not queue calls. A subsequent call would supercede the current outstanding alarm. This may cause problems, since in Section 2.3.2.6, we have noted that two timers are needed, a Normal Timer and an Idle Handshake Timer. To overcome the above problem, we can simulate the queue calls facility by using a linked list of timenode. Each timenode consists of its expiry time, timer type and the address of the interrupt routine. The alarm would be set to the expiry time of the first node. Once the timer expired, the alarm would be set to the expiry time of the next node. This solution certainly solves the problem on hand; however, it involves many queue manipulation routines and thus complicates the situation and raises many unnecessary problems. We therefore resort to find a simpler solution. By carefully examining the State Transition Table, we discover that the Normal Timer and the Idle Handshake Timer will never be set at the same time. They are mutu-ally exclusive. The reason supporting this observation is that Normal Timer is set only at situation when the Server enters an Essential State, while the Idle Handshake Timer is set when the next state is an Idle State. Thus, they cannot both happen simultane-ous^'. Also, we note that the previous timer is always stopped or expired before setting the next one. Therefore, the current alarm call will never be superceded. From the above observations, we conclude that only one alarm call could be out-standing at one time, making it possible to use the alarm feature to simulate the two timers without extra manipulations. Consequently, the appropriate timer is set by pass-ing its associated timeout value and timeout handling routine. BSP -63- Implementation start_timer (interval, timeout_proc) int interval, (*timeout_proc) (); { extern int (*intrpt_routine) (); alarm (interval); / * set the alarm * / intrpt_routine = timeout_proc; / * store the address of the * / / * interrupt routine */ } Figure V-12 : Start Timer Analogously, stop timer is done by turning off the alarm. stop_timer () { alarm (0); } Figure V-13 : Stop Timer Aside from the above observations, this simple solution can be realized due to the fact that the BSP is implemented as a stop-and-wait protocol. Only one packet can be in flight at a time; thus, only one timer is required. Moreover, since the Server is dedi-cated to serve only one client, it is not necessary to multiplex the timer usage among several clients, enabling the simple solution to be used. The overall simple Timer implementation is stemmed from the good protocol design and well-structured modularization. C H A P T E R VI P E R F O R M A N C E M E A S U R E M E N T S In this chapter, we will present the performance measurements collected for the Basic Block Protocol and Byte Stream Protocol implementations, followed by an evalua-tion of the protocol performance and a brief comparison to the implementation on the LSI 11/23. 6.1. Experiment Descriptions and Results 6.1.1. Performance of the Basic Block Protocol The performance of the Basic Block Protocol implementation is measured in a loop back fashion. A transmitter is started up to continuously issue writes to the ring. As the counterpart, a receiver is also started up on the same machine to receive all incoming basic blocks. The goal of this experiment is to determine the optimal number of preallo-cated buffers to store incoming basic blocks. In order to have better control of the exper-iment, it is performed in a single-user environment, with each run transmitting an ident-ical file of 358408 bytes. Results, based on 5 runs of each test, are presented in Figure VI-1. • The clock used in all measurements has a resolution of 10 microseconds. Performance -64- Measurements Performance -65- Measurements bytes transferred = 358408 bytes per basic block = 2048 No. of Time Preallocated Writes Writes Required Throughput Standard Buffers Issued Failed* (sec) (Bytes/sec) Deviation 1 176 3 71.46 4958.07 62.53 2 176 1 71.48. 5013.99 48.68 3 176 1 71.42 5018.20 51.64 4 176 0 71.59 5034.89 35.64 Figure VI-1 : Performance of the Basic Block Protocol As indicated by the measurements, the optimal number of preallocated buffers is 4. The number of buffers required is found to be small because in the experiment, a receiver is continuously running to accept any incoming basic blocks. 8.1.2. Performance of the Byte Stream Protocol The Byte Stream protocol implementation on the SUN workstation is measured in the loop back mode. The Writer, instead of sending the byte stream packet to the Basic Block protocol for transmission to the ring, actually sends the packet directly to the Reader. In doing so, the Basic Block protocol is bypassed and the measurements col-lected reflects solely the performance of the Byte Stream Protocol. We can then analyze the effectiveness of the multi-concurrent processes approach. All measurements are collected in a single user environment. Thus, the system workload is kept to a minimum. Since it is done in a loop back mode, a transmitter and a receiver are started up in the same machine for communication with itself. Conse-quently, there will be at least two client processes employing the services provided by • Write failed is due to the receiver station's inability to accept the incoming basic block; in this case, it is caused by not having any empty preallocated buffer to store the basic block. Performance -66- Measurements the Byte Stream Protocol in each measurement. Before any statistics is collected for the Byte Stream Protocol implementation, we have measured the communication cost of the UNIX 4.2bsd IPC. One thousand mes-sages are transmitted from sender to receiver and immediately send back to sender; thus, the time required for a message to make a round trip {i.e. travels from sender to receiver and back to sender) can be calculated. The experiment is conducted by varying the message size to see its effect on communication cost. The results, based on 10 runs of each message size, are presented in the following table. Message Size (bytes) Roundtrip time for 1000 messages (sec) 100 7.44043 200 9.139648 300 10.379883 500 14.040039 600 16.259766 1000 21.900391 1500 20.620117 1700 23.019531 2000 23.299805 Figure VI-2 : Result of UNIX IPC Communication Cost Knowing that communication cost increases with message size, we then proceed to measure the effect of changing the amount of data in a byte stream packet has on the throughput of the Byte Stream Protocol. The experiment is carried out in a single user environment, with a transmitter and a receiver residing on the same machine (i.e., 2 clients). The result, based on 3 runs of each packet size with 3 seconds timeout value, is listed in the following table: Performance -67- Measurements Packet Bytes Time Throughput Standard Size (bytes) Transferred (sec) (bytes/sec) Deviation 500 184000 87.98 2091.38 108.17 1000 309000 86.99 3552.22 17.33 1500 358408 73.21 4896.35 69.32 2048 358408 53.59 6687.96 119.11 Figure VI-3 : Byte Stream Protocol Performance with Various Packet Size Even though communication cost for larger data size is higher, the throughput of the system increases with larger data size. Obviously, having more data bytes transferred with a single byte stream command reduces the total number of byte stream commands transmitted which in turn minimizes the overhead cost of byte stream com-munication. Thus, packet size should be made as large as possible. For the Basic Block Protocol, it is restricted to 2048 bytes of data per byte stream packet. Another- interesting parameter to measure is the effect of different number of client processes using the byte stream service on the throughput of the system. An experiment is performed by fixing the packet size at 2048 bytes* setting a 3 seconds timeout interval, and having all transmitters sending 358408 bytes to their corresponding receivers for 100 seconds. All client processes employing the byte stream services are running con-currently. For instance, if there are n client processes, there would be a total of 2n + 4 (n client processes + n server processes + 1 main server + 1 Writer + 1 Reader + 1 Multiplexor) processes started up for the byte stream transfer. However, due to the fact that there is a quota of 25 concurrent processes per user (imposed by UNIX 4.2bsd), we could then have a maximum of 10 client processes using the byte stream service in our experiment. The results, average of 3 runs of each client number, are presented in Fig-ure VI-4. Performance -68- Measurements Total No. Effective No. Time No. of of Bytes of Bytes Required Throughput Standard Client. Transferred Transferred* (sec) (bytes/sec) Deviation 2 358408 358408 53.59 6687.96 119.11 4 416427 416427 80.13 5132.20 144.59 6 294513 259697 81.14 3200.61 175.12 8 107561 63522 78.61 808.07 180.80 10 - -- 2 clients cannot establish initial connection — No. of No. of Packet No. of ' Bytes Client Transmitted Retransmit Retransmitted 2 537 0 0 4 629 0 0 6 395 44 34816 8 255 110 44039 Figure VI-4 : Byte Stream Protocol Performance with Different Number of Clients As expected, with more clients competing for the byte stream services, the effective throughput of the system decreases drastically. From the data collected, the byte stream protocol performs moderately well with 2 to 6 clients. Once it has to multiplex among 8 or more clients, the number of processes competing for CPU resources and the number of retransmits required significantly affect the system's performance. In a protocol implementation, the amount of retransmission is dependent on the timeout value. The timeout value should be as small as possible so that lost packets can be detected quickly. However, too small a timeout value can cause many unnecessary retransmissions just because the packet cannot reach the destination to trigger an ack-nowledgement to be sent back within the timeout period. Therefore, the timeout value should be chosen with care and is very much implementation dependent. The recom-• Effective number of bytes transferred implies the total amount of data sent from all the transmitters to the re-ceivers (excluding the amount of retransmitted data) Performance -69- Measurements mended value for the Byte Stream Protocol implementation is 1 to 2 seconds. However, we found the value to be too small for our implementation using the multi-concurrent processes approach. A timeout value of 3 seconds is found to perform well with as many as 4 clients. Another set of experiments with the timeout value raised is performed to examine whether the poor performance of 6 or more clients is caused by the small timeout value. The results with various timeout values are presented in the following table. No. of Timeout Bytes Time Throughput No. of Bytes Client Value Transferred (sec) (bytes/sec) Retransmit Retransmitted 6 5 268444 81.80 3281.72 18 26624 8 5 79872 86.38 924.66 44 36992 6 7 257058 78.16 3288.88 18 26624 8 7 80891 85.13 950.21 40 34816 6 9 266056 75.42 3527.67 8 12288 8 9 90112 79.92 1127.53 33 26697 Figure VI-5 : Byte Stream Protocol Performance with Various Timeout Values It is evident from the measurement data that the drastic decrease in performance with more than six clients is not solely caused by the small timeout value, but also by having too many processes competing for CPU resources. Since the number of clients performing concurrently on the SUN workstation is typically less than six, a timeout value of 3 seconds seems appropriate for our environment. 6.1.3. Overall Performance of the two Protocol Layers The overall performance of the two protocol layers cooperatively functioning as a unit is measured in similar manner, i.e., in the loop back mode, single-user environment. The goal of this experiment is to measure the throughput of the two protocol layers working as a unit. The number of preallocated buffers in the Basic Block Protocol is set Performance -70- Measurements to 4, while the byte stream packet size is set to 2048 bytes and the timeout value to 3 seconds. A transmitter is started up to send a file of 358408 bytes to its corresponding receiver on the same machine (i.e., 2 clients). The system throughput is measured and presented in the following table. Data Bytes Time Throughput Standard Transferred (sec) (bytes/sec) Deviation 20480 8.10 2570.11 338.76 61440 23.32 2642.65 142.80 102400 39.60 2588.57 83.72 143360 54.73 2621.74 75.11 204800 77.24 2653.09 65.64 Figure VI-6 : Overall Performance of the Two Protocol Layers As noted in the measurements, the system throughput is lower as compared to those of the Basic Block and Byte Stream Protocols. This is as expected since the packet has to be sent out to the ring through the Byte Stream and Basic Block Protocols in a sequential manner. The time required for a packet to be transmitted is the total time spent in the Byte Stream and Basic Block layers. Since more time is needed to pro-cess a packet, it is expected to find a decrease in the system throughput. The system performance is also measured with 2, 4 and 6 clients using the protocol system with 3 seconds timeout, 2048 packet size and 4 preallocated buffers. Result is tabulated as follow. Performance -71- Measurements No. of Bytes Time Throughput No. of Writes Client Transferred (sec) (bytes/sec) Retransmit* Failed** 2 218280 83.45 2626.54 0 0 4 180880 79.91 2264.79 0 0 6 75480 79.25 955.04 55 3 Figure VI-7 : Protocol System Performance with Different Number of Clients The measurements indicate that the protocol system works reasonably well up to 4 clients. Once there are 6 clients using the system, the throughput drops drastically. The reasons for this behaviour are suspected to be related to the 3 seconds timeout value and the number of preallocated buffers in the Basic Block Protocols. Since there is extra time spent in the Basic Block Layer to send a packet out to the ring, the timer value set for optimal performance with the Byte Stream Protocol in the loop back mode may now expire prematurely, causing unnecessary retransmissions. Furthermore, there are unsuc-cessful transmissions of basic block due to the receiving station not having empty buffer to store the incoming block. Therefore, there are retransmissions not only in the Basic Block Layer but also in the Byte Stream Layer. It is these unnecessary retransmissions which cause the throughput to drop so drastically. To confirm our suspicions, we conduct 3 more experiments by varying timeout values and number of preallocated buffers to see its effect on the system throughput with 6 clients using the protocol package. Results are presented in the following table. • Number of Retransmit in the Byte Stream Protocol itself (excluding those in Basic Block Protocol). • • Writes Failed implies unsuccessful basic block transmission. Performance -72- Measurements Timeout No. of System Value Preallocated Throughput No. of Writes (sec) Buffers (bytes/sec) Retransmit* Failed** 7 4 1669.44 22 0 7 5 1708.25 11 0 3 5 976.73 50 1 Figure VI-8 : Protocol System Performance with Six Clients Versus Various Timeout and Preallocated Buffer Values Figure VI-8 would indicate that raising the timeout value from 3 to 7 seconds has a more significant effect in improving throughput than increasing the number of buffers at the receiver. As mentioned previously, the number of clients using the SUN workstation tends to be less than 6; thus, the system parameter of 3 seconds timeout with 4 preallocated buffers is suitable. But, if the protocol system should be made to accommodate more clients, the timeout and the preallocated buffer values would have to be increased to pro-vide efficient service. 6.2. Evaluation 6.2.1. Analysis of the Performance of the Protocols Although the Basic Block Protocol is implemented in the system kernel, its perfor-mance is inferior to that of the Byte Stream Protocol. One major factor which contri-butes to this performance behaviour is that the basic block protocol has to service a pro-grammed interrupt for each incoming minipacket carrying only 2 data bytes. For a byte stream packet size of 2048 bytes, there are a total of 1024 interrupts generated. Once it is engaged in servicing an incoming minipacket, it is unable to accept other minipackets, Performance -73- Measurements thus causing retransmissions to take place. Furthermore, all incoming packets are stored in preallocated buffers in the kernel address space. When the Byte Stream Layer is ready to process more packets, the byte stream packet is then copied into the user address space for further processing. Therefore, with all these data copyings, it inevit-ably slows down the protocol performance. However, the performance of the Byte Stream Protocol implementation using the multi-concurrent processes approach is rather poor with more than 6 clients. This inade-quate performance is caused by various factors. In UNIX 4.2bsd, the IPC's communication cost for a message of size 2000 bytes from sender to receiver requires 0.01165 second (0.0232998 second round trip), calculated from Figure VI-2. This is quite a major overhead since there are many concurrent processes working together to provide a particular service and that the most suitable form of communication provided to the processes is the costly IPC primitives. Thus, one way to improve the efficiency is by minimizing the number of messages sent. Another obvious way is to improve the communication cost so that processes can communicate with lower cost. UNIX heavy weight process is another major source of inefficiency. Heavy weight process implies that much effort is involved in saving the current values of the environ-ment and restoring the environment for the next-to-run process during process switch-ing, since there is a large amount of essential information associated with each process. In the multi-concurrent processes environment, the rate of process switchings is neces-sarily high. Thus, the overhead cost for a heavy weight process to do process switching turns out to be an expensive factor in a multi-concurrent processes environment. Performance -74- Measurements Furthermore, as mentioned previously in Section 3.1, process scheduling causes a significant real time delay in the process oriented approach. As more processes are present in the system, competition for CPU resources is more intense; thus, the protocol performance is expected to further deterioate when more users are logged on. Despite the unsatisfactory efficiency performance with more than 6 clients, the multi-concurrent processes approach is by no means to be discarded. Though it fails in the efficiency test, it succeeds in many other aspects. Moreover, one must bear in mind that the operating system is partially responsible for the failure. If the operating system provides an inexpensive form of IPC and light weight processes, the multi-concurrent processes approach would demonstrate to be a favourable technique. From the Byte Stream Protocol implementation, we have found that the multi-concurrent processes approach does enhance modularity and comprehensiveness. The server code (approximately 1000 lines), along with the data structures, is simplified since its code requires no necessary multiplexing among clients. All the other processes — Multiplexor, Reader, Writer — are also very simple, each with approximately 150 - 200 lines of code. Consequently, future maintenance of the protocol system is greatly enhanced. 6.2.2. Brief Comparison to the BSP Implementation on the LSI 11/23 In the current BSP implementation on the LSI 11/23, multiplexing among clients is done within a single server. One server is present at all times to multiplex its services among various clients by storing each client instance in a data structure. Thus, it leads to more complicated server code to handle multiple client instances and timer queue Performance -75- Measurements simulation, as compared to the multi-concurrent processes approach. Unfortunately, performance-wise, there is no measurements collected for the LSI 11/23's implementation for comparison. Even if there are available data, the difference in machine speed would favor the implementation on the SUN workstation. Thus, we proceed to compare by estimating the performance of the LSI 11/23's implementation on the SUN workstation. Figure VI-9 : BSP Implementation on the LSI 11/23 [Wong84] The major difference between these two implementations lies in the number of servers present to provide the byte stream service. For the single server approach, there would be n + 2 (n clients + 1 server + 1 reader) processes started up to serve n clients as compared to 2n + 4 processes needed in the multi-concurrent servers approach. Thus, for all n, there are always twice as many processes competing for CPU resources in the multi-concurrent processes case. With half as many processes running, the UNIX heavy weight process switching cost can then be reduced for the single server approach. However, if this is in an implementation environment where process switching cost is Performance -76- Measurements low, the difference in the number of processes used would not be a significant factor. As a consequence of the single server approach, 2 IPC are eliminated for each packet transmission (1 IPC with the writer and 1 IPC with the multiplexor). For each IPC of message size 2000 bytes, the communication cost is 0.01165 second, calculated from Figure VI-2. Thus, for each packet to be transmitted, 0.0233 second is saved, representing a saving of about 3% (from Figure VI-6, it takes about .79 second to send a packet from the sender to the receiver). However, since multiplexing is done within the single server, more instructions are executed in order to access the required client data structure in a linked list and to han-dle timer queue simulation. For each client request and basic block arrival, the client node needs to be retrieved to update its status. Also, for each transmitted and received packet, a timer node needs to be inserted or removed from the timer queue; thus,, linked list manipulation is required. On the SUN workstation, it is found to take 0.021 second to traverse a list of 1000 nodes. Therefore, the client node and timer queue manipula-tions have negligible cost if the number of clients employing the byte stream service is small. From the above analysis of difference in implementation, the version on the LSI 11/23 would outperform the multi-concurrent processes approach by at least 3% because of the reduced process switching cost due to fewer processes present and the eliminated IPC communication cost, assuming the rest of the protocol implementation is identical. In summary, the multi-concurrent processes approach is a suitable technique in an environment with a small number (one to two) of clients. For a large number of clients, multi-concurrent processes perform best in an environment which provides low commun-Performance -77- Measurements ication cost and minimal process switching overhead. With such an environment, the task of achieving an easily maintained and yet efficient protocol implementation is allevi-ated. C H A P T E R V H C O N C L U S I O N S 7.1. Thesis Summary With three months of implementation effort, the final product of the Basic Block and Byte Stream Protocols is currently running on a SUN workstation (ubc-andrew). Out of the three months period, two months were devoted in developing the Byte Stream Protocol; the other month is for the Basic Block Protocol implementation and the integration of the two layers. The layer structure does prove to be quite successful in protocol design since the integration process of the two layers is virtually painless. In the future, higher layers with more sophisticated functions can be added to strengthen network communication with minimal integration effort. As was discussed previously in Chapter VI, UNIX 4.2bsd does not provide a very suitable environment for implementation using the multi-concurrent processes approach. However, UNIX 4.2bsd provides a very favourable environment in the implementation of the Basic Block Protocol as a device driver. UNIX 4.2bsd kernel provides many facilities to aid the implementation; for instance, there are facilities such as interface routines to the user processes, mapping of user address space to kernel address space, hardware interrupt handling, timeout and untimeout, sleep and wakeup mechanism and raising and lowering interrupt priorities for processing of critical section. Moreover, the new device driver is made known to the kernel by merely adding its existence to a few configuration files. Thus, only minimal kernel modification is necessary to add a new -78--79-device driver. During the implementation phase, no major software problem was encountered except for a few ring interface hardware problems. Implementation was smooth and straight-forward. 7.2. Future Work In this thesis, only type 0 basic block is implemented. Other basic block types can be added to the implementation easily. Futhermore, the ring driver should be more thoroughly tested to ensure that it is stable under various error conditions. Future improvements to the Byte Stream Protocol Implementation include more intensive error conditions testing, and adding more sophisticated error recovery actions to provide better system robustness. Currently, the initial connection of a byte stream communica-tion and the port number assignment within a single host are done in an ad-hoc manner; thus, future work is required in this area to provide a less ad-hoc scheme in establishing initial connection and assigning port number. B I B L I O G R A P H Y [Clark82] Clark, D.D., "Modularity and Efficiency in Protocol Implementation", RFC: 817, MIT Lab. for Computer Science, Computer Systems and Com-munications Group, July 1982. [Clark78] Clark, D.D. et al, "An Introduction to Local Area Networks", Proc. IEEE, Nov. 1978, pp. 1497-1517. [Coll82] Collinson, R.P., "The Cambridge Ring and UNIX", Software Practice and Experience, Vol. 12, 1982, pp. 24-39. [Dal8l] Dallas, I.N., "Transport Service Byte Stream Protocol", Report No. 1, Revision 3, UKC Computing Lab., The University of Canterbury, Kent, Canterbury, August 1981. [John80] Johnson, M.A., "Ring byte stream protocol specification". Systems Research Group Paper, University of Cambridge Computing Laboratory, April 1980. [Kern78] Kernighan, B.W. and Ritchie, D.M., "The C Programming Language", Prentice-Hall, 1978. [LeFaJo] Leffler, Fabry, Joy, "4.2bsd Interprocess Communications Primer", Com-puter Systems Research Group, University of California, Berkeley, Draft of March 83. [Lions77] Lions, J., "A Commentary on the UNIX operating system", Dept. of Computer Science, University of New South Wales, 1977. [Need79] Needham, R.M., "Systems Aspects of the Cambridge Ring", Cambridge University, A C M , 1979. [NeHe82] Needham, R.M. and Herbert, A.J. , "The Cambridge Distributed Comput-ing System", Addison-Wesley, 1982, pp. 24-39. [RiTh78] Ritchie, D.M. and Thompson, K., "The UNIX time-sharing system", Bell Sys. Tech. J., Vol. 57, No. 6, July 1978, pp. 1905-1929. -80--81-[Sun84] Sun System Internals Manual, "Writing Device Drivers for the Sun Workstation", Revision C of 7 January 1984. [Tanen8l] Tanenbaum, A.S., Computer Network , Prentice-Hall Inc., Englewood Cliff, N.J. 07632, 1981. [Toltec] Toltec Computer Limited, "Toltec's Cambridge Data Ring", pp. 1-12. [Thom78] Thompson, K., "UNIX Implementation", Bell System Technical Journal, Vol. 57, No. 6, July 1978, pp. 1931-1946. [WiWh79] Wilkes, M.V. and Wheeler, D.J., "The Cambridge Digital Communication Ring", Proc. Local Area Communication Network Symposium, Mitre Corp., May 1979. [Wong84] Wong, J.H.P., "The Design and Implementation of Cambridge Ring Pro-tocols", University of British Columbia, Department of Computer Science, M.Sc. Thesis, May 1984. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051853/manifest

Comment

Related Items