UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Lntp : the implementation and performance of a new local area network transport protocol Robinson, James Beresford 1987

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1987_A6_7 R63.pdf [ 3.56MB ]
Metadata
JSON: 831-1.0051917.json
JSON-LD: 831-1.0051917-ld.json
RDF/XML (Pretty): 831-1.0051917-rdf.xml
RDF/JSON: 831-1.0051917-rdf.json
Turtle: 831-1.0051917-turtle.txt
N-Triples: 831-1.0051917-rdf-ntriples.txt
Original Record: 831-1.0051917-source.json
Full Text
831-1.0051917-fulltext.txt
Citation
831-1.0051917.ris

Full Text

LNTP - THE IMPLEMENTATION AND PERFORMANCE OF A NEW LOCAL AREA NETWORK TRANSPORT PROTOCOL By JAMES BERESFORD ROBINSON B.A.Sc, The University of Toronto, 1981 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (DEPARTMENT OF COMPUTER SCIENCE) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August 1987 © James Beresford Robinson, 1987 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date Z//<S / T ^ DE-6(3/81) Abstract In the past it has been convenient to adopt existing long haul network (LHN) protocols for use in local area networks (LANs). However, due to the different operating parameters that exist between these two types of networks, it is not possible for a L H N protocol to fully exploit the characteristics of a LAN. Thus, the need arises for a protocol designed specifically for use in a L A N environment. LNTP is one such transport level protocol. It was designed for exclusive use in LANs, and thus does not incorporate those features which are not relevant to a L A N environment. The result of this is a simpler and more efficient protocol. As well, LNTP employs a novel deferred flow control strategy which minimizes the time that a transmitting process will be blocked. This thesis examines the implementation of LNTP in the 4.2 BSD UNIX operating system. Various measurements are taken, and LNTP's performance is compared to that of TCP/IP, a L H N protocol which is often used in L A N environments. Several formulas are developed to determine the optimum values for various LNTP parameters, and these theoretical results are compared to the experimentally observed values. We conclude that LNTP does indeed outperform TCP/IP. However, due to the overhead of the non-LNTP specific protocol layers, this improvement is not as great as it might be. Nonetheless, LNTP proves itself to be a viable replacement for TCP/IP. ii Table of Contents A b s t r a c t i i L i s t o f T a b l e s v i L i s t o f F i g u r e s v i i A c k n o w l e d g e m e n t v i i i 1 I n t r o d u c t i o n 1 1.1 Goa ls 1 1.2 M o t i v a t i o n 1 1.3 Thes is Summary 4 2 L N T P D e s i g n a n d D e s c r i p t i o n 5 2.1 Design Decisions 5 2.2 Sal ient L N T P Features 6 2.2.1 Unsuppor ted Features .. 6 2.2.2 E r r o r C o n t r o l 6 2.2.3 Checksums 6 2.2.4 Address ing 7 2.2.5 Packe t Sequencing 7 2.2.6 P ro toco l Decoupl ing 7 2.2.7 F l o w C o n t r o l 8 2.2.8 T i m e r St ruc ture 9 2.3 Descr ip t ion of L N T P 11 2.3.1 Packe t St ructure 11 2.3.2 Func t i ona l Componen ts 13 2.3.2.1 Connec t ion Estab l ishment Phase 14 2.3.2.2 Execu t ion Phase 14 2.3.2.3 Close D o w n of Connec t ion Phase 16 2.3.3 E r r o r Recovery 17 2.3.4 F l o w C o n t r o l 17 2.3.5 Connec t ion Survei l lance 19 2.4 C h a p t e r Summary 19 i i i 3 S y s t e m E n v i r o n m e n t 21 3.1 T h e Hardware Env i ronmen t 21 3.1.1 T h e Processor 21 3.1.2 T h e Network 22 3.2 T h e Software Env i ronment 23 3.2.1 T h e User Interface 23 3.2.2 Internal Layer ing 25 3.2.2.1 T h e Socket Layer 26 3.2.2.2 T h e Network Interface Layer 26 3.2.2.3 T h e Pro toco l Laye r 27 3.3 C h a p t e r S u m m a r y 28 4 T h e P r o t o c o l I m p l e m e n t a t i o n 29 4.1 Genera l Pro toco l Implementat ion 29 4.2 M e m o r y Management 29 4.3 E therne t Interface Considerat ions 32 4.4 T h e Pro toco l Sw i t ch and D o m a i n Tab les 34 4.5 T h e Pro toco l Layer 35 4.6 Miscel laneous Implementat ion Prob lems 38 4.6.1 Ke rne l Debugging 38 4.6.2 Lack of Source C o d e 39 4.7 C h a p t e r Summary - 42 5 L N T P T i m i n g M e a s u r e m e n t s 43 5.1 T h e Exper imen ta l Env i ronmen t 43 5.2 T i m i n g Measurements 44 5.3 Prob lems Encountered 50 5.4 C h a p t e r Summary 50 6 L N T P P a r a m e t e r s 52 6.1 Genera l 52 6.2 Buf fer Sizes • 5 3 6.3 F l o w C o n t r o l Parameters , 56 6.3.1 N t x 57 6.3.2 N t r l -. 62 6.3.3 N t r 2 • 6 4 6.4 Effect of Checksumming on L N T P ' s Performance 68 6.5 Chap te r Summary ••• 69 7 C o n c l u d i n g R e m a r k s 7 2 iv 7.1 Summary • 72 7.2 Possible Improvements 72 7.3 Future Work 74 B i b l i o g r a p h y 76 v List of Tables 5.1 P ro toco l D a t a Packe t Processing T imes 46 5.2 Ethernet Packe t Processing T imes 47 6.1 Th roughpu t vs. Buffer S ize 54 6.2 Th roughpu t vs. N t x (Sender Speed equals Receiver Speed) 60 6.3 Th roughpu t vs. N t x (Sender Speed exceeds Receiver Speed) 60 6.4 Th roughpu t vs. N t r l (Sender Speed equals Receiver Speed) 63 6.5 Th roughpu t vs. N t r l (Sender Speed exceeds Receiver Speed) 64 6.6 Th roughpu t vs. N t r 2 (Sender Speed equals Receiver Speed) 66 6.7 T h r o u g h p u t vs. N t r 2 (Sender Speed exceeds Receiver Speed) 67 v i List of Figures 6.1 Throughput vs. Buffer Size vii Acknowledgement I would like to thank my supervisor, Dr. Sam Chanson, for his guidance and for the amazing amount of patience he has shown with me, and Dr. Gerald Neufeld for reading the final draft of this thesis. As well, I would like to thank K. Ravindran for his many helpful suggestions concerning LNTP, and Don Acton and Huay-Yong Wang for the readiness with which they shared their knowledge of computer systems with me. viii CHAPTER 1 Introduction T h i s thesis describes the design, implementat ion, and performance of L N T P , a new t ransport level pro toco l for local area networks. T h e thesis goals and the mot iva t ion behind the creat ion of L N T P are considered i n th is chapter , and a brief look is taken at the protocol 's design ph i losophy. A l s o , an out l ine of the entire thesis is presented. 1 .1 . G o a l s In th is thesis we examine the design, implementat ion, and performance of a new t ranspor t level protocol , known as L N T P , wh ich was designed specif ical ly for loca l area networks ( L A N s ) . T h e protocol was implemented on a network of S U N - 2 worksta t ions runn ing 4 .2 B S D U N I X wh ich were interconnected over an Ethernet [ M E T C 7 6 ] . Va r i ous performance measurements were taken and parameter tun ing was performed. T h e th roughput of the tuned protocol was then compared to that of DARPA's Transmiss ion C o n t r o l Pro toco l / In ternet P ro toco l ( T C P / I P ) [ P O S T 8 1 and P O S T 8 1 b ] , wh ich was original ly designed for long hau l networks ( L H N s ) bu t wh i ch is often used in L A N envi ronments and comes w i th 4 .2 B S D U N I X . 1.2. M o t i v a t i o n T h e need for a protocol such as L N T P is readi ly understood i f one considers the different operat ing parameters that characterize L A N s as compared to L H N s 1 2 [ S T A L 8 4 and H O P P 8 6 ] . Typ i ca l l y , relat ive to L A N s , L H N s have a low bandw id th , a h igh error rate, a high propagat ion delay, and often have to deal w i t h a hosti le env i ronment . In contrast , L A N s offer a much higher bandwid th , a lower error rate, a lower propagat ion delay, and a control lable environment. In the past it has often been convenient to merely use an exist ing L H N protocol in a L A N env i ronment . T h i s "qu ick and d i r t y " approach, a l though possibly appropr iate at the t ime, should now be given the bur ia l that it deserves as there is no va l id reason not to design and implement L A N protocols wh ich exploi t the potent ials of the different env i ronment to give better performance. W h e n deal ing w i t h L H N s it is often the case that the network itself is the bott leneck, as wou ld be expected w i t h a med ium that can span thousands of k i lometres and is by its very nature the slowest component in a t ransport connect ion. Coup l i ng th is fact w i th the error prone nature of these networks results in protocols wh ich are n o t designed w i th the goal of efficiency in m ind . Instead, the L H N pro toco l designer invar iab ly must choose to concentrate his at tent ion on the creat ion of a robust, and thus compl icated, error recovery scheme w i th mechanisms for congestion and flow contro l . L N T P , on the other hand , takes almost the opposite approach. S impl ic i ty and efficiency were the major design goals. T h e result of this is a protocol wh ich is not only relat ively easy to unders tand and main ta in but also is character ized by a low processing t ime. In order to achieve these tw in goals the designers of L N T P omi t ted any and a l l features wh ich were irrelevant to the L A N environment, or wh ich were rarely required, or wh ich could be taken over by the appl icat ion program itself. 3 T h u s , checksumming, for example, is specified as an opt ion since L A N s typ ica l ly per form th is operat ion in hardware. T C P / T P , however, not only does checksumming at the network level ( L P ) , bu t also at the t ransport level ( T C P ) - an unnecessary expendi ture of C P U cycles. O u r or ig inal goal was to replace T C P / L P w i t h a more efficient t ranspor t level protocol that w i l l take advantage of the character ist ics of loca l area networks. A t th is po in t the existence of other " l igh t weight" t ransport protocols should be noted. B o t h D A R P A ' s User Da tag ram Pro toco l ( U D P ) [ P O S T 8 0 ] and the Versat i le Message T ransac t ion P ro toco l ( V M T P ) [ C H E R 8 6 ] fal l into this category. U D P (which assumes that D ? is used as the under ly ing protocol) is t ransact ion or iented. It provides only a da tagram service w i t h no flow cont ro l and no guarantee of del ivery. V M T P is basical ly a request-response protocol . It uses a message transact ion model of a session and is an improvement over U D P in that it has a s imple fo rm of f low cont ro l as we l l as rel iable del ivery. It is designed to be used in d is t r ibuted systems and assumes inter-node communicat ions to be dominated by page-level file access, appl icat ion- level remote procedure ca l l , datagrams and mul t icasts. M o s t convent iona l operat ing systems do not fit this descr ipt ion. Nevertheless, because it is a request-response protocol it is not by itself sui table for appl icat ions such as file transfer. T h e user must implement a higher level protocol on top of V M T P in order to achieve this end. T h i s also appl ies to U D P , however, th is protocol 's l imi ted nature results i n an even more compl icated user level protocol being required. T h u s , it wou ld appear that neither of these l ight weight protocols satisfies our objectives. A m o n g other th ings, L N T P at tempts to str ike a balance between s impl ic i ty and funct iona l i ty . 4 1.3. Thesis Summary The remainder of this thesis will consist of: Chapter 2: The design philosophy of LNTP and a description of the protocol. Chapter 3: A description of the hardware and software environment provided by the SUN network in which our version of LNTP runs. Chapter 4: The implementation of a protocol such as LNTP in 4.2 BSD UNIX as well as the problems that were encountered in performing this particular implementation. Chapter 5: An examination of the amount of time spent in the various protocol layers for both LNTP and TCP/TP. Chapter 6: The effect of various parameters such as buffer size on LNTP's performance. Chapter 7: Summary and recommendations for future considerations. C H A P T E R 2 L N T P Design and Description This chapter examines the decisions influencing the design of LNTP. Specifically, it covers the reasons for the inclusion and exclusion of various features normally associated with protocols. As well, a brief description of L N T P will be included. The material for this chapter was based on CHAN85. 2.1. Design Decisions As mentioned previously, the operating environment of a L A N is significantly different from that of a L H N . This difference in operating environment results in a difference in design philosophy. Whereas the traditional L H N protocol was mostly concerned with error recovery, thus necessitating costly recovery algorithms, LNTP's underlying philosophy is simplicity. The result of this being not only improved understandability and ease of maintenance, but also a reduction in processing time. Thus, L N T P does not include any protocol features which are not strictly relevant to a L A N environment. Also, those functions which are only rarely needed, especially if they can be achieved by the application program, are excluded. Another design decision was to exploit the properties of the underlying network and the application environment as much as possible. 5 6 2 . 2 . S a l i e n t L N T P F e a t u r e s 2 . 2 . 1 . U n s u p p o r t e d F e a t u r e s L N T P makes no provis ion for congestion contro l , rout ing, packet f ragmentat ion/ reassembly , packet self-destruction, or service opt ions. These funct ions are typ ica l ly i r relevant to a L A N environment and are thus unnecessary. 2 . 2 . 2 . E r r o r C o n t r o l T h e only mandatory error contro l feature of L N T P is the retransmission scheme, the ma in purpose of wh ich is to handle packet loss due to buffer overflows. In keeping w i t h the L N T P phi losophy of s impl ic i ty , a g o b a c k n strategy is employed. (It should be noted that the or ig inal L N T P design cal led for the more compl ica ted s e l e c t i v e r e p e a t strategy.) Since the bi t error rate i n a L A N is extremely low, of the order of 1 i n 1 0 1 1 to 1 0 1 2 [ C L A R T 8 a n d C O T T 8 0 ] , more than l ike ly any error recovery being done is a result of buffer overflow rather than due to a lost or damaged packet. A l s o , since once a buffer overflows it is qui te l ikely tha t more than one packet w i l l have to be rejected and resent, the g o b a c k n strategy represents an uncompl icated scheme wh ich does not result in an undue loss of th roughput . 2 . 2 . 3 . C h e c k s u m s Checksumming is specified as an opt ion as the possibi l i ty exists that the network interface performs th is funct ion in hardware. 7 2 . 2 . 4 . A d d r e s s i n g T h e L N T P address space contains no internet por t ion and can thus be kept to a relat ively short 16 bi ts. (Note that at present T C P / I P style addressing is s t i l l used so as to take advantage of the exist ing 4 .2 B S D U N I X support ; this results in a 32 b i t internet address as wel l as a 16 b i t port number.) T h i s 16 bi t address is subd iv ided in to a host number and a local port number. T h e al locat ion of b i ts in to these two fields is site dependent. One example is assigning 6 bits to the host number prov id ing for a m a x i m u m of 64 nodes, and assigning the remain ing 10 b i ts to the local port number resul t ing i n a m a x i m u m of 1024 ports per node. 2 . 2 . 5 . P a c k e t S e q u e n c i n g L N T P employs a packet level sequence number scheme, thereby al lowing the sequence number space to be kept smal l (4 b i ts) . T h i s scheme can be just i f ied over the by te level sequencing w i t h i ts correspondingly large number space used in T C P by not ing that a large recycl ing in terva l is unnecessary in a non-mul t ihop network where i t is not possible for a " lost ' ' packet to eventual ly show up . Ano ther reason for using by te level sequencing in T C P is that i t makes the coalescing of f ragmented packets easier. Since packet f ragmentat ion in a L A N environment is a non-issue there is no just i f icat ion for not using the simpler packet sequencing scheme. 2 . 2 . 6 . P r o t o c o l D e c o u p l i n g F o r the sake of s impl ic i ty and hence reduced processing t ime, the sender and receiver sections of L N T P are logical ly separated. Th i s precludes the possibi l i ty of the receiver p iggybacking contro l in format ion on reverse da ta packets. However , 8 since network bandwidth is not scarce in a L A N environment, no real hardship is caused. 2.2.7. Flow Control The flow control mechanism employed in LNTP is quite novel and was designed with the intention of maximizing parallelism between the sender and receiver. The major thrust behind this scheme is that flow control is not initiated until certain thresholds have been crossed. Once again LNTP's simple design is observed when it is considered that there are but three flow control packets whose functions are simple and straightforward: PRMPT: request information on the latest in-sequence packet received. HOLD(n): stop sending (acknowledges up to packet n-1). RESUME(n): start sending (acknowledges up to packet n-l). The flow control of LNTP works as follows. Thresholds are defined in the windows of the sender and receiver. On the sender side, as long as the number of outstanding packets is less than its threshold (which should be true most of the time in normal operation) the sender does not initiate control. Once the threshold is exceeded, at the availability of the next data packet the sender initiates control by piggybacking the P R M P T control on the data packet. Similarly, on the receiver side, as long as the number of packets waiting to be consumed is below its threshold, the receiver does not acknowledge (unless an out-of-order packet arrives, which causes a RETXMT(n) to be sent, or unless the receiver is prompted by the sender). This deferred flow control scheme reduces the volume of control packets 9 f lowing in the network. W h e n i ts threshold is exceeded, the receiver sends a H O L D ( n ) t o suspend t ransmiss ion. T h e receiver accepts any packets that might be in t ransi t before the H O L D cont ro l is acted upon by the sender; thus the H O L D must be sent before the receiver's buffers are exhausted (see section 2.3.4). T h i s feature max imizes the paral le l operations of the sender and the receiver and min imizes packet loss. Fur thermore , the abi l i ty of a heavi ly loaded receiver (by unrelated host act ivi t ies) to direct ly suspend the sender f rom sending more packets provides a cer ta in degree o f adapta t ion to the load at the receiver's end. W h e n the receiver exi ts f rom the flow-controlled region, a hysteresis is prov ided in the sequence number space to absorb the sudden surge of packets that might be sent by the sender when i t is unb locked. T h i s el iminates the possibi l i ty of the receiver osci l lat ing in and out of the flow-controlled region, a phenomenon known as the ping-pong effect. 2.2.8. Timer Structure Eve ry outgoing da ta packet ini t iates a t imer wh ich is reset either by the next outgoing da ta packet (the t imer is also restarted) or by the ar r iva l of an acknowledgement. W h e n the t imer expires ( T g ) , the packet is ret ransmit ted. There is an associated t imer at the receiver wh ich is reset and restarted by every incoming packet; on t ime out ( T r ) , an expl ic i t acknowledgement is sent. T h e t imer values (T g ) and ( T f ) must be chosen to satisfy the inequal i ty: ( T s ) > ( T r ) + E u ( d a t ) + E ^ 2.1 where E is the packet end-tc-end transfer t ime. T h i s t imer has lower overhead than tt the t rad i t iona l one in most protocols. F o r stream traffic, the next outgo ing/ incoming 10 packet resets ( T f l ) / ( T r ) so that no ext ra network traffic is generated. F o r interact ive type traff ic, the t imers guard against possible deadlocks that might arise due to packet loss. T h i s simpler scheme is not sui table for L H N s because the var iance of E is h igh , necessitat ing the choice of a large value for T wh ich makes the scheme inefficient. A n o t h e r mot i va t ion behind the design of th is t imer structure is s impl ic i ty . A l m o s t a l l other protocols specify a separate t imer for each outs tanding packet , resul t ing i n the need to manage mul t ip le logical t imers. L N T P ' s s impler approach , on the other hand , requires the use of only one logical t imer. A s we l l , i t should be noted that P R M P T and R E S U M E packets have associated t imers. T h e R E S U M E t imer is especially impor tant as i t ' s purpose is to guard against deadlock in the event of the loss of a R E S U M E packet. Assoc ia t ing a t imer w i t h the R E S U M E packet works best in s i tuat ions where there is a constant flow of da ta f r om the sender to the receiver (e.g., file transfer). T o see w h y , consider the case where the receiver issues a R E S U M E but the sender has noth ing to send. T h e sender receives the R E S U M E but does not acknowledge it. T h e R E S U M E t imer t imes out a n d the receiver reissues the R E S U M E packet. T h i s w i l l cont inue un t i l the sender finally has da ta to t ransmit . If, on the other hand , the sender a lways has da ta to send, this inefficiency w i l l not occur. A n al ternat ive approach wou ld be to have the sender send the next packet i t in tends to t ransmi t . The re wou ld be a t imer associated w i t h th is packet , resul t ing i n i ts re t ransmi t ta l upon t imeout . T h e receiver wou ld cont inual ly receive (and possibly discard) the packet and emit a H O L D packet (assuming i t is s t i l l in H O L D state). T h i s wou ld , however, on ly happen if the sender has da ta to send. If there is no da ta 11 to send there w i l l be no exchange of packets. T h u s , this approach wou ld be more efficient t han the former in s i tuat ions, such as remote login, where the da ta f low is inconsistent. T h e former approach was adopted by L N T P because i t max imizes efficiency in the s i tuat ion where it counts the most: the transfer of large quant i t ies of da ta . 2.3. D e s c r i p t i o n o f L N T P A br ief descr ipt ion of L N T P is given below in terms of packet structure and the funct iona l components of the protocol . 2.3.1. P a c k e t S t r u c t u r e T h e format of L N T P packets is denoted by a 'C ' - l i ke st ructure: pk t = struct { S R C A D R S 16 bi ts; D S T A D R S 16 bi ts; P K T I D 4 bi ts; S E Q N O 4 bi ts; O P T 8 b i t ; L E N 16 bi ts; C H K S U M 16 bi ts; S A D D R 32 bi ts; / * temporary * / D A D D R 32 bi ts; / * temporary * / D A T A < L E N > bytes; } p k t . S R C A D R S : P o r t address associated w i th the sending peer. p k t . D S T A D R S : P o r t address associated w i th the receiving peer. p k t . P K T I D : Packe t identif ier. T h e different types of packets are: D A T A , D A T A + P R M P T , P R M P T , R E S U M E , H O L D , R E T X M T , C R E Q , C C O N F , D R E Q , and D C O N F . 12 p k t . S E Q N O : Sequence number of the da ta packet f rom sender. W r a p around occurs after every M A X _ S E Q N O packets. F o r a contro l packet sent by the receiver, the value of th is field is one plus the sequence number of the in-sequence packet last received correct ly. p k t . O P T : Opt ions f lag. Present ly only one bit is used ind icat ing whether the packet includes a va l id checksum. p k t . L E N : D a t a length i n bytes. p k t . C H K S U M : Ones complement checksum of the entire packet , inc luding header i f checksum opt ion is used; else 0. S A D D R : T h e internet address of the source host. T h i s f ield is t e m p o r a r y , and is to be removed when sufficient L N T P support becomes avai lable. D A D D R : T h e internet address of the dest inat ion host. T h i s field is t e m p o r a r y , and is to be removed when sufficient L N T P support becomes avai lable. T h e var iab les descr ibing the protocol machine are: M A X J 3 E Q N O : M a x i m u m sequence number space (16) S N D . N X T : Sequence number of the next packet to be sent. S N D . T J N A : Sequence number of the first packet in the sender wai t ing to be acknowledged. N t x : Sender 's threshold in the protocol w indow. 13 R C V . N X T : Sequence number of the next packet to be received. R C V . C N S : Sequence number of the first packet in the receiver wai t ing to be consumed. N t r l : Thresho ld value when the receive window decreases. N t r 2 : Thresho ld value when the receive window increases. T w o t imers are defined (it is the same t imer used differently at two different t imes): (1) A t imer is associated w i t h the packets in the flow control led region at the sender. (2) E a c h of the outgoing packets at the sender also resets and restarts a t imer (wh ich expires after T g ) in the non flow-controlled region. T h e t imer is reset (but not restarted) by an acknowledgement packet. S imi la r l y , there is a t imer a t the receiver wh i ch is reset and restarted by each incoming packet (the receiver t imer expires after T f ) . T h e expiry of T g provokes a D A T A + P R M P T f r om the sender. T h e expiry of T r causes the receiver to send an acknowledgement. T and T must be chosen to satisfy the inequal i ty : 8 r ( T ) > (T ) + E t t ( d a t ) + E t t ( a c k ) W h e r e E t ^ d a t ^ is the packet end-to-end transfer t ime. 2 . 3 . 2 . F u n c t i o n a l C o m p o n e n t s T h e protoco l is d iv ided into three phases: (1) Connec t i on establ ishment. (2) Execu t ion . 14 (3) C lose down of connect ion. 2 .3 .2.1. C o n n e c t i o n E s t a b l i s h m e n t P h a s e T h e connect ion set up is fair ly s tandard. A connect request may come f rom one of two sources: (1) A local cl ient may in i t ia te a connect ion. (2) A remote peer may in i t ia te a connect ion request i n a C R E Q packet. T h e peer that accepts the connect ion in response to the C R E Q sends a C C O N F packet , conf i rming the connect ion. A t imer handles loss of C R E Q and C C O N F packets. 2 .3 .2.2. E x e c u t i o n P h a s e T h i s phase is concerned w i t h the orderly transfer of da ta f rom the sender to the receiver. T h e sending peer can be in one of three states: (1) N O R M A L state (2) P R M P T _ S E N T state (3) H O L D state In the N O R M A L state, only da ta packets flow to the receiving peer. T h e t rans i t ion to P R M P T _ S E N T state occurs when the window size becomes less that N t x . T h i s provokes a P R M P T contro l packet. In this state the protocol can st i l l accept packets f rom the cl ient layer and send D A T A + P R M P T packets. In the H O L D state, no packets are sent except those provoked by a R E T X M T ( n ) 15 packet. T h e type of packets the sending peer may send are: (1) D A T A packets (2) D A T A + P R M P T packets (3) P R M P T cont ro l packets T h e func t ion of the P R M P T packet is to force an acknowledgement packet f rom the receiving peer. T h e receiving peer can exist i n one of three states: (1) N O R M A L state (2) R E T X M T _ S E N T state (3) H O L D _ S E N T state In the N O R M A L state, packets are accepted w i thout generating any acknowledgement unless an out-of-order packet arr ives wh ich provokes an immediate R E T X M T ( n ) contro l packet . T h e receiver cont inues to accept packets i n the H O L D _ S E N T state un t i l i ts w indow closes. Th ree types of packets may be sent by the receiving peer: (1) R E T X M T ( n ) packet (2) R E S U M E ( n ) packet (3) H O L D packet where ' n -1 ' is the sequence number of the last packet that has been received correct ly . The i r funct ions are: 16 R E T X M T ( n ) : Adv ises sender to retransmit a l l packets subsequent to sequence number 'n-1'. T h i s is sent when an out-of-order packet arr ives at th is layer. A l l out-of-order packets are discarded. R E S U M E ( n ) : Adv ises sender to resume transmissions s tar t ing w i th the packet w i th sequence number ' n ' . T h i s is used when sufficient w indow space is avai lable to reenable the sending peer. T h i s is also sent in response to a P R M P T cont ro l packet when packets up to 'n-1' have been received correct ly and the receiver is expect ing packet ' n ' . H O L D ( n ) : Instructs the sending peer to ho ld t ransmission un t i l ins t ructed to resume. T h i s is sent when the w indow size falls below a threshold. 2.3.2.3. C l o s e D o w n o f C o n n e c t i o n P h a s e Before each peer can enter the D I S C O N N E C T I N G state for the ac tua l disconnect sequence, they must be brought to a consistent state. T h e protocol defines a F L U S H I N G state wh ich is an intermediate state between the current state when the disconnect request arr ives and the D I S C O N N E C T I N G state. T h e disconnect request might or iginate f rom the local process or from the remote peer v ia a D R E Q packet. There are three si tuat ions that the protocol might be in when a disconnect request arr ives: (1) Packe ts wai t ing to be sent. These are sent w i t h P R M P T piggybacked on the last da ta packet. (2) N o packet wai t ing to be sent bu t packets are being held wai t ing to be acknowledged. P R M P T packets are sent to elicit acknowledgement. 17 (3) N o packet wai t ing to be sent or acknowledged. In th is case, the protocol immediate ly moves to the D I S C O N N E C T I N G state, in wh ich the sender sends a D R E Q to the receiver and moves to the C L O S E D state. T h e protoco l enters the F L U S H I N G state in the first two cases above. T h e rest of the disconnect phase is s tandard, and handled in a way simi lar to the connect ion establ ishment phase. T h e peer wh ich accepts the disconnect request sends a D C O N F packet conf i rming the disconnect ion. T imeou t is prov ided to guard against lost packets. 2.3.3. E r r o r R e c o v e r y Recovery f rom var ious error condit ions is performed as ind icated: O u t - o f - o r d e r p a c k e t s : R E T X M T control packets are used to in i t iate g o b a c k n retransmissions (missing packets are detected using sequence numbers assigned to packets). D a m a g e d p a c k e t s : R E T X M T packets (error is detected by checksum val idat ion) . L o s t p a c k e t s : R E T X M T packets (error is detected by t imeouts prov ided in the different phases). 2.3.4. F l o w C o n t r o l T h i s is achieved by means of P R M P T packets f rom the sender, and H O L D and R E S U M E packets f rom the receiver. 18 T h e sender's w indow size is defined as: Sender 's w indow size ( S W S ) = M A X S E Q N O - ( S N D . N X T - S N D . U N A ) ( N . B . modu lo ar i thmet ic is used i n the computa t ion of sequence numbers and w indow size w i t h M A X _ S E Q N O as the modulus.) T w o regions are defined: N o flow cont ro l is in i t ia ted f rom the sender in region 1. Once the sender enters region 2 it in i t iates f low contro l measures whi le mainta in ing the da ta flow un t i l the pro toco l stops when the w indow is closed. T h e receiver's w indow is defined as: Receiver 's w indow size ( R W S ) = M A X J 3 E Q N O - ( R C V . N X T - R C V . C N S ) where R C V . N X T < = sequence number < R C V . C N S T w o regions are defined for R W S : SWS < Ntx . . .Region 1 SWS >= Ntx . . .Region 2 R W S < N t r l . . .Region 1 R W S >= N t r l . . .Region 2 when the receiver is in the N O R M A L state, and R W S < Ntr2 . . .Region 1 R W S >= Ntr2 . . .Region 2 when the receiver is in the H O L D _ S E N T state. (Note: in a l l cases, Ntr2 < N t r l < M A X _ S E Q N O ) T h e po in t at wh ich flow cont ro l takes effect is: 19 (i) T i m e at wh ich R W S crosses N t r l , or (ii) T i m e at wh ich S W S crosses N t x . whichever occurs first. T h e hysteresis ( N t r l - N t r2 ) is needed to absorb any transient surge in packet arr ivals when the sender moves from the H O L D state to the N O R M A L state, thereby avo id ing any p ing-pong effect. 2 . 3 . 5 . C o n n e c t i o n S u r v e i l l a n c e T h i s is p rov ided by means of a simple asynchronous protocol running on top of L N T P at bo th the sender and the receiver. T h e protocol per iodical ly exchanges A Y T (acronym for " A r e Y o u There?" ) contro l packets i n a symmetr ic fashion to ascertain the connect ion is st i l l al ive. Absence of A Y T from the other end after repeated a t tempts results in a forced shutdown of the connect ion. 2 . 4 . C h a p t e r S u m m a r y In the past, L H N protocols have been used in L A N environments. However , L H N protocols conta in many features wh ich are irrelevant to a L A N environment. B y tak ing th is fact in to considerat ion, i t is possible to design a L A N protocol wh ich is not only m u c h simpler than the typ ica l L H N protocol , but is also more efficient. In th is chapter we examined L N T P , a new protocol designed specifically for use in a L A N env i ronment , and a t tempted to show why this protocol should outper form a L H N pro toco l , such as T C P / L P , when used in the environment for wh ich it was designed. L N T P ' s ma in features include the absence of elaborate error detect ion and correct ion mechanisms, s impl i f ied addressing, sequencing, and t imer management 20 schemes, and a flow cont ro l mechanism that min imizes in ter rupt ion in the norma l da ta flow and hence the use of contro l packets. CHAPTER 3 System Environment N o t surpr is ing ly , the system environment i n wh ich L N T P was developed can be d iv ided in to two areas: the hardware environment and the software envi ronment. T h e hardware envi ronment consisted of a number of S U N - 2 / 1 2 0 workstat ions and a V A X - l l / 7 5 0 connected by a 10 M b p s Ethernet . A l l computers on the network ran 4 .2 B S D U N I X . Since the implementat ion of a protocol is for the most par t concerned w i t h the software envi ronment , th is w i l l be the focus of the chapter . In par t icu lar , the appl icat ion interface, the socket layer, the network interface layer, and the protocol layer w i l l be examined. A brief overview of the hardware env i ronment is also inc luded. 3 . 1 . T h e H a r d w a r e E n v i r o n m e n t 3 . 1 . 1 . T h e P r o c e s s o r A s prev iously ment ioned the workstat ion used for protocol development was the S U N - 2 / 1 2 0 . T h i s works ta t ion is based on a 10 megahertz Mo to ro la 68010 processor a n d typ ica l ly contains 2 to 4 megabytes of ma in memory. T h e works ta t ions used in development and measurement experiments were diskless and thus depended upon a disk server to perform a l l disk 1/O for them. 21 22 3 . 1 . 2 . T h e N e t w o r k T h e loca l area network connect ing the workstat ions was a 10 M b i t / s Ethernet [ S T A L 8 5 ] . T h i s network is based on a coaxial cable onto wh ich each stat ion is connected. P r i o r to a t tempt ing to t ransmit on the network a stat ion first " l i s tens" to determine whether any other s tat ion is already t ransmi t t ing. If i t appears that the network is free the stat ion i n question commences t ransmi t t ing. However , propagat ion delay makes i t possible for a col l ision to occur. In order to detect this as soon as possible the s tat ion moni tors the cable dur ing t ransmission. If i t should detect a col l is ion i t terminates t ransmiss ion, waits a random amount of t ime, and then a t tempts to ret ransmit the aborted packet. T h i s strategy represents a signif icant throughput improvement over those wh ich do not moni tor the cable dur ing t ransmiss ion. N o t on ly w i l l these systems send the e n t i r e packet a l though col l is ion has rendered i t i nva l id , they must rely on a t ime-out or negative acknowledgement mechanism to detect error in order to retransmit , a process wh ich may take some t ime. T h e Ethernet , therefore, falls into the Car r ie r Sense Mu l t i p l e Access w i t h Co l l i s ion Detect ion ( C S M A / C D ) class of networks. T h e Ethernet control ler used by the S U N s was based on the Intel 82586 local communicat ions control ler ch ip . A n Etherne t packet consists of a 6 byte dest inat ion address field, a 6 byte source address field, a two byte type field wh ich is used to demul t ip lex incoming packets in to the correct protoco l (e.g., the Internet P ro toco l ( IP) ) , and a var iable length message. T h e message can be up to 1500 bytes in length, but not less than 46 bytes. T h e reason for a m i n i m u m message length is that i t is necessary to impose a 23 m i n i m u m transmission t ime in order for coll isions to be detectable dur ing message t ransmiss ion. T h i s m i n i m u m transmission t ime must be at least as long as the end-to-end round- t r ip propagat ion delay of the network. T h e 46 bytes is based on a 10 M b p s t ransmiss ion rate a n d an end-to-end distance of 1 k m . Ethernets w i th other character ist ics w i l l have different m i n i m u m packet length requirements. U p o n t ransmiss ion the transceiver generates a 32 bi t C R C checksum wh ich is appended to the packet. W h e n the packet reaches the receiver the checksum is recomputed and checked against the received checksum. If the checksums do not ma tch , the packet is d iscarded as i t has suffered cor rupt ion. Packets can also be lost i f the receiver's buffers are fu l l . 3 . 2 . The S o f t w a r e E n v i r o n m e n t 3 . 2 . 1 . The U s e r I n t e r f a c e T h e software env i ronment in wh ich protocol development was carr ied out was that p rov ided by 4 .2 B S D U N I X . In 4 .2 B S D U N I X the basis of a l l communica t ion is the s o c k e t wh ich represents one end of a communica t ion channel . W h e n creat ing a socket it is necessary to specify a socket type and a communicat ion domain . T h e socket type merely determines the type of communicat ion in wh ich the p rogram wishes to engage. T h e two most common socket types are the s t r e a m socket and the d a t a g r a m socket. A s suggested by i ts name, the s t r e a m socket provides a perfect channel in wh ich a l l da ta that is wr i t ten to the socket by the sender is guaranteed to arr ive at the receiver w i t h no errors or dupl icates, and in the same sequence that it was wr i t ten. A l s o , there are no record boundar ies, merely a 24 s t ream of da ta bytes, and the connection is fu l l duplex. T h e d a t a g r a m socket, on the other h a n d , is concerned w i th sending messages as d is t inct un i ts and makes no guarantees regarding sequencing of the messages, lost packets, or dupl icate packets. U n l i ke , the s t r e a m socket, the d a t a g r a m socket is not byte or iented. O f course, protocols suppor t ing s t r e a m sockets are by necessity signif icantly more compl icated than those suppor t ing merely d a t a g r a m sockets, since the former must be able to provide error detect ion and correct ion, flow cont ro l , as wel l as packet sequencing. T C P and L N T P are bo th protocols wh ich support the s t r e a m socket abstract ion. In s imple terms the communica t ion domain specifies a naming convent ion wh i ch sockets may use when referring to each other. In van i l la 4 . 2 B S D U N I X there are two domains: T h e U N I X domain and the Internet domain . In accordance w i t h long s tanding U N I X pract ice, U N I X communicat ion domain sockets are named w i t h U N I X pa th names, e.g. " / t m p / f o o b a r " . Sockets of th is type, however, a l low communica t ion on ly between processes on the same machine, as might be expected w i t h a naming convent ion that was never meant to span machines. O n the other hand , names d rawn f r om the Internet communicat ion domain specify a par t icu lar app l ica t ion process (as specified by the port number - a 16 bi t integer), as wel l as a par t icu lar machine (as specif ied by the machine address - a 32 bi t integer). T h e reason that the machine address is so large is because the Internet communica t ion doma in was designed to prov ide addressing over the A R P A N E T , a long hau l network that spans across the Un i ted States and extends in to several other countr ies. Since L N T P is a local area network protocol , i t is not necessary for i ts address space to be as large as that of the Internet domain . 25 T h e fo l lowing inter-process communicat ion ( I P C ) system calls are avai lable i n 4 .2 B S D U N I X : s o c k e t : create a socket b i n d : b ind a name to a socket c o n n e c t : in i t ia te a connect ion on a socket l i s t e n : l isten for a connect ion on a socket a c c e p t : accept a connect ion on a socket s e n d , s e n d t o , and s e n d m s g : send a message f rom a socket r e c v , r e c v f r o m , and r e c v m s g : receive a message f rom a socket s h u t d o w n : shut down par t of a fu l l duplex connection se lec t : synchronous i / o mul t ip lex ing A s we l l , the fol lowing fami l iar U N L X system calls work w i th sockets: r e a d : read from a socket w r i t e : wr i te f rom a socket c l ose : close a socket F o r a fu l l exp lanat ion of the use of these cal ls see L E F F 8 3 . 3 . 2 . 2 . I n t e r n a l L a y e r i n g A s can be inferred f r om the preceding sect ion, the user interface to the U N I X network ing faci l i t ies is bo th r ich and somewhat bewi lder ing. Needless to say, when add ing a new protocol it is necessary that these same I P C system calls be able to interface di rect ly w i th it . In order to accompl ish this goal i t is necessary to look below the user level. 26 Basically, the internal structure of the networking software in this environment can be divided into three layers. There is the socket layer, which is concerned with the implementation of the socket abstraction and interfaces with the application software; there is the protocol layer, which is concerned with the implementation of the communication protocol; and lastly, there is the network interface layer which is concerned with transmitting and receiving packets to/from the network. 3.2.2.1. The Socket Layer The socket layer is totally protocol independent. Therefore it is unnecessary to modify this layer when adding a new protocol. As long as the protocol in question correctly advertises its various characteristics (e.g., whether it requires a connection or not) the socket layer will function correctly unaltered. The reason for this is that it is the socket layer that defines the interface that the protocol must adhere to, and not the other way around. 3.2.2.2. The Network Interface Layer The network interface layer needs some knowledge of the protocol in order to function correctly. Specifically, there must be a separate ethernet type id for each available domain as well as a separate input queue. Whenever a packet that is destined for a given station is detected by that station, an interrupt is raised resulting in an interrupt handler being invoked. The incoming packet interrupt handler, after removing a packet from the network, would look at the type field to determine which domain the packet is associated with, and thus, which one of the several input queues to place the packet. After enqueueing the packet, which has had its ethernet header stripped off, the interrupt handler executes a software trap which 27 invokes the protocol 's queue handl ing rout ine. Because this t rap is at a lower pr ior i ty than the in terrupt handler the handler continues to operate, possibly removing more packets f rom the network, un t i l i t vo luntar i ly rel inquishes contro l . A t th is po in t , i f no other interrupts (or software traps) of a higher pr ior i ty are pend ing, the protocol 's dequeueing rout ine is invoked. A l t h o u g h i t was not st r ic t ly necessary to add a L c l n e t ( local net) domain for L N T P , i t was, nevertheless, very desirable. T h e al ternat ive wou ld have been to run L N T P in one of the exist ing domains, e.g., the I n t e r n e t domain . T h i s wou ld have resulted in encapsulat ing the L N T P packet in an I P packet and modi fy ing the T P layer to recognize L N T P packets and to invoke the correct rout ine. Na tu ra l l y , such a scheme wou ld involve an efficiency penal ty. L a s t l y , i t is necessary to make a smal l modi f icat ion to the network interface's ou tpu t rout ine in order to be able to handle the new packet type. F o r L N T P , the add i t iona l code was only a few lines since a l l tha t is needed is to prepend an ethernet header to the L N T P packet. 3 . 2 . 2 . 3 . T h e P r o t o c o l L a y e r It is in this layer tha t the protocol is actual ly implemented. A s previously ment ioned, the interface that this layer must prov ide to the socket layer is very wel l def ined. Implement ing this interface requires the addi t ion of another protocol swi tch table ( l c l n e t s w ) as we l l as a new doma in ( L c l n e t ) , and several rout ines. T h e purpose of the protocol swi tch table is to in fo rm the socket layer of certain at t r ibutes of the protocol and to provide hooks in to the protocol . T h u s , i n the l c l n e t s w table we have one entry wh ich states that the protocol in question ( L N T P ) belongs to the 28 l o c a l n e t ( L c l n e t ) protocol fami ly, supports s t r e a m sockets, requires a connect ion before commencing da ta transfer, and wants to be informed by the socket layer when the app l ica t ion process removes da ta f rom the input socket queue. Hooks found in the l c l n e t s w table include a rout ine that is cal led at system boot t ime to in i t ia l ize the protocol , a rout ine that is cal led every 500 ms, and the user request rout ine. It is this last routine wh ich provides the interface socke t /p ro toco l interface. A l l socket system cal ls, e.g., b i n d , w i l l eventual ly get mapped to th is rout ine w i t h one of the arguments specifying the par t icu lar operat ion to be per formed. It shou ld be noted that several other avai lable hooks, such as the " fas t " t imer , were not needed. T h e obvious reason for th is is that L N T P is a m u c h s impler protoco l than T C P / I P wh ich was the or ig inal target protocol for the whole B S D network ing in i t ia t ive. 3 . 3 . C h a p t e r S u m m a r y T h e sys tem envi ronment prov ided by 4 .2 B S D U N I X for network ing was examined. It was noted that the software environment was the ma in concern of the protoco l developer. A s we l l , the var ious layers compris ing the network ing faci l i t ies and their inter-dependence were discussed. T h e mater ia l presented in this chapter is needed i n the descr ipt ion of the L N T P implementat ion. C H A P T E R 4 The Protocol Implementation This chapter examines the implementation of protocols in general, and L N T P in particular, in the 4.2 B S D U N I X environment. Special attention is given to memory management, ethernet interface considerations, and the relevant system tables. Given the L N T P source code, the reader should be able to install it on a 4.2 B S D U N I X after going through this chapter. Some of the problems that were encountered while executing the L N T P implementation are also discussed. 4.1. General Protocol Implementation The kernel modifications that need to be made in order to add a new protocol are reasonably straightforward due to the fact that there already exists the T C P / D ? implementation which may be used as a guideline. Conceptually, the most difficult aspects to understand would be the memory management scheme and, somewhat less so, the software interrupt mechanism. Needless to say, the real difficulty rests with the implementation and debugging of the protocol itself. 4.2. Memory Management In order to implement a protocol in 4.2 B S D U N L X , it is necessary to have a good understanding of the memory management scheme since lack of such an understanding can result in some spectacular system crashes as well as more subtle problems. The reason that special memory management techniques are required is 2 9 30 due to the fact tha t the kernel has no means to acquire and deallocate new memory on the fly as an app l ica t ion program may. T h u s , the kernel has a fixed amount of memory (the size of wh ich was determined when the kernel was compi led) wh ich it must use to dynamica l ly create and destroy such entit ies as da ta buffers, protocol cont ro l b locks, and var ious tables. T o faci l i tate th is management, a new data structure known as the m b u f was added. T h e m b u f is an 128 byte structure wh ich is used for mul t ip le purposes. For tuna te ly , for our purposes i t is necessary to be concerned w i th only a few of i ts uses; the most impor tan t being i ts use as a packet header and i ts use in buffering da ta . Because the m b u f is such a mult i - facetted da ta type i t is by necessity self-descr ib ing. T h i s means that a por t ion of the 128 bytes is used as a header, w h i c h , among other th ings indicates the type of the m b u f (e.g., dynamic da ta a l locat ion, packet header, etc.), the amount of da ta associated w i t h i t , and an offset to the start of the date. T h u s , there are actual ly less than 128 bytes avai lable for da ta (112 bytes, to be precise). Na tu ra l l y , 112 bytes wou ld be inefficient for use in da ta buffering for certain appl icat ions, such as file transfer, whereas for conversat ional appl icat ions, such as remote logins, i t wou ld do qui te nicely. It is for this reason that there is ma in ta ined a fixed number of pages wh ich may also be used to buffer da ta . These pages are referenced using m b u f s , however, most of the t ime it is unnecessary to know whether the da ta perta in ing to an m b u f is to be found in the m b u f itself, or i n one of the pages. T h i s is because the manipu la t ion of the m b u f pool and the page pool is accompl ished by the use of var ious routines and macros wh ich hide 31 much of the gory details from the implementor. For example, in order to create a copy of an mbuf the m_copy call is used; a routine that works equally well on the "small data" mbufs as on the "large data" ones. It is of interest to note how it is determined where the mbuf data is kept. This is done by considering the sign of the offset field in the mbuf. This field is used to indicate the address of the data relative to the mbuf. When the offset is positive it signifies that the data is to be found in the mbuf itself. However, when the offset is negative the data is to be found in one of the reserved pages. In order for this scheme to work it is necessary for the page pool to be allocated in lower memory than the mbuf pool. In either case, finding the start of the data is done by adding the mbuf address to the data offset, i.e., address = (char *)((int)m + m->m_off) where m is a pointer to an mbuf. The mbuf length field will then indicate the amount of data that is being pointed to. The protocol implementor is therefore required, by necessity, to thoroughly understand how the various macros and routines are used, and to use them in the correct manner. Failure to do so can result in various problems. An example of one such problem that occurred was that an mbuf that should have been deallocated was not being freed up. The result was that after a while whenever an attempt was made to allocate an mbuf the kernel would fail to do so, and signify this failure by having the system call in question return error status and setting the application program's errno variable to E N O B U F S . Tracking down problems of this sort can be quite painful as it is not at all obvious which section of code is responsible. 32 4.3. E t h e r n e t I n t e r f a c e C o n s i d e r a t i o n s W h e n the ethernet interface accepts a packet f rom the ethernet i t raises the appropr ia te in terrupt . T h i s causes the ethernet interrupt handler to be invoked wh ich then proceeds to copy the ethernet packet out of the inpu t buffer into a newly al located mbuf , s t r ip off the ethernet header, enqueue the packet on the correct protoco l inpu t queue, and then schedule a software interrupt . T h e preceding is not s t r ic t ly t rue i n tha t often the copy ing of the packet is omi t ted and the ethernet dr iver leaves i t in the inpu t buffer. T h i s is done when there are " s tandby " buffers avai lable. In th is case, a " f u n n y " mbuf is created to point to the packet and the s tandby buffer is used to replace the now occupied input buffer. (Th is type of efficiency a id also exists i n the case of outgoing packets wh ich reside in addresses accessible by the ethernet interface chip. These packets do not have to be copied in order to be placed on the ethernet by the interface). Because the software in terrupt has a low pr ior i ty it w i l l not be serviced unt i l after the ethernet in terrupt handler has finished deal ing w i t h a l l the input packets. (Note that there are many other in ter rupts w i t h a higher pr ior i ty than a software in terrupt , such as the disk control ler in ter rupt ) . W h e n the software interrupt is eventual ly serviced it invokes the protocol 's in ter rupt rout ine whose purpose is to dequeue any packets residing on the in ter rupt queue and pass the packet up to the protocol itself. In order to add a new protocol it is necessary to modify both the ethernet ou tpu t rout ine and the ethernet in terrupt handler to recognize the new protocol i d , and add a new queue to wh ich a l l such incoming protocol packets may be appended. Mod i f y i ng the ou tpu t rout ine e c o u t p u t , i n file /sy8 / 8 u n i f / i f _ec . c , is quite 33 st ra ight forward. Ano the r case statement is added for L N T P packets whose only purpose is to specify the ethernet packet type and the ethernet dest inat ion address that are to be used in the ethernet header. A f te r this the contro l f low resumes w i t h previously exist ing code wh ich sets up the ethernet header, enqueues the ethernet packet on the ethernet interface's output queue, and then starts the ethernet interface i f i t is not already act ive. Mod i f y i ng the in ter rupt handler rout ine consists of going into the rout ine ecread (which is cal led by the interrupt rout ine, e c r i n t ) in the file /sys/sunif/if_ec.c and add ing a case statement for the new protocol id . In the case of L N T P , th is new case statement w i l l schedule the appropr iate software interrupt to occur and w i l l ident i fy the queue that the mbuf is to be appended to. Th i s is exact ly what is done for an Internet packet, except that a different input queue is used and a different b i t is turned on in the flag which is used by the network software in ter rupt handler to determine which protocol in terrupt routines are to be invoked. L a s t l y , i t is necessary to modify the software interrupt rout ine so that it can determine that i t has been invoked to service a L N T P packet. T h i s merely requires a few lines of assembler code i n the rout ine netintr in the file /sys/sun/locore.s. T h e code tha t is used for Internet packets can be used as a skeleton. T h e ma in difference being that the L N T P in terrupt rout ine is called instead of the IP in ter rupt rout ine. 34 4.4. T h e P r o t o c o l S w i t c h a n d D o m a i n T a b l e s In keeping w i t h the U N I X t rad i t ion , a l l protocol famil ies have their own swi tch tables. T h e purpose of the swi tch table is to provide a convenient and standard ized means of accessing the var ious routines and opt ions associated w i t h the protocol in quest ion. Since th is table was already touched upon in a previous chapter , suffice it to say tha t tha t i t is necessary to create a new file, /sys / l c l n e t / l c l _ p r o t o . c , wh i ch contains the protocol sw i tch , l c l n e t s w . If L N T P is the only protocol associated w i t h the L c l n e t protocol fami ly , then there w i l l be a single entry i n this table. In add i t ion to this swi tch table, there should be declared in the same f i le a L c l n e t d o m a i n table. L i k e the protocol swi tch table there is one of these tables per protocol fami ly . Its fields inc lude an integer wh ich is the protocol fami ly 's numer ic identi f ier, a character s t r ing wh ich is the name of the protocol fami ly (" localnet" i n th is case), the address of the protocol swi tch table, and , in an indirect way , the number of entries i n the protocol swi tch table. One other field i n this table is the " n e x t " field. It is used to cha in the various protocol fami ly domain tables together and , un l ike the other fields, is not in i t ia l ized stat ica l ly . It should be noted that i n order to conform to the exist ing system, the name of a protocol fami ly X ' s domain table should be X d o m a i n . T h u s , L c l n e t ' s domain table is cal led l c l n e t d o m a i n and I n t e r n e t ' s doma in tab le is cal led i n e t d o m a i n . In add i t i on to declar ing these two tables i t is necessary to modi fy the file / s y s / s y s / m p c _ d o m a i n . c by adding the single condi t ional ly compi led l ine " A D D O M A I N ( l c l n e t ) " to the rout ine d o m a i n i n i t ( ) . T h i s l ine is condi t ional ly compi led because it is only to be executed i f the kernel is to be configured w i th the 35 L c l n e t protoco l fami ly . A s wou ld be expected given the names of the rout ines in quest ion, A D D O M A I N ( ) adds the specified domain table to the chained l ist; and d o m a i n i n i t ( ) in i t ia l izes the protocol fami ly by cal l ing the in i t ia l izat ion rout ine (if i t exists) for each protocol in each protocol fami ly . 4 . 5 . The P r o t o c o l L a y e r Since there should exist a non-ambiguous specif ication for the protocol in quest ion, imp lementa t ion of th is layer is in theory rather mechanical . T h e one caveat being that i t is necessary for the implementor to adhere to the var ious constra ints tha t exist when wr i t i ng kernel code. F o r example, favour i te l ibrary rout ines w i l l p robably be miss ing and system calls are not al lowed to be used. T h e one other ma in considerat ion is ensuring that this layer interfaces correct ly w i t h the layers above and below it. In order to interface w i th the s o c k e t layer i t is necessary to create a user request rout ine that w i l l field a l l s o c k e t calls, such as a connect ion request. T h i s rout ine is the s o c k e t layer's hook in to the protocol and must therefore conform to a certa in specif ication [ L E F F 8 3 b ] . In order to interface w i t h the ethernet layer it is necessary to have an interrupt input rout ine whose purpose, as previously ment ioned, is to remove any input packets f rom the in terrupt queue. T h i s rout ine is qui te a bi t s impler than the user request rout ine, and to interface correct ly w i t h the ethernet layer, it merely has to know where the interrupt queue is and wh ich rout ines a n d / o r macros to ca l l in order to remove the packets. A s wou ld be expected, the addresses of these routines are entries i n the protocol swi tch table. 36 A t the t ime I took over the L N T P project a signif icant percentage of the imp lementa t ion of this layer had already been completed. T h e major missing features were error recovery and A Y T support . A l s o , flow contro l had been implemented but not thoroughly tested. T h e reason for th is was that previously L N T P had on ly been operat ing in loop-back mode; i.e., messages were routed r ight back to processes on the same machine and there was no inter-machine communica t ion . T h i s resul ted in a lock-step arrangement whereby the receiving L N T P wou ld receive a da ta packet and be able to act on i t before the sending L N T P cou ld send another packet. A f te r establ ishing inter-machine communica t ion , several experiments were performed and it was discovered that there was indeed a bug in the flow contro l imp lementa t ion wh ich resul ted in a system crash whenever a packet was discarded due to buffer overf low. T h e prob lem turned out to be that an a t tempt was being made to free a N U L L mbuf pointer result ing in a fai led consistency check wh ich caused the p a n i c () rout ine to be cal led. A n o t h e r interest ing prob lem that was encountered was that under certain c i rcumstances the sender was not responding to a H O L D packet as qu ick ly as i t should have been. Speci f ical ly , the scenario was that after receiving a R E S U M E packet the sender wou ld enter in to a loop in wh ich any and a l l packets wh ich had been buffered were t ransmi t ted to the receiver. If, however, the receiver in the meant ime sent a H O L D packet, it wou ld be ignored un t i l a f t e r a l l of the buffered packets had been t ransmi t ted . T h e result of th is delay was often buffer overf low on the receiver side, resul t ing in yet another H O L D - R E S U M E - o v e r f l o w cycle. 37 T o understand the reason that the incoming contro l packet was being ignored by the sender, i t is necessary to recal l that when a protocol is scheduled to be invoked by the ethernet in ter rupt handler it must first wai t un t i l the in ter rupt level drops to a sufficiently low value. T h u s , even though the contro l packet had been taken off the ethernet interface and placed on the appropr iate in terrupt queue (due to the h igh pr ior i ty of ethernet in terrupts) , i t was not un t i l the kernel had finished t ransmi t t ing the buffered packets and had vo luntar i ly rel inquished contro l that the pending network software in terrupt wou ld cause the receiver por t ion of L N T P to be invoked. T h i s prob lem, therefore, was a consequence of the relat ive independence that existed between the sender por t ion of L N T P and the receiver por t ion. U l t ima te l y , i t was deemed necessary to sacrif ice implementat ion pur i ty on the a l tar o f pract ica l i ty . T h e resul t ing solut ion thus required some coupl ing between the sender and receiver por t ions of L N T P by having the sender check the in ter rupt queue pr ior to t ransmi t t ing each packet whenever it was in the aforementioned send loop. If i t shou ld notice that the queue was not empty i t wou ld exit the loop thereby a l lowing the receiver por t ion to execute. A n even more efficient approach would be for the ethernet in terrupt software to determine whether the incoming packet in quest ion is a H O L D packet and , if so, set a flag wh ich the sender por t ion could check. T h e ma in p rob lem w i t h this approach is that it wou ld require the ethernet layer to be aware of the higher level protocol and carry out cer ta in duties on its behalf, thus v io lat ing the concept of protocol layer ing. 38 4.6. Miscellaneous Implementation Problems 4.6.1. Kernel Debugging It is an unfortunate fact of life that kernel code debugging is signif icant ly more dif f icult t o per form in a n efficient manner than is appl icat ion program debugging. It is s imp ly no t possible to r un the kernel under the contro l of your favour i te debugger. T h i s leaves the judic ious use of p r in t statements as the major means of in forming the implementor of the var ious events that may interest h im. T h i s me thod , however, has drawbacks of i ts own. T h e reason for th is is because output t ing to the console is a fa i r ly s low and expensive process wh ich , due to these characterist ics, can in fact al ter the dynamics of an exper iment . T h i s is readi ly understood i f one considers that a receiving process which is causing a great deal of console output cou ld very we l l find itself unab le t o service a network software in terrupt , for example, i n a reasonable t ime. S u c h a delay may result i n a n unexpected buffer overf low. A l te rna te ly , consider the case of a sending process which is causing the kernel to generate a large amount of console ou tpu t . Whereas previously, when no debugging statements were being used, the receiving process was suffering f rom chronic buffer overf low; we now may find tha t this p rob lem has "myster ious ly" disappeared. T h e reason for th is , of course, is tha t the sending machine has been sufficiently slowed down to al low the receiving mach ine enough t ime to process a l l incoming packets. In order to c i rcumvent th is prob lem, an in ternal event buffer was used w i t h event t ime pr in t statements relegated to last resort status. T h e event buffer wou ld thus " l o g " the recept ion and t ransmi t ta l of a packet along w i t h the var ious sal ient features of the packet, e.g., packet type. T o further increase funct ional i ty , th is packet 39 logging feature can be tu rned on or off at the appl icat ion level. T h u s , a t yp ica l exper imenta l run wou ld consist of no pr in t statements un t i l after the connect ion had been closed down. A t this point the event buffer wou ld be dumped to the console and an analysis of the run made. A n o t h e r unfor tunate aspect of kernel code debugging is that one does not merely make a change to the source code, recompile, and then run your program, as is done w i t h appl icat ion programs. One makes a change, recompiles the file in quest ion and then l inks a l l the binaries compr is ing the operat ing system together; a process that may take up to 5 minutes. It is then necessary to br ing the system down and reboot w i th the new (and hopeful ly improved) version. Since, i t is usual ly the case that there are two or more machines invo lved, it is necessary to do this for a l l . B r i ng ing the system down fol lowed by a reboot usual ly takes about another 5 to 10 minutes. O n e saving grace is tha t because the machine is being brought down in tent ional ly i t is not usual ly necessary to per form a file system check; a procedure that adds tens of minutes to the rebooting process. Th i s , however, is not the case when a system crash occurs. In such a s i tuat ion it is necessary to spend the ext ra t ime per forming the file system check in order to determine whether there are any inconsistencies and to fix those that are found. Needless to say, one can expect one's share of sys tem crashes when deal ing w i t h kernel code changes. 4.6.2. Lack of Source Code D u e to the fact tha t S U N is a commercia l organizat ion wh ich sells a " rea l w o r l d " p roduc t i t is not possible for educat ional inst i tut ions to obta in source code on very favourable terms. T h e result of this was that a l though we had the source code 40 for release 1.4 we were not able to obta in the source code for release 2.0 wi thout a fur ther cash out lay. A l s o , even i f U B C h a d decided to purchase the latest release, a delay of many months wou ld have undoubtedly occurred before the code wou ld arr ive. T h i s had been our experience when purchasing the source for release 1.4. T h u s , when an upgrade to release 2.0 was effected it was necessary to go through a few contort ions in order to be able to cont inue work. It was decided that because of the dependence of the diskless workstat ions on the d isk server it wou ld not be advisable to cont inue running release 1.4 on the cl ient workstat ions whi le runn ing release 2.0 on the server. It was also not considered to be a real ist ic op t ion to reboot the server w i t h release 1.4 as there were several diskless workstat ions that relied on i t a n d such a move wou ld have been at best d isrupt ive to them and at worst disastrous i f subtle incompat ib i l i t ies existed. A n o t h e r al ternat ive was to use release 1.4 source for those files wh ich were to be modi f ied and release 2.0 object modules for everything else, result ing in a hyb r i d sys tem tha t wou ld essential ly be release 2.0. Na tu ra l l y , the danger existed that one of the release 1.4 source files used may have been cruc ia l to the upgrade. Indeed, it was found that this was the case. One of the source files conta ined a rout ine that had apparent ly changed signif icant ly enough that the cl ient machine was unable to communica te w i t h the server (and thus not able to boot) . It was at th is point that the aforementioned contort ions came into p lay . Instead of using only those release 2.0 object modules wh ich perta ined to unmodi f ied source files, a l l of the release 2.0 object modules were l inked together when creat ing a new kernel . In add i t ion , the object modules of the release 1.4 modif ied source files 41 were also linked in (as well as those totally new object modules that were concerned with LNTP). Normally, the loader would complain about multiply defined symbols in such a circumstance. To prevent this from happening, two steps were taken: the release 1.4 source files were editted and any routine that was not modified was deleted, and, a small change was made to each global symbol (only routine names were left after editting) defined in the release 1.4 source files; the letter 'X' was appended to each and every such symbol (the reason for doing this is explained in the next paragraph). By itself this would result in a kernel that was functionally equivalent to the original release 2.0 kernel though somewhat larger. In order to turn this kernel into one that supported L N T P it was necessary to then use the adb debugger on it. Adb is an extremely low level debugger which operates at the assembly language level. As such, it does not require that binaries be compiled with special flags, as is the case with the sdb and dbx debuggers. Some of the more interesting features of adb are that it allows the programmer to determine the address of a symbol, as well as alter the contents of any memory location. The significance of this is that it was possible to use adb to determine the addresses of the routines that are to be replaced with the modified release 1.4 'X' routines. For each such routine, the first few machine language instructions were replaced with a long jump to the corresponding 'X' routine. Hence, whenever a call was made to one of these "replaced" routines the jump instruction would be executed and the corresponding 'X' routine would be invoked with all the parameters correctly set up on the stack exactly as if it had be called normally. It was possible to automate this procedure by incorporating it into the kernel makefile. 42 Of course, it was still possible that one of the routines replaced with its modified release 1.4 equivalent may have been so significantly changed in the upgrade that it would not be possible to make the substitution. However, due to the fact that it was only relatively low level ethernet interface routines that had to be modified (as well as the presence of a bit of luck), this was found not to be the case. It was thus possible to boot and run the modified kernel with a working L N T P . 4.7. Chapter Summary We examined the implementation of L N T P in 4.2 B S D U N I X . It was shown that a good knowledge of both the memory management scheme as well as the software interrupt mechanism was required by the protocol developer. It was noted that although it was unnecessary to modify the socket layer (the layer above L N T P ) , it was necessary to modify the ethernet layer (the layer below L N T P ) somewhat. In addition, all of the relevant system tables (such as the protocol switch tables) and data structures were discussed. C H A P T E R 5 L N T P Timing Measurements T h i s chapter examines var ious t iming measurements wh ich are concerned w i t h the speed w i t h wh ich the different protocol layers process a packet. It starts off w i t h a discussion of the exper imenta l setup. It then proceeds to identi fy the impor tant measurements, elaborate on how they were made, and place the results in the appropr ia te context. A s wel l , i t is shown that a l though the LNTP specific processing t ime associated w i t h a da ta packet is much less than that of TCP/IP, other considerat ions come into play wh ich mit igate the significance of this improvement . 5.1. The Experimental Environment Dis t r i bu ted processing appl icat ions are often model led using the c l ient /server parad igm. In this model the server process is constant ly " l i s ten ing" for a service request. W h e n such a request is eventual ly made, the server establishes (and perhaps val idates) the connect ion, and typ ica l ly creates a new process wh ich actual ly services the request. T h e parent process is then able to resume its p r imary role of wa i t i ng for other such requests. W e have adopted th is model in this work. T h u s , to in i t ia te an exper imenta l " r u n " , a cl ient process w i l l make a request to a " l i s ten ing" server process. A f te r receiving the request, the server forks off a worker process. T h i s worker process then , in conjunct ion w i th the client process, performs a file transfer. It is dur ing the 43 44 file transfer tha t var ious measurements are made. A typ ica l r u n w i l l consist of the transfer of 1000 da ta packets, each one contain ing 1024 bytes. A n exper imental session wou ld consist of 50 such runs i n order to generate a to ta l of 50,000 1024-byte da ta packet transfers, thus ensuring a sufficiently high confidence level i n the measurements. 5.2. Timing Measurements In order to obta in meaningfu l measurements it is necessary to measure not just the aggregate packet processing t ime, but also the t ime the packet spends i n the var ious pro toco l layers. These results can be obtained by the jud ic ious use of t imestamps. T h e t imestamps are placed in the da ta part of the packet (which is of no concern since the ac tua l contents of the packet are un impor tant ) and are logged by the reader process in to a file. A t the end of the session a p rogram is r un on the logged da ta wh ich calculates the mean amount of t ime the packet spends i n the different sections of code. O the r impor tant stat is t ical calculat ions are made, such as sample var iances and confidence levels. T imes tamps were made at several strategic points. App rox ima te l y ha l f of these t imestamps were concerned w i th output processing t ime whereas the rest were concerned w i t h input processing t ime. T h e sender process was forced to work i n lockstep w i t h the receiver process; i.e., the sender wou ld only ou tpu t another packet after it had received an acknowledgement from the receiver signifying that i t had processed the last packet. T h e result of this was that a l l measurements wou ld refer to processing t ime only and not include queuing delays. T h e results obta ined f rom the t imestamps enabled the calculat ions of: 45 (1) the t ime i t takes the ethernet interrupt handler to process an incoming packet (contro l and data) (i.e., the t ime it takes the ethernet in terrupt handler to place the packet on the higher level protocol 's in terrupt input queue), (2) the t ime i t takes the protocol ( T C P / I P or L N T P ) to process an incoming packet (i.e., the t ime f rom when L N T P (or T C P / I P ) removes the packet f rom the in terrupt inpu t queue, to the t ime i t passes the da ta to the socket layer) , (3) the t ime i t takes for an outgoing packet to be processed by the s o c k e t level, (4) the t ime it takes the protocol ( T C P / I P or L N T P ) to process an outgoing packet (i.e., the t ime f rom when L N T P (or T C P / T P ) receives the da ta f rom the socket layer, to the t ime it calls the ethernet output rout ine) , (5) a n d , the t ime it takes the ethernet dr iver to process an outgoing packet (contro l and data) (i.e., the t ime i t takes the ethernet output rout ine to execute when cal led by L N T P (or T C P / I P ) ) , T h e protoco l processing t imes for L N T P , i tems (2) and (4) above, were also calculated w i t h checksurnrning enabled. T h e same checksum rout ine that T C P / I P uti l izes was used. Tab le 5.1 summarizes the protocol processing t ime results. F r o m this table we see that L N T P w i thout checksumming enabled greatly exceeds T C P / T P i n packet processing speed. In fact, da ta packet inpu t processing speed for L N T P is about 3.2 t imes as fast, and packet output processing speed is roughly twice as fast. However , when checksumming is enabled the results tu rn out to be not qui te so g lowing. Packe t input processing speed for L N T P becomes only 30% faster than that of T C P / I P , and packet output processing speed is only 7% faster. 46 If, on the other h a n d , the not unreasonable assumpt ion is made that T C P / I P spends as m u c h t ime calcu lat ing checksums as L N T P does (and since the same rout ine was being used by L N T P there is every reason to believe this to be the case), then we note that the da ta packet input processing t ime for non-checksummed L N T P is 1.7 ((2.06 - (1.60 - 0.64)) / 0.64) t imes as fast as that of non-checksummed T C P / I P a n d da ta packet ou tpu t processing t ime for non-checksummed L N T P is 1.12 ((1.87 - (1.75 - 0.95)) / 0.95) t imes that of non-checksummed T C P / I P . It is th is compar ison that is the most useful and reasonable to make because i t factors out the non-protoco l specific affects of checksumming leaving only the ac tua l t ime spent by the pro toco l in processing the packet. T a k e n in this context L N T P outperforms T C P / I P qui te impressively. T h e observant reader may have real ized that the checksum t imes for input packet processing and ou tpu t packet processing are different; 0.95 (1.60 - 0.64) msec vs. 0.80 (1.75 - 0.95) msec, respect ively. T h i s may seem puzzl ing since i t is the same da ta being checksummed in both cases. T h e cause of this can be traced to the fact that when the packet is being ou tpu t i t resides in 2 m b u f s whereas when i t is being inpu t i t resides in 3. If one looks at the code that implements the checksum a lgor i thm, i t is readi ly obvious that there is a not insignif icant amount of overhead Pro toco l Packet Processing T imes E v e n t L N T P (msec) L N T P ( * ) (msec) T C P / I P (msec) D a t a P k t (in) D a t a P k t (out) 0.64 0.95 1.60 1.75 2.06 1.87 Tab le 5.1 P ro toco l D a t a Packe t Processing T imes (*) L N T P Resul ts w i t h checksumming enabled 47 associated w i t h the processing o f each m b u f that makes up the packet; the result being that three m b u f s conta in ing roughly 1 k i lobyte of da ta take longer to process than 2 m b u f s conta in ing the same byte count. T h e ethernet packet processing t imes, items (1) and (5), (Tab le 5.2) are quite interest ing in that they indicate that there is no real difference in processing t imes for cont ro l and data packets. T h i s observat ion in i t ia l ly appears non- in tu i t ive given the difference i n sizes of the packets invo lved. However , if one considers that the t imes Listed do not include the t ime required by the ethernet interface to place a packet on the ethernet or pu l l one off of i t , and , i f one realizes that often no copying of da ta is going on at that level (see section 4.2), then the results seem more plausib le. T h e reason tha t the aforementioned t imes are not inc luded is that once an outgoing packet is queued up by the ethernet handler and the interface is started the interface works in paral le l w i t h the host machine, a l lowing the host to service any other ou ts tand ing requests. S imi la r l y , i t is only after the interface has pul led the packet off of the ethernet that the interrupt handler is invoked, result ing in an over lapping of host C P U and interface processing. Th i s over lapping occurs even if there is on ly a single packet. However , in th is case the host C P U w i l l probably be invo lved in non-protocol related act iv i t ies dur ing the over lapping per iod. E thernet Packet Processing T imes Even t T i m e (msec) D a t a P k t (in) 0.51 D a t a P k t (out) 1.40 C n t l P k t (in) 0.49 C n t l P k t (out) 1.39 Tab le 5.2 Ethernet Packe t Processing T imes 48 Las t l y , there are several other miscellaneous measurements that are needed: (1) T h e socket processing t ime for an output da ta packet ( i tem (3)): 2.06 m s e c . (2) T h e L N T P specific processing t ime for an outgoing contro l packet: 0.40 m s e c . (3) T h e t ime between when an input da ta packet is queued on the protocol in ter rupt queue by the ethernet interrupt handler and when the protocol is actua l ly invoked to process any queued packets: 0.68 m s e c . T h e reason for th is delay is that a context swi tch occurs (section 4.2) when the in terrupt handler is finished execut ing. One aspect of packet processing t ime that d id not lend i tself to t imestamping was sys tem ca l l overhead. T o this end a new, system cal l was implemented wh ich d id absolutely noth ing. T h i s no-op system ca l l was then called a large number of t imes in an app l ica t ion program. T o t a l t ime spent dur ing the cal ls was d iv ided by the number of cal ls, resul t ing i n the mean system ca l l overhead. A value of 0.2 msec was obta ined as the overhead. No te , that the no-op system cal l was n o t cal led in a loop as th is wou ld have resulted in a somewhat inaccurate value wh ich wou ld have inc luded the t ime required to perform the loop test. Instead, the loop was unrol led a l lowing the ca l l to be sequential ly executed the required number of t imes. F r o m the above measurements i t is possible to calculate some useful der ivat ions wh ich are to be used in the tun ing of L N T P performance (see chapter 6): (1) T , is the to ta l t ime it takes to output a da ta packet s tar t ing at the cl ient level . It is composed of the system cal l overhead, the socket processing t ime, the L N T P processing t ime, and the ethernet dr iver processing t ime. T h i s gives us: 49 T , . = 0.2 + 2.06 + 0.95 + 1.40 = 4.61 msec. QcLtOC (2) T d a t . p is the t ime it takes to input a da ta packet at the protocol level; i.e., the t ime i t takes an incoming da ta packet to reach and be processed by LNTP. It is composed of the ethernet handler processing t ime, the t ime f rom when LNTP is scheduled by the ethernet handler to when LNTP is actual ly i nvoked , and the t ime it takes LNTP to process the packet. T h i s results i n : T , ,. --- 0.51 + 0.68 + 0.64 = 1.83 msec, datip (3) T c q is the output delay t ime for a control packet. It consists of the ethernet processing t ime and the LNTP processing t ime. T h e result being: T = 1.39 + 0.40 = 1.79 msec, co It should be noted that great care was taken to obtain accurate results; a large number of packets was used in each test session. T h e result o f th is was that the m a x i m u m 95 per cent confidence in terva l for the previously ment ioned t im ing measurements was + / - 1.7 per cent and , indeed, most measurements had confidence intervals we l l below th is figure. It shou ld also be noted that the protocol specific processing t ime is only a f ract ion of the t ime spent in processing a packet. Simple ar i thmet ic yields a value of 20.6% (0.95 / (0.2 + 2.06 + 0.95 + 1.40)) of the t ime spent in LNTP code for outgoing packets. T h e result of th is is that the magni tude of the gain in throughput that one wou ld expect LNTP to exhib i t over TCP/IP is signif icantly mi t igated (more on this in chapter 6). 50 5.3. P r o b l e m s E n c o u n t e r e d T h e m a i n p rob lem encountered when at tempt ing to make the aforementioned measurements was that the S T J N ' s clock resolut ion was 25 m s e c ; a value much too large to be of any use. T h e solut ion was to adjust the clock per iod of the Am9513 S y s t e m Timing Controller to 1 msec. T h e Am9513 is the device used to implement non-maskable in terrupts (nmi). It generates an interrupt per clock per iod wh ich is serviced by the appropr iate handler. B y causing th is in ter rupt to occur every mi l l isecond, as opposed to every 25 m s e c , it was possible to implement a new clock w i t h an appropr ia te resolut ion. T h i s necessitated the creat ion of a new assembly level rout ine to replace the exist ing handler. T h i s rout ine wou ld increment a clock var iable every t ime it was invoked (i.e., every msec.) and wou ld then determine whether i t was t ime to ca l l the appropr iate nmi rout ine; the lat ter funct ion of the rout ine is necessary since the nmi rout ine expected to be cal led every 25 m s e c , and not once every mi l l isecond. T h e result of th is was a clock w i t h a 1 mi l l isecond per iod wh ich was completely under ou r cont ro l . T h e penal ty pa id was that the host was being in terrupted every mi l l isecond instead of once every 25 mi l l iseconds, causing some degradat ion in performance. However , since the rout ine that implemented the clock consisted of only 12 assembly level inst ruct ions performing very st ra ight forward funct ions, i t is unl ike ly tha t the system performance suffered too greatly. 5.4. C h a p t e r S u m m a r y In th is chapter var ious protocol related t im ing measurements were reported. T h e t ime L N T P takes to process a data packet was compared to the t ime taken by T C P / I P . A l t h o u g h L N T P took signif icantly less t ime, the other layers cont r ibuted enough overhead to reduce the importance of L N T P ' s lesser processing t ime. It was also shown that checksumming protocol packets added signif icant processing t ime. C H A P T E R 6 L N T P Parameters In th is chapter we consider the effect of L N T P ' s var ious parameters on protoco l th roughput . T h e parameters under considerat ion are the buffer sizes and the flow cont ro l parameters: N t x , N t r l , and N t r 2 . S imple mathemat ica l formulas are developed to determine the o p t i m u m values of some of these parameters. A s wel l , the results of several exper iments wh ich were performed to empir ical ly determine L N T P ' s o p t i m u m parameter sett ings are discussed. Where appropr ia te , compar isons are made between the predicted and the observed results. 6.1. General In order to achieve a h igh degree of efficiency in any type of system a careful selection of operat ing parameters is necessary. S ince performance is a funct ion of the work load a n d work load varies considerably w i t h t ime and is not very predictable, it is often the case that tradeoffs of var ious k inds are required and that there does not exist any " o p t i m u m " a l l purpose set of values. T h i s is undoubtedly the case w i t h protocols in general and L N T P in par t icu lar . F o r example, we know that i t is possible to increase throughput up to a certain extent by merely increasing buffer size [ C H A N 8 4 ] . However , after a point the law of d imin ish ing returns starts to take i ts to l l and only very modest increases are experienced in contrast to the very real gains wh ich had previously been obtained. T h u s the question " W h a t is the buffer size tha t w i l l produce the highest throughput possible?" is more meaningful ly 52 53 expressed as " B e y o n d what buffer size do improvements in th roughput become ins ign i f icant?" T h i s mat ter of tradeoffs can become even more cruc ia l i f one or more of the resources in quest ion is scarce. W i t h regard to L N T P i t was found that those parameters wh ich have the most signif icant impact on performance were the size of the send and receive buffers, and the values of the threshold parameters N t x , N t r l , and N t r 2 . T h e t imeout values of the var ious t imers were not considered to be very impor tant since their purpose is, for the most par t , to correct for error condi t ions; condit ions wh ich w i l l rarely occur in a L A N envi ronment . It should be noted that i n the fo l lowing the main performance index is taken to be throughput . 6 . 2 . B u f f e r S i z e s B o t h the send and receive buffers are manipu la ted not only at the protocol level , bu t a lso, by necessity, at the socket level. T h e pract ica l result of this is tha t any new protoco l that wishes to make use of the exist ing socket layer is forced to accept the s tandard buffer structure and any l imi tat ions that accompany it. One such l im i ta t ion is that there exists a m a x i m u m buffer size of 15 K B . For tuna te ly , it was observed that incrementa l increases in buffer size at the h igh end of the range resulted in negligible increases in throughput , meaning that this l im i ta t ion is nonexistent for a l l in tents and purposes (see table 6.1). In order to determine the op t ima l buffer size for L N T P a series of measurements were taken. T h e test setup for these measurements consisted of a sender process residing on a S T J N - 2 t ransmi t t ing one thousand 1024 byte packets to 54 T h r u p u t vs. Buffer Size Buffer Size T C P L N T P (kbytes) (kb i ts /s) (kbi ts /s) 2 510 767 4 841 1056 6 897 1151 8 922 1188 10 961 1191 12 983 1217 14 1010 1226 Tab le 6.1 Throughput vs. Buffer Size a receiver process residing on another S U N - 2 . T h e packet size of 1024 bytes was chosen because the network ing software of 4 .2 B S D U N I X has been opt imized for that size [ C A B R 8 5 ] , and because i t is also a frequently used un i t of disk storage. It was decided to per form memory- to-memory rather than disk-to-disk transfers because it is possible to make the effect of disk i /o invis ible by the use of an asynchronous process. T h i s process would be charged w i th the task of fetching mul t ip le disk blocks i n a single request and passing them to the sender process for network transfer. T h e result of this wou ld be that the sender process wou ld rarely be b locked for want of da ta . Na tu ra l l y , the preceding assumes that neither is disk I /O inord inate ly fast, nor is packet t ransmission t ime inordinately s low. T h e th roughput of a given run was calculated in the usual manner by d iv id ing the number of b i ts transferred by the t ime taken for the transfer to complete. Connec t and disconnect t imes were not included in the measurements. 50 such runs were made for each of a range of buffer sizes i n order to achieve an acceptable s ta t is t ica l confidence level. T h e graph of average throughput versus buffer size (refer to figure 6.1) clearly indicates that increasing buffer sizes beyond about 8 K B results 55 in only marg ina l increases in throughput . T h i s led us to conclude that the op t ima l size for L N T P ' s buffers was in the 6 to 8 K B range. Somewhat arb i t rar i ly , it was decided that a value of 8 K B would be used for further exper imentat ion. Referr ing to figure 6.1 we note that below buffer sizes of about 6 K B it is the buffers themselves wh ich const i tute the bott leneck (since throughput increases roughly l inearly w i th buffer size) whereas above th is value processing is the bott leneck. O n e wou ld therefore expect that when operating in the buffer bott leneck Th roughpu t (kb i ts /s ) 2 4 6 8 10 12 14 Buffer Size (kbytes) F igure 6.1 Th roughpu t vs. Buffer Size 56 region a protocol with a superior flow control mechanism would clearly out-perform a protocol utilizing a more traditional approach. Indeed, this would appear to be the case as it can be seen in table 6.1 that LNTP's greatest gains over TCP/IP occur in this region. Once into the region where the buffer limitations no longer limits the throughput, however, the flow control facilities of the protocols are not as heavily utilized and the higher throughput experienced by L N T P can be attributed to the reduced processing time associated with the less complex protocol. Note that as pointed out in chapter 5, even though the receiver processing time associated with L N T P is significantly less than that of TCP/IP (0.64 msec. vs. 2.06 msec, on a SUN-2) the overhead attributed to the ethernet and socket layers is enough that these gains are not as significant as one might expect. 6.3. Flow Control Parameters It should be noted that due to the fact that the sender and receiver parts of a L N T P incarnation are logically separated it is theoretically possible to set Ntx, the threshold associated with the sender part, independently of N t r l and Ntr2, the thresholds associated with the receiver part. Also note that in the following, all client data packets are 1024 bytes in size and that M A X P A K T L E N , the maximum packet size (excluding header) that L N T P can handle, is set to 1024 bytes. This allows us to simply and accurately calculate N b (the total number of L N T P buffers available), as the total buffer size, in bytes, divided by 1024. For example, a total buffer size of 6 KB results in a value of 6 for Nb. 57 6 . 3 . 1 . N t x A s descr ibed in section 2.3.4, when the sender's w indow size crosses into region 2, it sol ic i ts one or more acknowledgements f rom the receiving peer in order to release some buffer space. These sol ici tat ions take the form of P R O M P T s p iggybacked on D A T A packets, i.e., D A T A - P R O M P T packets. Since the recept ion of a P R O M P T by the receiver results i n a contro l packet being issued (either a H A L T packet or a R E S U M E packet) , N t x should be made as large as possible so as to min imize the number of reverse contro l packets d ispatched. However , th is must be balanced w i t h the requirement that the sender not be blocked under no rma l condi t ions. T h u s , i t is also necessary for N t x to be smal l enough so that the cont ro l packet output as a result of the first D A T A - P R O M P T packet reaches the sender in t ime to disal low the possibi l i ty of the sender becoming b locked due to lack of free buffers. T h e s i tuat ion that wou ld best satisfy these two confl ict ing requirements wou ld have that first cont ro l packet ar r iv ing immediately after the sender has finished sending the last packet i t can send before becoming blocked. T h i s can be expressed determin is t ica l ly as: ( N b - N t x ) T d a t o c > = T d n + T c n + T d a t i p + T c o (6.1) where N b > = N t x , N b is the number of send buffers, T d a t o c is the to ta l t ime it takes to ou tpu t a da ta packet star t ing at the client level, and are the network delay t imes for da ta and control packets respectively. T ^ . is the t ime it takes to inpu t a da ta packet at the protocol level , and is the output delay t ime for a con t ro l packet. 58 T h e R H S of this equat ion can be thought of as the t ime required for a DATA_PROMPT packet , once injected into the ethernet, to cause an update ( in this case RESUME) packet to be outpu t and to reach the sender and be processed. T h e L H S of the equat ion is s imply the t ime i t takes to ou tpu t ( N b - N t x ) DATA_PROMPT packets. Note the fol lowing concerning the above relat ionship: (1) F o r the sake of s impl ic i ty , queueing delays were tota l ly ignored. (2) T h e assumpt ion is made that the sender is almost never b locked. T h e basis for the assumpt ion in point (2) is that this is the s i tuat ion we are t ry ing to achieve. It is also needed for us to be able to use T d a t o < ; . If th is were not the case it wou ld be necessary to replace T , w i t h some other value that wou ld QclbOC reflect the fact that the sender wou ld be occasionally t ransmi t t ing consecutive packets st ra ight f rom the send buffer and thus w i t h none of the overhead associated w i t h a sys tem ca l l and socket level packet man ipu la t ion . Needless to say such a s i tuat ion arises if the sender receives a RESUME packet after hav ing been blocked for some t ime. In order to use equat ion 6.1 it is necessary to make use of some of the t im ing measurements reported in the previous chapter; specif ical ly, to determine the values of T , , T . .. , and T . Recapsu la t ing f rom that chapter we get: datoc' datip' co r T , , = 4.61 msec datoc T J = 1.83 msec datip T = 1.79 msec. C O 59 T h e network delay parameters, T ^ and ^ c n , are calculated by not ing that a 10 M b / s e c ethernet requires 100 nsec to t ransmit 1 bi t ; and that a da ta packet consists of 1024 bytes of da ta plus 18 bytes of header, whereas a contro l packet consists of only 18 bytes of header. Not ice that the propagat ion delay is negligible compared to the t ransmiss ion delay i n L H N s . T h u s , simple ar i thmet ic yields: T J = 0.83 msec dn T = 0.14 msec. cn Subst i tu t ing these values in to (6.1) yields N b - N t x > = 0.99, thus ind icat ing that the va lue of N t x should be N b - 1. T h e exper imenta l setup was much the same as used in determining the o p t i m u m buffer size except tha t the var ied parameter was N t x . If we consider table 6.2 we see tha t this predic t ion for the op t ima l value of N t x (i.e., 7) matches we l l w i t h the recorded observat ions. Observat ions also indicated that the sender was rarely b locked thus va l idat ing the above assumpt ion concerning the frequency of th is s i tuat ion. It is also of interest to note that when N t x equals N b throughput drops off qui te a b i t . T h i s is to be expected since that value of N t x means that the sender w i l l of necessity be b locked after sending a fu l l set of buffers' wor th of packets. T h e exper iment was then repeated w i t h an art i f ic ial ly slowed down receiver. (The reader process was made to execute a do-nothing loop between r e a d s to achieve th is) . T h e purpose of s lowing down the receiver was to force the protocol i n to the flow-controlled region and thus s imulate a slow machine. T h e results of th is exper iment (table 6.3) seem to indicate that in th is case the only value for N t x that produces a signif icant increase in throughput is N b . Th i s is to be expected i f one considers that : 60 T h r u p u t vs. N t x N t x T h r u p u t (kb i ts /s) 3 1054 4 1097 5 1134 6 1146 7 1155 8 1100 Tab le 6.2 Throughpu t vs. N t x Sender Speed equals Receiver Speed T h r u p u t vs. N t x N t x T h r u p u t (kb i ts /s) 3 862 4 893 5 911 6 958 7 976 8 1085 Tab le 6.3 Throughpu t vs. N t x Sender Speed exceeds Receiver Speed (1) there is no advantage in the sender finding out about the state of the receiver by issuing a DATA-PROMPT pr ior to t ransmi t t ing the N b t h packet since it is p robab ly going to block anyway , and (2) the ex t ra reverse cont ro l packet traffic that is generated when N t x is less than N b w i l l most l ikely have a negative impact on throughput . Na tu ra l l y , equat ion 6.1 could not have been used to obta in this result. T h i s is because when the sender is much faster than the receiver, the sender w i l l a lways have to wa i t for the receiver. T h i s effect dominates the behaviour of the interact ion rendering the equat ion i nva l i d . 61 G i v e n that L N T P ' s two region flow contro l s t ructure wou ld most l ikely outper form a more convent ional structure when the receiver and sender are evenly matched in speed (because th is is the s i tuat ion where the possibi l i ty of overlap is the greatest), i t was decided to opt imize N t x for this possibi l i ty . Hence, a value of N b -1, i.e., 7, was chosen. Op t im iza t i on for this s i tuat ion also makes sense when one considers that most L A N s connect homogeneous machines wh ich would a l l be operat ing at roughly the same speed. Las t l y , another considerat ion to be made when choosing N t x is that the value should be sma l l enough so that i t is not possible for the sender to send a ful l set of buffers' wor th of packets w i thou t exceeding N t x . If this constraint is not observed then it is possible for a fa i r ly serious degradat ion in performance to occur. T h i s is a result of the receiver not sending an acknowledgement, since none was requested, wh i ch in t u rn w i l l result i n a D A T A packet t imeout occurr ing. T h e sender w i l l then resend the last packet and it is only after the receiver gets this dupl icate packet that the receiver w i l l send the necessary update in format ion. In order to prevent this unfor tunate state of affairs f rom becoming possible i t is merely necessary for N t x to conform to the fol lowing relat ionship: N t x < = ceil( sendspace / M A X P K T L E N ) (6.2) where cei l is the ceil ing func t i on 1 , sendspace is the to ta l send buffer set size in bytes, and M A X P A K T L E N is the m a x i m u m packet size in bytes that L N T P can handle. No te that E q u a t i o n (6.2) can also be expressed as: N t x < = N b (6.2.1) Where N b is the number of send buffers avai lable (this assumes that the cl ient level 1 The ceiling function computes the smallest integer greater than or equal to its argument. 62 packet size is M A X P A K T L E N ) . 6.3.2. N t r l N t r l is the parameter that determines i f and when the receiver enters in to the flow-controlled region. S ince th is change in protocol state is heralded by the issuance of a H O L D packet i t is desirable that the receiver only exceed N t r l when i t becomes absolutely necessary. T h e consequence of an undu ly low value for N t r l is that the sender w i l l cont inua l ly find itself in the H O L D state for no useful reason and a degradat ion i n th roughput w i l l therefore occur. In order to determine the op t ima l value for N t r l i t is useful to consider the thought process that went in to determining the op t ima l value for N t x . Some reflection w i l l reveal tha t a lmost the same formula can be appl ied to this p rob lem y ie ld ing: (Nb - N t r l ) T j „ > = T , + T + T , + T (6.3) v ' datoc dn cn datip co v ' T h e only difference being that N b in th is case refers to the number of receive buffers, not send buffers, and N t x has been replaced by N t r l , Subs t i tu t ing in to the above reveals an op t ima l N t r l value of N b - 1. A glance at table 6.4 appears to indicate that for the case of the speed of the sender equal l ing the speed of the receiver there is no real difference in throughput for any of the var ious values of N t r l used. T h i s result is qui te reasonable i f one considers that in the case of the sender and receiver being of equal speed, the receiver w i l l rarely, i f ever, go into the flow-contro l led region, prov ided the sender and the receiver have the same amount of buffer. T h u s , whatever inefficiencies that may be experienced due to the 63 unnecessarily low value of N t r l will occur sufficiently infrequently that the overall effect will be insignificant. If we now consider the case where the sender is faster than the receiver we obtain what at first appears to be rather surprising results (table 6.5). It seems that throughput increases with N t r l up to, and including, the case where N t r l equals Nb. In order to place these results into perspective it is necessary to realize that: (1) the receiver is often operating in the flow-controlled region, and (2) the size of the send buffers equals the size of the receive buffers, and (3) the value of Ntr2 is zero. The last two points ensure that it is not possible for the receiver to have its receive buffers overflowed by an overactive sender on coming out of the flow-controlled region. Thus, since the possibility of buffer overflow has been eliminated the only other factor that will have much of an effect is the reverse control packet traffic. Since the amount of reverse control packet traffic generated will be indirectly proportional to the the value of N t r l we find that the experimentally determined optimum value for N t r l is Nb. Thruput vs. Ntrl Ntrl Thruput (kbits/s) 4 1146 5 1147 6 1141 7 1146 8 1147 Table 6.4 Throughput vs. Ntrl Sender Speed equals Receiver Speed 64 Thruput vs. N t r l N t r l Thruput (kbits/s) 4 950 5 968 6 977 7 994 8 998 Table 6.5 Throughput vs. N t r l Sender Speed exceeds Receiver Speed It would therefore appear that the calculated optimum value for N t r l did not apply to either the case where the sender was faster than the receiver, or to the case where the sender and the receiver were of the same speed. In fact, it would probably be necessary to be using a non-zero value for N t r 2 for this optimum value to demonstrate its superiority, since this would allow for the possibility of buffer overflow which is just what N t r 2 is supposed to prevent. The reasoning behind a zero valued N t r 2 will be explained later. The situation where the receiver speed surpasses the sender speed is a non-issue since N t r l will hardly ever be exceeded. 6.3.3. N t r 2 The original purpose of N t r 2 was to provide ahysterisis for when the receiver exits from the flow-controlled region. Lack of such a hysterisis allows for the possibility of the receiver oscillating in and out of the flow-controlled region. Initial experimental observations indicated that for file transfers between two S U N - 2 s , operating at the same speed, a value 2 less than N t r l (i.e., N t r l - N t r 2 = 2) appeared to be sufficient. Under these conditions the receiver rarely ventured into the flow-controlled region, and on those occasions that it did no problems were 65 experienced when it moved out of said region. Observat ions were also made w i t h an art i f ic ia l ly slowed down receiver process. It was discovered that i n th is c i rcumstance the previously stated value of N t r 2 was inadequate. W h a t tended to happen was that as soon as the protocol crossed the N t r 2 threshold and issued a R E S U M E i t wou ld find itself inundated w i th enough D A T A packets to overf low its receive buffers. T h e receiver wou ld then issue a H O L D and eventual ly a R E T X M I T . T h i s s i tuat ion was clearly not acceptable as it resul ted in many packets hav ing to be retransmit ted. T h e obvious remedy for this state of affairs is to reduce the value of N t r 2 . However , the on ly value of N t r 2 that guarantees that a fast sender cannot overf low a s low receiver 's buffers is zero. B u t a value of zero should also result in a reduced th roughput for the case of the sender and receiver being of equal speed. In order to see how great this hypothesized reduct ion was, another experiment was performed. T h i s exper iment was s imi lar to the one that was used to empir ical ly determine the op t ima l value of N t x except tha t the vary ing parameter was N t r 2 . T h e results of th is exper iment (table 6.6) clearly indicate that the impact of a zero value for N t r 2 is negl igible. In fact, the throughput observed for the range of N t r 2 values f rom 0 to 5 were a l l w i t h i n about 1 per cent of the highest throughput rate; a difference wh ich is s tat is t ica l ly insigni f icant. T h i s result is to ta l ly consistent w i th what one wou ld expect after considering the outcome of the exper iment involv ing N t r l . Name ly , that since the sender and receiver are equally fast the receiver w i l l hard ly ever go in to the flow-controlled region and a poor value for N t r 2 is thus not l ikely to have m u c h of an impac t on 66 throughput . Repeat ing the exper iment w i t h a slow receiving process yields not total ly unexpected results (table 6.7). W e note that the non-opt imal i ty of a zero value for N t r 2 is now manifest ing itself. T h i s is due not only to the fact that the receiver is now spending a signif icant amount of t ime in the flow-controlled region, but also that the receiver is not tha t much slower than the sender. If the receiver was indeed that m u c h slower than the sender a value of zero wou ld be op t ima l since it wou ld not be possible for any packets to be consumed between the issuance of a R E S U M E and the recept ion of N b packets, result ing in a receive buffer overf low. (The assumpt ion being that the H O L D that wou ld be sent when N t r l was exceeded wou ld not get to the sender in t ime since the sender wou ld be t ransmi t t ing previously buffered packets whereas N t r l was opt imized for the case where there was a sys tem ca l l between the t ransmission of consecutive packets.) However , in the case at h a n d it is possible for the receiver to consume one packet pr ior to the recept ion of the N b packets g iv ing us an op t ima l value of one. T h r u p u t vs. N t r2 N t r 2 T h r u p u t (kb i ts /s) 0 1152 1 1150 2 1152 3 1148 4 1147 5 1139 6 1129 7 1099 Tab le 6.6 Throughpu t vs. N t r2 Sender Speed equals Receiver Speed 67 It is of interest to note that even though zero is a sub-opt imal value for N t r 2 it is nevertheless not that bad . In fact i t is only about 4 per cent smaller than the op t ima l va lue, and consistent ly better than the other N t r 2 values. So, it wou ld appear tha t , in this case, the sub-opt imal value of zero does not result in any serious degradat ion of throughput . T h i s experiment was again repeated, however, this t ime the receiver process itself was not slowed down. Instead, a process that was in an inf inite loop do ing absolutely noth ing (for(;;); in C ) was started up in the background and reniced to a pr ior i ty of 19. Due to the fact that this is the lowest pr ior i ty possible this process wou ld only r un when no other process wants to. U p o n measuring the transfer rate a value of 1083 K B / s was observed. T h i s represents a decrease of about 9% f rom the op t ima l ly recorded value of 1188 K B / s . T h i s result clearly indicates that machine work load has an appreciable affect on throughput . However, due to the diff icult ies inherent i n work load character izat ion, a more comprehensive s tudy was not done on th is aspect of the protocol 's performance. T h r u p u t vs. N t r2 N t r 2 T h r u p u t (kb i ts /s) 0 1012 1 1053 2 976 3 914 4 881 5 814 6 682 7 508 Tab le 6.7 Throughpu t vs. N t r2 Sender Speed exceeds Receiver Speed 68 As stated above the only N t r 2 value that guarantees that a fast sender cannot ever overflow a slow receiver's buffers is a value of zero. This statement rests on the assumption that the sender's send buffers are equal in size to the receiver's receive buffers. This state can easily be achieved by choosing a single total buffer size for both send and receive buffers and imposing that value on each machine that is part of the L A N . It is only because of the controllable environment that characterizes L A N s that such standardization of parameters is possible. The situation would be quite different in a L H N environment where the chances of getting all the concerned machines to conform to a single set of parameters is rather remote. 6.4. Effect o f C h e c k s u m m i n g o n L N T P ' s Performance A l l of the previous measurements were done with no checksumming being performed by L N T P . The reason for this is that the ethernet tranceivers already perform this operation in hardware rendering the need for L N T P to do so redundant. However, since calculating checksums represents a significant amount of overhead and since T C P / T P does perform these calculations another experiment was run in which L N T P calculated checksums for its packets using the same routines that T C P / I P uses. The results of this experiment were that L N T P experienced a decrease in throughput. The values obtained, using optimized parameters for L N T P , once again showed L N T P demonstrating a greater throughput than T C P / I P , with L N T P recording a throughput of 1046 K B / s versus T C P / I P ' s 931 K B / s . However, if one remembers that previously L N T P was enjoying a 28 per cent superiority in throughput when compared to T C P / I P . the newly measured 12 per cent superiority pales in comparison. 69 U p o n invest igat ion i t was discovered that performing checksumming increases that amount of incoming packet processing wh ich can be a t t r ibuted solely to L N T P f rom 0.64 msec to 1.60 msec for a 1024 byte D A T A packet; an increase of 250 per cent. T h i s s t i l l compares reasonably wel l to T C P / T P ' s processing t ime of 2.06 msec. However , when the socket and ethernet layers ' overhead are inc luded, the packet th roughput rate is only modest ly greater (12%) than T C P / I P ' s . 6.5. C h a p t e r S u m m a r y A s expected increasing buffer sizes beyond a certain value resulted in only marg ina l gains in throughput . T h u s , opt imizat ion of buffer sizes amounted to looking at the observed da ta and determin ing at what buffer sizes the increase in throughput became insigni f icant. A value of 8 K B was eventual ly chosen. Ideal ly, i t wou ld have been possible to assign a numer ica l cost for each uni t of buffer used and a numer ica l benefit for each uni t o f transfer rate observed. T h i s wou ld have al lowed the determinat ion of the o p t i m u m buffer sizes that took in to account the var ious needs and l im i ta t ions of the system under s tudy. Op t im i za t i on of the threshold parameters, N t x , N t r l , and N t r 2 was not as s t ra ight forward as opt imiza t ion of the buffer sizes. T h e reasons for this were: (1) th roughput d id not usual ly monotonical ly increase w i th a monotonical ly increasing (or decreasing) threshold paramenter, unl ike the case w i th buffer size, and (2) the o p t i m u m value for a threshold in the case where the sender speed is equal to that of the receiver speed is not necessarily the o p t i m u m value for the case when the sender is faster than the receiver. 70 T h u s , i t was necessary to make some compromises in order to arr ive at a set of parameters that wou ld be used regardless of the relat ive speeds of the sender and the receiver a n d the work load experienced at either machine. It was discovered that N t r l and N t r 2 had l i t t le affect on throughput when the sender and receiver were evenly matched , bu t tha t a wel l chosen N t x could result in a measurable increase in throughput . A formula was developed for calculat ing this o p t i m u m value wh ich exper imenta l evidence appeared to val idate (see table 6.2). Note that op t im iz ing N t x for the matched sender and receiver s i tuat ion makes sense since most L A N s are homogeneous; also, i t is the c i rcumstance where L N T P ' s flow contro l strategy is most l ikely to excel over t rad i t iona l strategies. Since the purpose of N t r l and N t r 2 is to prevent buffer overf low in the case where the sender is faster t han the receiver it seemed logical to opt imize these parameters for such a s i tuat ion. However , the p rob lem quick ly arose that there was at present no way of knowing either the relat ive speeds of the machines in quest ion or the work load they wou ld be experiencing at connect t ime (not to ment ion the p rob lem of a rapid ly va ry ing work load). T h u s , a N t r 2 value of zero was chosen w i t h the purpose of p lanning for the worst , even though such a value meant that L N T P wou ld be degenerating in to a modif ied blast protocol . For tuna te ly , table 6.7 seems to suppor t tha t this is indeed a reasonable choice. It shows that even though a value of zero is not op t ima l , i n th is instance, i t is nevertheless only s l ight ly worse than the op t ima l value but is better than a l l other values of N t r 2 . N t r l , on the other hand , turns out to be not very cr i t ica l given a zero value for N t r 2 and equal send and receive buffer sizes. Since these two condi t ions ensure that 71 buffer overf low cannot rout inely occur, the or ig inal ma in purpose of N t r 2 ( that of tel l ing the sender to stop sending) becomes less impor tant . Tab les 6.4 and 6.5 ind icate th is state of affairs. U l t ima te l y , a value of N b - 1 for N t r l was chosen in the hope tha t i n the case where the sender is only s l ight ly faster than the receiver th is " o p t i m u m " value may have a posit ive affect on throughput . F i n a l l y , i t was discovered that software checksum calculat ions in t roduce a signif icant amoun t of processing overhead. T h u s , at least par t of L N T P ' s superior th roughput rate, when compared to that of T C P / I P , can be a t t r ibu ted to the fact tha t i t does not calculate checksums. C H A P T E R 7 Concluding Remarks In this, the final chapter, we consider where we have been with L N T P , and where we may take it in the future with regards to possible improvements and future work. 7.1. S u m m a r y This thesis has presented a description of a new L A N protocol called L N T P . It's implementation in 4.2 B S D U N I X running on S U N workstations was examined and various measurements were made. Using these measurements it was possible to tune this protocol and achieve a greater throughput in data transfers than was obtainable by T C P / D ? , which was originally designed for L H N s . It was also shown that the significant overhead incurred by the n o n - L N T P layers mitigated the the gains which could be realized by L N T P with respect to throughput when compared with T C P / D ? . 7.2. Possible Improvements As has been previously stated one of the changes made to L N T P was replacing the original selective retransmission scheme with the less complicated go b a c k n retransmission scheme. This change was justified on the basis that it was compatible with the L N T P philosophy of simplicity without undue loss of efficiency. Similarly, it could be argued that the parameter N t r 2 could be discarded (i.e., hard-wired to a value of zero) with only a minimum effect on efficiency. If we recall 72 73 the pertinent tables (tables 6.6 and 6.7) we see empirical evidence that a zero value is optimal for the situation in which the sender and receiver are of equal speed, and that it is only slightly sub-optimal for the case in which the sender speed exceeds that of the receiver speed. This suggests that in the interest of simplicity (the underlying philosophy of LNTP) it would indeed be reasonable to discard this parameter. A third improvement would be to implement a more efficient technique to get around the lock-out problem discussed in section 4.4. (This was the problem whereby a sender after receiving a R E S U M E packet enters into a loop transmitting any buffered packets whilst ignoring the presence of any queued input packets.) The present solution involves the sender checking for the existence of any type of packet on the input interrupt queue between sending the buffered packets. Clearly, this is not an optimal situation as it may result in a sender needlessly terminating its actions due to the presence of a non-HOLD packet which may not even be destined for the sender in question. Thus, a more efficient approach would involve the ethernet layer inspecting the packet type and taking some appropriate action on discovering a HOLD packet. Such action could merely involve the incrementing of a counter in the LNTP connection's protocol control block (PCB). This counter would naturally be decremented by LNTP whenever the input interrupt routine dequeued a HOLD packet. In this manner it would be possible for a LNTP connection to quickly discover whether there are any HOLD packets queued for it. The negative aspects of this proposed solution are two-fold. First, they require an even greater degree of 74 coupl ing between the L N T P layer and the ethernet layer than exists current ly . T h i s violates the pr inc ip le of pro toco l layer ing for the sake of efficiency. Second, i t requires the ethernet in terrupt handler to perform ext ra processing. Since not only does this in ter rupt handler execute a t a high pr ior i ty , i t is also under certain t ime constraints (buffer overf low and thus loss of ethernet packets are possible results of a slow handler ) , i t makes sense to have it execute as quick ly as possible. T h u s , any ext ra processing imposed by L N T P on the ethernet in terrupt handler wou ld have to be kept to a m i n i m u m . However , a connect ion's P C B is found by a sequent ia l search of a l inked l ist of P C B s . T h i s introduces the possibi l i ty that when there is a large number of s imultaneous L N T P connect ions, this search could take a long t ime. A s some of the input d a t a are diff icult to obta in , a s imulat ion s tudy may not be appropr iate. T h u s , i t m a y be necessary to implement this proposal and make the appropr ia te measurements to determine i ts feasibi l i ty. 7.3. Future Work T h e examinat ion o f L N T P presented in this paper was most ly concerned w i t h the protocol 's performance i n a file transfer mode of operat ion. T h i s leaves open such future research as an examinat ion of i ts performance in an interact ive mode of operat ion. In such a c i rcumstance, i t would be necessary to employ a performance index other than throughput . T h e most l ikely candidate wou ld be the response t ime or the number of cont ro l packets exchanged. Due to L N T P ' s deferred flow control scheme, i t wou ld be expected that L N T P would prove itself superior to those protocols, such as T C P / I P , wh ich employ a more t rad i t ional strategy. 75 Ano the r possible research project could involve the performance measurement of L N T P under vary ing work loads. T h i s contr ived work load could take the fo rm of non-network related tasks (such as a compile) as wel l as mul t ip le L N T P connections invo lved i n intensive da ta transfers. T h e insights gained f rom these measurements could wel l prove valuable in refining L N T P . Las t l y , it is st i l l necessary to develop addressing support for the l o c a l n e t doma in . A t present L N T P makes use of the avai lable T C P / I P support necessitat ing the use of i n t e r n e t addressing. Undoub ted ly , due to the fact that L N T P is not a L H N protocol , the necessary support wou ld be bo th less compl icated and more efficient than that required by T C P / T P . Bibliography [GABR85] L . F . Cabrera, M . J . Karels, and D. Masher, "The Impact of Buffer Management on Networking Software Performance in Berkeley U N I X 4.2 BSD: A Case Study", USENIX Conference Proceedings, Summer 1985, pp.507-517. [CHAN84] Samual T . Chanson, K. Ravindran, and Stella Atkins, LNTP - An Efficient Transport Protocol for Local Area Networks, Technical Report 85-4, University of British Columbia, February 1985. [CHAN85] S.T. Chanson, K. Ravindran, and S. Atkins, " A Performance Evaluation of the Arpanet T C P in a L A N Environment", INFOR, vol.23, no.3, August 1985, pp.294-327. [CHER86] David R. Cheriton, " V M T P : A Transport Protocol for the Next Generation of Communication Systems", SIGCOMM Symposium on Communications Architectures and Protocols, August 1986, pp.406-415. [CLAR78] D. Clark, K. Pogran, and D. Reed, An Introduction to Local Area Networks, I E E E Proc., vol.66(ll), November 1978, pp.1497-1517. [COTT80] I.W. Cotton, Technologies for Local Area Computer Networks, Computer Networks, North-Holland Pub. co., vol.4, 1980, pp.197-208 [HOPP86] A . Hopper, S. Temple, and R. Williamson, Local Area Network Design, Addison-Wesley Publishing Company, 1986. [LEFF83] S.J. Leffler, R.S. Fabry, and W . N . Joy, A 4.2 BSD Interprocess Communication Primer, Department of Electrical Engineering and Computer Science, University of California, Berkley, Draft of March 1983. [LEFF83b] S.J. Leffler, R.S. Fabry, and W . N . Joy, 4.2 BSD Networking Implementation Notes, Department of Electrical Engineering and Computer Science, University of California, Berkley, Draft of September 1983. [METC76] R. Metcalfe, D. Boggs, "Ethernet: Distributed Packet Switching for Local Computer Networks", Communications of the ACM, vol.19, no.7, July 1976, pp.395-404. [POST80] Jon Postel, ed. User Datagram Protocol - D A R P A Internet Program Protocol Specification, RFC 768, USC/Information Sciences Institute, August 1980. [POST81] Jon Postel, ed. Transmission Control Protocol - D A R P A Internet Program Protocol Specification, RFC 79S, USC/Information Sciences Institute, September 1981. 76 77 [POST81b] J o n Pos te l , ed. Internet Protocol - D A R P A Internet P rog ram Pro toco l Speci f icat ion, RFC 791, USC/Information Sciences Institute, September 1981. [STAL84 ] W . Sta l l ings, Local Networks: An Introduction, New Y o r k , M a c M i l l a n Pub l i sh ing C o m p a n y , 1984. [STAL85 ] W . Sta l l ings, Data and Computer Communications, M a c M i l l a n Pub l ish ing C o m p a n y , 1985. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051917/manifest

Comment

Related Items