UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A hybrid hierarchical request-routing architecture for content internetworking Gan, Ming 2002

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2003-0009.pdf [ 3.82MB ]
Metadata
JSON: 831-1.0051732.json
JSON-LD: 831-1.0051732-ld.json
RDF/XML (Pretty): 831-1.0051732-rdf.xml
RDF/JSON: 831-1.0051732-rdf.json
Turtle: 831-1.0051732-turtle.txt
N-Triples: 831-1.0051732-rdf-ntriples.txt
Original Record: 831-1.0051732-source.json
Full Text
831-1.0051732-fulltext.txt
Citation
831-1.0051732.ris

Full Text

A Hybrid Hierarchical Request-Routing Architecture For Content Internetworking by MING GAN M.Sc, Shanghai Jiaotong University, 1998, China B.Eng., Xi'an Jiaotong University, 1995, China A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (The Department of Computer Science) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA July 2002 ©Ming Gan, 2002 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the U n i v e r s i t y of B r i t i s h Columbia, I agree that the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e for reference and study. I further agree that permission for extensive copying of t h i s thesis for s c h o l a r l y purposes may be granted by the head of my department or by h i s or her representatives. I t i s understood that copying or p u b l i c a t i o n of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of The U n i v e r s i t y of B r i t i s h Columbia Vancouver, Canada Date (Ovhbesr 8, 2^>oZ Abstract Nowadays, the dominant majority of Internet traffic comprises of the distribution of static and media rich content. For efficient content access and delivery, contents from original (authoritative) servers are replicated in geographically distributed (surrogate) servers over the Internet. A content request is routed to an appropriate surrogate server "nearest" to the client in a content network. This thesis proposes a new hybrid request-routing architecture for Content Internetworking. By extending existing IP networking infrastructure to incorporate content intelligence, the new routing platform supports both IP-aware and content-aware services. The salient features of the architecture are two-fold: (i) the hierarchical content request-routing and resolution approach, and (ii) the hybrid rP-address based routing and name based routing. This novel architecture enjoys scalability and performance advantages, while maintaining compatibility with the traditional Internet architecture. A content networking test bed with a complete suite of content routing protocols is implemented to demonstrate the viability of the architecture and framework in terms of scalability and performance. i i Table of Contents Abstract i i Table of Contents i i i List of Tables vi List of Figures vii Acknowledgements viii Chapter 1 Introduction 1 1.1 Objective and Motivation 1 1.2 Background 3 1.2.1 Concept and Organization of Content Network 3 1.2.2 Origin of Content Networking 5 1.2.3 Content Internetworking Architecture 7 1.2.4 Review of Request-Routing Techniques 8 1.2.5 Known Request-Routing Approaches for Content Internetworking 9 1.3 Methodology 11 1.4 Thesis Contributions 14 1.5 Thesis Outline 15 Chapter 2 Content Request-Routing Architecture 16 2.1 System Architecture 16 2.2 Hybrid Routing Platform 19 2.3 Content Network's Internal Request-Routing 21 ii i 2.4 CNGP-Communications Between Content Network and Content Internetworking Gateway 21 2.5 CGIP-Communications Between Content Internetworking Gateways 22 2.6 CGBP-Communications Between Content Internetworking Gateway and Content Border Gateway 23 2.7 CBGP-Communications Between Content Border Gateways 26 2.8 CREP-Content Request Resolution Protocol 28 2.9 Content Routing Metric Specification 31 2.10 Content Route Caching 34 2.11 Distinguishing Handling of Replicated and Non-Replicated Content 36 2.12 Load Balancing Strategy 37 2.13 Deployment 39 Chapter 3 Protocol Specification 41 3.1 CGIP 41 3.1.1 Message Header Format 41 3.1.2 OPEN Message Format 42 3.1.3 UPDATE Message Format 43 3.1.4 KEEP A L I V E Message Format 47 3.1.5 NOTIFICATION Message Format 47 3.2 CBGP 48 3.2.1 UPDATE Message Format 48 3.3 CGBP 50 3.3.1 Message Header Format 50 3.3.2 OPEN Message Format 50 3.3.3 UPDATE Message Format 51 3.3.4 KEEP A L I V E Message Format 52 3.3.5 NOTIFICATION Message Format 53 3.4 CNGP 53 3.5 CREP 53 3.5.1 CREP DNS Query and Response 53 3.5.2 CREP Content Query and Response 53 iv Chapter 4 Simulation and Performance Analysis 55 4.1 Multi-threaded Prototype Implementation 55 4.1.1 Multi-threaded System Prototype 55 4.1.2 Protocol Prototype Configuration 57 4.2 Server Load Measurement and Routing Updates 59 4.2.1 Server Load Measurement 59 4.2.2 Request-Routing Agent and Routing Updates 60 4.3 Content Routing Decision Process and Routing Table 61 4.3.1 Content Routing Decision Process and Design of Routing Table 61 4.3.2 Implementation of Content Routing Table 62 4.4 Simulation 64 4.4.1 Simulation Network Topology 64 4.4.2 Testing Environment Configuration 65 4.5 Scalability and Performance Analysis 67 4.5.1 Scalability with Respect to Content 67 4.5.2 Scalability with Respect to Networks 68 4.5.3 Scalability with Respect to End-users 69 Chapter 5 Conclusions and Future Work 70 5.1 Conclusions 70 5.2 Future Work 76 Bibliography 78 Appendix A Abbreviations 80 v List of Tables 3.1 CGIP Message Header Format 41 3.2 CGIP OPEN Message Format 42 3.3 CGIP UPDATE Message Format 44 3.4 Withdrawn Content Routes Field Format 44 3.5 Advertised Content Routes Field Format 45 3.6 Content Name 2-Tuple Formats....... .-.".V. . 1 . . . ; 45 3.7 Content Route 4-Tuple Format 46 3.8 Attribute Type Format • 46 3.9 CGIP NOTIFICATION Message Format 47 3.10 CONTENT Attribute Format 49 3.11 CONTENT Attribute Flags Field Layout 49 3.12 CONTENT Attribute Value Field Format 49 3.13 CGBP OPEN Message Format 51 3.14 CGBP UPDATE Message Format 52 3.15 CREP Content Query/Response Message Format 54 4.1 Content Routing Table of CIG 61 4.2 Content Routing Table of C B G 62 4.3 Content Route Cache of CIG 62 4.4 Simulation Network Configuration 65 vi List of Figures 1.1 Distribution of Content Replicas in a Content Network 2 1.2 Un-interoperable Distinct Content Networks 6 1.3 Interoperable Distinct Content Networks 6 1.4 Logical Content Routing Tree 10 2.1 Content Request-Routing System Architecture 16 2.2 Traditional DNS Approach 28 2.3 Request-Routing System in Resolution Chain 29 2.4 Control Flow of Content Resolution Process 31 2.5 Content Routing Knowledge Situation on CIG and C B G 34 4.1 System Prototype Configuration 56 4.2 Implementation of Content Routing Table Data Structure .63 4.3 Simulation Network Topology 64 vii Acknowledgements I would like to express my gratitude to my supervisor, Dr. Son Vuong, for his essential guidance, inspiration and suggestions for this thesis. I would also like to thank Dr. Alan Wagner for his time and comments on this thesis work and suggestions for improvement. I would also like to thank my fellow graduate student Xiaojuan Cai for her inspiration and friendship. The financial support from the Computer Science Department at UBC in the forms of TA and RA as well as the partial support from the CITR Network of Centres of Excellence for my involvement in the SCE project are gratefully acknowledged. Finally, I would like to thank my family for giving valuable encouragement and support during my years of graduate study. MING GAN Department of Computer Science The University of British Columbia August 2002 viii Chapter 1 Introduction 1.1 Objective and Motivation This decade has seen the explosive growth of the Internet. The Internet has become a part of our daily life, moreover, is fast evolving into the world's economic infrastructure [1]. Some measurements indicate that 70 to 80 percent of wide-area Internet traffic is HTTP traffic, and much of the remainder consists of RealAudio streams and DNS [14]. From this, we can conclude that the primary Internet traffic is the distribution of content-plain and media rich content, and ancillary traffic to locate content. Millions of clients are accessing content on thousands of Web sites on a daily basis; enterprises heavily invest in e-Business for better profit margins; service providers promote content-based services, such as Web hosting, application hosting, content delivery, application infrastructure services, and broadband streaming services, to attract new customers and maintain customer loyalty. As a result, the volume of content on the Internet increases exponentially [2], and traffic delivering all kinds of content vigorously strives for bandwidth resources. The development of the Internet depends on facing the challenges of poor service performance, slow response time, low content availability, and huge bandwidth cost. 1. Since the Internet is not structured to meet scalable increases in needs, when the number of users accessing content on the Internet grows, it becomes increasingly difficult to provide a high level of content availability and rapid response from a single location on the network. A solution to this problem is to mirror content at multiple sites across the network and bring content closer to potential users to improve response time and content availability, and reduce bandwidth usage and cost. This is the origin of Content Networking. In order to organize and efficiently distribute replicated content to clients, the concept of Content Network (CN), or Content Delivery Network (CDN), is raised. A C N is a logical collection of software, hardware and data, which is under the management of a single set of administrative policies and techniques. Content are distributed and replicated on surrogate servers across a C N , and the end-user's requests for specific content are redirected to suitable surrogates that can best service these requests. Figure 1.1 shows the content distribution situation in a typical Content Network, with two surrogate servers for content A , one surrogate server for each of content B, D and E, and an original authoritative server for content C. However, each C N has its own limit of geographic scope. authoritative server surrogate server Figure 1.1: Distribution of Content Replicas in a Content Network A critical technology for the widespread adoption of content delivery services, Content Internetworking (CI), or Content Peering (CP), enables the CNs of multiple service providers to work in cooperation. The key element of CI is the deployment of the Content Interconnection Gateway (CIG), for interconnecting CNs in order to extend the reach of an individual C N . CPs Request-Routing (RR) system redirects a client's name request to an appropriate content server that is "closest" to the client, in terms of physical proximity and network delay, and 2 server response time. RR between multiple data centers or points of presence (POPs) ensures the fastest delivery of content regardless of location, providing the highest availability and site response. To our knowledge, main request-routing approaches for Content Internetworking can be categorized into two directions: an application-layer approach, represented by the Internet drafts [8] of the IETF CDI working group, and the pure name-based routing approach, represented by the Triad project [14, 15] of Stanford University. Unfortunately, none of these approaches provides an elegant solution to meet the requirements and challenges of scalability and performance, along with the scale of content, end-users and networks in Content Internetworking scenarios. This thesis aims to address the following problems and concerns in a content request-routing system for Content Internetworking: • System scalability along with the scale of content, end-users, networks, and services; • Routing and resolution performance; • Duplicate routing infrastructures at the network layer and the application layer; and • Difficulty of smooth and incremental deployment. Given the existing mature IP routing platform and network infrastructures, we aim to integrate content routing intelligence and facilities with IP routing infrastructures, and to build a content Request-routing architecture, which can scale gracefully, and be deployed incrementally. 1.2 Background 1.2.1 Concept and Organization of Content Network In recent couple of years, Content Networking has become a hot topic. Deemed as an opportunity for huge market profits, Content Networking techniques are becoming the focus of intensive research. A Content Network (CN), also called 3 the Content Distribution Network (CDN) , or Content Delivery Network, is a virtual network defined and implemented to meet today's stringent "return on investment" requirements. In Cisco 's definition, "content delivery network combines traditional routing and switching intelligence with content-aware technology at service provider distribution points or enterprise data centers. Cisco views a content delivery network as a system that requires a number of co-operating, content-aware network devices working in concert to distribute content closer to users and locate that content when requested" [1] . Content Networking technology is expected to solve scalability, availability, reliability and security problems in existing Internet architecture, and bring networking technologies into the next generation. "We're not talking about a small incremental advancement in service networking - we're talking about a titanic shift in how we view services in the next decade" [2]. Though Content Network techniques are not standardized, a C N is comprised principally of four core system elements: the Request-Routing System, the Distribution System, the Accounting System, and Surrogates [4, 5]. • Content Distribution System distributes and delivers content injected by the origin across a network. There are two common distribution strategies of content: push and pull. Content that is pre-populated within a C N is pro-actively injected, which we call "push", while content that is to be delivered on demand is injected at the time of the object is requested for delivery, which we call "pul l " . Content cache is also an important ingredient of a Content Distribution System. • Content Request-Routing directs an end user's request for a particular content to, among multiple sites with mirrored content, the "best" or "closest" site, based on a set of metrics on the server, and network variations, such as server and network health, server load, Web response time, network delay, topology, and content availability, and a set of user-defined content specific policies, 4 such as location of content, thus enabling the accelerated, or optimal, delivery of Web static content and streaming media. It performs a dynamic routing decision and binding process that maps a DNS U R L name to an LP address of the best site where copy of the requested content resides. • Content Accounting System is responsible for aggregating and distilling accounting information into statistics and content detailed records for use by the origin and billing organizations. • Surrogates are responsible for delivering requested content to the client, and sending accounting information for delivered content to the Accounting System. Many suppliers, for example, Cisco, Nortel, 3Com, Akamai, Inktomi, CacheFlow, and F5, have already announced the launch of commercial Content Networking products. The market for Content Networking products is reported growing rapidly, and the revenues for content-based services are expected to exceed revenues for access services in the coming few years. 1.2.2 Origin of Content Networking A Content Network efficiently organizes content replicas to serve a content request with the best content surrogates that are closer to clients. However, each individual C N has its reachable geographic limit. No individual C N can span all of the geographies and networks across the Internet. Likewise, a given C N cannot take advantage of all the value added functionalities, including speed, security, quality-of-service, on-premise delivery, and other features that individual service providers may implement. Figure 1.2 explains the disadvantage of the lack of interoperability of individual CNs. CN-1 has a replica of content A , B, C, D and E, and CN-2 has a replica of content B, C, D and F. When A client in C N - 2 sitting on the edge of CN-2 , requires for content A , for which CN-1 has a replica that is very close to that client, due to the lack of interoperability between CN-1 and C N -5 2, the request has to be directed by the DNS chain to the authoritative server of content A , which may locate very far from that client. •>Q authoritative server of A DNS Figure 1.2: Un-interoperable Distinct Content Networks Only by interconnecting individual CNs, can the reach and scale of CNs be extended, and the benefits of Content Networking techniques be completely deployed across the globe. That is the why the concept of Content Internetworking (CI) is developed. Content Internetworking allows individual CNs to interoperate, ensuring optimal performance by delivering content from resources located close to, physically or logically, the requesting users. With Content Internetworking, a Web site owner can work with their preferred hosting service provider, but gain the reach of combined peering Content Networks. Figure 1.3 explains the advantage of building interoperability between distinguished CNs. The name request of the client of CN-2 for content A can be directed to CN-1, and in particular, to be serviced by the surrogate of content A which is closest to that client. Compared to the approach in Figure 1.2, this approach significantly reduces response time by saving round-trips to the DNS server and the authoritative server of content A . DNS > Q authoritative server of A Figure 1.3: Interoperable Distinct Content Networks 6 1.2.3 Content Internetworking Architecture Initiated by Cisco Systems, the Content Alliance [3], initiated in August 2000, was formed to foster interoperability of Content Networks. With more than 100 members of service providers, technology providers and content providers, Content Alliance aims to speed up the development of open standards for the advancement of content networking and expand the market for content networking services. Proposed in the Internet drafts [7] of Content Alliance, three principal architectural elements that constitute the building blocks of the core Content Internetworking system are the Request-Routing Peering System, Distribution Peering System, and Accounting Peering System. The key idea of Content Internetworking is a peering relationship between CNs. A CN's direct C N peer does not have to be the direct neighbor in the IP layer. • Distribution Peering System [7, 9] distributes and delivers content between CN's internal Distribution System. As CN's internal Distribution System, two distribution modes and strategies are defined: push and pull. • Request-Routing Peering System [7, 8] directs end user's content request to a suitable, hopefully the "best" or "closest", surrogate in a peering C N . Request-Routing Peering Systems interact with each other to exchange content routing information to map a DNS U R L name into an IP address of the best surrogate in a peering C N . The content routing decision and binding process are based on similar user-defined content-specific policies and set of metrics as an internal Request-Routing System of a C N , possibly including network proximity, bandwidth availability, server health and load, content availability, network response delay, and so forth. 7 • Accounting Peering System [7, 10] aggregates and distills accounting information into statistics and propagates that information between CN's internal Accounting Systems. 1.2.4 Review of Request-Routing Techniques This research in this thesis focuses on architecture and protocols for content request-routing peering system in Content Internetworking. Before we delve into Content Request-Routing Peering, we briefly review C N content request-routing techniques first. The CN's request-routing system directs user requests for objects to a surrogate that could best serve that content. In principle, request-routing techniques can be classified under: DNS Request-Routing, Transport-layer Request Routing, and Application-layer Request Routing [6]. DNS based request-routing techniques have one or multiple specialized DNS servers inserted into the DNS resolution process. The most common mechanisms used in DNS based request-routing is the use of NS and C N A M E records. The transport-layer request-routing system inspects the information available in the first packet of the client's request to make surrogate selection decisions. The inspection of the client's requests provides data about the client's IP address, port information, and Layer 4 protocol. The acquired data can be used in combination with user-defined policies and other metrics to determine the selection of a surrogate that is best. suited to serve the request. The application-layer request-routing system performs a deeper examination of a client's packets beyond the transport layer header. Common application-layer request-routing approaches are: URL-Based Request-Routing, in which a U R L consists of a naming scheme, followed by a string whose format is a function of the naming scheme, and this information is sufficient to disambiguate the content and suitably direct the request; Redirection, in which the client's request is first resolved to a virtual surrogate, and consequently, the surrogate returns an application-specific code to redirect the client to the actual delivery node; In-Path 8 Element, in which an In-Path element is present in the network in the forwarding path of the client's request and provides transparent interception of the transport connection and performs routing decisions. Request-Routing Peering systems are essentially needed to enable interconnection of CNs. The fundamental function of the Request-Routing Peering system is to interconnect and exchange routing information with each other to route requests of an available copy of content between CNs based on policies and metrics. In the picture of the interconnection of Request-Routing Peering Systems, CN's internal topologies and request-routing schemes are hidden from the rest of the interconnected CNs, and CNs are considered to be black boxes. 1.2.5 Known Request-Routing Approaches for Content Internetworking Content Request-Routing has become an active research area in recent years. To our knowledge, two representative content routing approaches for Content Internetworking are: the Internet drafts [7, 8] produced by IETF CDI working group, and Stanford's Triad project [14, 15]. IETF CDI working group's Internet drafts [7, 8] define Request-Routing Peering infrastructure at the application layer. This approach requires all CIGs of specific content to maintain a logical routing and distribution tree. In particular, all name requests for specific content are first sent to the authoritative CIG of that content, which is globally centralized, for request resolution. The authoritative CIG's Request-Routing Peering system then forwards the name requests to the Request-Routing Peering system of an appropriate CIG peer that is considered best able to service the request, based on its knowledge of neighboring request-routing peers in its logical routing tree. If the selected CIG peer decides that a surrogate in its distribution domain is the best server to serve a request, it injects the request into its internal Request-Routing system to further resolve the request; otherwise, it continues to forward the request to an appropriate CIG peer on its 9 logical tree. The process repeats until the request is finally resolved. Figure 1.4 depicts the content routing tree of this approach. The arrows show the path of the request traversed. authoritative CIG CIG peer Figure 1.4: Logical Content Routing Tree The main disadvantage of the approach suggested by IETF CDI working group's Internet drafts is the cost of maintaining per content logic trees, whose roots are the authoritative CIGs. Meantime, the presence of globally centralized authoritative CIGs in the request resolution path implies weak system scalability, fault tolerance, flexibility and reliability. Central nodes are the bottleneck of performance and single points of failure, and make the system difficult to scale. Another disadvantage of this approach is the inefficiency of metric measurements. Client proximity and network delay are two of the major essential metrics in content routing in order to precisely reflect the degree of preference in routing decision processes. Network proximity is a built-in element in network layer IP routing infrastructure, and network delay measurement can be extracted from TCP packets. However, at the application layer, some additional mechanisms have to be deployed to acquire such proximity measurements, for example, to let each surrogate periodically ping potential clients. Though applications such as ping can precisely reflect and indicate a network delay and congestion situation, the huge traffic overhead imposed onto networks render them only feasible and practicable for small-scale Content Networks, not Content Internetworking scenarios. 10 Stanford's Triad project [14, 15] proposes an Internet routing architecture that is completely based on name addresses, instead of IP addresses, of nodes. In this approach, everything is referred to by its name, and IP is no longer the fundamental and exclusive routing scheme in the Internet. Though it is ideal to have one uniform routing protocol on the Internet enabling both IP routing and content routing, the main disadvantage of the Triad approach is the inefficiency of name aggregation and processing. Aggregation is an important technique for achieving system scalability for wide area networks. For example, efficient aggregation contributes greatly to BGP's scalability. However, the inherent drawback of different name lengths and the lack of standard naming rules make name aggregation extremely difficult and inefficient. Moreover, there is a concern of compatibility and the inter-operability problem of existing IP products, applications and techniques. This thesis aims to build a request-routing solution for Content Internetworking, which is scalable, and avoids duplicating and wasting existing IP network infrastructures. 1.3 Methodology This thesis proposes an innovative routing architecture, which naturally extends the existing IP networking infrastructure to incorporate content networking and content-^aware services to the Internet. By combining content routing into IP routing infrastructure to enable the communication of content state, the traditional IP network is leveraged with content intelligence to meet the increasing requirements for content delivery and content-based services. Considering the routing nature of content routing and clean layering architecture, and in order to take advantage of valuable built-in information and facilities in IP routing, the proposed approach delegates content routing from the application layer to the network layer. The existing IP routing infrastructure, which is IP-address based, is extended to accommodate content routing, which is 11 name-based. IP and content routing share routing-related information, and function in parallel. The new routing platform supports both IP-based and content-based services to achieve maximum return on investment on infrastructures. The features of the proposed approach are: a hybrid routing platform supporting both IP and content routing, a hierarchical routing architecture and hierarchical content request routing and resolution, a distributed system, a name plus IP dual address scheme, and the distinguishing of handling of content with replica and content without replica. These features lend the approach to good scalability and performance. The new routing architecture is highly.hierarchical. Content routing occurs at three levels: the C N internal router level, the Content Interconnection Gateway (CIG) level, and the Border Gateway level, from lowest to highest, respectively. C N internal router level content routing deals with request routing within one C N , and is considered as the black box in Content Internetworking. Content Interconnection Gateway level content routing deals with Request-Routing between C N peers within one administration domain. Border Gateway level content routing deals with request routing between administration domain peers. By constructing the hierarchical content interconnecting routing architecture, the reach and services of individual CNs are extended across the globe to achieve maximum services flexibility and profits, and effectively address the problems of system scalability, flexibility and reliability, service performance, and content availability. With appropriate local and global load balancing strategies and algorithms, content traffic can be further optimized for maximum performance. The hierarchical architecture enables hierarchical content request routing and hierarchical content request resolution, the key features of our approach. This approach deploys a distributed system design, eliminating globally centralized nodes, which are the potential bottlenecks for performance and scalability, from the picture of Content Internetworking Request-Routing. Unlike the request-routing approach proposed in IETF CDI working group's Internet 12 drafts, authoritative servers are no longer essentially needed for resolving content requests, in contrast, authoritative servers are treated equally as other surrogates in routing decision and resolution processes. By adopting the distributed system design approach, the request-routing system naturally acquires all the nice features of a distributed system - good scalability, fault tolerance, flexibility and reliability. One innovative feature of the approach is the addressing scheme-name plus IP dual addresses. One particular piece of content on the Internet can be uniquely identified by the content name and IP address of an authoritative server or surrogate where the content resides. The reasons for adopting this dual address scheme are first, that content routing aims to direct requests for content identified by name, to the best surrogates, through a binding process which maps the content request to the IP address of surrogate. Therefore, there is a natural and inherent relationship between name and IP address in content routing. Second, as discussed above, we want to integrate content routing with IP routing to make use of the existing valuable IP infrastructures. By bringing the fundamental addressing scheme of IP networking-IP address-into the content routing addressing scheme, we establish a bridge between IP and content networking. In addition to these features, load balancing and content route caching strategies and algorithms are applied as additional intelligence onto the basic request-routing system for further optimization of content routing and resolution processes. Dynamic local and global load-balancing mechanisms are integrated into the basic routing algorithms and protocols. We developed four load-balancing algorithms, including Random Selection, Round Robin, Dynamic Equality, and Possibility Bucket algorithms. The Possibility Bucket algorithm bases its decision on both the content's historical access frequency and server-side metrics, and experiments demonstrated its capability in adjusting and distributing traffic efficiently and smoothly. 13 The approach deploys caching mechanisms to optimize content accessing performance. Besides real content caching, content routing information is also cached at different routing levels. The Time To Live (TTL) value and refreshing rate of the cached routing information depends on factors such as the stability of surrogates' load, network delay, and so on. Like other caching mechanisms, content routing information caching is a tradeoff between service response performance and information freshness and preciseness. In this thesis, where the terminologies of Content Request-Routing, Distribution, and Accounting are used, except where the author makes explicit specification, Content Request-Routing Peering, Distribution Peering, and Accounting Peering, respectively, instead of CN's internal Request-Routing, Distribution and Accounting, are referred. 1.4 Thesis Contributions This thesis explores the situations and existing approaches of content routing in Content Internetworking, and based on the research, proposed an innovative Request-Routing architecture for Content Internetworking. Based on the architecture, a complete suite of content Request-Routing protocols is developed. One of the most important features of the proposed approach is the hybrid routing architecture, in which the approach delegates content routing to network layer, and integrates content intelligence with IP intelligence by naturally extending the existing IP infrastructure. The new routing architecture is hierarchical and compatible with traditional Internet architecture, with components distributed at CN's internal request-routing level, Content Network level, and Autonomous System level. These unique features provide our approach with good scalability and performance, as well as an incremental deployment path, which distinguishes our research from other approaches. A complete set of request-routing and request-resolution protocols, including CNGP, CGIP, CGBP, CBGP, and CREP are specified based on the 14 architecture. The prototypes of these protocols are implemented with C on Linux RedHat 7.2. With minor changes, the system can be migrated onto Unix Solaris. The implementation was IPv4/v6 compatible. A simulation test bed is established to evaluate the feasibility and performance of the approach. The test bed is composed of 14 Linux and 3 Windows NT machines, interconnected through Bay450 switch with V L A N capability, and connected to the outside world via Firewall and NAT. The network is configured with IP routing capabilities. CN's internal request-routing system, including server status daemon, is implemented to act as the engine of routing updates. Apache Web server is installed to simulate the content host, which hosts text, video, and audio content. The client program is developed to generate name resolution requests. Netscape and Internet Explorer are also installed to simulate end-user accessing content on the Web. Besides, Passler's Webserver stress tool [17] and OpenLoad Web Stress Tool [18] are used to simulate multiple users requesting for content simultaneously, randomly, or at a predefined pattern. Experiments are conducted to prove and evaluate the system's feasibility and capability to scale with the increase in the number of content, networks, and end-users. 1.5 Thesis Outline The rest of this thesis is organized as follows. Chapter 2 introduces the content request-routing architecture and protocols for Content Internetworking. First we give an overview of the system architecture. Then we describe the complete suite of content request routing protocols built on the routing architecture. Next, we discuss the features of the proposed approach, and its deployment. Chapter 3 provides the simulation and performance evaluation. First we introduce the system prototype and its development, then the overview of simulation, including experiment environment and methodologies, and then scalability and performance evaluation and analysis. Finally, in Chapter 4, we summarize the work accomplished and suggest future work that extends the results of this thesis. 15 Chapter 2 Content Request-Routing Architecture 2.1 System Architecture The new routing architecture integrates content routing with traditional IP routing in a hierarchical way, as shown in Figure 2.1. AS-n CBGP AS-m — — — — — — Figure 2.1: Content Request-Routing System Architecture The routing architecture consists of three levels of content routing mechanisms: the C N internal request-routing level, the C N request-routing level, and the AS request-routing level, named from lowest to highest. A Content Network (CN) functions under a single set of content management policies, normally using content routing and distribution solutions 16 and products from one single supplier. Routing units (for example, content routers) within one C N construct the lowest level of Content Internetworking-CN's internal request-routing level. Each C N has its physical span limitation. In order for the clients in one C N to access and use content and services provided by other CNs, CNs need to be able to connect to and communicate with each other. However, since CNs run on different content Request-Routing platforms, we need a standard language to inter-connect CNs, which is where "Gateway" is usually deployed. In a Content Internetworking scenario, the infrastructure interconnecting heterogeneous CNs is called a Content Internetworking Gateway (CIG). On one hand, a CIG extends the physical reachability of an individual C N ; on the other hand, a CIG hides behind the difference and complexity of a C N , allowing for the maximum flexibility of choice of CN's internal Request-Routing schemes. A C N interconnection, with only CIGs visible to each other, constructs the second level of Content Internetworking's routing architecture-CN Request-Routing level. Internet's fundamental element, Autonomous System (AS), is applied here to the scenarios of Content Internetworking to build a fully hierarchical content routing architecture. ASes are interconnected by border gateways, which are originally only IP aware. In our approach, Border Gateways are upgraded with content intelligence to support content routing. To differentiate from the original Border Gateway, we name the upgraded one the Content Border Gateway. Each AS may contain one or more CNs, and the Content Border Gateway hides the complexity of AS's internal structure and schemes from the outside world. AS interconnection, thereby, constitutes the highest level of Content Internetworking's routing architecture-AS Request-Routing level. To organize the C N interconnection hierarchically into the AS interconnection, we assume that each individual C N is confined within the boundary of one single administration domain-Autonomous System. This assumption is reasonable because each C N is under the administration of a single set of policies. It also confirms the assumption of the IETF CDI Internet draft [7, 17 8]. Under this presumption, there might be zero, one or multiple CNs within one AS, depending on the needs and configuration of the service providers. Each C N may install one or multiple CIGs. Similarly, each AS may install one or multiple Content Border Gateways (CBGs). The situations of a multi-homed CIG and Content Border Gateway are not this thesis' focus. Based on the architecture, content Request-Routing communications are structured in a hierarchical way, as follows: • Content Network (CN) 's internal Request-Routing. • Inter-CN Request-Routing, or Content Internetworking Gateway (CIG) intercommunication. Inter-CN request routing, or CIG intercommunication deals with routing communication and co-operation between C N peers within one administration domain-Autonomous System (AS). CGIP is developed to address the communications between CIGs. • Communication between CN's internal request-routing system and CIG. Intercommunication between CN's internal Request-Routing system and CIG deals with communications and routing requirements between CN's internal system level and C N level, and represents external requirements of our Request-Routing approach. CGNP is developed to address the communications between CN's internal request-routing system and CIG. • Inter-Autonomous System (AS) Request-Routing, or Content Border Gateway (CBG) intercommunication. Inter-AS request routing, or C B G intercommunication deals with routing communication and co-operation between AS peers. CBGP is developed to address the communications between CBGs. • Communication between CIG and CBG 18 Intercommunication between a CIG and C B G deals with routing requirements between the C N level and AS level. CGBP is developed to address communications between the CIG and CBG. • Content Request Resolution Process Communications for handling content name request resolution, which involves the content internal request-routing system, CIG and C B G . CREP is developed to address the name resolution communications requirements. 2.2 Hybrid Routing Platform The architecture deploys content routing at the network layer, and extends the existing network layer IP routing infrastructure to build a common routing platform that supports both IP and content routing. Content routing for Content Internetworking is a routing issue by nature, and shares many features with IP routing. Both IP and content routing aim to find the "best" or "shortest" path among multiple possible paths to a destination and redirects user requests to that resource to achieve the best performance. The only obvious difference is, for IP routing, the destination is a node on Internet, identified by its IP address; for content routing, the destination is a piece of content on Internet, identified by its name. To identify the "best" or "closest" target, metrics essentially need to be defined and referred to in the routing decision process. For Content Internetworking, important metrics normally involve physical proximity, server load, server response time, link congestion situation, local preference, and so on. From the standpoint of the metric definition and calculation, many same measurements and concerns apply to both IP and content routing. Therefore, it is efficient to bridge these two routing mechanisms together. The author believes instead, of being defined and implemented in the application layer, content routing can, and should be, delegated to the network 19 layer, and combined with IP routing. First, the observation that content routing is in nature, a routing issue, leads us to the network layer where the routing infrastructures traditionally sit to look for content routing solutions. Second, since well-defined and mature network layer routing techniques, schemes and infrastructures for IP routing already exist, it is more efficient for content routing to share with IP routing a common routing platform and infrastructures, rather than establishing separate facilities for content routing from scratch. For example, content routing needs the measurement of the physical proximity from a particular piece of content for existing and potential clients to make routing decisions. In BGP, the physical proximity measurement is a built-in character, similar to the AS path. Compared to the approaches of using application layer ad-hoc techniques, such as ping, to get the physical proximity information, or imposing an extra network layer scheme to measure the physical proximity, it is much more efficient to make use of the already built-in physical proximity measurement in IP routing. Third, it is not only the efficiency concern that pushes us to a network layer content routing solution, but also the network architecture consideration. To define and implement another set of routing infrastructures at the application layer is a duplication of network layer routing infrastructures and facilities, further, it is a violation of clean layering architecture. With the idea of incorporating content routing with IP routing in mind, we start to explore the existing Internet routing architecture for a suitable joint-point. Since AS is the basic component in Border Gateway Protocol (BGP), naturally we ask ourselves: "Can content routing build on the AS platform, and if the answer is yes, what are the benefits?" The fact that each Content Network is usually administered under a single set of policies, and therefore confined within one single AS, makes it possible to organize the C N interconnection further into the AS interconnection. The advantages of this hierarchical and hybrid architecture strategy are three-fold: first, content internetworking routing architecture builds on, and makes use of, existing AS infrastructures, including the AS definition, AS 20 division and AS management; second, it paves the way for the future deployment; and third, having the same building block in pictures, content routing can conveniently fit into the BGP framework to construct a common routing platform for both IP and content routing. 2.3 Content Network's Internal Request-Routing At the lowest level, CN's internal Request-Routing system functions as the engine for the whole Request-Routing system of Content Internetworking. Depending on the design and implementation of CN's internal routing system, routing information may be generated in different ways. For example, distributed content routers or a centralized content router may interact with surrogates, or agents sitting on surrogates, to collect content state information (content availability, and content type-static or streaming), server state information (server health, server load including C P U load, memory usage, swap load and number of processes, and server load variance and stability), as well as network state information (network and link health, network bandwidth usage and availability, and network congestion). Based on these metrics of content, and server and network variations, together with content-specific local policies, the content router accordingly generates content routing information. In the distributed design approach, the content routers within one C N exchange content routing information to keep the routing table refreshed and consistent. In the scenarios of Content Internetworking, CN's internal request-routing system is considered to be a black box, and hence, not of our research interest. 2.4 CNGP-Communications between Content Network and Content Internetworking Gateway To extend the reach of an individual C N , the CN's internal routing system needs to communicate with its respective CIG. CNGP (Content Network and content internetworking Gateway interconnection Protocol) addresses the routing 21 requirements and communication between CN's internal request-routing system and CIG. CN's internal content routing system communicates with CIG, the interface to the outside content routing system, to inject content routing information about content residing in the local C N into the outside world via the CIG. The routing updates should include content name, type and availability, IP address of content surrogate, server load, and effective time of routing information. Hence, a CIG has the knowledge of all the content that resides in the respective C N and their routing states. For example, if yahoo has two copies on two surrogates in one C N , the CIG is aware of the IP addresses of these two yahoo copies, and their dynamic states, such as content availability, surrogate health and load state, network and link health and delay, and so forth. 2.5 CGIP-Communications Between Content Internetworking Gateways Communications between Content Internetworking Gateways (CIGs) are needed when there are more than one CNs in one AS, in order for the CIGs in the same AS to keep track of the health of each other and exchange routing information to keep, the routing tables up-to-date. CGIP (Content internetworking Gateway Interconnection Protocol) is developed to address the routing requirements and communications between the CIGs in one AS. A CIG distributes its knowledge about the content in its respective C N to its direct neighboring CIG peers within its own distribution domain-the local AS, which continues forwarding the content advertisements to their CIG peers, until finally all the CIGs in the local AS acquire that knowledge. Thus, through inter-communication based on the CIG peering relationship, a CIG holds knowledge of all the content that resides in the local AS, including the content in its local C N , and content in foreign CNs within the local AS. In above example, one CIG has the knowledge of two "yahoo" copies residing in its local C N , and through CIG inter-communication, it learns about 22 three other yahoo copies that reside on some remote CNs in the local AS. Based on its local policies and a set of metrics, the CIG can make routing decisions about "yahoo" content, and keep the best N (N is implementation dependent) routes for "yahoo" content in its routing table. The routing information exchange and routing decision processes are dynamic, as the metrics and content availability situations vary all the time, and CN's internal routing system keeps injecting changes into the CIG. Accordingly, the CIG dynamically makes routing decisions in an effort to reflect the real content situation at that moment, as precisely as possible. Once a CIG finds that its routing table has changed, it propagates the content routing updates to its direct CIG peers. Upon receiving content updates from a CIG peer, the CIG compares the updates to related content routes in its own RIB to decide whether a modification to its routing table is needed. If the routing table is updated, the CIG further propagates the updates to its direct CIG peers, except for the sender CIG of the updates; otherwise the CIG ignores the updates. In this fashion, content updates are saturated throughout the CIGs in one AS. Loop protection mechanisms are deployed to avoid advertisement loop. CGIP is specified to address the routing requirements and communications between CIG peers in one AS. In the case of multiple CIGs in one C N , these special CIG peers need to communicate with each other to keep a consistent view of content routing state of the respective C N . 2.6 CGBP-Communications Between Content Internetworking Gateway and Content Border Gateway To extend the reach of the content and content services within an individual AS to the outside distribution domains, communication between a CIG and its respective C B G is required. CGBP (Content internetworking Gateway and content Border gateway interconnection Protocol) is developed to address the routing requirements and define the communication between a CIG and C B G . 23 A CIG injects its knowledge of the content routing states into the outside world, via its connected CBG, the interface of one AS domain to the outside AS domains. Initially at the startup phase, a CIG sends its whole routing table to its connected C B G ; afterwards, the CIG propagates the routing updates incrementally to the C B G . Hence a C B G has the knowledge of all the content that resides in the respective AS. Based on its content-specific local policies and metrics, a C B G can make routing decisions for a particular content. For example, CIG-1 advertises "google.com" with an overall server-side metric of 690, CIG-2 advertises "google.com" with an overall server-side metric of 518, but with bigger network proximity (e.g., more hops). We assume the higher the metric, the less preferable the route. The C B G may choose CIG-2's advertised route over CIG-1 's if it puts less weight on the server-side metric, or the opposite if it puts more weight on the network proximity. Each CIG in one AS can be initiated to inject content routing information into C B G . Having comprehensive content routing information from all the CIGs in its AS, the C B G can make routing decisions that are the most precise. However, since the CIGs in one AS have a dynamically consistent view of the content state in that AS, the content routing state information the CIGs propagate to the C B G is highly repetitive. In addition, we do not want to overwhelm CBGs, which are normally heavy loaded nodes, and hence most likely to be the bottle-necks in Internet routing, with too much similar information for pursuing precision. Therefore, we can select only one CIG as the representative of all the CIGs in that AS to communicate with the C B G to reduce the protocol traffic overhead. The choice of the CIG Representative Selection Algorithm is also a tradeoff between the traffic and computation overhead, and the system flexibility and reliability. The simplest algorithm is to designate one CIG to act as the representative. To increase reliability, another CIG can be designated to act as the secondary representative in case the primary representative fails. The advantages 24 of this Fixed Representative Algorithm are: simple protocol, and low traffic and computational overhead. The disadvantage is inflexibility, in that the representative is fixed even when the representative CIG is over-loaded, but in the meantime, there are other CIGs that are idle. Moreover, if the secondary representative fails, the CIG peering system fails hopelessly even if there are still other CIGs alive in that AS. Therefore, it would be desirable to have a dynamic representative selection algorithm that bases the decision on real-time information, such as server load and server health. The traditional coordinator election algorithm Bully [18] for group communications in distributed systems can be improved with the real-time server load information to cope with the particular needs of the CIG representative selection process. These two algorithms are flexible in picking a CIG that has low usage, and the system is transparent to CIG failures. The disadvantage of the dynamic algorithm is the traffic and computational overhead. The communications between a CIG and C B G also happen in the reverse direction-from the C B G to the CIG. With the content updates injected by the internal CIGs, and content updates learned via CBGP [see section 2.7], a C B G has the whole picture of the globally distributed content that resides in the local AS and remote ASes. Normally, in a CIG's RIB, the best surrogates for a particular content are located within the local AS, but during exceptional situations when the surrogates local to the AS become highly over-loaded, links are terribly congested, or surrogates are entirely unavailable due to server or link failures, the surrogates that locate in the remote ASes can become more preferable than the local surrogates. Since C B G continues to communicate with the CIG, the C B G can discover these exceptional situations immediately and inject routing information back into the CIG. Except for during the above exceptional situations, a C B G does not take the initiative to inject content state information back into a CIG. On one hand, it avoids overwhelming the CIG with routing information that may never be useful 25 for the CIG; on the other hand, it is a strategy for scalability purposes. This is where the hierarchical architecture benefits the system. Caching schemes are deployed to optimize the request response performance. During the request resolution process where a C B G is involved, the routes returned by the C B G can be cached in the CIG to speed up the response time of future possible requests. Therefore, CIGs may still hold knowledge of some content that resides in remote ASes. We discuss caching schemes in detail, in Section 2.10. 2.7 CBGP-Communications Between Content Border Gateways To extend the reach of content and content services within an individual AS to outside distribution domains, ASes need to communicate with each other to keep track of the health of each other and to exchange content state and routing information. CBGP (Content Border Gateway interconnection Protocol) is deployed to address the routing requirements and define the communications between ASes. Traditional Border Gateway is upgraded to incorporate content intelligence to become Content Border Gateway (CBG). BGP [11] is the industry standard for IP routing interaction between traditional Border Gateways. CBGP is a super set of BGP, in that CBGP supports both content and IP routing. In this thesis, our discussion only focuses on the content routing part of CBGP. The key point of C B G intercommunication is the peering relationship between CBGs. As there exists a peering relationship between Border Gateways in IP routing, peering relationship for content routing is established between CBGs. In Content Internetworking scenarios, a C B G is only aware of it direct C B G peers. A direct C B G peer does not have to be a direct neighbor in layer 3. Intercommunication between CBGs works in a fashion very similar to that of the intercommunication between CIGs [see Section 2.5]. Initially, a C B G has knowledge about all the content residing in the local AS learned via CGBP [see Section 2.6]. Through exchanging the knowledge of 26 the routing states of the content residing in their own ASes, all CBGs finally acquire the routing states of all the content that are available across the globe, in the local or remote ASes. Similar to CGIP, CBGP's routing process is dynamic. Since metrics and content availability situations vary all the time, and the CIG keeps injecting these changes into the C B G , the C B G dynamically makes routing decisions and distributes the routing updates throughout the interconnected ASes via CBGP. Upon receiving a content advertisement from a CIG or a C B G peer, a C B G checks the content advertisement against its content routing table, in other words, its routing table, to find out if this content update has an influence on its routing table. There are three situations where a content update can trigger a change in the routing table of the receiver of the update: • The content update announces a destination that the receiver does not have in its routing table. • The content update announces a "shorter" route to an existing destination. • The content update withdraws an inaccessible route. In one of the three above situations, the C B G modifies its routing table accordingly, and then propagates the changes to its direct C B G peers, except for the sender C B G peer if the update is received via CBGP. If the routing table is not changed, the C B G ignores the content advertisement. In this manner, content routing updates get distributed throughout the interconnected ASes. Appropriate loop-proof mechanisms are required to avoid an advertisement distribution loop. Our approach uses advertisement TTL timestamp and loop detection at both the receiver and sender ends to effectively suppress the loop of the distribution of content advertisements. In the case of existing multiple CBGs in one AS, these special C B G peers need to communicate with each other to keep the consistent view of the content routing state in the respective AS. 27 2.8 CREP-Content Request Resolution Protocol The traditional name request resolution process is based on the DNS [12, 13] resolution chain, in which all the requests for a particular content are directed to the authoritative server of the content to be serviced. Figure 2.2 shows the resolution process of a traditional DNS approach. request for content A f77\. client IP address DNS server DNS server authoritative server of content A Figure 2.2: Traditional DNS Approach In our approach, the hierarchical system architecture enables not only the hierarchical request-routing process, but also the hierarchical request resolution process, which provides the system with good scalability. In addition, the hierarchical name resolution approach completely eliminates the globally centralized authoritative servers from the resolution chain, which significantly improves the service response performance and system reliability. Corresponding to the hierarchical architecture, the name resolution is structured hierarchically into three levels: CN's internal content resolution level, the CIG level, and the C B G level, naming from the lowest to highest. With its knowledge of the content states and routing situations, each level gives its best effort to resolve a name request submitted by a client or a lower level. Only when a request cannot be resolved at that level, is the request escalated to the next higher level. In above discussion, we did not discriminate the content that are replicated across the Internet, from the content that do not have any replica on Internet. Actually, we treat these two kinds of content differently for the scalability concern. One presumption is that we already know the replication state of specific content, which can be acquired from the Content Distribution System [7, 9]. A 28 CIG has knowledge of all the content in the respective AS, including replicated content and non-replicated content, but only propagates the routing state of replicated content to the C B G . Thus, when a client requires non-replicated content, and the content's authoritative server-the only server that holds that content-is located in a remote server, the name request cannot be resolved via our three-phase resolution path. To solve this problem, we integrate the normal DNS approach with our approach. That is, if even the highest level, the C B G level, fails to resolve the request, the request is redirected to the normal DNS chain for final resolution. Figure 2.3 illustrates the proposed approach. j surrogate server of content}^ client • request for A IP address RR system DNS server rf= DNS server authoritative server of content A Figure 2.3: Request-Routing System in Resolution Chain Content requests originated within a C N follow a name resolution path described briefly below: a. Level 1-CN's Internal Resolution A client's name request originating in a C N is first sent to this CN's internal Request-Routing system for resolution. If the client request can be resolved through this CN's internal Request-Routing system, which means that the requested content has either the authoritative or replica copy (depending on the CN's internal Request-Routing scheme) residing in the local C N , the resolution process stops here; otherwise the request is forwarded by the CN's internal request-routing system to a higher level-the CIG resolution level-for further resolution. Go to b . l . 29 b. Level 2-CIG Resolution b.l CIG Resolution If the name request can be resolved by the CIG, which means that the requested content has either the authoritative or replica copy located within the local AS, or the CIG finds cached routes for the requested content that has a copy located in a remote AS, the CIG replies with the address of the "nearest" server, which can be either the authoritative server or a surrogate server of the requested content; otherwise, the name request is sent to a higher level-the C B G resolution level-for further resolution. Go to b.2. b.2 CIG Content Query & CBG Content Response b.2.1 The CIG sends a content query for the requested content to its connected C B G and waits for the content response from the C B G . Go to c. b.2.2 Upon receiving the content response from its connected CBG, if the response contains a content server's IP address, the CIG replies to the CN's internal Request-Routing system with the IP address of the "nearest" server. If there are multiple content servers' addresses in the response, the CIG selects the "best" server based on the load-balancing algorithms [see Section 2.12]. Otherwise, the CIG forwards the name request further to the normal DNS request resolution chain. c. Level3-CBG Resolution Upon receiving the content query from the CIG, the C B G checks its content routing table for the requested content. If routes are found in the routing table, which means that the requested content is replicated in remote ASes, the C B G replies to the CIG with the IP address of the "best" server, which can be either the authoritative server or a surrogate server of 30 the requested content. If load-balancing algorithms are deployed on top of the basic protocol, the CBG's content response may contain more than one route for multiple servers, if applicable. Otherwise, the C B G replies to the CIG, indicating that the requested content is not replicated in any remote AS. Go to b.2. Figure 2.4 describes the control flow of the resolution process. WS/ss/AsZ'/////, DNS chain CBG CIG CN internal RR Figure 2.4: Control Flow of Content Resolution Process .2.9 Content Routing Metric Specification In Content Internetworking, a routing decision process, a request resolution process and load-balancing mechanisms are essentially based on the metrics of the routes. Generally there are three types of metrics in content routing: content variations, server variations, and network variations. Content Variations • Content Availability A Content Distribution System may add or withdraw particular content on surrogates at anytime. Moreover, content copies may experience an 31 updating process from the origin copies, during which those content copies should be kept from being accessed. Content may also be damaged or destroyed, and become unusable or inaccessible. • Content Type Content can be classified into two types: static and stream content. These two types are treated differently by the routing system depending on their sensitivities to delay and performance. Server Variations • Server Health A content server can fail at anytime. Since server failure can be directly reflected in IP routing, by combining content routing with IP routing, we make efficient use of the existing valuable information in IP routing to serve the content routing purposes. • Server Load When other metrics are equal, those servers with high processing capability are more preferable than those over-loaded servers. Server load is a dynamic synthetic metric reflecting a server's real-time computing capability, which depends on many factors, such as a server's C P U load, memory usage, swap load, number of processes, and so forth. One problem is that it is very difficult to come up with a global standard in measuring server load. Network Variations • Network Proximity Network proximity in AS-interconnection scenarios can be naturally expressed with AS-path, which means the path of the ASes traversed. Similarly, in CN-interconnection, the network proximity can be expressed with CN-path, which means the path of the CNs traversed. With the hop count of AS-path or CN-path, we can get a coarse 32 measurement for the network proximity of the advertised content server to the local clients. • Network Health The intermediate links and nodes between the advertised content copies and clients can fail at anytime. Network health metric dynamically reflects the health situation of the concerned networks. Like the server health metric, network health is a built-in metric in IP routing, and thereby can be shared for content routing purposes. • Link Bandwidth Usage This metric reflects real-time network congestion situations. We want to route traffic away from congested networks. As QoS is becoming one of the main concerns of today's Internet, this metric is soon expected to play an important role in IP and content routing. • Network Delay Network delay is a metric directly decided by factors, such as network proximity, link congestion, and network health. The application layer utilities, such as "Ping", are suggested in the IETF CDI working group's Internet draft [9] to get this measurement. Such applications are useful in obtaining precise network delay measurements, at the price of extra traffic overhead. This overhead is acceptable for small-sized individual C N , but this approach apparently has difficulty coping with global Content Internetworking in terms of feasibility and scalability. The choice of the TTL value of metric and route has critical influence to the stability and performance of the routing system. On one hand, too small a TTL value generates routing updates unnecessarily frequently, rendering the routing system unstable, and wasting precious bandwidth and computing resources. On the other hand, too big a TTL value introduces the problem of stale information, which may degrade system performance. 33 2.10 Content Route Caching In the basic content routing protocols, a CIG has only routing information of content that reside in the local AS. Consequently, if a client requests some content that only has a copy in remote ASes, the request needs to be escalated to the C B G for resolution. To save the round trip from the CIG to C B G , caching techniques are applied here to improve performance. Unlike most known Web caching schemes that cache the content themselves, our caching scheme also caches the routes for content. Content caching is also an important mechanism in Content Networking, especially in a Content Distribution system. In this thesis we focus on route caching. When a CIG receives a content response from its respective C B G with the IP addresses of the best servers, the CIG may cache these routes for future use. Afterwards, if clients in that AS request the same content, the CIG can find the cached routes for that content, and thereby, the name can be resolved in that CIG right away without the need to ask the C B G for help. Figure 2.5 depicts the content routing knowledge situation in the CIG and CBG. A CIG has the knowledge of all the content, replicated or non-replicated, residing in its local AS via CNGP and CGIP, and part of the content, replicated or non-replicated, residing in remote ASes via caching and CGBP. A C B G has knowledge of all the replicated content residing in its local AS via CGBP, and part of all the replicated content residing in remote ASes via CBGP. o ^^^^ content query ^ content response content in local AS content in remote ASes Figure 2.5: Content Routing Knowledge Situation on CIG and C B G By adopting the route caching mechanism at the CIG, unless there is cache miss happening at the CIG, the C B G is not bothered for name resolution, which 34 greatly speeds up the request resolution process, and reduces the overall demand for the involvement of the C B G , the potential performance bottleneck. On the other hand, caching introduces the possibility of out-of-date routes. Smaller TTL values may compensate this shortage by trading off extra volume of traffic for route information freshness. Moreover, caching further complicates the global load-balancing schemes for content request distribution. Since cache has a size limit, we want to store the most useful routes in the cache. To evaluate the usefulness of the route of some content, we consider the content's access history and freshness, since the most frequently and recently accessed content are very likely to be requested and accessed again in the near future. The access history of a route is also considered: the fresher the route is, the more useful the route is; the more frequently the route is selected, the busier the content server is, and therefore, for load-balancing purpose, the less desirable the route is. The CIG Route Caching Algorithm is developed to assist with the cache decision process in the CIG. The preference p of a route r for particular content c is calculated as below (the higherp is, the more preferable a route is): p = Kl*content_access_counter - K2* route_access_counter + K3*content_access_timestamp + K4*route_timestamp content_access_counter. counter of times content c has been accessed; route _access_counter. counter of times route r has been accessed; content_access_timestamp: time-stamp of the last time content c was accessed; routejtimestamp: time-stamp of route r, which is generated by content server or CN's request-routing system. The bigger the time-stamp is, the fresher the route is; Kl, K2, K3, K4: coefficient; This calculation can be performed periodically or on-demand. When the CIG receives a content response from a C B G , it compares the weight of the new 35 routes against the existing ones, and replaces the most undesirable routes with the new routes if needed. As described in CGIP [see Section 2.5], CIGs in an AS exchange routing information to keep their routing tables up-to-date. Since the name resolution and route caching in a CIG occur much more frequently than the announcement of feasible content routes and withdrawal of unfeasible content routes, moreover, cached routes in one CIG may never be useful to another CIG, it is advisable not to propagate updates of the cached routing information to CIG peers for stability purpose. 2.11 Distinguishing Handling of Replicated and Non-Replicated Content Having routing states of both replicated and non-replicated content residing in the respective CNs, CIGs only distribute the routing states of the replicated content to CBGs. The strategy of discriminating the handling of replicated content and that of non-replicated content is one unique feature of our approach for achieving good scalability without sacrificing performance. Name requests requiring replicated content can always be resolved though the three-phase path: CN's internal system->CIG->CBG. For a request requiring non-replicated content, when the requested content's authoritative server, which is the only server that holds the content across the globe, resides in the local AS, or there is a route for that content in the cache, the request can be resolved by CIG directly. Only when the requested content's authoritative server resides in a remote AS, and the cache has not stored any route for the content, can the request be unresolved through the three-phase path, and is finally redirected to the normal DNS chain for resolution. Compared to the traditional DNS resolution approach, our approach significantly reduces the response time for name requests requiring replicated content, as well as non-replicated content whose authoritative servers are located in the local ASes. At the price of this performance gain, the approach imposes a 36 small penalty on those requests requiring non-replicated content whose authoritative servers are located in remote ASes. However, since the overhead is only one round trip from a CIG to its respective C B G , which is located in the same AS with the CIG, the extra overhead is reasonable and acceptable. Additionally, the reason that some particular content is not replicated across the Internet is most likely because the content is not popular and seldom requested and accessed. Now that the chance of the non-replicated content being requested is very low, it would not noticeably degrade the overall service performance to put some extra delay into those requests requiring non-replicated content. Statistics indicate that requests for 20 to 30 percent of content comprise 80 percent of all web access activities [15], which conforms to our congestion on content access patterns, and justifies our design rationale. Furthermore, since a C B G stores the routing information of all the content available across the interconnected content networks, the size of the routing table of a C B G is the potential bottleneck for good system scalability. By populating C B G with routing information of only replicated content, we significantly cut down the size of the routing table of a C B G . 2.12 Load Balancing Strategy The content routing system that builds on the metrics, such as server load and network delay, inherently possesses load adjusting capabilities. For example, a server advertising a very low load attracts most requests. However, as its load increases, the routing system, aware of this metric change, routes requests away from this server. As a result, the server's load goes down, and the routing system starts to direct requests to it again. The same adjustment principles apply to the network congestion metric. However, to allow content to be distributed more efficiently across the Internet and protect the routing system against oscillation, which seriously degrades system performance, appropriate load balancing strategies are essential 37 in a content routing system. Load-balancing mechanisms complement the basic content routing protocols to optimize the system's performance and availability. Our architecture adopts a distributed design strategy, which gains the advantages of good system scalability and fault tolerance by eliminating the globally centralized nodes from the routing system, but in the meantime, complicates the load-balancing concerns. Additionally, the deployment of the route-caching algorithm in CIG makes efficient global load balancing more difficult. Four route selection algorithms are developed for the C B G and CIG to select the most desirable route from N (N>=1) feasible routes for particular content destination. • Random Selection Algorithm Selecting one of the feasible N routes randomly. A random algorithm has the lowest computing overhead, but has the disadvantage of the least contribution for systematically optimizing the traffic distribution. • Round Robin Algorithm Selecting one route from the N feasible routes in turn. This algorithm is simple to implement, but it does not take metrics into the consideration of a decision, therefore, is not able to offer the most efficient load distribution solution. • Dynamic Equality Algorithm Calculating the synthetic metric based on the set of metrics and local policies, plus the access history of each candidate route, then selecting the currently best surrogate to serve the requests until the synthetic metric of the best one reaches the synthetic metric of the second best one. Then the second best route is chosen to serve the following requests for that content until the synthetic metric of the second best route reaches the synthetic metric of the third best route... The process repeats until each of the N feasible routes are 38 selected for service, and then it starts with the best route again. The advantage of this algorithm is that it embeds real-time metrics into the decision process, but the disadvantage is high computing overhead. • Possibility Bucket Algorithm Suppose the synthetic metric of each of the N routes is M l , M2, ... Mn. If M = M l + M2 + ... + Mn, the preference of each route can be expressed as the possibility of a request falling into one of the N buckets: (M-M1)/M, (M-M2)/M, ... (M-Mn)/M. The experiments indicate that the Possibility Bucket algorithm has the most desirable load-balancing effects. The disadvantage is that this algorithm has the highest computing overhead. A conclusion of the effectiveness and efficiency of each load-balancing algorithm needs more experiments with various large-scale network topologies and simulations of Internet traffic situations. 2.13 Deployment Our approach allows for the all-at-once deployment, as well as the incremental evolution, based on service needs. At the CN's internal request-routing level, service providers are free to choose the appropriate request-routing schemes and products. To incorporate the individual C N into the Content Internetworking picture, the only requirement is to place a CIG at the edge of the C N , and deploy CNGP to enable the inter-operation between the CN's internal Request-Routing system and the CIG. Without CIG and CNGP, the individual C N still functions as usual, but is left out of the Content Internetworking system. At the C N interconnection level, to enable CIG intercommunication, only CGIP is required. CGIP allows for one C N to locate other CIG peers. Since a CN's direct C N peers do not have to be direct neighbors at the third layer, there is no topology constraint on constructing the C N interconnection. C N can join the 39 C N interconnection system whenever it is ready, by installing a CIG, CNGP and CGIP. CGIP allows the newly added C N to locate and connect to C N peers spontaneously To extend the C N interconnection to the global scale, the AS interconnection, CGBP is required to be deployed at the CIG and C B G sides of the respective AS. Similarly, the CIG and C B G do not have to be directly connected at the third layer, thus there is no topology constraint. At the CGBP interconnection level, enabling AS interconnection only requires upgrading the existing Border Gateway that only supports BGP to the Content Border Gateway (CBG) that supports CBGP, a superset of the original BGP. A C B G running CBGP and CGBP supports both IP and content routing. A CBG's peers do not have to be the direct neighbors at the third layer. An original Border Gateway that has only IP intelligence ignores the content attributes in routing advertisements and pass the advertisements along to its BGP peers, and thereby, can work compatibly with the CBGs. CNGP, CGIP and CGBP all sit on top of the transport layer, and there is no change to the existing IP infrastructure required. Being a superset of BGP, the CBGP upgrades the existing IP routing protocol BGP to add content intelligence to the existing IP infrastructure. The incremental and smooth deployment path greatly facilitates the deployment the request-routing architecture and protocols for Content Internetworking. 40 Chapter 3 Protocol Specification 3.1 CGIP CGIP specification is based on BGP [11]. CGIP peering builds on TCP. 3.1.1 Message Header Format Each CGIP message has a fixed-size header. There may or may not be a data portion following the header, depending on the message type. The layout of these fields is shown in Table 3.1: 0 1 2 3 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 -;:- + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - H — + - + - + - + -+'-•}•--!--;:-r- • + - + - + + • + I I + + I M a r k e r | + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - H — + - + - + - + - + | L e n g t h | Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Table 3.1: CGIP Message Header Format Marker: This 16-octet field contains a value that the receiver of the message can predict. The Marker can be used to detect the loss of synchronization between a pair of CGIP peers, and to authenticate incoming CGIP messages. 41 Length: This 2-octet unsigned integer indicates the total length of the message, including the header, in octets. The value of the Length field must always be at least 19, and no greater than 4096, and may be further constrained, depending on the message type. Type: This 1-octet unsigned integer indicates the type code of the message. The following type codes are defined: 1 - OPEN 2 - UPDATE 3 - NOTIFICATION 4 - KEEP A L I V E 3.1.2 OPEN Message Format After a transport protocol connection is established, the first message sent by each side is an OPEN message. If the OPEN message is acceptable, a KEEP ALIVE message confirming the OPEN is sent back. Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION messages may be exchanged. An OPEN message format is shown in Table 3.2. 0 1 2 3 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 +-+-+-+-+-+-+-+-+ | Version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | My Content Network | + - + - + - + - + - + - + - + - + - + - + - + - + - + - H — + - + | Hold Time | +-+-+-+-+-+-+-+-+•-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CGIP I d e n t i f i e r | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+.-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Opt Parm Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Optional Parameters | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Table 3.2: CGIP OPEN Message Format Version: This 1-octet unsigned integer indicates the protocol version number of the message. The current CGIP version number is 1. My Content Network: This 2-octet unsigned integer indicates the Content Network number of the sender. 42 Hold Time: This 2-octet unsigned integer indicates the number of seconds that the sender proposes for the value of the Hold Timer. Upon receipt of an OPEN message, a CGIP speaker calculates the value of the Hold Timer by using the smaller of its configured Hold Time, and the Hold Time received in the OPEN message. The calculated value indicates the maximum number of seconds that may elapse between the receipt of successive K E E P A L I V E , and/or U P D A T E messages by the sender. CGIP Identifier: This 4-octet unsigned integer indicates the CGIP Identifier of the sender. A given CGIP speaker sets the value of its CGIP Identifier to an IP address assigned to that CGIP speaker. Optional Parameters Length: This 1-octet unsigned integer indicates the total length of the Optional Parameters field in octets. The value of this field is set to zero in current version. Optional Parameters: The field may contain a list of optional parameters. 3.1.3 U P D A T E Message Format UPDATE messages are used to transfer content routing information between CGIP peers. An U P D A T E message is used to advertise multiple feasible routes to a peer, or to withdraw multiple unfeasible routes from service. An UPDATE message may simultaneously advertise and withdraw multiple routes. The UPDATE message format is shown in Table 3.3. Unfeasible Content Routes Length: This 2-octets unsigned integer indicates the total length of the Withdrawn Content Routes field in octets. A value of 0 indicates that no content routes are being withdrawn from service, and that the Withdrawn Content Routes field is not present in this UPDATE message. Withdrawn Content Routes: This is a variable length field that contains a list of content addresses for the routes that are being withdrawn from service. A content address uses a dual addressing scheme: IP address plus content name. The layout of this field is described in Table 3.4. 43 + + I Unfeasible Content Routes Length (2 octets) | + + | Withdrawn Content Routes (variable) | + + | Feasible Content Routes Length (2 octets) | + • + | Advertised Content Routes (variable) | + + | Total Path A t t r i b u t e Length (2 octets) | + + | Path At t r i b u t e s (variable) | + + Table 3.3: CGIP UPDATE Message Format Feasible Content Routes Length: This 2-octets unsigned integer indicates the total length of the Advertised Content Routes field in octets. A value of 0 indicates that no content routes are advertised, and that the Advertised Content Routes field is not present in this UPDATE message. Advertised Content Routes: This is a variable length field that contains a list of content routes that are advertised. The layout is shown in Table 3.5. Total Path Attribute Length: This 2-octet unsigned integer indicates the total length of the Path Attributes field in octets. Path Attributes: A variable length sequence of path attributes is present in every UPDATE. Each path attribute is a triple <attribute type, attribute length, attribute value> of variable length. Attribute Type is a two-octet field that consists of the Attribute Flags octet, followed by the Attribute Type Code octet. The Attribute Type layout is shown in Table 3.8. + + | P r e f i x Length (1 octet) | + + | P r e f i x (variable) | + + | Total Content Names Length (2 o c t e t ) | + + | Content Names | + + Table 3.4: Withdrawn Content Routes Field Format 44 + + I P r e f i x Length (1 octet) | + + | P r e f i x (variable) | + + |Total Content Routes Length (2 o c t e t ) | + + | Content Routes | + + Table 3.5: Advertised Content Routes Field Format Prefix Length: The Length field indicates the length in bits of the IP address prefix. Prefix: The Prefix field contains IP address prefixes followed by enough trailing bits to make the end of the field fall on an octet boundary. Total Content Names Length: This 2-octet unsigned integer indicates the total length of the Content Names field in octets. This field is used to facilitate the aggregation of multiple content names with the same IP prefix Content Names: This is a variable length field that contains a list of content names that are being withdrawn from service. Each content name is encoded as a 2-tuple of the form <content name length, content name>, whose fields are described in Table 3.6. Total Content Routes Length: This 2-octet unsigned integer indicates the total length of the Content Routes field in octets. This field is used to facilitate aggregation of multiple content routes with the same IP prefix. Content Routes: This is a variable length field that contains a list of content routes that are advertised. Each content name is encoded as a 4-tuple of the form <content name length, content name, metric, TTL>, whose fields are described in Table 3.7. + + | Content Name Length (1 octet) | + + | Content Name (variable) | + + Table 3.6: Content Name 2-Tuple Format 45 + + I Content Name Length (1 octet) | + + | Content Name (variable) | + + | Metric (4 octets) | + + | TTL (4 octets) | + + Table 3.7: Content Route 4-Tuple Format Content Name Length: This field indicates the length in characters of the Content Name field. Content Name: This field contains the characters of the name of the content. Metric: This 4-octet unsigned integer indicates the metric of the advertised route. TTL: This 4-octet unsigned integer indicates the expiration time of the advertised route. o 1 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | A t t r . Flags |Attr. Type Code| H — + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + Table 3.8: Attribute Type Format Attribute Flags: This field is unused in the current version. Attribute Type Code: This 1-octet unsigned integer represents how the remaining octets of the Path Attribute represented are interpreted. The supported attribute type in the current version includes: 0 - CN_PATH: This attribute is composed of a sequence of AS path segments. Each AS path segment is represented by a triple <path segment type, path segment length, path segment value>. • Path Segment Type: 0 - CN_SET: the unordered set of CNs a route in the UPDATE message has traversed. 46 1 - CN_SEQUENCE: the unordered set of CNs a route in the UPDATE message has traversed. • Path Segment Length: This 1-octet long field contains the number of CNs in the Path Segment Value field. • Path Segment Value: This variable length field contains one or more C N numbers, each encoded as a 2-octets long field. 3.1.4 K E E P A L I V E Message Format K E E P A L I V E messages are exchanged between peers often enough to keep the Hold Timer from expiring. A K E E P A L I V E message consists of only a message header, and has a length of 19 octets. 3.1.5 NOTIFICATION Message Format A NOTIFICATION message is sent when an error condition is detected. The BGP connection is closed immediately after sending it. The message format is shown in Table 3.9. o 1 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Error Code | Error Subcode | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Table 3.9: CGIP NOTIFICATION Message Format Error Code: This 1-octet unsigned integer indicates the type of CGIP NOTIFICATION. The following Error Codes are defined: 1 - Message Header Error 2 - Open Message Error 3 - UPDATE Message Error 4 - Hold Timer Expired Error subcode: This 1-octet unsigned integer provides more specific information about the nature of the reported error. Each Error Code may have one or more Error Subcodes associated with it. • Message Header Error subcodes: 47 1 - Connection Not Synchronized. 2 - Bad Message Length. 3 - Bad Message Type. • OPEN Message Error subcodes: 1 - Unsupported Version Number. 2 - Bad Peer C N . 3 - Bad CGIP Identifier. 6 - Unacceptable Hold Time. • UPDATE Message Error subcodes: 1 - Malformed Attribute List. 2 - Unrecognized Attribute. 4 - Attribute Flags Error. 5 - Attribute Length Error. 7 - C N Routing Loop. 11 - Malformed C N _ P A T H . 12 - Malformed Content Name List 13 - Malformed Content Route List 3.2 CBGP CBGP is a super set of BGP [11]. Similar to BGP, CBGP peering builds on a TCP transport layer connection, and defines OPEN, UPDATE, K E E P A L I V E and NOTIFICATION messages. The formats of a CBGP message header, OPEN, K E E P A L I V E , and NOTIFICATION message formats are the same as their formats in BGP, respectively. CBGP integrates content components into the original BGP UPDATE message to make BGP content aware. 3.2.1 UPDATE Message Format The content routing information is encoded as an attribute of the Path Attributes field of a BGP UPDATE message, which contains a list of path attributes. This new attribute is named CONTENT. The CONTENT attribute format is shown in Table 3.10. C O N T E N T Attribute Flags: This 1-octet field defines CONTENT attribute as optional transitive. The bit layout is shown in Table 3.11. 48 Bit 1 (Optional Bit): 1 (Optional) Bit 2 (Transitive Bit): 1 (Transitive) Bit 3 (Partial Bit): 1 (Partial) Bit 4 (Extended Length Bit): 1 (Attribute Length field is 2 octets) + + | CONTENT A t t r i b u t e Flags (1 octet) | + • + | CONTENT A t t r i b u t e Type Code (1 oc t e t ) | + + | CONTENT A t t r i b u t e Length (2 octets) | + + | CONTENT A t t r i b u t e Value (variable) | + + Table 3.10: CONTENT Attribute Format 0 1 2 3 4 5 6 7 0 + + + h + + - + - + + + | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | H H + h 1 + 1 1 + Table 3.11: CONTENT Attribute Flags Field Layout CONTENT Attribute Type Code: This 1-octet unsigned integer is set to 20. CONTENT Attribute Length: This 2-octet unsigned integer indicates the total length of the CONTENT Attribute field in octets. CONTENT Attribute Value: This is a variable length field that contains a list of content routes that are advertised to a CBGP peer, or are being withdrawn from service. The layout of the field is shown in Table 3.12. + + | Withdrawn Content Length (2 octets) | + + | Withdrawn Content (Variable) . | + + | Announced Content Length (2 octets) | + + | Announced Content (Variable) | + + Table 3.12: CONTENT Attribute Value Field Format Withdrawn Content Length: This 2-octet unsigned integer indicates the total length of the Withdrawn Content field in octets. 49 Withdrawn Content: This is a variable length field that contains a list of content that is being withdrawn from service. Each content item is encoded as a 2-tuple of the form <content name length, content name>, whose layout is described in Table 3.6. Announced Content Length: This 2-octet unsigned integer indicates the total length of the Announced Content field in octets. Announced Content: This is a variable length field that contains a list of content that is advertised. Each content item is encoded as a 4-tuple of the form <content name length, content name, metric, TTL>, whose layout is described in Table 3.7. The matching IP address prefix for the content that is announced or being withdrawn from service listed in the CONTENT attribute is fetched from the Network Layer Reachability Information field of an UPDATE message of BGP/CBGP. If the Unfeasible Routes Length field of the UPDATE message is not zero, there are IP prefixes that are being withdrawn from service. For each IP prefix being withdrawn, CBGP considers all the content that is associated with that IP prefix to become unfeasible. 3.3 CGBP CGBP specification is based on BGP [11]. CGBP peering relationships builds on TCP transport layer connection. 3.3.1 Message Header Format A CGBP message header format is the same as the CGIP message header format, described in Section 3.1.1. Similar to CGIP [see section 3.1], CGBP defines four message types: OPEN, UPDATES, NOTIFICATION, KEEP ALIVE. 3.3.2 OPEN Message Format 50 After a TCP connection is established, the first message sent by each side is an OPEN message. If the OPEN message is acceptable, a K E E P A L I V E message confirming the OPEN message is sent back. Once the OPEN is confirmed, UPDATE, K E E P A L I V E , and NOTIFICATION messages may be exchanged. An OPEN message format is shown in Table 3.13. 0 1 2 3 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 +-+-+-+-+-+-+-+-+ | Version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | My Autonomous System | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hold Time | + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - H — + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + | CGBP I d e n t i f i e r | + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - H — + - + - + - H — + - + - + | Opt Parm Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Optional Parameters | + - + - + - + - + - + - + - + - + - + - H — + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + Table 3.13: CGBP OPEN Message Format M y Autonomous System: This 2-octet unsigned integer indicates the Autonomous System number of the sender. C G B P Identifier: This 4-octet unsigned integer indicates the CGBP Identifier of the sender. A given CGBP speaker sets the value of its CGBP Identifier to an IP address assigned to that CGBP speaker. The formats of the Version, Hold Time, CGBP Identifier, Optional Parameters Length, and Optional Parameters fields are very similar to their formats in a CGIP OPEN message. See Section 3.1.2 for detailed description. 3.3.3 UPDATE Message Format UPDATE messages are used to transfer content routing information between a CIG and C B G . An UPDATE message is used to advertise multiple feasible routes, or to withdraw multiple unfeasible routes from service. An UPDATE 51 message from a CIG to a C B G may simultaneously advertise and withdraw multiple routes. An U P D A T E message from a C B G to a CIG never advertises routes, and only withdraws routes from services. The U P D A T E message format is shown in Table 3.14. Optional Parameters Length: This 1-octet unsigned integer indicates the total length of the Optional Parameters field in octets. The value of this field is set to zero in current version. + + | Unfeasible Content Routes Length (2 octets) | + + | Withdrawn Content Routes (variable) | + + | Feasible Content Routes Length (2 octets) | + + | Advertised Content Routes (variable) | + + | Optional Parameters Length (2 octets) | + + | Optional Parameters (variable) + Table 3.14: CGBP UPDATE Message Format Optional Parameters: The field may contain a list of optional parameters. In the current version, no parameter is defined. The formats of the Unfeasible Content Routes Length, Withdrawn Content Routes, Feasible Content Routes Length, and Advertised Content Routes fields are very similar to their formats in the CGIP UPDATE message. See Section 3.L3 for a detailed description. In a CGBP UPDATE message from a C B G to a CIG, the Unfeasible Content Routes Length is set to 0, and the Withdrawn Content Routes field is not present in the UPDATE message. 3.3.4 KEEPALIVE Message Format KEEP A L I V E messages are exchanged between the CIG and C B G often enough to keep the Hold Timer from expiring. The CGBP K E E P A L I V E message format is the same as the CGIP K E E P A L I V E message [see Section 3.1.4]. 52 3.3.5 NOTIFICATION Message Format A NOTIFICATION message is sent when an error condition is detected. The CGBP connection is closed immediately after sending it. The CGBP NOTIFICATION message format is the same as the CGIP NOTIFICATION message format [see Section 3.1.5 and Table 3.9]. The current CGBP version defines UPDATE Message Error subcodes as a subset of CGLP U P D A T E Message Error subcodes: 12 - Malformed Content Name List 13 - Malformed Content Route List 3.4 CNGP CNGP propagates CN's internal content routing information to the CIG. The transport layer connection can be either UDP or TCP. The UDP scheme defines the UPDATE message. The TCP scheme needs OPEN, KEEP A L I V E and NOTIFICATION messages, besides the UPDATE message. The formats of the CNGP message header, OPEN, UPDATE and NOTIFICATION messages are very similar to their formats in CGBP. See Section 3.3.1 for detailed description. 3.5 CREP CREP deals with the name resolution process. CREP communications between a CIG and CN's internal request-routing system are addressed by DNS query and response. CREP communications between a CIG and C B G are addressed by content query and response. 3.5.1 CREP DNS Query and Response A name resolution query sent to a CIG, and a response returned from a CIG follows the standard DNS message format [15]. 3.5.2 CREP Content Query and Response 53 The content query that a CIG sends to a CBG, and the response that the C B G returns to the CIG, are built on UDP. The content query and response message format is shown in Table 3.15. + + | Query/Response Type (1 b i t ) | + + | Queried Content Name Length (2 octets) | + + | Queried Content Name (variable) | + + | Content Response Length (1 octet) | + + | Content Response (variable) | + + Table 3.15: CREP Content Query/Response Message Format Query/Response Type: This 1-bit flag indicates the type of the message. 0 - Content Query 1 - Content Response Queried Content Name Length: This 2-octet unsigned integer indicates the length of a queried name in characters. Queried Content Name: This variable length field consists of a list of characters of the queried content name. Content Response Length: This 1-octet unsigned integer indicates the length of content response in 32-bit words. Content response is a list of IP addresses of content surrogates with replica for the queried content. This length is set to zero when C B G fails to find any content surrogate that has replica for the queried content. Content Response: This variable length field contains a list of IP addresses of feasible content surrogates. Each IP address is a 4-octet unsigned integer. 54 Chapter 4 Simulation and Performance Analysis 4.1 Multi-threaded Prototype Implementation 4.1.1 Multi-threaded System Prototype A system prototype with a complete suite of content request-routing and resolution protocols, and Content Network's internal request-routing system, is implemented. The implementation language choice is C, in light of C's strength in meeting system-level software's requirement for speed. The implementation platform is L I N U X RedHat 7.2. With minor changes to the syntax of a few system calls, and the data type integer's bit order in memory, the implementation can be migrated to UNIX. The content routing and resolution functionalities on the CIG are provided by a server process, which accommodates CIG-side CNGP, CGIP, CIG-side CGBP, and CIG-side CREP. Similarly, a server process that accommodates C B G -side CGBP, CBGP and CBG-side CREP sitting on the C B G enables the C B G of the content routing and resolution capabilities. A request-routing agent sitting on a content server is implemented to simulate CN's internal request-routing system. The request-routing agent communicates with CIG-side CNGP to propagate routing updates to the CIG. 55 The configuration of the system prototype is shown in Figure 4.1. The solid dark lines indicate routing-related activities, and the solid grey lines indicate resolution-related activities. Figure 4.1. System Prototype Configuration One of the most important features of the implementation is multi-threading. Theoretically, one CIG can establish CGIP peering relationships with an arbitrary number of CIG peers, and one C B G can establish CBGP peering relationships with arbitrary number of C B G peers. In practice, the maximum peering connections for each CIG or C B G are limited by the number of available network interfaces of the CIG or C B G . The system call pthread is used to start a new thread for each TCP peering connection. The advantage of the implementation choice of a single process with multithreading is the low system overhead of a light-weight thread. The communication-intensive feature of these protocols makes multi-process implementation an unsuitable solution due to the expensive Inter-Process Communication (IPC) mechanisms. The CGIP, CGBP, CBGP peering relationships can be initiated at any time, which greatly facilitates the incremental deployment of the proposed architecture. For example, a Content Network can join existing Content Internetworking when it is ready, by installing CIG and initiating a CGBP peering relationship with its respective C B G , and CGIP peering relationships with its 56 direct CGIP neighbors, if any. The C B G or a CGIP neighbor can either accept the peering connection request or reject it, depending on local policies and implementation requirement. A CIG or C B G can be the initiators of peering connections with some CIGs or CBGs (the active mode), and at the same time, be the accepter of peering connections with other CIGs or CBGs (the passive mode). 4.1.2 Protocol Prototype Configuration The protocols of CIG and C B G can be initially configured in two ways: i) configuration menu, ii) configuration file. In particular, in the implementation of a CIG, the CIG protocols server process has the option to periodically read the configuration file to see if there is any newly configured CGIP or CGBP peer. If there is, the CIG initiates a CGIP or CGBP peering connection with the specified CIG or CBG. The option is implemented as an ON/OFF switch variable. This implementation further facilitates the deployment path. For example, an already running C N can modify its configuration file to establish a CGBP or CGIP peering connection to join an already running Content Internetworking without shutting down its current operation and services. On CIG, after the server process starts, a CIG main thread runs and starts following threads using pthread(): one CNGP main thread, one CGIP main thread, one CGBP thread and one CREP main thread. Then the CIG main thread performs the initialization configuration via either the menu or reading the configuration file, and then runs at the background, and if necessary, reads the configuration file periodically (the implementation's time interval is 30 minutes). The CNGP main thread runs while waiting for peering connection requests (for the TCP scheme) or content routing updates (for UDP scheme) from the CN's internal request-routing system. For the TCP scheme, if a peering connection request is accepted by the CIG, the CNGP main thread starts a thread for that CNGP peering connection. The CGIP main thread first starts one thread for each CGIP peer configured in the configuration menu or file, if there is any, for initiating a TCP 57 peering connection with the CGIP peer. Then the CGIP main thread waits for peering connection requests from other CIGs. If a CGIP peering connection request is accepted by the CIG, the CGIP main thread starts a CGIP child thread for that CGIP peering connection. Each CIG keeps a list of its living CGIP peers. The CGBP main thread is responsible for the CIG representative election. The CGBP main threads of all interconnected CIGs in one AS communicate to elect a new CIG representative based on the Bully Coordinator Election Algorithm [19] when the current CIG representative fails, or the current CIG representative's load goes beyond a threshold. The threshold is self-adjusted based on the average load of all participating CGIP peers. When a CIG is elected to be the new CIG representative for an AS , the CGBP main thread on that CIG starts a CGBP child thread to initiate and maintain a CGBP TCP connection with the respective C B G . The CGBP child thread is killed until the CIG loses the title of CIG representative in the election process. Due to the expensive TCP connection setup and teardown, along with a new CIG representative decision, the threshold of re-election is set to be high enough to avoid unnecessary switch. The CREP main thread runs while waiting for name resolution requests submitted by CN's internal system. For each resolution request, the CREP main thread starts one CREP child thread to service that request. Once the response to the request is sent to the CN's internal system, the CREP child thread exits. The maximum number of simultaneous name resolution requests a CIG can handle is only limited by the system's limitation of the maximum number of TCP connections a process can have. CBGP is implemented by extending the BGP implementation on the platform of M R T to reduce the implementation time. Similar to the way CGIP works, the M R T main thread starts a CBGP child thread for each CBGP peering connection, and a CGBP child thread for each CGBP peering connection. The CBG-side CREP main thread waits for name resolution requests submitted by CIG-side CREP. For each resolution request, the CBG-side CREP main thread starts a CREP child thread to service the request. Once the response 58 to the request is sent to CIG-side CREP, the CBG-side child thread exits. Similarly, the maximum number of simultaneous name resolution requests a C B G can handle is only limited by the system's limitation of the maximum number of TCP connections a process can have. The peering connections for CNGP (in TCP scheme), CGIP, CGBP and CBGP are on TCP. A CIG or C B G can actively initiate a TCP peering connection with a CIG or C B G peer, and in the meantime, passively listen and accept a TCP peering connection from other CIG or C B G peer. When the protocol's Hold Timer expires because it did not receive the K E E P A L I V E packet for a certain period of time, the connection is torn down, and the associated thread exits. The socket API is used for implementing the TCP connection establishment. 4.2 Server Load Measurement and Routing Updates 4.2.1 Server Load Measurement The server load metric plays an important role in routing decision processes and load-balancing mechanisms, as well as the CIG representative election algorithm. The implementation takes the server's CPU load, memory usage, swap usage and number of processes into consideration when calculating the server load metric [21]. struct sysinfo *cs_info = (struct sysinfo*)malloc(sizeof(struct sysinfo)); if(sysinfo(cs_info)) { return 0; } curr_mem_usage = (double)(cs_info->totalram - cs_info->freeram) / (double)cs_info->totalram; curr_swap_usage = (double)(cs_info->totalswap - csjnfo->freeswap) / (double)cs_info->totalswap; curr_num_processes = cs_info->procs; curr_load = cs_info->loads[0]; *cs_metric_ptr = (double)curr_load * ( M E M _ V A R * curr_mem_usage + S W A P _ V A R * curr_swap_usage + P R O C _ V A R * curr_num_processes); 59 M E M _ V A R , SWAP_VAR, and P R O C _ V A R represent how much the memory usage, swap usage and number of processes contribute to calculating the load status. The implementation of the algorithm sets these three variables to 0.2, 0.7 and 0.1 respectively. 4.2.2 Request-Routing Agent and Routing Updates In Content Internetworking, CN's internal request-routing system is considered to be a black box. However, for testing system feasibility and performance purposes, CN's internal request-routing system is essential for building a complete Content Internetworking content routing system. Hence, a simple CN's internal request-routing mechanism is designed and implemented. The mechanism adopts the request-routing agent that sits on each content server to monitor the server load status, and periodically report the server load status and content state to the respective CIG. Besides the server load metric, the content routing update message a request-routing agent sends to the CNGP of a CIG also tags a TTL value to the corresponding advertised content and associated server load. TTL is the sum of the current system time and EXPIRE_TrME, the effective time of an advertised route. EXPIREJITME is the valid time period of the advertised status of a content server, in other words, it is the frequency of the routing update, and is dynamically adjusted based on the server's stability in terms of load. If the difference between the current sampled load and previous sampled load is greater than a threshold, which means the server's load is changing drastically, the updating frequency is decreased by first being halved, and then substracted by the percentage of the change to the threshold. Otherwise, the frequency is increased by the percentage of the change to the threshold. Further, the frequency is protected by predefined MrN_FREQ and M A X _ F R E Q values. The DEFAULT_FREQ at the startup phase is set to 60 seconds in the prototype implementation. 60 The content available in a content server is stored in a configuration file, which can be dynamically changed to reflect the adding or deletion of content on the server. The content routing agent sitting on that server periodically reads the content configuration file, and accordingly announces newly available content or withdraws unfeasible content from service using routing updates sent to the CIG. 4.3 Content Routing Decision Process and Routing Table 4.3.1 Content Routing Decision Process and Design of Routing Table The content routing table of CIG and C B G are the core data structures of a content request-routing system. The structures of the content routing table of CIG, the content routing table of CBG, and the content route cache of CIG, are shown in Table 4.1, Table 4.2, and Table 4.3, respectively. For each content destination identified by its domain name, the routing system stores routing information for up to N feasible best surrogates. In the implementation, N is set to 4. The difference between a content routing table of CIG and a content routing table of C B G is that a CIG remembers whether each piece of content is replicated so that CGBP does not propagate content state of non-replicated content to the C B G . A C B G remembers whether a route is local to its own AS, in case it needs to propagate better routes back to the CIG when the difference of metric between all the local routes and a remote route is beyond a threshold. Compared to the content routing table of CIG, the content route cache of CIG only misses the "replicated" tag for each content item, since cache does not need to remember the replication status of content. Content Replicated Surrogate #1 Surrogate Surrogate Name IP Metric TTL #2 #N Address Server-load Link-weight www.yahoo.com Y 192.168.3.12 96 20 28309180 www.google.com N 192.168.3.13 36 15 28311506 Table 4.1: Content Routing Table of CIG 61 Content Name Surrogate #1 Surrogate #2 Surrogate #N IP Address Metric TTL Local Server-load Link-weight www.yahoo.com 192.168.3.12 96 20 28309180 Y www.disney.com 192.168.8.11 150 81 28001925 N Table 4.2: Content Routing Table of C B G Content Surrogate #1 Surrogate Surrogate Name IP Metric TTL #2 #N Address Server-load link-weight www.google.com 192.168.8.11 150 81 28001925 www.abc.com 192.168.1.11 129 168 28175399 Table 4.3: Content Route Cache of CIG The content routing decision process occurs when a CIG receives a routing advertisement via CNGP, CGIP and CGBP, or when a C B G receives a routing advertisement via CBGP and CGBP. A content routing advertisement can announce a new route or a route with an updated metric, or it can withdraw an unfeasible route. The CIG or C B G compares the announced or withdrawn route against the feasible routes in the content routing table. The routing preference decision is made based on local policies and the synthetic metric of a server load and link weight. Link weight is calculated based on the length of CN-path in a CIG or AS-path in a C B G , and the weight of the intermediate network link. TTL of a route is checked each time the entry of the route is accessed in a content routing or resolution process. If the TTL of a route expires, it is deleted from the routing table. 4.3.2 Implementation of Content Routing Table The efficiency of the implementation of the data structures of the content routing table and the route cache data has a significant impact on the performance of content routing system. A hash function and a linked list data structure are used to optimize the searching and access speed. 62 A Same Compliment Pseudo Random Number hash function is used to calculate the hash key of content name: int hashKey (const char *contentName, int hashtableLen) { int c, index = *contentName++; while (c = *contentName++) index = (index * 257 + c) % hashtableLen; return index; } A Bi-direction linked-list [20] is used to construct the content routing table and route cache data structure, shown in Figure 4.2. The content name is first hashed with the Same Compliment Pseudo Random Number function to get the index. Each index points to a linked-list of a domain node. Each domain node is a data structure for a particular content, in which a pointer points to a linked-list of IP nodes. An IP node linked-list is the list of available content servers for that particular content. IP node Hash Index Tnrirx Index TnHpv Indpx Tnrip.x TnHpy o - t rp IP <— IP metric metric metric *prev *prev *prev *next — • *next — • *next 0-*pDomain >=pIP *pNumber *pCurrent *rPrev *rNext *cPrev *cNext Domain Node •"pDomaid >=pIP *pNumbei *pCurrent *rPrev *rNext *cPrev "cNext Figure 4.2: Implementation of Content Routing Table Data Structure In the multi-threaded system prototype, the content routing table and route cache are implemented as public data structures that are shared among all threads 63 that access and manipulate these tables. C N G P , C G I P and C G B P have read/write access to the content routing table of a C IG , C B G P and C G B P have read/write access to the content routing table of a C B G , and C R E P has read/write access to the content route cache of a C IG . The lock mechanism for pthread is used for synchronization of execution of critical section among threads. 4.4 Simulation 4.4.1 Simulation Network Topology A simulation test-bed is set up for testing the feasibility and performance of proposed content routing architecture and protocols. A typical testing network topology is shown in Figure 4.3. The topology composes of 7 ASes and 14 Linux machines. The test-bed connects to the L A N of the Computer Science Department of U B C via a firewall running N A T . Nortel Bay 450-T switch with V L A N capability is used for setting up a virtual L A N environment. Each machine is optionally configured with C N G P , CGIP , C G B P , C B G P , C R E P and C N ' s internal request-routing agent, and is manually configured with IP routing capability. The configuration for this topology is shown in Table 4.4. Firewall C B G C I G D content server Figure 4.3: Simulation Network Topology 64 4.4.2 Testing Environment Configuration Testing strategies and mechanisms enabling a complete resolution path are designed to measure the performance of content request resolution, in order to evaluate the routing system's overall performance and scalability, including the efficiency of load-balancing and content route caching algorithms. Machine Name CPU (MHz) RAM (M) Network Interface IP Address CNGP CGIP CGBP CBGP CREP CN's Internal RR Venus 1.800 512 . 0 192.168.5.2 X X X X X X 1 192.168.3.2 2 192.168.6.2 3 192.168.8.2 Goodearth 800 256 0 192.168.1.7 X X X X X X 1 192.168.5.7 Moonlight 800 256 0 192.168.4.8 X X X X X X 1 192.168.7.8 Ariel 200 64 0 192.168.6.9 X X X X X Robson 200 64 0 192.168.6.10 X X X X X X 1 192.168.4.10 Celestial 200 96 0 192.168.8.25 X X X X X X 1 192.168.7.25 Cupid 200 64 0 192.168.9.10 X X X X X Grimface 200 16 0 192.168.1.12 X X X X X X 1 192.168.3.12 2 192.168.5.12 Slesse 200 16 0 192.168.3.15 X X X X X Wedge 200 16 0 192.168.6.11 X x • X X X X 1 192.168.8.11 Jackhamer 200 16 0 192.168.7.4 X X X X X Hozameen 200 16 0 192.168.8.16 X X X X Outrage 200 16 0 192.168.5.15 X X X X X Price 200 16 0 192.168.2.15 X X X X Table 4.4: Simulation Network Configuration • Configuring content server with Apache Web server 65 The Apache Web server is installed and configured to simulate a content server that hosts both plain and media-rich content. In testing, up to 1 0 0 0 different content are simulated and strategically distributed across the test-bed. The machine is, in the meantime, configured with the request-routing agent. Hence, the request-routing agent sits on the same machine with the content server, monitoring the content server's status and propagating the status to a CIG as routing updates. Overriding original DNS server with CIG Traditionally, client (browser or other applications such as "Ping") domain name requests are forwarded to a DNS server configured in the file "/etc/resolv.cfs" in Linux/Unix or in the network settings menu in Windows NT. We replaced the IP address of the original default DNS server of the CS department of U B C with the IP address of the respective CIG. CIG-side CREP runs as a thread of the CIG server process, listening on port 53, the well-known DNS port, for incoming name resolution requests. The CIG-side CREP configures the macro variable-DNS server, to which the CIG forwards the unresolved name requests for non-replicated content, with the address of the default DNS server of the CS department of UBC. Thus, all name resolution requests originating in a particular C N are captured and redirected to the respective CIG of that C N first, and only when a request cannot be resolved through CIG-side and C B G -side CREP, is the request forwarded to a traditional DNS chain for a final resolution. Simulating clients and web stress testing A client accessing content is simulated in four approaches: Self-developed client program: The client program uses a while loop to read the domain name to be queried from a pre-configured 66 file, and sends a standard DNS query [13] about the domain name to port 53 of its respective CIG, which the CIG-side CREP listens to. Applications: Applications requiring name resolution service, such as "ping" and "nslookup". Browser: Internet Explorer and Netscape in Linux and Windows NT. Three Windows NT machines are connected to the testing network to provide a client under Linux and Windows environment. The caching functions of the browsers are disabled to eliminate factors that would otherwise interfere with normal content request resolution. Web stress tool: Passler's Webserver Stress Tool [17] for Windows and OpenLoad Web Stress Tool [18], are used to simulate intensive web access activities. Web stress testing is useful in simulating real Web access traffic and large-scale network client and service environment, and is essential in Web related protocol performance and scalability evaluation. These Web stress tools can simulate multiple clients simultaneously accessing the same content, selected content in a pre-defined pattern, or selected content randomly. 4.5 Scalability and Performance Analysis Scalability experiments tested the proposed content routing system's feasibility and scalability along with the increase in the number of content, networks including CNs and ASs, and end-users. 4.5.1 Scalability with Respect to Content In the experiments, content size ranged from KB to 10A2 KB, and content type included plain text, audio, video, and mixed media types. As expected, content size and type did not directly impact the performance of the content request routing system. Instead, the size and type of content are expected to be two key 67 factors that significantly influence the performance of a content distribution system. The most stringent limitation that the scale-up of the number of content brings to system scalability lies in the processing capability of a content routing table, which involves two aspects: the size of the content routing table and the access speed. On one hand, the hierarchical content routing and resolution architecture efficiently reduces the size of content routing tables. The content routing table of a CIG stores routing information of all the available content, replicated and non-replicated, that reside only in the local AS. C B G does not initiate to propagate routing information of content that is replicated in remote ASes. In the meantime, the C B G only stores routing information of content that is replicated across the interconnected content networks, as CIG does not propagate the routing information of non-replicated content to the C B G . Thus the size of content routing tables of a CIG and C B G are significantly reduced. On the other hand, the implementation of the routing table data structure using a hash function and linked list greatly speeds up searching and accessing operations on the routing table. Moreover, as computer memory is becoming cheaper and C P U is becoming faster nowadays, the size of a routing table is not as stringent as it used to be, though memory is not getting faster at the same rate. In the testing experiments, we configured 100 different content on Apache Web servers, and configured 2000 content items in content routing table. The results indicated good scalability. 4.5.2 Scalability with Respect to Networks The scaling of networks includes the increase in number of CNs and ASes. The scaling of Content Networks involves the increase in the number of CNs in one AS and the total number CNs in all interconnected ASes. The latter can be addressed by combining scalability with the scaling of CNs in one single AS, and the scaling of ASes. The content routing system's scalability, along with the scaling of CNs, is directly decided by the properties of CGIP, which deals with 68 the intercommunications between CNs. Following the design style, CGIP is expected to have the same level of performance, such as convergence behavior, as BGP. Similarly, the content routing system's scalability with the scaling of ASes directly depends on the properties of CBGP, which is designed based on BGP and expected to have the same level of performance as BGP. In our experiments, the number of CNs in one AS is increased from 1 to 8, and the total number of CNs is increased from 1 to 14. The number of ASes is increased from 1 to 7. No significant performance loss is noticed with the scale of networks. 4.5.3 Scalability with Respect to End-Users The increase in the number of end-users is reflected through the increase in the number of clients and name resolution requests. We used Web stress tools, IE, Netscape, self-developed client program and applications such as "Ping", to simulate clients simultaneously accessing content. The number of simultaneous clients is increased from 1 to 100 for testing. The results of the experiments showed that the system scales well, along with the increase in the number of end-users. The hierarchical and distributed nature of the routing architecture and protocol CREP enables a content request to be resolved without going through globally centralized nodes, which leads to good scalability. Further, the multi-threaded prototype assigns one thread for each name request, and hence, the only limitation to the maximum number of simultaneous clients is the resource limitation imposed by the underlying system. With a higher version of the Web stress tools of Passler and OpenLoad, this number can be scaled largely to better meet the requirements for scalability testing. 69 Chapter 5 Conclusions and Future Work 5.1 Conclusions The explosive growth of content-based Internet services has driven the development of content replication and content networking techniques. The deployment of Content Networks dramatically speeds up content delivery performance by distributing content replicas and redirecting clients' requests to surrogates "closest" to clients. However, each individual Content Network has its physical limit. In order to build interoperability of distinct Content Networks to allow individual Content Networks to expand across domain boundaries, the concept of Content Internetworking has been developed and is fast becoming the focus of intensive research. With Content Internetworking, distinct, independent, Content Networks can be organized into a global network infrastructure to optimize content-based service performance. Request-routing in Content Internetworking is responsible for directing a client's content request to a suitable Content Network in which a surrogate server can best serve the request, based on local policies and a set of metrics, including network proximity, network congestion and health, server load and health, and so forth. Known Content Internetworking request-routing approaches unfortunately are not able to provide an elegant solution for the scalability problem. Some of 70 these define content routing at the application layer, which wastes the existing network routing infrastructure and incurs a performance penalty. This thesis proposes an innovative request-routing architecture and a comprehensive suite of content routing and resolution protocols for Content Internetworking, in an effort to address scalability and performance concerns in the development of a content request-routing system. The design features of the proposed approach are summarized below: • Hybrid The thesis proposes a routing architecture for the next generation Internet, which supports both traditional IP routing and emerging content routing. Content routing is a routing issue in nature, sharing many fundamental features with IP routing. In particular, request routing in Content Internetworking works in a very similar fashion as BGP. The thesis extends the existing networking infrastructure to enable the communication of content state, and integrate the content routing system with the IP routing platform, to construct a hybrid platform and add content intelligence into the IP-aware network. In the proposed hybrid routing architecture, IP and content routing function in parallel. • Hierarchical The proposed request routing architecture is highly hierarchical, with routing components organized into three levels. Based on the assumption that an individual CN is always confined within the boundary of one single administrative domain-Autonomous System (AS), the architecture makes use of the AS-interconnection infrastructure to extend the CN-interconnection to achieve efficient global Content Internetworking. The hierarchical architecture enables the hierarchical content routing process. The lowest level is the CN's internal request-routing 71 component that is considered to be a black box in Content Internetworking scenarios, allowing the service providers maximum freedom in choosing the appropriate CN's internal content routing system. The second higher level is comprised of CIGs, enabling content interconnection of CNs. The third and highest level comprises of CBGs, enabling content interconnection of ASes. The hierarchical architecture also enables the hierarchical content request resolution process. A client's content request is first sent to the lowest level-CN internal's Request-Routing system for resolution. If unsuccessful, the request is sent to the second level-CIG for resolution. The CIG checks its RIB and route cache, attempting to resolve the request. If unsuccessful, the CIG sends a content query to the highest level-CBG for resolution. The CBG checks its RIB and sends a content response back to the CIG. If the content response contains a content server's IP address, the request is resolved; otherwise, the CIG forwards the request further to the normal DNS resolution chain. Distributed System Design The proposed content request routing system is distributed, and hence, inherits the advantages of a distributed system-good scalability, flexibility and reliability. But meantime, more complicated management mechanisms are required. The architecture eliminates the globally centralized nodes-authoritative servers-from the picture of request routing in Content Internetworking. For load sharing and fault tolerance purposes, each CN may have multiple CIGs, provided that they keep the consistent view of the CN's content routing situations. ICGIP addresses the communication between CIGs within the same CN. Similarly, each AS may have multiple CBGs, and ICBGP addresses the communication 72 between the CBGs within the same AS. The CIG representative election algorithm in CGBP allows for flexible representative selection based on the server load and health information. Name Plus IP Dual Address Routing Building on the hybrid routing platform, the addressing scheme of our content routing system adopts a dual address scheme: name plus IP dual address. Name plus IP dual address can uniquely identify a content copy across the globe. More importantly, this addressing scheme closely relates content routing with IP routing, thereby naturally bridging the gap between the content routing protocols and infrastructure, and existing IP routing protocols and infrastructure. Discrimination of Handling of Replicated and Non-Replicated Content Content with replica and content without replica are treated differently in our routing system. CIGs aware of the content's replication status do not distribute the routing information of non-replicated content to CBGs. Thus, the AS-interconnection level does not have the knowledge of non-replicated content, and the size of content routing tables in CBGs can be significantly reduced. The scheme provides the system with good scalability, yet maintains an acceptable overall system performance. The reason that some content is not replicated is mostly because this content is seldom requested and accessed by clients, and hence, not worth replicating. Therefore, although the majority of the content on the Internet is non-replicated content, most of the access requests are for content with replica, and the penalty imposed for requests requiring non-replicated content would not cause the degradation of overall performance. Additionally, with content route caching mechanisms, a CIG can cache routes for requests requesting non-replicated content, which further 73 complements the scheme of discriminating the replicated content from the non-replicated content. Content Route Caching To further reduce the response time to the client's content request, caching mechanisms are deployed. Similar to other caching schemes that optimize performance by bringing requested information closer to the user, our caching mechanisms cache the routes to content destinations, in addition to the basic caching mechanisms that cache content. With route caching, the resolution rate at the CIG is noticeably increased. Resolving content requests locally at the CIG saves the round-trip overhead from the CIG to the C B G , and thus greatly speeds up the response time and reduces traffic and computational overhead. Intelligent Load Balancing Load-balancing algorithms are built to complement the basic content routing algorithms to achieve optimal load distribution and system performance. The load-balancing algorithms are based on local content-specific policies, real-time metrics of content, server and network variations, and server-specific access history. The Possibility Bucket Algorithm demonstrates good load-balancing capability, at the price of more computing overhead. Scalability With the growth of content quantity and requirements for content services, plus the exponential expansion of networks and end users, scalability has become one of the major concerns of today's Internet. The features of the proposed approach, including hierarchical routing architecture, distributed system design, and the discrimination of the handling of content with replica and handling of content without replica, enable the content routing system to scale gracefully with 74 respect to the size of content, networks (including CNs and ASs) and users. • Incremental Deployment Path The proposed content request-routing architecture and protocols can be deployed all-at-once, as well as incrementally. A traditional Border Gateway that is only IP-aware is compatible with the upgraded one-Content Border Gateway (CBG), which is both IP and content aware, and hence, can be upgraded incrementally with CBGP to support content routing. The gateway CIG, the content routing protocols CNGP, CGP? and CGBP, as well as the content resolution protocols CRRP, can be installed or deployed incrementally, and spontaneously. A complete multi-threaded system prototype is implemented with C on Linux RedHat 7.2, and a simulation experiment test-bed is set up for feasibility evaluation and performance and scalability analysis of the proposed content routing architecture and protocols. The testing network topology is composed of 14 Linux and 2 NT machines. Apache Web servers are installed to simulate servers hosting content. IE and Netscape browsers are used as clients. Passler's Web Stress Tool [17] for Windows and OpenLoad Web Stress Tool [18] for Linux are used to simulate clients' parallel, and intensive, access to content. The request's response latency, server load, and traffic volume are measured for analysis. Server-side metrics that directly participate in routing decisions are monitored and the configurable variables are tested with different sets of values. The scalability evaluation focuses on the routing system's scalability with the increase in the amount of content, the number of networks (including CNs and ASes), and end-users. Experiments indicate that response delay is almost unnoticeable when simultaneous end-users scale from 1 to 100, content scale from 1 to 2000, CNs scale from 1 to 14, and ASes scale from 1 to 7. Compared to a traditional DNS approach, the proposed approach incurs one extra round trip from the CIG to the CBGP router for resolving requests for replicated content of which 75 the authoritative servers are located in remote ASes, and for which there are no cached routes in the CIG. The experiments demonstrate that this overhead is reasonable and acceptable, as compared to significant overall performance gain. More experiments with a larger scale of end-users, content and networks are still essential to draw a conclusion on the scalability and performance of this approach. 5.2 Future Work Metric collecting is a performance-critical aspect for a content routing system. The properties of mobility and flexibility of mobile intelligent agent techniques, such as WAVE [22], qualify these as good candidates for collecting metrics, especially server-side metrics, in Content Internetworking request-routing systems. This thesis investigated and proposed four load-balancing algorithms for content routing purpose. Further work can be carried out on the refinement of load balancing mechanisms to deal with the influence of a local cache on CIG's load-balancing decisions, and the cooperation between local and global load-balancing components of a CIG and CBG, respectively. Moreover, how to effectively and efficiently integrate the load-balancing efforts for content routing with the load-balancing efforts for IP routing could be an interesting and challenging topic. The thesis research on the request-routing system in Content Internetworking presumes that the distribution system already distributed content replicas across the networks, and content is aware of their replication state. To build an intelligent Content Internetworking system, a proper definition of the interface and interaction between the request-routing system and the distribution system is essential. For example, the real-time client request and routing situations of the content routing system can feedback to the content distribution system as a guide for more efficient content distribution and configuration. The mechanism described in this thesis basically belongs to a client-server model: the content server functions to deliver the best services to a client. 76 Comparatively, in a collaborative environment, each principal can hold content, in which it may be best described as Peer-to-Peer (P2P) architecture. The design of a scalable P2P content routing and distribution architecture can be a research interest when building a scalable collaborative environment. In evaluating content-based services in Content Networking, security, QoS and fault tolerance concerns are among the most important factors. To integrate security, QoS and fault tolerance functions and components with content request routing is the direction of the development of Content Networking techniques. Related to this topic, the concept of Virtual Private Content Network (VPCN) [23, 24] combines a Virtual Private Network (VPN) and Content Networking to accommodate security, QoS and performance issues in providing secure and QoS guaranteed content networking services. 77 Bibliography [I] Bringing Content Closer. Packet Magazine, Dec 2000. [2] T. Nolle. Why Is Content Networking So Hot? Network Magazine, Dec 5, 2000. [3] Content Alliance. http://www.content-peering.org, accessed in Mar 2002. [4] M . Day, B. Cain, G. Tomlinson, P. Rzewski. A Model for Content Internetworking. http://www.ietf.org/internet-draftsdraft-ietf-cdi-model-02.txt, Feb 22, 2002. [5] M. Day, B. Cain, G. Tomlinson, P. Rzewski. Content Internetworking Scenarios. http://www.ietf.ors/internet-drafts/draft-ietf-cdi-scenarios-01.txt, Feb 25, 2002. [6] A . Barbir, B. Cain, F. Douglis, M . Green, M. Hofmann, R. Nair, D. Potter, O. Spatscheck. Known C N Request-Routing Mechanisms. http://www.ietf.org/internet-drafts/draft-ietf-cdi-known-request-routing-Ol.txt, Feb 22, 2002. [7] M. Green, B. Cain, G. Tomlinson, S. Thomas, P. Rzewski. Content Internetworking Architectural Overview. http://www. ietf. org/internet-drafts/draft-ietf-cdi-architecture-6l.txt, Feb 22, 2002. [8] B. Cain, O. Spatscheck, M . May, A. Barbir. Request-Routing Requirements for Content Internetworking, http://www.ietf.org/internet-drafts/draft-ietf-cdi-request-routing-reqs-01.txt, Feb 22, 2002. [9] L. Amini, S. Thomas, O. Spatscheck. Distribution Requirements for Content Internetworking, http://www.ietf.org/internet-drafts/draft-ietf-cdi-distribution-reqs-00. txU Feb 22, 2002. [10] D. Gilletti, R.Nair. Content Internetworking Authentication, Authorization, and Accounting Requirements, http://www.ietf.org/internet-drafts/draft-ietf-cdi-aaa-reqs-01. txt. Feb 22, 2002. [II] Y . Rekhter, T. Ii. A Border Gateway Protocol 4 (BGP-4). RFC 1771, Mar 1995. 78 [12] P. Mockapetris. Domain Names - Concepts and Facilities. RFC 1034, Nov 1987. [13] P. Mockapetris. Domain Names - Implementation and Specification. RFC 1035, Nov 1987. [14] M. Gritter, D. R. Cherition. Triad: A New Next-Generation Internet Architecture. http://sresorio.stanford.edu/slides/triad-content-netseminar/, accessed in Aug 2001. [15] Mark Gritter, David R. Cherition. An Architecture for Content Routing Support in the Internet. USENIX Symposium On Internet Technologies and Systems, Mar 2001. [16] Multithreaded Routing Toolkit (MRT)-A Partnership between University of Michigan and Merit Network, http://www.mrtd.net, accessed in Aug 2001. [17] Paessler Web Stress Tool. http://www.paessler.com/WebStress/webstress.htm, accessed in Mar 2002. [18] OpenLoad Web Stress Tool 0.1.2 for Linux. http.V/openload.sourceforge.net/, accessed in Mar 2002 [19] A. Tanenbaum, M. V. Steen. Distributed Systems Principles and Paradigms. Prentice Hall, 2001. [20] J. Li, M. Gao, X. Huang. CPSC 527 Course Project Report. May 2002. [21] P. Au. CPSC 527 Course Project Presentation Slide. May 2002. [22] S.T. Vuong, I. Ivanov. Mobile Intelligent Agent System: WAVE vs. JAVA. The First International Conference on Emerging Technologies and Applications in Communications, May 1996. [23] A. Barbir, N. Mistry, R. Penno, D. Kaplan. A Framework for OPES End to End Data Integrity: Virtual Private Content Networks (VPCN). http://www.ietf-opes.org/documents/draft-barbir-opes-vpcn-00.txt. Nov. 2001. [24] Context Media's Virtual Private Content Network (VPCN) Solutions. http://www. contextmedia. com/solutions/vpcn. html, accessed in May 2002. [25] S.T. Vuong, M. Gan, X. Cai. In proceedings of 2nd Military Communications Conference, Oct 2002. 79 Appendix A Abbreviations AS Autonomous System BGP Border Gateway Protocol CBG Content Border Gateway CBGP Content Border Gateway interconnection Protocol CDI Content Distribution Internetworking CDN Content Distribution Network CGBP Content internetworking Gateway Border gateway interconnection Protocol CGIP Content internetworking Gateway Interconnection Protocol CI Content Internetworking CIG Content Internetworking Gateway CN Content Network CNGP Content Network content internetworking Gateway Protocol CP Content Peering CREP Content request REsolution Protocol DNS Domain Name Services IP Internet Protocol J HTTP Hyper Text Transmission Protocol POP Point Of Presence 80 QoS Quality Of Services RR Request-Routing TCP Transport Connection Protocol T T L Time To Live 81 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051732/manifest

Comment

Related Items