UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Proximity : subtitle a scalable P2P architecture for latency sensitive massively multiplayer online games Malik, Kamran S. 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2005-0544.pdf [ 4.68MB ]
JSON: 831-1.0051311.json
JSON-LD: 831-1.0051311-ld.json
RDF/XML (Pretty): 831-1.0051311-rdf.xml
RDF/JSON: 831-1.0051311-rdf.json
Turtle: 831-1.0051311-turtle.txt
N-Triples: 831-1.0051311-rdf-ntriples.txt
Original Record: 831-1.0051311-source.json
Full Text

Full Text

Proximity: A Scalable P2P Architecture for Latency Sensitive Massively Multiplayer Online Games by Kamran S. Malik B.Sc, Lahore University of Management Sciences, 2002  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science  in THE FACULTY OF GRADUATE STUDIES (Computer Science)  The University of British Columbia October, 2005 © Kamran S. Malik, 2005  Abstract P2P overlays are a natural architecture for scalably supporting Massively Multiplayer Online Games. However, computing consistent game state in a distributed fashion without compromising responsiveness places very hard to meet bandwidth, latency and scalability requirements on the architecture. We present Proximity - a system that utilizes the limited Area of Interest of game entities to arrange player contributed machines in an overlay that optimizes bandwidth consumption, reduces event propagation delays and scales dynamically with an increasing number of players. Network proximity of nodes is also taken into account during the overlay construction process. In this thesis, we describe how to split game state management over unreliable player machines while meeting the fault tolerance, state persistency and consistency requirements of games. We anticipate our system, with its increased scalability and lower latency, opening the possibility of completely new genre of MMOGs with fast paced reflexive action at a fraction of a cost of the more traditional client server architecture.  ?  ii  Contents Abstract  »  Contents  i»  List of Tables  vii  List of Figures  viii  Acknowledgements  x  Chapter 1  1  Introduction  1.1  Motivation  2  1.2  Thesis Contributions  4  1.3  Thesis Organization  6  Background and Related Work  7  Chapter 2 2.1  Scalability and Peer Architectures  7  2.2  Peer-to-Peer Architectures  8  2.3  Related Work 2.3.1  2.4 Chapter 3  ,  9  Problems with exiting P2P solutions  10  Goals of this Architecture  12  System Overview  13  3.1  Direct Player to Player connections  13  3.2  Neighbor Discovery  14  3.3  Game State  15  3.4  Fault Tolerance  16  3.5  Persistence  17  iii  3.6  Coordinator Overlay  18  3.7  Overlay Construction  18  3.8  Routing  19  3.9  Transparent Borders  21  3.10  Splitting & Merging  22  3.11  Dynamic Area of Interest (Aol)  23  3.12  Overload Detection  24  3.13  Topology Aware Overlay Construction  24  3.14  Replica Replacement  25  Overlay Network Management  ..26  Chapter 4 4.1  Routing Table  26  4.2  Handling Splits  27  4.3  Intelligent Split  33  4.4  Bootstrapping the child  33  4.5  Merge Algorithm  34  4.6  Locking  35  4.6.1  Locks and Splits  38  4.6.2  Locks and Merges  38  Chapter 5  Overlay Routing  41  5.1  Basic Routing Algorithm  5.2  Grid  5.3  Big Hop  45  Loop Avoidance Rule  48  5.3.1  41 .....  iv  44  5.4  Lies Within  49  5.5  Reverse BigHops  50  5.5.1  Reverse Tree Construction  52  5.5.2  Routing with Reverse BigHops  53  Chapter 6  Architecture  56  6.1  Player Code  6.2  Coordinator....  59  6.3  Central Server  60  6.4  Inter-Component communication  61  6.5  Removing CS Bottleneck  64  6.5.1  Secondary Overlay  64  6.5.2  Reusing Primary Overlay  65  Comparison with other D H T based Schemes  65  6.6 Chapter 7  ...56  Performance Evaluation  67  7.1  System Properties  69  7.2  Scalability  73  7.3  Routing  75  Chapter 8  Conclusions  78  8.1  Future Work: Service Discovery  78  8.2  Conclusion  80  Appendix A A. 1 Appendix B  Derivations  .  Expected Average Subscribers per Player Bounded Multicast  81 81 83  v  B.l  Bounded Multicast and Range Queries  B. 2  Query Termination  Bibliography  vi  List of Tables Table 3-1 Neighbor Registry and Routing  20  Table 4-1 Symbols used in the Locking Algorithm  36  Table 5-1 Definitions used in Reverse Description  54  Table 6-1 Message Types  64  Table 7-1 Routing Comparison  77  vii  List of Figures Figure 2-1: Kawahara's Node Blindness Figure 3-1 Neighbor Registry and Routing  11 ..  20  Figure 3-2 Transparent Boundaries  21  Figure 4-1 Vertical Split  28  Figure 4-2 Post Split Configuration  30  Figure 4-3 Horizontal Split  31  Figure 4-4 BoundaryMatcher (Region 3) - Horizontal Split  31  Figure 4-5 Split with Trespasser  32  Figure 4-6 Merge - CAN_announceReplace messages  34  Figure 4-7 Inconsistencies caused without Locks  35  Figure 4-8 Merges and Deadlocks  39  Figure 4-9 Why parent should unlock neighbors locked by child  40  Figure 5-1 Boundary Assignment  42  Figure 5-3 Zone Origin  43  Figure 5-4 Effect of origin choice  44  Figure 5-5 Grid and Loops  44  Figure 5-6 Relative Location and BigHop Links  46  Figure 5-7 Hopping Subtrees  47  Figure 5-8 Loop Avoidance Rule  48  Figure 5-9 Lies Within  49  Figure 5-10 Subspaces not connected by BigHop  50  viii  Figure 5-11 Available BigHop Links  *  51  Figure 5-12 Reverse BigHop (shows selected Coordinators)  52  Figure 5-13 Secondary Tree Construction  53  Figure 6-1 M D C Subcomponents  58  Figure 6-2 Deployment Diagram  60  Figure 7-2 Players per Zone  69  Figure 7-3 BW requirement per Player at M D C  70  Figure 7-4 Subscribers per Player  71  Figure 7-5 Neighbors per Coordinator  72  Figure 7-6 Coordinators per Player  72  Figure 7-7 Neighbors per Coordinator  73  Figure 7-8 Average Instantaneous Coordinator Bandwidth  73  Figure 7-9 Players per Zone  74  Figure 7-10 Bandwidth per Node  75  Figure 7-11 Subscribers per Player  75  Figure 7-12 Routing Comparison.  ....76  Figure 7-13 Reverse BigHop - Logarithmic Scale  77  Figure B-1 Bounded Broadcast  84  Figure B-2 Redundant Queries Prevention  85  Figure B-3 Iterative Queries  i  ix  86  Acknowledgements I would like to thank my supervisor, Dr. Son Vuong, for his expert guidance, unending support and encouragement. None of this could have been possible without his backing and sincere advice regarding all aspects of my studies at UBC. I thank Dr. Charles Krasic for always keeping his door open to students like myself seeking advice and feedback. I would also like to express my heartfelt gratitude to Michael Blackstock. He acted as sieve to help me sift through ideas through out the course of the project. A number of proposed improvements to the overlay network were stimulated by the many long discussions with him. I wish him the best of luck with his Doctorate work. Finally, I thank my family for their understanding and always being there whenever I needed them.  Kamran S. Malik The University of British Columbia  September 2005  x  Chapter 1 Introduction Massively Multiplayer Online Games (MMOGs) are games played over the Internet that involve thousands of concurrent players who interact in real time with other players and entities that share the game world. MMOGs are distinguished from other multiplayer games by the large number of players and the long term persistent storage of the game state. Games like Lineage have recorded 180000 concurrent players and the number of its subscribers exceeds two million [1]. In a MMOG, game entities are always present regardless of the presence of the player that introduced them. Changes made to the game world in a certain area by any player are recorded and must be visible to any player that might pass the same part of the world later even after the player who made the change has left the game. The ability of a player making lasting changes to the world makes the game feel more realistic and appealing. In summary, two properties that differentiate MMOGs from other multiplayer games are the number of players involved and the persistence of game state in large virtual worlds. Most commercial MMOGs available in the market use the traditional client-server model that limits the scalability of the system and places an enormous financial burden 1  on the publishing company to furnish enough resources to accommodate the large number of players. In addition, it also restricts the type of games that can be supported. In recent years there has been a push towards P 2 P architectures because of their inherent scalability. Peer-to-peer architectures allow games to utilize the storage, bandwidth and computing power available on the player machines. However, a number of technical issues are impeding the commercial popularity of this approach.  1.1  Motivation Traditionally, M M O G s have used a client server architecture where the server is  responsible for storing and computing the state of the shared world based on the inputs gathered from all the participating players. The client running on the player's machine can range from being as a simple text terminal to a complex three dimensional rendering system that shows the player a window into the virtual world. The clients receive input from the user, convert them into commands and pass them along to the server. The server collects the commands from player machines on the network and runs a game world specific simulation algorithm on the gathered commands to generate the next state of the game world. It then sends updates based on the current state of the game world to the clients depending on their area of interest. Since the state of the world is derived from input from all the clients together at a single central location, the.players can see the effect of the combination of each other's actions. A client may maintain a cache of the state of the virtual world and its objects within the area of interest of a player's avatar. The server sends them updates only for the objects within this perceivable area. The clients apply these updates to the objects in the cache and update the player's view of the world.  2  Since only the server is involved in deciding the state of the entire world it is much simpler to maintain consistency and deal with delayed and lost messages from clients. The game company can also easily control player admission and game state through the central server. However, as all player machines have to communicate with the server it can quickly become a computing and network bottleneck because of the potentially large number of players involved in M M O G s and the size of the game state. Scalability of the client server model may be improved by employing groups of servers arranged in a cluster [24] or a grid [25]. The game world is split and assigned to different servers on the cluster. A server thus maintains only a part of the virtual world and only players that reside in that portion of the game'world connect to that server. As players move through the world they may have to migrate to different servers. Terazone using a cluster of servers that individually could only handle 2000 to 6000 clients was able to support 32000 concurrent players over the cluster [24]. Other schemes divide the server workload by assigning players based on their physical network proximity to different servers running parallel independent worlds. This essentially side steps the problem of scalability as it limits interaction to only players connected to a shared server. Players in different "parallel" worlds cannot interact with one another. Although these architectures can be made to scale with the number of players by adding more centralized resources, it fundamentally lacks flexibility and the centralized server(s) must be over-provisioned to handle peak loads. To do this, game publishers are forced to estimate in advance the number of players they expect to play their games. To reduce costs providers often under-provision, limiting the growth of appealing games,  3  which translates into lost revenue for the company. Even if the company is able to predict and provision the right amount of resources, these resources remain under utilized while the game popularity rises to the predicted level. Similarly, when a game's popularity starts to wane the gaming company is unnecessarily forced to maintain the original level of resources. i  Moreover, in all forms of client "server architecture there exists an upper limit on the responsiveness of the user actions as they affect the overall game state since clients have to wait for commands to propagate to a possibly busy server and the resulting updates back to the client. This limits user responsiveness and even prevents certain genres of games like first person shooter games) from becoming massively multiplayer.  1.2  Thesis Contributions This thesis discusses these impediments and as a solution proposes an architecture  that allows player machines to contribute their resources for game state management by becoming a part of an overlay that dynamically distributes the load amongst their machines. The architecture exploits area of interest management to maximize the utilization of the pooled resources. It also describes methods for using existing distributed game state management algorithms with the proposed architecture. The biggest contribution of this thesis, we hope, will be to revive interest in a genre of overlay networks that allows semantic placement of data on the overlay better than other more popular overlays. We exploit this property to map regions of the game space on the overlay so that close-by entities in the virtual world are co-located on the overlay, as well. To highlight, this thesis:  4  •  Provides a description of a completely distributed dynamic algorithm for assigning game state management responsibilities, to share the load over a set of machines and thus, improve the scalability of the MMOG infrastructure. The division of the game state is completely transparent to the game entities as their perception is not limited by any artificial boundary imposed by the assignment algorithm.  •  Reduces infrastructure cost by providing ways to utilize resources available on the participating players' machine.  •  Allows direct communication between the source of an event and all entities affected by it to reduce communication latency and improve response time. It utilizes Area of Interest (Aol) to transmit an event to only those entities that need to be aware of it.  •  Lays out methods for maintaining the consistency of the game state spread over a set of peers and means for guaranteeing state persistence in the face of peer failure.  •  Describes distributed algorithms and synchronization issues involved in the topology-aware construction of the overlay. These details are not available in other published work on similar overlays.  •  Proposes utilization of the network topology aware construction of the overlay to assign coordinating responsibilities to reduce network traffic and latency.  •  Presents schemes for bounded multicast over well defined subspaces of the overlay coordinate space. Existing multicast algorithm cannot guarantee  5  non-duplication and only support broadcast of the message to all nodes in the overlay. •  Describes how bounded multicast can be used to make iterative range based queries.  •  Explains how a tree can be laid on top of the overlay and used to route messages leading to a reduction of routing cost from O(N 0.5) to O (log N). A  •  Describes an extension of the infrastructure for Service Discovery another application that can greatly benefit from the semantic placement of information.  1.3  Thesis Organization The rest of the thesis is organized as follows: Chapter 2 makes a case for a P2P  based approach, discusses some recently proposed attempts to adopt this technology and lists their shortcomings. In Chapter 3, we provide the high level details of Proximity, our proposed gaming architecture. Chapter 4 presents detailed algorithms for constructing the underlying P2P overlay. In Chapters 5, we describe our proposed improvements to the overlay to allow more efficient routing. The main components of the gaming architecture are presented in Chapter 6. We evaluate our proposals in Chapter 7 and conclude in Chapter 8 with a word on possible extensions of the architecture in the realm of Service Discovery. We present details of the multicast routing and iterative range queries in Appendix B.  6  Chapter 2 Background and Related Work This chapter makes a case for a peer-to-peer architecture for MMOGs, introduces some recently proposed peer architectures and delineates their major shortcomings.  2.1  Scalability and Peer Architectures For a system to be scalable it should allow large number of components to join and  leave the system arbitrarily. Resources on any given node are limited and are consumed as components join the system. A system can allow new comers to join only when it has enough spare resource to accommodate the new members. A system can continue doing this as long as resources are not depleted when new nodes join. To prevent resource depletion two strategies can be exploited: Resource-growing: resources increase at the accepting node with the addition of the new node.  7  Decentralized end point resource consumption: addition of new members should not use up any centralized resource.  2.2  Peer-to-Peer Architectures Peer to peer architecture comes naturally to mind as a system which can be built  around both these two strategies. A number of structured P2P overlays have been proposed. C A N  [13], Chord [14], Pastry [11], and HyperCast [15] being the more popular  ones. These networks focus on setting up a distributed hash table that maps values to keys and provides means to route through the network to locate those keys to retrieve the associated values. Recently, there has been considerable research published on building games on top of these DHTs with varied success. Early multiplayer games and even more recent research and commercial work rely on a simple P2P model. Games like MiMaze [16] and Age of Empires [17] implement a completely decentralized unstructured P2P system where every node receives moves from all the players in the game and decides the game state on its own. These games depend on the underlying network to support broadcasting or multicasting. All nodes are equal and model the entire world based on the messages received from all the other players. Since all nodes run the exact same algorithm on identical sets of inputs, all players see the same world on their screens. Other P2P systems such as AMaze [19] and Mercury [18] utilize multicast groups to send messages only within groups of nearby players to minimize the bandwidth requirements. The applicability of multicast is severely limited as IP multicast is still not supported widely and architectures that try to mimic multicasting by the use of unicast links introduce too much latency.  8  However, to adapt P2P technology for massively multiplayer gaming, which involves large number of players spread all over the Internet and not necessarily on the same L A N a number of enhancements are necessary. We can no longer assume that (i) messages can be easily broadcasted to all participants (ii) all the intended recipients receive all the messages within guaranteed time constraints. These systems should also be able to support large number of shared mutable environment object in addition to the player entities both of which should be stored persistently To remove the need for broadcasting all messages, nodes have to be arranged in a topology where an avatar's limited area of interest can be exploited to minimize the number of connections between nodes. Sophisticated consistency management techniques must be used to deal with delayed and lost messages.  2.3  Related W o r k In recent years a number of researchers have suggested improvements to P2P  architectures to meet MMOG requirements. Knutsson et al. [2] propose making all player machines members of a Pastry Overlay [11]. The game map is split into zones and all players within a zone form a group by joining the Scribe [12] multicast tree for the zone. Zone IDs are used to hash onto Pastry nodes to select zone coordinators, which also serve as roots for the multicast trees. Game state is updated by broadcasting the moves on the multicast tree using the coordinator's ED. Kawahara et al [7] advocate a fully distributed scheme where all nodes are equal. A player node maintains a list of a fixed number of nearest neighbours and forms direct  9  connections with the neighbours included in the list. Neighbours periodically exchange their lists to help each other detect new incoming nodes. In Solipsis [8] nodes attempt to maintain links with a ring of nodes around themselves instead of just connecting to the N nearest nodes. The nodes in the ring act as sentinels and warn the node when an unknown node approaches the central node from any direction.  2.3.1  Problems with exiting P2P solutions  Knutsson et al. [2] describe the most complete system of the three architectures as Solipsis and Kawahara et al. [7] do not consider the issue of game state management. However, Knutsson's approach still has some undesirable properties. The cell sizes do not correspond to the area of interest of the players. As it is, players cannot see across artificial cell boundaries introduced by the game architecture otherwise non-existent in the game world. The authors recommend players with greater visibility requirements to join multiple zones. But this results in a node'receiving unnecessary messages from locations beyond its area of interest. More seriously, the scheme seems to go against the very reasons that make the peer approach appealing. Player updates are routed through the coordinator, which is not much different from a client sending updates to a server. Secondly, the location of players and proximity of neighbouring virtual world zones are not used in deciding how to stitch nodes together while constructing the overlay. As a result, messages have to be routed through multiple nodes that may not even be interested in the contents of a message. There may be as many as 50 virtual hops [5] in a decent sized game. To make matters worse, each hop between nodes handling neighbouring virtual world zones translates to multiple hops on the physical network. Moreover, as  10  Pastry tries to randomize neighbours in the overlay to improve fault tolerance, the messages have to haphazardly bounce all over the Internet before it gets to the destination. All of this adds to the delivery time of updates, adversely affecting the responsive of the application. The main problem with [2] is that it does not fully utilize the possibility of exploiting direct connections made available by the P2P approach. On the other hand, Kawahara's method maximizes connections  the hop count efficiency  by maintaining direct  between players at the cost of increased network traffic. Moreover,  maintaining a list of a fixed number of nearest nodes does not guarantee global connectivity. For instance, if all the N closest nodes crowd together on only one side of the node, the player is rendered completely oblivious to a player approaching from the other side. This situation is depicted in the following diagram (Figure 2-1) where N=3 and A is unable to detect B as it was not connected to C:  Figure 2 - 1 : Kawahara's Node Blindness Like Kawahara et al., Solipsis utilizes direct links between neighbours improving the event delivery delay. It also solves the problem of node blindness due to overcrowding on one side, depicted in Figure 2-1, by ensuring that there is always a ring  11  of neighbours directly connected to the node regardless of their distances. However, proper neighbour discovery is still not guaranteed when players move too fast. Secondly, to maintain global connectivity players have to maintain connections even if they are too far apart to be visible to each other.  2.4  G o a l s o f this A r c h i t e c t u r e  Keeping all of these considerations in mind, we designed our system with the following objectives: •  Maximize the responsiveness of the game by reducing the amount of time packets take to get to recipients  •  Incorporate techniques for maintaining the consistency of the game across all clients  •  Minimize the bandwidth and computational requirements for a single client.  •  Provide fault tolerance against peers unpredictably leaving the system  •  Provide means to manage the game state for player controlled avatars (PC), player controlled entities (NPC), immutable and mutable terrain objects.  •  Provide distributed persistent storage of game state.  •  Where possible ensure the security of the system.  •  Allow the game publisher to retain admission control.  •  Make no assumption about the existence of a multicast infrastructure.  12  non-  Chapter 3 System Overview This chapter presents the high level details of our system called Proximity. We describe how the locality properties of the game are leveraged to distribute game state management over an overlay of player machines without imposing any visible sideeffects to the game play. We also discuss how game level semantic information can be utilized to structure the overlay to minimize the management overhead, network traffic and  latency.  3.1  D i r e c t P l a y e r to P l a y e r connections The  design exploits the limited area of interest of a player to directly interconnect  small subset of player machines. Each node (or player machine) is responsible for just the avatar of the machine's owner. Avatars have a limited area in which they can bring about changes. This is known as the publish area. An avatar also has a subscription area that conforms to the sensing abilities of the avatar. These areas can have different dimensions  13  depending on the capability and sensing ability of the player's avatar. A player can see a player, B, only if its subscription area overlaps the publish area of player B. All players whose areas overlap are connected directly with each other. Every time a player moves it sends a move event to all the connected peers and checks if the areas still overlap. If they do not, the peers drop the connection and are no longer aware of each other. Using publish/subscribe areas is the most efficient and flexible interest management technique. It ensures that only participants who are interested receive only the events that can possibly affect them.  3.2  Neighbor Discovery As players move around they require some mechanism to discover their new  neighbours. In Proximity, some of the player machines are promoted to play the role of a zone coordinator. Zone coordinators are assigned fixed portions of the game world called zones to manage. The coordinators have knowledge of all the avatars currently residing in their zone. They keep a record of their precise location and publish/subscribe areas, so that they can inform players when they come close enough to each other to interact. It is up to the players to decide when to stop communicating with a neighbour Zone coordinators form a structured peer-to-peer overlay. The entire game world is split up into non-overlapping rectangular regions/zones. Each coordinator knows about the zone coordinators of bordering zones and maintains direct unicast links with the neighbouring coordinators. Through these neighbouring coordinators it has indirect links to coordinators covering the entire game space. These indirect links can be used to route a message to any point in the virtual world. In this way, coordinators maintain global  14  connectivity by only keeping a record of a small subset of neighbouring zone coordinators.  3.3  Game State A successful MMOG middleware should provide support for maintaining  consistency and persistence for the game state. This becomes even more important when game state management is delegated to player machines that can arbitrarily join/leave the system. Game state can be broken down into the following classes with interaction between different classes of objects requiring varying level of latency and consistency guarantees: Player controlled Character (PC) Non-Player Controlled Character (NPC) Immutable terrain objects Mutable terrain objects Terrain objects are always located at fixed points on the map. The zone coordinator for that position of the map computes and stores the states of these objects. As an avatar moves through the world the coordinator informs its PlayerMD (the middleware component running on the player's machines) of the positions and states of the all the terrain objects in the avatar's surrounding. The player middleware caches these values and informs the relevant coordinator whenever its avatar interacts with any of the terrain objects. The coordinator resolves any conflict associated with concurrent updates to a terrain object and refreshes the cached value of the shared environment object in all the players in the vicinity.  15  For more latency sensitive interactions like PC-to-PC interactions, PlayerMDs are allowed to optimistically synchronize with each other, involving the coordinator only when necessary to resolve inconsistencies. For two players to interact they must be in close proximity. As discussed in the previous section these players maintain direct connections between them. Using these links a player informs all nearby players of its actions. The player nodes gather these events and periodically recalculate the new state of the world applying the Trailing State Synchronization algorithm [4]. The same algorithm is run on the coordinator but with a greater lag, which ensures the delivery of more events hence allowing the coordinator's simulation engine to arrive at a more consistent state some time later. The state computed at the coordinator is considered to be more authoritative. Whenever an inconsistency is detected between the states maintained by players they revert to the state of their coordinator. Direct connections between players and the resulting lower consistency delays improve the responsiveness of the clients and allow players in fast paced games to take a series of quick reflex actions. The resulting events are only exchanged between interacting players or transmitted to other nearby players within viewing distance of the action. The higher level of network communications is limited to this small set of players.  3.4  Fault Tolerance To protect against ungraceful node failures, every coordinator maintains a chain of  replicas, which are periodically updated. Only one of the replicas actively exchanges messages with the players and neighbouring coordinators, and computes the states of the  16  objects located in the zone. At a fixed interval the active coordinator updates one of the replicas, which relays the updates to the remaining replicas. Since the replicas expect to be updated at regular intervals, if a replica does not receive an update for some time it conclude that an upstream replica has gone down and initiate a process to select new PlayerMD to replace the lost replica, which could possibly be an active coordinator. Replicas save information regarding the players in the region, the active neighbouring coordinators, their replicas and the latest state of the game objects located in the region. When a replica detects a failure of the active coordinator, it informs all the constituent players, neighbouring coordinators and its own replicas that it is taking over the zone from the lost coordinator. The PlayerMDs resend all the events from their cache that the replica might have missed because of the old coordinator failing before updating its replicas.  3.5  Persistence To  insure tolerance for catastrophic failure and allow a minimum level of  persistence there is a lightweight background process, which backs up the state of the different zones of the game world onto the central server. The publishing company can keep a copy of the state of the entire world secure on its own machine. This copy may be out of date by several minutes but it provides a minimum level of protection against a situation where there is a mass exodus of players. In the event of such a catastrophic failure the game will lose the last few minutes of moves. Given the non-critical nature of the application, this is fairly acceptable. We foresee that only in very rare circumstances the game will have to be reset to a slightly outdated state saved at a backup server. Most failures will be sufficiently handled by the replicas of the active zone.  17  3.6  Coordinator Overlay Our design of the overlay centers on a 2 dimensional Cartesian coordinate space.  The entire coordinate space is dynamically partitioned among the coordinators as needed. Each coordinator is assigned its own independent zone within the overall space. A coordinator discovers and tracks other nodes that coordinate zones bordering its own zone. Using this set of coordinators a node can route messages to any random point on the virtual coordinate space.  3.7  Overlay Construction The overlay is constructed incrementally with the initial coordinator started by the  central server on a computer provided by the game publisher. The first coordinator manages the entire game space and maintains a list of all player machines currently playing the game. It can run on the same or a different machine as the central server depending on the resources available to the game company. As players join the game the initial coordinator can become overloaded. When that happens, the system dynamically selects one of the player nodes to share the coordinator's load. The overloaded coordinator splits its zone in half and transfers control of one half to the selected player machine while retaining the other half. This is achieved through the following multi-step process: 1.  A coordinator, C, detects an overload locally and sends a distress signal to the central server.  18  2.  The central server chooses from its list of available player machines the best candidate to share C s load and sends it a signal to spawn a coordinator process on its machine.  3.  The spawned coordinator, D, informs the overloaded coordinator that it is ready to share the load.  4.  The overloaded coordinator decides the orientation of the split and passes the dimensions and coordinates of the newly created zone to the new coordinator, D.  5.  C (the original coordinator) informs its neighbours of the split and passes D information about its inherited neighbours. Based on this information new links are established between D and the old neighbours of C.  6.  C also passes on to D states of all the players and objects that lie within its zone. These transferred players are asked to establish connections with D.  7.  As a last step, D asks the central server to pick player nodes to act as replicas and copies its acquired game object states to all the replicas.  3.8  Routing Each node divides its neighbours in four sets as shown in Table 3 - 1 . On receiving a  message to route to a position (x,y) the node first decides which neighbour set it should search for the next hop.  19  Figure 3-1 Neighbor Registry and Routing Neighbor Registry at Node NI (Figure 3-1) Region  Registered  Trespasser  Neighbors 0  N2, N3  False  1  NULL  True  2  N9, N5, N4  False  3  NO  False  Table 3-1 Neighbor Registry and Routing Within a set the neighbours are sorted in increasing order of their origins - with the origin being the coordinates of top left corner of the zone. The node goes through this list looking for a coordinator that has the smallest Euclidean distance to the message's destination and forwards the message to the selected coordinator. Figure 3-1 shows an interesting twist to this otherwise simple routing mechanism. Neighbour N 3 , has its origin coordinates in Region 0 but it crosses into and covers the entire Region 1. If node A , were to follow the simple routing algorithm mentioned above it would return an error because (x,y) lies in region 1 but the coordinator has no neighbour in that region. To counter this situation, the routing mechanism differentiates N3 as a trespassing neighbour for Region 1. So whenever there is a message destined to some region the algorithm checks to see if there is a trespasser for that region and if so, is it the closest coordinator to the message's destination.  20  3.9  Transparent Borders The routing mechanism is used most extensively when a new player logs in. The  joining player contacts the central server to join the game. At this point the server can refuse the player's entry into the game based on the company's admission control policy. Each player starts off with some starting position, which lies within some coordinator's zone. The server generates a register message for the player with its starting position as the destination of the message and injects the message into the overlay. The message' ultimately ends up at the coordinator that oversees the starting position of the player. The coordinator decides, based on the information in the message, whether the player's publish/subscribe areas encroach into the zones of neighbouring coordinators. It generates a list including all such coordinators and itself. This list is sent to the joining player asking it to register with all the coordinators in the list. Players forward their actions to all the coordinators they are registered with so that their movements can be tracked. This makes the boundaries fairly transparent. In all other zoning techniques proposed so far the perception of the players is limited by the boundaries of the zone in which its origin lies. In these schemes two nearby players separated by the artificial boundary are oblivious to each other's presence.  - CC s Zone  A  ;C1 s Zone  B  Zone Boundary  Subscribe Area  Publish Aree  Figure 3-2 Transparent Boundaries  21  However, in our scheme these boundaries are made completely transparent through a combination of the publish/subscribe areas and the requirement for a player to register with all coordinators whose space the player's publish/subscribe areas overlap. In Figure 3-2, since B's Aol encroaches into C l ' s zone, B sends position updates to CI as well, even though B itself resides in CO. Based on this information CI can track B's location and informs A when B's subscribe area starts overlapping its own publish area.  x  3.10  Splitting & Merging A zone is partitioned in half with the split occurring along a line running through  the center either horizontally or vertically. When the height and width of a zone are equal it is split vertically resulting in two rectangular zones. If the height and width are unequal the zone is split horizontally. In this way, a coordinator that is repeatedly split alternates between rectangular and square zones. This results in more regularly shaped regions. Merging is slightly more complicated. The relationship between coordinators can be classified as "parent-child. (The parent being the coordinator that split and handed over half of its space to the child coordinator.) Each coordinator stores the ED of its parent and a list of its children in the order in which they are 'created'. Whenever a coordinator becomes under utilized it sends a merge request to its parent if it does not have any children. Otherwise, it has to send a merge request to the last created child. A child can turn down the request under two circumstances (i) it is overloaded and merging will, in all likelihood, only result in an even more overloaded coordinator (ii) the child has other children. If this is the case, the parent has to wait for its child to become available. Only  22  once a node has recombined with all of its children in reverse order of creation can it initiate a merge with its parent or accept a parent's invitation to join. The architecture is flexible enough to support more complicated splits. For example, instead of just splitting in half, the vertical or horizontal line could be drawn away from the center and the direction of the split could be altered to equally split the number of players. Alternatively, game semantics can also be involved in the decision to coincide the zone boundaries along natural obstructions in the game like walls or mountain ranges.  3.11 Dynamic Area of Interest (Aol) As the zone split and become progressively smaller to handle higher player densities it becomes increasingly likely that a player's A o l will overlap multiple regions. To prevent this from happening the A o l of a player is resized proportionally to the dimensions of the zone in which it lies. As a result, a player passing through small zones will have its A o l decreased. This scheme is an acceptable adaptation of the cocktail party effect [2] already well established in interest management systems. Even in real life a person in a crowded space tends to only concentrate on listening to few close-by people and blanks out the background noise. Since smaller zones mean higher number of players in the region, one can get the same effect as above by simply reducing the A o l on entering smaller regions.  23  3.12  Overload Detection All along, the system is under pressure to perform its task against hard time  constraints so that the latency and responsiveness demands of the application are adequately met. We use this to decide when a coordinator needs to shed some of its workload. The coordinator times how long it takes to perform state computations and dissemination of the updates. If it sees that it is unable to process the state before the next state is due, it deduces that it is overloaded. The processing time also serves as a weak indicator of the load the node is putting on its network connection. As overloading the network forces the network to drop packets some of which have to be retransmitted by the coordinator resulting in higher processing time.  3.13  Topology Aware Overlay Construction  While choosing a child for an overloaded parent the central server considers amongst other factors the network proximity of the child and the parent. Choosing a child which is closest in network proximity to the parent results in lower latency of message exchanges, leading to better consistency and lower aggregate network consumption. Since neighbouring coordinators need to exchange large amounts of data, this can have a considerable effect on performance. Other P 2 P systems that we have studied do not assign any importance to the underlying network topology. Knutsson et al. [2] for instance uses Pastry, which separates neighbouring coordinators by many overlay hops over the overlay and ignores network topology in deciding the nodes to interconnect to form the overlay. Given the amount of traffic and the latency sensitive nature of the application this is completely unacceptable.  24  In  our scheme, replicas and neighbours are selected to be in close network  proximity. To gauge the network proximity we use the idea of network coordinates proposed by [10]. Basically, each node is assigned network coordinates on the basis of ping times to certain landmarks on the network. Two nodes are considered to be close when the Euclidean distance between their network coordinates is small. By selecting children optimally in network terms we make the entire overlay topologically aware and efficient.  3.14 R e p l i c a R e p l a c e m e n t This is a fairly simple procedure since the replicas keep a record of their original coordinator's neighbours, constituent players and object states. Based on this information they inform the neighbour coordinators and the players that it is taking over the coordination responsibilities. The players update the new coordinator with any events that it might have missed because the original coordinator died before passing them to the replicas. As there is now one fewer replica, the promoted replica asks the central server to nominate another player node to serve as a replica for that zone.  25  Chapter 4 Overlay Network Management This chapter describes a completely distributed algorithm for maintaining the topology of the overlay of coordinators that can dynamically add or remove nodes ' depending on the load distribution. Using this topology the coordinators maintain global connectivity to the entire game while being aware of only their immediate neighbours.  4.1  Routing Table Coordinators maintain information about all of its neighbouring zones in a routing  table. This information consists of the zone's origin, dimensions and the IP and port numbers of its coordinator. Whenever a zone's dimensions are changed only its immediate neighbours need to be notified of the change to maintain a consistent view of how the overlay space is divided amongst the coordinators. The routing table arranges the neighbours in four sets depending on their location relative to the coordinator's own zone. These sets correspond to the regions defined  26  around a zone by superimposing the grid, (described in Section 3.8). A neighbour is inserted into a set if its origin lies in the corresponding region. Within a set, neighbours are sorted by their origin. If the set runs along the y axis (Sets 1 and 3), the neighbours are sorted by the y coordinate of their origin. Otherwise, they are sorted by the x coordinate (Sets 0 and 2). Arranging neighbours in regions simplifies splitting and merging as the neighbours are arranged in a definite order. Coordinators exploit this order to decide how and when to update neighbours of changes in their zones. This arrangement of neighbours also has the following additional advantages, which are elaborated upon in the indicated sections: •  While forwarding packets a node has to search for the next hop through a smaller set of nodes  4.2  •  It prevents routing loop (Section 5.2)  •  During message multicast it prevents duplicate messages (Appendix B.l).  Handling Splits The current implementation divides overloaded coordinators in half by splitting  zones horizontally or vertically. Both kinds of splits are essentially the same. During a split a node (parent) reduces its chosen dimension to half and assigns the remaining half to another machine, which is referred to as the child. Once the child has been bootstrapped with information about its inherited neighbours, the parent informs its neighbours about the new zone so that they can update their routing tables. The type of update depends on the set of the neighbour. This section discusses the process for a vertical split. The process can be extended by symmetry to horizontal splits.  27  For a vertical split as in the case of node A shown in Figure 4-1, nodes in: Set 3: need only to be informed about the reduced width of A's zone. They are not informed about A's new child B as A separates them from B. Node A sends a CAN_announceReplace (please see Table 6-1 for details of this and other message types) to these neighbours asking its entry to be replaced with almost the same information as the original entry except the chosen dimension which is halved.  Split  Boundary Figure 4-1 Vertical Split Set 1: will no longer abut A after the split. So Node A sends all its neighbours in the set a CAN_announceReplace message to have all entries referring to A replaced by B. Sets 0 & 2: These sets run perpendicular to the direction of the split and updating them is not as straight forward as in the first two cases. The neighbours in these two sets can further be classified into four categories:  28  NoChange: These are neighbours like node 1 which lie completely on A's side of the boundary separating A from B. The NoChange nodes only have to be informed about the reduced dimensions of A by using a CAN_announceReplace message.  Boundary Matchers: These are neighbours whose boundary neatly coincides with the line splitting A and B. Neighbours 6 and 7 fall in this category for the split shown in Figure 4-1. A sends them a special message, CAN_announceSplit, telling them to reduce the dimensions of A in their routing records and inserting a new neighbour B in their neighbour list. A and B might be recorded in different neighbour sets of a Boundarymatcher.  BoundaryStradlers: Very similar to boundary matcher, these nodes also lie in sets running perpendicular to the direction of the split. The difference is that the zones of BoundaryStradlers extend on both sides of the split. These nodes are also sent a CAN_announceSplit message by A but in their case both A and B end up in the same neighbour set of the BoundaryStradler. Neighbour 2 in the diagram is an example of a BoundaryStradler.  ReplaceNeighbor: These neighbour lie completely on B's side of the split and are required to replace A by B in their records. Neighbours like node 3 which fall in this category receive a CAN_announceReplace message from the splitting node. These nodes replace the parent by its child with half of its parent's zone.  29  As a last step, after having informed all of its neighbours and bootstrapping a child the splitting node updates its own routing information to reflect the split. Set 3 remains intact with no changes. The neighbour sets 0 and 2 are trimmed to include only NoChange and BoundaryStradlers. Set  1 is reset to include the child and any  BoundaryMatchers, like 6. The BoundaryMatcher has to be moved from set 2 to set 1 because it lies in Region 1 of the resized parent node as shown in the following diagram.  Split Boundary  Region 2 to Region 1  Figure 4-2 Post Split Configuration For horizontal splits BoundaryMatchers from set 3 are moved to the new set 2 as in the diagram (Figure 4-3):  30  Split Boundary Figure 4-3 Horizontal Split  Split Boundary  9 moved from Region 3 to 2 Figure 4-4 BoundaryMatcher (Region 3) - Horizontal Split  31  Presence of a trespasser complicates splitting to some extent. Figure 4-5 shows a 1  node A with a trespasser node 3 from set 0 going over into region 1. When node A splits horizontally it sends a CAN_announceSplit message to node 1 even though it lies in set 0 which is an exception to the update rules defined above. (According to the update rules only CAN_announceReplace messages can be sent to nodes in sets running parallel to the split.) So we introduce the following exceptions in the update rules to accommodate trespassers: •  In case of a horizontal split send a set 0 trespasser a CAN_announceSplit message if the trespasser's lower limits extend below or matches the parent-child boundary.  •  For vertical splits, a CAN_announceSplit message is sent to a trespasser from set 3 if it overlaps with the child.  Boundary Figure 4-5 Split with Trespasser  Trespasser is a neighbour whose zone extends beyond the region of its origin. Please see Section 193.8 for more information on trespassers. 1  32  In the case of a horizontal split roles of the set are exchanged. Sets 0 and 2 get CAN_announceReplace  messages  while  neighbours  in  sets  1 and  3  get  CAN_announceReplace or CAN_announceSplit messages depending on what subcategory of neighbours they fall in.  4.3  Intelligent Split More efficient load balancing can be achieved by dynamically selecting the  orientation and location of the split to cut the number of players exactly in half. Also, to reduce the communication cost of moving state from the parent to the child the boundary between the parent and the child can be initially drawn closer to one side and then slowly moved towards the center transferring the state of the objects over a period of time as objects fall over into the child's half. In this way, a patent does not have to move half of its entire state at once to the child. The boundary can also be made fluid by allow it to move in either direction to balance out the workload on the coordinators. The neighbours can be informed of the corresponding changes by a combination of CAN_announceSplit and CAN_announceReplace messages.  4.4  Bootstrapping the child A parent has to inform its child of the neighbours it will inherit after a split. Using  the example of a vertical split of node A from Figure 4-1, node A sends to node B the following neighbours in a NeighborList message: •  All of its set 2 neighbours.  33  •  A l l neighbours in set 1 and 3 which lie on B's side of the boundary, BoundaryMatchers and BoundaryStradlers.  •  Any trespasser from 0 if it exists and its region crosses the boundary between A and B.  The child is bootstrapped before informing A's neighbours of the split to ensure that the child does not start receiving overlay messages before it has a complete list of its neighbours.  4.5  Merge Algorithm When two nodes recombine they send a replace message to all neighbours around  them except in the direction in which they combine. The parent advertises itself with the bigger combined sized using CAN_announceReplace messages; whereas the child informs its neighbours to replace it with its parent in their routing tables. The child's and the parent's sets that lies along the boundary of the merge are treated differently from other sets as one of the neighbours in the set is always a participant in the merge. In addition to that participant, there can at most be one other node that meets the node at a corner. These are shown below for both horizontal and vertical merges.  t  -p  t  c  1\  P  7T  I c  Figure 4-6 Merge - CAN_announceReplace messages  34  There is guaranteed to be at most only one such neighbour because the parent and the child have the same dimension along the edge that is merged and there is space left for a node to border the node only at the comer.  4.6  Locking Before merging/splitting nodes ensure that none of their neighbours are also in the  process of changing their boundaries. If two neighbours were allowed to split or merge at the same time it would yield inconsistent topologies as in the case of the following diagram:  T1  F1  F2  Start splitting almost at the s a m e time  FI  F2  P a s s their children a list of their inheritec neighbors  CI  C2  T2  announceReplace  /  X  F1  T2  F2  x  y  announceReplace  CI  C2  The parents inform each other of the split A n d form connections with the children  Children unaware of each FI  -J  IP  F2  T4 CI  C2  other as they are not included in the list of neighbors received from the parent  Connections between nodes  Figure 4-7 Inconsistencies caused without Locks The  exact final configuration depends on whether the parent receives the  CAN_announceReplace before or after sending the neighborList to the child. Similar, inconsistencies can arise when two pairs of neighbours merge or one pair merges while  35  the other splits at the same time. To counter these situations nodes send a special message to lock their neighbours before initiating a merge or a split. The Coordinator, as explained in Chapter 6, has sub-components called the Overlay Manager (OM) and the Lock Manager (LM). The Overlay Manager is responsible for maintaining routing table, connections with neighbouring coordinators and routing messages over the overlay. It also handles the splitting and merging of its own zone and keeping its' routing table updated when its neighbouring zones change. The Overlay Manager requests a lock from the Lock Manager before initiating a split or a merge. The Lock Managers working in tandem with Lock Managers in the neighbouring coordinators ensure that only one split/merge is active at a time in a 'neighbourhood'. The Lock Manager utilizes the set of variables shown in the table to ensure this property. Symbol  Description  locksGranted  Number of L M s granted a lock by this L M  requestsSent  Number of pending lock requests sent by this L M  requestNotSent  Flag that is set when a lock request could not be sent to neighbours because the requesting L M has already granted the lock to some other L M  requestTime  A Lamport timestamp assigned to a lock request to unambiguous establish its priority  requestLock  Message sent to neighbouring L M s to request a lock  lockGranted  Message sent by an L M to grant a lock to a requesting L M  blockedNeighbors  List of neighbouring L M s that have requested a lock but have not been granted a lock  grantedBy  List of L M s that have granted this L M a lock  grantedNeighbors  List of L M s that have been granted a lock by this L M Table 4-1 Symbols used in the Locking Algorithm  36  Each Lock Manager maintains two variables locksGranted and requestsSent. LocksGranted corresponds to the number of nodes that have been granted the lock by the node while requestsSents counts the neighbours that have been asked for a lock. To acquire a lock a L M sends a requestLock message to its neighbours incrementing requestsSent once for each node requested. A node can send requestLock only if it has not already granted the lock to another node (i.e. whenever locksGranted is zero). If locksGranted is non-zero the Lock Manager sets requestNotSent and waits for all granted nodes to release their locks before sending its own request for a lock. On receiving a lock request, a node will block the requestor if the requested node currently holds the lock or itself has a pending request with a higher precedence. The precedence of requests is found by first ordering them in decreasing order of requestTimes and then by the requestor's ID. Ids of requestors that are declined a lock are saved by the node in a list of blockedNeighbors. (A node declines a request by not responding to it.) In all other circumstances, a node accepts the lock requests and responds by sending a lockGranted message back to the requestor and incrementing its locksGranted counter. It also adds the granted node's Id to grantedNeighbors list. Every time a Lock Manager receives a lockGranted message it decrements the requestsSent counter. When the counter hits zero it means that all the requested nodes have granted it the lock and the Lock Manager can invoke the call-back function registered by the Overlay Manager. Again, a node records Ids of all nodes that have granted it the lock in a list named grantedBy. When the Lock Manager's releaseLock() method is called by the O M , the L M sends unlock messages to all neighbours in grantedBy and follows that by sending lockGranted messages to neighbours in the blockedNeighbors list. These are nodes that had requested a lock while the node was  37  acquiring or using the lock. The nodes from the blockedNeighbors list are moved to the grantedNeighbors list and the requestsGranted is incremented once for each of them. The unlock message can trigger L M s that had earlier suppressed their lock requests because of non-zero requestsGranted counters at the time of the earlier attempt to restart their suspended lock requests. These nodes will have their requestNotSent set to true by the first call to requestLock function. When nodes with requestNotSent flag set receive an unlock message they make an other attempt to acquire the lock. To prevent starvation the time at the instant of the original (first) requestLock function call is reused as the requestTime.  4.6.1  Locks and Splits  An Overlay Manager detecting an overload sets a flag splitlnProgress, requests a lock and registers its processLock() function with the Lock Manger as a call-back. SplitlnProgress is used to prevent an O M for making multiple split calls while it waits to receive a lock. The Lock Manager invokes OM's processLock() on securing the lock. processLock() sees that splitlnProgress is set so it concurs that the lock was requested for a split and initiates the aforementioned split algorithm but only if the node is still overloaded. Otherwise, it just lets the lock lapse. Once the split is complete, the node resets splitlnProgress and gives up the lock by calling releaseLock().  4.6.2  Locks and Merges  Merging is a bit more involved as two entities- the parent and the child both have to acquire locks from their neighbours before proceeding with the merge. There are some  38  situations in which this can result in loops of lock request which cause deadlocks as shown in the diagram below:  G r a n t e d by all e x c e p t for 1  F i g u r e 4-8  Merges and Deadlocks  The diagram (Figure 4-8) shows a case where P and C have agreed to merge but node 1 is also in the process of changing its zone and has acquired the lock from C and is waiting for the lock from P. P will not grant 1 the lock until it has merged with C which cannot happen as long as there are outstanding locks granted by C. This results in a deadlock as there is a cycle of dependencies. In the more complete algorithm a parent and a child decide to merge by exchanging CANMerge and CANMergeConfirmation messages. Once they have decided to merge the parent locks all of its neighbours and sends the child a list of its locked neighbours. The child, in turn, locks all its neighbours not in the list received from the parent and returns to the parent a special message neighborList containing a list of all its neighbours. Once all neighbours have been locked the parent sends CAN_announceReplace messages to the child's and its own neighbours to announce the merge followed by unlock  39  m e s s a g e s to a l l the n e i g h b o u r s l o c k e d b y i t s e l f a n d the c h i l d . T h e parent u n l o c k s the c h i l d ' s n e i g h b o u r s to protect against the f o l l o w i n g s i t u a t i o n :  Split & announceReplace  Figure 4-9 Why parent should unlock neighbors locked by child T h e n u m b e r s i n F i g u r e 4-9  at the start o f an e d g e s h o w w h e n the m e s s a g e w a s sent  a n d the n u m b e r at the e n d ( c l o s e to the a r r o w h e a d ) s h o w s i n w h a t s e q u e n c e it w a s r e c e i v e d . In the a b o v e case the n e i g h b o r L i s t f r o m the c h i l d is d e l a y e d a n d is r e c e i v e d at t i m e 10 after the s p l i t update f r o m n o d e N . W h e n the n e i g h b o r L i s t f i n a l l y a r r i v e s f r o m the c h i l d , it o v e r w r i t e s a m o r e recent update o f n o d e N ' s d i m e n s i o n s . B y m a k i n g the parent u n l o c k its c h i l d ' s n e i g h b o u r , w e ensure that n o d e N is not a l l o w e d to s p l i t u n t i l the parent has r e c e i v e d the n e i g h b o r L i s t f r o m the c h i l d .  40  Chapter 5 Overlay Routing In this chapter we present detailed algorithms for routing messages over the overlay. We begin by presenting an algorithm inspired by C A N . In the subsequent sections, we introduce incremental improvements to the algorithm which finally yield an algorithm with logarithmic routing cost.  5.1  Basic Routing Algorithm Irrespective of the routing algorithm, points on zone boundaries have to be  assigned deterministically to at most one of the neighbouring coordinators sharing the boundary to unambiguously route messages. A coordinator owns all points on the top and left boundaries of the zone starting from the top left corner to (but not including) top right and bottom left comers, respectively. It does not own any point on the right and the bottom boundaries.  41  Figure 5-1 Boundary Assignment  In this manner all boundaries are allocated to only one coordinator. Messages are routed through the overlay by including the coordinates of the destination in the message header. Nodes along the route of a message decide the region for the message's destination relative to themselves. The following algorithm returns a unique region for a message at a particular node:  Region(mx,my) { //CNode is the current Coordinator if (mx<CNode.x && my< CNode.y+ CNode.height) return 3; if(mx>= CNode.x && my< CNode.y) return 0; if(mx>= CNode.x+ CNode.width && my>= CNode.y) return 1; if(mx< CNode.x+ CNode.width && my>- CNode.y + CNode.height) return 2; else ' return -I; }  Figure 5-2 Region Calculation  Note how the function utilizes the assignment of boundaries to determine the region for message destinations.  42  As described in Section 4.1 the routing table is divided into four sets. Based on the value returned by the above function a node looks for the next hop in the corresponding routing table set. Within the set it picks the coordinator whose origin is closest in Euclidean space to the destination of the message. If the destination lies within the overlay coordinator's own space it passes the message on to the application dependent game middleware coordinator. This algorithm is what we refer to as the basic C A N routing algorithm in the thesis.  Origin of the zone  Figure 5-3 Zone Origin At first blush the top left corner of a coordinator's zones appears to be a reasonable choice for the coordinator's origin. But this choice of the origin results in non-optimal number of routing hops. For instance, in Figure 5-4 the message will be routed through A , B, and C because at each step their origin is closest to the destination. However, had D been picked by S as the next hop, it would have resulted in much fewer hops. So it appears that the dimensions of zones should also be considered while picking the next hop. Priority should be given to a node that will help the message move the farthest distance towards the destination with the minimum number of overlay hops.  43  J  D  •  SRC  A  e  c  -•  DST  Figure 5-4 Effect of origin choice To take advantage of bigger zones its better to use the center of the coordinator's zone as its origin. We refer to this scheme as 'Center Origin' in this thesis. The scheme incorporates into the origin's value a measure of the dimensions of the coordinator's zone yielding fewer hops. The evaluation sections provide empirical evidence for supporting this choice.  5.2  Grid The existence of the four sets of neighbours/regions plays a very subtle but  important role in routing as well. If the nodes were not split into sets (i) search for the next hop would take longer as the search space would be fours times as big; (ii) more crucially it also results in routing loops as depicted in Figure 5 - 5 :  D  D  •  C  •  c  SRC  SRC  B  G  A  A  DSl^  Figure 5-5 Grid and Loops  44  Without the grid A and B would repeatedly exchange the message between themselves because both their origins are closer to the destination compared to C or D. When the grid is laid on top of A to split the space into regions, A is forced to pick a neighbour in the same region as the destination which in the Figure 5-5 would be neighbour C. In other words, the grid forces the message to always move in the direction of its final destination.  5.3  Big Hop The routing algorithm as is requires N 0.5 number of hops in the worst case where A  N is the number of nodes connected in the overlay. To allow even better performance each node maintains links called BigHops to its parent and its first two children. The BigHop links inter-connect all the overlay nodes in the form of a tree. Links to only two children are kept to limit the number connections per node. The first two children are selected as they are always the farthest children of a node and maintaining connections with them allows parents a means to forward a message a long distance in fewer hops towards its destination in the overlay space. To route over the BigHop tree the node has to compare distances of its parent and children to the destination. The algorithm uses the top left corner of a zone as its origin for this purpose. The top left comer of a zone always remains the same regardless of the number of times it splits and merges. Any other choice of origin would have required the coordinator changing its zone dimensions to send an update to its parent and children. This not only requires extra communication but is also impossible to accomplish when parents with more than two children split or merge. As explained above, a parent does not maintain links to its children that are created after the first two splits. So it has no way of communicating changing values for its origin.  45  While routing a message a node checks if any of the nodes in the tree are closer to the destination compared to its best choice neighbour and if so uses the branch in a tree to make what is called a "BigHop'. This allows messages that have to travel long distances to take fewer larger hops to get to their destination. This algorithm just costs three additional links at the most for every node. It also offers the possibility of a message jumping sub-trees using the normal neighbour to neighbour links without reaching the common ancestors of the two trees Without BigHop links the message from the source to destination in Figure 5 - 6 would have to pass through all the small zones that lie between these two points. However, with BigHops node A will see that its child B is closer to destination then any of its neighbours so it will forward the message to B from where it takes only a single hop to get to its final destination. The message would take the same route if the roles of the source and destination were reversed.  E  SRC  •  A  B  GJ F  H  DSl  •  D  c Bidirection Parent-Chile To other possible subtrees^ Unidirectional Child to Parent  Figure 5-6 Relative Location and BigHop Links  46  The depicted space could possibly be divided further. However, the diagram only shows the nodes, links and zones that are important for the sake of this example. The dashed links represent unidirectional connections from a child to a parent. The parent in these cases does not have a direct BigHop link to the child because it already had two other children by the time this child was created. The dotted links represent points where other subtrees can be grafted if the zones where split further. When C receives a message whose final destination is a point in J's space it can send it to either its parent in the BigHop tree, node B, or it can send the message to one of its neighbours. If the neighbour is closer than the parent, it will chose the neighbour effectively jumping sub-trees by using one of its neighbour links. We use the top left corner of the zone as the coordinate to measure distance from while routing within the BigHop tree.  s N A DST •  Figure 5 - 7 Hopping Subtrees In Figure 5-7 since N is closer to DST then S's parent, S will forward the message by using the neighborLink. N will in turn use its neighborLink to forward the message to A which is in the same subtree as DST.  47  5.3.1  Loop Avoidance Rule  In the case of both parent and child being closer to the destination from each other than their closest neighbour they would indefinitely pass the message between themselves. One way to stop a child from sending a message that has already been to its parent is to add a field to signify that the packet has reached the current node using a link from its parent. This field is set to false whenever a non-child BigHop link is used. But this will not work when a packet is routed to another node before making its way back to the child. Since the parent field would have been cleared by the time packet gets back to the child it has no way of knowing that the packet has already been to its parent. As a result, the message will indefinitely loop between the parent and the child through possibly other intermediary nodes. To prevent this, a node only takes a BigHop if doing so will take the message closer to the destination; otherwise it is forced to pick the closest neighbouring node.  Without Loop Avoidance Loop Rule the. message will loop between P and C  Figure 5-8 Loop Avoidance Rule  48  5.4  Lies Within A node cannot blindly send a message to the closest node in the same region as the  destination. If it did so it would be stuck in a loop as in the case depicted in Figure 5 - 9 , as the origins of both A and B are closer to the destination than the owner of the destination's zone, C. This peculiarity can only arise between coordinators of zones that border the zone in which the destination lies. It is easily avoided by checking if the point lies in one of the neighbour's zone before forwarding it to the closest coordinator.  (  c  DST  •  B  A  Node  DST in Region R Neighbors in Region R  Closest Neighbor in Region R  A  0  C,B  B  B  3  C,A  A Figure 5-9 Lies Within  49  5.5  Reverse BigHops  *  " >  A  Figure 5-10 Subspaces riot connected by BigHop  Improvements through BigHop routing are significantly diminished when messages are routed in certain directions e.g. when a message is routed from region A to B in Figure 5-10 it will take a number of small hops through immediate neighbours until it reaches region B from where the power of BigHop routing can be fully realized. This happens because of two reasons; (i) coordinators in the two regions are not related by an ancestor-descendant relationship, hence there are no BigHop interconnections between these two regions (ii) the choice of the top left corner as the origin does not make going through the available links available in Primary Tree attractive enough for the greedy 2  routing algorithm as the message has to be temporarily routed away from the final destination. Figure 5-11 shows an example where S can possibly route a message to DST through three BigHops by choosing a locally non-optimal first step. However, this is disallowed by the greedy routing algorithm. Both these problems can be solved by overlaying an inverted tree consisting of Reverse BigHop links with its root at the bottom right comer of the overlay space. This Secondary Tree (or Reverse Tree) uses the bottom right corners of the zones it stitches together as their origin in contrast to the top left comer as in the case of the Primary or 2  See Table 5-1 for a description of terms used in this section.  50  BigHop tree. Such a tree does not only relate regions A and B of Figure 5-10 but also allows the greedy algorithm to take longer hops in the opposite direction of BigHop Links. The Primary coordinator for a zone cannot act as the coordinator in the secondary tree because when its zone splits it would lose authority over the bottom right corner to its new child. This would necessitate informing all children of the coordinator in the secondary tree to point their reverse links to the new child coordinator. However, as a parent only records links to its first two children, it would not be able to update any other children that it might have. Therefore, each zone is assigned a secondary coordinator whose origin lies in the bottom right corner of the zone. When a region does split the secondary coordinator need not be moved. The splitting primary coordinator is assigned a new Primary Tree Partner (PTPartner) for its smaller zone. And the new primary tree coordinator takes over the remaining half with a 'preinstalled' secondary coordinator. The new PTPartner of the parent is grafted into the secondary tree as a child of its former PTPartner. Figure 5-12 shows the reverse tree added to Figure 5-11.  •  9  A .Uni-—.  » \  .• • 9S•  BigHop Links linking S with Dst BigHop Links accessible at S  / /  •  Uni  DST  Uni-directional BigHop Link from S to its parent  •  Figure 5-11 Available BigHop Links  51  Reverse BigHop Links nking S with DST Zone Reverse BigHop Links accessible at S BigHop Links linking S with DST Zone BigHop Links accessible at S Un-directional BigHop Link from S to its parent Primary Coordinator Secondary Coordinator  Figure 5-12 Reverse BigHop (shows selected Coordinators)  5.5.1  Reverse Tree Construction  The reverse tree is constructed using the following algorithm: •  When there is only one zone the same M D C acts as the coordinator in both trees.  •  On split: •  the child becomes a secondary coordinator for the parent's half of the zone  •  while the parent's original secondary coordinator is inherited by the child  •  the reverse BigHop links are established between the new and the old secondary tree coordinators as needed  The above algorithm abstracts away some of the important details. A splitting coordinator sends its PTPartner information about its newest child so that PTPartner can associate itself with the new coordinator. The child inherits the parent's old PTPartner and sets itself as the PTPartner of its parent in the primary tree. The child is also added as a secondary coordinator for its parent's new smaller zone. The new secondary coordinator becomes the child of parent's original (pre-split) PTPartner. The salient  52  I  feature of the scheme is that all changes in the Primary and Secondary tree remain local to the coordinators directly involved in the split. Other, coordinators need not be informed of the change.  C/2  C/2 C/1  1/C  ©  1/C  1/C 2/3 3/1  2/1  C/2  3/4 2/3  1/C  4/1  ^>  -Bidirection Parent-Child • Unidirectional Child to Parent Primary / Secondary Coorc  Figure 5 - 1 3 Secondary Tree Construction  5.5.2  Routing with Reverse BigHops  The complete routing algorithm which incorporates reverse BigHops is given below. A node receiving a message needs to first check if it received the message because of its role in the primary or secondary tree. If the destination lies within its secondary zone or its secondary origin is closer than all BigHop, Reverse BigHop and ordinary neighbours then it forwards the message to its Secondary Tree Partner (STPartner). If the destination does not lie in the STPartner's zone it can, in turn, send the message using any of its three types of links. If the STPartner chooses one of its Reverse BigHop links the message will continue its journey on the inverted tree. The secondary coordinator is essentially  a form of indirection. It forwards  incoming messages to its associated STPartner which has more accurate information of  53  how its neighbours are arranged. The STPartner can, if required, send a message along the reverse BigHop to another secondary coordinator.  Term  Description  Primary or BigHop Tree  Tree connecting Primary Coordinator with BigHop Links  Secondary or Reverse Tree  Tree connecting Secondary Coordinators with Reverse BigHop Links  Primary Coordinator  Owner of a zone with its origin located at the top left corner of the zone  Secondary Coordinator  Relays messages on the Reverse Tree to its STPartner and is located at the bottom right corner of its zone  PTPartner  Is for a Primary coordinator the secondary coordinator assigned to its zone  STPartner  Is for a Secondary coordinator the Primary coordinator assigned to its zone Table 5-1 Definitions used in Reverse Description  The figure on the next page shows the complete routing algorithm. It makes a locally optimal choice to pick a message's next hop from ordinary neighbour-toneighbour, BigHop and ReverseBigHop links to reach the destination in logarithmic number of hops. The experimental proof for this improvement is furnished in Chapter 7.  54  Route(mssg){ minDist=curDist=distance(CNode.origin, mssg.Dst); STHop=T2Closer=false; STRegion=ST. region (mssg. Dst); BigHopIndex=-l; If(STRegion==-l){  II mssg.Dst lies in STPartner  ST.Send(,mssg); return; }  if(ST.distance(mssg.Dst)<ininDist){  II mssgDst closer to the STOrigin than PTOrigin  minDst-ST.distance(mssg.Dst); }  if(PTbigHop.distance(inssg.Dst)<ininDist){  ,  His the Dst Closer to Primary Tree bigHops < minDst  minDist=PTbigHop.distance(inssg.Dst); bigHoplndex= PTbigHop.closest(mssg.Dst); }  if(PTPbigHop.distance(inssg.Dst)<jninDst)f  His the Dst closer to Primary Tree Partner's bigHops < minDst  STHop=true; BigHopIndex= PTPbigHop.closest(mssg.Dst); minDst-PTPbigHop.distance(mssg.Dst); }  if(Neighbors[region].distance<ininDst \\ curDist==minDist){ II is one of the immediate neighbors or the current coordinator itself Neighbors[region].sendToClosestNeighbor(mssg);  l/closest to the destination. The distance function returns -1 if the  return;  //destination lies within one of the neighbors. curDist==minDist enforces lithe Loop Avoidance Rule  }  if(bigHopIndex==-l){ ST.send(mssg);  //send to Secondary tree partner as it is closer than BigHops, II ReverseBigHop and ordinary Neighbors  return; }  if(STHop) {  II use reverse BigHop  PTPbigHop[BigHopIndex]. send(mssg); }  else  II using BigHop PTbigHop[BigHopIndex].send(mssg);  Figure 5-14 Routing Algorithm  55  ,  Is  Chapter 6 Architecture In this chapter, we elaborate upon Proximity's software architecture by presenting the main components of the game middleware, namely: •  PlayerMD  •  Game Console  •  Zone Coordinator (Game Middleware & Overlay Coordinator)  •  Central Server  6.1  Player Code The code running on the player machine consists of a game console that sits on top  of the middleware, PlayerMD. The console is mostly game specific and is responsible for gathering user input, simulating the effects of user actions and visualization of the virtual world. The game console passes all the events generated by a player to the PlayerMD,  56  which runs underneath it on each player's computer and is the representation of the player in the gaming middleware. The PlayerMDs maintain connections with all the players whose area of interest overlaps with its player's publish area. These players are known as a player's subscribers. Every time it receives an event from the game console the PlayerMD forwards the event to all of its subscribers and also the zone coordinators with whom it is registered (registered coordinators). PlayerMDs periodically remove subscribers and registered coordinators that no longer overlap with its Aol. It adds new coordinators and subscribers to its lists on being informed by a coordinator of a new overlap. Whenever, a PlayerMD receives an event from another player it updates the player's location record and passes the event on to the game console. The game console and the PlayerMD exchange T C P messages on the loopback address. The game console starts the PlayerMD as a separate process and passes to it as arguments the port number for its socket and its player's starting coordinates. The PlayerMD connects to the passed port number. This connection is used to exchange messages between the game console and the PlayerMD. Having a socket based interface allows the Middleware to be used with Game Consoles written in any other language. The small size of the message set allows even existing games to be easily modified to work with the proposed architecture. On start up the -PlayerMD sends a register message to the central server. The central server runs on the game publisher's computer and is used for admission control for the game. The server also keeps a record of all the players currently playing the game. It essentially works as a rendezvous for the peer-to-peer overlay of the coordinators. On receiving a register message from the PlayerMD, the central server injects the register message into the overlay with the player's starting location as the destination of the  57  message. The coordinator responsible for the zone covering that part of the map becomes the primary coordinator for the player. The Primary Coordinator returns the dimensions of the zone, IP and Ports for itself and any other (Secondary) coordinators whose zones the player might infringe. The PlayerMD adds the Primary Coordinator to its list of registered coordinators and subsequently adds all the Secondary Coordinators after registering with them. Registering with a Secondary Coordinator might reveal more Secondary Coordinators that the PlayerMD will have to register with. It is the PlayerMDs responsibility to broadcast the events generated by its Game Console to all its registered coordinators.  MDC  Epoch () Player. Manager  MDC  Initialize)) -._ ProcessNextBatch(timer) . Send(mssg)  k.  A  i  Overloaded)) AddPlayer)) Underloaded))  AddPlayer() MovePlayer(ID) NetworkManager Merge)) — 1  Route(msg)-  LockGranted)), :LockRequest() i ReleaseLock)) ;LockGranted(); : iRequestLock))  Forward(msg)" * Overlay Manager  |4———LockReceived)} -—  RequestLock))—  Figure 6-1 M D C Subcomponents  58  Lock Manager.  6.2  Coordinator The coordinators are composed of the subcomponents shown in Figure 6-1. Unlike  the PlayerMD-console interface, the coordinator interfaces  are more conventional  function based interfaces. The Player Manager is the only application specific component while all other components will exist in one form or another for any applications based on the overlay network. The Player Manager performs the game specific activities like registering players, keeping tabs on their location and informing them of overlaps. It utilizes the services of the Overlay Manager which is responsible for maintaining the overlay topology, routing and communicating any changes to its zone to the Overlay Managers of the neighbouring coordinators. Before initiating a split/merge the Overlay Manager has to request a lock from the Lock Manager. The Lock Manager works with neighbouring Lock Managers to ensure that only one Overlay Manager is modifying its zone in the neighbourhood at a time. The network Manager delivers (/receives) messages for various subcomponents to (/from) other coordinators while hiding the networking details. M D C (subcomponent) determines the control flow for the entire coordinator. It periodically issues an Epoch() event to the Player Manager so that it can update certain time dependent information e.g. checking the status of a player that has been inactive for some time. Between 'epochs' the coordinator waits for messages to be delivered to the Network Manager which calls the appropriate functions in the subcomponents based on the type of the incoming messages. The Player Manager following a game specific logic signals to the Overlay Manager when to split or merge. During testing and evaluation we used number of registered players and bandwidth usage for deciding when to merge. However, Section 3.12 discusses a better more practical metric.  59  6.3  Central Server In the current implementation the central server helps coordinators locate close-by  player machines that do not already have a coordinator running on them. In order to do this the central server has to keep separate lists for PlayerMDs running and not running coordinators. Whenever a child coordinator merges with the parent it informs the central server that its PlayerMD has become available again to spawn another coordinator.  • GC  GMDC  •PMC"  OMDC Ear  a GC  nr GMDC  'PMC-  OMDC Mi .  Figure 6-2 Deployment Diagram  60  4, P  6.4  Inter-Component communication The central server has a single port on which it waits for messages from both  coordinators and PlayerMDs. While coordinators have three different ports (i) for communication with registeredPlayers  (playerport) (ii) for overlay maintenance and  routing (overlayPort) (iii) and for communication between neighboring middleware coordinators (coordinatorPort). A PlayerMD has two other ports in addition to the port used for communication with game console. One is used for inter-player communication while the second port is used to communicate with the coordinating infrastructure which includes both the central server and the coordinators. The following table shows the different message types used by the middleware  Message  Purpose  CAN_announceSplit  Overlay Maintenance Inform boundaryMatchers & BoundaryStradlers of a split  Own and new coordinator's IP, port and new dimensions  CAN_announceRepla ce  Overlay Maintenance Inform Neighbors to replace sender with the coordinator whose information is contained in the message  CAN_NeighbourList  Information From Contained Coordinator Messages  To  Response  Splitting Coordinator  Neighboring Coordinators that meet the requirements  Update routing table  Own and new coordinator's IP, port and new dimensions  Splitting and merging coordinator  Neighboring coordinators  Update routing table  Overlay Maintenance Bootstrapping a new coordinator  Information about neighbors inherited by a new Coordinator  Newly Spawned Coordinator  Build routing table  CAN_Route  Overlay Routing  Message to be routed through the Overlay  Splitting parent to child Merging child to parent Created by the source of the message  Passed through the Overlay until it reaches the destination  Destination: Extract payload and deliver to application Intermediate: Find and pass to next hop  CAN_Split  Overlay maintenance Spawn coordinator requesting bootstrap information  ID, IP and Port of the new coordinator  New Coordinator  Overloaded Coordinator  Initiate Split and return bootstrap information  61  CAN_Dimensions  Overlay Maintenance Partial information for bootstrapping a new coordinator  Origin and dimensions of the inherited zone  Overloaded Coordinator (parent)  Newly spawned coordinator which inherits half of overloaded coordinator's zone  Initialize Coordinator information  CAN_Merge  Overlay Maintenance Request a Merge  Sender's ID  Under-loaded Coordinator  Under-loaded Coordinator's parent/child  Accepted/Declined  Requested Coordinator  Requester  Initiate a merge if also under-loaded and Sender's ID matches next expected merge else decline request. Initiate merge if request accepted  Information about neighbors  Merging child  Merging Parent  Update routing table  Information about registered Player  Merging child  Merging Parent  Update Player Registry  Current Lamport timestamp and ID  Requesting Lock Manager  Grant Lock or force to wait  Lock Management receive lock grant Lock Management release locked neighbors  Granting Lock Manager's ID Releasing Lock Manager's ID  Neighboring Lock Managers Requesting Lock Manager Neighboring Lock Manger  Call registered call back function Record release and initiate own request if there is a pending request  CAN_RequestNeighb ourList  Overlay Maintenance To request NeighborList from Child once the parent has locked its neighbors for a merge  List of neighbors locked by parent  Neighboring Lock Manager Releasing Lock Manager which had formally requested and received a lock Merging Merging Child Parent  CAN_ChangePartner  Overlay Maintenance Information about (Secondary Tree) receivers new Inform Primary Tree secondary tree partner Partner to change its Secondary Tree Partner  Splitting/Mergi Primary Tree ng Coordinator Partner  Update Secondary Tree Partner  InitialRegistration  Player registration  Move  Player Management New location and Inform coordinator of a player ID move  CAN_MergeConfirma Overlay Maintenance tion Response to Merge Request CAN_NeighbourList Overlay Maintenance Merge pass neighbour information during merge CAN_MergeSubList Player Management To pass list of registered player LockRequest Lock Management request a lock LockResponse Unlock  Requests locks from its neighbors not included in the list received from parent  Pla^^erMD to Coord inator Player's coordinates dimensions of the publish & subscribe areas and connection information  62  Player  Coordinator (player manager)  Add player to list of registered players and return coordinators for other overlapping regions  Player  Registered Coordinators  Update Player location  LostRegistration  Player Management Re-register with the proper coordinator  Current Location  Lost Player  LeaveGame  Player Management leave game  Player ID  Player leaving the game  AddCoordinatorlnit  Player Management Inform new player about overlapping coordinators  AddCoordinator (afterRequest)  Player Management Inform player of other overlapping coordinators if a player moves into a neighboring zone or in response to initialRegistration Neighbor Discovery Inform publisher of an overlap with another player's subscription region Inform registered players if the coordinator splits  Last nonoverlapping registered coordinator in the list Registered Coordinators  Register player if it lies within own region else forward through the overlay Remove player from list of registered players  Coordinator Ids and Primary connection information Coordinator  New Player  Coordinator IDs and Registered connection information Coordinator  Player  Add primary coordinator and initiate registration with other coordinators in the list Register with listed coordinators  Subscriber's playerlD and connection information  Registered Coordinator  Player (publisher)  Add player to subscribers list  Registered Coordinator  Player  Update Registered Coordinator list  Registered Coordinator  Player  Update Registered Coordinator list  Primary Coordinator  Player  Add Coordinator to Registered Coordinators and initiate with other listed overlapping coordinators  Coordinator to Pla^perMD  AddSubsriber  CoordinatorSplit  CoordinatorMerge  ReregisterCoord (after LostRegistration)  Coordinator's new dimensions and connection information for coordinator's child Inform registered Parent's new player if the dimensions, connection coordinator merges information and merged child's ID Registration Primary coordinator's information in response and other overlapping coordinator's Ids, for a LostRegistration dimensions and message connection information  Coorc inator to Central Server Overloaded  Overlay Maintenance Connection ask CS for an available information and PlayerMD to share load network coordinates  Overloaded Coordinator  Merged&available  Overlay Management Inform of a merge and availability Register new Player  Merged Child  RegisterPlayer (from CS to Coordinator)  PlayerlD  PlayerlD, Central Server publish/subscribe areas and connection information  63  Finds an available PlayerMD with similar network coordinates Central Server Adds PlayerMD to list of available Players Primary Add player to list Coordinator of registered for the player - players and return message routed coordinators for through the other overlapping overlay regions Central Server  Central Server to PlayerMD SpawnCoord  Ask PlayerMD to start a Coordinator on its machine  Register  Register a new Player  Move  Direct player to player event delivery  Connection Central server information of overloaded coordinator  Available PlayerMD  Starts a coordinator which contacts the overloaded coordinator for bootstrap information  Central Server  Adds player to list of available PlayerMDs and forwards an InitialRegistration message through the Overlay  Subscribing Player  Passes event to its simulation engine  Player to Central Server Player's coordinates dimensions of the publish & subscribe areas and connection information  New player joining the game  PlayerMD to Player MD Event type and related properties  Publishing player  Table 6-1 Message Types  6.5  Removing C S Bottleneck Other than being a rendezvous, the central server's main responsibility is keeping  track of PlayerMDs that are already running a coordinator. There are a number of ways of further distributing this load as well.  6.5.1  Secondary Overlay The available PlayerMDs can form an overlay similar to the coordinators overlay.  The main difference being that all participating nodes are inserted into the overlay on the basis of their network coordinates. The PlayerMDs are arranged in the overlay according to their network coordinates. Whenever, an overloaded coordinator needs another machine it routes a message in this secondary overlay to a point (x, y) where (x, y) are the network coordinates of the overloaded coordinator. The owner of the zone in which (x, y) lies is chosen as the PlayerMD to spawn a new coordinator. The owner is also removed  64  from the secondary overlay as it is no longer available to spawn a coordinator. Because o f the properties of the overlay the chosen node is guaranteed to be close in terms of network coordinates to the overloaded coordinator.  6.5.2  Reusing Primary Overlay  Alternatively, PlayerMD's network coordinates can be mapped onto the overlay of MDCs. A translation from network to game coordinates is required to handle the difference in dimensions of the game map and the limits of the network coordinates. The overlay space is still split according to the loads on the coordinators. A coordinator may store multiple available PlayerMDs whose translated coordinates lie within its zone. To search for an available PlayerMD a coordinator first checks its own zone to see if there are any available PlayerMDs. If not, it can initiate iterative search queries (using techniques discussed in Appendix B) asking all coordinators within the bounded rectangle to return a PlayerMD closest to the overloaded coordinator's network coordinates from the set of available PlayerMDs registered with them. As  mentioned above the current implementation uses the Central Server to locate  available PlayerMDs and it does not support any of the techniques described in Sections 6.5.land 6.5.2.  6.6  C o m p a r i s o n w i t h o t h e r DHT b a s e d S c h e m e s ' The  salient features of the architecture are distributed dynamic load balancing and  more efficient placement of zones on the overlay nodes. There a number of schemes that propose using fixed sized zones/cells. The problem with these systems is that the zone  65  coordinator for all purposes is very much like a mini-server that can become a bottleneck. Moreover, having unnecessary partitions even in times of low load results in a higher possibility of inconsistency and redundant coordination effort between the zone coordinators. The DHTs also do not use any game level or network topological information while assigning zones to nodes in the overlay. Therefore, neighboring zones in the game space will usually end up many overlay hops away from each other. Some proposals counter that by adding links in addition to the overlay connections between neighboring coordinators. In contrast, for the architecture proposed in this thesis the overlay links are alone sufficient to route messages efficiently with the minimum possible delay. Even in the topology aware Pastry, a node only selects the next hop for a message based on the network information. The nodelDs are still assigned randomly to nodes. Therefore, even if similar Ids were assigned to neighboring coordinators in the game space, the corresponding nodes might still be far away from each other in the network. It is also not possible for a Pastry based gaming system to dynamically change the dimensions of the zones as these schemes assume that an objects coordinates can be mapped to a cell_id by using a predetermined function. If the zone sizes and boundaries change dynamically a straight forwarding mapping function can no longer determine the correct cell_id. This problem does not arise in the above architecture because a player only needs its own coordinates to locate its primary coordinator or as these schemes would like to call them 'Home Nodes.'  66  Chapter 7 Performance Evaluation We have implemented a prototype that allows the coordinators to dynamically split and merge as needed while maintaining a consistent overlay topology. The prototype also defines protocols used between the components of the architecture to handle player registration,  player  movement,  Aol  overlap  notification  and  inter-coordinator  communication. The current prototype only maintains simple state in the form of player location and their Aols. We have also implemented all the advanced overlay routing mechanisms discussed in Chapter 5. In this chapter, we discover system properties, study how they are affected by increasing number of players and finally, validate our claim of the routing improvements. We conducted extensive tests involving up to a 32000 overlay nodes to verify our routing algorithm and topology construction visually using the visualization tool, shown in Figure 7-1.  v  Simulations were run with 1000, 2000, 3000, 4000 and 5000 players. The purpose of these tests was to see how evenly the workload and number of links required by each entity are spread. The experiments simulated lmillion second of game play with each  67  player issuing an event every 500 millisecond. The statistics were collected once the simulation had run for the intended period of time and taken as being a sample for the entire duration of the simulation. This section also includes experimental results as evidence supporting routing improvement through the alterations suggested in this thesis.  Figure 7-1 Visualization Tool - Screenshot  68  7.1  System Properties 120  Figure 7-2 Players per Zone  Since all coordinators were simulated on the same machine we could not use the automatic load detection algorithm of Section 3.12 instead using weighted averages we measured the bandwidth consumed at each coordinator and set 1 and 3.5 KB/s as the merge and split thresholds, respectively. This led to a very fair distribution of players over the coordinators. Figure 7-2 and Figure 7-3 show that the splitting and merging algorithms are able to effectively limit the load on the coordinators within desired thresholds. The mechanism is quick to respond to any uneven load distribution as there are no coordinators that lie too far outside both the thresholds. The bandwidth requirement for coordinators grows with the number of registered players. Figure 7-3 shows the strong correlation between the two. On an average an additional player registered with a coordinator requires roughly 49 bytes per second.  69  0.5  20  40  60  80  Registered Players/Zone  Figure 7-3 B W requirement per Player at M D C  The expected number of subscribers per player is given by the equation (see appendix A l for derivation of this formula): f  E(Subscribers)  i  N xAx •yJ2xAoI  2 D  W  ~  \  + AoI  2  D  2  n  Where: Dimensions of the virtual world - W xW D  D  Number of players = N Dimensions of Aol = Aol x Aol D  D  For the simulation discussed in this section the values of the variables are A =5000, 7  W = 1500, AoI -l5. Inserting these values in the above equation yields an average D  D  2.287 subscribers per player. The average for the experiment was 2.08 connections per  70  player which is very close to the theoretical estimate derived from the equation. Figure 7-4 shows the distribution of inter-player connections. The graph shows that almost 70 percent of the players required two or less connections while no one was connected to more than 7. This is a marked improvement over Solipsis [8] and Voronoi [26] which require connections to all players around the player for neighbor discovery. The number of player-to-player connections in the case of Voronoi climbs rapidly as the number of players is increased. [26] reports an average as high as 14 connections per player for the Voronoi scheme with a total of only 250 players. The authors of Solipsis did not publish any experimental results but it is clear that it will require much more than 2 connections per player to maintain the convex hull around each player.  120  ,  0 1  2  3  4  5  6  7  8  9  10  Subscribers  Figure 7-4 Subscribers per Player The authors of [27] contend that one of the strengths of their system, M O P A R , is a fixed number of neighboring coordinators. Even though our scheme dynamically and distributedly draws the zones it still yields an average of 7 neighbors per zone. In the distribution of Figure 7-5 82% of the nodes have 7 or less neighbors.  71  Neighbors  Figure 7-5 Neighbors per Coordinator  Coordinators  Figure 7-6 Coordinators per Player  The above figure (Figure 7-6) shows that having less neighbors might not be that important. As players on an average stay within the interior of the zone and establish connections with multiple coordinators only for fleeting periods of time when they cross borders. Almost 80 % of the players are connected only to their primary coordinator  72  7.2  Scalability The  experiments  returned very encouraging results vis-a-vis  scalability. The  simulation was run with different number of players ranging from 1000 to 5000. The neighbors/coordinator,  bandwidth/coordinator,  and  players/zone  stayed  relatively  constant over the range of players. Figures Figure 7-7, Figure 7-8 and Figure 7-9'show that the scheme is .very scalable from the perspective of the coordinating infrastructure.  8 ,  :  0 -I  , 1000  ,  ,  2000  3000  .  , 4000  5000  # Players  Figure 7-7 Neighbors per Coordinator  2.20  2.00 ~  1.80  m *  1.60  m  1.40 1.20 1.00  Figure 7-8 Average Instantaneous Coordinator Bandwidth  73  45  „  a.  25 20  -I  , 1000  , 2000  , 3000  , 4000  5000  Total Players  Figure 7-9 Players per Zone However, the average bandwidth used by players consistently increases at the rate a constant  rate (Figure 7-10). This increase  is caused  because of the  increased  communication between players with overlapping Aols. Since the size of the virtual world was fixed as the number of players is increased their density rises as well leading to a higher likelihood of overlapping publish/subscribe areas. This reasoning is supported by the higher average number of subscribers per node for simulations with higher number of total players as depicted in Figure 7-11. The bandwidth per player grows at a rate slower rate than the average number of subscribers. The bandwidth shown in Figure 7-10 includes the playerMDs' communication with the coordinating infrastructure in addition to its subscribers. Since bandwidth consumed for publishing events to subscribers is only a portion of the sum, the rate of growth of the Total Bandwidth is slower than the growth rate of the average number of subscribers per player.  74  0.16 0.14  g m  <u  0.12 0.08 0.06  a>  £ 0.04 0.02 0 1000  2000  3000  4000  5000  Player Count Figure 7-10 Bandwidth per Node 2.5 ,  Si  "5 1.5in a  3 W  1 -  D)  ^ 0.50 1000  2000  3000  4000  5000  Player Count Figure 7-11 Subscribers per Player  7.3  Routing The original CAN scheme requires O(N 0.5) hops in the worst case to route a A  message where N is the number of nodes in the overlay. This is one of the main reasons for researchers to favor Pastry which requires only 0(log N) hops. Pastry however, is  75  much less flexible in terms of how values are placed on the overlay nodes and it does not allow semantically aware placement. Our improvements to the routing scheme led to considerable reduction in the routing cost. Because of the tree structure superimposed on the overlay by the Reverse BigHop, the overlay can now route messages within 0(log N) hops. The values for Figure 7-12 and Table 7-1 where collected by routing messages between a 100 randomly selected source and destination coordinates in the overlay. 60  0 -I  ,  2000  ,  4000  ,  6000  ,  8000  ,  10000 12000  ,  ,  16000 32000  N u m b e r of N o d e s  Figure 7-12 Routing Comparison 2K  Avg CAN Original CAN Center BigHop Reverse BigHop  4K  8K  6K  Max  Sum  Avg  Max  Sum  Avg  Max  Sum  Avg  10.59  24  1059  14.1  33  1412  19.3  • 43  1934  10.28  23  1028  13.8  33  1383  18.9  43,  8.94  22  894  11.8  29  1183  15.1  8.65  21  865  10.8  32  1076  13.1  76  Max  Sum  21.78  51  2178  1893  21.04  47  2104  35  1509  16.92  46  1692  37  1314  14.41  50  1441  8K Max  Sum  Avg  21.78  51  2178  26,24  21.04  47  2104  16.92  46  1692  14.41  50  1441  Avg CAN Original CAN Center BigHop Reverse BigHop  10K Max  1 Sum Avg  12K Max  Sum  Avg  32K Max  Sum  51.61  106  5161  16K Max  Sum  Avg  32.93  65  3293  56  2624|  26  64  2630  25.73  55  25731  26  61  2595  32.46  65  3246  49.96  105  4996  20.31  45  2031  20  55  1954  25.76  56  2576  35.44  95  3544  15.99  39  1599  16  56  1604  20.2  53  2020: 25.41  103  2541  Table 7-1 Routing Comparison 30 j 25 20 in  §• 15 -  x  10 -5 0 7  -1  8  1  9  1  10  1  1  11  12  1  13  1  14  Log(nodes)  Figure 7-13 Reverse BigHop - Logarithmic Scale Figure 7-13 depicts how the number of routing hops increases with the log of the number of nodes in the overlay. The best fit line shows that the routing cost is indeed logarithmic as claimed. In general, our experiments prove that the architecture places an acceptable amount of uniformly distributed load over all components of the system and that the system properties identified in Section 7.1 remain stable as the number of players are increased which augurs well for the scalability of middleware.  77  Chapter 8 Conclusions 8.1  Future Work: Service Discovery An entity's position is the most important attribute in a game. Most of the functions  are highly correlated to an item's location. We optimized the overlay topology by arranging the nodes according to their location in the virtual world. This.also resulted in a simple direct mapping from an entity's location in the game space to the overlays key space and allowed efficient range queries based on the location because of the way overlay nodes are arranged. Service Discovery [20] is another application that depends on the location of mobile devices  and services. Recently, there has been considerable research on  incrementally deployable distributed service discovery infrastructure. A number of them propose using P2P overlays [21, 22] that do not exploit any location information in their construction. Moreover, these overlays are highly inefficient in performing range queries,  78  which are very common in service discovery. Therefore, we feel that our architecture is well suited for such an application. Our architecture can be easily extended by representing mobile devices and services as players. Mobile service providers would publish services in a 'publication area' and clients will be returned all matching services whose publish area overlaps with the client's 'subscription area'. One difference will be in selecting where to run the coordinator for an area. The choice largely depends on the environment where the application is running and the types of services being supported. In an environment with a large number of mobile devices connected by an ad-hoc network some of the mobile devices themselves will serve as coordinators. A second, more widely accepted model, assumes the presence of coordinating infrastructure. There are two variations of the type of infrastructure assumed to be available. One assumes each point of presence deploys a coordinator [24], while in the other each provider/consumer of a service provides a coordinator [23]. The later requires simple wireless access to the Internet for the mobile devices to connect to the service discovery system; whereas, the former necessitates both an access point and a server to provide service discovery in a given area. In either of these two variations the coordinator could be run on the machines currently performing service matching. Regardless of where the coordinators are run, they will be arranged in a peer-to-peer overlay and assigned ownership of portions of the physical space. As future work, we plan to adapt the architecture for scalable global service discovery. We are currently leaning towards the model that assumes wireless access to the Internet for mobile devices where service matching is performed by a group of peers. Nonetheless, our architecture is flexible enough to be useful in all types of environments.  79  8.2  Conclusion This thesis has provided a general overview of MMOG, its requirements and  various architectures that have been used to meet these requirements. We also proposed an architecture that utilizes the players' machines to allow gaming companies to scalably reach out to larger audiences without a huge investment in the infrastructure. The proposed ideas open the possibility of bringing MMOG to a completely new genre of games involving fast paced action that have very strict latency requirements. By adopting the publish-subscribe mechanism we guarantee near optimal area of interest management at the same time as partitioning the workload dynamically without introducing unnatural, in-transparent boundaries in the virtual world. We also show how the tried and tested Trailing State Consistency protocol nicely fits on to the architecture to provide persistent and fault tolerant management of players and terrain objects and propose an extension of the architecture for service discovery. We hope that the improvements in the routing algorithm and the addition of support for bounded multicast to the underlying overlay will encourage numerous exciting P 2 P applications.  80  Appendix A Derivations A.l  Expected Average Subscribers per Player  Dimensions of the virtual world * = W  xW  D  D  Number of players = N N  Density of Players = D = Dimensions of Aol * = Aol  D  X Aol  D  Maximum possible distance, A, between two players with overlapping Aols = diagonal of the Aol = ^2xAoI (when the line joining two players runs along their diagonals) 2  D  Minimum Distance, B, between two non-overlapping player = Aol  D  (when the players Aols align perfectly horizontally and vertically) Average Distance, R —  A +B  =  J2xAoI  +AoI  2 D  D  81  Since a player is connected to at most all players that lie within a circular space of radius R around itself the average number of subscribers per player = expected number of players with this circular space. Expected number of players within this space = Player Density DxAxR  2  =  N  r  xAx  i  •J^xAolJ  T  \  + AoI  2  L  assuming square dimensions for the virtual world and the Aol  82  x  Area of Space =  Appendix B Bounded Multicast B.l  Bounded Multicast and Range Queries The  overlay provides the facility to broadcast a message to a bounded rectangular  region within the coordinate overlay space. A message can be broadcasted to all coordinators whose zones overlap a rectangle (x, y, x+width, y+height) by sending a special broadcast message to the broadcast-origin coordinator, BOCoordinator. This is the coordinator that owns the point (x,y), the top left corner of the broadcast rectangle. The  broadcast message also contains the xlimit (x+width) and the ylimit (y+width). The  BOCoordinator sends a broadcast_X message to all neighbors in its Set 1 and any trespasser from Set 0 whose zone overlaps with the broadcast rectangle. Nodes receiving a broadcast_X message forward it to all their neighbors in Set 1 that overlap with the broadcast rectangle. A node forwards a broadcast_X message to its Set 0 trespasser only if the node's zone lies or crosses the boundary of the broadcast rectangle. If it does not then there is guaranteed to be another neighbor above it, which will forward the message to the trespasser. This rule prevents the transmission of duplicate messages to the trespassers and the nodes downstream from them. The nodes keep forwarding the broadcast_X message until it reaches a node whose zone crosses the xLimit of the broadcast rectangle.  83  Broadcast Origin Coordinator  Broadcast Bounding Rectangle \  Broadcast Bounding Rectangle • |  Broadcast_X Broadcast_Y  1. sends to 5 even though it's a trespasser because its own zone lies on the boundary of the broadcast rectangle. 2. Sends Broadcast_Y to only its first neighbor in its region 2 that overlaps the broadcast rectangle. 3. Does not forward Broadcast_X message to 4 because it does not lie on the top boundary of the bounding rectangle. 4. Forwards Broadcast_X message to multiple nodes as they overlap with the bounding rectangle. (  Figure B-1 Bounded Broadcast The BOCoordinator also sends a broadcast_Y message to the first node in its Set 2. A node receiving a broadcast_Y message in addition to propagating the message to its first neighbor in Set 2, generates a broadcast_X message for nodes in its Set 1. The propagation of broadcast_X and broadcast_Y stops when xLimit and yLimit are reached, respectively. This algorithm guarantees that no node receives a duplicate message. This is achieved without keeping any message histories and the need for generating and maintaining any sequence numbers. The approach is also more resilient to faults as there is no risk of losing sequence number records due to faulty nodes. It also allows bounded multicast. None of the known schemes [28] for similar overlays provide any of these benefits.  84  Bounded multicasts can be used to search for values stored in sections of the overlay. It can support range queries like 'find all objects with a property X that lie within a bounded rectangle.' For a gaming middleware, this facility can be used for multicasting an event like a nuclear detonation to all the players/objects within the area affected by the explosion. There are applications that search for objects within some area and expand the search space when the smaller area fails to return any useful results. More often than not the expanded area includes the area that has already been explored in the first failed query. So optimization can be made to prevent redundant searches in an already explored area.  Figure B-2 Redundant Queries Prevention  Querying the expanded area would require searching the shaded area A, again. To prevent that we split the expanded search space as show below:  85  1  4  2  3  Figure B-3 Iterative Queries Four separate bounded broadcast messages are sent to BOCoordinators for 1, 2, 3 and 4 with the corresponding rectangular bounds. The four smaller rectangles together with A make up the whole area of the expanded search space. This also results in faster query processing time as the query is computed in parallel in all the rectangles. If the expanded search space does not return any satisfactory results, the search space is expanded again around the old expanded search spaces. This provides a mechanism for iteratively expanding search spaces without redundant computation and messages which is important for a number of applications including service discovery.  B.2  Query Termination Another non-trivial problem associated with range queries is determining query  termination at the querier. Every node forwarding a broadcast message sends to the querier a list of IDs of all the coordinators to which it is forwarding the message. Based on this list the querier can keep track of all nodes that have not responded. Keeping simple counts of nodes receiving the multicast would have been enough to determine when a query is complete as each node is guaranteed to receive only one broadcast  86  message/range query. Having richer information (ID and coordinates) of the involved nodes provides alternatives in case of nodes not returning any results for the query. Using the coordinate information the query can be restarted from where it failed.  87  Bibliography [1] Honghui L u , Bjorn Knutsson, Margaret Delap, John Fiore, and Baohua Wu. The Design of Synchronization Mechanisms for Peer-to-Peer Massively Multiplayer Games. [2] Bjorn Knutsson, Honghui Lu, Wei Xu, and Bryan Hopkins. Peer-to-Peer Support for Massively Multiplayer Games. [3] Jouni Smed, Timo Kaukoranta, and Harri Hokonen. Aspects of Networking in Multiplayer Computer Games. [4] Chris Dickey, Daniel Zappala, and Virginia Lo. A Distributed Architecture for Massively Multiplayer Online Games. [5] Shun-Yu Hu, and Guan-Ming Liao. Scalable Peer to Peer Networked Virtual Environment. [6] Eric Conan, Burton Filstrup, and Anthony Kurc. A Distributed Multiplayer Game Server System. [7] Nobutaka Matsumoto, Yoshihiro Kawahara, Hiroyuki Morikawa, and Tomonori Aoyama. A Scalable and Low Delay Communication Scheme for Networked Virtual Environments. [8] Joaquin Keller, and Gwendal Simon. Solipsis: A Massively Multi-Participant Virtual World. [9]K.L. Morse. Interest management in large scale distributed simulations  88  [10] T. S. Eugene Ng, and Hui Zhang. Predicting Internet network distance with coordinates-based approaches. [11] Anthony Rowstron and Peter Druschel. Pastry: scalable, decentralized object location and routing for large scale peer-to-peer systems. [12] Miguel Castro, Michael B. lones, Anne-Marrie Kermarrec, Anthony Rowstron, Marvin Theimer, Helen Wang, and Alec Wolman. An evaluation of scalable applicationlevel multicast built using peer-to-peer overlays. [13]Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A Scalable Content Addressable Network. [14] Ion Stoica, Robert Morris, David Liben-Nowell, David Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications. [15] I. Liebeherr, and Tyler K, Beam. HyperCast: A Protocol for Maintaining Multicast Group Members in a Logical Hypercube Topology. [16] C. Diot, and L. Gautier. A distributed architecture for multiplayer interactive applications on the internet. [17] Paul Bettner, and Mark Terrano. GDC2001: 1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond. [18] Ashwin R. Bharambe, Sanjay Rao, and Srinivasan Seshan. Mercury: a scalable publish-subscribe system for internet games. [19] E. J. Berglund and D. R. Cheriton. Amaze: A multiplayer computer game.  89  [20] Feng Zhu, Matt Mutka, and Lionel Ni. Classification of Service Discovery -in Pervasive Computing Environments. [21] Filipe Araujo, and Luis Rodrigues. GeoPeer: A Location Aware Peer to Peer System. [22] Magdalena Balazinska, Hari Balakrishnan, and David Karger. INS/TWLNE: A Scalable Peer-to-Peer Architecture for Intentional Resource Discovery. [23] Mike Blackstock, and Charles Kraisic. Towards an Architecture for Scalable Location-based Service Discovery. [24] Zona, Inc. Terazona: Zona application frame work white paper, 2002. [25] Butterfly.net, Inc. The butterfly grid: A distributed platform for online games, 2003. [26] Shun Yun Hu, and lui-Fa Chen. Scalable Network Virtual Environment, 2005. [27] Anthony Yu, and Son Vuong. MOPAR: a Mobile Peer-to-Peer Overlay Architecture for Interest Management of Massively Multiplayer Online Games, 2005. [28] Sylvia Ratnasamy, Mark Handley, Richard Karp, and Scott Shenker, Application Level Multicast Using Content Addressable Networks, 2002.  90  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items