UBC Theses and Dissertations
Reliable client-server communication in distributed programs Ravindran, K.
Remote procedure call (RPC) and shared variable are communication abstractions which allow the various processes of a distributed program, often modelled as clients and servers, to communicate with one another across machine boundaries. A key requirement of the abstractions is to mask the machine and communication failures that may occur during the client-server communications. In practice, many distributed applications can inherently tolerate failures under certain situations. If such application layer information is available to the client-server communication layer (RPC and shared variable), the failure masking algorithms in the communication layer may relax the constraints under which the algorithms may have to operate if the information is not available. The relaxation significantly simplifies the algorithms and the underlying message transport layer and allows formulation of efficient algorithms. This application-driven approach forms the backbone of the failure masking techniques described in the thesis, as outlined below: Orphan handling in RPCs: Using the application-driven approach, the thesis introduces a new technique of adopting the orphans caused by failures during RPCs. The adoption technique is preferable to orphan killing because orphan killing wastes any work already completed and requires rollback which may be expensive and sometimes not meaningful. The thesis incorporates orphan adoption into two schemes of replicating a server: i) Primary-secondary scheme in which one of the replicas of the server acts as the primary and executes RPCs from clients while the other replicas stand by as secondaries. When the primary fails, one of the secondaries becomes the primary, restarts the server execution from the most recent checkpoint and adopts the orphan, ii) Replicated execution scheme in which an RPC on the server is executed by more than one replica of the server. When any of the replicas fails, the orphan generated by the failure is adopted by the surviving replicas. Both schemes employ call re-executions by servers based on the application-level idempotency properties of the calls. Access to shared variables: Contemporary distributed programs deal with a new class of shared variables such as information on name bindings, distributed load and leadership within a service group. Since the consistency constraints on such system variables need not be as strong as those for user data, the access operations on the variables may be made simpler using this application layer information. Along this direction, the thesis introduces an abstraction, which we call application-driven shared variable, to govern access operations on the variables. The algorithms for the access operations on a variable use intra-server group communication and enforce consistency of the variable to the extent required by the application. The thesis describes complete communication models incorporating the application-driven approach to mask failures.
Item Citations and Data