UBC Theses and Dissertations
Generalized high availability via virtual machine replication Cully, Brendan
Allowing applications to survive hardware failure is a expensive undertaking, which generally involves re-engineering software to include complicated recovery logic as well as deploying special-purpose hardware; this represents a severe barrier to improving the dependability of large or legacy applications. We describe the construction of a general and transparent high-availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs. Remus provides an extremely high degree of fault tolerance, to the point that a running system can transparently continue execution on an alternate physical host in the face of failure with only seconds of downtime, completely preserving host state such as active network connections. We describe our approach, which encapsulates protected software in a virtual machine, asynchronously propagates VM state to a backup host at frequencies as high as forty times a second, and uses speculative execution to concurrently run the active VM slightly ahead of the replicated system state.
Item Citations and Data