UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Distributed random load balancing Wang, Chunpu


Low latency is highly desirable for cloud services spanning thousands of servers. With the rapid development of cloud market, the size of server farms grows fast. Hence, stringent timing requirements are needed for task scheduling in a large-scale server farm. Conventionally, the Join-the-Shortest-Queue (JSQ) algorithm, which directs an arriving task to the least loaded server, is adopted in scheduling. Despite its excellent delay performance, JSQ is throughput-limited, and thus doesn't scale well with the number of servers. There are two distributed algorithms proposed as "approximations" of the idealized JSQ. The first one is the Power-of-d-choices (Pod) algorithm, which selects d servers at random and routes a task to the least loaded server of the d servers. Despite its scalability, Pod suffers from long tail response times. The second one is the distributed Join-the-Idle-Queue (JIQ), which take advantages idle servers for task scheduling. In this thesis, we are interested in exploring Pod and JIQ further. First, a hybrid scheduling strategy called Pod-Helper is proposed. It consists of a Pod scheduler and a throughput-limited helper. Hybrid scheduling takes the best of both worlds, enjoying scalability and low tail response times. In particular, hybrid scheduling has bounded maximum queue size in the large-system regime, which is in sharp contrast to the Pod scheduling whose maximum queue size is unbounded. Second, we conduct an in-depth analysis for distributed Join-the-Idle-Queue (JIQ), a promising new approximation of an idealized task-scheduling algorithm. In particular, we derive semi-closed form expressions for the delay performance of distributed JIQ. Third, we propose a new variant of distributed JIQ that offers clear advantages over alternative algorithms for large systems.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International