UBC Theses and Dissertations
Software-hardware co-design for energy efficient datacenter computing Hetherington, Tayler Hicklin
Datacenters have become commonplace computing environments used to offload applications from distributed local machines to centralized environments. Datacenters offer increased performance and efficiency, reliability and security guarantees, and reduced costs relative to independently operating the computing equipment. The growing trend over the last decade towards server-side (cloud) computing in the datacenter has resulted in increasingly higher demands for performance and efficiency. Graphics processing units (GPUs) are massively parallel, highly efficient accelerators, which can provide significant improvements to applications with ample parallelism and structured behavior. While server-based applications contain varying degrees of parallelism and are economically appealing for GPU acceleration, they often do not adhere to the specific properties expected of an application to obtain the benefits offered by the GPU. This dissertation explores the potential for using GPUs as energy-efficient accelerators for traditional server-based applications in the datacenter through a software-hardware co-design. It first evaluates a popular key-value store server application, Memcached, demonstrating that the GPU can outperform the CPU by 7.5x for the core Memcached processing. However, the core processing of a networking application is only part of the end-to-end computation required at the server. This dissertation then proposes a GPU-accelerated software networking framework, GNoM, which offloads all of the network and application processing to the GPU. GNoM facilitates the design of MemcachedGPU, an end-to-end Memcached implementation on contemporary Ethernet and GPU hardware. MemcachedGPU achieves 10 Gbit line-rate processing at the smallest request size with 95-percentile latencies under 1.1 milliseconds and efficiencies under 12 microjoules per request. GNoM highlights limitations in the traditional GPU programming model, which relies on a CPU for managing GPU tasks. Consequently, the CPU may be unnecessarily involved on the critical path, affecting overall performance, efficiency, and the potential for CPU workload consolidation. To address these limitations, this dissertation proposes an event-driven GPU programming model and set of hardware modifications, EDGE, which enables any device in a heterogeneous system to directly manage the execution of pre-registered GPU tasks through interrupts. EDGE employs a fine-grained GPU preemption mechanism that reuses existing GPU compute resources to begin processing interrupts in under 50 GPU cycles.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International