UBC Theses and Dissertations
Inter-core locality aware memory access scheduling Li, Dongdong
Graphics Processing Units (GPUs) run thousands of parallel threads and achieve high Memory Level Parallelism (MLP). To support high MLP, a structure called a Miss-Status Holding Register (MSHR) handles multiple in-flight miss requests. When multiple cores send requests to the same cache line, the requests are merged into one last level cache MSHR entry and only one memory request is sent to the Dynamic Random-Access Memory (DRAM). We call this inter-core locality. The main reason for inter-core locality is that multiple cores access shared read-only data within the same cache line. By prioritizing memory requests that have high inter-core locality, more threads resume execution. Many memory access scheduling policies have been proposed for general-purpose multi-core processors and GPUs. However, some of these policies do not consider the characteristic of GPUs and others do not utilize inter-core locality information. In this thesis, we analyze the reasons that inter-core locality exists and show that requests with more inter-core locality have a higher impact performance. To exploit inter-core locality, we enable the GPU DRAM controller to be aware of inter-core locality by using Level 2 (L2) cache MSHR information. We propose a memory scheduling policy to coordinate the last level cache MSHR and the DRAM controller. 1) We introduce a structure to enable the DRAM to be aware of L2 cache MSHR information. 2) We propose a memory scheduling policy to use L2 cache MSHR information. 3) To prevent starvation, we introduce age information to the scheduling policy. Our evaluation shows a 28% memory request latency reduction and an 11% performance improvement on the average for high inter-core locality benchmarks.
Item Citations and Data
Attribution-NonCommercial-NoDerivs 2.5 Canada