Inter-core locality aware memory access scheduling

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Inter-core locality aware memory access scheduling Li, Dongdong

Abstract

Graphics Processing Units (GPUs) run thousands of parallel threads and achieve high Memory Level Parallelism (MLP). To support high MLP, a structure called a Miss-Status Holding Register (MSHR) handles multiple in-flight miss requests. When multiple cores send requests to the same cache line, the requests are merged into one last level cache MSHR entry and only one memory request is sent to the Dynamic Random-Access Memory (DRAM). We call this inter-core locality. The main reason for inter-core locality is that multiple cores access shared read-only data within the same cache line. By prioritizing memory requests that have high inter-core locality, more threads resume execution. Many memory access scheduling policies have been proposed for general-purpose multi-core processors and GPUs. However, some of these policies do not consider the characteristic of GPUs and others do not utilize inter-core locality information. In this thesis, we analyze the reasons that inter-core locality exists and show that requests with more inter-core locality have a higher impact performance. To exploit inter-core locality, we enable the GPU DRAM controller to be aware of inter-core locality by using Level 2 (L2) cache MSHR information. We propose a memory scheduling policy to coordinate the last level cache MSHR and the DRAM controller. 1) We introduce a structure to enable the DRAM to be aware of L2 cache MSHR information. 2) We propose a memory scheduling policy to use L2 cache MSHR information. 3) To prevent starvation, we introduce age information to the scheduling policy. Our evaluation shows a 28% memory request latency reduction and an 11% performance improvement on the average for high inter-core locality benchmarks.

Item Metadata

Title	Inter-core locality aware memory access scheduling
Creator	Li, Dongdong
Publisher	University of British Columbia
Date Issued	2015
Description	Graphics Processing Units (GPUs) run thousands of parallel threads and achieve high Memory Level Parallelism (MLP). To support high MLP, a structure called a Miss-Status Holding Register (MSHR) handles multiple in-flight miss requests. When multiple cores send requests to the same cache line, the requests are merged into one last level cache MSHR entry and only one memory request is sent to the Dynamic Random-Access Memory (DRAM). We call this inter-core locality. The main reason for inter-core locality is that multiple cores access shared read-only data within the same cache line. By prioritizing memory requests that have high inter-core locality, more threads resume execution. Many memory access scheduling policies have been proposed for general-purpose multi-core processors and GPUs. However, some of these policies do not consider the characteristic of GPUs and others do not utilize inter-core locality information. In this thesis, we analyze the reasons that inter-core locality exists and show that requests with more inter-core locality have a higher impact performance. To exploit inter-core locality, we enable the GPU DRAM controller to be aware of inter-core locality by using Level 2 (L2) cache MSHR information. We propose a memory scheduling policy to coordinate the last level cache MSHR and the DRAM controller. 1) We introduce a structure to enable the DRAM to be aware of L2 cache MSHR information. 2) We propose a memory scheduling policy to use L2 cache MSHR information. 3) To prevent starvation, we introduce age information to the scheduling policy. Our evaluation shows a 28% memory request latency reduction and an 11% performance improvement on the average for high inter-core locality benchmarks.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2015-04-08
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 2.5 Canada
DOI	10.14288/1.0166115
URI	http://hdl.handle.net/2429/52650
Degree (Theses)	Master of Applied Science - MASc
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2015-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Inter-core locality aware memory access scheduling Li, Dongdong

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights