Cache and branch prediction improvements for advanced computer architecture

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Cache and branch prediction improvements for advanced computer architecture Chu, Yul

Abstract

As the gap between memory and processor performance continues to grow, more and more programs will be limited in performance: by the memory latency of the system and by the branch instructions (control flow of the programs). Meanwhile, due to the increase in complexity of application programs over the last decade, object-oriented languages are replacing traditional languages because of convenient code reusability and maintainability. However, it has also been observed that the run-time performance of object-oriented programs can be improved by reducing the impact caused by the memory latency, branch misprediction, and several other factors. In this thesis, two new schemes are introduced for reducing the memory latency and branch mispredictions for High Performance Computing (HPC). For the first scheme, in order to reduce the memory latency, this thesis presents a new cache scheme called TAC (Thrashing-Avoidance Cache), which can effectively reduce instruction cache misses caused by procedure call/returns. The TAC scheme employs N-way banks and XOR mapping functions. The main function of the TAC is to place a group of instructions separated by a call instruction into a bank according to the initial and final bank selection mechanisms. After the initial bank selection mechanism selects a bank on an instruction cache miss, the final bank selection mechanism will determine the final bank for updating a cache line as a correction mechanism. These two mechanisms can guarantee that recent groups of instructions exist in each bank safely. A simulation program, TACSim, has been developed by using Shade and Spixtools, provided by SUN Microsystems, on an ultra SPARC/10 processor. Our experimental results show that TAC schemes reduce conflict misses more effectively than skewed-associative caches in both C (9.29% improvement) and C++ (44.44% improvement) programs on LI caches. In addition, TAC schemes also allow for a significant miss reduction on Branch Target Buffers (BTB). For the second scheme to reduce branch mispredictions, this thesis also presents a new hybrid branch predictor called the GoStay2 that can effectively reduce misprediction rates for indirect branches. The GoStay2 has two different mechanisms compared to other 2-stage hybrid predictors that use a Branch Target Buffer (BTB) as the first stage predictor: First, to reduce conflict misses in the first stage, an effective 2-way cache scheme is used instead of a 4-way set-associative scheme. Second, to reduce mispredictions caused by an inefficient predict and update rule, a new selection mechanism and update rule are proposed. A simulation program, GoS-Sim, has been developed by using Shade and Spixtools, provided by SUN Microsystems, on an Ultra SPARC/10 processor. Our results show significant improvement with these mechanisms compared to other hybrid predictors. For example, the GoStay2 improves indirect misprediction rates of a 64-entry to 4K-entry BTB (with a 512- or lK-entry PHT) by 14.9% to 21.53% compared to the Cascaded predictor (with leaky filter).

Item Metadata

Title	Cache and branch prediction improvements for advanced computer architecture
Creator	Chu, Yul
Publisher	University of British Columbia
Date Issued	2001
Description	As the gap between memory and processor performance continues to grow, more and more programs will be limited in performance: by the memory latency of the system and by the branch instructions (control flow of the programs). Meanwhile, due to the increase in complexity of application programs over the last decade, object-oriented languages are replacing traditional languages because of convenient code reusability and maintainability. However, it has also been observed that the run-time performance of object-oriented programs can be improved by reducing the impact caused by the memory latency, branch misprediction, and several other factors. In this thesis, two new schemes are introduced for reducing the memory latency and branch mispredictions for High Performance Computing (HPC). For the first scheme, in order to reduce the memory latency, this thesis presents a new cache scheme called TAC (Thrashing-Avoidance Cache), which can effectively reduce instruction cache misses caused by procedure call/returns. The TAC scheme employs N-way banks and XOR mapping functions. The main function of the TAC is to place a group of instructions separated by a call instruction into a bank according to the initial and final bank selection mechanisms. After the initial bank selection mechanism selects a bank on an instruction cache miss, the final bank selection mechanism will determine the final bank for updating a cache line as a correction mechanism. These two mechanisms can guarantee that recent groups of instructions exist in each bank safely. A simulation program, TACSim, has been developed by using Shade and Spixtools, provided by SUN Microsystems, on an ultra SPARC/10 processor. Our experimental results show that TAC schemes reduce conflict misses more effectively than skewed-associative caches in both C (9.29% improvement) and C++ (44.44% improvement) programs on LI caches. In addition, TAC schemes also allow for a significant miss reduction on Branch Target Buffers (BTB). For the second scheme to reduce branch mispredictions, this thesis also presents a new hybrid branch predictor called the GoStay2 that can effectively reduce misprediction rates for indirect branches. The GoStay2 has two different mechanisms compared to other 2-stage hybrid predictors that use a Branch Target Buffer (BTB) as the first stage predictor: First, to reduce conflict misses in the first stage, an effective 2-way cache scheme is used instead of a 4-way set-associative scheme. Second, to reduce mispredictions caused by an inefficient predict and update rule, a new selection mechanism and update rule are proposed. A simulation program, GoS-Sim, has been developed by using Shade and Spixtools, provided by SUN Microsystems, on an Ultra SPARC/10 processor. Our results show significant improvement with these mechanisms compared to other hybrid predictors. For example, the GoStay2 improves indirect misprediction rates of a 64-entry to 4K-entry BTB (with a 512- or lK-entry PHT) by 14.9% to 21.53% compared to the Cascaded predictor (with leaky filter).
Extent	7190701 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-10-07
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0065352
URI	http://hdl.handle.net/2429/13689
Degree	Doctor of Philosophy - PhD
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2001-05
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2001-714489.pdf -- 6.86MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Cache and branch prediction improvements for advanced computer architecture Chu, Yul

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights