UBC Theses and Dissertations
Cache and branch prediction improvements for advanced computer architecture Chu, Yul
As the gap between memory and processor performance continues to grow, more and more programs will be limited in performance: by the memory latency of the system and by the branch instructions (control flow of the programs). Meanwhile, due to the increase in complexity of application programs over the last decade, object-oriented languages are replacing traditional languages because of convenient code reusability and maintainability. However, it has also been observed that the run-time performance of object-oriented programs can be improved by reducing the impact caused by the memory latency, branch misprediction, and several other factors. In this thesis, two new schemes are introduced for reducing the memory latency and branch mispredictions for High Performance Computing (HPC). For the first scheme, in order to reduce the memory latency, this thesis presents a new cache scheme called TAC (Thrashing-Avoidance Cache), which can effectively reduce instruction cache misses caused by procedure call/returns. The TAC scheme employs N-way banks and XOR mapping functions. The main function of the TAC is to place a group of instructions separated by a call instruction into a bank according to the initial and final bank selection mechanisms. After the initial bank selection mechanism selects a bank on an instruction cache miss, the final bank selection mechanism will determine the final bank for updating a cache line as a correction mechanism. These two mechanisms can guarantee that recent groups of instructions exist in each bank safely. A simulation program, TACSim, has been developed by using Shade and Spixtools, provided by SUN Microsystems, on an ultra SPARC/10 processor. Our experimental results show that TAC schemes reduce conflict misses more effectively than skewed-associative caches in both C (9.29% improvement) and C++ (44.44% improvement) programs on LI caches. In addition, TAC schemes also allow for a significant miss reduction on Branch Target Buffers (BTB). For the second scheme to reduce branch mispredictions, this thesis also presents a new hybrid branch predictor called the GoStay2 that can effectively reduce misprediction rates for indirect branches. The GoStay2 has two different mechanisms compared to other 2-stage hybrid predictors that use a Branch Target Buffer (BTB) as the first stage predictor: First, to reduce conflict misses in the first stage, an effective 2-way cache scheme is used instead of a 4-way set-associative scheme. Second, to reduce mispredictions caused by an inefficient predict and update rule, a new selection mechanism and update rule are proposed. A simulation program, GoS-Sim, has been developed by using Shade and Spixtools, provided by SUN Microsystems, on an Ultra SPARC/10 processor. Our results show significant improvement with these mechanisms compared to other hybrid predictors. For example, the GoStay2 improves indirect misprediction rates of a 64-entry to 4K-entry BTB (with a 512- or lK-entry PHT) by 14.9% to 21.53% compared to the Cascaded predictor (with leaky filter).
Item Citations and Data