Analytical modeling of modern microprocessor performance

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Analytical modeling of modern microprocessor performance Chen, Xi

Abstract

As the number of transistors integrated on a chip continues to increase, a growing challenge is accurately modeling performance in the early stages of processor design. Analytical modeling is an alternative to detailed simulation with the potential to shorten the development cycle and provide additional insight. This thesis proposes hybrid analytical models to predict the impact of pending cache hits, hardware prefetching, and realistic miss status holding register (MSHR) resources on superscalar performance. We propose techniques to model the non-negligible influences of pending hits and the fine-grained selection of instruction profile window blocks on the accuracy of hybrid analytical models. We also present techniques to estimate the performance impact of data prefetching by modeling the timeliness of prefetches and to account for a limited number of MSHRs by restricting the size of profile window blocks. As with earlier hybrid analytical models, our approach is roughly two orders of magnitude faster than detailed simulations. Overall, our techniques reduce the error of our baseline from 39.7% to 10.3% when the number of MSHRs is unlimited. When modeling a processor with data prefetching, a limited number of MSHRs, or both, our techniques result in an average error of 13.8%, 9.5% and 17.8%, respectively. Moreover, this thesis proposes analytical models for predicting the cache contention and throughput of heavily fine-grained multithreaded architectures such as Sun Microsystems' Niagara. We first propose a novel probabilistic model using statistics characterizing individual threads run in isolation as inputs to accurately predict the number of extra cache misses due to cache contention among a large number of threads. We then present a Markov chain model for analytically estimating the throughput of multicore, fine-grained multithreaded architectures. Combined, the two models accurately predict system throughput obtained from a detailed simulator with an average error of 8.3% for various cache configurations. We also show that our models can find the same optimized design point of fine-grained multithreaded chip multiprocessors for application-specific workloads 65 times faster than detailed simulations. Furthermore, we show that our models accurately predict cache contention and throughput trends across varying workloads on real hardware, a Sun Fire T1000 server.

Item Metadata

Title	Analytical modeling of modern microprocessor performance
Creator	Chen, Xi
Publisher	University of British Columbia
Date Issued	2009
Description	As the number of transistors integrated on a chip continues to increase, a growing challenge is accurately modeling performance in the early stages of processor design. Analytical modeling is an alternative to detailed simulation with the potential to shorten the development cycle and provide additional insight. This thesis proposes hybrid analytical models to predict the impact of pending cache hits, hardware prefetching, and realistic miss status holding register (MSHR) resources on superscalar performance. We propose techniques to model the non-negligible influences of pending hits and the fine-grained selection of instruction profile window blocks on the accuracy of hybrid analytical models. We also present techniques to estimate the performance impact of data prefetching by modeling the timeliness of prefetches and to account for a limited number of MSHRs by restricting the size of profile window blocks. As with earlier hybrid analytical models, our approach is roughly two orders of magnitude faster than detailed simulations. Overall, our techniques reduce the error of our baseline from 39.7% to 10.3% when the number of MSHRs is unlimited. When modeling a processor with data prefetching, a limited number of MSHRs, or both, our techniques result in an average error of 13.8%, 9.5% and 17.8%, respectively. Moreover, this thesis proposes analytical models for predicting the cache contention and throughput of heavily fine-grained multithreaded architectures such as Sun Microsystems' Niagara. We first propose a novel probabilistic model using statistics characterizing individual threads run in isolation as inputs to accurately predict the number of extra cache misses due to cache contention among a large number of threads. We then present a Markov chain model for analytically estimating the throughput of multicore, fine-grained multithreaded architectures. Combined, the two models accurately predict system throughput obtained from a detailed simulator with an average error of 8.3% for various cache configurations. We also show that our models can find the same optimized design point of fine-grained multithreaded chip multiprocessors for application-specific workloads 65 times faster than detailed simulations. Furthermore, we show that our models accurately predict cache contention and throughput trends across varying workloads on real hardware, a Sun Fire T1000 server.
Extent	1122882 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-06-11
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0065506
URI	http://hdl.handle.net/2429/8979
Degree	Master of Applied Science - MASc
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2009-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Analytical modeling of modern microprocessor performance Chen, Xi

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights