- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Design and performance evaluation of a superscalar...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Design and performance evaluation of a superscalar digital signal processor Bagnordi, Hani
Abstract
Multimedia applications are compute intensive applications that often contain multiple streams of operations such as audio and video running in parallel. These characteristics make them suitable for multi-stream processing through the use of parallel processing. In the first part of the thesis we evaluate the instruction and data bandwidth requirements of typical signal processing and multimedia functions. In the second part of the thesis we compare the advantages and disadvantages of programmable and algorithm specific integrated circuit digital signal processors. In order to accommodate the high level of instruction and data bandwidth found in multimedia applications we propose a superscalar digital signal processor (DSP) that can execute multiple instructions in parallel. A second feature of the proposed DSP is scalability. Scalability is an important feature needed to accommodate future increases in the performance requirements of multimedia applications. Our proposed DSP is also compatible with existing instruction-set architectures which eliminates the need for a specialized compiler. Next, we introduce a detailed analysis of the availability and distribution of instruction-level parallelism in ten existing multimedia applications. We also discuss the relationship between instruction-level parallelism and machine parallelism. In the following part of the thesis, we discuss machine parallelism and describe the different superscalar techniques used to fetch, decode, and execute multiple instructions per cycle. These techniques include out-of-order issue with out-of- order completion, register renaming, and branch prediction. A parameterizable superscalar simulator is used to simulate ten real-world multimedia applications on five different models of parallelism. The five models represent a wide range of machine parallelism, from a bare scalar machine to an ideal superscalar machine with unlimited parallelism. Results of the instruction- level parallelism study show that the multimedia benchmarks simulated contain a high level of parallelism in their code which make them suitable candidates for multiple-issue machines. In addition, this high level of instruction parallelism proves to be evenly distributed throughout the program which helps in maintaining a good balance between available parallelism and on-chip resources. Furthermore, machine parallelism simulation results show that instruction fetching places the ultimate limit on speedup and is the most critical factor in determining overall performance. However, by using aggressive branch prediction mechanisms and out-of-order issue with out-of-order completion techniques we are able to obtain a two to four fold increase in performance compared to a single issue scalar machine. Overall results show that abundant instruction parallelism combined with adequate machine parallelism proves to be a real performance booster for multimedia applications.
Item Metadata
Title |
Design and performance evaluation of a superscalar digital signal processor
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
1997
|
Description |
Multimedia applications are compute intensive applications that often contain multiple
streams of operations such as audio and video running in parallel. These characteristics make
them suitable for multi-stream processing through the use of parallel processing. In the first part
of the thesis we evaluate the instruction and data bandwidth requirements of typical signal processing
and multimedia functions. In the second part of the thesis we compare the advantages and
disadvantages of programmable and algorithm specific integrated circuit digital signal processors.
In order to accommodate the high level of instruction and data bandwidth found in multimedia
applications we propose a superscalar digital signal processor (DSP) that can execute multiple
instructions in parallel. A second feature of the proposed DSP is scalability. Scalability is an
important feature needed to accommodate future increases in the performance requirements of
multimedia applications. Our proposed DSP is also compatible with existing instruction-set architectures
which eliminates the need for a specialized compiler.
Next, we introduce a detailed analysis of the availability and distribution of instruction-level
parallelism in ten existing multimedia applications. We also discuss the relationship between
instruction-level parallelism and machine parallelism. In the following part of the thesis, we discuss
machine parallelism and describe the different superscalar techniques used to fetch, decode,
and execute multiple instructions per cycle. These techniques include out-of-order issue with out-of-
order completion, register renaming, and branch prediction. A parameterizable superscalar
simulator is used to simulate ten real-world multimedia applications on five different models of
parallelism. The five models represent a wide range of machine parallelism, from a bare scalar
machine to an ideal superscalar machine with unlimited parallelism. Results of the instruction-
level parallelism study show that the multimedia benchmarks simulated contain a high level of
parallelism in their code which make them suitable candidates for multiple-issue machines. In
addition, this high level of instruction parallelism proves to be evenly distributed throughout the
program which helps in maintaining a good balance between available parallelism and on-chip
resources. Furthermore, machine parallelism simulation results show that instruction fetching
places the ultimate limit on speedup and is the most critical factor in determining overall performance.
However, by using aggressive branch prediction mechanisms and out-of-order issue with
out-of-order completion techniques we are able to obtain a two to four fold increase in performance
compared to a single issue scalar machine. Overall results show that abundant instruction
parallelism combined with adequate machine parallelism proves to be a real performance booster
for multimedia applications.
|
Extent |
4188927 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-03-24
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0065125
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
1997-11
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.