Broadening the applicability of FPGA-based soft vector processors

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Broadening the applicability of FPGA-based soft vector processors Severance, Aaron

Abstract

A soft vector processor (SVP) is an overlay on top of FPGAs that allows data- parallel algorithms to be written in software rather than hardware, and yet still achieve hardware-like performance. This ease of use comes at an area and speed penalty, however. Also, since the internal design of SVPs are based largely on custom CMOS vector processors, there is additional opportunity for FPGA-specific optimizations and enhancements. This thesis investigates and measures the effects of FPGA-specific changes to SVPs that improve performance, reduce area, and improve ease-of-use; thereby expanding their useful range of applications. First, we address applications needing only moderate performance such as audio filtering where SVPs need only a small number (one to four) of parallel ALUs. We make implementation and ISA design decisions around the goals of producing a compact SVP that effectively utilizes existing BRAMs and DSP Blocks. The resulting VENICE SVP has 2x better performance per logic block than previous designs. Next, we address performance issues with algorithms where some vector elements ‘exit early’ while others need additional processing. Simple vector predication causes all elements to exhibit ‘worst case’ performance. Density time masking (DTM) improves performance of such algorithms by skipping the completed elements when possible, but traditional implementations of DTM are coarse-grained and do not map well to the FPGA fabric. We introduce a BRAM-based implementation that achieves 3.2x higher performance over the base SVP with less than 5% area overhead. Finally, we identify a way to exploit the raw performance of the underlying FPGA fabric by attaching wide, deeply pipelined computational units to SVPs through a custom instruction interface. We support multiple inputs and outputs, arbitrary-length pipelines, and heterogeneous lanes to allow streaming of data through large operator graphs. As an example, on an n-body simulation problem, we show that custom instructions achieve 120x better performance per area than the base SVP.

Item Metadata

Title	Broadening the applicability of FPGA-based soft vector processors
Creator	Severance, Aaron
Publisher	University of British Columbia
Date Issued	2015
Description	A soft vector processor (SVP) is an overlay on top of FPGAs that allows data- parallel algorithms to be written in software rather than hardware, and yet still achieve hardware-like performance. This ease of use comes at an area and speed penalty, however. Also, since the internal design of SVPs are based largely on custom CMOS vector processors, there is additional opportunity for FPGA-specific optimizations and enhancements. This thesis investigates and measures the effects of FPGA-specific changes to SVPs that improve performance, reduce area, and improve ease-of-use; thereby expanding their useful range of applications. First, we address applications needing only moderate performance such as audio filtering where SVPs need only a small number (one to four) of parallel ALUs. We make implementation and ISA design decisions around the goals of producing a compact SVP that effectively utilizes existing BRAMs and DSP Blocks. The resulting VENICE SVP has 2x better performance per logic block than previous designs. Next, we address performance issues with algorithms where some vector elements ‘exit early’ while others need additional processing. Simple vector predication causes all elements to exhibit ‘worst case’ performance. Density time masking (DTM) improves performance of such algorithms by skipping the completed elements when possible, but traditional implementations of DTM are coarse-grained and do not map well to the FPGA fabric. We introduce a BRAM-based implementation that achieves 3.2x higher performance over the base SVP with less than 5% area overhead. Finally, we identify a way to exploit the raw performance of the underlying FPGA fabric by attaching wide, deeply pipelined computational units to SVPs through a custom instruction interface. We support multiple inputs and outputs, arbitrary-length pipelines, and heterogeneous lanes to allow streaming of data through large operator graphs. As an example, on an n-body simulation problem, we show that custom instructions achieve 120x better performance per area than the base SVP.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2015-03-24
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 2.5 Canada
DOI	10.14288/1.0167154
URI	http://hdl.handle.net/2429/52490
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2015-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Broadening the applicability of FPGA-based soft vector processors Severance, Aaron

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights