UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Broadening the applicability of FPGA-based soft vector processors Severance, Aaron

Abstract

A soft vector processor (SVP) is an overlay on top of FPGAs that allows data- parallel algorithms to be written in software rather than hardware, and yet still achieve hardware-like performance. This ease of use comes at an area and speed penalty, however. Also, since the internal design of SVPs are based largely on custom CMOS vector processors, there is additional opportunity for FPGA-specific optimizations and enhancements. This thesis investigates and measures the effects of FPGA-specific changes to SVPs that improve performance, reduce area, and improve ease-of-use; thereby expanding their useful range of applications. First, we address applications needing only moderate performance such as audio filtering where SVPs need only a small number (one to four) of parallel ALUs. We make implementation and ISA design decisions around the goals of producing a compact SVP that effectively utilizes existing BRAMs and DSP Blocks. The resulting VENICE SVP has 2x better performance per logic block than previous designs. Next, we address performance issues with algorithms where some vector elements ‘exit early’ while others need additional processing. Simple vector predication causes all elements to exhibit ‘worst case’ performance. Density time masking (DTM) improves performance of such algorithms by skipping the completed elements when possible, but traditional implementations of DTM are coarse-grained and do not map well to the FPGA fabric. We introduce a BRAM-based implementation that achieves 3.2x higher performance over the base SVP with less than 5% area overhead. Finally, we identify a way to exploit the raw performance of the underlying FPGA fabric by attaching wide, deeply pipelined computational units to SVPs through a custom instruction interface. We support multiple inputs and outputs, arbitrary-length pipelines, and heterogeneous lanes to allow streaming of data through large operator graphs. As an example, on an n-body simulation problem, we show that custom instructions achieve 120x better performance per area than the base SVP.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 2.5 Canada