Open Collections will undergo scheduled maintenance on the following dates: On Monday, April 27th, 2026, the site will not be available from 7:00 AM – 9:00 AM PST and on Tuesday, April 28th, 2026, the site will remain accessible from 7:00 AM – 9:00 AM PST, however item images and media will not be available during this time.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- An FPGA memory architecture to enable efficient weight...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
An FPGA memory architecture to enable efficient weight implementations for machine learning applications Chua, Martin
Abstract
Hardware acceleration for machine learning applications has become increasingly important as models grow and evolve rapidly. FPGAs are able to adapt to these changes quickly because they are hardware reconfigurable while also providing low latency, high throughput, and efficiency. The efficiency of machine learning acceleration is intrinsically tied to memory access latency, capacity, and bandwidth. On an FPGA, fine-grained resources like flip-flops and LUTRAMs provide lower
latency access but offer limited storage capacity. Dedicated on-chip BRAMs provide higher density but is still finite. Off-chip DRAM suffers from increased latency and constrained bandwidth, which limits throughput of model training and inference.
Prior work has proposed an architectural enhancement that allows the user to re-purpose unused configuration bits as user-accessible memory. In typical FPGA implementations, there remains a significant portion of routing segments are left unused. By modifying the switch block architecture, the configuration bits controlling unused segments can be implemented as user storage. Inspired by this work and the growing demand for machine learning acceleration, we present three research contributions.
The first contribution is an FPGA architecture enhancement, called switch block memory, that allows the user to re-purpose unused FPGA switch block configuration bits to implement weight memory in machine learning applications. The second contribution is a comprehensive analysis of machine learning memory utilization to identify the specific contexts where our switch block memories is most effective. The third contribution is an augmented CAD flow integrated into the
open-source VTR CAD suite to evaluate the proposed architecture. When applied to selected machine learning workloads, our approach achieves up to a 9% improvement in Fmax, a 3% reduction in total wire length, and enables up to 80 Mb of additional on-chip storage for large FPGA devices.
Item Metadata
| Title |
An FPGA memory architecture to enable efficient weight implementations for machine learning applications
|
| Creator | |
| Supervisor | |
| Publisher |
University of British Columbia
|
| Date Issued |
2026
|
| Description |
Hardware acceleration for machine learning applications has become increasingly important as models grow and evolve rapidly. FPGAs are able to adapt to these changes quickly because they are hardware reconfigurable while also providing low latency, high throughput, and efficiency. The efficiency of machine learning acceleration is intrinsically tied to memory access latency, capacity, and bandwidth. On an FPGA, fine-grained resources like flip-flops and LUTRAMs provide lower
latency access but offer limited storage capacity. Dedicated on-chip BRAMs provide higher density but is still finite. Off-chip DRAM suffers from increased latency and constrained bandwidth, which limits throughput of model training and inference.
Prior work has proposed an architectural enhancement that allows the user to re-purpose unused configuration bits as user-accessible memory. In typical FPGA implementations, there remains a significant portion of routing segments are left unused. By modifying the switch block architecture, the configuration bits controlling unused segments can be implemented as user storage. Inspired by this work and the growing demand for machine learning acceleration, we present three research contributions.
The first contribution is an FPGA architecture enhancement, called switch block memory, that allows the user to re-purpose unused FPGA switch block configuration bits to implement weight memory in machine learning applications. The second contribution is a comprehensive analysis of machine learning memory utilization to identify the specific contexts where our switch block memories is most effective. The third contribution is an augmented CAD flow integrated into the
open-source VTR CAD suite to evaluate the proposed architecture. When applied to selected machine learning workloads, our approach achieves up to a 9% improvement in Fmax, a 3% reduction in total wire length, and enables up to 80 Mb of additional on-chip storage for large FPGA devices.
|
| Genre | |
| Type | |
| Language |
eng
|
| Date Available |
2026-04-09
|
| Provider |
Vancouver : University of British Columbia Library
|
| Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
| DOI |
10.14288/1.0451848
|
| URI | |
| Degree (Theses) | |
| Program (Theses) | |
| Affiliation | |
| Degree Grantor |
University of British Columbia
|
| Graduation Date |
2026-05
|
| Campus | |
| Scholarly Level |
Graduate
|
| Rights URI | |
| Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International