Scalable and deterministic timing-driven parallel placement for FPGAs

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Scalable and deterministic timing-driven parallel placement for FPGAs Wang, Chao Chris

Abstract

This thesis describes a parallel implementation of the timing-driven VPR 5.0 simulated-annealing placement engine. By partitioning the grid into regions and allowing distant data to grow stale, it is possible to consider a large number of non-conflicting moves in parallel and achieve a deterministic result. The full timing-driven placement algorithm is parallelized, including swap evaluation, bounding-box calculation and the detailed timing-analysis updates. The partitioned region approach slightly degrades the placement quality, but this is necessary to expose greater parallelism. We also suggest a method to recover the lost quality. In simulated annealing, runtime can be shortened at the expense of quality. Using this method, the serial placer can achieve a maximum speedup of 100X while quality metrics degrades as much as 100%. In contrast, the parallel placer can scale beyond 500X with all quality metrics degrading by less than 30%. Specifically, at the point where the parallel placer begins to dominate over the serial placer, the post-routing minimum channel width, wirelength and critical-path delay degrades 13%, 10% and 7% respectively on average compared to VPR’s original algorithm,while achieving a 140X to 200X speedup 25 threads. Finally, it is shown that the amount of degradation in the parallel placer is independent of the number of threads used.

Item Metadata

Title	Scalable and deterministic timing-driven parallel placement for FPGAs
Creator	Wang, Chao Chris
Publisher	University of British Columbia
Date Issued	2011
Description	This thesis describes a parallel implementation of the timing-driven VPR 5.0 simulated-annealing placement engine. By partitioning the grid into regions and allowing distant data to grow stale, it is possible to consider a large number of non-conflicting moves in parallel and achieve a deterministic result. The full timing-driven placement algorithm is parallelized, including swap evaluation, bounding-box calculation and the detailed timing-analysis updates. The partitioned region approach slightly degrades the placement quality, but this is necessary to expose greater parallelism. We also suggest a method to recover the lost quality. In simulated annealing, runtime can be shortened at the expense of quality. Using this method, the serial placer can achieve a maximum speedup of 100X while quality metrics degrades as much as 100%. In contrast, the parallel placer can scale beyond 500X with all quality metrics degrading by less than 30%. Specifically, at the point where the parallel placer begins to dominate over the serial placer, the post-routing minimum channel width, wirelength and critical-path delay degrades 13%, 10% and 7% respectively on average compared to VPR’s original algorithm,while achieving a 140X to 200X speedup 25 threads. Finally, it is shown that the amount of degradation in the parallel placer is independent of the number of threads used.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2011-10-24
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0072367
URI	http://hdl.handle.net/2429/38168
Degree (Theses)	Master of Applied Science - MASc
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2011-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Scalable and deterministic timing-driven parallel placement for FPGAs Wang, Chao Chris

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights