UBC Theses and Dissertations
Scalable and deterministic timing-driven parallel placement for FPGAs Wang, Chao Chris
This thesis describes a parallel implementation of the timing-driven VPR 5.0 simulated-annealing placement engine. By partitioning the grid into regions and allowing distant data to grow stale, it is possible to consider a large number of non-conflicting moves in parallel and achieve a deterministic result. The full timing-driven placement algorithm is parallelized, including swap evaluation, bounding-box calculation and the detailed timing-analysis updates. The partitioned region approach slightly degrades the placement quality, but this is necessary to expose greater parallelism. We also suggest a method to recover the lost quality. In simulated annealing, runtime can be shortened at the expense of quality. Using this method, the serial placer can achieve a maximum speedup of 100X while quality metrics degrades as much as 100%. In contrast, the parallel placer can scale beyond 500X with all quality metrics degrading by less than 30%. Specifically, at the point where the parallel placer begins to dominate over the serial placer, the post-routing minimum channel width, wirelength and critical-path delay degrades 13%, 10% and 7% respectively on average compared to VPR’s original algorithm,while achieving a 140X to 200X speedup 25 threads. Finally, it is shown that the amount of degradation in the parallel placer is independent of the number of threads used.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International