Memory optimizations for hardware accelerated ray tracing on GPUs

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Memory optimizations for hardware accelerated ray tracing on GPUs Chou, Yuan Hsi

Abstract

Ray tracing is a photorealistic rendering technique that is gaining adoption in real-time graphics applications since the introduction of ray tracing accelerators in modern GPUs. However, current hardware still struggles to meet frame rate demands of real-time ray tracing. The primary bottleneck is acceleration structure traversal, where rays traverse a bounding volume hierarchy (BVH) tree to find the closest intersecting primitive. Irregular memory access patterns and ray divergence contribute to low traversal performance on GPUs and this dissertation addresses these issues through hardware and software co-design. The first work develops Vulkan-Sim, an architectural simulator to study ray tracing performance on GPUs. With the introduction of ray tracing accelerators in GPUs and limited public information, Vulkan-Sim models the ray tracing accelerator in detail and provides a platform for hardware research. We also evaluate two prior ray tracing hardware optimizations, function call coalescing and independent thread scheduling, showing that despite their SIMT efficiency benefits, they provide limited performance benefits for ray tracing workloads. The second work introduces a traversal algorithm leveraging treelets (subtrees of BVH tree), where threads fully traverse a treelet before moving to the next. It also proposes a treelet hardware prefetcher, complimenting the treelet traversal algorithm by fetching treelet nodes in advance, hiding memory latency. This achieves 32.1% speedup over a baseline GPU using a standard DFS traversal order. The third work proposes a treelet-based ray tracing unit that dynamically switches between treelet-stationary and ray-stationary traversal modes to maximize data reuse from the treelet traversal algorithm. We also propose ray virtualization and warp repacking to refill inactive threads with rays, increasing hardware utilization and SIMT efficiency. Results show our design achieves a 95% speedup. The final work presents contributions to scene reconstruction, addressing the non-deterministic nature of neural network training from atomic operation ordering on GPUs. We propose relaxed warp schedulers to only sequence atomic operations within a warp, and atomic buffers to efficiently handle reductions. We also propose a handshake mechanism enforcing ordering of packets arriving from the interconnect across different cores. Our design outperforms state-of-the-art deterministic GPU baseline by 2-4x.

Item Metadata

Title	Memory optimizations for hardware accelerated ray tracing on GPUs
Creator	Chou, Yuan Hsi
Supervisor	Aamodt, Tor M.
Publisher	University of British Columbia
Date Issued	2025
Description	Ray tracing is a photorealistic rendering technique that is gaining adoption in real-time graphics applications since the introduction of ray tracing accelerators in modern GPUs. However, current hardware still struggles to meet frame rate demands of real-time ray tracing. The primary bottleneck is acceleration structure traversal, where rays traverse a bounding volume hierarchy (BVH) tree to find the closest intersecting primitive. Irregular memory access patterns and ray divergence contribute to low traversal performance on GPUs and this dissertation addresses these issues through hardware and software co-design. The first work develops Vulkan-Sim, an architectural simulator to study ray tracing performance on GPUs. With the introduction of ray tracing accelerators in GPUs and limited public information, Vulkan-Sim models the ray tracing accelerator in detail and provides a platform for hardware research. We also evaluate two prior ray tracing hardware optimizations, function call coalescing and independent thread scheduling, showing that despite their SIMT efficiency benefits, they provide limited performance benefits for ray tracing workloads. The second work introduces a traversal algorithm leveraging treelets (subtrees of BVH tree), where threads fully traverse a treelet before moving to the next. It also proposes a treelet hardware prefetcher, complimenting the treelet traversal algorithm by fetching treelet nodes in advance, hiding memory latency. This achieves 32.1% speedup over a baseline GPU using a standard DFS traversal order. The third work proposes a treelet-based ray tracing unit that dynamically switches between treelet-stationary and ray-stationary traversal modes to maximize data reuse from the treelet traversal algorithm. We also propose ray virtualization and warp repacking to refill inactive threads with rays, increasing hardware utilization and SIMT efficiency. Results show our design achieves a 95% speedup. The final work presents contributions to scene reconstruction, addressing the non-deterministic nature of neural network training from atomic operation ordering on GPUs. We propose relaxed warp schedulers to only sequence atomic operations within a warp, and atomic buffers to efficiently handle reductions. We also propose a handshake mechanism enforcing ordering of packets arriving from the interconnect across different cores. Our design outperforms state-of-the-art deterministic GPU baseline by 2-4x.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2025-04-01
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0448289
URI	http://hdl.handle.net/2429/90572
Degree	Doctor of Philosophy - PhD
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2025-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Memory optimizations for hardware accelerated ray tracing on GPUs Chou, Yuan Hsi

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights