UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Supernormals : a floating-point format with fine-grained range control and tapered precision for DNN training Pun, Shing Wai

Abstract

The scaling of modern deep neural network (DNN) models has led to the use of smaller, lower-precision floating-point formats to meet computational, memory, and energy demands. However, the use of low-precision formats in DNN training introduces severe range and precision restrictions compared to traditional floating-point standards such as IEEE 754 binary32 (FP32), which can lead to poor training results or even divergence. This thesis addresses this by proposing Supernormals, a novel tapered-precision encoding that enables a parameterizable range extension by trading precision for range in specific regions of the encoding space. This thesis demonstrates the capability of Supernormals by systematically varying the independent requirements of numerical range and precision on DNN training at the large, middle, and small magnitude regions, ultimately demonstrating a 6-bit Supernormal format that is able to nearly match 8-bit floating-point formats in DNN training performance. The first part of this thesis explores the range extension capabilities and potential logic area savings of using Supernormals to replace traditional floating-point subnormals and special encodings. This approach replaces both the smallest and largest exponent encodings in a floating-point format with a configurable 0-bit mantissa (M0) regime, providing range extension. This reduces subnormal logic overhead by trading precision for extended range, while capturing the magnitude of infrequent tail values seen during DNN training (e.g., small gradients) to improve training stability. The second part of the thesis explores the precise numerical requirements of DNN training under tensor-level scaling, demonstrating the capability of the Supernormal encoding by systematically quantifying the independent impacts of range and precision across varying numerical magnitudes on DNN training stability, and then evaluating a 6-bit Supernormal floating-point format against existing 6 and 8-bit formats.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International