UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Accelerating input dispatching for deep learning recommendation models training Adnan, Muhammad


Deep-Learning and Time-Series based recommendation models require copious amounts of compute for the deep learning part and large memory capacities for their embedding table portion. Training these models typically involves using GPUs to accelerate the deep learning phase but restrict the memory-intensive embedding tables to the CPUs. This causes data to be constantly transferred between the CPU and GPUs, which limits the overall throughput of the training process. This thesis offers a heterogeneous acceleration pipeline, called Hotline, by leveraging the insight that only a small number of embedding entries are accessed frequently, and can easily fit in a single GPU’s local memory. Hotline aims to pipeline the training mini-batches by efficiently utilizing (1) the main memory for not-frequently accessed embeddings, (2) the GPUs’ local memory for frequently accessed embeddings and their compute for the entire recommender model, whilst stitching their execution through a novel hardware accelerator that gathers required working parameters and dispatches training inputs. Hotline accelerator processes multiple input mini-batches to collect and dispatch the ones that access the frequently-accessed embeddings directly to GPUs. For inputs that require infrequently accessed embeddings, Hotline hides the CPUGPU transfer time by proactively obtaining them from the main memory. This enables the recommendation system training, for its entirety of mini-batches, to be performed on low-capacity high-throughput GPUs. Results on real-world datasets and recommender models shows that Hotline reduces the average training time by 3.45 in comparison to a XDL baseline when using 4 GPUs. Moreover, Hotline increases the overall training throughput to 20.8 epochs/hr in comparison to 5.3 epochs/hr for Criteo Terabyte dataset.

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International