Open Collections

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Accelerating input dispatching for deep learning recommendation models training Adnan, Muhammad

Abstract

Deep-Learning and Time-Series based recommendation models require copious amounts of compute for the deep learning part and large memory capacities for their embedding table portion. Training these models typically involves using GPUs to accelerate the deep learning phase but restrict the memory-intensive embedding tables to the CPUs. This causes data to be constantly transferred between the CPU and GPUs, which limits the overall throughput of the training process. This thesis offers a heterogeneous acceleration pipeline, called Hotline, by leveraging the insight that only a small number of embedding entries are accessed frequently, and can easily ﬁt in a single GPU’s local memory. Hotline aims to pipeline the training mini-batches by efﬁciently utilizing (1) the main memory for not-frequently accessed embeddings, (2) the GPUs’ local memory for frequently accessed embeddings and their compute for the entire recommender model, whilst stitching their execution through a novel hardware accelerator that gathers required working parameters and dispatches training inputs. Hotline accelerator processes multiple input mini-batches to collect and dispatch the ones that access the frequently-accessed embeddings directly to GPUs. For inputs that require infrequently accessed embeddings, Hotline hides the CPUGPU transfer time by proactively obtaining them from the main memory. This enables the recommendation system training, for its entirety of mini-batches, to be performed on low-capacity high-throughput GPUs. Results on real-world datasets and recommender models shows that Hotline reduces the average training time by 3.45 in comparison to a XDL baseline when using 4 GPUs. Moreover, Hotline increases the overall training throughput to 20.8 epochs/hr in comparison to 5.3 epochs/hr for Criteo Terabyte dataset.

Item Metadata

Title	Accelerating input dispatching for deep learning recommendation models training
Creator	Adnan, Muhammad
Supervisor	Nair, Prashant J.
Publisher	University of British Columbia
Date Issued	2021
Description	Deep-Learning and Time-Series based recommendation models require copious amounts of compute for the deep learning part and large memory capacities for their embedding table portion. Training these models typically involves using GPUs to accelerate the deep learning phase but restrict the memory-intensive embedding tables to the CPUs. This causes data to be constantly transferred between the CPU and GPUs, which limits the overall throughput of the training process. This thesis offers a heterogeneous acceleration pipeline, called Hotline, by leveraging the insight that only a small number of embedding entries are accessed frequently, and can easily ﬁt in a single GPU’s local memory. Hotline aims to pipeline the training mini-batches by efﬁciently utilizing (1) the main memory for not-frequently accessed embeddings, (2) the GPUs’ local memory for frequently accessed embeddings and their compute for the entire recommender model, whilst stitching their execution through a novel hardware accelerator that gathers required working parameters and dispatches training inputs. Hotline accelerator processes multiple input mini-batches to collect and dispatch the ones that access the frequently-accessed embeddings directly to GPUs. For inputs that require infrequently accessed embeddings, Hotline hides the CPUGPU transfer time by proactively obtaining them from the main memory. This enables the recommendation system training, for its entirety of mini-batches, to be performed on low-capacity high-throughput GPUs. Results on real-world datasets and recommender models shows that Hotline reduces the average training time by 3.45 in comparison to a XDL baseline when using 4 GPUs. Moreover, Hotline increases the overall training throughput to 20.8 epochs/hr in comparison to 5.3 epochs/hr for Criteo Terabyte dataset.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2021-10-25
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0402605
URI	http://hdl.handle.net/2429/80042
Degree	Master of Applied Science - MASc
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2021-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Accelerating input dispatching for deep learning recommendation models training Adnan, Muhammad

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights