Accelerating recommendation system training by leveraging popular choices

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Accelerating recommendation system training by leveraging popular choices Ebrahimzadeh Maboud, Yassaman

Abstract

Recommendation systems have been deployed in e-commerce and online advertising to expose desired items from the user's perspective. To meet this end, various deep learning-based recommendation models have been employed such as the Deep learning recommendation model or DLRM at Facebook. The input of such a model can be categorized as dense and sparse representations. The former demonstrates the numerical representation of items and users with discrete parameters. On the other hand, the latter refers to continuous input such as time or age. Such models are comprised of two main components: computation-intensive components like multilayer perceptron or MLP and memory-intensive like embedding tables which save the numerical representation of sparse. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration, however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this thesis deep dives into the semantics of training data and feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed. Only a few embedding entries are accessed up to 10000× more. In this thesis, we focus on improving the end-to-end training performance using this insight and offer a framework, called Frequently Accessed Embeddings or FAE. we propose a hot-embedding-aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reducing the data transfers from CPU to GPU. We choose DLRM~\cite{dlrm} and XDL~\cite{xdl} as the baseline. Both of these models have been commercialized and are well-established in the industry. DLRM has been deployed by Facebook as well as XDL by Alibaba. We choose XDL as of its high utilization of CPU and a notably scalable solution for training recommendation models. Experiments on production-scale recommendation models with datasets from real work show that FAE reduces the overall training time by 2.3× and 1.52× in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy.

Item Metadata

Title	Accelerating recommendation system training by leveraging popular choices
Creator	Ebrahimzadeh Maboud, Yassaman
Supervisor	Nair, Prashant J.
Publisher	University of British Columbia
Date Issued	2021
Description	Recommendation systems have been deployed in e-commerce and online advertising to expose desired items from the user's perspective. To meet this end, various deep learning-based recommendation models have been employed such as the Deep learning recommendation model or DLRM at Facebook. The input of such a model can be categorized as dense and sparse representations. The former demonstrates the numerical representation of items and users with discrete parameters. On the other hand, the latter refers to continuous input such as time or age. Such models are comprised of two main components: computation-intensive components like multilayer perceptron or MLP and memory-intensive like embedding tables which save the numerical representation of sparse. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration, however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this thesis deep dives into the semantics of training data and feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed. Only a few embedding entries are accessed up to 10000× more. In this thesis, we focus on improving the end-to-end training performance using this insight and offer a framework, called Frequently Accessed Embeddings or FAE. we propose a hot-embedding-aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reducing the data transfers from CPU to GPU. We choose DLRM~\cite{dlrm} and XDL~\cite{xdl} as the baseline. Both of these models have been commercialized and are well-established in the industry. DLRM has been deployed by Facebook as well as XDL by Alibaba. We choose XDL as of its high utilization of CPU and a notably scalable solution for training recommendation models. Experiments on production-scale recommendation models with datasets from real work show that FAE reduces the overall training time by 2.3× and 1.52× in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2021-10-25
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0402608
URI	http://hdl.handle.net/2429/80046
Degree	Master of Applied Science - MASc
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2021-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Accelerating recommendation system training by leveraging popular choices Ebrahimzadeh Maboud, Yassaman

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights