- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Offloading embedding lookups to processing-in-memory...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Offloading embedding lookups to processing-in-memory for deep learning recommender models Zarif, Niloofar
Abstract
Recommender systems are an essential part of many industries and businesses. Generating accurate recommendations is critical for user engagement and business revenue. Currently, deep learning recommender models are commonly used, but they face challenges in processing and representing categorical data, which is a significant portion of the data used by these models. Embedding layers are often used to handle these complications by storing the numerical representation of different categories of a feature in a reduced vector space. Vectors representing all the categories of a feature in the reduced vector space will be stored in a tabular structure named embedding table. The operation of fetching the vector representation of a category from the embedding table and pooling them is called embedding lookup. However, embedding lookups have large memory footprints and require high memory bandwidth, leading to high latency and low throughput. We have developed a new system called PIM-Rec to address these challenges by using the first commercially available Processing-In-Memory (PIM) capable DRAM modules for embedding lookups. PIM-Rec is the first system to use such DRAM modules, and it has shown an 80% decrease in end-to-end inference cycle latency and an 80% increase in latency-bound throughput compared to the standard CPU-only implementation. PIM DRAM modules are a good candidate for handling embedding lookups, especially with the recent drastic size increase of embedding tables. Although PIM-Rec faced obstacles, it offers a realistic solution and analysis while discovering the obstacles and projecting them. This new system provides a promising solution for improving the efficiency of recommender systems and reducing the load they incur in data centers.
Item Metadata
Title |
Offloading embedding lookups to processing-in-memory for deep learning recommender models
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2023
|
Description |
Recommender systems are an essential part of many industries and businesses.
Generating accurate recommendations is critical for user engagement and business
revenue. Currently, deep learning recommender models are commonly used, but
they face challenges in processing and representing categorical data, which is a
significant portion of the data used by these models. Embedding layers are often
used to handle these complications by storing the numerical representation of different
categories of a feature in a reduced vector space. Vectors representing all
the categories of a feature in the reduced vector space will be stored in a tabular
structure named embedding table. The operation of fetching the vector representation
of a category from the embedding table and pooling them is called embedding
lookup. However, embedding lookups have large memory footprints and require
high memory bandwidth, leading to high latency and low throughput.
We have developed a new system called PIM-Rec to address these challenges
by using the first commercially available Processing-In-Memory (PIM) capable
DRAM modules for embedding lookups. PIM-Rec is the first system to use such
DRAM modules, and it has shown an 80% decrease in end-to-end inference cycle
latency and an 80% increase in latency-bound throughput compared to the standard
CPU-only implementation. PIM DRAM modules are a good candidate for
handling embedding lookups, especially with the recent drastic size increase of
embedding tables. Although PIM-Rec faced obstacles, it offers a realistic solution
and analysis while discovering the obstacles and projecting them. This new system
provides a promising solution for improving the efficiency of recommender
systems and reducing the load they incur in data centers.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2023-08-21
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0435518
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2023-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International