- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- A Multimodal Recommender System Using Deep Learning...
Open Collections
UBC Faculty Research and Publications
A Multimodal Recommender System Using Deep Learning Techniques Combining Review Texts and Images Jeong, Euiju; Li, Xinzhe; Kwon, Angela Eunyoung; Park, Seonu; Li, Qinglong; Kim, Jaekyeong
Abstract
Online reviews that consist of texts and images are an essential source of information for alleviating data sparsity in recommender system studies. Although texts and images provide different types of information, they can provide complementary or substitutive advantages. However, most studies are limited in introducing the complementary effect between texts and images in the recommender systems. Specifically, they have overlooked the informational value of images and proposed recommender systems solely based on textual representations. To address this research gap, this study proposes a novel recommender model that captures the dependence between texts and images. This study uses the RoBERTa and VGG-16 models to extract textual and visual information from online reviews and applies a co-attention mechanism to capture the complementarity between the two modalities. Extensive experiments were conducted using Amazon datasets, confirming the superiority of the proposed model. Our findings suggest that the complementarity of texts and images is crucial for enhancing recommendation accuracy and performance.
Item Metadata
Title |
A Multimodal Recommender System Using Deep Learning Techniques Combining Review Texts and Images
|
Creator | |
Publisher |
Multidisciplinary Digital Publishing Institute
|
Date Issued |
2024-10-10
|
Description |
Online reviews that consist of texts and images are an essential source of information for alleviating data sparsity in recommender system studies. Although texts and images provide different types of information, they can provide complementary or substitutive advantages. However, most studies are limited in introducing the complementary effect between texts and images in the recommender systems. Specifically, they have overlooked the informational value of images and proposed recommender systems solely based on textual representations. To address this research gap, this study proposes a novel recommender model that captures the dependence between texts and images. This study uses the RoBERTa and VGG-16 models to extract textual and visual information from online reviews and applies a co-attention mechanism to capture the complementarity between the two modalities. Extensive experiments were conducted using Amazon datasets, confirming the superiority of the proposed model. Our findings suggest that the complementarity of texts and images is crucial for enhancing recommendation accuracy and performance.
|
Subject | |
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-10-28
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
CC BY 4.0
|
DOI |
10.14288/1.0447147
|
URI | |
Affiliation | |
Citation |
Applied Sciences 14 (20): 9206 (2024)
|
Publisher DOI |
10.3390/app14209206
|
Peer Review Status |
Reviewed
|
Scholarly Level |
Faculty; Researcher
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
CC BY 4.0