- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Representation learning for computational sociopragmatics
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Representation learning for computational sociopragmatics Zhang, Chiyu
Abstract
Natural Language Processing (NLP) emerges as a critical solution for analyzing, manipulating, and understanding human language automatically and computationally, enabling the processing of vast amounts of language data swiftly. Computational NLP systems utilize numerical matrices or vectors as inputs, necessitating the conversion of discrete language symbols into a continuous representation space. The efficacy of these continuous representations is pivotal for developing successful NLP systems. With the advent of attention mechanisms, attention-based models have been adopted to learn contextual language representations by pre-training with language modeling (LM) objectives on extensive textual corpora. Despite the proven effectiveness of attention-based pre-trained language models (PLMs) in learning sequence-level representations for various NLP tasks, the integration of social aspects into representation learning remains unexplored. Recent efforts have applied PLMs to derive user-level representations, aiming to enhance content-based recommendation systems' transferability and precision. However, challenges persist in encoding lengthy user engagement histories, capturing users' diverse interests, and generating precomputable user-level representations. This dissertation focuses on advancing language representation learning for sequence-level sociopragmatic meaning (SM) comprehension and user-level content-based recommendation. For sequence-level SM, we introduce a novel weakly supervised method for pretraining and fine-tuning language models (Chapter 2). To enhance representation quality further, we propose a new contrastive learning framework for pretraining LMs (Chapter 3). Our approach is extended to the multilingual domain, presenting a unified, massively multilingual evaluation benchmark for SM (Chapter 4), alongside a comprehensive evaluation of state-of-the-art large language models for SM understanding. Addressing the challenges in learning user-level representations for recommendation systems, Chapter 5 introduces a novel framework that incorporates multiple poly-attention layers and sparse attention mechanisms. This framework hierarchically fuses token-level embeddings of session-based user history texts using PLM, tackling the intricacies of recommendation systems.
Item Metadata
Title |
Representation learning for computational sociopragmatics
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
Natural Language Processing (NLP) emerges as a critical solution for analyzing, manipulating, and understanding human language automatically and computationally, enabling the processing of vast amounts of language data swiftly. Computational NLP systems utilize numerical matrices or vectors as inputs, necessitating the conversion of discrete language symbols into a continuous representation space. The efficacy of these continuous representations is pivotal for developing successful NLP systems. With the advent of attention mechanisms, attention-based models have been adopted to learn contextual language representations by pre-training with language modeling (LM) objectives on extensive textual corpora. Despite the proven effectiveness of attention-based pre-trained language models (PLMs) in learning sequence-level representations for various NLP tasks, the integration of social aspects into representation learning remains unexplored. Recent efforts have applied PLMs to derive user-level representations, aiming to enhance content-based recommendation systems' transferability and precision. However, challenges persist in encoding lengthy user engagement histories, capturing users' diverse interests, and generating precomputable user-level representations. This dissertation focuses on advancing language representation learning for sequence-level sociopragmatic meaning (SM) comprehension and user-level content-based recommendation. For sequence-level SM, we introduce a novel weakly supervised method for pretraining and fine-tuning language models (Chapter 2). To enhance representation quality further, we propose a new contrastive learning framework for pretraining LMs (Chapter 3). Our approach is extended to the multilingual domain, presenting a unified, massively multilingual evaluation benchmark for SM (Chapter 4), alongside a comprehensive evaluation of state-of-the-art large language models for SM understanding. Addressing the challenges in learning user-level representations for recommendation systems, Chapter 5 introduces a novel framework that incorporates multiple poly-attention layers and sparse attention mechanisms. This framework hierarchically fuses token-level embeddings of session-based user history texts using PLM, tackling the intricacies of recommendation systems.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-08-28
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0445200
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2024-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NoDerivatives 4.0 International