- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Methods for design of efficient on-device natural language...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Methods for design of efficient on-device natural language processing architectures Jawahar, Ganesh
Abstract
Deep learning based models often achieve state-of-the-art performance in a wide range of natural language processing (NLP) tasks, which include open-ended tasks (e.g., story generation, brainstorming, and chat) and closed-ended tasks (e.g., summarization, question answering, and rewriting). To further enhance quality, there is a growing interest in scaling the model size and the amount of data used for training. These research efforts often overlook the impact of footprint metrics, such as high latency, high memory usage, and high energy consumption, on these deep learning models. A high footprint makes these models significantly inefficient for deployment on servers and devices such as tablets, handhelds, and wearables. Methods for improving model efficiency often come at the cost of degrading model quality. In this dissertation, we address the central question: how can we push the envelope in improving the efficiency-quality tradeoff of deep learning models for on-device NLP tasks? To this end, we propose methods that take on-device efficiency constraints (e.g., ≤ 16 MB memory or ≤ 200 ms latency) to inform the design of the model architecture. We propose methods for the manual design of architecture for the auto-completion task (generate continuations for user-written prompts) that enjoy a better memory-accuracy tradeoff than existing auto-completion models (Chapter 2). Additionally, we introduce methods that can directly take efficiency constraints to automatically search for efficient sparsely activated architectures for machine translation tasks (Chapter 3) and efficient pretrained (task-agnostic) language modeling architectures (Chapter 4). Finally, in Chapter 5, we explore a novel use case of employing large language models to speed up architecture search, while maintaining the efficiency and quality of state-of-the-art neural architecture search algorithms.
Item Metadata
Title |
Methods for design of efficient on-device natural language processing architectures
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
Deep learning based models often achieve state-of-the-art performance in a wide range of natural language processing (NLP) tasks, which include open-ended tasks (e.g., story generation, brainstorming, and chat) and closed-ended tasks (e.g., summarization, question answering, and rewriting). To further enhance quality, there is a growing interest in scaling the model size and the amount of data used for training. These research efforts often overlook the impact of footprint metrics, such as high latency, high memory usage, and high energy consumption, on these deep learning models. A high footprint makes these models significantly inefficient for deployment on servers and devices such as tablets, handhelds, and wearables. Methods for improving model efficiency often come at the cost of degrading model quality.
In this dissertation, we address the central question: how can we push the envelope in improving the efficiency-quality tradeoff of deep learning models for on-device NLP tasks? To this end, we propose methods that take on-device efficiency constraints (e.g., ≤ 16 MB memory or ≤ 200 ms latency) to inform the design of the model architecture. We propose methods for the manual design of architecture for the auto-completion task (generate continuations for user-written prompts) that enjoy a better memory-accuracy tradeoff than existing auto-completion models (Chapter 2). Additionally, we introduce methods that can directly take efficiency constraints to automatically search for efficient sparsely activated architectures for machine translation tasks (Chapter 3) and efficient pretrained (task-agnostic) language modeling architectures (Chapter 4). Finally, in Chapter 5, we explore a novel use case of employing large language models to speed up architecture search, while maintaining the efficiency and quality of state-of-the-art neural architecture search algorithms.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-04-16
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0441384
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2024-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International