UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Methods for design of efficient on-device natural language processing architectures Jawahar, Ganesh

Abstract

Deep learning based models often achieve state-of-the-art performance in a wide range of natural language processing (NLP) tasks, which include open-ended tasks (e.g., story generation, brainstorming, and chat) and closed-ended tasks (e.g., summarization, question answering, and rewriting). To further enhance quality, there is a growing interest in scaling the model size and the amount of data used for training. These research efforts often overlook the impact of footprint metrics, such as high latency, high memory usage, and high energy consumption, on these deep learning models. A high footprint makes these models significantly inefficient for deployment on servers and devices such as tablets, handhelds, and wearables. Methods for improving model efficiency often come at the cost of degrading model quality. In this dissertation, we address the central question: how can we push the envelope in improving the efficiency-quality tradeoff of deep learning models for on-device NLP tasks? To this end, we propose methods that take on-device efficiency constraints (e.g., ≤ 16 MB memory or ≤ 200 ms latency) to inform the design of the model architecture. We propose methods for the manual design of architecture for the auto-completion task (generate continuations for user-written prompts) that enjoy a better memory-accuracy tradeoff than existing auto-completion models (Chapter 2). Additionally, we introduce methods that can directly take efficiency constraints to automatically search for efficient sparsely activated architectures for machine translation tasks (Chapter 3) and efficient pretrained (task-agnostic) language modeling architectures (Chapter 4). Finally, in Chapter 5, we explore a novel use case of employing large language models to speed up architecture search, while maintaining the efficiency and quality of state-of-the-art neural architecture search algorithms.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International