- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Towards Afrocentric natural language processing
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Towards Afrocentric natural language processing Adebara, Ifeoluwanimi
Abstract
This dissertation centers on Natural Language Processing (NLP) for African languages, endeavoring to unravel the progress, challenges, and future prospects within this linguistic context. The research encompasses language identification and Natural Language Understanding (NLU), Natural Language Generation (NLG), and culminates in a comprehensive case study on machine translation. The first chapter introduces the problem statement, articulates the motivation for addressing the issue, and presents the innovative solutions developed throughout this research. Chapter two discusses intricate details of African languages, offering insights into the genealogical classification, linguistic landscape, and the challenges of multilingual NLP. Building upon this foundation, the third chapter advocates for an Afrocentric approach to technology development, emphasizing the significance of aligning technology with the cultural values and linguistic diversity of African communities. It addresses challenges such as data scarcity and representation bias, spotlighting community-driven initiatives aimed at advancing NLP in the region. The fourth chapter unveils AfroLID, a neural language identification tool designed for 517 African languages and language varieties, establishing itself as the new state-of-the-art solution for African language identification. Chapter five introduces SERENGETI, a massively multilingual language model tailored to support 517 African languages and language varieties. Evaluation on AfroNLU, an extensive benchmark for African NLP, showcases SERENGETI’s superior performance, thereby paving the way for transformative research and development across a diverse linguistic landscape. The sixth chapter addresses NLG challenges in African languages, presenting Cheetah, a language model designed for 517 African languages. Comprehensive evaluations underscore Cheetah’s capacity to generate contextually relevant text across various African languages. The seventh chapter presents a case study on machine translation, focusing on Bare Nouns (BNs) translation from Yorùbá to English. This study illuminates the challenges posed by information asymmetry in machine translation and provides insights into the linguistic capabilities of Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) systems. Emphasizing the importance of fine-grained linguistic considerations, the study encourages further research in addressing translation challenges faced by languages with BNs, analytic languages, and low-resource languages. In chapter eight, I conclude and discuss possible directions for future work.
Item Metadata
Title |
Towards Afrocentric natural language processing
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
This dissertation centers on Natural Language Processing (NLP) for African languages, endeavoring to unravel the progress, challenges, and future prospects within this linguistic context. The research encompasses language identification and Natural Language Understanding (NLU), Natural Language Generation (NLG), and culminates in a comprehensive case study on machine translation.
The first chapter introduces the problem statement, articulates the motivation for addressing the
issue, and presents the innovative solutions developed throughout this research. Chapter two discusses intricate details of African languages, offering insights into the genealogical classification, linguistic landscape, and the challenges of multilingual NLP. Building upon this foundation, the third chapter advocates for an Afrocentric approach to technology development, emphasizing the significance of aligning technology with the cultural values and linguistic diversity of African communities. It addresses challenges such as data scarcity and representation bias, spotlighting community-driven initiatives aimed at advancing NLP in the region.
The fourth chapter unveils AfroLID, a neural language identification tool designed for 517 African
languages and language varieties, establishing itself as the new state-of-the-art solution for African
language identification.
Chapter five introduces SERENGETI, a massively multilingual language model tailored to support
517 African languages and language varieties. Evaluation on AfroNLU, an extensive benchmark for
African NLP, showcases SERENGETI’s superior performance, thereby paving the way for transformative research and development across a diverse linguistic landscape.
The sixth chapter addresses NLG challenges in African languages, presenting Cheetah, a language model designed for 517 African languages. Comprehensive evaluations underscore Cheetah’s capacity to generate contextually relevant text across various African languages.
The seventh chapter presents a case study on machine translation, focusing on Bare Nouns (BNs)
translation from Yorùbá to English. This study illuminates the challenges posed by information
asymmetry in machine translation and provides insights into the linguistic capabilities of Statistical
Machine Translation (SMT) and Neural Machine Translation (NMT) systems. Emphasizing the
importance of fine-grained linguistic considerations, the study encourages further research in addressing translation challenges faced by languages with BNs, analytic languages, and low-resource languages.
In chapter eight, I conclude and discuss possible directions for future work.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-02-27
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0440415
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2024-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International