Representation learning for Arabic dialect identification

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Representation learning for Arabic dialect identification Sullivan, Peter

Abstract

Arabic dialect identification (ADI) is an important aspect of the Arabic speech processing pipeline, and in particular dialectal Arabic automatic speech recognition (ASR) models. In this work, we present an overview of corpora and methods applicable to both ADI and dialectal Arabic ASR, then we benchmark two approaches to using pre-trained speech representation models for ADI. Namely, we first employ direct fine-tuning, and then use fixed-representations extracted from pre-trained models as an intermediate step in the ADI process. We train and evaluate our models on the granular ADI-17 Arabic dialect corpus (92% F1 for our fine-tuned HuBERT model), and further probe generalization by evaluating our trained models on coarse-grained ADI-5, (80% F1 for fine-tuned HuBERT).

Item Metadata

Title	Representation learning for Arabic dialect identification
Creator	Sullivan, Peter
Supervisor	Abdul-Mageed, Muhammad
Publisher	University of British Columbia
Date Issued	2022
Description	Arabic dialect identification (ADI) is an important aspect of the Arabic speech processing pipeline, and in particular dialectal Arabic automatic speech recognition (ASR) models. In this work, we present an overview of corpora and methods applicable to both ADI and dialectal Arabic ASR, then we benchmark two approaches to using pre-trained speech representation models for ADI. Namely, we first employ direct fine-tuning, and then use fixed-representations extracted from pre-trained models as an intermediate step in the ADI process. We train and evaluate our models on the granular ADI-17 Arabic dialect corpus (92% F1 for our fine-tuned HuBERT model), and further probe generalization by evaluating our trained models on coarse-grained ADI-5, (80% F1 for fine-tuned HuBERT).
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2022-08-22
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0417468
URI	http://hdl.handle.net/2429/82448
Degree (Theses)	Master of Library and Information Studies - MLIS
Program (Theses)	Library and Information Studies
Affiliation	Arts, Faculty of; iSchool (Library, Archival and Information Studies)
Degree Grantor	University of British Columbia
Graduation Date	2022-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Representation learning for Arabic dialect identification Sullivan, Peter

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights