A Survey of Grapheme-to-Phoneme Conversion Methods

UBC Faculty Research and Publications

A Survey of Grapheme-to-Phoneme Conversion Methods Cheng, Shiyang; Zhu, Pengcheng; Liu, Jueting; Wang, Zehua

Abstract

Grapheme-to-phoneme conversion (G2P) is the task of converting letters (grapheme sequences) into their pronunciations (phoneme sequences). It plays a crucial role in natural language processing, text-to-speech synthesis, and automatic speech recognition systems. This paper provides a systematical overview of the G2P conversion from different perspectives. The conversion methods are first presented in the paper; detailed discussions are conducted on methods based on deep learning technology. For each method, the key ideas, advantages, disadvantages, and representative models are summarized. This paper then mentioned the learning strategies and multilingual G2P conversions. Finally, this paper summarized the commonly used monolingual and multilingual datasets, including Mandarin, Japanese, Arabic, etc. Two tables illustrated the performance of various methods with relative datasets. After making a general overall of G2P conversion, this paper concluded with the current issues and the future directions of deep learning-based G2P conversion.

Item Metadata

Title	A Survey of Grapheme-to-Phoneme Conversion Methods
Creator	Cheng, Shiyang; Zhu, Pengcheng; Liu, Jueting; Wang, Zehua
Publisher	Multidisciplinary Digital Publishing Institute
Date Issued	2024-12-17
Description	Grapheme-to-phoneme conversion (G2P) is the task of converting letters (grapheme sequences) into their pronunciations (phoneme sequences). It plays a crucial role in natural language processing, text-to-speech synthesis, and automatic speech recognition systems. This paper provides a systematical overview of the G2P conversion from different perspectives. The conversion methods are first presented in the paper; detailed discussions are conducted on methods based on deep learning technology. For each method, the key ideas, advantages, disadvantages, and representative models are summarized. This paper then mentioned the learning strategies and multilingual G2P conversions. Finally, this paper summarized the commonly used monolingual and multilingual datasets, including Mandarin, Japanese, Arabic, etc. Two tables illustrated the performance of various methods with relative datasets. After making a general overall of G2P conversion, this paper concluded with the current issues and the future directions of deep learning-based G2P conversion.
Subject	grapheme-to-phoneme conversion; speech synthesis; machine learning; deep learning
Genre	Article
Type	Text
Language	eng
Date Available	2025-01-10
Provider	Vancouver : University of British Columbia Library
Rights	CC BY 4.0
DOI	10.14288/1.0447724
URI	http://hdl.handle.net/2429/90095
Affiliation	Applied Science, Faculty of; Non UBC; Electrical and Computer Engineering, Department of
Citation	Applied Sciences 14 (24): 11790 (2024)
Publisher DOI	10.3390/app142411790
Peer Review Status	Reviewed
Scholarly Level	Faculty; Researcher
Rights URI	https://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications