Open Collections will undergo maintenance on Thursday, July 24th, 2025. The site will not be available from 8:00 AM - 9:00 AM PST and performance may be impacted from 9:00 AM - 12:00 PM PST.

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Speech recognition & diphone extraction for natural speech synthesis Darbandi, Hossein B.

Abstract

Modern speech synthesizers use concatenated words and sub-word segments, such as diphones, to synthesize natural speech. Synthesizers available today can synthesize speech with only a limited selection of voices provided by the vendors. The voice segments (e.g. words & diphones) are often created using semi-manual processes that are prone to human error and make the segments non-uniform. The main goal of this thesis is developing an automatic method to segment and label a natural speech into words, diphones, and phonemes. To segment speech into words and sub-words, I use a speech recognition engine. The commercially available speech recognition engines do not provide all the necessary functionality to segment the speech into diphones accurately. As a result, I have developed an engine to segment speech. For developing the engine, I have employed HTK tools provided by Cambridge University, available for free.

Item Media

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.