UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Speech recognition & diphone extraction for natural speech synthesis Darbandi, Hossein B.


Modern speech synthesizers use concatenated words and sub-word segments, such as diphones, to synthesize natural speech. Synthesizers available today can synthesize speech with only a limited selection of voices provided by the vendors. The voice segments (e.g. words & diphones) are often created using semi-manual processes that are prone to human error and make the segments non-uniform. The main goal of this thesis is developing an automatic method to segment and label a natural speech into words, diphones, and phonemes. To segment speech into words and sub-words, I use a speech recognition engine. The commercially available speech recognition engines do not provide all the necessary functionality to segment the speech into diphones accurately. As a result, I have developed an engine to segment speech. For developing the engine, I have employed HTK tools provided by Cambridge University, available for free.

Item Media

Item Citations and Data


For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.