- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Speech recognition & diphone extraction for natural...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Speech recognition & diphone extraction for natural speech synthesis Darbandi, Hossein B.
Abstract
Modern speech synthesizers use concatenated words and sub-word segments, such as diphones, to synthesize natural speech. Synthesizers available today can synthesize speech with only a limited selection of voices provided by the vendors. The voice segments (e.g. words & diphones) are often created using semi-manual processes that are prone to human error and make the segments non-uniform. The main goal of this thesis is developing an automatic method to segment and label a natural speech into words, diphones, and phonemes. To segment speech into words and sub-words, I use a speech recognition engine. The commercially available speech recognition engines do not provide all the necessary functionality to segment the speech into diphones accurately. As a result, I have developed an engine to segment speech. For developing the engine, I have employed HTK tools provided by Cambridge University, available for free.
Item Metadata
Title |
Speech recognition & diphone extraction for natural speech synthesis
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2002
|
Description |
Modern speech synthesizers use concatenated words and sub-word segments,
such as diphones, to synthesize natural speech. Synthesizers available today can
synthesize speech with only a limited selection of voices provided by the vendors. The
voice segments (e.g. words & diphones) are often created using semi-manual processes
that are prone to human error and make the segments non-uniform.
The main goal of this thesis is developing an automatic method to segment and
label a natural speech into words, diphones, and phonemes. To segment speech into
words and sub-words, I use a speech recognition engine. The commercially available
speech recognition engines do not provide all the necessary functionality to segment the
speech into diphones accurately. As a result, I have developed an engine to segment
speech. For developing the engine, I have employed HTK tools provided by Cambridge
University, available for free.
|
Extent |
7268301 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-08-12
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0065384
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2002-05
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.