UBC Theses and Dissertations
Automatic speech quality analysis with application to speech training Exner, Rolf
A number of aspects of speech training involve assessing the quality of the student's speech. It is of interest to determine whether such speech quality analysis can be done automatically. This thesis provides a preliminary answer to that question by proposing and then evaluating a set of quality measures for comparing the quality of two segments of speech. Speech quality is taken to be the lack of defects in the articulatory and prosodic components of speech. It is a non-quantitative definition from speech pathology that can meet the needs of speech training. Speech defects common among deaf children and students of English as a second language are reviewed, and classified according to this scheme. The speech quality measures are based on a linear prediction model of speech, and adapt several techniques from the field of speech recognition. Evaluations using speech with known quality defects show that the articulatory measures are effective in detecting most of the common errors of articulation, with the exception of ones between nasal sounds. The prosodic quality measures of loudness and timing give very useful indications of syllable stress and voicing errors. The timing measure is derived from the optimal time-warping curve between the two utterances, and provides an accurate means of tracking speed variations in speech. Differences between speakers tend to mask articulatory quality errors, but have little effect on the prosodic quality measures. An articulatory distance measure is proposed that partly counters these interspeaker differences. Work remains to be done in a number of key areas, but the results of this preliminary investigation suggest that automatic speech quality analysis by computer is practical and may one day become a versatile tool for speech training.
Item Citations and Data