UBC Theses and Dissertations
Investigation of time-domain measurements for analysis and machine recognition of speech Ito, Mabo Robert
At present in speech analysis and mechanical speech recognition work, spectral measurements are the conventional form of signal representation and acoustical descriptions of speech sounds are usually given in terms of this form of representation. In this thesis, certain time-domain measurements are investigated as an alternative form of signal representation and as a basis for acoustical characterization of speech sounds. The primary measurements studied are the short-time averages of the zero-crossing rate of the acoustic waveform and the distribution patterns of the time intervals between zero-crossings. These measurements are found to be easy to implement with digital techniques and are implemented through digital computer simulation. Other advantages of these measurements include effectiveness in handling the large intensity range of speech sounds and ability to track rapid transient phenomena such as the release of unvoiced stops. Computer software for an interactive graphics facility was developed for acquisition, presentation, manipulation and analysis of the acoustic speech data. One of the pattern analysis programs, for the display of time-interval distribution data, yielded a visual presentation which could be compared to frequency spectrograms. Theoretical expressions are developed to relate the time-domain and spectral representation for some phone types and these relationships are compared with experimental results. The above theoretical expressions show that important spectral characterization features are accounted for. These findings, combined with empirical observation of the utility of the time-domain signal representation in phonetic characterization, indicates that this form of representation is a useful alternative to the spectral representation. The speech materials employed were selected to study temporal structures and contextual variations of acoustic properties and to provide quantitative data useful for word recognition applications. The vowels, fricatives and stops were the main phoneme classes studied. Quantitative data on the acoustic properties of the selected phonemes is presented and discussed in terms of i) our own spectral data, ii) other data reported in the literature and iii) simple production models. The time-domain signal representation was found to provide an effective means of analyzing and characterizing the acoustically complex stops and voiced fricatives. For the vowels and unvoiced fricatives, which are well suited to spectral analysis, the time domain measurements were found to yield very simple and direct characterization features. Some limited phonemic decomposition and machine recognition work is described which demonstrates the design of useful characterization features and provides a basis for further work.
Item Citations and Data