UBC Theses and Dissertations
Identification of invariant acoustic cues in stop consonants using the Wigner distribution Garudadri, Harinath
It is a common belief that there are invariant acoustic patterns in speech signals, which can be related to their phonetic description. These patterns are expected to remain invariant, independent of the language, speaker, phonetic context, etc. Although many investigations based on short-time spectral analysis have established the feasibility of extracting invariant cues in certain contexts, they could not provide a set of invariant cues in any given phonetic context. In this thesis, the Wigner distribution (WD) was used to analyze speech signals for the first time, to investigate acoustic invariance. The WD, like the spectrogram, provides a time-frequency description of the signal. Unlike the spectrogram, it provides correct marginals in the time and frequency domains, but it is not a positive distribution. It is demonstrated here that the partially smoothed WD, in which both the properties of positivity and correct marginals are sacrificed to some extent, provides a better time-frequency resolution than short-time spectral analyses methods. An implementation and an interpretation of the partially smoothed WD are presented. The choice of smoothing parameters and the nature of cross-term suppression in a partially smoothed WD are discussed in detail. It is shown that the cross-terms in a partially smoothed WD do not mask the underlying nature of a signal in the time-frequency plane. A partially smoothed WD was used to investigate acoustic invariance in voiceless, unaspirated stop consonants spoken by native speakers of English, Telugu and French. Contrary to reports in the literature, it was shown that the features "diffuse-rising" and "compact" spectral shapes were not unique to alveolar and velar places of articulation, respectively, but depended on the vowel context. The resulting ambiguities when specifying the place of articulation were resolved using Formant Onset Duration (time taken for the steady state formants to occur in the vocal tract after the consonantal release) and F₂ of the following vowel. The place of articulation was specified correctly for 86% of the tokens. Unlike in other investigations, the errors in specifying the place of articulation were uniformly distributed over all vowel contexts.
Item Citations and Data