UBC Theses and Dissertations
Assessment of vocal pathology through computerized analysis of perturbation in vowels Cox, Neil Bernard
This thesis involved the development, validation and "calibration" of computerized methodologies for analysis of short-time perturbations in vowels, including mathematical analyses of the effect of measurement errors, verification using synthesized data, and evaluation using real data. Such methodologies have been proposed for improved diagnosis and management of laryngeal pathology. Significant effects were observed in mathematical analyses of quantization and pitch-period demarcation for three popular algorithms; the harmonics-to-noise ratio (HNR), the relative average perturbation (RAP) and the directional perturbation quotient (DPQ). A severe underestimation of the HNR caused by such errors was demonstrated. The effect was shown to depend on high frequency components of the vowel. Errors affecting the use of the RAP in measurement of jitter and shimmer were quantified, and methods of compensation were proposed. The DPQ demonstrated a dependence on perturbation magnitude. Such errors influence the interpretation and comparison of results. A number of new measures were developed. The RAP and the DPQ were generalized for variation of the number and spacing of points. The HNR was modified to account for a data offset and for reduction of the influence of jitter and shimmer. A new measure of time domain noise called the correlation factor (CF) was introduced, along with new measures of cyclic perturbation. Issues in Fourier spectrum analysis that affect measures of spectral noise were discussed. Methods for taking advantage of fast Fourier transforms and window tapering were described, along with methods for reducing dependence on formant structure. A new method for "optimizing" pitch-period demarcation markers was shown to be effective at reducing errors for all but the most severely perturbed waveforms. Cross-correlation was combined with parabolic interpolation to obtain high resolution pitch-period demarcation at moderate sampling frequencies. An analysis of synthetic vowels was used to comparatively evaluate the influences of fundamental frequency, vowel type, perturbation type, perturbation level, pitch-period demarcation and quantization. Some findings were: 1) Interpolation is recommended for most measures when the sampling frequency is 20 kHz or less. 2) Optimization of pitch-period markers significantly improved the analyses. 3) Both the offset and the accuracy of pitch-period demarcation can significantly affect measures of time domain noise. 4) Measures of shimmer and noise were affected by fundamental frequency and vowel type. 5) Jitter affected measures of other characteristics. 6) Window tapering reduced the sensitivity of measures of spectral noise to pitch-period demarcation errors. 7) Measures of spectral noise were far more sensitive to jitter than measures of time domain noise. Prolongations of /a/ from 206 male subjects and 194 female subjects were analyzed. The computed measures were correlated with subjective judgements of hoarseness, and used to discriminate among pathologies. Some findings were: 1) Logarithmic transformation was recommended for measures of jitter and shimmer. 2) Measures of time domain noise were generally superior to measures of jitter, shimmer or spectral noise. 3) The best single measure was the correlation factor (CF). 4) The correlation with hoarseness was improved through linear combination of the CF with a measure of jitter, leading to r≈.84 for males and r≈.80 for females. 5) Segregation of sexes was recommended. 6) Improved classification for males was obtained through separation into four diagnostic classes. 7) Improved classification for both males and females was obtained through inclusion of measures of perturbation patterns. 8) In an open test, the best classifiers had an average recognition rate of approximately 74% for distinguishing normal speakers, and 71% for detecting cancer subjects. 9) Computer classification matched or exceeded the ability of trained listeners.
Item Citations and Data