ASSESSMENT OF VOCAL PATHOLOGY THROUGH COMPUTERIZED ANALYSIS OF PERTURBATION IN VOWELS By NEIL BERNARD COX B.A.Sc., U n i v e r s i t y o f A l b e r t a , 1978 M.A.Sc, U n i v e r s i t y o f B r i t i s h Columbia, 1981 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY i n THE FACULTY OF GRADUATE STUDIES (Department o f E l e c t r i c a l E n g i n e e r i n g ) We accept t h i s t h e s i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF BRITISH COLUMBIA December 1988 © N e i l B. Cox, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of p~ f^.c^'fr,'^ I Tz^'ji^efr-/^^ The University of British Columbia Vancouver, Canada Date - ^ ^ ^ T V ^ T DE-6 (2/88) ABSTRACT This thesis involved the development, v a l i d a t i o n and " c a l i b r a t i o n " of computerized methodologies for analysis of short-time perturbations i n vowels, including mathematical analyses of the e f f e c t of measurement errors, v e r i f i c a t i o n using synthesized data, and evaluation using r e a l data. Such methodologies have been proposed for improved diagnosis and management of laryngeal pathology. S i g n i f i c a n t e f f e c t s were observed i n mathematical analyses of quantization and pitch-period demarcation for three popular algorithms; the harmonics-to-noise r a t i o (HNR), the r e l a t i v e average perturbation (RAP) and the d i r e c t i o n a l perturbation quotient (DPQ). A severe underestimation of the HNR caused by such errors was demonstrated. The e f f e c t was shown to depend on high frequency components of the vowel. Errors a f f e c t i n g the use of the RAP i n measurement of j i t t e r and shimmer were quantified, and methods of compensation were proposed. The DPQ demonstrated a dependence on perturbation magnitude. Such errors influence the i n t e r p r e t a t i o n and comparison of r e s u l t s . A number of new measures were developed. The RAP and the DPQ were generalized for v a r i a t i o n of the number and spacing of points. The HNR was modified to account for a data o f f s e t and for reduction of the influence of j i t t e r and shimmer. A new measure of time domain noise c a l l e d the c o r r e l a t i o n factor (CF) was introduced, along with new measures of c y c l i c perturbation. i I Issues i n Fourier spectrum analysis that a f f e c t measures of s p e c t r a l noise were discussed. Methods for taking advantage of fas t Fourier transforms and window tapering were described, along with methods for reducing dependence on formant structure. A new method for "optimizing" pitch-period demarcation markers was shown to be e f f e c t i v e at reducing errors for a l l but the most severely perturbed waveforms. Cross-correlation was combined with parabolic i n t e r p o l a t i o n to obtain high resolution pitch-period demarcation at moderate sampling frequencies. An analysis of synthetic vowels was used to comparatively evaluate the influences of fundamental frequency, vowel type, perturbation type, perturbation l e v e l , pitch-period demarcation and quantization. Some findings were: 1) Interpolation i s recommended for most measures when the sampling frequency i s 20 kHz or l e s s . 2) Optimization of pitch-period markers s i g n i f i c a n t l y improved the analyses. 3) Both the o f f s e t and the accuracy of pitch-period demarcation can s i g n i f i c a n t l y a f f e c t measures of time domain noise. 4) Measures of shimmer and noise were affected by fundamental frequency and vowel type. 5) J i t t e r affected measures of other c h a r a c t e r i s t i c s . 6) Window tapering reduced the s e n s i t i v i t y of measures of s p e c t r a l noise to pitch-period demarcation errors. 7) Measures of spectral noise were far more s e n s i t i v e to j i t t e r than measures of time domain noise. i l l Prolongations of /a/ from 206 male subjects and 194 female subjects were analyzed. The computed measures were correlated with subjective judgements of hoarseness, and used to discriminate among pathologies. Some findings were: 1) Logarithmic transformation was recommended for measures of j i t t e r and shimmer. 2) Measures of time domain noise were generally superior to measures of j i t t e r , shimmer or spectral noise. 3) The best sing l e measure was the c o r r e l a t i o n factor (CF). 4) The c o r r e l a t i o n with hoarseness was improved through l i n e a r combination of the CF with a measure of j i t t e r , leading to r-.84 for males and r~.80 for females. 5) Segregation of sexes was recommended. 6) Improved c l a s s i f i c a t i o n for males was obtained through separation into four diagnostic classes. 7) Improved c l a s s i f i c a t i o n for both males and females was obtained through i n c l u s i o n of measures of perturbation patterns. 8) In an open t e s t , the best c l a s s i f i e r s had an average recognition rate of approximately 74% for dis t i n g u i s h i n g normal speakers, and 71% for detecting cancer subjects. 9) Computer c l a s s i f i c a t i o n matched or exceeded the a b i l i t y of trained l i s t e n e r s . I v TABLE OF CONTENTS CHAPTER 1: INTRODUCTION AND BACKGROUND 1 1.1 BACKGROUND 3 1.2 FACTORS IN VOICE ANALYSES 4 1.3 OVERVIEW OF PREVIOUS WORK 6 1.3.1 INVERSE FILTERING 9 1.3.2 PATHOLOGY CLASSIFICATION 12 CHAPTER 2; OBJECTIVES AND THESIS OUTLINE 15 2.1 FOCUS AND RATIONALE 15 2.2 SPECIFIC ISSUES ADDRESSED 16 2.3 THESIS ORGANIZATION 18 CHAPTER 3: VOWEL SYNTHESIS 20 3.1 FILTER DESIGN 20 3.1.1 ADAPTATION FOR HIGH FREQUENCY SYNTHESIS 22 3.2 IMPLEMENTATION OF PERTURBATIONS 26 3.3 POSITIONING OF PITCH-PERIOD DEMARCATION MARKERS 28 3.4 IMPLEMENTATION CONSIDERATIONS 29 3.5 DISCUSSION 30 3.5.1 ALTERNATIVE APPROACHES 31 CHAPTER 4: PITCH-PERIOD DEMARCATION 34 4.1 PITCH-PERIOD DEMARCATION PROCEDURE 35 4.2 PITCH-PERIOD MARKER OPTIMIZATION 36 4.2.1 EVALUATION 38 4.3 DISCUSSION 46 v C H A P T E R 5 : ALGORITHMS FOR MEASURING VOWEL P E R T U R B A T I O N S 49 5 . 1 T H E R E L A T I V E A V E R A G E PERTURBATION); (RAP) 49 5 . 1 . 1 C O N S I D E R A T I O N S FOR PARAMETER S E L E C T I O N 51 ' 5 . 2 T I M E DOMAIN H A R M O N I C S - T O - N O I S E R A T I O S (HNRS) 52 5 . 2 . 1 S I N G L E PASS FORMULATIONS 55 5 . 2 . 2 R E L A T I O N S H I P WITH A C O R R E L A T I O N C O E F F I C I E N T 59 5 . 2 . 3 R E L A T I O N S H I P WITH M I L E N K O V I C ' S MEASURE 60 5 . 3 T H E P I T C H - P E R I O D C O R R E L A T I O N FACTOR ( C F ) 62 5 . 4 T H E S P E C T R A L H A R M O N I C S - T O - N O I S E R A T I O (SHNR) 62 5 . 4 . 1 D E S C R I P T I O N OF SHNR ALGORITHMS 6 3 5 . 4 . 2 E S T I M A T I O N OF F O U R I E R S E R I E S C O E F F I C I E N T S 66 5 . 4 . 3 OTHER C O N S I D E R A T I O N S 67 5 . 5 T H E D I R E C T I O N A L P E R T U R B A T I O N Q U O T I E N T (DPQ) 68 5 . 6 C Y C L I C P E R T U R B A T I O N F A C T O R S 69 5 . 7 SUMMARY * 70 C H A P T E R 6 : A N A L Y S I S OF ERRORS I N VOWEL P E R T U R B A T I O N MEASURES 72 6 . 1 T H E R E L A T I V E A V E R A G E P E R T U R B A T I O N (RAP) 72 6 . 1 . 1 E X P E C T E D V A L U E S FOR A N A L Y S I S OF J I T T E R . . 7 3 6 . 1 . 1 . 1 P R E D I C T I N G AND COMPENSATING FOR ERROR E F F E C T S 76 6 . 1 . 2 E X P E C T E D V A L U E S I N A N A L Y S I S OF SHIMMER 7 8 6 . 1 . 3 V E R I F I C A T I O N 7 9 6 . 1 . 4 RECOMMENDED S A M P L I N G C O N D I T I O N S ; J I T T E R A N A L Y S I S 82 6 . 1 . 5 RECOMMENDED S A M P L I N G C O N D I T I O N S ; SHIMMER A N A L Y S I S 83 6 . 1 . 5 . 1 L E V E L Q U A N T I Z A T I O N 84 6 . 1 . 5 . 2 T I M E Q U A N T I Z A T I O N 85 6 . 1 . 6 SUMMARY OF C O N S I D E R A T I O N S FOR T H E RAP 88 6 . 2 T H E H A R M O N I C S - T O - N O I S E R A T I O (HNR) 90 v i 6.2.1 ANALYSIS OF THE EFFECTS OF MEASUREMENT ERRORS 91 6.2.2 SUMMARY OF CONSIDERATIONS FOR THE HNR 94 6.3 THE DIRECTIONAL PERTURBATION QUOTIENT (DPQ) 97 6.3.1 EFFECTS OF QUANTIZATION AND A CENTER-LIMIT 97 6.3.2 RECOMMENDED SAMPLING CONDITIONS 104 6.3.3 ERROR REJECTION THROUGH A CENTER-LIMIT 106 6.3.3.1 RESULTS 111 6.3.4 SUMMARY OF CONSIDERATIONS FOR THE DPQ 112 CHAPTER 7: CALIBRATION OF PERTURBATION MEASURES 113 7.1 CALIBRATION USING SYNTHETIC VOWELS 113 7.1.1 ORGANIZATION OF FIGURES 115 7.1.2 MEASURES OF JITTER 116 7.1.3 MEASURES OF SHIMMER 121 7.1.4 MEASURES OF TIME DOMAIN NOISE 128 7.1.4.1 COMPARISON OF HNR CONFIGURATIONS 135 7.1.5 MEASURES OF SPECTRAL NOISE 136 7.1.5.1 EFFECTS OF VARYING THE PROCESSING CONDITIONS 137 7.1.5.2 DATA-SEGMENT DEMARCATION EFFECTS 140 7.1.5.3 RELATIONSHIPS WITH VOWEL CHARACTERISTICS 143 7.1.5.4 WINDOW TAPERING FOR SPECTRAL LEAKAGE 147 7.1.6 MEASURES OF PERTURBATION PATTERNS 150 7.2 VERIFICATION USING REAL VOWELS 151 7.3 SUMMARY AND DISCUSSION 162 7.3.1 SENSITIVITY TO PITCH-PERIOD DEMARCATION 162 7.3.2 SAMPLING FREQUENCY AND INTERPOLATION 163 7.3.3 DEPENDENCE ON VOWEL CHARACTERISTICS 165 v l I CHAPTER 8: APPLICATION TO NORMAL AND PATHOLOGICAL SPEAKERS 168 8.1 DATA DESCRIPTION 169 8.2 SUBJECTIVE ASSESSMENTS 171 8.2.1 RESULTS 172 8.3. COMPUTER ASSESSMENTS 181 8.3.1 DATA ACQUISITION 181 8.3.2 PRELIMINARY INSPECTION OF THE MEASURES 182 8.3.3 RELATIONSHIPS AMONG THE MEASURES 188 8.3.4 CORRELATION WITH SEVERITY JUDGEMENTS 194 8.3.4.1 INDIVIDUAL MEASURES 194 8.3.4.2 PRINCIPAL COMPONENTS 196 8.3.5 SINGLE MEASURE DISTINCTIONS AMONG PATHOLOGIES 201 8.3.6 AUTOMATIC CLASSIFICATION OF PATHOLOGY 209 8.3.6.1 ACCOUNTING FOR PERTURBATION MAGNITUDE 209 8.3.6.2 CLASSIFIER DESIGN AND EVALUATION 210 8.3.6.3 DIMENSIONALITY REDUCTION 213 8.3.6.4 MEASURE SELECTION 215 8.3.6.5 CLOSED TEST RESULTS 217 8.3.6.6 OPEN TEST RESULTS 224 8.4 SUMMARY 233 8.4.1 SUBJECTIVE JUDGEMENTS 23 3 8.4.2 DISTRIBUTION OF MEASURES 234 8.4.3 CORRELATION WITH JUDGED SEVERITY 2 35 8.4.4 SEPARATION OF SEXES 235 8.4.5 CLASSIFICATION OF PATHOLOGY 2 36 8.4.6 COMPARISON OF HUMAN AND COMPUTED CLASSIFICATION 2 38 8.4.7 RELATIONSHIP TO OTHER STUDIES 2 38 v l I i C H A P T E R 9 : SUMMARY AND F U T U R E D I R E C T I O N S ' 241 9 . 1 M A T H E M A T I C A L A N A L Y S E S 241 9 . 2 MEASURE DEVELOPMENT 242 9 . 3 P I T C H - P E R I O D DEMARCATION 243 9 . 4 VOWEL S Y N T H E S I S AND MEASURE C A L I B R A T I O N .244 9 . 5 MEASURE E V A L U A T I O N 246 9 . 6 F U T U R E D I R E C T I O N S 249 B I B L I O G R A P H Y 253 A P P E N D I X A : GLOSSARY OF M E D I C A L TERMS 261 A P P E N D I X B : PARAMETERS FOR VOWEL S Y N T H E S I S 269 A P P E N D I X C : REVIEW OF FOURIER SPECTRUM A N A L Y S I S 280 A P P E N D I X D : D I G I T A L R E S A M P L I N G WITH LAGRANGE I N T E R P O L A T I O N . 288 A P P E N D I X E : S T R A T E G I E S FOR HNR ERROR A N A L Y S I S 290 A P P E N D I X F : SUMMARY OF VOWEL P E R T U R B A T I O N MEASURES 29 3 Ix LIST OF TABLES TABLE PAGE 3.1 Formant Frequency and Bandwidth S p e c i f i c a t i o n s 22 3.2 LDELAY for Pitch-Period Marker Offsets 29 4.1 Pitch-Period Marker Optimization; Analysis Parameters 43 4.2 Pitch-Period Marker Optimization; Error Variances 44 5.1 Computational Requirements for HNR Formulations 58 6.1 Minimum Sampling Frequencies for J i t t e r Analysis using the RAP 83 6.2 Minimum Data Means for Shimmer Analysis using the RAP 85 6.3 Time Quantization i n Shimmer Analysis using the RAP 88 6.4 Minimum Data Means for Shimmer Analysis using the DPQ 105 6.5 Minimum Sampling Frequencies for J i t t e r Analysis using the DPQ 106 6.6 The Range over which the ERF[u] Series i s Accurate 110 7.1 Figure Numbers for Measure C a l i b r a t i o n Figures 116 7.2 Comparison of HNR Estimates for Synthetic /a/ Vowels 136 7.3 PDAVDB for Real /a/ Vowels 153 7.4 PAAVDB and PSAVDB for Real /a/ Vowels 154 7.5 Comparison of HNR Estimates for Real /a/ Vowels 158 7.6 SHNR r o s for Real /a/ Vowels 159 7.7 SHNR s o r for Real /a/ Vowels 160 7.8 Summary of Dependencies on Pitch-Period Demarcation 163 7.9 Summary of Recommended Sampling Frequencies 164 x 8.1 Age and Pathology Incidence within Groups 170 8.2 Subjective Judgments of Severity; Summary 174 8.3 Subjective Judgments of Severity; Class Means 175 8.4 Subjective Judgments of Pathology Type; Repeatability 178 8.5 Subjective Judgments of Pathology Type; Males 179 8.6 Subjective Judgments of Pathology Type; Females 180 8.7 D e t a i l s of Subjects I d e n t i f i e d as Ou t l i e r s 183 8.8 Correlations among Measures; Males 190 8.9 Correlations among Measures; Females 191 8.10 Overall Correlation within Types of Measures 193 8.11 Cor r e l a t i o n with Subjective Judgements of Severity; Individual Measures 195 8.12 Backward Elimination Results; P r i n c i p a l Components 198 8.13 Summary of P r i n c i p a l Components 199 8.14 Correlation with Subjective Judgements of Severity; P r i n c i p a l Components 200 8.15 Univariate F-Ratios for Differences among Classes 207 8.16 Closed t e s t C l a s s i f i c a t i o n Results; Measure Types 219 8.17 Measures Excluded from the C l a s s i f i e r s i n Table 8.16 220 8.18 Closed t e s t C l a s s i f i c a t i o n Results; Perturbation Types 222 8.19 Measures Excluded from the C l a s s i f i e r s i n Table 8.18 223 8.20 Open t e s t ; Log-Magnitude Measures; Males 4-Class 227 8.21 Open t e s t ; Log-Magnitude Measures; Males 2-Class 228 8.22 Open t e s t ; Selected 4-Class C l a s s i f i e r s ; Males 229 8.23 Open t e s t ; Selected 2-Class C l a s s i f i e r s ; Males 230 8.24 Open t e s t ; Log-Magnitude Measures; Females 231 8.25 Open t e s t ; Selected Class C l a s s i f i e r s ; Females 232 x l Bl Parameters for Synthesis of /a/ 270 B2 Parameters for Synthesis of / i / 272 B3 Parameters for Synthesis of /u/ 274 Cl Relationships between the Time and Frequency Domains 285 Dl Summation Weights for Lagrange Interpolation 289 F l Summary of Names of Vowel Measures 295 x l I LIST OF FIGURES FIGURE PAGE 1 . 1 The Linear Voice Production Model 10 3 . 1 Sampling Frequency Variation i n Vowel Synthesis 23 3 . 2 Spectral Match of High Frequency Vowel Synthesizers 27 4 . 1 Pitch-Period Marker Optimization 37 4 . 2 Pitch-Period Marker Optimization: V a r i a t i o n of the Reference Segment Length 40 4 . 3 Pitch-Period Marker Optimization: The E f f e c t of a Demarcation Offset 45 6 . 1 Center-Limits to Counteract Quantization i n the RAP 78 6 . 2 Quantization E f f e c t s i n the RAP 81 6 . 3 Predicted Quantization E f f e c t s i n the HNR 96 6 . 4 Predicted Quantization E f f e c t s i n the DPQ 102 6 . 5 Comparison of Expected Values for the DPQ 103 6 . 6 The Pr o b a b i l i t y of a Sign Error i n the DPQ 111 7 . 1 PDAVDB; Sampling and Interpolation E f f e c t s 118 7 . 2 PDAVDB; Fundamental Frequency Dependence 119 7 . 3 PDAVDB; Perturbation P r o f i l e 120 7 . 4 PAAVDB & PSAVDB; Sampling and Interpolation E f f e c t s 123 7 . 5 PAAVDB & PSAVDB; Fundamental Frequency Dependence 124 7 . 6 PAAVDB & PSAVDB; Perturbation P r o f i l e 125 7 . 7 PAAVDB St PSAVDB; Dependence on Vowel Type 126 7 . 8 PAAVDB & PSAVDB; Pitch-Period Demarcation Offset 127 7 . 9 HNRN & CF; Sampling and Interpolation E f f e c t s 130 7 . 1 0 HNRN & CF; Fundamental Frequency Dependence 131 x j I i 7 . 1 1 HNRN & CF; Perturbation P r o f i l e 1 3 2 7 . 1 2 HNRN & CF; Dependence on Vowel Type 1 3 3 7 . 1 3 HNRN & CF; Pitch-Period Demarcation Offset 1 3 4 7 . 1 4 SHNR r o s & SHNR s o r; Sampling and Interpolation E f f e c t s 1 3 8 7 . 1 5 SHNR r o s & SHNR s o r; Variation of Data-Segment Length 1 4 0 7 . 1 6 SHNR r o s Gt SHNR s o r; Data-Segment Demarcation Errors 1 4 2 7 . 1 7 SHNR r o s & SHNR s o r; Data-Segment Demarcation Offset 1 4 2 7 . 1 8 SHNR r o s & SHNR s o r; Dependence on Vowel Type 1 4 3 7 . 1 9 SHNR r o s & SHNR s o r; Fundamental Frequency Dependence 1 4 5 7 . 2 0 SHNR r o s & SHNR s o r; Perturbation P r o f i l e 1 4 6 7 . 2 1 SHNR r o s Gt SHNR s o r; The E f f e c t of Window Tapering 1 4 7 7 . 2 2 SHNR r o s Gt SHNR s o r; Pitch-Period Demarcation Errors with a Hanning Window 14 9 7 . 2 3 PSAVDB; Real Vowels 1 5 5 7 . 2 4 HNRN Gt CF; Real Vowels 1 5 6 7 . 2 5 SHNR r o s St SHNR s o r; Real Vowels 1 6 1 7 . 2 6 Summary of S e n s i t i v i t y to Formant Structure and Fundamental Frequency 1 6 6 7 . 2 7 Summary of Influences of J i t t e r , Shimmer and Noise 1 6 6 8 . 1 Subjective Judgements of Severity; Histograms 1 7 6 8 . 2 Pooled Histograms; Log-Magnitude Measures 1 8 6 8 . 3 Pooled Histograms; Pattern Measures 1 8 7 8 . 4 Pooled Scatter Plots; Log-Magnitude Measures 1 8 9 8 . 5 Inter-Class Differences; Log-Magnitude Measures 2 0 2 8 . 6 Inter-Class Differences; Pattern Measures 2 0 3 8 . 7 Inter-Class Differences; P r i n c i p a l Components 2 0 5 x I v Bl Parameters for Synthesis of /a/ 276 B2 Parameters for Synthesis of / i / 277 B3 Parameters for Synthesis of /u/ 278 B4 Error Variances for Vowel Synthesis F i l t e r s 279 CI The Frequency Spectra of Windowed Sinusoids 284 xv ACKNOWLEDGEMENT This research was supported i n part by a Science Council of B.C. GREAT scholarship and by a National Health Research and Development pre-doctoral fellowship. Support arranged by Dr. Murray D. Morrison and Dr. Mabo R. Ito i s also acknowledged. Equipment was purchased with the aid of grants from the B.C. Health Care Research Foundation and the P.A. Woodward Foundation. The cooperation of Dr. Morrison i n providing access to equipment and data at the Voice Laboratory at Vancouver General Hospital made t h i s project possible. The c l i n i c a l input from Dr. Morrison and Ms. Linda A. Rammage was invaluable. The supervision of Dr. Ito and Dr. James A. McEwen i s g r a t e f u l l y acknowledged. F i n a l l y , acknowledgement of my wife L e i l a for her support, devotion and occasional e d i t o r i a l suggestions i s anything but a token gesture. xv I 1 CHAPTER 1: INTRODUCTION AND BACKGROUND T h i s r e s e a r c h e x p l o r e s methodology f o r computerized assessment o f v o c a l pathology through a n a l y s i s o f s u s t a i n e d vowels. Techniques and terminology from seemingly d i v e r s e areas o f E n g i n e e r i n g , Medicine and Speech Pathology were adopted. A b r i e f g l o s s a r y o f medi c a l terms (Appendix A) i s i n c l u d e d f o r readers t h a t do not have a medical background. Appendices t h a t p r o v i d e background i n f o r m a t i o n f o r some o f the p r o c e s s i n g techniques are a l s o i n c l u d e d . "Vocal pathology" can l o o s e l y be d e f i n e d as a c o n d i t i o n t h a t causes d i s r u p t i o n o f the normal mechanisms o f v o i c e p r o d u c t i o n . A " l a r y n g e a l pathology" i s a subset o f those c o n d i t i o n s f o r which t h e r e i s a f u n c t i o n a l or o r g a n i c d i s t u r b a n c e t o the v o c a l cords. However, the d i s t i n c t i o n i s o f t e n not c l e a r i n p r a c t i c e , and the two terms were used i n t e r c h a n g e a b l y here. One major a p p l i c a t i o n f o r computerized assessment i s s c r e e n i n g f o r l a r y n g e a l pathology. The major symptom l e a d i n g a person w i t h v o c a l pathology t o seek me d i c a l a t t e n t i o n i s o f t e n a p e r s i s t e n t d e g r a d a t i o n o f v o i c e q u a l i t y . U n f o r t u n a t e l y , most Gener a l P r a c t i t i o n e r s (GPs) l a c k the t r a i n i n g , e x p e r i e n c e and equipment t o perform many o f the c u r r e n t assessment techniques. Because many such v o i c e d i s o r d e r s e i t h e r d i s a p p e a r over time or can be s u c c e s s f u l l y t r e a t e d through b e h a v i o r a l a d v i c e or a n t i b i o t i c s , t h e r e are o f t e n s i g n i f i c a n t d e l a y s b e f o r e s p e c i a l i z e d m e d i c a l a t t e n t i o n i s sought. Computerized analyses CH 1: INTRODUCTION 2 can p o t e n t i a l l y provide GPs with information that f a c i l i t a t e s prompt and appropriate r e f e r r a l . In diseases such as laryngeal cancer, delays can greatly increase the cost of treatment, decrease the chances of s u r v i v a l and lead to excessive s u f f e r i n g . I f detected and treated at an early stage, laryngeal cancer can be cured more than 90% of the time through radiat i o n therapy alone, and the patient i s l e f t with no physical d i s a b i l i t y (Harwpod et a l . 1983, Kaplan et a l . 1983). The rate of s u r v i v a l drops quickly for l a t e r stages of cancer, and i t i s generally necessary to remove the patient's larynx s u r g i c a l l y ( i . e . , laryngectomy). A conservative estimate of the added cost of laryngectomy, including three to four weeks of h o s p i t a l i z a t i o n , s u r g i c a l fees and r e h a b i l i t a t i o n , i s approximately $22,000. In addition, few of these patients are able to return to the work force. The time i t takes to progress to the l a t e r stages i s highly variable, but can be as l i t t l e as a few months from the onset of symptoms (Morrison 1988). Computerized methods of evaluation can also be u s e f u l l y applied to the management of vocal pathology. A l i n g e r i n g question i n the treatment of any disease i s whether or not the treatment i s e f f e c t i v e . With voice problems t h i s question i s often d i f f i c u l t to answer, as u n i v e r s a l l y accepted methods for objective measurement of vocal abnormality do not presently e x i s t . Objective measures provided by computerized analysis would f a c i l i t a t e comparison of c l i n i c a l findings. Such measures could also be used i n a biofeedback capacity, allowing patients to improve t h e i r vocal habits through t r i a l and error. F i n a l l y , CH 1: INTRODUCTION 3 objective measures provide insight into relationships between symptoms, physiology and pathology. 1.1 BACKGROUND This section provides a general overview of methods for evaluating pathological voice. Current c l i n i c a l practices are b r i e f l y discussed, along with various research methodologies. A more comprehensive overview can be.found i n Childers (1977). Descriptions of acoustic theory and physiology of voice production can also be found elsewhere (e.g., Fant 1960; Rabiner & Schafer 1978) . When a patient consults a GP regarding a voice problem, demographic information accompanied by a general external physical examination are the primary tools for determining the cause. Otolaryngologists are able to more pr e c i s e l y i d e n t i f y the problem through i n d i r e c t v i s u a l i z a t i o n of the larynx ( i . e . , i n d i r e c t laryngoscopy). In the l a s t 10 to 15 years, f i b e r - o p t i c laryngoscopes have become popular (Fujimura 1981; Morrison 1984). These instruments allow for closer and more detailed v i s u a l i z a t i o n of the larynx and can be connected to video cassette recorders. Laryngostroboscopes (a f i b e r - o p t i c laryngoscope with a l i g h t source strobed at approximately the same frequency as the vocal cord vibration) make i t possible to v i s u a l i z e vocal f o l d vibratory patterns. Nasolaryngoscopes (a f l e x i b l e f i b e r - o p t i c laryngoscope inserted through a n o s t r i l ) allow for laryngeal v i s u a l i z a t i o n during continuous speech. Depending on the condition, a number of other methods are used, CH 1: INTRODUCTION 4 including biopsy and h i s t o l o g i c a l analysis, electromyography, ultrasound imaging and radiography. Other methods for obtaining information about laryngeal function have been proposed. Vocal cord v i b r a t i o n has been studied through high speed cinematography (Timcke, Moore & Von-Leden 1958, 1959; Von-Leden, Moore & Timcke 1960; Booth & Childers 1979; Childers, Mott & Moore 1980). While useful for research purposes, t h i s approach i s l i m i t e d by the high cost of equipment and the danger of tissue damage from high i n t e n s i t y l i g h t sources. Electroglottography ( i . e . , measurement of the e l e c t r i c a l resistance between two surface electrodes placed either side of the larynx) i s simple, inexpensive and non-invasive. I t has been shown to be correlated with vocal cord a c t i v i t y (Fourcin 1981; Childers 1984). Photoglottography (e.g., Harden 1975) measures the modulation of l i g h t shone through the g l o t t i s ( i . e . , the space between the vocal cords). Relevant information about vocal cord function i s provided, but the method i s more invasive than electroglottography. F i n a l l y , measurement of o r a l airflow provides an i n d i c a t i o n of phonatory function and e f f i c i e n c y ( I s s h i k i 1981) . 1.2 FACTORS IN VOICE ANALYSES Desirable properties of an evaluation method are: 1) I t should be quickly and e a s i l y performed without s p e c i a l i z e d t r a i n i n g . 2) I t should cause a minimum of discomfort to the patient. CH 1: INTRODUCTION 5 3) Results should be r e p l i c a b l e under a variety of conditions and at a variety of i n s t i t u t i o n s . 4) I t should provide information to the c l i n i c i a n that i s useful i n diagnosis or management of vocal pathology. 5 ) I t should be unaffected or consistently affected by factors that are unrelated to the pathology. 6) The o v e r a l l cost per test should be j u s t i f i e d i n terms of improved diagnosis, treatment or documentation. Computer analysis of voice recordings i s well suited to meeting these c r i t e r i a . The a c q u i s i t i o n of tes t samples i s painless, quick and simple. The analysis can p o t e n t i a l l y be performed without the patient being present, thus providing service to r u r a l areas without the cost and inconvenience of t r a v e l . The cost of suit a b l e equipment has f a l l e n dramatically i n the past decade, and the increased use of computers i n the medical profession means that much of the required hardware i s often a v a i l a b l e i n the physician's o f f i c e . I f a cent r a l i z e d computer i s used, t e s t samples could be communicated over a telephone. I f the bandwidth and noise r e s t r i c t i o n s are s i g n i f i c a n t , audio recordings made on a home stereo system could be delivered by mail. Control of external sources of v a r i a b i l i t y i s a s i g n i f i c a n t problem i n voice analysis. The c h a r a c t e r i s t i c s of a person's voice are affected by a variety of external factors, including phonetic context, intonation, age, sex, anatomy, l i f e s t y l e , language, emotions, vocal fatigue, mental fatigue and coexisting ailments. The voice can vary greatly within a recording or CH 1: INTRODUCTION 6 between recordings. I t i s generally accepted that the a b i l i t y to a l t e r voice q u a l i t y through muscle action i s unconsciously used to compensate for a vocal pathology. This e f f e c t i s exacerbated by the patients' desire to "make a nice sounding ah" for the doctor. Also, there i s a tendency for subjects to mimic the c l i n i c i a n and to place t h e i r own i n t e r p r e t a t i o n on the task being asked of them. F i n a l l y , a voice sample made p r i o r to the onset of the pathology i s generally not available for comparison. The consistency and q u a l i t y of signals being analyzed are also s i g n i f i c a n t concerns. Because data i s acquired by d i f f e r e n t c l i n i c i a n s at d i f f e r e n t i n s t i t u t i o n s , precise control of variables such as microphone-to-mouth distance, recorder gain and ac o u s t i c a l environment cannot be expected. System c a l i b r a t i o n and equipment q u a l i t y w i l l also vary. Methods that are highly s e n s i t i v e to these influences have lim i t e d usefulness. An a d d i t i o n a l consideration for the cancer screening a p p l i c a t i o n i s the low incidence of t h i s disease. There are fewer than ten new cases of laryngeal cancer per 100,000 population per year (National I n s t i t u t e s of Health 1975). Consequently, the rate of f a l s e p o s i t i v e errors i n a p r a c t i c a l screening system must be low. 1.3 OVERVIEW OF PREVIOUS WORK The voice signals from which information related to vocal pathology can p o t e n t i a l l y be extracted range from sustained vowels to continuous speech. Sustained vowels have received the greatest attention i n the l i t e r a t u r e , and are the focus of the CH 1: INTRODUCTION 7 present project. The underlying hypothesis i s that a laryngeal pathology disrupts the normal v i b r a t i o n of the vocal cords and produces a consistent and measurable change i n the acoustic output. Because i t can be assumed that the supra-laryngeal structures remain r e l a t i v e l y stationary throughout the vowel, sign a l fluctuations can be i n t u i t i v e l y attributed to laryngeal phenomenon. Inverse f i l t e r i n g techniques can be employed i n an attempt to remove the supra-laryngeal e f f e c t s (e.g., Koike & Markel 1975). Vocal exercises can p o t e n t i a l l y be designed to circumvent the e f f e c t s of muscular compensation and e s t a b l i s h the l i m i t s of vocal performance. They can also be used to observe c h a r a c t e r i s t i c s that occur only under c e r t a i n conditions. Continuous speech can p o t e n t i a l l y provide the most information, and i t has been argued that the c h a r a c t e r i s t i c s of many vocal pathologies are not be represented i n simpler speech segments. (Askenfelt & Hammarberg 1980, 1981; Frokjaer-Jensen & P r i t z 1976; Lo f q v i s t & Manderson 1987). However, segmentation and removal of a r t i c u l a t o r y components are challenging problems, and the added v a r i a b i l i t y of pathological speech exacerbates the problems. It i s useful at t h i s point to introduce c e r t a i n d e f i n i t i o n s for vowels. The acoustic s i g n a l generated during the phonation of a vowel i s roughly characterized by quasi-periodic bursts of sound energy, each followed by a period of resonant decay. A "pitch-period" i s frequently defined as the time delay between two successive bursts, although depending on the context i t can also r e f e r to the waveform that occurs during that time. To CH 1: INTRODUCTION 8 avoid t h i s ambiguity here, the time delay was redefined as the "pitch-period duration". "Pitch" was defined here as the inverse of the pitch-period duration. While defining p i t c h i n t h i s manner s u f f i c e s for the present study, i t i s recognized that t h i s d e f i n i t i o n i s not suita b l e for a l l areas of speech science. P i t c h i s a perceptual concept that i s c l o s e l y but not so l e l y related to vocal cord p e r i o d i c i t y . Rapid, fluctuations i n the timing and amplitude of the sound bursts of a vowel are common. These fluctuations are defined i n the l i t e r a t u r e as " j i t t e r " and "shimmer", respectively. The causes of these perturbations are not well understood. Laryngeal asymmetry, random variations i n tissu e innervation, r e f l e x mechanisms and airflow c h a r a c t e r i s t i c s have been implicated (Baer 1981; Heiberger & H o r i i 1982). Nonetheless, i t has been demonstrated using synthesized vowels that hoarseness or roughness i s perceived when there are increased amounts of j i t t e r and shimmer (Wendahl 1963, 1966) or when there are gross cy c l e - t o - c y c l e waveform changes (Coleman 1971). I t i s not sur p r i s i n g , therefore, that increased l e v e l s of j i t t e r and shimmer are observed i n pathological voices (e.g., Lieberman 1961, 1963; C r y s t a l , Montgomery, Jackson & Johnson 1970; Koike, Takahashi & Calcaterra 1977; Heiberger & H o r i i 1982; Zyski, B u l l , McDonald & Johns 1984; Wolfe & S t e i n f a t t 1987). I t has also been suggested that various types of laryngeal pathologies have c h a r a c t e r i s t i c patterns of j i t t e r and shimmer, and patterns CH 1: INTRODUCTION 9 associated with laryngeal cancer have been reported (Koike 1968; Von-Leden & Koike 1970; Iwata & Von-Leden 1970; Iwata 1972). Noise components are also common i n vowels, and are at t r i b u t a b l e to turbulent airflow and inadequate closure of the g l o t t i s . Spectrographic analyses have shown that hoarseness i s manifested by increased spectral noise throughout the speech frequency range (Yanagahara 1967). Some investigators have developed quantitative measures of spectral noise and have achieved some success i n gauging l e v e l s of hoarseness (Kojima, Gould, Lambaise & I s s h i k i 1980; Kitajima 1981). Strong correlations with subjective judgements of hoarseness have also been reported for measures of time domain noise (Yumoto, Gould St Baer 1982; Yumoto 1983; Yumoto, Sasaki St Okamura 1984). There i s some question as to the re l a t i o n s h i p between noise i n the time domain and noise i n the frequency domain (Klingholtz & Martin 1986; Cox et a l . 1 9 8 9 a ) . Further discussion can be found i n Section 7.3.3. 1.3.1 INVERSE FILTERING Inverse f i l t e r i n g through l i n e a r p r e d i c t i o n has been an important t o o l i n the study of speech (Childers 1977; Monsen 1981; Rabiner St Schafer 1978, pp. 396-461; Rothenberg 1977, 1981). The basis for t h i s approach i s the l i n e a r model for voice production (Fant 1960; Rabiner St Schafer 1978, pp. 38-115) that i s i l l u s t r a t e d i n Figure 1.1. The ex c i t a t i o n for t h i s model i s a s p e c t r a l l y f l a t s i g n a l that has no precise p h y s i o l o g i c a l analog. For vowels the ex c i t a t i o n i s approximated by a s t r i n g of CH 1: INTRODUCTION 10 impulses. The "G l o t t a l Shaping" f i l t e r converts the impulses into triangular-shaped pulses representing the g l o t t a l volume-velocity (V-V) waveform. The vocal t r a c t and l i p r a d i a t i o n f i l t e r s represent the modulation imposed by supra-laryngeal anatomy. FIGURE 1.1: THE LINEAR VOICE PRODUCTION MODEL EXCITATION GLOTTAL SHAPING • • • GLOTTAL VOLUME VOCAL TRACT A A A . i i i_ LIP RADIATION i I I . SYNTHESIZED > 5 VELOCITY WAVEFORM SPEECH 1 2 3 1 2 3 1 2 3 FREQUENCY (kHz) FREQUENCY (kHz) FREQUENCY (kHz) With inverse f i l t e r i n g , the goal i s to model and remove the influence of supra-laryngeal structures. Two methods have been described, depending on the desired output. For "residue inverse f i l t e r i n g " , the desired output i s an estimate of the ex c i t a t i o n waveform. For " g l o t t a l inverse f i l t e r i n g " the desired output i s an estimate of the g l o t t a l V-V waveform. I t stands to reason that the r e l a t i v e l y chaotic c h a r a c t e r i s t i c s of pathological speech compound the problems encountered for normal speakers. For example, the observation that the g l o t t a l source in t e r a c t s with vocal t r a c t resonances (Fant & Ananthapadmanabha 1982; Fant 1983) has prompted researchers to focus on closed portions of the vocal cord vibratory cycle for parameter estimation. For pathological speakers i t i s d i f f i c u l t to r e l i a b l y i d e n t i f y the closed portion, CH 1: INTRODUCTION 11 and i t i s not unusual for a closed portion to be absent. Furthermore, the perturbations associated with pathologies affect the estimated f i l t e r parameters, t h e o r e t i c a l l y r e s u l t i n g i n removal of relevant information. These concerns aside, two approaches for obtaining measures of laryngeal pathology through inverse f i l t e r i n g have been described. F i r s t l y , the output of the inverse f i l t e r , i f i t performs i t s job properly, should be r e l a t i v e l y free of, supra-laryngeal e f f e c t s , and deviations of t h i s output from a "normal" waveform may be good indicators of laryngeal pathology (Koike & Markel 1975; Davis 1976, 1981). Secondly, the pathological components may produce meaningful changes i n the re s u l t i n g f i l t e r c o e f f i c i e n t s (Deller 1979; De l l e r & Anderson 1980; Smith 1980, 1981; Smith & Childers 1983). The choice of method i s not cl e a r . On one hand, an estimate of the g l o t t a l V-V waveform can be compared with p h y s i o l o g i c a l l y observable phenomenon for model v a l i d a t i o n and for i d e n t i f i c a t i o n of meaningful parameters. I f the model i s an accurate and complete representation of the events i n pathological voice production, then a l l information of in t e r e s t should be present. However, Davis (197 6) argued that the residual s i g n a l should t h e o r e t i c a l l y contain more information because the g l o t t a l V - V i s a low-pass f i l t e r e d version of the e x c i t a t i o n . This added information may be relevant when the model i s inaccurate. Furthermore, estimates of the g l o t t a l V-V have been found to be highly s e n s i t i v e to the phase d i s t o r t i o n observed i n audio cassette tape recorders (Holmes 1975; Berouti, Childers & Paige CH 1: INTRODUCTION 12 1977) and to recording conditions such as microphone p o s i t i o n and external acoustic environment. While sp e c i a l purpose hardware and techniques have been developed to reduce these e f f e c t s (Rothenberg 1977, 1981; Sondhi 1975), the cost of such equipment i s d i f f i c u l t to j u s t i f y for the present ap p l i c a t i o n . 1.3.2 PATHOLOGY CLASSIFICATION Pattern c l a s s i f i c a t i o n techniques have been applied to the task of di s t i n g u i s h i n g laryngeal pathology. C r y s t a l , Montgomery, Jackson and Johnson (1970) applied l i n e a r discriminant analysis to separate normal from pathological speakers. Data were recorded using a throat microphone and twenty measures of j i t t e r and shimmer magnitude were used. A closed t e s t yielded approximately 90% accuracy at separating 72 normal males from 3 3 pathological males. The r e s u l t for females was approximately 7 4% correct. There were 100 normal and 19 pathological females. Zyski, B u l l , McDonald and Johns (1984) used a discriminant analysis on various measures of j i t t e r and shimmer. While no c l a s s i f i c a t i o n r e s u l t s were presented, s i g n i f i c a n t differences were observed between 20 normal and 52 pathological subjects. Ludlow, Bassich, Connor, Coulter and Lee (1985) used multiple regression i n an attempt to derive a normative range for j i t t e r and shimmer. The u t i l i t y i n separating 99 normal and 34 pathological subjects was evaluated. Regression was used to account .for age, sex, smoking and drinking. A recognition rate of approximately 70% was reported. CH 1: INTRODUCTION 13 Laver, Mackenzie, H i l l e r and Rooney (1985) reported closed t e s t recognition rates of approximately 90% for both males and females. There were approximately 80 normal and 100 pathological samples for each sex. Seven measures of j i t t e r and shimmer magnitude were combined with two d i r e c t i o n a l perturbation quotients (Hecker & Kreul 1971). I t was suggested that l i n e a r discriminant analysis was superior to a maximum l i k e l i h o o d approach. However, the differences i n performance can also be attributed to increased dimensionality problems i n the maximum l i k e l i h o o d c l a s s i f i e r , leading to increased bias i n the closed t e s t . Pattern recognition has also been applied to measures derived through inverse f i l t e r i n g . Davis (1976, 1981) used six h e u r i s t i c measures of the l i n e a r prediction residual and reported an open t e s t performance of approximately 70% for distinguishing normal and pathological samples. I t was also suggested that the c l a s s i f i c a t i o n performance improved with the use of the inverse f i l t e r . However, the measures which contributed most i n the c l a s s i f i e r could equally well have been computed without the inverse f i l t e r . Furthermore, an all-in-one strategy was used for automatic pitch-period demarcation and measure computation, but no data was given on how t h i s program was tested and v e r i f i e d . This may be important, as a " g l i t c h " of unknown o r i g i n was frequently observed i n the residual s i g n a l when the same program was used i n another study (Cox & Morrison 1983). This g l i t c h affected the pitch-period demarcation and the computed measures. CH 1: INTRODUCTION 14 H i k i , Imaizumi, Hirano, Matsushita & Kakita (1976) applied a s i m i l a r approach as Davis to the l i n e a r p r e d i c t i o n estimate of the g l o t t a l V-V waveform. Ten parameters were reduced to three dimensions using a p r i n c i p a l components analysis. Results suggested that the cancer samples had increased l e v e l s of noise, and that non-cancerous pathologies had greater pitch-period a p e r i o d i c i t y . The normalized distance between groups suggested closed recognition rates of approximately 84% for detection of laryngeal pathology and for detection of cancer. Deller (1979, 1980) pursued the hypothesis that pathological speech components are re f l e c t e d i n the c o e f f i c i e n t s of the inverse f i l t e r . Correspondingly, he defined s t a t i s t i c a l measures based on the scatter of the c o e f f i c i e n t s i n the Z domain. A c l u s t e r i n g analysis yielded promising results for synthesized vowels. Smith (1980, 1981, 1983) modified Deller's method for app l i c a t i o n to r e a l vowel samples, but obtained discouraging r e s u l t s (approximately 63% accuracy i n an open te s t on 10 normal and 10 pathological males). Superior performance (approximately 75% accuracy) was obtained by Smith when the method was applied to electroglottographic waveforms. 15 CHAPTER 2; OBJECTIVES AND THESIS OUTLINE 2.1 FOCUS AND RATIONALE The o v e r a l l mission of t h i s project was to develop and evaluate computerized methodologies for use by c l i n i c i a n s i n the diagnosis and treatment of vocal pathologies. Four major applications for t h i s research were i d e n t i f i e d i n the introduction: screening for laryngeal pathology, charting of the progress of therapy, therapeutic feedback, and documentation and education. However, i t i s worth emphasizing that the methodologies are intended for use i n conjunction with e x i s t i n g techniques, rather than as a replacement of those techniques. The focus of t h i s thesis i s on the development and evaluation of measures of short-time perturbation i n audio-recordings of i s o l a t e d /a/ prolongations. Such measures are l i m i t e d to pathologies for which a persistent perturbation i s present, c l i n i c a l l y noted as hoarseness. Intermittent problems such as phonation breaks and problems with voicing onset are excluded. The rationale for using iso l a t e d vowel prolongations was discussed i n the introduction. B r i e f l y , the vocal t r a c t resonances for iso l a t e d vowels can reasonably be assumed to be invariant, and the vocal cords are the only major c o n s t r i c t o r and v i b r a t o r . The i s o l a t i o n provides a r e l a t i v e l y long data segment that i s r e l a t i v e l y free of contextual and c o - a r t i c u l a t i o n e f f e c t s . The use of /a/ has additional advantages. I t i s a natural sound for most people to produce and prolong, regardless CH 2: OBJECTIVES 16 of l i n g u i s t i c background. I t also provides a convenient separation of the f i r s t formant and the fundamental frequency, thus s i m p l i f y i n g pitch-period demarcation and other analyses. The focus on short-time perturbations also has advantages. Numerous previous studies have observed increased perturbation i n pathological speakers. Such perturbations are simply measured and are r e l a t i v e l y i n s e n s i t i v e to phase d i s t o r t i o n of recording equipment. F i n a l l y , modulation of these perturbations i s not generally a controlled part of speech communication. 2.2 SPECIFIC ISSUES ADDRESSED Two basic questions need to be asked when evaluating diagnostic methodologies: 1) Does i t e f f e c t i v e l y measure the c h a r a c t e r i s t i c for which i t was designed?, and 2) Does i t add useful c l i n i c a l information? I f the f i r s t question i s not adequately answered, conclusions made regarding the second can be misleading. Much of the work for t h i s thesis was directed at the f i r s t question. A mathematical analysis was used to estimate the e f f e c t of data l a b e l l i n g and quantization on various perturbation measures. This highlighted the importance of accurate pitch-period demarcation, and prompted the development of an improved demarcation method. A systematic evaluation of the influence of various vowel c h a r a c t e r i s t i c s was performed. This entailed the development of sui t a b l e methodology for vowel synthesis, generation of a CH 2: OBJECTIVES 17 database of te s t waveforms and comparison of measures computed from those waveforms. S p e c i f i c questions that were addressed are summarized at the s t a r t of Chapter 7. The question of c l i n i c a l information was also addressed. Voice samples from 206 male and 194 female subjects were analyzed. Vowel segments were randomly d i s t r i b u t e d on audio test tapes, and two experienced l i s t e n e r s (an Otolaryngologist and a Speech Pathologist) judged the severity of hoarseness and the probable pathology. The computed measures were then extracted from the same vowel samples and compared with the l i s t e n e r s ' judgements. Pattern c l a s s i f i c a t i o n techniques were employed, to evaluate the usefulness of the computed measures for discrimination among pathologies. S p e c i f i c questions that were addressed are summarized l a t e r i n t h i s section and at the s t a r t of Chapter 8. F u l l y automated methods of data a c q u i s i t i o n , pitch-period demarcation and measure computation were not used. While automation i s necessary for implementation of a c l i n i c a l l y e f f e c t i v e system, i t was f e l t that i n t e r a c t i o n was needed for v e r i f i c a t i o n . Methods for automatic s e l e c t i o n of representative portions pf the sig n a l are not simple. Such methods should be evaluated i n i s o l a t i o n before incorporation into the o v e r a l l system. In summary, the following goals were defined: 1) Analyze the e f f e c t of quantization and data l a b e l l i n g on various measures of vowel perturbation. CH 2: OBJECTIVES 18 2) Develop new measures with c e r t a i n t h e o r e t i c a l advantages over e x i s t i n g ones. 3) Develop and evaluate a method for precise demarcation of pitch-periods. 4) Develop methodology for synthesis of vowel waveforms that are s u i t a b l e for " c a l i b r a t i n g " the perturbation measures. 5) Perform a systematic evaluation of the s e n s i t i v i t y of the perturbation measures to various vowel c h a r a c t e r i s t i c s . 6) Evaluate the perturbation measures i n d i v i d u a l l y and i n combination with respect to t h e i r c o r r e l a t i o n with subjective judgements of hoarseness. 7) Assess the u t i l i t y of the perturbation measures for determining the presence or type of laryngeal pathology. 2.3 THESIS ORGANIZATION This thesis was organized into nine chapters. The f i r s t two introduce the subject area, provide background information and terminology, discuss the major application areas, discuss complicating factors and overview previous work. Chapter 3 discusses the synthesis of vowel t e s t waveforms and provides s p e c i f i c a t i o n s for synthesis of /a/, / i / and /u/ at sampling frequencies between 10 kHz and 100 kHz. Chapter 4 presents the methodology for pitch-period demarcation. Chapter 5 describes the algorithms for measure computation. Chapter 6 summarizes the mathematical analysis of quantization and data l a b e l l i n g i n three of the algorithms. Chapter 7 presents a c a l i b r a t i o n of the measures using synthetic vowel data. Chapter 8 presents an evaluation using r e a l vowel samples. The f i n a l chapter provides CH 2 : OBJECTIVES 19 an o v e r a l l summary of the findings, and discusses di r e c t i o n s for further research. Six appendices were included. Appendix A compiles a l i s t of medical terms. Appendix B contains parameters for vowel synthesis. Appendix C reviews relevant aspects of Fourier analysis. Appendix D describes Lagrange resampling. Appendix E d e t a i l s one stage of the error analysis for the harmonics-to-noise r a t i o . Appendix F summarizes the computed vowel measures. 20 CHAPTER 3: VOWEL SYNTHESIS This chapter describes methods used for synthesizing vowel waveforms. The intended application i s for c a l i b r a t i o n of computed measures of sustained vowels. A d i g i t a l vowel synthesizer was designed to produce data with the following c h a r a c t e r i s t i c s : 1) A "close" match with the spectral envelope of a vowel between 0 and 4.7 kHz. 2) Sharp attenuation above 5 kHz to permit downsampling without low-pass f i l t e r i n g . 3) Pitch-periods that do not a l l s t a r t at integer sampling i n t e r v a l s . 4) Pitch-period demarcation that i s synchronized with the d r i v i n g function of the synthesizer. 5) Pseudo-random j i t t e r , shimmer and noise perturbations that can be independently varied over a range observed i n vowels. 6) High temporal resolution for low l e v e l s of j i t t e r . 7) Data generation at sampling frequencies between 10 kHz and 100 kHz. 3.1 FILTER DESIGN A d i g i t a l vowel synthesizer described i n Rabiner and Schafer (1978), pp. 379-380 was adapted for t h i s a p p l i c a t i o n . This synthesizer i s based on d i g i t a l f i l t e r i n g of an impulse t r a i n . The f i l t e r function i s : CH 3: VOWEL SYNTHESIS 21 H(z) = V(z) * S(z) (3.1) where V(z) = N -irbjT —2irb j_T 1 - 2e *cos(2Tff iT) + e -TTbjT _ -2TTbiT i=l 1 - 2e *cos(2Trf iT)z 1 + e z 1 -2iraT —2irbT ( 1 - e ) ( l + e ) S(z) = — 2iraT -2nbT (1 - e z _ 1 ) ( 1 + e z - 1 ) a = the f i r s t compensation pole (Hz) b = the second compensation pole (Hz) T = the sampling period (seconds/sample) a=200 Hz and b=2500 Hz when T=100 usee F i l t e r sampling frequency = 1/T N = the number of formants f^ = the resonant frequency of the i ' t h formant (Hz) bj^ = the bandwidth of the i ' t h formant (Hz) The vocal t r a c t resonances were modelled as a cascade of d i g i t a l resonators (V(z)). A fixed s p ectral compensation (S(z)) was added to approximate the g l o t t a l pulse and radiati o n contributions. Spectral compensation recommended for vowel synthesis at a sampling frequency of 10 kHz i s given i n Eq. (3.1). Formant frequencies and bandwidths for /a/, / i / and /u/ can be found i n Table 3.1 (Rabiner & Schafer 1978, pp. 74-77). CH 3: VOWEL SYNTHESIS TABLE 3.1: FORMANT FREQUENCY AND BANDWIDTH SPECIFICATIONS 22 This table compiles formant s p e c i f i c a t i o n s for synthesis of /a/, / i / and /u/ vowels. " f i " = the frequency of the i ' t h formant "b^" = the bandwidth of the i ' t h formant /a/ / i / /u/ i f i b ± f i b i . f i b i (HZ) (HZ) (Hz) (HZ) (Hz) (Hz) 1 650 94 223 53 232 61 2 1076 91 2317 59 597 57 3 2463 107 2974 388 2295 66 4 3558 199 3968 174 3850 43 5 4631 90 4424 870 3.1.1 ADAPTATION FOR HIGH FREQUENCY SYNTHESIS Increasing the sampling frequency ( i . e . , lowering T) i n Eq. (3.1) res u l t s i n an attenuation of higher frequency components. The si g n i f i c a n c e of t h i s e f f e c t i s i l l u s t r a t e d i n Figure 3.1. Changing the sampling frequency by as l i t t l e as 2 kHz caused a large attenuation e f f e c t . The attenuation can be attributed to an altered i n t e r a c t i o n of the f i l t e r poles with t h e i r r e f l e c t i o n s i n the Z domain. Increasing the sampling frequency increases the separation of the f i l t e r poles from t h e i r r e f l e c t i o n s , leading to a reduction of the in t e r a c t i o n . This i s si m i l a r to the higher pole correction issue encountered for analog vowel synthesizers (e.g., Gold & Rabiner 1968). CH 3: VOWEL SYNTHESIS 23 FIGURE 3.1; SAMPLING FREQUENCY VARIATION IN VOWEL SYNTHESIS. Frequency responses of synthesis f i l t e r s for /a/, / i / and /u/ vowels are plotted for sampling frequencies between 10 kHz and 30 kHz. The f i l t e r i s defined i n Eq. (3.1). Formant s p e c i f i c a t i o n s are i n Table 3.1. The f i l t e r was designed for a sampling frequency of 10 kHz. CH 3: VOWEL SYNTHESIS 24 The following procedure was used to counteract the high frequency attenuation. The formant s p e c i f i c a t i o n s i n Table 3.1 were used i n Eq. (3.1) to obtain reference f i l t e r s with the desired frequency responses. A design f i l t e r was then constructed by changing the sampling period (T) and replacing S(z) with the following generalized spectral compensation f i l t e r : -2naT K -rib^T 1 - e -j—r- 1 + 2e S'(z) = — ( 3 . 2 ) -2rfaT - T i b ^ T 1 - e z 1 k=l 1 + 2e z _ 1 J -TTbjT -2TTb-jT - i — r 1 - 2e cos(2Tff^T) + e -TTb-jT -2Tfb-jT j=l 1 - 2e cos(2TifjT)z x + e z z where K = the number of high frequency f i r s t order poles b^ = the k'th high frequency f i r s t order pole J = the number of compensation resonators f j = the frequency of the j ' t h compensation resonator bj = the bandwidth of the j ' t h compensation resonator Parameters for Eq. (3.2) were selected to obtain a "good" spectral match with the reference between 0 and 4.7 kHz, sharp attenuation above 5 kHz, and no added spectral peaks. The i n i t i a l configuration at each sampling frequency was obtained by t r i a l and error. For sampling frequencies of l e s s than 11.2 kHz, s a t i s f a c t o r y results were obtained with J=0 and K=l or 2. The use of up to four resonators with fj=5 kHz was more appropriate at higher sampling frequencies ( i . e . , K=0 and 1 £ J £ 4). The CH 3: VOWEL SYNTHESIS 25 i n c l u s i o n of one additional resonator with f j between 0 and 5 kHz allowed for " f i n e tuning" of the spectral match i n some cases. Compensation parameters were optimized using a "sectioning" method (Wilde & Beightler 1967). The average squared-magnitude-difference between the frequency responses of the design f i l t e r and the reference f i l t e r was minimized. This distance metric was chosen over the average squared-dB-difference because i t places greater emphasis on the match at high power frequency components. The frequency responses were computed for 100 evenly spaced points between 0 and 4.7 kHz using a double p r e c i s i o n (64-bit) Goertzel DFT algorithm (Oppenheim & Schafer 1975, pp. 287-291). The following parameters were adjusted: the low frequency f i r s t order pole (a), the high frequency f i r s t order poles (b^), the bandwidth of the 5 kHz resonators (bj), the bandwidth of the f i n e tuning resonator (b-j), and the frequency of the f i n e tuning resonator ( f j ) . Only integer values for each parameter were considered. Dimensionality was reduced by equating the bandwidths of the 5 kHz resonators when more than one was present. Added spectral peaks were avoided by constraining the bandwidth of a l l resonators to greater than 400 Hz. Sharp attenuation above 5 kHz was assured, as the resonant frequency of a l l resonators was 5 kHz or l e s s . Compensation parameters for synthesis of /a/, / i / and /u/ vowels at sampling frequencies between 10 kHz and 100 kHz are summarized i n Appendix B. The maximum mismatch was 0.8 dB, 1.1 dB and 3.8 dB for /a/, / i / and /u/, respectively. These CH 3: VOWEL SYNTHESIS 26 mismatches occurred at low power frequency components. As an example, Figure 3.2 i l l u s t r a t e s the spectral match of f i l t e r s designed for a sampling frequency of 80 kHz. 3.2 IMPLEMENTATION OF PERTURBATIONS Zero mean evenly d i s t r i b u t e d pseudo-random numbers for implementing the perturbations were obtained using the FORTRAN l i b r a r y function RAN(). Seed values were chosen to produce sequences with means and variances that c l o s e l y approximate the expected values. Separate seed values were used for each type of perturbation ((30000,20000) for j i t t e r , (20000,30000) for shimmer, and (30000,30000) for noise). 500 points from each sequence were inspected to ensure the absence of obvious patterns or i n t e r r e l a t i o n s h i p s . A start-up transient i n the f i r s t 10 points of the series was observed for the seed values recommended i n the FORTRAN documentation (0,0). The maximum absolute values for j i t t e r and shimmer perturbations were s p e c i f i e d as percentages of the impulse spacing and impulse s i z e , respectively. Random noise was added to the output. Noise l e v e l was s p e c i f i e d as a percentage of the maximum steady-state output amplitude. The steady-state was estimated by e x c i t i n g the f i l t e r with three impulses that were free of j i t t e r and shimmer. I t i s worth emphasizing that the perturbation l e v e l s p e c i f i e s the maximum absolute value of the perturbation rather than i t s standard deviation. The standard deviations can be obtained from the maximums by d i v i d i n g by /3. CH 3: VOWEL SYNTHESIS 27 FIGURE 3.2: SPECTRAL MATCH OF HIGH FREQUENCY VOWEL SYNTHESIZERS. Frequency responses of vowel synthesis f i l t e r s designed for a sampling frequency of 80 kHz are compared with reference f i l t e r s designed for a sampling frequency of 10 kHz. Sp e c i f i c a t i o n s can be found i n Eq. (3.1), Eq. (3.2) and Table 3.1. FREQUENCY (kHz) CH 3: VOWEL SYNTHESIS 3.3 POSITIONING OF PITCH-PERIOD DEMARCATION MARKERS 28 Markers representing the s t a r t of each pitch-period were synchronized with the d r i v i n g impulses of the synthesizer, and o f f s e t to be aligned with s p e c i f i c waveform events. The following o f f s e t was used to p o s i t i o n the markers. DFACT = LDELAY - [(NPOLE/2)*106/FSAMP] (3.3) where LDELAY = a constant for aligning markers with a p a r t i c u l a r waveform event (microseconds) NPOLE = the number of f i l t e r poles FSAMP = the f i l t e r sampling frequency (Hz) Compensation was included i n t h i s expression to remove dependence on the f i l t e r delay. A common model for the vocal t r a c t i s an N-section concatenated l o s s l e s s tube. The transfer, function for such a model takes the form of Eq. (3.4) (Rabiner & Schafer 1978, p. 95). C * z "NPOLE/2 ( 3 . 4 ) , H(z) = — for constants C and a n NPOLE 1. - 2 a n * z ~ n n=l The delay term i n the numerator of Eq. (3.4) gives r i s e to the compensation i n Eq. (3.3), as that i s the only difference between i t and the transfer function of an a l l - p o l e d i g i t a l f i l t e r . Delay constants (LDELAY) for alignment with three waveform events are summarized i n Table 3.2. These values were obtained CH 3: VOWEL SYNTHESIS 29 through inspection of perturbation-free waveforms that had a fundamental frequency of 128 Hz. TABLE 3.2: LDELAY FOR PITCH-PERIOD MARKER OFFSETS This table contains LDELAY constants, i n microseconds, for posit i o n i n g of pitch-period demarcation markers using Eq. (3.3). Location 1 = the waveform t r a n s i t i o n preceding the f i r s t large pitch-period o s c i l l a t i o n Location 2 = the peak of the f i r s t large o s c i l l a t i o n Location 3 = the zero crossing following the f i r s t peak LDELAY (msec) FOR VARIOUS VOWELS LOCATION /a/ / i / /u/ 1 450 450 600 2 900 1850 1700 3 1200 3000 3050 3.4 IMPLEMENTATION CONSIDERATIONS When the sampling frequency i s increased, the f i l t e r poles become more c l o s e l y grouped and move towards the u n i t c i r c l e i n the Z domain. This results i n increased s e n s i t i v i t y to errors caused by l i m i t e d computational p r e c i s i o n . When normal (32-bit) p r e c i s i o n was used and the sampling frequency was greater than 40 kHz, s i g n i f i c a n t errors were observed i n the f i l t e r frequency responses. These e f f e c t s were successfully removed by using double (64-bit) p r e c i s i o n and by separating the synthesis f i l t e r CH 3: VOWEL SYNTHESIS 30 into a cascade of s u b f i l t e r s , where the order of each s u b f i l t e r was l e s s than 11. Data were stored as 16-bit integers with a maximum amplitude of l e s s than 1500. This simulates d i g i t i z a t i o n using a 12-bit A/D converter. The f i r s t three pitch-periods were not stored i n order to avoid the f i l t e r start-up transient. 3.5 DISCUSSION Use of a f i l t e r e d impulse t r a i n has a number of advantages. The synthesis f i l t e r can be designed to obtain a good match with the s p e c t r a l envelope of a vowel. The technique i s well known, and implementation i s r e l a t i v e l y straightforward. There i s a convenient separation of spectral and e x c i t a t i o n issues. The impulses provide a s o l i d anchor point for synchronization of pitch-period demarcation markers. F i n a l l y , independent control of j i t t e r and shimmer i s e a s i l y obtained through simple modifications to the impulse t r a i n . One must accept a number of l i m i t a t i o n s when using the f i l t e r e d impulse t r a i n described here. The e x c i t a t i o n s i g n a l bears l i t t l e r e l a t i o n s h i p to observable p h y s i o l o g i c a l phenomenon, and r e s u l t s cannot be d i r e c t l y related to s p e c i f i c c h a r a c t e r i s t i c s of the g l o t t a l volume-velocity waveform. Issues such as source-tract i n t e r a c t i o n and v a r i a b i l i t y of vocal t r a c t resonances were not simulated. Since source-tract i n t e r a c t i o n was not modelled, the data had a larger degree of pitch-period superposition than would be expected i n r e a l speech samples (Fant & Ananthapadmanabha 1982). F i n a l l y , a number of issues were not CH 3: VOWEL SYNTHESIS 31 considered i n implementation of the perturbations. Noise and d i s t o r t i o n during data recording and playback, environmental noise, room acoustics and e l e c t r i c a l interference were not modelled. Time dependencies within perturbations, correlations among perturbations, c o r r e l a t i o n with the harmonic component, v a r i a b i l i t y of the r e l a t i v e l e v e l s of the perturbations and deviations from the d i s t r i b u t i o n assumptions were not considered. While the spectral matches obtained through parameter optimization were adequate for the present a p p l i c a t i o n , further refinement i s possible. Under c e r t a i n conditions, such as a sharp ridge i n the solution surface, the optimization procedure can be i n e f f i c i e n t , or can stop before the optimal s o l u t i o n i s found. Other procedures are less affected by these problems (e.g., Wilde & Beightler 1967, pp. 271-344). Furthermore, closer s p e c t r a l matches can probably be obtained by considering non-integer parameter values, by varying the frequency of the 5 kHz resonators or by allowing the bandwidths of multiple 5 kHz resonators to d i f f e r . 3.5.1 ALTERNATIVE APPROACHES An a l t e r n a t i v e method for increasing the sampling frequency of the vowel synthesizer i s to modify the f i l t e r delay terms rather than the f i l t e r c o e f f i c i e n t s . For example, the following two f i l t e r s have i d e n t i c a l frequency responses between 0 and FSAMP/2 Hz. CH 3: VOWEL SYNTHESIS 32 R±(Z) = [ 2 Ni * z - 1 ] / [ 2 Di * Z _ 1 ] (3.5) where f i l t e r sampling frequency = FSAMP Hz H M(z) = [ 2 Ni * z - " * 1 ] / [ 2 Di * z - " * 1 ] (3.6) where M = a p o s i t i v e integer f i l t e r sampling frequency = M*FSAMP Hz The high frequency f i l t e r (Eq. (3.6)) e f f e c t i v e l y i n s e r t s M-l zeros between each point of the o r i g i n a l impulse response. Thus, a low-pass f i l t e r at FSAMP/2 Hz i s required for i n t e r p o l a t i o n . The required l e v e l of computational pr e c i s i o n i s independent of M when t h i s approach i s used, and the low-pass f i l t e r can be implemented at the output sampling frequency (Crochiere & Rabiner 1981). However, 2 stages of f i l t e r i n g are required, and the technique i s appropriate only when M i s an integer. A method for obtaining high resolution j i t t e r perturbation without increasing the sampling frequency was described by Milenkovic (1987). The impulses were replaced with a s t r i n g of band-limited e x c i t a t i o n pulses (e(t)) defined as e(t) = A * sin( (2N+l)*TT*t/T0 ) / sin( n*t/T0 ) (3.7) where e(t) = the e x c i t a t i o n pulse A = amplitude sc a l i n g TO = the fundamental period N = the number of harmonics CH 3 : VOWEL SYNTHESIS 33 Simulation of an e x c i t a t i o n that occurs between sample points i s obtained by sampling the e x c i t a t i o n pulse at appropriate o f f s e t s . Such an approach can be applied with the synthesis f i l t e r s described here to obtain an a r b i t r a r y resolution for j i t t e r perturbation. However, anticipatory o s c i l l a t i o n s ( i . e . , o s c i l l a t i o n s of increased amplitude that precede the major excitation) may lead to confusing indications of pitch-period i n t e r a c t i o n . 34 CHAPTER 4: PITCH-PERIOD DEMARCATION This chapter describes the methods for pitch-period demarcation. A new method i s presented for reducing quantization and other errors i n pitch-period demarcation. This method was also described elsewhere (Cox et a l . 1986b, 1 9 8 ^ b ) • The analysis of sustained vowels i s frequently preceded by a pitch-period demarcation process. Unfortunately, most ex i s t i n g methods do not provide adequate p r e c i s i o n for the present ap p l i c a t i o n , and s i g n i f i c a n t errors have been observed i n measures of j i t t e r ( H orii 1979; Heiberger & H o r i i 1982; T i t z e et a l . 1987), time domain noise (Cox et a l . 1986)-,, 19895; Hillenbrand 1987) and spectral noise (Cox et a l . 1989 a). Manual marking through inspection of waveforms on a video terminal i s . tedious, and accuracy i s highly dependent on the resolution of the display and the d i l i g e n c e of the operator. Noise, d i s t o r t i o n and other perturbations have been i d e n t i f i e d as s i g n i f i c a n t sources of error (e.g., Hillenbrand 1987), and inter-subject v a r i a b i l i t y makes i t d i f f i c u l t to define a s u i t a b l e waveform event. A v a r i e t y of strategies for automated marking have been devised (e.g., Hess 1983). However, most of these strategies were designed for applications where small marking errors are acceptable. P r i o r i t y has been given to computation speed and ease of implementation. A new method i s proposed here for reducing quantization and demarcation errors i n pitch-period markers. The method uses parabolic i n t e r p o l a t i o n of a c r o s s - c o r r e l a t i o n function to CH 4: PITCH-PERIOD DEMARCATION 35 "optimize" the markers. I t i s not r e s t r i c t e d to a p a r t i c u l a r waveform event, and i s r e l a t i v e l y unaffected by l o c a l perturbations. This method provides a means for obtaining high resolution i n pitch-period markers without using high sampling frequencies, thus f a c i l i t a t i n g e f f i c i e n t and accurate analyses on commonly avail a b l e equipment. 4.1 PITCH-PERIOD, DEMARCATION PROCEDURE Pitch-period markers were positioned at the waveform t r a n s i t i o n preceding the f i r s t large pitch-period o s c i l l a t i o n . The f i r s t two markers were manually positioned using a crosshair cursor on a video terminal. 40 msec of data were displayed at a time, and the horizontal resolution of the display was 1000 p i x e l s . For subsequent markers, an i n i t i a l estimate was obtained using t 3 - 2*t 2 - tj. (4.1) t ± - t i _ ! + t ± _ 2 - t i _ 3 i > 3 where t i = the p o s i t i o n of the i 1 t h pitch-period marker This estimate was refined using the "optimization" procedure described below. The refined estimate was displayed for v e r i f i c a t i o n by the operator. Use of the i-2'th pitch-period for i n i t i a l estimation of the i ' t h pitch-period, as i n Eq. (4.1), was found to be advantageous. This accommodates instances of alternate cycle p e r i o d i c i t y , where CH 4: PITCH-PERIOD DEMARCATION 36 the i ' t h pitch-period had a closer resemblance to the i-2'th pitch-period than to i t s nearest neighbors. 4.2 PITCH-PERIOD MARKER OPTIMIZATION The following method attempts to reduce errors i n a set of pitch-period markers. The method i s i l l u s t r a t e d i n Figure 4.1. A short waveform segment following the f i r s t marker i s used as a reference. The covariance of the reference with the waveform around the second marker i s determined for a number of o f f s e t s , and a function c a l l e d the Signed Squared Cor r e l a t i o n C o e f f i c i e n t (SSCC) i s computed at each o f f s e t . Parabolic i n t e r p o l a t i o n (Markel & Gray 1976) i s applied to locate the f r a c t i o n a l o f f s e t at which the SSCC i s maximized, and the marker i s adjusted accordingly. The process i s repeated, without changing the reference, for a l l subsequent markers. The SSCC i s defined as the square of the c o r r e l a t i o n c o e f f i c i e n t m u l t i p l i e d by i t s sign. The SSCC avoids computation of square roots and produces a sharper peak for i n t e r p o l a t i o n . A secondary use for the SSCC i s i d e n t i f i c a t i o n of pitch-periods for which there i s an increased p r o b a b i l i t y of error. This can indicate that the reference i s no longer appropriate or that an intermittent disturbance has occurred (e.g., a p i t c h break or phonation break). An occasional update of the reference can be used to t r a c t gradual changes i n the data. However, a cumulative error was observed when the reference was updated on each i t e r a t i o n , CH 4: PITCH-PERIOD DEMARCATION 37 i n d i c a t i n g a bias i n in t e r p o l a t i o n of the SSCC function. Evidence of a s i m i l a r i n t e r p o l a t i o n bias was observed i n Milenkovic (1987). FIGURE 4.1; PITCH-PERIOD MARKER OPTIMIZATION. This figure i l l u s t r a t e s the optimization of pitch-period markers through c r o s s - c o r r e l a t i o n . A old i s t h e offset of the o r i g i n a l marker A new ^ s the optimized o f f s e t R E F E R E N C E S E A R C H REGION CORRELATION FUNCTION The range over which the correlations are computed (search region) d i r e c t l y determines the maximum si z e of demarcation error that can be corrected. However, a large search region can lead to confusion of intra-period o s c i l l a t i o n s . This problem i s CH 4: PITCH-PERIOD DEMARCATION 38 avoided i f the length of the search region i s l e s s than one period of the f i r s t formant frequency. A search region of 1 millisecond was used for t h i s study. Interpolation of c r o s s - c o r r e l a t i o n functions has been used i n other applications (e.g., Hertz 1986). For vowel analysis, s i m i l a r procedures have been described by Smith (1980), pp. 75-76 and Milenkovic (1987). Parabolic i n t e r p o l a t i o n was not used i n Smith's procedure, and evaluations using synthetic vowels were not described. A comparison with Milenkovic's method can be found i n Section 4.3. 4.2.1 EVALUATION The optimization algorithm was evaluated using synthetic /a/, / i / and /u/ vowels. Each waveform contained equal percentages of j i t t e r , shimmer and additive noise. Data were generated at a sampling frequency of 80 kHz and downsampled to 10 kHz or 20 kHz. The fundamental frequency was 128 Hz. "Exact" pitch-period markers were synchronized with the d r i v i n g impulses of the synthesizer, and o f f s e t to be aligned with the waveform transition.preceding the f i r s t large pitch-period o s c i l l a t i o n . Further d e t a i l can be found i n Chapter 3. The optimization algorithm was repeatedly applied to the t e s t vowels to determine suitable analysis parameters. The variance of the difference between the "optimized" markers and the "exact" markers was minimized. Variances were computed over 50 consecutive pitch-periods from each test vowel. CH 4: PITCH-PERIOD DEMARCATION 39 The r e s u l t s of varying the length of the reference segment (REF) are plotted i n Figure 4.2. I n i t i a l l y , the error variance decreased as REF increased, but i n general there was l i t t l e benefit i n extending REF above 2 msec. The same conclusion could be drawn from data sampled at 10 kHz. Exceptionally good performance was observed for / i / data. The algorithm was evidently able to take advantage of the strong high frequency formants. However, r e l a t i v e l y large variances (>4400 usee 2) were observed for waveforms with 20 percent perturbation when REF was less than 2 msec. Apparently a number of cycles of the high frequency resonance are required i n order to overcome large perturbations. The increase i n error variance with increasing perturbation was greatest for /u/ data. Probable causes for t h i s are the absence of high frequency components and the r e l a t i v e l y large degree of pitch-period superposition (Fant & Ananthapadmanabha 1982). The lack of source-tract i n t e r a c t i o n i n the vowel synthesizer leads to a probable exaggeration of the superposition e f f e c t . An increase i n error variances occurred for /a/ and / i / when REF was 8 msec. This i s attr i b u t a b l e to overlap of the reference region with the next pitch-period, as the average pitch-period length for the data was approximately 7.8 msec. Fortunately, such an overlap seldom occurs for r e a l vowels when the recommended REF of 2 msec i s used. CH 4: PITCH-PERIOD DEMARCATION 40 FIGURE 4.2; PITCH-PERIOD MARKER OPTIMIZATION; VARIATION OF THE REFERENCE SEGMENT LENGTH. This f i g u r e plots the error variance for "optimized" pitch-period markers as a function of the length of the reference segment of the optimization algorithm. Data are presented for synthetic /a/, / i / and /u/ vowels with various l e v e l s of random perturbation. The sampling frequency was 20 kHz. Q) (-> Z «£ cc «t >• at o oc at 20 o ec /!/ Percent Perturbation 5 10 8 : 1 X i i 4 0 4 6 REFERENCE SEGMENT LENGTH (msec) CH 4: PITCH-PERIOD DEMARCATION 41 Preprocessing through nonlinear c l i p p i n g was considered, but was found to degrade performance. This form of preprocessing has been successfully applied for c o r r e l a t i o n based p i t c h estimation (Rabiner 1977; Rabiner & Schafer 1978, pp. 141-158). I t provides a simple means for f l a t t e n i n g the vowel spectrum while preserving p i t c h p e r i o d i c i t y , thus reducing the si z e of erroneous peaks i n the c o r r e l a t i o n function. However, the erroneous peaks can be avoided i n the present algorithm by choosing a small search region. Preprocessing through l i n e a r f i l t e r i n g was also considered. High frequency pre-emphasis (Eq. (4.2)) counteracts the spectral r o l l o f f i n vowels, and can be advantageous when high frequency formants are important. On the other hand, the low-pass f i l t e r i n g e f f e c t of the moving average i n Eq. (4.3) can be useful for reducing the signal-to-noise r a t i o . Y(i) = X(i) - 0.01*PRE*X(i-l) (4.2) where PRE = percent high frequency pre-emphasis NMAVE Y(i) = 1/NMAVE 2 X( i+j-l-NMAVE/2 ) (4.3) where NMAVE = the number of points i n the moving average Various combinations of high frequency pre-emphasis and moving average f i l t e r i n g were evaluated. In general, the moving average reduced error variances for severely perturbed waveforms but often increased error variances for moderately perturbed data CH 4: PITCH-PERIOD DEMARCATION 42 ( i . e . , 3% or 5% perturbation). Performance improvements through pre-emphasis, when they occurred, tended to a f f e c t the low or moderately perturbed data. Parameter selections judged to have the best o v e r a l l performance are summarized i n Table 4.1, and the associated error variances are summarized i n Table 4.2. Except for the most severely perturbed data, the error variances were much less than the variance caused by quantization of markers to the nearest sample point. The net e f f e c t of the recommended preprocessing i s a s l i g h t emphasis of high frequencies for /a/, a gentle low-pass f i l t e r for / i / , and a sharper low-pass f i l t e r for /u/. As summarized i n Table 4.2, good results were obtained when data were sampled at 10 kHz. However, without preprocessing the error variances for / i / varied widely between 110 psec 2 and 10,000 usee 2. The was corrected by applying a 2-point moving average, leading to error variances of below 80 usee 2 for a l l but the most severely perturbed data. This problem can be attributed to sampling of the high frequency formants at near the Nyquist frequency. The strength of the c r o s s - c o r r e l a t i o n i s highly dependent on the phase of the sampled waveforms for such circumstances. Data i n Figure 4.3 simulate the e f f e c t of varying the waveform instant used for pitch-period demarcation. A constant o f f s e t of up to 1 msec was added to (or subtracted from) each pitch-period marker p r i o r to optimization. R e l a t i v e l y low variances were obtained at a l l o f f s e t s for /a/ and / i / . CH 4: PITCH-PERIOD DEMARCATION 43 Increased v a r i a n c e s a t n e g a t i v e o f f s e t s were expected, as l a r g e r p o r t i o n s o f the p r e c e d i n g p i t c h - p e r i o d are i n c l u d e d . The f o l l o w i n g two f a c t o r s c o n t r i b u t e d t o the i n c r e a s e d v a r i a n c e s at p o s i t i v e o f f s e t s . F i r s t l y , the amplitude o f the harmonic component decays wi t h time, so the r e l a t i v e i n f l u e n c e o f n o i s e i s g r e a t e r . Secondly, the r a t e o f decay o f the c u r r e n t p i t c h - p e r i o d i s g r e a t e r than t h a t o f superimposed p o r t i o n s o f p r e c e d i n g p i t c h - p e r i o d s , so the superimposed data has more i n f l u e n c e . The d e c l i n e i n v a r i a n c e a t p o s i t i v e o f f s e t s f o r /a/ w i t h 10 percent p e r t u r b a t i o n i s d i f f i c u l t t o e x p l a i n . I t may be t h a t i n f l u e n t i a l segments o f superimposed data are avoided. TABLE 4.1: PITCH-PERIOD MARKER OPTIMIZATION; ANALYSIS PARAMETERS T h i s t a b l e summarizes recommended a n a l y s i s parameters f o r p i t c h - p e r i o d marker o p t i m i z a t i o n . "REF" = the l e n g t h o f the r e f e r e n c e segment "PRE" = the p e r c e n t pre-emphasis (Eq. (4.2)) . "NMAVE" = the s i z e of the moving average (Eq. (4.3)) VOWEL AND REF PRE NMAVE SAMPLING FREQUENCY (msec) (%) (pts) 10 kHz /a/ 2.0 35 2 / i / 5.0 0 2 /u/ 1.5 60 4 20 kHz /a/ 2.0 70 3 / i / 2.0 0 2 /u/ 1.5 -100 4 CH 4: PITCH-PERIOD DEMARCATION 44 TABLE 4.2; PITCH-PERIOD MARKER OPTIMIZATION; ERROR VARIANCES T h i s t a b l e summarizes e r r o r v a r i a n c e s , i n u s e e 2 , f o r p i t c h - p e r i o d markers i n s y n t h e t i c vowels. E r r o r v a r i a n c e s were computed w i t h r e f e r e n c e t o the d r i v i n g impulses o f the vowel s y n t h e s i z e r . 1 1PTB 1 1 = the l e v e l o f p e r t u r b a t i o n i n the vowels "QUANT" = markers rounded t o the n e a r e s t sample p o i n t "OPT" = markers obtained from the o p t i m i z a t i o n a l g o r i t h m "PREPROCESSED" = p r e f i l t e r i n g as i n Tab l e 4.1 ERROR VARIANCES (Usee 2) FOR VARIOUS MARKER TYPES SAMPFREQ QUANT OPT PREPROCESSED+OPT AND PTB /a/ / i / /u/ /a/ / i / 7 u / 10kHz 0% 866 16 6573 0 3.0 37 .0.1 1% 723 12 2486 4.7 3.8 42 4.6 3% 710 28 7247 45 17 50 43 5% 813 47 116 146 45 65 142 10% 716 155 9730 1127 167 75 1017 20% 889 316 8980 5858 286 8549 3995 20kHz 0% 192 0.3 2.4 0 0.6 1.7 0 1% 183 1.5 3.7 3.1 1.5 3.1 3.1 3% 204 5.8 2.8 39 4.4 2.5 36 5% 179 24 2.4 141 10 2.5 132 10% 192 85 3.7 1190 51 3.4 1060 20% 149 238 12 3894 193 14 3940 CH 4: PITCH-PERIOD DEMARCATION 45 FIGURE 4 .3 ; PITCH-PERIOD MARKER OPTIMIZATION; THE EFFECT OF A DEMARCATION OFFSET. Error variances for "optimized" pitch-period markers are plotted as a function of an additive o f f s e t (X) to the pitch-period demarcation markers. X was constant within each estimate. The sampling frequency was 20 kHz. 100 4000 20004 -1 0 1 -1 0 1 - 1 0 1 - 1 0 1 X(msec) X(msec) X(msec) X(rasec) 1 2 3* 5% m PERTURBATION LEVEL (*) CH 4: PITCH-PERIOD DEMARCATION 46 4.3 DISCUSSION The method for "optimizing" pitch-period markers was evaluated using synthetic /a/, / i / and /u/ vowels. I t was shown to be e f f e c t i v e at reducing demarcation errors due to quantization for a l l but the most severely perturbed waveforms. Demarcation errors from other sources are also corrected, provided that the correct l o c a t i o n i s included i n the search region. The a b i l i t y to obtain high resolution pitch-period markers at moderate sampling frequencies i s useful for analysis of j i t t e r and noise with low cost computer hardware. While preprocessing through l i n e a r f i l t e r i n g improved the performance of the pitch-period optimization algorithm, good resu l t s were also obtained without i t . Thus, i t i s not e s s e n t i a l that separate analysis parameters be used for d i f f e r e n t vowels and sampling frequencies. One exception, as exemplified by data for / i / sampled at 10 kHz, i s when a strong formant i s sampled at near i t s Nyquist frequency. This leads to a severe dependence on the phase of the sampled waveform. The use of a low-pass f i l t e r such as a 2-point moving average i s recommended when t h i s i s a p o s s i b i l i t y . There are some differences i n i n t e r p r e t a t i o n of the pitch-period optimization procedure discussed here, and a similar procedure discussed i n Milenkovic (1987). Milenkovic suggested that the reference segment be as long as the pitch-period. Results presented here indicate that a shorter reference segment i s preferable. This provides f a s t e r computation, permits the.use CH 4: PITCH-PERIOD DEMARCATION 47 of the same length of reference segment for a l l subjects and prevents overlap with the onset of the next pitch-period. The l a s t point i s s i g n i f i c a n t , as reference segment overlap degraded performance here. Furthermore, the harmonic component i s not ergodic, rather, i t decays with time. This, combined with the p o s s i b i l i t y of increased noise during the open phase of the vocal cord vibratory cycle, suggests that the signal-to-noise r a t i o can vary considerably over the course of a pitch-period. Thus, a longer reference segment i s not necessarily superior. A possible modification of Milenkovic's procedure i s to determine the pitch-period demarcation using a short reference segment, then recompute l o c a l c o r relations for estimation of shimmer and noise. The d e s i r a b i l i t y of a broad peak for parabolic i n t e r p o l a t i o n depends on the applicat i o n . I f the goal i s to estimate the height of the peak, then a broad peak i s advantageous. However, the same i s not true i f the goal i s to locate the peak, as too broad a peak leads to increased s e n s i t i v i t y to noise or measurement error. This explains Milenkovic's r e s u l t s for perturbation-free synthetic vowels, where pitch-period and j i t t e r estimates were better for . / a / , but shimmer and noise estimates were better for / i / . , In contrast with the present study, Milenkovic did not report large errors for / i / data sampled at low frequencies. This can be attributed to differences i n the synthetic vowels. An attenuation of high frequencies i s i m p l i c i t i n the design of his synthesizer, as the e x c i t a t i o n signal did not contain harmonics greater than 3.5 kHz. Furthermore, problematic phases CH 4: PITCH-PERIOD DEMARCATION 48 may have been avoided through the use of alternate cycle or square wave perturbations. Nonetheless, phase s e n s i t i v i t y i s a possible explanation for the nonlinear r e l a t i o n s h i p between measured j i t t e r and actual j i t t e r observed by Milenkovic for / i / (Figure 1, p. 535). Because of l i m i t a t i o n s of the vowel synthesizer discussed i n Chapter 3, further evaluations of the pitch-period optimization procedure are recommended. E f f e c t s attributed to pitch-period superposition were exaggerated here by the lack of source-tract i n t e r a c t i o n i n the synthesizer. Time dependencies within perturbations, i n t e r r e l a t i o n s h i p s among perturbations, c o r r e l a t i o n with the harmonic component, v a r i a b i l i t y of the r e l a t i v e l e v e l of each perturbation and deviations from the d i s t r i b u t i o n assumptions were not considered. Other p o t e n t i a l sources of error include natural variations i n the formants and the fundamental frequency, noise and d i s t o r t i o n during recording and playback, environmental noise, room acoustics, e l e c t r i c a l interference, etc.. Additional evaluations would serve to e s t a b l i s h relationships between the "optimized" pitch-period markers and the vocal cord vibratory cycle, and to i d e n t i f y factors which influence that r e l a t i o n s h i p . Such evaluations appropriately include comparison with other measures of laryngeal function, such as electroglottography. 49 CHAPTER 5; ALGORITHMS FOR MEASURING VOWEL PERTURBATIONS This chapter describes algorithms used for measuring perturbations i n sustained vowels. A mathematical expression for each algorithm i s presented, along with a b r i e f summary of i t s o r i g i n s . Formulations for e f f i c i e n t implementation are described, and considerations for parameter s e l e c t i o n are discussed. 5.1 THE RELATIVE AVERAGE PERTURBATION (RAP) A number of studies have applied the r e l a t i v e average perturbation (RAP), or other s i m i l a r algorithms, to sustained vowels for measurement of j i t t e r and shimmer (e.g., Hecker & Kreul 1971; Heiberger & H o r i i 1982; H o r i i 1979; Koike 1973; Lieberman 1963; Zyski et a l . 1984). These measures have been important components i n computer-based detection of laryngeal pathology (Davis 1981), and have shown a s i g n i f i c a n t c o r r e l a t i o n with subjective judgements of hoarseness l e v e l (Yumoto et a l . 1984). Measures of j i t t e r have generally been found to be superior to measures of shimmer. The RAP was designed to measure the average magnitude of the deviation of a data point from i t s predicted value. The predicted values are computed from an average of nearby data samples. Normalization with respect to the mean of the sequence provides independence of scale. Such normalization also CH 5: ALGORITHMS 50 decreases dependence on the fundamental frequency for analysis of j i t t e r ( H orii 1979). The following i s a generalized formulation for the RAP. (5.1) N-K*HI (1/(N-K*(M-1) ) * 2 -• X(i) - X p(i,M,K) --i=l+K*MID RAP(CTR,M,K) = N (1/N) 2 X(i) i=l where X(i) = a numeric sequence to be analyzed N = the number of points i n X(i) CTR = a cent e r - l i m i t constant (see +x+ below) M = the number of points i n the moving average K = increment between points i n the moving average -j-x-j- = a center-limited absolute value of x = |x| when |x|>CTR = 0 when |x|£CTR Xp(i,M,K) = M-point moving average p r e d i c t i o n of X(i) = X(i-K) M=l = 0.5*[ X(i) + X(i-K) ] M=2 HI = 2 a-j X(i+K*j) j=LO where aj £ 0 and 2 aj = 1 MID = int ( M/2 ) HI = in t ( (M-l)/2 ) M>2 CH 5: ALGORITHMS 51 LO = - i n t ( (M+l)/2 ) int(x) = truncation to the nearest lower integer The algorithm proposed by Koike (197 3) i s obtained by set t i n g CTR=0, M=3, K=l and aj=l/3. 5.1.1 CONSIDERATIONS FOR PARAMETER SELECTION Because the RAP can be viewed as the average of N samples of a random va r i a b l e , i t s variance i s roughly proportional to the square root of N. N greater than 20 has been recommended (Titze et a l . 1987). However, the i m p l i c i t assumption of s t a t i o n a r i t y becomes increasingly tenuous at large values of N. The parameters "CTR" and "K" were included for specia l i z e d applications of the RAP. "CTR" can be used for compensating for quantization e f f e c t s (see Section 6.1.5), or for removing small perturbations. "K" allows the RAP to be applied to sequences with p e r i o d i c a l l y varying expected values or peri o d i c perturbations (see Section 5.6). Lower values of M reduce storage requirements and computational complexity, and provide greater l o c a l i z a t i o n of i n d i v i d u a l perturbations. However, when M equals 1 or an even number, the center point of Xp(i,M,K) i s not aligned with X ( i ) . This r e s u l t s i n an increased s e n s i t i v i t y to l i n e a r variations i n the data. I f the perturbation i s stationary, then choosing aj=l/M produces the best l i n e a r unbiased estimate of the mean. However, depending on the data, there may not be an i n t u i t i v e relationship CH 5: ALGORITHMS 52 between the mean and the expected (unperturbed) value. The mean i s appropriate when the unperturbed value can be assumed to be constant. I f M i s odd and greater than 2, t h i s statement also holds when the unperturbed value varies l i n e a r l y with time. However, a l t e r n a t i v e weighting factors i n the moving average may be preferred i f a nonlinear v a r i a t i o n i n the expected value i s suspected, or i f the perturbation is suspected to be nonstationary. 5.2 TIME DOMAIN HARMONICS-TO-NOISE RATIOS (HNRS) An algorithm for measurement of time domain noise (the HNR) was developed by Yumoto et a l . (1982, 1983, 1984). The c o r r e l a t i o n with subjective judgements of hoarseness was reported to be superior to measures of j i t t e r (Yumoto et a l . 1984). To derive the HNR, i t was assumed that the vowel can be modelled as the sum of a harmonic component that repeats i n each pitch-period, and a nonrepetitive noise component. The vowel ( V j j x ) ) i s segmented into i n d i v i d u a l pitch-periods, where " i " i s the period number and " x " i s the time o f f s e t from the s t a r t of the period. The harmonic component i s estimated by averaging V j j x ) across " i " . The noise component i s obtained by subtracting t h i s average from each pitch-period of the o r i g i n a l s i g n a l . The following i s the HNR algorithm described i n Yumoto et a l . (1982). CH 5: ALGORITHMS 53 imax v h 2 ( x ) d x N * J O HNRYumoto = 1 0 * l o g 1 0 (5.2) N 2 i=l 2 d x [ V i ( x ) - v h ( x ) ] J O where v-jjx) = the i ' t h period of the sign a l T-L = the length of the i ' t h period imax = the length of the longest period N = the number of periods N v h ( x ) = l/N S V j _ ( x ) = the harmonic component estimate i=l V i ( x ) = 0 when T i < x < T i m a x While i t s name suggests that i t i s primarily a measure of noise, t h i s HNR i s also i n t r i n s i c a l l y s e n s i t i v e to j i t t e r and shimmer. This i s appropriate when i t i s used as a single-feature index of hoarseness, as j i t t e r and shimmer are common i n hoarse speakers. However, given the multifaceted nature of vocal pathology, the a v a i l a b i l i t y of separate measures of j i t t e r , shimmer and noise i s advantageous. Inspection of Eq. (5.2) reveals that o f f s e t adjustments i n d i g i t i z a t i o n equipment can a f f e c t r e s u l t s , as such an o f f s e t a f f e c t s the numerator but not the denominator. In addition, the influence of v a r i a t i o n i n T^ ( j i t t e r ) depends on the event chosen for pitch-period demarcation. For example, i f the markers are aligned with pitch-period peaks, the harmonic component w i l l be underestimated when zeros are appended during i t s computation, CH 5: ALGORITHMS 54 leading to an overestimate of the noise component. This e f f e c t i s l e s s when the markers are aligned with a zero crossing. Eq. (5.3) i s a generalized HNR algorithm that addresses the issues discussed above. (5.3) f T h HNR = 1 0 * l o g 1 0 N * vh 2(x) dt - N*C 2*T h N S i=l 'Th [ M i*v i(x) - vh(x) - C*(Mi-l) ] 2dx where T n = the integration range Mi = a m u l t i p l i c a t i v e s c a l i n g constant [Th C = 1/T h vh(x) dx = an additive data o f f s e t "C" allows for removal of the dependence on the data o f f s e t . "Mi" allows for amplitude normalization to reduce the influence of shimmer. The s e n s i t i v i t y to j i t t e r caused by choosing Th = Timax a s ^ n E c 3 ' ( 5« 2) c a n be reduced by using the al t e r n a t i v e T h = T i m i n ' where T i m i n i s the minimum pitch-period length. Similar methods for reduction of j i t t e r and shimmer e f f e c t s were discussed i n Cox et a l . (1986b, 1 9 8 9 b ) a n d Hillenbrand (1987). A method for extending short periods i s required i n Eq. (5.3) for instances when the integration range (T n) exceeds T i m i n ' However, as previously noted, the o r i g i n a l l y proposed method of appending zeros leads to a variable s e n s i t i v i t y to j i t t e r , depending on the waveform event used for pitch-period CH 5: ALGORITHMS 55 demarcation. An i n t u i t i v e a l t e r n a t i v e i s to extend short pitch-periods with the average of a l l longer pitch-periods. It can be shown that the following HNR i m p l i c i t l y adopts t h i s assumption. (5.4) HNR' = 1 0 * l o g 1 0 i m a x N 2 i - l n(t) vh 2(x) dr - N*C 2*T i a v e [ M i*v i(x) - vh(x) - C*(Mi-l) ] 2dx where T i a v e = the average T^ for a l l i n(x) = N minus the number of pitch-period endpoints that occur i n the time i n t e r v a l from 0 to x. N vh ( x ) = l/n(x) 2 V j j x ) i-1 v^(x)=0 when T i < x < T i m a x TT. C = l / N * T i a v e i m a x 0 n(x) vh(x) dx 5.2.1 SINGLE PASS FORMULATIONS Implementation of the HNR d i r e c t l y from Eq. (5.2), Eq. (5.3) or Eq. (5.4) requires two passes through the data, one for estimation of the harmonic component, followed by one for noise component and energy computations. Formulations are presented i n t h i s section that can be computed i n a singl e data pass under c e r t a i n assumptions. These formulations are u s e f u l , as storage and r e c a l l of data are often time consuming, and multiple data passes prevent implementation of a real-time analysis program. CH 5: ALGORITHMS 56 The following two formulations were derived by assuming that M^=l and completing the square i n the denominator of Eq. (5.3) and Eq. (5.4). rTh HNR = 1 0 * l o g 1 0 N* vh 2(x) dx - N*C 2*T h N S i=l V i 2 ( x ) dx - N* Th vh 2(x) dx (5.5) when Mi=l (5.6) HNR' = 1 0 * l o g 1 0 Timax n(x) vh 2(x) dx - N*C 2*T i a v e 0 N S i=l V i 2 ( x ) dx - imax n(x) vh 2(x) dx when M^=l These formulations tend to be more sen s i t i v e to p r e c i s i o n errors for instances when the noise component i s small r e l a t i v e to the harmonic component. However, l i t t l e error was observed i n HNR values computed for t h i s paper, i n which r e a l (32-bit) precision was used. Normalization of the energy i n each pitch-period makes Eq. (5.7) a l o g i c a l choice for applications that require a reduction of the influence of shimmer. HNR - - 1 0 * l o g 1 0 - 1 vh 2(x) dx - C 2*T h (5.7) where CH 5: ALGORITHMS 57 N vh(x) = 1/N S M i*v i(x) i = l ~ f T h C = 1 /T n vh(x) dx J O 1 Vi 2 (x) dx - C i 2 * T h 0 T h 1/T h Vjjx) dx J O Two modifications of the general form i n Eq. (5.3) have been incorporated; an estimate of the o f f s e t constant (C^) was separately computed for each period, and the amplitude sca l i n g factor (M-jJ was included i n the harmonic component estimate (vh(x)). These changes have l i t t l e e f f e c t on the f i n a l r e s u l t i f the v a r i a t i o n of C^ and M^ i s small r e l a t i v e to t h e i r respective means. Eq. (5.7) can be computed i n a sin g l e pass, but does not incorporate the pitch-period extension assumption adopted for Eq. (5.4) and Eq. (5.6). A single-pass formulation that adopts t h i s assumption cannot be derived, as knowledge of the extension i s required for determination of M^. Table 5.1 l i s t s the number of computations required for various HNR formulations. I t was assumed that integrations from x=0 to x=Tjj are implemented as summations of N^ values. I f time for data storage and r e c a l l i s ignored, the reduction i n computational load obtained by using Eq. (5.5) or Eq. (5.6) i s CH 5: ALGORITHMS 58 small. However, Eq. (5.7) reduces the computational load by a factor of approximately 1.5. TABLE 5.1: COMPUTATIONAL REQUIREMENTS FOR HNR FORMULATIONS This table compares the number of mathematical operations required for HNR estimation under various analysis conditions. Eq. (5.5) through Eq. (5.9) are compared with equivalent forms of Eq. (5.3). "Mi" = the amplitude sca l i n g parameter for the HNR "N" = the number of pitch-periods "Nn" = the number of points for integrations of 0 -£ x £ T^ ADDITIONS MULTIPLICATIONS ] DIVISIONS / LOG Mi - 1 Eq. 5. 3 3*N*Nh - N+l (N+2)*Nh + 4 3 0 1 Eq. 5. 5 2*N*Nh - N+l (N+2)*Nh + 4 3 0 1 M± = = normalized energy for each period Eq. 5. 3 6*N*Nh - N (3N+2)*Nh + N+4 N + 3 N 1 Eq. 5. 7 (N+3)*Nh - 1 (2N+3)*Nn + 2N+3 N + 3 0 1 Eq. 5. 9 3*N*Nh + N (2N+2)*Nh + 4 N + 3 N 1 M± = maximized contribution of each period Eq. 5. 3 7*N*Nh - 2N (4N+2)*Nh + N+4 N + 3 0 1 Eq. 5. 8 3*N*Nh + N-l (2N+2)*Nj1 + 4 N + 3 0 1 CH 5: ALGORITHMS 59 5.2.2 RELATIONSHIP WITH A CORRELATION COEFFICIENT The formulations i n t h i s section show a re l a t i o n s h i p between the HNR and a c o r r e l a t i o n c o e f f i c i e n t when the scale factor M-^ i s used for amplitude normalization. The following expression, obtained through d i f f e r e n t i a t i o n of Eq. (5.3), uses an M-^ that maximizes the contribution of each period to the r e s u l t . HNR = - 1 0 * l o g 1 0 when N 1 - 1/N 2 r i 2 i=l Th (v i(t)-C) (vh(x)-C) dx Mi = ( V i ( x ) - C ) 2 dx where r i ' 'Th ( V i(x)-C) (vh(x)-C) dx ( V i ( x ) - C ) 2 dx Th (vh(x)-C) 2 dx (5.8) A l t e r n a t i v e l y , the Mi used for Eq. (5.9) i s f u n c t i o n a l l y equivalent to the one used for Eq. (5.7), i n that period-to-period variations i n amplitude are normalized. HNR = - 1 0 * l o g 1 0 when N 2 - 2/N 2 r ± i=l Th (vh(x)-C) 2 dx Mi 2 = Th ( V i ( x ) - C ) 2 dx (5.9) CH 5: ALGORITHMS 60 While these f o r m u l a t i o n s are not d i r e c t l y computable i n a s i n g l e d ata pass, data i n T a b l e 5.1 suggest t h a t they reduce the computational l o a d by a f a c t o r o f approximately 2. 5.2.3 RELATIONSHIP WITH MILENKOVIC'S MEASURE The f o l l o w i n g makes a mathematical c o n n e c t i o n between the HNR and a measure d e s c r i b e d i n M i l e n k o v i c (1987). I f N=2, C=0 and i f M-^ i s i n c l u d e d i n e s t i m a t i o n o f the harmonic component, then the HNR i n Eq. (5.3) becomes (5.10) HNR2 = 1 0 * l o g 1 0 Th [ M ^ v ^ x ) + M i + 1 * v i + 1 ( x ) ]* dx [ M i * V i ( x ) - M i + 1 * v i + 1 ( x ) ]» dx Through d i f f e r e n t i a t i o n i t can be shown t h a t Eq. (5.10) i s maximized when 1 M i2 = T h (5.11) V i 2 ( x ) dx Now, u s i n g the n o t a t i o n a l conventions o f t h i s paper, the measure d e s c r i b e d by M i l e n k o v i c i s (5.12) rTh ^ M i l e n k o v i c = 1 0 * l o g 1 0 (1+M2) V j / ( x ) dx 'T h [ v ± ( x ) - M * v i + 1 ( x ) ]» dx CH 5: ALGORITHMS 61 - 1 0 * l o g 1 0 (1+M2) 'Th [ V i ( x ) + M*v i + 1(x) ] 2 dx [ V i ( x ) - M*v i + 1(x) ] 2 dx where M = A + / (A 2+l) A = ( 0 . 5 / r i # i + 1 ) * (X - 1/X) T h (vi(z)-Ci) ( v i + 1 ( x ) - C i + 1 ) dx : i , i + l C ± = 1/T h fTh Th 0 'Th ( V i ( x ) - C i ) 2 dx * V i ( x ) dx V i 2 ( x ) dx Th ( v i + l ( x ) - c i + l ) 2 d x 0 X 2 = T h V i + i 2 ( x ) d x For a vowel waveform from a normal speaker, r ^ i + i i s close to 1, and the v a r i a t i o n i n Mi i s r e l a t i v e l y small. Under those conditions i t can be shown that M • Mi+^/Mi and HNR2 - H N R M i l e n k o v i c + 3 (5.13) The difference of 3 dB between HNR2 and HNRj^iien^ovic i s not sur p r i s i n g , as i t was assumed for the o r i g i n a l HNR that N i s large enough to e f f e c t i v e l y remove noise from the harmonic component estimate. Small values of N lead to overestimation of harmonic component energy and underestimation of noise component CH 5: ALGORITHMS 62 energy. The difference w i l l be less when N i s larg e r , provided that the assumption of harmonic component invariance i s v a l i d . 5.3 THE PITCH-PERIOD CORRELATION FACTOR (CF) The c o r r e l a t i o n factor (CF) defined below i s a new measure of noise that has t h e o r e t i c a l advantages over e x i s t i n g measures. CF(K) = 1 0 * l o g 1 0 where N 1/N 2 R i / i + K i=l (5.14) 1 + r i , i + K R i , i + K = 1 " r i , i + K N = the number of pitch-period pairs to be included K = the period separation constant Substitution of Eq. (5.11) into Eq. (5.10), followed by removal of the logarithm and rearrangement of terms leads to R J ^ I + K * Gradual changes i n the data have le s s e f f e c t on the CF than the HNR, as the harmonic component need not be assumed to be invariant over a number of pitch-periods. In addition, the separation constant "K" accommodates periodic perturbations such as alternate cycle p e r i o d i c i t y or one-half sub-harmonic (see Section 5.6). 5.4 THE SPECTRAL HARMONICS-TO-NOISE RATIO (SHNR) The evenly spaced peaks i n the frequency spectrum of a vowel si g n a l are c o l l e c t i v e l y referred to as the spectral harmonic component. I t has been observed i n narrow band spectrographic displays that the spectral harmonic component becomes CH 5: ALGORITHMS 6 3 increasingly obscured by noise for vowel samples from hoarse speakers (Yanagihara 1967). This provides motivation for quantitative measurement of the r e l a t i v e l e v e l of spectral noise. Kitajima (1981) developed a measure that u t i l i z e d a frequency domain f i l t e r i n g operation to i s o l a t e the noise component. A s i g n i f i c a n t c o r r e l a t i o n with the l e v e l of noise i n power spectra obtained from a sound spectrograph was reported. However, the analysis was not synchronized with the pitch-periods, and methods used to minimize leakage were unclear. Kojima, Gould, Lambiase and I s s h i k i (1980) developed a p i t c h synchronized SHNR and reported a s i g n i f i c a n t c o r r e l a t i o n with subjectively judged hoarseness l e v e l . However, t h e i r algorithm i s time consuming to compute and i s li m i t e d to short data-segments. This algorithm was generalized here for longer data-segments, and methods for reducing computation time are proposed. Klingholtz and Martin (1985) developed a SHNR to study the e f f e c t s of j i t t e r and shimmer on the spectrum of a vowel-like synthetic waveform. I t was reported that the amount of spectral noise produced by j i t t e r i s much larger than the noise produced by s i m i l a r l e v e l s of shimmer. Because t h i s algorithm uses the spectrum of an unperturbed waveform as a reference, i t cannot be d i r e c t l y applied for analysis of re a l vowel samples. 5.4.1 DESCRIPTION OF SHNR ALGORITHMS The following rationale i s used for p i t c h synchronous SHNR algorithms. I t i s assumed that a sustained vowel can be modelled as the sum of an aperiodic noise component, and a harmonic CH 5: ALGORITHMS 64 component that i s repeated i n each pitch-period. The vowel i s broken into data-segments (w-j(k)), each spanning an integer number of pitch-periods (NPP). I f the pitch-periods are a l l approximately the same length, then the energy from the harmonic component appears at every NPP'th c o e f f i c i e n t of the Fourier ser i e s of each data-segment. The sum of the square of these c o e f f i c i e n t s provides an estimate of the o v e r a l l harmonic energy. The sum of the square of the other coefficients., excluding the zero frequency c o e f f i c i e n t , provides an estimate of the noise energy. The following i s the algorithm proposed by Kojima et a l . (1980). (5.15) SHNR K o : ) i m a = 1 0 * l o g 1 0 jmax nmax S S |W-s (3*n) I j=l n=l jmax nmax 2 2 |W-j(3*n-l)|2 + |W-i(3*n-2) j=l n - l where Each data-segment contains 3 pitch-periods W j(k) = the j ' t h data-segment, k=l,2,...,K Wj(m) = the m'th c o e f f i c i e n t of the Fourier series for the j ' t h data-segment, m=0,1,...,K/2 jmax = the t o t a l number of data-segments nmax = the highest harmonic component i n the analysis For Kojima's analysis, "jmax" was set so that the analysis spanned 325 milliseconds, "nmax" was set for a maximum included frequency of 5500 Hz and the sampling frequency was 22,026 Hz. CH 5: ALGORITHMS 65 Eq. (5.16) and Eq. (5.17) are generalized SHNR algorithms. (5.16) 10 jmax SHNRros(NPP,NSKIP)= 2 l o g 1 0 jmax j=l nmax 2 |W-j (NPP*n) ] 2 n=l nmax NPP-NSKIP-1 2 2 |w-j (NPP*n-m) | n=l m-NSKlP+.l where NPP = the number of pitch-periods i n each data-segment NSKIP = number of excluded components each side of the harmonic peaks NPP £ 2*NSKIP + 2 (5.17) 10 jmax SHNRsor(NPP,NSKIP)= 2 l p g 1 0 jmax j =1 nmax '|wj(NPP*n) 2 n - l NPP-NSKIP-1 2 |w-i (NPP*n-m) | m=NSKIP+l - 10*l o g 1 0 [ nmax ] "NPP" allows for v a r i a t i o n of the number of pitch-periods per data-segment (NPP). "NSKIP" provides a means for reducing the demand for high spectral resolution. The difference between the two formulations, as indicated by t h e i r subscripts, i s that Eq. (5.16) i s computed as a ratio-of-sums, while Eq. (5.17) i s computed as an averaged sum-of-ratios. Because small spectral components are overwhelmed by large components i n the summations of Eq. (5.16), i t i s primarily determined by the l e v e l of noise near the dominant formant frequency. Eq. (5.17) does not have t h i s problem, and consequently i s more representative of the harmonics-to-noise r a t i o throughout the frequency spectrum. CH 5: ALGORITHMS 5.4.2 ESTIMATION OF FOURIER SERIES COEFFICIENTS 66 A b r i e f nontechnical review of resolution, leakage, p i t c h synchronization and other issues a f f e c t i n g spectrum analysis of vowels can be found i n Appendix C. Kojima et a l . (1980) were prevented from using f a s t Fourier transforms (FFTs) i n t h e i r analysis, as an algorithm that can be applied to data-segments that contain an a r b i t r a r y number of points was required. The usual practice of appending zeros was not appropriate, as t h i s a l t e r s the spacing of components i n the computed spectrum estimate. D i g i t a l resampling through Lagrange i n t e r p o l a t i o n was used here as a means for simultaneously meeting data-segment length requirements of the FFT and the SHNR algorithms. S p e c i f i c a l l y , each data-segment was "resampled" through i n t e r p o l a t i o n i n order to obtain new sequences that spanned L points, where L i s a power of 2. This method s i g n i f i c a n t l y reduced computation time with l i t t l e l o s s of accuracy (see Section 7.5.2). De t a i l s of Lagrange resampling can be found i n Appendix D. Spectral leakage i s an important consideration i n SHNR algorithms, as the harmonic peaks are frequently much larger than the noise. P i t c h synchronization serves to control spectral leakage i n Eq. (5.16) and Eq. (5.17). Additional control can be obtained by using a nonrectangular window function and a l t e r i n g the NSKIP parameter to account for the associated loss of s p e c t r a l r e solution (see Section 7.5.6). CH 5: ALGORITHMS 67 5.4.3 OTHER CONSIDERATIONS Misleading SHNR estimates can r e s u l t when the number of pitch-periods per data-segment (NPP) i s small. I t was shown i n Section 7.5.5 that SHNR estimates are highly dependent on j i t t e r . Because there i s an increased p r o b a b i l i t y of encountering data-segments that are free of j i t t e r perturbation when NPP i s small, misleadingly large SHNR estimates for some data-segments can occur, r e s u l t i n g i n a p o s i t i v e bias to the averages i n Eq. (5.16) and Eq. (5.17). Eq. (5.17) was proposed as a means for reducing dependence on the formant structure of the vowel. Other alte r n a t i v e s include logarithmic transformation p r i o r to SHNR computation or p r e - f i l t e r i n g to f l a t t e n the spectrum. Such approaches have an increased s e n s i t i v i t y to external noise a r t i f a c t , as low-power regions of the frequency spectrum are emphasized. A possible means for c o n t r o l l i n g t h i s s e n s i t i v i t y i s to l i m i t the frequency range of computation. Through the use of a nonrectangular window, i t i s possible to develop a measure of spectral noise that does not require p i t c h synchronization. The performance of such an algorithm depends on the Fourier transform of the window function. For example, the main lobe of the Fourier transform of a Hanning window i s 4 DFT points wide (see Appendix C, Figure C l ) . Thus, i f a Hanning window i s used, the harmonic peaks must be separated by at l e a s t 5 DFT points i n order to resolve a noise component. CH 5: ALGORITHMS 6 8 This implies that the data-segments must be at l e a s t 5 pitch-periods long. 5.5 THE DIRECTIONAL PERTURBATION QUOTIENT (DPQ) The following i s a generalized formulation for the d i r e c t i o n a l perturbation quotient (DPQ) (Hecker & Kreul 1971). N DPQ(CTR) = 1/N) * 2 SIGNCHANGE[ Z(i),CTR ] (5.18) i=l where Z(i) = the perturbation of i ' t h sample point from i t s predicted value CTR = a ce n t e r - l i m i t constant (see SIGN[] below) SIGNCHANGE[ Z(i),CTR ] = sign change ind i c a t o r = 1 i f SIGN[Z(i),CTR] = -SIGN[Z(i-l),CTR] = 0 otherwise 1 Z(i) > CTR -1 Z(i) < -CTR SIGN[Z(i-l),CTR] -CTR £ Z(i) < CTR SIGN[Z(i),CTR] = This algorithm measures the rate of change i n the sign of the perturbation, rather than i t s magnitude. Hecker and Kreul reported that the DPQ for j i t t e r perturbation was superior to a measure of j i t t e r magnitude at i d e n t i f y i n g subjects with laryngeal cancer. The following measure of perturbation i s the same as was used i n the RAP algorithm (Eq. (5.1)). CH 5: ALGORITHMS 69 Z(i) = X(i) - X p(i,M,K) (5.19) where X(i) = the i ' t h value of the sequence = pitch-period duration for j i t t e r = pitch-period peak amplitude for shimmer = Ri,i+K f ° r noise (see Eq. (5.14)) Xp(i,M,K) = moving average defined i n Eq. (5.1) K = increment between points i n the moving average M = number of points i n the moving average Hecker and Kreuls' DPQ i s obtained by se t t i n g CTR=0, M=l and K=l. 5.6 CYCLIC PERTURBATION FACTORS The development of the c y c l i c perturbation factors was motivated by the observation that not a l l perturbations i n pathological speakers are random. A greater degree of p e r i o d i c i t y i n j i t t e r and shimmer perturbations was reported for subjects with laryngeal cancer (Koike 1968, Von-Leden & Koike 1970; Iwata 1972). In addition, i t i s not uncommon to encounter vowels with alternate cycle p e r i o d i c i t y or one-half sub-harmonic, where the i ' t h pitch-period has a closer resemblance to the i-2'th pitch-period than to i t s nearest neighbors. The following are new algorithms for quantifying c y c l i c perturbations. (5.20) C P F R A P ( K ) = 10*lo g 1 0 [ RAP( CTR,M,1 ) / RAP( CTR,M,K ) ] where RAP( CTR,M,K ) i s defined i n Eq. (5.1) CH 5 : ALGORITHMS 7 0 N C P F C F 1 ( K ) = 1 0 * l o g 1 0 1/N S i=l R i , i + K / R i , i + 1 ( 5 . 2 1 ) N N C P F C F 2 ( K ) = 1 0 * l o g 1 0 S R i , i + K /.S R i , i + 1 i=l ' 1 = 1 ( 5 . 2 2 ) = CF(K) - CF ( 1 ) where R I , I + K ^ s defined i n Eq. ( 5 . 1 4 ) These measures are approximately zero for random perturbations, and greater than zero i n the presence of a c y c l i c perturbation. The parameter "K" determines the r e p e t i t i o n frequency that the algorithm i s s e n s i t i v e to. The term "diplophonia" was d e l i b e r a t e l y avoided i n naming of the CPFs. While q u a n t i f i c a t i o n of diplophonia i s an obvious ap p l i c a t i o n , i t i s a perceived q u a l i t y and the c o r r e l a t i o n should be established experimentally. 5 . 7 SUMMARY The moving average i n the r e l a t i v e average perturbation (RAP) (Koike 1 9 7 3 ) was generalized for v a r i a t i o n of i t s length and for measurement of c y c l i c perturbations. Considerations for se l e c t i o n of analysis parameters were discussed. The harmonics-to-noise r a t i o (HNR) (Yumoto et a l . 1 9 8 2 ) was modified for removal of a data o f f s e t and for reduction of the influence of j i t t e r and shimmer. Formulations for e f f i c i e n t computation i n a single pass through the data were presented, thus s i m p l i f y i n g the implementation of a real-time analysis. A r e l a t i o n s h i p with a c o r r e l a t i o n c o e f f i c i e n t was established, CH 5: ALGORITHMS 71 along with a re l a t i o n s h i p with a measure described i n Milenkovic (1987). F i n a l l y , a new measure of noise was presented that does not require the harmonic component to be invariant. This measure was c a l l e d the Correlation Factor (CF). The measure of spectral noise described i n Kojima et a l . (1980) was generalized for v a r i a t i o n of the length of the analysis window and for reduction of dependence on the spectral c h a r a c t e r i s t i c s of the harmonic component. Considerations for spe c t r a l estimation and control of spectral leakage were discussed. A method for reducing the computation time through use of an FFT was described. The d i r e c t i o n a l perturbation quotient (DPQ) (Hecker & Kreul 1971) was generalized i n the same manner as the RAP. F i n a l l y , new measures c a l l e d c y c l i c perturbation factors (CPFs) were introduced for measurement of c y c l i c or periodic perturbations. These measures were based on the RAP and the CF. 72 CHAPTER 6: ANALYSIS OF ERRORS IN VOWEL PERTURBATION MEASURES This chapter provides an analysis of the e f f e c t of additive measurement errors on three measures of vowel perturbation. The Relative Average Perturbation (RAP) (Eq. (5.1)), the harmonics-to-noise r a t i o (HNR) (Eq. (5.3)), and the d i r e c t i o n a l perturbation quotient (DPQ) (Eq. (5.18)) were analyzed. Expressions for estimating these algorithms from the l e v e l of perturbation and error were derived. Sampling conditions and analysis techniques to minimize errors were recommended. 6.1 THE RELATIVE AVERAGE PERTURBATION (RAP) Concern has been expressed i n the l i t e r a t u r e regarding the e f f e c t s of measurement errors on magnitude-based measures of j i t t e r and shimmer such as the RAP (e.g., Heiberger & H o r i i 1982; H o r i i 1979; T i t z e et a l . 1987). I t has been shown experimentally that time quantization tends to cause an overestimation i n measures of j i t t e r when the perturbation l e v e l i s low. Recording conditions and equipment have also been shown to contribute error (Doherty & Shipp 1988). Inevitable errors i n pitch-period demarcation and the wide variety of demarcation methods further complicate the comparison of r e s u l t s . The analysis presented i n t h i s section serves to quantify the e f f e c t s of measurement errors. Expressions are provided for estimating the RAP based on the l e v e l of perturbation and error. Sampling conditions required to keep quantization e f f e c t s at an acceptable l e v e l are recommended. Methods of compensating for CH 6: MATH ANALYSIS 73 measurement errors are described. These re s u l t s were also presented i n Cox et a l . (in press). The RAP was defined i n Eq. (5.1). 6.1.1 EXPECTED VALUES FOR ANALYSIS OF JITTER When the RAP i s used for measurement of j i t t e r , the sequence being analyzed (X(i)) i s derived from the difference of pitch-period markers. The e f f e c t of an independent additive random error (6_) i n the markers can be expressed as M=l,2 N 2 f X(i)-X(i-K) + 6 i + 1 - 6 i + 6 i_ K-6 i_ K + 1-RAPj(6,CTR,M,K)= i=K+l N M*(N-K) * 2 X(i) + 6 i + 1 - 6 i N i =l (6.1) M>2 -1 HI ( l - a 0 ) * X ( i ) - S a-j X(i+K*j) - 2 a-j X(i+K*j) j=LO j=l N-K*HI -1 _ .. + ( l - a 0 ) * ( 6 i + 1 - 6 i ) - 2 a-j ( S i + i + K * j - 6 i + K * j ) i=l+K*MID j=LO HI - 2 aj ( 6 i + 1 + K * j - 6 i + K * j ) 3=1 N (N-K*(M-1)) * 2 X(i) + 6i+i-S_ N i =l where RAPj(6,CTR,M,K) = RAP measurement of j i t t e r X(i) = sequence of pitch-period durations 6^ = the error i n the i ' t h pitch-period marker CH 6: MATH ANALYSIS 74 N = the number of points i n X(i) CTR = a ce n t e r - l i m i t constant (see +x+ below) M = the number of points i n the moving average K = increment between points i n the moving average +x+ = a center-limited absolute value of x = |x| when |x|>CTR = 0 when | x|^CTR MID = i n t ( M/2 ) HI = i n t ( (M-l)/2 ) LO = - i n t ( (M+l)/2 ) int(x) = truncation to the nearest lower integer When K=l, a number of common terms must be merged. Thus; M=l,2 N 2 -• X(i) - X ( i - l ) + 6 i + 1 - 2 5 i + 5 i - l "' RAPj(5,CTR,M,1) = i=2 N M*(N-l) * 2 X(i) + 6 i + 1 - 6 i N i = l (6.2) M>2 -1 HI ( l - a 0 ) * X ( i ) - 2 a-j X(i+j) - 2 a-j X(i+j) j=LO j=l N-HI 2 i=l+MIN + (1-ao+a!) 6 i + 1 -_ aHI 6i+HI+l + aL0 6i+LO (l-a 0+a_!) 6 ± i+j N (N-M+l) * 2 X(i) + 6 i + 1 - 6 i N i=l CH 6: MATH ANALYSIS 75 According to the central l i m i t theorem, the p r o b a b i l i t y density function (pdf) of the numerator of Eq. (6.1) or Eq. (6.2) can be modeled as the p o s i t i v e h a l f of a zero mean Gaussian i f i t i s assumed that X(i) and 6j_ are mutually independent random variables of i . I f 6i has n e g l i g i b l e e f f e c t on the denominator, then - X 2/20-j 2 x e dx CTR (6,CTR,M,K) ] - 1 2 XM T f O j 2 * 1 XM 2Qj 2 % -CTR 2/20j (6.3) where E[] = the expected value N XM = E[X( i ) ] - 1/N 2 X(i) i-1 o-j 2 = C x * o x 2 + C 6 * o 6 2 = er r o r - f r e e variance of X(i) = variance of pitch-period demarcation errors = 2/M2 M=l,2 - l ( l - a 0 ) 2 + 2 a-j j=LO J HI 2 + 2 a - i 2 3 = 1 C R = 6/M2 = 4/M2 (1-ao+ai) 2 +(l-a 0+a_ 1) 2 + a H I 2 +a L 0 : -1 HI + 2 (a-j.i-a-j) 2 + 2 ( a ^ - a - j ) 2 j=LO+l j=2 M>2 K=l, M=l,2 K>1, M=l,2 K=l, M>2 = 2* -1 HI ( l - a 0 ) 2 + 2 a-j2 + 2 a-j 2 j=LO j=l K>1, M>2 CH 6: MATH ANALYSIS 76 A common s e l e c t i o n for aj i s aj=l/M. For t h i s s p e c i a l case O-j2 = ( M - l ) * o Y 2 + (2M 2+2)*OK 2 K=l, M>2, aj=l/M J M M 2 (6.4) = ( M - l ) * ( O y 2 + 2 * 0 6 2 ) K>1, M>2, aj=l/M M 6.1.1.1 PREDICTING AND COMPENSATING FOR ERROR EFFECTS Some useful expressions can be obtained through manipulation of Eq. (6.3). When CTR=0, the square of Eq. (6.3) can be separated into a term dependent on o x 2 , and a term dependent on o 5 2. I t follows that RAPj 2 (0,0,M,K) - RAPj 2 (6,0,M,K) - 2*C 6*o 6 2 / ( TT*XM2 ) (6.5) Given o s 2 , Eq. (6.5) can be used to compensate for a demarcation error. In addition, rearrangement of Eq. (6.5) leads to a quadratic function of Qj, where Oj i s the percent overestimation i n RAP estimates. The solution to that quadratic (Eq. (6.6)) can be used to predict the e f f e c t of an error. Qj = 100 * where 2*C6 o 6 2 1 + - 100 (6.6) TT*XM2 * RAPj 2 (0,0,M,K) Qj = percent overestimate i n j i t t e r analysis RAPj( £ , 0 , M, K) - RAPj(0,0,M,K) = 100 * RAPj(0,0,M,K) An a l t e r n a t i v e method of compensating for measurement errors i s to use a c e n t e r - l i m i t . Inspection of Eq. (6.3) reveals that CTR reduces the expected value of the RAP. Thus, the CH 6: MATH ANALYSIS 77 overestimation caused by quantization or other errors can be counteracted by a suitable choice of CTR. In mathematical terms, the goal i s to choose CTR=CTRj such that RAPj 2 (6 ,CTRj ,M,K) -RAPj 2(0,0,M,K). I t follows from Eq. (6.3) that CTRj 2 - C x * O x 2 * (1+Rj) * ln[ 1+Rj ] (6.7) where CTRj = ce n t e r - l i m i t for error compensation for j i t t e r Rj = C 5 * o 6 2 / (C x * o x 2 ) The c e n t e r - l i m i t s recommended i n Eq. (6.7) are plotted i n Figure 6.1. Since the perturbation standard deviation (o x) i s not generally known a p r i o r i , a constant c e n t e r - l i m i t must be chosen from the t h e o r e t i c a l l y optimal values. Choice of a value that i s optimal for o x < 2o 5 i s not advisable, as a bias w i l l be introduced for sequences with larger perturbations. Another consideration i s that perturbations i n quantized data are also quantized. Thus, values of CTR between k/M and (k+l)/M are e f f e c t i v e l y equivalent for any integer value of k. A reasonable compromise i s to choose a ce n t e r - l i m i t that i s optimal when °X"" 2 o6' leading to CTRj-0.5 quantization l e v e l s . CH 6: MATH ANALYSIS 78 FIGURE 6.1: CENTER-LIMITS TO COUNTERACT QUANTIZATION IN THE RAP Center-limits recommended for counteracting the e f f e c t s of quantization on the RAP are plotted as a function of the standard deviation of the perturbation. The recommendations were obtained using Eq. (6.7) and Eq. (6.11), with o 6 2 and o E 2 set equal to 1/12 to simulate quantization. The RAP i s defined i n Eq. (5.1). Data are for RAP with CTR=0, K=l and aj=l/M. "NMAVE" = the number of points i n the moving average "Qnt-lvls" = quantization l e v e l s "Stddev" = denotes standard deviation .4-.4-.3-\ SHIMMER ANALYSIS NMAVE • = 2 0= 3 0= 4 A= 5 PERTURBATION STO-DEV ( q n t - l v l s ) .5 PERTURBATION STD-DEV ( q n t - l v l s ) 6.1.2 EXPECTED VALUES IN ANALYSIS OF SHIMMER The RAP i s t y p i c a l l y applied to sequences of pitch-period peak amplitudes for measurement of shimmer. I f i t i s assumed that the error i s independent of the peak amplitude, the following expressions can be obtained. E[ RAPS(E,CTR,M,K) ] - 1_ XM 2 o g 2 TI I -CTR 2/2o s a e (6.8) where RAPs(e,CTR,M,K) - RAP measurement of shimmer CH 6: MATH ANALYSIS 79 = the error i n amplitude of the i ' t h pitch-period °s 2 - C x*(o x 2+o e 2) o E 2 = the variance of the error RAP S 2 (0,0,M,K) - RAPS 2 ( E , 0 , M, K ) - 2 * C X * O E 2 / ( TT*XM2 ) (6.9) Q s = 100 * 2*C X o e 2 1 + - 100 (6.10) " A J T T * X M 2 * RAP S 2 (0,0,M , K ) where Q s = percent overestimate i n shimmer analysis RAP S(E,0,M,K) - RAPs(0,0,M,K) = 100 * RAPs(0,0,M,K) CTR S 2 - C X * O X 2 * (1+RS) * ln[ 1+RS ] (6.11) where CTR S = c e n t e r - l i m i t for error compensation i n shimmer R s = ° E 2 / °x2 Inspection of Figure 6.1 reveals that the c e n t e r - l i m i t recommended by Eq. (6.11) i s approximately 0.25 quantization l e v e l s when o x-2o e. For quantized data, t h i s value i s not meaningful unless M i s greater than 3. 6.1.3 VERIFICATION The preceding analysis was v e r i f i e d using pseudo-random sequences. Each sequence was comprised of 500 r e a l (32-bit) numbers. The sequences had a fixed mean and an additive evenly d i s t r i b u t e d perturbation. The perturbations were produced using the FORTRAN l i b r a r y function RAN(30000,20000). The maximum CH 6: MATH ANALYSIS 80 excursion of the perturbation was a percentage of the mean. The sequences had perturbation l e v e l s between 0.5% and 20%. Level quantization and pitch-period marker quantization were simulated by rounding to the nearest integer. Sequences with means of 19.7, 39.7, 59.7, ..., 199.7 were used. F r a c t i o n a l means were used so that there was v a r i a t i o n i n the alignment with quantization boundaries for simulations of pitch-period marker quantization. The correspondence between the predicted and measured RAP i s i l l u s t r a t e d i n Figure 6.2. As expected, quantization lead to an overestimation of the RAP. A l l error-free RAP estimates were within 4% of t h e i r predicted values, and there was a close match for quantized data when the mean was greater than 80. However, the errors at lower means suggest that the assumptions i n the analysis are fa u l t y when the standard deviation of the perturbation i s les s than 0.5 quantization l e v e l s . The "compensated" and "center-limited" curves i n Figure 6.2 i l l u s t r a t e the methods for counteracting quantization e f f e c t s . The estimation bias was e f f e c t i v e l y removed for data means (XM) greater than 80 quantization l e v e l s . The c e n t e r - l i m i t approach was roughly comparable to the expected value approach. While the ce n t e r - l i m i t was s l i g h t l y l e s s e f f e c t i v e at removing the RAPj overestimate, i t was also l e s s prone to overcompensation. CH 6: MATH ANALYSIS 81 FIGURE 6.2: QUANTIZATION EFFECTS IN THE RAP. Predicted and measured values of the RAP for data quantized to the nearest integer are plotted as a function of the data mean. The perturbation i n the data was evenly d i s t r i b u t e d with a bound that was proportional to the mean. The RAP was defined i n Eq. (5.1). Data are for RAP with CTR=0, M=3, K=l and aj=l/3. "RAPS" = analysis of shimmer using the RAP "RAPj" = analysis of j i t t e r using the RAP "Compensated" = obtained using Eq. (6.5) and Eq. (6 .9) "Center-Limited" = obtained using a c e n t e r - l i m i t of 0.5 i i i i i i i i i i i i ~ T i i i i — i — " — i u •—i T T i i f r ^ r ^ * ' i ' i I i ) I i i i i 0 100 200 0 100 2 0 0 0 . 5 % P e r t u r b a t i o n DATA MEAN ( q n t - l v l s ) DATA MEAN ( q n t - l v l s ) CH 6: MATH ANALYSIS 6.1.4 RECOMMENDED SAMPLING CONDITIONS; JITTER ANALYSIS 82 Ignoring for the moment other sources of error, i t would be useful to know the sampling frequency at which rounding of pitch-period markers to the nearest sample point i s acceptable. By assuming that the error i s evenly d i s t r i b u t e d between - l / 2 f s a m p and l / 2 f s a m p , where f s a m p i s the sampling frequency, i t follows from Eq. (6.5) that C 6 * F 0 m a x * * 10000 S F m i n 2 _ (6.12) 6TT *RAPj 2(0,0,M,K) m i n * ( 2 0 0 * f i j m a x + Qj max 2 > where R A P j ( 0 , 0 , M , K ) = smallest value of RAPj Qjmax = maximum acceptable overestimation i n RAPj (%) RAPj(a,0,M,K) m i n - RAPj(0,0,M,K) m i n = ioo * -RAPj(0,0,M,K) m l n F 0 m a x = the maximum fundamental frequency S F m i n = the minimum recommended sampling frequency for j i t t e r analysis when pitch-period markers are rounded to the nearest sample point S F m i n for various l e v e l s of error, F 0 m a x and RAP are compiled i n Table 6.1. At low l e v e l s of RAP, the sampling frequencies recommended to keep the error below 15% were well above the sampling frequencies t y p i c a l l y used i n published data. High re s o l u t i o n and i n t e r p o l a t i o n are c l e a r l y indicated, as the tabulated RAP values are i n the range of normal human phonation (e.g., T i t z e et a l . 1987). CH 6: MATH ANALYSIS 83 TABLE 6.1: MINIMUM SAMPLING FREQUENCIES FOR JITTER ANALYSIS USING THE RAP This table compiles the minimum recommended sampling frequencies, i n kHz, for analysis of j i t t e r using the RAP. Pitch-period markers were assumed to be quantized to the nearest sample point. The RAP i s defined i n Eq. (5.1). Recommendations were obtained using Eq. (6.12), and are for RAP with CTR=0, M=3, K=l and a-j=l/3. "FO" i s the fundamental frequency. RECOMMENDED SAMPLING FREQUENCIES (kHz) LOWEST FOR VARIOUS ERROR LEVELS (%) LEVEL F0 = 150 Hz F0 = 200 Hz F0 = 250 Hz OF RAP 5% 15% 25% 5% 15% 25% 5% 15% 25% 0.001 161 91 69 215 121 92 269 152 115 0.003 54 31 23 72 41 31 90 51 39 0 .005 33 23 14 43 25 19 54 31 23 6.1.5 RECOMMENDED SAMPLING CONDITIONS; SHIMMER ANALYSIS Level quantization and time quantization both contribute error i n analyses of shimmer. Malalignment with the pitch-period peaks due to time quantization leads to underestimation of the peaks. Level quantization adds further error. In the analyses that follow, i t was assumed that the two errors are independent and that other influences can be ignored. Thus, °e 2 - °a 2 + °p 2 (6.13) where o a 2 = variance i n X(i) due to l e v e l quantization O p 2 = variance i n X(i) due to time quantization CH 6: MATH ANALYSIS 84 6.1.5.1 LEVEL QUANTIZATION I f errors i n the pitch-period peak amplitudes are exclusiv e l y due to l e v e l quantization, and i f the units of X(i) are quantization l e v e l s , then o E 2=o a 2=l/12. I t follows from Eq. (6.9) that C x * 10000 XM m i n 2 _ — (6.14) 6TT *RAP s 2(0,0,M,K) m i n *(200*Q s m a x + Q s m a X 2 > where Q s m a x = maximum acceptable overestimation i n RAPS (%) RAP s(a,0,M,K) m l n - RAP s(0,0,M,K) m i n = 100 * RAP s(0,0,M,K) m i n RAP s(0,0,M,K) mi n = smallest value of RAPS X M m i n = minimum recommended data mean for RAPS Table 6.2 contains X M m i n for various l e v e l s of error and RAPS. The data indicate that l e v e l quantization alone introduces less than 5% error when the data mean i s greater than 587 quantization l e v e l s . I f X(i) i s measured from a signed s i g n a l for which the larg e s t peak amplitude i s less than twice the mean, then 2348 quantization l e v e l s are adequate. This range i s e a s i l y obtained using a 12-bit d i g i t i z e r . CH 6: MATH ANALYSIS 85 TABLE 6.2: MINIMUM DATA MEAN FOR SHIMMER ANALYSIS USING THE RAP This table compiles the minimum data mean, i n quantization l e v e l s , recommended i n Eq. (6.14) for analysis of shimmer using the RAP. The RAP i s defined i n Eq. (5.1). Recommendations are for RAP with CTR=0, M=3, K=l and aj=l/3. s Error was assumed to be e n t i r e l y due to l e v e l quantization. LOWEST LEVEL OF RAP FOR VARIOUS ERROR LEVELS (%) 5% 15% 25% 0.001 587 331 251 0.003 196 110 84 0 .005 117 66 50 6.1.5.2 TIME QUANTIZATION For time quantization, assume that the pitch-period peaks can be modeled as sinusoids over the range of malalignment. Thus, X(i) - Aj^ C O S ( 2 7 T f p 6 i ) + B i (6.15) where X(i) = peak amplitude of the i ' t h pitch-period fp = frequency of the sinusoid used to model the peak A i + B i = the error-free peak amplitude 6i = error i n alignment with the i ' t h peak I t follows that CH 6: MATH ANALYSIS 86 Pi " - A i ( 1 ~ c o s ( 2TTf p6i ) ) (6.16) where Pi = error i n peak estimation due to time quantization I f v a r i a t i o n of A-^ i s ignored ( i . e . , i f A^=A) , then fp(Pi) = f 6 ( 6 _ > I dSi/dPi | (6.17) -A(l-COS(TTfp/f s) ) _ pj^ < 0 fs Tlfp • ( A 2 - ( A+pi) 2) where fp(Pi) = the p r o b a b i l i t y density of p i f$(6i) = the p r o b a b i l i t y density of 6^ = f s - l / 2 f s ^ 6i ^ l / 2 f s f s = the sampling frequency Thus, the error i n X(i) due to time quantization can be characterized by E[ p i ] = - A * ( l - sinc( TTR )) (6.18) o p 2 = A 2 * ( 0 . 5 + 0.5*sinc( 2TTR ) - s i n c 2 ( TTR ) ) (6.19) where E t Pi ] ~ the expected value for Pi O p 2 = the variance of Pi R = f p / f s sinc(u) = sin(u)/u R A P S estimates from perturbation-free synthetic vowels can be used i n Eq. (6.20) to determine suitable values for fp. I f the vowel i s free of j i t t e r , shimmer and noise perturbations, and CH 6: MATH ANALYSIS 87 i f l e v e l quantization i s i n s i g n i f i c a n t , i t follows from Eq. (6.9) and Eq. (6.19) that RAP S 2 ((3,0,M,K) - (C X/TT) * (1 + sine(2TTR) - 2*sinc 2 (TTR) ) when (6.20) A = XM o x 2 = o a 2 = 0 Waveforms with fundamental frequencies of 103 Hz, 128 Hz, 153 Hz, 178 Hz, and 203 Hz were synthesized for t h i s purpose using techniques described i n Chapter 3. The RAPS estimates averaged 0.0053, 0.013, and 0.001 for /a/, / i / and /u/, respectively, where the /a/ and / i / data were downsampled to 20 kHz, and the /u/ data were downsampled to 10 kHz. I t follows that suitable values for fp are 1500 Hz for /a/, 2300 for / i / and 300 for /u/. Table 6.3 summarizes the predicted overestimation i n RAPS estimates for /a/, / i / and /u/. The data indicate that time quantization can have a s i g n i f i c a n t e f f e c t . RAPS as low as 0.005 have been reported (e.g., T i t z e et a l . 1987). At that l e v e l , the predicted error for /u/ sampled at 10 kHz i s small. However, s i g n i f i c a n t errors are predicted for / i / and, to a les s e r extent, for /a/. I t i s recommended that data be sampled at at le a s t 20 times the frequency of the o s c i l l a t i o n producing the peak (fp). This implies a sampling frequency of 30 kHz for /a/ and greater than 40 kHz for / i / . CH 6: MATH ANALYSIS 88 TABLE 6.3: TIME QUANTIZATION IN SHIMMER ANALYSIS USING THE RAP This table l i s t s the predicted overestimation due to time quantization when using the RAP i n shimmer analysis. The RAP i s defined i n Eq. (5.1). Data are for RAP with CTR=0, M=3, K=l and an=l/3. The predictions were obtained using Eq. (6.11) and Eq. (6.19) with fp=1500 Hz for /a/, fp=2300 Hz for / i / and f p=300 Hz for /u/. I t i s assumed that peak i n t e r p o l a t i o n i s not used. " f s " = the sampling frequency LOWEST LEVEL OF RAP PERCENT OVERESTIMATION DUE TO TIME QUANTIZATION = 10 kHz fs = 20 kHz fs = 30 kHz /a/ / i / /u/ /a/ / i / /a/ / i / 0.001 2024 4785 28 446 1159 159 470 0.003 614 1531 3.5 105 330 28 112 0 .005 336 882 1.3 47 170 11 50 0.010 135 399 0.3 13 60 2.8 15 6.1.6 SUMMARY OF CONSIDERATIONS FOR THE RAP A mathematical analysis of the e f f e c t of measurement errors on the Relative Average Perturbation (RAP) was presented. Expressions were derived that r e l a t e the RAP to the variance of the perturbation and the measurement error. A close correspondence between the predicted and measured RAP was demonstrated. The predicted e f f e c t of l e v e l quantization was r e l a t i v e l y minor. Errors should be les s than 5% when data are sampled using a 12-bit d i g i t i z e r , provided that the majority of the range of the d i g i t i z e r i s u t i l i z e d . On the other hand, time quantization CH 6: MATH ANALYSIS 89 was i d e n t i f i e d as a p o t e n t i a l source of s i g n i f i c a n t error i n both the analysis of j i t t e r and the analysis of shimmer. I f pitch-period markers for j i t t e r analyses are rounded to the nearest sample point, the sampling frequencies recommended for maintaining an error l e v e l of less than 15% were higher than the sampling frequencies t y p i c a l l y used. This i s i n agreement with published experimental r e s u l t s . For shimmer analysis, the predicted error for V u / sampled at 10 kHz was small, but the sampling frequencies recommended for /a/ (30 kHz) and / i / (>40 kHz) exceed those commonly used. Two methods of compensating for measurement errors i n the RAP were presented; one based on expected values, and the other based on a c e n t e r - l i m i t . The effectiveness of these methods was roughly comparable. While the c e n t e r - l i m i t was l e s s e f f e c t i v e at removing the overestimate, i t was also l e s s prone to overcompensation. Both methods were e f f e c t i v e at compensating for quantization e f f e c t s when the data mean was greater than 80 quantization l e v e l s . For analysis of j i t t e r , t h i s suggests that unbiased RAP estimates can be obtained from data sampled at 20 kHz, provided that the fundamental frequency i s l e s s than 250 Hz. The compensation methods presented here provide a means for removing the bias introduced by an error. Errors from any source can be counteracted, provided that they can be assumed to be independent and have a constant variance. For example, tape recording has been shown to introduce error to measures of shimmer (Titze et a l . 1987). An estimate of the variance of t h i s CH 6: MATH ANALYSIS 90 error can be obtained from Eq. (6.5) or Eq. (6.9) through comparison of matched vowel samples, one of which has been tape recorded. The compensation methods can then be applied under the assumption that the variance remains constant for a l l subsequent recordings. I t should be emphasized that there remains a need to use high resolution sampling and demarcation techniques. The compensation methods remove the bias introduced by measurement errors, but nothing i s done to reduce the variance. Thus, methods of in t e r p o l a t i o n of pitch-period demarcation are s t i l l recommended for analysis of j i t t e r (e.g., Chapter 4; Cox et a l . 19865, i n press; T i t z e et a l . 1987), and methods for reducing errors i n analysis of shimmer should be considered (e.g., Section 7.3). 6.2 THE HARMONICS-TO-NOISE RATIO (HNR) Yumoto et a l . (1982) p a r t i a l l y addressed the issue of sampling adequacy for the HNR by analyzing sinusoidal waveforms with frequencies of between 120 Hz and 300 Hz. The measured HNR values were well d i f f e r e n t i a t e d from those obtained for normal speech samples, suggesting that the data a c q u i s i t i o n methodology was adequate. However, s i g n i f i c a n t underestimation of the HNR caused by errors i n pitch-period demarcation has been observed (Cox et a l . 1986b, Hillenbrand 1987). This apparent discrepancy i s explained i n the analysis presented here. A V e r i f i c a t i o n of the r e s u l t s can be found i n Chapter 7. The analysis and v e r i f i c a t i o n were also discussed i n Cox et a l . (1989b). CH 6: MATH ANALYSIS 6.2.1 ANALYSIS OF THE EFFECTS OF MEASUREMENT ERRORS 91 G e n e r a l i z e d e x p r e s s i o n s f o r the HNR were g i v e n i n Chapter 5. For the p r e s e n t a n a l y s i s , i t i s convenient t o omit the l o g a r i t h m . The r e s u l t i n g r e l a t i o n was d e f i n e d as the component energy r a t i o (CER). Assuming t h a t M_=l and C=0, i t f o l l o w s from Eq. (5.4) t h a t r Timax n(x) vh*(x) dx 0 CER a = (6.21) N fT_ 2 v n ^ 2 ( x ) dx i = l JO where CER a = the e r r o r - f r e e or " a c t u a l " component energy r a t i o n(x) = the number o f p i t c h - p e r i o d s d e f i n e d a t o f f s e t x v-jjx) = the i ' t h p i t c h - p e r i o d N vh(x) = ( l / n ( x ) ) 2 V j _ ( x ) = the harmonic component i = l vn_(x) = V i ( x ) - vh(x) = the n o i s e component o f the i ' t h p i t c h - p e r i o d N = the t o t a l number o f p i t c h - p e r i o d s T-^ = the l e n g t h o f the i ' t h p i t c h - p e r i o d Timax = the maximum p i t c h - p e r i o d l e n g t h E r r o r s i n p i t c h - p e r i o d demarcation can be r e f l e c t e d i n Eq. (6.21) by r e p l a c i n g a l l occurrences x w i t h x+S_, where 6-^ i s the e r r o r i n marking the " s t a r t p o i n t " o f the i ' t h p i t c h - p e r i o d . Assume t h a t v a r i a t i o n o f the p i t c h - p e r i o d l e n g t h (T_) due to the demarcation e r r o r s ( 6 j J i s not s i g n i f i c a n t , and t h a t vn^(x) i s a zero mean random v a r i a b l e o f i . For l a r g e N i t f o l l o w s t h a t CH 6: MATH ANALYSIS 92 imax n(x) C E R 0 = N vh(x) + ( l / n ( x ) ) 2 E i ( x ) i = l (6.22) dx N 2 i=l N vni(x) + Ei(x) - (l/n(x)) 2 Ejjx) i=l dx where C E R E = the C E R obtained when p i t c h - p e r i o d demarcation e r r o r s are prese n t 6^ = e r r o r i n demarcation o f the i ' t h p i t c h - p e r i o d E i ( t ) = V j ^ X+Si) - v ±(x) E^(X) i n Eq. (6.22) can be approximated by su b s t i t u t i n g a Taylor series expansion for v^(x+6i). In addition, Eq. (6.22) can be generalized for data quantized to dis c r e t e l e v e l s by adding a random error variable (T(x)) to v^(x). I f T(x) i s an independent zero mean random variable of x, then i t can be iso l a t e d i n the denominator and, for large N, i t has minimal e f f e c t on the numerator. Thus, using methods described i n Appendix E, i t can be shown that C E R A * C E R « 1 - 2 CH-j * R A T I O H H 3=1 where 1 + C E R A * Q U A N T + C E R A 2 CH-s * R A T I O H - 1 j = l (6.23) , k - l CH-i = 2 2 J k=l j ( - 1 ) ^ ( Mj_ k*M j + k - M-j2) ( j - k ) ! (j+k)! N M-j = the j ' t h moment of 6^ - 1/N 2 i = l CH 6: MATH ANALYSIS 93 E[ T 2 ( t ) ] QUANT = E[ vh 2(x) ] E[] = the expected value or mean T(t)= a random variable representing l e v e l quantization Timax / n\2 RATIOHj = n(x) v h ^ , j ; (x) dx Timax n(x) vh 2(x) dx 0 vh ( ' 3)(x) = the j ' t h deri v a t i v e of vh(x) F i n a l l y , through application of Parseval's theorem, RATIOHj can be estimated from the frequency spectrum of the harmonic component using fmax (2T T) 23 "fmax f 2 ^ |VH(f)| 2 df RATIOH-i = (6.24) f f Imax . . |VH(f)| 2 df J"^max where VH(f) = the Fourier transform of the harmonic component f m a x = the maximum frequency component of VH(f) Rapid convergence of the series i n Eq. (6.23) can be expected when the demarcation error i s small. For a band-limited s i g n a l , RATIOHj increases by at most ( 2 7 r f m a x ) 2 with each d e r i v a t i v e , where f m a x i s the highest frequency component. The change i n Mj with increasing j depends on the c h a r a c t e r i s t i c function of the d i s t r i b u t i o n of 6^. However, i f 6^ i s bounded such that |6i|<T e r r, then Mj changes by a factor of les s than CH 6: MATH ANALYSIS 94 T e r r with each increment of j . Thus, a loose bound for the product |RATIOHj*CHj| i s |RATIOH-j*CH-j | < 2*KZ-> 2 = (6.25) k=l ( j - k ) ! (j+k)! j ! 2 where K = 2 T T f m a x * T e r r f m a x = the highest frequency component T e r r = the bound for 6i I f Eq. (6.23) i s used to model marker quantization, then 6^ i s assumed to be evenly d i s t r i b u t e d between - l / 2 f s a m p and l / 2 f s a m p , where f s a m p i s the sampling frequency. I t follows that the maximum K = ( l / 2 f s a m p ) * ( 7 T f s a m p ) = n/2, and the product |RATIOHj*CHj| drops to l e s s than 3 percent of i t s i n i t i a l value a f t e r three series elements. 6.2.2 SUMMARY OF CONSIDERATIONS FOR THE HNR Eq. (6.23) indicates that the e f f e c t s of l e v e l quantization are r e l a t i v e l y minor. I f there are no demarcation errors ( i . e . , 6_=0 for a l l i ) , then CERe~1/QUANT for large CER a. To simulate l e v e l quantization, assume that T(x) i s evenly d i s t r i b u t e d between -0.5 and 0.5. I f vh(x) i s also evenly d i s t r i b u t e d , the bound of i t s d i s t r i b u t i o n should be greater than 50 to obtain an upper l i m i t for CER e that i s greater than 10,000 (HNRe greater than 40 dB). While a 7-bit sampler provides an adequate number of quantization l e v e l s , a resolution of at least 10 b i t s i s recommended to accommodate harmonic component CH 6: MATH ANALYSIS 95 d i s t r i b u t i o n s that are not f l a t and to allow for incomplete u t i l i z a t i o n of the d i g i t i z a t i o n range. In contrast with l e v e l quantization, errors i n pitch-period demarcation can have a s i g n i f i c a n t e f f e c t . When demarcation errors are small, the series i n Eq. (6.23) i s p o s i t i v e and much less than 1. Thus, there i s l i t t l e e f f e c t on the numerator, but a s i g n i f i c a n t overestimation of the denominator can occur when CER a i s large. The si z e of the e f f e c t depends on the frequency spectrum of the harmonic component. I t i s i n t e r e s t i n g that, under the zero mean and independence assumptions, the noise spectrum does not play a part. Eq. (6.23) was used to estimate the ef f e c t s of quantizing pitch-period markers for /a/, / i / and /u/ data. Estimates of the vowel spectra were obtained from d i g i t a l f i l t e r s described i n Chapter 3. 6^ was assumed to be evenly d i s t r i b u t e d between - l / 2 f s a m p and l / 2 f s a m p , where f s a m p i s the sampling frequency. Three series elements were used i n Eq. (6.23), and rapid convergence was observed. Results are plotted i n Figure 6.3. The data suggest that accurate pitch-period demarcation i s c r i t i c a l , and that i n t e r p o l a t i o n i s required for HNR estimation at commonly used sampling frequencies. Large underestimations are predicted for V i / and /a/ when HNRa i s greater than 15 dB, even when the sampling frequency i s 30 kHz. HNR l e v e l s for normal speakers are frequently above 15 dB (Yumoto et a l . 1982). CH 6: MATH ANALYSIS 96 FIGURE 6.3: PREDICTED QUANTIZATION EFFECTS IN THE HNR. This f i g u r e plots the predicted r e l a t i o n s h i p between the "Quantized" HNR and the "Actual" or unquantized HNR. Data are presented for /a/, / i / and /u/ vowels at sampling frequencies between 10 kHz and 30 kHz. The predictions were obtained using Eq. (6.23). Quantization implies that pitch-period markers are rounded to the nearest sample point. 0 10 20 30 Actual HNR (dB) CH 6: MATH ANALYSIS 6.3 THE DIRECTIONAL PERTURBATION QUOTIENT (DPQ) 97 This section contains an analysis of the ef f e c t s of quantization and a cen t e r - l i m i t on the DPQ. A general formulation for the DPQ can be found i n Eq. (5.18). This analysis was also presented i n Cox et a l . (in press). 6.3.1 EFFECTS OF QUANTIZATION AND A CENTER-LIMIT Quantization and a cen t e r - l i m i t can be expected to have the following e f f e c t s on the DPQ. A cen t e r - l i m i t reduces the p r o b a b i l i t y of a sign change, thus reducing the expected value. Quantization also tends to reduce the expected value, as i t re s u l t s i n a f i n i t e p o s s i b i l i t y that the perturbation i s zero. The e f f e c t of quantization on the symmetry of the pdf of the perturbation may influence the DPQ. F i n a l l y , a r t i f i c i a l perturbations, dependent on the alignment with quantization boundaries, can be introduced when the data are computed from the differe n c e of quantized values, such as for j i t t e r analysis. The DPQ can be viewed as a compilation of res u l t s from repeated performances of a two outcome experiment, where the two possible outcomes are signchange and no-signchange. When the rep e t i t i o n s are independent, such a process produces a binomial d i s t r i b u t i o n . For a large number of re p e t i t i o n s , a binomial d i s t r i b u t i o n can be approximated by a Gaussian with mean N*P and variance N*P*(1-P), where "N" i s the number of repetitions and "P" i s the p r o b a b i l i t y of signchange. Thus, the pdf of the DPQ can be approximated by a Gaussian with the following s t a t i s t i c s : CH 6: MATH ANALYSIS 98 E[DPQ] = P O D P Q 2 = P*(l-P) / ( N - l ) where P = the p r o b a b i l i t y of a sign change E[DPQ] = the expected value for the DPQ °DPQ 2 = the variance of the DPQ The signchange p r o b a b i l i t y can be estimated using (6.26) P i - P[+,i] & P[neg,i - 1 ] + P[ - , i ] & P[pos , i - l ] ( 6 . 2 7 ) where P i = p r o b a b i l i t y of a sign change at the i ' t h data point P[+,i]= p r o b a b i l i t y that the i ' t h perturbation i s po s i t i v e P[-,i]= p r o b a b i l i t y that the i ' t h perturbation i s negative P [ 0,i]= p r o b a b i l i t y that the i ' t h perturbation i s zero & = l o g i c a l AND with p r i o r i t y over addition P[ p o s , i - l ] = P[ + , i - l ] + P [ 0 , i - 1 ] & ( P[ + , i - 2 ] + P [ 0 , i - 2 ] G< ( • * * St ( P[ + ,i-NZ-l] + 0 . 5 * P [ 0 , i - N Z - l ] )•••.)) P[neg,i - 1 ] = P [ - , i - l ] + P [ 0 , i - 1 ] S« ( P [ - , i - 2 ] + P [ 0 , i - 2 ] St ( St ( P[-,i-NZ-l] + 0 . 5 * P [ 0 , i - N Z - l ] )••*)) Now, assume that the sequence being analyzed i s an independent random va r i a b l e , and that the perturbation i s measured with a moving average, as i n Eq. ( 5 . 1 9 ) . I f P - 0 . 5 a f t e r NZ successive occurrences of a n u l l perturbation, i t follows that P - f x ( X i ) dxi limm(2)-e f x ( x 2 ) dx 2 M=l ,2 CH 6: MATH ANALYSIS 99 fx(y3> d y 3 + lim p ( 3 ) + E *limp(3) f x ( x 3 ) d x 3 * limm(3) limp(NZ+3) fx(YNZ+3) dYNZ+3 +0.5* limp(NZ+3)+£ f x ( x i ) d x l fx( xNZ+3) d xNZ+3 limm(NZ+3) limm(3)-£ fx(y3) dY3 + limm(NZ+3)-E fx(YNZ+3) dYNZ+3 + 0- 5* f x ( x 2 ) d x 2 l i m p ( 2 ) + E limp(3) f x ( x 3 ) dx 3 * limm(3) limp(NZ+3) fX( xNZ+3) d xNZ+3 limm(NZ+3) (6.28; M>2 f x ( x x ) dx x fx( xM-l) d x M - l limm(M)-E f x ( x M ) dxM fx(yM+l) dyM+l + limp(M+l)+£ fx(yM+NZ+l) dyM+NZ+l +0.5* limp(M+NZ+l)+E limp(M+l) fx( xM+l) d x M + l * limm(M+l) "limp(M+NZ+l) fx( xM+NZ+l) d xM+NZ+l limm(M+NZ+l) CH 6: MATH ANALYSIS 100 f x ( x i ) d x l ••• f x ( x M - l ) d x M - l f x ( x M ) dx M limp(M)+e 11mm(M+l)-e fx(VM+l) dYM+l + limm(M+NZ+l)-E fx(yM+NZ+l) dVM+NZ+l +0.5* limp(M+l) f x( xM+l) d xM+l * limm(M+l) limp(M+NZ+l) fx( xM+NZ+l) dxM+NZ+l limm(M+NZ+l) where f x ( x ) = the pdf pf the sequence being analyzed CTR = the c e n t e r - l i m i t f o r the DPQ M = number of p o i n t s i n the moving average l i m p ( i ) = l i m ( i ) + M*CTR limm(i) = l i m ( i ) - M*CTR l i m ( i ) = X i . ! = 2x i_ 1 - X i _ 2 i-l-M/2 = ( M - l ) * X i _ M / 2 - S 3 1-1 - 2 M=l,2 M=3 M>3 j=i+l-M j=i+l-M/2 E = a small number used to indicate that the range of integration approaches but does not include the l i m i t , E matters only when f x ( x ) i s quantized. A program for estimating Eq. (6.28) was implemented. f X ( x ) was assumed to be a Gaussian bounded to 3 standard deviations. A quantized version of t h i s pdf was substituted, and integrations were converted to summations. The ERF[] function, required for computing the quantized pdf, was implemented using a power series CH 6: MATH ANALYSIS 101 approximation (Zeimer & Tranter 1976, p. 495). F i n a l l y , the maximum number of repeated occurrences of a n u l l perturbation (NZ) was 3. DPQ predictions from Eq. (6.28) are compared with measured values i n Figure 6.4. The measured values were obtained from the random sequences described i n Section 6.1.3. There was a close match between predicted and measured values, and the expected tendency to underestimate the DPQ was observed. As indicated by the shaded areas i n Figure 6.4, the alignment of quantization boundaries a f f e c t the DPQ when the number of quantization l e v e l s per standard deviation i s les s than 1. The e f f e c t s of a cen t e r - l i m i t are i l l u s t r a t e d i n Figure 6.5. Without a c e n t e r - l i m i t , s i g n i f i c a n t underestimation occurred when the standard deviation of the perturbation was les s than one quantization l e v e l . The range of underestimation increased when a c e n t e r - l i m i t was applied. When the cen t e r - l i m i t was 1-1/M quantization l e v e l s , underestimation occurred when the standard deviation of the perturbation was les s than three quantization l e v e l s . 1-1/M i s a germane choice of ce n t e r - l i m i t , as that i s the minimum value that rejects the single l e v e l perturbations introduced when data i s computed from the difference of quantized values. CH 6: MATH ANALYSIS 102 FIGURE 6.4: PREDICTED QUANTIZATION EFFECTS IN THE DPQ. Expected values for the DPQ (E[DPQ]) at various c e n t e r - l i m i t s are plotted as a function of the quantization resolution. The dis c r e t e points are DPQ estimates computed from sequences with random perturbation. The l i n e s are predicted values from Eq. (6.28). Shaded areas are the range of values that a r i s e when the quantization of the p r o b a b i l i t y density function i s varied. The DPQ i s defined i n Eq. (5.18). Resolution was s p e c i f i e d with reference to the standard deviation of the perturbation. "NMAVE" = the number of points i n the moving average CH 6: MATH ANALYSIS 103 FIGURE 6.5; COMPARISON OF EXPECTED VALUES FOR THE DPQ. Expected values for the DPQ (E[DPQ]) at various values of NMAVE are plotted as a function of quantization resolution. The expected values were computed using Eq. (6.28). The DPQ is defined in Eq. (5.18). Resolution was specified with reference to the standard deviation of the perturbation. "NMAVE" = the number of points in the moving average RESOLUTION ( q n t - l v l s p e r s t d - d e v ) RESOLUTION ( q n t - l v l s p e r s t d - d e v ) It i s interesting that high resolution DPQ estimates approached 0.66, 0.73, 0.63, and 0.6 for moving average sizes of 2, 3, 4 , and 5, respectively. One might i n i t i a l l y expect that an independent perturbation would produce a DPQ of 0.5. This can be explained by the fact that perturbations are measured with respect to a moving average and not the data mean. Because each point in the sequence tends to draw the moving average in the direction that i t i s perturbed, the probability of a change in sign i s increased. For example, a data point with a negative perturbation causes an increased probability that the sign of the perturbation for nearby points i s positive. As M i s increased, the number of surrounding points affected by a given perturbation CH 6: MATH ANALYSIS 104 i s increased, leading to a higher expected DPQ when M=3 than when M=2. The decrease i n the expected DPQ for M>3 i s due to a reduction of the amount of v a r i a t i o n of the moving average about the mean. 6.3.2 RECOMMENDED SAMPLING CONDITIONS In a t y p i c a l scenario of usage for the DPQ, the quantization resolution, the number of points i n the moving average, and the cen t e r - l i m i t are fixed , but data with a wide range of perturbation magnitudes are encountered. I f the DPQ i s to be primarily dependent on perturbation patterns, the variatio n s i n perturbation magnitude should not a f f e c t the r e s u l t . Sampling conditions required for magnitude independence can be determined from knowledge of t y p i c a l l e v e l s of perturbation. I t has been suggested that perturbations i n pitch-period duration or amplitude can be modeled as Gaussian (Horii 1979). While the standard deviation i s not generally measured, i t can be estimated from published RAP data through rearrangement of Eq. (6.3) or Eq. (6.5). Thus, the minimum recommended data mean for shimmer analysis can be estimated using Eq. (6.29), and the minimum recommended sampling frequency for j i t t e r analysis can be estimated using Eq. (6.30). The units for XMm;jLn i n Eq. (6.29) and Eq. (6.30) are quantization l e v e l s . XM, 'min - NQU m i n * 2*C X/TT / RAP, min (6.29) where XM, 'min = minimum recommended data mean CH 6: MATH ANALYSIS 105 NQU mi n = minimum number of quantization l e v e l s per standard deviation (determined from Figure 6.5) R A P m i n = minimum RAP l i k e l y to be encountered S F m i n = XM m i n * FOmax < 6- 3 0) where S F m i n = the minimum recommended sampling frequency F0 max = the highest fundamental frequency Table 6.4 and Table 6.5 compile the recommended X M m i n and S F m i n for magnitude independence i n the DPQ. While the XM m i n l e v e l s can be obtained using a 12-bit l i n e a r A/D converter, the recommended sampling frequencies are higher than those generally used i n vowel analysis. Thus, high resolution i n pitch-period demarcation i s needed, and in t e r p o l a t i o n i s recommended. TABLE 6.4; MINIMUM DATA MEAN FOR SHIMMER ANALYSIS USING THE DPQ This table compiles the minimum data mean, i n quantization l e v e l s , recommended i n Eq. (6.29) for analysis of shimmer using the DPQ. The DPQ was defined i n Eq. (5.18). Error was assumed to be e n t i r e l y due to l e v e l quantization. "M" = the number of points i n the moving average "NQUmin" = the minimum number of quantization l e v e l s per perturbation standard deviation LOWEST MINIMUM DATA MEAN (qnt-lvls) LEVEL NQUm-j n = l NQU m i n= 3 OF RAP M=2 M=3 M=4 M=2 M=3 M=4 0.001 565 652 691 1692 1955 2073 0.003 189 218 231 565 652 691 0.005 113 131 139 339 391 415 CH 6 : MATH ANALYSIS 106 TABLE 6 . 5 ; MINIMUM SAMPLING FREQUENCIES FOR JITTER ANALYSIS USING THE DPQ This table compiles the minimum recommended sampling frequencies, i n kHz, for analysis of j i t t e r using the DPQ. Pitch-period markers were assumed to be quantized to the nearest, sample point. This should provide independence of perturbation magnitude. "M" = the number of points i n the moving average "NQUmJLn" = the minimum number of quantization l e v e l s per perturbation standard deviation " F 0 m a x " = the maximum fundamental frequency MINIMUM S A M P L I N G F R E Q U E N C I E S (kHz) LOWEST WHEN F 0 m ? , v = 2 0 0 Hz L E V E L NQU min = 1 NQUm -i n = 3 OF RAP M=2 M=3 M=4 M=2 M=3 M=4 0 . 0 0 1 113 131 139 339 391 415 0 . 0 0 3 38 44 47 113 131 139 0 . 0 0 5 23 27 28 68 79 83 6 . 3 . 3 ERROR REJECTION THROUGH A CENTER-LIMIT This section contains the derivation of an expression for computing the p r o b a b i l i t y of a sign error within the D P Q . A sign error i s defined as an erroneous determination of perturbation sign (+, 0 , or -) caused by the presence of an additive measurement error. This expression was used to predict the ef f e c t s of an error, and to evaluate the use of a ce n t e r - l i m i t f o r reducing those e f f e c t s . Assume that the perturbation sequence i s independent of an additive error, and that the pdfs of these two variables (fz(x) CH 6: MATH ANALYSIS 107 and f5(x)) are evenly symmetric about a mean of zero. I f f$(x) i s bounded between -B and B, and i f the preceding sign i n the DPQ can be p o s i t i v e or negative with equal p r o b a b i l i t y , then B>2*CTR P[signerror] = CTR f z ( x ) d x „ 0 *B-CTR + 2* f z ( x ) d x JCTR B+CTR TB f z(X)dx B-CTR B B f 6 ( y ) d y + x+CTR 'B f 6 ( y ) d y + 0.5* x+CTR f6 (y ) d y -X+CTR 'x+CTR fs(y) d y X-CTR f"6(y) dy X-CTR B-CTR f z ( x ) d x 0 "CTR f z(X)dx . B-CTR CTR TB f z ( x ) d x CTR-B B f 6 ( y ) d y + X+CTR *B fs (y) d y + -X+CTR B fs(y) d y -X+CTR B+CTR f z (x)dx CTR "B (6.31) CTR<B<2*CTR CTR+B f z(X)dx CTR f 6 ( y ) d y + -X+CTR where f z ( x ) = the pdf for the perturbation f 5 ( x ) = the pdf for the additive error CTR = the cen t e r - l i m i t for the DPQ. B = the bound for the error B fs (y) d y x-CTR B<CTR fs (y) d y x-CTR CH 6: MATH ANALYSIS 108 I f f z ( x ) i s a zero mean Gaussian w i t h v a r i a n c e o z 2 and i f fg(y) i s evenly d i s t r i b u t e d , then CTR^B, o z 2=0 CTR<B, o z 2=0 o z 2>0 (6.32) e x p [ - ( B z - C z ) 2 ] + e x p [ - ( B z + C z ) 2 ] - 2*exp[-C z 2] P [ s i g n e r r o r ] e = 0 = 0.5*[ 1 - CTR/B ] 1 4Bz~7TF + 1 ( B z - C z ) * e r f [ B z - C z ] + ( B z + C z ) * e r f [ B z + C z ] - 2 C z * e r f [ C z J 4B Z where B z = B / / ( 2 o z 2 ) C z = CTR / / ( 2 o z 2 ) o z 2 = the v a r i a n c e o f the p e r t u r b a t i o n A l t e r n a t i v e l y , i f f g ( y ) i s Gaussian w i t h v a r i a n c e o 6 2 , then P [ s i g n e r r o r ] g = 0.25*erf[B z+C z] + 0 . 2 5 * e r f [ B z - C z ] B>CTR - A* - A* - A* * C T R exp -x 2 * e r f Jo [ 2 o z 2 J • " B + C T R • r exp -x 2 * e r f J C T R [2o z 2 J ' B - C T R • • r exp - X 2 * e r f Jo [ 2 o z 2 J -x + C 6 • x a " C 6 ^ ( 2 o 6 2 ) x + C 6 v^(2o 6 2) dx dx dx P [ s i g n e r r o r ] g = 0.25*erf[B z+C z] + 0 . 2 5 * e r f [ B z - C z ] (6.33) BJsCTR - A* 'CTR exp -x 2 * e r f • CTR-B [ 2 o z 2 j - A* 'CTR+B exp - X 2 * e r f • CTR [ 2 o z 2 j + c 6 _ _ _ _ L _ _ " C 6 A 2 o 6 2 ) dx dx CH 6: MATH ANALYSIS 109 where A = 1 / ( 2 * / ( 2 T T O z 2 ) * e r f [ B 6 ] ) Bg = B / •/( 2og 2) Cg = CTR / /(20g 2) O g 2 = the v a r i a n c e o f the e r r o r D i r e c t d e t e r m i n a t i o n o f Eq. (6.33) i s not p o s s i b l e . However, by s u b s t i t u t i n g a power s e r i e s approximation f o r e r f [ ] , expanding terms i n v o l v i n g powers o f (x±CTR) and s o l v i n g i n t e g r a l s o f the form x n * e x p [ - a x 2 ] , one o b t a i n s > (6.34) P [ s i g n e r r o r ] g = 0 CTR^B, o z 2=0 = 0.5*[ 1 - e r f [ C 6 ] / e r f [ B g ] ] CTR<B, o z 2=0 = 0.25*erf[B z+C z] + 0 . 2 5 * e r f [ B z - C z ] Oz 2 > 0 ~ ( - l ) 1 ( 2 i ) ! i - PO * 2 * C g Z 1 * 2 i=0 i ! j=0 where PO = o z / ( 2TT*o 6*erf [Bg ] ) P l ( j ) P 2(j) + ( 2 i - 2 j + l ) ! ( 2 i - 2 j ) ! P l (j ) = C Z * / T T * J=0 e r f [ B z - C z ] + 2 * e r f [ C z ] - e r f [ B z + C z ] 23 c z 2 ^ 1 e r f [ B z - C z ] + 2 * e r f [ C z ] - e r f [ B z + C z ] j - 2 * e x p [ - ( B z - C z ) 2 ] * 2 ( B z - C z ) 2 k 1 * 2 2 k k! k=l (2k)! - 4*exp[-C z 2] * 2 C z 2 k _ 1 * 2 2 k k! k - l (2k) +2*exp[-(B z+C z) 2] * 2 ( B z + C z ) 2 k 1 * 2 2 k k! k=l (2k)! CH 6: MATH ANALYSIS 110 -i ! P 2 ( j ) = C z 2 ^ ( 2 j + l ) ! 3*2 2k - e x p [ - ( B z - C z ) 2 ] * 2 ( B Z - C Z ) ^ / k! k=0 + 2*exp[-C z 2] * 2 C z 2 k / k! k=0 - e x p [ - ( B z + C z ) 2 ] * 2 ( B z + C z r K / k! k=0 Eq. (6.34) i s a c c u r a t e o n l y when B i s s p e c i f i e d such t h a t u=B s does not exceed the maximum valu e s i n T a b l e 6.6. The s e r i e s used t o approximate e r f [ u ] i s not w e l l behaved f o r l a r g e v a l u e s o f u; i t i n i t i a l l y d i v e r g e s when u>/3, and can produce very poor r e s u l t s when i t i s t r u n c a t e d t o a c o m p u t a t i o n a l l y convenient number o f terms. Ta b l e 6.6 compiles v a l u e s o f u f o r which e s t i m a t i o n e r r o r s are l e s s than 1%. The e r r o r i n c r e a s e s r a p i d l y when u exceeds these l i m i t s . TABLE 6.6: THE RANGE OVER WHICH THE ERF[u] SERIES IS ACCURATE e r f [ u ] - 2//n 2 u 2 i + 1 i=0 i ! ( 2 i + l ) # terms maximum u i n s e r i e s f o r e r r o r < l % 8 1.7 10 1.9 12 2.1 14 2.3 CH 6: MATH ANALYSIS 111 6.3.3.1 RESULTS Figure 6.6 presents results from Eq. (6.34). A center-limit reduced the probability of a sign error. Without a center-limit, P[signerror] rose to 0.5 as the perturbation magnitude dropped. The maximum P[signerror] was less than 0.2 for a center-limit that equals or exceeds the standard deviation of the error. If the measurement error i s due to quantization, then the center-limit of 1-1/M that was used in Figure 6.5 corresponds to CTR*SDerror-2 in Figure 6.6. The maximum P[signerror] under these conditions was less than 0.1. FIGURE 6.6: THE PROBABILITY OF A SIGN ERROR IN THE DPQ. The probability of a sign error in the DPQ is plotted for various center-limits. The probabilities were computed using Eq. (6.34) under the assumption that perturbation and the error are both Gaussian. The DPQ is defined in Eq. (5.18). "SDperturb" = standard deviation of the perturbation "SDerror" = standard deviation of the error "P[signerror]" « probability of an erroneous change in sign CENTER LIMIT STANDARD DEVIATION RATIO ( s j j e r ro r^ ) CH 6: MATH ANALYSIS 6.3.4 SUMMARY OF CONSIDERATIONS FOR THE DPQ 112 The e f f e c t s of quantization and cen t e r - l i m i t on the expected value for the DPQ were analyzed. The expected value for the DPQ depended on the number of points i n the moving average, and approached 0.66, 0.73, 0.63, and 0.6 for moving average sizes of 2, 3, 4, and 5, respectively. Quantization lead to an underestimation, e s p e c i a l l y when the standard deviation of the perturbation was less than 1 quantization l e v e l . A cen t e r - l i m i t increased the range over which underestimation occurred. A cen t e r - l i m i t also reduced the influence of measurement errors, and removed a r t i f i c i a l perturbations that can be introduced when data are computed from the difference of quantized values. Independence of perturbation magnitude i s required i f data are to be used to support the hypothesis that c e r t a i n laryngeal pathologies cause c h a r a c t e r i s t i c patterns of perturbation. The present analysis indicated that s u f f i c i e n t r e s o lution for shimmer analysis can be obtained using a 12-bit l i n e a r A/D converter. However, the sampling frequencies recommended for j i t t e r analysis were higher than those t y p i c a l l y used. The e f f e c t of quantization i n Hecker and Kreul (1971) i s d i f f i c u l t to judge, as t h e i r study was not performed on a computer. However, i t i s l i k e l y that l i m i t e d resolution i n the display apparatus introduced a dependence on perturbation magnitude. Thus, the elevated DPQs observed for pathological speakers can be attributed to perturbation magnitude.as well as perturbation patterns. 113 CHAPTER 7; CALIBRATION OF PERTURBATION MEASURES This chapter contains a description and " c a l i b r a t i o n " of computed measures of sustained vowels. The measures are related to the algorithms i n Chapter 5. A c a l i b r a t i o n using synthetic vowels was performed, and r e s u l t s were v e r i f i e d i n r e a l vowels. Various parts of t h i s chapter were also presented i n Cox et a l . (1986 a, 1986b, i n press, 1989 a, 1989b). T n e names are summarized i n Appendix F. 7.1 CALIBRATION USING SYNTHETIC VOWELS The c a l i b r a t i o n involved an analysis of synthesized vowels to t e s t the influence of formant structure, fundamental frequency, perturbation type, perturbation l e v e l , pitch-period demarcation and quantization. The following questions were addressed: 1) What processing conditions are required? S p e c i f i c a l l y ; a) Can pitch-period demarcation markers be rounded to the nearest sample point? b) Is pitch-period marker optimization (Chapter 4) b e n e f i c i a l ? c) What sampling frequency i s needed? d) Is i n t e r p o l a t i o n necessary? 2) What i s the r e l a t i v e influence of vowel c h a r a c t e r i s t i c s ? S p e c i f i c a l l y ; a) Do the measures depend on the formant structure or the fundamental frequency? CH 7: CALIBRATION 114 b) What i s the r e l a t i v e influence of random j i t t e r , shimmer and additive noise? Methods described i n Chapter 3 were used to synthesize /a/, / i / and /u/ waveforms. Data were generated at a sampling frequency of 80 kHz and downsampled to 10 kHz or 20 kHz to obtain pitch-periods that did not a l l conveniently s t a r t at integer sampling i n t e r v a l s . Waveforms with fundamental frequencies of 103 Hz, 128 Hz, 153 Hz, 178 Hz and 203 Hz were generated. The waveforms contained equal percentages of j i t t e r , shimmer and additive noise. Additional waveforms with only one type of perturbation were generated for question 2b. "Exact" pitch-period markers were synchronized with the d r i v i n g impulses of the synthesizer, and o f f s e t to be aligned with the waveform t r a n s i t i o n preceding the f i r s t large pitch-period o s c i l l a t i o n . "Optimized" markers were obtained using techniques described i n Chapter 4. "Quantized" markers were rounded to the nearest sample point. Unless otherwise s p e c i f i e d , the following conditions applied. 60 successive pitch-periods from each waveform were analyzed. Optimized markers were used. Data were resampled using second order Lagrange i n t e r p o l a t i o n . The sampling frequency was 20 kHz and the fundamental frequency was 128 Hz. Optimally, a l l estimates of a given measure from a given waveform should be the same. Estimates from data sampled at 10 kHz should duplicate those from data sampled at 20 kHz. CH 7: CALIBRATION 115 Estimates from perturbation-free waveforms should be well d i f f e r e n t i a t e d from those from perturbed data. 7.1.1 ORGANIZATION OF FIGURES The r e s u l t s of the c a l i b r a t i o n were i l l u s t r a t e d i n the f i r s t 22 figures of t h i s chapter. Table 7.1 summarizes the organization of these figures. The "quantization" figures i l l u s t r a t e the e f f e c t s of quantization and other processing conditions. The "fundamental frequency" figures i l l u s t r a t e the e f f e c t of varying the fundamental frequency between 10 3 Hz and 203 Hz. The "perturbation p r o f i l e " figures i l l u s t r a t e the r e l a t i v e influence of the various types of perturbation by presenting data for vowels that contain only one type of perturbation. F i n a l l y , the "vowel type" figures reproduce data from the quantization figures i n order to f a c i l i t a t e comparison of r e s u l t s for /a/, / i / and /u/. The e f f e c t s of varying the o f f s e t of pitch-period demarcation markers are i l l u s t r a t e d by the "demarcation o f f s e t " f i g u r e s . Data are plotted as a function of an additive o f f s e t "X" i n the markers, where "X" was constant within each estimate. A zero o f f s e t implies alignment with the waveform t r a n s i t i o n preceding the f i r s t large pitch-period o s c i l l a t i o n . Pitch-period marker optimization was applied p r i o r to addition of the o f f s e t . Three addi t i o n a l types of figures were included for the s p e c t r a l noise measures. The "data-segment length" figure i l l u s t r a t e s the e f f e c t of varying the number of pitch-periods i n each data-segment. The "demarcation error" figures i l l u s t r a t e CH 7: CALIBRATION 116 the e f f e c t of small variations i n the length of the data-segment caused by errors i n pitch-period demarcation. Each data-segment spanned 3 or 4 pitch-periods plus "X" sample points, where "X" was constant within each estimate. F i n a l l y , the "window tapering" figure demonstrates the e f f e c t of applying various percentages of half-cosine taper to the data-segments. TABLE 7.1: FIGURE NUMBERS FOR MEASURE CALIBRATION FIGURES FIGURE TYPE JITTER MEASURE TYPE SHIMMER TIME-NOISE SPECTRAL-NOISE QUANTIZATION 1 4 9 14 FUNDAMENTAL FREQ 2 5 10 19 PERTURBATION PROFILE 3 6 11 20 VOWEL TYPE 7 12 18 DEMARCATION OFFSET 8 13 17 DATA-SEGMENT LENGTH 15 DEMARCATION ERROR 16, 22 WINDOW TAPERING 21 7.1.2 MEASURES OF JITTER "PDAV" i s the measure of j i t t e r magnitude described i n Koike (1973). The RAP algorithm (Eq. (5.1)) was used, with M=3, CTR=0, K=l and aj=l/3. PDAVDB was derived from PDAV using PDAVDB = -10 * l o g 1 0 [ PDAV ] (7.1) CH 7: CALIBRATION 117 Figure 7.1 compares the error-free PDAVDB with estimates obtained using optimized pitch-period markers and estimates obtained using quantized pitch-period markers. There was a close correspondence between PDAVDB estimates at a l l tested l e v e l s of perturbation; The predictions from Eq. (6.3) were within 0.5 dB of the estimates. However, the quantized estimates from perturbation-free data were low, in d i c a t i n g that accuracy i s questionable when PDAVDB i s greater than 25 dB (PDAV < 0.003). The use of optimized pitch-period markers provided a considerable improvement. Data i n Figure 7.2 tes t for dependency of PDAVDB on the fundamental frequency (fO). In general there was l i t t l e dependence. However, higher values of PDAVDB were observed for /u/ when fO was high. Perturbations were apparently masked by resonance with the f i r s t formant. This i s consistent with the evaluations of Chapter 4, where increased error variances were observed for /u/ data. The absence of t h i s e f f e c t f or / i / was attributed to the presence of strong high frequency formants. Figure 7.3 i l l u s t r a t e s the r e l a t i v e influences of j i t t e r , shimmer and additive noise. PDAVDB provided excellent i s o l a t i o n of j i t t e r . In a l l cases i t took more than twenty times the amount of shimmer or noise to produce the same e f f e c t as j i t t e r . As expected, a larger dependence on shimmer and noise was observed for /u/. CH 7: CALIBRATION 118 FIGURE 7.1; PDAVDB; SAMPLING AND INTERPOLATION EFFECTS This figure illustrates sampling and interpolation effects in a log-magnitude measure of j i t t e r . The appended number in the labels is the sampling frequency. "Exact" = pitch-period markers from the vowel synthesizer "Quantized" = markers rounded to the nearest sample point "Optimized" = markers optimized as in Chapter 4 50-co -o CO o < Q O. CH 7: CALIBRATION FIGURE 7.2: PDAVDB; FUNDAMENTAL FREQUENCY DEPENDENCE T h i s f i g u r e i l l u s t r a t e s dependence o f a lo g - m a g n i t u d e measure j i t t e r on t h e fundamental f r e q u e n c y . 30 CO -o CQ Q > O 10 30 CO •o CO Q o a. 10 / a / l i l o-o-o-o-n D-O-O-O - 0 0*0-0-0"C 0-0-0-8^ CO CO o >• < o a. 1 2 F0*.01 (Hi) IS 3X SI lot PERTURBATION LEVEL (X) CH 7: CALIBRATION 120 FIGURE 7.3: PDAVDB; PERTURBATION PROFILE This figure illustrates the relative influences of random j i t t e r , shimmer and additive noise on a log-magnitude measure of j i t t e r . 10-1 1 1 — i — • — • • — 0 1 0 PERTURBATION LEVEL (%) CH 7: CALIBRATION 121 7.1.3 MEASURES OF SHIMMER Two approaches to measurement of shimmer were implemented. The f i r s t used the t r a d i t i o n a l method of measuring pitch-period peak amplitude. Parabolic i n t e r p o l a t i o n (Markel & Gray 1976, p. 167) was applied i n an attempt to reduce the ef f e c t s of time quantization (see Section 6.1.8). The second approach redefined the pitch-period amplitude as the standard deviation of each pitch-period computed over a time i n t e r v a l that i s equal to the shortest pitch-period duration. Alignment with the peaks should not be as c r i t i c a l using the second approach, provided that data are sampled at greater than the Nyquist frequency. The two approaches were c a l l e d "amplitude shimmer" and "stddev shimmer", and the corresponding names were "PAAV" and "PSAV". As for PDAV, the RAP algorithm (Eq. (5.1)) was used, with M=3, CTR=0, K=l and an=1/3. PAAVDB and PSAVDB were derived using Eq. (7.2) and Eq. (7.3), respectively. PAAVDB = -10 * l o g 1 0 [ PAAV ] (7.2) PSAVDB = -10 * l o g 1 0 [ PSAV ] (7.3) Figure 7.4 i l l u s t r a t e s the ef f e c t s of time quantization and in t e r p o l a t i o n on PAAVDB and PSAVDB. The expected value for PAAVDB, computed using Eq. (6.3), i s compared with estimates obtained with and without parabolic i n t e r p o l a t i o n . Without i n t e r p o l a t i o n , a large deviation from the expected value was observed for / i / and to a les s e r extent /a/. Interpolation improved the s i t u a t i o n , but accuracy remains questionable at low l e v e l s of perturbation for / i / . CH 7: CALIBRATION 122 In contrast with PAAVDB, time quantization had l i t t l e e f f e c t on PSAVDB. In f a c t , resampling of data to resolve r e a l valued pitch-period markers was detrimental, p a r t i c u l a r l y for / i / . E vidently, errors due to pitch-period marker quantization are reduced during computation of the standard deviations, but errors introduced by in t e r p o l a t i o n are not. Figure 7.5 i l l u s t r a t e s the e f f e c t of varying the fundamental frequency. There was up to 6 dB v a r i a t i o n i n both measures when fO was varied. Both measures tended to be higher for / i / and /u/ when the fundamental frequency approached the f i r s t formant frequency. The v a r i a t i o n for PSAVDB was les s than for PAAVDB when fO was low. These e f f e c t s can be attributed to the combined influence of pitch-period superposition and j i t t e r . The r e l a t i v e influences of j i t t e r , shimmer and additive noise on PAAVDB and PSAVDB are i l l u s t r a t e d i n Figure 7.6. S u r p r i s i n g l y , j i t t e r was the most i n f l u e n t i a l type of perturbation for both measures. This i s a t t r i b u t a b l e to pitch-period superposition. The influences of shimmer and noise on PAAVDB were roughly equal, but noise e f f e c t s were l a r g e l y reduced for PSAVDB. Data from Figure 7.4 was reproduced i n Figure 7.7 for comparisons between vowels. The differences for PSAVDB at low le v e l s of perturbation were less than for PAAVDB. Furthermore, i n Figure 7.5 the r e l a t i v e influences of j i t t e r , shimmer and noise were more consistent for PSAVDB. This suggests that PSAVDB i s l e s s affected by the formant structure than PAAVDB. CH 7: CALIBRATION 123 FIGURE 7.4: PAAVDB & PSAVDB; SAMPLING AND INTERPOLATION EFFECTS This figure illustrates sampling and interpolation effects in log-magnitude measures of shimmer. PAAVDB is based on peak amplitudes and PSAVDB is based on standard deviations. The appended number in the labels is the sampling frequency. "Expected" "Interpolated 1 "Exact" "Quantized" "Optimized" = prediction of PAAVDB from Eq. ( 6 .3 ) = parabolic peak interpolation = pitch-period markers from the vowel synthesizer = markers rounded to the nearest sample point = markers optimized as in Chapter 4 = Q u a n t i z e d - 2 0 k H z = O p t i m i z e d - l O k H z O = 0 p t i r a i z e d - 2 0 k H z A » E x a c t - 2 0 k H z PERTURBATION LEVEL (%) PERTURBATION LEVEL (%) CH 7: CALIBRATION 124 FIGURE 7.5: PAAVDB & PSAVDB; FUNDAMENTAL FREQUENCY DEPENDENCE This figure illustrates the dependence of log-magnitude measures of shimmer on the fundamental frequency. PAAVDB i s based on peak amplitudes and PSAVDB is based on standard deviations. PERTURBATION LEVEL (t ) PERTURBATION LEVEL (X) CH 7: CALIBRATION 125 FIGURE 7.6: PAAVDB & PSAVDB; PERTURBATION PROFILE This figure illustrates the relative influences of random j i t t e r , shimmer and additive noise on log-magnitude measures of shimmer.' PAAVDB i s based on peak amplitudes and PSAVDB i s based on standard deviations. PERTURBATION LEVEL (%) PERTURBATION LEVEL (%) CH 7 : CALIBRATION 126 FIGURE 7 . 7 ; PAAVDB & PSAVDB; DEPENDENCE ON VOWEL TYPE This figure illustrates the dependence of log-magnitude measures of shimmer on the type of vowel (i.e., the formant structure). PAAVDB i s based on peak amplitudes and PSAVDB is based on standard deviations. The effect on PSAVDB of changing the waveform event used for pitch-period demarcation was simulated in Figure 7 . 8 . A constant offset was added to or subtracted from pitch-period markers. PSAVDB tended to be lower at positive offsets. This can be attributed to the omission of large-amplitude portions of the pitch-periods when j i t t e r i s present, as the integration range is determined by the shortest pitch-period duration. Thus, the waveform event should be chosen such that this does not occur. CH 7: CALIBRATION 127 FIGURE 7.8: PSAVDB; PITCH-PERIOD DEMARCATION OFFSET T h i s f i g u r e i l l u s t r a t e s t h e e f f e c t o f v a r y i n g t h e o f f s e t o f p i t c h - p e r i o d d e m a r c a t i o n on a l o g - m a g n i t u d e measure o f shimmer t h a t i s based on p i t c h - p e r i o d s t a n d a r d d e v i a t i o n . The o f f s e t "X" was c o n s t a n t f o r each e s t i m a t e . 25 CQ CO o <: / i / o-ooooo^ - 1 0 1 - 1 0 1 - 1 0 1 - 1 0 1 X(asec) X(asec) X(nsec) X(nsec) IS 31 51 10X PERTURBATION LEVEL (*) CH 7: CALIBRATION 7.1.4 MEASURES OF TIME DOMAIN NOISE 128 Three measures of time domain noise were defined. "HNRL" i s the harmonics-to-noise r a t i o (Eq. (.5.3)), computed with T n=T.j_ m a x and M_=l. This configuration i s roughly equivalent to Yumoto's HNR (Eq. (5.2)). "HNRN" i s Eq. (5.3) computed with Th=Ti mj_ n and Mj^ chosen to normalize the power i n each pitch-period. F i n a l l y , "CF" used the c o r r e l a t i o n factor (Eq. (5.14)), with K=l and T h = T i m i n ' The e f f e c t s of pitch-period demarcation on HNRN and CF are i l l u s t r a t e d i n Figure 7.9. As predicted i n Section 6.2, large underestimation occurred for /a/ and / i / data when pitch-period markers were quantized. The mathematical predictions were within 1 dB of the measured values. The absence of a steep negative slope at low l e v e l s of perturbation indicates that the a b i l i t y to resolve low perturbation data i s compromised. The underestimation was reduced through pitch-period marker optimization. However, underestimation remained for /a/ and / i / data sampled at 10 kHz. Since t h i s underestimation also occurred when "exact" markers were used, i t i s probably due to int e r p o l a t i o n errors. Thus, oversampling or a more sophisticated approach to in t e r p o l a t i o n i s recommended. The r e l a t i v e influences of j i t t e r , shimmer and noise on HNR^ and CF are i l l u s t r a t e d i n Figure 7.10. Whereas shimmer had l i t t l e e f f e c t , these measures were strongly influenced by both j i t t e r and noise. The r e l a t i v e influence of j i t t e r was less for CF than for HNRN. CH 7: CALIBRATION 129 As with PAAVDB and PSAVDB, the influence of j i t t e r can be attributed to pitch-period superposition. The same can be said for the dependence on fundamental frequency that i s i l l u s t r a t e d i n Figure 7.11. HNRN varied by up to 6.3 dB, and CF varied by up to 4.4 dB. Data from Figure 7.9 was reproduced i n Figure 7.12 for comparisons among vowels. Lower HNR estimates were obtained for vowels with large high frequency components. A number of factors contribute to t h i s dependence. The influence of j i t t e r caused by pitch-period superposition i s greater when high frequency components are present. Interpolation errors can contribute to the difference, p a r t i c u l a r l y for / i / . However, errors i n the pitch-period optimization algorithm cannot be blamed, as the same dependence was observed when "exact" markers were used. An extreme s e n s i t i v i t y to the choice of event for pitch-period demarcation i s i l l u s t r a t e d i n Figure 7.13. HNR estimates were maximized when the o f f s e t was zero, i . e . , when pitch-period markers were aligned with the waveform t r a n s i t i o n preceding the f i r s t large pitch-period o s c i l l a t i o n . The sharp decrease at p o s i t i v e o f f s e t s can be explained as follows. When the o f f s e t i s p o s i t i v e , two impulses of the vowel synthesizer make a major contribution to each pitch-period, r e s u l t i n g i n an increased s e n s i t i v i t y to j i t t e r . The implication for rea l vowels i s that pitch-period markers should be aligned with, or s l i g h t l y precede, the instant of g l o t t a l closure. The pitch-period peak, or a zero crossing following the peak, are not recommended. CH 7: CALIBRATION 130 FIGURE 7 .9; HNRM & CF; SAMPLING AND INTERPOLATION EFFECTS This figure illustrates sampling and interpolation effects in measures of time domain noise. The appended number in the labels is the sampling frequency. "Exact" = pitch-period markers from the vowel synthesizer "Quantized" = markers rounded to the nearest sample point "Optimized" = markers optimized as in Chapter 4 PERTURBATION LEVEL (!) PERTURBATION LEVEL (%) CH 7: CALIBRATION 131 FIGURE 7.10: HNRM & CF; FUNDAMENTAL FREQUENCY DEPENDENCE This figure illustrates the dependence of measures of time domain noise on the fundamental frequency. CO BC ~ 30 CO 10 r 3X sx 10X 1 2 F0V01 (Hi) 1 2 F0*.01 (Hi) PERTURBATION LEVEL (X) IX 3X 5X 10X PERTURBATION LEVEL (X) CH 7: CALIBRATION 132 FIGURE 7.11; HNRM & CF; PERTURBATION PROFILE This figure illustrates the relative influences of random j i t t e r , shimmer and additive noise on measures of time domain noise. CH 7: CALIBRATION 133 FIGURE 7.12: HNRM & CF; DEPENDENCE ON VOWEL TYPE This figure illustrates the dependence of measures of time domain noise on the type of vowel (i.e., the formant structure). PERTURBATION LEVEL (%) PERTURBATION LEVEL (%) CH 7: CALIBRATION 134 FIGURE 7.13: HNRM & CF; PITCH-PERIOD DEMARCATION OFFSET This figure illustrates the effect of varying the offset of pitch-period demarcation on measures of time domain noise. The offset "X" was constant for each estimate. 40 • 1 0 1 - 1 0 1 - 1 0 1 - 1 0 1 X(msec) X(msec) X(msec) X(msec) 1* 31 5X 10X PERTURBATION LEVEL (X) 1 0 1 -1 0 1 .1 0 1 -1 0 1 X(msec) X(Bsec) X(msec) X(msec) IX 3X SX 10X PERTURBATION LEVEL (X) CH 7: CALIBRATION 135 7.1.4.1 COMPARISON OF HNR CONFIGURATIONS Knowledge of the e f f e c t of varying analysis parameters, i n the H N R i s useful for comparing results from d i f f e r e n t studies. Data i n Table 7.2 compares the c o r r e l a t i o n factor (Eq. (5.14)) and three configurations of the H N R (Eq. (5.3)) . Data for H N R L and HNRg reveals that shortening of the range of integration (T n) had l i t t l e e f f e c t . Amplitude normalization, as i n HNRJJ, resulted i n an increase i n the H N R estimate of approximately 1 dB, probably due to a reduction of the influence of shimmer. F i n a l l y , as discussed i n the comparison with HNRM-Qen^Qyic (Section 5.2.3), the difference of over 3 dB between H N R N and the c o r r e l a t i o n factor i s att r i b u t a b l e to retention of noise i n the harmonic component estimate. Underestimation caused by marker quantization was observed for a l l measures i n Table 7.2. However, the e f f e c t appears to be less for the c o r r e l a t i o n factor than for the HNRs. I t i s worth noting that the mathematical analysis of marker quantization i s not d i r e c t l y applicable for the c o r r e l a t i o n f a c t o r , as i t was assumed that N i s large. The same conclusions can be drawn from s i m i l a r data for / i / and /u/ vowels, and the ef f e c t s observed i n Figures 7.9 through 7.13 were present for a l l HNR configurations. However, the difference between HNRg and HNRN for /u/ was approximately 3 dB. This can be explained by an increased v a r i a t i o n of pitch-period amplitude caused by a combination of j i t t e r and pitch-period superposition. CH 7: CALIBRATION 136 TABLE 7.2: COMPARISON OF HNR ESTIMATES FOR SYNTHETIC /a/ VOWELS This table compares the c o r r e l a t i o n factor (Eq. (5.14)) and 3 configurations of the HNR (Eq. (5.3)) for synthetic /a/ vowels. "Quantized" = markers rounded to the nearest sample point "Optimized" = markers obtained as described i n Chapter 4 "PTB" = the l e v e l of perturbation i n the vowels 1 1 normalized energy OPTIMIZED MARKERS QUANTIZED MARKERS PTB HNRL HNRg HNRN CF(1) HNRL HNRg HNRN CF(1) (%) (dB) (dB) (dB) (dB) (dB) (dB) (dB) (dB) 0 39.4 39.4 39.5 43.3 21. 3 21.3 21.3 27 .0 1 30.3 30.4 31.5 35.9 21.0 21.0 21.2 31.3 3 22.2 22. 3 23.4 27.0 18.4 18.4 18.8 23.7 5 18.7 18.8 19.8 23.2 16 . 3 16.6 17.2 21 .0 10 13.9 14.2 14.9 18.1 12.4 12.7 13.2 16.4 20 8.5 9.1 9.5 13.0 7.4 8.1 8.4 11.8 7.1.5 MEASURES OF SPECTRAL NOISE "SHNR r o s" and "SHNRsor" are the spectral harmonics-to-noise r a t i o s defined i n Eq. (5.16) and Eq. (5.17), respectively. The subscript "ros" stands for ratio-of-sums, and the subscript "sor" stands for sum-of-ratios. NPP was varied between 2 and 5, NSKIP was 0 and NMAX was set so that the analysis included frequency components between 0 and 5,000 Hz. Prior to estimation of Fourier c o e f f i c i e n t s , data-segments were resampled using t h i r d order Lagrange "HNRL" = HNR with "HNRg" = HNR with "HNRN" = HNR with Th=Timax a n d M i = Th= Timin a n d M i = Th= Timin a n d M i -CH 7: CALIBRATION 137 i n t e r p o l a t i o n so that they spanned an integer number of sample points. The Fourier c o e f f i c i e n t s were estimated using one of the following methods; a d i r e c t DFT algorithm ( i . e . , a DFT algorithm implemented such that there i s no r e s t r i c t i o n on the number of points i n an analysis window), or a FFT algorithm preceded by second order Lagrange resampling. A rectangular analysis window was used, and trigonometric lookup tables were used to speed computations. Further d e t a i l on the spectrum analysis can be found i n Section 5.4.2. 7.1.5.1 EFFECTS OF VARYING THE PROCESSING CONDITIONS Figure 7.14 plots SHNR r o s and SHNR s o r estimates obtained under various processing conditions. The close correspondence between DFT-based and FFT-based SHNR estimates was close. The FFT-based SHNR r o s estimates from data sampled at 10 kHz tended to be higher than the DFT-based estimates, and FFT-based SHNR s o r estimates tended to be lower. The differences were largest for / i / data (2 dB for SHNR r o s, -0.6 dB for SHNR s o r) and smallest for /u/ data (less than 0.2 dB for both measures). A l l such differences were less than 0.3 dB for data sampled at 20 kHz. The above differences can be attributed to a low-pass f i l t e r i n g e f f e c t of Lagrange i n t e r p o l a t i o n (Schafer & Rabiner 1973). Since the noise components i n the data had a greater preponderance of high frequency than the harmonic component, the low-pass f i l t e r reduced the r e l a t i v e l e v e l of noise. Differences were up to 5 times larger for f i r s t order i n t e r p o l a t i o n . There was l i t t l e benefit i n using t h i r d order i n t e r p o l a t i o n . CH 7: CALIBRATION 138 FIGURE 7.14; SHNRrr>t, & SHNRgOT-; SAMPLING AND INTERPOLATION EFFECT This figure illustrates the effects of sampling, interpolation and processing mode for measures of spectral noise. The appended number in the labels i s the sampling frequency. Each analysis window spanned 3 pitch-periods. "DFT " "FFT2" "DFT a direct DFT computation a fast DFT preceded by 2nd order Lagrange resampling data-segment demarcation rounded to the nearest sample PERTURBATION LEVEL (*) PERTURBATION LEVEL (%) CH 7: CALIBRATION 139 . The estimates from perturbation-free data suggest that oversampling provided an improvement for vowels with a predominance of high frequency. Doubling the sampling frequency increased the perturbation-free SHNR r o s from 21.9 dB to 41.2 dB for / i / , and from 29.4 dB to 49.0 dB for /a/. The increase for SHNR s o r was not as large. FFT-based computation of SHNR r o s and SHNR s o r was between 9 and 17 times faster than DFT-based computation. The time savings were roughly the same for data sampled at 10 kHz and data sampled at 20 kHz. This i s not surpris i n g , as the number of mu l t i p l i c a t i o n s required for DFT computation i s proportional to N i n * N o u t ' a n d the number of mu l t i p l i c a t i o n s required for FFT computation i s proportional to N i n * l o g 2 [ N i n ] , where " N i n " i s the number of input data points and "N o ut" i s the number of computed spe c t r a l c o e f f i c i e n t s . When the sampling frequency i s doubled, NjLn i s doubled but N o u t remains constant. The plots i n Figure 7.15 reveal a downward s h i f t i n SHNR r o s and SHNR s o r as the number of pitch-periods per data-segment (NPP) increases. Estimates for a l l three vowels s h i f t e d downwards by roughly 3 dB, 2 dB and 1 dB as NPP increased from 2 to 5. This trend i s reduced i f the summations are computed as averages. A f t e r f a c t o r i n g and eliminating common terms, i t follows that; SHNR r o s'( NPP,NSKIP ) = SHNR r o s( NPP,NSKIP ) + C (7.4) SHNR s o r'( NPP,NSKIP ) = SHNR s o r( NPP,NSKIP ) + C (7.5) where C = 10*lo g 1 0 [ NPP-2*NSKIP-1 ] CH 7: CALIBRATION 140 FIGURE 7.15: SHNRrng & SHNRwnr; VARIATION OF DATA-SEGMENT LENGTH This figure illustrates a dependence on window length in two measures of spectral noise. The number of pitch-periods in each analysis window was varied between 2 and 5. PERTURBATION LEVEL (*) PERTURBATION LEVEL (%) The values produced by Kojima's algorithm (Eq. (5.15)) were comparable to those produced by SHNRros when NPP=3 and NSKIP=0; SHNRros was less than 1.5 dB higher at low levels of perturbation. Conclusions drawn here about the performance of the SHNRros apply for both algorithms. 7.1.5.2 DATA-SEGMENT DEMARCATION EFFECTS When the markers used to demarcate data-segments l i e between sample points, one can either interpolate the data or round the markers to the nearest point. The curves labelled DFT and D F T Q in Figure 7.14 represent both cases. There was less than 0.4 dB difference between these curves at a l l tested levels of CH 7: CALIBRATION 141 perturbation. However, SHNR r o s and SHNR s o r were 17 dB higher for perturbation-free data when in t e r p o l a t i o n was used. Thus, i n t e r p o l a t i o n provides a greater range for vowels with low le v e l s of perturbation. I f p i t c h synchronization was not important, then small errors i n the length of the data-segments would have l i t t l e e f f e c t on the computed r e s u l t . In Figure 7.16, the s e n s i t i v i t y of the SHNRs to such errors was tested by appending or deleting up to 5 sample points for each data-segment. The number of points added or subtracted was constant for a given SHNR estimate. The sharp peaks i n t h i s Figure indicate an extreme s e n s i t i v i t y to errors i n the length of the data-segments. The s e n s i t i v i t y decreased as NPP increased, but remained large for a l l NPP between 2 and 5. This e f f e c t was present for a l l three vowels. In contrast with demarcation errors, the addition of an of f s e t to the demarcation markers had l i t t l e e f f e c t . This was tested by adding or subtracting a constant of up to 1 msec to each pitch-period marker. Results for /a/ are summarized i n Figure 7.17. There was l i t t l e v a r i a t i o n i n SHNR r o s, and vari a t i o n s i n SHNR s o r were less than 2 dB. The same conclusions can be drawn from s i m i l a r plots for other vowels and other values of NPP. CH 7: CALIBRATION 142 FIGURE 7.16: SHNRrnc; & SHNRgnr; DATA-SEGMENT DEMARCATION ERRORS This figure illustrates the effect of errors in pitch-period . demarcation on measures of spectral noise. Each analysis window spanned 3 pitch-periods plus "X" sample points, where "X" was constant within each estimate. -5 0 5 -5 0 5 X(pts) X(pts) -5 0 5 X(pts) -5 0 5 X(pts) OS IX 3t 5X PERTURBATION L E V E L ( X ) -5 0 5 X(pts) 1CX 5 0 5 X(pts) -5 0 5 X(pts) -5 0 5 X(pts) -5 0 5 X(pts) OX IX 3X 5X PERTURBATION L E V E L (I) ICS F I G U R E 7.17: S H N R r n c , & S H N R g n r ; D A T A - S E G M E N T D E M A R C A T I O N O F F S E T This figure illustrates the effect of the offset of pitch-period demarcation on measures of spectral noise. A constant offset ("X") was added to or subtracted from the pitch-period demarcation markers. Each analysis window spanned 3 pitch-periods. - 1 0 1 - 1 0 1 - 1 0 1 - 1 0 1 - 1 0 1 X(wsec) X(«sec) X(«sec) X(«sec) X(«sec) - 1 0 1 - 1 0 1 - 1 0 1 - 1 0 1 - 1 0 1 X(msec) X(msec) X(msec) X(msec) X(msec) OX IX 3X 5X 10X ox IX 3X 5X 10X PERTURBATION L E V E L ( X ) PERTURBATION L E V E L (X) CH 7: CALIBRATION 7.1.5.3 RELATIONSHIPS WITH VOWEL CHARACTERISTICS 143 Data from Figure 7.14 was reproduced in Figure 7.18 for comparisons between vowels. SHNRros varied by as much as 15 dB at a given level of perturbation, depending on whether /a/, / i / or /u/ was being analyzed. In addition, the functional relationship with perturbation level was different for / i / than for /a/ or /u/. The differences were less for SHNRsor; the functional relationship with perturbation level was the same for a l l three vowels, and there was less than 4 dB difference between the curves. FIGURE 7.18: SHNRrng & SHNRsor; DEPENDENCE ON VOWEL TYPE This figure illustrates the dependence of measures of spectral noise on the type of vowel (i.e., the formant structure). Each analysis window spanned 3 pitch-periods. 40 co •o o PERTURBATION LEVEL {%) • = /a/ 0= IM o = /u/ PERTURBATION LEVEL (%) 10 CH 7: CALIBRATION 144 In Figure 7.19, the SHNRs tended to increase with increasing fundamental frequency. Var i a t i o n of the fundamental frequency from 10 3 Hz to 20 3 Hz lead to variations of over 6 dB i n SHNR r o s. SHNR s o r provided a s l i g h t reduction i n t h i s s e n s i t i v i t y for /a/ data but not for / i / or /u/ data. I t should be noted that the d i f f e r i n g dependencies on fundamental frequency for /a/, / i / and /u/ data a f f e c t the separation between the curves i n Figure 7.18. Figure 7.20 indicates that j i t t e r has a much larger influence on the SHNRs than shimmer or noise. I t took more than 10 times the percentage of shimmer or noise to produce a given SHNR l e v e l i n /a/ data. The dominance of j i t t e r f or SHNR r o s was greater for 11/ and less for /u/. The dominance of j i t t e r for SHNR s o r was roughly the same for a l l vowels. CH 7: CALIBRATION 145 FIGURE 7.19: SHNRT-ng & SHNRgor.; FUNDAMENTAL FREQUENCY DEPENDENCE This figure illustrates the dependence of measures of spectral noise on the fundamental frequency. Each analysis window spanned 3 pitch-periods. 30 CO to o _ Of z: a: to -10 co T3 IS) O -10 1 2 F0V01 (Hz) IS 3t 51 10X 31 5X 1 2 F0*.01 (Hz) 10X PERTURBATION LEVEL (X) PERTURBATION LEVEL (X) CH 7: CALIBRATION 1 4 6 FIGURE 7.20: SHNRrng & S H N R c , n r ; PERTURBATION PROFILE This figure illustrates the relative influences of random j i t t e r , shimmer and additive noise on measures of spectral noise. Each analysis window spanned 3 pitch-periods. /a / • • Jitter O • shimmer O • noise PERTURBATION LEVEL (%) PERTURBATION LEVEL {%) CH 7: CALIBRATION 7.1.5.4 WINDOW TAPERING FOR CONTROL OF SPECTRAL LEAKAGE 147 Figure 7.21 illustrates the effect of using the window function defined in Eq. (7.6). This is referred to as a Tukey window in spectrum analysis literature. The taper regions are specified as a percentage of the total window length, and a taper of 50 percent produces a Hanning window. 0.5*[ l-cos ( T r*(k-l)/(A-l) ) ] 1 _ k _ A w ( k ) = l A < k < N-A+l (7.6) 0.5*[ l - C O S ( T i*(N-k)/(A-l) ) ] N-A+l £ k _ N where A = PCT * N N = the number of points in the window PCT = size of the taper regions expressed as a percentage of the window length FIGURE 7.21: SHNRrng; THE EFFECT OF WINDOW TAPERING The effect of window tapering (Eq. (7.6)) on a measure of spectral noise. Each analysis window spanned 4 pitch-periods. "NSKIP" is a spacing parameter in the SHNR algorithm. PERTURBATION LEVEL {%) PERTURBATION LEVEL {%) CH 7: CALIBRATION 148 The tapering caused a large reduction i n the s e n s i t i v i t y of SHNR r o s to perturbation l e v e l . When NSKIP was 0, a large reduction was observed for a l l nonzero taper percentages. When NSKIP was 1, the desired s e n s i t i v i t y to perturbation l e v e l was obtained for both a 0 percent (rectangular) taper and a 50 percent (Hanning) taper. These results can be predicted by inspecting the Fourier transform of the Tukey window (Childers & Durling 1975, p. 436). Harmonic energy leaks into spectral components associated with noise i n a l l cases for which there was a reduction i n s e n s i t i v i t y to perturbation l e v e l . The eff e c t s decreased when NPP was increased, but remained large for a l l NPP between 2 and 5. Similar e f f e c t s can be observed for SHNR s o r and for the other vowels. The data i n Figures 7.14 through 7.20 were recalculated with NPP=4, NSKIP=1 and a Hanning window. The most notable change was a reduction i n s e n s i t i v i t y to errors i n data-segment demarcation. This becomes apparent when one compares Figure 7.22 with Figure 7.16. The peaks i n Figure 7.22 are les s pronounced, and when the error (X) was 3 sample points or l e s s , the SHNR r o s and SHNR s o r estimates from perturbation-free data exceeded a l l estimates from perturbed data. A second difference i s a reduction i n the dependence of SHNR s o r on the fundamental frequency for /a/ data; the v a r i a t i o n was les s than 3 dB for fundamental frequencies between 103 Hz and 203 Hz. However, s i m i l a r reductions were not observed for / i / or /u/ data. CH 7: CALIBRATION 149 FIGURE 7 .22 : SHNRrftg & SHNRgor; DATA-SEGMENT DEMARCATION ERRORS WITH A HANNING WINDOW This figure illustrates the effect of errors in pitch-period demarcation on measures of spectral noise when a Hanning window is used. Each analysis window spanned 4 pitch-periods plus "X" sample points, where "X" was constant within each estimate. The NSKIP parameter was 1 . / a / -5 0 5 - 5 0 5 - 5 0 5 - 5 0 5 - 5 0 5 0 -5 0 5 - 5 0 5 - 5 0 5 - 5 0 5 - 5 0 5 X(pts) X(pts) X(pts) X(pts) X(pts) x(pts) X(pts) X(pts) X(pts) X(pts) OX IX 3X 5X 10X OX IX 3X 5X 1CX PERTURBATION L E V E L ( X ) PERTURBATION L E V E L ( X ) A by-product of reduced sensitivity to demarcation errors is that errors associated with rounding of demarcation markers to the nearest sample point become tolerable. The error was less than 0 .6 dB for a l l SHNRros and SHNRsor estimates from perturbed data, and quantized estimates from perturbation-free data were more than 13 dB higher than estimates from perturbed data. Use of the Tukey window in this section was convenient for il l u s t r a t i n g the effects of spectral leakage. However, other windows described in the literature may be preferred (e.g., Childers & Durling 1975, pp. 433-440) . For example a Hamming CH 7: CALIBRATION 150 window provides a superior reduction i n spectral leakage without a further decrease i n spectral resolution. 7.1.6 MEASURES OF PERTURBATION PATTERNS The C y c l i c Perturbation Factors (CPFs) used the algorithms i n Eq. (5.20), Eq. (5.21) and Eq. (5.22), with M=3, CTR=0, K=2 and aj=l/3. The names are "xxCPF", where "xx" s p e c i f i e s the type of perturbation (see Appendix F). These measures are s e n s i t i v e to alternate cycle p e r i o d i c i t y i n the perturbations. As expected, the CPFs were approximately zero for the t e s t vowels. Additio n a l vowels that contained alternate cycle p e r i o d i c i t y were generated, and suitably large values for the CPFs were obtained. The d i r e c t i o n a l perturbation quotients (DPQs) used the algorithm i n Eq. (5.18), with M=3, CTR=0, K=l and aj=l/3. The names of the DPQs take the same form as the CPFs (see Appendix F ) . The e f f e c t s predicted i n Section 6.3 were observed i n the t e s t data; er r o r - f r e e DPQ estimates were approximately 0.7, and quantization of pitch-period markers lead to underestimation of PDDPQ. Pitch-period marker optimization removed the underestimation. Problems of underestimation were not observed for PADPQ, PSDPQ or CFDPQ. The c o r r e l a t i o n measures are c o r r e l a t i o n c o e f f i c i e n t s between time functions of perturbations. The time functions for j i t t e r and shimmer were obtained from PDAV, PAAV and PSAV. The time function for noise was obtained from CF with i t s logarithm removed. The mean of each time function was removed. CH 7: CALIBRATION 151 The names of the c o r r e l a t i o n c o e f f i c i e n t s are "xxXyy", where "xx" and "yy" specify the type of perturbation. "PD" = j i t t e r , "PA" = amplitude shimmer, "PS" = stddev shimmer, and "CF" = time domain noise. Thus, PDXPA i s the c o r r e l a t i o n c o e f f i c i e n t of the time function of perturbation i n pitch-period duration, with the time function of perturbation i n pitch-period peak amplitude. As expected, there was a high p o s i t i v e c o r r e l a t i o n between shimmer sequences (E[PAXPS]=0.78). The c o r r e l a t i o n was lower for / i / (E[PAXPS]=0.53) than for /a/ and /u/, probably due to errors i n peak i n t e r p o l a t i o n . There was considerable v a r i a t i o n i n the c o r r e l a t i o n between j i t t e r and shimmer, depending on the vowel and fundamental frequency. Correlations ranged from 0.65 to -0.75. F i n a l l y , there was l i t t l e c o r r e l a t i o n between j i t t e r and noise (PDXCF), and a small negative c o r r e l a t i o n between shimmer and noise (PAXCF and PSXCF). The following r e s u l t s were obtained; E[PDXCF]=0.05 and E[|PDXCF|]=0.13, E[PAXCF]=-0.09 and E[|PAXCF|]=0.15, and E[PSXCF]»-0.13 and E[|PSXCF|]=0.18, where E[|x|] i s the mean of the absolute value of x. The correlations are probably due to the combined influence of pitch-period superposition and j i t t e r on the shimmer and noise sequences. Thus, PDXPA and PDXPS appear to be se n s i t i v e to superposition, while the other measures are not. 7.2 VERIFICATION USING REAL VOWELS An analysis of /a/ vowels from 3 female and 3 male'speakers was undertaken to v e r i f y the results of the c a l i b r a t i o n . Vowels with r e l a t i v e l y low l e v e l s of perturbation were selected, as such CH 7: CALIBRATION 152 data are most l i k e l y to be affected by quantization. A l l but one of the selected subjects had no apparent laryngeal pathology. The exception.had mild vocal cord inflammation. Procedures and equipment for data a c q u i s i t i o n are described i n Chapter 8. Table 7.3 summarizes estimates of PDAVDB for the speakers. As expected, quantization of pitch-period markers caused a tendency to underestimate t h i s measure. Predictions of the quantization e f f e c t were within 2.1 dB for female speakers and within 0.9 dB for male speakers. However, compensation for quantization e f f e c t s based on the predictions produced disappointing r e s u l t s . This i s att r i b u t a b l e to a magnification of the s i g n i f i c a n c e of predic t i o n errors, as the compensation r e l i e s on subtraction of numbers with s i m i l a r magnitudes. While an improvement can t h e o r e t i c a l l y be obtained by increasing the number of pitch-periods, the use of optimized markers i s c l e a r l y preferable. High fundamental frequencies lead to the larger discrepancies for female speakers. Table 7.4 summarizes values of PAAVDB computed with and without parabolic i n t e r p o l a t i o n . Time quantization caused a small underestimation (up to 1.5 dB) when data were sampled at 10 kHz. L i t t l e error was observed when the sampling frequency was 20 kHz. Parabolic i n t e r p o l a t i o n reduced the underestimation, but the change was greater than 1 dB for only one subject. I f these subjects are i n d i c a t i v e of normal speakers, then i n t e r p o l a t i o n i s not necessary for /a/ data sampled at 20 kHz, and the recommendations i n Section 6.1.8.2 are pessimistic. However, t h i s may not be true for other studies, as shimmer CH 7: CALIBRATION 153 l e v e l s i n the present data are l i k e l y i n f l a t e d by tape recorder d i s t o r t i o n . TABLE 7.3; PDAVDB FOR REAL /a/ VOWELS This table compares PDAVDB estimates for /a/ segments from three female speakers (F1-F3) and three male speakers (M1-M3). PDAVDB i s defined i n Eq. (7.1), and was computed with M=3, K=l, CTR=0 and aj=l/3. Dashed l i n e s denote the log of a negative number. "PPLEN" = the pitch-period length "OPT" = optimized pitch-period markers "QUANT" = quantized pitch-period markers "PREDICT" = predicted quantization e f f e c t (Eq, "COMP" = compensated estimates based on Eq, (6.5)) (6.5) SAMPLING FREQUENCY AND SPEAKER MEAN PPLEN (pts) ESTIMATES OF PDAVDB (dB) OPT QUANT PREDICT COMP 27.0 18.6 20.5 19.7 26.7 18.7 18.9 23.8 26.8 18.4 19.6 20.2 27.9 23.5 22.7 28.2 22.3 23.2 24.4 28.0 22.4 21.6 28.7 25.5 23.4 27.0 20.2 21.7 21.6 27.5 22.9 22.4 27.9 25.3 25.2 28.3 28.3 26.2 25.6 31.3 28.4 23.9 24.4 26.3 10kHz F l F2 F3 Ml M2 M3 79.19 53.54 62.29 133.55 149.51 101.86 20kHz F l F2 F3 Ml M2 M3 39.59 26.77 31.64 66.78 74.75 50.93 CH 7: CALIBRATION TABLE 7.4: PAAVDB AND PSAVDB FOR REAL /a/ VOWELS 154 This table compares PAAVDB and PSAVDB estimates for /a/ segments from three female speakers (F1-F3) and three male speakers (M1-M3). PAAVDB and PSAVDB were defined i n Eq. (7.2) and Eq. (7.3), and were computed with M=3, K=l, CTR=0 and aj=l/3. "INTERP" = Parabolic peak in t e r p o l a t i o n for PAAVDB = Lagrange resampling for PSAVDB "QUANT" = no in t e r p o l a t i o n or resampling SAMPLING FREQUENCY PAAVDB (dB) PSAVDB (dB) AND SPEAKER INTERP QUANT INTERP QUANT 10 kHz F l 20 .6 18.8 21.0 21. 3 F2 20.8 20.0 19.7 19.6 F3 17.9 1.7.9 20.3 20.2 Ml 18.8 18.5 1.9 .5 19.5 M2 19.2 19.0 19.7 19 . 7 M3 18.0 17.7 19.6 19.6 20kHz F l 21.2 21.2 21.4 21.4 F2 20.6 20.5 19.7 19.7 F3 19.4 19.5 20.2 20.2 Ml 18.8 18.9 19.5 19.5 M2 19.0 18.9 19.6 19.6 M3 17.8 17.7 19.6 19.6 CH 7: CALIBRATION 155 Figure 7.23 illustrates the effect on PSAVDB of the offset of the pitch-period markers. The variation was less than 2 dB in each case. The absence of underestimation at positive offsets, as observed for synthetic vowels in Figure 7.8, is attributable to relatively low levels of j i t t e r for these subjects. Quantization of pitch-period markers had a negligible effect, as did sampling at 10 kHz, with a l l differences being less than 0.4 dB (see Table 7.4). FIGURE G23: PSAVDB; REAL VOWELS This figure illustrates the effect of demarcation offset on a measure of shimmer. A constant offset "X" was added to the demarcation markers. F l , F 2 , F 3 M l , M2, M3 3 female subjects 3 male subjects 30 C O o > to OL / a / 10 -10 1 -10 1-10 1-10 1-10 1-10 1 X X X X X X (•sec) (»sec) (usee) (asec) (msec) (usee) Fl F2 F3 Ml M2 M3 CH 7: CALIBRATION 156 HNRjj and CF estimates for the 6 subjects are plotted in Figure 7.24. The predicted effects of marker quantization and marker offset are readily apparent. With the exception of the second male speaker, the underestimation caused by marker quantization was 1 to 2 dB less than that predicted in Figure 6.3. This difference i s not alarming, as no attempt was made to model the spectral characteristics of each speaker, and the actual distribution of quantization errors was not determined. The difference of approximately 4 dB for the second male speaker was due to a coincidental synchronization with the fundamental frequency. The average pitch-period duration for this subject was 149.51 sample points, so the distribution of quantization errors deviated from the assumed distribution. FIGURE G24; HNRM & CF; REAL VOWELS The effect of quantization and pitch-period demarcation on measures of time domain noise. A constant offset "X" was added to the demarcation markers. T3 F l , F2, F3 Ml, M2, M3 3 female subjects 3 male subjects • = optimized pitch-period markers O = markers rounded to the nearest sample point 0 = predicted quantization effect from Figure 6.3 45i -10 1-10 1-10 1-10 1-10 1-10 1 X X X X X X (msec) (asec) (asec) (asec) (usee) (asec) Fl F2 F3 Ml M2 M3 oo (asec) (asec) (asec) (asec) (asec) (asec) Fl F2 F3 Ml M2 M3 CH 7: CALIBRATION 157 Conclusions drawn from Table 7.2 about the various HNRs are supported by data i n Table 7.5. The differences between the measures are comparable, as are the eff e c t s of marker quantization. An exception i s that the CF i s between 4 dB and 7 dB higher than HNRN, rather than approximately 3 dB. This can be attributed to the structure of the CF algorithm. Because i t i s the log of an average of HNRs, the CF places greater emphasis on pitch-periods with low le v e l s of perturbation. Furthermore, gradual changes i n the data have less influence on the CF, as i t i s based on comparisons of pairs of pitch-periods rather than a long term average. The HNR proposed i n Milenkovic (1987) (Eq. (5.12)) may also be affected, as i t i s computed i n a similar manner. Estimates of SHNR r o s and SHNR s o r for the 6 subjects are summarized i n Table 7.6 and Table 7.7. Accurate SHNR estimates were obtained using an FFT and there was l i t t l e benefit i n oversampling at 20 kHz. Quantization of pitch-period markers caused underestimation of up to 7 dB i n the SHNRs when a rectangular window was used. Application of a Hanning window reduced these errors to les s than 2.4 dB. These r e s u l t s are i n agreement with data for synthetic vowels i n Figure 7.14. CH 7: CALIBRATION 158 TABLE 7.5: COMPARISON OF HNR ESTIMATES FOR REAL /a/ VOWELS This table presents HNR estimates for /a/ segments from three female speakers (F1-F3) and three male speakers (M1-M3). The parameters and l a b e l l i n g are the same as for Table 7.2. SAMPLING OPTIMIZED MARKERS QUANTIZED MARKERS FREQUENCY HNRL HNRg HNRN CF(1) HNRL HNRg HNRN CF(1) AND SPEAKER (dB) (dB) (dB) (dB) (dB) (dB) (dB) (dB) 10kHz F l 27.4 28.0 29.8 33.9 14.8 14.8 14.9 21 .0 F2 25.7 25.8 29 .4 34.0 14.2 14.5 14.7 24.8 F3 26 .4 26.6 27 .6 32.0 15.2 15.4 • 15.5 20 .5 Ml 25.6 25.7 27.2 33.1 18.5 18.6 18.8 28.3 M2 27.1 27.2 28.9 35.0 17.2 17.5 17.6 26.8 M3 24. 3 24.4 28.9 35.6 14.5 14.6 14.9 31.3 20kHz F l 27 .7 28.4 30.4 37.2 20 .4 21.0 21.3 32.8 F2 , 26 .0 26.1 29.8 34.5 20.3 20.5 21.2 26.2 F3 27.2 27.3 28.4 34.9 20.8 21.0 21. 3 28.6 Ml 25.5 25.6 27.2 33.0 21.4 21.5 22.0 30.6 M2 26.1 27 .0 28.5 34.7 23.1 23.6 24.3 32.8 M3 24.4 24.5 29.2 36.1 20.7 20.8 22.2 32.7 . CH 7: CALIBRATION 159 TABLE 7.6: 5HNR r n s 'FOR REAL /a/ VOWELS This table compares SHNR r o« estimates for /a/ segments from three female speakers (F1-F3) and three male speakers (M1-M3). "SHNRros" was defined i n Eq. (5.16), and was computed with NPP=4. "RECTANGULAR" = a rectangular window and NSKIP=0 "HANNING" = a Hanning window and NSKIP=1 "DFT" = d i r e c t DFT estimation "FFT" = fa s t DFT estimation with Lagrange resampling "FFT a" = fast DFT estimation without Lagrange resampling SAMPLING SHNRrfV- (dB) FOR VARIOUS MODES FREQUENCY RECTANGULAR HANNING AND SPEAKER DFT FFT FFTg DFT FFT FFTg 10kHz F l 29.1 29.1 23.2 33.9 34.0 33.4 F2 27.8 27.8 21.9 31.8 31.8 39.9 F3 28.5 28.1 22.4 33.8 33.4 31.0 Ml 26 .0 26.1 21.6 30.4 30.5 30.4 M2 28.2 28.2 23.8 33.5 33.5 32.9 M3 27.9 27 .9 20.9 32.9 32.9 31.4 20kHz F l 29.3 29.3 24.6 34.2 34.2 33.8 F2 27.8 27 .8 24.7 31.8 31.8 31.0 F3 28.7 28.7 24 .7 34.1 34.2 33.8 Ml 26.9 26.9 25.4 32.0 32.0 31.8 M2 28.2 28.2 27.2 33.5 33.5 33.5 M3 27 .9 27.9 24.4 33.0 33.0 32.4 CH 7: CALIBRATION 160 TABLE 7.7: SHNR^nr FOR REAL /a/ VOWELS This table compares SHNR s o r estimates for /a/ segments from three female speakers (F1-F3) and three male speakers (M1-M3). Lab e l l i n g i s the same as for Table 7.6. SAMPLING SHNRc-nr (dB) FOR VARIOUS MODES FREQUENCY RECTANGULAR HANNING AND SPEAKER DFT FFT FFTg DFT FFT FFTg 10kHz F l 26.3 26.3 21.5 33.9 33.5 33.4 F2 27 .0 27.0 21.1 33.3 33.2 31.7 F3 26 .4 26.3 21.6 33.6 33.7 33.9 Ml 23.0 23.0 20.2 30.4 30 .5 30.0 M2 22.2 22.2 19.3 29.1 29.0 30.2 M3 24.4 24.4 19 .5 31.9 31.8 31.9 20kHz F l 26.6 26.6 23.2 35.5 35.0 35.2 F2 27.0 27.0 23.6 33.7 33.7 33.6 F3 26.6 26.6 24.2 35.1 35.3 33.8 Ml 24.1 24.1 23.4 30.2 30.2 30 .7 M2 22.2 22.2 21.8 29.7 29.7 30.4 M3 24.4 24.4 22.2 32. 3 32.3 32.2 CH 7: CALIBRATION 161 The data in Figure 7.25 give further evidence that window tapering reduces the effects of pitch-period demarcation. As in Figures 7.16 and 7.22, the peaks in these figures were much sharper when a rectangular window was used, indicating a greater sensitivity to demarcation errors. The offset of pitch-period demarcation had l i t t l e effect on the SHNRs. The var i a b i l i t y increased when a Hanning window was applied, and was greater for SHNRsor. However, in a l l cases the variation was less than 1.6 dB. This agrees with the conclusions drawn from Figure 7.17. FIGURE G25: SHNRrng & SHNRgor; REAL VOWELS The effect of pitch-period demarcation on measures of spectral noise. A constant offset "X" was added to the demarcation markers. F l , F2, F3 = 3 female subjects Ml, M2, M3 = 3 male subjects Upper plot «= Hanning window, 4 PP/window, NSKIP=1 Lower plot « rectangular window, 4 PP/window, NSKIP=0 CH 7: CALIBRATION 162 7.3 SUMMARY AND DISCUSSION 7.3.1 SENSITIVITY TO PITCH-PERIOD DEMARCATION Table 7.8 summarizes the s e n s i t i v i t y of the computed vowel measures to pitch-period demarcation. The s e n s i t i v i t i e s predicted i n Chapter 6 were observed. Measures of time domain noise (HNRN and CF) were highly s e n s i t i v e to marker quantization and other demarcation errors. Measures of j i t t e r (PDAVDB) were moderately s e n s i t i v e . The pitch-period optimization methods described i n Chapter 4 reduced such errors to an acceptable l e v e l . The high s e n s i t i v i t y of the spectral noise measures (SHNR r 0 S and SHNR s o r) to demarcation errors i s not s u r p r i s i n g , given the reliance on p i t c h synchronization to control spectral leakage. Use of a nonrectangular (Hanning) window reduced t h i s s e n s i t i v i t y by providing additional control of leakage. As a r e s u l t , errors caused by quantization of demarcation markers become t o l e r a b l e . The data suggested that pitch-period demarcation markers should be aligned with the instant of g l o t t a l closure for measures of time domain noise (HNRN and CF). I f a d i f f e r e n t alignment i s used, then two g l o t t a l closures have a major influence on each demarcated pitch-period. A large underestimation of the noise measures results when j i t t e r i s present. To a l e s s e r extent, the same argument applies for PSAVDB. Thus, the pitch-period peak, or the zero crossing following the peak, should not be used. CH 7: CALIBRATION 16 3 TABLE 7.8: SUMMARY OF DEPENDENCIES ON PITCH-PERIOD DEMARCATION This table provides a q u a l i t a t i v e summary of dependencies of vowel measures on pitch-period demarcation. Conclusions were drawn from an analysis of synthetic vowel waveforms. The measures are defined i n Appendix F. "SHNR-Rect" = SHNR r o s or SHNR s o r with a rectangular window "SHNR-Hann" = SHNR r o s or SHNR s o r with a Hanning window DEMARCATION CHARACTERISTIC MEASURE ERRORS OFFSET PDAVDB Moderate None PSAVDB Low Moderate HNRN High High CF High High SHNR-Rect High Low SHNR-Hann Moderate Low 7.3.2 SAMPLING FREQUENCY AND INTERPOLATION A summary of recommended sampling frequencies i s given i n Table 7.9. Without i n t e r p o l a t i o n , high sampling frequencies were required for most of the measures. Requirements were reduced to 20 kHz or le s s through i n t e r p o l a t i o n . A notable exception was PSAVDB, where in t e r p o l a t i o n was detrimental. Apparently errors introduced by time quantization are minimized during computation of PSAVDB, but errors introduced by in t e r p o l a t i o n are not. The sampling conditions recommended i n Table 7.9 also apply to the associated pattern and c o r r e l a t i o n measures. For example, the CH 7 : CALIBRATION 164 recommended conditions for PDCPF and PDDPQ are the same as for PDAVDB. TABLE 7 . 9 : SUMMARY OF RECOMMENDED SAMPLING FREQUENCIES This table summarizes recommended sampling frequencies for vowel measures. Conclusions were drawn from an analysis of synthetic vowels. The measures are defined i n Appendix F. Recommendations are for analysis of / i / . Lower sampling frequencies are adequate for vowels with smaller high frequency components. "QUANTIZED" = rounding to the nearest sample point "INTERPOLATED" = parabolic i n t e r p o l a t i o n for PAAVDB = Lagrange in t e r p o l a t i o n for the others "SHNR-Rect" = SHNR r o s or SHNR s o r with a rectangular window "SHNR-Hann" = SHNR r o s or SHNR s o r with a Hanning window RECOMMENDED SAMPLING FREQUENCY Measure QUANTIZED INTERPOLATED PDAVDB > 20 kHz 10 kHz PAAVDB >> 20 kHz 20 kHz PSAVDB 10 kHz 20 kHz HNRN >> 20 kHz 20 kHz CF >> 20 kHz 20 kHz SHNR-Rect > 20 kHz 20 kHz SHNR-Hann 20 kHz 20 kHz It i s worth emphasizing that accurate pitch-period demarcation i s i m p l i c i t l y assumed for Table 7 . 9 . C l e a r l y , no sampling frequency can provide the desired accuracy i f the methodology for data a c q u i s i t i o n and l a b e l l i n g i s flawed. CH 7: CALIBRATION 165 7.3.3 DEPENDENCE ON VOWEL CHARACTERISTICS Figure 7.26 summarizes the relative sensitivities of the vowel measures to the formant structure and the fundamental frequency. A l l measures except PDAVDB were affected. SHNRros was the most affected by the formant structure. This effect was reduced in SHNRsor. PSAVDB and CF provided a reduction in dependence on fundamental frequency when compared to their counterparts PAAVDB and HNRN. Figure 7.27 summarizes the relative influences of j i t t e r , shimmer and noise. PDAVDB was effective at isolating j i t t e r . ' However, j i t t e r had a large influence on a l l other measures as well. At a given level of perturbation, j i t t e r was the most influential contributor to so-called measures of shimmer (PAAVDB and PSAVDB). For HNRN, j i t t e r and noise were equally influ e n t i a l . The j i t t e r effect was reduced in CF, but i t s t i l l had a moderate influence. These effects can be attributed to pitch-period superposition. The results indicated that j i t t e r is the prime determinant of the measures of spectral noise (SHNRros and SHNR s o r). This was expected, as clear definition of the harmonic components in the frequency domain depends on the regularity of repetition of the harmonic component in the time domain. This suggests that a distinction should be made between noise in the frequency domain and noise in the time domain. The waveform characteristics and underlying physiological phenomenon influencing algorithms designed for measurement of time domain noise may be different. CH 7: CALIBRATION 166 FIGURE 7.26: SUMMARY OF SENSITIVITY TO FORMANT STRUCTURE AND FUNDAMENTAL FREQUENCY. This figure is a qualitative summary of the sensitivity of computed vowel measures to fundamental frequency and formant structure. Conclusions were drawn from an analysis of synthetic vowels. The measures are defined in Appendix F. HIGH FORMANT STRUCTURE SENSITIVITY LOW CF SHNRros HNRN PSAVDB PAAVDB SHNRsor PDAVDB LOW HIGH FUNDAMENTAL FREQUENCY SENSITIVITY FIGURE 7.27: SUMMARY OF INFLUENCES OF JITTER, SHIMMER AND NOISE. This figure is a qualitative summary of the relative influences of j i t t e r , shimmer and additive noise on computed vowel measures. Measures are defined in Appendix F. "SHNR" denotes either SHNRros or SHNRsor. JITTER A PDAVDB SHIMMER NOISE CH 7: CALIBRATION 167 I f "independent" measures of j i t t e r , shimmer and noise are desired, then a l o g i c a l choice i s PDAVDB, PSAVDB and CF. PSAVDB was preferred over PAAVDB because of i t s reduced dependence on noise and i t s reduced s e n s i t i v i t y to quantization. CF was preferred over HNRjsj because of i t s reduced dependence on j i t t e r . While Figure 7.27 indicates considerable i n t e r a c t i o n among the measures, there may be greater independence i n r e a l vowels. S p e c i f i c a l l y , the s e n s i t i v i t y of PAAVDB, PSAVDB, HNRN and CF to j i t t e r and fundamental frequency may be overestimated, as the lack of source-tract i n t e r a c t i o n i n the vowel synthesizer results i n an excessive amount of pitch-period superposition. The issue of measurement i n t e r a c t i o n was also discussed i n Hillenbrand (1987). His study involved the analysis of synthetic /a/ data sampled at 20 kHz. The HNR was considered, along with standard measures of j i t t e r and amplitude shimmer. Results were i n agreement with data presented here. A number of issues warrant further consideration. Tests using a synthesizer that models source-tract i n t e r a c t i o n would provide a better i n d i c a t i o n of the degree to which pitch-period superposition a f f e c t s results-. Other p o t e n t i a l sources of error were not modelled, and a number of simp l i f y i n g assumptions were made i n implementation of the perturbations (see Section 3.5). F i n a l l y , further v a l i d a t i o n of results should be performed for r e a l data through comparison with other measures of vocal function, such as the electroglottograph. 168 CHAPTER 8: APPLICATION TO NORMAL AND PATHOLOGICAL SPEAKERS T h i s chapter d e t a i l s an e v a l u a t i o n o f the u t i l i t y o f the computed vowel measures d e s c r i b e d i n Chapter 7 f o r j u d g i n g the l e v e l o f hoarseness and d i s c r i m i n a t i n g among l a r y n g e a l p a t h o l o g i e s . The f o l l o w i n g q u e s t i o n s were addressed: 1) Is i t reasonable t o assume t h a t the p r o b a b i l i t y d e n s i t i e s o f the computed measures are normal? 2) What i s the c o r r e l a t i o n between the computed measures and s u b j e c t i v e judgements o f hoarseness? 3) I s t h e r e an advantage i n c o n s i d e r i n g a number o f measures and i f so, which measures should be i n c l u d e d ? 4) Should data be separated a c c o r d i n g t o age and sex? 5) How do log-magnitude, CPF, DPQ, and c o r r e l a t i o n measures compare f o r the purpose o f d e t e r m i n i n g the presence and/or type o f l a r y n g e a l pathology? Which type o f measure i n d i v i d u a l l y has the best performance and which measures c o n t r i b u t e ? 6) Are c l a s s i f i c a t i o n s based p r i m a r i l y on d i f f e r e n c e s i n the o v e r a l l magnitude o f p e r t u r b a t i o n ? 7) I f the g o a l i s t o d e t e c t l a r y n g e a l pathology or t o d e t e c t l a r y n g e a l cancer, i s t h e r e an advantage i n u s i n g a 4 - c l a s s c l a s s i f i e r and p a r t i t i o n i n g the r e s u l t , or i s a 2 - c l a s s c l a s s i f i e r s u i t a b l e ? 8) How does the computer's c l a s s i f i c a t i o n performance compare w i t h the performance o f t r a i n e d l i s t e n e r s ? CH 8: EVALUATION 169 8.1 DATA DESCRIPTION A l a r g e database o f s u b j e c t s w i t h v a r i o u s l a r y n g e a l d i s o r d e r s has been e s t a b l i s h e d at the V o i c e Lab at Vancouver General H o s p i t a l . Over 1,500 p a t i e n t s have been assessed, and the database has expanded a t a r a t e o f approximately 300 s u b j e c t s per year. The standard assessment procedure i n c l u d e d documentation o f a p e r s o n a l and medi c a l h i s t o r y , a p h y s i c a l examination performed by an O t o l a r y n g o l o g i s t , a v i d e o t a p e r e c o r d i n g o f a l a r y n g o s c o p i c examination and an audio-tape r e c o r d i n g o f the p a t i e n t ' s v o i c e . The v o i c e r e c o r d i n g s i n c l u d e d t h r e e p r o l o n g a t i o n s o f the phoneme /a/, a number o f v o c a l e x e r c i s e s and continuous speech (the rainbow passage). A Speech P a t h o l o g i s t made a s u b j e c t i v e a p p r a i s a l o f each p a t i e n t ' s v o i c e . S i g n i f i c a n t f i n d i n g s were recorded on a p a t i e n t e v a l u a t i o n form. The f o l l o w i n g equipment was used. A Wolf 5010 r i g i d f i b e r - o p t i c laryngoscope was connected t o a Sanyo c o l o r t e l e v i s i o n camera and a Sony 3/4 i n c h v i d e o c a s s e t t e r e c o r d e r . Audio r e c o r d i n g s were made u s i n g an Optonica RT-6501C r e c o r d e r and an AKG A c o u s t i c s D190E microphone p o s i t i o n e d approximately 15 cen t i m e t e r s i n f r o n t o f the mouth. The s u b j e c t s were l o c a t e d i n a s e p a r a t e room designed t o minimize e x t e r n a l n o i s e and room r e v e r b e r a t i o n . Recordings were made u s i n g h i g h b i a s and 70 ysec e q u a l i z a t i o n , and the g a i n was adjus t e d t o o b t a i n a VU meter r e a d i n g o f approximately 0. Dolby n o i s e r e d u c t i o n was not used. Two groups o f s u b j e c t s were s e l e c t e d . The f i r s t group c o n s i s t e d o f 206 male s u b j e c t s w i t h ages between 40 and 75 yea r s . CH 8: EVALUATION 170 The second group c o n s i s t e d o f 194 female s u b j e c t s w i t h ages between 20 and 40 y e a r s . The ages w i t h i n each group were r e s t r i c t e d t o minimize the a s s o c i a t e d v a r i a b i l i t y (Ramig & R i n g e l 1983). A summary can be found i n Tab l e 8.1. TABLE 8.1: AGE AND PATHOLOGY INCIDENCE WITHIN GROUPS T h i s t a b l e summarizes the age and i n c i d e n c e o f p a t h o l o g i e s f o r a group o f 206 male speakers and a group o f 194 female speakers. The p a t h o l o g i e s are d e s c r i b e d i n Appendix A. CLASS NUM- AGE (yea r s ) PATHOLOGY CODE BER MIN MAX MEAN GROUP 1: 206 males between 40 and 75 years o l d Normal (N) N 22 40 74 57 . 0 F u n c t i o n a l Dysphonia (F) F 45 40 71 54. 1 Benign L e s i o n (B) B 72 40 75 53. 2 T l G l o t t i c Cancer (TIG) C 44 47 75 61. 8 T2 G l o t t i c Cancer (T2G) C 23 42 74 57 . 3 GROUP 2: 194 females between 20 and 40 years o l d Normal (N) N 22 20 40 29. 5 F u n c t i o n a l Dysphonia (F) F 40 22 40 33. 3 Muscle T e n s i o n Dysphonia - Normal Larynx (MTD1) M 18 22 39 29. 0 - V o c a l Nodules (MTD2A) M 62 20 39 28. 5 - L a r y n g i t i s (MTD2B) M 4 26 33 29. 0 - V o c a l Polyps (MTD2C) M 18 22 39 30. 0 Other Pathology (O) 0 30 20 37 29. 5 CH 8: EVALUATION 171 Subjects were excluded i f they were unable to sustain a stable voiced /a/ for over 1 second. Stable implies a segment free of intermittences, such as pitch breaks or phonation breaks. Voiced implies the exclusion of subjects with inhalation phonation or whisper phonation. 8.2 SUBJECTIVE ASSESSMENTS Segments of /a/ prolongations were randomly distributed on audio-tapes for subjective assessment. One segment from each subject was used. Each segment was approximately two seconds in duration. The onset and offset of voicing were excluded, and obvious intermittences were avoided. The segments were repeated twice in rapid succession to provide listeners with a longer sample on which to make their evaluations. A separate tape was produced for each sex. An Otolaryngologist and a Speech Pathologist judged the probable pathology and the severity of breathiness and stridency. Each listener had more than four years of experience in evaluating and treating vocal pathology. Each test tape was graded twice, with a delay of over one month between the f i r s t and second grading. The listeners were seated in a quiet room. Data were presented at a comfortable volume using a Sony STR-V1 receiver connected to a Sony 3-way stereo speaker. No discussion or comparison of judgements were permitted during a listening session, but pauses and replays of samples were permitted. They were informed of the sex and age range of the subjects, but did CH 8: EVALUATION 172 not know the relative frequency of each type of pathology. They were informed that the voice onset and offset were excluded, and that the samples were truncated to approximately 2 seconds. In order to simulate the process used in the computer analyses, the listeners were asked to judge the "clearest" portion of a sample for instances when a variation in the severity was perceived. Finally, each session was preceded by at least 10 practice samples. Breathiness and stridency were graded on an integer scale from 0 to 7, where 0 = none, 1 or 2 = slight, 3, 4, or 5 = moderate and 6 or 7 = severe. The "composite severity" was defined as the maximum of the breathiness and stridency judgements. The probable pathology was categorized into four diagnostic classes. For the males, the classes were normal (N), functional dysphonia (F), benign lesion (B) and cancer (C). For the females, the classes were normal (N), functional dysphonia (F), muscle tension dysphonia (M) and "other" (O). 8.2.1 RESULTS A summary of the judgements of pathology severity can be found in Tables 8.2 and 8.3. The significance of differences between correlation coefficients was tested using a Student's t s t a t i s t i c after application of a Fisher's Z transform (Spiegel 1975, pp. 267-268). The Student's t is a test of equality of two means under the assumption that the two samples are normally distributed with equal variance. The Z transform compensates for CH 8: EVALUATION 173 the lack of normalcy in correlation coefficients when they are significantly different from zero (Sachs 1984, pp. 427-432). The overall breathiness and composite severity were roughly the same for both sexes, but stridency tended to be lower for the females. Judgements of breathiness and stridency for Listener 1 were approximately equal. Listener 2 had higher breathiness and lower stridency judgements, leading to an composite severity that averaged approximately 1 level higher than Listener 1. The correlation coefficients in Table 8.2 indicate that judgments of pathology severity are repeatable between sessions and between judges. A l l correlations were significant at the 0.1% level. The correlations were stronger for the males (r».8) than for the females (r<.76). The correlations for judgements of stridency by the second listener were lower. Table 8.3 provides a breakdown of the severity judgements according to the type of pathology. As expected, the average severities for the normal class were lower than for the pathological classes, and the severities were greater at later stages of cancer. Both listeners perceived an increased proportion of breathiness for. MTD. This is attributable to leakage through the posterior glottic chink that characterizes MTD. Histograms of the severity judgements are presented in Figure 8.1. The tendency toward higher assessments from Listener 2 can be observed. The histograms suggest that i t is reasonable to assume a normal distribution for breathiness and CH 8: EVALUATION 174 composite severity. The assumption of normalcy is more tenuous for stridency, particularly for Listener 2, as there was an increased incidence of stridency=0. This contributes to the differences in mean stridency noted above. TABLE 8.2: SUBJECTIVE JUDGEMENTS OF SEVERITY; SUMMARY This table summarizes subjective judgments of pathology severity made by two listeners (LI and L2). Each sex group in Table 8.1 was graded twice by each listener. Breathiness and stridency were graded on an integer scale between 0 and 7 inclusive. The composite severity is the maximum of the breathiness and stridency determinations. A l l correlations are significant (p<.001). Differences between correlations of 0.8 or more are significant (t>2.2, p<.02). Differences between means of 0.4 or more are significant (t<2.2, p<.02). MALES (N=206) FEMALES (N = 194) BREATHY STRIDENT COMPOSITE BREATHY STRIDENT COMPOSITE OVERALL MEAN Ll 2.3 2.6 2.8 2.2 2.2 2.8 L2 3.2 2.0 3.7 3.1 1.6 3.5 OVERALL STANDARD DEVIATION Ll 1.60 1.67 1.66 1.39 1. 37 1. 34 L2 1.91 1.91 1.62 1.66 1.56 1.42 INTER-SESSION CORRELATION (Pearson' S r) Ll 0.82 0.80 0.84 0.70 0.70 0.76 L2 0.83 0.59 0.79 0.76 0.64 0.76 INTER-JUDGE CORRELATION (Pearson's r) 0.74 0.52 0.78 | 0.72 0.48 0.68 CH 8: EVALUATION 17 5 TABLE 8.3; SUBJECTIVE JUDGEMENTS OF SEVERITY; CLASS MEANS T h i s t a b l e b r e a k s down t h e means o f s u b j e c t i v e judgments o f p a t h o l o g y s e v e r i t y a c c o r d i n g t o t h e t y p e o f p a t h o l o g y . The p a t h o l o g i e s a r e d e s c r i b e d i n T a b l e 8.1. The g r a d i n g p r o c e d u r e i s d e s c r i b e d i n T a b l e 8.2. CLASS LISTENER 1 LISTENER 2 CODE BREATHY STRIDENT COMPOSITE BREATHY STRIDENT COMPOSITE MALES (N= 206) N 1.0 1.3 1.4 1.4 1.8 2.4 F 2.1 2.4 2.7 3.0 1.8 3.5 B 2.0 2.2 2.5 3.0 1.8 3.5 TIG 2.6 2.9 3.1 3.7 2.1 4.1 T2G 3.8 3.9 4.3 5.1 3.0 5.1 (C) 2.9 3.2 3.4 4.0 2.4 4.4 FEMALES (N= 194 ) N 1.1 1.6 1.8 1.8 1. 3 2.4 F 1.9 2.5 2.9 3.1 2.0 3.7 MTD1 2.0 1.7 2.3 3.1 1.1 3.2 MTD2A 2.3 2.0 2.8 3.2 1.3 3.5 MTD2B 3.5 2.1 3.5 4.3 0.5 4.3 MTD2C 2.7 2.3 3.1 3.5 1.4 3.6 (M) 2.4 2.1 2.9 3.3 1.3 3.5 0 2.7 2.6 3.3 3.7 2.1 4.1 CH 8 : EVALUATION 176 FIGURE 8 . 1 : SUBJECTIVE JUDGEMENTS OF SEVERITY; HISTOGRAMS This figure contains histograms of judgments of pathology severity made by two listeners (Ll and L2) on 206 males and 194 females. Each sex group was graded twice by each listener. Breathiness and stridency were graded on an integer scale between 0 and 7 inclusive. The composite severity i s the maximum of the breathiness and stridency determinations. BREATHINESS 20 I A A A 8 i l A A A A A A A A A i i A s A \ A A 0 A A % W^j r ^ N f_ STRIDENCY 40 %9 A A i -1 y A A A -A A A M i l s ez= Listener 1 KS= Listener 2 COMPOSITE SEVERITY 20 o <_> i 1 I A A A A A A A A A A A A A A i A A A A A A A A j A A A A A 2 S J A A. i 3 4 5 SEVERITY 20 0hm I A A A A A. A A A A A A A A A A A I A k A \ A SEVERITY CH 8: EVALUATION 177 Confusion matrices and summary sta t i s t i c s for judgements of pathology type can be found in Table 8.4, Table 8.5 and Table 8.6. The chi-squared stat i s t i c s compare the confusion matrices with the result expected when the row variables are independent of the column variables (Sachs 1984, pp. 474-476). Large chi-squared values indicate dependence. The total percent correct is overall incidence of correct classification. The average percent correct is the average of classification results for each class. Imbalance in the number of subjects per class affects the interpretation of the percent agreement s t a t i s t i c s . For example, the high total percent correct for Listener 2 in identifying normal subjects (N/FBC and N/FMO) is due in part to the tendency of this listener to perceive pathology. The average percent correct provides a less flattering indication of performance. Table 8.4 summarizes inter-session and inter-judge st a t i s t i c s for judgements of pathology type. A significant level of repeatability was observed. The f i r s t listener was particularly adept at consistently judging male subjects. Data in Table 8.5 and Table 8.6 indicate a significant relationship between actual and judged pathology. However, the level of performance was not sufficient to warrant the use of subjective assessment as the sole means of diagnosis. For males, both listeners had moderate success at identifying normal and cancer subjects. Diagnosis of functional dysphonia and benign lesion was more d i f f i c u l t . Classification of the cancer subjects CH 8: EVALUATION 178 was easier at later stages of the disease. For females, both listeners had moderate success at distinguishing the normal subjects. Listener 1 tended to classify samples as functional, and Listener 2 tended to classify samples as "other". Listener 2 had a greater tendency to perceive a pathology than Listener 1. TABLE 8.4: JUDGEMENTS OF PATHOLOGY TYPE; REPEATABILITY This table summarizes total percent correct and Pearson's chi-squared st a t i s t i c s for subjective judgements of pathology type made by two listeners (LI and L2) on a group of 206 male speakers and a group of 194 female speakers. N/F/B/C, N/F/M/O = 4-class confusion matrices N/FBC, N/FMO = normal/pathological matrix partitions NFB/C = non-cancer/cancer matrix partition NFO/M = non-MTD/MTD matrix partition For chi-squared s t a t i s t i c s , 4-class matrices have nine degrees of freedom and 2-class partitions have 1 degree of freedom. A l l sta t i s t i c s are significant at the 0.1% level. Random chance produces 25% agreement for 4-class matrices, and 50% agreement for the 2-class matrices. MALES (N=206) FEMALES IN- 194) LISTENER N/F/B/C N/FBC NFB/C N/F/M/O N/FMO NFO/M BETWEEN SESSIONS CHI-SQUARED LI 237 117 97 80 20 34 L2 92 21 56 58 19 25 BETWEEN LISTENERS CHI-SQUARED 209 62 125 96 57 28 BETWEEN SESSIONS TOTAL PERCENT CORRECT LI 68 89 85 52 81 78 L2 49 87 75 52 91 69 BETWEEN LISTENERS TOTAL PERCENT CORRECT 47 79 78 39 86 66 CH 8: EVALUATION 179 TABLE 8.5: SUBJECTIVE JUDGEMENTS OF PATHOLOGY TYPE; MALES T h i s t a b l e p r e s e n t s c o n f u s i o n m a t r i c e s f o r s u b j e c t i v e judgements o f pathology type made by two l i s t e n e r s on a group o f 206 males. C l a s s codes are d e s c r i b e d i n Tab l e 8.1. A l l c h i - s q u a r e d s t a t i s t i c s are s i g n i f i c a n t at the 0.1% l e v e l . II J£ 2 II "ave%" " t o t a l % " the percentage o f c o r r e c t d e t e r m i n a t i o n s the Pearson's c h i - s q u a r e d f o r the mat r i x the average pe r c e n t c o r r e c t the t o t a l p e r c e n t c o r r e c t LISTENER 1 LISTENER 2 N F B C %= N F B C %= N 30 7 5 2 68% 14 12 10 8 32% 'UAL F 23 22 20 25 • 24% 6 29 32 23 32% B 37 34 41 32 29% 22 29 46 47 32% TIG 12 21 24 31 35% 5 12 30 41 47% T2G 4 5 10 27 59% 0 4 8 34 74% (C) 16 26 34 58 43% 5 16 38 75 56% MATRIX X 2 ave% t o t a l % X 2 ave% t o t a l % N/F/B/C 67 .8 41 .1 36.7 61 .2 40 .0 39.8 N/FBC 46 .5 73 .8 78.2 20 .3 61 .4 84.7 NFB/C 21 .6 61 .1 67.2 30 .2 64 .0 66.7 CH 8: EVALUATION 180 TABLE 8.6; SUBJECTIVE JUDGEMENTS OF PATHOLOGY TYPE; FEMALES This table presents the same data as Table 8.5 for a group of 194 females. A l l chi-squared stati s t i c s are significant at the 0.1% level. LISTENER 1 LISTENER 2 N F M 0 %= N F M O %= N 1 9 17 4 4 43% 12 15 11 6 27% ACTUAL F 7 45 10 18 56% 4 23 25 28 29% MTD1 7 19 9 1 25% 1 2 21 12 58% MTD2A 16 52 41 15 33% 8 23 56 37 45% MTD2B 0 1 7 0 88% 0 1 6 1 75% MTD2C 6 6 14 10 39% 2 3 16 15 44% (M) 29 76 71 24 35% 11 29 99 61 49% 0 10 21 14 15 25% 1 8 21 30 50% . MATRIX X2 ave% tot a l % X 2 ave% total% N/F/M/O N/FMO 52.3 40.0 38.7 24.9 64.9 81.7 59.5 38.9 42.3 29.8 61.3 87.6 CH 8: EVALUATION 181 8.3 COMPUTER ASSESSMENTS Computer processing was performed on a Digital Equipment Corporation (DEC) PDP11/23 microcomputer, with an ADV11-A 12-bit linear A/D converter, an AAV11-A D/A converter and a DEC VT100 graphics terminal enhanced for compatibility with Tektronix graphics commands. A commercially available software package (Interactive Laboratory System (ILS) Version 3) was extensively revised and expanded. Programs were written under the RT-11 Version 4 operating system in the FORTRAN programming language (DEC Version 2.5). Computations were performed using normal (32-bit) precision. 8.3.1 DATA ACQUISITION A 0.75 second segment of each vowel prolongation was sampled at 20 kHz and stored on the computer. Data were band-pass fi l t e r e d between 80 Hz and 9 kHz using a Wavetek 452 Dual HI/LO f i l t e r . The "clearest" portions of the prolongations were sampled for instances when a variation was apparent. The gain was adjusted so that the amplitude was approximately 1,000 quantization levels. The procedure for demarcation of pitch-periods was discussed in Chapter 4. The waveform transition preceding the f i r s t large pitch-period oscillation was marked. The markers were visually verified on the video terminal and adjusted when necessary. A description of the measures was provided in Chapter 7. A summary of the names and analysis parameters can be found in CH 8: EVALUATION 182 Appendix F. 64 successive pitch-periods from each vowel segment were analyzed. A number of computer programs were written for computation and evaluation of the measures. The graphical displays produced by the programs included pseudo 3-dimensional plots of the pitch-periods and the spectra, a time series display of the perturbations and Fourier transforms of the perturbation series. These displays were useful for program verification and i n i t i a l inspection of data. It has also been suggested that these types of displays may be c l i n i c a l l y useful (e.g., Davis 1976; Deller 1979). However, further work is needed to optimize for specific c l i n i c a l applications. 8.3.2 PRELIMINARY INSPECTION OF THE MEASURES As summarized in Table 8.7, seven male subjects and five female subjects were identified as outliers. These subjects were i n i t i a l l y identified through inspection of histograms and l i s t i n g s of extreme values. The following T s t a t i s t i c was then used to test the probability that the suspected outliers were from a different population than the other samples (Sachs 1984, pp. 279-281). T = | x-y | / o (8.1) where x = the suspected outlier y = the sample mean with x excluded o = the sample standard deviation with x excluded CH 8: EVALUATION 183 Histograms suggested that the requirement that the sample distribution be approximately normal is satisfied. A table of c r i t i c a l T values can be found in Sachs (1984). TABLE 8.7: DETAILS OF SUBJECTS IDENTIFIED AS OUTLIERS This table provides details about subjects identified as outliers in a group of 206 male speakers and a group of 194 female speakers. The groups are described in Table 8.1. The measures are described in Appendix F. Subjective assessments are summarized in Tables 8.2 through 8.6. The "T" s t a t i s t i c is defined in Eq. (8.1). A l l stat i s t i c s are significant at the 3% level. SUBJECTIVE ASSESSMENTS OUTLIER AND ABERRANT PATHOLOGY COMPOSITE PATHOLOGY MEASURE T TYPE SEVERITY MALE 1 T2G CFCPF = 12 .8 9.69 C C C C 6 6 6 6 2 TIG SHNRsor = 8 .2 3.74 C C c C 6 7 6 7 3 TIG PDCPF = 9 .0 7.66 C B c C 5 4 4 4 4 BL PDCPF 9 .5 8.06 C C c C 4 6 3 5 5 BL PSCPF = 4 .9 4.92 C C B F 3 6 5 6 6 TIG PDCPF = 10 .7 9 .10 F B F F 3 3 2 2 7 BL PDCPF = 11 .3 9.64 C C C B 4 5 5 4 FEMALE 1 BL PDCPF = 10 .5 10.57 F F F F 4 6 5 5 2 MTD2A PSCPF 4 .9 4.70 O M M M 2 2 3 2 3 FD PSAVDB = 8 .8 4.26 0 0 0 O 6 6 5 6 4 MTD2A PDCPF = 6 .5 6.37 0 F F M 5 3 4 3 5 FD CFCPF = 8 .6 8.76 M M M M 2 3 2 3 CH 8: EVALUATION 184 The majority of the outliers were rejected because of high levels of alternate cycle periodicity. This also lead to high perturbation magnitudes and DPQs. With the exception of the second male, the probability that the samples were outliers was greater than 99% (i.e., T>3.9). The probability for the exception was greater than 97% (T>3.7). The estimated fundamental frequency for the second male was abnormally low (f0=49 Hz). This suggests that noise combined with extreme alternate cycle periodicity lead to demarcation of every second pitch-period. These subjects were excluded from subsequent analyses, as they would have undue influence on parametric methods based on means and variances. It is interesting that the majority of male outliers had cancer, and that 20 of 28 subjective assessments (71%) were cancer. Furthermore, a l l of the benign lesions were unilateral (i.e.; primarily affecting one side of the larynx). This suggests that extreme alternate cycle periodicity in males is a hallmark of unilateral pathology. The histograms in Figure 8.2 and Figure 8.3 indicate that a normal distribution can be assumed for the log-magnitude, pattern and correlation measures. The implicit bounding of the DPQ measures between 0 and 1 should not be problematic. Similarly, only one of the correlation measures (PAXPS) required a Fisher's Z transform to account for the bounding between -1 and 1, as this transformation i s roughly linear for correlation coefficients between -0.9 and 0.9. The transformation of PAXPS was indicated in the remaining text by the label PAXPS'. CH 8: EVALUATION 185 The distributions in Figure 8.2 imply that the magnitude measures (PDAV, PAAV and PSAV) are more appropriately modelled as log-normal. This agrees with observations made in Koike, Takahashi & Calcaterra (1977) and argues against correlating such measures with subjective judgments of pathology severity (e.g.,3 Wolfe & Steinfatt 1987). It is assumed for the standard correlation model that the variables have a bivariate normal distribution (Wonnacott & Wonnacott 1981, pp. 172-173). Furthermore, linear regression and correlation test only for a linear relationship. Such a relationship would not be expected when one variable is normal and the other is log-normal. Finally, the lack of normalcy limits the suitable options for automated classification. CH 8: EVALUATION 186 FIGURE 8.2: POOLED HISTOGRAMS; LOG-MAGNITUDE MEASURES This figure presents histograms of the log-magnitude measures described in Appendix F. Data included 189 females and 199 males, and excluded the outliers in Table 8.7. PDAVDB (dB) PAAVDB (dB) P S A V D B W 4 35 11 37 CH 8: EVALUATION 187 FIGURE 8.3: POOLED HISTOGRAMS; PATTERN MEASURES This figure presents histograms of the cyclic perturbation directional perturbation and correlation measures described in Appendix F. Data included 189 females and 199 males, and excluded the outliers in Table 8.7. PDCPF fas, PACPF J 1 PSCPF fa CFCPF 1 \ -3 •4 -4 4 -2 A PDDPQ A PADPQ f PSDPQ CFCPF2 .57 .93 .52 .90 .52 PDXPA .57 .85 CFDPQ 2.1 .48 .84 A PDXCF Ah .54 .35 -.52 V PAXCF 7 \ PSXCF f .31 -.50 .46 CH 8 : EVALUATION 8 . 3 . 3 RELATIONSHIPS AMONG THE MEASURES 188 The scatter plots in Figure 8 . 4 suggest that the relationships among the log-magnitude measures are approximately linear, and that the variation about the principal axis is relatively constant. It is also apparent for the pair (PAAVDB, PSAVDB) that a high level of PAAVDB implies a high level of PSAVDB but not vice versa. To a lesser extent, the same holds for the pairs (HNRL, CF), (HNRN, CF) and (SHNRros, CF). This t e s t i f i e s to the reduction of external influences on PSAVDB and CF. Similar plots that use various subsets of the database support these conclusions. Linear correlations among measures are summarized in Table 8 . 8 and Table 8 . 9 . There were relatively strong correlations among the log-magnitude measures and moderate correlations among the CPF measures. A reduction of the influence of noise on PSAVDB relative to PAAVDB i s evidenced by lower correlations with noise measures. The strong correlations for the pairs (PAAVDB,PSAVDB), (HNRL,HNRN), (HNRL,CF), (HNRN,CF) and (SHNRros,SHNRsor) were expected, as these combinations were designed to measure similar waveform characteristics. The same applies for the pairs (PACPF,PSCPF), (CFCPF,CFCPF2), (PADPQ,PSDPQ), (PDXPA,PDXPS) and (PAXCF,PSXCF). Finally, the higher correlations for (PDCPF,PDDPQ), (PACPF,PADPQ), (PSCPF,PSDPQ) (PACPF,PSDPQ) and (PSCPF,PADPQ) were expected, as the CPF and the DPQ ar both sensitive to cyclic perturbations. CH 8: EVALUATION FIGURE 8.4: POOLED SCATTER PLOTS; LOG-MAGNITUDE MEASURES This figure presents 2-dimensional scatter plots of the log-magnitude measures described in Appendix F. Data included 189 females and 199 males, and excluded the outliers in Table 8.7. CH 8: EVALUATION 190 TABLE 8.8: CORRELATIONS AMONG MEASURES; MALES This table summarizes correlations among the computed vowel measures described in Appendix F. Correlation coefficients for 199 male subjects are presented. The subjects are described in Table 8.1. With the exception of PDAVDB*PDCPF (r=-0.524), PAAVDB*PDCPF (r=-0.422), PDXPA*PDXPS (r=0.689) and PAXCF*PSXCF (r=0.626), a l l excluded correlations were less than 0.38. PDAVDB PAAVDB PSAVDB HNRL HNRN CF SHNRros PAAVDB 0.776 PSAVDB 0.701 0.835 HNRL 0.781 0.784 0.753 HNRN 0.763 0.783 0.723 0.990 CF 0.798 0.871 0.782 0.908 0.920 SHNRros 0.765 0.853 0.861 0.892 0.890 0.950 SHNRsor 0.678 0.671 0.751 0.704 0.690 0.736 0.769 PDCPF PACPF PSCPF CFCPF CFCPF2 PDDPQ PADPQ PACPF 0.301 PSCPF 0.344 0.680 CFCPF 0.480 0.421 0.414 CFCPF2 0.425 0.499 0.471 0.817 PDDPQ 0 .743 0.219 0.227 0.313 0.274 PADPQ 0.206 0.773 0.467 0.363 0.441 0.171 PSDPQ 0.270 0 .494 0.761 0.375 0.444 0.165 0.399 CFDPQ -0.118 -0.143 -0.108 -0.304 -0.340 -0.044 -0.113 CH 8: EVALUATION 191 TABLE 8.9: CORRELATIONS AMONG MEASURES; FEMALES This table summarizes correlations among the computed vowel measures described in Appendix F. Correlation coefficients for 189 female subjects are presented. The subjects are described in Table 8.1. With the exception of PDAVDB*PDCPF (r=-0.524), PAAVDB*PDCPF (r=-0.422), PDXPA*PDXPS (r=0.689) and PAXCF*PSXCF (r=0.626), a l l excluded correlations were less than 0.37. PDAVDB PAAVDB PSAVDB HNRL HNRN CF SHNRros PAAVDB 0.745 PSAVDB 0.535 0.828 HNRL 0.693 0.717 0.664 HNRN 0.683 0.727 0.628 0.946 CF 0.739 0.859 0.701 0.771 0.837 SHNRros 0.739 0.877 0.817 0.787 0.816 0.946 SHNRsor 0.714 0.811 0.766 0.693 0.673 0.783 0.837 PDCPF PACPF PSCPF CFCPF CFCPF2 PDDPQ PADPQ PACPF 0 .208 PSCPF 0.046 0.668 CFCPF 0.437 0.346 0.289 CFCPF2 0.387 0.340 0.275 0.876 PDDPQ 0.702 0.080 -0.028 0.314 0 .250 PADPQ 0.127 0.704 0.406 0.193 0.229 0.073 PSDPQ -0.008 0.489 0.793 0.215 0.175 -0.083 0.381 CFDPQ -0.083 -0.072 -0.117 -0.327 -0.251 -0.070 -0.034 CH 8 : EVALUATION 1 9 2 The strength of the correlations of SHNRros with measures of shimmer and time domain noise are surprising, given the theoretical dependence on j i t t e r observed in Chapter 7 . Further investigation as to the cause is warranted. The correlations among measures were greater for males than for females. Excluding SHNRsor, a l l but one of the correlations among log-magnitude measures were higher for males, and of those a l l but five of the increases were significant at the 10% level (t>1.3). The correlations among the time domain noise measures (HNRjj,HNRJJ,CF) were notably greater for males ( t > 3 . 6 , p < . 0 0 1 ) . This suggests a higher incidence of gradual changes in the females1 vowels, as the long term average in HNRL and HNRN makes these measures more sensitive to such changes. As with subjective assessments, significance was tested using a Student's t s t a t i s t i c after application of a Fisher's Z transform. A measure of overall correlation among measures i s r' = Inverse Fisher's Z transform of z' e z z - 1 ( 8 . 2 ) = e 2 2 ' + 1 where z' = the average of absolute values of Fisher's Z transforms of correlations among measures Fisher's Z = 0 . 5 * l n [ (l+r)/(l-r) ] r* assumes values between 0 and 1 , and high values indicate a strong overall correlation. The Z transform was used for i t s desirable s t a t i s t i c a l properties (Sachs 1 9 8 4 , pp. 4 2 7 - 4 3 2 ) . CH 8: EVALUATION 193 Table 8.10 summarizes values of r' for various subsets of the data. Results suggest that the degree of correlation among the log-magnitude measures is higher for unilateral pathologies. In the female group the correlations were highest for the "other" class. In the male group the correlations were highest for the benign lesion and cancer classes. The pathologies within each of these classes tend to be unilateral. TABLE 8.10: OVERALL CORRELATION WITHIN TYPES OF MEASURES This table summarizes the overall correlation among the computed vowel measures described in Appendix F. Correlations for various subsets of the data are presented. The subsets were described in Table 8.1. Outliers in Table 8.7 were excluded. Overall correlation (r 1) is defined in Eq. (8.2). TYPE OF MEASURE SUBSET LOG-MAGNITUDE CYCLIC DIRECTIONAL CORRELATION MALE 0.827 0 .504 0.162 0.243 FEMALE 0.781 0.429 0.132 0 .221 BOTH 0.810 0.466 0.135 0.230 SPECIFIC MALE PATHOLOGIES N 22 0.795 0. 332 0.234 0.271 F 45 0.807 0.418 0.140 0.265 B 69 0.836 0.535 0.199 0.216 C 63 0.834 0.596 0.176 0 .256 SPECIFIC FEMALE PATHOLOGIES N 22 0.781 0.436 0.234 0.200 F 38 0.797 0.440 0.123 0.313 M 100 0.748 0.461 0.206 0.219 0 29 0.846 0.437 0.218 0.223 CH 8: EVALUATION 8.3.4 CORRELATION WITH SEVERITY JUDGEMENTS 194 8.3.4.1 INDIVIDUAL MEASURES Correlations of the log-magnitude measures with subjective judgments of composite severity are summarized in Table 8.11. In general, CF had the highest correlations, followed by PDAVDB and PAAVDB. The relative superiority of CF over HNRL and HNRN indicates that the long term average in the later two measures detracts from their usefulness. The low correlations for PSAVDB suggest that shimmer is the least useful type of perturbation. The high correlations of PAAVDB relative to PSAVDB can be attributed to the influence of noise discussed in Section 7.1.2. The poor performance of SHNRsor relative to SHNRros may indicate that spectral noise near the dominant formant frequency has greater importance. However, this difference can also be attributed to the increased susceptibility to noise artefact discussed in Section 5.4.3. Further study using higher quality recordings is warranted. Separation of the data according to sex and pathology affected the correlations. The correlations were generally higher for the males than for the females. Low correlations for the normal class can be attributed to a small range of severities, leading to greater sensitivity to experimental errors. The high correlation of CF relative to the other measures within the female MTD class suggests that hoarseness in MTD is associated with relatively high levels of noise. This supports the intuitive notion that breathiness and time domain CH 8: EVALUATION 195 noise are closely related, as the mean breathiness was also elevated for the MTD subjects. TABLE 8.11: CORRELATION WITH SUBJECTIVE JUDGEMENTS OF SEVERITY; INDIVIDUAL MEASURES This table compiles the Pearson's correlation coefficients between the log-magnitude measures described in Appendix F and subjective judgements of composite severity. The subsets of the database are described in Table 8.1. The outliers in Table 8.7 were excluded. The PSAVDB correlation for the female functional subset is significant at the 5% level. A l l other correlations are significant at the 1% level. subscript = rank ordering of the top measures for each subset MEASURE SUBSET PDAVDB PAAVDB PSAVDB HNRL HNRN CF SHNR ros SHNRsor MALE -0 .7703 -0 • 7742 -0.639 -0.731 -0.735 -0.8061-0. 747 -0 .621 FEMALE -0 .7123 -0 .692 -0.486 -0.641 -0.679 -0.7811-0. 7132 -0 .647 BOTH -0 .7472 -0 • 7293 -0.577 -0.650 -0.647 -0.7581-0. 704 -0 .612 SPECIFIC MALE PATHOLOGIES N 22 -0 .5873 -0 .585 -0.6122-0.531 -0.530 -0.568 -0. 544 -0 .6181 F 45 -0 .8351 -0 .8142 -0.636 -0.687 -0.699 -0.8013-0. 719 -0 .689 B 69 -0 • 7872 -0 .744 -0.641 -0.7553-0.754 -0.8201-0. 752 -0 .552 C 63 -0 .752 -0 .7592 -0.653 -0.699 -0.692 - O ^ e S 1 - ^ 7553 -0 .624 SPECIFIC FEMALE PATHOLOGIES N 22 -0 .7163 -0 .693 -0.513 -0.616 -0.652 - O . S e ^ - O . 8382 -0 .715 F 38 -0 .57 3 1 -0 .5043 -0.291 -0.465 -0.505 -0.5642-0. 485 -0 .462 M 100 -0 .663 -0 .6863 -0.500 -0.655 -0.567 -0.8031-0. 7182 -0 .664 0 29 -0 .8491 -0 .732 -0.462 -0.721 -0.7493-0.7552-0. 718 -0 .676 CH 8: EVALUATION 196 8.3.4.2 PRINCIPAL COMPONENTS Given the apparent linear relationships among log-magnitude measures observed in Section 8.3.3, i t is expected that there exists a single dimension in the measurement space that accounts for most of the variation in perturbation magnitude. It would be useful to know i f this "magnitude dimension" has a higher correlation with severity judgements than individual measures, and i f so, which measures should be included. A principal components analysis or Karhuenen-Loeve (KL) Transformation (Young & Calvert 1974, pp. 228-233) provides a means for deriving the magnitude dimension. The principal components are the eigenvectors of a covariance matrix, ordered according to the size of their eigenvalues. The magnitude dimension is the f i r s t principal component, and is the single dimension that accounts for the largest amount of variance in the data. The analysis was applied to the correlation matrix here to minimize dependence on the scale of the measures (Srivastava & Khatri 1979, pp. 274-275). Linear regression could also be used to combine the measures. The resulting correlations are guaranteed to equal or exceed those produced for the principal components. However, the relatively high correlations among the measures (i.e., multi-colinearity) adversely affects the variance of the solution (Wonnacott & Wonnacott 1981, pp. 84-88). In addition, the regression parameters are optimized for the listeners' judgements and may d i f f e r in other experiments with other judges. CH 8: EVALUATION 197 A sequential backward elimination procedure (Hand 1981, pp. 145-150) was used to search for the "best" combination of log-magnitude measures. While identification of the best subset is not guaranteed, this procedure greatly reduces the number of required iterations. I n i t i a l l y , the magnitude dimension derived from a l l eight log-magnitude measures is correlated with the judgements of composite severity. The "best" combination of seven measures was then identified by an exhaustive search, and the excluded measure was removed from future consideration. The process is repeated u n t i l a l l but two measures are excluded. Results of the backward elimination are summarized in Table 8.12. Except for one transposition, the order of elimination was the same for both sexes. Early rejection of PSAVDB is consistent with the conclusion that shimmer is the least useful type of perturbation. The measures of spectral noise were also eliminated early. In each case the f i r s t principal component accounted for over 79% of the variance. The two measure combination of CF and PDAVDB had the highest correlations. While inclusion of a third measure (PAAVDB) provided a marginal improvement for the males, such an improvement was not observed for the females. Component weights for the top two combinations are given in Table 8.13. The relative weighting of measures within the 2-measure combination was virtually the same for males and females. A larger difference was observed when the sexes were pooled. The weights for the 3-measure combinations are not CH 8: EVALUATION 198 directly comparable because of inclusion of HNRN rather than PAAVDB for the females. Table 8.14 compares the components in Table 8.13 with the best single measure from Table 8.11 (CF). Except for the female normal class, the correlations for the f i r s t principal components (Al and Bl) were higher than for the CF. Correlations for the other components (A2, B2, B3) were less than 0.2 for the males and less than 0.4 for the females. This supports the assumption of a linear relationship between the magnitude dimension and subjective judgements of severity. TABLE 8.12: BACKWARD ELIMINATION RESULTS; PRINCIPAL COMPONENTS This table summarizes the backward elimination process used to maximize correlation of log-magnitude measures with judgements of composite severity. Outliers in Table 8.7 were excluded. "NEXTOUT" = next measure to be excluded in the search "a 2" = variance explained by the f i r s t principal component "r" = correlation with severity NUMBER OF MEASURES MALES (N= 199) r FEMALES (N= 189) NEXTOUT O 2 NEXTOUT O 2 r 8 PSAVDB 83% 0 .802 PSAVDB 79% 0.752 7 HNRL 84% 0 .809 HNRL 81% 0.771 6 SHNRros 83% 0 .815 SHNRros 82% 0.776 5 SHNRsor 82% 0 .822 SHNRsor 81% 0 .782 4 HNRN 87% 0 .830 PAAVDB 82% 0.789 3 PAAVDB 88% 0 .837 HNRN 84% 0.792 2 PDAVDB 90% 0 .832 PDAVDB 87% 0.800 1 CF 100% 0 .806 CF 100% 0 .781 CH 8: EVALUATION TABLE 8.13: SUMMARY OF PRINCIPAL COMPONENTS 199 This table summarizes component weights and percent of variance accounted for by principal components of the two best combinations from Table 8.12. COMPONENT WEIGHTS COMPO MALES (N= 199) FEMALES (N = 189 ) BOTH (N= 388) NENT CF PDAVDB PAAVDB CF PDAVDB HNRN CF PDAVDB PAAVDB Al 1 1.75 0 1 1.78 0 1 1.85 0 A2 1 -1.75 0 1 -1.78 0 1 -1.85 0 Bl 1 1.68 2.51 1 1.67 1.09 1 1.77 2.46 B2 1 -4.48 3.76 1 -5.77 2.30 1 -3.89 2.48 B3 1 -0.22 -2.26 1 -0.38 -0.92 1 -0.01 -2.46 EXPLAINED VARIANCE (%) COMPONENT MALES FEMALES BOTH Al 89.9 87.0 87.2 A2 10.1 13.0 12.8 Bl 87.7 83.6 85.8 B2 8.1 11.2 9.3 B3 4.2 5.2 4.9 CH 8: EVALUATION 200 TABLE 8.14: CORRELATION WITH SUBJECTIVE JUDGEMENTS OF SEVERITY: PRINCIPAL COMPONENTS This table compiles the Pearson's correlation coefficients between the principal components in Table 8.13 and subjective judgements of composite severity. The CF from Table 8.11 was also included for comparison. The subsets of the database are described in Table 8.1. Outliers in Table 8.7 were excluded. COMPONENT SUBSET CF Al Bl A2 B2 B3 MALE -0 .806 -0.831 -0 .837 -0.057 0.017 -0 .049 FEMALE -0 .781 -0.800 -0 .792 -0.096 0.052 -0 .140 BOTH -0 .758 -0.801 -0 .801 -0.016 0.058 -0 .055 SPECIFIC MALE PATHOLOGIES N 22 -0 .568 -0.629 -0 .641 -0.047 -0.039 0 .027 F 45 -0 .801 -0.865 -0 .877 -0.049 -0 .050 0 .080 B 69 -0 .820 -0.837 -0 .829 0.038 0.121 -0 .052 C 63 -0 .768 -0.807 -0 .817 0.105 0.177 -0 .012 SPECIFIC FEMALE PATHOLOGIES N 22 -0 .861 -0.832 -0 .845 -0.396 -0.057 -0 .324 F 38 -0 .564 -0.599 -0 .601 0.166 0.149 0 .118 M 100 -0 .803 -0.805 -0 .793 -0.194 -0.039 -0 .225 0 29 -0 .755 -0.853 -0 .832 0.290 0 . 326 0 .268 CH 8: EVALUATION 8.3.5 SINGLE MEASURE DISTINCTIONS AMONG PATHOLOGIES 201 Figures 8.5 through 8.7 allow comparison of intra-class means and standard deviations. No single measure provided a clear separation of classes. A l l log-magnitude measures provided some separation of the normal class from the other classes, attributable to the lower levels of hoarseness. The normal males also tended to have elevated values of CFCPF, CFCPF2 and PDXPS. PAAVDB and a l l measures of noise magnitude provided some separation of the cancer class. Finally, with the possible exception of PDCPF and some of the correlation measures, there was l i t t l e separation among the female pathological classes. Contrary to results in Hecker and Kreul (1971), the DPQ did not separate the cancer class. As discussed in Section 6.3.1, the probable cause for this discrepancy is the sensitivity of low resolution DPQ estimates to perturbation magnitude. There was a difference between the males and females for some of the measures. The measures of perturbation magnitude, especially the noise measures, tended to be lower for the males. This i s surprising, as the average judged severity for the males and females was approximately the same (Table 8.2). The listeners evidently accepted as normal a greater amount of noise in male voices. Possible explanations include differences in fundamental frequency and differing expectations for males and females. Further exploration using synthetic vowels and randomly mixed male and female recordings is warranted. CH 8: EVALUATION 202 FIGURE 8.5: INTER-CLASS DIFFERENCES; LOG-MAGNITUDE MEASURES This figure presents intra-class means and standard deviations for the log-magnitude measures described in Appendix F. The classes are described in Table 8.1. Outliers in Table 8.7 were excluded. The bars extend 1 standard deviation each side of the mean. 29 CO •o CO o < o Cu 19 22 CO •o CO o s» <c oo c 12 28 CO ce z 7 34 CO o 14 <> <> o O ( ) <) <) <) N F B C N F M O MALE FEMALE 211 CO •o CO o 11 25 CO •a ce z 3C 7 37 CO 14 35 CO •n o a: 19 o o (> <> <) N F B C N F M O MALE FEMALE CH 8: EVALUATION FIGURE 8.6: INTER-CLASS DIFFERENCES; PATTERN MEASURES This figure presents intra-class means and standard deviations for the cyclic perturbation, directional perturbation and correlation measures described in Appendix F. The classes are described in Table 8.1. Outliers in Table 8.7 were excluded. The bars extend 1 standard deviation each side of the mean. 1.6 Cu O Cu -0.8 <) <) <) N F B C N F M O MALE FEMALE .85 cr C u o cr Cu o t/1 .60 .751 cr Cu o .60. <) < > N F B C N F M O MALE FEMALE CH 8: EVALUATION 204 FIGURE 8.6: ( c o n t i n u e d ) -1.6 1 > < < ) < ) > < ) 5 < > .60 J L J L X o CL. .30 .101 LJ X o (> .33. .10' X -.36. N F B C N F H 0 MALE FEMALE 1.7 C O I ) ( ) ( ) .50 J .12 X C O .30 J N F B C N F M 0 MALE FEMALE CH 8: EVALUATION 2 FIGURE 8.7: INTER-CLASS DIFFERENCES; PRINCIPAL COMPONENTS This figure presents intra-class means and standard deviations for the principal components in Table 8.13. The classes are described in Table 8.1. Outliers in Table 8.7 were excluded. The bars extend 1 standard deviation each side of the mean. 10.7 6.3 o o 13.6 8.0 -1.6 CM -3.2 o N F B C N F M O MALE FEMALE -1.3 CSJ CQ •3.0 •1.0 n co -2.0 <> N F B C N F M O MALE FEMALE CH 8: EVALUATION 206 As illustrated in Figure 8.7, the principal components from Table 8.13 demonstrated some separation of classes. The f i r s t principal components (Al and Bl) had similar class distributions as the measures of noise magnitude, with some separation of normal and cancer subjects. The separation of males and females was greatest in the second principal components (A2 and B2). This indicates that sex differences are not simple differences in perturbation magnitude, and emphasizes the need to segregate the data for pattern classification. In Table 8.15, the significance of differences among the class means were tested with an F-ratio (Sachs 1984, pp. 501-509). As with the Student's t test, i t is assumed that the classes are normally distributed with equal variance. Significant differences were observed for most of the log-magnitude measures and principal components, and for the pattern measures PACPF, CFCPF, CFCPF2 and PDXPS. When the normal class was removed, the differences generally remained significant for males but not for females. Based on the F-ratios, CF provided the best separation of classes, followed by PAAVDB, SHNRros and HNRN. Each of these is either a measure of noise magnitude, or is sensitive to such perturbation. The least useful of the log-magnitude measures was PSAVDB, indicating that shimmer is relatively unimportant on its own. Finally, significant differences between the sexes were observed for; 1) a l l log-magnitude measures except PDAVDB, 2) a l l principal components and 3) the pattern measures PDCPF, CFCPF2, PSDPQ, PDXPA and PDXPS. CH 8: EVALUATION 207 TABLE 8.15: UNIVARIATE F-RATIOS FOR DIFFERENCES AMONG CLASSES This table compiles univariate F-ratios for testing the significance of differences among means. The F-ratios have NCLS-1 and Ntot-NCLS degrees of freedom, where Ntot is the number of subjects and NCLS is the number of classes. There were 199 males and 189 females, including 22 normal males and 22 normal females. "M vs F" = comparison of males and females "ALL" = includes three pathological and one normal class "notN" = excludes the normal class "*" = significance at the 5% level "#" = significance at the 1% level PDAVDB PAAVDB PSAVDB HNRL HNRN CF SHNRros SHNRsor M V S F MALES ALL notN FEMALES ALL notN 3.34 17.71# 5.36* 71.04# 99.02# 48.37# 41.11# 32.55# 3.20* 6.56# 2.22 5.01# 6.11# 9.53# 5.66# 2.51 1.18 5.43# 2.69 3.87* 4.22* 7.78# 5.15# 1.70 3.51* 5.71# 3.86* 2.23 3.22* 7.54# 5.35# 4.84# 1.70 0.90 0.19 0.40 0.89 1.32 0.28 0.08 Al Bl A2 B2 B3 M vs F MALES ALL notN FEMALES ALL notN 20.99# 21.52# 53.62# 29.01# 23.84# 6.38# 6.91# 4.81# 3.14* 1.25 4.06# 4.87# 6.51# 4.45* 1.28 5.87# 5.28# 2.89* 0.49 5.24# 1.51 1.26 1.60 0.49 4.17* CH 8: EVALUATION 208 M , V S F MALES ALL notN FEMALES ALL notN TABLE 8.15: (continued) PDCPF . PACPF PSCPF CFCPF CFCPF2 PDDPQ PADPQ PSDPQ 19.46# 0.63 0.51 2.12 10.85# 2.63 0.53 4.97' 1.74 2.41 1.32 2.17 3.39* 0.48 2.49 0.10 0.33 1.80 0.01 0.11 3.30* 4.06# 1.09 2.11 1.31 1.82 4.90# 1.01 2.71 1.92 0.04 0.87 1.39 0.12 0.35 0.01 1.18 1.52 0.16 0.32 M vs F MALES ALL notN FEMALES ALL notN CFDPQ PDXPA PDXPS PAXPS" PDXCF PAXCF PSXCF 0.37 20.69# 33.24# 0.41 1.49 1.87 0.54 1.40 1.84 4.36# 2.51 0.83 2.18 0.33 0.86 0.88 2.07 1.89 0.13 1.12 0.05 0.27 1.98 2.26 1.46 0.58 2.06 0.69 0.23 2.04 3.13* 1.90 0.86 2.28 0.31 CH 8: EVALUATION 8.3.6 AUTOMATIC CLASSIFICATION OF PATHOLOGY 209 8.3.6.1 ACCOUNTING FOR PERTURBATION MAGNITUDE One question that was addressed in this chapter is whether the measures provide relevant information that is unrelated to hoarseness level. This was prompted by concerns that have been expressed in the literature about undue reliance on measures of hoarseness for automated classification (e.g., Ludlow 1981). Since i t is not uncommon to observe hoarseness without laryngeal pathology, or conversely to observe pathology without undue hoarseness, classification based solely on such measures may be limited. Furthermore, performance evaluations of these c l a s s i f i e r s are affected by severity of hoarseness in the selected speakers, and may not be representative of the population as a whole. Information that is unrelated to hoarseness level is evidenced i f good classification performance can be obtained using measures that are not strongly correlated with i t . This includes a l l of the CPF, DPQ and correlation measures. In addition, i t was observed in Section 8.3.4.2 that only the f i r s t principal component of the KL transformation of log-magnitude measures i s strongly correlated with hoarseness. Thus, a cl a s s i f i e r that uses the other components is also indicative of such information. For the remaining discussion, the process of omitting the f i r s t principal component was called "magnitude compensation". Two approaches were considered. For "pooled magnitude CH 8: EVALUATION 210 compensation", the magnitude dimension was assumed to be the same for each class, and the analysis was applied to the pooled correlation matrix. For "separate magnitude compensation", a separate magnitude dimension was determined for each class, thus allowing for different orientations in the measurement space. Magnitude compensation has the following interpretation for automated classification. Many classification strategies, including the one used here, characterize each class as a point in the measurement space about which the probability of class membership decreases as a function of the intra-class covariance. This characterization may be inappropriate when some of the measures depend on an external factor, as a curve is traced in the measurement space when the external factor is varied. By excluding the f i r s t principal component, one essentially assumes that the curve traced by variation of perturbation magnitude is a line about which the variance i s independent of the position along the li n e . Results in Section 8.3.3 suggest that for the log-magnitude measures this assumption i s reasonable. 8.3.6.2 CLASSIFIER DESIGN AND EVALUATION Options for c l a s s i f i e r design were limited both by the data and by the magnitude compensation methods. The range of perturbation magnitudes is greater for pathological speakers than for normal speakers (e.g., Zyski et a l . 1984). While results in Section 8.3.5 indicate that a logarithmic transformation reduces the difference, this argues against pooling of the covariance matrix. Furthermore, the process of separate magnitude CH 8: EVALUATION 211 compensation produces a different feature set for each class, thus making i t impossible to pool the covariance. This also complicates the comparison of class-conditional probabilities when a maximum likelihood c l a s s i f i e r is used. Specifically, Bayes' theorem dictates that |j) * P(j) P(j|xn) = - (8.3) J 2 f(Xj|k) * P(k) k=l • where Xj = a feature vector for probability tests in class j J = total number of classes p ( j ) = probability of class j P(k|xj) = conditional probability of class k given Xj f(Xj|k) = probability density of Xj in class k. It unfortunately does not follow that f(Xj|k) = f(x k|k) when the features in Xj and x k are different. The implication is that the denominator of Eq. (8.3) can depend on j , and direct comparison of f ( X j | j ) * P(j) for a l l j is not sufficient. A Maximum Likelihood approach was adopted (Hand 1981, pp. 50-54). The class-conditional probability densities were assumed to be Gaussian. The means and variances were estimated from pre-classified training vectors, and each test vector was then associated with the class for which i t had the highest probability of membership. Results in Section 8.3.5 support the assumption of Gaussianity. CH 8: EVALUATION 212 The various subsets of features were i n i t i a l l y evaluated using a closed (resubstitution) test (Hand 1981, p. 186). Because the c l a s s i f i e r is trained and tested using the same data, the closed test provides an optimistic indication of c l a s s i f i e r performance. It was assumed here that this bias i s comparable for a given database of subjects when the number of features being considered is the same. A leave-one-out open test (Hand 1981, pp. 187-188) was applied to c l a s s i f i e r s that demonstrated good performance in the closed test. This is an iterative procedure where one subject is excluded for training of the c l a s s i f i e r , then that subject alone is used for testing. The process is repeated u n t i l each subject has been excluded once. While the leave-one-out method produces a performance estimate that i s relatively unbiased, the need for repeated retraining of the c l a s s i f i e r made i t impractical for use on a l l feature combinations. In addition, because the method for dimensionality reduction described in Section 8.3.6.3 u t i l i z e s differences among classes, i t also had to be recomputed on each iteration. Classification results were rank ordered using a Pearson's chi-squared s t a t i s t i c (Sachs 1984, pp. 474-476). This was preferred over the total percent correct because of the bias introduced by imbalance in the number of subjects per class (see Section 8.2.1). CH 8: EVALUATION 213 It is worth noting that the optimistic bias of the closed test affects the interpretation of the chi-squared s t a t i s t i c as a test of significance. A significant chi-squared can be attributed to the bias as well as the features. 8.3.6.3 DIMENSIONALITY REDUCTION It i s well known that the number of samples required to accurately characterize a multi-variate probability density increases rapidly as the number of dimensions increases (e.g. Hand 1981, p. 123). This is problematic for pattern classification, as the number of samples available for training is generally limited. The result is an increasing sensitivity to peculiarities in the data used for training. In addition, the increased uncertainty about the prior parameter distributions causes the error rate i n i t i a l l y to decrease but then to increase as features are added (Van-Campenhout 1982). No general rule for optimum determination of the number of features has been published. However, a recommended rule-of-thumb is that the ratio of the number of samples per class to the number of features should be five or more (Jain & Chandresakaran 1982). In keeping with this guideline, the cl a s s i f i e r s evaluated here each used four features. The f i r s t four components from a KL transformation were used as features when consideration of more than four measures was desired. The transformation was applied to a hybrid of the correlation matrix, defined here as the "interrelation" matrix. Specifically, i f X j ^ ( i ) is the i'th sample of the k'th variable CH 8: EVALUATION 214 of the j'th class, then one element of the interrelation matrix is Pk,m2 4>k,m2 = (8-4) a k , k am,m where ak,m2 = pooled intra-class covariance NCLS Wj Nj = 2 2 (x-j k ( i ) - xmj k ) ( X j m ( i ) - xmj m) j=l Nj-1 i=l Pk,m2 = pooled overall covariance NCLS Wj Nj = 2 2 ( x j , k ( i ) - x m . , k ) ( xj,m(i) " xm.,m) j=l Nj-1 i=l J x m j ,k = class mean = mean of X j k ( i ) with respect to i xm > / k = pooled mean of xntj / k with respect to j Vj = Weights for pooling of means Wj = Weights for pooling of covariances 0 £ Vj £ 1 NCLS NCLS 0 £ Wj £ 1 j=l J j=l NCLS = total number of classes Nj = number of samples in the j'th class The interrelation matrix is similar to a correlation matrix in that normalization with respect to scale is provided through division by standard deviations ( a k / k and a m / i n) . The significant difference is that variance due to differences among the means is preserved in Bk/m*. As an aside, the diagonal of the interrelation matrix has a useful relationship with a univariate F s t a t i s t i c . It can be shown that 2 Vj = l 2 Wj = l CH 8: EVALUATION 215 (8.5) NCLS Wj Nj 0k,m2 = ak,m2 + 2 ( x mj,k " xm.,k> ( x mj,m " xr*.,m> j-1 Nj-l NCLS Wj Nj S (»nj k - xm k ) 2 j-1 Nj-l <t>k,k2 = 1 + (8.6.) NCLS Wj Nj S S ( xj,k ( D " x ^ j , k ) 2 j-1 Nj-l i=l Now, the F-ratio that Is used for testing for differences among class means in 1-way analysis of variance is (8.7) NCLS S Nj (xm.j k - xm k ) 2 Ntot - NCLS .j-1 FNCLS-l,Ntot-NCLS = NCLS - 1 NCLS Nj S S (Xj k ( i ) - xmj k) j-1 i=l where Ntot = total number of samples Thus Ntot - NCLS NCLS - 1 (<t>k/k2 - 1) (8-8) is distributed as PN CLS-1,Ntot-NCLS i f Vj = Nj / Ntot and Wj = Nj-l / (Ntot-NCLS). This relationship was used to derive the data in Table 8.15. 8.3.6.4 MEASURE SELECTION The features produced by the KL transformation are not necessarily the best choices for pattern classification. The CH 8: EVALUATION 216 variance added by redundant or unnecessary measures can cause relevant variance to be displaced to dimensions that are excluded from the c l a s s i f i e r . This concern makes i t desirable to search for a subset of measures on which to apply the transformation. The backwards elimination procedure described in Section 8.3.4.2 was used when the number of measures was eight or less. The procedure was used to optimize normal/non-normal and non-cancer/cancer performance as well as the 4-class performance. The additional t r i a l s that were required provided some protection against finding a sub-optimal solution. A sequential forward selection procedure (Hand 1981, p. 147) was used for searches that involved a larger number of measures. This was preferred over backward elimination because the likelihood that relevant variance w i l l be allocated to discarded components increases with the number of measures. This results in an increased likelihood of inappropriately excluding a measure. Forward selection is flawed in that i t f a i l s to consider potentially useful interactions among measures that remain to be selected. Other approaches do not have this problem (e.g. Hand 1981, pp. 138-150), but generally require a considerably greater computational effort. Two approaches to forward selection were used. The f i r s t started with one measure and added other measures one at a time u n t i l further additions did not yield an improvement. The second started with the best combinations of log-magnitude measures, and CH 8: EVALUATION 217 proceeded in the same manner to select pattern and correlation measures. As before, the added t r i a l s required for optimization of three performance criterion (i.e., 4-class, normal/non-normal and non-cancer/cancer) provided some protection against finding a sub-optimal solution. Further protection was obtained by "stepping forward" from more than one result when performances were comparable. 8.3.6.5 CLOSED TEST RESULTS Table 8.16 summarizes closed classification results for various types of measures. The log-magnitude measures produced the best results, followed in turn by the correlation, CPF and DPQ measures. The best c l a s s i f i e r s for males had an average percent correct (APC) and total percent correct (TPC) of 56.2 and 50.8 for the 4-class decision, 81.9 and 81.9 for the normal/non-normal decision, and 75.0 and 74.4 for non-cancer/cancer decision. The best c l a s s i f i e r s for females had an APC and TPC of 56.2 and 47.6 for the 4-class decision and 78.6 and 76.2 for the normal/non-normal decision. Classification performance for males improved when magnitude compensation was applied. This argues strongly that the classes are separated by more than differences in hoarseness level. The improvement occurred only for the 4-class c l a s s i f i e r s , and there was l i t t l e advantage in using the separate magnitude compensation approach. The classification performance declined with magnitude compensation for females, suggesting a greater reliance on CH 8: EVALUATION 218 perturbation magnitude. However, separate magnitude compensation was apparently superior to pooled magnitude compensation. Since the KL transformation was a part of the magnitude compensation process, the magnitude compensated features were ordered according to a variance criterion. However, i t is not entirely redundant to reapply the KL transformation to the interrelation matrix for dimensionality reduction, as differences among class means would then be considered. Results did not improve when this was done. For males, partitioning of the output of the 4-class c l a s s i f i e r s was superior to the 2-class c l a s s i f i e r s for both normal and cancer detection. However, caution is warranted when comparing these results, as the bias of the closed test may be reduced when subjects are pooled in the 2-class c l a s s i f i e r s . An added practical advantage of the 4-class approach to cancer detection is that erroneous classification as functional dysphonia or benign lesion would s t i l l lead to medical consultation. Thus, this type of error is less c r i t i c a l than a simple false negative. Table 8.17 summarizes the log-magnitude measures excluded in designing the c l a s s i f i e r s for Table 8.16. For males there was a greater tendency to exclude measures of j i t t e r and shimmer, and to retain the measures of noise. CF was retained in a l l male c l a s s i f i e r s , whereas PAAVDB was excluded from a l l but two of them. The noise measures were not as dominant in the females. HNRL was excluded from a l l but two of the c l a s s i f i e r s , and PDAVDB CH 8: EVALUATION 219 was included along with HNRN and SHNRsor in a l l of them. Finally, magnitude compensation made PAAVDB dispensable for females. TABLE 8.16: CLOSED TEST CLASSIFICATION RESULTS; MEASURE TYPES This table compiles the Pearson's chi-squared s t a t i s t i c s for a closed test of cla s s i f i e r s that are limited to a single type of measure. "POOLED" and "SEPARATE" = types of magnitude compensation "DIRECT" = no magnitude compensation (see Section 8.3.6.3) "N" = normal "B" = benign lesion "F" = functional dysphonia "C" = cancer "M" = muscle tension dysphonia "0" = "other" N/F/B/C and N/F/M/0 = 4-class cl a s s i f i e r s (X2 df=9) N/notN, notC/C = 2-class cl a s s i f i e r s (X2 df=l) N/FBC, N/FMO, N/FMO = 2-class partitions of 4-class cla s s i f i e r s (X2 df=l) LOG -MAGNITUDE COMPENSATED CLASSIFIER DPQ CPF CORREL DIRECT POOLED SEPARATE MALES N/F/B/C 24.5 34.1 41.2 74. 3 90.0 90.8 N/FBC 1.9 16.8 23.5 25.7 42. 3 38.2 N/notN 9.3 15.2 19.6 27.8 26.5 29.8 NFB/C 6.9 8.7 11.6 43.6 39.5 40.7 notC/C 7.5 13.1 12.7 29.9 31.1 29.9 FEMALES N/F/M/0 22.1 27.0 55.6 73.2 47.5 60.0 N/FMO 0.8 14.3 29.9 28.1 18.4 23.7 N/notN 4.6 15.4 23.4 29.7 17.7 22.3 CH 8: EVALUATION 220 TABLE 8.17: MEASURES EXCLUDED FROM THE LOG-MAGNITUDE CLASSIFIERS This table compiles the log-magnitude measures that were excluded through backward elimination to arrive at the results in Table 8.16. CLASSIFIER DIRECT POOLED SEPARATE MALES N/F/B/C PDAVDB PAAVDB, PSAVDB PDAVDB, PAAVDB PAAVDB, PSAVDB HNRL N/FBC PAAVDB PAAVDB, PSAVDB PAAVDB, PSAVDB SHNRsor N/notN PDAVDB, PAAVDB PDAVDB, PAAVDB PAAVDB HNRN, SHNRros HNRN SHNRros, SHNRsor NFB/C PDAVDB, PAAVDB PDAVDB PAAVDB, PSAVDB HNRL, HNRN SHNRros notC/C PSAVDB, HNRN PAAVDB PAAVDB SHNRros, SHNRsor HNRL HNRL FEMALES N/F/M/O HNRL PAAVDB PAAVDB SHNRros HNRL, SHNRros N/FMO HNRL PAAVDB PAAVDB HNRL, SHNRros HNRL, CF N/notN PSAVDB PAAVDB PAAVDB, PSAVDB HNRL, CF CF, SHNRros HNRL CH 8: EVALUATION 221 It is relevant to ask i f good performance can be obtained using fewer types of perturbation, as measures of noise are more complex and time consuming to compute than measures j i t t e r or shimmer. To address this question, c l a s s i f i e r s were constructed that were restricted to measures of two types of perturbation. The measurement sets were comprised of the log-magnitude, DPQ and CPF measures for each perturbation type along with the appropriate correlation measure. The f i r s t four principal components were used as features, and unnecessary measures were removed through backward elimination. Results are summarized in Table 8.18. Inclusion of the time domain noise measures generally produced superior results. Two exceptions were "PD+PA" for cancer classification and "PA+PS" for females. A l l pairs that included j i t t e r were inferior for females, indicating that shimmer gains significance when combined with other measures and is more useful for classi f i c a t i o n . It is of practical significance that results comparable to the best in Table 8.16 were obtained without including the spectral noise measures. Table 8.19 summarizes the log-magnitude measures excluded in deriving the c l a s s i f i e r s for Table 8.18. The pattern measures, especially the DPQ measures, were the most frequently excluded. However, one of the log-magnitude measures was dropped for a l l but three of the normal/non-normal optimizations, and the correlation measure was dropped from a l l but one of the non-cancer/cancer optimizations. CH 8: EVALUATION 222 TABLE 8.18: CLOSED TEST CLASSIFICATION; PERTURBATION TYPES This table compiles the Pearson's chi-squared st a t i s t i c s for a closed test of cl a s s i f i e r s , where each c l a s s i f i e r was limited to measures of two types of perturbation. "PD" = j i t t e r "N" = normal "PA" = = amplitude shimmer lip II = functional dysphonia "PS" = = standard deviation shimmer "M" = muscle tension dysphonia "CF" = = time domain noise "B" = benign lesion I I Q II = cancer H Q II = "other" N/F/B/C and N/F/M/0 = 4-class cl a s s i f i e r s (X2 df=9) N/FBC, N/FMO, N/FMO = 2-class partitions of 4-class clas s i f i e r s (X2 df=l) CLASSIFIER PD+PA PD+PS MEASURE PA+PS SET PD+CF PA+CF PS+CF MALES N/F/B/C 53.8 42.6 50.0 73.5 76.6 73.0 N/FBC 20.5 21.7 26.1 45.2 47.9 45.0 NFB/C 35.9 15.9 24.0 36.7 34.3 35.8 FEMALES N/F/M/O 52.2 50.3 74.0 63.2 79.5 67.6 N/FMO 25.0 26 .0 37.4 30.4 47.8 40.6 CH 8: EVALUATION 223 TABLE 8.19: MEASURES EXCLUDED FROM THE "PERTURBATION PAIRS" CLASSIFIERS This table compiles the measures that were excluded through backward elimination to arrive at the results in Table 8.18. CLASSIFIER PD+PA PD+PS MEASURE PA+PS SET PD+CF PA+CF PS+CF MALES N/F/B/C PDDPQ PSDPQ PADPQ PDDPQ CF PSAVDB PACPF PSDPQ CFDPQ CFDPQ CFDPQ PDXPA CFCPF2 CFCPF N/FBC PAAVDB PSDPQ PSAVDB PDAVDB PAAVDB PSAVDB PADPQ PACPF PDCPF PACPF PSCPF PAXPS' PDDPQ PADPQ CFCPF2 CFDPQ NFB/C PDDPQ PSDPQ PSAVDB PDXCF PAXCF PSXCF PACPF PDCPF PSDPQ PDCPF CFCPF CFDPQ PDXPA PDXPS PDDPQ CFCPF2 CFDPQ FEMALES N/F/M/O PDAVDB PDAVDB PAAVDB CFCPF2 CFCPF2 CF and N/FMO PADPQ PDDPQ PACPF CFDPQ PACPF PSDPQ PDXPA PDCPF PADPQ CFDPQ CFDPQ CH 8: EVALUATION 224 8.3.6.6 OPEN TEST RESULTS Open tests of selected c l a s s i f i e r s are summarized in Tables 8.20 through 8.25. Included in Tables 8.20, 8.21 and 8.24 are the log-magnitude cl a s s i f i e r s from Table 8.16 that produced the highest chi-squared s t a t i s t i c s . Tables 8.22, 8.23 and 8.25 summarize the best cla s s i f i e r s found through forward selection. The log-magnitude cla s s i f i e r s in Table 8.20, as expected, performed poorly at identifying functional dysphonia and benign lesion subjects. The ab i l i t y to distinguish normal males was also discouraging, with X2=4.6, p<.05 and APC=61%. Only a slight improvement was obtained by using the c l a s s i f i e r from Table 8.16 that was optimized for detection of normal speakers (X2=5.7, p<.02, APC=63%). The performance was substantially better for cancer detection, with X2=25.6, p<.001 and APC=69%. The c l a s s i f i e r from Table 8.16 that was optimized for cancer detection did not yield superior results. The apparent benefits of magnitude compensation for males were not reflected in Table 8.20. However, significant distinctions were observed. Most notable is that the ab i l i t y to distinguish cancer remains remained significant, with X2=19.8, p<.001 and APC=67%. Thus, the conclusion that the classes are separated by more than differences in hoarseness level remains valid. CH 8: EVALUATION 225 The 2-class cl a s s i f i e r s in Table 8.21 were superior at separating normal speakers from pathological speakers; the APC increased to 65%. However, the 4-class approach was superior for cancer detection. This supports the conjecture that extra pathological classes are useful for accounting for non-cancerous subjects that have significant levels of perturbation. The c l a s s i f i e r s in Table 8.22 demonstrate that improved performance can be obtained through inclusion of pattern and correlation measures. The f i r s t c l a s s i f i e r provided a substantial improvement i n the normal/non-normal distinction, with X2=28.0, p<.001 and APC=74%. The third c l a s s i f i e r improved the non-cancer/cancer performance to X2=33.3, p<.001 and APC=71%. It is convenient that a l l but the third c l a s s i f i e r in this table did not make use of the spectral noise measures. The second c l a s s i f i e r was included because i t produced the best performance in the closed tests (4-class X2=108 and N/FBC X2=67.2). Evidently, the closed test bias was particularly large. Nonetheless, a factor that argues for the use of this c l a s s i f i e r is the low incidence of mistaking cancer as normal. The fourth c l a s s i f i e r is the one recommended in Table 8.18 for cancer detection using measures that are simple to compute. While a significant a b i l i t y to distinguish cancer was observed, i t was inferior to the third c l a s s i f i e r . The 2-class c l a s s i f i e r s in Table 8.23 were inferior to the c l a s s i f i e r s in Table 8.22. For separating the normal speakers, the extra pathological classes reduced the false negative error CH 8: EVALUATION 226 rate while causing only one extra false positive. The extra classes for cancer detection accounted for non-cancerous pathologies and reduced the practical risk of a false negative. The results for females in Table 8.24 and Table 8.25 were similar to those for the males. The reliance on perturbation magnitude was larger for the females, as magnitude compensation produced a greater reduction in performance. However, a significant a b i l i t y to separate MTD subjects was apparent after magnitude compensation (X2=8.7, p<.005). A significant a b i l i t y to detect the "other" class was apparent for the direct c l a s s i f i e r in Table 8.24 (X2=13.8, p<.001), but this a b i l i t y was removed by magnitude compensation. As with the males, improved performance was obtained through inclusion of pattern measures, leading to an APC of 71% for normal/non-normal distinctions using a 4-class c l a s s i f i e r , and an APC of 74% using a 2-class c l a s s i f i e r . A lack of clear distinctions among the pathological classes accounts for the apparent superiority of the 2-class c l a s s i f i e r . The extra classes increase the opportunities for false positive errors. The second and fourth c l a s s i f i e r s in Table 8.25 were included because they had promising performance in closed tests and used inexpensive measures. While i t failed in i t s original purpose of distinguishing normal speakers, the second c l a s s i f i e r showed some promise for identifying MTD (X2=7.3, p<.01). The fourth c l a s s i f i e r had a significant level of performance (p<.001), but was inferior to the third c l a s s i f i e r . CH 8: EVALUATION 227 TABLE 8.20; OPEN TEST RESULTS; LOG-MAGNITUDE MEASURES; MALES This table summarizes an open test of 4-class c l a s s i f i e r s that use log-magnitude measures. The measures, confusion matrices and summary st a t i s t i c s are presented. The best c l a s s i f i e r s from Table 8.16 were tested. The chi-squared for the N/FBC matrices are significant at the 5% level. Other chi-squared stat i s t i c s are significant at the 0.1% level. DIRECT MAGNITUDE COMPENSATED HNRL, HNRN CF, SHNRros, SHNRsor PDAVDB, HNRL, HNRN CF, SHNRros, SHNRsor N F B C N F B C N 11 4 3 4 N 9 5 4 4 F 19 13 2 11 F 17 11 6 11 B 22 9 12 26 B 13 16 14 26 C 8 4 8 43 C 8 7 8 40 X2 ave% total% X2 ave% total% N/F/B/C 41.2 41 .1 39.7 29.2 37 .3 37.2 N/FBC 4.6 61 .2 69.8 4.1 59 .7 74.4 NFB/C 25.6 69 .1 69.3 19.8 66 .7 67.8 CH 8: EVALUATION 228 TABLE 8.21; OPEN TEST RESULTS; LOG-MAGNITUDE MEASURES; MALES This table summarizes an open test of 2-class c l a s s i f i e r s that use log-magnitude measures. The best c l a s s i f i e r s from Table 8.16 were tested. The chi-squared for the N vs P matrices are significant at the 1% level. The chi-squared s t a t i s t i c s for cancer detection are significant at the 0.1% level. DIRECT MAGNITUDE COMPENSATED PSAVDB, HNRL, CF SHNRros PSAVDB, HNRL, CF SHNRros, SHNRsor N P N P N 14 8 N 14 8 P 61 116 P 62 115 ave% total% 7.1 64.6 65.3 ave% to t a l % 6.8 64.3 64.8 PDAVDB, PAAVDB HNRL, CF note C note 84 52 C 20 43 X2 ave% total% PDAVDB, PSAVDB, HNRN CF, SHNRros, SHNRsor note C note 83 53 C 21 42 X2 ave% total% 15.6 65.0 63.8 13.2 63.8 62.8 CH 8: EVALUATION 229 TABLE 8.22; OPEN TEST RESULTS FOR SELECTED CLASSIFIERS; MALES This table summarizes an open test of selected c l a s s i f i e r s . The 4-class chi-squared stat i s t i c s for c l a s s i f i e r s (1) and (3) are significant at the 0.1% level, as were the N/FBC s t a t i s t i c for (1) and the NFB/C statis t i c s for (2), (3) and (4). The 4-class and N/FBC st a t i s t i c s for (2) were significant at the 1% level. The same st a t i s t i c s for (4) were significant at the 5% level. (1) CF, CFCPF PDXCF, PAXCF (2) PAAVDB, PSAVDB CFCPF, CFCPF2, PDXCF N F B C N F B C N 14 2 3 3 N 8 6 3 5 F 8 15 9 13 F 8 12 12 13 B 14 28 7 20 B 13 23 15 18 C 5 27 9 22 C 5 14 9 35 X2 ave% tot a l % X2 ave% total% N/F/B/C 35.0 35 .5 29.1 22.5 35 .1 35.2 N/FBC 28.0 74 .2 82.4 6.5 60 .8 79.9 NFB/C 1.5 54 .2 61.3 15.9 64 .5 67.8 (3) CF, PDCPF SHNRros, SHNRsor (4) PDAVDB, PAAVDB PDCPF, PADPQ N F B C N F B C N 10 7 2 3 N 14 3 1 4 F 17 12 6 10 F 22 9 5 9 B 24 17 10 18 B 33 13 8 15 C 9 6 7 41 C 18 8 7 30 X2 ave% total% X2 ave% total* N/F/B/C 37.1 37 .9 36.7 18.8 35 .7 30.7 N/FBC 2.8 58 .6 68.8 4.0 61 .2 59.3 NFB/C 33.3 71.1 73.4 15.2 63.5 69. 3 CH 8: EVALUATION 230 TABLE 8.23; OPEN TEST RESULTS FOR SELECTED CLASSIFIERS; MALES This table summarizes an open test of selected 2-class c l a s s i f i e r s found through forward selection. A l l chi-squared st a t i s t i c s are significant at the 0.1% level. (1) CF, CFCPF PDXCF, PAXCF, PSXCF (2) PAAVDB, HNRL, PDXPA PSCPF, CFCPF2 N P note C N 15 7 note 96 40 P 37 140 C 23 40 X2 ave% tot a l % 22.7 73.6 77.9 X2 ave% total% 20.8 67.0 68.3 CH 8: EVALUATION 231 TABLE 8.24; OPEN TEST RESULTS; LOG-MAGNITUDE MEASURES; FEMALES This table summarizes an open test of cl a s s i f i e r s that use log-magnitude measures. The best c l a s s i f i e r s from Table 8.16 were tested. The chi-squared for the direct c l a s s i f i e r s are significant at the 0.5% level. DIRECT MAGNITUDE COMPENSATED PDAVDB, PAAVDB, PSAVDB HNRN, CF SHNRros, SHNRsor PDAVDB, PSAVDB HNRL, HNRN, CF SHNRsor N F M 0 N F M 0 N 11 2 6 3 N 7 4 4 7 F 8 11 9 10 F 9 6 7 16 M 24 22 28 26 M 16 16 40 28 0 2 5 5 17 0 2 8 7 12 X2 ave% tota l % X2 ave% total% N/F/M/O 24.6 41 .4 35.4 15.6 37 .2 34.4 N/FMO 9.4 64 .8 76.2 3.2 57 .8 77.8 PDAVDB, PAAVDB HNRN, SHNRros, SHNRsor PDAVDB, PSAVDB HNRL, HNRN, SHNRsor N P N P N 13 9 N 7 15 P 43 124 P 38 129 ave% tot a l % X2 ave% total% 10.4 66.7 72.5 0.9 54.5 72.0 CH 8: EVALUATION 232 TABLE 8.25; OPEN TEST RESULTS FOR SELECTED CLASSIFIERS; FEMALES This table summarizes an open test of selected c l a s s i f i e r s found through forward selection. A l l chi-squared s t a t i s t i c s except those for c l a s s i f i e r 2 are significant at the 0.1% level. (1) CF, PAXCF, PDDPQ PAXPS' f PSAVDB, (2) [ PADPQ, PSCPF, PSDPQ PAXPS' N F M 0 N F M O N 13 4 2 3 N 6 3 5 8 F 9 9 10 10 F 8 5 13 12 M 17 23 27 33 M 13 18 52 17 0 2 5 5 17 O 7 5 11 6 X2 ave% total% X2 ave% total% N/F/M/O 30.5 42. 1 34.9 13.2 28. 3 36.5 N/FMO 20.5 71. 2 80 .4 1.5 55. 3 76.7 (3) CF, SHNRros PACPF, PAXCF (4) PSAVDB, HNRL CFCPF, PAXPS' N P N P N 15 7 N 14 8 P 35 132 P 44 123 X: ave% total% ave% total% 22.3 73.6 77.8 12.7 68.6 72.5 CH 8: EVALUATION 233 8.4 SUMMARY 8.4.1 SUBJECTIVE JUDGEMENTS Subjective judgements of breathiness and stridency for 206 male and 194 female subjects were presented. Composite severity was defined as the maximum of the breathiness and stridency judgements. The repeatability between sessions was higher for males (r~.82) than for females (r-.76). Similarly, the inter-judge correlations were higher for males (r-.75 vs r-.70). The correlations were lower for stridency than for breathiness, partially because of an increased tendency to judge stridency=0. Histograms supported the assumption of normal distributions for breathiness and composite severity. The tendency to judge stridency=0 made this assumption less valid for stridency. As expected, the mean severities were lower for normal subjects and higher for cancer subjects. MTD subjects had an increased proportion of breathiness, attributable to the presence of a posterior glottic chink. The listeners were also asked to judge the probable type of pathology. For males the types were normal, function dysphonia, benign lesion and cancer. For females the types were normal, functional dysphonia, muscle tension dysphonia and "other". There was a significant relationship between the judgements and the actual pathologies (X2~65, p<.001). The repeatability of the judgements between listeners and between sessions was higher for males (X2~210, p<.001) than for females (X2-90, p<.001). The CH 8: EVALUATION 234 listeners had moderate success at distinguishing normal subjects (X2~30, p<.001) and cancer subjects (X2~25, p<.001). 8.4,2 DISTRIBUTION OF MEASURES Seven male subjects and five female subjects were rejected as outliers. In most cases this was because of extreme alternate cycle periodicity. The fact that a l l of the male outliers had unilateral pathology is suggestive of a possible relationship. With the exception of PAXPS, histograms for a l l log-magnitude, CPF, DPQ and correlation measures supported the assumption of normal distributions. It follows that the magnitude measures (PDAV, PAAV and PSAV) are better modelled with a log-normal distribution. This argues against correlating or regressing such measures with subjective judgements of hoarseness, and limits their use in pathology classification. A Fisher's Z transform resolved the lack of normalcy for PAXPS. Strong linear correlations among the log-magnitude measures were observed (r=.67 to r=.99). These correlations were greater for males, particularly among the time domain noise measures. The correlations between SHNRros and measures of time domain noise (CF, HNRL and HNRN) were surprisingly high, given the theoretical dependences on j i t t e r observed in Chapter 7. Common factors contributing to the perturbations are evidently emphasized in SHNRros. Correlations among most other measures were relatively weak. The exceptions could be attributed to similarities in the algorithms. CH 8: EVALUATION 8.4.3 CORRELATION WITH JUDGED SEVERITY 235 The best single measure for correlation with subjective judgements of composite severity was CF, followed by PDAVDB and PAAVDB. The correlations were in general higher for males (r~.81) than for females (r-.78). The relative superiority of CF over HNRL and HNRN suggests that the long term average in the later two measures detracts from their usefulness. The large correlation of CF relative to the other measures within the female MTD class supports the notion that breathiness and time domain noise are closely related. Finally, the results suggested that shimmer was the least useful type of perturbation. Higher correlations were obtained through a principal components analysis of the log-magnitude measures. The f i r s t principal component of the pair (CF, PDAVDB) had higher correlations for males (r-.83) and females (r-.80). Further addition of PAAVDB produced a small increase for males (r-.84) but not for females. 8.4.4 SEPARATION OF SEXES A number of the measures were significantly different for the males and females, indicating that data should be segregated. PAAVDB and a l l log-magnitude measures of noise, were significantly lower for males (p<.01), and the difference for PSAVDB was significant at p<.05. This suggests that listeners accept a larger amount of shimmer and noise as normal for males, because the mean of severity judgements for males and females CH 8: EVALUATION 236 were approximately the same. The differences for PDCPF, CFCPF2, PDXPA and PDXPS were also s i g n i f i c a n t at the 1% l e v e l . 8.4.5 CLASSIFICATION OF PATHOLOGY No sing l e measure provided a clear separation of classes. However, s i g n i f i c a n t differences (p<.05) were observed for the log-magnitude measures, p a r t i c u l a r l y within the males. The pattern measures PACPF, CFCPF, CFCPF2 and PDXPS also had s i g n i f i c a n t differences. The two measures with the l a r g e s t class separation were CF and PAAVDB. PSAVDB, SHNR s o r and PDAVDB had comparatively poor class separation for males. HNRL, HNRN, PDAVDB and PSAVDB had poor separation for females. The class differences generally remained s i g n i f i c a n t for males when the normal class was removed. This indicates some promise for d i s t i n g u i s h i n g among pathologies. Unfortunately, with the exception of PDXPS, the differences for females were not s i g n i f i c a n t a f t e r removal of the normal c l a s s . The major observations from the c l a s s i f i e r s were as follows: 1) The log-magnitude measures were generally the best for c l a s s i f i c a t i o n , followed by c o r r e l a t i o n , CPF and DPQ measures. 2) Measures of time domain noise were generally better than other measures. Also, measures of shimmer gained importance when combined with other measures, and tended to be superior to measures of j i t t e r . CH 8: EVALUATION 237 3) The classes are separated by more than a simple difference in the overall level of perturbation, particularly for males. This was supported by significant differences among classes for features that were not strongly correlated with hoarseness level, and by the significant level of performance obtained from the "magnitude compensated" c l a s s i f i e r s . 4) There was l i t t l e difference in performance for the two methods of magnitude compensation. 5) The distinctions among the pathological classes were clearer for males than for females. Functional dysphonia and benign lesion subjects were poorly c l a s s i f i e d . However, a significant a b i l i t y do distinguish normal speakers, male cancer speakers, and female MTD speakers was observed. An a b i l i t y to detect the female "other" class was also observed, but results of magnitude compensation suggest that this was largely due to differences in perturbation magnitude. 6) Partitions of the output of 4-class c l a s s i f i e r s were generally superior to 2-class c l a s s i f i e r s . The 4-class c l a s s i f i e r reduced the false negatives for pathological males with only one extra false positive. For detection of cancer, the 4-class c l a s s i f i e r was superior on two counts. The performance was superior and the extra pathological classes reduce the practical risk of false CH 8: EVALUATION 238 negatives. The 2-class c l a s s i f i e r s were superior for detecting pathological females. 7) Superior classification was obtained through inclusion of pattern measures. Not only was performance improved, but the results were achieved without including spectral noise measures, thus reducing the computational demands. 8) The best open test performance for separating normal speakers from pathological speakers was X2=28.0, p<.001 and APC=74.2 for males, and X*=22.3, p<.001 and APC=73.6 for females. The best open test performance for cancer detection was X2=33.3, p<.00l and APC=71.1. 8.4.6 COMPARISON OF HUMAN AND COMPUTER CLASSIFICATION The performance of computer classification was generally superior to that of the listeners. The APC was compared, as duplication of the listening t r i a l s causes the chi-squared st a t i s t i c s not to be directly comparable. For males, the computer matched the performance of the best listener at distinguishing normal subjects (APC=74%), and exceeded the performance at detecting cancer subjects (APC=71% vs APC=64%). Similarly, the a b i l i t y of the computer to distinguish normal subjects exceeded that of the listeners (APC=71% vs APC=65%). 8.4.7 RELATIONSHIP TO OTHER STUDIES Direct comparison of the results in this chapter with other studies i s complicated not only by differences in the algorithms and processing techniques, but also by the background and CH 8: EVALUATION 239 training of the listeners, the subjects in the database, and the nature of the speech samples. These concerns aside, other studies have reported similar correlations among measures (e.g., Davis 1981; Yumoto 1984; Wolfe & Steinfatt 1987; Hirano et a l . 1988). The correlations with severity judgements were also in general agreement. For example, in a group of 87 speakers, Yumoto et a l . (1984) reported correlations of 0.809, 0.805 and 0.712 for a harmonics-to-noise ratio (HNRL), spectrographic class i f i c a t i o n and a non-logarithmic measure of j i t t e r , respectively. A correlation of 0.868 was reported for a group of 58 speakers by Kojima et a l . (1980) for a spectral harmonics-to-noise ratio that was largely equivalent to SHNRros. Wolfe et a l . (1987) reported a correlation of 0.78 for spectrographic analysis of a group of 51 speakers, and lower correlations (less than 0.68) for non-logarithmic measures. However, increases in the correlations were observed in a multiple regression analysis when data were separated according to voice type. Further evaluation would be useful to determine i f similar increases can be obtained for the logarithmic measures presented here. A number of other factors combine with the concerns expressed at the beginning of this section to make direct comparison of recognition rates of limited value. Some of the previous studies extract measures from sentences and use specialized methods for automatically rejecting the non-vowel portions (Crystal et a l . 1970; Laver et a l . 1985). Other studies focus on short vowel segments (Davis 1976, 1981; Hiki et a l . CH 8: EVALUATION 240 1976; Smith 1980, 1983). A number of the studies did not apply a logarithmic transform to the magnitude measures, making i t likely that assumptions of normalcy were invalid (Davis 1976, 1981; Ludlow et a l . 1985; Laver et a l . 1985). Some studies separated sexes (Crystal et a l . 1970; Laver et a l . 1985) while others did not. Finally, the methods of c l a s s i f i e r evaluation frequently differed, with some studies using closed tests and others using open tests. Closed tests in particular can be significantly affected by the number of measures, the number of subjects and the distributional assumptions of the c l a s s i f i e r . Nonetheless, the results presented here compare favorably. Most other work has focussed on measures of j i t t e r and shimmer magnitude. Results presented here indicate that the inclusion of measures of noise and perturbation patterns improves performance. The tendency for cancer subjects to have relatively high levels of noise that was suggested in Hiki et a l . (1976) was also observed here. The absence of a significant separation for the DPQ contradicts observations made by Hecker and Kreul (1971) and Laver et a l . (1985). As discussed in Section 6.3.1, this is explained by sensitivity of this algorithm to quantization and center-limiting. Hecker and Kreuls 1 data was acquired using low resolution equipment, and Laver's measure incorporated a three percent threshold. CHAPTER 9: SUMMARY AND FUTURE DIRECTIONS This chapter provides a general summary of results and recommendations. Detailed summaries were also provided within other chapters. Specifically, Section 3.5 summarizes the vowel synthesis, Section 4.3 summarizes pitch-period demarcation, Sections 6.1.9, 6.2.2 and 6.3.4 summarize the mathematical analyses, Section 7.3 summarizes the calibration results and Section 8.4 summarizes the evaluation using real vowel samples. In addition, the measures were summarized in Appendix F. 9.1 MATHEMATICAL ANALYSES The mathematical analyses provided new insight into the effects of quantization and pitch-period demarcation in this area. These effects in the relative average perturbation (RAP) algorithm have been studied experimentally in the past (e.g., Titze et a l . 1987). Results presented here quantify the observed effects and provide a mathematical basis for identifying adequate sampling conditions. In addition, methods of compensation for such errors were derived. The harmonics-to-noise ratio (HNR) demonstrated an extreme sensitivity to quantization and pitch-period demarcation. Vowels with large high frequency components are the most severely affected. Thus, highly accurate pitch-period demarcation is essential, particularly i f various vowels are to be compared. It was concluded that interpolation is required for accurate HNR estimation at practical sampling frequencies. CH 9: SUMMARY AND FUTURE DIRECTIONS 242 Two new insights into the direction perturbation quotient (DPQ) were identified. F i r s t l y , the expected value for random perturbation i s not 0.5 as might intuitively be expected. Rather, i t approaches 0.66, 0.73. 0.63 and 0.6 for moving averages between 2 and 5. Secondly, when resolution is low or when a center-limit is used, the DPQ becomes sensitive to perturbation magnitude. This indicates that consistent sampling conditions are required for meaningful comparisons of results. In addition, the elevated DPQs observed for cancer subjects in Hecker and Kreul (1971) can be attributed to low resolution as well as perturbation patterns. 9.2 MEASURE DEVELOPMENT The moving average predictors in the relative average perturbation (RAP) (Koike 1973) and the directional perturbation quotient (DPQ) (Hecker & Kreul 1971) were generalized for variation of the number and spacing of points. A center-limit was also incorporated. A number of modifications were made to the harmonics-to-noise ratio (HNR) to obtain certain theoretical improvements. The modifications reduced the influence of j i t t e r and shimmer, and removed a dependence on the data offset. A relationship between the HNR and a correlation coefficient was derived, and configurations that can be computed in a single pass through the data were presented. A new measure of time domain noise, called the correlation factor (CF), was developed. Its advantage is removal of the need CH 9: SUMMARY AND FUTURE DIRECTIONS 243 for a long term average, thus relieving the constraint that the harmonic component be invariant throughout the analysis. The CF also allows for measurement of cyclic noise perturbations. The performance characteristics of measures of spectral noise were related to common issues in Fourier spectrum analysis. One such measure (Kojima et a l . 1980) was shown to be strongly influenced by pitch synchronization and spectral leakage. This algorithm was generalized for greater f l e x i b i l i t y in choosing an analysis window to control leakage. An approach for taking advantage of fast Fourier transforms was shown to produce a significant decrease in computation time with l i t t l e loss of accuracy. Finally, an algorithm change designed to reduce dependence on the formant structure was proposed. New measures for quantifying cyclic variations in j i t t e r , shimmer and time domain noise were derived from the RAP algorithm and the CF algorithm. Such measures have an intuitive relationship with voice abnormalities such as diplophonia. 9.3 PITCH-PERIOD DEMARCATION Cross-correlation was combined with parabolic interpolation to obtain high resolution pitch-period demarcation at moderate sampling frequencies (i.e.; 10 kHz or 20 kHz). This is useful for accurate analysis of j i t t e r and noise with low-cost computing equipment. The method was shown to be effective at reducing errors for a l l but the most severely perturbed waveforms. CH 9: SUMMARY AND FUTURE DIRECTIONS 9.4 VOWEL SYNTHESIS AND MEASURE CALIBRATION 244 Vowel waveforms for calibrating the measures were synthesized by d i g i t a l f i l t e r i n g of an impulse train. This application required synthesis at a high sampling frequency to obtain high resolution for j i t t e r perturbations and pitch-period markers that did not a l l conveniently occur at integer sampling intervals. However, alteration of the sampling period of the f i l t e r was shown to produce a significant attenuation of high frequency components. Methods of compensation for this effect were devised. Specifications for synthesis of /a/, / i / and /u/ data at sampling frequencies between 10 kHz and 100 kHz were derived. The importance of unintended dependences of a measure on vowel characteristics such as fundamental frequency, formant structure or the relative level of j i t t e r , shimmer and noise depends on i t s application. For assessment of laryngeal pathology and monitoring of disease progression, dependences on these characteristics increase the variance of the measures, but does not preclude their use. The problem is more c r i t i c a l i f the goal i s to gain insight into laryngeal dynamics or to quantify specific acoustical attributes of speech. For example, an abnormally low measure of noise may be c l i n i c a l l y meaningful, but measurement interactions can complicate inferences about the underlying physiological cause. The calibration results provide a comparative evaluation of the influences of fundamental frequency, vowel type, perturbation CH 9: SUMMARY AND FUTURE DIRECTIONS 245 type, perturbation level, pitch-period demarcation and quantization. Some of these issues were addressed in Hillenbrand (1987). However, the present study is broader in scope, and preliminary results were published in Cox et a l . (1986a, 1986b). In general, the measures had a significant sensitivity to quantization and pitch-period demarcation. The effects predicted in the mathematical analyses were confirmed. Interpolation and pitch-period marker optimization were recommended for accurate analyses at sampling frequencies of 20 kHz or below. A notable exception was PSAVDB (i.e., shimmer based on pitch-period standard deviation). Interpolation in this case degraded performance. The sensitivity of the spectral noise measures to demarcation errors was shown to be reduced through the use of a tapered window function. In addition to sensitivity to demarcation errors, the measures of time domain noise were shown to be highly sensitive to the demarcation offset. This was attributed to the combined influence of two vocal tract excitations on each demarcated pitch-period, leading to increased sensitivity to j i t t e r and shimmer. This indicates that markers should be aligned with the instant of glottal closure. For this reason, alignment with a convenient peak or zero-crossing is not recommended. A l l of the measures of shimmer, time domain noise and spectral noise were affected by fundamental frequency and vowel type. For shimmer analyses, a reduction of both influences was obtained by measuring standard deviations rather than peak CH 9: SUMMARY AND FUTURE DIRECTIONS 246 amplitudes (i.e., PSAVDB rather than PAAVDB). Similarly, these influences were reduced for the correlation factor (CF) when compared to the other time domain noise measures (HNRL and HNRN). SHNRros (spectral noise measured as in Kojima et a l . (1981)) was the most severely affected by vowel type. The modifications leading to SHNRsor effectively reduced this dependence. A general conclusion with regard to perturbation type is that j i t t e r can affect measures of the other perturbations. The modifications to the original harmonics-to-noise ratio (HNRL) that lead to HNRN and CF were effective at reducing this effect. Because pitch-period superposition was the hypothesized cause, the effects may be exaggerated here due to the lack of source-tract interaction in the vowel synthesizer. J i t t e r was shown to have a major influence on the measures of spectral noise. This is not surprising, as clear definition of the harmonic component in the frequency domain depends on the regularity of repetition in the time domain. This indicates that equating spectral noise with time domain noise can be misleading when drawing conclusions from published studies. The same holds for comparisons with spectrographic analyses. 9.5 MEASURE EVALUATION Histograms supported the assumption of a normal distribution for the log-magnitude, directional perturbation, cyclic, and correlation measures. It follows that traditional measures of j i t t e r and shimmer are better modelled as log-normal. This argues against correlating or regressing such measures with CH 9: SUMMARY AND FUTURE DIRECTIONS 247 subjective judgements of hoarseness, as normalcy is an implicit assumption. The log-magnitude measures had a strong correlation with subjective judgements of hoarseness. The CF was the best single measure, followed by PDAVDB and PAAVDB. However, a further improvement was obtained by combining CF and PDAVDB in a principal components analysis, leading to an overall correlation of r~.84 for the males, and r~.80 for the females. Some conclusions were: 1) the long term average in the HNR detracted from i t s performance, 2) time domain noise was the most useful single type of perturbation, 3) shimmer was the least useful type of perturbation, and 4) performance can be improved through combination of measures of j i t t e r and time domain noise. Significant differences between the sexes were observed for a number of the measures. Higher levels of shimmer and noise were observed for the male speakers. This indicates that the listeners accepted as normal for males a higher level of these perturbations, because the severity judgements were approximately the same for both groups. Segregation of data according to sex was recommended. The males and females were each separated into four classes according to the primary diagnosis. For males, the classes were normal, functional dysphonia, benign lesion and cancer. For females the classes were normal, functional dysphonia, muscle tension dysphonia and other. CH 9: SUMMARY AND FUTURE DIRECTIONS 248 For the computed measures the distinctions among the male classes were larger than those for the females. The log-magnitude measures had the largest differences. Of those, the measures of time domain noise were generally preferred, and the CF was the best single measure. Also, the measures of shimmer gained importance when combined with other measures, and were preferred over the measures of j i t t e r . A method of "magnitude compensation" through principal components analysis was used to determine i f the class distinctions were primarily due to overall perturbation magnitude. This was demonstrated to be false for males. The decline in classification performance was small. The dependence on perturbation magnitude was greater for the females. Partitions of the output of 4-class c l a s s i f i e r s were preferred over 2-class cl a s s i f i e r s for males. The a b i l i t y to distinguish normal subjects and cancer subjects was superior. In addition, the risk of a false negative for cancer detection i s less when the mis-classification is to another pathological class, as further medical would nonetheless be recommended. For females, the 2-class cl a s s i f i e r s were superior. Classification performance was improved through inclusion of the pattern measures. In a "leave-one-out" open test, the average percent correct for detection of normal speakers was approximately 74% for both males and females. The best open test performance at detection of cancer was approximately 71%. CH 9: SUMMARY AND FUTURE DIRECTIONS 249 The computer classification matched the a b i l i t y of the best listener at distinction of normal subjects, and exceeded the listeners' a b i l i t i e s at detecting cancer. Thus, i t may be overly optimistic to hope for significant improvement in classification performance using perturbation measures extracted from vowel segments. Other types of measures extracted from more complex speech segments should be considered. 9.6 FUTURE DIRECTIONS This section provides a general discussion of considerations for future work. Specific extensions of the work presented in this thesis were discussed in greater detail in the summary sections l i s t e d at the start of this chapter. An important consideration for future research i s careful calibration of methodologies. Sources of error and varia b i l i t y must be identified for meaningful comparisons of results. Automated methods used at various stages in the analysis should be thoroughly tested in isolation for pathological speakers before incorporation into the overall system. For example, further evaluation i s warranted for automated strategies for pitch-period demarcation that have been developed for normal speakers (e.g. Hess 1983). As was demonstrated in this thesis, synthesized test waveforms can be useful for performing such evaluations. Another direction i s to expand on the types of data from which measures are extracted. Relevant demographic information would help to account for external sources of v a r i a b i l i t y . Some CH 9: SUMMARY AND FUTURE DIRECTIONS 250 concern was expressed by the listeners about the concept of focussing on the "clearest" portion of a vowel when i t is the vocal disturbance that is at issue. Thus, quantification of intermittences or other characteristics of the unclear portions may be useful. Certain aspects of voice onset and offset are also used c l i n i c a l l y . Data obtained through inverse f i l t e r i n g could logically be incorporated. Measures of vocal exercises or continuous speech may provide additional relevant information. Finally, data from other equipment, such as an electroglottograph or oral airflow measurements, could be usefully integrated. It may be unrealistic to expect that a single number w i l l be sufficient to characterize the perturbations in speech segments. It i s l i k e l y that the amount of perturbation in the vowels of a sentence varies considerably. Even within an isolated vowel the degree of perturbation can vary, particularly near i t s onset or end. Thus, measures of the time course of perturbations warrant further investigation. There are a number of p o s s i b i l i t i e s for further exploration in pattern classification. Methods of measure selection and dimensionality reduction are available that make better use of multi-variate relationships (e.g. Hand 1981). Pooling of the covariance matrices would reduce dimensionality problems and make linear discriminant analysis an option. Results in Section 8.3.6 indicate that a logarithmic transformation of measures of perturbation magnitude reduces the difference in variance for normal and pathological classes. This suggests that pooling of CH 9: SUMMARY AND FUTURE DIRECTIONS 251 the c o v a r i a n c e i s a p p r o p r i a t e . F i n a l l y , other c l a s s i f i e r s w i t h d i f f e r e n t s t a t i s t i c a l assumptions may improve performance. Some o f the d i f f i c u l t y i n c l a s s i f y i n g p a t h o l o g i e s can be a t t r i b u t e d t o ambiguity i n the d e f i n i t i o n o f the problem. The d e f i n i t i o n o f what c o n s t i t u t e s a "normal" speaker i s vague. For example, a t r a i n e d s i n g e r may seek med i c a l a t t e n t i o n because o f " l o s s o f h i g h notes" or " f a t i g u e a t the end o f a performance". Should such a p a t i e n t be c l a s s i f i e d as p a t h o l o g i c a l when such symptoms would not be c o n s i d e r e d p r o b l e m a t i c f o r anyone but a s i n g e r ? C o n v e r s e l y , should s u b j e c t s t h a t have been "hoarse a l l o f t h e i r l i f e " be c l a s s e d as normal? Even on p h y s i c a l examination the d i s t i n c t i o n i s f r e q u e n t l y not c l e a r , as a wide range o f sub-optimal c o n d i t i o n s are accepted as normal, and the p r o g r e s s i o n t o p a t h o l o g i c a l s t a t u s i s o f t e n a g r a d u a l one. D i s t i n c t i o n s among p a t h o l o g i e s are a l s o vague. While o r g a n i c h a l l m a r k s o f c e r t a i n p a t h o l o g i e s can be i d e n t i f i e d through p h y s i c a l and l a r y n g o s c o p i c examination, the p a t h o l o g i e s o f t e n c o e x i s t . For example, muscle t e n s i o n dysphonia has a l a r g e f u n c t i o n a l component, y e t i t l e a d s t o the development o f o r g a n i c problems on the v o c a l c o r d s . The p h y s i c a l m a n i f e s t a t i o n s o f each pathology can vary c o n s i d e r a b l y . V o c a l nodules, f o r example, can be f l e s h y or f i r m , vary i n s i z e and can c o e x i s t w i t h g e n e r a l v o c a l c o r d edema or a p o s t e r i o r g l o t t i c c h i n k . Thus, i t may be an o v e r - s i m p l i f i c a t i o n t o c a t e g o r i z e p a t h o l o g i c a l s u b j e c t s a c c o r d i n g t o the primary d i a g n o s i s . CH 9: SUMMARY AND FUTURE DIRECTIONS 252 In conclusion, this thesis provided methodologies for extraction of relevant information from perturbations in vowel recordings. While the results have indicated promise in the major areas of c l i n i c a l application, further work is needed. It is recommended that other measures extracted from more complex speech segments be considered. However, because effective u t i l i z a t i o n depends on the ease with which a c l i n i c i a n is able to combine the various sources of relevant data, further research should also be directed towards effective presentation of information, as well as improvement of the specificity of that information. 253 BIBLIOGRAPHY ASKENFELT A. & HAMMARBERG B. (1980). "Speech Waveform Perturbation Analysis". Speech Transmission Laboratory -Quarterly Progress and Status Report, 40-48. ASKENFELT A. & HAMMARBERG B. (1981). "Speech Waveform Perturbation Analysis Revisited". Speech Transmission Laboratory - Quarterly Progress and Status Report, 49-68. BAER T. (1981). "Investigation of the Phonatory Mechanism". In C. Ludlow (Eds.), Proceedings of the Conference of the Assessment of Vocal Pathology, ASHA Reports, 11, 38-48. BEROUTI M. CHILDERS D.G. & PAIGE A. (1977). "Correction of tape recorder distortion". IEEE In t l . Conf. on ASSP, 397. BOOTH J.R. & CHILDERS D.G. (1979). "Automated Analysis of High Speed Laryngeal Films". IEEE Trans, on BME, 26, 185-192. CHILDERS D.G. & DURLING A. (1975). "Digital F i l t e r i n g and Signal Processing". New York: West Publishing. CHILDERS D.G. (1977). "Laryngeal Pathology Detection". CRC C r i t i c a l Reviews in Bioengineering, 2, 375-425. CHILDERS D.G., MOTT J.S. & MOORE G.P. (1980). "Automatic Parameterization of Vocal Cord Motion from Ultra High Speed Films". IEEE In t l . Conf. on ASSP, 65-68. CHILDERS D.G. (1984). "A C r i t i c a l Review of Electroglottography". CRC C r i t i c a l Reviews in Bioengineering, 12, 131-161. COLEMAN R.F. (1971). "Effect of Waveform Changes upqn__ Roughness Perception". Folia Phoniatrica, 23, 314-322. CRYSTAL T.H., MONTGOMERY W.W., JACKSON C L . & JOHNSON N. (1970). "Methodology and Results on Laryngeal Disorder Detection through Speech Analysis (Final Report, Contract PH-86-68-192)", Rockville Maryland: Public Health Services and Mental Health Administration. COX N.B. & MORRISON M.D. (1983). "Acoustic Analysis of Voice for Computerized Laryngeal Pathology Assessment". J. Otolaryngol., 12, 295-301. COX N.B., ITO M.R. & MORRISON M.D. (1986 a). "Calibration of Computed Features of Isolated Vowels using Synthetic Vowel Waveforms". J. Acoust. Soc. Am. Suppl. 1, 79, S95. 254 COX N.B., MORRISON M.D. & ITO M.R. (1986b). " O p t i m i z i n g P i t c h - P e r i o d Markers P r i o r t o E x t r a c t i n g F e a t u r e s from I s o l a t e d vowels". Proceedings o f the 12th I n t e r n a t i o n a l Congress on A c o u s t i c s . A l - 7 . COX N.B., ITO M.R. & MORRISON M.D. (19 8 9 a ) . " T e c h n i c a l C o n s i d e r a t i o n s i n Computation o f S p e c t r a l Harmonics-to-Noise R a t i o s f o r Su s t a i n e d Vowels". J o u r n a l o f Speech and Hearing Research. 203-218. COX N.B., ITO M.R., & MORRISON M.D. (1989b). "Data L a b e l l i n g and Sampling E f f e c t s i n Harmonics-to-Noise R a t i o s " . J . Acoust. Soc. Am.. A p r i l . COX N.B., ITO M.R. & MORRISON M.D. ( i n p r e s s ) . " Q u a n t i z a t i o n and Measurement E r r o r s i n the A n a l y s i s o f Short-time P e r t u r b a t i o n s i n Sampled Data". J . Acoust. Soc. Am.. CROCHIERE R.E. & RABINER L.R. (1981). " I n t e r p o l a t i o n and Decimation o f D i g i t a l S i g n a l s - A T u t o r i a l Review". Proceedings o f the IEEE, 69, 300-331. DAVIS S.B. (1976). "Computer E v a l u a t i o n o f L a r y n g e a l Pathology based on Inverse F i l t e r i n g o f Speech". SCRL Monograph, 13, Santa Barbara, CA. DAVIS S.B. (1981). " A c o u s t i c C h a r a c t e r i s t i c s o f Normal and P a t h o l o g i c a l V o i c e s " . In C. Ludlow ( E d s . ) , Proceedings o f the Conference o f the Assessment o f V q c a l Pathology, ASHA Reports, 11, 97-112. DELLER J.R. (1979). " A c o u s t i c A n a l y s i s o f L a r y n g e a l D y s f u n c t i o n u s i n g the Systems I d e n t i f i c a t i o n P r o p e r t i e s o f the D i g i t a l Inverse F i l t e r " . Ph.D. D i s s e r t a t i o n , U n i v e r s i t y o f Michigan. DELLER J.R. & ANDERSON D.J. (1980). "Automatic C l a s s i f i c a t i o n o f L a r y n g e a l D y s f u n c t i o n u s i n g the Roots o f the D i g i t a l I n v e r s e F i l t e r " . IEEE Trans, on BME, 37, 714-721. DOHERTY E.T..& SHIPP T. (1988). "Tape Recorder E f f e c t s on J i t t e r and Shimmer E x t r a c t i o n " . J o u r n a l o f Speech and Hearing Research, 31, 485-490. FANT G. (1960). " A c o u s t i c Theory o f Speech P r o d u c t i o n " . The Hague, Netherlands: Mouton. FANT G. & ANANTHAPADMANABHA T.V. (1982). " T r u n c a t i o n and S u p e r p o s i t i o n " . Speech T r a n s m i s s i o n L a b o r a t o r y - Q u a r t e r l y Progress and St a t u s Report 2-3, 1-18. FANT G. (1982). " P r e l i m i n a r i e s t o A n a l y s i s o f the Human Voi c e Source". Speech T r a n s m i s s i o n L a b o r a t o r y - Q u a r t e r l y Progress and St a t u s Report 4, 1-28. 255 FOURCIN 0. (1981). "Laryngographlc Assessment of Phonatory Function". In C. Ludlow & M. 0'Connell-Hart (Eds.): Proceedings of the Conference of the Assessment of Vocal Pathology, ASHA Reports, 11, 116-125. FROKJAER-JENSEN B. & PRYTZ S. (1976). "Registration of Voice Quality". Bruel and Kjar Technical Review. FUJIMURA 0. (1981). "Fiber-optic Observation and Measurement of Vocal Fold Movement". In C. Ludlow & M. 0'Connell-Hart (Eds.): Proceedings of the Conference of the Assessment of Vocal Pathology, ASHA Reports, 11, 59-68. GOLD B. & RABINER L.R. (1968). "Analysis of Digital and Analog Formant Synthesizers", IEEE Trans, on Audio and Electroacoustics, 16, 81-94. HAND D.J. (1981). "Discrimination and Classification". New York: John Wiley and Sons. HARDEN R.J.R. (1975). "Comparison of glottal area changes as measured from ultra-high speed photographs and photoelectric glottographs". Journal of Speech and Hearing Research, 18, 728. HARWOOD A.R., BEALE F.A., CUMMINGS B.J., KEANE T.J., PAYNE D.G. & RIDER W.D. (1983). "Management of Early Sypraglottic Laryngeal Carcinoma by Irradiation with Surgery in Reserve". Arch. Otolaryngol., 109, 583-585. HECKER M.H.L. & KREUL, E.J. (1971). "Descriptions of the Speech of Patients with Cancer of the Vocal Folds". J. Acoust. Soc. Am., 49, 1275-1282. HEIBERGER V.L. & HORII Y. (1982). "Ji t t e r and Shimmer in Sustained Phonation". Speech and Language: Advances in Basic Research and Practice, ]_,. 299-333. HERTZ D. (1986). "Time Delay Estimation by Combined Efficient Algorithms and Generalized Cross-Correlation Methods". , IEEE Trans, on ASSP, 34, 1-7. HESS W. (1983). "Pitch Determination of Speech Signals; Algorithms and Devices". New York: Springer Verlag. HIKI S., IMAIZUMI S., HIRANO M., MATSUSHITA H. & KAKITA Y. (1976). "Acoustical Analysis of Voice Disorders". IEEE I n t l . Conf. on ASSP, 613. HILLENBRAND J. (1987). "A Methodological Study of Perturbation and Additive Noise in Synthetically Generated Voice Signals". Journal of Speech and Hearing Research, 30, 448-461. 256 HIRANO M., HIKI S., YOSHIDA T., HIRADE Y., KASUYA H. & KIKUCHI Y. (1988). "Acoustic Analysis of Pathological Voice; Some Results of C l i n i c a l Application". Acta Otolaryngology, 105, 432-438. HOLMES J.N. (1975). "Frequency phase distortion of speech recordings". J. Acoust. Soc. Am., 53, 39. HORII Y. (1979). "Fundamental Frequency Perturbation Observed in Sustained Phonation". Journal of Speech and Hearing Research, 22, 5-19. ISSHIKI N. (1981). "Vocal Efficiency Index". In K.N Stevens & M. Hirano (Eds.), Vocal Fold Physiology. Japan: University of Tokyo, 193-209. IWATA S. & VON-LEDEN H. (1970). "Pitch Perturbations in Normal and Pathologic Voices". Folia Phoniatrica, 22, 413-424. IWATA S. (1972). "Periodicities of Pitch Perturbations in Normal and Pathologic Larynges". Laryngoscope, 82, 87-96. JAIN A.K. & CHANDRASEKARAN B. (1982). "Dimensionality and Sample Size Considerations in Pattern Recognition Practice", in P.R. Krishnaiah & L.N. Kanal (Eds.). Handbook of Statistics 2; Classification, Pattern Recognition and Reduction of Dimensionality. New York: North Holland. 835-856. KAPLAN M.J., JOHNS M.E., MCLEAN W.C, FITZ-HUGH G.S., CLARK D.A., BOYD J.C. & CANTRELL R.W. (1983). "Stage II Glottic Carcinoma: Prognostic Factors and Management". Laryngoscope, 93, 725-728. KITAJIMA K. (1981). "Quantitative Evaluation of the Noise Level in the Pathological Voice". Folia Phoniatrica, 33, 115-124. KLINGHOLTZ F. & MARTIN F. (1985). "Quantitative Spectral Evaluation of Shimmer and J i t t e r " . Journal of Speech and Hearing Research, 28, 169-174. KOIKE Y. (1968). "Vowel Amplitude Modulations in Patients with Laryngeal Diseases". J. Acoust. Soc. Am., 45, 839-844. KOIKE Y. (1973). "Application of some Acoustic Measures for the Evaluation of Laryngeal Dysfunction". Studia Phonologica, ]_' 17-23. KOIKE Y. & MARKEL J. (1975). "Application of Inverse Fi l t e r i n g for Detecting Laryngeal Pathology". Ann. Otol. Rhinol. Laryngol., 84, 117-124. 257 KOIKE Y., TAKAHASHI H. & CALCATERRA T.C. (1977). "Acoustic Measures for Detecting Laryngeal Pathology". Acta. Otolaryngol., 84, 105-117. KOJIMA H., GOULD W.J., LAMBIASE A. & ISSHIKI N. (1980). "Computer Analysis of Hoarseness". Acta Otolaryngologica, 89, 547-554. LAVER J., MACKENZIE J., HILLER S. & ROONEY E. (1985). "Acoustic Screening for Vocal Pathology". Transcripts of the 14'th Symposium of the Voice Foundation; Care of the Professional Voice, Part 2; Pedagogy and Medical. Denver Center for the Performing Arts, 241-251. LIEBERMAN P. (1961). "Perturbations in Voice Pitch". Acoust. Soc. Am., 33, 597-603. LIEBERMAN P. (1963). "Some Acoustical Measures of the Fundamental Periodicity of Normal and Pathological Larynges". J. Acoust. Soc. Am., 35, 344-353. LOFQVIST A. & MANDERSSON B. (1987). "Long-Time Average Spectrum of Speech and Voice Analysis". Folia Phoniatrica, 39, 221-229. LUDLOW C L . (1981). "Research Needs for the Assessment of Phonatory Function". In C. Ludlow & M. 0'Connell-Hart (Eds.), Proceedings of the Conference of the Assessment of Vocal Pathology, ASHA Reports, 3-8 LUDLOW C.L., BASSICH C.J., CONNOR N.P., COULTER D.C. & LEE Y.L. (1985). "The Validity of using Phonatory J i t t e r and Shimmer to Detect Laryngeal Pathology". Proceedings of the Fourth International Vocal Fold Physiology Conference, New Haven, Connecticut. MARKEL J.D. & GRAY A.H. (1976). "Linear Prediction of Speech", (p. 167). New York: Springer Verlag. MILENKOVIC P. (1987). "Least Mean Square Measures of Voice Perturbation". Journal of Speech and Hearing Research, 30, 529-538. MONSEN R.B. (1981). "The Use of the Reflectionless Tube to Assess Vocal Function". In C. Ludlow & M. 0'Connell-Hart (Eds.), Proceedings of the Conference of the Assessment of Vocal Pathology, ASHA Reports, 11, 141-148. MORRISON M.D. & COX N.B. (1983). "The Otolaryngologist and the Voice; Computer Assessment". Annals of the Royal College of Physicians and Surgeons of Canada, 16, 569-574. 258 MORRISON M.D. (1984). "A C l i n i c a l V o i c e L a b o r a t o r y , Video Tape and S t r o b o s c o p i c I n s t r u m e n t a t i o n " . O t o l a r y n g o l o g y - Head and Neck Surgery, 487-488. MORRISON M.D. (1988). p e r s o n a l communication. NATIONAL INSTITUTES OF HEALTH. (1975). " T h i r d N a t i o n a l Cancer Survey, Inc i d e n c e Data". N a t i o n a l Cancer I n s t i t u t e Monograph, 41, Washington DC: DHEW P u b l i c a t i o n number (NIH) 75-787. OPPENHEIM A.v. & SCHAFER R.W. (1975). " D i g i t a l S i g n a l P r o c e s s i n g " . Englewood C l i f f s , New J e r s e y : P r e n t i c e H a l l . RABINER L.R. (1977). "On the use o f A u t o c o r r e l a t i o n A n a l y s i s f o r P i t c h D e t e c t i o n " . IEEE Trans, on ASSP, 25, 24-33. RABINER L.R. & SCHAFER R.W. (1978). " D i g i t a l P r o c e s s i n g o f Speech S i g n a l s " . Englewood C l i f f s , New J e r s e y : P r e n t i c e - H a l l . RAMIG L. Gr RINGEL R. (1983). " E f f e c t s o f P h y s i o l o g i c a l Aging on S e l e c t e d A c o u s t i c C h a r a c t e r i s t i c s o f V o i c e " . J o u r n a l o f Speech and Hearing Research, 26, 22-30 ROTHENBERG M. (1977). "Measurement o f A i r f l o w i n Speech". J o u r n a l o f Speech and Hearing Research, 20, 155-176. ROTHENBERG M. (1981). "Some R e l a t i o n s between G l o t t a l A i r f l o w and V o c a l F o l d Contact Area". In C. Ludlow & M. 0'Connell-Hart ( E d s . ) , Proceedings o f the Conference o f the Assessment o f V o c a l Pathology, ASHA Reports, 11, 88-95 SACHS L. (1984). " A p p l i e d S t a t i s t i c s ; A Handbook o f Techniques". New York: S p r i n g e r - V e r l a g . SCHAFER R.W. S> RABINER L.R. (1973). "A D i g i t a l S i g n a l P r o c e s s i n g Approach t o I n t e r p o l a t i o n " . Proceedings o f the IEEE, 61, 692-702. SMITH A.M. (1980). " L i n e a r P r e d i c t i o n A n a l y s i s / S y n t h e s i s Techniques A p p l i e d t o the Speech A c o u s t i c and E l e c t r o g l o t t o g r a p h i c Waveforms". Ph.D. D i s s e r t a t i o n , U n i v e r s i t y o f F l o r i d a . SMITH A.M. (1981). "Feature E x t r a c t i o n f o r L a r y n g e a l E v a l u a t i o n " . IEEE I n t l . Conf. on ASSP, 137-140. SMITH A.M. & CHILDERS D.G. (1983). "Laryngeal E v a l u a t i o n Using F e a t u r e s from Speech and the E l e c t r o g l o t t o g r a p h " , IEEE Trans, on BME, 30, 775-759. SONDHI M.M. (1975). "Measurement o f the G l o t t a l Waveform". J V Acoust. SOC. Am., 57, 228-232. 259 SPIEGEL M.R. (1975). "Schaum's Outline Series; Theory and Problems of Probability and Statistics". New York: McGraw H i l l . SRIVASTAVA M.S. & KHATRI CG. (1979). "An Introduction to Multivariate Statistics". New York: North Holland. TIMCKE R., MOORE G.P. St VON-LEDEN H. (1958). "Laryngeal . Vibrations, Measurements of the Glottic Wave I. The Normal Vibratory Cycle". Arch. Otolaryngol., 68, 1. TIMCKE R., MOORE G.P., & VON-LEDEN H. (1959). "Laryngeal Vibrations, Measurements of the Glottic Wave II. Physiologic Vibrations". Arch. Otolaryngol., 69, 438. TITZE I. (1981). "Biomechanics and distributed mass models of the vocal fold vibration". In K.N. Stevens & M. Hirano (Eds.), Vocal Fold Physiology. Japan: University of Tokyo, 245-264. TITZE I.R., HORII Y. St SCHERER R.C. (1987). "Some Technical Considerations in Voice Perturbation Measurements". Journal of Speech and Hearing Research, 30, 252-260. VAN-CAMPENHOUT J.M. (1982). "Topics in Measurement Selection", in P.R. Krishnaiah & L.N. Kanal (Eds.). Handbook of Statistics 2; Classification, Pattern Recognition and Reduction of Dimensionality. New York: North Holland. 793-803. VON-LEDEN H., MOORE G.P. & TIMCKE R. (1960). "Laryngeal Vibrations, Measurements of the Glottic Wave III. The Pathologic Larynx". Arch. Otolaryngol., 71, 16. VON-LEDEN H. & KOIKE Y. (1970). "Detection of Laryngeal Disease by Computer Technique". Arch. Otolaryngol., 91, 3-10. WENDAHL R.W. (1963). "Laryngeal Analog Synthesis of Harsh Voice Quality". Folia Phoniatrica, 15, 241. WENDAHL R.W. (1966). "Laryngeal Analog Synthesis of J i t t e r and Shimmer, Auditory Parameters of Harshness". Folia Phoniatrica, 18, 98. WILDE D.J. & BEIGHTLER CS. (1967). "Foundations of Optimization", (p. 296). Englewood C l i f f s , New Jersey: Prentice Hall. WOLFE V.I. & STEINFATT T.M. (1987). "Prediction of Vocal Severity Within and Across Voice Types". Journal of Speech and Hearing Research, 30, 230-240. WONNACOTT T.H. Ec WONNACOTT R.J. (1981). "Regression: A Second Course in Statistics". New York: John Wiley and Sons. 260 YANAGIHARA N. (1967). "Significance of Harmonic Changes and Noise Components in Hoarseness". Journal of Speech and Hearing Research, 10, 531-541. YOUNG T.Y. & CALVERT T.W. (1974). "Classification, Estimation and Pattern Recognition". New York: Elsevier. YUMOTO E., GOULD W. St BAER T. (1982). "Harmonics-to-Noise Ratio as an Index of the Degree of Hoarseness". J. Acoust. Soc. Am., 71, 1544-1550. YUMOTO E. (1983). "The Quantitative Evaluation of Hoarseness: A new Harmonics-to-Noise Ratio Method". Arch. Otolaryngol., 109, 48-52. YUMOTO E., SASAKI Y. St OKAMURA H. (1984). "Harmonics-to-Noise Ratio and Psycho-physical Measurement of the Degree of Hoarseness". Journal of Speech and Hearing Research, 27, 2-6.. ZEIMER R.E. St TRANTER W.H. (1976). "Principles of Communications: Systems, Modulation and Noise". Boston: Houghton M i f f l i n . ZYSKI B.J., BULL G.L., MCDONALD W.E. St JOHNS M.E. (1984). "Perturbation Analysis of Normal and Pathologic Larynges". Folia Phoniatrica, 36, 190-198. 261 APPENDIX A: GLOSSARY OF MEDICAL TERMS This appendix describes medical terms that are relevant to detection and management of vocal pathologies. The definitions have been simplified and are intended as a brief introduction for non-medical readers. BENIGN LESION = An non-cancerous organic disorder affecting the functioning of the larynx and vocal cords. BILATERAL PARALYSIS = Paralysis of the controlling muscles of both vocal cords. BREATHINESS = A whisper-like quality that i s sometimes perceived in vowels. Increased breathiness is a symptom of certain vocal pathologies. CANCER = see glotti c cancer CHRONIC LARYNGITIS = A condition characterized by a general reddening and thickening of the vocal cords, and is usually caused by some sort of vocal abuse. CONTACT ULCER/GRANULOMA = An ulceration that generally appears at the posterior end of the gl o t t i s , and is generally caused by chronic i r r i t a t i o n of the vocal cords. A common source of this i r r i t a t i o n i s gastroesophageal reflux. A granuloma i s the scar tissue produced as the body attempts to heal the ulcer. CONVERSION REACTION - A functional dysphonia that i s a physiological reaction to a sudden psychologically-based A: MEDICAL TERMS 262 event. Sometimes i t is associated with coping or avoidance strategies regarding the psychological conflict. Sub-classifications are the same as for functional dysphonia. FALSE CORDS = see ventricular bands FORMANT = A resonance of the vocal tract. The frequencies and bandwidths of formants of a vowel largely determine i t s type. FUNCTIONAL DYSPHONIA = A broad class of vocal dysfunctions for which there is no apparent organic origin. The mechanism is the manner of use of the anatomy of voice production. Functional dysphonia is sometimes precipitated by psychological or behavioral factors. The following sub-classifications are used: FD(b) = A functional dysphonia with bowing of the vocal cords (i.e., contact at both ends but not the middle). FD(ha) = A functional dysphonia with hypo-adducted vocal cords (i.e., vocal cords that do not come together fu l l y during voicing). FD(ns) = A functional dysphonia with a variety of "non-specific" manifestations. FD(vb) = A functional dysphonia with hyper-adducted ventricular bands (i.e., ventricular bands brought too tightly together). A: MEDICAL TERMS 263 FUNDAMENTAL FREQUENCY = The frequency of oscillation of the vocal cords. GASTRO-ESOPHAGEAL REFLUX = Mild regurgitation of the acidic contents of the stomach. GLOTTIC CANCER = A cancerous growth primarily invading laryngeal structures. Stage 1 (TIG) i s the least severe and is generally limited to the surface tissue of the vocal cords. With increasing stage, the cancer invades deeper into the laryngeal tissue, and increasingly limits the mobility of the vocal cords. In later stages, the cancer spreads to surrounding structures. GLOTTIS = The space between the vocal cords, although in some contexts i t is used synonymously with vocal cords (e.g. glottic cancer). Only the posterior end of the glottis opens for breathing, thus forming a triangular shaped airway. The posterior end is farthest from the point of the larynx can be f e l t on the front of the neck. GLOTTAL VOLUME-VELOCITY = The airflow immediately above the glo t t i s . HOARSENESS = A perceived persistent roughness in vowels that is often a symptom of laryngeal pathology. HYPERKERATOSIS = A diffuse white plaque-like thickening of the mucosal lining of the vocal cords. Hyperkeratosis is thought to be a pre-cancerous formation. A: MEDICAL TERMS 264 HYPO-PHARYNGEAL CANCER = A cancerous growth originating in the swallowing portion of the throat around the posterior aspect of the larynx. JITTER = rapid fluctuation of pitch-period duration LARYNGEAL CANCER = see glottic cancer LARYNGEAL PATHOLOGY = A vocal pathology for which the normal functioning of the larynx and vocal cords is disrupted. However, laryngeal pathology and vocal pathology are often used synonymously. LARYNGEAL TRAUMA = Fracture or other trauma to the larynx. Trauma from an external source, such as a hockey puck, is labelled LTE. Trauma from an internal source, such as surgical misadventure or endotracheal intubation, is labelled LTI. LARYNGECTOMY = Surgical removal of the larynx. LARYNGOSCOPE = An instrument for visualization of the larynx. Modern laryngoscopes use glass fibers and mirrors for both illumination and visualization. LARYNGOSTROBOSCOPE = a fiber-optic laryngoscope for which the light source is "strobed" at approximately the frequency of vocal cord vibration. This f a c i l i t a t e s inspection of the vocal cord vibratory cycle. LARYNX = A collection of muscle and cartilage located in the neck at the upper end of the trachea. The larynx A: MEDICAL TERMS 265 includes the vocal cords and is used, in part, for voicing. LARYNX RISE = A rise in larynx position as a person increases vocal pitch. This is often associated with increased muscular tension during voicing. MUSCULAR TENSION DYSPHONIA = A vocal disorder characterized by a posterior glottic chink, increased suprahyoid tension, and larynx rise with pitch rise on phonation. Depending on the presence of other laryngeal conditions (often caused by MTD), the following sub-classification is used. MTD 1 = MTD with normal vocal cords MTD 2A = MTD with vocal nodules MTD 2B = MTD with chronic laryngitis MTD 2C = MTD with polypoid degeneration NASOLARYNGOSCOPE = flexible laryngoscope that can be inserted through a n o s t r i l . This allows laryngeal visualization without obstruction of the vocal tract. OTOLARYNGOLOGY = A medical specialty that is primarily concerned with problems of the ear, nose and throat. PHONATION BREAK = see pitch break PITCH = For this document the pitch is the inverse of the average pitch-period duration. For general speech science, i t is also a complex quality of voice that is A: MEDICAL TERMS 266 related to but not solely determined by the frequency of vocal cord vibration. PITCH BREAK = A brief interruption of voicing during production of a vowel. PITCH-PERIOD = For this document a pitch-period is the acoustic waveform produced during one cycle of the vocal cords. It is also frequently equated to pitch-period duration. PITCH-PERIOD DURATION = The time required for one cycle of the vocal cords. POSTERIOR GLOTTIC CHINK = a triangular shaped gap that results when the cartilages forming the posterior third of the vocal cords f a i l to come together during voicing. SHIMMER = rapid period-to-period fluctuation of pitch-period amplitude. SPEECH PATHOLOGY = The study and treatment of disorders affecting the production of speech. STRIDENCY = A strained or forced quality that i s sometimes perceived in vowels. SUPRA-GLOTTIC CANCER = A cancerous growth originating in the upper portion of the larynx. Sub-classifications are the same as for glottic cancer. A: MEDICAL TERMS 267 SUPRAHYOID MUSCLES = The set of muscles that can be f e l t (palpated) by placing a thumb underneath the chin. These connect the larynx to the jaw and tongue. T1G/T2G/T3G/T4G = see glotti c cancer T1SG/T2SG/T3SG/T4SG = see supra-glottic cancer UNILATERAL PARALYSIS = Paralysis of the controlling muscles of one vocal cord. VENTRICULAR BANDS = A part of the supra-glottic larynx. Tissue and muscle located immediately above the vocal cords. The ventricular bands form an airway constriction used normally for coughing and swallowing. Although they may vibrate in some pathological speakers, they are not a useful alternative to the vocal cords for voicing. VOCAL BANDS = see vocal cords VOCAL FOLDS = see vocal cords VOCAL CORDS = Membranous folds of tissue in the larynx that vibrate during production of vowels and other voiced sounds. VOCAL NODULES = Callous-like growths that appear near the center of the vocal cords. When accompanied by a posterior glottic chink, they often appear at the anterior end of the chink (i.e., where the cords f i r s t make contact). Vocal nodules generally appear on both vocal cords. A: MEDICAL TERMS 268 VOCAL PATHOLOGY = A condition that causes disruption of the normal mechanisms of voice production. , VOCAL POLYPS = A degeneration or breakdown of the tissue below the membranous cover of the vocal cords. Vocal Polyps look l i k e a "blister". When the "blister" appears on one cord only, the condition is labelled unilateral vocal polyp (UVP), otherwise i t i s labelled bilateral vocal polyp (BVP). VOCAL TRACT = Anatomy i n the airway between the larynx and the mouth. VOICED SOUNDS = Components of human voice for which the vocal cords vibrate. VOICING = the production of sounds by humans for which the vocal cords vibrate. VOWEL = A component of human voice that is produced when the vocal tract is relatively unobstructed, but the vocal cords are held together so that they vibrate with exhalation. A sustained vowel is produced in isolation from other speech sounds. 269 APPENDIX B: PARAMETERS FOR VOWEL SYNTHESIS This appendix summarizes parameters for synthesis of /a/, / i / or /u/ waveforms at sampling frequencies between 10 kHz and 100 kHz. The parameters are for Eq. (3.2). The synthesizers are constructed by substituting Eq. (3.2) into Eq. (3.1). Formant specifications are in Table 3.1. Tables Bl through B3 contain parameters for synthesis of /a/, / i / and /u/, respectively. "a", " f j " and "bj" aire defined in Eq. (3.2). "b^' and "b2" are values of the parameter b k in Eq. (3.2). The error st a t i s t i c s give the maximum (max) and variance (var) of the difference between the frequency responses of the design f i l t e r and the reference f i l t e r between 0 and 4.7 kHz. The reference f i l t e r was designed for a sampling frequency of 10 kHz. Figures Bl through B3 plot the synthesis parameters from Tables Bl through B3 as a function of the sampling frequency. Figure B4 plots error variances for these f i l t e r s . B : VOWEL PARAMETERS TABLE B l : PARAMETERS FOR SYNTHESIS OF / a / 270 FILTER' SAMPLE FREQ (kHz ) 1st ORDER POLES a t>i D2 (Hz) (Hz) (Hz) I RESONATORS 5k"Hz TUNING f NUM in ) BER (Hz (H^) dB-difference max (dB) ERROR STATISTICS mag-diT?erence var (dB») max (mag) var (magJ) 10.00 10.03 10.05 10.08 10.10 ******* 10.10 10.15 10.20 10.30 10.40 10.50 10.60 10.70 10.75 ******* 10.75 10.80 10.90 11.00 11.10 11.20 *******.. 11.20 11.30 11.40 11.50 11.60 11.70 11.80 11.90 12.00 12.10 12.20 12. 30 12.40 12.50 12.60 12.70 12.80 12.90 13.00 13.10 13.20 13.20 13.30 13.40 13.50 ******* 200 2500 200 2321 200 2162 200 2020 200 1891 ************************************** 00 10 18 26 32 0.000 0.003 0.010 0.019 0.028 O.O0E+00 3.42E-03 6.08E-03 8.26E-03 1.00E-02 00E+00 90E-07 27E-06 03E-06 8.70E-06 200 1493 200 1326 200 1192 200 984 200 626 200 685 200 558 200 435 200 400 1968 2003 2054 2199 2401 2680 3064 3642 4508 6483 6444 6463 6610 6873 7222 7616 8064 8742 ************************************** 200 457 3207 200 400 2931 200 400 2287 200 400 1810 200 400 1422 200 438 1050 ************************************** ******************************** 0.18 0.005 4.83E-03 1.47E-06 0.24 0.008 6.83E-03 2.30E-06 0.27 0.011 8.11E-03 2.96E-06 0.28 0.011 9.04E-03 3.36E-06 0.25 0.009 8.46E-03 2.91E-06 0.19 0.006 7.00E-03 2.08E-06 0.13 0.003 5.03E-03 1.28E-06 0.08 0.002 3.28E-03 6.64E-07 0.09 0.001 3.90E-03 4.46E-07 ******************************** 0.17 0.004 5.53E-03 1.49E-06 5.13E-03 15 27 35 34 27 003 016 024 019 014 60E-03 08E-02 71E-03 06E-02 21E-06 30E-06 15E-06 83E-06 14E-06 ..******************************** 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 201 201 201 201 1279 1187 1096 1020 949 880 824 781 748 718 690 660 635 606 578 550 515 481 445 410 400 444 409 400 400 2288 6287 2500 6639 2735 6893 3015 7128 3305 7214 3562 7104 3818 6939 4067 6719 37 35 31 27 23 19 4296 4461 4579 4634 4659 4658 4646 4623 4584 4549 4515 4482 4475 4592 4545 4521 4506 6447 6097 5725 5326 4923 4546 4193 3862 3560 3288 3044 2820 2608 2685 2490 2298 2121 0.16 0.14 0.12 0.11 10 10 08 08 0.07 0.06 06 06 09 07 0.19 0.09 0.11 0.19 0.30 020 016 012 009 007 006 005 004 003 002 0.002 0.002 0.001 0.001 0.001 0.000 0.001 0.001 0.002 0.003 0.006 0.002 0.004 0.009 0.018 1.31E-02 1.29E-02 1.21E-02 1.12E-02 1.01E-02 8.58E-03 .27E-03 •36E-03 .70E-03 •10E-03 4.40E-03 3.51E-03 3.07E-03 .30E-03 .64E-03 .04E-03 .19E-04 .67E-03 .12E-03 .77E-03 .81E-03 .08E-03 .72E-03 .22E-03 8.95E-03 7 .09E-06 6.03E-06 5.04E-06 4.17E-06 •39E-06 .75E-06 .27E-06 .86E-06 .50E-06 .21E-06 .53E-07 •39E-07 .22E-07 •33E-07 .69E-07 7.66E-08 1.26E-07 .86E-07 .87E-07 .08E-06 .11E-06 .80E-06 .80E-06 4.78E-06 7.42E-06 **************************************..******************************** B: VOWEL PARAMETERS 271 TABLE Bl: (continued) FILTER 1st ORDER POLES RESONATORS ERROR STATISTICS SAMPLE f-1« 5kHz TUNING dB-difference mag-difference FREQ a *>1 *>2 bj NUM (Hz) (Hz) max var " max var (kHz) (Hz) (Hz) (Hz) (Hz) BER (dB) (dB') (mag) (magJ) 13.50 199 1124 2 2769 5397 0.40 0.020 1.72E-02 1.12E-05 13.75 199 1060 2 3000 5282 0.34 0.015 1.53E-02 9.69E-06 14.00 199 1016 2 3244 5135 0.32 0.011 1.46E-02 6.52E-06 14.00 200 1040 2 3352 5654 0.33 0.018 1.51E-02 8.75E-06 14.50 200 951 2 3715 4939 0.27 0.011 1.24E-02 5.87E-06 15.00 200 665 2 3873 4143 0.20 0.006 9.30E-03 3.21E-06 15.50 200 778 2 3908 3458 0.18 0.004 5.60E-03 1.28E-06 16.00 200 704 2 3923 2925 0.18 0.006 3.65E-03 1.76E-06 16.00 201 741 2 4043 3046 0.12 0.002 9.33E-03 4.01E-06 16.50 201 661 2 4000 2597 0.17 0.005 6.15E-03 3.23E-06 17.00 201 583 2 3962 2254 0.27 0.016 1.11E-02 8.45E-06 *******. .**************************************. 17.00 200 1188 3 3075 4898 0.55 0.043 2.26E-02 1.94E-05 18.00 200 1100 3 3363 4282 0.43 0.026 1.96E-02 1.36E-05 19.00 200 1025 3 3510 3683 0.36 0.014 1.66E-02 8.53E-06 20.00 200 951 3 3552 3181 0.28 0.011 1.27E-02 5.23E-06 21.00 200 884 3 3561 2797 0.34 0.018 8.12E-03 4.72E-06 21.00 201 925 3 3692 2919 0.29 0.005 1.32E-02 6.02E-06 22.00 201 856 3 3652 2594 0.29 0.011 7.43E-03 4.12E-06 23.00 201 799 3 3628 2346 0.45 0.028 8.21E-03 5.99E-06 24 .00 201 750 3 3611 2150 0.61 0.055 1.20E-02 1.19E-05 25.00 201 708 3 3600 1994 0.77 0.088 1.63E-02 2.19E-05 *******. **************************************. .********************************: 25.00 200 1202 4 3182 3742 0.60 0.043 2.63E-02 2.27E-05 26 .00 200 1178 4 3238 3561 0.57 0.035 2.56E-02 1.99E-05 27.00 200 1159 4 3289 3401 0.56 0.028 2.54E-02 1.77E-05 28.00 200 1140 4 3323 3256 0.55 0.023 2.47E-02 1.59E-05 29.00 200 1118 4 3333 3121 0.51 0.021 2.33E-02 1.44E-05 30.00 200 1097 4 3337 3004 0.48 0.020 2.16E-02 1.32E-05 32.00 200 ' 1061 4 3341 2811 0.41 0.022 1.87E-02 1.20E-05 34.00 200 1031 4 3342 2660 0.40 0.026 1.58E-02 1.20E-05 34.00 201 1053 4 3421 2771 0.41 0.016 1.89E-02 1.18E-05 35.00 201 1044 4 3431 2701 0.41 0.014 1.87E-02 1.09E-05 40.00 201 999 4 3438 2445 0.34 0.020 1.56E-02 9.29E-06 45.00 201 965 4 3433 2285 0.43 0.038 1.22E-02 1.04E-05 50.00 201 941 4 3429 2177 0.52 0.048 1.27E-02 1.29E-05 50.00 202 958 4 3486 2250 0.39 0.023 1.20E-02 1.34E-05 60.00 202 928 4 3480 2108 0.49 0.039 1.32E-02 1.45E-05 70.00 202 909 4 3475 2029 0.56 0.052 1.54E-02 1.65E-05 80.00 202 896 4 3469 1978 0.63 0.065 1.65E-02 1.83E-05 90.00 202 887 4 3465 1944 0.66 0.075 1.74E-02 1.99E-05 100.0 202 861 4 3463 1920 0.71 0.082 1.81E-02 2.12E-05 B : VOWEL PARAMETERS 272 TABLE B 2 : PARAMETERS FOR SYNTHESIS OF / i / FILTER SAMPLE FREQ (kHz) 1st ORDER POLES a (Hz) bl (Hz) b 2 (Hz) RESONATORS 5k"Hz TUNING E NUM f-i (H2) BER (H2) (Hz) ERROR STATISTICS dB-difference mag-difference max var max var (dB) (dB 1) (mag) (magJ) 10.00 10.03 10.05 10.08 10.10 ******* 10.10 10.20 10.30 10.40 10.50 10.60 10.70 10.75 10.80 ******* 10.80 10.90 11.00 11.10 ******* 11.10 11.20 11.30 11.40 11.50 11.60 11.70 11.80 11.90 12.00 12.10 12.20 12. 30 12.40 12.50 12.60 12.70 12.80 12.90 13.00 13.10 13.20 ******* 200 2500 200 2361 200 2234 200 2115 200 2003 ************************************** 0.00 0.12 0.22 0.32 0.40 0.000 0.001 0.002 0.005 0.007 O.OOE+00 1.83E-03 3.51E-03 4.96E-03 6.19E-03 O.00E+00 2.54E-07 8.72E-07 1.67E-06 2.51E-06 200 1484 200 1238 200 1042 200 898 200 790 200 681 200 575 200 526 200 488 2031 2129 2260 2459 2765 3183 3760 4175 5400 5605 5661 5834 6169 6762 7280 7682 7867 8179 ..******************************** 0.09 0.003 1.60E-03 . 6.22E-07 0.21 0.21 0.18 0.16 0.10 0.04 0.04 0.03 0.004 0.004 0.003 .002 ,001 .001 .000 0. 0. 0. 0. 0.000 31E-03 26E-03 78E-03 55E-03 22E-03 56E-03 57E-03 89E-03 8.46E-07 9.51E-07 7.94E-07 .99E-07 .16E-07 .43E-07 .06E-07 .06E-07 4, 3. 2. 2. 1, ************************************** 200 554 2749 200 479 2288 200 447 1859 200 524 1401 ************************************** ******************************** 0.12 0.001 0.10 0.001 0.12 0.003 0.18 0.007 2.96E-03 3.65E-03 5.74E-03 9.18E-03 4.25E-07 5.17E-07 1.17E-06 3.03E-06 .******************************** 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 1525 1409 1325 1237 1163 1091 1027 976 935 902 875 851 822 794 761 727 691 641 591 532 470 406 2261 2423 2634 2876 3165 3457 3735 3992 4245 4443 4587 4697 4737 4752 4733 4714 4666 4613 4562 4506 4460 4419 5464 5756 6145 6451 6721 6793 6712 6526 6315 6009 5651 5287 4897 4523 4164 3842 3534 3267 3028 2817 2630 2463 41 38 40 37 34 30 25 22 19 18 17 0.17 0.15 13 19 08 08 03 04 0.08 0.15 0.22 0.009 0.007 0.006 0.004 0.003 0.003 0.002 0.002 0.001 0.001 0.001 0.001 0.001 ,000 .000 ,000 .000 .000 .001 0.001 0.003 0.005 0. 0. 0. 0. 0. 0. 6.37E-03 6.04E-03 6.34E-03 5.85E-03 5.41E-03 4.69E-03 .93E-03 .48E-03 .01E-03 .87E-03 2.76E-03 2.63E-03 •33E-03 .08E-03 .75E-03 .27E-03 .24E-03 1.50E-03 2.68E-03 3.96E-03 5.55E-03 7.49E-03 3. 3. 3, 2. 2. 2. 1, 1, 1. 2.47E-06 2.10E-06 74E-06 44E-06 19E-06 84E-07 8.46E-07 7.59E-07 46E-07 51E-07 62E-07 57E-07 60E-07 96E-07 22E-07 26E-06 96E-08 8.41E-08 2.56E-07 6.21E-07 1.29E-06 2.39E-06 6. 5. 4. 3. 2. 1. 1. 6. 3. 4 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ******************************** 9 0 - 3 9 £ ' Z zo-3z .9'z 890*0 L*'0 0X8X 9Z9E * £X6 zoz o*oox S 0 - 3 6 S ' Z Z O - 3 / . * * Z * 9 0 * 0 9 * * 0 6Z8X 9Z9E * 6X6 z o z 00 -06 9 0 - 3 9 E " Z Z O - 3 X E * Z 8 S 0 ' 0 * » ' 0 £98X ZZ9E * 9Z6 zoz 00*08 9 0 - 3 9 0 ' Z 2 0 - 3 6 0 - 3 150*0 Z * ' 0 £68X 8X9E * LZ6 zoz 00*0*. S0-L390-Z zo - a 9 x*z 690*0 9 * ' 0 6£8X 0X9E * ZE6 xoz 0 0 * 0 £ so-a>9* t ' Z0-368-T 8*0*0 X * ' 0 9*6X L09Z > 096 xoz 00 -09 s o - 3 8 £ * x Z 0 - 3 T Z . - T X * 0 * 0 8 £ * 0 E66X 909E * 996 xoz 0 0 - 9 9 SO -30T-T Z 0 - 3 6 * * X * £ 0 * 0 * £ ' 0 090Z E09E * Z86 xoz 00*09 9 0 - 3 6 X * 8 z o - a z z - x 9 Z 0 * 0 OE'O E9XZ 669E * 900X xoz 00*9* 9 0 - 3 9 9 ' S £ 0 - 3 6 0 * 6 8 T 0 ' 0 X * ' 0 E6ZZ E69C * 6E0T xoz 0 0 * 0 * 90 -3Z6*9 zo - a e o * T ZZO'O X * ' 0 *£ZZ 689E * £EOX ooz 0 0 * 0 * 9 0 - 3 6 6 * £ £ 0 - 3 * 8 * 8 * X 0 * 0 * S ' 0 9 8 * Z 999E * 080X ooz 0 0 - 9 E 9 0 - 3 6 £ ' £ Z0-300*T ZTO*0 9 9 ' 0 9 £ 9 Z 9 * 9 £ * OXXX ooz 09-ZE 9 0 - 3 8 E * * Z 0 - 3 i T * T ZTO'O 9L'0 6E8Z EX9E * £*XX ooz OO 'OE 9o-ase -9 Z 0 - 3 6 £ * X 9 1 0 * 0 X 6 * 0 98XE ***£ 90ZX ooz 0 0 ' LZ 9 0 - 3 6 9 * 8 Z 0 - 3 £ S * X ZZO'O E0*X 6 £ * £ 9*EE » £9ZX ooz 00*9Z 9 0 - 3 6 6 - 6 Z 0 - 3 9 9 ' X Z.ZO'0 0X*X 8*9E E£ZE * X8ZX ooz 0 0 * * Z * * * * * * * ¥ ¥ * ¥ ¥ * ¥ ¥ ¥ ¥ « * ¥ * * ¥ « ¥ * ¥ ¥ * ¥ * « ' ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ * ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ •»»***»» S 0 - 3 9 * ' T z o - a z a - i 9 £ 0 * 0 E E ' O 900Z L9LZ E Z6L xoz 0 0 ' * Z 9 0 - 3 £ 0 * 8 Z 0 - 3 9 E * T 0Z0*0 9 Z * 0 *£XZ ££££ E X*8 xoz 00-£Z 90-300*> E 0 - 3 6 V 8 0X0*0 6 X ' 0 Z8EZ *£££ E 968 xoz o o * z z 9 0 - 3 8 f » £ 0 - 3 * 6 * 6 EXO'O OZ'O Z9EZ 0££E £ *68 ooz oo - z z 90-3Z . 8 -T £ 0 - 3 9 0 * S £ 0 0 * 0 LZ'O LZ9Z Z9LZ E 096 ooz oo- xz 9 0 - 3 8 9 * 1 £0 -3£8*9 *00*0 * * ' 0 X£6Z £*££ E 9X0X ooz oo - o z 9 0 - 3 I Z . - Z £ 0 - 3 £ 0 * 6 9 0 0 * 0 es'o 9 X * £ 989E £ 980X ooz 00-6X 90-309•£ Z 0 - 3 * 0 ' T 600*0 £9*0 *£9E 0E9£ E *ZXX ooz 09-8X 9 0 - 3 0 9 * * Z 0 - 3 6 I * T ZXO'O LL'Q *£6E 099£ £ 89XX ooz 00-8X 9 0 - 3 S £ * 9 Z 0 - 3 9 Z * X 9X0*0 Z 8 * 0 * * Z * 9 Z * £ E 90ZX ooz 0 9 * £ X 90-386*9 Z 0 - 3 S E * t 6X0*0 68*0 8 Z 9 * E9ZE E 09ZX ooz 00*£X 9 0 - 3 E * * 8 Z0-3SS*T 9 Z 0 * 0 ZO'X E9£» 090E £ 90£X ooz 09*9X ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ * ¥ ¥ ¥ « ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ * ' ¥ ¥ ¥ ¥ * ¥ ¥ ¥ ¥ ¥ ¥ * * « ¥ ¥ * ¥ ¥ ¥ ¥ ¥ * ¥ * ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ * ¥ ¥ ¥ ¥ •»*»»»»* 9 0 " 3 » Z . * * Z0-360*T 110*0 8 X ' 0 9*£Z * X 0 * Z *£9 ooz 0 S - 9 X 9 0 - 3 9 T - T E O - a o o * s £ 0 0 * 0 OX'O 0 0 £ Z 0*0* z 99L ooz 0 0 ' 9 X i 0 - 3 6 8 ' E E 0 - 3 T T * E XOO'O 0Z*0 X £ X E 090V* z 098 ooz 09-9X £ 0 - 3 9 9 * 6 £ 0 - 3 * 8 * * ZOO'O X£*0 06£E ZEO* z 8*6 ooz 00-9X 90 -3£E*X EO-311'9 £ 0 0 * 0 6 £ * 0 E9X* 966E z £66 ooz 9£**X 9 0 - 3 9 8 ' T E0-3E6*9 900*0 * * * 0 8 * 9 * *06E z 0*0X ooz 09**X 9 0 - 3 * E * Z £0-3*9**. 9 0 0 * 0 6**0 Z £ 6 * £9£E z E80X ooz 9 Z * * X 9 0 - 3 9 8 ' Z £ 0 - 3 6 1 * 8 £ 0 0 * 0 £ 9 * 0 99Z9 099E z 9ZXX ooz oo - * x 90-aee-c £ 0 - 3 9 8 * 8 600*0 £ 9 * 0 0E*S 09EE z £9XX ooz 08*£X 9 0 - 3 9 8 ' E £ 0 - 3 0 £ * 6 XXO'O 09 *0 96*9 £XXE z XOZX ooz 0 9 * £ X 9 0 - 3 8 * * » £ 0 - 3 1 8 * 6 EXO'O £9-0 *9 *9 0£8Z z 9*ZX ooz O V E X 90 -36T-S Z0-3ET*T iLXO'O * £ ' 0 *0*9 099Z z 60EX ooz O Z * E X ( ,6BUI) (6eui) («ap) O p ) 'ts1 (-JH) aaa (ZH) (ZH) (ZH) (ZH^() JBA xeui . xeui wnN Zq e 03H3 Z H * 9 - r J 37dWVS SOIXSIXVXS aonaa SH0I.YN0S3H sa ic -d asaao i s x n a n 13 (panuxq.uoo) :zs. a i a Y i B: VOWEL PARAMETERS TABLE B3: PARAMETERS FOR SYNTHESIS OF / u / 274 * * * * * * * FILTER 1st ORDER POLES RESONATORS ERROR STATISTICS SAMPLE 5kHz TUNING dB-difference mag-difTerence FREQ a *>2 b 3" (Hz) NUM (Hz) (Hz) max var max var (kHz) (Hz) (Hz) (Hz) BER (dB) (dB') (mag) (mag') 10.00 200 2500 0.00 0.000 0.00E+00 0.00E+00 10.03 200 2415 0.05 0.000 4.33E-06 7.45E-13 10.05 200 2331 0.09 0.000 1.21E-05 4.52E-12 10.06 200 2248 0.11 0.001 2.31E-05 1.47E-11 10.10 200 2165 0.12 0.001 3.84E-05 3.68E-11 *******. **************************************. ******************************** 10.10 200 2240 8547 0.25 0.003 1.07E-05 3.63E-12 10.20 200 2002 7259 0.37 0.006 1.08E-05 6.59E-12 10.30 200 1827 6196 0.48 0.009 1.03E-05 7.56E-12 10.40 200 1680 5443 0.55 0.012 8.04E-06 6.67E-12 10.50 200 1552 4857 0.58 0.012 8.19E-06 5.69E-12 10.60 200 1440 4367 0.57 0.011 6.85E-06 5.24E-12 10.70 200 1350 3926 0.56 0.011 7.96E-06 4.45E-12 10.80 200 1280 3520 0.55 0.010 8.65E-06 4.64E-12 10.90 200 1255 3111 0.60 0.013 8.58E-06 5.09E-12 11.00 200 1277 2692 0.68 0.017 1.08E-05 9.50E-12 11.10 200 1377 2224 0.74 0.020 2.02E-05 2.87E-11 *******. **************************************. ******************************** 11.10 200 3318 1 2278 9965 1.18 0.060 2.02E-05 3.79E-11 11.20 200 3110 1 2430 9602 1.14 0.052 1.57E-05 2.39E-11 11.30 200 2907 1 2552 9187 1.09 0.046 1.51E-05 1.53E-11 11.40 200 2704 1 2650 8816 1.01 0.037 1.40E-05 1.38E-11 11.50 200 2538 1 2766 6592 0.96 0.033 1.50E-05 1.79E-11 11.60 200 2369 1 2867 8359 0.91 0.027 2.37E-05 2.73E-11 11.70 200 2224 1 2986 6204 0.86 0.023 3.01E-05 3.76E-11 11.80 200 2100 1 3121 8090 0.61 0.019 3.46E-05 4.58E-11 11.90 200 1989 1 3266 7988 0.76 0.016 3.69E-05 5.12E-11 12.00 200 1895 1 3416 7873 0.73 0.014 3.88E-05 5.44E-11 12.10 200 1799 1 3546 7710 0.67 0.012 4.08E-05 5.91E-11 12.20 200 1716 1 3674 7533 0.62 0.010 4.18E-05 6.11E-11 12. 30 200 1672 1 3848 7398 0.64 0.010 3.83E-05 5.18E-11 12.40 200 1635 1 4011 7226 0.66 0.011 3.45E-05 4.27E-11 12.50 200 1599 1 4148 7012 0.68 0.011 3.12E-05 3.54E-11 12.60 200 1569 1 4264 6769 0.70 0.012 2.77E-05 2.86E-11 12.70 200 1543 1 4362 6509 0.73 0.013 2.36E-05 2.20E-11 12.80 200 1512 1 4425 6231 0.74 0.014 2.03E-05 1.71E-11 12.90 200 1488 1 4479 5952 0.77 0.015 1.60E-05 1.22E-11 13.00 200 1469 1 4517 5670 0.81 0.017 1.14E-05 8.41E-12 13.10 200 1401 1 4485 5399 0.75 0.014 1.06E-05 6.43E-12 13.20 200 1321 1 4439 5147 0.65 0.009 9.17E-06 4.51E-12 13.30 200 1223 1 4382 4918 0.47 0.004 6.76E-06 2.86E-12 13.40 200 1134 1 4334 4703 0.30 0.002 4.38E-06 1.51E-12 13.50 200 1029 1 ' 4261 4508 0.15 0.002 3.15E-06 1.60E-12 13.60 200 949 1 4244 4319 0.24 0.004 4.26E-06 2.78E-12 13.70 200 870 1 4209 4141 0.37 0.009 9.11E-06 6.82E-12 13.80 200 802 1 4179 3971 0.50 0.013 1.67E-05 1.49E-11 13.90 200 740 1 4151 3609 0.64 0.018 2.58E-05 2.89E-11 14.00 200 698 1 4127 3651 0.67 0.017 3.66E-05 5.15E-11 ************************************** ******************************** B: VOWEL PARAMETERS 275 TABLE B3: (continued) FILTER 1st ORDER POLES RESONATORS ERROR STATISTICS SAMPLE I'M.-5kHz TUNING dB-difference mag-difference FREQ a bi b 2 (Hz) NUM (Hz) (Hz) max var max var (kHz) (Hz) (HZ) (HZ) BER (dB) «3B») (mag) (mag1) 14.00 200 2159 2 2661 7135 1.71 0.092 5.11E-05 1.17E-10 14.50 200 2017 2 3116 7100 1.74 0.093 4.38E-05 8.81E-11 15.00 200 1898 2 3509 6725 1.73 0.089 3.62E-05 6.54E-11 15.50 200 17B6 2 3763 6128 1.69 0.082 2.81E-05 4.59E-11 16.00 200 1655 2 3852 5482 1.53 0.062 2.16E-05 3.07E-11 16.50 200 1505 2 3839 4919 1.21 0.034 1.67E-05 1.73E-11 17.00 200 1358 2 3798 4459 0.81 0.013 1.13E-05 5.98E-12 17.50 200 1205 2 3743 4090 0.27 0.006 6.86E-06 8.22E-12 16.00 200 1068 2 3697 3780 0.60 0.027 2.63E-05 4.04E-11 16.50 200 952 2 3662 3511 1.04 0.068 5.30E-05 1.29E-10 19.00 200 609 2 3617 3294 2.04 0.188 7.68E-05 2.93E-10 *******. **************************************. .******************************** 19.00 200 2109 3 3569 6316 3.09 0.326 3.63E-05 8.34E-11 20.00 200 1935 3 3549 5585 2.75 0.238 3.47E-05 6.30E-11 21.00 200 1789 3 3522 5060 2.37 0.161 3.06E-05 5.02E-11 22.00 200 1666 3 3498 4663 1.97 0.099 2.60E-05 3.57E-11 23.00 200 1560 3 3474 4350 1.56 0.054 2.10E-05 2.15E-11 24.00 200 1465 3 3450 4099 1.11 0.024 1.53E-05 1.38E-11 25.00 200 1380 3 3426 3693 0.65 0.013 1.06E-05 1.77E-11 26.00 200 1302 3 3403 3722 0.61 0.020 1.75E-05 3.73E-11 27.00 200 1240 3 3389 3572 0.69 0.039 3.40E-05 7.80E-11 28.00 200 1189 3 3379 3441 0.87 0.061 5.18E-05 1.44E-10 29.00 200 1156 3 3376 3320 0.93 0.067 7.31E-05 2.56E-10 30.00 200 1122 3 3376 3211 1.07 0.087 1.03E-04 3.96E-10 *******. **************************************. .******************************** 30.00 200 2012 4 3320 5038 3.80 0.490 4.53E-05 1.61E-10 35.00 200 1841 4 3301 4550 3.01 0.260 3.75E-05 7.41E-11 40.00 200 1735 4 3272 4247 2.51 0.161 3.21E-05 4.57E-11 45.00 200 1668 4 3257 4055 2.18 0.112 2.84E-05 3.34E-11 50.00 200 1621 4 3247 3925 1.93 0.082 2.55E-05 2.84E-11 60.00 200 1558 4 3232 3765 1.55 0.049 2.09E-05 3.05E-11 70.00 200 1519 4 3222 3673 1.29 0.034 1.77E-05 3.85E-11 80.00 200 1492 4 3213 3616 1.09 0.027 2.04E-05 4.69E-11 90.00 200 1472 4 3206 3579 0.93 0.025 2.35E-05 5.42E-11 100.0 200 1457 4 3200 3553 0.80 0.024 2.06E-05 5.86E-11 B: VOWEL PARAMETERS 276 FIGURE B l : PARAMETERS FOR SYNTHESIS OF / a / DC Pole 202 N O 200 10 T 1 1 I 1—i—r 20 Roll o f f Poles R o l l o f f Resonator Bandwidth SAMPLING FREQUENCY (kHz) SAMPLING FREQUENCY (kHz) B: VOWEL PARAMETERS 277 FIGURE B2: PARAMETERS FOR SYNTHESIS OF / i / DC Po le 3 .200 1 M 10 Roll o f f Poles 3 *i 1 1 1 1 — i — i — i — r 20 Rol lo f f Resonator Bandwidth u 5 Fine-Tuning Resonator Frequency 3 Fine-Tuning Resonator Bandwidth SAMPLING FREQUENCY (kHz) SAMPLING FREQUENCY (kHz) B: VOWEL PARAMETERS FIGURE B 3 : PARAMETERS FOR SYNTHESIS OF /u/ 278 DC Pole 202 • T 1 r ^ 202 M X w 200 10 Roll o f f Poles T 1 1 1 1 1 M 1 I I I I I I I I 11 1 1 1 1 1 I I 20 10 20 Roll o f f Resonator Bandwidth Fine-Tuning Resonator Frequency — 3 Fine-Tuning Resonator Bandwidth 10 100 100 100 SAMPLING FREQUENCY (kHz) SAMPLING FREQUENCY (kHz) B: VOWEL PARAMETERS 279 FIGURE B4: ERROR VARIANCES FOR VOWEL SYNTHESIS FILTERS /a/ SAMPLING FREQUENCY (kHz) SAMPLING FREQUENCY (kHz) 280 APPENDIX C: REVIEW OF FOURIER SPECTRUM ANALYSIS This appendix briefly reviews some issues in Fourier transform spectrum analysis. A number of generalizations and omissions have been made in the interests of brevity and simplicity. Readers are urged to consult a general text on di g i t a l signal processing, such as Childers and Durling (1975), Oppenheim and Schafer (1975) or Rabiner and Schafer (1978), for a more comprehensive treatment. Fourier spectrum analysis using a d i g i t a l computer typically involves the following steps. Sampling is performed to convert the data to a form that can be manipulated by a computer. A segment of data is selected for analysis. The data segment is optionally multiplied by a window function to reduce spectral leakage. Finally, the spectrum is estimated using a discrete Fourier transform (DFT) algorithm. C r i t i c a l issues and trade-offs encountered at each stage of this process are discussed below. THE FOURIER SERIES AND THE FOURIER TRANSFORM An underlying principal in Fourier spectrum analysis is that any periodic waveform can be expressed as a weighted sum of sinusoidal components. Thus, i t is always theoretically possible to derive a set of magnitude coefficients X k and phase coefficients ^ such that: C: FOURIER ANALYSIS REVIEW 281 v(t) - 2 x k * sin( kOt + 4>k ) (Cl) k=0 where v(t) = a periodic waveform T = the length of one period of v(t) 0 = 2TT/T Only one cycle of v(t) needs to be analyzed in order to obtain X k and 4)^, as knowledge of that cycle completely defines v(t) for a l l time. This series is called a Fourier series, and the plot of |xk| as a function of k is called the amplitude frequency spectrum of v(t). The Fourier transform is an extension of the Fourier series for a nonperiodic signal. The nonperiodic signal is simply assumed to be one complete period from a longer periodic signal. The coefficients of the associated Fourier series form the Fourier transform. SAMPLING "Sampling" i s performed to convert a waveform to a representative series of numbers. The most common method of sampling is to observe and store the instantaneous level of the waveform at evenly spaced points in time. If i t is known that the signal is "band-limited", that i s , that the spectrum of the signal has no frequency components above some frequency (f^ax)' then i t can be shown that sampling the signal at a frequency greater than 2 * f m a x w i l l provide a unique representation. This frequency is called the Nyquist frequency. C: FOURIER ANALYSIS REVIEW 282 If the sampling frequency is below the Nyquist frequency, then a phenomenon called "aliasing" occurs. High frequency components become indistinguishable from some of the lower frequency components. The standard method for preventing aliasing is to low-pass f i l t e r the signal prior to sampling. A low-pass f i l t e r "passes" low frequency, components and attenuates high frequency components. WINDOWING, LEAKAGE, AND RESOLUTION If the signal is very long, or i f the spectrum of the signal varies with time, then i t is necessary to choose a segment of data for analysis. This is equivalent to multiplication of the signal by a rectangular window function (Eq. (C2)). The effects of segmentation are described below, along with common methods for minimizing these effects. It can be shown that the Fourier transform of the product of two waveforms can be obtained from the Fourier transform of each of the waveforms through a process called convolution. When a signal is multiplied by a window, this implies that each frequency component in the signal spectrum is replaced with the Fourier transform of the window function. The resulting spectrum estimate i s the sum of overlapping contributions from each of the substituted window transforms. = 0 t < t± w(t) - 1 t i S t £ t 2 (C2) =0 t > t 2 C: FOURIER ANALYSIS REVIEW 283 Insight into the influence of a window function on a Fourier spectrum estimate can be gained from Figure Cl. The spectrum of a sinusoid of i n f i n i t e duration is simply an impulse at the frequency of oscillation. The spectra in plot A and plot B result when the sinusoid is viewed through a rectangular window. If the frequency of the sinusoid is aligned with one of the points in the computed spectrum, as in plot A, then an impulse is obtained•from a DFT. However, i f the sinusoid is not conveniently aligned, as in plot B, then the spectrum estimate deviates from the desired impulse. This can be a significant problem, as spectral component alignment is not guaranteed in most applications. The spreading of spectral energy caused by the window function i s called spectral leakage. A common method for reducing leakage is to taper the ends of the window function. Unfortunately, tapering also results in a loss of spectral resolution. For example, one popular window function, called a Hanning window, i s = 0 t < t± (C3) w(t) 0 . 5 * [ l - c o s ( 2 T T * ( t - t 1 ) / ( t 2 - t 1 ) ) ] t]_ £ t £ t 2 = 0 t > t 2 The effect of the Hanning window on the spectrum of a sinusoid is presented in plot C and plot D of Figure Cl. The attenuation of the side lobes in these plots, when compared to plot A and plot B, indicates that a reduction of leakage has been achieved. However, the increased width of the main lobe implies a concomitant reduction in the spectral resolution. C: FOURIER ANALYSIS REVIEW FIGURE CI. THE FREQUENCY SPECTRA OF WINDOWED SINUSOIDS. 284 The plots labelled A and B contain the spectra of 5 Hz and 5.5 Hz sinusoids when a rectangular window is applied. The plots labelled C and D provide spectra for the same sinusoids when a Hanning window i s applied. Both windows are 1 second in duration. The evenly spaced points in these plots are the values produced by a discrete Fourier transform. 5 10 0 5 10 Frequency (Hz) Frequency (Hz) C: FOURIER ANALYSIS REVIEW 285 RELATIONSHIPS BETWEEN THE TIME AND FREQUENCY DOMAINS Table Cl summarizes some relationships between the frequency domain and the time domain. The f i r s t relationship arises from the observation that the highest frequency that can be uniquely frequency. Thus, the highest meaningful frequency component in the spectrum of a sampled data-segment is the Nyquist frequency. The "spectral bandwidth" (i.e., the range of frequencies that can be meaningfully represented) is directly determined by the sampling frequency. TABLE C l : RELATIONSHIPS BETWEEN THE TIME AND FREQUENCY DOMAINS Each time domain parameter is paired with the frequency domain characteristic that i t influences. Examples are included in brackets below each pair. represented is one half of the sampling frequency, or the Nyquist TIME DOMAIN FREQUENCY DOMAIN Sampling Frequency <-> Spectral Bandwidth (20 kHz) (0 to 10 kHz) Analysis Window Length <-> Spectral Component Spacing (0.5 sec) (2 Hz) Analysis Window Shape <-> Spectral Resolution (see Figure Cl and associated text) C: FOURIER ANALYSIS REVIEW 286 It is significant that spectral component spacing and spectral resolution are liste d as separate issues in Table CI. Spectral resolution relates to the a b i l i t y to separate a signal into i t s component frequencies. It is easy to see that the spacing of components in a computed spectrum limits the maximum attainable spectral resolution. However, as discussed above in conjunction with Figure CI, spectral resolution is also affected by the shape of the window function. There are many instances where the resolution is lower than that indicated by the component spacing. DISCRETE FOURIER TRANSFORM ALGORITHMS Many algorithms have been devised for computing the discrete Fourier transform (DFT). Algorithms derived directly from the mathematical formulation for the DFT are inefficient, with the number of required computations being roughly proportional to the square of the number of data points. Fortunately, fast Fourier transform (FFT) algorithms have been developed for which the number of required computations is roughly proportional to N*log(N). FFTs are made possible by requiring that the number of sample points (N) be a highly composite number (i.e., a number that can be s p l i t into many integer factors). Most popular FFTs require that the N be an integer power of 2. A standard method for analyzing a data-segment that does not meet the length restriction of FFTs is to append zeros. This w i l l alter the spectral component spacing, but should not change the overall shape of the spectral estimate. If a nonrectangular C: FOURIER ANALYSIS REVIEW 287 window function is used to reduce leakage, i t is important that i t be applied prior to appending zeros. PITCH SYNCHRONIZATION AND MEASUREMENT OF SPECTRAL NOISE IN VOWELS The need for pitch synchronization in the spectral analysis of vowels arises from the implied periodic extension of the data-segment. If the data-segment spans an integer number of pitch-periods, then periodic extension produces an intuitively appealing waveform. The result is not so appealing when the number of pitch-periods spanned by the data-segment is not conveniently an integer. Another way of viewing pitch synchronization is in terms of the alignment of harmonic components in the computed spectrum estimate. When the data-segment spans N pitch-periods, where N is an integer, the harmonic peaks are aligned with every N'th coefficient in the computed spectrum. Alignment is lost for non-integer values of N, and the leakage problems illustrated in Figure Cl become problematic. 288 APPENDIX D: DIGITAL RESAMPLING WITH LAGRANGE INTERPOLATION This appendix gives details of d i g i t a l resampling through Lagrange interpolation. A variety of methods for d i g i t a l resampling have been developed (e.g., Crochiere & Rabiner 1981). Lagrange interpolation (Schafer & Rabiner 1973) was chosen because i t is relatively simple to implement and is well known. Digital resampling is a process where interpolation is used to simulate a change in the sampling frequency. A given data-segment w(j) that spans J points is "resampled" in order to obtain a new sequence v(k) that spans K points. This is achieved by using interpolation to estimate w(x) at K values of x that are evenly spaced between 1 and J (Eq. (Dl)). v(k) = w(x) k=l,2,...,K (Dl) where w(j) = the original data sequence, j=l,2,...,J v(k) = the new data sequence, k=l,2,...,K x = the real valued offset into the original sequence = (k-l)*(J-l)/(K-l) +1 Lagrange interpolation was used in Eq. (D2) to estimate w(x). The method is based on f i t t i n g an Q'th order polynomial through Q+l sample points surrounding the point to be estimated. The order of the interpolator is the order of the f i t t i n g polynomial. Thus, a Q'th order Lagrange interpolator estimates each unknown point using a weighted sum of Q+l nearby points. Formulas for the summation weights (L^) are given in Table Dl. D: LAGRANGE RESAMPLING 289 Q+l w(x) - S Lj^ * w(j 0 + i+C) i=l where C = Q = L i -. 1 - l o 1 £ x £ J 1 £ t < 2 and Gj£2 J-1 £ T £ J and Q=3 otherwise (D2) the order of the interpolator (Q^3) the Lagrange weighting factor (see Table Dl) int(x) - int(Q/2) TABLE Dl: SUMMATION WEIGHTS FOR LAGRANGE INTERPOLATION This table compiles formulas for determining Lagrange summation weights (L-jJ for Eq. (D2). "Q" is the order of the interpolator "6" is the fractional part of the real valued offset into the sample sequence The following modifications are required near the ends of a f i n i t e sample sequence. If Q^ 2 and l£x<2, then replace a l l occurrences of 6 with 6-1. If Q=3 and J-Kx^J, then replace a l l occurrences of 6 with 6+1. Q Ll L 2 L 3 L 4 1 1-6 6 2 -6(1-6) 2 (1+6)(1-6) 6(1+6) 2 3 -6(1-6)(2-6) 6 (l+6)(l-6)(2-6) 2 (1+6)6(2-6) -(1+5)6(1-6) 2 6 290 APPENDIX E: STRATEGIES FOR HNR ERROR ANALYSIS This appendix contains a description of manipulations for obtaining Eq. (6.23) from Eq. (6.22). Terms of the form x^'3) denote the j'th derivative of x. Substitution of a Taylor series expansion of v-jJx+Si) for estimation of e-jjx) in Eq. (6.22) results in a rather unwieldy expression. Fortunately, a number of the elements can be eliminated. F i r s t l y , because the mean of the product of independent random variables is the product of the means, many terms can be eliminated when the noise component (vnjjx)) and/or the error variable (6^) are assumed to be independent. Secondly, through integration by parts i t can be shown that i v i ( '3 - k ) ( -c) v ± < • 3 + k > (-c) dx k = 2 (-1) m=l k-m V i ( . j - m ) ( x ) V i ( , j + m - l ) ( T ) Ti 0 + ("1)J T i f -n2 ' J ' (T.) dx 0 and i , 1 v i ( . j - k ) ( x ) V l ( , j + k - 1 ) ( t ) d x = 2 (-1) m=2 k-m v . ( . j - m ) ( x ) V i(,j+m-2) ( x ) V i ( . j - D 2 ( t ) + 0 .5*(-l) k-1 Ti 0 Ti 0 ( E l ) (E2) E: HNR ANALYSIS STRATEGIES 291 Since the endpoints of the pitch-periods line up, i t follows that v i^'^)(T i) = v i + 1('3)(0) for a l l i and j , so the terms in Eq. (El) and Eq. (E2) can be usefully rearranged. If one further assumes that v ^ ' ^ f O ) = vN^'^'(Tj_) for a l l j then a l l terms can be paired in the rearrangement. Finally, i f i t is assumed that E[ V j ^ ' ^ f O ) ] = E[ V i J ' ^ f T - j J ] for a l l i and j , then a l l but the integral term in Eq. (El) and a l l terms in Eq. (E2) are zero. DECOMPOSITION OF THE NUMERATOR OF EQ. (6.22) Expansion of the square in the numerator of Eq. (6.22), followed by substitution of the Taylor series for e-jjx), leads to NUMER = •imax N n(x)vh 2(x)dx + 2 2 3^ 2 6^ j-1 j ! i-1 imax v h ( x ) V i ( ' J > ( x ) d x + 2 1 imax j-1 J ! 2 J CT n(x) N 2 6^ v ± ( 1 J > (x) i-1 (E3) d x oo j - i + 2 2 j=2 k=l » j-1 +2 2 j=2 k-1 imax n(x) N 2 6i^~k v±<•J~k>(x) i=l 2 6 ^ + k V i ( ^ + k ) ( x ) i=l d x T imax 2 0 n(x) N (j-k)! (j+k)! N 2 v i ( ' J~k> ( x ) i-1 2 S ^ " 1 V i ' ' ^ - 1 ) ^ ) i=l d x (j-k)! (j+k-1)! Now, separate the terms involving V j ^ ' 3 ) ( t ) into harmonic component and noise component derivatives, and assume that the noise component and i t s derivatives (vn-^'^fx)) are zero mean random variables of i and x. The desired result for Eq. (6.23) can be obtained by assuming that the harmonic component (vh(x)), the noise component (vn^(x)) and the marking error (SjJ are E: HNR ANALYSIS STRATEGIES 292 mutually independent, and that the derivatives of the harmonic component at i t s start p o i n t are equal to the same derivatives at i t s endpoint. DECOMPOSITION OF THE DENOMINATOR OF EQ. (6.22) After expanding the squares, separating into harmonic and noise components, and r e c a l l i n g the assumptions of independence and zero mean described during decomposition of the numerator, the following expression for the denominator of Eq. (6.22) can be obtained. N DENOM = 2 i=l N V l l i ' l t ) dx +2 2 l / j ! 2 bp 0 j=l i=l 1 v n j v (x) vn-L ( ' J ) (x) dx (E4) N + 2 i=l " E J _ 2 ( X ) dx imax n(x) N 2 E i ( x ) i = l dx Assume that v n ^ ' ^ f O ) = vn N^'^(Tj_) for a l l j . Application of Eq. (El) and Eq. (E2) to the second term of Eq. (E4) leads to summations with respect to i that contain the factor - 6^+1^. Since 6^ was assumed independent of the other components, these summations can be eliminated. The denominator of Eq. (6.23) can then be obtained using a s i m i l a r approach as for the numerator. 293 APPENDIX F: SUMMARY OF VOWEL PERTURBATION MEASURES This appendix compiles the names of the vowel perturbation measures and relates them to the algorithms in Chapter 5. The names are summarized in Table F l . The measures were separated into four groups; "magnitude", "log-magnitude", "pattern" and "correlation". Perturbation magnitude is the degree to which a value deviates 'from i t s predicted "normal" state. The pattern and correlation measures are theoretically independent of perturbation magnitude. Their value l i e s in the detection of abnormal time dependencies and interrelationships. MAGNITUDE AND LOG-MAGNITUDE MEASURES A l l measures of j i t t e r or shimmer magnitude were based on the RAP (Eq. (5.1)), with M=3, CTR=0, K=l and a-j=l/3. "Ji t t e r " (PDAV) is perturbation to pitch-period duration. "Amplitude shimmer" (PAAV) is perturbation to pitch-period peak amplitude. "Stddev shimmer" (PSAV) is perturbation to pitch-period standard deviation, where the standard deviation is computed over a time interval that is equal to the shortest pitch-period duration. The log-magnitude measures of j i t t e r and shimmer were obtained using P%AVDB = -10 * l o g 1 0 [ P%AV ] (Fl) where % = "D", "A" or "S", depending on the type of perturbation F: MEASURE NAMES 294 Three measures of time domain noise were defined. "HNRL" is the harmonics-to-noise ratio (Eq. (5.3)), computed with T n=T x m a x and Mj_=l. This configuration is roughly the same as Yumoto's HNR (Eq. (5.2)). "HNRN" is Eq. (5.3) computed with Th=Tj_mj_n and Mj_ chosen as in Eq. (5.9) to normalize the power in each pitch-period. Finally, "CF" is the Correlation Factor in Eq. (5.14), computed with K=l and Th=Ti mi n. "SHNRros" and "SHNRsor" are the spectral harmonics-to-noise ratios defined in Eq. (5.16) and Eq. (5.17), respectively. The subscript "ros" stands for ratio-of-sums, and the subscript "sor" stands for sum-of-ratios. For calibration in Chapter 7, NPP was varied between 2 and 5, NSKIP was 0 and NMAX was set so that the analysis included frequency components between 0 and 5000 Hz. Fourier coefficients were computed using a FFT algorithm preceded by second order Lagrange resampling, and a Rectangular window was applied. For real vowels in Chapter 8, NPP was 4, NSKIP was 1, and a Hanning window was applied. PATTERN AND CORRELATION MEASURES The cyclic perturbation factors PDCPF, PACPF, PSCPF used the algorithm in Eq. (5.20) with M=3, CTR=0, K=2 and aj=l/3. CFCPF is associated with Eq. (5.21) and CFCPF2 is associated with Eq. (5.22). These measures should be sensitive to alternate cycle periodicity in the perturbations. The directional perturbation quotients (PDDPQ, PADPQ, PSDPQ and CFDPQ) used the algorithm in Eq. (5.18), with M=3, CTR=0, K=l and an=l/3. F: MEASURE NAMES 295 The p e r t u r b a t i o n c o r r e l a t i o n s are c o r r e l a t i o n c o e f f i c i e n t s between time f u n c t i o n s o f the p e r t u r b a t i o n s . Time f u n c t i o n s f o r j i t t e r and shimmer were obtained from PDAV, PAAV and PSAV. The time f u n c t i o n f o r n o i s e was obtained from CF w i t h i t s l o g a r i t h m removed. The mean o f each p e r t u r b a t i o n sequence was removed. Each name takes the form "xxXyy", where "xx" and "yy" s p e c i f y the p e r t u r b a t i o n type. "PD" = j i t t e r , "PA" = amplitude shimmer, "PS" = stddev shimmer, and "CF" = time domain n o i s e . TABLE F l : SUMMARY OF NAMES OF VOWEL MEASURES T h i s t a b l e summarizes names o f measures o f vowel p e r t u r b a t i o n and groups them a c c o r d i n g t o the c h a r a c t e r i s t i c t h a t they were designed t o measure. TYPE OF MEASURE JITTER TYPE OF PERTURBATION SHIMMER AMPLITUDE STDDEV I NOISE TIMEDOMAIN SPECTRAL MAGNITUDE PDAV PAAV PSAV LOG-MAGNITUDE PDAVDB PAAVDB PSAVDB HNR L SHNR r o s HNR N SHNR s o r CF PATTERN PDCPF PDDPQ PACPF PSCPF PADPQ PSDPQ CFCPF CFCPF2 CFDPQ CORRELATION PDXPA, PDXPS, PDXCF, PAXPS, PAXCF, PSXCF PUBLICATIONS COX N.B. (1981). "The Development o f a Microcomputer-Based P e r i - o p e r a t i v e P a t i e n t M o n i t o r i n g System", M.A.Sc. T h e s i s , U n i v e r s i t y o f B r i t i s h Columbia. SMALL C.F., MCEWEN J.A., COX N.B. & JOHNSON D.L. (1981). " E v a l u a t i n g Automated Blood P r e s s u r e Measuring D e v i c e s " , The Lancet, Oct. 10. MORRISON M.D. & COX N.B. (1983). "The O t o l a r y n g o l o g i s t and the Vo i c e ; Computer A n a l y s i s " , Annals R.CP.S.C., 16, 569-574. COX N.B. & MORRISON M.D. (1983). " A c o u s t i c A n a l y s i s o f V o i c e f o r Computerized L a r y n g e a l Pathology Assessment", Canadian J o u r n a l o f Oto l a r y n g o l o g y , 12, 295-302. COX N.B., ITO M.R. & MORRISON M.D. (1986). " C a l i b r a t i o n o f Computed. Features o f I s o l a t e d Vowels u s i n g S y n t h e t i c Vowel Waveforms", J o u r n a l o f the A c o u s t i c a l S o c i e t y o f America, Suppl. 1", S95. COX N.B., MORRISON M.D. & ITO M.R. (1986). " O p t i m i z i n g P i t c h - P e r i o d Markers p r i o r t o E x t r a c t i n g Features from I s o l a t e d Vowels", Proceedings o f the 12th I n t e r n a t i o n a l Congress on A c o u s t i c s , A l - 7 . DURHAM J.S., MORRISON M.D., RAMMAGE L.A. & COX N.B. (1986). "Voice A n a l y s i s i n TI G l o t t i c Carcinoma", Annual Meeting o f the Canadian O t o l a r y n g o l o g i c a l S o c i e t y , Winnipeg, Manitoba. COX N.B., ITO M.R. & MORRISON M.D. (1989). " T e c h n i c a l C o n s i d e r a t i o n s i n Computation o f S p e c t r a l Harmonics-to-Noise R a t i o s f o r Su s t a i n e d Vowels", J o u r n a l of Speech and Hearing Research, 203-218. COX N.B., MORRISON M.D. & ITO M.R. ( 4989 ) . "Data L a b e l l i n g and Sampling E f f e c t s i n Harmonics-to-Noise R a t i o s " , J o u r n a l o f the A c o u s t i c a l S o c i e t y o f America, A p r i l . , COX N.B. & MORRISON M.D. ( i n p r e s s ) . " A c o u s t i c A n a l y s i s o f Vo i c e f o r Computerized L a r y n g e a l Pathology Assessment", E x t r a c t a , Acron P u b l i s h e r s , B e r l i n . < WONG D., ITO M.R., COX N.B., TITZE I.R. ( i n p r e s s ) . "A Hybrid Model o f V o c a l F o l d V i b r a t i o n , P a r t 1: A p p l i c a t i o n t o the V i b r a t i o n o f the Normal V o c a l F o l d " , J o u r n a l o f the A c o u s t i c a l S o c i e t y o f America. COX N.B., ITO M.R. & MORRISON M.D. ( i n p r e s s ) . " Q u a n t i z a t i o n and Measurement E r r o r s i n the A n a l y s i s o f Short-Time P e r t u r b a t i o n s i n Sampled Data", J o u r n a l o f the A c o u s t i c a l S o c i e t y o f America. PRESENTATIONS COX N.B. & ALEXANDER p. (1982). "Acoustic Analysis of Voice using a Computer: Theory and practice", Canadian Speech and Hearing Association AGM, Vancouver. COX N.B. & MORRISON M.D. (1983). "Computerized Laryngeal Pathology Assessment: Physical Characteristics and Acoustic Analysis", Canadian Acoustical Association AGM. COX N.B. (1984). "Evaluation and Development of Computerized Laryngeal Pathology Assessment Techniques", Ph.D. Candidacy Exam, University of British Columbia. COX N.B. (1988). "Evaluation and Development of Computerized Laryngeal Pathology Assessment Techniques", Ph.D. Departmental Exam, University of British Columbia. TECHNICAL REPORTS SMALL C.F., COX N.B. & McEWEN J.A. (1981). "An Autosphygmomanometer Tester, Final Report to the Canadian Bureau of Medical Devices", Vancouver General Hospital. SMALL C.F., COX N.B. & McEWEN J.A. (1981). "An Autosphygmomanometer Tester - Supplementary Report to the Canadian Bureau of Medical Devices", Vancouver General Hospital.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Assessment of vocal pathology through computerized...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Assessment of vocal pathology through computerized analysis of perturbation in vowels Cox, Neil Bernard 1988
pdf
Page Metadata
Item Metadata
Title | Assessment of vocal pathology through computerized analysis of perturbation in vowels |
Creator |
Cox, Neil Bernard |
Publisher | University of British Columbia |
Date Issued | 1988 |
Description | This thesis involved the development, validation and "calibration" of computerized methodologies for analysis of short-time perturbations in vowels, including mathematical analyses of the effect of measurement errors, verification using synthesized data, and evaluation using real data. Such methodologies have been proposed for improved diagnosis and management of laryngeal pathology. Significant effects were observed in mathematical analyses of quantization and pitch-period demarcation for three popular algorithms; the harmonics-to-noise ratio (HNR), the relative average perturbation (RAP) and the directional perturbation quotient (DPQ). A severe underestimation of the HNR caused by such errors was demonstrated. The effect was shown to depend on high frequency components of the vowel. Errors affecting the use of the RAP in measurement of jitter and shimmer were quantified, and methods of compensation were proposed. The DPQ demonstrated a dependence on perturbation magnitude. Such errors influence the interpretation and comparison of results. A number of new measures were developed. The RAP and the DPQ were generalized for variation of the number and spacing of points. The HNR was modified to account for a data offset and for reduction of the influence of jitter and shimmer. A new measure of time domain noise called the correlation factor (CF) was introduced, along with new measures of cyclic perturbation. Issues in Fourier spectrum analysis that affect measures of spectral noise were discussed. Methods for taking advantage of fast Fourier transforms and window tapering were described, along with methods for reducing dependence on formant structure. A new method for "optimizing" pitch-period demarcation markers was shown to be effective at reducing errors for all but the most severely perturbed waveforms. Cross-correlation was combined with parabolic interpolation to obtain high resolution pitch-period demarcation at moderate sampling frequencies. An analysis of synthetic vowels was used to comparatively evaluate the influences of fundamental frequency, vowel type, perturbation type, perturbation level, pitch-period demarcation and quantization. Some findings were: 1) Interpolation is recommended for most measures when the sampling frequency is 20 kHz or less. 2) Optimization of pitch-period markers significantly improved the analyses. 3) Both the offset and the accuracy of pitch-period demarcation can significantly affect measures of time domain noise. 4) Measures of shimmer and noise were affected by fundamental frequency and vowel type. 5) Jitter affected measures of other characteristics. 6) Window tapering reduced the sensitivity of measures of spectral noise to pitch-period demarcation errors. 7) Measures of spectral noise were far more sensitive to jitter than measures of time domain noise. Prolongations of /a/ from 206 male subjects and 194 female subjects were analyzed. The computed measures were correlated with subjective judgements of hoarseness, and used to discriminate among pathologies. Some findings were: 1) Logarithmic transformation was recommended for measures of jitter and shimmer. 2) Measures of time domain noise were generally superior to measures of jitter, shimmer or spectral noise. 3) The best single measure was the correlation factor (CF). 4) The correlation with hoarseness was improved through linear combination of the CF with a measure of jitter, leading to r≈.84 for males and r≈.80 for females. 5) Segregation of sexes was recommended. 6) Improved classification for males was obtained through separation into four diagnostic classes. 7) Improved classification for both males and females was obtained through inclusion of measures of perturbation patterns. 8) In an open test, the best classifiers had an average recognition rate of approximately 74% for distinguishing normal speakers, and 71% for detecting cancer subjects. 9) Computer classification matched or exceeded the ability of trained listeners. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-10-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0065577 |
URI | http://hdl.handle.net/2429/29081 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1989_A1 C69.pdf [ 15.26MB ]
- Metadata
- JSON: 831-1.0065577.json
- JSON-LD: 831-1.0065577-ld.json
- RDF/XML (Pretty): 831-1.0065577-rdf.xml
- RDF/JSON: 831-1.0065577-rdf.json
- Turtle: 831-1.0065577-turtle.txt
- N-Triples: 831-1.0065577-rdf-ntriples.txt
- Original Record: 831-1.0065577-source.json
- Full Text
- 831-1.0065577-fulltext.txt
- Citation
- 831-1.0065577.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0065577/manifest