Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Towards an objective measure of speakers' intelligibility derived from the speech wave envelope Hoek, Dorothy Christine 1988

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1988_A6_7 H63.pdf [ 4.41MB ]
Metadata
JSON: 831-1.0097698.json
JSON-LD: 831-1.0097698-ld.json
RDF/XML (Pretty): 831-1.0097698-rdf.xml
RDF/JSON: 831-1.0097698-rdf.json
Turtle: 831-1.0097698-turtle.txt
N-Triples: 831-1.0097698-rdf-ntriples.txt
Original Record: 831-1.0097698-source.json
Full Text
831-1.0097698-fulltext.txt
Citation
831-1.0097698.ris

Full Text

TOWARDS AN OBJECTIVE MEASURE OF SPEAKERS' INTELLIGIBILITY DERIVED FROM THE SPEECH WAVE ENVELOPE by DOROTHY C. HOEK B . A . , UNIVERSITY OF BRITISH COLUMBIA, 1985 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n THE SCHOOL OF AUDIOLOGY AND SPEECH SCIENCES THE FACULTY OF MEDICINE We accept th i s thes is as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August, 1988 ® D o r o t h y C h r i s t i n e Hoek, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6G/81) ABSTRACT T h i s study i n v e s t i g a t e s the p o s s i b i l i t y of a r e l a t i o n s h i p between amplitude modulation i n the speech envelope and a speaker's i n t e l l i g i b i l i t y or a r t i c u l a t o r y c l a r i t y . I t aims a t d e v e l o p i n g an i n t e l l i g i b i l i t y measure c a l l e d the Modulation Index (MI). Speech samples from s e v e r a l E n g l i s h speakers and one French speaker were recorded and d i g i t i z e d . Speakers were asked to produce speech under three a r t i c u l a t o r y c o n d i t i o n s : U n d e r a r t i c u l a t e d , Normally A r t i c u l a t e d , and O v e r a r t i c u l a t e d . A computer program was developed f o r c a l c u l a t i o n o f MI, based on the amount of amplitude modulation depth i n the envelope of each d i g i t i z e d speech sample. The MI v a l u e s so obt a i n e d were compared with the co r r e s p o n d i n g r a t i n g s from E n g l i s h - s p e a k i n g l i s t e n e r s who judged the a r t i c u l a t o r y c l a r i t y of the recorded u t t e r a n c e s . R e s u l t s i n d i c a t e t h a t the r e l a t i o n s h i p between the p e r c e p t u a l data and the Modulation Index i n i t s present form i s weak and non-monotonic. S e v e r a l f a c t o r s may have a f f e c t e d the r e s u l t s of the comparison between the MI valu e s and the p e r c e p t u a l data. There are i n d i c a t i o n s t hat speakers were not always s u c c e s s f u l i n producing the intended a r t i c u l a t o r y c o n d i t i o n s . A l s o , d e s p i t e p r e c a u t i o n s , there were some d i f f e r e n c e s i n i n t e n s i t y and d u r a t i o n between u t t e r a n c e s from the three c o n d i t i o n s . I t i s concluded that there i s some c o r r e l a t i o n between amplitude modulation i n speech envelopes and speakers' i n t e l l i g i b i l i t y or a r t i c u l a t o r y c l a r i t y . However, the Modulation Index w i l l r e q u i r e m o d i f i c a t i o n b e f o r e i t can become a u s e f u l t o o l . Some m o d i f i c a t i o n s were b r i e f l y e x p l o r e d , and p o s s i b l e f u r t h e r m o d i f i c a t i o n s to both the Modulation Index and the experimental d e s i g n are suggested f o r f u t u r e i n v e s t i g a t i o n s . i v TABLE OF CONTENTS CHAPTER PAGE ABSTRACT i i TABLE OF CONTENTS i v LIST OF TABLES v i i LIST OF FIGURES . i x LIST OF APPENDICES x i ACKNOWLEDGMENT x i i 1. INTRODUCTION 1 2. LITERATURE REVIEW 6 2.1 I n t r o d u c t i o n 6 2.2 F a c t o r s A f f e c t i n g Speech I n t e l l i g i b i l i t y 6 2.21 F a c t o r s i n v o l v i n g the speaker 6 2.22 F a c t o r s i n v o l v i n g the t r a n s m i s s i o n system. 11 2.3 Speech I n t e l l i g i b i l i t y Measures Based on L i s t e n e r Judgments 13 2.31 I n t e l l i g i b i l i t y s c a l e s . . 13 2.32 D e t a i l e d judgment-based i n t e l l i g i b i l i t y t e s t 16 2.4 A c o u s t i c I n t e l l i g i b i l i t y Measures 17 2.41 The A r t i c u l a t i o n Index (AI) 17 2.42 The Speech T r a n s m i s s i o n Index (STI, RASTI) 21 2.43 M o d i f i e d Speech Tr a n s m i s s i o n Index (mSTI). 27 2.44 A r t i c u l a t i o n Loss of Consonants (ALcons).. 27 2.45 D i r e c t - t o - R e v e r b e r a n t I n t e n s i t y Method (SRR)..... 28 2.46 Speech Communication Index (SCI) 30 2.47 P a t t e r n Correspondance Index (PCI) 30 2.48 Kondraske's Method 31 2.49 Monsen's Formula 31 2.5 Comparisons of A c o u s t i c I n d i c e s 32 2.6 F u r t h e r Developments of the Modulation T r a n s f e r F u n c t i o n 33 2.7 C o n c l u s i o n 33 3. METHODS AND MATERIALS 35 3.1 An Overview of the Experimental Design 35 3.2 P r e p a r a t i o n of Speech Samples ; 36 3.21 Speech m a t e r i a l 36 3.22 Speakers 36 3.23 Recording of speech samples 37 3.3 Design of the L i s t e n i n g Test 39 3.31 P r e p a r a t i o n of the L i s t e n i n g Test tape.... 39 3.32 L i s t e n e r s 43 3.33 Procedures f o r the L i s t e n i n g Test 43 3.4 The Modulation Index 44 3.41 D e s c r i p t i o n i f the Index 44 3.42 D i g i t i z a t i o n of speech samples 46 4. RESULTS 51 4.1 R e s u l t s of the L i s t e n i n g Test 51 4.11 C o n s i s t e n c y of l i s t e n e r judgments 51 4.12 Comparison of l i s t e n e r judgments v i with speaker i n t e n t i o n s 55 4.13 E f f e c t o f language on l i s t e n e r judgments.. 59 4.14 E f f e c t of u t t e r a n c e d u r a t i o n v a r i a b i l i t y on l i s t e n e r judgments 60 4.2 R e s u l t s of the MI C a l c u l a t i o n s 60 4.3 Comparison of Modulation Index Values with P e r c e p t u a l Data 64 4.31 Comparison of Modulation Index Values with l i s t e n e r judgments 64 4.32 Comparison of Modulation Index Values with v i s u a l i d e n t i f i c a t i o n of waveforms... 70 5. DISCUSSION AND CONCLUSIONS 75 BIBLIOGRAPHY 81 APPENDICES 86 v i i LIST OF TABLES TABLE PAGE I. Information r e g a r d i n g the speakers s e l e c t e d . . . 40 I I . D u r a t i o n i n seconds of the u t t e r a n c e s s e l e c t e d 42 I I I . Information r e g a r d i n g the l i s t e n e r s 44 IV. Standard d e v i a t i o n s f o r l i s t e n e r judgments a c r o s s the a r t i c u l a t o r y c o n d i t i o n s intended by the speakers f o r Speakers 0 to 6 i n c l u s i v e . 53 V. Standard d e v i a t i o n s f o r l i s t e n e r judgments by ten l i s t e n e r s a c r o s s the a r t i c u l a t o r y c o n d i t i o n s intended by the speakers 53 VI. R e p e a t a b i l i t y of l i s t e n e r judgments as a f u n c t i o n of the speaker 54 V I I . R e p e a t a b i l i t y of l i s t e n e r judgments ac r o s s l i s t e n e r s 54 V I I I . Modulation Index v a l u e s f o r u t t e r a n c e s by s i x speakers 62 IX. Modulation Index rankings of u t t e r a n c e s by a r t i c u l a t o r y c o n d i t i o n 63 X. C o n f u s i o n matrix comparing v i s u a l i d e n t i f i c a t i o n of a r t i c u l a t o r y c o n d i t i o n with speaker i n t e n t i o n s 71 XI. Confusion matrix comparing v i s u a l i d e n t i f i c a t i o n of a r t i c u l a t o r y c o n d i t i o n v i i i with the a r t i c u l a t o r y c o n d i t i o n suggested by the p e r c e p t u a l r a n k i n g order a c c o r d i n g to l i s t e n e r judgments 71 X I I . Comparison of rankings a c c o r d i n g to the o r i g i n a l MI va l u e s with rankings a c c o r d i n g to MImod 74 i x LIST OF FIGURES FIGURE PAGE 1. I l l u s t r a t i o n of the r e d u c t i o n i n modulation of a speech s i g n a l caused by background n o i s e and r e v e r b e r a t i o n 4 2. The N a t i o n a l T e c h n i c a l I n s t i t u t e f o r the Deaf (NTID) S c a l e of I n t e l l i g i b i l i t y 14 3. A percent r a t i n g s c a l e of i n t e l l i g i b i l i t y . . . . 15 4. R e l a t i o n between A r t i c u l a t i o n Index sc o r e s and PB-word sc o r e s 20 5. The way i n which the o r i g i n a l Speech T r a n s m i s s i o n Index accounted f o r the e f f e c t s of background n o i s e 23 6. R e l a t i o n between STI and PB-word scores 26 7. L a b e l l i n g of peaks and troughs i n the amplitude envelope of a speech sample 45 8. Speech sample d i g i t i z a t i o n scheme 47 9. An example of s t a r t and end p o i n t l o c a t i o n s chosen f o r Modulation Index a n a l y s i s of the amplitude envelope of a speech sample 49 10. L i s t e n e r judgments as a f u n c t i o n of the speaker 56 11. L i s t e n e r judgments as a f u n c t i o n of the l i s t e n e r 57 12. The r e l a t i o n s h i p between Modulation Index v a l u e s and l i s t e n e r judgments 12a 66 12b 67 12c. 68 12d 69 x i LIST OF APPENDICES APPENDIX PAGE A. I n t e l l i g i b i l i t y t e s t s based on l i s t e n e r judgments 86 B. The E n g l i s h and French sentences s e l e c t e d . . . 88 C. L i s t e n i n g t e s t i n s t r u c t i o n s and answer sheet 89 D. L i s t i n g o f the program f o r c a l c u l a t i o n o f the Modulation Index 91 x i i ACKNOWLEDGEMENT I would l i k e to thank everyone who has had a p a r t i n t h i s t h e s i s . In p a r t i c u l a r , I would l i k e to thank: Dr. A n d r e - P i e r r e Benguerel f o r s h a r i n g a smal l p a r t of h i s knowledge with me, and f o r h i s a s s i s t a n c e throughout the p r e p a r a t i o n of t h i s t h e s i s . N o e l l e Lamb f o r her encouragement and i n t e r e s t , and f o r s e r v i n g on my committee. My s u b j e c t s f o r t h e i r k i n d c o o p e r a t i o n . My f a m i l y and Mike f o r t h e i r p a t i e n c e and support. My classmates, e s p e c i a l l y C h a r l o t t e , f o r t h e i r f r i e n d s h i p . 1 CHAPTER ONE INTRODUCTION I n t e l l i g i b i l i t y of speech r e f e r s to the ease with which speech may be understood by normal h e a r i n g l i s t e n e r s . Speech which i s u n i n t e l l i g i b l e may have been a f f e c t e d by i n t e r f e r i n g c o n d i t i o n s , such as r e v e r b e r a t i o n or n o i s e , or i t may have been u n i n t e l l i g i b l e as i t l e f t the speaker's l i p s . V a r i o u s methods have been developed i n attempts to measure speech i n t e l l i g i b i l i t y . Some of these, such as the NTID s c a l e ( c i t e d i n Doyle, 1987), are s u b j e c t i v e and are based on l i s t e n e r r a t i n g s . Others, such as the A r t i c u l a t i o n Index (French & S t e i n b e r g , 1947), are o b j e c t i v e measures based on a c o u s t i c a n a l y s e s . Furthermore, some methods were designed or adapted f o r measurement of speech i n t e l l i g i b i l i t y a t the source (Kondraske, 1985, f o r example), while o t h e r s , i n c l u d i n g the A r t i c u l a t i o n Index, assume f u l l i n t e l l i g i b i l i t y at the source and measure the d e g r a d a t i o n of s i g n a l s over speech t r a n s m i s s i o n systems. Speech i n t e l l i g i b i l i t y measures, w i l l be d i s c u s s e d i n d e t a i l i n Chapter Two, but some g e n e r a l o b s e r v a t i o n s which motivated the present study are i n o r d e r . Methods i n v o l v i n g l i s t e n e r judgments have v a r i o u s disadvantages. The experience and b i a s e s of l i s t e n e r s a f f e c t t h e i r judgments, n e c e s s i t a t i n g l a r g e sample groups, and i n c r e a s e d a n a l y s i s time ( c f . Doyle, 1987). In a d d i t i o n , while l i s t e n e r s may make gross judgments about the adequacy of the speech, they cannot make the f i n e a c o u s t i c a n a l y s e s necessary f o r p i n p o i n t i n g the p a r t i c u l a r a t t r i b u t e s o f a speech s i g n a l which c o n t r i b u t e to i t s r e l a t i v e i n t e l l i g i b i l i t y . Without knowledge of the aspects of the a c o u s t i c s i g n a l which are behind p e r c e p t u a l e r r o r s , attempts to remedy the problems i n the system w i l l a t best be based on educated guesses. For example, i f the source of poor speech i n t e l l i g i b i l i t y i n an a u d i t o r i u m i s known to be r e v e r b e r a t i o n , engineers can improve the s i t u a t i o n by p l a c i n g a c o u s t i c t i l e s , or by other means s p e c i f i c to the problem. I f , on the other hand, the f a c t o r s d e t r i m e n t a l t o speech t r a n s m i s s i o n i n t h a t a u d i t o r i u m are unknown, the s o l u t i o n to the problem can o n l y be found by t r i a l and e r r o r . Even those t e s t s which i d e n t i f y p e r c e p t u a l e r r o r s a f f e c t i n g i n d i v i d u a l speech sounds do not r e v e a l the a c o u s t i c c o r r e l a t e s of those e r r o r s . A c o u s t i c i n d i c e s , i n c o n t r a s t , g i v e c o n s i s t e n t r a t i n g s from measurement to measurement, are l e s s time consuming to s c o r e , and g i v e some i n f o r m a t i o n as to the p h y s i c a l q u a l i t i e s c o n t r i b u t i n g to the i n t e l l i g i b i l i t y of speech. The purpose of the presen t study was to e x p l o r e the p o s s i b l e a p p l i c a t i o n of the concept behind an a c o u s t i c index -the Modulation T r a n s f e r F u n c t i o n (MTF) - to an index of i n t e l l i g i b i l i t y a t the source, i . e . the speaker. P r e v i o u s l y , the use of the MTF concept has been c o n f i n e d to the study of speech t r a n s m i s s i o n systems, or to p s y c h o a c o u s t i c p e r c e p t u a l 3 s t u d i e s . The MTF r e f l e c t s the r e d u c t i o n i n modulation depth of an a c o u s t i c wave propagating from source to r e c e i v e r when the i n t e r v e n i n g t r a n s m i s s i o n system i n t r o d u c e s r e v e r b e r a t i o n , n o i s e , and frequency f i l t e r i n g as contaminants (see F i g u r e 1 f o r an i l l u s t r a t i o n of the p r i n c i p l e of modulation r e d u c t i o n ) . Houtgast and Steeneken (1973) compared MTF values o b t a i n e d from v a r i o u s t r a n s m i s s i o n systems to speech i n t e l l i g i b i l i t y measures o b t a i n e d u s i n g P h o n e t i c a l l y Balanced word s c o r e s . F i n d i n g a s t r o n g c o r r e l a t i o n between the two measures, the authors proceeded to i n c o r p o r a t e the MTF as an i n t e g r a l p a r t of the Speech T r a n s m i s s i o n Index (Steeneken & Houtgast 1980; Houtgast and Steeneken 1971, 1985). Houtgast and Steeneken (1973) found a near l i n e a r r e l a t i o n s h i p between speech i n t e l l i g i b i l i t y s c o r e s and r e t e n t i o n of amplitude modulation i n the r e c e i v e d s i g n a l as measured by the MTF. Thus, depth of modulation of a speech s i g n a l seems to c o r r e l a t e with speech i n t e l l i g i b i l i t y . In the prese n t study, the go a l was the development of a r a t i n g t o o l f o r assessment of i n t e l l i g i b i l i t y as a f f e c t e d by a speaker's a r t i c u l a t o r y c l a r i t y . The scores o b t a i n e d from a measure of modulation depth (the Modulation Index or MI) were compared to l i s t e n e r ' s p e r c e p t u a l judgments of a r t i c u l a t o r y c l a r i t y f o r recorded speech samples from s e v e r a l speakers. E f f o r t s were made to minimize contaminating v a r i a b l e s which might a f f e c t the r e s u l t s . Background n o i s e and Received speech signal Transmitted speech signal modulation index «= m < 1 &S0703 Figure 1, I l l u s t r a t i o n of the reduction i n modulation of a speech s i g n a l caused by background noise and re v e r b e r a t i o n , (reproduced from Steeneken and Houtgast, 1985). 5 r e v e r b e r a t i o n , which c o u l d both decrease the i n h e r e n t amplitude modulation i n the produced speech samples, were minimized by making a l l of the r e c o r d i n g s i n a sound proof booth with a c o u s t i c t i l i n g on the w a l l s and c e i l i n g . High q u a l i t y r e c o r d i n g equipment was used to a v o i d frequency f i l t e r i n g , and other e f f e c t s which would reduce the r e c o r d i n g q u a l i t y . Furthermore, i n i t s present f o r m u l a t i o n , the new measure (MI) i s s e n s i t i v e to a b s o l u t e i n t e n s i t i e s , and the e f f e c t on the measure of t i m i n g d i f f e r e n c e s , whether i n d u r a t i o n of samples or i n spacing of words or d u r a t i o n of i n d i v i d u a l phonemes, i s d i f f i c u l t to p r e d i c t . T h e r e f o r e , i n order to minimize these e f f e c t s , speakers p r a c t i c e d keeping i n t e n s i t y and r a t e of speech as constant as p o s s i b l e ; they received, feedback as to t h e i r performance from the experimenter, and they were urged to maintain t h i s constancy d u r i n g r e c o r d i n g s e s s i o n s . These p r e c a u t i o n s were taken i n order to maximize the i n f l u e n c e of the v a r i a b l e s of i n t e r e s t - amplitude modulation depth and a r t i c u l a t o r y c l a r i t y . N e v e r t h e l e s s , other contaminants, unrecognized and unaccounted f o r , may have been p r e s e n t . T h i s i s a hazard of undertaking an e x p l o r a t o r y study u s i n g n a t u r a l speech samples. 6 CHAPTER TWO LITERATURE REVIEW 2 . 1 INTRODUCTION T h i s chapter i s a survey o f the a v a i l a b l e measures of speech i n t e l l i g i b i l i t y , and of r e l a t e d l i t e r a t u r e on the s u b j e c t . T e s t s of speech r e c o g n i t i o n aimed a t q u a n t i f y i n g p e r c e p t i o n of speech by impaired l i s t e n e r s are not reviewed here. (These t e s t s i n c l u d e t e s t s o f word d i s c r i m i n a t i o n such as the CID W-22 word l i s t s , among o t h e r s ) . The study of speech i n t e l l i g i b i l i t y has many u s e f u l a p p l i c a t i o n s . Knowledge of the f a c t o r s which degrade i n t e l l i g i b i l i t y and which are most i n f l u e n t i a l i n a p a r t i c u l a r s e t t i n g can c o n t r i b u t e to the de s i g n and c o n s t r u c t i o n of a c o u s t i c a l l y o p t i m a l a u d i t o r i a , l e c t u r e h a l l s , classrooms, telephones, p u b l i c address systems, and h e a r i n g a i d s . Speech-language p a t h o l o g i s t s may employ these methods when making diagnoses or a s s e s s i n g p r o g r e s s . Thus, the p o t e n t i a l a p p l i c a t i o n s of speech i n t e l l i g i b i l i t y measures are many and, not s u r p r i s i n g l y , there have been many d i f f e r e n t measures developed. 2 . 2 FACTORS AFFECTING SPEECH INTELLIGIBILITY 2 . 2 1 FACTORS INVOLVING THE SPEAKER S e v e r a l s t u d i e s o f the a c o u s t i c v a r i a b l e s which a f f e c t i n t e l l i g i b i l i t y o f an i n d i v i d u a l ' s speech are to be found i n the l i t e r a t u r e . An exh a u s t i v e l i s t of these s t u d i e s i s not necessary f o r t h i s survey, but a d e s c r i p t i o n of p a r t i c u l a r l y r e l e v a n t s t u d i e s w i l l i l l u s t r a t e to the reader some of the approaches which have been e x p l o r e d . Monsen (1978) s t u d i e d the speech of h e a r i n g impaired c h i l d r e n . Three a c o u s t i c v a r i a b l e s accounted f o r 73% of the v a r i a n c e i n normal l i s t e n e r s ' judgments of the c h i l d r e n ' s i n t e l l i g i b i l i t y . These d i f f e r e n c e s were: (1) the d i f f e r e n c e s i n v o i c e onset time between / t / and /d/ (accounted f o r 48.5% of the v a r i a n c e ) ; (2) the second formant d i f f e r e n c e between / i / and /o/ (accounted f o r 20.5% of the v a r i a n c e ) ; and (3) the presence (normal) or absence of r a p i d s p e c t r a l change between a s y l l a b l e i n i t i a l l i q u i d or n a s a l and the f o l l o w i n g vowel. The other p h o n e t i c v a r i a b l e s found to c o n t r i b u t e l i t t l e or not a t a l l to i n t e l l i g i b i l i t y were v o i c e onset time d i f f e r e n c e s between /k/ and /g/ and between /p/ and /b/, f i r s t formant d i f f e r e n c e s i n vowels, and extent of the second formant frequency change i n the diphthong / a i / . In a d d i t i o n , Monsen c i t e d p r e v i o u s s t u d i e s i n which the authors c l a i m e d t h a t v a r i a b l e s a f f e c t i n g a c o u s t i c p r o s o d i c parameters such as d u r a t i o n , r a t e , and fundamental frequency (Voelker, 1938; Hudgins and Numbers, 1942; Hudgins, 1960; John and Howarth, 1965; Brannon, 1966; Ando and Canter, 1969) c o n t r i b u t e s i g n i f i c a n t l y to the r e l a t i v e i n t e l l i g i b i l i t y of speech. These f i n d i n g s , however, were not confirmed by Monsen (1978). Approaching the s u b j e c t d i f f e r e n t l y , Ananthapadmanabha (1983) regards the speech s i g n a l as the c o n v o l u t i o n of the source ( g l o t t i s ) , and the v o c a l t r a c t f i l t e r , a f t e r Fant's (1960) model of speech p r o d u c t i o n . The parameters of speech 8 o r i g i n a t i n g a t the source, c o l l e c t i v e l y d e s i g n a t e d as "source dynamics", c o n s i d e r e d by Ananthapadmanabha, are v o i c i n g and p l o s i v e c o n t r a s t s , i n t e n s i t y changes, and n a t u r a l p i t c h v a r i a t i o n s . Formant i n f o r m a t i o n i s imposed on the source dynamics by the v o c a l t r a c t f i l t e r . In o rder to i s o l a t e the source dynamics and exclude formant i n f o r m a t i o n , Ananthapadmanabha passed a speech s i g n a l through a s o - c a l l e d "epoch" f i l t e r . The i n p u t to t h i s k i n d of f i l t e r i s v o i c e d speech while the output c o n s i s t s of p u l s e s c o r r e s p o n d i n g to each peak of v o c a l source e x c i t a t i o n i n the s i g n a l . Each peak i s termed an epoch. The s i g n a l i s f i r s t passed through a t h i r d - o c t a v e band-pass f i l t e r c e n t r e d a t 4 kHz, then r e c t i f i e d and low pass f i l t e r e d with a 340 Hz c u t o f f frequency. Even without the formant i n f o r m a t i o n , enough p h o n e t i c i n f o r m a t i o n f o r the comprehension of speech remained. As a r e s u l t , Ananthapadmanabha concluded t h a t source dynamics has a much s t r o n g e r r o l e to p l a y i n the p e r c e p t i o n of the phonetic i n f o r m a t i o n than p r e v i o u s l y b e l i e v e d . When taken t o g e t h e r with Monsen's (1978) r e s u l t s , Ananthapadmanabha's c o n c l u s i o n i s d i f f i c u l t to e v a l u a t e , as the v a r i a b l e groupings by the two authors o v e r l a p . Since source dynamics i n c l u d e d Monsen's most important v a r i a b l e (consonant v o i c i n g c o n t r a s t s , or v o i c e onset time d i f f e r e n c e s ) , Monsen would have p r e d i c t e d t h a t i n t e l l i g i b i l i t y would be maintained i n Ananthapadmanabha's processed speech. On the other hand, Monsen would have i n c o r r e c t l y p r e d i c t e d l o s s of i n t e l l i g i b i l i t y when Ananthapadmanabha excluded formant i n f o r m a t i o n , which Monsen c o n s i d e r e d important. Metz, Sama, S c h i a v e t t i , S i t l e r , and Whitehead (1985) a l s o i n v e s t i g a t e d f a c t o r s a f f e c t i n g i n t e l l i g i b i l i t y of h e a r i n g impaired speakers. These authors r e p l i c a t e d Monsen's study and confirmed h i s major f i n d i n g s . They agree with Monsen i n l a b e l i n g segmental i n f o r m a t i o n as the primary dimension of speech i n t e l l i g i b i l i t y . C o n t r a r y to Monsen, however, and i n c o n c e r t with p r e v i o u s authors, Metz et a l . suggest t h a t p r o s o d i c f e a t u r e s are an important secondary dimension. The s t u d i e s mentioned to t h i s p o i n t have d e a l t with i n t e r - s p e a k e r d i f f e r e n c e s i n i n t e l l i g i b i l i t y . Picheny, Durlach and B r a i d a (1985, 1986) i n v e s t i g a t e d i n s t e a d d i f f e r e n c e s i n i n t e l l i g i b i l i t y of speech produced by the same speakers i n d i f f e r e n t s i t u a t i o n s . In p a r t i c u l a r , these s t u d i e s focused on the a c o u s t i c c h a r a c t e r i s t i c s of " c l e a r " speech. C l e a r speech was d e f i n e d as speech intended f o r h e a r i n g impaired l i s t e n e r s or produced i n n o i s y environments. The authors c o n t r a s t e d c l e a r speech with " c o n v e r s a t i o n a l " speech, the l a t t e r being speech intended f o r normal h e a r i n g l i s t e n e r s i n the absence of competing n o i s e . In t h e i r 1985 study, Picheny et a l . found the f o l l o w i n g d i f f e r e n c e s between c l e a r and " c o n v e r s a t i o n a l " speech: (1) the d u r a t i o n of sentences produced i n c l e a r speech almost twice the d u r a t i o n of sentences produced i n c o n v e r s a t i o n a l speech, and t h i s d i f f e r e n c e was a r e f l e c t i o n of both a d d i t i o n a l pauses and i n c r e a s e d d u r a t i o n of i n d i v i d u a l speech sounds i n the c l e a r 10 condition; (2) there were more instances of vowel reduction and consonant deletion i n conversational speech than i n clear speech; and (3) there were differences between the conditions in the short-term spectra of i n d i v i d u a l speech sounds; for example, in c l e a r speech, consonant i n t e n s i t i e s tended to be greater i n r e l a t i o n to neighboring vowels than i n conversational speech. As a cautionary note, however, the r e s u l t s of t h i s study may have been affected by some of the preliminary instructions given to speakers before recording of the "clear" condition, as the following excerpt from Picheny et a l . (1985) indicates: "The talkers were also t o l d to enunciate consonants more c a r e f u l l y and with greater (vocal) e f f o r t than in conversational speech and to avoid s l u r r i n g the words together." (p. 97) These i n s t r u c t i o n probably introduced a disproportionate number of pauses and other a r t i f a c t s into the speech samples. Picheny et a l . (1986) confirmed t h e i r 1985 r e s u l t s , and added findings that the long-term RMS (root mean square) spectrum l e v e l was not s u b s t a n t i a l l y d i f f e r e n t between cle a r and conversational speech, and that there was a wider range of fundamental frequencies i n c l e a r than in conversational speech. Again, Picheny et a l . ' s work supports Metz et a l . ' s (1985) view that prosodic features play an important part in the r e l a t i v e i n t e l l i g i b i l i t y of speech. 11 2.22 FACTORS INVOLVING THE TRANSMISSION SYSTEM French and S t e i n b e r g (1947) l i s t e d a number of f a c t o r s which can a f f e c t i n t e l l i g i b i l i t y . These i n c l u d e the i n t e n s i t y of the s i g n a l , background n o i s e i n the system, r e v e r b e r a t i o n , and phase d i s t o r t i o n . M i l l e r and N i c e l y (1955) added low- and high-pass f i l t e r i n g to the l i s t , but p o i n t e d out t h a t low-pass f i l t e r i n g , i n i t s e f f e c t on speech, i s roughly e q u i v a l e n t to the e f f e c t of background n o i s e because of the lower i n t e n s i t y of the high frequency components of speech. Given the nature of the m a j o r i t y of s e n s o r i n e u r a l h e a r i n g l o s s e s ( i . e . with the g r e a t e s t degree of damage to high frequency h e a r i n g ) , the low-pass f i l t e r i n g e f f e c t of low f i d e l i t y audio systems, and the pre v a l e n c e of background n o i s e as a b a r r i e r to speech t r a n s m i s s i o n , low-pass, r a t h e r than high-pass f i l t e r i n g i s more l i k e l y to be an important f a c t o r i n determining speech i n t e l l i g i b i l i t y i n s i t u a t i o n s o u t s i d e a c o u s t i c l a b o r a t o r i e s . M i l l e r and N i c e l y d e f i n e d f i v e " a r t i c u l a t o r y " dimensions i n speech. These were v o i c i n g , n a s a l i t y , a f f r i c a t i o n , d u r a t i o n , and p l a c e of a r t i c u l a t i o n . V o i c i n g and n a s a l i t y were found to be most robust when speech was s u b j e c t e d to n o i s e or low-pass f i l t e r i n g , whereas p l a c e of a r t i c u l a t i o n was the most e a s i l y d i s r u p t e d dimension under these c o n d i t i o n s . None of the dimensions was p a r t i c u l a r l y r e s i s t a n t to high-pass f i l t e r i n g , s i n c e i n t h i s case most of the a c o u s t i c energy i n the consonants was removed, l e a v i n g the remaining a v a i l a b l e i n f o r m a t i o n at very low i n t e n s i t y , and consequently i n a u d i b l e f o r the purposes of speech p e r c e p t i o n . 12 I n t e l l i g i b i l i t y of speech i s a l s o a d v e r s e l y a f f e c t e d by r e v e r b e r a t i o n . R e v e r b e r a t i o n i s d e f i n e d by R e t t i n g e r (1968) as: "sound p e r s i s t e n c e due to repeated boundary r e f l e c t i o n s a f t e r the source of sound has stopped." (p. 85) Boundaries i n t h i s case are s u r f a c e s such as w a l l s or c e i l i n g s , or any o b j e c t i n an en c l o s e d space. Speech i n t e l l i g i b i l i t y i s reduced by r e v e r b e r a t i o n because p e r s i s t i n g sound energy r e s u l t s i n o v e r l a p of s u c c e s s i v e speech sounds and b l u r r i n g o f the s i g n a l . The determining f a c t o r i n the s u s c e p t i b i l i t y of speech to degra d a t i o n through r e v e r b e r a t i o n has been t r a d i t i o n a l l y i d e n t i f i e d as the r e v e r b e r a t i o n time. Morse and Ingard (1968) d e f i n e r e v e r b e r a t i o n time as: "The l e n g t h of time i t takes the mean energy of the wave to reduce to a m i l l i o n t h p a r t of i t s i n i t i a l mean v a l u e " , (p. 558), or, i n other words, the time taken f o r the wave energy to decrease by 60 d e c i b e l s . Lochner and Burger (1964) concluded t h a t speech i s u n a f f e c t e d by r e v e r b e r a t i o n o n l y a t r e v e r b e r a t i o n times below 0.3 seconds, but t h a t the s i g n a l and i t s r e f l e c t i o n s are p a r t i a l l y i n t e g r a t e d at times between 0.3 and 0.8 seconds. Morse and Ingard add t h a t i f the speech s i g n a l changes s i g n i f i c a n t l y i n a time l e s s than one tenth of the r e v e r b e r a t i o n time, the o r i g i n a l s i g n a l w i l l be b l u r r e d by r e f l e c t e d sound energy. Furthermore, Crum (1974) s t a t e d that a r e v e r b e r a t i o n time of 1.2 seconds or more decreased i n t e l l i g i b i l i t y f o r normal h e a r i n g a d u l t s i n q u i e t , and that 13 the combination of background n o i s e and r e v e r b e r a t i o n reduced speech r e c o g n i t i o n performance more than p r e d i c t e d from the sum of the e f f e c t s of the i n d i v i d u a l v a r i a b l e s . 2.3 SPEECH INTELLIGIBILITY MEASURES BASED ON LISTENER JUDGMENTS 2.31 INTELLIGIBILITY SCALES Perhaps the most s u b j e c t i v e measures of i n t e l l i g i b i l i t y are i n t e l l i g i b i l i t y s c a l e s such as the two employed by Doyle (1987), which may be used, f o r example, f o r s c r e e n i n g a p o p u l a t i o n of speakers f o r i n t e l l i g i b i l i t y d e f i c i t s . F i g u r e 2 and F i g u r e 3 d e s c r i b e these s c a l e s , which g i v e no d e t a i l s as to the f a c t o r s a f f e c t i n g r e l a t i v e i n t e l l i g i b i l i t y , but are quick to score and a d m i n i s t e r . Doyle s t u d i e d the use of these s c a l e s by a u d i o l o g i s t s a s s e s s i n g h e a r i n g impaired c h i l d r e n ' s speech. The r e s u l t s i n d i c a t e d good i n t r a - r a t e r r e l i a b i l i t y , but poor i n t e r - r a t e r r e l i a b i l i t y , p a r t i c u l a r l y f o r the scores a s s i g n e d to the speech of c e r t a i n i n d i v i d u a l s . Thus, l i s t e n e r b i a s i s a concern i n the use of i n t e l l i g i b i l i t y s c a l e s . Speech l a Speech Is With d i f f i c u l t y Speech I s Speech Is completely very d i f f i c u l t Che l i s t e n e r I n t e l l i g i b l e c o m p l e t e l y u n i n t e l l i g i b l e . to understand, can understand w i t h the I n t e l l i g i b l e , only I s o l a t e d about h a l f of e x c e p t i o n of words or the message a few words phrases are ( I n t e l l i g i b i l i t y and phrases, i n t e l l i g i b l e . may improve a f t e r a l i s t e n i n g p e r i o d ) . F i g u r e 2. The N a t i o n a l T e c h n i c a l I n s t i t u t e f o r the Deaf I n t e l l i g i b i l i t y ( a f t e r Doyle, 1987). (NTID) S c a l e o f 111111111111111111111 0 1° 20 30 40 50 60 70 80 90 100 COMPLETELY COMPLETELY UNINTELLIGIBLE INTELLIGIBLE Figure 3. A percent r a t i n g s c a l e of i n t e l l i g i b i l i t y (Doyle, 1987). 16 2.32 DETAILED JUDGMENT-BASED INTELLIGIBILITY TESTS Black (1957) reviewed early i n t e l l i g i b i l i t y tests (see Appendix A for a l i s t of the tests) and he advocated the use of a multiple choice format with answer forms provided. For example, a l i s t e n e r would be presented with a choice of four possible words, and he would c i r c l e the word he thought he heard. He compared the multiple choice format to tests where l i s t e n e r s simply wrote down the words they heard. The multiple choice tests had the advantage of reducing the burden on the phonetic knowledge of the scorer and the scoring time, but otherwise they were no more r e l i a b l e (nor necessarily more valid) than the write-down tests. Williams and Hecker (1968) compared the re s u l t s of four tests aimed at the assessment of speech transmission systems ( l i s t e d in Appendix A). These authors used various types of speech d i s t o r t i o n (additive speech-shaped noise, peak c l i p p i n g , and vocoding), and two d i f f e r e n t speakers for the test conditions. They confirmed the e a r l i e r finding of Hirsh et a l . (1954) that i n d i v i d u a l i n t e l l i g i b i l i t y test scores varied r e l a t i v e to one another depending on the type of speech d i s t o r t i o n introduced into the same transmission system. Furthermore, the scores obtained for th e i r two speakers were more si m i l a r for some d i s t o r t i o n types than for others. More recently, Newman (1979) wrote the introduction to a chapter consisting of a c o l l e c t i o n of reviews of a r t i c u l a t i o n tests used by speech-language pathologists. The tests are l i s t e d i n Appendix A. These tests were developed to replace 17 spontaneous speech samples with the idea of decreasing testing and analysis times, and ensuring that a l l phonemes were sampled in a given session. In contrast to i n t e l l i g i b i l i t y scales and the i n t e l l i g i b i l i t y tests discussed by Black (1957), a r t i c u l a t i o n tests, as well as phonological analyses performed on spontaneous speech samples, give attention to the s p e c i f i c phoneme confusions which adversely a f f e c t i n t e l l i g i b i l i t y . S t i l l , these tests provide only s u p e r f i c i a l d e s criptive information, since even the ear of a well .trained phonetician cannot analyze phoneme errors a c o u s t i c a l l y . To summarize, Newman c r i t i c i z e d these tests for the i r questionable v a l i d i t y and r e l i a b i l i t y , although he acknowledged recent e f f o r t s to improve th i s s i t u a t i o n . In addition, he applauded the addition of suprasegmental phonemes to the content of some tests, whioh should help te i m p r o v e test v a l i d i t y . 2.4 ACOUSTIC INTELLIGIBILITY MEASURES 2.41 THE ARTICULATION INDEX (AI) The A r t i c u l a t i o n Index (AI) was conceived by French and Steinberg (1947). Although o r i g i n a l l y intended for assessment of telephone systems, the AI has been revised for several purposes, and i t i s well enough established to be described in an American National Standards I n s t i t u t e standard (ANSI 1969). The A r t i c u l a t i o n Index was an innovation in that i t summarized speech i n t e l l i g i b i l i t y into one number, independent of l i s t e n e r judgments, and was firml y based on acoustics. As 18 o r i g i n a l l y formulated by French and S t e i n b e r g (1947), the A r t i c u l a t i o n Index i s a r a t i o o b t a i n e d by comparing an i d e a l i n put speech spectrum ( e m p i r i c a l l y determined) with the a c t u a l output of a speech t r a n s m i s s i o n system c o n s i d e r i n g the e f f e c t s of n o i s e and band-pass f i l t e r i n g . The speech spectrum i s d i v i d e d i n t o twenty frequency bands i n the range 200-6100 Hz, each of which i s c o n s i d e r e d to c o n t r i b u t e e q u a l l y to speech i n t e l l i g i b i l i t y . The use of twenty frequency bands p r o v i d e s a measure with good frequency r e s o l u t i o n , u s e f u l i n s i t u a t i o n s where sharp f i l t e r i n g of the speech s i g n a l takes p l a c e , or where there i s narrow band or frequency s p e c i f i c background i n t e r f e r e n c e . However, l a t e r authors (eg. K r y t e r , 1962a; ANSI 1969; Humes, D i r k s , B e l l , Ahlstrom, and K i n c a i d , 1986) have used 1/3 octave bands or octave bands, together with weighting f a c t o r s , with s i m i l a r AI s c o r e s o b t a i n e d . P a v l o v i c (1987) p r o v i d e d a summary of the v a r i o u s m o d i f i c a t i o n s which have been a p p l i e d to the AI s i n c e 1947, as w e l l as updated t a b l e s of v a r i a b l e s f o r AI c a l c u l a t i o n s . The A r t i c u l a t i o n Index i s c a l c u l a t e d by means of the f o l l o w i n g two e q u a t i o n s : A = P l l j W , -s = T (A ) , where A i s the A r t i c u l a t i o n Index, P i s the p r o f i c i e n c y f u n c t i o n (a measure t a k i n g i n t o account the e n u n c i a t i o n of the speaker and the f a m i l i a r i t y of the l i s t e n e r with the m a t e r i a l s ) , i i s the frequency band under c o n s i d e r a t i o n , W- i s the p r o p o r t i o n of the speech dynamic range w i t h i n frequency 1 9 band i which c o n t r i b u t e s to o v e r a l l speech i n t e l l i g i b i l i t y -over the t r a n s m i s s i o n system, I' i s the i d e a l c o n t r i b u t i o n of that frequency band, and s i s speech i n t e l l i g i b i l i t y r e l a t e d to the AI through the e m p i r i c a l t r a n s f e r f u n c t i o n given i n the second e q u a t i o n . The o r i g i n a l A r t i c u l a t i o n Index c o n t a i n e d no p r o v i s i o n f o r r e v e r b e r a n t room c o n d i t i o n s . K r y t e r (1962a) p a r t l y remedied t h i s s i t u a t i o n by p r o v i d i n g c o r r e c t i o n f a c t o r s based on r e v e r b e r a t i o n time which, i n m o d i f i e d form, were i n c o r p o r a t e d i n t o the ANSI standard. However, Humes et a l . (1986) found t h a t these c o r r e c t i o n s were inadequate a t s i g n a l -t o - n o i s e r a t i o s worse than zero d e c i b e l s , and K r y t e r h i m s e l f s t r e s s e d t h a t the c o r r e c t i o n s were based on the r e s u l t s of a s i n g l e study. The v a l i d i t y of the AI has been the s u b j e c t of numerous i n v e s t i g a t i o n s . French and S t e i n b e r g (1947) p r o v i d e d a c h a r t of AI scores compared to l i s t e n e r judgment scores o b t a i n e d using a v a r i e t y of speech m a t e r i a l s (see F i g u r e 4). As K r y t e r (1962a,b) p o i n t e d out, g r e a t e r semantic redundancy of the speech m a t e r i a l s r e s u l t s i n a s m a l l e r AI score f o r any given i n t e l l i g i b i l i t y s c o r e . By semantic redundancy, he meant m a t e r i a l i n which meaning can be gleaned from s y n t a c t i c cues, or i n which a few words are repeated o f t e n , which al l o w s s u b j e c t s to guess at words more a c c u r a t e l y . T h i s 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ARTICULATION INDEX Figure 4. Relation between A r t i c u l a t i o n Index scores and PB-word scores, (reproduced from French and Steinberg, 1947). means that an AI score has lim i t e d meaning in i s o l a t i o n from any s p e c i f i c a t i o n of the speech material to which i t applies. Pavlovic (1984), Kamm, Dirks and B e l l (1985), and Dirks, B e l l , Rossman and Kincaid (1986) have investigated the v a l i d i t y of the AI when applied to hearing impaired l i s t e n e r s . They found that the AI was a good predictor of the performance of most l i s t e n e r s , with some exceptions among subjects with severe high frequency sloping sensorineural hearing losses. 2.42 THE SPEECH TRANSMISSION INDEX (STI, RASTI) Houtgast and Steeneken (1971) introduced an acoustic index having one big advantage over the A r t i c u l a t i o n Index: provisions for taking into account peak c l i p p i n g , band-pass f i l t e r i n g , and reverberation, as well as background noise, were b u i l t i n , rather than added on as clumsy correction factors. Instead of natural speech signals, the STI employed a r t i f i c i a l signals - another difference from the A r t i c u l a t i o n Index. The o r i g i n a l Speech Transmission Index (STI) calculates a weighted sum of spectrum differences between the two leve l s of an alternating signal in each of the f i v e octave bands centred at frequencies ranging from 250 to 4000 Hz. The signal l e v e l difference at the input to a speech transmission system was compared to the difference at the output. This p r i n c i p l e , as applied to background noise, is i l l u s t r a t e d in Figure 5. The two l e v e l s of the alternating signal are 22 r e f e r r e d to as two separate s i g n a l s i n the d i s c u s s i o n which f o l l o w s . At the i n p u t , the two s i g n a l s c o n s i s t e d of n o i s e shaped to resemble the average long-term speech spectrum. One of the s i g n a l s (Sound 1) was more i n t e n s e than the other (Sound 2), but s i n c e the s i g n a l s had the same s p e c t r a l shape a t the i n p u t , the i n t e n s i t y d i f f e r e n c e between them at the input was equal a c r o s s the frequency range. I f , however, the s i g n a l s were passed through a t r a n s m i s s i o n system c o n t a i n i n g background n o i s e approximately equal i n i n t e n s i t y to Sound 2, t h e i r i n t e n s i t y l e v e l d i f f e r e n c e ( A L 1 ) at the output would be changed. Sound 2, combined with the n o i s e , would r e s u l t i n an output s i g n a l (Sound 3) having hi g h e r i n t e n s i t y and a s p e c t r a l shape d i f f e r e n t from t h a t of Sound 2. Sound 1, however, being s i g n i f i c a n t l y more i n t e n s e than the background n o i s e , would be e s s e n t i a l l y u n a f f e c t e d by i t . Comparison of the spectrum l e v e l s of Sound 3 and Sound 1 a t the output, i n each of the octave bands, would r e v e a l the e f f e c t s of the background n o i s e , s i n c e the i n t e n s i t y d i f f e r e n c e s between the two s i g n a l s would no l o n g e r be equal a c r o s s the frequency range. A weighted sum of these d i f f e r e n c e s would take i n t o account the d i f f e r e n t c o n t r i b u t i o n of each octave band. I f r e v e r b e r a t i o n was present i n a system, the a l t e r n a t i o n r a t e (3 Hz at input) would be changed a f t e r t r a n s m i s s i o n . INPUT > SPEECH TRANSMISSION SYSTEM •> OUTPUT SOUND 1 and BACKGROUND NOISE SOUND 2 alt e r n a t i n g •H 1—> > 1 h 0.25 A 0.25 4 Frequency (kHz) Frequency (kHz) SOUND 1 and SOUND 3 al t e r n a t i n g sum of SOUND 2 & BACKGROUND NOISE _^ 1 > 0.25 A Frequency (kHz) Figure 5. The way i n which the o r i g i n a l Speech Transmission Index accounted for the e f f e c t s of background noise. 24 R e v e r b e r a t i o n was measured by q u a n t i f y i n g the change i n a l t e r n a t i o n r a t e from i n p u t to output. The s i g n a l s , and the a n a l y s i s procedure which f o l l o w e d , were developed and m o d i f i e d on the b a s i s of comparisons to P h o n e t i c a l l y Balanced (PB) word scores measured over f i f t y t r a n s m i s s i o n channels. V a r i o u s degrees and combinations of r e v e r b e r a t i o n , band-pass f i l t e r i n g , i n t e r f e r i n g n o i s e , and peak c l i p p i n g were used as contaminants i n the channels. The formula f o r the STI i n t h i s e a r l y form was: STI = 1/5 ^ < 1^ ) where i i s the octave band index, i s the output i n t e n s i t y l e v e l d i f f e r e n c e (which i n c o r p o r a t e s a l t e r n a t i o n r a t e d i f f e r e n c e s , i f any), 20 dB i s the i n i t i a l i n t e n s i t y l e v e l d i f f e r e n c e between the two s i g n a l s . L i m i t a t i o n s i n d i c a t e d by the authors i n c l u d e d i n a b i l i t y to account f o r frequency d i s t o r t i o n , c e n t e r c l i p p i n g , or extremes of i n t e n s i t y . The next s t e p i n the e v o l u t i o n of the Speech Tran s m i s s i o n Index was the i n c o r p o r a t i o n of the Modulation T r a n s f e r F u n c t i o n (MTF) (Houtgast & Steeneken, 1973; Steeneken and Houtgast, 1980). The MTF concept, whereby the f l u c t u a t i o n s i n the envelope of an input s i g n a l are smoothed i n the output s i g n a l by the e f f e c t s of r e v e r b e r a t i o n and background n o i s e , o r i g i n a t e d i n s t u d i e s of v i s u a l p e r c e p t i o n (see F i g u r e 1). In the r e v i s e d STI, the i n f l u e n c e of Modulation T r a n s f e r F u n c t i o n s f o r seven frequency bands of a s i n u s o i d a l l y 25 modulated input s i g n a l of shaped pink n o i s e are combined i n t o a s i n g l e s c o r e . The c a l c u l a t i o n scheme was adapted from the A r t i c u l a t i o n Index. The c o r r e l a t i o n of r e v i s e d STI scores with (Dutch) P h o n e t i c a l l y Balanced word scores i s i l l u s t r a t e d i n F i g u r e 6. N o n - l i n e a r d i s t o r t i o n , changing background noi s e d u r i n g measurements, and frequency dependent r e v e r b e r a t i o n times w i l l r e s u l t i n i n v a l i d STI s c o r e s . A review of the development and a p p l i c a t i o n s of the Speech T r a n s m i s s i o n Index i s p resented i n Houtgast and Steeneken (1985). The authors have moved i n three d i r e c t i o n s with the Speech T r a n s m i s s i o n Index. The f i r s t i s the development of measurement d e v i c e s . Steeneken & A g t e r h u i s (1978) d e s c r i b e d an STI meter used i n f i e l d s t u d i e s . More r e c e n t l y , (Houtgast & Steeneken, 1984; Steeneken & Houtgast, 1985, Dareham, 1986) RASTI (Rapid Speech Tr a n s m i s s i o n Index) was i n t r o d u c e d . T h i s i s a s c r e e n i n g meter, i n which only two of the octave frequency bands (cent e r e d at 500 Hz and 2000 Hz) are i n c l u d e d , but the method of c a l c u l a t i o n i s otherwise s i m i l a r to the Speech Transmission Index. The second d i r e c t i o n i s the design of a c o u s t i c a l l y o ptimal a u d i t o r i a , u sing a d e s i r e d STI value as a s t a r t i n g p o i n t (Houtgast, Steeneken and Plomp (1980) and Plomp, Houtgast and Steeneken (1980)). The s p e c i f i c a t i o n s g i v e n were f o r the volume of the room, the r e v e r b e r a t i o n time, the 26 100 80 * 60 o 3 i m o. 20 k a V * 1 • . 1 1 1 • 1 1 A f 1 . = N . B P . = p e a k c • = A G C / F a = 5.6% : l . I E V -2 0 LQ 60 S T M % ) 80 100 F i g u r e 6. R e l a t i o n between STI and PB-word s c o r e (Dutch words) f o r the c o n d i t i o n s w i t h n o i s e , bandpass l i m i t i n g , peak c l i p p i n g , a u t o m a t i c g a i n c o n t r o l , and r e v e r b e r a t i o n . The c u r v e r e p r e s e n t s the b e s t -f i t t i n g c u r v e f o r a l l these d a t a p o i n t s . (Reproduced from Steeneken and H o u t g a s t , 1980). 27 ambient noise l e v e l , the o r i g i n a l signal i n t e n s i t y , and the distance between speaker and l i s t e n e r . The t h i r d new d i r e c t i o n for the STI i s a computer ray-tracing model designed to provide an STI score for each in d i v i d u a l audience position rather than merely one score for the entire auditorium (van Reitschote, Houtgast and Steeneken, 1981, 1983). In t h i s model, the speaker i s simulated by a point source which emits a signal to each audience position, 2.43 MODIFIED SPEECH TRANSMISSION INDEX (mSTI) The mSTI i s a hybrid of the AI and the STI in that i t combines the modulation transfer function approach with A r t i c u l a t i o n Index weighting factors for one t h i r d octave band frequency analysis. Humes et a l . (1986) created the mSTI as an improvement on i t s two predecessors for prediction of speech i n t e l l i g i b i l i t y parfermane§ by nermal and heaping impaired l i s t e n e r s when speech was temporally and s p e c t r a l l y d i s t o r t e d . The mSTI did indeed prove superior to the AI and to the STI. The mSTI scores matched best with scores obtained by hearing impaired l i s t e n e r s on a speech recognition test . 2.44 ARTICULATION LOSS OF CONSONANTS (ALcons) Peutz (1971) and Klein (1971) developed a measure c a l l e d the A r t i c u l a t i o n Loss of Consonants measure (ALcons), so c a l l e d because Peutz regarded the degradation of i n t e l l i g i b i l i t y over a transmission system as a loss of information, and because the measure proved to be much more consistent for transmission of consonants than of vowels. 28 L i k e the STI, ALcons d i r e c t l y accounted f o r r e v e r b e r a t i o n e f f e c t s , but, l i k e the AI, i t employed n a t u r a l speech as an input s i g n a l . Peutz suggested t h a t up to a c e r t a i n c r i t i c a l d i s t a n c e ( d ^ ) , speech i n t e l l i g i b i l i t y i s p a r t i a l l y dependent on the speaker to l i s t e n e r d i s t a n c e . Above d c , i n t e l l i g i b i l i t y i s independent of d i s t a n c e , and v a r i e s with the r e v e r b e r a t i o n time of the room. The f o l l o w i n g equations are used to o b t a i n the ALcons measure: f o r d < d c , ALcons = 200 dJ_T*+ a (%), V f o r d 2 d j , ALcons = 9T + a (%), with d c = (0.2 s*mv»)J V/T * In these equations, d i s the c r i t i c a l d i s t a n c e i n meters, d i s the d i s t a n c e to the l i s t e n e r i n meters, V i s the room volume i n m , T i s r e v e r b e r a t i o n time (at 1400 Hz) i n seconds, ALcons i s the i n t e l l i g i b i l i t y s c o r e , and a i s a c o r r e c t i o n f o r the s k i l l s of the l i s t e n e r , as measured by a speech r e c o g n i t i o n t e s t . A m o d i f i c a t i o n i s made to the measure i f there i s competing background n o i s e i n the room. Peutz (1971) used very s m a l l groups of l i s t e n e r s ( f i v e to ten people) when v a l i d a t i n g the ALcons. S t i l l , a c c o r d i n g to Lundin (1982), even though i t s v a l i d i t y i s not w e l l e s t a b l i s h e d , t h i s measure i s widely accepted. 29 2.45 DIRECT-TO-REVERBERANT INTENSITY METHOD (SRR) The D i r e c t - t o - R e v e r b e r a n t I n t e n s i t y method, b e t t e r known as SRR, f o r S i g n a l to R e v e r b e r a t i o n R a t i o (Lundin, 1986), combines f e a t u r e s of the ALcons with a frequency band approach s i m i l a r to the A r t i c u l a t i o n Index. I t was formulated to i n c o r p o r a t e the b u i l t - i n c o n s i d e r a t i o n of r e v e r b e r a t i o n and s i m p l i c i t y of the ALcons with the frequency s p e c i f i c i t y of the AI. The SRR may be c a l c u l a t e d a c c o r d i n g to the f o l l o w i n g formulae: SRR = -20.log d / r r r = /QA/16Tr X= 0.057 ^ J Q V Z T T where d i s the d i s t a n c e i n meters between the source and the l i s t e n e r , r r i s the r e v e r b e r a t i o n r a d i u s i n meters ( d e f i n e d as the d i s t a n c e between the source of the o r i g i n a l s i g n a l and the p o i n t where the o r i g i n a l s i g n a l and the re v e r b e r a n t s i g n a l are e q u a l l y i n t e n s e (SRR = 0 dB)), Q i s the d i r e c t i v i t y of the source (which corresponds to the p r o p o r t i o n of sound from the source which a c t u a l l y reaches the l i s t e n e r s when the r e s t of the sound energy i s d i s s i p a t e d around the room), A i s the a b s o r p t i o n of the room i n me t r i c s a b i n s ' , V i s the volume of the room i n m%, and T the i s r e v e r b e r a t i o n time of the room i n seconds. The two formulae f o r c a l c u l a t i o n of r are r e l a t e d v i a r* The u n i t of a b s o r p t i o n , the s a b i n , i s named i n honor of W.C. Sabine, and has the dimensions of one square f o o t . A me t r i c s a b i n has the dimensions of one square meter, and i s t h e r e f o r e equal to 10.76 s a b i n s , s i n c e there are 10.76 square f e e t i n one square meter. 30 Sabine's formula: A = 0.163 (s/m) V A T where the a b s o r p t i o n c o e f f i c i e n t , oi , i s assumed to be equal to one. Lundin (1986) found, however, t h a t the SRR f a i l e d to pro v i d e b e t t e r p r e d i c t i o n s of i n t e l l i g i b i l i t y than the ALcons, the AI, or the STI. He concluded that a l l four measures gave roughly e q u i v a l e n t p r e d i c t i o n s of the performance of normal h e a r i n g l i s t e n e r s i n adverse c o n d i t i o n s , but t h a t these p r e d i c t i o n s o v e r e s t i m a t e d i n t e l l i g i b i l i t y as measured by l i s t e n e r judgments. 2.46 SPEECH COMMUNICATION INDEX (SCI) The SCI ( K r y t e r and B a l l , 1964) has not been w i d e l y adopted because i t s use r e q u i r e s s o p h i s t i c a t e d i n s t r u m e n t a t i o n . An i n t e l l i g i b i l i t y score i s c a l c u l a t e d on the b a s i s of the s i g n a l - t o - n o i s e r a t i o i n nine frequency bands, frequency s h i f t , and peak c l i p p i n g i n the system under study. 2.47 PATTERN CORRESPONDANCE INDEX (PCI) The PCI ( L i c k l i d e r , B i s b e r g and Schwartzlander, 1959) i s another index which has onl y s p e c i f i c a p p l i c a t i o n s i n systems a n a l y s i s because of the complex i n s t r u m e n t a t i o n r e q u i r e d . In t h i s case, the p a t t e r n of the running power spectrum of a r e a l speech s i g n a l i s compared b e f o r e and a f t e r passage through a t r a n s m i s s i o n system. T h i s method i n s p i r e d Houtgast and 31 Steeneken (1971) when they o r i g i n a l l y c r e a t e d the Speech T r a n s m i s s i o n Index. 2.48 KONDRASKE'S METHOD A r e c e n t attempt to q u a n t i f y i n t e l l i g i b i l i t y of speech a t i t s source i s Kondraske's (1985) measure. He intends t h i s method, which i s s t i l l i n i t s i n f a n c y , to be used by speech c l i n i c i a n s f o r assessment of p a t i e n t s with d i s o r d e r s such as d y s a r t h r i a . A microphone i s connected to a microcomputer which d i g i t i z e s numerals spoken by the p a t i e n t . The measure c o n s i d e r s peak amplitude, average amplitude, peak to average amplitude r a t i o , i n t e r - s y l l a b l e time, and speed of a r t i c u l a t i o n measured as the number of s y l l a b l e s produced i n ten seconds. 2.49 MONSEN'S FORMULA Having i d e n t i f i e d a smal l number of e s p e c i a l l y i n f l u e n t i a l v a r i a b l e s i n the d e t e r m i n a t i o n of speech i n t e l l i g i b i l i t y , Monsen (1978) ( d i s c u s s e d above i n s e c t i o n 2.21) developed the f o l l o w i n g formula to be a p p l i e d to the speech of the hea r i n g impaired: I = 0.91(T t - T d ) + 0.0214(F ; - F0) + 4.78(L,N) + 54.57, where I i s the index of i n t e l l i g i b i l i t y , T t i s the mean v o i c e onset time of / t / , T(j i s the mean v o i c e onset time of /d/, F,* i s the mean second formant frequency f o r / i / , F 3 i s the mean second formant frequency for /J /, L and N are numerical variables r e f l e c t i n g the presence or absence of rapid spectral change following s y l l a b l e i n i t i a l l i q u i d s and nasals, and 54.57 i s an empirically determined constant. Monsen tested the v a l i d i t y of his formula, and found a c o r r e l a t i o n of 0.86 between predicted and obtained i n t e l l i g i b i l i t y scores assigned by normal hearing l i s t e n e r s . This formula has not been widely adopted elsewhere, however, probably in part due to the time consuming spectrographic measurements required. 2.5 COMPARISONS OF ACOUSTIC INDICES Some of the indices described (Monsen's, PCI, SCI) have li m i t e d applications and have not gained popularity since t h e i r introduction. The a p p l i c a b i l i t y of Kondraske's method, which has the promise of being available to c l i n i c a l speech-language pathologists through o f f i c e microcomputers, has yet to be determined. But what of the mSTI, the STI, the AI, the ALcons, and the SRR? None of these measures i s equipped to deal with non-li n e a r frequency or amplitude d i s t o r t i o n , or with extremes of signal i n t e n s i t y . The STI, mSTI, ALcons, and SRR are superior to the AI for reverberant conditions, but the ALcons and the SRR require an external correction in the presence of i n t e r f e r i n g noise. Humes et a l . (1986) found the mSTI to be superior to both the AI or the STI in prediction of i n t e l l i g i b i l i t y scores in the presence of temporal and spectral d i s t o r t i o n . However, 3 3 they also found that a l l three of these measures tended to underestimate loss of i n t e l l i g i b i l i t y in some hearing impaired subjects, a finding which i s i n agreement with those of Kamm et a l . (1986) and Pavlovic (1984). S i m i l a r l y , Lundin (1986) found that the AI, the STI, the SRR, and the ALcons a l l predicted higher i n t e l l i g i b i l i t y scores than those obtained through l i s t e n e r judgments. 2.6 FURTHER DEVELOPMENTS OF THE MODULATION TRANSFER FUNCTION The Modulation Transfer Function has aroused the intere s t of others besides Steeneken, Houtgast and th e i r colleagues. In 1981, for instance, Schroeder described the Complex Modulation Transfer Function (CMTF), which involves the use of Fourier transforms, and includes consideration of phase differences together with reduction i n modulation depth in i t s c a l c u l a t i o n . Elsewhere, Ahlstrom and his colleagues (Ahlstrom & Humes, 1983, 1985; Ahlstrom, Boney & Humes, 1985) have developed a method for assessing psychoacoustic MTFs by obtaining behavioural thresholds for temporal probe tones (tone pips at peaks or valleys of s i n u s o i d a l l y modulated speech noise). They have investigated Modulation Transfer Functions in subjects with normal hearing and sensorineural hearing losses, and subjects using compression and non-compression hearing aids . 3 1 2.7 CONCLUSION Even given a l l of these l i m i t a t i o n s , the Modulation Transfer Function seems to be the most v e r s a t i l e of the measures on which indices have been based. It can account for both reverberation and background noise e f f e c t s without external corrections, and i t has warranted the attention of many authors working in several d i f f e r e n t d i r e c t i o n s , a l l with promising r e s u l t s . Perhaps in a form yet to be determined, the Modulation Transfer Function may well be the i n t e l l i g i b i l i t y measurement tool of the future. CHAPTER THREE METHODS AND MATERIALS 3.1 OVERVIEW OF THE EXPERIMENTAL DESIGN The o b j e c t i v e of t h i s i n v e s t i g a t i o n was to e x p l o r e the p o s s i b i l i t y of d e v i s i n g an a c o u s t i c measure of speech i n t e l l i g i b i l i t y when i t depends on l y on the a r t i c u l a t o r y c l a r i t y of the speaker. The merit of t h i s computed measure (h e n c e f o r t h r e f e r r e d t o as Modulation Index, or MI), was e v a l u a t e d by a comparison to l i s t e n e r s ' p e r c e p t u a l judgments of the same speech m a t e r i a l s . For the purposes of t h i s experiment, the range of a r t i c u l a t o r y c l a r i t y was d i v i d e d i n t o three " a r t i c u l a t o r y c o n d i t i o n s " . At the low end of the range, there was the " U n d e r a r t i c u l a t e d " (U) or mumbled c o n d i t i o n , which was intended to correspond to poor i n t e l l i g i b i l i t y . In the middle of the range, there was the "Normally A r t i c u l a t e d " (N) c o n d i t i o n . At the top of the range, there was the " O v e r a r t i c u l a t e d " (0) c o n d i t i o n . T h i s c o n d i t i o n corresponded to maximally i n t e l l i g i b l e speech, such as t h a t intended f o r hard of h e a r i n g l i s t e n e r s i n n o i s y c o n d i t i o n s . Speakers producing sentence-length u t t e r a n c e s were recorded. They were asked to produce the sentences i n each of the t hree c o n d i t i o n s mentioned. In most cases, but not a l l , the intended l e v e l of a r t i c u l a t o r y c l a r i t y was a t t a i n e d . The performance of the speakers w i l l be d i s c u s s e d i n d e t a i l i n S e c t i o n 4.12. MI values c a l c u l a t e d f o r the speech samples 36 were compared with the p e r c e p t u a l data, which was q u a n t i f i e d i n the form of l i s t e n e r judgments of a r t i c u l a t o r y c l a r i t y . 3.2 PREPARATION OF THE SPEECH SAMPLES 3.21 SPEECH MATERIALS I n i t i a l l y , nine E n g l i s h and nine French sentences were composed. Each sentence c o n t a i n e d nine s y l l a b l e s , and f o r each language t h e r e were three sentences c o n t a i n i n g predominantly l a b i a l consonants, three c o n t a i n i n g predominantly a l v e o l a r and p a l a t a l consonants, and three c o n t a i n i n g predominantly v e l a r consonants. An e f f o r t was made to r e p r e s e n t as many French and E n g l i s h phonemes i n the sentences as p o s s i b l e . 3.22 SPEAKERS Ten E n g l i s h speakers and one French speaker were recorded. Among the E n g l i s h speakers, f i v e were male and f i v e were female. The French speaker was male. E i g h t of the E n g l i s h speakers ( f o u r males and f o u r females) were long-time or n a t i v e Western Canadian r e s i d e n t s who spoke the standard Western Canadian d i a l e c t . One male E n g l i s h speaker had a B r i t i s h (Received P r o n u n c i a t i o n ) accent, and one female speaker had a Newfoundland accent. The French speaker was a n a t i v e of Lausanne, S w i t z e r l a n d ; he a l s o spoke E n g l i s h , but, u n l i k e the other ten speakers, he recorded French sentences. A l l of the speakers were judged to have normal speech. 3.23 RECORDING OF SPEECH SAMPLES Speech samples were recorded i n a sound proof booth with a c o u s t i c t i l i n g , u s i n g a S c u l l y model 280 tape r e c o r d e r and an AKG D202 dynamic microphone. Each speaker produced nine sentences under each a r t i c u l a t o r y c o n d i t i o n , i . e . U n d e r a r t i c u l a t e d (U c o n d i t i o n ) , Normally A r t i c u l a t e d (N c o n d i t i o n ) , and O v e r a r t i c u l a t e d (0 c o n d i t i o n ) . The order of r e c o r d i n g was Normal, O v e r a r t i c u l a t e d , Normal, U n d e r a r t i c u l a t e d . The second Normal c o n d i t i o n was used to enable the speaker to get back to h i s / h e r b a s e l i n e a f t e r the O v e r a r t i c u l a t e d c o n d i t i o n , i n p r e p a r a t i o n f o r r e c o r d i n g the U n d e r a r t i c u l a t e d c o n d i t i o n . U tterances i n the second Normal c o n d i t i o n were not used i n MI c a l c u l a t i o n s , or i n the L i s t e n i n g t e s t . The U n d e r a r t i c u l a t e d c o n d i t i o n was recorded l a s t because the experimenters f e l t t h a t i t would be the most d i f f i c u l t c o n d i t i o n to produce i n t e n t i o n a l l y , and that r e c o r d i n g the other c o n d i t i o n s f i r s t would p o s s i b l y help the speaker form an i d e a of what was wanted. The nature of the O v e r a r t i c u l a t e d and U n d e r a r t i c u l a t e d c o n d i t i o n s was not e x p l a i n e d p r i o r to r e c o r d i n g the f i r s t c o n d i t i o n (Normal). T h i s was done because the Normal c o n d i t i o n was intended to r e f l e c t the n a t u r a l a r t i c u l a t o r y p a t t e r n s of the speaker, and a n t i c i p a t i o n of the other c o n d i t i o n s might have r e s u l t e d i n a r t i c u l a t o r y changes. I n s t r u c t i o n s f o r the Normal c o n d i t i o n were to simply read the sentence through, without f u r t h e r prompting. For the O v e r a r t i c u l a t e d c o n d i t i o n , speakers were asked to "exaggerate" t h e i r a r t i c u l a t i o n and to "speak very c l e a r l y , as i f f o r someone with a h e a r i n g l o s s " . For the U n d e r a r t i c u l a t e d c o n d i t i o n , the i n s t r u c t i o n s were to "mumble". I f the c o n t r a s t s d e s i r e d were s t i l l u n c l e a r to the speakers, the experimenter demonstrated the three c o n d i t i o n s . In a d d i t i o n , s u b j e c t s were asked to watch the VU meter of the tape r e c o r d e r and to make sure the d e f l e c t i o n of i t s needle stayed w i t h i n a narrow range around the 0 dB mark. In t h i s way, the average i n t e n s i t y of each sample was kept approximately equal a c r o s s c o n d i t i o n s and sentences. Using a metronome and a stop watch, the speakers a l s o p r a c t i c e d keeping t h e i r speaking r a t e s approximately equal a c r o s s c o n d i t i o n s . Speakers found t h a t m o nitoring i n t e n s i t y and t i m i n g a c r o s s c o n d i t i o n s was d i f f i c u l t , s i n c e t h e i r n a t u r a l tendency was to i n c r e a s e the i n t e n s i t y and d u r a t i o n of u t t e r a n c e s i n order to achieve a r t i c u l a t o r y c l a r i t y . Each sentence was produced two or three times c o n s e c u t i v e l y , u n t i l the speaker was s a t i s f i e d with at l e a s t one u t t e r a n c e under each c o n d i t i o n . Each speaker was p e r m i t t e d to rehearse as much as d e s i r e d b e f o r e r e c o r d i n g commenced, but, even so, most speakers r e p o r t e d that they found the task d i f f i c u l t . The l a b e l l i n g of samples through the r e s t of t h i s paper as U n d e r a r t i c u l a t e d , O v e r a r t i c u l a t e d or Normally A r t i c u l a t e d should t h e r e f o r e be taken to r e f e r to the speakers' i n t e n t i o n s r a t h e r than to the c o n d i t i o n a c t u a l l y a c hieved. 3 9 3.3 DESIGN OF THE LISTENING TEST 3.31 PREPARATION OF THE LISTENING TEST TAPE F i r s t , the best one of the two or th r e e tokens of each u t t e r a n c e was s e l e c t e d and i s o l a t e d , based on absence of h e s i t a t i o n s , m i s a r t i c u l a t i o n s and ti m i n g i r r e g u l a r i t i e s . Even so, the q u a l i t y of the tokens v a r i e d a c r o s s speakers and sentences, mainly due to u n s u c c e s s f u l r a t e c o n t r o l . G e n e r a l l y , u t t e r a n c e s intended to be u n d e r a r t i c u l a t e d were s h o r t e s t , while those intended to be o v e r a r t i c u l a t e d were l o n g e s t . Because of the v a r i a b l e q u a l i t y of the tokens, and because the l e n g t h of the l i s t e n i n g t e s t needed to be l i m i t e d so t h a t the l i s t e n e r s c o u l d maintain t h e i r c o n c e n t r a t i o n , a subset of the best recorded tokens was s e l e c t e d f o r the L i s t e n i n g Test and the MI computations. The b a s i s of s e l e c t i o n was s i m i l a r i t y of u t t e r a n c e d u r a t i o n a c r o s s the three c o n d i t i o n s f o r each speaker, while r e t a i n i n g as many phonemes as p o s s i b l e i n the sentence m a t e r i a l . One speaker, however, was excluded, because she was unable to produce c o n t r a s t s of a r t i c u l a t o r y c l a r i t y to her own or to the experimenters' s a t i s f a c t i o n . A l s o , one speaker (Speaker 1) was i n c l u d e d i n s p i t e of the v a r i a b i l i t y of h i s u t t e r a n c e s , because h i s p r o d u c t i o n s r e p r e s e n t e d extremes i n ti m i n g d i f f e r e n c e s , and i t was d e s i r a b l e to d i s c o v e r the e f f e c t of t h i s v a r i a b i l i t y on the MI values and on the l i s t e n e r judgments. 4 0 The sentences s e l e c t e d are l i s t e d i n Appendix B, and i n f o r m a t i o n about the speakers i s g i v e n i n Table I. E v e n t u a l l y , three sentences each from the French speaker (Subject 6) and from s i x E n g l i s h speakers (Subjects 0 to 5) were s e l e c t e d , f o r a t o t a l of 63 tokens (3 sentences x 3 c o n d i t i o n s x 7 s p e a k e r s ) . See Table II f o r l i s t i n g s of u t t e r a n c e d u r a t i o n s and ranges of d u r a t i o n s f o r the samples s e l e c t e d . Speaker Sex Language D i a l e c t Area SO F E n g l i s h Western Canadian SI M E n g l i s h Received P r o n u n c i a t i o n S2 F E n g l i s h Western Canadian S3 F E n g l i s h Newfoundland S4 M E n g l i s h Western Canadian S5 M E n g l i s h Western Canadian S6 M French Lausanne, S w i t z e r l a n d Table Information r e g a r d i n g speakers s e l e c t e d . A r t i c u l a t o r y Condit ion Speaker Sentence U N 0 SO 1 2.6 sec. 2.6 2.8 (0.2) 2 2.4 2.5 2.9 (0.5) 3 2.5 2.7 2.6 (0.2) SI 1 1.6 2.1 2.8 (1.2) 2 1.8 2.9 3.7 (1.9) 3 2.0 2.5 3.5 (1.5) S2 . 1 2.1 2.2 2.5 (0.4) 2 2.1 2.5 2.6 (0.5) 3 2.5 2.6 2.8 (0.3) S3 1 2.3 2.3 2 . 2 (0.1) 2 2.1 2.4 2.5 (0.4) 3 2.3 2.5 2.5 (0.2) S4 1 2.2 2.3 2.4 (0.2) 2 2.3 2.3 2.6 (0.3) 3 2.5 2 . 3 2.4 (0.2) S5 1 2.7 3.0 2.5 (0.5) 2 2.6 2.5 2.9 (0.4 ) 3 2.9 2.4 2.7 (0.5) S6* 1 1.9 1.9 2.0 (0.1) 2 1.9 1.9 2.0 (0.1) 3 1.9 2.0 2.0 (0.1) * The durations l i s t e d for Speaker 6 are for i corresponding French sentences. Table I I . Duration in seconds of the utterances se lected , parentheses, durat ion di f ferences ( in seconds) between shortest and longest token, for each set of three. 4 2 Once the s e l e c t i o n process was completed, each s e l e c t e d token was c o p i e d twice onto the l i s t e n i n g t e s t tape i n pseudo-random order, each item and i t s d u p l i c a t e being separated by a t l e a s t one other item. The beginning of each u t t e r a n c e was separated from the beginning of the next u t t e r a n c e by nine seconds. Since each u t t e r a n c e was approximately 2.5 seconds i n l e n g t h , t h i s r e s u l t e d i n about 6.5 seconds of s i l e n c e between s u c c e s s i v e u t t e r a n c e s sample, an amount which was found to be s a t i s f a c t o r y i n a p i l o t t e s t . In order t h a t the r e c o r d i n g order c o u l d be checked, the speech samples were recorded on Channel 1 of the two t r a c k tape, and i d e n t i f i c a t i o n numbers f o r each t e s t item were recorded on Channel 2. When the tape was played to the l i s t e n e r s , o n l y the speech samples on Channel 1 were heard, but by s e t t i n g the r e c o r d e r to p l a y both Channel 1 and 2 s i m u l t a n e o u s l y , each item c o u l d be heard together with i t s i d e n t i f i c a t i o n number i f the experimenters wished to i d e n t i f y any sample on the tape. To summarize, the l i s t e n i n g t e s t tape c o n s i s t e d of 126 t e s t items (3 sentences x 3 c o n d i t i o n s x 7 speakers x 2 tokens of each u t t e r a n c e ) , p l u s ten p r a c t i c e items at the beginning and four dummy items (with response spaces on the answer sheet but no items recorded) a t the end. The running time of the complete tape was 20.4 minutes. 4 3 3.32 LISTENERS Ten E n g l i s h speaking l i s t e n e r s - f i v e male and f i v e female - were used f o r the L i s t e n i n g T e s t . No French l i s t e n e r s were i n c l u d e d . The hea r i n g of a l l l i s t e n e r s was t e s t e d beforehand u s i n g standard a u d i o m e t r i c procedures. One male can d i d a t e l i s t e n e r was found to have a p r e v i o u s l y undetected h e a r i n g l o s s , and was thus r e p l a c e d i n the study. Two of the l i s t e n e r s had no knowledge of French, but the remaining e i g h t had some knowledge, ranging from elementary knowledge to good f l u e n c y . Table I I I p r o v i d e s i n f o r m a t i o n r e g a r d i n g the l i s t e n e r s used i n the p e r c e p t u a l t e s t . 3.33 PROCEDURES FOR THE LISTENING TEST The t e s t tape was presented over Sennheiser HD 420 headphones. L i s t e n e r s were asked to r a t e each u t t e r a n c e on a seven p o i n t s c a l e . An example of the response sheet i s given i n Appendix C, as w e l l as the i n s t r u c t i o n s . The low ( l e f t ) end of the s c a l e was l a b e l e d " U n d e r a r t i c u l a t e d " , the mid p o i n t "Normal", and the high ( r i g h t ) end " O v e r a r t i c u l a t e d " . A f t e r some p r a c t i c e items had been prov i d e d , the l i s t e n e r s had the o p p o r t u n i t y to stop the tape r e c o r d e r and ask questions about these items, or any p a r t of the t e s t , i f they wished. F o l l o w i n g t h i s , the tape was rewound to the beginning, and the l i s t e n e r s were encouraged to con t i n u e through the e n t i r e t e s t without s t o p p i n g , i f p o s s i b l e . The "dummy" items p r o v i d e d at the end of the tape were aimed at a v o i d i n g 4 4 Knowledge Subject Sex Age (years) of French L l M 26 NONE L2 F 25 SOME L3 F 36 SOME L4 F 26 SOME L5 F 25 SOME L6 F 33 SOME L7 M 27 SOME L8 M 29 SOME L9 M 26 NONE L10 M 22 SOME Table I I I . Information regarding the l i s t e n e r s . any end e f f e c t s , such as rushing through in an t i c i p a t i o n of f i n i s h i n g . They consisted of items numbered on the answer sheet which were not actu a l l y presented on the tape. 3.4 THE MODULATION INDEX 3.41 DESCRIPTION OF THE INDEX A program was developed to compute a measure of amplitude modulation depth in the d i g i t i z e d envelopes of the speech samples. The program i s l i s t e d in Appendix D. Figure 7 i l l u s t r a t e s the t y p i c a l peaks and troughs found i n a speech signal envelope. The program i d e n t i f i e s f i r s t the peaks (a's) and troughs (b's) of the waveform Figure 7. Labelling of peaks and troughs in the amplitude envelope of a speech sample. tn 46 envelope. The amplitude and the l o c a t i o n of each peak/trough i s then s t o r e d . In a d d i t i o n , the h i g h e s t peak ( a m f t K ) and lowest trough (b w; r t ) are determined, as w e l l as t h e i r average (av = ( a m w + b„,,„)/2). The r a t i o s of trough-to-adjacent-peak amplitudes are c a l c u l a t e d . The product of the ob t a i n e d v a l u e s i s taken to be the b a s i c measure of amplitude modulation i n the sample, s i n c e as amplitude modulation depth i n c r e a s e s ( i . e . g r e a t e r a r t i c u l a t o r y c l a r i t y ) , trough-to-peak r a t i o decreases and t h e r e f o r e the MI decreases. The geometric average of t h i s product i s taken to normalize MI values f o r tokens of d i f f e r e n t l e n g t h s . The b a s i c formula f o r the c a l c u l a t i o n of of modulation depth i s thus: MI = n /.av # Jo^ b, b,. b„ b n . t _ bo_ v a t a i a 1 ax a 3 a^ a„ or more simply MI* = n / a v b j bl b*.i b V t " I — ' —£— • , ' » • • —j-* V a i a i a j a »<i a " The s q u a r i n g of terms i s then e l i m i n a t e d by t a k i n g the square r o o t o f both s i d e s of the equation to y i e l d f i n a l l y the MI: n / ay J a, ' MI = v bi b. bn-i bn-i a, where n i s the number of peaks, av i s a value equal to (a^ax + b yy,,-^ )/2, the a \s are the peak v a l u e s , and the b's are the trough v a l u e s . 3.42 DIGITIZATION OF TOKENS FOR MI CALCULATIONS The use of v a r i o u s v e r s i o n s of the d i g i t i z e d envelope 47 s i g n a l was i n v e s t i g a t e d . The b a s i c method i s i l l u s t r a t e d i n F i g u r e 8. A f t e r r e c t i f i c a t i o n , the s i g n a l was low-pass f i l t e r e d with c u t o f f f r e q u e n c i e s of e i t h e r 25 Hz or 75 Hz. In some cases, t h i s smoothing was f o l l o w e d by l o g a r i t h m i c a m p l i f i c a t i o n ; i n o t h e r s , i t was f o l l o w e d by l i n e a r a m p l i f i c a t i o n . E v e n t u a l l y , the method r e s u l t i n g i n the l e a s t smoothing of the envelope was chosen - i . e . 75 Hz low-pass f i l t e r i n g f o l l o w e d by l i n e a r a m p l i f i c a t i o n . I t was reasoned t h a t i f smoothing was kept to a minimum, l o s s of amplitude modulation i n the d i g i t i z e d envelopes would be avoided. The s i g n a l was sampled on a PDP-12 computer at 200 Hz and s t o r e d on LINC tape, u s i n g a s e t of programs developed by L l o y d R i c e a t UCLA. Once s t o r e d , the s i g n a l s c o u l d be d i s p l a y e d on the o s c i l l o s c o p e screen, i n wave (graphic) form or i n numerical form. Each s i g n a l was i n s p e c t e d i n d i v i d u a l l y , and s t a r t i n g and end p o i n t s f o r MI computation were chosen. The s t a r t i n g p o i n t chosen was always on the r i s i n g s l o p e of the f i r s t peak of the u t t e r a n c e , and the end p o i n t chosen was always on the f a l l i n g s l ope of the l a s t peak (see F i g u r e 9). Trough amplitudes with n e g a t i v e values had to be avoided because the geometric averaging i m p l i c i t i n the the MI formula cannot d e a l with them. S i m i l a r l y , a trough amplitude of zero i s u n d e s i r a b l e s i n c e i t would r e s u l t i n a c a l c u l a t e d MI value of zero. For these reasons, each d i g i t i z e d envelope was manipulated through a program so that i t c o n t a i n e d no d i g i t i z e d trough amplitudes which were FULL SPEECH DIGITIZED SIGNAL ENVELOPE REVOX TAPE RECORDER FULL WAVE RECTIFIER LOW-PASS FILTER 75 Hz LINEAR AMPLIFIER k LOW-PASS FILTER 80 Hz v A/D CONVERTER PDP-12 COMPUTER r J 9 w W w F i g u r e 8 . Speech sample d i g i t i z a t i o n scheme. Figure 9. An example of start and end point locations chosen for Modulation Index analysis of the amplitude envelope of a speech sample. 5 0 n e g a t i v e o r e q u a l t o z e r o . A t t h e same t i m e , s i n c e t h e i n t e n s i t i e s o f t h e s a m p l e s w e r e f o u n d t o v a r y s o m e w h a t , d e s p i t e e f f o r t s t o e n s u r e u n i f o r m i t y t h r o u g h a p p r o p r i a t e i n s t r u c t i o n s t o t h e s p e a k e r s , e a c h d i g i t i z e d e n v e l o p e was a d j u s t e d u p w a r d o r d o w n w a r d i n s u c h a way t h a t t h e a v e r a g e a m p l i t u d e s o f a l l i t s p e a k s h a d a p p r o x i m a t e l y t h e same v a l u e f o r a l l u t t e r a n c e s . MI v a l u e s w e r e o b t a i n e d a n d a n a l y z e d o n l y f o r t h o s e u t t e r a n c e s s e l e c t e d f o r t h e l i s t e n i n g t e s t . 51 CHAPTER FOUR RESULTS 4.1 RESULTS OF THE LISTENING TEST 4.11 CONSISTENCY OF LISTENER JUDGMENTS L i s t e n e r s were asked to judge the a r t i c u l a t o r y c l a r i t y of the speech samples. They gave t h e i r judgments on an i n t e g e r s c a l e , from 1 (Mumbled) to 7 ( O v e r a r t i c u l a t e d ) . The answer sheet g i v e n to the l i s t e n e r s i s i l l u s t r a t e d i n Appendix C. The c o r r e l a t i o n between the MI values computed and the p e r c e p t u a l data c o u l d t h e r e f o r e be checked. The p e r c e p t u a l data a l s o p r o v i d e d a check of how w e l l the speakers performed, s i n c e t h e i r i n t e n t i o n s were not n e c e s s a r i l y r e a l i z e d i n every case. Each speaker's performance was e v a l u a t e d by a n a l y z i n g how the l i s t e n e r s judged h i s / h e r u t t e r a n c e s . Each l i s t e n e r made 18 judgments about a giv e n speaker's p r o d u c t i o n s s i n c e each speaker produced a t o t a l of 9 tokens (3 sentences x 3 c o n d i t i o n s ) , and each token was presented twice to the l i s t e n e r s . These 18 judgments were grouped, a c c o r d i n g to the three c o n d i t i o n s intended by the speakers, i n t o three s e t s of s i x judgments each. The standard d e v i a t i o n s of l i s t e n e r judgments f o r u t t e r a n c e s produced by Speakers 0 to 6 are shown i n Table IV as a f u n c t i o n of the a r t i c u l a t o r y c o n d i t i o n intended by the speaker - i . e . U n d e r a r t i c u l a t e d (U), Normally A r t i c u l a t e d (N), and O v e r a r t i c u l a t e d ( 0 ) . Ten l i s t e n e r s made two judgments f o r each of three sentences per speaker, f o r a t o t a l number of s i x t y judgments per speaker and per c o n d i t i o n . The l i s t e n e r judgments of speech samples from Speakers 2 and 6 were the l e a s t v a r i a b l e , and those of Speaker 5 were the most v a r i a b l e . These r e s u l t s i n d i c a t e t hat l i s t e n e r s found u t t e r a n c e s produced by Speaker 5 more d i f f i c u l t to judge than the u t t e r a n c e s of other speakers. For t h i s reason, the data from Speaker 5 was excluded from f u r t h e r a n a l y s e s . A s i m i l a r a n a l y s i s of standard d e v i a t i o n s , t h i s time as a f u n c t i o n of the l i s t e n e r who made the judgments, r e v e a l e d that judgments by L i s t e n e r 6 were c o n s i d e r a b l y l e s s c o n s i s t e n t than judgments by the other l i s t e n e r s . Table V shows the standard d e v i a t i o n s of judgments f o r each l i s t e n e r a c r o s s the a r t i c u l a t o r y c o n d i t i o n s intended by the speakers (Speaker 5 exc l u d e d ) . In a d d i t i o n , the performance of the l i s t e n e r s themselves was e v a l u a t e d by a n a l y z i n g the c o n s i s t e n c y of t h e i r judgments f o r repeated items ( h e r e a f t e r " r e p e a t a b i l i t y " ) . The r e s u l t s of the a n a l y s e s of r e p e a t a b i l i t y are shown i n Table VI as a f u n c t i o n of the speaker whose u t t e r a n c e s were judged, and i n Table VII as a f u n c t i o n of the l i s t e n e r who made the judgments. F u r t h e r e x c l u s i o n s of speakers or l i s t e n e r s were not necessary on the b a s i s of these r e s u l t s . 53 Under-a r t i c u l a t e d Speaker s.d. (n=60) Normal a r t i c u l a t i o n s.d. (n=60) Over-a r t i c u l a t e d s.d. (n=60) Mean s.d. (combined c o n d i t i o n s ) (n=180) 0 0.58 0.86 0.72 0.72 1 0.83 0.70 0.60 0.71 2 0.41 0.72 0.59 0.57 3 0.56 0.76 0.91 0.74 4 0.67 0.61 1.02 0.77 5 0.73 0.87 0.96 0.85 6 0.53 0.51 0.64 0.56 Table IV. Standard d e v i a t i o n s f o r l i s t e n e r judgments a c r o s s the a r t i c u l a t o r y c o n d i t i o n s intended by the speakers f o r Speakers 0 to 6 i n c l u s i v e . (The data were drawn from 10 l i s t e n e r s , and two judgments per l i s t e n e r per sentence.) Under- Normal Over- Mean s.d. a r t i c u l a t e d a r t i c u l a t i o n a r t i c u l a t e d (combined s.d. s.d. s.d. c o n d i t i o n s ) L i s t e n e r (n=42) (n=42) (n=42) (n=126) 1 0.54 0. 59 0.64 0.59 2 0.47 0.84 0.92 0. 74 3 0.81 0.60 0.63 0.68 4 0.58 0.68 0.63 0.63 5 0.44 0.23 0.54 0.40 6 0.79 1. 18 1 . 18 1.05 7 0.61 0.73 0.90 0.75 8 0.75 0.67 0.81 0. 74 9 0.63 0.90 0.67 0.73 10 0. 57 0.73 0.85 0.72 Table V. Standard d e v i a t i o n s f o r l i s t e n e r judgments by ten l i s t e n e r s a c r o s s the a r t i c u l a t o r y c o n d i t i o n s intended by the speakers. (The data are drawn from Speakers 0 to 6 i n c l u s i v e , and from two judgments per sentence f o r three sentences per speaker.) 5 4 Number of d i f f e r e n c e s g r e a t e r than 1 between repeated Speaker judgments Percent d i f f e r e n c e s g r e a t e r than 1 between repeated judgments (n=81) 0 9 11.1 1 8 9.9 2 4 4.9 3 5 6.2 4 9 11.1 6 11 13.6 Table VI. R e p e a t a b i l i t y o f of the speaker. (The data c o n d i t i o n s , three sentences L i s t e n e r 6 excluded.) l i s t e n e r judgments as a f u n c t i o n are drawn from three a r t i c u l a t o r y per speaker, and nine l i s t e n e r s -Number of d i f f e r e n c e s g r e a t e r than 1 between repeated L i s t e n e r judgments Percent d i f f e r e n c e s g r e a t e r than 1 between repeated judgments (n=54) 1 1 1.9 2 8 14.8 3 5 9.3 4 6 11.1 5 1 1.9 7 3 5.6 8 6 11.1 9 10 18.5 10 10 18.5 Table V I I . R e p e a t a b i l i t y of l i s t e n e r judgments a c r o s s l i s t e n e r s . (The data are drawn from three a r t i c u l a t o r y c o n d i t i o n s , and three sentences f o r each of s i x speakers Speakers 0, 1, 2, 3, 4, and 6.) 55 F o r a l l o t h e r a n a l y s e s o f the p e r c e p t u a l d a t a , the mean o f the two judgment s c o r e s from each o f the l i s t e n e r s f o r each token r e p l a c e d the i n d i v i d u a l s c o r e s . 4.12 COMPARISON OF LISTENER'S JUDGMENTS WITH SPEAKER'S INTENTIONS In F i g u r e 10, the means o f l i s t e n e r judgment s c o r e s , s e p a r a t e l y f o r each s p e a k e r , are p l o t t e d as a f u n c t i o n o f the a r t i c u l a t o r y c o n d i t i o n i n t e n d e d by the s p e a k e r . The same d a t a a r e a r r a n g e d i n F i g u r e 11 to d i s p l a y the means o f l i s t e n e r judgment s c o r e s , a l l speakers p o o l e d , f o r each l i s t e n e r s e p a r a t e l y . The means o f the judgment s c o r e s i n each c o n d i t i o n can be seen to c o r r e l a t e w e l l w i th the s p e a k e r s ' i n t e n t i o n s f o r every s p e a k e r . When i n d i v i d u a l sentences were c o n s i d e r e d , however, some d i sagreement between the s p e a k e r s ' i n t e n t i o n s and the l i s t e n e r s ' p e r c e p t i o n became a p p a r e n t . In a d d i t i o n , the agreement between s p e a k e r ' s i n t e n t i o n s and l i s t e n e r ' s p e r c e p t i o n was b e t t e r f o r some speakers and some l i s t e n e r s than f o r o t h e r s . Speakers 2 and 6, and L i s t e n e r s 1, 3, and 5 were the bes t s u b j e c t s i n t h i s r e s p e c t . F o r the three-way c o n t r a s t o f a r t i c u l a t o r y c o n d i t i o n s ( U v s . N v s . O ) , t h e r e were 162 s e t s of judgments , s i n c e 6 speakers each produced 3 sentences which were each judged by 9 l i s t e n e r s . The l i s t e n e r s ' p e r c e p t i o n agreed wi th the L i s t e n e r Judgments P -(JO c fD CL H -< CO P - r r t>5 (D r r B P - f0 O "1 3 CO (_.. • C C L 09 3 (T> 3 r r CO (» CO to fu a o r r P -O 3 of r r 3" CD CO •a CD m CD • o II 3 n CD 3 <_>. c Cl-CW 3 tn 3 r r c o a a cu n a . 3 r r fD 3 a . fD > r r C M O |-C cr* << •< o CO o TJ 3 CD C L 03 p . 7? r r ft) p . i-i O 3 O M CO 15 fD CD ?r fD i-l L i s t e n e r Judgments > i-i P - r r 3 P -m 3 CL fD a . to o i-i cr >< n tn o •o s fD CL t» P-7? rt tn p -i-i o 3 z n O n CO T 3 fD CU 7? fD I-i L i s t e n e r Judgments > i-i P - r r c 3 H -r r n fD C 3 t-i a . cu fD r r C L O Z + H cr •< n co o -O 3 fD C L O r f CB P -IT r r fD P -i-l O 3 cn T J fD CU ?c fD l-l ON I-i fD 3 O 3 1 L i s t e n e r Judgments - 1 t s i C J .p- Kj\ CN > i-i P - r r 3 P -r r n fD c 3 P> a . Cu fD r r C L O i-l cr v : O CO O XJ 3 fD a . 0) P -?r r r fD P -i-i O 3 Z 1-4 cn •a fD CU fD i-l L i s t e n e r Judgments > i-i P - r r C 3 P* r r O fD C 3 M C L CU fD r r C L O •I cr C l CO o • O 3 fD C L O 03 p . TT rt fD P -•1 O 3 Z M H h-cn T J fD CD 7? fD L i s t e n e r Judgments Z r H > P - r r C t f l 3 P -r r O fD C 3 P< C L 01 fD r r C L O I-I c r o co o T J a fD C L O 0> p . K rt fD p -•I O a cn T J (D » fD r-i V-0-« 99 Listener Judgments Listener Judgments Ul Ul OO c n re r* * CL rr ft < CO rr u re rt 3 H- re O n 3 cn • c e. 05 s re 3 r cn (u /—V cn o o p PT l-Ti C 3 C n re PT a. H-* 0 • 3 a * — ' O M l PT 3* re r-1 H* cn PT re 3 re p( • 5 re 3 > H* ft C 3 H" PT n re c 3 f—' CL Qi re PT CL O Z p( cr vs >< n cn o •a 3 re CL O CU (-u ?r PT re H-i-l o 3 re 3 re Listener Judgments > r n C 3 H-PT n re c 3 r-a- cu re PT CL O Z II O" v-n cn o 3 re CL o +t 7T rt re H -n o 3 re 3 re i-l •OH > i-l H* rt C 3 H-PT n ro c 3 I-CL 01 (B rr CL O Z i-< C T v j n cn o "O 3 re CL o CU H-JC PT re H-n o 3 f cn PT re 3 re Listener Judgments rO U) > H* rt C 3 H-PT P) re c 3 I-CL Hi ro rr a o z rt CP v : n ca o •a p re a. o cu p. jr PT re p* 11 o 3 rr cn PT re 3 re •-I ro C a co 3 re 3 PT o CU 3 CL CD i-l > M H- rt C 3 P-rr n re c 3 M O. CU ft) rr CL o z II cr v-o cn o 3 re CL o cu p-?r PT ro H -n o P Listener Judgments ro UJ -C- U i CT\ « J 1 1 1 1 1 h Listener Judgments rr P-cn rt ro 3 ro ii > H* rt C 3 P-ro 3 a ro cr *< cn •a ro cu ?r ro n Z M o o 3 CL O r-1 ro H r -U i cn rt re 3 ro »i u> Listener Judgments J ro UJ .ts i_n CTN » j > a H- c rr f> ro C a t-(X 01 CD rt CL O z cr vs vj o 01 O TS a 0) CL Oi p. o ?r rr (t> P-r< 0 a f 01 rr rt> a ro i-i CO Listener Judgments N5 U l -C» U l Ov > w* rt 3 V -rt n n> c 3 r-1 Cu 0> rt rt O. O z i i cr vs o 01 o •O 3 ro CL o 01 H-?r pt ro H-i-l o a -r-l -I— Ul rr ro 3 ro i i Listener Judgments > i i rt 3 H* c rr O ro c 3 CL w ro rt CL O H z c r v ; vs O 01 O •a 3 ro CL Qi H- o f c rr ro H-on ts> UJ -t> Ul CT> -»J _, 1 1 , r r1 H-CO rt ro 3 ro r| 59 speaker's intentions i n 117 out of 162 sets. Agreement for the two-way contrasts was 88% (143/162) for U vs N, 83% (135/162) for N vs. 0, and 99% (161/162) for U vs. 0. Thus, there was, i n general, a monotonic re l a t i o n s h i p between the speakers' intentions and the l i s t e n e r s ' perception of increasing a r t i c u l a t o r y c l a r i t y . 4.13 EFFECT OF LANGUAGE ON LISTENER JUDGMENTS The small size of the corpus used li m i t e d the conclusions which could be drawn about the eff e c t s of language on the data. However, based on the data available, agreement between the speakers intentions and the perceptual data was s l i g h t l y better for the English than the French speech. Of the 27 sets of l i s t e n e r judgments applying to the three French sentences, (9 l i s t e n e r s x 3 sentences) 17 (63%) had complete agreement between the speaker's intentions and the r e l a t i v e ranking of a r t i c u l a t o r y c l a r i t y assigned by the l i s t e n e r s for a l l three a r t i c u l a t o r y conditions. For the English speakers, there was 73% (99/135 judgments) agreement between speakers' intentions and l i s t e n e r s ' judgments. The f a m i l i a r i t y of the l i s t e n e r s with French also seems to a f f e c t t h e i r judgments. In fact, when only the French utterances are considered, the two l i s t e n e r s who knew no French were in 100% (6/6 sentences) agreement with the speaker's intentions, whereas the 7 l i s t e n e r s with some knowledge of French agreed with the speaker's intentions for only 11 of 21 sentences (52%). Perhaps the l i s t e n e r s with no 60 French were able to concentrate more on the a r t i c u l a t i o n of the speaker than the l i s t e n e r s with some French, since the l a t t e r group may have been d i s t r a c t e d by attempts to decipher the semantic content of the sentences. The small sample size does not allow s t a t i s t i c a l analysis of the difference observed or pursuit of t h i s possible explanation, however. 4.14 EFFECT OF UTTERANCE DURATION VARIABILITY ON LISTENER JUDGMENTS Speaker 1 was selected for inclusion in the Listening Test because he, of a l l the speakers, produced the greatest contrasts of utterance duration across the three a r t i c u l a t o r y conditions. However, a t - t e s t shows that speaker/listener agreement for Speaker 1 (22/27 judgments or 81%) was not s i g n i f i c a n t l y d i f f e r e n t from the agreement for the other English speakers (77/108 judgments or 71%) (p = 0.05, z = 1.55 on a two-tailed Proportions Test). 4.2 RESULTS OF THE MI CALCULATIONS Various combinations of low-pass f i l t e r i n g (25 Hz, 75 Hz) and amplification (logarithmic vs. linear) were explored in d i g i t i z i n g the data in view of MI computation. The parameters selected for the f i n a l measurements were low-pass f i l t e r i n g with a cutoff frequency of 75 Hz, and l i n e a r amplification. Of a l l the variations explored, this combination resulted in retention of the greatest amount of amplitude modulation in the d i g i t i z e d sample. It was reasoned that setting the cutoff frequency of the f i l t e r as high as possible (given equipment li m i t a t i o n s ) and using l i n e a r rather than logarithmic amplification should avoid any loss of contrast in amounts of amplitude modulation for d i f f e r e n t samples which might res u l t otherwise from excessive smoothing of the envelope. The MI values obtained with t h i s analysis are l i s t e d i n Table VIII as a function of the speaker's intentions for each utterance. The MI values were arranged into MI ranking orders (see Table IX). A smaller MI value indicates more amplitude modulation i n the s i g n a l . Consequently, in accordance with the hypothesis that amplitude modulation increases with a r t i c u l a t o r y c l a r i t y , MI ranking i s assigned as a function of decreasing MI value. The order of increasing a r t i c u l a t o r y c l a r i t y intended by the speaker was assumed to oerrtiipend to the order UNO for each sentenoe. Eaoh of the tokens retained i t s l a b e l according to speaker intentions, but the order of the three tokens per sentence could be rearranged according to MI values. For example, a sentence with MI values of 0.789 for the U condition, 0.876 for the N condition, and 0.688 for the O would be assigned an MI ranking of NUO. As can be seen in Table IX, the MI ranking order f a i l e d Modulation Index Values Speaker Sentence Under- Normal Over-a r t i c u l a t e d a r t i c u l a t i o n a r t i c u l a t e d 0 1 0.819 0.906 0.923 2 0.800 0.884 0.926 3 0.936 0.865 0.832 1 1 0.886 0.848 0.897 2 0.860 0.872 0.868 3 0.871 0.847 0.902 2 1 0.767 0.586 0.683 2 0.885 0.813 0.842 3 0.779 0.808 0.828 3 1 0.894 0.762 0.860 2 0.896 0.885 0.909 3 0.860 0.844 0. 788 .4 1 0.821 0.674 0.699 2 0.889 0.822 0.833 3 0.831 0.795 0.791 6 1 0.842 0.875 0. 778 2 0.923 0.916 0.752 3 0.839 0.868 0.751 Table V I I I . Modulation Index values f o r u t t e r a n c e s by s i x speakers. (The d i v i s i o n of values by a r t i c u l a t o r y c o n d i t i o n r e f l e c t s the i n t e n t i o n s of the speakers.) 6 3 Speaker Sentence Modulation Index ranking 0 1 ONU 2 ONU 3 UNO 1 1 OUN 2 NOU 3 OUN 2 1 UON 2 UON 3 ONU 3 1 UON 2 OUN 3 UNO 4 1 UON 2 UON 3 UNO 6 1 NUO 2 UNO 3 NUO Table IX. Modulation Index rankings of utterances by a r t i c u l a t o r y condition. (The labels for each token corresponds to the a r t i c u l a t o r y condition intended by the speaker. According to the experimental hypothesis, a l l MI ranking orders would be UNO i f the speakers were completely successful at producing the desired a r t i c u l a t o r y contrasts.) 6 4 to match the speakers' intentions consistently. Agreement between MI ranking and speakers' intentions was 22% (4/18 sentences) when a three-way contrast (U vs. N vs. 0) was considered. For the three two-way contrasts, U vs. N., N vs. 0, and U vs. 0, agreement was 72% (13/18), 39% (7/18), and 61% (11/18), respectively. Sample sizes were too small to allow meaningful analysis of the e f f e c t s of language or of utterance duration v a r i a b i l t i y on MI values. 4.3 COMPARISON OF MI VALUES WITH PERCEPTUAL DATA 4.31 COMPARISON OF MI VALUES WITH LISTENER JUDGMENTS MI values ranking orders were compared to the l i s t e n e r judgments ranking orders. Each token s t i l l retained i t s label according to the speaker's intentions. There were 18 possible comparisons, one for each of three sentences produced by each of the six speakers. However, some way of selecting the sentences for which there was most agreement between l i s t e n e r s was needed. The only sentences out of the o r i g i n a l eighteen included in the analysis were those for which a l l nine l i s t e n e r s had previously shown agreement with the speakers' intentions, or sentences for which one l i s t e n e r out of nine had judged two tokens to have the same degree of a r t i c u l a t o r y c l a r i t y . In other words, at worst, one l i s t e n e r had judged as equivalent two d i f f e r e n t tokens of the same sentence. Five sentences (1 each produced by Speakers 0 and 1, and 3 produced by Speaker 2) met the above c r i t e r i a in a l l three c o n d i t i o n s . T h i s meant t h a t , i n the three-way c o n t r a s t U vs. N vs. 0, a p e r c e p t u a l r a n k i n g order of UNO c o u l d be a s s i g n e d on the b a s i s of the judgments from a t l e a s t e i g h t l i s t e n e r s . None o f the f i v e sentences had an MI ranking order of UNO (0% agreement). For the two-way c o n t r a s t s U vs. N, N vs. 0, and U vs. 0, agreement was 71% (10/14 sent e n c e s ) , 27% (3/11 se n t e n c e s ) , and 61% (11/18 sent e n c e s ) , r e s p e c t i v e l y . Both the p e r c e p t u a l data and the MI values i n d i c a t e t h at the speakers produced more c o n t r a s t i n a r t i c u l a t o r y c l a r i t y between the U and N c o n d i t i o n s than between the N and 0 c o n d i t i o n s . In order to b e t t e r e v a l u a t e the r e l a t i o n s h i p between the MI valu e s and the p e r c e p t u a l data, data from the best speakers (Speaker 2 and Speaker 6) was p l o t t e d a g a i n s t data from the best l i s t e n e r s ( L i s t e n e r s 1, 3, and 5) (see F i g u r e 12). Unexpectedly, trends were s u g g e s t i v e of a non-monotonic r e l a t i o n s h i p with lower MI val u e s f o r tokens judged by l i s t e n e r s to be over- or u n d e r a r t i c u l a t e d than f o r tokens judged to be normally a r t i c u l a t e d . There was, however, a g r e a t d e a l of s c a t t e r i n these p l o t s , i n d i c a t i n g t h a t the r e l a t i o n s h i p between the MI and t h i s p e r c e p t u a l data i s not p a r t i c u l a r l y s t r o n g . 11 0.5 " 0.4 n 0.3 . i M 0.2 0.1 u u ti o o o o n n 0.0 2 3 4 5 Listener Judgments - Listener 1 Figure 12a. The relationship between Modulation Index values and listener judgments. MI values are for utterances by Speakers 2 and 6. u indicates the point corresponds to an utterance intended by the speaker to be Underarticulated. n indicates the point corresponds to an utterance intended by the speaker to be Normally Articulated, o indicates the point corresponds to an utterance intended by the speaker to be Overarticulated. o 0.5I u u o o un n n o 0 Listener Judgments - Listener 3 Figure 12 b. 0 . 3 4-o o o y o n B n : • 1 • H — 1 2 3 4 5 Listener Judgments - L i s t e n e r 5 Figure 12c. 0.5 0.4.-« 0-3 0.2 •• 0.1 0.0 n n -V- 4-2 3 4 5 6 7 Listener Judgments - Listeners 1, 3 & 5 combined F i g u r e 12d. The line indicates the non-monotonic relationship suggested by the points. 70 4.32 COMPARISON OF MI VALUES WITH VISUAL IDENTIFICATION OF WAVEFORMS In an informal b l i n d test, the author attempted to v i s u a l l y i d e n t i f y the intended a r t i c u l a t o r y conditions (U, N , or O) from the waveforms of the various randomly selected samples, when they were displayed on an oscilloscope after amplitude equalization. The v i s u a l i d e n t i f i c a t i o n s matched the speaker intentions i n only 11 out of 30 t r i a l s . A confusion matrix (Table X) reveals that the author could r e a d i l y discern the U vs. 0 contrast intended by the speakers, but not the U vs. N vs. O contrast. The visua l i d e n t i f i c a t i o n s were also compared to the perceptual ranking orders based on the l i s t e n e r s ' judgments. For thi s comparison, a confusion matrix (Table XI) again showed that tokens which were judged to have low or high a r t i c u l a t o r y c l a r i t y by the l i s t e n e r s were those v i s u a l l y i d e n t i f i e d by the author as U or O respectively, whereas tokens which were rated in the middle of the scale of a r t i c u l a t o r y c l a r i t y by the li s t e n e r s were v i s u a l l y i d e n t i f i e d as O most often, as U some of the time, but seldom as N. Three differences were observed in the shapes of the waveforms from the three a r t i c u l a t o r y conditions. F i r s t , Overarticulated tokens, as hypothesized, presented the most depth of modulation, which could be c l e a r l y observed by comparing the excursion of adjacent peaks and troughs. 7 1 A r t i c u l a t o r y Condition v i s u a l l y i d e n t i f i e d U N 0 A r t i c u l a t o r y Condition U 5 5 0 intended by the Speakers N 4 1 5 0 1 4 5 Table X. Confusion matrix comparing v i s u a l i d e n t i f i c a t i o n of a r t i c u l a t o r y condition with speaker intentions. A r t i c u l a t o r y Condition v i s u a l l y i d e n t i f i e d A r t i c u l a t o r y Condition based on rankings of tokens according to Listener Judgments u N 0 u 55 , 4 7 6 N 16 8 30 0 10 34 62 Table XI. Confusion matrix comparing visual i d e n t i f i c a t i o n of a r t i c u l a t o r y condition with the a r t i c u l a t o r y condition suggested by the perceptual ranking order according to the l i s t e n e r s ' judgments. S e c o n d , w h e r e a s U n d e r a r t i c u l a t e d t o k e n s a p p e a r e d s m o o t h a n d r o u n d e d , w i t h a few m a i n p e a k s a n d t r o u g h s , O v e r a r t i c u l a t e d t o k e n s t e n d e d t o h a v e j a g g e d p e a k s , a n d t h e t o p o f e a c h m a i n p e a k c o n t a i n e d s e v e r a l s m a l l e r p e a k s a n d t r o u g h s . T h e N o r m a l l y A r t i c u l a t e d t o k e n s h a d f e a t u r e s o f t h e o t h e r two c o n d i t i o n s . T h e t h i r d o b s e r v a b l e d i f f e r e n c e d e s e r v e s m o r e d i s c u s s i o n . T h e O v e r a r t i c u l a t e d t o k e n s , a n d t o a l e s s e r e x t e n t , t h e N o r m a l l y A r t i c u l a t e d t o k e n s , h a d p l a t e a u s i n t h e i r e n v e l o p e s w h e r e t h e i n t e n s i t y d r o p p e d t o z e r o f o r some s h o r t p e r i o d o f t i m e . T h e s e p l a t e a u s c o r r e s p o n d e d t o p a u s e s b e t w e e n w o r d s . T h i s r e s u l t e d i n a " b o t t o m - c l i p p i n g " e f f e c t , s i n c e t h e d o w n w a r d e x c u r s i o n o f t r o u g h s was l i m i t e d t o z e r o i n t e n s i t y . A s c a n b e s e e n f r o m t h e f o r m u l a f o r t h e c a l c u l a t i o n o f t h e M I , t h e MI was n o t d e s i g n e d t o t a k e b o t t o m c l i p p i n g i n t o a c c o u n t . R e c a l l t h a t : w h e r e n i s t h e n u m b e r o f p e a k s , a v = ( a m 8 J t + b ^ ; * ) / 2 , a m )o, i s t h e v a l u e o f t h e h i g h e s t p e a k , a n d b m i / ( i s t h e v a l u e o f t h e l o w e s t t r o u g h , a l l 9 n a r e t h e p e a k v a l u e s , a n d b , ^ „ . , a r e t h e t r o u g h v a l u e s . S a m p l e s w e r e m a n i p u l a t e d p r i o r t o MI c a l c u l a t i o n s s o t h a t z e r o t r o u g h a m p l i t u d e s w o u l d b e e l i m i n a t e d . T h e r e f o r e , s i n c e MI v a l u e s d e p e n d o n t r o u g h - t o - p e a k a m p l i t u d e r a t i o s , i f t r o u g h amplitudes are l i m i t e d by bottom c l i p p i n g when peak amplitudes are not, MI values w i l l be a r t i f i c i a l l y e l e v a t e d f o r tokens c o n t a i n i n g pauses. T h i s e l e v a t i o n of MI values would apply mainly to tokens which are more c l e a r l y a r t i c u l a t e d , and f o r which low MI values are expected a c c o r d i n g to the hypothesis which i s the m o t i v a t i o n f o r t h i s study. I t i s p o s s i b l e , then, t h a t t h i s e l e v a t i o n of MI values weakened the apparent r e l a t i o n s h i p between the MI ranking order and the p e r c e p t u a l ranking order by reducing c o n t r a s t s between MI values f o r tokens judged by the l i s t e n e r s to d i f f e r i n a r t i c u l a t o r y c l a r i t y . T h i s p o s s i b i l i t y was e x p l o r e d with a subset of the data. I t was reasoned that MI values would be l e s s a f f e c t e d by l i m i t e d e x c u r s i o n of troughs i f the MI formula were a r i t h m e t i c i n s t e a d of geometric. The MI formula was m o d i f i e d to i n c l u d e d i f f e r e n c e s between peak and trough values r a t h e r than r a t i o s . The f o l l o w i n g i s the m o d i f i e d MI formula: MImod = l/ n [ a v - a„ + £(b; - a - )1 av = 1/2 (a m a„ + brrt* ) . MI was then r e c a l c u l a t e d f o r a subset of tokens which c o n t r a s t e d i n the number of pauses they c o n t a i n e d . The r e s u l t i n g MI ranking orders are shown i n Table X I I . The m o d i f i c a t i o n does not change the MI ranking orders s i g n i f i c a n t l y . For the two sentences f o r which there i s a change i n ranking {Speaker 3, Sentence 1 and Speaker 4, Sentence 3), the s e p a r a t i o n i n the o r i g i n a l MI values 7 4 Speaker Sentence MI MImod ranking r a n k i n g 0 1 ONU ONU 2 2 UON UON 3 1 UON UON 4 3 UNO UON 6 1 NUO NUO Table X I I . Comparison of rank orders a c c o r d i n g to the o r i g i n a l MI valu e s with rankings a c c o r d i n g to MImod. Peak/trough r a t i o s are the b a s i s f o r MI c a l c u l a t i o n s ; peak/trough d i f f e r e n c e s are the b a s i s f o r MImod s c o r e s . between the two c o n d i t i o n s which r e v e r s e d was very s m a l l , and consequently the changes i n rank cannot be c o n s i d e r e d s i g n i f i c a n t . There was an improvement i n most cases i n the s e p a r a t i o n between MI values c a l c u l a t e d f o r tokens of c o n t r a s t i n g a r t i c u l a t o r y c l a r i t y with t h i s m o d i f i c a t i o n . Even so, the lac k of improvement i n rank o r d e r i n g of c o n d i t i o n s suggests t h a t some other method of d e a l i n g with bottom c l i p p i n g , such as e l i m i n a t i o n of pauses from the tokens analyzed, may be a b e t t e r s o l u t i o n . 75 CHAPTER FIVE DISCUSSION AND CONCLUSIONS The purpose of t h i s study was the e x p l o r a t i o n of a method f o r e s t i m a t i n g the i n t e l l i g i b i l i t y of d i f f e r e n t speakers as a f f e c t e d by the a r t i c u l a t o r y c l a r i t y of t h e i r speech. The b a s i c premise behind t h i s computed measure, the MI, i s that the amplitude envelope of the waveform f o r i n t e l l i g i b l e speech i s c h a r a c t e r i z e d by g r e a t e r amplitude modulation than the amplitude envelope f o r l e s s i n t e l l i g i b l e speech; consequently, a measure of amplitude modulation should p r o v i d e a measure of speech i n t e l l i g i b i l i t y . T h i s idea has been p r e v i o u s l y a p p l i e d to the more g e n e r a l case of speech i n t e l l i g i b i l i t y i n l i s t e n i n g spaces by Houtgast, Steeneken, and t h e i r c o l l e a g u e s (Houtgast and Steeneken, 1973, 198-1; Houtgast, Steeneken and Plomp, 1980; Plomp, Houtgast and Steeneken, 1980; Steeneken and Houtgast, 1980; van R e i t s o h o t e , Houtgast and Steeneken, 1981, 1984; Steeneken and Houtgast 1983). These authors developed the Modulation T r a n s f e r F u n c t i o n as the b a s i s f o r the second v e r s i o n of the Speech T r a n s m i s s i o n Index. In room a c o u s t i c s , the changes between source and r e c e i v e r of the amplitude spectrum of a speech sample may be compared b e f o r e and a f t e r passage through a speech t r a n s m i s s i o n system. Reduction i n amplitude modulation and the c o r r e s p o n d i n g r e d u c t i o n i n i n t e l l i g i b i l i t y are due to 76 s e v e r a l causes, i n p a r t i c u l a r n o i s e , r e v e r b e r a t i o n , and the f i l t e r i n g e f f e c t of the system. In the second v e r s i o n of the STI, Houtgast and Steeneken (Houtgast and Steeneken, 1973, 1985; Steeneken and Houtgast, 1980, 1983) used a simple a r t i f i c i a l s i g n a l c o n s i s t i n g of a sum of s i n u s o i d a l l y modulated bands of pink n o i s e as an input s i g n a l f o r measurement of the Modulation T r a n s f e r F u n c t i o n . The s i g n a l i s thus shaped a c r o s s the frequency range of 125 to 8000 Hz to resemble an average speech amplitude spectrum modulated a t the modulation f r e q u e n c i e s found i n n a t u r a l speech. In c o n t r a s t , when comparing i n t e l l i g i b i l i t y of one speaker, n a t u r a l speech must be used, s i n c e i t i s not the average p r o p e r t i e s of speech which are of i n t e r e s t . T h i s makes the task of the proposed measure, the M I , somewhat more co m p l i c a t e d than the task of the Modulation T r a n s f e r F u n c t i o n . The modulation f r e q u e n c i e s i n n a t u r a l speech are u n l i k e l y to remain c o n s t a n t a c r o s s speakers and a c r o s s speech samples, and a measure u s i n g n a t u r a l speech must a l s o be equipped to cope with i n t e n s i t y and ti m i n g d i f f e r e n c e s . In order to minimize the e f f e c t s of these other f a c t o r s i n the present study, a d e s i g n was used i n which speakers attempted to produce the same sentences i n three d i f f e r e n t ways: U n d e r a r t i c u l a t e d (or mumbled), Normally A r t i c u l a t e d , and O v e r a r t i c u l a t e d . E f f o r t s were made to minimize d i f f e r e n c e s i n i n t e n s i t y and d u r a t i o n of tokens, through i n s t r u c t i o n to the l i s t e n e r s , s e l e c t i o n of the tokens with l e a s t v a r i a b i l i t y i n d u r a t i o n a c r o s s a r t i c u l a t o r y c o n d i t i o n s , and man i p u l a t i o n of 77 samples to e q u a l i z e i n t e n s i t i e s . Even so, contaminants such as i n t e n s i t y and t i m i n g v a r i a t i o n s w i t h i n the tokens remained. One source of e r r o r i n t h i s study, t h e r e f o r e remained: the speakers, more s p e c i f i c a l l y the type of speech they produced. Although speakers were allowed r e h e a r s a l time and feedback as to the adequacy of t h e i r p r o d u c t i o n s , the task of producing the a r t i c u l a t o r y c o n t r a s t s proved to be d i f f i c u l t , and some speakers were more s u c c e s s f u l than others i n producing a r t i c u l a t o r y c o n t r a s t s . T h i s problem was compounded when the MI values were checked a g a i n s t p e r c e p t u a l data. Normal h e a r i n g l i s t e n e r s were asked to r a t e , the speakers' u t t e r a n c e s as to the a r t i c u l a t o r y c l a r i t y they p e r c e i v e d . R e s u l t s i n d i c a t e t h a t the l i s t e n e r s were f a i r l y s u c c e s s f u l i n t h i s r e s p e c t - much more so, i n f a c t , than the MI. However, some l i s t e n e r s were much b e t t e r (and more c o n s i s t e n t ) than others i n guessing the speaker's i n t e n t i o n s . Perhaps some of the l i s t e n e r s were responding to a d d i t i o n a l cues i n the speech tokens ( i . e . tim i n g or i n t e n s i t y d i f f e r e n c e s , e t c .) which the MI was not designed to d e t e c t . The o b s e r v a t i o n that l i s t e n e r s with no knowledge of French seemed b e t t e r a t judging the a r t i c u l a t o r y c l a r i t y intended by a French speaker than l i s t e n e r s with no knowledge of French suggests t h a t the l i s t e n e r s c o u l d be d i s t r a c t e d by f a c t o r s other than those of i n t e r e s t to the experimenters. In s p i t e of a l l of the d i f f i c u l t i e s encountered, a r e l a t i o n s h i p was observed between the MI values ( c a l c u l a t e d for the utterances of the best speakers) and the judgments of the best l i s t e n e r s (see Figure 12). The trend was toward a non-monotonic rel a t i o n s h i p , with lower MI values for tokens judged to be Underarticulated or Overarticulated than for those tokens judged to be Normally A r t i c u l a t e d . The nature of t h i s r e l a t i o n s h i p presents a problem. According to the non-monotonic curve suggested by the d i s t r i b u t i o n of the points, low MI values correlate with both extremely high and extremely low a r t i c u l a t o r y c l a r i t y , whereas high MI values correlate with average a r t i c u l a t o r y c l a r i t y . Clearly, some modifications to the measure are required in order to esta b l i s h a monotonic r e l a t i o n s h i p between MI values and a r t i c u l a t o r y c l a r i t y , i f the MI i s to be a useful speech i n t e l l i g i b i l i t y rating t o o l . In order to discover what some of these modifications should be, the experimenter studied the envelopes of the tokens displayed on an oscilloscope. There were cues which served to i d e n t i f y the a r t i c u l a t o r y conditions. As expected, the modulation depth observed in the tokens increased with the a r t i c u l a t o r y c l a r i t y perceived by the l i s t e n e r s . In addition to t h i s expected contrast, however, the number of pauses between words increased as perceived a r t i c u l a t o r y c l a r i t y increased. Although according to Picheny, Durlach and Braida (1985, 1986) the number of pauses i s a factor contributing to speech i n t e l l i g i b i l i t y , for the purposes of MI calculations of amplitude modulation, the pauses were contaminants. Since pauses are b r i e f periods of silence, the in t e n s i t y at pauses 79 descends to zero and then remains there u n t i l the next word begins. T h i s r e s u l t s i n a "bottom c l i p p i n g " e f f e c t which i s not accounted f o r i n the c a l c u l a t i o n of the MI. A m o d i f i c a t i o n to the MI formula designed to reduce the e f f e c t of bottom c l i p p i n g on MI scores was t e s t e d without s a t i s f a c t o r y r e s u l t s . A b e t t e r s o l u t i o n may be the s e l e c t i o n of speech tokens which do not c o n t a i n pauses f o r MI a n a l y s i s , or a d d i t i o n of a c o r r e c t i o n f a c t o r to MI values o b t a i n e d from speech tokens c o n t a i n i n g pauses. A f i n a l m o d i f i c a t i o n to the MI which we were unable to e x p l o r e , due to l i m i t a t i o n s i n equipment and time, was frequency dependent a n a l y s i s . French and S t e i n b e r g (1947) were the f i r s t to employ frequency dependent a n a l y s i s when c a l c u l a t i n g the A r t i c u l a t i o n Index. They d i v i d e d the speeoh spectrum i n t o twenty frequency bands, eaoh of which was c a l c u l a t e d to c o n t r i b u t e e q u a l l y to i n t e l l i g i b i l i t y in the i d e a l case. A more modern approach has been to employ octave bands or t h i r d - o c t a v e bands, and to compare the s i g n a l w i t h i n each of the bands s e l e c t e d b e f o r e and a f t e r passage through a t r a n s m i s s i o n system. The advantage of t h i s d i v i s i o n of the s i g n a l i s t h a t f r e q u e n c y - s p e c i f i c e f f e c t s , such as low-pass f i l t e r i n g , i n t e r f e r i n g narrow band n o i s e , or frequency-s p e c i f i c amplitude modulation, can be measured with more p r e c i s i o n . These e f f e c t s may be important to the i n t e l l i g i b i l i t y of the s i g n a l , but t h e i r s i g n i f i c a n c e may be l o s t i n wide band a n a l y s i s which averages the frequency band(s) of i n t e r e s t with the other u n a f f e c t e d bands. 8 0 To summarize, the concept of r a t i n g speech i n t e l l i g i b i l i t y of speakers by q u a n t i f i c a t i o n of amplitude modulation i n speech amplitude envelopes seems promising, based on the r e s u l t s of t h i s e x p l o r a t o r y study. However, the MI needs m o d i f i c a t i o n b e f o r e i t becomes a u s e f u l t o o l . In p a r t i c u l a r , the e f f e c t s of i n t e r - s p e a k e r v a r i a t i o n s i n t i m i n g and i n t e n s i t y need to be overcome adequately. Although some suggestions f o r m o d i f i c a t i o n s to the MI have been presented here, f u r t h e r r e s e a r c h w i l l be necessary to d i s c o v e r the form of the Index which w i l l be most e f f e c t i v e . 81 BIBLIOGRAPHY Ahlst rom, C. and. Humes, L.E. (1983). "Normal modulation transfer functions i n noise using temporal probe tones," ASHA. 25(10):67. Ahlstrom, C. and Humes, L.E. (1984). "Psychoacoustic modulation transfer functions of impaired ears using probe tones," ASHA, 26(10):175. Ahlstrom, C , Boney, S.F. and Humes, L.E. (1984). "Modulation transfer functions of hearing aids," ASHA 26(10):124. Ananthapadmanabha, T.V. (1982) " I n t e l l i g i b i l i t y c a r r i e d by speech source functions: implications for a theory of speech perception," Speech Transmission Laboratory, Quarterly Report. STL-QPSR 4/1982. Ando, K. and Canter, G.J. (1969). "A study of stress in some English words by deaf and hearing speakers," Lang.Speech 12:247-255. ANSI (1969). American National Standard methods for the ca l c u l a t i o n of the a r t i c u l a t i o n index (ANSI S3.5 1969) New York: ANSI. Black, J.W. (1957). "Multiple choice i n t e l l i g i b i l i t y t e s t s , " J. of Speech and Hearing Dis. 22(2):213-240. Brannon, J.B. (1966). "The speech production and spoken language of the deaf. Lang.Speech 12:247-255. Crum, M.A. (1974). "Effects of speaker to l i s t e n e r distance upon speech i n t e l l i g i b i l i t y in reverberation and noise," Doctoral d i s s e r t i o n . Northwestern University. Dareham, J.R. (1986). "Measuring speech i n t e l l i g i b i l i t y using RASTI," Sound Video Contractor 3(11)50,52,54,56-57. Dirks, D.D., B e l l , T.S., Rossman, R.N. and Kincaid, G.E. (1986). " A r t i c u l a t i o n index predictions of contextually independent words," J.Acoust.Soc.of America 80:82-92. Doyle, J. (1987). " R e l i a b i l i t y of audiologists' ratings of the i n t e l l i g i b i l i t y of hearing impaired children's speech," Ear and Hearing 8(3 ) : 170-174. Dunn, H.K. and White, S.D. (1940). " S t a t i s t i c a l measurements on conversational speech," J.Acoust.Soc.of America 11:278-288. s y l l a b i c normally 82 Fant, G. (1960). A c o u s t i c Theory of Speech P r o d u c t i o n Mouton, Hague. Second E d i t i o n 1970. French, N.R. and S t e i n b e r g , J.C. (1947). " F a c t o r s governing the i n t e l l i g i b i l i t y of speech sounds," J.Acoust.Soc.of  America 19:90-119. H i r s h , I . J . , Reynolds, E.G. and Joseph, M. (1954). " I n t e l l i g i b i l i t y of d i f f e r e n t speech m a t e r i a l s , " J.Acoust.Soc.of America 26:530-538. Houtgast, T. and Steeneken, H.J.M. (1971). " E v a l u a t i o n of speech t r a n s m i s s i o n channels by u s i n g a r t i f i c i a l s i g n a l s , " A c u s t i c a 25:355-367. Houtgast, T. and Steeneken, H.J.M. (1973). "The modulation t r a n s f e r f u n c t i o n i n room a c o u s t i c s as a p r e d i c t o r of speech i n t e l l i g i b i l i t y , " A c u s t i c a 28:66-73. Houtgast, T. and Steeneken, H.J.M. (1985). "A review of the MTF concept i n room a c o u s t i c s and i t s use f o r e s t i m a t i n g speech i n t e l l i g i b i l i t y i n a u d i t o r i a , " J.Acoust.Soc.of America 77:1069-1077. Houtgast, T. and Steeneken, H.M.J. (1984). "A m u l t i -language e v a l u a t i o n of the RASTI - method f o r e s t i m a t i n g speech i n t e l l i g i b i l i t y i n a u d i t o r i a , " A c u s t i c a 54:185-199. Houtgast, T., Steeneken, H.J.M. and Plomp, R. (1980). " P r e d i c t i n g speech i n t e l l i g i b i l i t y i n rooms from the modulation t r a n s f e r f u n c t i o n I. General room a c o u s t i c s , " A c u s t i c a 46:60-72. Hudgins, C.V. (1960). "The development of communication s k i l l s among pr o f o u n d l y deaf c h i l d r e n i n an a u d i t o r y t r a i n i n g programme," In I.R. Ewing (Ed.), Modern  E d u c a t i o n a l Treatment of Deafness. Manchester (Eng.): Manchester U n i v e r s i t y P r e s s . Hudgins, C.V., and Numbers, M. (1942). "An i n v e s t i g a t i o n of the i n t e l l i g i b i l i t y of the speech of the deaf," Genet.Psycholog. Monogr. 25:289-392. Humes, L.E., D i r k s , D.D., B e l l , T.S., Ahlstrom, C. and K i n c a i d , G.E. (1986). " A p p l i c a t i o n of the A r t i c u l a t i o n Index and the Speech T r a n s m i s s i o n Index to the r e c o g n i t i o n of speech by normal-hearing and h e a r i n g impaired l i s t e n e r s , " J . of Speech and Hearing Res. 29:447-462. John, J . E . J . , and Howarth, J.N., (1965). "The e f f e c t of time d i s t o r t i o n s on the i n t e l l i g i b i l i t y of deaf c h i l d r e n ' s speech," Lang.Speech 8:127-134. 83 Kamm, C.A., D i r k s , D.D. and B e l l , T.S. (1985). "Speech r e c o g n i t i o n and the a r t i c u l a t i o n index f o r normal and h e a r i n g - i m p a i r e d l i s t e n e r s , " J.Acoust.Soc.of America 77:281-288. K l e i n , W. (1971). " A r t i c u l a t i o n l o s s of consonants as a b a s i s f o r the d e s i g n and judgment of sOund rei n f o r c e m e n t systems," J.Audio Eng. soc. 19:11:920. Kondraske, G.V. (1985). " Q u a n t i t a t i v e assessment of speech f u n c t i o n , " Proc. 7th. Annual Conference of the IEEE Medical B i o l o g i c a l S o c i e t y . Chicago, I l l i n o i s . September 27-30, 1985. Volume 2:671-674. K r y t e r , K.D. (1962a). "Methods f o r the c a l c u l a t i o n and use of the a r t i c u l a t i o n index," J.Acoust.Soc.of America 34:1689-1697. K r y t e r , K.D. (1962b). " V a l i d a t i o n of the a r t i c u l a t i o n index," J.Acoust.Soc.of America 34:1698-1702. K r y t e r , K.D. and B a l l , J.H. (1964). "SCIM - A meter f o r measuring the performance of speech communication systems," D e c i s i o n Science Laboratory. E l e c t r o n i c systems D i v i s i o n . A i r Force Systems Command. Report ESD-TDR-64-674. L i c k l i d e r , J.C.R., B i s b e r g , A. and Schwarzlander, H. (1959). "An e l e c t r o n i c d e v i c e to measure the i n t e l l i g i b i l i t y of speech," Proceedings of the N a t i o n a l E l e c t r o n i c s  Conference 15:329. Lochner, J . and Burger, J . (1964). "The i n f l u e n c e of r e f l e c t i o n s on a u d i t o r i u m a c o u s t i c s , " J.Sound V i b r . 4,426-454. Lundin, F . J . (1982). "The i n f l u e n c e of room r e v e r b e r a t i o n on speech - an a c o u s t i c a l study of speech i n a room," Speech T r a n s m i s s i o n Laboratory, Q u a r t e r l y Report. STL-QPSR 4/1982. Lundin, F . J . (1986). "A study of speech i n t e l l i g i b i l i t y over a p u b l i c address system," Speech T r a n s m i s s i o n Laboratory, Q u a r t e r l y Report. STL-QPSR 1/1986. Metz, D.E., Sama R., V.J., S c h i a v e t t i , N., S i t l e r , R. and Whitehead, R.L. (1985). " A c o u s t i c dimensions of h e a r i n g impaired speaker's i n t e l l i g i b i l i t y , " J . of Speech and  Hearing Res. 28:345-355. M i l l e r , G.A. and N i c e l y , P.E. (1955). "An a n a l y s i s of p e r c e p t u a l c o n f u s i o n s among some E n g l i s h consonants," J.Acoust.Soc.of America 27:338-352. 84 Monsen, R.B. (1978). "Toward measuring how w e l l deaf c h i l d r e n speak," J . of Speech and Hearing Res. 21:197-219. Morse, P.M. and Ingard, U.K. (1968). T h e o r e t i c a l A c o u s t i c s McGraw-Hill. N.Y. Newman, P.W (1979). " A p p r a i s a l of a r t i c u l a t i o n , " P a r t II i n Darley, F.L. ( E d i t o r ) E v a l u a t i o n of A p p r a i s a l  Techniques i n Speech and Language Pathology. Don M i l l s , O n t a r i o : Addison-Wesley P u b l i s h i n g Company. P a v l o v i c , C.V. (1987). " D e r i v a t i o n of primary parameters and procedures f o r use i n speech i n t e l l i g i b i l i t y p r e d i c t i o n s , " J.Acoust.Soc.of America 82:413-422. P a v l o v i c , C.V. and Studebaker, G.A. (1984) "An e v a l u a t i o n of some assumptions u n d e r l y i n g the a r t i c u l a t i o n index," J.Acoust.Soc.of America 75:1606-1612. Peutz, V.M.A. (1971). " A r t i c u l a t i o n l o s s of consonants as a c r i t e r i o n f o r speech t r a n s m i s s i o n i n a room," J.Audio.Eng. 19(11):915. Picheny, M.A., Durlach, N.I. and B r a i d a , L.D. (1985). "Speaking c l e a r l y f o r the hard of h e a r i n g I: I n t e l l i g i b i l i t y d i f f e r e n c e s between c l e a r and c o n v e r s a t i o n a l speech," J . of Speech and Hearing Res. 28:96-103. Picheny, M.A., Durlach, N.I. and B r a i d a , L.D. (1986). "Speaking c l e a r l y f o r the hard of h e a r i n g I I : A c o u s t i c c h a r a c t e r i s t i c s of c l e a r and c o n v e r s a t i o n a l speech," J .  of Speech and Hearing Res. 29:434-446. Plomp, R., Houtgast, T. and Steeneken, H.J.M. (1980). " P r e d i c t i n g speech i n t e l l i g i b i l i t y i n rooms from the modulation t r a n s f e r f u n c t i o n I I . M i r r o r image computer model a p p l i e d to r e c t a n g u l a r rooms," J.Acoust.Soc.of  America 46:73. R e t t i n g e r , M.A. (1968). A c o u s t i c s : Room Design and Noise  C o n t r o l New York: Chemical P u b l i s h i n g Co., Inc. Schroeder, M.R. (1981). "Modulation t r a n s f e r f u n c t i o n s , " A c u s t i c a 49:179. Steeneken, H.J.M. and A g t e r h u i s , E. (1978). " D e s c r i p t i o n of STIDAS I I - c (Speech T r a n s m i s s i o n Index Device u s i n g A r t i f i c i a l S i g n a l s ) P a r t I , " Report 1978-19,IZF-TNO, Soesterb e r g , The Netherlands. 85 Steeneken, H.J.M. and Houtgast, T. (1980). "A p h y s i c a l method f o r measuring s p e e c h - t r a n s m i s s i o n q u a l i t y , " J.Acoust.Soc.of America 67:318-326. Steeneken, H.J.M. and Houtgast, T. (1983). "The temporal envelope spectrum of speech and i t s s i g n i f i c a n c e i n room a c o u s t i c s , " In Proceedings o f the E l e v e n t h  I n t e r n a t i o n a l Congress on A c o u s t i c s . 7:85-88. Steeneken, H.J.M. and Houtgast, T. (1985). "RASTI: A t o o l f o r e v a l u a t i n g a u d i t o r i a , " B r u e l & K.jaer Tech. Rev. 3. van R e i t s c h o t e , H.F., Houtgast, T. and Steeneken, H.J.M. (1981). " P r e d i c t i n g speech i n t e l l i g i b i l i t y i n rooms from the modulation t r a n s f e r f u n c t i o n IV: A r a y - t r a c i n g computer model," A c u s t i c a 49:245-252. van R e i t s c h o t e , H.F., Houtgast, T. and Steeneken, H.J.M. (1984). " P r e d i c t i n g speech i n t e l l i g i b i l i t y i n rooms from the modulation t r a n s f e r f u n c t i o n V: The me r i t s of a r a y - t r a c i n g model versus g e n e r a l room a c o u s t i c s , " A c u s t i c a 53:72-78. Voe l k e r , C H . (1938) "An experimental study of the comparative r a t e of u t t e r a n c e of deaf and normal h e a r i n g speakers," Am.Ann.Deaf 83:274-284. W i l l i a m s , C.E. and Hecker, M.H.L. (1968). " R e l a t i o n between i n t e l l i g i b i l i t y scores f o r types of speech d i s t o r t i o n . J.Acoust.Soc.of America 44:1002. 86 APPENDIX A INTELLIGIBILITY TESTS BASED ON LISTENER JUDGMENTS A. TESTS CITED BY BLACK (1957) B e l l Telephone's T e s t s r e f e r e n c e : F l e t c h e r , H. and S t e i n b e r g , J.C. (1929). " A r t i c u l a t i o n t e s t i n g methods," B e l l S y s t . Tech. .1^8:806-854. Harvard's P h o n e t i c a l l y Balanced (PB) T e s t s r e f e r e n c e s : a) Egan, J.P. (1944). " A r t i c u l a t i o n t e s t i n g methods I I , " Psycho-Acoustic Laboratory, Harvard U n i v e r s i t y , Nov. OSRD Rept. no. 3820. b) Egan, J.P. (1948). " A r t i c u l a t i o n t e s t i n g methods," Laryngoscope 58:955-991. Voi c e Communication Lab o r a t o r y ' s T e s t r e f e r e n c e : Haagen, C H . (1944). " I n t e l l i g i b i l i t y measurement: Techniques and procedures used by the V o i c e Communication Laboratory," P s y c h o l o g i c a l C o r p o r a t i o n . , New York, OSRD Rept. no. 3748. B. TESTS REVIEWED IN NEWMAN'S (1979) CHAPTER Drumwright, A.F. (1971). The Denver A r t i c u l a t i o n Screening  Examination (DASE) Ladoca P r o j e c t and P u b l i s h i n g Foundation, Inc. F i s h e r , H.A. and Logemann, J.A. (1971). The Fisher-Logemann  Te s t of A r t i c u l a t i o n Competence Houghton M i f f l i n Company. Fudala, J.B. (1963). A r i z o n a A r t i c u l a t i o n P r o f i c i e n c y S c a l e Western P s y c h o l o g i c a l S e r v i c e s , ( l a t e r e d i t i o n s 1970, 1974). Goldman, R. and F r i s t o e , M. (1969). Goldman-Fristoe Test  of A r t i c u l a t i o n (GFTA) American Guidance S e r v i c e , Inc. ( l a t e r e d i t i o n p u b l i s h e d 1972). Hejna, R.F. (1968). Developmental A r t i c u l a t i o n Test Speech M a t e r i a l s . Irwin, O.C (1972) " I n t e g r a t e d a r t i c u l a t i o n t e s t , " In O r v i s C. Irwin, Communication V a r i a b l e s of C e r e b r a l P a l s i e d  and M e n t a l l y Retarded C h i l d r e n S p r i n g f i e l d , 111: C h a r l e s C. Thomas. McDonald, E.T. (1964). A Deep Test of A r t i c u l a t i o n . P i c t u r e  Form Stanwix House, Inc. (Also a v a i l a b l e from the same p u b l i s h e r are the Sentence and Screening Forms of the same t e s t ) . Mecham, J . L . J . and Jones, J.D. (1970). Screening Speech A r t i c u l a t i o n Test (SSAT) Communication Research A s s o c i a t e s , Inc.. Pendergast, K., Dickey, S., Selmar, J.W., and Soder, A.L. (1969). Photo A r t i c u l a t i o n Test (PAT) I n t e r s t a t e P r i n t e r s and P u b l i s h e r s , Inc.. Templin, M.C. and Darley, F.L. (1969). The Templin-Darley  T e s t s of A r t i c u l a t i o n U n i v e r s i t y of Iowa Bureau of E d u c a t i o n a l Research and S e r v i c e , Sound e d i t i o n . Toronto, A.S. (1977). Southwestern Spanish A r t i c u l a t i o n  T e s t (SSAT) Academic T e s t s , Inc.. Van R i p e r , C. and E r i c k s o n , R.L. ( 1973) . P r e d i c t i v e Screening Test of A r t i c u l a t i o n (PSTA) Western Michigan U n i v e r s i t y , C o n t i n u i n g E d u c a t i o n O f f i c e , T h i r d E d i t i o n . C. TESTS CITED BY WILLIAMS AND HECKER (1968) Harvard PB-Word I n t e l l i g i b i l i t y T e s t and Harvard Sentence T e s t s r e f e r e n c e : Egan, J.P. (1948) " A r t i c u l a t i o n t e s t i n g methods," Laryngoscope 58:955-991. Fairbanks Rhyme Tes t r e f e r e n c e : F a i r b a n k s , G. (1958) "Test of phonemic d i f f e r e n t i a t i o n : The Rhyme T e s t , " J.Acoust. Soc.Amer. 30:596-600. M o d i f i e d Rhyme Test r e f e r e n c e : House, A.S., W i l l i a m s , C.E., Hecker, M.H.L and K r y t e r , K.D. (1965) " A r t i c u l a t i o n t e s t i n g methods: Consonantal d i f f e r e n t i a t i o n with a c l o s e d - r e s p o n s e s e t , " J.Acoust. Soc. Amer. 37:158-166. APPENDIX B THE ENGLISH AND FRENCH SENTENCES SELECTED ENGLISH 1. P a t t y put f i v e pennies i n her purse. 2. Shallow seas are not shark i n f e s t e d . 3. Green grapevines grow i n country gardens. FRENCH 1. II n'y a jamais de fumee sans f e u . 2. I I joue du trombone tous l e s l u n d i s . 3. Un grand c o c a - c o l a sans g l a c o n s . 89 APPENDIX C LISTENING TEST INSTRUCTIONS AND ANSWER SHEET PART 1 - LISTENING TEST ANSWER SHEET Subject ID: L In t h i s experiment, you w i l l be h e a r i n g a number of sentences spoken by s i x E n g l i s h speakers and one French speaker. Sometimes the sentences are mumbled ( u n d e r a r t i c u l a t e d ) , sometimes they are a r t i c u l a t e d normally, and sometimes they are o v e r a r t i c u l a t e d . Your task i s to a s s i g n a number on a seven p o i n t s c a l e f o r each sentence i n d i c a t i n g how i t sounds to you. For example, i f you were f a i r l y c e r t a i n t h a t a sentence was normally a r t i c u l a t e d , you would c i r c l e the number 4 as shown below: mumbled normal o v e r a r t i c u l a t e d 1 4 1 . 1 2 3 4 5 6 7, whereas i f you thought t h a t item 142 was mumbled, you might c i r c l e 1 as shown below: mumbled normal o v e r a r t i c u l a t e d 142. 1 2 3 4 5 6 7 The s c a l e r e p r e s e n t s a continuum between mumbled (number 1) and o v e r a r t i c u l a t e d (number 7). You may choose any number which you th i n k i s a p p r o p r i a t e to what you hear. P l e a s e t r y to a t t e n d o n l y to whether the sentence sounds mumbled, normally a r t i c u l a t e d , or o v e r a r t i c u l a t e d , and ignore other v a r i a b l e s such as d i f f e r e n c e s i n r e c o r d i n g or v o i c e q u a l i t y , speed of a r t i c u l a t i o n , language spoken, or sentence content. P l e a s e begin by l i s t e n i n g to the f i r s t ten sentences without marking the paper, and then stop the tape and ask quest i o n s i f you need c l a r i f i c a t i o n about any p a r t of the task. F o l l o w i n g t h i s , the tape w i l l be rewound to the beginning and you w i l l be asked to l i s t e n to the e n t i r e tape and mark your answer sheet, without rewinding or st o p p i n g the tape i f p o s s i b l e . Subject ID: L mumbled 1. 1 2 2. 1 2 3. 1 2 4. 1 2 5. 1 2 6. 1 2 7. 1 2 8. 1 2 129. 1 2 130. 1 2 131. 1 2 132. 1 2 133. 1 2 134. 1 2 135. 1 2 136. 1 2 137. 1 2 138. 1 2 139. 1 2 140. 1 2 mumbled 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 normal 4 4 4 4 4 4 4 ( ... . 4 4 4 4 4 4 4 4 4 4 4 4 normal 5 5 5 5 5 5 5 i etc• • • ) 5 5 5 5 5 5 5 5 5 5 5 5 overar t i cu la t ed 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 overar t i cu la ted APPENDIX D LISTING OF THE FOCAL-12 PROGRAM FOR CALCULATING THE MODULATION INDEX PART A 1 .01 C PROGRAM DHPP12A 1 .03 E 1.05 O C 1 .07 L O,F0,I,INPUT,l 1.08 L O,F0,I,OUTPUT,0 1 .10 A "FIRST SAMPLE",SF,!,"LAST SAMPLE",SL, !;S I=SF+2 K=l 1.20 D 2;G 4.1 2. 10 S A=F0(I)-F0(I-1);S B(0)=F0(I)-F0(I+1) 2.20 S B(I)+F0(I+l)-F0(I+2);S B(2)=F0(I+2)-F0(I+3);S B(3) = F0(1+3)-F0(1+4);S B(4)=F0(I+4)-F0(I+5) 4. 10 I (A)4.2,4.3;I (B(0))5.4,5.2,5.1 4.20 I (B(0))5.6,5.3,5.4 4 . 30 I (B(0))5.2,5.5,5.2 5.09 C IT IS A PEAK 5.10 S Fl( K ) = F 0 ( I ) ; S F1(K+1)=I;S K=K+2;G 6.1;C STORING AMPLITUDE AND LOCATION OF PROUGHS 5.19 C SHOULD NOT HAPPEN 5.20 G 7.1 5.29 C LOOK ONE, TWO, OR THREE AHEAD 5 . 30 I (A*B(1))5.8,5.31;I (A)5.34,7.1,5.34 5.31 I (A*B(2))5.85,5.32;LI (A)5.35,7.1,5.35 5.32 I (A*B(3))5.9,5.33;I (A)5.37,7.1,5.37 5.33 I (A*B(4))5.95,7.1;I (A)5.38,7.1,5.38 5.34 S F1(K)=F0(I);S Fl(K+l)=1+0.5;S I=I+1;S K+K+2;G 6 5.35 s F1(K)=F0(I);S F1(K+1)=I+1.0;S I=I+2;S K=K+2;G 6 5.37 s F l ( K ) = F 0 ( I ) ; S F1(K+1)=I+1.5;S I=I+3;S K=K+2;G 6 5.38 s F1(K)=F0(I);S Fl(K+l)=I+2.0;S I=I+4;S K=K+2;G 6 5.39 C CONTINUE 5.40 G 6.1 5.49 C SHOULD NOT HAPPEN 5.50 G 7.1 5.59 C IT IS A TROUGH 5.60 S Fl ( K ) = F 0 ( I ) ; S F1(K+1)=I;S K=K+2;G 6.1 5.80 S 1=1+1;G 6.1 5.85 S I=I+2;G 6.1 5.90 S I=I+3;G 6.1 5.95 S 1=1+4;G 6.1 6.10 S I=I+l;I (I-SL+5)1.2,1.2 6.20 T %2.01,!!,1,!,K,!!!."TYPE RETURN TO CONTINUE";A ! 6 . 30 S F1(0)=(K-1)/2;C STORES IN F1(0) THE NUMBER OF "PROUGHS" 6.40 L C,F1 6.50 L O.DHPP12B.0 PART B 1.01 C PROGRAM DHPP12B 1.02 E 1.03 0 C 1.05 L 0,F1,F,OUTPUT,0 1.10 S MAX=0;S MIN=1000;S K=F1(0) 1.20 F I+1,2,K*2;D 2 1.30 D 3 1.40 S PR=AV/F1(IS);F I=IS+2,4,IL-2;S PR=PR*F1(I)/FI(1+2) 1.50 S MI=FEXP((l/NP)*FLOG(PR)) 1.60 T %7.06,"MI = ",MI,!! 1.90 Q 2.01 C DETERMINING MAX AND MIN 2.10 I (F1(I)-MAX)2.2,2.2;S MAX=F1(I) 2.20 I (MIN-Fl(I))2.3,2.3;S MIN=F1(I) 2.30 S AV=(MA+MI)/2;R 3.01 c DETERMINING THE LOCATION OF THE FIRST AND LAST PEAKS 3.10 I (F1(I)-F1(3))3.2,3.9;S IS=1;G 3.3 3.20 s IS = 3 3.30 I ( F l ( K * 2 - l ) - F l ( K * 2 - 3 ) ) 3 . 4 , 3 . 9 ; S IL=K*2-1;G 3.5 3.40 s IL=K*2-3 3.50 s NP=(IL-IS)/4+l;S NT=NP-l;R 3.90 T "TROUBLE',!;Q 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0097698/manifest

Comment

Related Items