UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Perception of coarticulated lip rounding Adelman, Sharon 1974

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1974_A6_7 A34.pdf [ 3.87MB ]
Metadata
JSON: 831-1.0093039.json
JSON-LD: 831-1.0093039-ld.json
RDF/XML (Pretty): 831-1.0093039-rdf.xml
RDF/JSON: 831-1.0093039-rdf.json
Turtle: 831-1.0093039-turtle.txt
N-Triples: 831-1.0093039-rdf-ntriples.txt
Original Record: 831-1.0093039-source.json
Full Text
831-1.0093039-fulltext.txt
Citation
831-1.0093039.ris

Full Text

PERCEPTION OF COARTICULATED LIP ROUNDING  by SHARON ADELMAN B . S c , McGill U n i v e r s i t y , 1972  A THESIS SUBMITTED IN PARTIAL FULFILMENT Of THE REQUIREMENTS FOR THE DEGREE OF .  MASTER OF SCIENCE in the Department of Paediatrics  D i v i s i o n of Audiology and Speech Sciences  We accept t h i s thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH COLUMBIA J u l y , 1974  In p r e s e n t i n g t h i s  thesis  an advanced degree at the L i b r a r y I further for  of  this  fulfilment  the U n i v e r s i t y of  s h a l l make it  freely  British  available  for  agree t h a t permission f o r e x t e n s i v e  scholarly  by h i s  in p a r t i a l  of  the requirements  Columbia,  I agree  reference and copying o f  this  for  that  study. thesis  purposes may be granted by the Head of my Department or  representatives. thesis for  It  financial  i s understood that gain s h a l l  written permission.  Department of The U n i v e r s i t y o f B r i t i s h Vancouver 8, Canada  Columbia  not  copying or  publication  be allowed without my  ii  ABSTRACT  The present study i n v e s t i g a t e s the perceivabi 1 i t y of c o a r t i c ulated l i p rounding  i n French.  Nine utterances containing the c l u s t e r s  / k s t r / , / r s t r / , and / r s k r / followed by one of the vowels / i / , /y/, or /u/ i n a l l possible combinations, were truncated at 4 d i f f e r e n t points before the vowel.  Test items  in each of the 4 groups therefore  contained d i f f e r e n t amounts of information regarding the nature of the following vowel, due to c o a r t i c u l a t o r y influences of the vowel on the preceding consonants.  Subjects were asked to predict the i d e n t i t y of  the missing vowel on hearing the truncated utterances.  Subjects were  native speakers of e i t h e r French or E n g l i s h ; some of them had a knowledge of phonetics. Results show that when segments up to and including at least h a l f of the f i n a l consonant of the c l u s t e r are present, subjects c o r r e c t l y i d e n t i f y the missing vowel well above chance l e v e l s .  Several  i n d i v i d u a l s were able to i d e n t i f y the vowel even when presented  with  shorter versions of the utterances.  No s i g n i f i c a n t d i f f e r e n c e i n  performance was found between French and English subjects, nor between subjects with and without phonetic t r a i n i n g .  P e r c e i v a b i l i t y of  i n d i v i d u a l features of the missing vowel i s discussed. I t i s concluded rounding  that c o a r t i c u l a t o r y e f f e c t s due to l i p  (as well as to horizontal tongue p o s i t i o n ) provide perceivable  information a t a level s i g n i f i c a n t l y above chance, and that t h i s information may be used by the perceptual mechanism as an a i d i n speech sound i d e n t i f i c a t i o n .  i ii TABLE OF CONTENTS Chapter  Page  ABSTRACT  i i  TABLE OF CONTENTS  i i i  LIST OF TABLES  v  LIST OF FIGURES  vi  ACKNOWLEDGEMENT  vii  1.  INTRODUCTION  .  1  2.  REVIEW OF THE LITERATURE  3  2.1  Introduction  3  2.2  C o a r t i c u l a t i o n : The Acoustic Level  3  2.3  C o a r t i c u l a t i o n : The A r t i c u l a t o r y Level  7  2.4  C o a r t i c u l a t i o n : The Perceptual Level  18  3.  AIMS.OF THE EXPERIMENT  32  4.  MATERIALS AND METHODS  34  4.1  P i l o t Study  34  4.2  Main Study  5.  .....  36 51  RESULTS 5.1  C o r r e l a t i o n Between Relative Transmission (T  r e l  ) and Score (S)  51  5.2  I d e n t i f i c a t i o n of the Missing Vowel  5.3  I d e n t i f i c a t i o n of Individual  52  Features of  the Missing Vowel  59  5.4  Differences Between Subject Groups  62  5.5  Speaker Differences  62  iv Chapter  6.  Page  DISCUSSION  65  6.1  I d e n t i f i c a t i o n of the Missing Vowel  65  6.2  I d e n t i f i c a t i o n of Individual Features of the Missing Vowel  67  6.3  Differences Between Subject Groups  70  6.4  Subjects' Comments  71  6.5  Conclusions  72  BIBLIOGRAPHY  ......  75  APPENDIX  I - Utterances Used i n the Experiment  APPENDIX I I - I n s t r u c t i o n s  78 79  V  LIST OF TABLES Table  I.  Page  Groups of Edited S t i m u l i with Constant C l u s t e r s / k s t r / , / r s t r / , and / r s k r /  II.  Differences i n T  -| and i n S between Each Group  of Items, f o r French and English Subjects III.  41  55  Percent of Items Answered C o r r e c t l y i n Each Vowel Category f o r Each Group of Items, A l l Subjects Pooled Together  IV.  58  Percent of Items i n Each Group f o r Which Various Vowel Confusions Were Made, A l l Subjects Pooled Together  V.  Mean T  59 -j Values (x) and Sample Standard Deviations  (s) f o r Each Group of Itmes, Shown f o r French and English Subjects VI.  Mean T  63  .| Values (x) and Sample Standard Deviations  (s) f o r Each Group of Items, Shown f o r Phonetica l l y Trained and P h o n e t i c a l l y Naive Subjects . . .  63  vi  LIST OF FIGURES Figure  1.  Page  Mingogram of one of the t e s t utterances, " l a dextre u n i v e r s e l l e , " showing the 4 points of truncation  2.  43  D i s t r i b u t i o n of T  ^ based on random responses  to a 27-item t e s t 3.  49  D i s t r i b u t i o n of Scores based on random responses to a 27-item t e s t  4.  Mean values of T  r e l  50 and S, plus or minus one  standard d e v i a t i o n , f o r each group of items 5.  T ^  and S values f o r each group of items f o r  three d i f f e r e n t subjects 6.  54  56  Mean values of T  -. and S f o r front-vs-back rel d i s t i n c t i o n s , and unrouhded-vs-rounded d i s t i n c t i o n s , shown f o r three groups of items  60  vi i  ACKNOWLEDGEMENT  I would l i k e to thank a l l those who have had a part i n this thesis:  • Dr. Andre-Pierre Benguerel f o r his guidance during the research and w r i t i n g of the t h e s i s . • Dr. Joyce D. Edwards f o r serving on my committee. • My subjects f o r t h e i r kind cooperation. • My parents f o r t h e i r encouragement over the past two years. • I n g r i d , Betty, Lynne, Pat, and M e r a l i n , f o r much friendship.  1  CHAPTER 1 INTRODUCTION  The production of speech i s a complex process, and i t s complexi t i e s necessitate a unique and equally complex perception process. I t would be i n t e r e s t i n g to know whether the s u b t l e t i e s and v a r i a t i o n s in the production process are noted i n , perhaps even necessary t o , the speech perception process. Speech i s not'merely a sequence of independent sounds produced by independent gestures.  As the motor gestures producing speech over-  lap i n time and change with context, so do the acoustic cues i n the speech s i g n a l . i s based.  I t i s on t h i s ever-changing signal that speech perception  The l i s t e n e r must abstract the appropriate cues from the  mass of acoustic information to c o r r e c t l y i d e n t i f y the s i g n a l , to understand spoken language.  How he recognizes the appropriate cues,  indeed even what these cues may be, i s f a r from completely understood. In examining speech production, one sees that i f an a r t i c u l a t o r , such as'the tongue t i p or the velum, i s free to move during production of a p a r t i c u l a r sound, i t may i n i t i a t e movement towards i t s target p o s i t i o n f o r the subsequent phone, or f o r a phone several segments ahead.  A l s o , an a r t i c u l a t o r may s t i l l  be moving from i t s p o s i t i o n  for the preceding phone while a current phone i s already being produced. This overlapping of speech gestures i n time i s referred to as coarticulation.  This c h a r a c t e r i s t i c of speech production r e s u l t s i n  one phoneme being a c o u s t i c a l l y d i f f e r e n t v i r t u a l l y each time i t i s  2 produced.  I t also means that the u n i t s of production overlap to  an extent whereby cues for a phoneme may  be found several phones  preceding and several phones f o l l o w i n g the one i n question.  Does a  l i s t e n e r make use of these c h a r a c t e r i s t i c s of speech production i n i d e n t i f y i n g the speech s i g n a l ?  At l e a s t , can he make use of them i f  required t o , f o r example, when other cues are masked or missing?  Or  are these cues i r r e l e v a n t to the perception process, merely a byproduct of the complex workings of the a r t i c u l a t o r s , without perceptual correlates? The present study looks at utterances i n which cues f o r a c e r t a i n vowel are known to e x i s t several phones preceding the vowel. Subjects are asked to p r e d i c t the i d e n t i t y of the upcoming vowel a f t e r hearing only part of the utterance.  This study therefore gives  an i n d i c a t i o n as to how much use a l i s t e n e r makes, or at l e a s t can make, of c o a r t i c u l a t e d information i n the speech s i g n a l .  3  CHAPTER 2 REVIEW OF THE LITERATURE  2.1  Introduction Studies of c o a r t i c u l a t i o n have been c a r r i e d out on several  levels.  Section 2.2 discusses c o a r t i c u l a t i o n a t the acoustic l e v e l .  Section 2.3 discusses c o a r t i c u l a t i o n a t the a r t i c u l a t o r y l e v e l and o u t l i n e s several theories that have been proposed to e x p l a i n the phenomenon.  Section 2.4 reviews studies on the perceptual c o r r e l a t e s  of c o a r t i c u l a t i o n .  A d i s c u s s i o n of the possible u n i t s of speech  perception i s included i n t h i s s e c t i o n .  2.2  C o a r t i c u l a t i o n : The Acoustic Level E a r l i e s t i n d i c a t i o n s of the phenomenon of c o a r t i c u l a t i o n  came from acoustic s t u d i e s .  I t has long been known that the acoustic  value of a vowel i s influenced by the vowel's phonetic context.  For  example, vowel d u r a t i o n , i n t e n s i t y , and fundamental frequency are known to vary with changes i n consonantal environment [House and. Fairbanks, 1953]. Stevens and House [1963] examined changes i n vowel formant frequency and formant bandwidth with context.  Three speakers produced  various /haC^VC^/ utterances, i n which C i s a consonant and V i s a vowel.  In these utterances C-j = C^.  When the f i r s t formant frequency  ( F ^ was p l o t t e d against the second formant frequency (F,-,) f o r each  4 vowel, i t was seen that quite appreciable d i f f e r e n c e s occurred with changes i n consonantal context.  In a d d i t i o n , several of the uttered  vowels d i d not f a l l w i t h i n the F-j vs Peterson and Barney [1952].  contours e s t a b l i s h e d by  These contours had been determined using  several productions of the utterance /hvd/.  Stevens and House showed  that the vowel i n such an environment i s not u n l i k e the vowel produced in a n u l l environment (/#V#/).  The major discrepancy between the  F-| - Fr, values noted by Stevens and House, and those found by Peterson and Barney, then, was due to the influence of the consonantal environment imposed by the /hGCVC/ production.  Looking at d i f f e r e n c e s w i t h i n  t h e i r own data, Stevens and House found f u r t h e r evidence f o r the r e l a t i o n s h i p between phonetic context and a vowel's acoustic value. Consonantal context was seen to cause systematic s h i f t s i n the vowel's formant frequencies, p a r t i c u l a r l y  depending on the place  of a r t i c u l a t i o n , manner of a r t i c u l a t i o n , and the v o i c i n g c h a r a c t e r i s t i c of the consonant involved.  For example, i n an environment of l a b i a l  or post-dental consonants, f r o n t vowels showed more of a downward s h i f t of ?2  t n a n  they d i d i n a back environment.  F r i c a t i v e s produced  greater s h i f t s i n interconsonantal vowel formants than d i d stops. Voiced consonants produced a lowering e f f e c t on F-j of the vowel while F£ was not as appreciably a f f e c t e d . These changes i n the acoustic value of a vowel are explained by Stevens and House i n a r t i c u l a t o r y terms. In the production of a C-jVC2 s y l l a b l e , the structures of the vocal t r a c t assume p o s i t i o n f o r C-pthen maneuver towards p o s i t i o n f o r V.  During t h i s movement,  5 instructions for  are i n i t i a t e d .  Vowel m o d i f i c a t i o n s are therefore  due to overlapping of timing of neural i n s t r u c t i o n s , which may r e s u l t in a n t i c i p a t i o n of the upcoming phoneme, and the sluggishness or dynamic c o n s t r a i n t s ( i . e . mass and i n e r t i a ) of the system.  Fricatives,  for example, r e q u i r i n g c a r e f u l l y c o n t r o l l e d p o s i t i o n i n g and target approach, would tend to i n f r i n g e on the neighbouring vowel's a r t i c u l a t i o n more than would a  q u i c k l y executed stop.  Ohman [1966] looked a t the influence of both preceding and f o l l o w i n g phones on a phoneme.  Whereas Stevens and House used  symmetrical CVC utterances and were unable to separate the influence C-| had on the vowel from the influence of C^. Ohman used C-jVC^ di s y l lables.  Utterances were spoken by speakers of three d i f f e r e n t  languages and employed vowels p a r t i c u l a r to each language.  Spectro-  g r a p h ^ a n a l y s i s y i e l d e d measurements o f formant frequencies a t two points along the VC and CV t r a n s i t i o n s . did  V-| a f f e c t the f o l l o w i n g  t r a n s i t i o n (as might be expected due  to mechanoinertial f a c t o r s ) , but as w e l l , V^C t r a n s i t i o n .  Ohman found t h a t , not only  influenced the preceding  As noted by previous i n v e s t i g a t o r s , i t was  that  showed the most v a r i a t i o n . Ohman's work y i e l d e d r e s u l t s d i f f e r e n t from that of previous workers, whose studies of CV t r a n s i t i o n s had led to the formation of the  "locus theory" [ D e l a t t r e , Liberman, and Cooper, 1955].  This  theory states that f o r each consonant there e x i s t s a c h a r a c t e r i s t i c frequency p o s i t i o n (or p o s i t i o n s ) , or locus, from which formant t r a n s i t i o n s begin or to which they point.  Delattre e_t al_. had found  f i x e d l o c i f o r the second formant of /b/ and /d/, and two l o c i f o r  6 /g/ (depending on context).  Ohman found that the t r a n s i t i o n l o c i f o r  /b/ and /d/ are not f i x e d , but are dependent on context. in a /V^bV^/ if V  1  For example,  utterance, the by^ t r a n s i t i o n o r i g i n a t e s a t 500 Hz  i s /u/, but a t 1300 Hz i f V  2  i s /y/.  Delattre _et_ al_. had  postulated a f i x e d locus f o r bV t r a n s i t i o n s a t 720 Hz. The  a r t i c u l a t o r y basis behind the locus theory i s that  formant t r a n s i t i o n s are r e f l e c t i o n s of the change i n s i z e and shape of the vocal  t r a c t as i t moves from one target p o s i t i o n to another.  Delattre e t al_. s t a t e : Since the a r t i c u l a t o r y place of production of each consonant i s , f o r the most part, f i x e d , we might expect to f i n d that there i s correspondingly a f i x e d frequency p o s i t i o n -- or "locus" -- f o r i t s second formant; we could . . . describe the various secondformant t r a n s i t i o n s as movements from t h i s acoustic locus to the steady state l e v e l o f the vowel. . . . [ D e l a t t r e e t a l . , 1955, p. 769] What the theory does not take into account i s that i f previous and/or succeeding a r t i c u l a t i o n s have an appreciable e f f e c t on the vocal  t r a c t configuration  f o r any given consonant, the locus o f  a consonant produced i n one environment w i l l not be i d e n t i c a l to that of a consonant produced i n another environment. Ohman, l i k e Stevens and House, a t t r i b u t e s the e f f e c t of preceding context on an upcoming phoneme (or i n t h i s case, i t s e f f e c t on the upcoming CV t r a n s i t i o n ) to mechanoinertial f a c t o r s .  This type  of c o a r t i c u l a t i o n has since been referred to as carryover c o a r t i c u lation.  To explain the influence of succeeding context on preceding  events, or a n t i c i p a t o r y c o a r t i c u l a t i o n ,  Ohman points out that speech  7 gestures are not independent and l i n e a r l y sequenced.  Often the vocal  t r a c t can vary a great deal without introducing a phonemic change i n the sound produced.  For example, the tongue i s free to move during  the production of a b i l a b i a l stop; the l i p s are free to move during the production of a v e l a r stop or l i q u i d .  In general, i f an a r t i c u l a t o r  i s free to move during production of one phoneme, i t w i l l  initiate  movement toward i t s target p o s i t i o n f o r the next upcoming phoneme. Since traces of the f i n a l vowel are observable already in the t r a n s i t i o n from the i n i t i a l vowel to the consonant, i t must be concluded that a motion toward the f i n a l vowel s t a r t s not much l a t e r than, or perhaps even simultaneously w i t h , the onset of the stop-consonant gesture. A VCV utterance of the kind studied here can, a c c o r d i n g l y , not be regarded as a l i n e a r sequence of three successive gestures. [Ohman, 1966, p. 165] Ohman also i n d i c a t e s the possible language-dependent nature of c o a r t i c u l a t i o n .  Russian stops must be c o a r t i c u l a t e d with one of  only two vowels, whereas American English and Swedish stops enjoy more freedom of c o a r t i c u l a t i o n .  2.3  C o a r t i c u l a t i o n : The A r t i c u l a t o r y Level With i n v e s t i g a t i o n s at the acoustic l e v e l , . t h e complexity  of the c o a r t i c u l a t i o n process began to come to l i g h t .  Two major  approaches to the study of a r t i c u l a t o r y behaviour, electromyography and cineradiography, began to y i e l d evidence of c o a r t i c u l a t i o n f o r various a r t i c u l a t o r s , and several models have been advanced to account f o r the phenomenon.  Such models are n e c e s s a r i l y r e l a t e d to basic  questions of speech o r g a n i z a t i o n .  8 Electromyography (EMG) has been employed to great advantage in c o a r t i c u l a t i o n s t u d i e s .  Electrodes are introduced into the  a r t i c u l a t o r i n question, and muscle a c t i o n p o t e n t i a l s are recorded during utterance production.  In t h i s way muscle a c t i v i t y during  production of any phone can be measured.  A major problem i n the  i n t e r p r e t a t i o n of EMG studies i s that the a c t i v i t y of one muscle i s often c l o s e l y r e l a t e d to that of others.  A given amount of c o n t r a c t i o n  in one muscle may therefore produce d i f f e r e n t amounts of movement of an a r t i c u l a t o r , depending on the p o s i t i o n and a c t i v i t y of other muscles [MacNeilage and DeClerk, 1969].  Therefore, i n v e s t i g a t i o n s i n t o the  EMG a c t i v i t y of only one muscle do not n e c e s s a r i l y r e f l e c t a l l that i s happening to the a r t i c u l a t o r i n question.  However, EMG studies  allow i n d i v i d u a l muscles to be studied and c o r r e l a t i o n s between neuromuscular a c t i v i t y and l i n g u i s t i c u n i t s to be made. Cineradiography  has been used to a great extent as w e l l .  Movements of l i p s , tongue, jaw, velum, and pharynx can be made v i s i b l e by various methods of cineradiography, and c o r r e l a t e d with acoustic output.  However the r e s u l t i n g p i c t u r e i s a two dimensional  of the vocal t r a c t and so has l i m i t a t i o n s .  display  I t also can only y i e l d  information at the motor l e v e l , whereas EMG studies give i n s i g h t into neuromuscular commands.  Perkell states:  Although a cineradiograph contains a large amount of one type of information, i t i s obvious that many other types of parameters should be examined and c o r r e l a t e d with the c i n e r a d i o g r a p h y data before a comprehensive d e s c r i p t i o n of v o c a l - t r a c t function can be obtained. [ P e r k e l l , 1969, p. 2] Kozhevnikov and C h i s t o v i c h [1965] examined c o a r t i c u l a t i o n of l i p movements in Russian, measuring e l e c t r i c a l a c t i v i t y of the o r b i c u -  9 l a r i s o r i s muscle and c o r r e l a t i n g i t with utteran.ee production.  One  speaker produced CV and CCV s y l l a b l e s i n which V was a rounded vowel. Results show l i p protrusion  to begin almost simultaneously with the  beginning i f the f i r s t consonant, even i f a word or s y l l a b l e boundary f a l l s w i t h i n the CC sequence. Thus l i p rounding was found to c o a r t i c u l a t e over an e n t i r e CCV u n i t .  The authors postulate an " a r t i c u l a t o r y  s y l l a b l e " model of speech production i n which commands f o r the e n t i r e s y l l a b l e are i n i t i a t e d simultaneously and executed simultaneously as long as they are noncompeting.  Competing commands, such as l i p  r e t r a c t i o n vs l i p rounding, are executed i n sequence.  Therefore commands  f o r an / i / i n one environment, would be d i f f e r e n t from commands f o r an IM  i n another environment.  C o a r t i c u l a t i o n would be maximum w i t h i n  the a r t i c u l a t o r y s y l l a b l e , and minimum across such s y l l a b l e boundaries. Such a s y l l a b l e i s described by Kozhevnikov and Chistovich  as the  CC...V u n i t , which has been found by themselves and others to be a strongly cohesive u n i t and to e x h i b i t strong c o a r t i c u l a t i o n e f f e c t s within  itself. Fromkin [1966] used electromyography to study action of the  o r b i c u l a r i s o r i s muscle f o r production of /b/, /p/, and-various rounded and unrounded vowels i n English.  Her r e s u l t s , obtained from three  speakers, show that no simple correspondence e x i s t s between phoneme and motor command; d i f f e r e n t muscle action p o t e n t i a l s are responsible f o r producing an i n i t i a l /b/ or /p/ and a f i n a l /b/ or /p/. However, f u r t h e r contextual aspects have no e f f e c t on the muscle gesture f o r these phonemes, at l e a s t as f a r as t h i s muscle i s concerned.  Muscle  10 action p o t e n t i a l s are r e l a t i v e l y i n v a r i a n t f o r production of the /b/ in a /bVC/ s y l l a b l e , regardless of the values of the f o l l o w i n g phones. S i m i l a r l y , action p o t e n t i a l s f o r f i n a l /b/ are unaffected by preceding phones i n a /CVb/ s y l l a b l e .  The same r e s u l t s apply to i n i t i a l and  f i n a l /p/. Looking at EMG a c t i v i t y of the same muscle during vowel production, Fromkin d i d note influence of adjacent phonemes. The rounded vowels /u/ and /o/ show appreciably lower peak amplitude of EMG a c t i v i t y when f o l l o w i n g i n i t i a l /b/, which i t s e l f involves c o n t r a c t i o n of the  o r b i c u l a r i s o r i s muscle, than when f o l l o w i n g i n i t i a l /d/. Muscle  a c t i v i t y f o r a rounded vowel i s uninfluenced i n amplitude or duration by the f o l l o w i n g consonant of a CVC s y l l a b l e , be i t /b/ or /d/.  Thus  i t seems that some aspects of context somehow r e s t r i c t or reorganize the  neuromuscular commands and gestures f o r some phonemes, while  other aspects do not.  Just what the nature of the reorganization i s ,  i s not known, Fromkin s t a t e s .  Her f i n d i n g s lead her to put f o r t h  two suggestions. Perhaps the minimal l i n g u i s t i c u n i t a t the motor command l e v e l i s l a r g e r than the phoneme, p o s s i b l y , i n her words, of the  order of a s y l l a b l e .  This theory agrees with the Kozhevnikov-  C h i s t o v i c h model of speech o r g a n i z a t i o n .  However, Fromkin does not  give any i n d i c a t i o n of the s i z e or nature of the s y l l a b l e proposed. The second p o s s i b i l i t y i s that motor commands are a l t e r e d with context by a feedback system concerning the e x i s t i n g state of muscle p o s i t i o n and a c t i v i t y , or by information held i n short-term memory. This theory i s consistent with the idea that the phoneme i s a basic u n i t of speech production at the neuromuscular l e v e l .  Both theories  11 proposed by Fromkin are able to account f o r the c o a r t i c u l a t i o n e f f e c t s she observed. Ohman [1966] describes the c o a r t i c u l a t e d VCV utterance as follows: We have c l e a r evidence that the stop-consonant gestures are a c t u a l l y superimposed on a contextdependent vowel substrate that i s present during a l l of the consonantal gesture. [Ohman, 1966, p. 165] Production of the consonant i n such a s y l l a b l e involves three separate, but probably overlapping, sets of muscles i n the tongue, each of which has separate neural representation i n the motor control networks o f the  brain.  The response of the tongue to a r t i c u l a t o r y commands coming  independently over three d i f f e r e n t channels i s a summation of the components of the i n s t r u c t i o n s .  As the tongue i s executing commands  for one phone, c e r t a i n subsets of muscles are l e f t free to a n t i c i p a t e the  f o l l o w i n g phone, i n s t r u c t i o n s f o r which are also coming down  independently.  Therefore, consonant production i s accomplished by  a r t i c u l a t o r y adjustments that p a r t i a l l y a n t i c i p a t e the c o n f i g u r a t i o n of the succeeding vowel, though c e r t a i n components of  are i n h i b i t e d  during C production. Henke [1966] proposes a system whereby production i s programmed phoneme by phoneme, but there i s a scanning of upcoming feature s p e c i f i c a t i o n s .  I f a phoneme has no s p e c i f i c a t i o n f o r a  p a r t i c u l a r f e a t u r e , such as l i p rounding, the system looks ahead to the  next phoneme f o r which that feature i s s p e c i f i e d , and the  a r t i c u l a t o r s i n i t i a t e movement toward that goal.  12 MacNeilage and DeClark [1969] questionned whether changes i n motor gesture with context are due to changes i n underlying  neurological  control or to mechanical c o n s t r a i n t s and m o d i f i c a t i o n s on an i n v a r i a n t phoneme command. EMG  Examination of cinefluorograms  of the vocal t r a c t and  t r a c i n g s from nine a r t i c u l a t o r y l o c a t i o n s showed that both l e f t - t o -  r i g h t (carryover) e f f e c t s and r i g h t - t o - l e f t ( a n t i c i p a t o r y ) e f f e c t s of adjacent phonemes on each other are present i n CVC  syllables.  They s t a t e :  I t i s q u i t e c l e a r from these r e s u l t s that the command system responsible f o r CVC s y l l a b l e s does not c o n s i s t of a s e r i e s of context-independent phoneme commands that r e t a i n t h e i r independence a l l the way down to the l e v e l of muscle c o n t r a c t i o n . [MacNeilage and DeClerk, 1969, p. 1228] They hypothesize three mechanisms at work to account f o r these e f f e c t s . F i r s t i s an a n t i c i p a t o r y mechanism, i n which the greater the amount of muscle c o n t r a c t i o n required f o r a c e r t a i n phoneme, the greater the amount of a n t i c i p a t o r y contraction of that muscle i n the preceding  phoneme.  An i n h i b i t o r y component against muscle c o n t r a c t i o n a n t a g o n i s t i c to the muscular movement required f o r the upcoming phoneme might a l s o be involved i n the a n t i c i p a t o r y mechanism. right-to-left coarticulatory effects. work i s a c o m p a t i b i l i t y mechanism.  Such a system can e x p l a i n  The second mechanism at  Since more or l e s s c o n t r a c t i o n  i s necessary to assume a p a r t i c u l a r a r t i c u l a t o r y p o s i t i o n , depending on the previous p o s i t i o n of the a r t i c u l a t o r , upcoming commands f o r c o n t r a c t i o n might be made compatible with the e x i s t i n g state of muscle c o n t r a c t i o n .  This would be accomplished v i a a feedback  system i n v o l v i n g the cerebellum. for  the  strong  left-to-right  Such a system i s able to account influence  imposed  by  context.  13 This mechanism i s somewhat, s i m i l a r to one proposed by Fromkin [1966]. The t h i r d suggested mechanism at work i s a gamma-loop mechanism.  In  t h i s case commands are sent down f o r a muscle to assume a p a r t i c u l a r length, regardless of i t s e x i s t i n g length, by the gamma system of motoneurons which innervate s t r e t c h - r e c e p t i v e spindles w i t h i n the muscles.  Thus commands would be i n v a r i a n t , but EMG  activity  necessary to achieve the s p e c i f i e d length would show the dependent v a r i e t y seen i n several s t u d i e s .  context-  This model seems approp-  r i a t e f o r speech production which involves approximation of target p o s i t i o n s regardless of context. MacNeilage and DeClerk point out tnat j o i n t a c t i o n of the three mechanisms o u t l i n e d above on i n v a r i a n t phoneme commands cannot account f o r a l l the c o a r t i c u l a t i o n e f f e c t s seen. two  f u r t h e r mechanisms that do  The authors c i t e  not necessitate r u l i n g out i n v a r i a n t  phoneme commands as the basis of production.  At l e a s t they may  present at c e r t a i n l e v e l s of the speech production  system.  The  be first  p o s s i b i l i t y i s that other m o d i f i c a t i o n mechanisms, such as the use of somesthetic information, are at work.  The second p o s s i b i l i t y i s  that to a c e r t a i n , maybe considerable, extent, motor commands are organized  i n u n i t s l a r g e r than the phoneme; perhaps as suggested by  others, commands are issued f o r a s y l l a b l e at a time.  However, since  they were unable to observe e f f e c t s of i n i t i a l and f i n a l consonants on each other, MacNeilage and DeClerk suggest that the CVC not q u a l i f y as the u n i t of command o r g a n i z a t i o n . CV segment, which shows more  u n i t does  They feel that the  right-to-left coarticulation effects  than the VC segment, i s a more cohesive u n i t .  14  .  Dani 1 off and Moll U 9 6 8 ] extended Kozhevnikov and Chistovich's 1965 work on l i p p r o t r u s i o n , to the production of s t r i n g s of one to four consonants followed by the rounded vowel /u/.  The  sequences  were embedded i n meaningful English sentences and spoken by three subjects.  Though the utterances contained the phonemes / r / and / l / ,  which themselves may involve l i p p r o t r u s i o n , the authors noted that such an amount of protrusion was small. evaluate a r t i c u l a t o r y behavior.  Cineradiography was used to  Findings show that l i p protrusion  extends over as many as four consecutive consonants before a rounded vowel, and that the extent of c o a r t i c u l a t i o n i s not a f f e c t e d by word or s y l l a b l e boundaries w i t h i n the consonant s t r i n g .  Results are i n  general agreement with those of Kozhevnikov and C h i s t o v i c h .  However,  D a n i l o f f and Moll observed onset of protrusion before contact f o r the f i r s t consonant was achieved, whereas Kozhevnikov and C h i s t o v i c h noted protrusion onset at the time of contact f o r the f i r s t consonant. In a number of cases noted by D a n i l o f f and M o l l , protrusion began even before movement toward the f i r s t consonant was i n i t i a t e d , that i s , outside the boundary of the CC...V u n i t .  Cowan [1973] found  s i m i l a r c o a r t i c u l a t i o n e f f e c t s f o r l i p protrusion i n French utterances. Six  native French speakers produced utterances containing s t r i n g s of  four and six. consonants before a rounded vowel.  She.found that i n  almost a l l cases, protrusion f o r the upcoming vowel began with production of the f i r s t consonant of the c l u s t e r , and i n approximately h a l f the cases, protrusion began during the production of the vowel preceding the consonant c l u s t e r .  15 C o a r t i c u l a t i o n e f f e c t s have been observed the l a t e r a l pharyngeal  i n the motion of  wall [Kelsey ejt a^.., 1969].  An u l t r a s o n i c  method of data c o l l e c t i o n was used, i n which a pulsed u l t r a s o n i c signal was beamed toward the pharyngeal  wall and the time of echo return  provided a measure of displacement of the a r t i c u l a t o r . uttered VCV utterances.  Three speakers  Data show that displacement during production  of /a/ v a r i e s as a f u n c t i o n of phonetic context. Amerman et al_. [1970] i n v e s t i g a t e d c o a r t i c u l a t i o n e f f e c t s jaw  and  l i p movements  produced meaningful  by  cineradiography.  Four  speakers  utterances which included segments of one to four  consonants preceding the vowel /&/.  Jaw lowering and l i p r e t r a c t i o n  are two gestures involved i n the production of t h i s vowel.  Jaw lowering  v/as found to c o a r t i c u l a t e over two and sometimes three phones before /a&/, and could presumably extend over a l l four consecutive consonants, had not one of the consonants c o n s i s t e n t l y been / s / .  Amerman et a l .  found / s / production a n t a g o n i s t i c to jaw lowering; t h i s gesture was never i n i t i a t e d during / s / production, but began immediately a f t e r i t . S i m i l a r l y , l i p r e t r a c t i o n seemed to be i n h i b i t e d by / s / production and was never i n i t i a t e d during i t .  However, a good / s / can be produced  with r e t r a c t e d l i p s and the authors suggest that perhaps i n h i b i t i o n of one gesture f o r /ae./ production f a c i l i t a t e s i n h i b i t i o n of another gesture r e l a t e d to /a&/ production.  In general, l i p r e t r a c t i o n was  not as e x t e n s i v e l y c o a r t i c u l a t e d as jaw lowering.  Though i t sometimes  extended two and three consonants before the vowel, several of the cases showed r e t r a c t i o n beginning with the s t a r t of the vowel and not  16 before.  However the l i p r e t r a c t i o n measure was not considered by the  authors as r e l i a b l e a measure as jaw lowering, due f o r instance to some l i p p r o t r u s i o n during />/ production.  The authors f e e l that i n -  c o n s i s t e n c i e s i n the synchrony and s t a r t i n g points of the two gestures are not predicted by the Kozhevnikov-Chistovich model, which states that commands f o r the s y l l a b l e are s p e c i f i e d simultaneously and synchronously.  The nature of the c o a r t i c u l a t o r y u n i t found i n t h i s  study i s i n agreement with that model's a r t i c u l a t o r y s y l l a b l e , i . e . a CC..V  unit.  The data f i t Henke's model of production e q u a l l y w e l l .  Carney and Moll [1971] extended Ohman's 1966 study of coa r t i c u l a t i o n i n VCV utterances.  Whereas Ohman had examined c o a r t i c u -  l a t i o n of vowels and stop consonants, Carney and Moll looked at fricative-vowel interactions.  MacNeilage [1963] had previously shown  acoustic properties of the f r i c a t i v e / f / to be context dependent; s p e c i f i c a l l y , duration of / f / i n f i n a l p o s i t i o n was twice as great as f o r / f / embedded i n a consonant c l u s t e r .  However electromyograms  taken at the l i p s during / f / production d i d not show pattern changes with context, except to some extent f o r onset of a c t i v i t y .  Carney  and Moll placed f r i c a t i v e s i n a vowel rather than a consonant environment, and looked at e f f e c t s on the tongue as well as the lips.  They analyzed cineradiographs of two speakers  producing  /hVCV/ utterances, i n which C was the f r i c a t i v e / f / , /v/, / s / , or / z / . Unlike MacNeilage,  they found muscle gestures f o r production of f r i c -  a t i v e s to be influenced by context.  Their r e s u l t s agree with Ohman's  [1966] d e s c r i p t i o n of a consonantal gesture superimposed on a basic vowel-to-vowel  diphthongal gesture.  The f i n d i n g s show that i f an  17  .  a r t i c u l a t o r i s f r e e , as the tongue body and t i p are during / f / or /v/ production, then c o a r t i c u l a t i o n i s seen i n the tongue and i n the l i p s throughout the vowel-to-vowel movement. C o a r t i c u l a t i o n e f f e c t s have been observed i n v e l a r movements by Moll and D a n i l o f f [1971]. containing  Four subjects produced English  sentences  various combinations of nasal consonants, non-nasal conson-  ants, and vowels.  Examination of cineradiograms showed that movement  towards v e l a r opening i n a CVN or CVVN (where N = nasal) sequence begins a f t e r contact f o r the i n i t i a l consonant. over the VN or VVN u n i t .  Thus n a s a l i t y i s c o a r t i c u l a t e d  S i m i l a r l y , f o r NVC sequences, movement  towards v e l a r closure begins during the approach to the vowel, and sometimes even during the nasal i t s e l f .  The u n i t over which c o a r t i c u -  l a t i o n extends i n t h i s case i s the VC u n i t . contradict  Kozhevnikov and Chistovich's  These r e s u l t s d i r e c t l y  hypothesis that CV i s the  basic u n i t of production within which c o a r t i c u l a t i o n i s strongest. Moll and D a n i l o f f tend to support a model such as Henke's where commands are s p e c i f i e d phoneme by phoneme. Thus three major systems have been put f o r t h to account f o r c o a r t i c u l a t o r y behaviour.  One i s the Kozhevnikov-Chistovich  " a r t i c u l a t o r y s y l l a b l e " model, i n which neural commands are organized in s y l l a b l e - l i k e u n i t s .  Though t h i s model accounts f o r much of the  observed data, the a r t i c u l a t o r y s y l l a b l e i s described as a CC..V group, whereas studies  i n d i c a t e that c o a r t i c u l a t i o n may extend back  to encompass a VCC..V group [ D a n i l o f f and M o l l , 1968] or a VC or CVC group [Moll and D a n i l o f f , 1971].  However, MacNeilage [1972] c i t e s  18 evidence t h a t , i n a CVC  s y l l a b l e , there i s weaker c o a r t i c u l a t i o n w i t h i n  the VC segment than w i t h i n the CV segment, i n d i c a t i n g that CV i s a strongly cohesive u n i t .  The second major model i s that of Henke, whereby  a forward scanning system allows a free a r t i c u l a t o r to begin movement towards p o s i t i o n f o r an upcoming phoneme.  Such a system would be  operative during a n t i c i p a t o r y coarticulation.MacNeilage & DeClerk [1969] point out that such an a n t i c i p a t o r y mechanism may at work during speech production.  Ohman [1966, 1967] describes a t h i r d  model of c o a r t i c u l a t i o n , i n which a consonantal on a diphthongal  be one of several  vowel-to-vowel movement.  gesture i s superimposed  The phoneme command f o r  consonant production i s i n v a r i a n t , but the vocal t r a c t shape during i t s production i s a r e s u l t of an overlap of vocal t r a c t shape assumed f o r the consonant and the varying shape due to vowel environment. Thus contextual m o d i f i c a t i o n s take place at the motor l e v e l .  Carry-  over c o a r t i c u l a t i o n i s accounted f o r i n most models by mechanoi n e r t i a l f a c t o r s , or by the c o m p a t i b i l i t y mechanism [MacNeilage and DeClerk, 1969] described e a r l i e r .  2.4  C o a r t i c u l a t i o n : The Perceptual Level Recent studies have examined the perceptual c o r r e l a t e s of  coarticulation.  The question asked i s , whether the acoustic and  a r t i c u l a t o r y m o d i f i c a t i o n s due to c o a r t i c u l a t i o n i n an utterance provide information u t i l i z a b l e by the l i s t e n e r .  Al i et^ al_. s t a t e :  I t i s uncertain i n most s p e c i f i c cases i f c o a r t i c u l a t i o n on the a r t i c u l a t o r y l e v e l r e s u l t s i n p e r c e p t i b l e d i f f e r e n c e s on the perceptual l e v e l . . . . I f the answer i s a f f i r m a t i v e , then i t can be said that speech perception 'follows' speech production and makes use of i t s i d i o s y n c r a c i e s . [Al i et al_., 1971, p. 538]  19 . A point to keep i n mind when studying the perceptual  correlates  of c o a r t i c u l a t i o n i s that the subject i s being asked to make subphonemic d i s c r i m i n a t i o n s , subtle d i s t i n c t i o n s that do not a f f e c t the value he assigns to a phone. so?  To what extent can we r e a l i s t i c a l l y expect him to do  I t i s known that subphonemic d e t a i l (one form of which i s allophonic  v a r i a t i o n ) can be d i s t i n g u i s h e d w i t h i n a s i n g l e phoneme category, even though speech perception i s i t s e l f to some extent a c a t e g o r i c a l  process.  For example, Liberman et al_. [1957] showed that l i s t e n e r s can make subphonemic d i s t i n c t i o n s when they are presented with synthetic speech sounds varying along an acoustic continuum.  S t i m u l i were produced by  a pattern playback, consisted of f i r s t and second formant patterns, and varied i n d i r e c t i o n and extent of the second-formant t r a n s i t i o n . v a r i a b l e i s a cue which has been found to be instrumental /b,d,g/ d i s t i n c t i o n s .  This  i n making  Fourteen d i f f e r e n t s t i m u l i were produced, and  presented t o subjects i n an ABX arrangement.  In a separate t e s t , sub-  j e c t s were asked to make phonemic judgments of the same s t i m u l i , that i s , to state whether each was /b/, /d/, or /g/.  Comparing the r e s u l t s  of both s t u d i e s , the authors determined that (1) phonemic d i s t i n c t i o n s along the continuum are c a t e g o r i c a l , the point at which a response changes from one phoneme to another being abrupt and c o n s i s t e n t , (2) subphonemic d i s c r i m i n a t i o n s across phoneme boundaries are able to be made to some extent, and (3) d i s c r i m i n a t i o n s across phoneme boundaries are better and more c o n s i s t e n t l y made than d i s c r i m i n a t i o n s w i t h i n a phoneme category.  20 Fry points out that . . . a pair of utterances may appear i n d i s t i n g u i s h ably the same to a l i s t e n e r of one n a t i o n a l i t y and i n d i s p u t a b l y d i f f e r e n t to a l i s t e n e r of another n a t i o n a l i t y . . . . [Fry, 1964, p. 60] This i s another point to consider i n evaluating perceptual studies of coarticulation.  Fry c i t e s work by Lotz et a_l_. [1960] on phonemic  l a b e l l i n g of the same set of s t i m u l i by d i f f e r e n t language groups. F o r t i s a s p i r a t e d , f o r t i s unaspirated, and l e n i s unaspirated stops were presented to speakers of various languages. into phonemic categories as f o l l o w s :  The s t i m u l i were placed  by English speakers,  /p,t,k/, /b,d,g/, and /b,d,g/ groups r e s p e c t i v e l y ;  into  by Hungarian and  Spanish speakers, into /p,t,k/, /p,t,k/ and sometimes /b,d,g/, and /b,d,g/ groups; by Thai speakers  ( i n whose language a s p i r a t i o n i s  phonemic), i n t o /p,t,k/, / p ^ t * , ^ / , and /b,d,k/ groups. 1  For the  velar case, Thai speakers assigned the l e n i s unaspirated stop to the /k/ category, there being no /g/ in Thai, though the p o s s i b i l i t y of the /g/ label was a v a i l a b l e to them. Thus i t seems that perceptions are influenced by language learning.  In considering t h i s point i n r e l a t i o n to c o a r t i c u l a t i o n  s t u d i e s , one might ask whether French l i s t e n e r s , f o r example, make f i n e r judgments regarding l i p rounding than do English ones. already been seen that c o a r t i c u l a t i o n on the a r t i c u l a t o r y l e v e l  I t has may  be language dependent [Ohman, 1966]. Findings on phonemic l a b e l l i n g opposite to those described above emerged in a study of cross-language  vowel perception c a r r i e d  21 out by Stevens et al_. [1969].  Thirteen unrounded and t h i r t e e n  rounded vowels were synthesized on the OVE II speech s y n t h e s i z e r , with the f i r s t three formants varying along an acoustic continuum.  Two  ABX d i s c r i m i n a t i o n tests' were administered, one f o r the unrounded and one f o r the rounded vowels, to a group of Swedish and a group of American English speakers.  Two phonemic i d e n t i f i c a t i o n t e s t s were  administered f o r the same s t i m u l i to the same subject groups. rounding feature i s phonemic i n Swedish, but not i n E n g l i s h .  The Results  show that f o r vowels presented i n i s o l a t i o n , the l i s t e n e r ' s l i n g u i s t i c experience has e s s e n t i a l l y no e f f e c t on h i s a b i l i t y to make subphonemic d i s c r i m i n a t i o n s , nor does i t appreciably a f f e c t h i s i d e n t i f i c a t i o n of phonemic categories.  L i t t l e d i f f e r e n c e was seen i n the phoneme  boundaries determined by the Swedes and those determined by the Americans.  The boundaries assigned by these groups d i f f e r e d by no more  than one step along the acoustic continuum f o r the unrounded vowel s e r i e s , and one to two steps f o r the rounded vowels. These f i n d i n g s i n a sense do not c o n t r a d i c t the languagedependence found by Lotz and h i s colleagues [I960]. presented with d i f f e r e n t tasks in these two studies.  Subjects were There i s no  reason to assume t h a t , given the same s e r i e s of f o r t i s a s p i r a t e d , unaspirated, and l e n i s unaspirated s t i m u l i , and asked to place each into one of three  categories (a s i t u a t i o n s i m i l a r to the i d e n t i f i -  cation task presented by Stevens et al_.) speakers of a l l languages investigated by Lotz e_t aj_. would not be able to assign each phone to i t s appropriate category.  For some of these speakers, some of  the category assignments would be based on a phonemic  distinction,  22 and some would be based on a subphonemic d i s t i n c t i o n .  English speakers  involved i n the experiment by Stevens et al_. placed rounded vowels into phoneme categories not appreciably d i f f e r e n t from (although somewhat less c o n s i s t e n t than) those chosen by the Swedes, though f o r the o  English speakers the placements were based on subphonemic  discriminations.  What Lotz's experiment does show, i s that depending on h i s l i n g u i s t i c experience, a l i s t e n e r may chose to ignore some of the d i s t i n c t i o n s he i s capable of making. In a d d i t i o n to t h e i r study of vowel d i s c r i m i n a t i o n s described above, Stevens e_t al_. [1969] r e p l i c a t e d the experiment on consonant d i s c r i m i n a t i o n done by Liberman et a]_. [1957, also described above]. Synthetic stop consonants, f o r which the f i r s t three formants varied along an acoustic continuum, were presented i n an ABX s i t u a t i o n . Stevens and h i s coworkers found that subphonemic d i s c r i m i n a t i o n s along a physical scale were better made f o r vowels than f o r stop consonants. For example, c o r r e c t d i s c r i m i n a t i o n could be made w i t h i n a vowel phoneme category 80-90% of the time (depending on how f a r along the acoustic continuum they d i f f e r e d ) , but w i t h i n a consonant phoneme category only 60-65% of the time.  The authors c i t e the suggestion  that d i f f e r e n t mechanisms may be involved i n vowel and consonant perception.  In a d d i t i o n , i n v e s t i g a t o r s have found that vowels are  not perceived as c a t e g o r i c a l l y as consonants [Kozhevnikov and C h i s t o v i c h , 1965; Liberman et al_., 1967], a l s o suggesting that separate perceptual processes may be a t work f o r these two classes of phones.  However,  Liberman et al_. point out that vowels studied i n i s o l a t i o n , or the "unencoded" s t a t e , as i n the above s t u d i e s , may not t r i g g e r perception  23 in the speech mode, and that evidence e x i s t s that vowels embedded i n phonetic context are more nearly c a t e g o r i c a l l y perceived than are unencoded vowels. Liberman e_t al_. [1967] discuss subphonemic perception as being e s s e n t i a l to speech perception: That subphonemic features are present both i n production and perception has by now been quite c l e a r l y e s t a b l i s h e d . . . we must deal with the phonemes i n terms of t h e i r c o n s t i t u e n t features because the existence of such features i s e s s e n t i a l to the speech code and to the e f f i c i e n t production and perception of language. . . . high rates of speech would overtax the temporal r e s o l v i n g power of the ear i f the acoustic signal were merely a cipher on the phonemic s t r u c t u r e of the language. [Liberman et a]_., 1967, p. 446] I t should be noted that the "features" discussed above are not the d i s t i n c t i v e features discussed by Jakobson and his colleagues, but are c o n s t i t u e n t motor gestures and neural commands of phonemes.  These  researchers support the motor theory of speech perception, which states that speech i s perceived i n reference to the motor gestures that can produce i t . g r e a t l y and s t i l l  They showed that acoustic s i g n a l s may  produce the same perceptual e f f e c t .  vary  For example,  the frequency of the s t a r t i n g point of the second formant t r a n s i t i o n from /d/ to a f o l l o w i n g vowel can vary by as much as 1000 Hz, depending on the vowel, yet a /d/ i s perceived i n a l l cases.  Since a  phoneme's acoustic signal varies not only with context but also from speaker to speaker, i t i s necessary to explain how the l i s t e n e r i d e n t i f i e s the phoneme each time.  Liberman et al_. propose that  the l i s t e n e r traces the v a r i a b l e acoustic signal back to the less  24 v a r i a b l e a r t i c u l a t o r y gestures with which he himself would produce the s i g n a l .  He then i d e n t i f i e s the signal i n reference to these  motor gestures.  Since the motor gesture f o r a p a r t i c u l a r phone  can be broken down into several elements (e.g. r a i s i n g the velum, r a i s i n g or lowering the tongue, i n i t i a t i n g v i b r a t i o n of the vocal c o r d s ) , then perception of the phone's constituent features can i n some manner occur. To what extent the l i s t e n e r may perceive subphonemic, or a l l o p h o n i c , v a r i a t i o n s , has been examined by Wickelgren [1969]. He c i t e s the c o n t e x t - s e n s i t i v e allophone as a u n i t of perception. Such a u n i t i s one which s p e c i f i e s i t s r i g h t and left-hand neighbours. Thus the word "tap" would be coded as / ^ t  8 6  /,  * 4,P*7-  The  input to the perceptive mechanism could thus be an unordered set o f symbols, the coding system allowing c o r r e c t order to be recovered from such a s e t .  The context to which such allophones are s e n s i t i v e  i s l i m i t e d to one preceding and one f o l l o w i n g phone, i n Wickelgren's model.  As we have seen, such i s not the case i n production, where  a phoneme such as a rounded vowel may e x h i b i t an e f f e c t on another phoneme as many as s i x sounds removed from i t s e l f [Cowan, 1973]. Perhaps though, an allophone i s s e n s i t i v e to an extent which i s perceivable only to adjacent phonemes.  A major problem with  Wickelgren's hypothesis i s the extremely large number of neural u n i t s that must be a v a i l a b l e and through which a l l acoustic input must be channeled, f o r i t i s assumed that each c o n t e x t - s e n s i t i v e allophone has i t s own neural representation. I t may be appropriate here to point out the arguments  that  25 e x i s t f o r various other perceptual u n i t s . take place on several l e v e l s .  Speech perception  may  Though subphonemic d i s t i n c t i o n s can  be made, the f a c t that consonants show strong and d e f i n i t e c a t e g o r i c a l perception [Liberman et al_., 1957], and that the same i s true of vowels to a l e s s e r extent [Stevens et_ al_., 1969], provides evidence f o r the phoneme as a basic speech perception u n i t .  Savin and Bever  [1970], however, believe that i n d i v i d u a l phonemes are i d e n t i f i e d only a f t e r perception on yet another l e v e l has been c a r r i e d out.  They  asked subjects to monitor a speech sample f o r a p a r t i c u l a r u n i t , e i t h e r a s y l l a b l e or a phoneme w i t h i n a s y l a b l e .  Results showed  that response times f o r s y l l a b l e i d e n t i f i c a t i o n were f a s t e r than f o r i d e n t i f i c a t i o n of a p a r t i c u l a r phoneme, suggesting the s y l l a b l e f i r s t perceived as a u n i t , before the phoneme i t s e l f was Certain s y n t a c t i c sequences may  was  identified.  be perceived as u n i t s .  By  presenting extraneous sounds ( c l i c k s ) during sentences, Ladefoged and Broadbent [1960] found that l i s t e n e r s tend to locate the c l i c k s f a r removed from t h e i r actual l o c a t i o n . displacement  They argue that s u b j e c t i v e  of c l i c k s i s towards boundaries of perceptual  Several f u r t h e r studies on c l i c k displacement,  units.  o u t l i n e d by Lehiste  [1972], have been c a r r i e d out with i n c o n s i s t e n t r e s u l t s .  Subjective  l o c a t i o n of extraneous sounds i s also r e l a t e d to s t r e s s , i n t o n a t i o n , and other surface phenomena.  However, i t i s c l e a r that acoustic  cues alone do not determine the boundaries of perceptual u n i t s , and that higher l e v e l sequences are somehow perceived as u n i t s .  26  •  Lehiste [1972] sums up a d i s c u s s i o n on perceptual u n i t s by saying that two basic steps e x i s t i n speech perception:  primary  processing, c o n s i s t i n g of auditory and phonetic processing, and l i n g u i s t i c processing, c o n s i s t i n g i n part of phonological and s y n t a c t i c processing.  Though the auditory l e v e l must precede other l e v e l s of  processing, i t i s possible that phonetic and l i n g u i s t i c processing may proceed concurrently.  Units at d i f f e r e n t l e v e l s d i f f e r i n  size. Thus we see t h a t , though perception i s p r i m a r i l y a c a t e g o r i c a l process on one l e v e l , and that higher l e v e l sequences may act as u n i t s in perception, l i s t e n e r s are indeed capable of making subphonemic distinctions.  I t i s t h i s type of d i s c r i m i n a t i o n that subjects are  asked to make i n the c o a r t i c u l a t i o n studies o u t l i n e d below.  It is  possible that many of the large number of d i s t i n c t i o n s a l i s t e n e r can make when hearing a speech sample are ignored, i n favor of grouping several d i f f e r e n t , but somehow s i m i l a r , sounds into a s i n g l e category f o r quicker processing.  Whether subphonemic perception i s of primary  importance i n the speech perception process i s not c l e a r , since d i s c r i m i n a t i o n i s c o n s i s t e n t l y poorer w i t h i n a phoneme category than across i t s boundaries.  However, i n times of unfavorable c o n d i t i o n s ,  f o r example a noisy environment, or a large amount of information having to be processed  q u i c k l y , i t may be that subphonemic nuances  are used by the perceptual mechanism to provide a d d i t i o n a l cues. Perceptual r e a l i t y of c o a r t i c u l a t o r y e f f e c t s would mean t h a t , on hearing one sound, the l i s t e n e r not only has acoustic information on its value-, but has information to v e r i f y the value he has assigned  27 to the preceding phone(s), and to t e n t a t i v e l y a n t i c i p a t e the value of the upcoming phone(s).  Such a process would f a c i l i t a t e c o r r e c t  i d e n t i f i c a t i o n of any one speech sound.  Let us now examine the few  studies that have been done on the perception of c o a r t i c u l a t o r y effects. Moll and D a n i l o f f [1971] had shown that  velopharyngeal  opening i n CVN and CVVN sequences (where N i s a nasal consonant) almost always begins during the CV t r a n s i t i o n .  To t e s t the p e r c e i v a b i 1 i t y  of t h i s c o a r t i c u l a t e d n a s a l i t y , Al i ejt al_. [1971] s p l i c e d the f i n a l consonant and i t s VC t r a n s i t i o n from English CVC and CVVC utterances, in which the f i n a l consonant was sometimes a nasal and sometimes not. Twenty-two subjects were presented with the s p l i c e d utterances and asked to i d e n t i f y the missing consonant as nasal or non-nasal.  Results  show that nasal s t i m u l i were c o r r e c t l y i d e n t i f i e d s i g n i f i c a n t l y above chance l e v e l .  There was no s i g n i f i c a n t d i f f e r e n c e between c o r r e c t  perception of /n/ and /m/.  Stop consonants were i d e n t i f i e d as  nasals more frequently than were f r i c a t i v e s .  Consonants f o l l o w i n g  the vowel /a/ were perceived as nasal more often than consonants f o l l o w i n g other vowels. were found.  S i g n i f i c a n t i n d i v i d u a l subject d i f f e r e n c e s  The authors b e l i e v e that i n the case of n a s a l i t y , the  perceptual mechanism does make use of c o a r t i c u l a t e d information. Lehiste and Shockey [1972] tested the p e r c e i v a b i 1 i t y of vowels removed from a VCV utterance (where C i s a stop  consonant).  Ohman [1966] had p r e v i o u s l y shown that the VC and CV t r a n s i t i o n s i n such an utterance are influenced by the transconsonantal  vowel.  For the perceptual t e s t , VCV utterances were cut i n two during the  28 consonant c l o s u r e .  Over twenty subjects were asked to i d e n t i f y the  missing i n i t i a l or f i n a l vowel.  Though Lehiste and Shockey noted  the same c o a r t i c u l a t i o n e f f e c t s s p e c t r o g r a p h i c a l l y f o r t h e i r  utterances  as did Ohman, they found that these contextual e f f e c t s are not s u f f i c i e n t f o r i d e n t i f i c a t i o n of the deleted segment.  Nor was enough  information present i n the s p l i c e d utterances to i d e n t i f y a feature of the deleted phone, such as high/low or front/back; i n c o r r e c t responses did not tend to share a feature with the c o r r e c t response. The authors conclude that "whatever the e f f e c t s of c o a r t i c u l a t i o n in terms of t h e i r influence on formant t r a n s i s i t i o n s , these e f f e c t s are not s u f f i c i e n t to have an influence on perception" and Shockey, 1972,  [Lehiste  p. 84]. Lehiste [1972] c i t e s these r e s u l t s as  evidence against Wickelgren's [1969] model of speech perception, which involves coding of c o n t e x t - s e n s i t i v e allophones. The physical modifications are undoubtedly there, but i f the context of a c o n t e x t - s e n s i t i v e allophone i s not p e r c e p t i b l e , i t seems u n j u s t i f i e d to assume that c o n t e x t - s e n s i t i v e allophones are the basic u n i t s of speech perception. [ L e h i s t e , 1972, p. 5] Lehiste and Shockey's [1972] f i n d i n g s are contrary to those of Kuehn [1970], as c i t e d by Carney and Moll [1971], who l i s t e n e r s were able to p r e d i c t  found that  of a V - j ^ utterance above chance  l e v e l , when they were given the i n i t i a l segments of the utterance. However, Carney and Moll do not discuss the t e s t s i t u a t i o n used by Kuehn, and therefore s t r i c t comparisons between the two  studies  cannot be made. In comparing the A l i et al_., and Lehiste and Shockey s t u d i e s , we see that context of the CV- and CVV-  u n i t s was  recoverable,  but  29 that context of the VC- or -CV u n i t was not. the  I t may be noted that i n  f i r s t case, the subphonemic cues r e l a t i n g to context must be  elicited  from the preceding vowel, and i n the second case, from the  preceding or f o l l o w i n g VC or CV t r a n s i t i o n .  I t has already been  seen that subphonemic d i s c r i m i n a t i o n s are more e a s i l y made f o r vowels than f o r consonants [Stevens et al_., 1969], and i f we f o r a moment consider the CV or VC t r a n s i t i o n as part of the consonant, or at l e a s t as behaving as a consonant i n t h i s respect, then we may adduce an explanation f o r the above f i n d i n g s :  c o a r t i c u l a t i o n e f f e c t s on a  vowel are more e a s i l y perceived than those on a consonant.  However,  i t must be kept i n mind that there i s i n d i c a t i o n that vowels i n phonetic context are not as d i f f e r e n t l y perceived from consonants as data on i s o l a t e d vowels suggests [Liberman ejt al_., 1967].  Also,  the motor gestures involved i n the c o a r t i c u l a t o r y e f f e c t s of the two cases described above are d i f f e r e n t -- the f i r s t involves lowering of the  velum, the second involves tongue movement.  the  e f f e c t s of these two motor gestures are perceived to d i f f e r e n t  extents.  I t may be that  Human l i s t e n e r s may be inherently more aware of s l i g h t  changes i n one type of gesture than i n another. Clark and Sharf [1973] looked at c o a r t i c u l a t o r y e f f e c t s of  o n  s n o r  t  term r e c a l l  of V-j i n V^CV utterances. By presenting 2  l i s t s of VC/V ( f i n a l vowel d e l e t e d ) , VCV ( f i n a l vowel r e t a i n e d ) , and VC# (no f i n a l vowel produced and thus no c o a r t i c u l a t i o n present) utterances to subjects, they found that the presence of c o a r t i c u l a t i o n influenced the % c o r r e c t r e c a l l  of the i n i t i a l vowel.  They determined  that the c o a r t i c u l a t i o n e f f e c t s i n question are perceived by the  30 l i s t e n e r and r e g i s t e r e d i n short term memory.  Previous i n v e s t i g a t o r s  have suggested that the l i s t e n e r remembers f o r a c e r t a i n time the spectral c h a r a c t e r i s t i c s of the phone he hears, and on i d e n t i f y i n g i t as a phoneme, uses the necessary [ L e h i s t e , 1972].  information and discards the r e s t  In other words, he r e t a i n s subphonemic information  in his memory f o r some unspecified length of time.  Whether the  process as described by Clark and Sharf i s n a t u r a l l y operative i n speech perception i s not c l e a r , s i n c e , though r e c a l l f o r the VC/V c o n d i t i o n was f a c i l i t a t e d over the VC# c o n d i t i o n , the VCV c o n d i t i o n did not have the same f a c i l i t a t i v e e f f e c t .  The authors  attribute  t h i s to a possible perceptual overloading, the subject hearing twice as many vowels i n the VCV than i n the VC/V c o n d i t i o n .  They suggest  that even i n the VCV c o n d i t i o n , the e f f e c t s may be r e g i s t e r e d but ignored. Sharf and Ostreicher [1973] looked a t the e f f e c t s of c o a r t i c u l a t i o n on i d e n t i f i c a t i o n of nasal consonants i n noise. utterances of the form C-jVI^V, where  Using  c o n s i s t s of 0, 1, or 2 non-  nasal consonants, they found that i d e n t i f i c a t i o n of N was s i g n i f i c a n t l y better when a l l the post-nasal sounds were retained than when they were deleted.  That i s , when the carryover c o a r t i c u l a t i o n e f f e c t s  present i n the post-nasal sounds v/ere a v a i l a b l e , subjects scored better i n nasal i d e n t i f i c a t i o n i n noise than when these e f f e c t s were removed.  By asking subjects to i d e n t i f y the f i n a l vowel from  the same truncated utterances, the authors noted a better than chance l e v e l of c o r r e c t i d e n t i f i c a t i o n i f no consonant had o r i g i n a l l y intervened between N and V, and a c o n s i s t e n t but i n s i g n i f i c a n t trend  31 f o r the number of c o r r e c t vowel i d e n t i f i c a t i o n s to decrease as the number of intervening consonants  increased from 0 to 2.  This seems  to i n d i c a t e that a n t i c i p a t o r y c o a r t i c u l a t i o n e f f e c t s of the vowel on the nasal a i d i n i d e n t i f i c a t i o n of the deleted vowel, but that as nasal and vowel move f a r t h e r apart, the weakened c o a r t i c u l a t o r y becomes imperceptible.  effect  Thus they conclude that a n t i c i p a t o r y c o a r t i c u l a t i o n  produces a strong enough cue i n the nasal to f a c i l i t a t e  identification  of the upcoming vowel, and that cues present i n the vowel due to carryover c o a r t i c u l a t i o n with the preceding nasal a i d i n the c o r r e c t perception of the nasal. I t remains to be seen which c o a r t i c u l a t o r y influences are perceivable and which are not, and over how long a sequence of phones c o a r t i c u l a t o r y information i s usable.  32  CHAPTER 3 AIMS OF THE EXPERIMENT  Some major questions i n the study of speech perception are:  What features and cues does the l i s t e n e r a b s t r a c t from the speech  signal in attempting to i d e n t i f y i t ?  Is a l l the acoustic information  present in the signal u t i l i z a b l e f o r the perception process?  Are  a l l the f i n e , as well as g r o s s , motor adjustments involved i n the production of the speech signal recognized and i n t e r p r e t e d by the listener?  Research has shown that neither the acoustic value of a  phoneme, nor the motor gesture that produced i t , d i f f e r e n t contexts.  is  i n v a r i a n t across  How much of t h i s v a r i a t i o n i s p e r c e i v a b l e , and  to what extent does i t a c t u a l l y provide cues f o r perception? Studies on the perceptual c o r r e l a t e s of c o a r t i c u l a t i o n have begun to i n d i c a t e that the l i s t e n e r may use some of the ever-present contextual v a r i a t i o n as an aid in i d e n t i f y i n g speech sounds.  The  present experiment attempts to provide f u r t h e r information in t h i s area.  It asks whether c o a r t i c u l a t i o n provides perceivable i n f o r m a t i o n ,  that i s , whether i t contains cues usable in the speech perception process.  Utterances containing the sequence - C - ^ C ^ C ^ V - (where C^ i s  a consonant and V a rounded or unrounded vowel), i n which c o a r t i c u l a t e d l i p rounding i s known to occur when V i s a rounded vowel, are truncated at four points before the vowel.  Edited versions thus contain  d i f f e r e n t amounts of c o a r t i c u l a t e d information.  By presenting these  s t i m u l i to p h o n e t i c a l l y t r a i n e d and p h o n e t i c a l l y naive native French and native English speakers, the present experiment attempts to do the f o l l o w i n g : 1.  To discover whether c o a r t i c u l a t i o n of l i p rounding i n French produces perceivable information, by asking subjects to i d e n t i f y a missing vowel f o r which c o a r t i c u l a t i o n i s present.  2.  To discover over how many segments such information i s perceivable.  C o a r t i c u l a t i o n on the a r t i c u l a t o r y l e v e l  i s known to extend over a l l four consonants i n the type of utterance described above. 3.  To i n v e s t i g a t e the language-dependent nature of the perception of c o a r t i c u l a t e d information, by comparing r e s u l t s from French and English speakers; and to reveal whether perception of these cues plays a normal part in the speech perception process, or whether they  may  nevertheless be abstracted from speech by a s u i t a b l y trained l i s t e n e r , by comparing r e s u l t s from phonetica l l y t r a i n e d and p h o n e t i c a l l y naive subjects.  34  CHAPTER 4 MATERIALS AND METHODS  4.1  P i l o t Study  ,  Preparation of Test Tapes Two  p i l o t t e s t tapes were constructed.  The items of the  f i r s t t e s t contained the consonant c l u s t e r / k s t r / followed by each of the three vowels / i / , /y/, and /u/.  The sequences were derived from  the three French utterances " l a dextre i n i m i t a b l e , " " l a dextre . u n i v e r s e l l e , " and " l a dextre outragee."  These utterances were  recorded during the course of a previous experiment [Cowan, 1973] i n a non-soundproof environment.  A wide-band hum due to the operation  of a graphic recorder during t h e i r recording produced d i s t r a c t i n g background noise on the o r i g i n a l tapes.  However, i t was decided to  use these recordings because the speech wave, the duplex o s c i l l o g r a m , the log i n t e n s i t y of the speech s i g n a l , and a graphic representation of the speaker's upper l i p p r o t r u s i o n were a l l a v a i l a b l e , displayed on separate channels of a Siemens Oscillomink graphic recorder.  The  speaker was a male native speaker of French, from Lausanne, Switzerland. The utterances were edited at three points each, on a PDP-12 d i g i t a l computer, using a set of computer programs w r i t t e n by L. Rice at the UCLA Phonetics Laboratory.  (The e d i t i n g  process  35 w i l l be described i n Section 4.2).  Three edited versions were  made: /ladekstr/ /ladekst/ /ladeks/ The t e s t items were recorded onto a Revox A77 tape recorder. procedure i s also described i n Section 4.2).  (This  The p i l o t t e s t tape  consisted of three samples of each of the three utterances truncated at each of three p o i n t s , f o r a t o t a l of 27 items.  The t e s t was con-  structed so that the longest of the edited versions made up the f i r s t t h i r d of the t e s t , the next longest the second t h i r d , and the shortest the l a s t t h i r d , i . e . : Group 1:  9 items of / l a d e k s t r /  Group 2:  9 items of /ladekst/  Group 3:  9 items of /ladeks/  However, the order of presentation with respect to the missing vowel was random w i t h i n each group, with each vowel being represented T/3 of the time. The second p i l o t t e s t tape was made i n response to some subjects' comments that the f i r s t tape was noisy and d i s t r a c t i n g , and that they had f e l t unsure of the task required of them u n t i l at l e a s t one or two utterances had been played.  I t was constructed  except that the utterances were recorded under soundproof  similarly, conditions,  using an A l t e c 681A LO microphone and a S c u l l y 280 tape recorder.  The  36  •  same speaker recorded the same utterances as used i n the f i r s t t e s t . These speech samples were edited with the same set of computer programs and a t the same three points as described above.  I t was proposed that  the r e s u l t s of the f i r s t and second t e s t s be compared to determine whether background noise on Cowan's tapes produced a s u f f i c i e n t l y lower score to warrant the use of new tapes recorded under soundproof conditions f o r the main experiment.  In response to the comment that  subjects were not sure of the task u n t i l at l e a s t two items had been played, the second t e s t contained 29 items, the f i r s t two being p r a c t i c e items whose r e s u l t s were not considered i n the a n a l y s i s .  Subjects Subjects were s i x a d u l t s (3 male, 3 female), a l l o f whom had some knowledge of phonetics.  Only one s u b j e c t , who was a l s o  the speaker on the tapes, was a native speaker of French.  One subject  was a native speaker of German, a language which makes use of the three vowels under study.  The same 6 subjects took part i n both  Tests I and I I .  Test  Procedure Subjects were seated, one a t a time, alone i n a quiet room.  The t e s t items were presented over headphones at a comfortable listening level.  Subjects were asked to i n d i c a t e i n w r i t i n g whether  the missing vowel was / i / , /y/, or /u/. the o r i g i n a l ^ u t t e r a n c e s had been.  They were f i r s t t o l d what  37 Test I was given in one session and Test II in another.  At  the time that Test II was administered, Test I was readministered to see i f f a m i l i a r i t y with the test situation affected test results. The tests are hereafter referred to as Test l a ( f i r s t session), Test l b (second session), and Test II (second session).  Results Values for relative transmission (T  -j) of information (a  measure to be discussed in Section 4.2) and % correct score were calculated.  Scores were generally higher for Test II than for Test  l a or l b . Since no significant differences were noted between Tests l a , given in the f i r s t session, and l b , given in the second session, i t was assumed that no practice effect was contributing to the increase in T  ^ and score from Test I to Test II.  This suggests  that improvement from Test I to Test II was probably due to the better listening conditions on the second tape. A distribution of T test was calculated.  -| based on random responses to a 27-item  This distribution is shown in Figure 2 and  described in detail in Section 4.2. maximum value of T  From the d i s t r i b u t i o n , the  -j which a subject could obtain by chance 10% of  the time was determined.  T  , values above this level were considered rel  significant values of information transmission and the following was observed:  a l l subjects obtained significant T  -j values for  Group 1 items; 2 out of 6 obtained significant values for Group 2; no subject obtained a significant score for Group 3.  Responses were  38 also analyzed to see i f they tended to have a feature i n common with the stimulus.  The a b i l i t y to perceive front/back d i s t i n c t i o n s and  unrounded/rounded d i s t i n c t i o n s was examined.  For a l l groups of  items, the front/back d i s t i n c t i o n was made more often than the unrounded/ rounded d i s t i n c t i o n .  Both d i s t i n c t i o n s were made more often for Group  1 than f o r Group 2, and f o r Group 3, which contained the shortest edited v e r s i o n s , subjects were g i v i n g responses no d i f f e r e n t from random guessing.  4.2  Main Study  Speech Samples Because scores were g e n e r a l l y higher on P i l o t Test II  than  on Test I (a or b), i t was decided to use utterances recorded under soundproof conditions f o r the main study. Three male native speakers o f French recorded the utterances. Speaker #1 was born i n Lausanne, S w i t z e r l a n d , and had been i n North America f o r 14 years.  Speaker #2 was born i n Grenoble, France, and  had been i n North America f o r 4 years.  Speaker #3 was born i n A l b i ,  France, and had been i n North America f o r 9 years. F i f t e e n utterances were recorded by each speaker, a t l e a s t twice each.  Each utterance contained one o f the consonant sequences  / k s t r / , / r s t r / , / r s k r / , followed by one of the three vowels / i / , /y/, or /u/ i n a l l possible combinations.  Cowan [1973] had shown t h a t ,  f o r the utterances decribed above, upper l i p p r o t r u s i o n most often begins with the approach to the f i r s t consonant i n the c l u s t e r , i f the c l u s t e r i s followed by the rounded vowel /y/ or /u/.  Cowan's  39 f i n d i n g s a l s o applied to utterances with 6-consonant c l u s t e r s . utterances were considered f o r use i n the experiment,  Such  but since the  p i l o t t e s t had shown no s i g n i f i c a n t information to be a v a i l a b l e when the utterance was truncated a f t e r the second consonant of a 4-consonant c l u s t e r , utterances with 6-consonant c l u s t e r s were not used. Recordings  f o r the present experiment were made i n an IAC  soundproof room using an A l t e c 681 A LO microphone and a S c u l l y  1204  280  tape recorder. One set of 9 utterances, c o n s i s t i n g of examples of each of the three c l u s t e r s followed by each of the three vowels, was chosen each speaker.  from  Utterances were chosen on the s u b j e c t i v e bases of c l a r i t y  of the speaker's v o i c e , absence of background noise, s i m i l a r i t y of i n t o n a t i o n patterns of utterances containing the same c l u s t e r , and presence of a l l phonemes i n the cluster..  These 9 utterances are l i s t e d i n Appendix I.  The remaining 6 utterances from each speaker contained a d d i t i o n a l samples of c l u s t e r s which were present i n the other utterances, and these samples were not used.  Spectrograms, on a Kay Sona-Graph Model 7029A, and mingo-  grams, on a Siemens Oscillomink graphic recorder, were made of a l l utterances, f o r reference i n the e d i t i n g process.  E d i t i n g of Speech Samples and  Preparation of Test Tapes  E d i t i n g of utterances was c a r r i e d out using a set of computer programs w r i t t e n by Lloyd Rice f o r a PDP-12 d i g i t a l computer.  This  set of programs d i g i t i z e s the speech signal and d i s p l a y s i t on the computer o s c i l l o s c o p e screen, and allows the speech waveform data to be manipulated  i n various ways.  The speech signal was f i r s t  low  pass f i l t e r e d at 6 kHz to prevent a l i a s i n g of the input s i g n a l .  I t was  intended to d i g i t i z e the speech wave at 12 kHz; however, l i m i t a t i o n s  of the equipment meant that the computer could not keep up with such a f a s t t r a n s f e r rate f o r the length of time i t took to sample the utterance.  The computer was therefore skipping some samples, once the  core b u f f e r had been f i l l e d , and notable d i s t o r t i o n r e s u l t e d .  To  overcome t h i s problem, each utterance was played a t h a l f speed and d i g i t i z e d with a 10 b i t a n a l o g - t o - d i g i t a l converter a t 6 kHz sample frequency, f o r an equivalent of 12,000 samples per second. The d i g i t i z e d speech wave thus produced was stored on d i g i t a l tape and could be displayed on the computer screen.  A knob c o n t r o l l e d the  v e l o c i t y of the speech waveform data as i t moved backward or forward across the screen. the screen.  The waveform could a l s o be made s t a t i o n a r y on  In t h i s way, the speech wave could be v i s u a l l y examined  as the operator saw f i t .  The speech wave was then edited as f o l l o w s :  the speech wave of the whole utterance was displayed on the screen, and the operator marked the desired i n i t i a l point of the truncated utterance by a command on the t e l e t y p e .  In a l l cases, t h i s point  was marked j u s t before the onset of phonation at the beginning of the utterance.  The waveform was then moved slowly across the screen u n t i l  the desired endpoint was v i s i b l e . a t e l e t y p e command.  This point was a l s o entered by  The e n t i r e edited segment was then stored e l s e -  where on the d i g i t a l tape.  In t h i s way, an edited utterance could  be obtained, leaving the o r i g i n a l utterance i n t a c t and a v a i l a b l e for making f u r t h e r e d i t i o n s .  Each utterance was truncated a t four  d i f f e r e n t p o i n t s , producing the four groups of s t i m u l i shown i n Table I.  Since r e s u l t s of the p i l o t study showed that no s i g n i f i c a n t  information i s a v a i l a b l e when t r u n c a t i o n takes place a f t e r the second  TABLE I Groups of Edited Stimuli with Consonant C l u s t e r s / k s t r / , / r s t r / , and / r s k r / . Each sample as Described Above has 3 Versions, One of Which O r i g i n a l l y had the Following Vowel / i / , One /y/, and One /u/. O r i g i n a l Utterances From Which the Edited S t i m u l i Were Derived Are L i s t e d i n Appendix I  GROUP I Truncation  /ladekstr/  immediately  a f t e r the f i n a l  /laverstr/  consonant o f the c l u s t e r  /lamorskr/  GROUP I I Truncation i n the  /ladekst//  middle of the f i n a l  /I avers t / /  consonant  / l amors kf/  GROUP I I I Truncation  immediately  /ladekst / h  after aspiration of  /laverst /  the t h i r d consonant  /lamorsk* /  h  1  GROUP IV Truncation  immediately  a f t e r release o f the t h i r d consonant, before  aspiration  /ladekst/ . /laverst/ /lamorsk/  42 consonant of the c l u s t e r , the shortest group of s t i m u l i f o r the main experiment were truncated a f t e r release of the t h i r d consonant ( e i t h e r a / t / or a /k/) of the c l u s t e r .  With three speakers, 9 utterances  per speaker, and 4 t r u n c a t i o n points per utterance, a t o t a l of 108 t e s t items was  available.  Truncation points were i d e n t i f i e d p r i m a r i l y by v i s u a l examina t i o n of the speech wave on the computer o s c i l l o s c o p e screen.  Spectro-  grams and mingograms were examined f o r a d d i t i o n a l cues when necessary. Figure 1 shows a minogram of one of the utterances, and the four points of t r u n c a t i o n .  I d e n t i f i c a t i o n of t r u n c a t i o n points proved d i f f i c u l t  f o r only one case: t h i r d consonant.  the i d e n t i f i c a t i o n of the end of a s p i r a t i o n  of the  As displayed on the computer screen and the mingograph,  a s p i r a t i o n was not e a s i l y separated from the f o l l o w i n g f i n a l consonant, /r/.  Spectrograms were h e a v i l y r e l i e d upon f o r t h i s information.  Each edited utterance was checked by two l i s t e n e r s f o r auditory confirmation of the point of t r u n c a t i o n . Truncated utterances were played back from the computer through a d i g i t a l - t o - a n a l o g converter, low pass f i l t e r e d at 6 kHz to remove high frequency d i g i t a l noise generated  by the computer, and  recorded onto both channels of a two-channel S c u l l y 280 tape recorder. The computer program a l s o c o n t r o l l e d the operation of the tape recorder; i t was set so that 3.25  seconds of s i l e n c e was  recorded  before and a f t e r each utterance, f o r a t o t a l of 5.5 seconds of s i l e n c e between each item. The order of taping items was randomized with respect to speaker, c l u s t e r , and vowel.  Three p r a c t i c e items, picked at random  4 3 2 1 CD CO  cxi  -'  ;  A  /  -  Speech Wave  Duplex O s c i l l o g r a m  Log I n t e n s i t y o f Speech S i g n a l  F i g u r e 1.  Mingogram o f one o f t h e t e s t u t t e r a n c e s , " l a d e x t r e u n i v e r s e l l e " , showing the  4 points of  truncation:  1.  a f t e r t h e f i n a l consonant o f t h e c l u s t e r  2.  i n t h e m i d d l e o f t h e f i n a l consonant  3.  a f t e r a s p i r a t i o n o f t h e t h i r d consonant o f t h e c l u s t e r (/t* /)  (/r/)  1  ^ co  4.  after release  o f the t h i r d consonant, before  aspiration  44 from among the 108 t e s t items, were recorded at the beginning of the tape.  Two buffer items, also chosen at random from among the t e s t  items, were also recorded, one before the t e s t items and one a f t e r the t e s t items.  Thus the tape contained three p r a c t i c e items, followed  by utterances #1 to 110, of which #2 to 109 were the t e s t items, and #1 and #110 were buffer items whose r e s u l t s were not considered i n the a n a l y s i s . Two tapes were made from the o r i g i n a l tape, using two Revox A77 tape recorders. Tape A contained items i n the o r i g i n a l random order.  Tape B contained the same p r a c t i c e items, but the two halves  of the t e s t ( i . e . #1 to 55, and #56 to 110) were interchanged. two t e s t tapes were a v a i l a b l e . I and II  Thus  The t e s t was recorded on both tracks  of each tape. Numbers were recorded before each t e s t item.  A non-native  speaker of French recorded French numbers on channel I of each t e s t tape, and an English speaker recorded English numbers on channel  II.  Subjects A group o f 10 native French speaking adults and a group' of 10 native English speaking adults p a r t i c i p a t e d i n the experiment. Four females and s i x males made up the French speaking group. They had been i n North America from 3 to 14 years.  Seven subjects had  been born i n France, two i n S w i t z e r l a n d , and one i n Ha'iti. One of the French-born subjects had l i v e d i n several places i n Europe as a c h i l d , but had always spoken French i n the home.  One of the Swiss subjects  had grown up speaking both French and German, though French was her  45 mother language. and Spanish.  The H a i t i a n subject had grown up speaking both  French  A l l subjects had at l e a s t a working knowledge of E n g l i s h .  Four subjects w i t h i n the French speaking group had no knowledge of phonetics, while three had had formal phonetic t r a i n i n g and three were teachers of the French language with some informal phonetic background. Three of the subjects had served as the speakers on the t e s t . Six females and four males made up the E n g l i s h speaking group. A l l subjects had had approximately 4 years of high school French i n Canada, while two had had a d d i t i o n a l French courses i n u n i v e r s i t y , a l s o in Canada, and had each spent several months i n France. subjects considered himself f l u e n t i n French.  None of the  Six subjects had no  knowledge of phonetics while the other four had some degree of phonetic training. A l l 20 subjects passed a pure tone hearing screening t e s t at 15 dB HL f o r the frequencies 500, 1000, 2000, 4000, and 6000 Hz.  Test Procedure The subjects were seated, one at a time, i n a soundproof room with the experimenter.  The t e s t tape was played on a S c u l l y 280  tape  recorder and presented over TDH-39 Maico headphones at a l e v e l of 60-70 dB SPL as measured on a Bruel and Kjaer 2203 p r e c i s i o n sound l e v e l 3 meter with a Bruel and Kjaer 6 cm experimenter monitored  4152 a r t i f i c i a l ear.  The  the t e s t over headphones and c o n t r o l l e d movement  of the tape i n the soundproof room by a remote c o n t r o l u n i t . Subjects were i n s t r u c t e d i n w r i t i n g to l i s t e n to each utterance and to mark the missing vowel' on an answer sheet.  The  46  missing vowels were described as " i " as i n " d i t e s , " "u" as i n "une," and "ou" as i n "bout."  (See Appendix II f o r complete i n s t r u c t i o n s )  The vowels were p h o n e t i c a l l y t r a n s c r i b e d as / i / , /y/, and /u/ f o r those who had a knowledge of phonetics.  English subjects were f i r s t  asked whether they were f a m i l i a r with the vowels as represented i n French orthography.  The experimenter then pronounced  each vowel i n i s o l a t i o n  f o r the English subjects. Included i n the i n s t r u c t i o n s were the nine whole utterances from which the edited versions had been taken.  Inclusion of t h i s l i s t  was meant to show subjects that each truncated utterance could i n f a c t be followed by each of the three vowels.  Subjects were t o l d that the  vowels were represented i n approximately equal proportion on the t e s t (that i s , that each vowel appeared approximately 1/3 of the time). Guessing was s t r o n g l y encouraged.  Subjects were asked to mark an  i n d i c a t i o n o f the confidence they had i n t h e i r answers by marking  their  response with a 1 ( f o r most c o n f i d e n t ) , 2 or 3, only i f they f e l t they had time to make t h i s judgment. The tape track containing French numbers was played f o r a l l but three subjects.  I t was one of these subjects, a French speaker  who was one of the speakers on the t e s t , who suggested that numbering be done i n French instead of the o r i g i n a l E n g l i s h . French subjects heard French numbers.  Subsequently a l l  Each English subject was asked  whether he preferred to hear the numbers i n French or E n g l i s h , and each chose French.  47 The tape was stopped a f t e r the three p r a c t i c e items and subj e c t s were given the opportunity to hear these items again.  Measures o f P e r c e i v a b i l i t y The two measures described below, r e l a t i v e transmission (T -|) and c o r r e c t score ( S ) , were used i n analyzing the r e s u l t s of both the p i l o t and the main experiment. The r e l a t i v e transmission i s a measure o f covariance between input (the s t i m u l u s ) , and output (the subject's r e s p o n s e ) [ M i l l e r and N i c e l y , 1955].  This measure was used to describe the amount o f  t r a n s m i s s i b l e information a v a i l a b l e i n the truncated utterances, and i s given by  T  r e l  ( x ; y ) = -_.^P iog ij  2  I  p.log^  where the input v a r i a b l e i s x, with any one input x^ having the p r o b a b i l i t y p., and the output v a r i a b l e i s y, with any one output y^ having the p r o b a b i l i t y p..  The symbol p.. represents the p r o b a b i l i t y that a  J  'J  p a r t i c u l a r input x.. w i l l e l i c i t the p a r t i c u l a r response y..  The more  c o n s i s t e n t l y a response can be predicted from the stimulus, that i s , the better the transmission of information, then the c l o s e r T to a value o f 1.  .j i s  I f the transmission o f information i s poor, then  stimulus and response are unrelated, and T  -j has a value near 0.  Values of r e l a t i v e transmission f o r a s e r i e s o f computergenerated random responses were c a l c u l a t e d . d i s t r i b u t i o n of T  ,  Figure 2 shows the  based on random responses (with equal p r o b a b i l -  48 .  i t i e s of 0 . 3 3 3 each) to 1000  27-item t e s t s .  In t h i s graph, the bins th  for T  -j values from 0 to 50% represent i n t e r v a l s of 1 % , the n  representing the number of cases where T and 0 . 0 1 x ( n + l ) .  bin  -j has a value between 0 . 0 1 * n  Each a s t e r i s k represents 2 cases.  A bin with l e s s  than 2 cases shows one a s t e r i s k . The c o r r e c t score, e i t h e r i n % or absolute value, was a l s o c a l c u l a t e d f o r each subject.  Figure 3 shows the d i s t r i b u t i o n of  c o r r e c t scores, based on computer-generated random responses (with equal p r o b a b i l i t i e s of 0 . 3 3 3 each) to 1000  27-item t e s t s .  Each a s t e r i s k  represents 2 cases, a bin with l e s s than 2 cases showing one a s t e r i s k . The above d i s t r i b u t i o n s are based on t e s t s of 27 items f o r comparison with each group of edited utterances, as there were. 27 items of Group I utterances, 27 items of Group II utterances, and so on.  22 ************ 81 ***** 75 ************** 98 ***************** 88 ******************* 104 61 65 ************************************* 72 ************************* 48 *********************** 45 36 47 26 14 30 17 12 7 11 12 9 2 3 4 0 3 1 1 2 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49  C  4 fD  O H-  W r+  tr c  r+ O  3 O Hi  4 fl> 1—  1  tr cu w (D  o 0) 3 a. o  a  fD W "O O  3  W fD W  r+ O  to  r+ fD  •3 r+ fD W r+  t  ,  5  0  fD H r—s  o\°  617  *  l  O Hi O fD W fD DO  tr 3  * **  * * ** * * * * * * * * * * *  * ** * + * * *• * ** * * * *• * * * * * * * * * * * * * * * * * * * * * * * *  * * * * * * * * * * * * * * * * * *  * * * * * *  * * * * * *  * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * *  * * * * * ** * * * * * * *  * * * * * * * * *  * * * * * * * * *  * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *•*•» * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * + * * * * * * * * * * * * * * * * * * * # * * * * * * * *  - =  e  ^ ' ? « ^ ^ s " c ] - "  co n <r'in  Q  r  t  G  Q  G  e  S  S  O  C  e  <  r- ccCT-O — CVJ co -3" m *° r- co o o — c\:cr> <r in <• r~ _ ^~ -HI — — _ — o; a a M o; CJ <x w <  ' # of cases/bin  Score (maximum 27) on o  F i g u r e 3. D i s t r i b u t i o n o f S c o r e s based on random r e s p o n s e s t o a 27-item t e s t .  51  CHAPTER 5 RESULTS  5.1  C o r r e l a t i o n Between R e l a t i v e Transmission  (T  -j) and Score (S)  Because the r e l a t i v e transmission i s a measure of information transmission, and not n e c e s s a r i l y correct a high T ^  information transmission,  value does not always r e f l e c t a high score.  For example,  i f a subject c o n s i s t e n t l y responds with the vowel / i / when the stimulus i s /y/, he i s r e c e i v i n g p r e d i c t a b l e information from the stimulus. Although he m i s i n t e r p r e t s t h i s information c o n s i s t e n t l y , he obtains a high value of T  -j (assuming other responses are a l s o h i g h l y p r e d i c t a b l e  from t h e i r s t i m u l i ) .  I f a subject responds i n a random manner, using  no information from the stimulus, S w i l l be at chance l e v e l , f o r example in the t e s t described here, d i s t r i b u t e d around 9 (out of a maximum of 27), as shown i n Figure 3; T -j w i l l be r e l a t i v e l y low, d i s t r i b u t e d rg  as shown i n Figure 2.  The better a subject performs on the t e s t , the  better one would expect T  , and S to c o r r e l a t e .  To determine whether t h i s was so f o r performances on the present t e s t , Pearson c o r r e l a t i o n c o e f f i c i e n t s were c a l c u l a t e d between T  , and S f o r each of the 4 groups of items and a l s o f o r a s e r i e s of re I  random responses.  C o r r e l a t i o n s were as f o l l o w s :  Group  I  R =  0.91  Group  II  R =  0.73  Group I I I  R =  0.76  Group  R =  0.50  IV  Random Responses  R =-0.08  52  From the above, one sees that the longer the p o r t i o n of the c l u s t e r i n the t e s t item, the better the c o r r e l a t i o n between the two measures used here to describe performance.  I t w i l l also be seen i n  Section 5 . 2 that the longer the t e s t item, the b e t t e r the subjects' performance.  Therefore, as expected, the highest c o r r e l a t i o n s between  T - i and S occur f o r the items f o r which the subject does best. re I  In  examining performance f o r Group IV items, one measure i s not a good i n d i c a t o r o f the other measure.  5.2  I d e n t i f i c a t i o n of the Missing Vowel Responses were tabulated i n 3 x 3 confusion matrices, one  matrix per group of items and p e r s u b j e c t . -  There were therefore 4  matrices per subject, each with 2 7 items. "'"rel  anc  * ^  w e r e  c a  ^  c u  l  a t e  d  f o r each matrix.  Figure 4 shows  mean T-| and S values f o r each of the four groups of items, displayed separately f o r French and E n g l i s h speakers.  T  standard d e v i a t i o n about the mean are a l s o shown.  ^ and S values one Levels that one  subject would obtain by chance 1 % , 5 % , and 1 0 % of the time (obtained from Figures 2 and . 3 , Chapter 4 ) are also shown i n Figure 4 .  As can  be seen i n that f i g u r e , a l l subjects showed a downward trend i n both T -| and S, from Group I to Group IV. re  That i s , the f a r t h e r from the  vowel the utterance was truncated, the l e s s able subjects were to c o r r e c t l y i d e n t i f y the vowel.  An a n a l y s i s of variance showed a  s i g n i f i c a n t treatment e f f e c t among the groups of items f o r both T -| re  and S f o r both French and English subjects.  This e f f e c t i s s i g n i f i c a n t  53 at the l e v e l s indicated i n the t a b l e below.  French Treatment  T  E f f e c t s On:  S  rel  English  p <'O.OOl  p <  0.05  p <  p <  0.005  0.001  The Newman-Keuls t e s t , which i n d i c a t e s between which groups of items s i g n i f i c a n t d i f f e r e n c e s e x i s t [Winer, 1971, pp. ,191-196] was a l s o applied to the data.  S i g n i f i c a n t d i f f e r e n c e s were found between several  p a i r s of groups of items, f o r both French and E n g l i s h speakers, as shown i n Table  II.  Results show a great deal of i n d i v i d u a l v a r i a t i o n .  Figure 5  shows T -j and S values f o r each group of utterances, f o r 3 d i f f e r e n t re  subjects.  Levels that one subject would obtain by chance 1%, 5%, and  10% of the time are i n d i c a t e d . than any other subject.  Subject AS scored c o n s i s t e n t l y higher  She was a native French speaker who was a  teacher of French, but had had no formal phonetic t r a i n i n g . was a female native English speaker who  Subject CM  had had some phonetic t r a i n i n g .  Subject CB was a male native speaker of French and one of the speakers on the t e s t ; he had a l s o had some phonetic t r a i n i n g .  Because of her  high performance r e l a t i v e to other s u b j e c t s , subject AS was r e t e s t e d . On the second run of the t e s t she maintained her high l e v e l s , scoring s l i g h t l y higher than she had on the f i r s t run. Results as shown i n Figures 4 and 5 i n d i c a t e t h a t , f o r several of the item groups, subjects were able to i d e n t i f y the missing vowel above chance l e v e l s .  For example, on the average, English  d.  25  20 +  w  25 i  20  1%  15  w  - 1%  15  e 6  5%  - 5% -10%  H  10  I  II III  O  "X  W  6  0  IV  4 .  I  I I IIV  II GROUP  GROUP Figure  10  a n d S , p l u s o r m i n u s one Mean v a l u e s o f T • rel ' . d e v i a t i o n , f o r each group o f i t e m s . a,b - French speakers c , d - C n j ' . l i a h speakers r  standard  55 TABLE II Differences i n T  .| and i n S Between Each Group of Items,  f o r French and English Subjects FRENCH SPEAKERS T Group II  r e l  I  (%) II  Score III  35.18  III  132.03**  IV  172.97** 137.79** 40.92  96.85*  Group  I  II  II  10  III  46**  36**  IV  58**  48**  III  12  ENGLISH SPEAKERS T Group  I  r e l  II  II  60.99  III  121.33*  60.34  IV  123.27*  62.28  **  Score  (%) III  1.94  0.01 l e v e l of s i g n i f i c a n c e .  * 0.05 l e v e l of s i g n i f i c a n c e .  Group  I  II  II  19  III  41**  22  IV  40**  21*  III  1  57 subjects obtained T  .| and/or S values higher than those one subject  would obtain by chance 5% of the time, f o r items of Group I , and French s u b j e c t s , on the average, obtained s i m i l a r l y high l e v e l s f o r items of Group I and I I .  Several i n d i v i d u a l s of both languages  performed  well above the 5% chance l e v e l s f o r Groups I and I I , and 5 i n d i v i d u a l s did so f o r Group I I I .  In general though, Group I I I and IV performances  were at the l e v e l which one subject would obtain by chance 80% of the time.  I t i s i n t e r e s t i n g to note that i n d i v i d u a l v a r i a t i o n s were so  great that some subjects were able to i d e n t i f y the vowel f o r Group I I I and IV items better than others were able to i d e n t i f y vowels f o r Groups I and I I . In general, French subjects tended to d i s t r i b u t e t h e i r responses evenly, responding approximately 1/3 of the time with each vowel.  This tendency was somewhat weaker f o r the English s u b j e c t s ,  who ,for Groups I I I and IV tended to make more /u/ and /i/ responses respectively. Correct answers were not evenly d i s t r i b u t e d f o r e i t h e r language group, f o r any group of items.  For a l l groups except Group IV, c o r r e c t  /y/ responses were l e s s frequent than c o r r e c t / i / or /u/ responses. French subjects did not show a d i f f e r e n t pattern of c o r r e c t responses from English subjects.  The percent of c o r r e c t responses f o r a l l  of a p a r t i c u l a r stimulus, pooled f o r a l l subjects, i s shown i n Table I I I .  items  58 TABLE I I I Percent o f Items Answered C o r r e c t l y i n Each Vowel Category f o r Each Group of Items, A l l Subjects Pooled Together  STIMULUS  GROUP  IM  hi  /u/  64.4%  40.5%  63.9%  II  51.7  40.0  61.1  III  41.2  38.3  41.5  IV  45.0  35.6  33.4  I  Examination of the confusion matrices showed c e r t a i n confusions to be more common than others.  Cpnfusions  between /i/ and /y/, and /y/  and /u/, were more common than the / i / - / u / confusion.  This i s not  s u r p r i s i n g when one considers that while / i / and /y/ share the f r o n t f e a t u r e , and /y/ and /u/ share the rounding f e a t u r e , IM n e i t h e r o f these.  and /u/ share  Confusions between a l l p a i r s of vowels increased as  the items got s h o r t e r , the only exception to t h i s downward trend being for the /y/-/u/ confusion, which was made l e s s often i n Group IV than i n Group I I I . made.  Table IV shows the percent of time each confusion was  For example, i n Group I the / i / - / y / confusion was made on 27.5%  of the items f o r which e i t h e r IM  o r lyl  was the stimulus.  Several subjects reported that they were most often undecided as to whether the missing vowel was IM  or /y/, or lyl  several reported that they were never confused  or /u/, and  between / i / and /u/.  Performances seem c o n s i s t e n t with the f i r s t observation, but not s t r i c t l y so with the second.  59 TABLE IV Percent of Items i n Each Group f o r Which Various Vowel Confusions Were Made, A l l Subjects Pooled Together  VOWEL CONFUSIONS  GROUP /i/-/y/  /y/-/u/  /i/-/u/  27.5%  27.8%  10.3%  II  27.5  27.2  16.1  III  30.8  37.2  21.7  IV  40.6  28.1  24.2  I  Further f i n d i n g s on feature r e l a t i o n s h i p s between stimulus and response are discussed i n Section 5.3 below.  5.3  I d e n t i f i c a t i o n o f I n d i v i d u a l Features o f the Missing Vowel To examine the p e r c e i v a b i l i t y of a p a r t i c u l a r f e a t u r e , responses  were grouped i n the f o l l o w i n g ways:  / i / and /y/ vs /u/ ( f r o n t - v s - b a c k ) ,  and / i / vs /y/ and /u/ (unrounded-vs-rounded).  That i s , i f the  stimulus was a f r o n t vowel, and the response e i t h e r the same or the other f r o n t vowel, the response was considered c o r r e c t i n the f r o n t / back a n a l y s i s . A s i m i l a r procedure was employed f o r the unrounded/ rounded a n a l y s i s . When a 27-item 3 x 3 confusion matrix, f o r which each row has a t o t a l o f 9 e n t r i e s , i s collapsed i n the manner described above, a 2 x 2 matrix r e s u l t s i n which one row has 9 e n t r i e s , and the other row 18 e n t r i e s .  In such a matrix., the feature f o r which the data  are grouped forms 2/3 of the t o t a l data.  Therefore an e r r o r among the  90  t  JO  +  ••  unrounded v s . r o u n d e d  -o  f r o n t v s back  15 +  70 4-  60  50  40  10  +  6 o o  +  30  H  x res  54  20 10 +  0  —I—  I  II .  III Il l  GROUP F i g u r e 6.  Mean v a l u e s o f T  I '  r e l  II  III  GROUP  and S f o r f r o n t - v s - b a c k d i s t i n c t i o n s , and unrounded-  vs-rounded d i s t i n c t i o n s , shown f o r t h r e e groups o f items.'  CTl O  grouped data a f f e c t s the score more than does an e r r o r among the grouped data.  To overcome t h i s imbalance, values i n the row  18 e n t r i e s were halved before T  un-  containing  and S were c a l c u l a t e d .  n  rel Figure 6 shows mean values of T the front/back  -j and S f o r perception  of  d i s t i n c t i o n and f o r the unrounded/rounded d i s t i n c t i o n .  Values shown are mean values f o r a l l 20 subjects.  Only responses  f o r Groups I to I I I are shown, as Group I I I responses are already at chance l e v e l s . An a n a l y s i s of variance showed t h a t , f o r items of Group I , subjects did not make front/back  d i s t i n c t i o n s s i g n i f i c a n t l y better  than they made rounded/unrounded d i s t i n c t i o n s . either T  ^ or S was  This was  the case when  taken as an i n d i c a t i o n of performance.  d i f f e r e n c e s were greatest f o r Group I but yet were not  Because  significant,  a n a l y s i s of variance between feature d i s t i n c t i o n s i n the other groups was not c a r r i e d  out.  Much i n d i v i d u a l  v a r i a t i o n was  seen, both i n a b i l i t y to make  a feature d i s t i n c t i o n , and i n which d i s t i n c t i o n , e i t h e r  front/back  or unrounded/rounded, was more e a s i l y made. The table below shows the wide range of T ^ values f o r feature d i s t i n c t i o n s f o r 4 French rel 3  subjects.  I t also shows that some subjects made the  front/back  d i s t i n c t i o n more o f t e n , some made the unrounded/rounded d i s t i n c t i o n more o f t e n , and some made both d i s t i n c t i o n s equally. d i f f e r e n c e s were observed f o r English  subjects.  Similar individual  T  Subject  -j  rel Unrounded/ Rounded Distinction  T i for rel Front/Back Distinction  DN  0.30%  2.19%  EA  8.17  25.33  CB  10.52  11.24  PC  65.49  29.07  5.4  Differences Between Subject Groups Tables V and VI compare mean T  -j values and standard  devia-  t i o n s f o r the 10 French and 10 English subjects, and f o r the 7 phonetically  t r a i n e d and 13 p h o n e t i c a l l y naive subjects, r e s p e c t i v e l y .  Treatment-by-levels analyses of variance showed no s i g n i f i c a n t d i f f e r ences between French and English subjects, f o r e i t h e r T  -j or f o r S,  and no s i g n i f i c a n t d i f f e r e n c e s between p h o n e t i c a l l y t r a i n e d and p h o n e t i c a l l y naive subjects, f o r T ^  5.5  or f o r S.  Speaker Differences Responses were examined to see i f subjects performed better  on items spoken by one of the three speakers than by the others.  No  s i g n i f i c a n t d i f f e r e n c e s were found between r e s u l t s f o r items by each speaker.  However, again some i n d i v i d u a l  v a r i a t i o n s were observed.  Several subjects stated that items spoken by one or another of the  63 TABLE V Mean T ] Values (x) and Sample Standard Deviations (s) f o r Each Group of Items, Shown f o r French and English Subjects. Differences Between the Two Language Groups are not S i g n i f i c a n t . N = Number of Subjects i n Each Group r e  SUBJECT  French  English  GROUP  N  10  10  I  II  III  IV  25.4  21.9  12.2  8.1  s 17.3  14.9  10.8  7.1  22.7  16.6  10.5  10.3  s 14.8  11.2  11.7  7.2  X  X  TABLE VI Mean T g-j Values (x) and Sample Standard Deviations (s) For Each Group of Items, Shown f o r P h o n e t i c a l l y Trained and P h o n e t i c a l l y Naive Subjects. Differences Between the Two Groups With D i f f e r e n t Phonetic Backgrounds are not S i g n i f i c a n t . N = Number o f Subjects i n Each Group r  SUBJECT  Phonetically Trained  GROUP  N  7  I  II  III  IV  26.2  22.4  13.2  11.1  7.4  7.5  4.9  22.8  17.5  10.4  8.2  s 14.3  10.6  7.9  4.9  X  s Phonetically Naive  13  X  6.2 .  speakers were e a s i e s t to answer. with the subjects'  Such remarks were u s u a l l y c o n s i s t e n t  better performance f o r that p a r t i c u l a r speaker,  but there was no consistent trend as to who the "best" speaker was. A l l of the three speakers served as subjects i n the t e s t . They d i d not perform c o n s i s t e n t l y d i f f e r e n t l y from the other subjects. Nor did they perform best on the items f o r which they themselves were speaking.  In f a c t , two of the speakers performed somewhat worse on.  items f o r which they were the speakers, than f o r items uttered by another speaker.  65  CHAPTER 6 DISCUSSION  6.1  I d e n t i f i c a t i o n of the Missing Vowel French utterances containing the sequence -C^C^C^V- were  truncated a t four points before the vowel, as shown i n Table I (Chapter 4).  Cowan [1973] has shown that f o r production of these u t t e r -  ances by native French speakers, upper l i p protrusion most often begins with the f i r s t consonant of the c l u s t e r or e a r l i e r , when the vowel f o l l o w i n g the c l u s t e r i s a rounded one.  Items of each of the four  groups prepared f o r the present experiment therefore contained d i f f e r e n t amounts of information as to the nature of the f o l l o w i n g vowel. Subjects were able to p r e d i c t the upcoming vowel above chance l e v e l s f o r items i n Groups I and I I . These items had been.truncated a f t e r C^, and i n the middle of C^ r e s p e c t i v e l y . the phoneme />/.  In a l l cases, C^ was  In general, subjects were unable to p r e d i c t the  upcoming vowel f o r items i n Groups I I I and IV. These items had been truncated a f t e r a s p i r a t i o n of C^ (C^ being /k/ or / t / ) , and a f t e r release of C^. but before a s p i r a t i o n , r e s p e c t i v e l y . One sees from these r e s u l t s , represented  graphically i n  Figures 4 and 5 (Chapter 5 ) , that the segments up to and i n c l u d i n g C^ contain information about the f o l l o w i n g vowel that i s u t i l i z a b l e i n the perception process.  This information i s not r e s t r i c t e d to  the C^V j u n c t u r e , since the vowel can be c o r r e c t l y predicted when segments up to only the middle of C  d  are heard.  Though on the a r t i c u -  66 l a t o r y l e v e l the influence of the vowel i s apparent as f a r back as the f i r s t consonant of the c l u s t e r or e a r l i e r , the information present i n of the c l u s t e r and before i s not by i t s e l f u t i l i z a b l e by the l i s t e n e r as an a i d i n i d e n t i f y i n g the upcoming vowel.  I t i s not apparent  whether the perceivable information i s r e s t r i c t e d to i s the cumulative  or whether i t  information present i n the whole c l u s t e r up to  and i n c l u d i n g at l e a s t h a l f of  which i s used i n perception.  because c o a r t i c u l a t i o n due to the vowel may of the c l u s t e r , i t seems l i k e l y  However,  begin by the f i r s t consonant  that several segments, and not j u s t  C^,  contain information which, when i t i s a l l a v a i l a b l e to the l i s t e n e r , can be used i n the perception process, but when only e a r l y segments are a v a i l a b l e , i s not p e r c e p t u a l l y u s e f u l .  However, there are great i n d i v i d u a l  d i f f e r e n c e s , and one subject at l e a s t was able to c o n s i s t e n t l y  predict  the upcoming vowel above the l e v e l she would obtain by chance 5% of the time even f o r items of Group IV. Lehiste and Shockey [1972] have determined that c o a r t i c u l a t o r y e f f e c t s i n VCV utterances are not p e r c e i v a b l e , whereas Sharf & Ostreicher [1973] c i t e evidence that these e f f e c t s are perceivable i n CVNV utterances. I t i s not c l e a r why,  f o r some utterances,  coarticulated  information i s perceivable, while f o r others i t i s not. of c o a r t i c u l a t i o n may  depend on several f a c t o r s :  The  extent  on the a r t i c u l a t o r s  involved [ f o r example, Carney and M o l l , 1971], the place, manner, and v o i c i n g c h a r a c t e r i s t i c s of the neighbouring  phonemes [Stevens and  House, 1963], and the language being spoken [Ohman, 1966].  Depending  on f a c t o r s such as these, c o a r t i c u l a t i o n at the a r t i c u l a t o r y l e v e l may  be of an extent to produce more or less perceivable e f f e c t s , or  none at a l l . of a vowel  I t may on  a  be, f o r instance, that c o a r t i c u l a t o r y influences  preceding  nasal  (as i n  Sharf  and  Ostreicher's  study) are perceivable, whereas c o a r t i c u l a t o r y influences of a vowel on a preceding l e s s so.  stop consonant (as i n Lehiste and Shockey's study) are  Further studies comparing p e r c e i v a b i l i t y of c o a r t i c u l a t o r y  influences on f r i c a t i v e s , nasals, stops, and g l i d e s , voiced unvoiced, would y i e l d r e s u l t s relevent to t h i s matter.  and  In comparing  such studies to the present one, i t should be noted that the />/  used  here i s the uvular f r i c a t i v e , as opposed to the English r e t r o f l e x e d sonorant. Table II (Chapter 5) shows between which groups of items performance d i f f e r e d s i g n i f i c a n t l y . English s u b j e c t s , and f o r both T  One  sees that f o r both French and  .| and S, no s i g n i f i c a n t d i f f e r e n c e s  were found between Groups I and I I , although the trend was a s l i g h t decrease from I to I I .  Subjects were able to i d e n t i f y the vowel when  segments only up to the middle of C  4  were present almost as well as  they could i d e n t i f y i t when they heard a l l segments i n c l u d i n g the e n t i r e consonant.  S i m i l a r l y , no s i g n i f i c a n t d i f f e r e n c e s were noted  between Groups I I I and IV, i n d i c a t i n g that hearing a l l of C^ did not increase a l i s t e n e r ' s performance over hearing only part of that consonant.  I t seems that without the information present i n C^,  amount of other preceding  the  information present makes no d i f f e r e n c e to  a l i s t e n e r ' s a b i l i t y to i d e n t i f y the f o l l o w i n g vowel.  6.2  I d e n t i f i c a t i o n of Individual Features of the Missing Vowel A l i e_t al_. [1971] found that c o a r t i c u l a t e d n a s a l i t y was  perceivable i n CVN and CVVN utterances from which the f i n a l nasal deleted.  The present study found that c o a r t i c u l a t i o n of two  other  was  68 f e a t u r e s , front/back and unrounded/rounded, a l s o have perceivable effects. As shown i n Tables V and VI (Chapter 5 ) , / i / - / u /  confusions  were l e s s frequent than / i / - / y / or /y/-/u/ confusions, and the vowel /y/ was c o r r e c t l y i d e n t i f i e d l e s s of the time than the other vowels. These r e s u l t s are probably due to the f a c t t h a t , while / i / and /y/ share the feature value f r o n t , and /y/ and /u/ the feature value rounded, / i / and /u/ share neither of these.  Thus on hearing an item containing  information f o r a /y/, a subject may m i s i n t e r p r e t i t as e i t h e r an /u/ or an / i / , based on h i s perception of the shared features discussed above.  S i m i l a r l y , he may m i s i n t e r p r e t an / i / as a /y/, but i s l e s s  l i k e l y to m i s i n t e r p r e t i t as an /u/; he may m i s i n t e r p r e t an /u/ as a /y/, but i s l e s s l i k e l y to m i s i n t e r p r e t i t as an / i / .  Because /y/  shares features with both other vowels, m i s i n t e r p r e t a t i o n s o f the kind described here are more l i k e l y to occur f o r the vowel /y/ than f o r the other vowels. Figure 6 (Chapter 5) compares p e r c e i v a b i l i t y o f the f r o n t / back and unrounded/rounded d i s t i n c t i o n s .  I n d i v i d u a l features are known  to d i f f e r s i g n i f i c a n t l y i n i n t e l l i g i b i l i t y , some being more r e a d i l y perceivable than others [Wang and B i l g e r , 1973].  However, the features  i n question here are equally p e r c e i v a b l e , though several i n d i v i d u a l s were better able to make one d i s t i n c t i o n than the other. As expected, perception of e i t h e r feature decreased as the t e s t item grew s h o r t e r , performances f o r Groups I I I and IV being a t the chance l e v e l s i n d i c a t e d a t the r i g h t of each graph i n Figure 6. One sees t h a t , j u s t as segments preceding C  d  provide no usable  infor-  mation regarding the vowel on t h e i r own, they a l s o provide no usable information regarding a feature of the vowel. Comparison of Figures 4 and 6 shows that scores (converted to %) were considerably higher f o r feature i d e n t i f i c a t i o n than f o r vowel i d e n t i f i c a t i o n , i n part a consequence of c o l l a p s i n g the matrix and i n c l u d i n g e n t r i e s o f f the diagonal of the 3 x 3 matrix.  However T -|  values f o r feature i d e n t i f i c a t i o n were s l i g h t l y lower than f o r vowel identification.  One expects feature i d e n t i f i c a t i o n to be better than  vowel i d e n t i f i c a t i o n , the subject being presented with a two-way d i s c r i m i n a t i o n task i n the f i r s t case, and a three-way task i n the second. The s l i g h t decrease i n T  '-j from the vowel to the feature c o n d i t i o n  shows t h a t , when both feature d i s t i n c t i o n s are considered together, as i s necessary f o r c o r r e c t vowel i d e n t i f i c a t i o n , s l i g h t l y more information i s abstracted from the stimulus than when e i t h e r feature i s considered on i t s own. The d i s t i n c t i o n s front/back and unrounded/rounded are the manifestations of s p e c i f i c a r t i c u l a t o r y gestures, necessary f o r the production of the vowels described above.  C o a r t i c u l a t i o n i n the  utterances described here causes the a r t i c u l a t o r s to i n i t i a t e gestures i n a n t i c i p a t i o n of the upcoming vowel. e f f e c t i s perceivable.  these  To an extent, t h i s  In the case of rounding, such a n t i c i p a t o r y  c o a r t i c u l a t i o n i s known to occur as e a r l y as the f i r s t consonant of the Cj-.C^V sequence, y e t i n general, i s perceivable only i f segments up to and i n c l u d i n g at l e a s t h a l f of C^ are present, and not i f l e s s than t h i s amount of information i s a v a i l a b l e . I t i s not known how e x t e n s i v e l y f r o n t i n g i s c o a r t i c u l a t e d i n these utterances, but s i m i l a r  70 to rounding, i t i s perceivable only i f a l l segments i n c l u d i n g a t l e a s t h a l f of  6.3  are present.  Differences Between Subject Groups Results show that the p e r c e i v a b i l i t y o f c o a r t i c u l a t e d informa-  t i o n does not seem to be r e l a t e d to the l i s t e n e r ' s native language, even though one of the vowels employed i n the study (/y/) i s not an English phoneme.  Such f i n d i n g s are i n agreement with the f i n d i n g s of  Stevens et al_. [1969], that the l i s t e n e r ' s l i n g u i s t i c  background, be  i t English or Swedish, did not a f f e c t his a b i l i t y to make subphonemic d i s t i n c t i o n s , even among vowels that were not present i n h i s language. Though the pattern of c o a r t i c u l a t i o n may be language-dependent, i t s perception does not seem to be. Phonetic t r a i n i n g d i d not a f f e c t the t e s t r e s u l t s .  The subjects  with phonetic background i n general were not better able to i d e n t i f y the missing vowel than the p h o n e t i c a l l y naive subjects.  This suggests  that the a b i l i t y to make use of c o a r t i c u l a t o r y information does not depend on s p e c i f i c t r a i n i n g .  However, large i n d i v i d u a l v a r i a t i o n s i n  t e s t performance i n d i c a t e that not a l l l i s t e n e r s make the subphonemic d i s t i n c t i o n s necessary to p r e d i c t the missing vowel.  Whether they are  completely unable to do so, or whether several subjects were not s u f f i c i e n t l y motivated or did not completely understand the task, i s not c l e a r .  Other researchers of speech perception a b i l i t i e s have  also noted considerable i n d i v i d u a l d i f f e r e n c e s [ A l i ejt a]_., 1971; Liberman et aT., 1957; Stevens et_ aj_. , 1969].  I t i s l i k e l y that  71 some subjects i n these tasks are more motivated  than others; however,  i t also seems p o s s i b l e that some i n d i v i d u a l s possess keener powers of d i s c r i m i n a t i o n than others.  Several of the poorer performers i n the  present study had i n f a c t shown keen i n t e r e s t and motivation i n the task.  6.4  Subjects' Comments Without exception, a l l subjects reported that they found the  test d i f f i c u l t . they may  or may  of the time.  Most f e l t c e r t a i n they had performed badly (though not have), and that they had guessed a large proportion The f a c t that subjects thought the t e s t was a d i f f i c u l t  one and that they had "only guessed" does not n e c e s s a r i l y mean that the perceptual mechanism, to a large extent working subconsciously, could not handle the task.  However, only one of the twenty subjects c o n s i s -  t e n t l y gave an i n d i c a t i o n of his confidence i n each of his as was suggested i n the i n s t r u c t i o n s .  responses,  This seems to i n d i c a t e that the  task of i d e n t i f y i n g the vowel was s u f f i c i e n t l y d i f f i c u l t to impede subjects from making the f u r t h e r d e c i s i o n of how confident they were i n each response.  This may mean, that though the vowel was  perceivable  to an extent, use of c o a r t i c u l a t o r y information i s not a process used in everyday speech perception. As noted i n Section 5 . 5 , some subjects tended to do better on items spoken by one speaker than the others, though there was general trend f o r a l l subjects to perform best f o r one speaker.  particular  Subjects were u s u a l l y c o r r e c t when they stated they had  no  72 performed best f o r one of the speakers.  F a m i l i a r i t y with one or more  of the speakers did not a f f e c t a subject's performance, and the three speakers, who a l s o served as s u b j e c t s , did not perform best on t h e i r own  utterances. Most subjects could not describe the strategy they had used  in responding.  However, several subjects were seen to repeat the t e s t  item subvocally two or three times before choosing  t h e i r response.  Another comment some subjects made was that t h e i r choice sometimes influenced by a vowel heard i n the t e s t item.  was  Specifically,  f o r an item containing the sequence /lamorsk/, they would tend to choose the vowel /u/ as the missing one, because of the back vowel hi t e s t item.  i n the  Other subjects reported that they tended to choose /u/ f o r  items of the speaker with the lowest v o i c e , and one p h o n e t i c a l l y trained subject said she often chose / i / and /y/ f o r a speaker who judged to have "more fronted speech."  she  Subjects were sometimes, but  not always, accurate i n t h e i r d e s c r i p t i o n s of t h e i r response tendencies. Thus i t seems several f a c t o r s may  have influenced a  subject's response,  perhaps sometimes masking out the perceivable e f f e c t due to c o a r t i c u lation.  However, none of the f a c t o r s described above was looked at  s p e c i f i c a l l y i n the a n a l y s i s .  6.5  Conclusions Al i et a]_. [1971] hypothesize that i f the e f f e c t s of c o a r t i c u -  l a t i o n are p e r c e i v a b l e , then speech perception can be said to f o l l o w speech production and make use of i t s i d i o s y n c r a c i e s .  This r e l a t i o n -  ship i s predicted by the motor theory of speech perception  73 [Liberman e t aj_., 1967].  Results of the present study suggest that  such a r e l a t i o n s h i p between production and perception e x i s t s to an extent.  The perception process can make use of some of the i d i o s y n -  c r a c i e s of production; c o a r t i c u l a t e d information i s only sometimes perceivable i n -C..CV utterances, notably when a l l segments up to and including  are present.  by Wickelgren's  The present r e s u l t s seem to be predicted  [1969] model of c o n t e x t - s e n s i t i v e coding, i n which each  u n i t s p e c i f i e s i t s r i g h t - and left-hand neighbours.  The f i n a l consonant  of the c l u s t e r contains information which s p e c i f i e s the immediately f o l l o w i n g vowel, but segments preceding the f i n a l consonant seem to contain no perceivable information regarding the vowel.  However, as  discussed p r e v i o u s l y , i t i s l i k e l y that i t i s the cumulative present i n a l l preceding segments that i s used p e r c e p t u a l l y .  information Also,  there i s no reason to assume that c o a r t i c u l a t o r y influences of a vowel could never be strong enough to produce a completely  perceivable  e f f e c t on a phoneme more than one removed from the vowel. subjects were able to i d e n t i f y the vowel when hearing  Some  utterances  truncated a f t e r Cg of the c l u s t e r , suggesting t h a t , f o r them at l e a s t , context s e n s i t i v i t y i s not l i m i t e d to the immediately phoneme.  In a d d i t i o n ,  v o i c e l e s s stop.  neighbouring  i n the utterances used here was always a  I t i s not known what the c o a r t i c u l a t o r y i n f l u e n c e of  the vowel on a nasal or f r i c a t i v e i n that p o s i t i o n may be. The f a c t that subjects can use subphonemic c o a r t i c u l a t o r y information to i d e n t i f y an upcoming vowel does not mean that the perception process n e c e s s a r i l y incorporates t h i s a b i l i t y .  There i s  evidence that subphonemic d i s t i n c t i o n s are not as well perceived as  phonemic ones [Liberman e_t a l _ . , 1957; Stevens et a l _ . , 1969], and speech perception seems to be p r i m a r i l y a c a t e g o r i c a l process.  But i t i s  possible that i n unfavorable c o n d i t i o n s , such as a noisy environment or a large amount of information having to be processed q u i c k l y , c o a r t i c u l a t o r y e f f e c t s are used as cues by the perceptual mechanism. Use of such redundant cues would f a c i l i t a t e c o r r e c t i d e n t i f i c a t i o n of any one speech sound.  I t i s c l e a r that some c o a r t i c u l a t o r y e f f e c t s  provide s i g n i f i c a n t l y perceivable information to the l i s t e n e r .  BIBLIOGRAPHY  ALI,  L., GALLAGHER, T., GOLDSTEIN, J . , and DANILOFF, R. (1971). "Perception of Coarticulated N a s a l i t y , " J . Acoust. Soc. Amer. 49: 538-540.  AMERMAN, J.D., DANILOFF, R., and MOLL, K.L. (1970). "Lip and Jaw C o a r t i c u l a t i o n f o r the Phoneme / a e . / , " J . Speech Hearing Res. 13: 174-161. CARNEY, P.J., and MOLL, K.L. (1971). "A Cinefluorographic I n v e s t i gation of F r i c a t i v e Consonant-Vowel C o a r t i c u l a t i o n , " Phonetica 23: 193-202. CLARK, M., and SHARF, D.J. (1973). " C o a r t i c u l a t i o n E f f e c t s of PostConsonantal Vowels on the Short-Term Recall of PreConsonantal Vowels," Language and Speech 1_6: 67-76. COWAN, H.A. (1973). "A Study of Upper L i p Protrusion i n French," Master's Thesis, U n i v e r s i t y of B r i t i s h Columbia. DANILOFF, R., and MOLL, K.L. (1968). " C o a r t i c u l a t i o n of L i p Rounding," J . Speech Hearing Res. 1J_: 707-721 . DELATTRE, P.C., LIBERMAN, A.M., and COOPER, F.S. (1955). "Acoustic Loci and T r a n s i t i o n a l Cues f o r Consonants," J . Acoust. Soc. Amer. 27: 769-773. FROMKIN, V.A. (1966). "Neuro-Muscular S p e c i f i c a t i o n s of L i n g u i s t i c Units," Language and Speech 9: 170-199. FRY, D.B. (1964). "Experimental Evidence f o r the Phoneme," i n In Honour of Daniel Jones, David Abercrombie, D.B. F r y , P.A.D. MacCarthy, N.C. S c o t t , and J.L. Trim, Eds. (Longmans, London), 59-72. HENKE, W.L. (1966). "Dynamic A r t i c u l a t o r y Model of Speech Production Using Computer Simulation," Doctoral Thesis, M.I.T. \ HOUSE, A.S., and FAIRBANKS, G. (1953). "The Influence of Consonant Environment upon the Secondary Acoustical C h a r a c t e r i s t i c s of Vowels," J . Acoust. Soc. Amer. 25: 105-113. KELSEY, C.A., W00DH0USE, R.J., and MINIFIE, F.D. (1969). " U l t r a sonic Observations of C o a r t i c u l a t i o n i n the Pharynx," J . Acoust. Soc. Amer. 46: 1016-1018.  KOZHEVNIKOV, V.A., and CHISTOVICH, L.A. (1965). Speech, A r t i c u l a t i o n , and Perception ( t r a n s l a t e d from Russian], J o i n t P u b l i c a t i o n Research S e r v i c e , U.S. Dept. Commerce Mo. 3Tj (Washington). KUEHN, D. (1970). "Perceptual E f f e c t s of Forward M.A. Thesis, U n i v e r s i t y of Iowa.  Coarticulation,"  LADEFOGED, P., and BROADBENT, D.E. (1960). "Perception of Sequence . in Auditory Events," Quart. J . Exp. Psych. 1_2: 162-170. LEHISTE, I. (1972). "The Units of Speech Perception," Working Papers i n L i n g u i s t i c s No. 1_2, The Ohio State U n i v e r s i t y , 1-32. LEHISTE, I . , and SHOCKEY, L. (1972). "On the Perception of C o a r t i c u l a t i o n E f f e c t s i n English VCV S y l l a b l e s , " Working Papers i n L i n g u i s t i c s No. Ij2, The Ohio State U n i v e r s i t y , 78-86. LIBERMAN, A.M., COOPER, F.S., SHANKWEILER, D.P., and STUDDERT-KENNEDY, M. (1967). "Perception of the Speech Code," Psych. Review 74: 431-461. LIBERMAN, A.M., HARRIS, K.S., HOFFMAN, H.S., and GRIFFITH, B.C. (1957). "The D i s c r i m i n a t i o n of Speech Sounds w i t h i n and across Phoneme Boundaries," J . Exper. Psych. 54: 358-368. LOTZ, J . , ABRAMSON, A., GERSTMAN, L., INGEMANN, F., and NEMSER, W.J. (1960). "The Perception of English Stops by Speakers of E n g l i s h , Spanish, Hungarian, and Thai," Language and Speech 3: 71-77. MACNEILAGE, P.F. (1963). "Electromyographic and Acoustic Study of the Production of Certain Final C l u s t e r s , " J . Acoust. Soc. Amer. 35: 461-463. MACNEILAGE, P.F. (1972). "Speech Physiology," i n Speech and C o r t i c a l Functioning, John H. G i l b e r t , Ed., (Academic Press, New York & London), 1-72. MACNEILAGE, P.F., and DECLERK, J.L. (1969). "On the Motor Control of C o a r t i c u l a t i o n i n CVC Monosyllables," J . Acoust. Soc. Amer. 45: 1217-1233. MILLER, GA ., and NICELY, P.E. (1955). "An A n a l y s i s of Perceptual Confusions Among Some English Consonants," J . Acoust. Soc. Amer. 27_: 338-352. MOLL, K.L., and DANILOFF, R. (1971). " I n v e s t i g a t i o n of the Timing of Velar Movements During Speech," J . Acoust. Soc. Amer. 50: 678-684.  77 OHMAN, S.E.G. (1966). " C o a r t i c u l a t i o n i n VCV Utterances: Spectrog r a p h ^ Measurements," J . Acoust. Soc. Amer. 39_: 151-168. OHMAN, S.E.G. (1967). "Numerical Model of C o a r t i c u l a t i o n , " J . Acoust. Soc. Amer. 41_: 310-320. PERKELL, J.S. (1969). Physiology of Speech Production: Results and Implications of a Q u a n t i t a t i v e Cineradiographic Study (M7l.T. Press, Cambridge). PETERSON, G.E., and BARNEY, H.L. (1952). "Control Methods Used In a Study of the Vowels," J . Acoust. Soc. Amer. 24: 175-184. SAVIN, H.B., and BEVER, T.G. (1970). "The Nonperceptual R e a l i t y of the Phoneme," J . Verb. Learning Verb. Behavior £: 295-302.  '  SHARF, D.J., and OSTREICHER, H. (1973). " E f f e c t of Forward and Backward C o a r t i c u l a t i o n on the I d e n t i f i c a t i o n o f Speech Sounds," Language and Speech 1_6: 196-206. STEVENS, K.N., and HOUSE, A.S. (1963). " P e r t u r b a t i o n o f Vowel A r t i c u l a t i o n s by Consonantal Context: An A c o u s t i c a l Study," J . Speech Hearing Res. 6_: 111-128. STEVENS, K.N., HOUSE, A.S., and PAUL, A.P. (1966). "Acoustical D e s c r i p t i o n o f S y l l a b i c N u c l e i : An I n t e r p r e t a t i o n i n Terms o f a Dynamic Model o f A r t i c u l a t i o n , " J . Acoust. Soc. Amer. 40: 123-132. STEVENS, K.N., LIBERMAN, A.M., STUDDERT-KENNEDY, M., and OHMAN, S.E.G. (1969). "Cross Language Study of Vowel Perception," Language and Speech 1_2: 1-23. WANG, M.D., and BILGER, R.C. (1973). "Consonant Confusions i n Noise: A Study o f Perceptual Features," J . Acoust. Soc. Amer. 54: 1248-1266. WICKELGREN, W.A. (1969). "Context-Sensitive Coding i n Speech Recognition, A r t i c u l a t i o n , and Development," i n Information Processing i n the Nervous System, K.N. L e i b o v i c , Ed., (New York-Heidelberg-Berlin: S p r i n g e r ) , 85-95. WINER, B.J. (1971). S t a t i s t i c a l P r i n c i p l e s i n Experimental Design, (McGraw-Hi11 Book Company, New York).  APPENDIX I Utterances Used i n the Experiment  l a dextre i n i m i t a b l e  /ladekstrinimitabl/  l a dextre universelle  /ladekstryniverse!/  l a dextre  /ladekstrutra ^ e /  outraged  1'averse t r i b a l e  /laverstribal/  1'averse truquee  /laverstryke/  1'averse troublee  /laverstruble/  1'amorce c r i p t i q u e  •/lamarskriptik/  1'amorce cruciforme  /1 armr s krys i form/  1'amorce croupissante  /lamorskrupisant/  79 APPENDIX I I Instructions  You w i l l be hearing a tape of a s e r i e s o f short French u t t e r ances.  The end o f each utterance has been deleted.  Listen carefully  and decide what vowel w i l l f o l l o w the truncated utterance. The p o s s i b l e answers are the French vowels " i " as i n " d i t e s , " "u" as i n "une," and "ou" as i n "bout" (that i s , the phonetic symbols /i/, /y/, /u/). For example, the utterance may be: l a dextre i n i m i t a b l e or l a dextre u n i v e r s e l l e or l a dextre outragee However, you w i l l hear the phrase cut o f f before the vowel: la  dextr(e)—  In a l l cases, your task i s to decide i f the missing vowel i s " i , " "u,"or"ou," Choose your answer  on the basis  as if it is coming up.  of what you hear, and what vowel  sounds  Do not be concerned with the meaning of the  utterance. The next sheet contains a l i s t of a l l the utterances.  Remember,  you w i l l not be hearing the whole utterance, only a shortened form. The l i s t i s meant to f a m i l i a r i z e you with a l l the possible answers. Your task i s to i d e n t i f y only the missing  vowel.  80 Mark your answer i n the appropriate column on the answer sheet. If you feel you do not know the answer, i t i s important that you guess. Approximately 1/3 of the answers are "i," 1/3 "u," and 1/3 "ou."  These  numbers are only approximate, so l i s t e n c a r e f u l l y and mark your answer as the vowel you feel most sure i s the missing one. I f you l i k e , you can mark an i n d i c a t i o n you have i n your choice.  of the confidence  I f you are reasonably sure you have answered  c o r r e c t l y , mark a '!' beside your answer.  I f you are not too sure of  the answer you have put down, or i f you have no confidence at a l l i n your response, mark a '2' or a '3' r e s p e c t i v e l y beside the answer.  You  need not make t h i s judgment f o r each response i f you feel you do not have the time. There are 110 items on the t e s t . 20 minutes.  I t w i l l l a s t approximately  You w i l l f i r s t be hearing three p r a c t i c e items, a f t e r  which the tape w i l l be stopped i n case you have any questions.  You may  ask to stop the tape any time during the t e s t i f you f e e l you need a break, but no item w i l l be repeated. Choose your answer on the basis of what you hear, and what vowel sounds as i f i t i s coming up.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0093039/manifest

Comment

Related Items