UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Perception of coarticulated lip rounding Adelman, Sharon 1974

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1974_A6_7 A34.pdf [ 3.87MB ]
Metadata
JSON: 831-1.0093039.json
JSON-LD: 831-1.0093039-ld.json
RDF/XML (Pretty): 831-1.0093039-rdf.xml
RDF/JSON: 831-1.0093039-rdf.json
Turtle: 831-1.0093039-turtle.txt
N-Triples: 831-1.0093039-rdf-ntriples.txt
Original Record: 831-1.0093039-source.json
Full Text
831-1.0093039-fulltext.txt
Citation
831-1.0093039.ris

Full Text

PERCEPTION OF COARTICULATED LIP ROUNDING by SHARON ADELMAN B.Sc, McGill University, 1972 A THESIS SUBMITTED IN PARTIAL FULFILMENT Of THE REQUIREMENTS FOR THE DEGREE OF . MASTER OF SCIENCE in the Department of Paediatrics Division of Audiology and Speech Sciences We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA July, 1974 In presenting th is thes is in p a r t i a l fu l f i lment of the requirements for an advanced degree at the Univers i ty of B r i t i s h Columbia, I agree that the L ibrary shal l make it f ree l y ava i lab le for reference and study. I fur ther agree that permission for extensive copying of th is thes is for scho lar l y purposes may be granted by the Head of my Department or by his representat ives . It is understood that copying or pub l i ca t ion of th is thes is for f i nanc ia l gain sha l l not be allowed without my wri t ten permission. Department of The Univers i ty of B r i t i s h Columbia Vancouver 8, Canada i i ABSTRACT The present study investigates the perceivabi 1 i t y of coartic-ulated l i p rounding in French. Nine utterances containing the clusters / k s t r / , / r s t r / , and /rskr/ followed by one of the vowels /i/, /y/, or /u/ in a l l possible combinations, were truncated at 4 different points before the vowel. Test items in each of the 4 groups therefore contained different amounts of information regarding the nature of the following vowel, due to coarticulatory influences of the vowel on the preceding consonants. Subjects were asked to predict the identity of the missing vowel on hearing the truncated utterances. Subjects were native speakers of either French or English; some of them had a knowledge of phonetics. Results show that when segments up to and including at least half of the f i n a l consonant of the cluster are present, subjects correctly i d e n t i f y the missing vowel well above chance levels. Several individuals were able to ide n t i f y the vowel even when presented with shorter versions of the utterances. No s i g n i f i c a n t difference in performance was found between French and English subjects, nor between subjects with and without phonetic training. P e r c e i v a b i l i t y of individual features of the missing vowel i s discussed. It i s concluded that coarticulatory effects due to l i p rounding (as well as to horizontal tongue position) provide perceivable information at a level s i g n i f i c a n t l y above chance, and that this information may be used by the perceptual mechanism as an aid in speech sound i d e n t i f i c a t i o n . Chapter TABLE OF CONTENTS i i i Page ABSTRACT i i TABLE OF CONTENTS i i i LIST OF TABLES v LIST OF FIGURES vi ACKNOWLEDGEMENT v i i 1. INTRODUCTION . 1 2. REVIEW OF THE LITERATURE 3 2.1 Introduction 3 2.2 Coarticulation: The Acoustic Level 3 2.3 Coarticulation: The Ar t i c u l a t o r y Level 7 2.4 Coarticulation: The Perceptual Level 18 3. AIMS.OF THE EXPERIMENT 32 4. MATERIALS AND METHODS 34 4.1 P i l o t Study . . . . . 34 4.2 Main Study 36 5. RESULTS 51 5.1 Correlation Between Relative Transmission ( T r e l ) and Score (S) 51 5.2 I d e n t i f i c a t i o n of the Missing Vowel 52 5.3 I d e n t i f i c a t i o n of Individual Features of the Missing Vowel 59 5.4 Differences Between Subject Groups 62 5.5 Speaker Differences 62 i v Chapter Page 6. DISCUSSION 65 6.1 I d e n t i f i c a t i o n of the Missing Vowel 65 6.2 I d e n t i f i c a t i o n of Individual Features of the Missing Vowel 67 6.3 Differences Between Subject Groups 70 6.4 Subjects' Comments 71 6.5 Conclusions 72 BIBLIOGRAPHY . . . . . . 75 APPENDIX I - Utterances Used in the Experiment 78 APPENDIX II - Instructions 79 V LIST OF TABLES Table Page I. Groups of Edited Stimuli with Constant Clusters / k s t r / , / r s t r / , and /r s k r / 41 I I . Differences in T -| and in S between Each Group of Items, for French and English Subjects 55 I I I . Percent of Items Answered Correctly in Each Vowel Category for Each Group of Items, A l l Subjects Pooled Together 58 IV. Percent of Items in Each Group for Which Various Vowel Confusions Were Made, A l l Subjects Pooled Together 59 V. Mean T -j Values (x) and Sample Standard Deviations (s) for Each Group of Itmes, Shown for French and English Subjects 63 VI. Mean T .| Values (x) and Sample Standard Deviations (s) for Each Group of Items, Shown for Phonetic-a l l y Trained and Phonetically Naive Subjects . . . 63 vi LIST OF FIGURES Figure Page 1. Mingogram of one of the test utterances, " l a dextre universelle," showing the 4 points of truncation 43 2. Dist r i b u t i o n of T ^ based on random responses to a 27-item test 49 3. Distribution of Scores based on random responses to a 27-item test 50 4. Mean values of T r e l and S, plus or minus one standard deviation, for each group of items 54 5. T ^ and S values for each group of items for three d i f f e r e n t subjects 56 6. Mean values of T -. and S for front-vs-back rel d i s t i n c t i o n s , and unrouhded-vs-rounded d i s t i n c t i o n s , shown for three groups of items 60 v i i ACKNOWLEDGEMENT I would l i k e to thank a l l those who have had a part in this thesis: • Dr. Andre-Pierre Benguerel for his guidance during the research and writing of the thesis. • Dr. Joyce D. Edwards for serving on my committee. • My subjects for t h e i r kind cooperation. • My parents for t h e i r encouragement over the past two years. • Ingrid, Betty, Lynne, Pat, and Meralin, for much friendship. 1 CHAPTER 1 INTRODUCTION The production of speech i s a complex process, and i t s complex-i t i e s necessitate a unique and equally complex perception process. It would be interesting to know whether the subtleties and variations in the production process are noted i n , perhaps even necessary to, the speech perception process. Speech i s not'merely a sequence of independent sounds produced by independent gestures. As the motor gestures producing speech over-lap in time and change with context, so do the acoustic cues in the speech signal. I t i s on th i s ever-changing signal that speech perception i s based. The l i s t e n e r must abstract the appropriate cues from the mass of acoustic information to corre c t l y i d e n t i f y the s i g n a l , to under-stand spoken language. How he recognizes the appropriate cues, indeed even what these cues may be, i s far from completely understood. In examining speech production, one sees that i f an a r t i c u -l a t o r , such as'the tongue t i p or the velum, i s free to move during production of a part i c u l a r sound, i t may i n i t i a t e movement towards i t s target position for the subsequent phone, or for a phone several segments ahead. Also, an a r t i c u l a t o r may s t i l l be moving from i t s position for the preceding phone while a current phone i s already being produced. This overlapping of speech gestures in time i s referred to as co-a r t i c u l a t i o n . This c h a r a c t e r i s t i c of speech production results in one phoneme being acoustically d i f f e r e n t v i r t u a l l y each time i t i s 2 produced. I t also means that the units of production overlap to an extent whereby cues for a phoneme may be found several phones preceding and several phones following the one in question. Does a l i s t e n e r make use of these cha r a c t e r i s t i c s of speech production in iden t i f y i n g the speech signal? At l e a s t , can he make use of them i f required to, for example, when other cues are masked or missing? Or are these cues irrelevant to the perception process, merely a by-product of the complex workings of the a r t i c u l a t o r s , without perceptual correlates? The present study looks at utterances in which cues for a certain vowel are known to ex i s t several phones preceding the vowel. Subjects are asked to predict the id e n t i t y of the upcoming vowel after hearing only part of the utterance. This study therefore gives an indication as to how much use a l i s t e n e r makes, or at least can make, of coarticulated information in the speech sign a l . 3 CHAPTER 2 REVIEW OF THE LITERATURE 2.1 Introduction Studies of coa r t i c u l a t i o n have been carried out on several le v e l s . Section 2.2 discusses c o a r t i c u l a t i o n at the acoustic l e v e l . Section 2.3 discusses c o a r t i c u l a t i o n at the a r t i c u l a t o r y level and outlines several theories that have been proposed to explain the phenomenon. Section 2.4 reviews studies on the perceptual correlates of c o a r t i c u l a t i o n . A discussion of the possible units of speech perception i s included in th i s section. 2.2 Coarticulation: The Acoustic Level E a r l i e s t indications of the phenomenon of coa r t i c u l a t i o n came from acoustic studies. I t has long been known that the acoustic value of a vowel i s influenced by the vowel's phonetic context. For example, vowel duration, i n t e n s i t y , and fundamental frequency are known to vary with changes in consonantal environment [House and. Fairbanks, 1953]. Stevens and House [1963] examined changes in vowel formant frequency and formant bandwidth with context. Three speakers produced various /haC^VC^/ utterances, in which C i s a consonant and V i s a vowel. In these utterances C-j = C^. When the f i r s t formant frequency ( F ^ was plotted against the second formant frequency (F,-,) for each 4 vowel, i t was seen that quite appreciable differences occurred with changes in consonantal context. In addition, several of the uttered vowels did not f a l l within the F-j vs contours established by Peterson and Barney [1952]. These contours had been determined using several productions of the utterance /hvd/. Stevens and House showed that the vowel in such an environment i s not unlike the vowel produced in a null environment (/#V#/). The major discrepancy between the F-| - Fr, values noted by Stevens and House, and those found by Peterson and Barney, then, was due to the influence of the consonantal environ-ment imposed by the /hGCVC/ production. Looking at differences within t h e i r own data, Stevens and House found further evidence for the relationship between phonetic context and a vowel's acoustic value. Consonantal context was seen to cause systematic s h i f t s in the vowel's formant frequencies, p a r t i c u l a r l y depending on the place of a r t i c u l a t i o n , manner of a r t i c u l a t i o n , and the voicing c h a r a c t e r i s t i c of the consonant involved. For example, in an environment of l a b i a l or post-dental consonants, front vowels showed more of a downward s h i f t of ?2 t n a n they did in a back environment. Fricatives produced greater s h i f t s in interconsonantal vowel formants than did stops. Voiced consonants produced a lowering effect on F-j of the vowel while F£ was not as appreciably affected. These changes in the acoustic value of a vowel are explained by Stevens and House in a r t i c u l a t o r y terms. In the production of a C-jVC2 s y l l a b l e , the structures of the vocal t r a c t assume position for C-pthen maneuver towards position for V. During this movement, 5 instructions for are i n i t i a t e d . Vowel modifications are therefore due to overlapping of timing of neural instructions, which may result in anticipation of the upcoming phoneme, and the sluggishness or dynamic constraints ( i . e . mass and i n e r t i a ) of the system. F r i c a t i v e s , for example, requiring c a r e f u l l y controlled positioning and target approach, would tend to infringe on the neighbouring vowel's a r t i c u l a t i o n more than would a quickly executed stop. Ohman [1966] looked at the influence of both preceding and following phones on a phoneme. Whereas Stevens and House used symmetrical CVC utterances and were unable to separate the influence C-| had on the vowel from the influence of C^. Ohman used C-jVC^ di s y l -lables. Utterances were spoken by speakers of three dif f e r e n t languages and employed vowels p a r t i c u l a r to each language. Spectro-graph^ analysis yielded measurements of formant frequencies at two points along the VC and CV t r a n s i t i o n s . Ohman found that, not only did V-| af f e c t the following t r a n s i t i o n (as might be expected due to mechanoinertial f a c t o r s ) , but as w e l l , influenced the preceding V^ C t r a n s i t i o n . As noted by previous investigators, i t was that showed the most va r i a t i o n . Ohman's work yielded results d i f f e r e n t from that of previous workers, whose studies of CV trans i t i o n s had led to the formation of the "locus theory" [Delattre, Liberman, and Cooper, 1955]. This theory states that for each consonant there exists a c h a r a c t e r i s t i c frequency position (or po s i t i o n s ) , or locus, from which formant transitions begin or to which they point. Delattre e_t al_. had found fixed l o c i for the second formant of /b/ and /d/, and two l o c i for 6 /g/ (depending on context). Ohman found that the tr a n s i t i o n l o c i for /b/ and /d/ are not f i x e d , but are dependent on context. For example, in a /V^bV^/ utterance, the by^ t r a n s i t i o n originates at 500 Hz i f V1 i s /u/, but at 1300 Hz i f V 2 i s /y/. Delattre _et_ al_. had postulated a fixed locus for bV transitions at 720 Hz. The a r t i c u l a t o r y basis behind the locus theory i s that formant transitions are ref l e c t i o n s of the change in size and shape of the vocal tract as i t moves from one target position to another. Delattre et al_. state: Since the a r t i c u l a t o r y place of production of each consonant i s , for the most part, f i x e d , we might expect to find that there i s correspondingly a fixed frequency position -- or "locus" -- for i t s second formant; we could . . . describe the various second-formant transitions as movements from this acoustic locus to the steady state level of the vowel. . . . [Delattre et a l . , 1955, p. 769] What the theory does not take into account i s that i f pre-vious and/or succeeding a r t i c u l a t i o n s have an appreciable effect on the vocal t r a c t configuration for any given consonant, the locus of a consonant produced in one environment w i l l not be identic a l to that of a consonant produced in another environment. Ohman, l i k e Stevens and House, attributes the effect of preceding context on an upcoming phoneme (or in th i s case, i t s effect on the upcoming CV tra n s i t i o n ) to mechanoinertial factors. This type of c o a r t i c u l a t i o n has since been referred to as carryover coarticu-l a t i o n . To explain the influence of succeeding context on preceding events, or anticipatory c o a r t i c u l a t i o n , Ohman points out that speech 7 gestures are not independent and l i n e a r l y sequenced. Often the vocal tract can vary a great deal without introducing a phonemic change in the sound produced. For example, the tongue i s free to move during the production of a b i l a b i a l stop; the l i p s are free to move during the production of a velar stop or l i q u i d . In general, i f an a r t i c u l a t o r i s free to move during production of one phoneme, i t w i l l i n i t i a t e movement toward i t s target position for the next upcoming phoneme. Since traces of the f i n a l vowel are observable already in the t r a n s i t i o n from the i n i t i a l vowel to the consonant, i t must be concluded that a motion toward the f i n a l vowel starts not much l a t e r than, or perhaps even simul-taneously with, the onset of the stop-consonant gesture. A VCV utterance of the kind studied here can, accordingly, not be regarded as a li n e a r sequence of three successive gestures. [Ohman, 1966, p. 165] Ohman also indicates the possible language-dependent nature of c o a r t i c u l a t i o n . Russian stops must be coarticulated with one of only two vowels, whereas American English and Swedish stops enjoy more freedom of co a r t i c u l a t i o n . 2.3 Coarticulation: The Ar t i c u l a t o r y Level With investigations at the acoustic level,.the complexity of the coa r t i c u l a t i o n process began to come to l i g h t . Two major approaches to the study of a r t i c u l a t o r y behaviour, electromyography and cineradiography, began to y i e l d evidence of coa r t i c u l a t i o n for various a r t i c u l a t o r s , and several models have been advanced to account for the phenomenon. Such models are necessarily related to basic questions of speech organization. 8 Electromyography (EMG) has been employed to great advantage in c o a r t i c u l a t i o n studies. Electrodes are introduced into the a r t i c u l a t o r in question, and muscle action potentials are recorded during utterance production. In this way muscle a c t i v i t y during production of any phone can be measured. A major problem in the interpretation of EMG studies i s that the a c t i v i t y of one muscle i s often closely related to that of others. A given amount of contraction in one muscle may therefore produce d i f f e r e n t amounts of movement of an a r t i c u l a t o r , depending on the position and a c t i v i t y of other muscles [MacNeilage and DeClerk, 1969]. Therefore, investigations into the EMG a c t i v i t y of only one muscle do not necessarily r e f l e c t a l l that is happening to the a r t i c u l a t o r in question. However, EMG studies allow individual muscles to be studied and correlations between neuromuscular a c t i v i t y and l i n g u i s t i c units to be made. Cineradiography has been used to a great extent as w e l l . Movements of l i p s , tongue, jaw, velum, and pharynx can be made v i s i b l e by various methods of cineradiography, and correlated with acoustic output. However the resulting picture i s a two dimensional display of the vocal tr a c t and so has l i m i t a t i o n s . It also can only y i e l d information at the motor l e v e l , whereas EMG studies give insight into neuromuscular commands. Perkell states: Although a cineradiograph contains a large amount of one type of information, i t i s obvious that many other types of parameters should be examined and correlated with the cineradiography data before a comprehensive description of vocal-tract function can be obtained. [ P e r k e l l , 1969, p. 2] Kozhevnikov and Chistovich [1965] examined co a r t i c u l a t i o n of l i p movements in Russian, measuring e l e c t r i c a l a c t i v i t y of the orbicu-9 l a r i s o r i s muscle and correlating i t with utteran.ee production. One speaker produced CV and CCV syl l a b l e s in which V was a rounded vowel. Results show l i p protrusion to begin almost simultaneously with the beginning i f the f i r s t consonant, even i f a word or s y l l a b l e boundary f a l l s within the CC sequence. Thus l i p rounding was found to coarticulate over an entire CCV unit. The authors postulate an " a r t i c u l a t o r y s y l l a b l e " model of speech production in which commands for the entire s y l l a b l e are i n i t i a t e d simultaneously and executed simultaneously as long as they are noncompeting. Competing commands, such as l i p retraction vs l i p rounding, are executed in sequence. Therefore commands for an /i/ in one environment, would be di f f e r e n t from commands for an IM in another environment. Coarticulation would be maximum within the a r t i c u l a t o r y s y l l a b l e , and minimum across such s y l l a b l e boundaries. Such a s y l l a b l e i s described by Kozhevnikov and Chistovich as the CC...V un i t , which has been found by themselves and others to be a strongly cohesive unit and to exhibit strong c o a r t i c u l a t i o n effects within i t s e l f . Fromkin [1966] used electromyography to study action of the or b i c u l a r i s o r i s muscle for production of /b/, /p/, and-various rounded and unrounded vowels in English. Her r e s u l t s , obtained from three speakers, show that no simple correspondence exists between phoneme and motor command; di f f e r e n t muscle action potentials are responsible for producing an i n i t i a l /b/ or /p/ and a f i n a l /b/ or /p/. However, further contextual aspects have no effect on the muscle gesture for these phonemes, at least as far as this muscle i s concerned. Muscle 10 action potentials are r e l a t i v e l y invariant for production of the /b/ in a /bVC/ s y l l a b l e , regardless of the values of the following phones. S i m i l a r l y , action potentials for f i n a l /b/ are unaffected by preceding phones in a /CVb/ s y l l a b l e . The same results apply to i n i t i a l and f i n a l /p/. Looking at EMG a c t i v i t y of the same muscle during vowel pro-duction, Fromkin did note influence of adjacent phonemes. The rounded vowels /u/ and /o/ show appreciably lower peak amplitude of EMG a c t i v i t y when following i n i t i a l /b/, which i t s e l f involves contraction of the o r b i c u l a r i s o r i s muscle, than when following i n i t i a l /d/. Muscle a c t i v i t y for a rounded vowel i s uninfluenced in amplitude or duration by the following consonant of a CVC s y l l a b l e , be i t /b/ or /d/. Thus i t seems that some aspects of context somehow r e s t r i c t or reorganize the neuromuscular commands and gestures for some phonemes, while other aspects do not. Just what the nature of the reorganization i s , is not known, Fromkin states. Her findings lead her to put forth two suggestions. Perhaps the minimal l i n g u i s t i c unit at the motor command level i s larger than the phoneme, possibly, in her words, of the order of a s y l l a b l e . This theory agrees with the Kozhevnikov-Chistovich model of speech organization. However, Fromkin does not give any indication of the size or nature of the s y l l a b l e proposed. The second p o s s i b i l i t y i s that motor commands are altered with context by a feedback system concerning the exi s t i n g state of muscle position and a c t i v i t y , or by information held in short-term memory. This theory i s consistent with the idea that the phoneme i s a basic unit of speech production at the neuromuscular l e v e l . Both theories 11 proposed by Fromkin are able to account for the coa r t i c u l a t i o n effects she observed. Ohman [1966] describes the coarticulated VCV utterance as follows: We have clear evidence that the stop-consonant gestures are act u a l l y superimposed on a context-dependent vowel substrate that i s present during a l l of the consonantal gesture. [Ohman, 1966, p. 165] Production of the consonant in such a s y l l a b l e involves three separate, but probably overlapping, sets of muscles in the tongue, each of which has separate neural representation in the motor control networks of the brain. The response of the tongue to a r t i c u l a t o r y commands coming independently over three d i f f e r e n t channels i s a summation of the components of the instructions. As the tongue i s executing commands for one phone, certain subsets of muscles are l e f t free to anticipate the following phone, instructions for which are also coming down independently. Therefore, consonant production i s accomplished by ar t i c u l a t o r y adjustments that p a r t i a l l y anticipate the configuration of the succeeding vowel, though certain components of are inhibited during C production. Henke [1966] proposes a system whereby production i s pro-grammed phoneme by phoneme, but there i s a scanning of upcoming feature s p e c i f i c a t i o n s . If a phoneme has no sp e c i f i c a t i o n for a particular feature, such as l i p rounding, the system looks ahead to the next phoneme for which that feature i s specifi e d , and the ar t i c u l a t o r s i n i t i a t e movement toward that goal. 12 MacNeilage and DeClark [1969] questionned whether changes in motor gesture with context are due to changes in underlying neurological control or to mechanical constraints and modifications on an invariant phoneme command. Examination of cinefluorograms of the vocal t r a c t and EMG tracings from nine a r t i c u l a t o r y locations showed that both l e f t - t o -r i g h t (carryover) effects and r i g h t - t o - l e f t (anticipatory)effects of adjac-ent phonemes on each other are present in CVC s y l l a b l e s . They state: I t i s quite clear from these results that the command system responsible for CVC s y l l a b l e s does not consist of a series of context-independent phoneme commands that retain t h e i r independence a l l the way down to the level of muscle contraction. [MacNeilage and DeClerk, 1969, p. 1228] They hypothesize three mechanisms at work to account for these effects. F i r s t i s an anticipatory mechanism, in which the greater the amount of muscle contraction required for a certain phoneme, the greater the amount of anticipatory contraction of that muscle in the preceding phoneme. An i n h i b i t o r y component against muscle contraction antagonistic to the muscular movement required for the upcoming phoneme might also be involved in the anticipatory mechanism. Such a system can explain r i g h t - t o - l e f t coarticulatory effects. The second mechanism at work i s a compatibility mechanism. Since more or less contraction i s necessary to assume a pa r t i c u l a r a r t i c u l a t o r y position, depending on the previous position of the a r t i c u l a t o r , upcoming commands for contraction might be made compatible with the existing state of muscle contraction. This would be accomplished via a feedback system involving the cerebellum. Such a system i s able to account for the strong l e f t - t o - r i g h t influence imposed by context. 13 This mechanism i s somewhat, sim i l a r to one proposed by Fromkin [1966]. The t h i r d suggested mechanism at work i s a gamma-loop mechanism. In this case commands are sent down for a muscle to assume a particular length, regardless of i t s existing length, by the gamma system of motoneurons which innervate stretch-receptive spindles within the muscles. Thus commands would be invariant, but EMG a c t i v i t y necessary to achieve the specified length would show the context-dependent variety seen in several studies. This model seems approp-r i a t e for speech production which involves approximation of target positions regardless of context. MacNeilage and DeClerk point out tnat j o i n t action of the three mechanisms outlined above on invariant phoneme commands cannot account for a l l the c o a r t i c u l a t i o n effects seen. The authors c i t e two further mechanisms that do not necessitate r u l i n g out invariant phoneme commands as the basis of production. At least they may be present at certain levels of the speech production system. The f i r s t p o s s i b i l i t y i s that other modification mechanisms, such as the use of somesthetic information, are at work. The second p o s s i b i l i t y i s that to a c e r t a i n , maybe considerable, extent, motor commands are organized in units larger than the phoneme; perhaps as suggested by others, commands are issued for a s y l l a b l e at a time. However, since they were unable to observe effects of i n i t i a l and f i n a l consonants on each other, MacNeilage and DeClerk suggest that the CVC unit does not q u a l i f y as the unit of command organization. They feel that the CV segment, which shows more r i g h t - t o - l e f t c o a r t i c u l a t i o n effects than the VC segment, i s a more cohesive unit. 14 . Dani 1 off and Moll U 9 6 8 ] extended Kozhevnikov and Chistovich's 1965 work on l i p protrusion, to the production of strings of one to four consonants followed by the rounded vowel /u/. The sequences were embedded in meaningful English sentences and spoken by three subjects. Though the utterances contained the phonemes / r / and / l / , which themselves may involve l i p protrusion, the authors noted that such an amount of protrusion was small. Cineradiography was used to evaluate a r t i c u l a t o r y behavior. Findings show that l i p protrusion extends over as many as four consecutive consonants before a rounded vowel, and that the extent of coar t i c u l a t i o n i s not affected by word or s y l l a b l e boundaries within the consonant s t r i n g . Results are in general agreement with those of Kozhevnikov and Chistovich. However, Daniloff and Moll observed onset of protrusion before contact for the f i r s t consonant was achieved, whereas Kozhevnikov and Chistovich noted protrusion onset at the time of contact for the f i r s t consonant. In a number of cases noted by Daniloff and Moll, protrusion began even before movement toward the f i r s t consonant was i n i t i a t e d , that i s , outside the boundary of the CC...V unit. Cowan [1973] found sim i l a r c o a r t i c u l a t i o n effects for l i p protrusion in French utterances. Six native French speakers produced utterances containing strings of four and six. consonants before a rounded vowel. She.found that in almost a l l cases, protrusion for the upcoming vowel began with production of the f i r s t consonant of the c l u s t e r , and in approximately half the cases, protrusion began during the production of the vowel preceding the consonant cluste r . 15 Coarticulation effects have been observed in the motion of the l a t e r a l pharyngeal wall [Kelsey ejt a^ .., 1969]. An ultrasonic method of data c o l l e c t i o n was used, in which a pulsed ultrasonic signal was beamed toward the pharyngeal wall and the time of echo return provided a measure of displacement of the a r t i c u l a t o r . Three speakers uttered VCV utterances. Data show that displacement during production of /a/ varies as a function of phonetic context. Amerman et al_. [1970] investigated c o a r t i c u l a t i o n effects jaw and l i p movements by cineradiography. Four speakers produced meaningful utterances which included segments of one to four consonants preceding the vowel /&/. Jaw lowering and l i p retraction are two gestures involved in the production of th i s vowel. Jaw lowering v/as found to coarticulate over two and sometimes three phones before /a&/, and could presumably extend over a l l four consecutive consonants, had not one of the consonants consistently been /s/. Amerman et a l . found /s/ production antagonistic to jaw lowering; t h i s gesture was never i n i t i a t e d during /s/ production, but began immediately after i t . S i m i l a r l y , l i p retraction seemed to be inhibited by /s/ production and was never i n i t i a t e d during i t . However, a good /s/ can be produced with retracted l i p s and the authors suggest that perhaps i n h i b i t i o n of one gesture for /ae./ production f a c i l i t a t e s i n h i b i t i o n of another gesture related to /a&/ production. In general, l i p retraction was not as extensively coarticulated as jaw lowering. Though i t sometimes extended two and three consonants before the vowel, several of the cases showed retraction beginning with the star t of the vowel and not 16 before. However the l i p retraction measure was not considered by the authors as r e l i a b l e a measure as jaw lowering, due for instance to some l i p protrusion during />/ production. The authors feel that i n -consistencies in the synchrony and st a r t i n g points of the two gestures are not predicted by the Kozhevnikov-Chistovich model, which states that commands for the s y l l a b l e are specified simultaneously and synchronously. The nature of the coarticulatory unit found in t h i s study i s in agreement with that model's a r t i c u l a t o r y s y l l a b l e , i . e . a CC..V unit. The data f i t Henke's model of production equally w e l l . Carney and Moll [1971] extended Ohman's 1966 study of co-a r t i c u l a t i o n in VCV utterances. Whereas Ohman had examined coarticu-l a t i o n of vowels and stop consonants, Carney and Moll looked at fricative-vowel interactions. MacNeilage [1963] had previously shown acoustic properties of the f r i c a t i v e / f / to be context dependent; s p e c i f i c a l l y , duration of / f / in f i n a l position was twice as great as for / f / embedded in a consonant cluster. However electromyograms taken at the l i p s during / f / production did not show pattern changes with context, except to some extent for onset of a c t i v i t y . Carney and Moll placed f r i c a t i v e s in a vowel rather than a consonant environment, and looked at effects on the tongue as well as the l i p s . They analyzed cineradiographs of two speakers producing /hVCV/ utterances, in which C was the f r i c a t i v e / f / , /v/, /s/, or /z/. Unlike MacNeilage, they found muscle gestures for production of f r i c -atives to be influenced by context. Their results agree with Ohman's [1966] description of a consonantal gesture superimposed on a basic vowel-to-vowel diphthongal gesture. The findings show that i f an 17 . a r t i c u l a t o r i s free, as the tongue body and t i p are during / f / or /v/ production, then co a r t i c u l a t i o n i s seen in the tongue and in the l i p s throughout the vowel-to-vowel movement. Coarticulation effects have been observed in velar movements by Moll and Daniloff [1971]. Four subjects produced English sentences containing various combinations of nasal consonants, non-nasal conson-ants, and vowels. Examination of cineradiograms showed that movement towards velar opening in a CVN or CVVN (where N = nasal) sequence begins after contact for the i n i t i a l consonant. Thus nasality i s coarticulated over the VN or VVN unit. S i m i l a r l y , for NVC sequences, movement towards velar closure begins during the approach to the vowel, and sometimes even during the nasal i t s e l f . The unit over which coarticu-l a t i o n extends in this case i s the VC unit. These results d i r e c t l y contradict Kozhevnikov and Chistovich's hypothesis that CV i s the basic unit of production within which coar t i c u l a t i o n i s strongest. Moll and Daniloff tend to support a model such as Henke's where commands are specified phoneme by phoneme. Thus three major systems have been put forth to account for coarticulatory behaviour. One i s the Kozhevnikov-Chistovich "a r t i c u l a t o r y s y l l a b l e " model, in which neural commands are organized in s y l l a b l e - l i k e units. Though this model accounts for much of the observed data, the a r t i c u l a t o r y s y l l a b l e i s described as a CC..V group, whereas studies indicate that c o a r t i c u l a t i o n may extend back to encompass a VCC..V group [Daniloff and Moll, 1968] or a VC or CVC group [Moll and Daniloff, 1971]. However, MacNeilage [1972] c i t e s 18 evidence that, in a CVC s y l l a b l e , there i s weaker co a r t i c u l a t i o n within the VC segment than within the CV segment, indicating that CV i s a strongly cohesive unit. The second major model i s that of Henke, whereby a forward scanning system allows a free a r t i c u l a t o r to begin movement towards position for an upcoming phoneme. Such a system would be operative during anticipatory coarticulation.MacNeilage & DeClerk [1969] point out that such an anticipatory mechanism may be one of several at work during speech production. Ohman [1966, 1967] describes a t h i r d model of c o a r t i c u l a t i o n , in which a consonantal gesture i s superimposed on a diphthongal vowel-to-vowel movement. The phoneme command for consonant production i s invariant, but the vocal t r a c t shape during i t s production i s a result of an overlap of vocal t r a c t shape assumed for the consonant and the varying shape due to vowel environment. Thus contextual modifications take place at the motor l e v e l . Carry-over co a r t i c u l a t i o n i s accounted for in most models by mechano-i n e r t i a l f actors, or by the compatibility mechanism [MacNeilage and DeClerk, 1969] described e a r l i e r . 2.4 Coarticulation: The Perceptual Level Recent studies have examined the perceptual correlates of c o a r t i c u l a t i o n . The question asked i s , whether the acoustic and a r t i c u l a t o r y modifications due to c o a r t i c u l a t i o n in an utterance provide information u t i l i z a b l e by the l i s t e n e r . Al i et^ al_. state: It i s uncertain in most s p e c i f i c cases i f coarticu-l a t i o n on the a r t i c u l a t o r y level results in perceptible differences on the perceptual l e v e l . . . . If the answer i s affirmative, then i t can be said that speech perception 'follows' speech production and makes use of i t s idiosyncracies. [Al i et al_., 1971, p. 538] 19 . A point to keep in mind when studying the perceptual correlates of c o a r t i c u l a t i o n i s that the subject i s being asked to make subphonemic discriminations, subtle d i s t i n c t i o n s that do not affect the value he assigns to a phone. To what extent can we r e a l i s t i c a l l y expect him to do so? I t i s known that subphonemic det a i l (one form of which i s allophonic variation) can be distinguished within a single phoneme category, even though speech perception i s i t s e l f to some extent a categorical process. For example, Liberman et al_. [1957] showed that l i s t e n e r s can make subphonemic d i s t i n c t i o n s when they are presented with synthetic speech sounds varying along an acoustic continuum. Stimuli were produced by a pattern playback, consisted of f i r s t and second formant patterns, and varied in direction and extent of the second-formant t r a n s i t i o n . This variable i s a cue which has been found to be instrumental in making /b,d,g/ d i s t i n c t i o n s . Fourteen d i f f e r e n t stimuli were produced, and presented to subjects in an ABX arrangement. In a separate t e s t , sub-jects were asked to make phonemic judgments of the same s t i m u l i , that i s , to state whether each was /b/, /d/, or /g/. Comparing the results of both studies, the authors determined that (1) phonemic d i s t i n c t i o n s along the continuum are categori c a l , the point at which a response changes from one phoneme to another being abrupt and consistent, (2) subphonemic discriminations across phoneme boundaries are able to be made to some extent, and (3) discriminations across phoneme boundaries are better and more consistently made than discriminations within a phoneme category. 20 Fry points out that . . . a pair of utterances may appear indis t i n g u i s h -ably the same to a l i s t e n e r of one n a t i o n a l i t y and indisputably d i f f e r e n t to a l i s t e n e r of another n a t i o n a l i t y . . . . [Fry, 1964, p. 60] This i s another point to consider in evaluating perceptual studies of c o a r t i c u l a t i o n . Fry c i t e s work by Lotz et a_l_. [1960] on phonemic l a b e l l i n g of the same set of stimuli by di f f e r e n t language groups. Fortis aspirated, f o r t i s unaspirated, and lenis unaspirated stops were presented to speakers of various languages. The stimuli were placed into phonemic categories as follows: by English speakers, into /p,t,k/, /b,d,g/, and /b,d,g/ groups respectively; by Hungarian and Spanish speakers, into /p,t,k/, /p,t,k/ and sometimes /b,d,g/, and /b,d,g/ groups; by Thai speakers (in whose language aspiration i s phonemic), into /p,t,k/, / p ^ t * 1 , ^ / , and /b,d,k/ groups. For the velar case, Thai speakers assigned the lenis unaspirated stop to the /k/ category, there being no /g/ in Thai, though the p o s s i b i l i t y of the /g/ label was available to them. Thus i t seems that perceptions are influenced by language learning. In considering this point in r e l a t i o n to c o a r t i c u l a t i o n studies, one might ask whether French l i s t e n e r s , for example, make fi n e r judgments regarding l i p rounding than do English ones. I t has already been seen that coar t i c u l a t i o n on the a r t i c u l a t o r y level may be language dependent [Ohman, 1966]. Findings on phonemic l a b e l l i n g opposite to those described above emerged in a study of cross-language vowel perception carried 21 out by Stevens et al_. [1969]. Thirteen unrounded and thirteen rounded vowels were synthesized on the OVE II speech synthesizer, with the f i r s t three formants varying along an acoustic continuum. Two ABX discrimination tests' were administered, one for the unrounded and one for the rounded vowels, to a group of Swedish and a group of American English speakers. Two phonemic i d e n t i f i c a t i o n tests were administered for the same stimuli to the same subject groups. The rounding feature i s phonemic in Swedish, but not in English. Results show that for vowels presented in i s o l a t i o n , the l i s t e n e r ' s l i n g u i s t i c experience has e s s e n t i a l l y no effect on his a b i l i t y to make subphonemic discriminations, nor does i t appreciably affect his i d e n t i f i c a t i o n of phonemic categories. L i t t l e difference was seen in the phoneme boundaries determined by the Swedes and those determined by the Ameri-cans. The boundaries assigned by these groups differed by no more than one step along the acoustic continuum for the unrounded vowel serie s , and one to two steps for the rounded vowels. These findings in a sense do not contradict the language-dependence found by Lotz and his colleagues [I960]. Subjects were presented with d i f f e r e n t tasks in these two studies. There i s no reason to assume that, given the same series of f o r t i s aspirated, unaspirated, and le n i s unaspirated s t i m u l i , and asked to place each into one of three categories (a situation s i m i l a r to the i d e n t i f i -cation task presented by Stevens et al_.) speakers of a l l languages investigated by Lotz e_t aj_. would not be able to assign each phone to i t s appropriate category. For some of these speakers, some of the category assignments would be based on a phonemic d i s t i n c t i o n , 22 and some would be based on a subphonemic d i s t i n c t i o n . English speakers involved in the experiment by Stevens et al_. placed rounded vowels into phoneme categories not appreciably d i f f e r e n t from (although some-what less consistent than) those chosen by the Swedes, though for the o English speakers the placements were based on subphonemic discriminations. What Lotz's experiment does show, i s that depending on his l i n g u i s t i c experience, a l i s t e n e r may chose to ignore some of the d i s t i n c t i o n s he i s capable of making. In addition to t h e i r study of vowel discriminations described above, Stevens e_t al_. [1969] replicated the experiment on consonant discrimination done by Liberman et a]_. [1957, also described above]. Synthetic stop consonants, for which the f i r s t three formants varied along an acoustic continuum, were presented in an ABX s i t u a t i o n . Stevens and his coworkers found that subphonemic discriminations along a physical scale were better made for vowels than for stop consonants. For example, correct discrimination could be made within a vowel phoneme category 80-90% of the time (depending on how far along the acoustic continuum they d i f f e r e d ) , but within a consonant phoneme category only 60-65% of the time. The authors c i t e the suggestion that d i f f e r e n t mechanisms may be involved in vowel and consonant perception. In addition, investigators have found that vowels are not perceived as categorically as consonants [Kozhevnikov and Chistovich, 1965; Liberman et al_., 1967], also suggesting that separate perceptual processes may be at work for these two classes of phones. However, Liberman et al_. point out that vowels studied in i s o l a t i o n , or the "unencoded" state, as in the above studies, may not trigger perception 23 in the speech mode, and that evidence exists that vowels embedded in phonetic context are more nearly categorically perceived than are unencoded vowels. Liberman e_t al_. [1967] discuss subphonemic perception as being essential to speech perception: That subphonemic features are present both in production and perception has by now been quite c l e a r l y established . . . we must deal with the phonemes in terms of t h e i r constituent features because the existence of such features i s essential to the speech code and to the e f f i c i e n t production and perception of language. . . . high rates of speech would overtax the temporal resolving power of the ear i f the acoustic signal were merely a cipher on the phonemic structure of the language. [Liberman et a]_., 1967, p. 446] It should be noted that the "features" discussed above are not the d i s t i n c t i v e features discussed by Jakobson and his colleagues, but are constituent motor gestures and neural commands of phonemes. These researchers support the motor theory of speech perception, which states that speech i s perceived in reference to the motor gestures that can produce i t . They showed that acoustic signals may vary greatly and s t i l l produce the same perceptual e f f e c t . For example, the frequency of the starting point of the second formant t r a n s i t i o n from /d/ to a following vowel can vary by as much as 1000 Hz, depend-ing on the vowel, yet a /d/ i s perceived in a l l cases. Since a phoneme's acoustic signal varies not only with context but also from speaker to speaker, i t i s necessary to explain how the l i s t e n e r i d e n t i f i e s the phoneme each time. Liberman et al_. propose that the l i s t e n e r traces the variable acoustic signal back to the less 24 variable a r t i c u l a t o r y gestures with which he himself would produce the signal. He then i d e n t i f i e s the signal in reference to these motor gestures. Since the motor gesture for a particular phone can be broken down into several elements (e.g. r a i s i n g the velum, rai s i n g or lowering the tongue, i n i t i a t i n g vibration of the vocal cords), then perception of the phone's constituent features can in some manner occur. To what extent the l i s t e n e r may perceive subphonemic, or allophonic, v a r i a t i o n s , has been examined by Wickelgren [1969]. He c i t e s the context-sensitive allophone as a unit of perception. Such a unit i s one which specifies i t s right and left-hand neighbours. Thus the word "tap" would be coded as / ^ t 8 6 / , * 4,P*7- The input to the perceptive mechanism could thus be an unordered set of symbols, the coding system allowing correct order to be recovered from such a set. The context to which such allophones are sensitive i s limited to one preceding and one following phone, in Wickelgren's model. As we have seen, such i s not the case in production, where a phoneme such as a rounded vowel may exhibit an effect on another phoneme as many as six sounds removed from i t s e l f [Cowan, 1973]. Perhaps though, an allophone i s sensitive to an extent which i s perceivable only to adjacent phonemes. A major problem with Wickelgren's hypothesis i s the extremely large number of neural units that must be available and through which a l l acoustic input must be channeled, for i t i s assumed that each context-sensitive allophone has i t s own neural representation. It may be appropriate here to point out the arguments that 25 exist for various other perceptual units. Speech perception may take place on several levels. Though subphonemic d i s t i n c t i o n s can be made, the fact that consonants show strong and d e f i n i t e categorical perception [Liberman et al_., 1957], and that the same i s true of vowels to a lesser extent [Stevens et_ al_., 1969], provides evidence for the phoneme as a basic speech perception unit. Savin and Bever [1970], however, believe that individual phonemes are i d e n t i f i e d only after perception on yet another level has been carried out. They asked subjects to monitor a speech sample for a pa r t i c u l a r u n i t , either a s y l l a b l e or a phoneme within a sylable. Results showed that response times for s y l l a b l e i d e n t i f i c a t i o n were faster than for i d e n t i f i c a t i o n of a pa r t i c u l a r phoneme, suggesting the s y l l a b l e was f i r s t perceived as a unit, before the phoneme i t s e l f was i d e n t i f i e d . Certain syntactic sequences may be perceived as units. By presenting extraneous sounds ( c l i c k s ) during sentences, Ladefoged and Broadbent [1960] found that l i s t e n e r s tend to locate the c l i c k s far removed from t h e i r actual location. They argue that subjective displacement of c l i c k s i s towards boundaries of perceptual units. Several further studies on c l i c k displacement, outlined by Lehiste [1972], have been carried out with inconsistent r e s u l t s . Subjective location of extraneous sounds i s also related to stress, intonation, and other surface phenomena. However, i t i s clear that acoustic cues alone do not determine the boundaries of perceptual units, and that higher level sequences are somehow perceived as units. 26 • Lehiste [1972] sums up a discussion on perceptual units by saying that two basic steps e x i s t in speech perception: primary processing, consisting of auditory and phonetic processing, and l i n g u i s -t i c processing, consisting in part of phonological and syntactic processing. Though the auditory level must precede other levels of processing, i t i s possible that phonetic and l i n g u i s t i c processing may proceed concurrently. Units at di f f e r e n t levels d i f f e r in size. Thus we see that, though perception i s primarily a categorical process on one l e v e l , and that higher level sequences may act as units in perception, l i s t e n e r s are indeed capable of making subphonemic d i s t i n c t i o n s . I t i s this type of discrimination that subjects are asked to make in the coa r t i c u l a t i o n studies outlined below. It i s possible that many of the large number of d i s t i n c t i o n s a l i s t e n e r can make when hearing a speech sample are ignored, in favor of grouping several d i f f e r e n t , but somehow s i m i l a r , sounds into a single category for quicker processing. Whether subphonemic perception i s of primary importance in the speech perception process i s not cl e a r , since discrimination i s consistently poorer within a phoneme category than across i t s boundaries. However, in times of unfavorable conditions, for example a noisy environment, or a large amount of information having to be processed quickly, i t may be that subphonemic nuances are used by the perceptual mechanism to provide additional cues. Perceptual r e a l i t y of coarticulatory effects would mean that, on hearing one sound, the l i s t e n e r not only has acoustic information on its value-, but has information to v e r i f y the value he has assigned 27 to the preceding phone(s), and to tentatively anticipate the value of the upcoming phone(s). Such a process would f a c i l i t a t e correct i d e n t i f i c a t i o n of any one speech sound. Let us now examine the few studies that have been done on the perception of coarticulatory effects. Moll and Daniloff [1971] had shown that velopharyngeal opening in CVN and CVVN sequences (where N i s a nasal consonant) almost always begins during the CV t r a n s i t i o n . To test the perceivabi1ity of this coarticulated n a s a l i t y , Al i ejt al_. [1971] spliced the f i n a l consonant and i t s VC t r a n s i t i o n from English CVC and CVVC utterances, in which the f i n a l consonant was sometimes a nasal and sometimes not. Twenty-two subjects were presented with the spliced utterances and asked to i d e n t i f y the missing consonant as nasal or non-nasal. Results show that nasal stimuli were correctly i d e n t i f i e d s i g n i f i c a n t l y above chance l e v e l . There was no s i g n i f i c a n t difference between correct perception of /n/ and /m/. Stop consonants were i d e n t i f i e d as nasals more frequently than were f r i c a t i v e s . Consonants following the vowel /a/ were perceived as nasal more often than consonants following other vowels. S i g n i f i c a n t individual subject differences were found. The authors believe that in the case of na s a l i t y , the perceptual mechanism does make use of coarticulated information. Lehiste and Shockey [1972] tested the perceivabi1ity of vowels removed from a VCV utterance (where C i s a stop consonant). Ohman [1966] had previously shown that the VC and CV transitions in such an utterance are influenced by the transconsonantal vowel. For the perceptual t e s t , VCV utterances were cut in two during the 28 consonant closure. Over twenty subjects were asked to i d e n t i f y the missing i n i t i a l or f i n a l vowel. Though Lehiste and Shockey noted the same coart i c u l a t i o n effects spectrographically for t h e i r utterances as did Ohman, they found that these contextual effects are not s u f f i c i e n t for i d e n t i f i c a t i o n of the deleted segment. Nor was enough information present in the spliced utterances to i d e n t i f y a feature of the deleted phone, such as high/low or front/back; incorrect responses did not tend to share a feature with the correct response. The authors conclude that "whatever the effects of c o a r t i c u l a t i o n in terms of t h e i r influence on formant t r a n s i s i t i o n s , these effects are not s u f f i c i e n t to have an influence on perception" [Lehiste and Shockey, 1972, p. 84]. Lehiste [1972] c i t e s these results as evidence against Wickelgren's [1969] model of speech perception, which involves coding of context-sensitive allophones. The physical modifications are undoubtedly there, but i f the context of a context-sensitive allophone i s not perceptible, i t seems un j u s t i f i e d to assume that context-sensitive allophones are the basic units of speech perception. [Lehiste, 1972, p. 5] Lehiste and Shockey's [1972] findings are contrary to those of Kuehn [1970], as cited by Carney and Moll [1971], who found that l i s t e n e r s were able to predict of a V - j ^ utterance above chance l e v e l , when they were given the i n i t i a l segments of the utterance. However, Carney and Moll do not discuss the test si t u a t i o n used by Kuehn, and therefore s t r i c t comparisons between the two studies cannot be made. In comparing the A l i et al_., and Lehiste and Shockey studies, we see that context of the CV- and CVV- units was recoverable, but 29 that context of the VC- or -CV unit was not. It may be noted that in the f i r s t case, the subphonemic cues re l a t i n g to context must be e l i c i t e d from the preceding vowel, and in the second case, from the preceding or following VC or CV t r a n s i t i o n . It has already been seen that subphonemic discriminations are more e a s i l y made for vowels than for consonants [Stevens et al_., 1969], and i f we for a moment consider the CV or VC t r a n s i t i o n as part of the consonant, or at least as behaving as a consonant in this respect, then we may adduce an explanation for the above findings: c o a r t i c u l a t i o n effects on a vowel are more e a s i l y perceived than those on a consonant. However, i t must be kept in mind that there i s indication that vowels in phonetic context are not as d i f f e r e n t l y perceived from consonants as data on isolated vowels suggests [Liberman ejt al_., 1967]. Also, the motor gestures involved in the coarticulatory effects of the two cases described above are di f f e r e n t -- the f i r s t involves lowering of the velum, the second involves tongue movement. I t may be that the effects of these two motor gestures are perceived to di f f e r e n t extents. Human lis t e n e r s may be inherently more aware of s l i g h t changes in one type of gesture than in another. Clark and Sharf [1973] looked at coarticulatory effects of o n s n o r t term r e c a l l of V-j in V^CV2 utterances. By presenting l i s t s of VC/V ( f i n a l vowel deleted), VCV ( f i n a l vowel retained), and VC# (no f i n a l vowel produced and thus no coa r t i c u l a t i o n present) utterances to subjects, they found that the presence of coart i c u l a t i o n influenced the % correct r e c a l l of the i n i t i a l vowel. They determined that the coa r t i c u l a t i o n effects in question are perceived by the 30 l i s t e n e r and registered in short term memory. Previous investigators have suggested that the l i s t e n e r remembers for a certain time the spectral c h a r a c t e r i s t i c s of the phone he hears, and on ident i f y i n g i t as a phoneme, uses the necessary information and discards the rest [Lehiste, 1972]. In other words, he retains subphonemic information in his memory for some unspecified length of time. Whether the process as described by Clark and Sharf i s naturally operative in speech perception i s not cl e a r , since, though r e c a l l for the VC/V condition was f a c i l i t a t e d over the VC# condition, the VCV condition did not have the same f a c i l i t a t i v e e ffect. The authors a t t r i b u t e this to a possible perceptual overloading, the subject hearing twice as many vowels in the VCV than in the VC/V condition. They suggest that even in the VCV condition, the effects may be registered but ignored. Sharf and Ostreicher [1973] looked at the effects of coar t i c u l a t i o n on i d e n t i f i c a t i o n of nasal consonants in noise. Using utterances of the form C-jVI^V, where consists of 0, 1, or 2 non-nasal consonants, they found that i d e n t i f i c a t i o n of N was s i g n i f i c a n t l y better when a l l the post-nasal sounds were retained than when they were deleted. That i s , when the carryover c o a r t i c u l a t i o n effects present in the post-nasal sounds v/ere available, subjects scored better in nasal i d e n t i f i c a t i o n in noise than when these effects were removed. By asking subjects to id e n t i f y the f i n a l vowel from the same truncated utterances, the authors noted a better than chance level of correct i d e n t i f i c a t i o n i f no consonant had o r i g i n a l l y intervened between N and V, and a consistent but i n s i g n i f i c a n t trend 31 for the number of correct vowel i d e n t i f i c a t i o n s to decrease as the number of intervening consonants increased from 0 to 2. This seems to indicate that anticipatory c o a r t i c u l a t i o n effects of the vowel on the nasal aid in i d e n t i f i c a t i o n of the deleted vowel, but that as nasal and vowel move farther apart, the weakened coarticulatory effect becomes imperceptible. Thus they conclude that anticipatory c o a r t i c u l a t i o n produces a strong enough cue in the nasal to f a c i l i t a t e i d e n t i f i c a t i o n of the upcoming vowel, and that cues present in the vowel due to carryover c o a r t i c u l a t i o n with the preceding nasal aid in the correct perception of the nasal. It remains to be seen which coarticulatory influences are perceivable and which are not, and over how long a sequence of phones coarticulatory information i s usable. 32 CHAPTER 3 AIMS OF THE EXPERIMENT Some major questions in the study of speech perception are : What features and cues does the l i s t e n e r abstract from the speech signal in attempting to i d e n t i f y i t ? Is a l l the acoustic information present in the signal u t i l i z a b l e for the perception process? Are a l l the f i n e , as well as gross, motor adjustments involved in the production of the speech signal recognized and interpreted by the l i s t e n e r ? Research has shown that neither the acoustic value of a phoneme, nor the motor gesture that produced i t , i s invar iant across d i f f e r e n t contexts. How much of t h i s va r ia t ion i s perceivable , and to what extent does i t a c t u a l l y provide cues for perception? Studies on the perceptual corre lates of c o a r t i c u l a t i o n have begun to indicate that the l i s t e n e r may use some of the ever-present contextual va r ia t ion as an aid in ident i f y ing speech sounds. The present experiment attempts to provide further information in t h i s area. I t asks whether c o a r t i c u l a t i o n provides perceivable informat ion, that i s , whether i t contains cues usable in the speech perception process. Utterances containing the sequence -C -^C^C^V- (where C^  i s a consonant and V a rounded or unrounded vowel), in which coar t icu lated l i p rounding i s known to occur when V i s a rounded vowel, are truncated at four points before the vowel. Edited versions thus contain d i f f e r e n t amounts of coar t i cu la ted information. By presenting these stimuli to phonetically trained and phonetically naive native French and native English speakers, the present experiment attempts to do the following: 1. To discover whether coar t i c u l a t i o n of l i p rounding in French produces perceivable information, by asking subjects to i d e n t i f y a missing vowel for which c o a r t i -culation i s present. 2. To discover over how many segments such information i s perceivable. Coarticulation on the a r t i c u l a t o r y level i s known to extend over a l l four consonants in the type of utterance described above. 3. To investigate the language-dependent nature of the perception of coarticulated information, by comparing results from French and English speakers; and to reveal whether perception of these cues plays a normal part in the speech perception process, or whether they may nevertheless be abstracted from speech by a suitably trained l i s t e n e r , by comparing results from phonetic-a l l y trained and phonetically naive subjects. 34 CHAPTER 4 MATERIALS AND METHODS 4.1 P i l o t Study , Preparation of Test Tapes Two p i l o t test tapes were constructed. The items of the f i r s t test contained the consonant cluster / k s t r / followed by each of the three vowels /i/, /y/, and /u/. The sequences were derived from the three French utterances " l a dextre inimitable," " l a dextre . universelle," and " l a dextre outragee." These utterances were recorded during the course of a previous experiment [Cowan, 1973] in a non-soundproof environment. A wide-band hum due to the operation of a graphic recorder during t h e i r recording produced d i s t r a c t i n g background noise on the o r i g i n a l tapes. However, i t was decided to use these recordings because the speech wave, the duplex oscillogram, the log i n t e n s i t y of the speech s i g n a l , and a graphic representation of the speaker's upper l i p protrusion were a l l av a i l a b l e , displayed on separate channels of a Siemens Oscillomink graphic recorder. The speaker was a male native speaker of French, from Lausanne, Switzerland. The utterances were edited at three points each, on a PDP-12 d i g i t a l computer, using a set of computer programs written by L. Rice at the UCLA Phonetics Laboratory. (The editing process 35 w i l l be described in Section 4.2). Three edited versions were made: /ladekstr/ /ladekst/ /ladeks/ The test items were recorded onto a Revox A77 tape recorder. (This procedure i s also described in Section 4.2). The p i l o t test tape consisted of three samples of each of the three utterances truncated at each of three points, for a total of 27 items. The test was con-structed so that the longest of the edited versions made up the f i r s t t h i r d of the t e s t , the next longest the second t h i r d , and the shortest the l a s t t h i r d , i . e . : Group 1: 9 items of /ladekstr/ Group 2: 9 items of /ladekst/ Group 3: 9 items of /ladeks/ However, the order of presentation with respect to the missing vowel was random within each group, with each vowel being represented T/3 of the time. The second p i l o t test tape was made in response to some subjects' comments that the f i r s t tape was noisy and d i s t r a c t i n g , and that they had f e l t unsure of the task required of them u n t i l at least one or two utterances had been played. It was constructed s i m i l a r l y , except that the utterances were recorded under soundproof conditions, using an Altec 681A LO microphone and a Scully 280 tape recorder. The 36 • same speaker recorded the same utterances as used in the f i r s t test. These speech samples were edited with the same set of computer programs and at the same three points as described above. I t was proposed that the results of the f i r s t and second tests be compared to determine whether background noise on Cowan's tapes produced a s u f f i c i e n t l y lower score to warrant the use of new tapes recorded under soundproof conditions for the main experiment. In response to the comment that subjects were not sure of the task u n t i l at least two items had been played, the second test contained 29 items, the f i r s t two being practice items whose results were not considered in the analysis. Subjects Subjects were six adults (3 male, 3 female), a l l of whom had some knowledge of phonetics. Only one subject, who was also the speaker on the tapes, was a native speaker of French. One subject was a native speaker of German, a language which makes use of the three vowels under study. The same 6 subjects took part in both Tests I and I I . Test Procedure Subjects were seated, one at a time, alone in a quiet room. The test items were presented over headphones at a comfortable l i s t e n i n g l e v e l . Subjects were asked to indicate in writing whether the missing vowel was /i/, /y/, or /u/. They were f i r s t told what the original^utterances had been. 37 Test I was given in one session and Test II in another. At the time that Test II was administered, Test I was readministered to see i f fami l iar i ty with the test situation affected test results. The tests are hereafter referred to as Test l a ( f i r s t session), Test l b (second session), and Test II (second session). Results Values for relative transmission (T -j) of information (a measure to be discussed in Section 4.2) and % correct score were calculated. Scores were generally higher for Test II than for Test l a or l b . Since no signif icant differences were noted between Tests l a , given in the f i r s t session, and l b , given in the second session, i t was assumed that no practice effect was contributing to the increase in T ^ and score from Test I to Test II. This suggests that improvement from Test I to Test II was probably due to the better l istening conditions on the second tape. A distr ibution of T -| based on random responses to a 27-item test was calculated. This distr ibution is shown in Figure 2 and described in detail in Section 4.2. From the d ist r ibut ion, the maximum value of T -j which a subject could obtain by chance 10% of the time was determined. T , values above this level were considered rel signif icant values of information transmission and the following was observed: a l l subjects obtained signif icant T -j values for Group 1 items; 2 out of 6 obtained signif icant values for Group 2; no subject obtained a signif icant score for Group 3. Responses were 38 also analyzed to see i f they tended to have a feature in common with the stimulus. The a b i l i t y to perceive front/back d i s t i n c t i o n s and unrounded/rounded d i s t i n c t i o n s was examined. For a l l groups of items, the front/back d i s t i n c t i o n was made more often than the unrounded/ rounded d i s t i n c t i o n . Both d i s t i n c t i o n s were made more often for Group 1 than for Group 2, and for Group 3, which contained the shortest edited versions, subjects were giving responses no di f f e r e n t from random guessing. 4.2 Main Study  Speech Samples Because scores were generally higher on P i l o t Test II than on Test I (a or b), i t was decided to use utterances recorded under soundproof conditions for the main study. Three male native speakers of French recorded the utterances. Speaker #1 was born in Lausanne, Switzerland, and had been in North America for 14 years. Speaker #2 was born i n Grenoble, France, and had been in North America for 4 years. Speaker #3 was born in A l b i , France, and had been in North America for 9 years. Fifteen utterances were recorded by each speaker, at least twice each. Each utterance contained one of the consonant sequences / k s t r / , / r s t r / , / r s k r / , followed by one of the three vowels / i / , /y/, or /u/ in a l l possible combinations. Cowan [1973] had shown that, for the utterances decribed above, upper l i p protrusion most often begins with the approach to the f i r s t consonant in the c l u s t e r , i f the cluster i s followed by the rounded vowel /y/ or /u/. Cowan's 39 findings also applied to utterances with 6-consonant clus t e r s . Such utterances were considered for use in the experiment, but since the p i l o t test had shown no s i g n i f i c a n t information to be available when the utterance was truncated after the second consonant of a 4-consonant c l u s t e r , utterances with 6-consonant clusters were not used. Recordings for the present experiment were made in an IAC 1204 soundproof room using an Altec 681 A LO microphone and a Scully 280 tape recorder. One set of 9 utterances, consisting of examples of each of the three clusters followed by each of the three vowels, was chosen from each speaker. Utterances were chosen on the subjective bases of c l a r i t y of the speaker's voice, absence of background noise, s i m i l a r i t y of inton-ation patterns of utterances containing the same c l u s t e r , and presence of a l l phonemes in the cluster.. These 9 utterances are l i s t e d in Appendix I. The remaining 6 utterances from each speaker contained additional samples of clusters which were present in the other utterances, and these samples were not used. Spectrograms, on a Kay Sona-Graph Model 7029A, and mingo-grams, on a Siemens Oscillomink graphic recorder, were made of a l l utterances, for reference in the editing process. Editing of Speech Samples and Preparation of Test Tapes Editing of utterances was carried out using a set of computer programs written by Lloyd Rice for a PDP-12 d i g i t a l computer. This set of programs d i g i t i z e s the speech signal and displays i t on the computer oscilloscope screen, and allows the speech waveform data to be manipulated in various ways. The speech signal was f i r s t low pass f i l t e r e d at 6 kHz to prevent a l i a s i n g of the input signal. It was intended to d i g i t i z e the speech wave at 12 kHz; however, l i m i t a t i o n s of the equipment meant that the computer could not keep up with such a fast transfer rate for the length of time i t took to sample the utterance. The computer was therefore skipping some samples, once the core buffer had been f i l l e d , and notable d i s t o r t i o n resulted. To overcome t h i s problem, each utterance was played at half speed and d i g i t i z e d with a 10 b i t analog-to-digital converter at 6 kHz sample frequency, for an equivalent of 12,000 samples per second. The d i g i t i z e d speech wave thus produced was stored on d i g i t a l tape and could be displayed on the computer screen. A knob controlled the velocity of the speech waveform data as i t moved backward or forward across the screen. The waveform could also be made stationary on the screen. In th i s way, the speech wave could be v i s u a l l y examined as the operator saw f i t . The speech wave was then edited as follows: the speech wave of the whole utterance was displayed on the screen, and the operator marked the desired i n i t i a l point of the truncated utterance by a command on the teletype. In a l l cases, t h i s point was marked ju s t before the onset of phonation at the beginning of the utterance. The waveform was then moved slowly across the screen u n t i l the desired endpoint was v i s i b l e . This point was also entered by a teletype command. The entire edited segment was then stored else-where on the d i g i t a l tape. In th i s way, an edited utterance could be obtained, leaving the or i g i n a l utterance intact and available for making further editions. Each utterance was truncated at four d i f f e r e n t points, producing the four groups of stimuli shown in Table I. Since results of the p i l o t study showed that no s i g n i f i c a n t information i s available when truncation takes place after the second TABLE I Groups of Edited Stimuli with Consonant Clusters / k s t r / , / r s t r / , and /rsk r / . Each sample as Described Above has 3 Versions, One of Which O r i g i n a l l y had the Following Vowel /i/, One /y/, and One /u/. Original Utterances From Which the Edited Stimuli Were Derived Are Listed in Appendix I Truncation immediately after the f i n a l consonant of the clust e r Truncation in the middle of the f i n a l consonant Truncation immediately after aspiration of the t h i r d consonant Truncation immediately after release of the t h i r d consonant, before aspiration GROUP I /ladekstr/ / l a v e r s t r / /lamorskr/ GROUP II /ladekst// /I avers t// / l amors kf/ GROUP III /ladekst h/ / l a v e r s t h / /lamorsk*1/ GROUP IV /ladekst/ . /laverst/ /lamorsk/ 42 consonant of the c l u s t e r , the shortest group of stimuli for the main experiment were truncated after release of the t h i r d consonant (either a / t / or a /k/) of the clust e r . With three speakers, 9 utterances per speaker, and 4 truncation points per utterance, a tota l of 108 test items was available. Truncation points were i d e n t i f i e d primarily by visual examin-ation of the speech wave on the computer oscilloscope screen. Spectro-grams and mingograms were examined for additional cues when necessary. Figure 1 shows a minogram of one of the utterances, and the four points of truncation. I d e n t i f i c a t i o n of truncation points proved d i f f i c u l t for only one case: the i d e n t i f i c a t i o n of the end of aspiration of the t h i r d consonant. As displayed on the computer screen and the mingograph, aspiration was not e a s i l y separated from the following f i n a l consonant, / r / . Spectrograms were heavily r e l i e d upon for t h i s information. Each edited utterance was checked by two l i s t e n e r s for auditory confirmation of the point of truncation. Truncated utterances were played back from the computer through a digital-to-analog converter, low pass f i l t e r e d at 6 kHz to remove high frequency d i g i t a l noise generated by the computer, and recorded onto both channels of a two-channel Scully 280 tape recorder. The computer program also controlled the operation of the tape recorder; i t was set so that 3.25 seconds of silence was recorded before and after each utterance, for a tota l of 5.5 seconds of silence between each item. The order of taping items was randomized with respect to speaker, c l u s t e r , and vowel. Three practice items, picked at random 4 3 2 1 A C D -' C O -cxi ; / Speech Wave Duplex Os c i l l o g r a m Log I n t e n s i t y o f Speech S i g n a l Figure 1. Mingogram of one of the t e s t u t t e r a n c e s , " l a dextre u n i v e r s e l l e " , showing the 4 p o i n t s of t r u n c a t i o n : 1. a f t e r the f i n a l consonant of the c l u s t e r (/r/) 2. i n the middle of the f i n a l consonant 3. a f t e r a s p i r a t i o n o f the t h i r d consonant of the c l u s t e r (/t*1/) ^ co 4. a f t e r r e l e a s e o f the t h i r d consonant, before a s p i r a t i o n 44 from among the 108 test items, were recorded at the beginning of the tape. Two buffer items, also chosen at random from among the test items, were also recorded, one before the test items and one after the test items. Thus the tape contained three practice items, followed by utterances #1 to 110, of which #2 to 109 were the test items, and #1 and #110 were buffer items whose results were not considered in the analysis. Two tapes were made from the or i g i n a l tape, using two Revox A77 tape recorders. Tape A contained items in the o r i g i n a l random order. Tape B contained the same practice items, but the two halves of the test ( i . e . #1 to 55, and #56 to 110) were interchanged. Thus two test tapes were available. The test was recorded on both tracks I and II of each tape. Numbers were recorded before each test item. A non-native speaker of French recorded French numbers on channel I of each test tape, and an English speaker recorded English numbers on channel I I . Subjects A group of 10 native French speaking adults and a group' of 10 native English speaking adults participated in the experiment. Four females and six males made up the French speaking group. They had been in North America from 3 to 14 years. Seven subjects had been born in France, two in Switzerland, and one in Ha'iti. One of the French-born subjects had li v e d in several places in Europe as a c h i l d , but had always spoken French in the home. One of the Swiss subjects had grown up speaking both French and German, though French was her 45 mother language. The Haitian subject had grown up speaking both French and Spanish. A l l subjects had at least a working knowledge of English. Four subjects within the French speaking group had no knowledge of phonetics, while three had had formal phonetic tra i n i n g and three were teachers of the French language with some informal phonetic background. Three of the subjects had served as the speakers on the test. Six females and four males made up the English speaking group. A l l subjects had had approximately 4 years of high school French in Canada, while two had had additional French courses in un i v e r s i t y , also in Canada, and had each spent several months in France. None of the subjects considered himself fluent in French. Six subjects had no knowledge of phonetics while the other four had some degree of phonetic t r a i n i n g . A l l 20 subjects passed a pure tone hearing screening test at 15 dB HL for the frequencies 500, 1000, 2000, 4000, and 6000 Hz. Test Procedure The subjects were seated, one at a time, in a soundproof room with the experimenter. The test tape was played on a Scully 280 tape recorder and presented over TDH-39 Maico headphones at a level of 60-70 dB SPL as measured on a Bruel and Kjaer 2203 precision sound level 3 meter with a Bruel and Kjaer 6 cm 4152 a r t i f i c i a l ear. The experimenter monitored the test over headphones and controlled movement of the tape in the soundproof room by a remote control unit. Subjects were instructed in writing to l i s t e n to each utterance and to mark the missing vowel' on an answer sheet. The 46 missing vowels were described as "i" as in "dites," "u" as in "une," and "ou" as in "bout." (See Appendix II for complete instructions) The vowels were phonetically transcribed as /i/, /y/, and /u/ for those who had a knowledge of phonetics. English subjects were f i r s t asked whether they were f a m i l i a r with the vowels as represented in French orthography. The experimenter then pronounced each vowel in i s o l a t i o n for the English subjects. Included in the instructions were the nine whole utterances from which the edited versions had been taken. Inclusion of th i s l i s t was meant to show subjects that each truncated utterance could in fact be followed by each of the three vowels. Subjects were t o l d that the vowels were represented in approximately equal proportion on the test (that i s , that each vowel appeared approximately 1/3 of the time). Guessing was strongly encouraged. Subjects were asked to mark an indication of the confidence they had in t h e i r answers by marking t h e i r response with a 1 (for most confident), 2 or 3, only i f they f e l t they had time to make th i s judgment. The tape track containing French numbers was played for a l l but three subjects. It was one of these subjects, a French speaker who was one of the speakers on the t e s t , who suggested that numbering be done in French instead of the o r i g i n a l English. Subsequently a l l French subjects heard French numbers. Each English subject was asked whether he preferred to hear the numbers in French or English, and each chose French. 47 The tape was stopped after the three practice items and sub-jects were given the opportunity to hear these items again. Measures of P e r c e i v a b i l i t y The two measures described below, r e l a t i v e transmission (T -|) and correct score (S), were used in analyzing the results of both the p i l o t and the main experiment. The r e l a t i v e transmission i s a measure of covariance between input (the stimulus), and output (the subject's response)[Miller and Nicely, 1955]. This measure was used to describe the amount of transmissible information available in the truncated utterances, and is given by T r e l ( x ; y ) where the input variable i s x, with any one input x^ having the probabil-i t y p., and the output variable i s y, with any one output y^ having the probab i l i t y p.. The symbol p.. represents the pr o b a b i l i t y that a J ' J p a r t i c u l a r input x.. w i l l e l i c i t the p a r t i c u l a r response y.. The more consistently a response can be predicted from the stimulus, that i s , the better the transmission of information, then the closer T .j i s to a value of 1. If the transmission of information i s poor, then stimulus and response are unrelated, and T -j has a value near 0. Values of r e l a t i v e transmission for a series of computer-generated random responses were calculated. Figure 2 shows the di s t r i b u t i o n of T , based on random responses (with equal probabil-= -_.^P i jiog 2 I p . l o g ^ 48 . i t i e s of 0 . 3 3 3 each) to 1000 27-item tests. In t h i s graph, the bins t h for T -j values from 0 to 50% represent intervals of 1 % , the n bin representing the number of cases where T -j has a value between 0 . 0 1*n and 0 . 0 1 x(n+l). Each asterisk represents 2 cases. A bin with less than 2 cases shows one asterisk. The correct score, either in % or absolute value, was also calculated for each subject. Figure 3 shows the d i s t r i b u t i o n of correct scores, based on computer-generated random responses (with equal p r o b a b i l i t i e s of 0 . 3 3 3 each) to 1000 27-item tests. Each asterisk represents 2 cases, a bin with less than 2 cases showing one asterisk. The above d i s t r i b u t i o n s are based on tests of 27 items for comparison with each group of edited utterances, as there were. 27 items of Group I utterances, 27 items of Group II utterances, and so on. C 4 fD O H -W r+ tr c r+ O 3 O Hi 4 fl> 1—1 t r cu w (D o 0) 3 a. o a fD W "O O 3 W fD W r+ O t o r+ fD •3 r+ fD W r+ t 617 0 22 1 81 2 75 3 98 4 88 5 104 6 61 7 65 8 72 9 48 10 45 11 36 12 47 13 26 14 14 15 30 16 17 17 12 18 7 19 11 20 12 21 9 22 2 23 3 24 4 25 0 26 3 27 1 28 1 29 2 30 1 31 0 32 2 33 0 34 0 35 0 36 0 37 0 38 0 39 0 40 0 41 0 42 0 43 0 44 0 45 0 46 0 47 0 48 0 49 0 , 5 0 *l fD O H Hi r—s O o\° fD W fD DO tr 3 ************ ***** ***************** ******************* ************** ************************************* ************************* *********************** * * * * * * * * * * * * * + * * * * • * * * * * * * * *• * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * • * • » * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * + * * * * * * * * * * * * * * * * * * * # * * * * * * * * - = e ^ ' ? « ^ ^ s " c ] - " Q r t G Q G e S S O C e < ' # of cases/bin co n <r'in r- cc CT- O — CVJ co -3" m *° r- co o o — c\:cr> <r in <• r~ _ ^ ~ -HI — — _ — o; a a M o; CJ <x w < Score (maximum 27) on o Figure 3. D i s t r i b u t i o n o f Scores based on random responses to a 27-item t e s t . 51 CHAPTER 5 RESULTS 5.1 Correlation Between Relative Transmission (T -j) and Score (S) Because the r e l a t i v e transmission i s a measure of information transmission, and not necessarily correct information transmission, a high T ^ value does not always r e f l e c t a high score. For example, i f a subject consistently responds with the vowel / i / when the stimulus i s /y/, he i s receiving predictable information from the stimulus. Although he misinterprets t h i s information consistently, he obtains a high value of T -j (assuming other responses are also highly predictable from t h e i r s t i m u l i ) . I f a subject responds in a random manner, using no information from the stimulus, S w i l l be at chance l e v e l , for example in the test described here, distributed around 9 (out of a maximum of 27), as shown in Figure 3; T r g-j w i l l be r e l a t i v e l y low, distributed as shown in Figure 2. The better a subject performs on the t e s t , the better one would expect T , and S to correlate. present t e s t , Pearson correlation c o e f f i c i e n t s were calculated between T , and S for each of the 4 groups of items and also for a series of re I random responses. Correlations were as follows: To determine whether t h i s was so for performances on the Group I Group II Group III Group IV Random R = 0.91 R = 0.73 R = 0.76 R = 0.50 Responses R =-0.08 5 2 From the above, one sees that the longer the portion of the cluster in the test item, the better the correlation between the two measures used here to describe performance. It w i l l also be seen in Section 5 . 2 that the longer the test item, the better the subjects' performance. Therefore, as expected, the highest correlations between T - i and S occur for the items for which the subject does best. In re I examining performance for Group IV items, one measure i s not a good indicator of the other measure. 5 . 2 I d e n t i f i c a t i o n of the Missing Vowel Responses were tabulated in 3 x 3 confusion matrices, one matrix per group of items and per -subject. There were therefore 4 matrices per subject, each with 2 7 items. "'"rel a n c* ^ w e r e c a ^ c u l a t e d f o r each matrix. Figure 4 shows mean T-| and S values for each of the four groups of items, displayed separately for French and English speakers. T ^ and S values one standard deviation about the mean are also shown. Levels that one subject would obtain by chance 1%, 5 % , and 1 0 % of the time (obtained from Figures 2 and . 3 , Chapter 4 ) are also shown in Figure 4 . As can be seen in that figure, a l l subjects showed a downward trend i n both Tre-| and S, from Group I to Group IV. That i s , the farther from the vowel the utterance was truncated, the less able subjects were to correc t l y i d e n t i f y the vowel. An analysis of variance showed a s i g n i f i c a n t treatment effect among the groups of items for both Tre-| and S for both French and English subjects. This effect i s s i g n i f i c a n t 53 at the levels indicated in the table below. French English Treatment Effects On: T r e l S p <'O.OOl p < 0.001 p < 0.05 p < 0.005 The Newman-Keuls t e s t , which indicates between which groups of items s i g n i f i c a n t differences e x i s t [Winer, 1971, pp. ,191-196] was also applied to the data. S i g n i f i c a n t differences were found between several pairs of groups of items, for both French and English speakers, as shown in Table II. Results show a great deal of individual v a r i a t i o n . Figure 5 shows T r e-j and S values for each group of utterances, for 3 d i f f e r e n t subjects. Levels that one subject would obtain by chance 1%, 5%, and 10% of the time are indicated. Subject AS scored consistently higher than any other subject. She was a native French speaker who was a teacher of French, but had had no formal phonetic t r a i n i n g . Subject CM was a female native English speaker who had had some phonetic t r a i n i n g . Subject CB was a male native speaker of French and one of the speakers on the t e s t ; he had also had some phonetic t r a i n i n g . Because of her high performance r e l a t i v e to other subjects, subject AS was retested. On the second run of the test she maintained her high l e v e l s , scoring s l i g h t l y higher than she had on the f i r s t run. Results as shown in Figures 4 and 5 indicate that, for several of the item groups, subjects were able to i d e n t i f y the missing vowel above chance l e v e l s . For example, on the average, English d. 25 20 + w 15 10 1% 5% 25 i 20 w 15 e 6 H O "X W 6 10 I II I I I IV GROUP Mean v a l u e s o f T 0 - 1% - 5% -10% IV I I I I I I GROUP F i g u r e 4 . ea  l s f • and S , p l u s o r minus one s t a n d a r d r e l ' r . d e v i a t i o n , f o r e a c h g roup o f i t e m s . a , b - F r e n c h s p e a k e r s c , d - C n j ' . l i a h speakers 55 TABLE II Differences in T .| and in S Between Each Group of Items, for French and English Subjects FRENCH SPEAKERS T r e l ( % ) Score Group I II III Group I II III II 35.18 II 10 III 132.03** 96.85* III 46** 36** IV 172.97** 137.79** 40.92 IV 58** 48** 12 ENGLISH SPEAKERS T r e l ( % ) Score Group I II III Group I II III II 60.99 II 19 III 121.33* 60.34 III 41** 22 IV 123.27* 62.28 1.94 IV 40** 21* 1 ** 0.01 level of significance. * 0.05 level of significance. 57 subjects obtained T .| and/or S values higher than those one subject would obtain by chance 5% of the time, for items of Group I, and French subjects, on the average, obtained s i m i l a r l y high levels for items of Group I and I I . Several individuals of both languages performed well above the 5% chance levels for Groups I and I I , and 5 individuals did so for Group I I I . In general though, Group III and IV performances were at the level which one subject would obtain by chance 80% of the time. I t i s interesting to note that individual variations were so great that some subjects were able to i d e n t i f y the vowel for Group III and IV items better than others were able to i d e n t i f y vowels for Groups I and I I . In general, French subjects tended to d i s t r i b u t e t h e i r responses evenly, responding approximately 1/3 of the time with each vowel. This tendency was somewhat weaker for the English subjects, who ,for Groups III and IV tended to make more /u/ and /i/ responses respectively. Correct answers were not evenly distributed for either language group, for any group of items. For a l l groups except Group IV, correct /y/ responses were less frequent than correct / i / or /u/ responses. French subjects did not show a d i f f e r e n t pattern of correct responses from English subjects. The percent of correct responses for a l l items of a p a r t i c u l a r stimulus, pooled for a l l subjects, i s shown in Table I I I . 58 TABLE III Percent of Items Answered Correctly in Each Vowel Category for Each Group of Items, A l l Subjects Pooled Together GROUP STIMULUS IM hi /u/ I 64.4% 40.5% 63.9% II 51.7 40.0 61.1 III 41.2 38.3 41.5 IV 45.0 35.6 33.4 Examination of the confusion matrices showed certain confusions to be more common than others. Cpnfusions between /i/ and /y/, and /y/ and /u/, were more common than the / i / - / u / confusion. This i s not surprising when one considers that while /i/ and /y/ share the front feature, and /y/ and /u/ share the rounding feature, IM and /u/ share neither of these. Confusions between a l l pairs of vowels increased as the items got shorter, the only exception to th i s downward trend being for the /y/-/u/ confusion, which was made less often in Group IV than in Group I I I . Table IV shows the percent of time each confusion was made. For example, in Group I the / i / - / y / confusion was made on 27.5% of the items f o r which either IM or lyl was the stimulus. Several subjects reported that they were most often undecided as to whether the missing vowel was IM or /y/, or lyl or /u/, and several reported that they were never confused between /i/ and /u/. Performances seem consistent with the f i r s t observation, but not s t r i c t l y so with the second. 59 TABLE IV Percent of Items in Each Group for Which Various Vowel Confusions Were Made, A l l Subjects Pooled Together GROUP VOWEL CONFUSIONS / i / - / y / /y/-/u/ / i / - / u / I 27.5% 27.8% 10.3% II 27.5 27.2 16.1 III 30.8 37.2 21.7 IV 40.6 28.1 24.2 Further findings on feature relationships between stimulus and response are discussed in Section 5.3 below. 5.3 I d e n t i f i c a t i o n of Individual Features of the Missing Vowel To examine the p e r c e i v a b i l i t y of a pa r t i c u l a r feature, responses were grouped in the following ways: /i/ and /y/ vs /u/ (front-vs-back), and / i / vs /y/ and /u/ (unrounded-vs-rounded). That i s , i f the stimulus was a front vowel, and the response either the same or the other front vowel, the response was considered correct in the front/ back analysis. A simi l a r procedure was employed for the unrounded/ rounded analysis. When a 27-item 3 x 3 confusion matrix, for which each row has a total of 9 entries, i s collapsed in the manner described above, a 2 x 2 matrix results i n which one row has 9 entr i e s , and the other row 18 entries. In such a matrix., the feature for which the data are grouped forms 2/3 of the total data. Therefore an error among the 90 t JO + 70 4-60 50 + 40 + 30 20 10 + •• unrounded vs.rounded -o f r o n t vs back 15 + —I— I I I 10 6 H o x o res 5 4 0 I I I I I I . l l I I I GROUP ' GROUP Fig u r e 6. Mean values of T r e l and S f o r front- v s - b a c k d i s t i n c t i o n s , and unrounded-vs-rounded d i s t i n c t i o n s , shown f o r three groups o f items.' CTl O grouped data affects the score more than does an error among the un-grouped data. To overcome th i s imbalance, values in the row containing 18 entries were halved before T n and S were calculated. rel Figure 6 shows mean values of T -j and S for perception of the front/back d i s t i n c t i o n and for the unrounded/rounded d i s t i n c t i o n . Values shown are mean values for a l l 20 subjects. Only responses for Groups I to III are shown, as Group III responses are already at chance l e v e l s . An analysis of variance showed that, for items of Group I, subjects did not make front/back d i s t i n c t i o n s s i g n i f i c a n t l y better than they made rounded/unrounded d i s t i n c t i o n s . This was the case when either T ^ or S was taken as an indication of performance. Because differences were greatest for Group I but yet were not s i g n i f i c a n t , analysis of variance between feature d i s t i n c t i o n s in the other groups was not carried out. Much individual v a r i a t i o n was seen, both i n a b i l i t y to make a feature d i s t i n c t i o n , and in which d i s t i n c t i o n , either front/back or unrounded/rounded, was more e a s i l y made. The table below shows the wide range of T ^ values for feature d i s t i n c t i o n s for 4 French 3 r e l subjects. I t also shows that some subjects made the front/back d i s t i n c t i o n more often, some made the unrounded/rounded d i s t i n c t i o n more often, and some made both d i s t i n c t i o n s equally. Similar individual differences were observed for English subjects. Subject T i for rel Front/Back Di s t i n c t i o n T -j rel Unrounded/ Rounded Dis t i n c t i o n DN 0.30% 2.19% EA 8.17 25.33 CB 10.52 11.24 PC 65.49 29.07 5.4 Differences Between Subject Groups Tables V and VI compare mean T -j values and standard devia-tions for the 10 French and 10 English subjects, and for the 7 phonetically trained and 13 phonetically naive subjects, respectively. Treatment-by-levels analyses of variance showed no s i g n i f i c a n t d i f f e r -ences between French and English subjects, for either T -j or for S, and no s i g n i f i c a n t differences between phonetically trained and phonetically naive subjects, for T ^ or for S. 5.5 Speaker Differences Responses were examined to see i f subjects performed better on items spoken by one of the three speakers than by the others. No s i g n i f i c a n t differences were found between results for items by each speaker. However, again some individual variations were observed. Several subjects stated that items spoken by one or another of the 63 TABLE V Mean T r e] Values (x) and Sample Standard Deviations (s) for Each Group of Items, Shown for French and English Subjects. Differences Between the Two Language Groups are not Si g n i f i c a n t . N = Number of Subjects in Each Group SUBJECT N GROUP I II III IV French 10 X 25.4 21.9 12.2 8.1 s 17.3 14.9 10.8 7.1 English 10 X 22.7 16.6 10.5 10.3 s 14.8 11.2 11.7 7.2 TABLE VI Mean Trg-j Values (x) and Sample Standard Deviations (s) For Each Group of Items, Shown for Phonetically Trained and Phonetically Naive Subjects. Differences Between the Two Groups With Different Phonetic Back-grounds are not S i g n i f i c a n t . N = Number of Subjects in Each Group SUBJECT N GROUP I II I I I IV Phonetically Trained 7 X 26.2 22.4 13.2 11.1 s 6.2 . 7.4 7.5 4.9 Phonetically Naive 13 X 22.8 17.5 10.4 8.2 s 14.3 10.6 7.9 4.9 speakers were easiest to answer. Such remarks were usually consistent with the subjects' better performance for that p a r t i c u l a r speaker, but there was no consistent trend as to who the "best" speaker was. A l l of the three speakers served as subjects in the test. They did not perform consistently d i f f e r e n t l y from the other subjects. Nor did they perform best on the items for which they themselves were speaking. In f a c t , two of the speakers performed somewhat worse on. items for which they were the speakers, than for items uttered by another speaker. 65 CHAPTER 6 DISCUSSION 6.1 I d e n t i f i c a t i o n of the Missing Vowel French utterances containing the sequence -C^C^C^V- were truncated at four points before the vowel, as shown in Table I (Chapter 4). Cowan [1973] has shown that for production of these utter-ances by native French speakers, upper l i p protrusion most often begins with the f i r s t consonant of the clust e r or e a r l i e r , when the vowel following the cluster i s a rounded one. Items of each of the four groups prepared for the present experiment therefore contained d i f f e r e n t amounts of information as to the nature of the following vowel. Subjects were able to predict the upcoming vowel above chance levels for items in Groups I and I I . These items had been.truncated af t e r C^, and in the middle of C^ respectively. In a l l cases, C^ was the phoneme />/. In general, subjects were unable to predict the upcoming vowel for items in Groups III and IV. These items had been truncated after aspiration of C^  (C^ being /k/ or / t / ) , and after release of C^ . but before a s p i r a t i o n , respectively. One sees from these r e s u l t s , represented graphically i n Figures 4 and 5 (Chapter 5), that the segments up to and including C^  contain information about the following vowel that i s u t i l i z a b l e in the perception process. This information i s not r e s t r i c t e d to the C^ V juncture, since the vowel can be corre c t l y predicted when segments up to only the middle of C d are heard. Though on the a r t i c u -66 latory level the influence of the vowel i s apparent as far back as the f i r s t consonant of the cluster or e a r l i e r , the information present in of the cluster and before i s not by i t s e l f u t i l i z a b l e by the l i s t e n e r as an aid in identifying the upcoming vowel. It i s not apparent whether the perceivable information i s r e s t r i c t e d to or whether i t i s the cumulative information present in the whole clust e r up to and including at least half of which i s used in perception. However, because co a r t i c u l a t i o n due to the vowel may begin by the f i r s t consonant of the c l u s t e r , i t seems l i k e l y that several segments, and not j u s t C^, contain information which, when i t i s a l l available to the l i s t e n e r , can be used in the perception process, but when only early segments are a v a i l a b l e , i s not perceptually useful. However, there are great individual differences, and one subject at least was able to consistently predict the upcoming vowel above the level she would obtain by chance 5% of the time even for items of Group IV. Lehiste and Shockey [1972] have determined that coarticulatory effects in VCV utterances are not perceivable, whereas Sharf & Ostreicher [1973] c i t e evidence that these effects are perceivable in CVNV utterances. It i s not clear why, for some utterances, coarticulated information i s perceivable, while for others i t i s not. The extent of c o a r t i c u l a t i o n may depend on several factors: on the a r t i c u l a t o r s involved [for example, Carney and Mo l l , 1971], the place, manner, and voicing c h a r a c t e r i s t i c s of the neighbouring phonemes [Stevens and House, 1963], and the language being spoken [Ohman, 1966]. Depending on factors such as these, c o a r t i c u l a t i o n at the a r t i c u l a t o r y level may be of an extent to produce more or less perceivable e f f e c t s , or none at a l l . It may be, for instance, that coarticulatory influences of a vowel on a preceding nasal (as in Sharf and Ostreicher's study) are perceivable, whereas coarticulatory influences of a vowel on a preceding stop consonant (as in Lehiste and Shockey's study) are less so. Further studies comparing p e r c e i v a b i l i t y of coarticulatory influences on f r i c a t i v e s , nasals, stops, and g l i d e s , voiced and unvoiced, would y i e l d results relevent to t h i s matter. In comparing such studies to the present one, i t should be noted that the />/ used here i s the uvular f r i c a t i v e , as opposed to the English retroflexed sonorant. Table II (Chapter 5) shows between which groups of items performance diff e r e d s i g n i f i c a n t l y . One sees that for both French and English subjects, and for both T .| and S, no s i g n i f i c a n t differences were found between Groups I and I I , although the trend was a s l i g h t decrease from I to I I . Subjects were able to i d e n t i f y the vowel when segments only up to the middle of C 4 were present almost as well as they could i d e n t i f y i t when they heard a l l segments including the entire consonant. S i m i l a r l y , no s i g n i f i c a n t differences were noted between Groups III and IV, indicating that hearing a l l of C^  did not increase a l i s t e n e r ' s performance over hearing only part of that consonant. I t seems that without the information present in C^, the amount of other preceding information present makes no difference to a l i s t e n e r ' s a b i l i t y to i d e n t i f y the following vowel. 6.2 I d e n t i f i c a t i o n of Individual Features of the Missing Vowel A l i e_t al_. [1971] found that coarticulated nasality was perceivable in CVN and CVVN utterances from which the f i n a l nasal was deleted. The present study found that c o a r t i c u l a t i o n of two other 68 features, front/back and unrounded/rounded, also have perceivable effects. As shown in Tables V and VI (Chapter 5), / i / - / u / confusions were less frequent than / i / - / y / or /y/-/u/ confusions, and the vowel /y/ was corre c t l y i d e n t i f i e d less of the time than the other vowels. These results are probably due to the fact that, while /i/ and /y/ share the feature value front, and /y/ and /u/ the feature value rounded, /i/ and /u/ share neither of these. Thus on hearing an item containing information f o r a /y/, a subject may misinterpret i t as either an /u/ or an /i/, based on his perception of the shared features discussed above. S i m i l a r l y , he may misinterpret an /i/ as a /y/, but i s less l i k e l y to misinterpret i t as an /u/; he may misinterpret an /u/ as a /y/, but i s less l i k e l y to misinterpret i t as an / i / . Because /y/ shares features with both other vowels, misinterpretations of the kind described here are more l i k e l y to occur for the vowel /y/ than for the other vowels. Figure 6 (Chapter 5) compares p e r c e i v a b i l i t y of the front/ back and unrounded/rounded d i s t i n c t i o n s . Individual features are known to d i f f e r s i g n i f i c a n t l y in i n t e l l i g i b i l i t y , some being more readily perceivable than others [Wang and B i l g e r , 1973]. However, the features in question here are equally perceivable, though several individuals were better able to make one d i s t i n c t i o n than the other. As expected, perception of either feature decreased as the test item grew shorter, performances for Groups III and IV being at the chance levels indicated at the right of each graph i n Figure 6. One sees that, just as segments preceding C d provide no usable i n f o r -mation regarding the vowel on thei r own, they also provide no usable information regarding a feature of the vowel. Comparison of Figures 4 and 6 shows that scores (converted to %) were considerably higher for feature i d e n t i f i c a t i o n than for vowel i d e n t i f i c a t i o n , in part a consequence of collapsing the matrix and including entries off the diagonal of the 3 x 3 matrix. However T -| values for feature i d e n t i f i c a t i o n were s l i g h t l y lower than for vowel i d e n t i f i c a t i o n . One expects feature i d e n t i f i c a t i o n to be better than vowel i d e n t i f i c a t i o n , the subject being presented with a two-way d i s -crimination task in the f i r s t case, and a three-way task in the second. The s l i g h t decrease in T '-j from the vowel to the feature condition shows that, when both feature d i s t i n c t i o n s are considered together, as i s necessary for correct vowel i d e n t i f i c a t i o n , s l i g h t l y more information i s abstracted from the stimulus than when either feature i s considered on i t s own. The d i s t i n c t i o n s front/back and unrounded/rounded are the manifestations of s p e c i f i c a r t i c u l a t o r y gestures, necessary for the production of the vowels described above. Coarticulation i n the utterances described here causes the a r t i c u l a t o r s to i n i t i a t e these gestures in anticipation of the upcoming vowel. To an extent, t h i s effect i s perceivable. In the case of rounding, such anticipatory c o a r t i c u l a t i o n i s known to occur as early as the f i r s t consonant of the Cj-.C^V sequence, yet in general, i s perceivable only i f segments up to and including at least half of C^ are present, and not i f less than t h i s amount of information i s available. It i s not known how extensively fronting i s coarticulated in these utterances, but si m i l a r 70 to rounding, i t i s perceivable only i f a l l segments including at least half of are present. 6.3 Differences Between Subject Groups Results show that the p e r c e i v a b i l i t y of coarticulated informa-tion does not seem to be related to the li s t e n e r ' s native language, even though one of the vowels employed in the study (/y/) i s not an English phoneme. Such findings are in agreement with the findings of Stevens et al_. [1969], that the l i s t e n e r ' s l i n g u i s t i c background, be i t English or Swedish, did not affect his a b i l i t y to make subphonemic d i s t i n c t i o n s , even among vowels that were not present in his language. Though the pattern of co a r t i c u l a t i o n may be language-dependent, i t s perception does not seem to be. Phonetic t r a i n i n g did not aff e c t the test r e s u l t s . The subjects with phonetic background in general were not better able to i d e n t i f y the missing vowel than the phonetically naive subjects. This suggests that the a b i l i t y to make use of coarticulatory information does not depend on s p e c i f i c t r a i n i n g . However, large individual variations in test performance indicate that not a l l l i s t e n e r s make the subphonemic d i s t i n c t i o n s necessary to predict the missing vowel. Whether they are completely unable to do so, or whether several subjects were not s u f f i c i e n t l y motivated or did not completely understand the task, i s not clear. Other researchers of speech perception a b i l i t i e s have also noted considerable individual differences [ A l i ejt a]_., 1971; Liberman et aT., 1957; Stevens et_ aj_. , 1969]. It i s l i k e l y that 71 some subjects in these tasks are more motivated than others; however, i t also seems possible that some individuals possess keener powers of discrimination than others. Several of the poorer performers in the present study had in fact shown keen interest and motivation in the task. 6 . 4 Subjects' Comments Without exception, a l l subjects reported that they found the test d i f f i c u l t . Most f e l t certain they had performed badly (though they may or may not have), and that they had guessed a large proportion of the time. The fact that subjects thought the test was a d i f f i c u l t one and that they had "only guessed" does not necessarily mean that the perceptual mechanism, to a large extent working subconsciously, could not handle the task. However, only one of the twenty subjects consis-tently gave an indication of his confidence in each of his responses, as was suggested in the instructions. This seems to indicate that the task of ide n t i f y i n g the vowel was s u f f i c i e n t l y d i f f i c u l t to impede subjects from making the further decision of how confident they were in each response. This may mean, that though the vowel was perceivable to an extent, use of coarticulatory information i s not a process used in everyday speech perception. As noted in Section 5 . 5 , some subjects tended to do better on items spoken by one speaker than the others, though there was no general trend for a l l subjects to perform best for one pa r t i c u l a r speaker. Subjects were usually correct when they stated they had 72 performed best for one of the speakers. F a m i l i a r i t y with one or more of the speakers did not affect a subject's performance, and the three speakers, who also served as subjects, did not perform best on t h e i r own utterances. Most subjects could not describe the strategy they had used in responding. However, several subjects were seen to repeat the test item subvocally two or three times before choosing t h e i r response. Another comment some subjects made was that t h e i r choice was sometimes influenced by a vowel heard in the test item. S p e c i f i c a l l y , for an item containing the sequence /lamorsk/, they would tend to choose the vowel /u/ as the missing one, because of the back vowel hi in the test item. Other subjects reported that they tended to choose /u/ for items of the speaker with the lowest voice, and one phonetically trained subject said she often chose /i/ and /y/ for a speaker who she judged to have "more fronted speech." Subjects were sometimes, but not always, accurate in t h e i r descriptions of t h e i r response tendencies. Thus i t seems several factors may have influenced a subject's response, perhaps sometimes masking out the perceivable effect due to coarticu-l a t i o n . However, none of the factors described above was looked at s p e c i f i c a l l y in the analysis. 6.5 Conclusions Al i et a]_. [1971] hypothesize that i f the effects of coarticu-l a t i o n are perceivable, then speech perception can be said to follow speech production and make use of i t s idiosyncracies. This r e l a t i o n -ship i s predicted by the motor theory of speech perception 73 [Liberman et aj_., 1967]. Results of the present study suggest that such a relationship between production and perception exists to an extent. The perception process can make use of some of the idiosyn-cracies of production; coarticulated information i s only sometimes perceivable in -C..CV utterances, notably when a l l segments up to and including are present. The present results seem to be predicted by Wickelgren's [1969] model of context-sensitive coding, in which each unit specifies i t s r i g h t - and left-hand neighbours. The f i n a l consonant of the clust e r contains information which specifies the immediately following vowel, but segments preceding the f i n a l consonant seem to contain no perceivable information regarding the vowel. However, as discussed previously, i t i s l i k e l y that i t i s the cumulative information present in a l l preceding segments that i s used perceptually. Also, there i s no reason to assume that coarticulatory influences of a vowel could never be strong enough to produce a completely perceivable ef f e c t on a phoneme more than one removed from the vowel. Some subjects were able to i d e n t i f y the vowel when hearing utterances truncated a f t e r Cg of the c l u s t e r , suggesting that, for them at l e a s t , context s e n s i t i v i t y i s not lim i t e d to the immediately neighbouring phoneme. In addition, in the utterances used here was always a voiceless stop. It i s not known what the coarticulatory influence of the vowel on a nasal or f r i c a t i v e in that position may be. The fact that subjects can use subphonemic coarticulatory information to i d e n t i f y an upcoming vowel does not mean that the perception process necessarily incorporates t h i s a b i l i t y . There i s evidence that subphonemic d i s t i n c t i o n s are not as well perceived as phonemic ones [Liberman e_t al_., 1957; Stevens et al_., 1969], and speech perception seems to be primarily a categorical process. But i t i s possible that in unfavorable conditions, such as a noisy environment or a large amount of information having to be processed quickly, coarticulatory effects are used as cues by the perceptual mechanism. Use of such redundant cues would f a c i l i t a t e correct i d e n t i f i c a t i o n of any one speech sound. I t i s clear that some coarticulatory effects provide s i g n i f i c a n t l y perceivable information to the l i s t e n e r . BIBLIOGRAPHY ALI, L., GALLAGHER, T., GOLDSTEIN, J . , and DANILOFF, R. (1971). "Perception of Coarticulated Nasality," J. Acoust. Soc. Amer. 49: 538-540. AMERMAN, J.D., DANILOFF, R., and MOLL, K.L. (1970). "Lip and Jaw Coarticulation for the Phoneme / a e . / , " J. Speech Hearing Res. 13: 174-161. CARNEY, P.J., and MOLL, K.L. (1971). "A Cinefluorographic Investi-gation of F r i c a t i v e Consonant-Vowel Coarticulation," Phonetica 23: 193-202. CLARK, M., and SHARF, D.J. (1973). "Coarticulation Effects of Post-Consonantal Vowels on the Short-Term Recall of Pre-Consonantal Vowels," Language and Speech 1_6: 67-76. COWAN, H.A. (1973). "A Study of Upper Lip Protrusion in French," Master's Thesis, University of B r i t i s h Columbia. DANILOFF, R., and MOLL, K.L. (1968). "Coarticulation of Lip Round-ing," J. Speech Hearing Res. 1J_: 707-721 . DELATTRE, P.C., LIBERMAN, A.M., and COOPER, F.S. (1955). "Acoustic Loci and Transitional Cues for Consonants," J. Acoust. Soc. Amer. 27: 769-773. FROMKIN, V.A. (1966). "Neuro-Muscular Specifications of L i n g u i s t i c Units," Language and Speech 9: 170-199. FRY, D.B. (1964). "Experimental Evidence for the Phoneme," in In  Honour of Daniel Jones, David Abercrombie, D.B. Fry, P.A.D. MacCarthy, N.C. Scott, and J.L. Trim, Eds. (Longmans, London), 59-72. HENKE, W.L. (1966). "Dynamic A r t i c u l a t o r y Model of Speech Production Using Computer Simulation," Doctoral Thesis, M.I.T. \ HOUSE, A.S., and FAIRBANKS, G. (1953). "The Influence of Consonant Environment upon the Secondary Acoustical Characteristics of Vowels," J. Acoust. Soc. Amer. 25: 105-113. KELSEY, C.A., W00DH0USE, R.J., and MINIFIE, F.D. (1969). " U l t r a -sonic Observations of Coarticulation in the Pharynx," J. Acoust. Soc. Amer. 46: 1016-1018. KOZHEVNIKOV, V.A., and CHISTOVICH, L.A. (1965). Speech, A r t i c u l a t i o n , and Perception (translated from Russian], Joint Publication Research Service, U.S. Dept. Commerce Mo. 3Tj (Washington). KUEHN, D. (1970). "Perceptual Effects of Forward Coarticulation," M.A. Thesis, University of Iowa. LADEFOGED, P., and BROADBENT, D.E. (1960). "Perception of Sequence . in Auditory Events," Quart. J. Exp. Psych. 1_2: 162-170. LEHISTE, I. (1972). "The Units of Speech Perception," Working Papers in L i n g u i s t i c s No. 1_2, The Ohio State University, 1-32. LEHISTE, I., and SHOCKEY, L. (1972). "On the Perception of Coarticu-l a t i o n Effects in English VCV Syll a b l e s , " Working Papers in Lin g u i s t i c s No. Ij2, The Ohio State University, 78-86. LIBERMAN, A.M., COOPER, F.S., SHANKWEILER, D.P., and STUDDERT-KENNEDY, M. (1967). "Perception of the Speech Code," Psych. Review 74: 431-461. LIBERMAN, A.M., HARRIS, K.S., HOFFMAN, H.S., and GRIFFITH, B.C. (1957). "The Discrimination of Speech Sounds within and across Phoneme Boundaries," J. Exper. Psych. 54: 358-368. LOTZ, J . , ABRAMSON, A., GERSTMAN, L., INGEMANN, F., and NEMSER, W.J. (1960). "The Perception of English Stops by Speakers of English, Spanish, Hungarian, and Thai," Language and Speech 3: 71-77. MACNEILAGE, P.F. (1963). "Electromyographic and Acoustic Study of the Production of Certain Final Clusters," J. Acoust. Soc. Amer. 35: 461-463. MACNEILAGE, P.F. (1972). "Speech Physiology," in Speech and Cortical  Functioning, John H. G i l b e r t , Ed., (Academic Press, New York & London), 1-72. MACNEILAGE, P.F., and DECLERK, J.L. (1969). "On the Motor Control of Coarticulation i n CVC Monosyllables," J . Acoust. Soc. Amer. 45: 1217-1233. MILLER, GA ., and NICELY, P.E. (1955). "An Analysis of Perceptual Confusions Among Some English Consonants," J. Acoust. Soc. Amer. 27_: 338-352. MOLL, K.L., and DANILOFF, R. (1971). "Investigation of the Timing of Velar Movements During Speech," J. Acoust. Soc. Amer. 50: 678-684. 77 OHMAN, S.E.G. (1966). "Coarticulation in VCV Utterances: Spectro-graph^ Measurements," J. Acoust. Soc. Amer. 39_: 151-168. OHMAN, S.E.G. (1967). "Numerical Model of Coarticulation," J. Acoust. Soc. Amer. 41_: 310-320. PERKELL, J.S. (1969). Physiology of Speech Production: Results and Implications of a Quantitative Cineradiographic Study (M7l.T. Press, Cambridge). PETERSON, G.E., and BARNEY, H.L. (1952). "Control Methods Used In a Study of the Vowels," J. Acoust. Soc. Amer. 24: 175-184. SAVIN, H.B., and BEVER, T.G. (1970). "The Nonperceptual Reality ' of the Phoneme," J. Verb. Learning Verb. Behavior £: 295-302. SHARF, D.J., and OSTREICHER, H. (1973). "Effect of Forward and Back-ward Coarticulation on the I d e n t i f i c a t i o n of Speech Sounds," Language and Speech 1_6: 196-206. STEVENS, K.N., and HOUSE, A.S. (1963). "Perturbation of Vowel Ar t i c u l a t i o n s by Consonantal Context: An Acoustical Study," J. Speech Hearing Res. 6_: 111-128. STEVENS, K.N., HOUSE, A.S., and PAUL, A.P. (1966). "Acoustical Description of Sy l l a b i c Nuclei: An Interpretation in Terms of a Dynamic Model of A r t i c u l a t i o n , " J . Acoust. Soc. Amer. 40: 123-132. STEVENS, K.N., LIBERMAN, A.M., STUDDERT-KENNEDY, M., and OHMAN, S.E.G. (1969). "Cross Language Study of Vowel Perception," Language and Speech 1_2: 1-23. WANG, M.D., and BILGER, R.C. (1973). "Consonant Confusions in Noise: A Study of Perceptual Features," J. Acoust. Soc. Amer. 54: 1248-1266. WICKELGREN, W.A. (1969). "Context-Sensitive Coding in Speech Recognition, A r t i c u l a t i o n , and Development," in Information Processing in the Nervous System, K.N. Leibovic, Ed., (New York-Heidelberg-Berlin: Springer), 85-95. WINER, B.J. (1971). S t a t i s t i c a l P r i n c i p l e s in Experimental Design, (McGraw-Hi11 Book Company, New York). APPENDIX I Utterances Used in the Experiment l a dextre inimitable l a dextre universelle l a dextre outraged 1'averse t r i b a l e 1'averse truquee 1'averse troublee 1'amorce criptique 1'amorce cruciforme 1'amorce croupissante /la d e k s t r i n i m i t a b l / /ladekstryniverse!/ /ladekstrutra ^ e / / l a v e r s t r i b a l / /laverstryke/ /laverstruble/ •/lamarskriptik/ /1 armr s krys i form/ /lamorskrupisant/ 79 APPENDIX II Instructions You w i l l be hearing a tape of a series of short French utter-ances. The end of each utterance has been deleted. Listen c a r e f u l l y and decide what vowel w i l l follow the truncated utterance. The possible answers are the French vowels "i" as i n "dites," "u" as i n "une," and "ou" as in "bout" (that i s , the phonetic symbols /i/, /y/, /u/). For example, the utterance may be: l a dextre inimitable or l a dextre universelle or l a dextre outragee However, you w i l l hear the phrase cut o f f before the vowel: l a d e x t r ( e ) — In a l l cases, your task i s to decide i f the missing vowel i s " i , " "u,"or"ou," Choose your answer on the basis of what you hear, and what vowel sounds as if it is coming up. Do not be concerned with the meaning of the utterance. The next sheet contains a l i s t of a l l the utterances. Remember, you w i l l not be hearing the whole utterance, only a shortened form. The l i s t i s meant to f a m i l i a r i z e you with a l l the possible answers. Your task i s to id e n t i f y only the missing vowel. 80 Mark your answer in the appropriate column on the answer sheet. If you feel you do not know the answer, i t i s important that you guess. Approximately 1/3 of the answers are "i," 1/3 "u," and 1/3 "ou." These numbers are only approximate, so l i s t e n c a r e f u l l y and mark your answer as the vowel you feel most sure i s the missing one. If you l i k e , you can mark an indication of the confidence you have in your choice. If you are reasonably sure you have answered cor r e c t l y , mark a '!' beside your answer. I f you are not too sure of the answer you have put down, or i f you have no confidence at a l l i n your response, mark a '2' or a '3' respectively beside the answer. You need not make th i s judgment for each response i f you feel you do not have the time. There are 110 items on the test. It w i l l l a s t approximately 20 minutes. You w i l l f i r s t be hearing three practice items, a f t e r which the tape w i l l be stopped in case you have any questions. You may ask to stop the tape any time during the test i f you feel you need a break, but no item w i l l be repeated. Choose your answer on the basis of what you hear, and what vowel sounds as i f i t i s coming up. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0093039/manifest

Comment

Related Items