Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Use of the analysis by synthesis model of speech perception by children acquiring the sound system of… Reddy, Christine Ann 1977

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1977_A6_7 R43.pdf [ 5.9MB ]
Metadata
JSON: 831-1.0094092.json
JSON-LD: 831-1.0094092-ld.json
RDF/XML (Pretty): 831-1.0094092-rdf.xml
RDF/JSON: 831-1.0094092-rdf.json
Turtle: 831-1.0094092-turtle.txt
N-Triples: 831-1.0094092-rdf-ntriples.txt
Original Record: 831-1.0094092-source.json
Full Text
831-1.0094092-fulltext.txt
Citation
831-1.0094092.ris

Full Text

USE OF THE ANALYSIS BY SYNTHESIS MODEL OF SPEECH PERCEPTION BY CHILDREN ACQUIRING THE SOUND SYSTEM OF LANGUAGE by CHRISTINE ANN REDDY B.Sc., University of B r i t i s h Columbia, 1974 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n the D i v i s i o n of Audiology and Speech Sciences . i n the Department of Paediatrics We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA March, 1977 © Ch r i s t i n e Ann Reddy, 1977 In p r e s e n t i n g t h i s t h e s i s in p a r t i a l f u l f i l m e n t o f the r e q u i r e m e n t s f o r an advanced deg ree a t the U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t ha t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by the Head o f my Depar tment o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Depar tment o f /ijjislih'iuJJi > jfrS'tH'A/rtkj c/ f)'/• i*iljofa<*a Cun^JLi The U n i v e r s i t y o f B r i t i s h C o l u m b i a 2075 Wesbrook. Place Vancouver, Canada V6T 1W5 Date Wlr^j -3. 111,7 7 i ACKNOWLEDGEMENTS I should l i k e to express sincere gratitude to Dr. John G i l b e r t f o r h i s continued advice and c r i t i c i s m throughout the wri t i n g of t h i s t h e s i s . Many thanks are also due to my friends and members of my family for the much-appreciated understanding and the support that they have offered. i i ABSTRACT During the time when a c h i l d learns the sound system of h i s language, there i s much evidence that the c h i l d can perceive phonological d i s t i n c t i o n s and therefore detect phonetic differences before he can pro-duce these d i s t i n c t i o n s . This evidence i s often provided to disprove the hypothesis that the c h i l d could be using an " a c t i v e " model of speech per-ception. One such model, the analysis by synthesis model of speech per-ception, supposes that decoding of the acoustic s i g n a l employs the a r t i c u -l a t o r y representation that would be required to produce the hypothesized i d e n t i t y of the incoming s i g n a l . The model proposes that while the human auditory system i s innately equipped to handle the segments contained i n speech, that the c o r r e l a t i o n s between the acoustic information and a r t i c u -l a t i o n are learned with experience and form the basis for the d i v i s i o n of the continuous acoustic s i g n a l into d i s c r e t e categories of speech sounds. This thesis reviews recent research into the speech perception process and revises the analysis by synthesis model. I t reveals that the human auditory system i s innately equipped to divide s t i m u l i (both speech and non-speech) that vary along c e r t a i n acoustic dimensions into d i s c r e t e classes. The unique processing that r e s u l t s f o r speech s t i m u l i , occurs when the s t i m u l i i s recognized as having a function i n the system of language. Hence the requirements for phonetic processing involve the psychological r e a l i z a t i o n that stimulus originated i n the human vocal t r a c t . This i n v e s t i g a t i o n then reviewed the av a i l a b l e l i t e r a t u r e on the perception and production of ch i l d r e n acquiring language to determine i i i whether there i s support for t h e i r use of the revised analysis by syn-thesis model. The r e s u l t s favoured that c h i l d r e n do use such a model. When res o l v i n g the various acoustic cues that combine to form a stimulus complex, the c h i l d does r e f e r to h i s a r t i c u l a t o r y a b i l i t i e s . Lacking f u l l a r t i c u l a t o r y knowledge, the perceptual errors that t y p i f y children's language, occur. I t was shown that the c h i l d need not have the precise adult a r t i c u l a t o r y configuration i n order to u t i l i z e t h i s model. The model i s operative during the c h i l d ' s perception of both himself and the adult. In both instances the comparator performs the function of matching the c h i l d ' s a r t i c u l a t o r y representation with h i s perceived r e -presentation of a form. The r e s u l t s serve to improve h i s knowledge of a c o u s t i c - a r t i c u l a t o r y c o r r e l a t i o n s . In t h i s manner the processes of perception and production are c l o s e l y integrated and as under-standing of t h e i r f i n e i n t e r r e l a t i o n s h i p improves, production becomes more accurate and perception i s s i m p l i f i e d . TABLE OF CONTENTS Page LIST OF FIGURES v i Section 1.0 INTRODUCTION 1 1.1 Introduction 1 1.2 Review of the L i t e r a t u r e .... 6 1.2.1 The Status of Perception with Respect to ( Production i n the Chi l d Developing Language ' 6 1.2.2 Summary 17 1.3 Hypothesis '. 18 2.0 THE ANALYSIS BY SYNTHESIS MODEL OF SPEECH PERCEPTION ... 20 2.1 History . 20 2.2 The E x i s t i n g Model , 24 2.3 Recent Research into the Speech Perception Process 30 2.4 A Revised Model 50 3.0 THE USE OF THE REVISED MODEL BY THE CHILD 71 3.1 The L i t e r a t u r e on Children's Perception and Production, Re-reviewed 71 3.1.1 L i t e r a t u r e Evaluating the Relationship of Perception to Production . 71 3.1.2 L i t e r a t u r e Examining the Child's Perception of Speech 84 3.2 Conclusions 99 V Page 4.0 DISCUSSION 109 4.1 The Importance of Babbling 109 4.2 Implications f o r Research 112 SELECTED BIBLIOGRAPHY 114 v i LIST OF FIGURES Page Figure Chapter 1 INTRODUCTION 1.1 Introduction When the in f a n t i s approximately four months of age he "babbles" and begins to experience the sensory-motor connections that are so important for the development of speech. The c h i l d progresses from experiencing simple sensations and discriminating between auditory s t i m u l i , to ass o c i a t i n g the d i f f e r e n t movements i n the vocal t r a c t with the auditory sensations that they produce (by comparison of kines-t h e t i c feedback with auditory feedback). When the c h i l d learns to attach l i n g u i s t i c meaning to the discriminated s t i m u l i he i s perceiving phonological d i s t i n c t i o n s . At the same time, as the c h i l d begins to in t e r a c t with h i s environment,;, h i s experiences accrue and he begins to form concepts about the world surrounding him. These concepts, formed from the c h i l d ' s perceptions about objects, are the "meanings" of the words spoken by the c h i l d . Hence development of the semantic system as well as the a r t i c u l a t o r y and perceptual systems i s responsible for development of the c h i l d ' s language system. Semantics form the funda-mental structure of language and as the c h i l d i n t e r a c t s with h i s environment i n a more complex manner, h i s concepts become more abstract. In a sense, language begets language, i . e . language can be u t i l i z e d to form new concepts and hence new language. Not only does the c h i l d learn to recognize phonological contrasts, 2 but he learns to disregard allophonic differences not relevant i n h i s language, to choose the correct allophone required i n a s p e c i f i c phonetic context, and to use morphophonemic r u l e s . Phonological systems describe a language that i s both spoken and heard, so the c h i l d acquires the a b i l i t y to produce and perceive these aspects of the system. It i s l i k e l y though, that perception and production do not develop simultaneously. In f a c t , due to the f i n e motor co-ordination required to a r t i c u l a t e the sounds of a language, i t i s l i k e l y that a c h i l d has knowledge of h i s phonological system before h i s productions reveal t h i s knowledge. Research into phonological or phonetic development has for the large part, studied the order of appearance of children's phonemes and looked for recurring patterns that may reveal what constitutes "ease of production". As the c h i l d has only l i m i t e d experience with language, such research may y i e l d information as to which a b i l i t i e s are innate, which are acquired and what develops i n order to promote a c q u i s i t i o n . Theories of phonological development use r e s u l t s of such research and attempt to incorporate them into a theory, usually consistent with the author's l i n g u i s t i c background. At any p a r t i c u l a r time during t h i s process the c h i l d a c t u a l l y has h i s own system, yet he must remain f l e x i b l e enough to allow changes, usually towards the adult's system. I t might be expected that the c h i l d ' s system i s a "subset" of the complete adult system, yet c e r t a i n behaviours are frequently reported that contradict t h i s assumption. These behaviours are apparent c o n f l i c t s with use of the adult's system or with use of the c h i l d ' s system as i t i s revealed by h i s production. In order to explain such behaviours i t becomes necessary to research 3 the state of the c h i l d ' s perceptual a b i l i t i e s and to hypothesize about the r o l e of perception i n the a c q u i s i t i o n of phonology. An adequate theory of phonological development must not only be complete and. account f o r a l l aspects of the phonological system, and be consistent with other aspects of language a c q u i s i t i o n , but i t must also account f o r behaviours that apparently contradict use of a system resembling the adult's. At the same time i t must allow f o r f l e x i b i l i t y and progression towards the adult<system. To do t h i s , knowledge of percep-t u a l development and the i n t e r a c t i o n between perception and production i s mandatory. No component of the c h i l d ' s developing system i s s t a t i c and i t i s t h i s i n t e r a c t i o n of developing systems that allows f o r change and at the same time produces the occasional contradiction. Perception obviously plays a c r u c i a l r o l e i n the development of production, yet i t has been scarcely researched. While diary studies provide extensive data on a c h i l d ' s l o n g i t u d i n a l production, v i r t u a l l y no research has examined perceptual development over a period of time. There are inherent problems with such an undertaking. The adult obser-ver w i l l be biased i n h i s i n t e r p r e t a t i o n of the meaning associated with a c h i l d ' s production, and i n h i s phonetic i n t e r p r e t a t i o n of that production. Research i n t h i s area often assumes the format of examin-ing the common features of the c h i l d ' s production and the adult's production (usually i n terms of d i s t i n c t i v e feature analysis) to determine what contrasts are maintained and therefore, assumed to be perceived. Two assumptions underlie such a r a t i o n a l e . F i r s t , that the c h i l d can produce what he perceives and second, that the c h i l d i s perceiving by using an incomplete subset of the adult system. The c h i l d i s i n a unique p o s i t i o n for he not only perceives the 4 adult but presumably perceives himself, even though the two forms are not i d e n t i c a l . One of three possible explanations i s usually c i t e d to explain t h i s . The c h i l d may perceive some constituents that are common to both forms and he would not discriminate between the forms. His perception of a speech event would be a p a r t i a l subset of the adult perception and he would produce the same form that he perceived. The rest of the perceptual information would be "noise" to the c h i l d and not yet u t i l i z e d i n h i s system. A l t e r n a t i v e l y , the c h i l d may perceive as the adult does, but his productive a b i l i t i e s are not mature enough to permit him to produce the forms that he can perceive. In order to be able to perceive him-s e l f , the c h i l d must have some knowledge of the correspondence between h i s own and the adult forms. He would develop his production by using t h i s knowledge to reduce the differe n c e between forms. A t h i r d p o s s i -b i l i t y e x i s t s ; the c h i l d can perceive using two independent systems, one for the adult and one for himself. His mode of perception would depend upon the speaker and he would not have knowledge of the d i r e c t correspondence between forms. To learn more about what the c h i l d perceives i t i s appropriate to i nvestigate how the c h i l d perceives. Is there evidence that percep-t i o n r e l i e s upon production a b i l i t i e s (or v i c e versa) and therefore the two systems develop concurrently? Is the c h i l d f i r s t able to discrim-inate acoustic information to which the auditory system responds w e l l , and then he progresses to more complex forms of analysis? With some ins i g h t into the perceptual process, one can better understand the processes of phonetic and phonological a c q u i s i t i o n . One can begin to 5 speculate on what i s innate, what i s acquired and how the process i s perpetuated. This thesis i s concerned with understanding the r o l e of perception i n the a c q u i s i t i o n of phonology by seeking evidence f o r the use of a speech perception model by the c h i l d . The analysis by syn-thesis model of speech perception, proposed i n i t s l a t e s t form by Stevens and House (1972) has been favourably (though not unani-mously) received to account for speech perception by the adult. The model proposed that while some acoustic a t t r i b u t e s of the speech s i g n a l may be simply converted to l i n g u i s t i c data, f o r the large part some reference i s made to the a r t i c u l a t o r y mechanisms during speech percep-t i o n . The acoustic s i g n a l undergoes auditory analysis, then a hypo-thesis i s formed concerning i t s phonetic i d e n t i t y . This hypothesis i s "synthesized" to y i e l d an a r t i c u l a t o r y d e s c r i p t i o n (at some neuro-p h y s i o l o g i c a l l e v e l ) that i s compared with the r e s u l t s of preliminary analysis. The hypothesis i s either accepted or rejected. Processing at d i f f e r e n t l i n g u i s t i c l e v e l s occurs i n p a r a l l e l so that information from other l e v e l s i s a v a i l a b l e when forming a hypothesis about a segment's phonetic i d e n t i t y . Perception and production are not inde-pendent processes, but two aspects of the same system. In order to use t h i s model, the adult must have developed a strong knowledge of acoustic-a r t i c u l a t o r y c o r r e l a t i o n s . When t h i s model was proposed i t was supported by extensive research into various aspects of the perception process. This thesis w i l l v r e v i e w that l i t e r a t u r e and then re-assess the model with respect 6 to the research that has since been published. The revised analysis by synthesis model w i l l then be evaluated i n terms of the l i t e r a t u r e that i s a v a i l a b l e concerning the perception process i n the c h i l d . I t can then be determined whether there exists support for the use of t h i s " a c t i v e " model of speech perception by the c h i l d . Phonetic developmentphonological development and t h e i r i n t e r a c t i o n can then be better understood. 1.2 Review of the L i t e r a t u r e 1.2.1 The Status of Perception with Respect to Production i n the  Ch i l d Developing Language This section w i l l review l i t e r a t u r e that has evaluated the a b i l i t y of the c h i l d to perceive and produce phonetic and phon-l o g i c a l d i s t i n c t i o n s . In Russia, Shvachkin (1973) performed an experiment to deter-mine the order that Russian c h i l d r e n perceive the vowel and consonant d i s t i n c t i o n s i n t h e i r language that d i f f e r e n t i a t e the meanings of words. Eighteen c h i l d r e n ranging i n age from ten months to a year and one-half, were tested over a period of time ranging from one to eight months. The c h i l d r e n were f i r s t taught monosyllabic nonsense names for novel objects u n t i l the examiner was s a t i s f i e d that the c h i l d could c o r r e c t l y r e l a t e the object to i t s name. Shvachkin did not c l e a r l y state the exact method used to teach the names nor the c r i t e r i a estab-l i s h e d to accept a name as "taught". When the subject had learned several names, he was required to s e l e c t one object from a p a i r of 7 objects and then from a group of three objects. An i n i t i a l evaluation was made of the phonological d i s t i n c t i o n s that the c h i l d could success-f u l l y perceive. A d d i t i o n a l experimentation was c a r r i e d out on the d i s t i n c t i o n s that the ch i l d r e n could not yet perceive i n order to determine the sequence i n which c h i l d r e n acquire the a b i l i t y to make these contrasts. The c h i l d r e n were again taught words and a set of s i x methods were used to determine when the d i s t i n c t i o n s could be made. The methods required that the subjects follow various i n s t r u c t i o n s containing the words being tested. Not a l l c h i l d r e n were tested with the complete set of methods and Shvachkin again did not specify the c r i t e r i o n used to determine the subjects' successful performance. The r e s u l t s from the experiment indicated that the d i s t i n c -tions Shvachkin tested could be ordered into a sequence of twelve stages and that each c h i l d acquired the d i s t i n c t i o n s with l i t t l e v a r i a t i o n from t h i s order. B r i e f l y , vowels were found to be phonemically discrim-inated before consonants, then the presence or absence of the consonant i n CVC-VC pair s was perceived. F i n a l l y consonants were d i f f e r e n t i a t e d i n a serie s of ten stages. Although t h i s study was apparently unrelated to the work being done by Jakobson (1968) the r e s u l t s p a r a l l e l e d Jakobson's remarkably w e l l . Jakobson hypothesized that the c h i l d learns phonological oppo-s i t i o n s i n an invariant order and that t h i s order i s u n i v e r s a l l y v a l i d . Certain d i s t i n c t i o n s must appear i n a c h i l d ' s speech before l a t e r d i s t i n c t i o n s may be acquired. The sequence of stages i s based upon the hypothesis that ch i l d r e n learn to d i s t i n g u i s h consonants and vowels i n 8 a manner beginning with maximal contrasts and progressing to minimal contrasts. The complexity of the contrast i s determined by the number of oppositions that a sound p a r t i c i p a t e s i n , within the c h i l d ' s system at a s p e c i f i c time. The c h i l d ' s system at f i r s t consists of an optimal vowel plus an optimal consonant, such as /pa/ and then the d i s t i n c t i v e features are d i f f e r e n t i a t e d along two axes — the sonority axis, which includes secondary consonantal source features and the t o n a l i t y a xis, which includes the o r a l resonance features. Jakobson did not e x p l i c i t l y state the order of appearance of s p e c i f i c phonological contrasts, but he stated that c e r t a i n oppositions w i l l appear before or a f t e r other oppositions. Attempts to v e r i f y t h i s theory have led to i t s support and i t s r e j e c t i o n . There i s no evidence from acoustic-p h y s i o l o g i c a l data to support t h i s theory. Shvachkin's work may appear to o f f e r support but i n view of the methodological weaknesses and lack of q u a n t i t a t i v e data i n t h i s study, i t i s wise to r e j e c t the existence of a u n i v e r s a l invariant order for the a c q u i s i t i o n of the perception or production of phonological oppositions. In addition to determining a sequence for phonological develop-ment, Shvachkin made the observation that the c h i l d r e n i n h i s experiment were often able to perceive an opposition yet not produce the contrast. Also, those oppositions that the c h i l d could produce were perceived from the outset of the experiment and the oppositions that were acquired during the experiment were not usually produced by the c h i l d -ren for the duration of t e s t i n g . Shvachkin therefore concluded that the r o l e of a r t i c u l a t i o n must not be overrated i n the process of phono-l o g i c a l development. He stated that a r t i c u l a t i o n and hearing influence 9 the phonemic development of c h i l d language. In some cases the c h i l d ' s hearing may serve to d i s t i n g u i s h phonemes. In other cases some approximation to correct pronunciation w i l l be s u f f i c i e n t to discrim-inate phonemes. But phonemes not c l e a r l y pronounced by the c h i l d w i l l be discriminated at a l a t e r stage than those which the c h i l d already pronounces c l e a r l y . Garnica (1973) performed an experiment s i m i l a r to Shvachkin's with English-speaking c h i l d r e n . A sequence of stages was t e n t a t i v e l y established using Shvachkin's r e s u l t s as a guide, but changing some of the s p e c i f i c oppositions to be consistent with the English language. Following a p i l o t study the sequence was re-ordered. Sixteen c h i l d r e n ranging i n age from 1;5 to 1;10 served as subjects (older c h i l d r e n were found to be able to perceive most of the oppositions). The chi l d r e n were f i r s t trained to perform the task of choosing only one object named from a p a i r . The s t i m u l i consisted of nonsense CVC s y l l a b l e s that when paired d i f f e r e d only i n the i n i t i a l consonant sound. The s t i m u l i were presented to the chi l d r e n as "names" for novel objects. On each t r i a l on a t e s t i n g session the subject was required to perform an a c t i v i t y with the object that was named from a p a i r of objects. The c r i t e r i o n f o r success was seven or more correct t r i a l s out of ten t r i a l s . Testing began at the stage where a c h i l d was judged to be performing, then more advanced oppositions were tested. Later the c h i l d was re-tested on oppositions that he f a i l e d to make i n the e a r l i e r stage of the experiment. 10 Garnica's r e s u l t s do not support the invariant sequence post-ulated by Shvachkin. The subjects i n her study displayed a great deal of v a r i a t i o n i n the order that they acquired the oppositions. However, she observed that general trends did exi s t i n her data," "which suggest that the order of a c q u i s i t i o n i s simply more v a r i a b l e than Shvachkin's data would indi c a t e [p.221]." For many of the subjects i n Garnica's study, a l l of the op-positions were discriminated by the end of t e s t i n g . At t h i s time the ch i l d r e n were l e s s than two-years o ld and were therefore, at an age when i t i s very u n l i k e l y that they could produce such d i s t i n c t i o n s i n t h e i r speech. Although Garnica made no mention of her subjects' productions, i t appears again that there i s evidence that c h i l d r e n are capable of perceiving phonological d i s t i n c t i o n s before producing them. In a recent a r t i c l e , Barton (1975a) c r i t i c i z e d the c r i t e r i o n used i n experiments i n v e s t i g a t i n g the speech-sound d i s c r i m i n a t i o n a b i l i -t i e s i n ch i l d r e n . The c r i t e r i o n used by Garnica w i l l allow the c h i l d to be successful by using random choice i n 17.2% of the sessions. In order to reduce the chance of success by random choice (and therefore produce r e l i a b l e data) Barton raised the c r i t e r i o n to f i f t e e n correct responses i n twenty t r i a l s . In t h i s case the subject could reach the c r i t e r i o n by chance i n only about 2% of the sessions. A l t e r n a t i v e l y i f the subject got the f i r s t f i v e t r i a l s correct, t h i s could occur by chance only 3% of the time, and tes t i n g could be discontinued. Barton (1975b) sought to determine "whether most phonologically relevant discriminations can be made by a c h i l d at an early stage of his or her phonological development and whether they are acquired as 11 pronunciation develops [ p . l ] . " He c a r r i e d on the Shvachkin-Garnica experiments but made some changes i n methodology and c r i t e r i o n . Twenty subjects ranging i n age from 2;3 to 2;11 were tested with phonological contrasts that are generally considered to be acquired the l a t e s t . The s t i m u l i were pairs of monosyllabic words that d i f f e r e d from each,other i n one phonological feature of one segment. The words were the names of objects that the chi l d r e n were l i k e l y to know. Testing f o r several p a i r s of words took place i n one session. For each c h i l d the t e s t i n g took place over ten days. The f i r s t stage of the experiment consisted of presenting to the c h i l d each stimulus to be used i n the session and requesting that he i d e n t i f y i t . The word was recorded as named i f the c h i l d named the object without the tester naming i t , as prompted i f the experimenter was required to name the object once, or as taught i f the experimenter had to name the object more than once and give some explanation about i t s function, etc. The f i r s t part of the session was complete when the c h i l d could i d e n t i f y each of the s t i m u l i by discriminating i t from words d i f f e r i n g i n a l l segments. In the second portion of the experiment the c h i l d was required to discriminate the word-pairs that d i f f e r e d only i n one phonetic seg-ment. I f the c h i l d responded c o r r e c t l y i n each of the f i r s t f i v e t r i a l s , t e s t i n g was discontinued. Otherwise twenty t r i a l s were adminis-tered for each word-pair, with the c r i t e r i o n of f i f t e e n correct res-ponses i n twenty t r i a l s . i 12 Analysis of seven pairs of words te s t i n g the v o i c i n g contrast showed that the children's errors increased when the words were prompted, and increased even more so when the words were taught. Considering the r e s u l t s for named words only, Barton found that nearly a l l p a i r s were discriminated by the subjects c o n s i s t e n t l y either i n the f i r s t f i v e t r i a l s or i n f i f t e e n out of twenty t r i a l s . Only i n three out of seventy cases did the subject respond randomly or i n a biased manner, choosing one of the pa i r consistently. While the ch i l d r e n d i f f e r e d i n t h e i r o v e r a l l pattern of r e s u l t s , a l l of them could make some v o i c i n g d i s t i n c t i o n s . It was not found to be the case that some chi l d r e n made the v o i c i n g d i s t i n c t i o n and others did not. For pairs where one or both of the words were taught, most of the subjects again discriminated the pairs (twenty-five out of f o r t y cases) but did not do so as well as when they could name the words. In Shvachkin's and Garnica's experiments the s t i m u l i were invented words, so the subjects could not have performed better than the subjects did with the prompted or taught words i n Barton's study. Some of the errors found i n these experiments may have resulted from the ch i l d r e n not knowing the words well enough. The ch i l d r e n may have been able to make the discriminations that they were reported to not make, i f r e a l words had been used. Barton obtained s i m i l a r r e s u l t s using p a i r s of words t e s t i n g c o r o n a l i t y and n a s a l i t y . The l e v e l of accuracy was high, suggesting that c h i l d r e n of th i s age have already acquired these discriminations. Barton also cautioned that generalizations should not be made about the a c q u i s i t i o n of a c e r t a i n class of oppositions based on j u s t 13 one or some members of that c l a s s . For example, although the subjects discriminated the g/k p a i r w e l l , these were words that the chi l d r e n were l i k e l y to know. He suggested that although the order of a c q u i s i t i o n i and the order of perceptual d i f f i c u l t y are not necessarily the same, d i f f i c u l t i e s i n discrimination may l i m i t the a c q u i s i t i o n of some vo i c i n g p a i r s . Kornfeld (1971) hypothesized that children's production i s based predominantly upon t h e i r perception of language, with only some influence from motor constraints and knowledge of the language system. She sought evidence revealing the nature of the representational system that the c h i l d has i n t e r n a l i z e d , i n order to determine i n what way, i f any, the c h i l d ' s encoder d i f f e r s from the adult's. Kornfeld rejected the hypothesis that the c h i l d ' s system of phonological d i s t i n c t i o n s i s determined by that of the adult, i . e . that the c h i l d perceives as the adult does, or by using some proper subset of the adult's system, but that he cannot produce the d i s t i n c t i o n s that he perceives due to motor constraints. Kornfeld studied t h i r t e e n c h i l d r e n ranging i n age from 1;6 to 2;6 over a period of time (not stated). The children's productions of c l u s t e r words were compared with t h e i r productions of non-cluster words. The observed pa i r s d i f f e r e d i n the i n i t i a l segment. Some words began with a singleton consonant, while the others began with a c l u s t e r , e.g. 'truck' and 'tuck'. The samples were obtained from the spontaneous speech of ch i l d r e n i n play sessions. The utterances were transcribed immediately and were also tape recorded. The words were then s p l i c e d out and analyzed spectrographically. 14 Spectre-graphic analysis revealed that the c l u s t e r words were con s i s t e n t l y i d e n t i f i a b l e from the singleton words even though the tr a n s c r i b e r had f a i l e d to observe any d i f f e r e n c e . The c h i l d r e n did sim-p l i f y , the c l u s t e r s , but the r e s u l t i n g singleton consonant d i f f e r e d from the consonant produced i n the singleton words. When the c l u s t e r s contained l i q u i d s and g l i d e s , the c h i l d r e n were judged as having produced a /w/, but spectrographic analysis revealed that these were not " r e a l " "w's". Moreover, the "w" produced for / l / and the "w" produced for / r / d i f f e r e d i n the mean frequency on t h e i r second formant locus. This difference does not correspond to the acoustic d i f f e r e n c e produced by the adult for 111 and / r / . Disregarding whether these r e s u l t s support Kornfeld's hypo-thesi s , they do reveal that c h i l d r e n produce d i s t i n c t i o n s between sequences of phonemes that the adult does not perceive. In order to produce an acoustic variant the c h i l d must have perceived that a difference e x i s t s . The c h i l d may i n f a c t produce the form that he perceives as Kornfeld proposed, or he may not yet have mastered the a r t i c u l a t i o n necessary to r e p l i c a t e the representation that he has perceived. In e i t h e r instance the a b i l i t y of the c h i l d to perceive differences i s at l e a s t equal to, i f not more advanced than, his a b i l i t y to produce the d i f f e r e n c e . In a further study Kornfeld (1976) presented evidence to show that language learners may attend to unique p r i n c i p l e s when c l a s s i f y i n g phonological d i s t i n c t i o n s . Kornfeld argued that the c h i l d must depend more upon phonetic data than the adult, due to h i s poor 15 knowledge of morpheme structure constraints and that the child may interpret phonetic contexts differently when there exists a conflict of acoustic cues. She presented acoustic-phonetic data and showed that ambiguous acoustic cues do exist in just the environments where children simplify i n i t i a l consonant clusters. Again, Kornfeld supported the notion that the child's production is based on the perceptual dis-tinctions that he is capable of making. The state of the child's perception i s at most equivalent to the state of his production, and most lik e l y precedes his production by some margin. In a longitudinal study of his son, A's, phonological develop-ment, Smith (1973) presented some interesting observations about his son's behaviour that indicate the relative status of his perception with respect to his production. Smith analyzed his son's production data in two ways — f i r s t as though A's language was a language in i t s own right bearing no necessary connection to the adult language, and second, as though A's language was derived from the adult system, i.e., the child perceived in a manner equivalent to the adult, but he could not yet produce these forms. Smith failed to find evidence supporting the hypothesis that his son's language existed as a system in i t s own right. He supported the viewpoint that A had internalized the adult system in order to represent the forms that he perceived. The child would be unable to produce the perceived, underlying forms due to the inability of his articulatory mechanism to perform the required gestures and as a result of the child not yet having made certain hypotheses about the language he is learning. 16 Three examples led Smith to believe that A could perceive d i s t i n c t i o n s and do so i n the adult manner, before he could produce them. As reported i n other diary studies (Velten, 1943; Weir, 1962) the c h i l d ' s productions revealed extensive homonymy, yet when A was required to de-monstrate that he could phonologically d i s t i n g u i s h the adult forms he did so. Before A could speak at a l l , Smith tested him with word-pairs such as 'mouth' and 'mouse' and found that he could i n d i c a t e the correct object from the p a i r . A was s t i l l unable to produce both members of such a p a i r at the time Smith wrote the book. A was able to understand h i s own productions provided that he was s t i l l at the stage where he produced these forms. However, i f his " i n c o r r e c t " production of a word was equivalent to the adult form for a d i f f e r e n t l e x i c a l item and A was required to i d e n t i f y the word, he would i n v a r i a b l y give the meaning expressed by the adult. Smith quotes the following example: NVS What does [maus] mean? A Like a cat. NVS Yes: what else? A Nothing els e . NVS I t ' s part of you. A [ d i s b e l i e f ] NVS I t ' s part of your head. A [fascinated] NVS [touching A's mouth] What's this? A [maus] [p. 137] A must perceive that there i s a phonological d i s t i n c t i o n between the two forms, or he would have responded without the extensive prodding. But did A r e a l i z e that h i s production of 'mouth' was not 17 the same as the adult's form? This would seem to imply that he per-ceived the d i s t i n c t i o n s made by the adult but not the lack of t h i s d i s t i n c t i o n i n h i s own speech. Perhaps a more l i k e l y s o l u t i o n i s that A was very aware of h i s " i n c o r r e c t " pronunciation and was avoiding saying the word. Avoidance behaviours have been reported i n other research (Drachman, 1971; Engel, 1973) as well as i n Smith (1973). Often the c h i l d w i l l d e l i b e r a t e l y avoid producing a word that he knows does not make a necessary phonological d i s t i n c t i o n . Therefore the c h i l d i s able to perceive the phonological d i s t i n c t i o n s that are made both by the adult and by himself. Smith reported that on several occasions upon mastering a new word, A would comment on h i s newly acquired a b i l i t y . Again t h i s behaviour would appear to i n d i c a t e that the c h i l d perceived that h i s forms did or did not contain c e r t a i n phonological d i s t i n c t i o n s . These observations made by Smith support the notion that a c h i l d perceives d i s t i n c t i o n s made i n hi s language before he i s able to produce them, and furthermore that he i s aware of the state of h i s productive a b i l i t i e s . 1.2.2. Summary ' The research presented here indicates that a c h i l d can perceive d i s t i n c t i o n s made i n segments of the adult's speech and that he u t i l i z e s t h i s information to d i f f e r e n t i a t e the meaning of l e x i c a l items. The c h i l d can d e f i n i t e l y perceive the d i s t i n c t i o n s that he 18 produces and i t i s l i k e l y that he can perceive the lack of c e r t a i n d i s t i n c t i o n s i n h i s system. Therefore there e x i s t s evidence that the c h i l d i s able to perceive phonological oppositions contained i n his language before he can produce those oppositions. 1.3 Hypothesis The analysis by synthesis model of speech perception (Stevens & House, 1972) " i s based on the premise that there e x i s t s close t i e s be-tween the processes of speech production and speech perception, and that there are components or operations that are common to both processes [p. 51]". Once the acoustic a t t r i b u t e s have been analyzed to produce a p a r t i a l s p e c i f i c a t i o n of the features and a hypothesis concerning the input has been made, the hypothesis i s generated i n terms of the i n -struc t i o n s to the a r t i c u l a t o r y mechanisms that are necessary to a c t u a l i z e the hypothesis. Then the acoustic information i s compared with the a r t i c u l a t o r y i n s t r u c t i o n s to determine whether the hypothesis can be accepted. To u t i l i z e t h i s system the subject must be able to generate, at some l e v e l , the a r t i c u l a t o r y equivalent of the s i g n a l undergoing processing and he must have knowledge of the match between auditory patterns and a r t i c u l a t o r y i n s t r u c t i o n s that give r i s e to these patterns! The research presented above supports the hypothesis that c h i l d r e n are capable of making perceptual d i s t i n c t i o n s before they are capable of producing them. In order to discriminate two phonemes the c h i l d must be able to detect phonetic differences i n the acoustic s i g n a l . The im p l i c a t i o n i s that c h i l d r e n must be able to perceive phonetic information i n a segment without r e l y i n g upon t h e i r a b i l i t y 1 9 to articulate the same segments. Therefore the null hypothesis is that the analysis by synthesis model of speech perception cannot be valid for the child. \ Chapter 2 i THE ANALYSIS BY SYNTHESIS MODEL OF SPEECH PERCEPTION 2.1 History The most recently published form of the analysis by synthesis model of speech perception (Stevens & House, 1972) was derived from updating previous " a c t i v e " models f o r th i s process proposed by Stevens and various other researchers. In 1960, Stevens proposed the design for a machine that would both recognize speech by accepting the speech wave and generating a sequence of phonetic symbols, and synthesize speech by accepting a sequence of symbols and generating a speech wave. Two peripheral u n i t s , an analog f i l t e r and a model of the vocal t r a c t achieve coupling between the acoustic speech s i g n a l and the machine. B r i e f l y , the speech wave f i r s t undergoes "peripheral a n a l y s i s " where i t i s reduced by a set of analog f i l t e r s to display i t s short-time spectra and p e r i o d i c i t i e s . Thus some information about the seg-ment' s phonetic i d e n t i t y and p i t c h i s extracted. The c e n t r a l unit then makes a tentative decision as to the a r t i c u l a t o r y i n s t r u c t i o n s that could give r i s e to the information about t h i s segment, using i t s know-ledge of previous analyses, i t s knowledge about analysis of adjacent s p e c t r a l samples, and previous scores f o r the sample under analysis. The "model" then generates speech spectra from the decision made by the 20 21 control unit and the comparator compares these spectra with the acoustic a t t r i b u t e s of the input that are r e s i d i n g i n temporary store. If the comparator accepts the match then the output at t h i s l e v e l i s an a r t i c u l a t o r y d e s c r i p t i o n of the input. Stevens states that t h i s d e s c r i p t i o n may be at the a c o u s t i c a l , anatomical or neurophysiological l e v e l , but he does not commit the model to any of these. In the second stage of processing the a r t i c u l a t o r y d e s c r i p t i o n i s converted to a sequence of phonetic symbols, again by a c t i v e l y synthesizing signals to be compared with the input that i s under an a l y s i s . The rules for synthesizing possible signals at each stage of conversion are contained i n the machine and the order i n which the hypothesized t r i a l s ignals are generated during analysis i s determined by strategies that are also b u i l t into the machine or can be evolved as the analysis proceeds (contained i n the control unit)., The machine can synthesize speech by converting an input of symbols into the a r t i c u l a t o r y des-c r i p t i o n s and then a c t i v a t i n g the peripheral u n i t , the vocal t r a c t . B e l l , F u j i s a k i , Heinz, Stevens and House (1961) described the procedures used i n an analysis by synthesis technique to reduce a speech wave to time-varying vocal t r a c t resonance and source character-i s t i c s . The analysis by synthesis model described above was programmed on a d i g i t a l computer to perform the f i r s t stage of conversion of the speech wave, i . e . into an a r t i c u l a t o r y d e s c r i p t i o n . Comparison spectra were generated using rules based on an a c o u s t i c a l theory of speech production. The control unit provided information on the poles and zeros of the transfer function and on the type of vocal t r a c t e x c i t a t i o n . The experimenters were concerned with e s t a b l i s h i n g rules or strategies that the c o n t r o l component may u t i l i z e to convert the 22 r e s u l t s from peripheral analysis into such information with maximum speed and accuracy. Two methods were used to implement the opera-tions i n the c o n t r o l component. Either the experimenter performed the control function or a rudimentary strategy that permitted auto-matic analysis was employed. When the experimenter co n t r o l l e d the strategies the values for spectra poles and zeros were found to be more accurately matched than by automatic an a l y s i s , however the time required was f a r greater than by automatic a n a l y s i s . The s p e c t r a l match determined by automatic analysis resulted i n consistent small errors i n the formant l o c a t i o n s . These errors could be eliminated i f more elaborate and time consuming strategies were used to match the spectra. The authors concluded that such a "feedback" model i s advan-tageous to other models because i t ensures that adequate representation of the input i s being analyzed, whereas i n open-loop analysis a simple a t t r i b u t e analysis i s performed and may omit important data or permit errors to occur. Also i n t h i s model, once v a r i a t i o n s due to speaker differences have been resolved, the strategy can be s i m p l i f i e d . The strategy i s then only concerned with extracting the dynamic features of speech. Halle and Stevens (1964) presented a speech perception model that u t i l i z e d an a c t i v e or feedback process. The model was proposed to account for the conversion of a continuous speech wave into a set of d i s c r e t e phonemes without segmentation. The dynamic speech wave containing t r a n s i t i o n s and overlap of information r e s u l t s from the i n t e r a c t i o n of independent a r t i c u l a t o r y structures, each with i t s 23 i n e r t i a and l i m i t a t i o n s i n neural and muscular c o n t r o l . Therefore i n t h i s model reference i s made to a set of phonetic parameters that describe the independent a r t i c u l a t o r ' s behaviours. Stevens and House acknowledged that the phonetic parameters may be akin to d i s t i n c t i v e feature systems or the more t r a d i t i o n a l c l a s s i f i c a t o r y systems. They did not commit the model to either but stated that "the v o c a l - t r a c t behaviour must be described by s p e c i f y i n g a set of quasi-independent phonetic parameters [p. 605]" i n order that the speech wave preserve i t s non-segmentable and time-varying c h a r a c t e r i s t i c s . The f i r s t step i n the model consists of preliminary analysis — where v o c a l - t r a c t information i s extracted and tentative i d e n t i f i c a t i o n by s p e c i a l a t t r i b u t e s i s made, thereby l i m i t i n g the number of possible sequences that may be generated l a t e r . The generative rules are capable of synthesizing spectra for a l l sets of phonetic parameters that i t may receive, but does not need to do so because the control unit orders l i k e l y representations to achieve rapid convergence to a match. The analysis takes place i n two stages. The f i r s t stage reduces the s p e c t r a l representation "to a set of parameters which describe the pertinent motions and excitations of the vocal t r a c t , i . e . the phonetic parameters [p. 609]". Here v a r i a t i o n s between d i f f e r e n t speakers and a s p e c i f i c speaker under d i f f e r e n t conditions are resolved. In the second stage, the output from stage I, the phonetic parameters, are transformed to a sequence of phonemes. Here the v a r i a t i o n s i n speech si g n a l s due to rate of speech, l i n g u i s t i c background, d i a l e c t , and the contextual v a r i a t i o n of phonemes are resolved. 24 The authors do not expand upon how the transformation from phonetic parameters to phonetic segments takes place nor upon the form that phonetic d e s c r i p t i o n takes. The model i s also capable of transforming an input of d i s c r e t e phonemes into a continuous speech wave, by using the same set of phonetic parameters i n the intermediate stage and then generating speech at a peripheral vocal t r a c t . 2.2 The E x i s t i n g Model During the next decade, the extensive research that was c a r r i e d out on various aspects of the speech perception process, led Stevens and House to r e f i n e e x i s t i n g forms of the analysis by syn-thesis model and to propose a revised version (Stevens & House, 1972) . Two areas of research were p a r t i c u l a r l y important for the model's development. F i r s t of a l l , i n v e s t i g a t i o n was c a r r i e d out to support the existence of a set of "features" that would describe each l i n g u i s t i c u n i t , the phoneme, i n a r t i c u l a t o r y terms. (The best known such system being that of Chomsky & Halle (1968).) At the l i n g u i s t i c l e v e l these twenty to t h i r t y features are capable of describing, uniquely, the phonemes contained i n a l l languages by specifying the presence or absence of a t t r i b u t e s that e x i s t at some l e v e l i n the speech event. They describe such c h a r a c t e r i s t i c s of the segment as place of a r t i -c u l a t i o n , secondary v o c a l - t r a c t c o n s t r i c t i o n s , the manner of a r t i -c u l a t i o n and the type of acoustic source of v o c a l - t r a c t e x c i t a t i o n . The r e s u l t s from experiments that analyzed errors made i n short-term memory, metathetical errors and the perceptual confusions \ 25 that r e s u l t when the speech s i g n a l i s degraded, ind i c a t e that there i s some v a l i d i t y f o r the existence of the d i s t i n c t i v e feature as an en t i t y that i s analyzed at some point i n both speech production and perception. Furthermore the authors stated that because these features occur repeatedly i n a l l languages and because " c h i l d r e n seem to learn the r u l e s for manipulating these features from exposure to a r e l a t i v e l y small number of samples, t h i s constitutes strong evidence that the nervous system of man i s predisposed not only to encode sounds as seg-ments and features, but also to decode sounds i n the same ways [p. 13]". If such features are analyzed during perception of speech then there i s strong support f o r reference being made to a r t i c u l a t o r y knowledge during the conversion of the speech wave to phonemes, as i s required by the analysis of synthesis model. Researchers then turned to examining the acoustic waveform i n cases where some feature was known to e x i s t , to determine the acoustic c o r r e l a t e s of that p a r t i c u l a r feature. Such analysis has not been e n t i r e l y successful because while some features are represented by inv a r i a n t acoustic information, the acoustic manifestations for many vary according to the phonetic context and the p a r t i c u l a r speaker. Therefore research was unable to define p a r t i c u l a r acoustic a t t r i b u t e s to which the speech-perception mechanism i s predisposed to detect or recognize, and there can e x i s t no simple acoustic to a r t i c u l a t o r y , to phonetic t r a n s l a t i o n . The authors proposed that the r o l e of synthesis during perception i s s t i l l necessary to resolve the non-invariant information c a r r i e d i n the speechwave. 26 Data that indicated there was a perceptual dichotomy between speech and nonspeech discovered the phonomenon known as "categorical perception". Studies in this area form the second class of experiments that influenced the theory behind the most recent analysis by synthesis model. These experiments examine the identification versus the discrimination of synthetically produced speech sounds. The stimuli vary i n one feature such as 'voice' or 'place of articulation' by varying the VOT or formant characteristics in equal steps along a continuum. It was found that for the stimuli of syllabic, dynamic structure, the listener could discriminate them only i f he could identify them as being different. Discrimination was excellent near a phoneme boundary yet poor within a phoneme region. The authors disregarded that categorical perception may result from a type of " f i l t e r i n g " mechanism in the auditory system that separates acoustic data into classes, due to the lack of data to support that the auditory system performs such complex analysis. They proposed that the listener uses his knowledge of linguistic categories to separate the auditory patterns into categories and that the listener is assisted in doing so, because as a generator of speech he has knowledge of the articula-tory instructions that produce speech, available for his use. They exemplify this proposal with the fact that when producing a stop consonant, the tongue has a discrete target in mind and that when perceiving such, the listener learns to hear these articulatory plateaus or targets as discrete classes. Therefore distinctive features were considered to be partly innate and partly learned. The ab i l i t y to handle features is innate 27 i n the human nervous system, but r e s o l v i n g the various forms of a phoneme into a phonetic c l a s s , requires knowledge of a r t i c u l a t o r y -auditory r e l a t i o n s h i p s . Also the l i s t e n e r learns to disregard i n t r a -phonemic information that i s not relevant i n h i s language system. Stevens and House state that such knowledge i s probably b u i l t up from an early age i n the c h i l d when he begins to form a c o u s t i c - a r t i c u l a t o r y connections. In order for a s i g n a l to be processed as speech i n t h i s model, i t i s necessary that i t evoke the "speech perception mode". When a si g n a l contains the s y l l a b i c , dynamic c h a r a c t e r i s t i c s that i d e n t i f y i t as a stimulus that can be categorized i n an a r t i c u l a t o r y c l a s s , t h i s mode i s achieved. The l i s t e n e r learns to perceive signals that may have originated i n the vocal t r a c t as speech, due to the character-i s t i c s of the s i g n a l that evoke reference to a r t i c u l a t o r y mechanisms. In t h i s model, the two-loop synthesis system from previous models can be collapsed to one loop. The existence of d i s t i n c t i v e features i s a natural explanation f o r the acoustic to a r t i c u l a t o r y , to phonetic transformations. These features are assumed to be represented at a l l l e v e l s and once i n i t i a l a nalysis has derived the more invar i a n t acoustic information for some features, the remaining features are resolved using knowledge of the co r r e l a t i o n s between l i n g u i s t i c categories and the a r t i c u l a t o r y patterns that produce classes of sounds. The components of th i s model are shown i n Figure 2.1. The acoustic s i g n a l undergoes peri p h e r a l processing whether i t be speech or not. I t 28 i s at t h i s stage that the presence of the dynamic c h a r a c t e r i s t i c s of . speech are sought and i f found, the s i g n a l i s processed accordingly. The peripheral processing transforms the acoustic wave-form into a neural time-space pattern that w i l l be used at l a t e r stages of a n a l y s i s . It i s known that at t h i s l e v e l more than simple time-frequency analysis takes place and probably extraction of some of the more e a s i l y d i s -criminated information relevant to' d e s c r i p t i o n of c e r t a i n features, also occurs. Some normalization of the s i g n a l must also take place, so that the output i s a set of normalized a t t r i b u t e s that contribute to l a t e r i d e n t i f i c a t i o n of l i n g u i s t i c u n i t s , (A). These auditory patterns are placed i n temporary store to await further processing. The preliminary analysis component derives the features that are not strongly context dependent from the s i g n a l (A). These features are a v a i l a b l e a f t e r conversion of the r e s u l t s from peripheral processing and r e s u l t i n a p a r t i a l s p e c i f i c a t i o n of a feature matrix of the utterance, (B). The c o n t r o l component has access to the r e s u l t s of preliminary processing as w e l l as the r e s u l t s of analysis of previous parts of the utterance, the lexicon, and the output of the comparator. With t h i s information, the control unit makes a hypothesis concerning the representation of the utterance i n terms of morphemes. The features are an abstract quantity that underlie, but are not n e c e s s a r i l y i d e n t i f i e d with, the acoustic a t t r i b u t e s or the production of the s i g n a l . This hypothesized representation (B) t r a v e l s to the generative rules where i t i s transformed into a representation of the a r t i c u l a -tory i n s t r u c t i o n s that would be necessary to generate such an utterance. PERIPHERAL AUDITORY ANALYSIS A TEMPORARY STORE COMPARATOR ARTICULATORY MECHANISM <r O PRELIMINARY ANALYSIS ERROR — > GENERATIVE RULES CONTROL (INCLUDES LEXICON) V OUTPUT OF ANALYSIS Figure 2.1 A proposed model of the speech perception and production processes, (from Stevens and House, 1972) 30 (The a r t i c u l a t o r y i n s t r u c t i o n s (V) that r e s u l t , could be used to control the a r t i c u l a t o r y mechanisms and produce speech output.) The generated patterns (V) are compared by the comparator with the a t t r i b u t e s of the analyzed utterance r e s i d i n g i n temporary store and judged as to t h e i r closeness of match. This information i s relayed to the control component and the hypothesized sequence i s eith e r accepted or a new hypothesis i s made using the error -detected i n the comparator. This loop i s transversed u n t i l the message has been s u c c e s s f u l l y i d e n t i f i e d . The model r e l i e s on the comparison of auditory patterns (A) with a r t i c u l a t o r y i n s t r u c t i o n s (V) that p o t e n t i a l l y are able to produce such patterns. The comparator must contain a catalogue of such r e l a t i o n s . The a r t i c u l a t o r y gesture i s represented i n terms of t a c t i l e , proprioceptive sensations and motor-commands. Stevens and House state that the catalogue i s b u i l t up as the c h i l d begins to utter sounds, hear the auditory r e s u l t and form t h e i r a s s o c iation. Therefore the c h i l d i s aided i n learning to perceive speech, by being able to a r t i c u l a t e i t . 2.3 Recent Research into the Speech Perception Process In view of the studies of c a t e g o r i c a l perception that have taken place since the Stevens and House model was proposed, the theory underlying t h i s model must be revised. Results of various studies on infant perception reveal that the young infant i s capable of d i s -criminating sounds varying i n acoustic dimensions that are common to 31 a l l languages. Elmas, Siqueland, Jusczyk and V l g o r l t o (1971) presented to one- and four-month old i n f a n t s , the s y l l a b l e s /pa/ and /ba/ that varied i n only one acoustic dimension, voice onset time. The s t i m u l i were synthesized so that the VOT varied from -20 to +80 m sec i n 20 msec steps. The infant's responses were measured by determining the rate of n u t r i t i v e sucking. A f t e r a baseline rate was established, an auditory stimulus was presented which caused the rate to increase. When the infant became habituated to t h i s stimulus and h i s rate of sucking decreased, a second s t i m u l i was presented. T y p i c a l l y , the sucking rate then increased again. In the control group, the second stimulus was i d e n t i c a l to the f i r s t stimulus. An infant was assumed to have discriminated the s t i m u l i i f h i s sucking rate increased a f t e r a change i n stimulation, or i f i t s decrease was of smaller magnitude than that shown by the c o n t r o l subjects. The r e s u l t s indicated that the infants discriminated /p/ from "non-/p/" and that they did so at the same phoneme boundary determined for adult subjects by Lisker and Abramson (1970). In addition, the infants more often discriminated the s t i m u l i when they were from d i f f e r e n t adult phonemic categories than when they were from the same category. Eimas et a l . took t h i s as evidence f o r c a t e g o r i c a l perception and stated "that the means by which the c a t e g o r i c a l perception of speech, that i s , perception i n a l i n g u i s t i c mode, i s accomplished may well be part of the b i o l o g i c a l makeup of the organism and, moreover, that these means must be operative at an unexpectedly early age [p. 306]." Morse (1972) designed an experiment using the same method as Eimas et a l . for infants between one and two months of age. The 32 s t i m u l i were the s y l l a b l e s /ba/ and /ga/ varying only i n one acoustic c h a r a c t e r i s t i c , t h e i r formant frequency pattern. The second and t h i r d formant t r a n s i t i o n s varied i n d i r e c t i o n and rate of change i n order to vary the feature, 'place of a r t i c u l a t i o n ' . Morse was also concerned as to whether c h i l d r e n can discriminate intonation patterns before they can discriminate phonemic diff e r e n c e s , so the fundamental frequency contour i n the s y l l a b l e /ba/ was varied so that intonation ranged from r i s i n g to f a l l i n g , (ba+) to (ba-). The r e s u l t s showed that the infants c a t e g o r i c a l l y perceived 'place of a r t i c u l a t i o n ' at the same boundary and i n the same manner as the adults. The infants also discriminated i n the acoustic cues that determine intonation patterns. Morse concluded that i f infants of t h i s age can u t i l i z e acoustic information i n a l i n g u i s t i c a l l y relevant manner, t h e i r experience i n producing these contrasts i s not necessary for perceiving these d i s t i n c t i o n s . He suggested that the r e s u l t s support the theory that the speech code i s structured i n terms of a r t i c u l a t o r y invariants that are extracted from the speech s i g n a l during perception, (proposed by Liberman (1970)), and that the a r t i c u l a t o r y basis for t h i s code must be p r i m a r i l y a phylo-genetic a c q u i s i t i o n . Kuhl and M i l l e r (1975) performed an experiment to determine whether the c h i n c h i l l a , an animal without a phylogenetic h i s t o r y of phonetic knowledge, could be trained to d i f f e r e n t i a t e classes of speech sounds d i f f e r i n g i n the feature, 'voice'. In Experiment I, the caged animals were presented with one of two auditory s t i m u l i and were trained to cross a b a r r i e r whenever the "negative" stimulus was presented. They were p o s i t i v e l y reinforced i f they did so, and 33 negatively r e i n f o r c e d i f they f a i l e d to do so. The s t i m u l i were / t i , ta, tu/ and / d i , da, du/ produced by four d i f f e r e n t speakers. The c h i n c h i l l a s were s u c c e s s f u l l y trained to discriminate each p a i r into voiced-voiceless classes and i n addition, without further t r a i n -ing they generalized t h i s learning to discriminate pairs selected from /te, taa , to/ and/de, dae , do/. In Experiment I I , learning was kept to a minimum by c o n t r o l l i n g the order of stimulus presentation and rewarding the animal a f t e r each t r i a l . The s t i m u l i consisted of / t a / and /da/ tokens with the VOT value ranging i n 10-msec steps from 0 to 80 msec. Four English-speaking adults i d e n t i f i e d the same s t i m u l i as / t a / or /da/. The animals i d e n t i f i e d the s t i m u l i i n a manner s t r i k i n g l y s i m i l a r to the adults. The adults and the c h i n c h i l l a s determined a very s i m i l a r phonetic boundary, 35.2 and 33.5 msec r e s p e c t i v e l y . Both sets of subjects were further tested with l a b i a l and v e l a r VOT s e r i e s and again they both determined the same l o c a t i o n for phonetic boundaries and these values agreed with those reported by Lisker and Abramson (1970). The authors concluded that i f speech i s " s p e c i a l " because reference i s made to a r t i c u l a t o r y representations during perception, or because " s p e c i a l phonetic feature detectors" (review following) are involved i n this.process, then the c h i n c h i l l a s should not have performed as they did. They also suggested that the a r t i c u l a t o r y basis f o r speech classes cannot be considered to be phylogentic. Kuhl and M i l l e r proposed that i t i s necessary to reveal the " s p e c i a l " status of speech and that i t i s more l i k e l y that these perceptual 34 a b i l i t i e s r e s u l t because speech-sound oppositions are selected to be highly d i s t i n c t i v e to auditory systems, both human and animal. These experiments provide evidence that the i n t e r p r e t a t i o n given to c a t e g o r i c a l perception by Stevens and House must be recon-sidered. To summarize the revised information a v a i l a b l e on c a t e g o r i c a l perception: 1. The a b i l i t y to perceive c e r t a i n auditory s t i m u l i i n categories (at Teast along the dimensions of 'place of a r t i c u l a t i o n ' and 'voicing') does not require previous experience i n a r t i c u l a t i n g these sequences, nor i s learning required to form phonetic boundaries. 2. Categorical perception may r e f l e c t a general c h a r a c t e r i s t i c of the auditory system and speech-sound oppositions may .have been s e l e c t i v e l y chosen to coincide with the system's res o l v i n g powers. (In t h i s case, c a t e g o r i c a l perception does not r e f l e c t "perception i n a l i n g u i s t i c mode" as stated by Eimas et a l . (1971). 3 . Consequently, theories that consider speech as " s p e c i a l " must not consider c a t e g o r i c a l perception as evidence f o r th i s "specialness", because the c h i n c h i l l a also perceives speech sounds c a t e g o r i c a l l y . Therefore, neither reference to a r t i c u l a t o r y representations nor the use of phonetic feature de-coders, are required by a l i s t e n e r to divide auditory s t i m u l i changing along one acoustic dimension, into s p e c i f i c classes. In order to determine whether there e x i s t general acoustic dimensions to which the auditory system may respond i n a discontinuous fashion, researchers investigated the nature of s t i m u l i that would evoke c a t e g o r i c a l perception. It was necessary to show that these dimensions evoked c a t e g o r i c a l perception i n nonspeech s t i m u l i as well 35 as speech s t i m u l i . At the same time researchers were interested i n determining what d i f f e r e n t i a t e s speech from nonspeech i n order that processing i s performed i n a d i f f e r e n t manner. House, Stevens, Sandel and Arnold (1962) showed that l i s t e n e r s do categorize s t i m u l i as e i t h e r speech or nonspeech and that there are no degrees mediating the two judgments. D i c h o t i c - l i s t e n i n g experiments, such as those performed by Shankweiler and Studdert-Kennedy (1967), Studdert-Kennedy and Shankweiler (1970), Kimura (1964) and Cutting (1974) and experiments on humans with brain lesions (Kimura, 1961) support the f a c t that speech (or signals resembling speech) and nonspeech s t i m u l i are pro-cessed i n d i f f e r e n t areas of the brain. G i l b e r t and Climan (1974) found t h i s to be true f or c h i l d r e n as young as two and one-half years of age. If t h i s speech-nonspeech d i s t i n c t i o n i s not made i n the auditory system, s i g n a l l e d by the presence of c e r t a i n acoustic dimensions, and i f r e s o l v i n g these acoustic dimensions into s p e c i f i c categories can be considered as forming a kind of general "auditory concept" then the question may be proposed, "When does an auditory concept become a phonetic concept?". The following experiments were directed towards answering these questions. Burdick and M i l l e r (1975) performed an experiment to determine whether c h i n c h i l l a s were capable of d i s t i n g u i s h i n g /a/ and / i / . The method was the same as that used by Kuhl and M i l l e r . Four c h i n c h i l l a s were s u c c e s s f u l l y trained to discriminate the s t i m u l i /a/ and / i / produced by one speaker, for the same token at constant p i t c h and sound l e v e l . The token, speaker and p i t c h l e v e l were then varied 36 u n t i l 48 s t i m u l i were presented for i d e n t i f i c a t i o n . The c h i n c h i l l a s s u c c e s s f u l l y generalized t h e i r t r a i n i n g to the new s t i m u l i and thereby ignored v a r i a t i o n s i n p i t c h l e v e l , sound l e v e l and voice q u a l i t y when making the phonetic judgments. Synthetic /a/ and / i / tokens varying i n formant structure and p i t c h contour were then presented to the subjects. The c h i n c h i l l a s c o r r e c t l y observed the relevant formant d i f -ferences and ignored the p i t c h differences when d i f f e r e n t i a t i n g these s t i m u l i . In these experiments, the c h i n c h i l l a s were required to do more than simply d i f f e r e n t i a t e between two s t i m u l i d i f f e r i n g by one v a r i a b l e . They were required to i s o l a t e the e s s e n t i a l differences, disregard nonessential variables and disregard differences with the same nature as those that were e s s e n t i a l , but were s t i l l within a vowel category. These requirements meet those considered i n general psychology as being e s s e n t i a l f o r concept formation. The authors proposed that i n t h i s case the animals defined the s t i m u l i as members of a psychological category, i n t h i s case an acoustic category. The c h i n c h i l l a s do t h i s i n the same way as they form concepts for the meaningful sounds of nature that occur i n t h e i r environment. The authors suggested that speech sounds whose category membership can be perceived, should be described as auditory concepts, for no cognitive r u l e i s employed when judging membership i n these categories. The c h i n c h i l l a s perceived the category membership and therefore formed auditory concepts, so Burdick and M i l l e r concluded that t h i s a b i l i t y can be accounted for by general psychoacoustical processing. Sinnott, Beecher, Moody, and Stebbins (1976) studied the a b i l i t y of Old World monkeys to discriminate the speech sounds /ba/ 37 and /da/. Four monkeys were s u c c e s s f u l l y trained to discriminate /ba/ and /da/ s t i m u l i spoken by an adult male. Two monkeys ca r r i e d t h i s t r a i n i n g over to synthetic /ba/ and /da/ s t i m u l i , while the other two required r e t r a i n i n g but did achieve the transfer. In the second experi-ment human and monkey subjects discriminated synthetic /ba/ and /da/ s t i m u l i that varied i n second and t h i r d formant frequencies (80 Hz steps between the endpoint s t i m u l i used f o r i n i t i a l t r a i n i n g ) . Latencies from stimulus onset to response time were recorded on a PDP-8 computer and the interstimulus i n t e r v a l (ISI) was one of 0.5, 2.0 or 3.8 sec. The task was a version of an AX dis c r i m i n a t i o n task,.where the sub-j e c t heard two s t i m u l i and released a key i f the second was d i f f e r e n t from the f i r s t . The human subjects i d e n t i f i e d a phoneme boundary, yet they were able to detect intraphonemic stimulus differences and were not completely " c a t e g o r i c a l " i n t h e i r d i s c r i m i n a t i o n . The differ e n c e threshold for formant t r a n s i t i o n s was 160 Hz for human subjects and 320 Hz for monkey subjects. The latency functions f o r the monkeys were e s s e n t i a l l y l i n e a r with some s l i g h t increase as the two s t i m u l i approached the same value. For the humans, la t e n c i e s were constant when the s t i m u l i were from d i s t i n c t categories, yet there was a d i s c r e t e latency increase as the second s t i m u l i approached the value of the f i r s t . The nature of the AX task permits a d i r e c t comparison of the auditory s t i m u l i using "echoic memory" whereas the ABX task usually used i n c a t e g o r i c a l perception experiments requires that the s t i m u l i be coded and stored i n short-term memory. The authors suggested that the humans f i r s t sought i n t e r -38 phonemic differences and i f they were found the response was immediate. Otherwise the response was delayed to search for differences i n timbre. The monkeys r e l i e d s o l e l y on timbre information, producing constant l a t e n c i e s . Further support f o r t h i s proposal occurred when ISI increased. The humans' performance became more c a t e g o r i c a l . In t h i s case the s t i m u l i would be needed to be coded into short-term memory and the task more c l o s e l y resembled an ABX paradigm. Monkeys showed no such signs of latency differences even a f t e r extensive t r a i n i n g . The authors interpreted t h i s f i n d i n g as evidence that the monkeys perceived the s t i m u l i i n a continuous manner and the humans i n a c a t e g o r i c a l manner, and that there e x i s t s some d i f f e r e n c e , i n the underlying perceptual mech-anism. Sinnott et a l . proposed that while auditory systems may detect acoustic d i s c o n t i n u i t i e s that occur along a continuum, a p a r t i c u l a r species w i l l form concepts about such d i s t i n c t i o n s that are meaningful for him i n h i s environment and communication system. Cutting and Rosner (1974) f i r s t demonstrated that categories and boundaries occur for both speech and nonspeech s t i m u l i that d i f f e r i n r i s e time. In Experiment I subjects were presented with four sets of s t i m u l i that varied i n r i s e time i n nine steps, from 0 to 80 msec. The nonspeech s t i m u l i were sawtooth waves at 294 Hz and 440 Hz. The speech s t i m u l i were the a f f r i c a t e / f r i c a t i v e p a i r l+j-JI combined with one of the two vowels, /a/ or /ac/. The r i s e varied i n 10 msec i n t e r v a l s . The subjects performed an i d e n t i f i c a t i o n task and then an ABX comparison task. 39 The r e s u l t s of both arrays of nonspeech s t i m u l i were combined as they revealed no dif f e r e n c e s . S i m i l a r l y , the two sets of speech s t i m u l i were analyzed as one set. Subjects i d e n t i f i e d the sawtooth waves eit h e r "plucked" or "bowed" sounds from a s t r i n g instrument and i d e n t i f i e d a boundary at 40 msec. The dis c r i m i n a t i o n function showed a sharp peak at t h i s boundary and d i s c r i m i n a t i o n was s i g n i f i c a n t l y poorer f o r s t i m u l i within a category. The speech s t i m u l i were also perceived i n a c a t e g o r i c a l manner. Therefore c a t e g o r i c a l perception takes place f o r the a f f r i c a t e / f r i c a t i v e d i s t i n c t i o n as well as for d i s t i n c t i o n s among stop consonants. In Experiment I I , the order of the tasks was reversed to ru l e out the p o s s i b i l i t y that the i d e n t i f i c a t i o n task had influenced the dis c r i m i n a t i o n task. The same two sawtooth s e r i e s as well as two series of sine waves (at 294 and 440 Hz) varying i n r i s e time from 0 to 70 msec were presented. Reversing the order of tasks had no e f f e c t on the r e s u l t s and c a t e g o r i c a l perception occurred for sine waves as w e l l as the sawtooth waves. Moreover, the category boundary determined by the peak i n the dis c r i m i n a t i o n function occurred at the same l o c a t i o n , 40 msec. " In the ABX dis c r i m i n a t i o n task i t i s necessary that some type of coding take place, for echoic memory could not store a l l three s t i m u l i during a n a l y s i s . Within-category information may be l o s t during the coding of A and B because these differences did not remain long enough i n short-term memory. "The quick los s of within-category information i s the crux of c a t e g o r i c a l perception [p. 569]." Therefore the ABX paradigm i s very s e n s i t i v e to i d e n t i f i c a t i o n of perceptual categories i n a u d i t i o n . 40 For stop consonants, i t i s obvious that the l i s t e n e r perceives the event almost immediately, as a phonetic c l a s s . But for the "plucked" and "bowed" notes the encoding cannot be phonetic. "This f a c t , coupled with the r e s u l t of the f i r s t experi-ment, which demonstrated that r i s e time can cue percep-t u a l categories i n both speech and music, suggests that c e r t a i n aspects of phonetic coding may be intimately r e l a t e d to the coding of n a t u r a l l y occurring n o n l i n g u i s t i c sounds [p. 569]." The authors concluded that the perception of speech has developed around the e x i s t i n g properties of the auditory system and that r i s e time may be one of the dimensions by which d i f f e r e n t categories are detected. M i l l e r , Weir, Pastore, K e l l y and Dooling (1976) investigated whether noise-buzz sequences would be c a t e g o r i c a l l y perceived i n a s i m i l a r manner to pl o s i v e consonants d i f f e r i n g i n voice onset time. A 100 Hz buzz of constant duration (500 msec) was presented with thermal noise that l e d the buzz i n 10 msec steps along a continuum from -10 msec to +80 msec. In the dis c r i m i n a t i o n test the s t i m u l i were presented i n groups of three. Two were i d e n t i c a l while the t h i r d d i f f e r e d , and the subject was required to i d e n t i f y the d i f f e r e n t stimulus. For the i d e n t i f i c a t i o n t e s t , the subject l a b e l l e d the stimulus as "noise" or "no-noise", l a b e l s chosen from the subjects' observations. The s t i m u l i were discriminated and i d e n t i f i e d i n a manner that met the requirements f o r c a t e g o r i c a l perception proposed by Studdert-Kennedy, Liberman, Harris and Cooper (1970). The authors interpreted c a t e g o r i c a l perception as the d e t e c t a b i l i t y of one s i n g l e component i n the stimulus complex that i s judged r e l a t i v e to the 41 constant part of the stimulus context. There are psychophysical boundaries or thresholds for perceptual e f f e c t s that are encountered as the one component changes, and at the threshold the e f f e c t s undergo rapid changes i n d i s c r i m i n a b i l i t y , c l a r i t y or perceived magnitude. Pairs of s t i m u l i that straddle the threshold w i l l be discriminated w e l l , but pairs selected from within a category w i l l follow Weber's law. Therefore i f constant differences are tested they w i l l be discriminated at a constant Weber f r a c t i o n (the r a t i o of AX to X). M i l l e r et a l . added that t r a i n i n g , attention and memory may serve to sharpen the boundary and enhance dis c r i m i n a t i o n of items from either side of the boundary, but that these e f f e c t s are only a matter of degree. Also, when di s c r i m i n a t i o n involves comparisons of categories that are named, as opposed to comparing sensory traces, then the re s u l t s w i l l more l i k e l y be c a t e g o r i c a l . Therefore while c a t e g o r i c a l perception may occur i n general sensory psychoacoustical experiments, i t w i l l occur i n i d e a l form only when " c e r t a i n combinations of stimulus configurations as w e l l as t r a i n i n g and perceptual-cognitive factors obtain [p. 416]". The boundary determined by M i l l e r et a l . for noise-lead (20 msec) was also determined i n an experiment by Stevens and K l a t t (1974). In Experiment I, the s t i m u l i consisted of a 5 msec burst of noise followed by a v a r i a b l e i n t e r v a l of s i l e n c e (from 0 to 40 msec) and the onset of a synthetic vowel with f i x e d formants. The s t i m u l i were not judged to be speech. Listeners could not detect a s i l e n t i n t e r v a l f o r a VOT up to 15 msec. Af ter *a VOT of 25 msec,, l i s t e n e r s reported a s i l e n t i n t e r v a l . F i f t y percent detection occurred at a VOT of 20 msec. 42 In Experiment II the s t i m u l i were / t a / or /da/ tokens that varied independently i n both voice onset time and t r a n s i t i o n duration. The authors sought evidence for l i s t e n e r s perceiving the voiced-voice-le s s d i s t i n c t i o n , not only by detecting VOT differences but by detecting the presence or absence of a s i g n i f i c a n t formant t r a n s i t i o n following v o i c i n g onset (a c h a r a c t e r i s t i c of natural English stop consonants). The r e s u l t s showed that the VOT was increased when the duration of the formant increased. The average VOT at the phoneme boundary moved from 26 to 39 msec for a 30 msec change i n formant t r a n s i t i o n duration. These findings i n d i c a t e that the various components contained i n the i n i t i a l 20-25 msec of a stimulus are integrated and perceived as a u n i t . The presence of an t r a n s i t i o n a f t e r the onset of v o i c i n g i s a cue f o r a voiced consonant. The authors postulate " f i r s t that the cue f o r the presence of a consonantal segment i s a rapid change i n the acoustic spectrum occurring at a point where there i s an abrupt or discon-tinuous increase i n i n t e n s i t y i n some frequency range (Stevens 1971). This rapid change i n the acoustic spectrum i s i n the frequency range above about 1000 Hz, and occurs over a b r i e f time i n t e r v a l of 20-30 msec. The c h a r a c t e r i s t i c s of t h i s transient spectrum s h i f t provide some of the cues for place of a r t i c u l a t i o n for the consonant [p. 657]." The r e s u l t s of Experiment I I can be explained by t h i s hypo-th e s i s . When VOT i s delayed, a second d i s c o n t i n u i t y a r i s e s i n the spectrum and the two onsets are perceived as separate events. If the VOT i s le s s than 20 msec then the two onsets are perceived as simultan-eous and the stimulus i s i d e n t i f i e d as voiced. In voiced consonants the t r a n s i t i o n i s of s i g n i f i c a n t duration and occurs at the onset of v o i c i n g , but i n the vo i c e l e s s consonants the t r a n s i t i o n i s short and i s v i r t u a l l y 43 complete by the beginning of VOT. The l i s t e n e r determines whether the onsets are simultaneous or d i s c r e t e and whether there e x i s t s a presence or an absence of rapid spectrum change at the onset of v o i c i n g , as cues to the voiced-voiceless d i s t i n c t i o n . These r e s u l t s c l e a r l y explain the r e s u l t s found by Eimas et a l . . Stimuli were varied along the VOT continuum i n 20 msec steps. The infants perceived the 20 msec VOT as voiced and the 40 msec VOT as v o i c e l e s s . The present r e s u l t s would predict such a d i v i s i o n based on the VOT values chosen. Stevens and K l a t t offered a tempting explanation for the longer VOTs that occur for velar consonants than dentals, and for dental consonants than l a b i a l s . " I t i s known that the duration of the movement of the a r t i c u l a t o r that forms the closure i s greatest for the tongue body, les s for the tongue t i p , and l e a s t for the l i p s [p. 658]". The formant t r a n s i t i o n s are manifested a c o u s t i c a l l y during t h i s time and therefore t h e i r rate of change follows the same order. When the rate of the t r a n s i t i o n i s the slowest, the voice onset time for v o i c e l e s s stops must also increase so that the t r a n s i t i o n i s com-plete before v o i c i n g begins and a c o n f l i c t i n g cue for a voiced stop does not r e s u l t . Hence the authors posited an a r t i c u l a t o r y basis for some of the acoustic cues that e x i s t i n a stimulus and the suggestion i s apparent — that perceptual d i s t i n c t i o n s are based on d i s t i n c t i o n s performed necess a r i l y during production. While the above studies c l e a r l y reveal that the auditory sys-tem i s capable of d i s t i n g u i s h i n g auditory s t i m u l i into classes when one acoustic dimension i s varied about s p e c i f i c "thresholds" for that 4 4 dimension, i t i s also cl e a r that the simple notion of f i x e d regions of auditory s e n s i t i v i t y w i l l not s u f f i c e to explain t h i s a b i l i t y . L i sker and Abramson (1970) i n t h e i r studies of c a t e g o r i c a l perception, revealed that speakers of languages with c h a r a c t e r i s t i c a l l y d i f f e r e n t VOT values w i l l also determine c a t e g o r i c a l boundaries at these d i f -ferent values. Experience then must play some r o l e i n shaping these boundaries and furthermore the experience involves a r t i c u l a t i n g the classes of speech sounds. Burdick and M i l l e r (1975), M i l l e r et a l . (1976), and Sinnott et a l . (1976), have a l l formed conclusions i n t h i s d i r e c t i o n . Burdick and M i l l e r claim that the c a t e g o r i c a l perception experiments reveal no more than the formation of an auditory concept and that t h i s concept i s l i k e l y "perceived". They suggest that these concepts are then relevant to the perception of the constituents of language, allophones, phonemes and s y l l a b l e s , but elaborate no further. M i l l e r et a l . concluded that t r a i n i n g , attention and memory are im-portant factors i n e s t a b l i s h i n g categories and they point out the p a r a l l e l s that e x i s t with studies on t r a i n i n g and generalization (Mostofsky, 1965). Most important i s the authors' observation that "overlearned responses or l a b e l s f o r the s t i m u l i probably increase the chances that the subjects w i l l compare la b e l s or names of the s t i m u l i rather than t h e i r sensory traces during d i s c r i m i n a t i o n tasks [p. 416]". Therefore knowledge of the speech sounds i n a language w i l l contribute to the a b i l i t y of the mechanism to i d e n t i f y the speech s t i m u l i as "named" sounds (that name themselves). Sinnot et a l . pro-posed that a s p e c i f i c species w i l l perceive s t i m u l i that vary along an acoustic continuum, as d i s t i n c t categories depending upon the concepts 45 t h e y have fo rmed abou t s u c h s t i m u l i i n t h e i r c o m m u n i c a t i o n s y s t e m , and t h e r e f o r e , i f t h e s e c a t e g o r i e s a r e m e a n i n g f u l t o t h a t s p e c i e s i n i t s e n v i r o n m e n t . A human may p e r c e i v e a s e r i e s o f a n i m a l c a l l s as f a l l i n g a l o n g a n a c o u s t i c c o n t i n u u m , y e t t h e a n i m a l may r e s p o n d c a t e g o r i c a l l y . The b o u n d a r i e s and c l a s s e s t h a t e x i s t i n a c o m m u n i c a t i o n s y s t e m r e v e a l t h e r e l e v a n t d i s t i n c t i o n s t h a t t h e u s e r s r e q u i r e . The p a r a d i g m s u s e d t o d e t e r m i n e d i s c r i m i n a t i o n f u n c t i o n s r e v e a l t h a t c o d i n g i n t o s h o r t - t e r m memory i s n e c e s s a r y t o r e v e a l t h e c h a r a c t e r -i s t i c s o f d i s c r i m i n a t i o n f o u n d i n c a t e g o r i c a l p e r c e p t i o n . I f s u c h c o d i n g i s n o t r e q u i r e d , t h e s e n s o r y t r a c e s s t o r e s i n " e c h o i c memory" c a n be d i r e c t l y compared and p e r c e p t i o n becomes more c o n t i n u o u s . The above s t u d i e s p r o p o s e t h a t p o o r d i s c r i m i n a t i o n f o r s t i m u l i w i t h i n a c l a s s r e s u l t s b e c a u s e t h e i t e m s a r e i d e n t i f i e d as members o f t h e same c l a s s and e n t e r p h o n e t i c s t o r e . T h i s i n f o r m a t i o n i s a n a l y z e d d u r i n g d i s c r i m i n a t i o n b e c a u s e t h e i n t r a p h o n e m i c d i f f e r e n c e s a r e l o s t due t o p o o r a u d i t o r y memory. D i s c r i m i n a t i o n i s good f o r s t i m u l i f r o m d i f f e r -e n t c l a s s e s b e c a u s e t h e r e p r e s e n t a t i o n s i n p h o n e t i c s t o r e a r e s u f f i c i e n t f o r a c c u r a t e d i s c r i m i n a t i o n . The l a t e n c y f u n c t i o n s r e p o r t e d by S i n n o t t e t a l . s u p p o r t t h i s i d e a and i n d i c a t e t h a t t h e l i s t e n e r w i l l f i r s t p r o c e s s p h o n e t i c i n f o r m a t i o n and t h e n (and p r o b a b l y n o t n o r m a l l y ) p r o c e s s i n t r a p h o n e m i c i n f o r m a t i o n . P i s o n i ( 1 9 7 3 a ) , u s i n g an AX same-d i f f e r e n t t a s k , v a r i e d t h e i n t e r v a l f r o m A t o X f r o m z e r o t o two s e c o n d s , f o r b o t h v o w e l and s t o p c o n s o n a n t c o n t i n u a . When t h e s t i m u l i we re f r o m d i f f e r e n t c a t e g o r i e s , d i s c r i m i n a t i o n was h i g h f o r b o t h c o n s o n a n t s and v o w e l s d i d n o t v a r y w i t h t h e i n t e r s t i m u l u s i n t e r v a l . P r e s u m -a b l y t h e p h o n e t i c s t o r e was u t i l i z e d ; i t h a v i n g a s l o w e r r a t e o f decay < 46 than the auditory store. When the s t i m u l i were from the same category the d i s c r i m i n a t i o n was poor for consonants and independent of stimulus delay. For vowels, di s c r i m i n a t i o n was high but declined with an increase i n the delay i n t e r v a l . Presumably i n t h i s case, the informa-t i o n was coded into auditory store; i t having a rapid rate of decay. If the hypothesis of the r o l e of auditory memory i n c a t e g o r i c a l per-ception i s to be accepted, then i t has been shown that auditory memory i s strong for vowels and weak for consonants. I t follows that when vowels are degraded so that they are poorly represented i n auditory memory, they are perceived more c a t e g o r i c a l l y (Pisoni, 1973b; Lane, 1965; Sachs, 1969). Consonants and vowels are distinguished i n these experiments not by the processes of assignment to t h e i r phonetic c l a s s , but by t h e i r c h a r a c t e r i s t i c s and the duration of t h e i r auditory stores. These considerations and t h e i r implications for the r e v i s i o n of the analysis by synthesis model proposed by Stevens and House w i l l be presented i n the following s e c t i o n . F i r s t i t i s necessary to make some mention of the possible existence of feature detectors. Neurophysiological evidence for the existence of feature detectors i n the auditory system has been found for species other than the human. Single c e l l s at the c o r t i c a l l e v e l have been found to respond to a s p e c i f i c stimulus or a s p e c i f i c c h a r a c t e r i s t i c contained i n s t i m u l i . For example, Wollberg and Newman (1972) described s i n g l e c e l l s i n the cortex of the s q u i r r e l monkey that responded only to the species' ' i s o l a t i o n peep'. In humans, the adaptation paradigm i s used to evidence the existence of such feature analyzing systems. In short, 47 the r a t i o n a l e i s that i f such detectors e x i s t and a s t i m u l i i s presented continuously, then i t w i l l adapt or fatigue the detector being u t i l i z e d and s e n s i t i z e an adjacent detector (or system). Therefore, i f per-ceptual s h i f t s can be demonstrated to occur a f t e r prolonged stimulation by one feature, a feature detector i s said to ex i s t for that feature. I f such systems e x i s t , then several questions need to be answered — "Are they peripheral or central?", "Are there separate detectors f o r the presence or absence of a feature, or i s there only one detector?" and "Are the detectors auditory or phonetic?". Adaptation studies have shown that perceptual boundaries are s h i f t e d towards the adapting stimulus, for the features 'voicing' and 'place of a r t i c u l a t i o n ' (Ades, 1974a; Cooper, 1974a; Eimas, Cooper, & Corbit, 1973; Eimas & Corbit, 1973). Ades (1974b) and Eimas et a l . (1973) demonstrated that the s h i f t occurs as strongly when the adapt-ing stimulus i s presented to the ear c o n t r a l a t e r a l to the ear being tested, as when both s t i m u l i are presented b i n a u r a l l y . They construed t h i s as evidence that the detectors are c e n t r a l . Support f o r the existence of separate detectors for the two values of a feature comes from evidence that voiced stops are more r e s i s t a n t to adaptation and y i e l d smaller boundary s h i f t s than do voi c e l e s s stops (Eimas & Corbit, 1973; Eimas et a l . , 1973). Whether such systems are phonetic or auditory i n nature has not been determined. Eimas et a l . (1973) demonstrated that the f i r s t 50 msec of /da/ (an acoustic segment that contains v o i c i n g information but i s not heard as speech) was i n e f f e c t i v e as an adapting stimulus. Eimas and Corbit (1973) showed that when using the continua /ba/ to /pa/ or i 48 /da/ to / t a / and the extreme values from ei t h e r set as an adapting stimulus, c r o s s - s e r i e s adaptation took place. Also the discrimina-t i o n function s h i f t e d to coincide with the adapted phonetic boundary. Eimas et a l . (1973) showed that adaptation with l a b i a l stop produced boundary s h i f t s on alveolar and v e l a r stop consonant VOT continua. If the e f f e c t were acoustic, differences i n second and t h i r d formant t r a n s i t i o n d i r e c t i o n s might have ruled out the e f f e c t . The researchers interpreted t h i s as evidence that the detectors are phonetic. But the feature tested was voice onset time, a feature i d e n t i f i e d by complex and r e l a t i o n a l cues (Stevens & K l a t t , 1974). Cooper (1974a) and Ades (1974b) produced cross-series adaptation for /b/-/d/ continua with d i f f e r e n t vowels. Therefore the formant t r a n s i t i o n s could r i s e for a token with one vowel and f a l l for a token with another vowel. Ades and Cooper interpreted the r e s u l t s as showing that the detectors are phonetic i n nature. Bailey (1973) constructed two /ba/-/da/ s e r i e s . In one, was f i x e d and place cues were i n F^. In the other, there was no F^ and a l l place cues were i n F^. The experiment yielded cross-adaptation from the F^ cue s e r i e s to the f i x e d F 2 , but none from the F^ cue seri e s to the non-existent F^. These r e s u l t s strongly favour that adaptation i s auditory i n nature. The controversy over phonetic or auditory detectors has not been resolved. At t h i s stage i t i s probably wise not to r u l e out auditory, phonetic or both auditory and phonetic systems. Cooper (1974b) showed that adaptation on the / b i / - / p i / continuum produced not only a perceptual boundary s h i f t , but that speakers also s h i f t e d t h i s boundary during production. Here i s strong support for the often 49 postulated perception-production l i n k . Studdert-Kennedy (1976) suggested that the o r i g i n of t h i s l i n k be sought i n studies of language a c q u i s i t i o n . In the following chapter, l i t e r a t u r e concern-ing c h i l d perception and production w i l l be examined to seek support for t h i s production-perception l i n k . Recent research has succeeded i n demonstrating that the auditory system performs analysis more complex than was previously believed to be the case. However i t has not demonstrated that r e s u l t s of t h i s analysis produce a complete s p e c i f i c a t i o n of features that can be converted " d i c t i o n a r y - s t y l e " into a sequence of phonemes. Invariance between acoustic a t t r i b u t e s and l i n g u i s t i c features has not been su c c e s s f u l l y resolved and there remains the need f or a mediating step, such as the synthesis of the s i g n a l at some abstract a r t i c u l a t o r y l e v e l . Results from studies of c a t e g o r i c a l perception (Lisker & Abramson, 1970) and feature analyzing systems (Cooper, 1974b) also support the existence of a l i n k with a r t i c u l a t i o n during perception. The studies reviewed i n the previous section demonstrated that the type of processing previously believed to be reserved f o r speech, also occurs for nonspeech that contains s p e c i f i c acoustic dimensions. Hence t h i s analysis occurs i n the auditory system for a l l s t i m u l i . Knowledge that speech perception requires processing beyond that used for nonspeech, and that unique c o r t i c a l s i t e s are employed for such, led research to re-i n v e s t i g a t e the requirements for phonetic processing. M i l l e r et a l . (1976), Burdick and M i l l e r (1975), Cutting and Rosner (1974), and Sinnott et a l . (1976) hypothesized that phonetic processing requires that the s t i m u l i be recognized as part of a larger system of sound s t i m u l i that i n t e r a c t and are h i e r a r c h i c a l l y organized. 50 The transformation occurs at the acoustic to psychological l e v e l . Stimuli are processed accordingly when they have some meaning i n th i s system and represent a concept. Such systems may be l i n g u i s t i c or n o n l i n g u i s t i c (music) but when they are l i n g u i s t i c , the components are sounds that form auditory concepts, t h e i r own "name". The current research has led to only a surface understanding of the g e n e r a l i t i e s involved during speech perception. The s p e c i f i c s are buried under discrepancies i n data and authors' inte r p r e t a t i o n s of t h e i r data. This i s c e r t a i n l y the case concerning the nature of speech processing that extends beyond that used f o r nonspeech. In the following section the analysis by snythesis model w i l l be revised as s p e c i f i c a l l y as possible, i n view of current research. 2.4 A Revised Model Peripheral Analysis — Huggins (1964) and Cherry and Taylor (1954) presented continuous speech to subjects while a l t e r n a t i n g the s i g n a l between ears. The subjects were required to "shadow" the speech by repeating i t aloud as they heard i t . At a c r i t i c a l rate of a l t e r n a t i o n -about 3 to 4 alternations per second - the i n t e l l i g i b i l i t y of the speech was sharply reduced. Huggins also demonstrated that t h i s e f f e c t was not the r e s u l t of an "attention-switching" factor but that processing at the peripheral l e v e l was interrupted. This rate corres-ponds approximately to the rate of s y l l a b l e occurrence. Therefore i t was suggested that a p a r t i a l extraction of cues occurs at the peripheral l e v e l and that such analysis takes place over the approximate length of a s y l l a b l e . 51 Stevens and House (1972) proposed that t h i s peripheral analysis i s more than simple frequency analysis and that a l l signals undergo such processing. Acoustic differences that r e s u l t from d i f f e r e n t speakers are normalized at t h i s l e v e l and the r e s u l t i s a set of normalized acoustic a t t r i b u t e s , some of which may correspond d i r e c t l y to features that are not strongly context-dependent. Results from c a t e g o r i c a l perception experiments show that stim-u l i not heard as speech can be perceived c a t e g o r i c a l l y i f they contain c e r t a i n acoustic information that v a r i e s over a continuum. There-fore such analysis i s not l i n g u i s t i c a l l y relevant and i s a general c h a r a c t e r i s t i c of the auditory system. A l l acoustic s t i m u l i must under-go the same auditory a n a l y s i s . This f i n d i n g also indicates that the d i f f e r e n t i a t i o n between speech and nonspeech does not occur at t h i s l e v e l , due simply to the detection of these acoustic dimensions. Stevens and House c i t e d neurophysiological studies of animals "such as [those of] Kiang, Watanbe, Thomas, and Clark (1965) on the auditory system of the cat, of L e t t v i n , Maturana, McCulloch, and P i t t s (1959) on the v i s u a l system of the frog, and of Frishkopf and Gold-s t e i n (1963) on the frog's auditory system as evidence that f a i r l y complex processing takes place p e r i p h e r a l l y [p. 48]". Current research examining the acoustic correlates of the more invariant features sup-port that complex analysis i s performed i n the auditory system. Stevens and K l a t t (1974) report that the voiced-voiceless d i s t i n c t i o n i s indicated by the presence or absence of rapid spectrum changes at the onset of v o i c i n g (also found i n nonspeech s t i m u l i ) . The presence of a stop consonant i s detected by a rapid i n t e n s i t y increase i n the 5 2 spectrum (Stevens, 1971) and r i s e time indicates the f r i c a t i v e - a f f r i c a t e d i s t i n c t i o n as w e l l as d i f f e r e n t i a t i n g classes of nonspeech s t i m u l i (Cutting & Rosner, 1974). Recently, evidence has been produced to show that some of the more strongly context-dependent features can be derived d i r e c t l y from acoustic analysis of the speech s i g n a l (apart from the possible e x i s t -ence of feature analyzing systems). Kuhn (1975) studied spectrographic samples from f r i c a t i v e and normal speech to determine whether the front cavity resonance frequency could be d i r e c t l y calculated. A v a r i a b l e frequency component i n the f r i c a t i v e speech was found to specify the quarter-wave resonance of the front c a v i t y . In normal speech t h i s resonance was calculated from the second or t h i r d formant t r a n s i t i o n depending on whether the vowel was formed i n the back or front of the mouth. Kuhn also determined that stop consonant bursts y i e l d informa-t i o n that can be used to estimate the cavity resonance and hence place of a r t i c u l a t i o n data. Further research may v e r i f y that such information i s derived d i r e c t l y by acoustic analysis and that the same may occur for other non-invariant features. Stevens (1960) required that the model perform l i n g u i s t i c and n o n l i n g u i s t i c analysis simultaneously, yet independently. Stevens and House c i t e examples from neurophysiological studies of animals that show the p e r i o d i c i t y of the vowel, and analysis of the vowel's spec-trum envelope are analyzed separately i n the auditory system (Davis, Silverman, & McAuliffe, 1951; Schouten, Ritsma, & Lopes Cardozo, 1962). 53 Burdick and M i l l e r showed that c h i n c h i l l a s could discriminate two categories of speech sounds without responding to the i r r e l e v a n t v a r i a t i o n i n sound l e v e l , p i t c h l e v e l , p i t c h contour and voice q u a l i t y . Therefore i t seems that the animals processed the information relevant for the phonetic d e c i s i o n independently of the i r r e l e v a n t information. Wood (1975) reported evidence for p a r a l l e l extraction of p i t c h and segmental information bearing on phonetic c l a s s i f i c a t i o n . In a speeded c l a s s i f i c a t i o n task the reaction times increased when subjects were required to make decisions on 'place of a r t i c u l a t i o n ' when p i t c h was also v a r i e d independently (as opposed to the co n t r o l condition where only the target dimension v a r i e d ) . When the p i t c h was the target dimension and 'place' also varied the reaction times did not increase. Wood (1974) v a r i e d fundamental frequency and phonetic class i n a c o r r e l -ated manner rather than independently and found that reaction times were s i g n i f i c a n t l y shorter than on the two-dimensional t e s t . He concluded that l i n g u i s t i c and n o n - l i n g u i s t i c information are extracted separately and simultaneously. The Stevens and House model proposed that the acoustic a t t r i -butes a r i s i n g from peripheral analysis are placed i n temporary store while the synthesis "loop" i s transversed and phonetic analysis i s completed. Hence the model i m p l i c i t l y assumed that auditory analysis i s performed at the peripheral l e v e l and phonetic analysis beyond. The temporary store component must be an auditory store as no phonetic coding has yet taken place. The nature of auditory memory (to be discussed below) presents problems i f such i s assumed to ex i s t at such 54 an early stage i n the speech perception process. It i s more appro-p r i a t e to divide the study of the processes involved into auditory and phonetic processes with t h e i r respective stores, rather than peripheral and c e n t r a l processes. The r e s u l t s considered above, under the head-ing "peripheral a n a l y s i s " , include the complex analysis that i s per-formed i n the auditory system regardless of s i t e . Temporary Store — One of the a l t e r n a t i v e s that a model of speech per-ception must specify, i s whether processing i s c a r r i e d out i n a serie s of stages, i . e . auditory a n a l y s i s , phonetic analysis, morphological analysis and so f o r t h , or whether processing of these various l e v e l s occurs i n p a r a l l e l . The Stevens and House model allows f o r p a r a l l e l processing and therefore some form of store must be u t i l i z e d during a n a l y s i s . As mentioned above the temporary store must be auditory i n nature. "The auditory store, or trace, i s usually assumed to be rather l i k e an echo: a f a i n t simulacrum, i f not of the waveform, at le a s t of i t s neural correlates at an early stage of processing [p. 262]", (Studdert-Kennedy, 1976). The echo i s an analog of the o r i g i n a l and decays r a p i d l y . If another sound a r r i v e s before decay i s complete, the trace i s immediately displaced. As revealed by the r e s u l t s of ca t e g o r i c a l perception experiments, the auditory store i s u t i l i z e d when disc r i m i n a t i n g s t i m u l i that are phonetically members of the same class yet a c o u s t i c a l l y d i f f e r e n t . The r e s u l t i n g d i s c r i m i n a t i o n functions revealed that t h i s memory i s strong for vowels yet weak for consonants. Auditory memory can be divided into two stores. Store I i s very b r i e f , while Store II can l a s t several seconds. Decoding of a CV s y l l a b l e requires analysis of information spread throughout the 5 5 s y l l a b l e . Therefore s y l l a b l e information (consonant and vowel information) must be processed i n p a r a l l e l and Store I must l a s t over s y l l a b l e length, probably 200-300 msec (Pisoni & Tash, 1974; L i b e r -man, 1970). However studies that sought to specify the duration of t h i s store by determining the time required to free a target CV from i n t e r r u p t i o n by a masking s y l l a b l e , produced widely varied r e s u l t s probably due to the v a r i e t y of masking conditions imposed. S y l l a b l e processing time probably varies with a t t e n t i o n a l c o n t r o l , speaking rate and other f a c t o r s . Stage II of auditory memory probably l a s t s several seconds. It was studied i n d e t a i l by Crowder and Morton (1973), under the term "precategorical acoustic stage" (PAS). Three e f f e c t s are t y p i c a l l y evidenced i n PAS: 1. Error increases from the beginning to the end of a l i s t that i s r e c a l l e d , with a s l i g h t drop on terminal items (rec-ency e f f e c t ) . 2. The terminal drop i s increased i f the l i s t i s pre-sented by ear rather than eye (modality e f f e c t ) . 3. The recency ef-f e c t i s reduced i f the auditory l i s t i s followed by a redundant spoken s u f f i x that indicates the subject should begin r e c a l l ( s u f f i x e f f e c t ) . These three e f f e c t s were shown to occur when CV l i s t s consist of members d i f f e r i n g i n vowel alone or vowel and consonant. The e f f e c t s do not occur for CV or VC s y l l a b l e l i s t s where members d i f f e r i n voiced stop consonants (Cole, 1973). Crowder (1971a) concluded that vowels receive representation i n PAS, while voiced stop consonants do not. However i n a study by P i s o n i and Tash (1974) subjects were required to make same d i f f e r e n t judgments for pairs of s t i m u l i drawn from the /ba/-/pa/ continuum. "Same" reaction times were fas t e r when the pairs 56 were i d e n t i c a l than when they were a c o u s t i c a l l y d i s t i n c t . " D i f f e r e n t " reaction times decreased as the acoustic differences between items i n a p a i r from d i f f e r e n t categories increased. Hence i t appears that at l e a s t some trace of consonants must reside i n the store. In addition, Darwin and Baddeley (1974) demonstrated the recency e f f e c t for tokens of a stop CV;, /ga/, and the two CV s y l l a b l e s lja/ and /ma/. They also eliminated the recency e f f e c t f o r vowels by reducing t h e i r duration (30 msec of a 60 msec CV s y l l a b l e ) . They concluded that items i n PAS are not r e l i a b l y accessed when they are s i m i l a r a c o u s t i c a l l y and that the consonant-vowel d i s t i n c t i o n i s for the large part, i r r e l e v a n t . During r e c a l l of an eight-item l i s t , the degree of i n t e r -ference w i l l decrease as time between items decreases and, i f the s u f f i x i s presented two seconds a f t e r the l a s t item, the s u f f i x e f f e c t v i r t u a l l y disappears. Crowder (1971b) postulated that an a c t i v e re-hearsal process at the a r t i c u l a t o r y l e v e l i s operative and therefore that performance w i l l improve as time allowed for PAS decay increases. Crowder stated then, that decay does not occur i n PAS because the subject can check rehearsal of items against h i s auditory store. If another s t i m u l i a r r i v e s before t h i s v e r i f i c a t i o n i s complete the error p r o b a b i l i t y w i l l increase. "A preliminary a r t i c u l a t o r y , i f not phonetic, decision must be made before PAS i s l o s t i f rehearsal i s to permit cross-check with the store [p. 267]" (Studdert-Kennedy, 1976). Although Crowder's hypothesis may appear to support the temporary store component i n the Stevens and House model there exists 57 one important d i f f e r e n c e . P r e c i s e l y , the information that would need to be represented i n PAS i n order to resolve invariance ( i . e . stop consonant data) appears to be poorly represented i n auditory memory. Consonantal auditory memory i s probably much less than a second. In order to-allow the synthesis loop to operate, the store would need to be of longer duration. The PAS rehearsal loop may go into operation at an early stage and prolong the auditory memory. Consonantal information would s t i l l decay though, and there i s no evidence of early operation of t h i s loop. Studies therefore began to reconsider the s e r i a l - p a r a l l e l d i s t i n c t i o n and began to d i s t i n g u i s h the types of information processed a u d i t o r a l l y and phonetically and t h e i r i n t e r a c t i o n . Wood (1975) performed four experiments to d i s t i n g u i s h between auditory and phonetic l e v e l s of processing. Two techniques were u t i l i z e d ; the f i r s t measured reaction times on a speeded c l a s s i f i c a t i o n test and the second measured average evoked p o t e n t i a l s during the same t e s t s . Experiment I measured reaction times for the i d e n t i f i c a t i o n of a phonetic dimension (place of a r t i c u l a t i o n f or voiced stop consonants /ba/ and /ga/) and an audi-tory dimension (fundamental frequency). The evoked p o t e n t i a l s for each cerebral hemisphere were also recorded. The s t i m u l i varied between two values for each dimension and subjects performed a two-choice i d e n t i f i c a t i o n task. In the control condition only the target dimension varied, i n the orthogonal condition the target dimension and the i r r e l e v a n t nontarget dimension varied orthogonally. Results showed there was a su b s t a n t i a l increase i n reaction time from the control condition to the orthogonal condition for place, but there was minimal 58 difference f o r p i t c h . There was no differe n c e i n the average evoked po t e n t i a l s f o r processing of p i t c h and place at r i g h t hemisphere locati o n s , but s i g n i f i c a n t differences occurred at every electrode l o c a t i o n of the l e f t hemisphere. The conclusions were that an auditory-phonetic d i s t i n c t i o n can be made, that the phonetic dimension required processing beyond that used f o r the auditory dimension, and that such processing took place i n the s p e c i a l i z e d l e f t hemisphere. In Experiment II two auditory dimensions, p i t c h and i n t e n s i t y , were compared to determine whether the r e s u l t s from Experiment I did indeed occur due to d i f f e r e n t l e v e l s of processing required f o r phone-t i c and auditory processing. The reaction times for both auditory dimensions increased s i g n i f i c a n t l y from the control to orthogonal condition. The evoked potentials f o r p i t c h and i n t e n s i t y did not vary at any electrode l o c a t i o n . These r e s u l t s support that phonetic pro-cessing took place f o r the phonetic dimension, 'place', i n Experiment I and i n d i c a t e that the auditory dimensions, p i t c h and i n t e n s i t y can i n t e r f e r e with each other's processing when i r r e l e v a n t v a r i a t i o n s occur. Experiment III was designed to determine whether the processes involved i n speech perception that extend beyond those involved f o r nonspeech perception were auditory or phonetic i n nature. That i s , were mechanisms responding to s p e c i f i c acoustic events contained i n the speech s i g n a l or were abstract phonetic features extracted from the speech signal? The s t i m u l i varied again on the auditory dimension of p i t c h and also varied on the i s o l a t e d second formant t r a n s i t i o n s from the s t i m u l i used i n Experiment I (/ba/ and /ga/). I f the r e s u l t s 59 from Experiment I were due to processing of second formant trans-i t i o n s alone, then the r e s u l t s f o r Experiment III should have been i d e n t i c a l . The reaction times increased f or both dimensions from the control to orthogonal condition. The average evoked p o t e n t i a l s r e v e a l -ed no s i g n i f i c a n t differences at any electrode l o c a t i o n . These r e s u l t s resemble those from Experiment I I . Hence the perception of second formant t r a n s i t i o n s i n i s o l a t i o n i s c l o s e l y r e l a t e d to perception of n o n l i n g u i s t i c dimensions. In order to evoke phonetic processing the phonetic context i s required. Experiment IV investigated the dimensions of p i t c h contour and p i t c h . P i t c h contour was chosen because i t i s considered to cue l i n g u i s t i c d i s t i n c t i o n s but i t i s not under context-conditioned v a r i a -t i o n . The reactions times and error rates increased s u b s t a n t i a l l y from the co n t r o l to orthogonal condition. In addition the reaction times and error rates increased s u b s t a n t i a l l y from the control to orthogonal condition. In addition the reaction time was s i g n i f i c a n t l y larger f o r p i t c h contour than for p i t c h under both conditions. The evoked p o t e n t i a l revealed no differences between dimensions at any l o c a t i o n . Reaction times were f a s t e r f o r p i t c h than for place i n Experiment I, or for p i t c h contour i n Experiment IV. However, i n Experiment IV t h i s occurred i n both conditions, whereas i n Experiment I, only i n the orthogonal condition. Hence t h i s d i f f e r e n c e r e f l e c t s d i f f e r e n t temporal requirements for the judgments, rather than i n t e r -ference between dimensions. Wood reviewed that the auditory and phonetic d i s t i n c t i o n had been noted i n d i c h o t i c l i s t e n i n g r e s u l t s , the "phoneme boundary e f f e c t " , 60 speeded c l a s s i f i c a t i o n tasks, average evoked p o t e n t i a l recordings, and adaptation studies. He believed that there e x i s t s an empirical basis for t h i s d i s t i n c t i o n and that i t i s now necessary to investigate the nature of phonetic processing. The r e s u l t s from Experiments I and I I I indicated that phonetic processing consists of more than simple detection of c e r t a i n acoustic features. The i s o l a t e d second formant t r a n s i t i o n s were not perceived i n the same manner as when they are imbedded i n a phonetic context and serve as cues for a l i n g u i s t i c d i s t i n c t i o n . In view of recent evidence that c e r t a i n acoustic features have been i s o l a t e d that do correspond with l i n g u i s t i c features, i t seems p l a u s i b l e that phonetic processing involves mechanisms that extract r e l a t i v e l y invariant acoustic properties from the speech s i g n a l and more abstract mechanisms that resolve context-variable acoustic cues. Wood c i t e s Studdert-Kennedy (1976) on the subject of adaptation studies: "We should not discount the p o s s i b i l i t y that the auditory-phonetic d i s t i n c t i o n i s misleading i n t h i s context, and that the adapted systems are both auditory and phonetic. I f , for example, the output from the auditory analyzers tuned to speech tunneled d i r e c t l y into phonetic processors so that adaptation of one set e n t a i l e d adaptation of the other, a convincing separation of the two would be d i f f i c u l t to demonstrate [p. 278]." (Studdert-Kennedy then reported the study by Cooper (1974b) mentioned above as support for t h i s theory). In addition to d i s t i n g u i s h i n g between auditory and phonetic pro-cessing l e v e l s , these experiments y i e l d information on the r e l a t i o n s h i p of phonetic processes with the general auditory system. Wood rules 61 out the p o s s i b i l i t y that processing follows a s t r i c t s e r i a l order, with auditory information processed at one l e v e l followed by phonetic processing at another l e v e l . Wood (1974) proposed a s e r i a l - p a r a l l e l model with three components. A peripheral component performs prelim-inary analysis of a l l s i g n a l s , a c e n t r a l auditory component processes n o n l i n g u i s t i c auditory information and a c e n t r a l phonetic component extracts phonetic features from the r e s u l t s of preliminary a n a l y s i s . The two ce n t r a l components perform i n p a r a l l e l but are dependent upon input from the peripheral component. The auditory l e v e l of processing then consists of two parts, peripheral analysis and a cen t r a l component working i n p a r a l l e l with the phonetic component. Pastore, Ahroon, and Puleo (1975) disagreed with the basic assumption underlying Wood's (1974) research into the auditory-phonetic d i s t i n c t i o n . The assumption i s that because c e r t a i n aspects of auditory s t i m u l i are found to be important when discriminating d i f f e r e n t phonetic s t i m u l i , the use of these aspects i n auditory s t i m u l i as vari a b l e s i n a study implies a phonetic c a u s a l i t y i n the results.. The authors therefore believed that Wood did not observe any phonetic processing and the r e s u l t s could be explained i n auditory terms. Pastore et a l . r e p l i c a t e d Wood's experiment using two tone pips varying i n p i t c h and duration. The reaction times and error rates were deter-mined. The r e s u l t s followed the same pattern as Wood's r e s u l t s . Reaction times f o r correlated conditions were f a s t e r than those for the control condition and only the reaction times f o r the orthogonal tone pip condition were slower. 62 In retrospect, a comparison of -the r e s u l t s from Experiments I and I II i n Wood (1975) and the s i g n i f i c a n t differences i n average evoked p o t e n t i a l values that resulted between these experiments, show that Wood revealed that a type of processing more complex than auditory processing and invo l v i n g the l e f t hemisphere, took place. Furthermore, the r e s u l t s indicated the importance of the acoustic dimension e x i s t i n g i n a phonetic context, for the processing to take place i n the l e f t hemisphere. Therefore the r e s u l t s of Pastore et a l . may indic a t e that c e r t a i n acoustic features may be processed i n a manner s i m i l a r to speech, j u s t as nonspeech was shown to be c a t e g o r i c a l l y perceived, yet something more i s involved to evoke phonetic processing. Wood has shown that phonetic processing e x i s t s but i n order that i t i s u t i l i z e d , the s i g n a l must be judged as having s u f f i c i e n t speech-like features. Blechner, Day and Cutting (1976) varied nonspeech s t i m u l i along two dimensions — r i s e time and i n t e n s i t y . Having shown that saw-tooth waves d i f f e r i n g i n r i s e time (not heard as speech) could be cate-g o r i c a l l y perceived (Cutting & Rosner, 1974) and could be s e l e c t i v e l y adapted (Cutting, Rosner & Foard, 1976), both c h a r a c t e r i s t i c s of speech s t i m u l i , the authors were interested i n determining whether these stim-u l i would produce s i m i l a r r e s u l t s as speech on a speeded c l a s s i f i c a t i o n task. The s t i m u l i were the same as those used by Cutting and Rosner (1974) only they varied i n two l e v e l s of i n t e n s i t y . The r e s u l t s were i d e n t i c a l to those of Wood (1975). Reaction times increased s i g n i f i -cantly for the r i s e time dimension i n the orthogonal condition. Thus there was an asymmetrical pattern of interference; i n t e n s i t y v a r i a t i o n 63 i n t e r f e r e d with the processing of r i s e time, while the reverse was not true. The authors proposed that neither a s t r i c t s e r i a l nor a s t r i c t p a r a l l e l processing model could account for t h e i r r e s u l t s and that factors such as s t i m u l i d i s c r i m i n a b i l i t y and task c h a r a c t e r i s t i c s may a f f e c t the mode of processing. In t h i s experiment nonspeech s t i m u l i varying i n r i s e time and i n t e n s i t y produced the same r e s u l t as speech s t i m u l i . Therefore two auditory dimensions produced asymmetric i n t e r -ference and the authors f e l t that the auditory system should be consid-ered to possess various processing l e v e l s . The l i n g u i s t i c - n o n l i n g u i s t i c d i s t i n c t i o n cannot be overlooked as there i s evidence f o r l e f t hemi-sphere processing. But Blechner et a l . proposed that less emphasis should be given to " s p e c i a l " speech processing and that the l i n g u i s t i c -n o n l i n g u i s t i c dimension i s not an accurate way of describing the non-acoustic factors that determine c e r t a i n perceptual processes. Instead, they proposed that speech i s an important h i e r a r c h i c a l l y coded system of sound where sounds can be recoded into higher order l i n g u i s t i c u n i t s . The "plucks" and "bows" used i n t h i s experiment are also lower l e v e l components i n a highly structured sound system — music. The need for c e r t a i n perceptual processing may be determined by non-acoustic factors and therefore studying the l i n g u i s t i c - n o n l i n g u i s t i c d i f f e r e n c e may not be an accurate method of studying these f a c t o r s . The type of processing r e s u l t s from the i n t e r a c t i o n of the acoustic nature of a sound and the manner i n which i t may be coded i n a h i e r a r c h i c a l l y organized sound system. Preliminary Analysis — At t h i s stage i n the Stevens and House model the acoustic a t t r i b u t e s that a r i s e from peripheral processing undergo 64 analysis and some of those that correspond more i n v a r i a n t l y with fea-tures are decoded into such. This r e s u l t s i n a p a r t i a l feature s p e c i -f i c a t i o n of the segment. Support f o r t h i s stage of analysis comes from several sources. The c a t e g o r i c a l perception studies that revealed c e r t a i n acoustic dim-ensions produce t h i s type of processing whether they be contained i n speech or nonspeech, support that there can e x i s t acoustic detection of some features. Such dimensions as rise-time (Cutting & Rosner, 1974), noise lead ( M i l l e r et a l . , 1976) and the presence or absence of sharp s p e c t r a l change at voice onset time (Stevens & K l a t t , 1974) a l l produce c a t e g o r i c a l perception i n nonspeech and cue r e s p e c t i v e l y the features r of a f f r i c a t e / f r i c a t i v e d i s t i n c t i o n , v o i c i n g , and the voiced-voiceless d i s t i n c t i o n i n speech. In addition, Stevens (1971) showed that a sharp increase i n i n t e n s i t y at a s p e c i f i c frequency range i n the spec-trum revealed the presence of a stop consonant. As mentioned e a r l i e r , Kuhn (1975) may have found evidence that a more context-dependent fea-ture, place of a r t i c u l a t i o n , can be a c o u s t i c a l l y analyzed d i r e c t l y from the speech s i g n a l . .The studies on the auditory-phonetic processing d i s t i n c t i o n support the view that there are various l e v e l s of auditory processing. Central l e v e l s of auditory processing perform more complex analysis on the r e s u l t s from preliminary a n a l y s i s . The preliminary analysis component i n the analysis by synthesis model would correspond to a c e n t r a l auditory component, such as described by Wood (1974). Forming the d i s t i n c t i o n between a c e n t r a l auditory and a c e n t r a l phonetic com-65 ponent w i l l explain the r e s u l t s of Pastore et a l . and Elechner et a l . that seemed to i n d i c a t e only auditory processing took place. In these two studies the s t i m u l i underwent extraction of acoustic a t t r i b u t e s that were decoded into "feature" information, but i n the absence of phonetic context they were not judged as "speech" and the psychological transformation from sound into "named" sound did not take place, engag-ing the c e n t r a l phonetic component. That phonetic processing e x i s t s cannot be denied. I t involves the more abstract transformation of the invariant acoustic a t t r i b u t e s into features. Control and Generative Rules — These two components are the "hypothesis" of t h i s model and by t h e i r very nature defy i n v e s t i g a t i o n into t h e i r existence. The control component uses the p a r t i a l feature s p e c i f i c a -t i o n that r e s u l t s from preliminary analysis and forms a hypothesis as to i t s i d e n t i t y . I t has a v a i l a b l e for i t s use information from ana-l y s i s of adjacent segments, analysis of e a r l i e r parts of the utterance, the l e x i c o n and r e s u l t s from the comparator. Knowledge of the lexicon provides information about what sequences may be expected i n the par-t i c u l a r language. L i t t l e i s known about the form that the lexicon assumes, except that i t would be unreasonable to expect that every possible sequence for every speaker under every condition could be r e a l i z e d , l e t alone stored. Therefore a model with recourse to a generative r u l e system i s required, rather than one where d i r e c t matching occurs between the input and a lexicon. -The hypothesized representation i s transmitted to the gener-ativ e rules i n the form of morphemes, segments and features. The generative rules y i e l d a representation of i n s t r u c t i o n s to the a r t i -66 culators that would be necessary to generate the utterance. Stevens (1960) recognized that the representation could e x i s t at one of three l e v e l s - acoustic, anatomical or neurophysiological - but did not commit to model to one. The generative rules could be used to generate actual speech, given a set of phonemes as input and a c t u a l i z i n g the peripheral o r a l musculature. The control unit and generative rules form the phonetic pro-cessing component i n t h i s model. In order for a s i g n a l to be processed i n t h i s manner the d i s t i n c t i o n between speech or nonspeech must have been made p r i o r to the a c t i v a t i o n of the synthesis loop. It i s neces-sary that peripheral auditory analysis be performed, before preliminary a n a l y s i s , (or c e n t r a l auditory analysis) and the phonetic processing. The model operates i n s e r i a l order with some p a r a l l e l recourse between ce n t r a l auditory and phonetic processing, i n much the same manner as that proposed by Wood (1974) and Blechner et a l . (1976). The Comparator — The comparator calculates the degree of match between the a r t i c u l a t o r y representation and the acoustic a t t r i b u t e s of the s i g n a l under a n a l y s i s , that reside i n temporary store. The degree of match i s communicated to the c o n t r o l component and the hypothesis i s ei t h e r accepted or rejected. If the hypothesis i s rejected the control component uses the error detected i n the comparator to form a new hypo-t h e s i s . The loop i s transversed u n t i l a match i s s u c c e s s f u l l y achieved. The authors proposed that because much acoustic information i s already i n a form that i s r e l a t e d to features, the i n i t i a l hypothesis w i l l usually be correct, yet the matching process i s always employed as a check. 67 Stevens and House provided e s s e n t i a l l y two d i r e c t sources of evidence to support comparison with a r t i c u l a t i o n during perception. When a subject i s required to repeat a CV nonsense s y l l a b l e from an ensemble of varying s i z e , Saslow (1958) found that the reaction time i s independent of the s i z e of the ensemble and therefore postulated that the mechanism required does not depend on a n t i c i p a t e d features but that a d i r e c t recoding i s made from the stimulus pattern to the motor response. Kozhevnikov and Chistovich (1965) examined the courses of events that take place as a speaker repeats an utterance originated by another speaker. They measured tongue, palate and l i p contacts and nasal c l o -sure as they occurred i n the o r i g i n a l production and the r e p e t i t i o n . The experiments used VCV and CV s y l l a b l e s with the consonant changing values. The repeater was found to constantly correct and r e f i n e his_-a r t i c u l a t o r y p o s i t i o n for the consonant as more information became av a i l a b l e to him from the speaker. The repeater approximates the con-sonant he i s to repeat, but does not release i t u n t i l the speaker has effected closure and release, and the repeater's a r t i c u l a t i o n c o r r e c t l y approximates the p o s i t i o n of closure. (About 100 msec l a t e r the repeat-er releases h i s consonant.) Kozhevnikov and Chistovich hypothesized that the states that the producer goes through i n th i s case p a r a l l e l the perceptual mechanisms that are taking place. Each new state depends on previous analyses and modifies the next state. For t h i s to be the case, they argued there must be a common underlying feature set and these features form the substructure of phonemes, i n a r t i c u l a t o r y 68 terms anyway. Kozhevnikov and Chistovich proposed that the speaker refe r s to an "inner s t a t e " that i s r e f l e c t i v e of the information re-ceived to a s p e c i f i c time and that i s continually reviewed as more information i s added. The research reviewed above also supports reference to a r t i -c u l a t i o n during perception. The d i f f e r e n t values that speakers of d i f f e r e n t languages produce for phonetic classes i s also r e f l e c t e d i n the exact boundary p o s i t i o n that they determine i n perceiving phonetic classes (Lisker & Abramson,• 1970). S i m i l a r l y a f t e r adaptation with a token from one phonetic c l a s s , subjects do not only s h i f t t h e i r percep-t u a l boundary, but a s h i f t occurs i n the value that they produce also. (Cooper, 1974b). Hence the reference to a r t i c u l a t i o n during perception i s ongoing. Certain r e l a t i o n s h i p s are not established and then l e f t to be used. The l i n k i s a c t i v e l y reinforced with each production and perception. The research of Kuhn (1975) also supported that there e x i s t s reference to an a r t i c u l a t o r y representation during perception but the inference here i s that synthesis i s n o t required to'accomplish the l i n k as the information can be d i r e c t l y extracted through auditory ana-l y s i s . Stevens (1972) studied spectrographic data and computations from a model of the vocal t r a c t to describe possible acoustic correlates of phonetic features. He found that s p e c t r a l patterns associated with place of a r t i c u l a t i o n do not change continuously but i n quantal steps, within which a change i n the point of c o n s t r i c t i o n produces l i t t l e acous-t i c e f f e c t . These plateaus are bounded by abrupt acoustic d i s c o n t i n -69 u i t i e s . These plateaus correspond to place of a r t i c u l a t i o n i n many languages and therefore the o r i g i n of phonetic categories may l i e i n the human vocal t r a c t . Stevens and House stated that a natural explanation for the "speech perception' mode" ar i s e s i f the model contains a component that establishes a correspondence between auditory and a r t i c u l a t o r y patterns. "After processing by peripheral structures, some at t r i b u t e s of an incoming auditory pattern are then, as i t were, looked up i n the dic t i o n a r y of auditory-a r t i c u l a t o r y correspondences. If a correspondence i s found - i . e . , i f i t i s established that the stimulus i s of a class that could have been produced by the human a r t i c u l a t o r y mechanism - then the speech perception mode i s brought into play, and i n subsequent processing, use i s made of a r t i c u l a t o r y information derived from the d i c t i o n -ary [p. 54]." The requirements that the s i g n a l need meet to enter "speech per-ception mode" must equal those needed f o r a s i g n a l to be processed phonetically. From the foregoing discussion on phonetic processing i t i s obvious that detection of a s p e c i f i c acoustic dimension " p e c u l i a r " to speech i s not s u f f i c i e n t f o r phonetic processing to occur. That i s to say, that nonspeech containing an acoustic dimension found i n speech w i l l not be phonetically processed, yet i t w i l l undergo complex auditory analysis and reveal some c h a r a c t e r i s t i c s s i m i l a r to speech. The acoustic dimensions that characterize speech need to e x i s t i n phone-t i c context, and hence i n t h e i r natural state before they are processed as speech. The d i s t i n c t i o n i s a psychological one, a d i v i s i o n of s t i m u l i into natural classes, l i k e those found i n general psychology. In the case of speech the natural classes form auditory concepts, or the names of sounds. These names e x i s t i n the form of symbols (phonemes) 70 or in the form of acoustic patterns. It follows that division of acoustic stimuli into these classes relies upon identification of the stimuli as a distinct product of the vocal tract. Stevens and House quote Hayek (1962) as writing [p. 362] that "identification" means that "some movement (or posture, etc.) of our own which is perceived through one sense is recognized as being of the same kind as the movements of other people which we perceive through another sense [p. 54]." Chapter 3 THE USE OF THE REVISED MODEL BY THE CHILD 3.1 The L i t e r a t u r e on Children's Perception and Production, Re-reviewed  3.1.1 L i t e r a t u r e Evaluating the Relationship of Perception to Production Experiments that have been performed using adults as subjects to investigate the processes involved i n speech perception, have been par-a l l e l e d using infants and c h i l d r e n as subjects. Such c a t e g o r i c a l percep-t i o n studies revealed that by one-month of age the human can perceive c a t e g o r i c a l l y , acoustic s t i m u l i that vary along the acoustic dimensions that s p e c i f y the features, v o i c i n g and place of a r t i c u l a t i o n (Eimas et a l . , 1971; Morse, 1972). The infants i d e n t i f y the s t i m u l i about the same boundary as determined by the adult and t h e i r d i s c r i m i n a t i o n func-tions reveal sharp peaks at t h i s boundary and troughs within a phonetic c l a s s . In view of the t o t a l knowledge about c a t e g o r i c a l perception, i t i s obvious that the infant u t i l i z e s an innate a b i l i t y of h i s auditory system to perform i n t h i s manner. No more than auditory analysis, the same as that used for nonspeech s t i m u l i containing s i m i l a r information, i s c a r r i e d out. At the age of one-month or i n the case of the c h i n c h i l l a , no knowledge of language as a system can be expected. U n t i l the c h i l d has experience i n producing sound combinations and can attend to the language system, he has not r e a l i z e d h i s p o t e n t i a l as a user of that language and the phonetic percept cannot be r e a l i z e d . Having seen that adults and preverbal infants perceived a s e r i e s '71 72 of synthetic speech s t i m u l i that varied i n VOT c a t e g o r i c a l l y , Wolf (1973) assessed the a b i l i t y of kindergarten and second-grade ch i l d r e n to do the same. The two groups of subjects performed an i d e n t i f i c a t i o n task and a same-different AX d i s c r i m i n a t i o n task. The s t i m u l i were nine tokens of /ba/ or /pa/ with VOTs ranging from -10 msec to 70 msec. Both groups determined a phoneme boundary that agreed with adult values reported by Lisker and Abramson (1970). The obtained di s c r i m i n a t i o n functions followed the predicted d i s c r i m i n a t i o n functions based oh ab-solute c a t e g o r i c a l perception, but the obtained discrimination was some-what lower at the peaks and higher at the troughs. A second experiment was performed to test the generality of the findings from Experiment I and to determine whether the task order had affected d i s c r i m i n a t i o n of the speech s t i m u l i . The s t i m u l i were a set r of /da/ and / t a / tokens varying i n VOT and the d i s c r i m i n a t i o n task was presented before the i d e n t i f i c a t i o n task. Again the groups determined phoneme boundaries that did not d i f f e r s i g n i f i c a n t l y from each other or from the values determined by adults. The d i s c r i m i n a t i o n functions were very nearly c a t e g o r i c a l but again d i s c r i m i n a t i o n was poorer than pre-dicted i n the region of the phoneme boundary and better than predicted at the end of the continuum. The order i n which the tasks were presented did not a f f e c t d i s c r i m i n a t i o n . Comparison of the children's i d e n t i f i c a t i o n and d i s c r i m i n a t i o n functions with those of the adult revealed that c h i l d r e n i d e n t i f y the s t i m u l i only s l i g h t l y less c o n s i s t e n t l y than the adult and at very si m i -l a r phoneme boundaries. Their d i s c r i m i n a t i o n though i s c o n s i s t e n t l y poorer than the adults' at peak l e v e l s . Wolf explained that the c h i l d -ren's d i s c r i m i n a t i o n was poorer due to e x t r a - l i n g u i s t i c aspects of the 73 task. The subject was required to i d e n t i f y two s t i m u l i and remember the two successive s t i m u l i before making a same-different judgment. There-fore e f f e c t s of memory, attention and the cognitive same-different judg-ment may have aff e c t e d the d i s c r i m i n a t i o n r e s u l t s . The nature of the discrimination task may explain the children's improved d i s c r i m i n a t i o n of items within a phonemic c l a s s . The AX same-d i f f e r e n t d i s c r i m i n a t i o n task permits d i r e c t comparison of the s t i m u l i from auditory store without coding into phonetic classes. Supposing that at l e a s t some traces of consonants are represented i n t h i s store, i t f o l -lows that t h e i r d i s c r i m i n a t i o n would be better i n such a task than an ABX d i s c r i m i n a t i o n task. Regardless of these minor v a r i a t i o n s , the re-s u l t s of t h i s experiment demonstrated that l i k e infants and adults, c h i l d r e n also perceive VOT continuum c a t e g o r i c a l l y and therefore the me-chanisms involved are b i o l o g i c a l l y innate. Chapter 2 reviewed studies of children's a c q u i s i t i o n of t h e i r phonological/phonetic knowledge. These studies indicated that children's perceptual a b i l i t i e s are more advanced than t h e i r productive a b i l i t i e s and hence implied that the analysis by synthesis model of speech per-ception could not be operative during t h i s stage of language a c q u i s i t i o n . The evidence w i l l now be reviewed again and reconsidered i n view of the revised analysis by synthesis model to determine whether the c h i l d r e n could be using such a model for speech perception. In the studies by Shvachkin (1973), Garnica (1973), and Barton (1975b) the c h i l d was required to make a d i s t i n c t i o n between phonologi-c a l contrasts that resulted i n a meaningful difference between stimulus items. Hence the c h i l d was making a phonetic di s c r i m i n a t i o n i n a 74 l i n g u i s t i c a l l y relevant manner and was not j u s t l i s t e n i n g to the "music" of language. The r e s u l t s of adaptation and c a t e g o r i c a l perception studies i n d i c a t e that the c h i l d can make discriminations from a very young age based on the r e s u l t s of auditory a n a l y s i s . It depends upon the nature of the task and the nature of the d i s t i n c t i o n s that the c h i l d must attend to, to reveal the type of processing that the c h i l d i s performing. In order for the c h i l d to make a d i s t i n c t i o n that forms a phonological contrast, that d i s t i n c t i o n must have been conceptualized i n the c h i l d ' s system and hence have achieved the phonetic percept. It may be expected from the outset, that as a c h i l d gains knowledge about pro-duction, t h i s knowledge w i l l a i d i n his conceptualization of sounds and t h e i r function i n the system and hence, a i d i n phonetic processing also. It has been shown that Shvachkin's r e s u l t s do not have the uni-v e r s a l v a l i d i t y that he supposed they had. Nor can t h i s piece of re-search be r e l i a b l y assessed due to i t methodological weaknesses and lack of quantitive data. It produces two i n t e r e s t i n g f a c t s though — the c h i l d r e n did perceive contrasts before they could produce them and although the r e s u l t s were not invariant and univer s a l , some general trends appeared when compared to Garnica's work. The f a c t that the ch i l d r e n could d i f f e r e n t i a t e s t i m u l i before they could produce these differences can be explained by analyzing the task they performed. The ch i l d r e n a l l had some a r t i c u l a t o r y s k i l l s , they were equipped with the a b i l i t y to make auditory discriminations and they indicated that they were aware of the r o l e of these d i s t i n c t i o n s i n the language system. From Shvachkin's vague account i t i s probable that i n many instances the ch i l d r e n were taught the phonological d i s -,75 t i n c t i o n before being tested. The chi l d r e n learned the asso c i a t i o n be-tween the stimulus and the corresponding a r t i c l e and i n the task r e l i e d upon t h i s learned association and the a b i l i t i e s of the auditory system. Subjects were successful as long as the opposition being tested involved acoustic differences that could be assessed either r e l a t i v e l y d i r e c t l y by auditory analysis or from the c h i l d ' s e x i s t i n g phonetic system. In th i s manner, the c h i l d underwent a "forced" and probably premature lear n -ing experience. During the normal course of learning such d i s t i n c t i o n s , i t i s l i k e l y that matching of a perceived d i s t i n c t i o n with d i f f e r e n t lex-i c a l items taks place. The c h i l d learns the function of the d i s t i n c -t i o n within h i s language system. When the c h i l d has achieved the a b i l i -ty to produce a d i s t i n c t i o n , the perceived phonetic d i s t i n c t i o n i s matched with the a r t i c u l a t o r y d i s t i n c t i o n and the information required by the comparator unit of the analysis by synthesis model i s established. It i s expected that the a b i l i t y to perceive d i s t i n c t i o n s i s a v a i l a b l e to the c h i l d before the a b i l i t y to produce them because of the innate c h a r a c t e r i s t i c s of the auditory system. When the corresponding a r t i c u l a t o r y data i s also a v a i l a b l e , phonetic processing no longer occurs i n a " t r i a l and e r r o r " manner but i s more consistent. This notion w i l l be discussed further with reference to Barton's study (1975b). That there e x i s t differences i n the order of phonetic a c q u i s i t i o n between languages i s not s u r p r i s i n g . The phonemes contained i n d i f f e r e n t languages are composed of d i f f e r e n t combinations of features. The fea-tures that d i f f e r e n t i a t e two phonemes must be discriminated with res-pect to the r e s t of the phonetic information i n the stimulus complex. Certain combinations of acoustic a t t r i b u t e s may r e s u l t i n a stimulus 76 complex that contains non-contradictory information. A feature con-tained i n such a stimulus may be r e a d i l y discriminated and therefore discriminated by the c h i l d before other s t i m u l i containing that feature. Combinations where there e x i s t s c o n f l i c t i n g cues about a feature's iden-t i t y may be expected to be acquired l a s t by the c h i l d . Nor can i t be expected that a s p e c i f i c feature w i l l be perceived i n a l l phonetic con-texts instantaneously. The acoustic a t t r i b u t e s that correspond more or le s s d i r e c t l y with c e r t a i n features may be expected to be perceived ear-l i e r than those that have not been shown to have any type of i n v a r i a n t r e l a t i o n s h i p . These a t t r i b u t e s w i l l be perceived easiest i n phonetic contexts where there e x i s t s no other acoustic cues that contradict t h e i r s p e c i f i c a t i o n of the feature. For example, the presence or absence of a stop consonant, cued by a sharp increase i n s p e c t r a l energy a f t e r a period of very low energy, should be an early feature a c q u i s i t i o n . In both Shvachkin's and Garnica's studies detection of the presence or ab-sence of a stop consonant preceded most other feature d i s t i n c t i o n s . Barton's (1975a) legitimate c r i t i c i s m of the c r i t e r i o n used by Garnica to determine success, renders Garnica's r e s u l t s uninterpretable. However again the same issues can be raised as for Shvachkin's work. Barton's l a t e r study (1975b) produced three i n t e r e s t i n g r e s u l t s . F i r s t , the correct responses were more consistent for words that were learned than those that were taught. Second, a consistent pattern of r e s u l t s existed across a l l subjects. The i n d i v i d u a l differences were a matter of degree, not of being able to perceive a contrast or not. Third, features were not discriminated equally well when combined with d i f f e r e n t combinations of acoustic information for s p e c i f i c contrasts. 77 That i s , the feature 'voice' was more e a s i l y detected i n the /k-g/ p a i r than i n the / f - v / p a i r . Discrimination of learned words involved contrasts that the c h i l d was able to a r t i c u l a t e . Presumably the a r t i c u l a t o r y d i s t i n c t i o n aided i n perceiving the phonetic d i s t i n c t i o n . The f a c t that the subjects could be taught the words and then perform the task indicates again that the a b i l i t y to perceive d i s t i n c t i o n s precedes the a b i l i t y to produce them. By the time that the c h i l d could produce the d i s t i n c t i o n i t might be ex-pected that the a b i l i t y to perceive i t was simply more established. How-ever once the c h i l d can a r t i c u l a t e the contrast, the phonetic "meaning" of the contrast has been f i r m l y established and the phonetic percept has been achieved. Barton's study reveals the difference between the a b i l i t y to make a phonetic d i s c r i m i n a t i o n based s o l e l y on the r e s u l t s of auditory analysis and based on a c o u s t i c - a r t i c u l a t o r y correspondences. The f a c t that the subjects could a l l perform the discriminations but that i n d i v i d u a l performances d i f f e r e d by t h e i r consistency, supports the notion that a l l normal c h i l d r e n have the same innate a b i l i t y to pro-cess s t i m u l i i n t h e i r auditory system, but that each i n d i v i d u a l learns to use these a b i l i t i e s at h i s own rate and i n h i s own order, i . e . by using i n d i v i d u a l s t r a t e g i e s . The t h i r d consideration reveals that phonetic processing i n -volves learning to recognize the existence of features i n a complex, rather than i n i s o l a t i o n . The auditory system i s able to i d e n t i f y and discriminate i s o l a t e d acoustic cues that resemble those found i n speech, but the recognition of the t o t a l stimulus complex evokes use of the l e f t hemisphere when the complex i s recognized as a component of the language 78 system that needs decoding. Such recognition i s achieved when the au d i t o r y - a r t i c u l a t o r y correspondence i s discovered by the c h i l d . The correspondence may be discovered before the c h i l d has acquired the motor s k i l l s required f o r a r t i c u l a t i o n . But when the a r t i c u l a t o r y s k i l l s are developed the l i n k i s established that w i l l permit the acoustic patterns to be traced to t h e i r a r t i c u l a t o r y source and hence, consistently phonetically perceived. Kornfeld's (1971) hypothesis that the c h i l d produces what he per-ceives and does so according to a system that i s more abstract than the adult's i s worth of c a r e f u l consideration. The evidence f o r t h i s theory comes from spectrographic analysis of children's productions that re-veals that the c h i l d r e n mark d i s t i n c t i o n s that are not heard by the adult. The adult would assign the production to a d i f f e r e n t phonetic c l a s s but the acoustic data shows that the c h i l d has marked d i s t i n c t i o n s that are not found i n either phoneme. It i s d i f f i c u l t to understand how a c h i l d whose auditory system has been shown to perform the same acoustic analysis and make the same discriminations as the adult, could hear both the adult production and h i s own production as anything other than that heard by the adult. In t h i s case the c h i l d , using h i s abstract system would perceive the adult form and h i s form as being i d e n t i c a l . His pro-duction matches his perception. Several d i f f i c u l t i e s a r i s e from such a theory. F i r s t of a l l , i f the c h i l d hears both forms as being i d e n t i -c a l , what promotes the development of h i s system s t e a d i l y towards that of the adult? Related to th i s question, Kornfeld maintained that a c h i l d r e j e c t s the adult's imitations of his productions because the adult does not mark the e s s e n t i a l differences that the c h i l d perceives. But numerous reports on the c h i l d acquiring language report that the c h i l d 79 i s aware that h i s productions are not i n adult form. If t h i s i s indeed the case then the c h i l d can perceive the difference between h i s form and the adult form and he r e j e c t s the adult i m i t a t i o n because he can de-tect the difference and expects the adult form from the adult. This be-haviour i s repeatedly reported i n Smith (1973). Kornfeld (1976) elaborated upon the reasons f o r the c h i l d ' s use of a more abstract system. She supposed that c e r t a i n combinations of acoustic information can be ambiguously interpreted during analysis and therefore r e s u l t i n a "misperception". In other words, the c h i l d ' s per-ceptual a b i l i t i e s are developing as well as h i s productive a b i l i t i e s . His more abstract perceptual system i s explainable i n terms of the i n -t e r a c t i o n of acoustic a t t r i b u t e s i n a stimulus complex. Studying words with i n i t i a l consonant c l u s t e r s , Kornfeld c i t e d examples of phonetic contexts where there e x i s t s c o n f l i c t s of acoustic cues. She argued that lacking the adult's knowledge of morpheme st r u c -ture constraints and phonological redundancies, the c h i l d i s more re-l i a n t upon phonetic information during speech recognition. The mis-perceptions r e s u l t not from imperfect "tuning" at the l e v e l of auditory ana-l y s i s but from c o n f l i c t s that a r i s e at the l e v e l of phonetic processing. Kornfeld outlined the morpheme constraint knowledge that the adult has a v a i l a b l e for h i s use when decoding C j ^ V and C^C^C^V c l u s t e r s . The morpheme structure for English immediately predicts that for a three-segment c l u s t e r the f i r s t segment must be /s/, the second a voice-less p l o s i v e , and the t h i r d either / r / or / l / . For two segment cl u s t e r s not containing an i n i t i a l /s/, the f i r s t p l o s i v e may be voiced or voi c e l e s s , the second segment, an / r / or 111. In t h i s paper, Kornfeld 80 discussed the perception of the v o i c i n g contrast because there i s much information a v a i l a b l e on i t s acoustic-phonetic r e l a t i o n s h i p . In addition to the determination of absolute VOT, Kornfeld l i s t e d f i v e other acoustic patterns that have been shown to serve to judge voicelessness i n a segment. These include cues such as the ab-sence of a rapid s p e c t r a l change at the onset of vo i c i n g (see Stevens & K l a t t , 1974, above), higher onset frequency of F^, the presence of as p i r a t i o n , greater i n t e n s i t y and duration of the burst phase and a high p i t c h on adjacent vowels. In d i f f e r e n t contexts these dimensions vary i n t h e i r effectiveness to cue the voi c e l e s s d i s t i n c t i o n . Those cues that can override other c o n f l i c t i n g cues would be used p r i m a r i l y to make the phonetic decision. In CV contexts where the consonant i s a stop, i t i s p r i m a r i l y the VOT and detection of a rapid s p e c t r a l change at v o i c i n g onset that are used to make the voiced-voiceless contrast (Stevens & K l a t t , 1974). In #stop - / r / c l u s t e r s , the presence of the / r / causes the VOT for the vo i c e l e s s stop to be increased by 30-50 msec. This change also causes the F ^ - t r a n s i t i o n to be lengthened, which lowers the F^ onset frequency and therefore produces cues for a voiced p l o s i v e . Kornfeld argued then that i f the c h i l d i s using F ^ - t r a n s i t i o n data as a primary cue, he w i l l perceive the stop as voiced. She supplied data from Ingram (1974) and Smith (1973) that reveal the c h i l d substitutes a voiced stop singleton consonant for such c l u s t e r s . Stevens and K l a t t (1974) explained that the lengthening of the VOT i n such c l u s t e r s corresponds with an increased duration of the / r / segment. The sonorant i s lengthened so that the vo i c i n g onset i s not 81 concurrent with the rapid formant motions of the sonorant vowel t r a n s i -t i o n and does not f a l s e l y cue a voiced segment. Hence c o n f l i c t i n g cues for voicelessness, i . e . increased VOT and a rapid spectrum change at the onset of v o i c i n g , are resolved by an a r t i c u l a t o r y adjustment, lengthening of the sonorant. In the c h i l d who has not achieved the a r t i c u l a t o r y s k i l l s necessary to produce a c l u s t e r , he can not use such knowledge to unambiguously perceive the c l u s t e r produced by an external source, the adult, l e t alone modify his own production based on auditory feedback so that he perceives h i s productions as correct. While Stevens and K l a t t provided evidence that the F ^ - t r a n s i t i o n cues and the VOT cues are un-ambiguous i n nature, Kornfeld provided evidence that the F^ onset f r e -quency i s a c o n f l i c t i n g cue. I t can only be assumed that the c h i l d w i l l use whatever cues he has previously found to be successful for making phonetic decisions. U n t i l the c h i l d can r e l a t e the stimulus complex to the a r t i c u l a t o r y adjustments required to produce i t , h i s analysis may rest upon ambiguous cues. Kornfeld reported that c h i l d r e n often analyze and produce an adult c l u s t e r as an a f f r i c a t e or a s i n g l e i n i t i a l consonant marked by heavy a s p i r a t i o n . Haggard (1973) and K l a t t (1973) showed that i n / t r / c l u s t e r s , the delayed onset of v o i c i n g may be accompanied by turbulence p r i o r to the f i r s t g l o t t a l pulse on the following vowel. Kornfeld proposed that the c h i l d may turn to the turbulence and loudness as cues when VOT f a i l s . In adult utterances such a c l u s t e r may become a f f r i c a t e d due to an open g l o t t i s that allows greater a i r f l o w and greater acoustic i n t e n s i t y . Again, not having the a r t i c u l a t o r y correspondence to r e l y upon for i n t e r -p retation of the acoustic information contained i n the c l u s t e r , i t i s 82 l i k e l y that the c h i l d may i n t e r p r e t the / t r / as i t s homorganic a f f r i c a t e /if/. As a t h i r d example, Kornfeld presented for discussion the c l u s t e r /gr/, when the VOT value for /g/ extends into the normally v o i c e l e s s category. If the c h i l d were depending upon canonical VOT value to mark the d i s t i n c t i o n , he would perceive the stop as being voiceless and as-p i r a t e d . Kornfeld presents examples from the l i t e r a t u r e that support the occurrence of such substitutions i n children's speech. Stevens and K l a t t (1974) offered an a r t i c u l a t o r y explanation for the increased VOT values that e x i s t for v o i c e l e s s aspirated stops i n the v e l a r p o s i t i o n . As the a r t i c u l a t i o n for /g/ i s performed also with the slow-moving tongue body i t might be expected that VOT i s somewhat length-ened although not enough to ensure that the t r a n s i t i o n i s complete before v o i c i n g commences, as for /k/. Hence i f the c h i l d lacks knowledge of the a r t i c u l a t o r y manipulations required to co-ordinate a complex of n o n - c o n f l i c t i n g cues, he can not r e l y upon them to process external auditory s t i m u l i and hence may use a more basic cue such as the absolute VOT value. In t h i s case, not knowing that the VOT value for a /k/ i n a /kr/ c l u s t e r i s extended to the length that i t i s , the VOT for the /g/ i n a /gr/ c l u s t e r may be judged as v o i c e l e s s . F i n a l l y , the case of voiceless unaspirated stop consonants con-tained i n c l u s t e r s beginning with /s/ was c i t e d . Here both the VOT value and the F ^ - t r a n s i t i o n cues for v o i c i n g i n d i c a t e that the consonant i s voiced. The cues of burst i n t e n s i t y and duration also go i n the wrong d i r e c t i o n . Davidsen - Neilsen (1969) reported that when the Is/ segment i s removed, 92% of the subjects' responses i d e n t i f i e d the re-83 s u i t i n g s t i m u l i as voiced. Kornfeld c i t e d the high frequency of such forms produced by c h i l d r e n as substantiating that they misperceive the v o i c i n g of the p l o s i v e i n such c l u s t e r s . In summary, two important conclusions can be drawn from t h i s study. F i r s t of a l l , c h i l d r e n may use c e r t a i n acoustic cues to make the v o i c i n g d i s t i n c t i o n , that w i l l lead the c h i l d to a misperception. With the a c q u i s i t i o n of the a r t i c u l a t o r y knowledge required to produce the segment under a n a l y s i s , the c h i l d learns to i d e n t i f y the acoustic cues with t h e i r source and hence learns to "dis-ambiguate" any c o n f l i c t i n g cues. (Evidence has been provided to show that during a r t i c u l a t i o n the cues are "dis-ambiguated".) In order to a r t i c u l a t e a sequence the c h i l d must have as input, a sequence of phonemes, hence he must have concep-t u a l i z e d the existence of the e n t i t i t e s i n the utterance. To do t h i s , the c h i l d must trace the acoustic a t t r i b u t e s to t h e i r source through manipulation of h i s a r t i c u l a t o r y and auditory a b i l i t i e s . Second, once the possible phoneme combinations are familar to the c h i l d , then he has knowledge of morpheme constraints i n h i s language, and t h i s knowledge can be u t i l i z e d to a i d i n decoding the speech s i g n a l e s p e c i a l l y i n cases where the acoustic information i s contradictory, such as i n some c l u s t e r s . One objection can be raised to Kornfeld's hypothesis. If the c h i l d ' s production does reveal h i s perception of the adult form, then the c h i l d does not discriminate any d i f f e r e n c e between the two pro-ductions. In t h i s case, the external model and the c h i l d ' s auditory feed-back are equivalent and he could assume that his a r t i c u l a t i o n was correct. Without some kind of more s e n s i t i v e self-monitoring i t i s d i f f i c u l t to see how the c h i l d ' s forms advance towards the adult's. In the 84 example c i t e d e a r l i e r from Smith (1973) the c h i l d was aware that h i s production was not equivalent to the adult's. Smith assumed that the c h i l d perceives i n the same manner as the adult. Kornfeld, on the other hand has presented evidence that the c h i l d ' s perceptual a b i l i t i e s are developing, as well as h i s production a b i l i t i e s . Taking both of these viewpoints into consideration i t i s most l i k e l y that the c h i l d ' s percep-t i o n a b i l i t i e s are more advanced (yet not i n adult-form) than his pro-duction a b i l i t i e s . The c h i l d can perceive the d i f f e r e n c e between his form and the adult form, yet u n t i l he i s able to develop the motor s k i l l s required to equate the forms, he i s not able to perceive what constitutes the d i f f e r e n c e . % This i s accomplished by being able to trace the auditory stimulus back to the a r t i c u l a t o r y configuration that produced i t . Hence the c h i l d develops his productive r e p e r t o i r e by matching his auditory feedback to the external model. The f a c t that Smith's son, A, could discriminate the objects by name, before he could speak at a l l indicates that he was able to do so by r e f e r r i n g to the r e s u l t s of auditory analysis and forming an associa-t i o n between objects and t h e i r names. He was not required to i d e n t i f y what the dif f e r e n c e was between the l e x i c a l items and therefore did not need any a r t i c u l a t o r y knowledge. The task resembled an auditory discrimina-t i o n task. 3.1.2 L i t e r a t u r e Examining the Child's Perception of Speech Tikofsky and Mclnish (1968a) tested the a b i l i t y of seven-year olds to discriminate s t i m u l i that varied i n the i n i t i a l segment. The subjects performed a forced-choice same-different d i s c r i m i n a t i o n task. 85 The s t i m u l i were 120 - item l i s t s composed of word-word p a i r s , word-nonsense s y l l a b l e p a i r s or nonsense syllable-nonsense s y l l a b l e p a i r s . The i n i t i a l segment i n each p a i r d i f f e r e d i n one to f i v e d i s t i n c t i v e features. The subjects were able to discriminate w e l l by seven years of age and only 2% errors occurred. Most errors occurred when the conso-nants d i f f e r e d by only one d i s t i n c t i v e feature. Some features were more discriminable than others and within a d i s t i n c t i v e feature category some pairs were more discriminable than others. This l a s t f i n d i n g indicates that the stimulus complex as a whole undergoes phonetic analysis and that the i n t e r a c t i o n of acoustic information may either f a c i l i t a t e or impair the recognition of s p e c i f i c features. In cases where the acoustic i n -formation produces c o n f l i c t i n g cues the c h i l d may depend upon misleading cues and further impair h i s a b i l i t y to make the relevant d i s c r i m i n a t i o n . Most of the errors were based on the features place or v o i c i n g . The highest error rate occured for the /f~0/ and the /v-<t/ p a i r s . These pair s have also been found to be poorly discriminated by adults (Tikofsky & Mclnish, 1968b; M i l l e r & Nicely, 1955; Streyens, 1960). The errors that occurred for v o i c i n g occurred i n the p a i r s /v-f/, /z-s/ and /f-0/. Abbs and M i n i f i e (1969) examined the acoustic cues that were found to be important for perceptual d i s c r i m i n a t i o n of f r i c a t i v e s by pre-school c h i l d r e n . Each of / f / , /v/, /<?/, l&l, I si and /z/ were paired with each of /a/, / i / and / a i / i n VC and CV combinations. The s y l l a b l e s were paired so that they d i f f e r e d only i n the consonant. Seventeen c h i l d -ren ranging i n age from 3;0 to 5;1 were subjects. As each s y l l a b l e i n a p a i r was presented, one of two pictures i n front of the subject was 86 illuminated. The subject was then asked, "Who said ?" Aft e r res-ponding the subject was either reinforced or corrected, allowed to change ' his response, and then rei n f o r c e d . The r e s u l t s showed that there were no differences i n errors due to the changing of vowel environment, yet o v e r a l l there were s i g n i f i c a n t l y fewer errors when the consonant followed the vowel than when i t preceded the vowel. However the pairs /v-z/, / f - s / and /v-^7 produced more errors as VC than as CV, contrary to the o v e r a l l trend. The most d i f -f i c u l t p a i r s to discriminate were /f-0/ and /v - c f / , while the easiest were /9-z/ and /s-z/. The s t i m u l i were then analyzed to determine what acoustic cues were u t i l i z e d by the subjects i n performing the discriminations. Duration, i n t e n s i t y and s p e c t r a l measures were analyzed. F r i c a t i v e s i n f i n a l p o s i t i o n were found to be longer than those i n i n i t i a l p o s i t i o n and unvoiced f r i c a t i v e s were longer than' the voiced f r i c a t i v e s i n ei t h e r p o s i t i o n . /s/ was formed to be s i g n i f i c a n t l y longer than any of the other f r i c a t i v e s , while /v/ and were s i g n i f i c a n t l y shorter. The analysis of the r a t i o of f r i c a t i v e duration to vowel duration showed that the r a t i o s for unvoiced f r i c a t i v e s were longer than those f or the voiced f r i c a t i v e s . The p o s i t i o n e f f e c t showed that i n CV s y l l a b l e s the vowel i s much longer i n r e l a t i o n to the preceding f r i c a -t i v e , whereas i n VC s y l l a b l e s the vowel was only s l i g h t l y longer or shorter than the following consonant. The differences i n the C:V r a t i o s r e f l e c t changes inboth vowel and consonant duration. An important cue for the perception of v o i c i n g may be the r a t i o of these durations. 87 Is/ and Izl were found to be 10-15dB more intense than the other f r i c a t i v e s . There were no differences i n i n t e n s i t y that corresponded to the p o s i t i o n of the f r i c a t i v e i n the s y l l a b l e or v o i c i n g d i f f e r e n c e s . Spectrographic analysis revealed that I si and Izl have reso-nances at the high end of the spectrum while If/, /v/, 10/ and l&l are s i g n i f i c a n t l y lower. There were no differences i n center frequency associated with p o s i t i o n i n the s y l l a b l e . It was also determined that the half-power bandwidths for I si and I z l were considerably shorter than those for the r e s t of the f r i c a t i v e s . Therefore I si and /z/ are more intense and have major resonances at a higher frequency than the rest of the f r i c a t i v e s . O v e r a l l , fewer errors occurred when one of the consonants was voiced and the other was unvoiced than when both were voiced or unvoiced. Fewer errors occurred when the f r i c a t i v e was i n f i n a l p o s i t i o n rather than i n i t i a l p o s i t i o n . When one of Is/ or /z/ were compared with one of If/, /v/, /©/, /&/ fewer errors resulted than when two members of the l a t t e r group were compared with each other. The voiced-unvoiced d i s t i n c t i o n for f r i c a t i v e s appears to be f a c i l i t a t e d e s p e c i a l l y i n VC s y l l a b l e s by the r a t i o of the consonant to vowel duration. The C:V r a t i o s are s i g n i f i c a n t l y larger f or VC s y l l a b l e s containing unvoiced f r i c a t i v e s than for those containing a voiced f r i c a -t i v e . Although s i m i l a r changes occur i n CV s y l l a b l e s they are not large enough to be s t a t i s t i c a l l y s i g n i f i c a n t . The other cues that may be used when the f r i c a t i v e i s i n i n i t i a l p o s i t i o n are the presence of a low f r e -quency component (the voice bar) and the well defined formant structure of voiced f r i c a t i v e s . In the M i l l e r et a l . (1976) experiment i t was de-88 termined that subjects judged noise-buzz sequences to contain voice i f the noise-lead was 16 msec or greater. If the noise lead was smaller than 16 msec the s t i m u l i were i d e n t i f i e d as "no-noise". Although M i l l e r et a l . used nonspeech s t i m u l i there may e x i s t a p a r a l l e l with the de-t e c t i o n of v o i c i n g i n f r i c a t i v e s . For an unvoiced f r i c a t i v e i n i n i t i a l p o s i t i o n , the noise produced at the point of c o n s t r i c t i o n extends for a period f a r greater than 16 msec and then v o i c i n g begins with the on-set of the following vowel. Voiced f r i c a t i v e s i n i n i t i a l p o s i t i o n have two components, the f r i c t i o n produced at the point of a r t i c u l a t i o n and a low frequency v o i c i n g component from the g l o t t a l source. The v o i c i n g d i s t i n c t i o n may be determined by detecting the noise-lead time i n a manner s i m i l a r to that i n the M i l l e r et a l . experiment. In addition, i t was found i n the study by Abbs and M i n i f i e (1969) that v o i c e l e s s f r i c a -t i v e s have a longer duration than voiced f r i c a t i v e s , possibly to ensure a noise-lead. The voiced d i s t i n c t i o n may also be f a c i l i t a t e d by the presence of strong formant patterns that are detected i n a manner simi-l a r to those i n stop-consonant CVs (Stevens & K l a t t , 1974). /s/ and 17.1 are p r i n c i p a l l y distinguished from other f r i c a t i v e s by t h e i r high resonance frequency and t h e i r r e l a t i v e l y intense, short spectrum. They also have r e l a t i v e l y high peak amplitudes. The highest error rate occurred when two members of /£/, /v/, /&/ and /•}"/ were contrasted and there were no major voic i n g , frequency, bandwidth or i n t e n s i t y d i f f e r e n c e s . But when two of these dimensions were contrasted i n a p a i r the error rate dropped considerably. E s s e n t i a l l y the acoustic cues used by ch i l d r e n to discriminate the f r i c a t i v e s were of a temporal nature or due to s i g n i f i c a n t frequency 89 or i n t e n s i t y contrasts. These acoustic cues were used to d i s t i n g u i s h the st r i d e n c y ! features + voice and + coronal J. When neither feature were present .. _.nterior_ J to be discriminated, the pairs were poorly discriminated. The type of analysis required to perform these discriminations resembles the auditory analysis that has been shown to take place for the ca t e g o r i c a l perception of stop consonants. Conceivably with further research, the type of audi-tory analysis necessary to perform f r i c a t i v e d i s crimination could also be shown to take place i n a ca t e g o r i c a l manner. Cutting and Rosner (1974) have shown that v a r i a b l e acoustic dimension, rise-time, can r e s u l t i n the c a t e g o r i c a l perception of speech (to cue the a f f r i c a t e / f r i c a t i v e d i s t i n c t i o n ) and of nonspeech ("plucks" and "bows"). Graham and House (1971) investigated the adequacy of current l i n g u i s t i c descriptions to describe the cues used by a c h i l d to deduce the adult's phonological system. T h i r t y g i r l s from three to four and one-h a l f years of age performed a same-different d i s c r i m i n a t i o n task on pairs of "words". The "words" consisted of the frame /haCad/ containing one of the sixteen commonly occurring English consonants. The words were paired e i t h e r with themselves or another word i n the group. Analysis of the errors revealed that the ch i l d r e n made the same type of errors as adults, only more. Individual differences were a matter of degree, not type of er r o r . The error rate was the highest when the consonants d i f -fered by only one d i s t i n c t i v e feature. When the stimulus pair d i f f e r e d by two or more d i s t i n c t i v e features, the error rate dropped and then l e v e l l e d o f f . Graham and House also examined the "equivalence" of d i s t i n c t i v e features for purposes of d i s c r i m i n a b i l i t y . When one-feature contrast 90 pairs were examined, some features contributed more to discrimination than others. However these features were not "additive", i . e . when combined so that a p a i r d i f f e r e d by two highly "discriminable" features, the r e s u l t was not always a highly discriminable p a i r . The authors used multidimensional s c a l i n g techniques to deter-mine i f they could f i n d the indicated four dimensions that would account for ninety-percent of t h e i r data's " f i t " , i n terms of several d i s t i n c t i v e feature systems. None of the feature systems would combine into feature combinations that describe those relevant f o r children's perceptual judgments. The authors concluded that while l i n g u i s t i c d i s t i n c t i v e features may adequately describe the a r t i c u l a t o r y gestures used to produce speech they are not adequate to describe the perceptual para-meters used by a l i s t e n e r i n categorizing speech. The pattern of errors made by the c h i l d r e n did support t h e i r coding acoustic information i n terms of a r t i c u l a t o r y data such as manner, place and v o i c i n g . There-fore the c h i l d r e n may use a r t i c u l a t o r y data when decoding the speech s i g n a l . Considering the fac t that Graham and House determined that the d i s c r i m i n a b i l i t y of features was not addi t i v e and that the i n t e r a c t i o n of acoustic cues often obscure the s p e c i f i c a t i o n of the features present, i t i s not s u r p r i s i n g that the authors f a i l e d to f i n d an adequate system to describe the perceptual parameters used i n decoding the speech s i g n a l . Unless more in v a r i a n t acoustic a t t r i b u t e s are found to correspond with feature d i s t i n c t i o n s such an analysis would not be possible. The re-s u l t s of t h i s experiment again i n d i c a t e that the processing of acoustic information about a feature i s performed with respect to the t o t a l stimulus complex. The information pertaining to a feature i s v a r i a b l e i n nature 91 according to the rest of the acoustic information i n the stimulus com-plex and i n t e r a c t i o n s among the acoustic cues may y i e l d c o n f l i c t i n g information about the features that are present. I t should also be noted i n the Graham and House study that the pair li-B/ produced the most er-r o r s . Here i s another example where the lack of a well-defined acoustic basis f o r a phonetic d e c i s i o n r e s u l t s i n poor dis c r i m i n a t i o n . Moskowitz (1975) stated that a d i f f e r e n t i a t i o n must be made be-tween the c h i l d ' s a c q u i s i t i o n of the phonetic system, i . e . learning to associate the a r t i c u l a t o r y gestures with s p e c i f i c acoustic e f f e c t s , the v a r i a t i o n s imposed upon a sound by i t s phonetic context and the timing required to produce a r t i c u l a t o r y gestures with respect to each other, and the a c q u i s i t i o n of phonology, i . e . learning the systematic aspects of the sound system including functional oppositions and the system i n which they are f u n c t i o n a l . If t h i s d i f f e r e n t i a t i o n i s kept i n view then many apparent discrepancies i n the c h i l d ' s system can be resolved. I t i s possible that the c h i l d ' s system may contain a phonological d i s t i n c t i o n yet he cannot phonetically r e a l i z e the items or the reverse', that the c h i l d may be able to produce the items forming a d i s t i n c t i o n yet not use the items to form a phonological contrast. Both systems i n t e r a c t and e i t h e r one may lead or lag behind the other, not revealing the true status of the other system. In cases where the c h i l d i ndicates that h i s perception i s more advanced than his production, i t i s usually supposed that the c h i l d ' s phonetic system i s not as advanced as h i s phonological system. Moskowitz reviewed data from eight subjects to determine the a c q u i s i t i o n of the eight English f r i c a t i v e s i n both systems. She used 92 a s u b s t i t u t i o n analysis framework. Several relevant observations were made. F i r s t the most common s u b s t i t u t i o n for /£>/ i s 7f/, p r e c i s e l y the phoneme that i s so poorly discriminated from 10/ by ch i l d r e n as well as adults. Second, the usual substitute for /<J7 was /d/. However on e l i -c i t e d i m i t a t i o n tasks the phoneme /v/ was frequently substituted for Moskowitz established the following c r i t e r i o n to e s t a b l i s h that a phoneme has been phonologically acquired: "A phoneme x, can be said to have been acquired when the pattern of phonetic r e a l i z a t i o n of X i s con s i s t e n t l y d i s t i n c t from the pattern of phonetic r e a l i z a t i o n of any other phoneme, Y [p. 146]." Using t h i s c r i t e r i o n the voic e l e s s f r i c a t i v e s / f , s , j f , 0 / were acquired before the voiced f r i c a t i v e s /v,z,j,dV. In addition a l l f r i c a t i v e s showed evidence of phonological a c q u i s i t i o n before they were phonetically stable. When a phoneme has been phonologically learned, i t has been established as an e n t i t y i n the system of language. It has been d i s -tinguished from other phonemes as being a c o u s t i c a l l y d i f f e r e n t and being able to d i f f e r e n t i a t e l e x i c a l items. The phoneme i s recognized as a speech-sound contained i n the system and hence becomes e l i g i b l e f or pho-n e t i c processing. Phonetic learning can now begin to determine what distinguishes the phoneme from other phonemes i n a r t i c u l a t o r y terms. The three unvoiced f r i c a t i v e s / f , s , j f / occurred phonetically be-fore the four voiced f r i c a t i v e s . Phonetically, IOI was acquired l a s t , as the ch i l d r e n p e r s i s t e d i n s u b s t i t u t i n g / f / for /&/ for long periods of time. The four voiced f r i c a t i v e s exhibited four d i f f e r e n t patterns of phonetic a c q u i s i t i o n and did not duplicate the a q u i s i t i o n pattern of the unvoiced f r i c a t i v e s . 93 Moskowitz explained the a l t e r n a t i v e substitutions for ligl as re-s u l t i n g from c o n f l i c t s between the two sources of phonetic information a v a i l a b l e to the c h i l d , the perceptual mode and the a r t i c u l a t o r y mode. The c h i l d who has not yet mastered /^Vand who does not have 10/ w i l l s ubstitute /d/ i n h i s spontaneous speech, /d/ i s h i s closest possible a r t i c u l a t o r y s u b s t i t u t i o n and during production he attempts to modify the a r t i c u l a t i o n of /d/ to match /$/. In the i m i t a t i o n task though, he does not need to maximize a r i t c u l a t o r y p r a c t i c e , but needs to maximize acous-t i c e f f e c t . To do t h i s , he substitutes /v/, a closer match a c o u s t i c a l l y than /d/. Moskowitz presented the suppression process sas an example of a process that l i m i t s the c h i l d ' s phonetic c a p a c i t i e s . B r i e f l y , f o r "each allophone of a phoneme, a speaker has recognized that there e x i s t s a nucleus of possible intended productions that r e s u l t i n acceptable i n -stances df that Nphone. Outside t h i s nucleus i s a second layer of phone-t i c p o s s i b i l i t i e s that are close enough that they may be considered ac-ceptable during a c q u i s i t i o n . A t h i r d layer e x i s t s beyond the second layer and i t i s composed of a l l the phones that could not serve as an acceptable r e a l i z a t i o n of the phone i n question. During a c q u i s i t i o n the c h i l d learns the boundary between the "acceptable" and the "second l a y e r " productions. In the process, he suppresses the phonetic r e a l i z a t i o n s that e x i s t i n the second layer. If however two phones are being acquired and t h e i r "second l a y e r s " overlaps then t h i s area w i l l be strongly sup-pressed. If ranges of values are acceptable for one phone yet being suppressed for another phone, the acceptable ranges may s h i f t to avoid a c o n f l i c t of demands. 94 /&/ f a l l s into the suppressed layers of the phonetic r e a l i z a t i o n s of It/, Is/ and / f / . Phonologically 10/ emerges but phonetically i t ba t t l e s with the phonetic r e a l i z a t i o n s of / t / , /s/ and / f / and hence, i s t r i p l y suppressed. The c h i l d may substitute / f / for 10/ during t h i s stage as HI o f f e r s the closest match a c o u s t i c a l l y , / f / occurs as a substitute f o r 101 when he cannot approach i t s production i n an a r t i c u l a t o r y manner. Id/ occurs as a substitute for while attempting to modify h i s a r t i -c u l a t i o n to approach /$/. /v/ occurs as a substitute f o r /<J/ i n an imi-t a t i o n task because i t i s an expedient acoustic substitute. Moskowitz then, outlined the information that a c h i l d has a v a i l -able f o r acquiring phonetic r e a l i z a t i o n s . He uses h i s a r t i c u l a t o r y know-ledge and h i s perceptual knowledge to achieve the r e a l i z a t i o n i n the most expedient manner for the task at hand. This account suggests that the c h i l d monitors h i s productions and attempts to modify h i s k i n e s t h e t i c feedback u n t i l , on successive attempts, a match i s achieved between h i s auditory feedback and h i s perception of the external model. U n t i l he has achieved t h i s goal h i s productions w i l l be composed of segments that are either a close a r t i c u l a t o r y match or a close acoustic match. Menyuk (1973) surveyed data c o l l e c t e d on the mastery of speech sounds by American and Japanese c h i l d r e n to determine i f there existed a s p e c i f i c order for the a c q u i s i t i o n of feature d i s t i n c t i o n s . This data was compared with data on sound substitutions made by chil d r e n with ar-t i c u l a t o r y problems and data on perceptual confusions made by adults. Menyuk hoped to analyze these c o l l e c t i o n s of data and derive some i n f o r -mation on the cues used i n perception and production of consonants by chi l d r e n during the developmental period. 95 The features examined were gravity, diffuseness, stridency, con-tinuancy, n a s a l i t y and v o i c i n g . The data on the mastery of features was determined by c a l c u l a t i n g the percentage of sounds containing a feature that were produced with t h i s feature at various stages during the c h i l d -ren's development. Menyuk did not state whether both members of a fea-ture opposition were required before a feature was considered learned. As no mention was made concerning use of the opposition i n a meaningful way, i t i s assumed that Menyuk was examining phonetic c a p a c i t i e s not phonological a b i l i t i e s . The s p e c i f i c phonemes i n which features were contained were not given so no information can be derived from t h i s study, concerning the s p e c i f i c p h y s i o l o g i c a l and acoustic parameters that were used to make these d i s t i n c t i o n s . The data showed that Japanese and American c h i l d r e n mastered the features i n a very s i m i l a r order. The order derived did not cor-respond w e l l with data c o l l e c t e d on the frequency of occurrence of features i n the adult language. Therefore Menyuk gave l i t t l e support to the theory that c h i l d r e n f i r s t produce the features that they hear most frequently. A comparison was made between the substitutions made by the American c h i l d r e n and the group with a r t i c u l a t i o n problems, with the perceptual errors made by adults. The normal and the deviant groups performed i n a s i m i l a r manner, but the deviant group generally main-tained features l e s s w e l l . Stridency was best maintained by adults i n t h e i r r e c a l l of CV s y l l a b l e s , while i t was^the feature l e a s t maintained by the c h i l d r e n i n the " a r t i c u l a t i o n problem" group. Voicing and n a s a l i t y ranked high i n a l l groups. Substitutions made by the normal c h i l d r e n cor-re l a t e d very well with the adult perceptual errors, i n d i c a t i n g that there 96 may e x i s t some underlying acoustic cues i n those features that are not r e a d i l y perceived, r e c a l l e d and produced. The features + nasal and + voiced ranked high i n the maintenance of features by both groups of c h i l d r e n and the adults' perceptual errors. Therefore they may be the easiest features to perceive, r e c a l l and pro-duce. The feature + s t r i d e n t was best maintained i n the r e c a l l of consonants by adults. I t was among the l a s t of.the features to be ac-quired by the American and Japanese chil d r e n , yet i t was best maintained a f t e r the feature + voice and + nasal i n the substitutions of normal c h i l d r e n . This feature ranks l a s t though, i n the features maintained by the c h i l d r e n with a r t i c u l a t i o n problems., Menyuk proposed that some features are more e a s i l y mastered due to t h e i r on-off c h a r a c t e r i s t i c s . That i s , for the feature + voice the folds e i t h e r v i b r a t e or do not, and the feature i s either present or absent. Features such as + continuant, though, cannot be as c l e a r l y delineated; the acoustic s i g n a l l a s t s f or sometime longer than a burst, but cues are not c l e a r l y s p e c i f i e d i n absolute terms. The features that were best maintained by the c h i l d r e n (+ voice and + nasal) were found to be discriminated by c h i l d r e n with varying ef-f i c i e n c y depending on the phonemes that contained the contrast (see Graham & House, 1971). For example, the feature voice was discriminated w e l l i n the p a i r / t - d / , yet poorly i n the /s-z/ p a i r . Features may be detected through basic auditory processing i n some stimulus complexes and not i n others. The s p e c i f i c acoustic information that needed to be decoded i n the above r e s u l t s cannot be examined because s p e c i f i c con-texts of occurrence were not provided and features are manifested i n * 9J d i f f e r e n t manners i n d i f f e r e n t phonemes. Menyuk (1971) sought evidence that phonological processing takes place i n p a r a l l e l with semantic-syntactic processing. Nonsense s y l l a b l e sequences and three- to five-word sentences were presented to a normal and a language delayed group of c h i l d r e n f o r r e p e t i t i o n . Menyuk argued that the language delayed c h i l d r e n should not perform as well as the normal ch i l d r e n i f semantic-syntactic information was u t i l i z e d during phonetic processing. The r e s u l t s indicated that language delayed c h i l d r e n made more consonant errors when repeating both types of s t i m u l i . The la r g e s t num-ber of errors was made by both groups when repeating nonsense s y l l a b l e s . The language delayed group could not analyze phonological sequences i n terms of segmental or s y l l a b i c phonological features. When the s u b s t i -tutions made by both groups were analyzed a s i m i l a r feature maintenance pattern r e s u l t e d . Although the scores deteriorated when the s t i m u l i were meaningless, -the pattern of r e c a l l was the same f o r both nonsense s y l l a b l e s and words. Menyuk concluded that phonological processing into l e x i c a l re-presentations must include consideration of semantic information and that "phonological sequences are i d e n t i f i e d f i r s t i n terms of l e x i c a l form (semantic and phonological features) and then i n terms of t h e i r speech sound sequences - s y l l a b i c or segmental [p. 188]". It i s i n t e r e s t i n g also to note that the same pattern and type of errors occur-red f o r both sets of s t i m u l i . This i n f e r s that while semantic informa-t i o n may improve p o s i t i v e i d e n t i f i c a t i o n of a speech sound sequence, a l l auditory s t i m u l i undergo the same basic auditory analysis (including non-98 speech). Furthermore a s i g n a l that has been judged to be a p o t e n t i a l component of the language system w i l l be processed phonetically and i n -formation from one l e v e l of analysis i s a v a i l a b l e at another l e v e l . This i s a requirement of the analysis by synthesis model. Menyuk and Anderson (1969) examined the production and i d e n t i f i -c a t i o n of the three l i q u i d s /w/, / r / and 111 by preschool age c h i l d r e n . Adults and thirteen-year olds served as c o n t r o l groups. The authors wished to determine whether the c h i l d r e n could d i s t i n g u i s h a sharp phonetic boun-dary for these l i q u i d s , as has been found for stop consonant s t i m u l i varying i n several acoustic dimensions. The values produced by the c h i l d -ren were also recorded. Results showed that the c h i l d r e n did not e s t a b l i s h sharp phonetic boundaries for the l i q u i d sounds, either i n t h e i r production or during the i d e n t i f i c a t i o n task. The c h i l d r e n maintained a d i s t i n c t boundary better i n the perception task than the production task. The authors reasoned that when chil d r e n are developing t h e i r phonetic system, they f i r s t i d e n t i f y differences between members i n a set and then learn to produce them. Menyuk and Anderson were also concerned as to whether ch i l d r e n match an a r t i c u l a t o r y gesture to a previously i d e n t i f i e d sound, or "mimic" sounds and base t h e i r perceptual categories on the r e s u l t i n g a r t i c u l a t o r y gestures. The adults i n t h i s study were found to observe the boundaries more cons i s t e n t l y when both producing and i d e n t i f y i n g the three sounds. However, the thirteen-year-olds performed the best of a l l three groups. This r e s u l t was interpreted as possible support for the use of the analysis by snythesis model of speech perception. At t h i r t e e n years of age, the time when l i n g u i s t i c knowledge i s considered 99 to be f u l l y developed (and the l e f t hemisphere a l s o ) , c h i l d r e n w i l l have the most r e l i a b l e a c o u s t i c - a r t i c u l a t o r y data a v a i l a b l e f o r t h e i r use that they w i l l ever have. Up u n t i l t h i s point i n t h e i r l i v e s , they have been experimenting with a r t i c u l a t i o n and i t s acoustic e f f e c t s . Having the raw data that r e s u l t s from auditory analysis a v a i l a b l e , they then go about matching an a r t i c u l a t o r y configuration to a desired sound and hence b u i l d up a bank of a r t i c u l a t o r y - a c o u s t i c data. Furthermore t h i s store can be drawn upon for phonetic processing. At t h i s age then, the c h i l d ' s system i s probably "tuned" the f i n e s t i t w i l l ever be. The adult loses the acuity of h i s auditory system and h i s d i s c r i m i n a t i o n a b i l i t i e s and production may deteriorate. The a c o u s t i c - a r t i c u l a t o r y image becomes le s s v i v i d . Menyuk and Anderson were unable to place much s i g n i f i c a n c e on t h i s f i n d i n g due to the small number of thirteen-year old subjects (N=4). I t would be of i n t e r e s t i f researchers further investigated any processing differences that do occur at t h i s age. 3.2 Conclusions The above research on children's a c q u i s i t i o n of the sound system i s consistent with what i s 'known about adult processing of speech s t i m u l i . The r e s u l t s can be integrated to show that the analysis by synthesis model of speech perception i s used by c h i l d r e n during the time that they acquire language. From b i r t h , the human auditory system i s equipped to perform com-plex auditory analysis on speech or nonspeech s t i m u l i . The nature of t h i s processing involves the e x t r a c t i o n of data from c e r t a i n acoustic dimensions and the r e s u l t i n g d i v i s i o n of s t i m u l i into i t s natural cate-100 gories. The categories perceived i n speech processing characterize the "names" of sounds and i t may be more than c o i n c i d e n t a l that the acoustic dimensions that d i f f e r e n t i a t e many features correspond to d i s t i n c t i o n s thatjthe auditory system determines r e a d i l y . Accordingly, infants have been shown to be able to perceive speech s t i m u l i i n a c a t e g o r i c a l manner s i m i l a r to the adult. In these studies, the s t i m u l i vary i n only one acoustic dimension. In the case of natural speech, each sound i s com-posed of a complex of the acoustic data representing each feature. In these complexes the acoustic data i n t e r a c t s and can produce poorly de-fined, i f not c o n f l i c t i n g , data f o r a s p e c i f i c feature's i d e n t i f i c a t i o n . Also i n d i f f e r e n t stimulus complexes the information for a s p e c i f i c fea-ture i s r e a l i z e d i n a d i f f e r e n t manner. Hence features are not simply detected as an isomorphic mapping between acoustic information and fea-ture. The c h i l d can be expected to f i r s t l earn a feature i n some con-texts before others. For example, Barton (1975b) found that two and one-h a l f year olds could r e a d i l y discriminate the feature 'voice' for the stimulus p a i r /k-g/, yet Tikofsky and Mclnish (1968a) found that seven-year olds s t i l l had d i f f i c u l t y with that feature i n the p a i r s / f - s / , /s-z/ and /0-£l. There i s a great difference i n detecting one feature from one acoustic dimension and processing the e n t i r e stimulus complex to y i e l d a set of features. Herein l i e s the d i f f e r e n c e between auditory and phonetic processing. What causes a s i g n a l to be recognized as belonging to the system of language and undergo l e f t hemisphere processing? For speech or nonspeech, i f the acoustic c h a r a c t e r i s t i c s of a s t i m u l i define i t as an element of a system (therefore i t may be c l a s s i f i e d ) and the 101 l i s t e n e r has developed a concept about the existence of that element i n the system then s p e c i a l perceptual processing occurs. In the case of speech the sounds are recognized as o r i g i n a t i n g i n the vocal t r a c t and during phonetic processing are traced back to t h e i r , p l a c e of o r i g i n . Decoding of the t o t a l stimulus complex i s achieved through seeking the acoustic-a r t i c u l a t o r y r e l a t i o n s h i p s . The c h i l d may begin to process sounds phonetically when he has recognized that the sound originated i n the vocal t r a c t , but does not need to have mastered the a r t i c u l a t o r y r e a l i z a t i o n s . The studies by Shvachkin (1973), Garnica (1973) and Barton (1975b) required that the c h i l d detect a contrast that produced a dif f e r e n c e i n meaning between l e x i c a l items. To do t h i s the c h i l d has formed a concept about the e x i s t -ence of the phonemes being discriminated. He may detect a d i f f e r e n c e between two sounds yet u n t i l he has mastered t h e i r a r t i c u l a t i o n he can-not i d e n t i f y the d i f f e r e n c e . The phonemes become r e a l f o r the c h i l d when he can a r t i c u l a t e them. Before then he uses what a r t i c u l a t o r y knowledge that he has to approximate a phoneme during production. It i s postulated then, that the c h i l d begins to process s t i m u l i phonetically when he has formed a concept concerning the segment's place i n the speech system and begins to seek the a r t i c u l a t o r y configuration that pro-duces the segment. Hence the a r t i c u l a t o r y pattern i s not necessary f o r a segment's phonetic i n t e r p r e t a t i o n , but the r e a l i z a t i o n that the seg-ment can be traced to i t s source and the active pursuit of the source, i s necessary. Once t h i s information i s a v a i l a b l e to the c h i l d , he no longer depends so much on auditory analysis because he has a r e l i a b l e means of perceiving a l l tokens of the same s t i m u l i . 102 Moskowitz provided evidence that the c h i l d has formed some r e l a -tionships between acoustic and a r t i c u l a t o r y images during phonetic ac-q u i s i t i o n of a phoneme. In s u b s t i t u t i n g /v/ for•/&/ the c h i l d r e n were able to match acoustic e f f e c t s when necessary, although i n t h e i r spon-taneous speech they provided an a r t i c u l a t o r y match, /d/. Hence the c h i l d has formed ideas about best matches i n both a r t i c u l a t o r y and acoustic terms. In the development of a r t i c u l a t i o n he modifies the a r t i c u l a t o r y gesture u n t i l i t matches a c o u s t i c a l l y , the model. Smith's son A was able to understand a recording of h i s speech provided he s t i l l was at the same stage. If several weeks lapsed be-for he l i s t e n e d to the recording, he could not understand his speech. The c h i l d must be able to use h i s a r t i c u l a t o r y data to perceive^his speech i n a manner s i m i l a r to that proposed by the analysis by synthesis model. When l i s t e n i n g to the adult though, i f the adult used an "adult form" that also corresponded to a d i f f e r e n t l e x i c a l item i n the c h i l d ' s speech, the c h i l d would i n t e r p r e t the adult meaning. In t h i s case the c h i l d could not have used his a r t i c u l a t o r y data i n a d i r e c t manner. However the c h i l d also indicated on numerous occasions that he was aware of h i s " i n c o r r e c t " pronunciation. Provided that t h i s i s so, the c h i l d knew there was a d i f f e r e n c e yet could not perceive the nature of the d i f -ference i n a r t i c u l a t o r y terms. He had a concept of the phoneme's en t i t y and he could perceive the phoneme without being able to produce i t . In the analysis by synthesis model each time a sequence i s pro-cessed the a r t i c u l a t o r y representation i s activated and compared with the acoustic data. Hence the a c o u s t i c - a r t i c u l a t o r y l i n k i s r e i n f o r c e d . In a case where the a r t i c u l a t i o n i s not 'correct' t h i s reinforcement 103 serves to r e - e s t a b l i s h the difference between the e x i s t i n g production and the acoustic goal. Hence processing i n th i s manner also explains the source of the c h i l d ' s awareness that h i s a r t i c u l a t i o n i s d e f i c i e n t . The comparator provides t h i s information. During analysis of adult speech, the c h i l d uses h i s a r t i c u l a t o r y knowledge and can attend to the d i f f e r -ences between h i s form and the adult form but i d e n t i f i c a t i o n of the sequence i s more dependent upon auditory analysis during t h i s stage. Therefore i t can be explained why the more a c o u s t i c a l l y i d e n t i f i a b l e features are perceived f i r s t . Perception then develops along with production, but precedes production by the i n t e r v a l extending from when the phoneme i s recognized as a part of the system to when i t i s c o r r e c t l y a r t i c u l a t e d . During t h i s i n t e r v a l the l i n k between the perceived acoustic c h a r a c t e r i s t i c s and the produced a r t i c u l a t o r y representation i s established and the two forms are manipulated u n t i l they coincide. Once a r t i c u l a t i o n i s mas-tered, the c h i l d can convert a sequence of symbols (the "names" of the speech sounds) into a r t i c u l a t o r y representations and he begins to ac-quire knowledge of the morphological system. As morpheme constraints are learned, t h i s knowledge further aids i n perceiving sequences of sounds. Although a l l human auditory systems are capable of the same au-dit o r y analysis from b i r t h , a l l human languages do not contain the same combinations of features i n t h e i r phonemes. Therefore i t i s u n l i k e l y that more than general trends of a c q u i s i t i o n could occur due to the d i f -ferent f a c i l i t y with which features contained i n d i f f e r e n t combinations are perceived. 104 Between the time of b i r t h and adulthood, the human's perception and production of a feature takes on the s p e c i f i c values determined i n h i s native language. Eimas et a l . (1971) showed that infants i n an Eng-lish-speaking environment perceived the phonetic boundary for v o i c i n g at a l o c a t i o n very near that found i n adult English. C h i n c h i l l a s did the same (Kuhl & M i l l e r , 1975) so i t i s u n l i k e l y that infants i n a Spanish-speaking environment would do otherwise. Lisker and Abramson (1970) found that Spanish speakers produce and i d e n t i f y a d i f f e r e n t phonetic boundary for the feature 'voice'. During the period of phone-t i c a c q u i s i t i o n the c h i l d ' s auditory system becomes "tuned" to the s p e c i f i c values found i n h i s language environment. There i s some modi-f i c a t i o n of the c h a r a c t e r i s t i c s of innate auditory processing to s p e c i -f i c values perceived i n a language, and production takes on these values also. It would be i n t e r e s t i n g to further investigate at what age these language s p e c i f i c values for phoneme boundaries would be revealed i n c a t e g o r i c a l perception experiments and whether production takes on the s p e c i f i c value from i t s onset or i s modified at a l a t e r date. The r e s u l t s of d i s c r i m i n a t i o n t e s t i n g showed that c h i l d r e n per-ceived i n a s i m i l a r manner to the adult, only poorer. Stimuli that were poorly discriminated by the adult were also poorly discriminated by the c h i l d . In these cases, i t was shown that acoustic information did not c l e a r l y define features. In a d i s c r i m i n a t i o n experiment where the adult cannot resort to using contextual information for i d e n t i f i c a t i o n of a stimulus, performance i s r e s t r i c t e d to pure auditory-phonetic processing and a condition more comparable to the c h i l d ' s mode of processing r e s u l t s . 105 Hence I t i s not s u r p r i s i n g that both populations made s i m i l a r errors; they have s i m i l a r auditory systems. Nor i s i t s u r p r i s i n g that c h i l d r e n are poorer at d i s c r i m i n a t i n g i n many cases, because they are more de-pendent upon auditory analysis which i s not as r e l i a b l e as drawing upon a c o u s t i c - a r t i c u l a t o r y c o r r e l a t i o n s . In c e r t a i n cases i t was also determined that phonemes acquired l a s t are among those that are poorly discriminated. The p a i r Ii-QI were poorly discriminated by c h i l d r e n and adults and furthermore, / f / p e r s i s t s as a s u b s t i t u t e f o r /£?/ for a long period i n many children's speech. In t h i s case the c h i l d r e n are able to draw upon an acoustic sub-s t i t u t e when an a r t i c u l a t o r y substitute i s not a v a i l a b l e . Menyuk (19,73) proposed that some features are more e a s i l y per-ceived and produced due to t h e i r on-off c h a r a c t e r i s t i c s . The detection of an acoustic a t t r i b u t e can be associated with the presence or absence of a well-defined a r t i c u l a t o r y motion. But Menyuk's study f a i l e d to examine the a c q u i s i t i o n of features i n s p e c i f i c contexts and features are not learned i n a l l contexts simultaneously. Again the example a r i s e s that i n Barton's study (1975b) the /g-k/ v o i c i n g contrast was e a s i l y discriminated but other v o i c i n g contrasts were not. The same r e s u l t was found by Graham and House (1971). Hence again, i t appears that the clear i n t e r p r e t a t i o n of acoustic data i n t e r a c t i n g i n a stimulus complex, de-termines which features i n t h e i r respective contexts are discriminated f i r s t . This d i s c r i m i n a t i o n i s a p r e r e q u i s i t e for the analysis of what differences e x i s t between t h i s stimulus and other s t i m u l i and hence i d e n t i f i c a t i o n of the segment i n a r t i c u l a t o r y terms. 106 The development of phonetic perception that occurs i n t h i s man-ner explains the development of the a c o u s t i c - a r t i c u l a t o r y r e l a t i o n s that are contained i n the comparator unit of the analysis by synthesis model. Once these r e l a t i o n s h i p s have been determined, the c h i l d i s f u l l y equipped to use th i s mode of ana l y s i s . But p r i o r to t h i s time how does the c h i l d use such a model? The c h i l d can perceive two forms of speech, h i s own and the adult form. The c h i l d can understand h i s speech during the b r i e f time that he remains at that l e v e l . In order to do t h i s he uses the ana-l y s i s by synthesis method and processing employs h i s a r t i c u l a t o r y representa-t i o n s . In perceiving the adult though, he lacks the a r t i c u l a t o r y speci-f i c a t i o n s f o r some segments. For those segments that are phonologically learned, he has discriminated the segment from others i n the language and he has formed a concept of the segment's r o l e i n the language system. The c h i l d i s also aware that h i s a r t i c u l a t o r y knowledge i s not s u f f i c i e n t to r e a l i z e the segment i n production. But t h i s very f a c t , that the c h i l d knows h i s a r t i c u l a t i o n and the acoustic model do not correspond, explains how the c h i l d uses the analysis by synthesis model for processing and the value of i t s use. When a segment i s analyzed that i s beyond the ch i l d ' s a r t i c u l a t o r y a b i l i t i e s , a hypothesis concerning i t s i d e n t i t y i s made by the control u n i t . The a r t i c u l a t o r y representation used by the c h i l d i s generated and the comparator analyzes the difference between th i s representation and the r e s u l t s of preliminary analysis (auditory a n a l y s i s ) . The r e s u l t s from the comparator serves to produce the c h i l d ' s awareness of the difference between forms, as the a c o u s t i c - a r t i c u l a t o r y differences are exactly what are determined. Information about the difference can then be u t i l i z e d i n the c h i l d ' s next production. Hence 107 perception a c t i v e l y modififes the c h i l d ' s productions and a r t i c u l a t i o n ultimately aids i n perception. Stevens and House suggest that i n speech processing a match of acoustic information with a r t i c u l a t o r y representations does not always need to be performed by the adult because often a r e l i a b l e hypothesis can be derived from the r e s u l t s of preliminary analysis, however i t i s performed as a check. The c h i l d i s aware that he cannot r e l y upon t h i s match and r e l i e s more on e a r l i e r precessing and h i s knowledge of h i s ar-t i c u l a t o r y discrepancy from the model. If the c h i l d was not aware of his l i m i t e d production c a p a c i t i e s , he would f a i l to understand the adult's speech. This awareness originates i n the comparator during perception of the adult forms but also occurs during the c h i l d ' s production. In a theory that presumes perception and production are com-ponents of the same system i t i s not s u r p r i s i n g to f i n d that both pro-cesses i n t e r a c t i n t h e i r development. The comparator with i t s vast store of a c o u s t i c - a r t i c u l a t o r y r e l a t i o n s provides the "goal" information for production and attempts to determine how a r t i c u l a t i o n i s d e f i c i e n t , based on previous a n a l y s i s . When the c h i l d produces a phoneme not yet phonetically learned, he has a goal f o r t h i s production, based on h i s acoustic and a r t i c u l a t o r y knowledge of that phoneme and the discrepancies determined by the com-parator u n i t . During production the c h i l d receives k i n e s t h e t i c feedback from the vocal t r a c t and auditory feedback that i s processed i n the same manner as external s t i m u l i . The hypothesis i s p r e c i s e l y the goal that contains c o r r e c t i o n a l a r t i c u l a t o r y information. The comparator analyzes t h i s hypothesis with the actual a r t i c u l a t i o n and computes the dif f e r e n c e . 108 The a r t i c u l a t o r s can then be informed how far o f f - t a r g e t they were and th i s c o r r e c t i o n factor can be used i n the next production. In t h i s case a comparison i s made between the actual a r t i c u l a t o r y representation and the hypothesized a r t i c u l a t o r y representation supplied by the comparator. The comparator can a n t i c i p a t e that the a r t i c u l a t o r y representation used i n i t s next analysis of that segment w i l l be i n t h i s form. It i s postulated that the c h i l d u t i l i z e s the analysis by synthe-s i s model from the time he begins to phonetically perceive speech. While the a r t i c u l a t o r y representation may not be a v a i l a b l e for h i s use, pro-cessing u t i l i z e s the r e s u l t s of auditory analysis and the comparator's knowledge of the discrepancies to be expected between the c h i l d and adult forms. While the comparator recognizes that these forms are not iden-t i c a l , the c h i l d i s aware of his immature production. During perception of the adult, the comparison of the acoustic model with the c h i l d ' s a r t i c u l a t o r y representation provides information that may be used l a t e r to modify production. During production the comparison i s made between the actual a r t i c u l a t o r y representation used and the a r t i c u l a t o r y adjustments proposed by the "goal". This information i s used to keep the comparator informed of the c h i l d ' s production compared with the goal so that the comparator can a n t i c i p a t e t h i s r e l a t i o n s h i p for i t s next perceptual com-parison. U n t i l the c h i l d has developed hi s a r t i c u l a t o r y competence and can trace s t i m u l i to t h e i r o r i g i n i n the vocal t r a c t , i t i s to be ex-pected that he w i l l have d i f f i c u l t y perceiving sequences where c l e a r l y defined acoustic-phonetic information i s at a premium. V \ Chapter 4 DISCUSSION 4.1 The Importance of Babbling The introduction r a i s e d some issues that may be better explained with some in s i g h t into the perceptual process used by ch i l d r e n . The c h i l d can perceive both h i s own speech and that of the adult. He does this using the same method of processing. When perceiving himself he i s able to use the analysis by synthesis model i n the same way that the adult uses i t . To perceive the adult, he uses t h i s model again but also uses the knowledge that the comparator contains about the r e l a t i o n s h i p be-tween the forms. This information i s also used to modify a r t i c u l a t i o n towards the external adult model. It should be noted that the compara-tor does not ne c e s s a r i l y know the exact a r t i c u l a t o r y adjustments required to correct an a r t i c u l a t i o n , but i t s knowledge i s probably i n general a c o u s t i c - a r t i c u l a t o r y terms and has been derived from previous experimen-t a t i o n with a r t i c u l a t i o n . The c h i l d ' s perception i s not i n adult form from a very young age. Although h i s auditory system i s capable of analyzing s t i m u l i i n an a d u l t l i k e manner, phonetic-perception develops along with, but pro-bably s l i g h t l y ahead of, phonetic production. These two aspects of the same system depend on each other for t h e i r development. Although the re s u l t s of auditory analysis may contribute l a r g e l y to phonetic proces-sing i t i s the f i n a l r e s o l u t i o n with an a r t i c u l a t o r y representation 109 110 that completes t h i s learning. One consideration of theories of phonological development i s whether babbling contributes to a c q u i s i t i o n of the sound system. During th i s period the c h i l d produces a complex v a r i e t y of sounds, some of which are not contained i n his language. He may not babble-some sounds con-tained i n h i s language. Therefore, babbling i s not language s p e c i f i c . A f t e r t h i s period, the v o c a l i z a t i o n s t r a i l o f f somewhat and then assume q u a l i t i e s of sound sequences found i n language. Sometimes a f t e r the c h i l d ' s f i r s t birthday, "words" appear; they are assumed to have some meaning for the c h i l d . Deaf c h i l d r e n babble as normal c h i l d r e n do, but do not progress beyond t h i s stage. It may be supposed that during the babbling period the i n f ant i s experimenting with h i s a r t i c u l a t o r y a b i l i t i e s , but not i n a l i n g u i s t i c a l l y relevant manner. Babbling and i t s auditory feedback are necessary for the next stage of development where the acoustic-a r t i c u l a t o r y correspondences are established. The infant i s born with innate auditory c a p a b i l i t i e s but although he may hear sounds he has not yet learned what to l i s t e n f o r . S i m i l a r l y the infant i s equipped with the o r a l musculature necessary for speech but he has not yet discovered the product that r e s u l t s from manipulating the peripheral o r a l structures. During babbling the c h i l d discovers the sensations associated with speak-ing and hearing, without perceiving t h e i r r o l e i n the system of language. Studdert-Kennedy (1976) outlined a study by Marler (1975) i n which he studied the a b i l i t i e s of several species of birds to develop normal b i r d song dependent on hearing t h e i r own productions and hearing an external model. The dove or chicken needs to hear neither an external model or i t s own voice for song to develop normally. The song sparrow I l l sparrow needs to hear i t s own voice but does not need an external model. The white-crowned sparrow needs both the external model and i t s own feed-back i n order to develop normal song. I f i t i s deafened early i n l i f e a highly abnormal song develops. If reared i n i s o l a t i o n an abnormal song with some natural c h a r a c t e r i s t i c s w i l l develop. Marler suggested that t h i s rudimentary song r e f l e c t s the e x i s t -ence of an auditory template that consists of information for auditory processing about the structure of vocal sounds. Matched with an external model the template tunes the development of rudimentary speech to char-a c t e r i s t i c s of the model. The b i r d gradually discovers the motor con-t r o l s needed to match i t s output with the modified template. Without a model the template establishes some basic features of normal song i n the rudimentary form. Marler proposed that the c h i l d learns language i n a s i m i l a r man-ner. Early i n l i f e sensory mechanisms analyze the sounds of others and then turn to analyzing t h e i r own productions. On the motor side vocal development i s dependent on auditory feedback and there has developed "neural c i r c u i t r y necessary to modify patterns of motor outflow so that sounds generated can be matched to pre-established auditory templates [p. 33]". Similar patterns of speech develop i n the c h i l d who i s born deaf or r a i s e d i n i s o l a t i o n . The congenitally deaf c h i l d produces l i t t l e i f any, speech and voice with abnormal "deaf" q u a l i t i e s . I f r a i s e d i n i s o -l a t i o n the c h i l d produces a highly abnormal pattern of speech (see From-kin, Krashen, C u r t i s s , R i g l e r , & R i g l e r , 1974). Therefore there i s support f o r some type of speech-related auditory sensorimotor mechanism that may modify patters of motor output to match sounds generated by 112 the vocal apparatus against some standard. This sensorimotor i n t e r -a c t i o n would provide the mechanism for discovering a u d i t o r y - a r t i c u l a t o r y correspondences. 4.2 Implications For Research The r e s u l t s of studies on children's perceptual status can be explained i n terms of the c h i l d ' s use of the analysis by synthesis model. If the r e s u l t s are not s u f f i c i e n t to wholly support the existence of t h i s model i n children's processing systems, i t i s due simply to the lack of research i n t h i s area. While there are inherent d i f f i c u l t i e s associated with experimentation i n th i s area, several t h e o r e t i c a l issues a r i s e that could y i e l d information about the a c q u i s i t i o n process. Beginning from a very young age, i t would be of i n t e r e s t to de-termine at what stage i n the a c q u i s i t i o n process and for what type of tasks, phonetic processing occurs. G i l b e r t and Climan (1974) found i n d i c h o t i c l i s t e n i n g experiment, that an REA was present i n c h i l d r e n as young as two and one-half years. Using recordings of average evoked p o t e n t i a l s , such as that used i n Wood (1975), some idea of what i s nec-essary to evoke phonetic processing could be obtained. More integrated research into both production and perception at successive stages i n a c h i l d ' s a c q u i s i t i o n of h i s sound system could y i e l d more s p e c i f i c information on the r e l a t i o n s h i p and i n t e r a c t i o n be-tween these forms. Most of the present studies involve one of these processes and i n f e r about the other. Analysis of the acoustic informa-t i o n contained i n both forms may in d i c a t e what information i s processed r e l a t i v e l y d i r e c t l y from the s i g n a l and where discrepancies i n acoustic cues a r i s e , as well as i n d i c a t i n g whether the perceived form i s i n fac t 113 produced. Another issue upon which investigators disagree i s whether the c h i l d can perceive the differ e n c e between h i s form and the adult form. This i s d i f f i c u l t to assess i f the c h i l d can perceive both forms when produced by the appropriate speaker. To test t h i s , both forms would need to be produced by the same speaker. Therefore spectrographic analysis of the c h i l d ' s productions would be necessary to ensure that the speaker r e p l i c a t e s exactly the "child's form. During phonological a c q u i s i t i o n , between the time when d i s t i n c -tions are taught and named do other differences e x i s t i n processing be-sides consistency? Processing time could be examined as well as any diff e r e n c e i n the acoustic information that i s being extracted as a basis for the phonetic decision. As mentioned e a r l i e r , the r e s u l t observed by Menyuk and Anderson (1969) could be further researched to determine whether thirteen-year -olds r e a l l y do more c l o s e l y observe phonetic boundaries i n perception and production than either younger c h i l d r e n or adults. This could lead to i n d i r e c t evidence f o r a close t i e between perception and production and the theory that t h i s t i e i s developed i n the pre-puberty stage of l i f e . Also mentioned e a r l i e r was the i n v e s t i g a t i o n into when s p e c i f i c language values for features such as v o i c i n g occur, and whether these boundaries are both perceived and produced at th i s value from the be-ginning, or whether production i s l a t e r modified to these s p e c i f i c values. SELECTED BIBLIOGRAPHY Abbs, M.S., and M i n i f i e , F.D. E f f e c t of acoustic cues i n f r i c a t i v e s on perceptual confusion i n preschool c h i l d r e n . Journal of the  Aco u s t i c a l Society of America, 1969, 4b, 1535-1542. Ades, A.E. A study of acoustic invariance by s e l e c t i v e adaptation. Perception and Psychophysics, 1974, l b , 61-66. (a) Ades, A.E. A b i l a t e r a l component i n speech perception. Journal of the  Aco u s t i c a l Society of America, 1974, 5b, 610-616. (b) Bailey, P. Perceptual adaptation f o r a c o u s t i c a l features i n speech. Speech Perception, 1973, 1, 29-34. Barton, D. S t a t i s t i c a l s i g n i f i c a n c e i n phonemic perception experiments. Journal of C h i l d Language, 1975, g, 297-298. (a) Barton, D. The dis c r i m i n a t i o n of minimally - d i f f e r e n t pairs of r e a l words by c h i l d r e n aged 2;3 to 2;11. Paper presented at the Third C h i l d Language Symposium, London, 1975. (b) B e l l , C.G., F u j i s a k i , H., Heinz, J.M. Stevens, K.N., and House, A.S. Reduction of speech spectra by analysis by synthesis techniques. Journal of the A c o u s t i c a l Society of America, 1961, 33, 1725-1736. Blechner, M.J., Day, R.S., and Cutting, J.E. Processing two dimensions of nonspeech s t i m u l i : The auditory-phonetic d i s t i n c t i o n reconsidered. Journal of Experimental Psychology: Human  Perception and Performance, 1976, 2^, 257-266. Burdick, C.K., and M i l l e r , J.D. Speech perception by the c h i n c h i l l a : Discrimination of sustained /a/ and / i / . Journal of the  Aco u s t i c a l Society of America, 1975, 58, 415-427. Cherry, E.C., and Taylor, W.K. Some further experiments upon the recog-n i t i o n of speech, with one and two ears. Journal of the Acous- t i c a l Society of America, 1954, 2jL, 554-559. Chomsky, N., and Halle, M. The Sound Pattern of English. New York: Harper and Row, 1968. Cole, R.A. D i f f e r e n t memory functions for consonants and vowels. Cognitive Psychology, 1973, A, 39-54. Cooper, W.E. Adaptation of phonetic feature analyzers for place of a r t i -c u l a t i o n . Journal of the Aco u s t i c a l Society of America, 1974, 5A, 617-627. (a) 114 115 C o o p e r , W . E . C o n t i n g e n t f e a t u r e a n a l y s i s i n s p e e c h p e r c e p t i o n . P e r c e p t i o n and P s y c h o p h y s i c s, 1974, 1JL, 201-204. (b) Crowd e r , R . G . The sound o f v o w e l s and c o n s o n a n t s i n i m m e d i a t e memory. J o u r n a l o f V e r b a l L e a r n i n g and V e r b a l B e h a v i o r , 1971, 10^, 587-659. (a) C r o w d e r , R . G . W a i t i n g f o r t h e s t i m u l u s s u f f i x : D e c a y , d e l a y , r h y t h m and r e a d o u t i n i m m e d i a t e memory. Q u a r t e r l y J o u r n a l o f E x - p e r i m e n t a l P s y c h o l o g y , 1971, 21, 324-340. (b) C r o w d e r , R . G . , and M o r t o n , J . P r e c a t e g o r i c a l a c o u s t i c s t o r a g e ( P A S ) . P e r c e p t i o n and P s y c h o p h y s i c s , 1973, 11, 502-506. C u t t i n g , J . E . Two l e f t - h e m i s p h e r e mechanisms i n s p e e c h p e r c e p t i o n . P e r c e p t i o n and P s y c h o p h y s i c s , 1974, IJL, 601-612. C u t t i n g , J . E . , and R o s n e r , B . S . C a t e g o r i e s and B o u n d a r i e s i n s p e e c h and m u s i c . P e r c e p t i o n and P s y c h o p h y s i c s , 1974, 16, 564-570. C u t t i n g , J . E . , R o s n e r , B . S . , and F o a r d , C . F . P e r c e p t u a l c a t e g o r i e s f o r m u s i c l i k e s o u n d s : I m p l i c a t i o n s f o r t h e o r i e s o f s p e e c h p e r c e p t i o n . Q u a r t e r l y J o u r n a l o f E x p e r i m e n t a l P s y c h o l o g y , ( i n p r e s s ) . D a r w i n , C . J . , and B a d d e l e y , A . D . A c o u s t i c memory and t h e p e r c e p t i o n o f s p e e c h . C o g n i t i v e P s y c h o l o g y , 1974, j ^ , 41 - 6 0 . D a v i d s e n - N i e l s e n , N . E n g l i s h s t o p s a f t e r i n i t i a l /si. E n g l i s h  S t u d i e s , 1969, 5JL (4), 321-339. D a v i s , H . , S i l v e r m a n , S . R . , and M c A u l i f f e , D . R . Some o b s e r v a t i o n s on p i t c h and f r e q u e n c y . J o u r n a l o f t h e A c o u s t i c a l S o c i e t y o f  A m e r i c a , 1951, gl, 40-42. Drachman, G . Some s t r a t e g i e s i n t h e a c q u i s i t i o n o f p h o n o l o g y . P a p e r p r e s e n t e d a t t h e U r b a n a C o n f e r e n c e on P h o n o l o g y , U r b a n a , I l l i n o i s , A p r i l 1971. E i m a s , P . D . , C o o p e r , W . E . , and C o r b i t , J . D . Some p r o p e r t i e s o f l i n g u i s t i c f e a t u r e d e t e c t o r s . P e r c e p t i o n and P s y c h o p h y s i c s , 1973, 11, 247-252. E i m a s , P . D . , and C o r b i t , J . D . S e l e c t i v e a d a p t a t i o n o f l i n g u i s t i c f e a t u r e d e t e c t o r s . C o g n i t i v e P s y c h o l o g y , 1973, 99-109. E i m a s , P . D . , S i q u e l a n d , E . R . , J u s c z y k , P . , and V i g o r i t o , J . M . Speech p e r c e p t i o n i n i n f a n t s . S c i e n c e , 1971, 171, 303-306. E n g e l , W. v o n R a f f l e r , An example o f l i n g u i s t i c c o n s c i o u s n e s s i n t h e c h i l d . I n C A . F e r g u s o n , and D . I . S l o b i n ( E d s . ) , S t u d i e s o f  C h i l d Language D e v e l o p m e n t . New Y o r k : H o l t , R i n e h a r t and W i n s t o n , 1973. Pp.155-158. 116 Frishkopf, L.S., and Goldstein, M.H., J r . Responses to acoustic s t i m u l i from s i n g l e units i n the eighth nerve of the b u l l f r o g . Journal  of the A c o u s t i c a l Society of America, 1963, 3Ji, 1219-1228. Fromkin, V.A., Krashen, S., Cu r t i s s , S., Ri g l e r , D., and Ri g l e r , M. The development of language i n Genie: A case of language ac-q u i s i t i o n beyond the C r i t i c a l Period. Brain and Language, 1974, 1, 81-107. Fry, D.B. Perception and recognition i n speech. In M. Halle, H.G. Lunt, and C.H. van Schooveld (Eds.), For Roman Jakobson. The Hague: Mouton, 1956. Pp. 169-173. Garnica, O.K. The development of phonemic speech perception. In T.E. Moore (Ed.), Cognitive development and the a c q u i s i t i o n of  language. New York: Academic Press, 1973. Pp 215-223. G i l b e r t , J.H.V., and Climan, I. Proceedings of the Speech Communication Seminar, Stockholm: Almkuist and Wik s e l l , 1974. Pp. 321-329. Graham, L.W., and House, A.S. Phonological oppositions i n c h i l d r e n : A perceptual study. Journal of the Acoustical Society of America, 1971, 4jL, 559-566. Haggard, M.P. Abbreviation of consonants i n English pre- and post-v o c a l i c c l u s t e r s . Journal of Phonetics, 1973, 1, 9-24. Halle, M., and Stevens, K.N. Speech recognition: A model and a pro-gram for research. In J.A. Fodor, and J . J . Katz (Eds.), The  structure of language. Englewood C l i f f s , New Jersey: Prentice-H a l l , 1964. Pp. 604-612. Hayek, F.A. Rules, perception, and i n t e l l i g i b i l i t y . Proc. Br. Acad., 1962, 48, 321-344. House, A.S., Stevens, K.N., Sandel, T.T., and Arnold, J.B. On the le a r n -ing of speechlike vocabularies. Journal of Verbal Learning and  Verbal Behavior, 1962, 1, 133-143. Huggins, A.W.F. D i s t o r t i o n of the temporal pattern of speech: Inter-ruption and a l t e r n a t i o n . Journal of the A c o u s t i c a l Society of  America, 1964, 1055-1064. Ingram, D. Phonological rules i n young c h i l d r e n . Journal of Chi l d  Language, 1974, 1, 49-64. Jakobson, R. Chi l d language, aphasia, and phonological universals. The Hague: Mouton, 1968. Kiang, N.Y-S., Watanabe, T., Thomas, E.C., and Clark, L.F. Discharge patterns of si n g l e f i b e r s i n the cat's auditory nerve. M.I.T.  Research Monographs, 1965, No. 35. 117 Kimura, D. Some e f f e c t s of temporal-lobe damage on auditory perception. Canadian Journal of Psychology, 1961, JJL, 156-165. Kimura, D. L e f t - r i g h t differences i n the perception of melodies. Quarterly Journal of Experimental Psychology, 1964, 16, 355-358. K l a t t , D. Voice onset-time, f r i c a t i o n , and a s p i r a t i o n i n w o r d - i n i t i a l consonant c l u s t e r s . Q.P.R., Research Laboratory of E l e c t r o n i c s , M.I.T., 1973, 109, 124-135. Kornfeld, J.R. Theoretical issues i n c h i l d phonology. Papers from the  Seventh Regional Meeting, Chicago L i n g u i s t i c s Society, 1971, 454-468. Kornfeld, J.R. Implications of studying reduced consonant c l u s t e r s i n normal and abnormal c h i l d speech. Paper presented at the Psy-chology of Language Conference, S t i r l i n g , Scotland, June 1976. Kozhevnikov, V.A., and Chistovich, L.A. Speech: A r t i c u l a t i o n and Per- ception. Washington, D.C: Joi n t Publications Research Service, 1965. Kuhl, P.K., and M i l l e r , J.D. Speech perception by the c h i n c h i l l a : Voiced-voiceless d i s t i n c t i o n i n alveolar p l o s i v e consonants. Science, 1975, 190, 69-72. Kuhn, CM. On the front cavity resonance and i t s possible r o l e i n speech perception. Journal of the Acoustical Society of  America, 1975, 5JL, 428-433. Lane, H.L. The motor theory of speech perception: A c r i t i c a l review. Psychological Review, 1965, 21, 275-309. L e t t v i n , J.Y., Maturana, H.R., McCulloch, W.S., and P i t t s , W.H. What the frog's eye t e l l s the brai n . Proceedings of the I n s t i t u t e of  Radio Engineers, 1959, AZ, 1940-1951. Liberman, A.M. The grammars of speech and language. Cognitive Psy- chology, 1970, 1, 301-323. Liske r , L., and Abramson, A.S. The voi c i n g dimension: Some experiments i n comparative phonetics. Proceedings of the Sixth International  Congress of Phonetic Sciences, Prague: Academia, 1970, 563-567. Marler, P. On the o r i g i n of speech from animal sounds. In J.F. Kavanagh, and J.E. Cutting (Eds.), The r o l e of speech i n language. Cambridge, Massachusetts: M.I.T. Press, 1975. Pp. 12-37. Menyuk, P. D i r e c t i o n of language processing. Q.P.R., Research Laboratory  of E l e c t r o n i c s , M.I.T., 1971, 101, 182-188. 118 Menyuk, P. The r o l e of d i s t i n c t i v e features i n children's a c q u i s i t i o n of phonology. In C A . Ferguson, and D.I. Slob i n (Eds.), Studies  of c h i l d language development. New York: Holt, Rinehart and Winston, 1973. Pp. 44-52. Menyuk, P., and Anderson, S. Children's i d e n t i f i c a t i o n and reproduction of the speech sounds /w/, / r / , and / l / . Journal of Speech and  Hearing Research, 1969, JJL, 39-52. M i l l e r , G.A., and Nicely, P.E. An analysis of perceptual confusions among some English consonants. Journal of the Acou s t i c a l Society  of America, 1955, 21, 338-352. M i l l e r , J.D., Weir, C C , Pastore, R.E., K e l l y , W.J., and Dooling, R.T. Discrimination and l a b e l i n g of noise-buzz sequences with varying noise-lead times: An example of c a t e g o r i c a l perception. Journal  of the Acou s t i c a l Society of America, 1976, 60, 410-417. Morse, P.A. The disc r i m i n a t i o n of speech and nonspeech s t i m u l i i n early infancy. Journal of Experimental C h i l d Psychology, 1972, 14, 477-492. -Moskowitz, B.A. The a c q u i s i t i o n of f r i c a t i v e s : A study i n phonetics and phonology. Journal of Phonetics, 1975, 1, 141-150. Mostofsky, D.T/ (Ed.) Stimulus generalization. Stanford U n i v e r s i t y , Stanford, C a l i f o r n i a , 1965. Pastore, R.E., Ahroon, W.A., and Puleo, J.S. Hybrid s e r i a l - p a r a l l e l model f or acoustic and phonetic processing: A re-evaluation. Journal of the Acou s t i c a l Society of America, 1975,58, 584. (Abstract) P i s o n i , D.B. Auditory and phonetic memory codes i n the discrimination of consonants and vowels. Perception and Psychophysics, 1973, 11, 253-260. (a) Pi s o n i , D.B. The r o l e of auditory short-term memory i n vowel perception. Status Report on Speech Research, 1973, SR-34, 89-119, Haskins Laboratories. (b) Pi s o n i , D.B., and Tash, J . Reaction times to comparisons within and across phonetic categories. Perception and Psychophysics, 1974, l j l , 285-290. Sachs, R.M. Vowel i d e n t i f i c a t i o n and disc r i m i n a t i o n i n i s o l a t i o n vs. word context. Q.P.R., Research Laboratory of E l e c t r o n i c s , M.I.T., 1969, J2, 220-229. Saslow, M.G. Reaction time to consonant-vowel s y l l a b l e s i n ensembles of various s i z e s . Q.P.R., Research Laboratory of E l e c t r o n i c s , M.I.T., 1958, 5JL, 143-144. 119 Schouten, J.F., Ritsma, R.J., and Lopes Cardozo, B. P i t c h of the r e s i -due. Journal of the Ac o u s t i c a l Society of America, 1962, 34, 1418-1424. Shankweiler, D.,and Studdert-Kenedy, M. I d e n t i f i c a t i o n of consonants and vowels presented to l e f t and r i g h t ears. Quarterly Journal  of Experimental Psychology, 1967, 19, 59-63. Shvachkin, N.Kh. The development of phonemic speech perception i n early childhood. In CA. Ferguson, and D.I. Slobin (Eds.), Studies of c h i l d language development. New York: Holt, Rinehart and Winston, 1973. Pp. 91-127. Sinnott, J.M.,; Beecher, M.D., Moody, D.B., and Stebbins, W.C Speech sound d i s c r i m i n a t i o n by monkeys and humans. Journal of the  Ac o u s t i c a l Society of America, 1976, jjO_, 687-695. Smith, N.V. The a c q u i s i t i o n of phonology. Cambridge: Cambridge Uni-v e r s i t y Press, 1973. Stevens, K.N. Toward a model for speech recognition. Journal of the  Aco u s t i c a l Society of America, 1960, 32, 47-55. Stevens, K.N. The r o l e of rapid spectrum changes i n the production and perception of speech. In E. Fischer-Jorgensen (Ed.), Form and substance. Copenhagen: Akademisk Forlag, 1971. Pp. 95-101. Stevens, K.N. The quantal nature of speech: Evidence from a r t i c u l a -tory-acoustic data. In E.E. David, J r . , and P.B. Denes (Eds.), Human communication: A u n i f i e d view. New York: McGraw-Hill, 1972. Pp. 51-66. Stevens, K.N., and House, A.S. Speech perception. In J.V. Tobias (Ed.), Foundation of modern auditory theory. (Vol. 2). New York: Academic Press, 1972. Pp. 3-62. Stevens, K.N., and K l a t t , D.H. Role of formant t r a n s i t i o n s i n the voiced-voiceless d i s t i n c t i o n for stops. Journal of the Ac o u s t i c a l  Society of America, 1974, 5JL, 653-659. Strevens, P. Spectra of f r i c a t i v e s noise i n human speech. Language and  Speech, 1960, 1, 32-49. Studdert-Kennedy, M. Speech perception. In N.J. Lass (Ed.), Contem- porary issues i n experimental phonetics. New York: Academic Press, 1976. Pp. 243-293. Studdert-Kennedy, M., Liberman, A.M., Ha r r i s , K.S., and Cooper, F.S. Motor theory of speech perception: A reply to Lane's c r i t i c a l review. Psychological Review, 1970, 77, 234-249. r 120 Studdert-Kennedy, M., and Shankweiler, D.P. Hemispheric s p e c i a l i z a t i o n f o r speechperception. Journal of the Acou s t i c a l Society of  America, 1970, 4jL, 579-594. Tikofsky, R.S., and Mclnish, J.R. Consonant di s c r i m i n a t i o n by seven year olds: A p i l o t study. Psychonomic Science, 1968, 10, 61-62. (a) Tikofsky, R.S., and Mclnish, J.R. E f f e c t s of intrastimulus delay on speech and accuracy i n a forced choice speech di s c r i m i n a t i o n task. Report Number 20, October 1, 1968, University of Michigan, Grant Number 1 P01 HD 01368-04, National I n s t i t u t e for Child Health and Human Development. (b) Velten, H.V. The growth of phonemic and l e x i c a l patterns i n infant speech. Language, 1943, ljL, 281-292. Weir, R. Language i n the c r i b . The Hague: Mouton, 1962. Wolf, C.G. The perception of stop consonants by ch i l d r e n . Journal of  Experimental C h i l d Psychology, 1973, IJL, 318-331. Wollberg, Z., and Newman, J.D. Auditory cortex of s q u i r r e l monkey: Response patterns of s i n g l e c e l l s to s p e c i e s - s p e c i f i c v o c a l i -zations. Science, 1972, 175. 212-213. Wood, C C . P a r a l l e l processing of auditory and phonetic processing i n speech perception. Perception and Psychophysics, 1974, 15_, 501-508. Wood, C C Auditory and phonetic l e v e l s of processing i n speech per-ception: neurophysiolpgical and information-processing analysis. Journal of Experimental Psychology: Human Perception and Per- formance, 1975, 104, 1-33. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0094092/manifest

Comment

Related Items