UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Recognition of syntactic structure based on prosodic or segmental cues Wiley, Michelle Dawn 1998

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1998-0318.pdf [ 7.91MB ]
Metadata
JSON: 831-1.0088529.json
JSON-LD: 831-1.0088529-ld.json
RDF/XML (Pretty): 831-1.0088529-rdf.xml
RDF/JSON: 831-1.0088529-rdf.json
Turtle: 831-1.0088529-turtle.txt
N-Triples: 831-1.0088529-rdf-ntriples.txt
Original Record: 831-1.0088529-source.json
Full Text
831-1.0088529-fulltext.txt
Citation
831-1.0088529.ris

Full Text

RECOGNITION O F SYNTACTIC S T R U C T U R E BASED O N PROSODIC OR S E G M E N T A L C U E S by MICHELLE DAWN WILEY B.Sc. The University of British Columbia, 1995 A THESIS SUBMITTED IN PARTIAL FULFILMENT O F T H E REQUIREMENTS FOR T H E D E G R E E O F M A S T E R O F SCIENCE in T H E FACULTY O F MEDICINE (School of Audiology and Speech Sciences) We accept this thesis as conforming to the required standard UNIVERSITY O F BRITISH COLUMBIA April 1998 © Michelle Dawn Wiley, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. The University of British Columbia Vancouver, Canada Department of DE-6 (2/88) A B S T R A C T The purpose of this study was to determine if a listener is able to recognize sentential syntactic type on the basis of prosodic or segmental cues when the availability of the other type of cue is severely reduced. Through a process known as "spectral inversion" (Blesser, 1972), prosodic cues (temporal and waveform amplitude cues) were maintained, while segmental cues were reduced. Two experiments were conducted, Experiments 1 and 2, in which participants listened to digitized recordings. In Experiment 1, it was demonstrated that when speech was spectrally inverted, listeners were able to use syllabicity to some extent to identify words or word combinations from a closed-set, but that overall, word recognition ability was severely reduced. In the main experiment, Experiment 2, fifteen participants (21 to 29 years) listened to three lists of sentences, each containing five exemplars of nine different syntactic types of sentences that ranged in complexity. Each list was presented in a separate condition. Condition 1 consisted of non-altered sentences, condition 2 of spectrally-inverted sentences, and condition 3 of concatenated sentences with reduced prosodic cues. Participants indicated the type heard using a closed-set, forced-choice, nine-alternative response paradigm. Participants had near-perfect accuracy in recognizing the syntactic type in conditions 1 and 3, but were less accurate in condition 2. Examination of response errors, however, indicated that poor performance in condition 2 was primarily attributable to decreased ability to recognize only two of the syntactic types, and did not necessarily reflect overall poorer performance. Additionally, it was shown by the error patterns in condition 2, that the number of syllables in the sentence served as an important cue. It was concluded that recognition of syntactic type is possible with prosodic cues (temporal and waveform amplitude cues) even when segmental cues are severely reduced. IV T A B L E O F C O N T E N T S Abstract ii Table of Contents iv List of Tables viii List of Figures ix Acknowledgements xi Chapter 1: LITERATURE REVIEW 1 1.1 Introduction 1 1.2 Speech Perception and Language Comprehension 2 1.3 Allocation of Working Memory Resources 4 1.4 Segmental Cues 6 1.5 Defining Prosodic Cues 7 1.6 Prosodic Support of Syntax 14 1.7 Syntactic Complexity 16 1.8 Caplan's Work 22 1.9 Hypotheses 24 Chapter 2: METHODS 28 2.1 Chapter Preview 28 2.2 Experiment 2: Purpose 28 2.3 Experiment 2: Participants 28 2.4 Experiment 2: Materials 29 2.4.1 Preparation of Stimuli for the Intact Condition 30 2.4.2 Preparation of Stimuli for the Prosodic Condition 31 2.4.3 Preparation of Stimuli for the Concatenated Condition 32 2.4.4 Preparation of Stimuli for the Practice Sentences 33 2.4.5 Ordering of Stimuli 33 2.4.6 Calibrating of the Sound Level of the Stimuli 34 2.5 Experiment 2: Conditions of Presentation 36 2.6 Experiment 2: Experimental Task 37 2.7 Experiment 1: Issues 37 2.8 Experiment 1: Procedure 38 2.9 Experiment 1: Participants 39 2.10 Experiment 1: Materials 40 2.11 Experiment 1: Calibration 40 2.12 Experiment 1: Presentation of Materials 40 Chapter 3: R E S U L T S 41 3.1 Experiment 1: Word Identification 41 3.2 Experiment 2: Sentence Structure Identification 47 3.2.1 Effect of Removal of Spectral Cues on Reaction Time 48 3.2.2 Effect of Removal of Spectral Cues on the Accuracy of Sentence Type Recognition 49 3.2.3 Effect of Sentence Complexity on the Accuracy of Syntactic Type Recognition 54 3.2.4 Effect of Interaction of Condition and Sentence Type on Accuracy of Response 56 VI 3.2.5 Error Patterns 58 3.2.6 Correlation of Working Memory with Syntactic Type Recognition 61 3.2.7 Summary 62 Chapter 4: DISCUSSION 63 4.1 Review of Hypotheses 63 4.2 Summary of Results 64 4.2.1 Experiment 1 64 4.2.2 Experiment 2 64 4.3 Conclusions: Experiment 1 65 4.3.1 Response Accuracy 65 4.3.2 Syllabic Recognition 66 4.4 Conclusions: Experiment 2 68 4.4.1 Reaction Time 69 4.4.2 Accuracy of Sentence Recognition 70 4.4.3 Syntactic Type Recognition 71 4.4.4 Effect of Condition and Syntactic type on Accuracy of Response 72 4.4.5 Examination of the Prosodic Condition and Accuracy of Syntactic type Recognition Based on the Number of Syllables 75 4.4.6 Working Memory in the Prosodic Condition 76 4.4.7 Hierarchy of Syntactic Types 78 VII 4.5 Utility of Prosodic Cues 79 4.6 Future Directions 83 R E F E R E N C E S 87 APPENDIX A: Syntactic Trees of an Ambiguous Sentence 92 APPENDIX B: Experiment 2: Participants' Pure-Tone Thresholds 93 APPENDIX C: Experiment 2: Background Information on Participants 94 APPENDIX D: Experiment 2: Words for the Concatenation Condition 95 APPENDIX E: Sentences and Word Lengths for the Materials Used in Experiment 2 the Concatenated Condition 96 APPENDIX F: Experiment 2: Order of Presentation of Sentences 98 APPENDIX G: Calibration Calculations 111 APPENDIX H: Set-up of Tucker Davis Technology Modules 116 APPENDIX I: Experiment 2: Instructions Presented to Participants 117 APPENDIX J: Experiment 1: Alphabetical List Given to Participants 120 APPENDIX K: Experiment 1: Order of Presentation of Individual Words 122 APPENDIX L: Mean and Standard Deviation of Reaction Times for Syntactic Types Across Condition 125 viii LIST O F T A B L E S Table 1: Sentence Hierarchy According to Syntactic Complexity 20 Table 2: Comparison of Syntactic Theories 21 Table 3: F Ratios for Target Sentences in Each Condition 49 Table 4: Student-Newman-Keuls Test for Condition Differences 53 Table 5: Mean Correct Score for Each Syntactic type for Each Condition 55 Table 6: Student-Newman-Keuls Test for Syntactic Type Differences 56 Table 7: Student-Newman-Keuls Test for Syntactic Type Differences by Condition 57 Table 8: Response Matrix for Experiment 2 59 Table 9: Syllabic Response Matrix for Experiment 2 61 Table 10: Correlation of Working Memory and Syntactic Type 62 ix LIST O F FIGURES Figure 1: Spectrograms 10 Figure 1a: Wide Band Spectrograms 10 Figure 1b: Narrow Band Spectrograms 12 Figure 2: Experiment 1: Total Number of Correct Responses 41 Figure 3: Experiment 1: Total Number of Participants Correctly Responding to Each Token 43 Figure 4: Experiment 1: Conditionl: Raw Score of Correct Syllables Given 44 Figure 5: Experiment 1: Condition 2: Raw Score of Correct Syllables Given 45 Figure 6: Experiment 1: Condition 3: Raw Score of Correct Syllables Given 45 Figure 7: Experiment 1: Conditionl: Percent Correct of Number of Syllables Given 46 Figure 8: Experiment 1: Condition 2: Percent Correct of Number of Syllables Given 46 Figure 9: Experiment 1: Condition 3: Percent Correct of Number of Syllables Given 47 Figure 10: Mean Score Correct of Each Target Syntactic type by Block 50 Figure 11: Experiment 2: Mean Response of all Participants for all Blocks by Condition 54 Figure 12: Experiment 2: Mean Number of Sentences Recognized for Each Syntactic type 55 xi A C K N O W L E D G M E N T S I would like to thank Kathy Pichora-Fuller for her support, patience and expertise throughout this project; and Jeff Small, Rushen Shi, and Andre-Pierre Benguerel, for their knowledge and suggestions. I am very grateful to the faculty and staff and the participants from the school of Audiology and Speech Sciences for their encouragement, understanding and assistance in meeting my time line. I would also like to thank the "lab gang" (Glynnis, Ruth, Val, Lisa, Hollis, Christiane, and Kristin), on whom I depended immensely. Finally, I would like to thank my husband Greg, my friends and my family, for their love and support throughout this project. This research was supported by a grant from the National Science and Engineering Research Council of Canada to M.K. Pichora-Fuller. 1 1. LITERATURE REVIEW 1.1 Introduction In everyday life, most communication is spoken. The goal of a listener is to understand spoken language. Part of understanding spoken language involves the recognition of sentential syntactic type. The recognition of sentential syntactic type is supported by the presence of normal prosody and/or segmental cues (Wingfield, Lombardi, & Sokol, 1984). For a normal-hearing listener, a rich array of prosodic and segmental cues are perceived. The cue redundancy available to a normal-hearing listener is not available to a hard-of-hearing listener. However, prosodic cues frequently remain available even when spectrally-dependent segmenal cues do not remain available to the hard-of-hearing listener. Diminished availability of cues to the hard-of-hearing listener typically affects the ease of listening and overall comprehension (Erber, 1988). By determining the relative usefulness of segmental versus prosodic cues, it may be possible in the future to promote better utilization of the cues which remain available to hard-of-hearing people. This study is designed to evaluate the extent to which prosodic and segmental cues support the recognition of syntactic structure. The purpose of this study is to determine if a listener is able to recognize sentential syntactic type on the basis of prosodic or segmental cues when the availability of the other type of cue, is severely reduced. In the present chapter, relevant findings concerning speech perception and the comprehension of spoken language will be reviewed. Special attention will be given to the importance of prosodic cues in syntactic processing. 1.2 Speech Perception and Language Comprehension Human ability to communicate using spoken language is complex. To fully understand spoken language, a listener must decipher the syntactic type of the utterance. The syntactic type of a sentence can be determined from reading printed words in text, or from auditory perception of lexical items accompanied by supportive prosodic structure (Wingfield, Lahar, & Stine, 1989). Studies with normal-hearing listeners have shown that prosodic cues provide support for linguistic factors including word stress, syntactic structure and semantic interpretation (Cooper, 1983). For example, cues which correlate with syntactic aspects of processing include lengthening of clause or sentence-final vowel(s), and rising or falling fundamental frequency at the ends of clauses or sentences. Additionally, prosodic information has been shown to aid listeners in determining if specific lexical information is missing from an utterance (Cooper, & Tye-Murray, 1985). The above cues may contribute to listeners' recognition of syntactic types. Such studies on the detailed acoustic phonetics of prosodic cues are essential, but so are studies involving prosodic cues in general. Thus, the focus of this study is to evaluate listeners' general ability to utilize prosodic cues. The degree to which language abilities are innate has been debated (Kuhl, 1993; Jusczyk, 1985; Berk, 1993); however, all agree that experience 3 promotes language development to some extent, even in early infancy. One developmental model argues that prosody supports the development of syntactic understanding (Morgan, 1996; Morgan, & Demuth, 1996). This model of language development views prosodic cues as facilitating the acquisition of syntax. Morgan (Morgan, 1986; Morgan, 1996) states that prosodic cues assist in the bracketing of certain syntactic constituents. Along the same line of research, Cassidy and Kelly (1991) suggest that an infant is able to use prosodic cues (pauses and stress) to determine the roles that nouns and verbs play within an utterance. Such a contrast is fundamental in syntactic processing. Empirical evidence (e.g., Nelson, Hirsh-Pasek, Jusczyk, & Cassidy, 1989) supports this view of syntactic acquisition. In this study, infants benefited from prosodic cues because these cues helped them segment the speech stream into perceptual units that corresponded to clauses (Nelson, et al., 1989). This suggests that, language comprehension in infants is a top-down process where, by knowing some of the initial syntactic constraints of a language through prosodic support, further deduction of syntactic rules, speech segmentation, and word learning are possible. Therefore, although the above models did not address the role of prosody in adult perception of syntax, it seems possible that similar processes continue into adulthood. Indeed, Cutler (e.g., Cutler, & Norris, 1988; Cutler, 1989) found that in English speaking adults, prosodic cues aid listeners' segmentation of an ongoing acoustic signal such that lexical access is facilitated. Other, more general, models of phonological representation tend to view speech as more hierarchical. These models (e.g., Fraser, 1992) purport that the 4 phonological representation provides input to the lexical representation, which in turn provides input to the syntactic representation. The phonological representation of speech includes prosodic units at all levels: the syllable, the word, the phrase, and the utterance (Nespor, & Vogel, 1983; Selkirk, 1996; Selkirk, 1978). There is at least some degree of matching between syntactic constituents and prosodic units. Given these phonological models it is reasonable to propose that prosody supports syntactic recognition. Therefore, in the present study adult's ability to utilize prosodic cues to recognize syntactic type will be evaluated. 1.3 Allocation of Working Memory Resources In understanding how listeners process incoming acoustic signals as language, it is necessary to recognize that perception involves ongoing cognitive processing. Specifically, in decoding an acoustic signal a listener must integrate recently heard information with new information. Working memory is required for this integration. According to a generalized working memory model (Carpenter, Miyake, & Just, 1994; Carpenter, Miyake, & Just, 1995; Daneman, & Carpenter, 1980), there is one reservoir of resources for the processing and storage of information, and a trade-off is necessary when language processing requires more than the available resources. That is, a listener has a limited capacity for maintenance, storage and processing of information. When a listener's capacity is saturated, 5 resources can be reallocated, sometimes resulting in a compromise between maintenance, storage and processing functions. Pichora-Fuller, Schneider, and Daneman (1995) suggest that when listening is difficult, a balance must be struck between perception and storage. When more resources are allocated to perceiving what is heard, fewer resources remain for processing, storage, and maintenance. In one example (King, & Just, 1991) where the relationship between working memory capacity and language comprehension at the syntactic level of processing was examined, it was concluded that syntactically difficult sentences put stress on working memory resulting in comprehension errors. It seems then that working memory resources can be allocated to perceptual processing in adverse listening conditions and/or they can be allocated to syntactic processing. In either case, stresses on processing can potentially exceed the constrained capacity of working memory resources, resulting in inefficient or inaccurate processing of information. Normal-hearing listeners have both segmental and prosodic cues available to them, whereas hard-of-hearing listeners have reduced cues available. The reduction of cues, or redundancies, available to a hard-of-hearing listener likely results in allocation of working memory resources in a manner which is different from the way in which working memory resources are allocated in normal-hearing listeners. Put another way, the normal-hearing listener does not have to use as many working memory resources to understand spoken language, and therefore he or she is able to allocate more working memory to 6 storage or higher levels of comprehension, with fewer working memory resources being allocated to lower-level perceptual processing of the acoustic signal (Pichora-Fuller, 1997). Because the hard-of-hearing listener is always functioning with limited cues, working memory is constantly being stressed by perceptual processing and there is less working memory available to allocate to storage or comprehension (Pichora-Fuller, et al., 1995). Therefore, interest in how listeners may rely on one set of cues over another (for example prosodic cues over segmental cues) motivates an investigation of the extent to which it is possible to use impoverished acoustic cues to understand syntax. If a disadvantageous listening situation is created, effectively removing a variety of cues, is it still possible to recognize the syntactic type of a sentence? 1.4 Segmental Cues Segmental cues involve primarily frequency detection and discrimination (Ohde, Haley, Vorperian, & McMahon, 1995). Depending on the configuration and the severity of their hearing loss, hard-of-hearing listeners may be unable to detect some of the frequency information necessary to understand segmental cues. This is because many hard-of-hearing listeners have hearing loss in the high frequencies (above 1.5 kHz) and frequency-specific information in this range is necessary for consonant recognition. For example: [f], [s], [k], [0], and [I] all have energy concentrated primarily above 1.5 kHz and are all relatively low in intensity. Therefore, these sounds are often inaudible to hard-of-hearing listeners. 7 Several studies have been completed which examined the role of prosodic support in the resolution of syntactic ambiguity (e.g., Allbritton, McKook, & Ratcliff, 1996; Speer, Kjelgaard, & Dobroth, 1996; Bradford, 1995; Price, Ostendorf, Shattuck-Hufnagel, & Fong, 1991). On the other hand, studies of how segmental cues support the resolution of syntactic ambiguity are lacking but it is clear that in English there is a direct link between the segmental items in a sentence and the syntax (e.g., Chomsky, 1965). Additionally, if segmental items were not able to directly support syntax then readers would be unable to decode syntactic structure. 1.5 Defining Prosodic Cues Prosodic cues are defined as changes in fundamental frequency (or pitch), duration (or temporal cues), and amplitude (or loudness) of the voice (Pell, 1996; Shattuck-Hufnagel, & Turk, 1996; Tidball, 1995; Wingfield, et al., 1989). These three main prosodic cues usually remain audible to hard-of-hearing listeners (Ginsberg, & Thomas, 1994). When a hard-of-hearing person reports that he or she is able to "tell when someone is talking" or that he or she can "hear but has trouble understanding", it is typically the prosodic cues and partial segmental cues which are being detected. A hard-of-hearing person is also usually able to hear low-frequency information, including the fundamental frequency of the voice (Ginsberg, & Thomas, 1994). Additionally, the temporal cues of the speech signal remain detectable in most cases. A typical hard-of-hearing listener does not have temporal resolution problems, to the extent that it 8 remains possible for him or her to recognize when the pauses within and between words occur (Turner, Smith, Aldridge, & Stewart, 1997). Hard-of-hearing listeners can also usually discriminate changes in amplitude or loudness when signals are suprathreshold. Most energy in speech is at or below 1.0 kHz, such that loudness changes in the overall speech signal are often detectable by the hard-of-hearing. In the present study, spectral inversion is used whereby temporal and overall amplitude cues are maintained (see Figure 1a) while spectrally-dependent segmental cues are scrambled or reduced, thus affecting vowel and consonant spectral features. Specifically, in this procedure, acoustic materials were band-pass filtered (.2-4 kHz, 48 dB/octave) and spectrally inverted (around 2.1 kHz). According to Blesser (1972) and confirmed by different authors (e.g., Lehiste, 1980; Kreiman, 1982; Duez, 1985), spectral inversion preserves supra-segmental speech cues while removing most of the segmental ones. However, because the signal is spectrally-inverted around 2.1 kHz, the fundamental frequency prior to spectral inversion is moved to a higher frequency and is therefore no longer the fundamental. According to some authors' judgement (e.g., Swerts, & Geluykens, 1993), the removal of the fundamental frequency may result in the related perception of pitch to not be present in the spectrally-inverted sentences. However, others have argued that the fundamental frequency does not necessarily have to be present to be perceived (e.g., Yost, & Nielsen, 1985, chapter 13). The fundamental frequency and pitch can be perceived even when only the harmonics of the fundamental are present. 9 Importantly, in spectrally-inverted materials, speech stimuli are inverted such that the relative relationship between the harmonic energy changes little, however the frequencies at which they occur may be shifted. As seen in Figure 1b, after spectral inversion, regularly distributed energy (which had been harmonics of the fundamental prior to spectral inversion) is still present in vowel segments of the signal, as is consonantal noise. It is also true that when listening to the spectrally-inverted signal, the speech does not sound natural. However, this is the case whether the fundamental is present or not. Apparently, the harmonic structure in combination with higher amplitude in the high frequencies results in a different pitch sensation following spectral inversion. Due to the unusual sound quality of the sentences, the experimenter was not confident that the associated change in pitch would not impact the listener in some way. Specifically, the experimenter was concerned that prosodic cues associated with changes in pitch were altered. In this study, therefore, a more conservative view is taken; namely, it is certain that temporal and overall amplitude cues remain following spectral inversion, whereas pitch cues are at least partially compromised. It is also the view of the experimenter that the possible absence of useful pitch cues does not weaken the findings of the study. If listeners are able to select the syntactic type using only two of the three main prosodic cues, a more specific argument is obtained. 10 11 12 13 CD =3 J*C9 CM 1.6 Prosodic Support of Syntax A number of studies have been completed in which the investigators have studied the support of syntactic coding by prosodic cues. These studies have primarily focused on how prosody is used to disambiguate syntactically ambiguous sentences (e.g., Allbritton, McKoon, & Ratcliff, 1996; Bradford, 1995; Nicol, 1996; Price, et al., 1991). A syntactically ambiguous sentence surface structure is one which can have two (or more) possible syntactic structures, or parsing options at some point in the sentence (Chomsky, 1957). For example in the sentence, (1) John knew the answer [was correct]. the phrase "the answer" could be processed as a sentence-final object NP (John knew the answer) or as the subject NP of a sentence complement (John knew the answer was correct). Initially, there are two possible syntactic interpretations of this sentence, and the ambiguity is resolved at a later point in the sentence (i.e. "was correct"; see Appendix A). Additionally, speakers have been shown to alter the intonation or prosodic structure of the sentence depending on the intended meaning (Allbritton, et al., 1996; Bradford, 1995). It has been shown that a listener reliably uses prosodic cues to correctly interpret a syntactically ambiguous sentence (Price, et al., 1991). The above research highlights the importance of prosody in correctly interpreting syntactic information when it is 15 ambiguous, and it points to the need to investigate prosodic support of non-ambiguous sentences. In a study by Shannon, Fan-Gang, and Kamath (1995), the importance of prosodic information to speech recognition for sentences was illustrated. This study focused primarily on the temporal aspects of speech. Note that the amplitude envelopes were also preserved. In this study, speech was first filtered into three bands (0 - 0.5 kHz, 0.5 -1.5 kHz, and 1.5 - 4.0 kHz) and then the amplitude envelope of each band was measured. The amplitude envelope of each band was then used to modulate a noise in the same frequency range. The three amplitude-modulated noises were then added together to produce a signal which maintained only the temporal (and amplitude) properties in each of these three broad frequency regions, with only rudimentary spectral shape information being transmitted. Participants were then asked to repeat as many words as possible from each sentence. Despite the reduced spectral content, participants were able to produce 90% of the words correctly from a given sentence with only the temporal (and amplitude) cues. The findings of this study by Shannon et al. (1995) have strong implications for how prosodic cues support the understanding of spoken language. That participants were able to repeat back 90% of the words correctly with only two prosodic cues, timing and loudness fluctuations, suggests that prosodic cues play an important role in spoken language comprehension. These findings are consistent with the views of Morgan (Morgan, 1996; Morgan, & Demuth, 1996) on language acquisition as mentioned above. 16 Therefore, because prosodic cues can be viewed as a means for understanding spoken language, and because prosodic cues aid a listener in disambiguating syntactic sentences, it is worthwhile to investigate if prosodic cues can be used to recognize and select the correct syntactic type of a sentence. 1.7 Syntactic Complexity Sentences may be syntactically complex on several different levels. In the present study there were nine different syntactic types of sentences presented and these types ranged in complexity. By having a variety of types of syntactic complexities, it was possible to determine if participants performed similarly in recognizing different kinds of syntactic types of sentence using a limited set of cues. In considering syntactic complexity, it is generally agreed that there exists a hierarchy of complexity (Caplan & Evans, 1990; Dillon, 1995; Shapiro & Nagel, 1995; Smith, 1988), and that a listener's performance is differentially affected depending on the syntactic structure of a sentence, with more difficulty being observed for more complex than for simple sentences (Caplan, Baker, & Dehaut, 1985; Dillon, 1995). The views of Shapiro and Nagel (1995) draw attention to the importance of verbs and their lexical properties both in linguistic theory and in sentence processing. "Verbs interact with syntax and interpretation in interesting ways" (Shapiro & Nagel, 1995) and the organization of the lexicon has direct 17 implications on syntactic constraints. Lexical representation of a verb contains four types of information: strict subcategorization, argument structure, thematic role information, and lexical-conceptual structure. Of these, only argument structure and thematic role information, and their impact on syntax, will be discussed here. "Argument structure" is defined as specifying how many participants occur in the "action" which the verb describes (Shapiro & Nagel, 1995). Thus, a sentence can be viewed as the representation of relations between a predicate and its arguments. "Thematic information" is described as the "thematic role" each participant plays in the action described by the verb (Shapiro & Nagel, 1995). Each argument takes on a thematic role (e.g. agent, theme, goal, experiencer) and each verb selects a set of possible thematic roles assigned to its arguments. The listener must then determine the correct thematic role from the set of possibilities. Shapiro and Nagel (1995) also state that sentences which serve as arguments of a verb complicate the overall argument structure. Additionally, the sentence serving as an argument for a verb, which also contains a verb itself, takes on thematic information. In sentence (2) below, the portion of the sentence in brackets indicates what is an argument of the verb "know". (2) Joelle knows [that Dillon broke the glass]. According to Shapiro and Nagel (1995), a verb is defined as less complex if it selects fewer thematic roles, and it is considered more complex the more 18 thematic roles it selects. Using measures of reaction time, these authors found verbs with more thematic roles (e.g. 4) to be more difficult to process compared to verbs with fewer (e.g. 2) thematic roles (Shapiro & Nagel, 1995). Another perspective on syntactic complexity is that of Smith (1988). According to this author, syntactic complexity is determined by: "amount", "density", and "ambiguity". "Amount" refers to the specific number of linguistic units (words or morphemes) in a sentence. A sentence which is longer and contains more complex morphology is considered to be more complex. Morphemic material may be distributed homogeneously among the linguistic units in a sentence or it may be more compressed into a dense unit. "Density" involves the way in which linguistic material is distributed in a sentence. Primarily, this concept examines the embedded phrases within the syntactic phrase structure tree. A sentence is considered to be more complex the more non-terminal structures within a phrase (Smith, 1988). For example, a sentence (3) with the form NP followed by PP is simpler than sentence (4) with an embedded AdjP within an NP. (3) Sue bought a hat with her allowance. (4) Sue bought a brilliant burgundy hat with her allowance. In the above sentences, the more complex sentence (4) has more non-terminal node structures and also contains more morphemes. 19 "Ambiguity" is when there are alternative interpretations of the surface structure of a sentence. The greater the number of possible syntactic trees and possible interpretations, the more complex the sentence is considered to be. As described above in sentence (1), there are initially two possible syntactic interpretations of this sentence, therefore it is considered to be ambiguous. Smith's (1988) view of syntactic complexity is generally one limited by both perceptual (was the morpheme audible) and cognitive constraints (determining the syntactic tree). From this perspective, it is therefore possible to affect syntactic understanding by either impacting the perceptual or the cognitive processes involved. It is also possible to evaluate the relative cognitive contribution to syntactic understanding by degrading the perceptual/acoustic cues available. Therefore, from Smith's discussion of syntactic complexity, it seems warranted to evaluate what cues are important at the perceptual level in order to facilitate the cognitive processes needed to comprehend spoken language. A slightly different way to operationalize syntactic complexity is discussed by Caplan and Evans (1990). These authors list several features which may contribute to the syntactic complexity of a sentence including: non-canonical (or expected) word order, number of arguments per verb, and the number of verbs per sentence. Caplan et al. (1985) describe these features as additive, and they state that the complexity of a sentence can be predicted from the number of features which occur in it. See Table 1 below. 20 When examining Caplan's definitions in comparison to those of Shapiro and Nagel (1995), similarities in the general definitions of syntactic complexity are apparent. For example, verbs with two agents take on more thematic roles. As Caplan's sentence complexity hierarchy increases, so do the number of possible options of thematic roles. Sentence Complexity Hierarchy j Sentences with one verb Two-place verb sentences: (A) Active The frog followed the goose. (P) Passive The dog was swallowed by the cat (CS) Cleft Subject It was the cat that that tackled the goose. (CO) Cleft Object It was the frog that the cat punched. Three-place verb sentences: (D) Dative Active The dog smacked the frog to the goose. (DP) Dative Passive The dog was smacked to the frog by the cat. ii Sentences with two verbs (C) Coordinated The goose tickled the dog and tackled the cat. (SO) Subject-Object The frog that the goose followed smacked the dog. (OS) Object-Subject The cat punched the frog that swallowed the goose. Table 1: Sentence Hierarchy According to Syntactic Complexity (Caplan et al., 1985). In comparing Smith (1988) and Caplan (Caplan, & Evans, 1990; Caplan, et al., 1985), it is seen that a general pattern of the ordering of syntactic complexity is common. However, Caplan's views of syntactic complexity are specific to the types of sentences which he is considering, whereas Smith's (1988) definitions are more generalizable. With respect to Smith's definition of amount, there are typically more morphemes in sentences with two-place verbs 21 compared to sentences with three-place verbs. The exception to this is syntactic type D, a three-place verb syntactic type which is similar in number of morphemes to two-place verb syntactic types C S and C O . Additionally, fewer morphemes occur in most one-verb sentences than in the two-verb sentences, with the exception of one-verb syntactic type DP which has more morphemes than any other of Caplan's syntactic types. Smith's definition of density is also consistent with Caplan's hierarchy of syntactic types. The density of Caplan's syntactic types increases in accordance with Caplan's hierarchy, with two-place one-verb syntactic types being less dense and two-verb syntactic types being more dense. Smith's category of ambiguous does not apply to Caplan's sentence hierarchy since none of his nine syntactic types are ambiguous. Table 2 below, compares the main defining areas of syntactic complexity of Caplan (1985 and 1990), Shapiro and Nagel (1995), and Smith (1988). It can be seen that in general the concepts on which syntactic complexity are based are similar within these three theories. Accordingly, similar decisions are reached regarding the hierarchy of sentences in terms of complexity. Therefore, as can be seen from the discussion above, Caplan's sentence hierarchy is in accordance with more general views of syntactic complexity. This supports the use of Caplan's syntactic types in evaluating syntactic comprehension. Caplan (1985,1990) Shapiro & Nagel (1995) Smith (1988) Number of verbs quantity of thematic information amount and density Expected word order more common arguments density Table 2: Comparison of Syntactic Theories of Caplan (1985,1990), Shapiro & Nagel (1995), and Smith (1988). 22 1J3 Caplan's Work The work of Caplan and his colleagues has been primarily motivated by research with the aphasic population and the difficulties which these individuals have in comprehending language, particularly at the syntactic level. Caplan has sought to develop an explanation of how sentences of varying syntactic complexity are comprehended and interpreted. Caplan's work was initially motivated by his interest in how aphasics process syntactically complex sentences in the Token Test (DeRenzi, & Vignolo, 1962). He found this work to be limited because it failed to investigate a range of syntactic types differing in complexity. Consequently, Caplan developed a set of sentences including nine types of syntactic complexity. Through empirical research on aphasic populations, he then developed predictions of complexity or a hierarchy of syntactic complexity. That is, by trying to improve upon the Token Test (DeRenzi & Vignolo, 1962), a set of sentences composed of nine syntactic types was developed and studied extensively in the aphasic population. Aphasic participants acted out the thematic roles of the nouns and verb(s) in the sentence by manipulating toy animals whose names were used in the sentence (referred to as the object manipulation test (OMT)). Results from several different studies (Caplan, et al., 1985; Caplan, & Futter, 1986; Caplan, 1987) indicate that there is a consistent hierarchy among the nine syntactic types. Specifically: 1) in each study, the same syntactic types produced the best mean correct scores, except for variation of the relative ordering of scores for sentence type C in comparison to syntactic types C O , DP, SO, and OS; 23 2) sentences with more frequently occurring word order were easier than sentences with less frequently occurring word order (i.e. A versus P; C S versus C O ; D versus DP, and C; and OS versus SO); 3) verb argument structure affected correct interpretation; when matched for word order, syntactic type D was harder than syntactic type A, and syntactic type DP was harder than syntactic type P; 4) two-verb sentences were more difficult than one-verb sentences. Results of Caplan and Evans (1990) indicated that, for non-aphasics, the above-stated results do not show the same quantitative effect of syntactic structure on sentence comprehension. Aphasics understand differentially depending on syntactic type, whereas non-aphasics' understanding depends far less on syntactic type. From the perspective of theories concerning the allocation of working memory resources (Carpenter, et al., 1994; Carpenter, et al., 1995), it is possible that aphasics have less working memory resources available compared to normals, and that for this reason they are less able to process syntactically complex sentences. Therefore, other studies have tried to mimic the results seen in aphasics in non-aphasics by essentially stressing the perceptual system. Some studies have visually presented sentential stimuli which have been time compressed (Miyake, Carpenter, & Just, 1994), and others have presented the sentential stimuli in the presence of competing noise (Kilborn, 1991; Dillon, 1995). Overall, it seems that it is possible to mimic some of the aphasic errors in 24 the non-aphasic population, but further investigation into the bases for the errors is still needed. 1.9 Hypotheses In the present chapter the following were reviewed: speech perception and comprehension of spoken language, working memory, segmental and prosodic cues, syntactic understanding, and syntactic complexity. Prosodic support of syntax was also discussed. It has generally been stated that there are models which show that prosody provides cues which can be used to support syntactic comprehension; and research has shown that prosody helps in disambiguating syntactic meaning as well as more generally supporting the understanding of spoken language. Additionally, there is thought to be a reservoir of working memory resources available for processing spoken language (including both perception and comprehension). When perceptual or language processing demands are high, more working memory resources are required, resulting in a trade-off between the allocation of resources to perception, comprehension and storage. Also examined was the nature of the reduction of perceptual cues available for hard-of-hearing listeners. That some cues (temporal and waveform amplitude) are more readily available to them than others warrants investigation into their utility. Therefore, by examining how listeners use prosodic cues when spectral cues are removed, it may be possible in the future to promote better utilization of the cues which remain available to 25 hard-of-hearing people. Such studies would also further our general understanding of how listeners understand spoken language. In the present study, support of syntactic recognition in normal-hearing listeners in conditions of reduced prosodic and segmental cues was examined. Stated below are the null hypotheses which were tested. Predictions regarding these hypotheses are presented and justified. Null Hypothesis 1: Participants will repeat spectrally-inverted words as accurately as non-spectrally-inverted words. Predictions for Null Hypothesis 1: This null hypothesis will be refuted. Participants will perform less accurately at repeating spectrally-inverted words compared to non-spectrally-inverted words because recognition of lexical items depends on spectrally-dependent segmental cues and these cues are reduced by spectral inversion. Furthermore, in the open-set paradigm, there is little possibility to guess correctly using reduced cues. Null Hypothesis 2: Participants will select the syntactic type of spectrally-inverted sentences with the same degree of accuracy as compared to non-altered sentences and sentences containing primarily segmental cues. 26 Predictions for Null Hypothesis 2: This null hypothesis will be supported, and it will be shown that participants perform similarly when selecting the syntactic type of sentence in all three conditions. It is predicted that this null will be supported because, from reviewing the literature, it seems that it will be possible to recognize syntactic structures on the basis of either primarily prosodic or spectral cues. Null Hypothesis 3: Participants will perform equally well in recognizing each syntactic type. Predictions for Hypothesis 3: This null hypothesis will be supported. If prosodic cues are sufficient to enable recognition one syntactic type, then they will be sufficient for the recognition of all syntactic types. As can be seen in the review of the literature, prosodic cues are claimed to support recognition of syntactic type. Note that the absence of a main effect of condition on response accuracy (Hypothesis 2), and the absence of a main effect of sentence type (Hypothesis 3) implies that there will also be no interaction effect. Null Hypothesis 4: Participant reaction time will not vary with sentential syntactic type. 27 Predictions for Hypothesis 4: This prediction will be supported, and it will be shown that syntactically more complex sentences do not require longer reaction times than syntactically less complex sentences. The literature has shown that syntactically more complex sentences require more effort to understand than less complex sentences and therefore take longer to process, but given the off-line nature of the present experimental task, these differences will not be observed. Null Hypothesis 5: Working memory will not correlate with recognition of syntactic type. Predictions for Null Hypothesis 5: This null hypothesis will be refuted, and there will be a significant correlation between participants' working memory span and the accuracy with which they recognize syntactic type. Specifically, consistent with the literature, participants with a larger working memory spans will have more resources to allocate to decoding sentences with reduced cues and will be able to recognize the syntactic type with a higher degree of accuracy than participants with lower working memory spans. 28 2. METHODS 2.1 Chapter Preview The purpose of this study was to determine if an individual is able to recognize sentential syntactic type when prosodic cues are severely reduced but segmental cues remain, and vice versa. While designing the main experiment (hereafter called Experiment 2), it was decided that an additional initial experiment was needed (hereafter called Experiment 1). The purpose of Experiment 1 was to validate assumptions underlying Experiment 2. Experiment 2 is the main study, and thus it will be described first. Following the description of the methods for Experiment 2, the methods for Experiment 1 will be described. 2.2 Experiment 2: Purpose The purpose of Experiment 2 was to determine if participants were able to recognize the sentential syntactic type in a 9-alternative forced-choice task when: (1) sentences were intact; (2) segmental cues were essentially eliminated; and (3) sentential prosodic cues were essentially eliminated. 2.3 Experiment 2: Participants Fifteen subjects participated in this experiment. All were female, native-English speakers between the ages of 21 and 29. All participants had pure-tone air-conduction thresholds within normal limits bilaterally (see Appendix B for details). Participants had basic knowledge of linguistics, ensuring that concepts 29 pertaining to the syntactic, lexical, and prosodic features of a sentence would be familiar. Additional information collected on participants included: Reading working memory span (as developed by Daneman & Carpenter, 1980), babble threshold, and Mill Hill vocabulary score (Raven, 1938). See Appendix C for details of participant characteristics. 2.4 Experiment 2: Materials Three different lists of 45 sentences were employed. The lists were devised by Dillon (1995) based on Caplan et al. (1985). Each list contained five exemplars of nine syntactic types of sentences: active (A), passive (P), cleft subject (CS), cleft object (CO), dative active (D), dative passive (DP), coordinated (C), subject-object relative (SO), and object-subject relative (OS). Examples of these syntactic types are: 1) Active The frog tickled the goose. 2) Passive The dog was followed by the cat 3) Cleft Subject It was the cat that tackled the goose. 4) Cleft Object It was the frog that the cat punched. 5) Dative Active The dog smacked the frog to the goose. 6) Dative Passive The dog was smacked to the frog by the cat. 7) Coordinated The goose tickled the dog and tackled the cat. 8) Subject-Object The frog that the goose followed smacked the dog. 9) Object-Subject The cat punched the frog that swallowed the goose 30 In the first experimental condition (hereafter called the intact condition), participants were presented with an unaltered list of sentences. In the second experimental condition (hereafter called the prosodic condition), participants were presented with sentences which had been spectrally inverted so that spectrally-dependent segmental cues were largely removed, and only prosodic cues remained. (Issues surrounding spectral inversion are discussed in section 2.4.2 below.) In the third condition (hereafter called the concatenated condition), participants were presented with sentences formed by words that had been spoken and recorded in isolation, then edited and assembled by the experimenter to form the sentences. This process resulted in sentences in which sentential prosodic cues were effectively removed but spectrally-dependent segmental cues remained. (Details of the concatenating procedure are discussed in section 2.4.3 below.) 2.4.1 Preparation of Stimuli for the Intact Condition Stimuli for all conditions were played in Computer Speech Research Environment 4.2 (CSRE) (1995). Therefore, all stimuli had to be stored in C S R E format. The stimuli for the intact condition (Dillon's 1995 sentences) were stereo soundfiles already in C S R E format, with the sentences on one channel and multi-talker babble on the other. In this experiment, only the sentence channel was presented to the participants (see section 2.5 on Conditions of Presentation below). 31 2.4.2 Preparation of Stimuli for the Prosodic Condition Dillon's 1995 sentences were re-recorded from their original NeXT format in Sound Works 3.0, v. 2 (a sound-recording/playing program) onto one channel of an audio cassette. These materials were then band-pass filtered (.2-4 kHz, 48 dB/octave) and spectrally inverted (around approximately 2.1 kHz) using in-house customized hardware (Benguerel, under review). The spectral-inversion technique has been described by Blesser (1972). It is known that the long-term spectrum of speech has a peak at approximately 1-kHz and falls off above this frequency with a slope of about 12 dB/octave. In order to avoid the presence of too much energy around 3 kHz after spectral inversion, spectral pre-emphasis of approximately 24 dB/octave between 0.5 and 3.5 kHz was applied before spectral inversion. The result of this spectral-inversion process can be seen in Figure 1a. This figure illustrates that, as discussed in Chapter 1, the spectral-inversion process maintains the temporal cues and waveform shape. The spectrally-inverted sentences were then recorded onto a second audio cassette. The sentences were re-recorded from this cassette into Sound Works 3.0 in a mono soundfile at a sampling rate of 32 kHz. All Sound Works files were then converted from a sampling rate of 32 kHz to 20 kHz. Using the NeXT program "Garbage In Sound Out" (GISO), all sentences were converted from Sound Works to binary format. The binary files were then copied to an IBM-clone PC, and imported into C S R E , from which they were played for participants. 32 2.4.3 Preparation of the Stimuli for the Concatenated Condition As mentioned earlier, the concatenated condition used sentences formed using words that had been spoken and recorded in isolation. The individually-spoken words for the concatenated condition were recorded live, directly into Sound Works 3.0 at a sampling rate of 32 kHz, using a female speaker (the same person who spoke the sentences used by Dillon, 1995). See Appendix D for a list of all the individual words read. During recording, the speaker sat in a sound-attenuating, double-walled IAC booth. The speaker read three different orderings of the list of all target words. All words were recorded using a Sennheiser K3U model microphone positioned approximately six inches from the speaker's mouth. These stimuli were recorded in mono, through a Proport (model 656) stereo-audio DSP port interface. The soundfile was recorded at a sampling rate of 32 kHz. Once recorded, the amplitude of the soundfile was increased using the "gain" parameter in Sound Works. The waveform amplitude gain was increased to 200% and applied three times. This resulted in a soundfile containing waveforms of an amplitude that made calibration possible. The experimenter listened to and visually inspected the waveforms for the three tokens of each target word and selected what she considered to be the best token. The best token of each target word was copied into its own soundfile. These words were then edited and concatenated to form sentences. Initially, the words in isolation were spoken at a slow rate of speech such that each word was two to three times longer than its counterpart recorded in the 33 Dillon (1995) sentences. Hence the words spoken in isolation were re-recorded at a more rapid rate of speech. However, even though the words were spoken at a much faster rate, the average length of each word and each of the concatenated sentences assembled from them was still slightly longer than the length of the original words and sentences (see Appendix E). During concatenation, 50 msec of silence was inserted after each word to ensure that participants would be able to discriminate lexical items and to reduce sentential temporal cues. (In natural, running speech, the inter-word silences are a potential prosodic cue. To ensure that this cue was absent, all words had the same pause inserted between them.) The fact that 50 msec of silence was inserted after each word contributes to the concatenated sentences being longer than the corresponding naturally-spoken sentences. 2.4.4 Preparation of Stimuli for the Practice Sentences A set of 18 practice sentences, two repetitions of one exemplar of each of the nine different syntactic types, were recorded at the same time and using the same method as that described in section 2.4.3 above. 2.4.5 Ordering the Stimuli Each list in each of the three conditions was ordered in a blocked fashion. Specifically, there were five exemplars of each of the nine types of sentences and the lists were ordered so that in block one, each of the nine types of sentences was heard. In block two, one of each of the nine types of sentences 34 was also presented, and so on for the other three blocks. Thus, five blocks were created, each containing one of each of the nine types of sentences. Within each of the five blocks, the order of the sentences was random as determined using a random numbers table. This was done so that if it was found that the first block showed a noteworthy learning effect, this block could optionally be excluded from the statistical analysis or be analyzed separately (see Appendix F). 2.4.6 Calibrating of the Sound Level of the Stimuli Each set of stimuli was calibrated by first using an in-house program (rms_spch.exe designed by Kim Yue, Erindale College, University of Toronto) to find the rms voltage of each soundfile. The average rms voltage of the soundfiles used in each condition in each experiment was then calculated. Next, following the preferred method (Wilber,1994, p.83), a 1-kHz calibration tone was played through the entire system of equipment with it set up as if an experiment was being run. A 1-kHz tone is used because the intensity of a 1-kHz tone is a good representation of the peak intensity of speech. The 1-kHz calibration tone was first played through an experiment in C S R E in 'ecoscon' through the Tucker Davis Technology modules set up exactly as during the experiment but without any attenuation. Then the 1-kHz calibration tone was routed through the jack panel of the sound booth to the right head phone of the TDH-39 head phones. The intensity of the 1 -kHz calibration tone was then measured using a sound level meter and was found to be 91.9 dBA. 35 To ensure that there was no peak clipping of the 1 -kHz calibration tone, it was next attenuated by 20 dB and played in the same manner as stated above. Following this attenuation, the 1-kHz tone was found to be 71.8 dBA, consistent with lack of peak clipping. The 1-kHz calibration tone was then run through the same in-house program to find its rms voltage. The difference between the rms voltage of the 1 -kHz calibration tone and the average rms voltage of the soundfiles to be used in each condition was then calculated and expressed in decibels using the formula: 20 log (voltage A/ voltage B) (see Appendix G for the calculations of the decibel differences and the average rms voltages for each condition). For example, in the intact condition, the sentences were found to be on average 14.30 dB less than the calibration tone. Since the calibration tone was 91.9 dBSPL, the level of the intact sentences was on average 91.9 dBSPL -14.30 dBSPL = 77.60 dBSPL. The average level of conversational speech is 70 dBSPL (equivalent to 50 dBHL; Davis, 1947) and this is the level at which the stimuli were presented to the participants. Therefore, the difference between 70 dBSPL and the average level of the speech stimuli in each condition was calculated, and the corresponding amount of attenuation was entered into the C S R E experimental generator program, ecosgen. For example, the amount of attenuation required for the intact condition was: 77.60 dBSPL - 70 dBSPL = 7.60 dBSPL. See Appendix G for the attenuation values used for each condition in each experiment. 36 2.5 Experiment 2: Conditions of Presentation All stimuli was presented monaurally to the participant's right ear. As stated above, the speech stimuli were presented at 70 dBSPL [equivalent to 50 dBHL (Wilber, 1994, p. 83)] which is approximately normal conversational level. All stimuli were presented in quiet. All experimental sessions took place with the participant seated in a sound-attenuating, double-walled IAC booth. The participant faced a computer screen on which the numbers "1" through "9" were presented vertically. An example of the target syntactic type was posted beside each number. The experimenter controlled the experiment from outside the soundbooth. The digital recordings of the talker's voice were played from the C S R E experimental control program, "ecoscon," to the participant via the Tucker Davis Technologies D/A and attenuator modules (see Appendix H for set up of Tucker Davis Technologies modules). Participants listened through Madsen TDH 39P 10W headphones. All participants underwent the experimental conditions in the same order. As discussed earlier, the intact condition was presented first, followed by the prosodic condition, then the concatenated condition. It was decided that the intact condition should be presented first to ensure that participants were comfortable with the task before completing the more challenging conditions in which cues were reduced. The specific sentences and the order in which they were presented was the same for all participants. Note that Dillon (1995) demonstrated that all sentence lists were equivalent. 37 2.6 Experiment 2: Experimental Task Prior to beginning Experiment 2, all participants completed practice trials with 18 practice sentences to ensure that they understood the task before beginning the experimental conditions. For both the practice and experimental conditions, participants were instructed to "select the number beside the sentence on the computer screen that best matched the sentence type which they heard" (see Appendix I for a copy of the instructions given to participants). Participants were instructed to base their decision on the type of sentence heard, regardless of the particular words in that sentence. Participants responded by selecting one of the nine choices using the computer mouse. To ensure that participants responded to all sentences, they were instructed to guess if unsure of the sentence type and told that the next sentence would not be presented until they had done so. No time restrictions were placed on participants. The participant's response was recorded by the same PC which presented the sentential stimuli. An output file was created including target, response, and reaction time. Each of the three conditions in this experiment plus the practice condition were recorded in separate text files (.txt) on the computer. 2.7 Experiment 1: Issues While preparing the spectrally-inverted stimuli, the experimenter noted that, with experience, it was possible to recognize the lexical content of the spectrally-inverted sentences with a surprising degree of accuracy. This 38 observation raised a question regarding the degree to which segmental cues may still be present, even after spectral-inversion. The presence of segmental cues would contaminate the condition intended to examine the unique contribution of prosodic cues to syntactic processing. It was necessary to determine if such cues were present in the spectrally-inverted speech signals, or if the experimenter's ability to recognize sentence content was based rather on her knowledge of the materials and top-down processing. Thus, it was decided to conduct a preliminary experiment (Experiment 1) to determine if the segmental cues had, in fact, been removed, or if enough remained to enable participants to identify content. 2.8 Experiment 1: Procedure Participants for Experiment 1 attended one session. Each participant experienced a total of three conditions, two with spectrally-inverted words and one with non-inverted words. In the first condition, participants were presented with words which had been spectrally-inverted. In the second condition, participants heard a set of unaltered words to ensure that they were able to accurately identify the stimuli when all normal acoustic cues were present. In the third condition, the spectrally-inverted words were presented again; this time, however, participants were given a printed, alphabetical list of all the words they would hear (see Appendix J for a copy of this list). The spectrally-inverted words were presented twice to evaluate the potential benefits of supports and experience with these altered materials. The 39 first time, the spectrally-inverted words were presented before the non-spectrally inverted words to assess the participants' ability to recognize the stimuli in the absence of any supports or experience. The second time, they were presented after the non-spectrally inverted words and the change to closed-set format so that the participants' ability to recognize the stimuli with supports and experience could be assessed. The second time participants heard the spectrally-inverted words, they were allowed to scan the list before responding. All participants heard the three experimental conditions in the same order. The words were presented in a different random order across conditions. In each condition, they listened to and repeated aloud the words. Participants were told that if unsure they were required to guess and that the next word would not be presented until they had done so. Participants' responses were recorded in writing on a score sheet by the experimenter on-line as the participant responded (see Appendix J). 2.9 Experiment 1: Participants No participant in Experiment 1 participated in Experiment 2. There were 10 participants in Experiment 1: two male and eight female. All were native English speakers between the ages of 23 and 37 years of age. All had hearing within normal limits bilaterally. 40 2.10 Experiment 1: Materials The materials for this experiment consisted of words or word combinations excised from sentence soundfiles similar to those used in the intact and prosodic conditions in Experiment 2. Because of coarticulation, it was sometimes difficult to excise individual words from intact sentences such that they remained clearly intelligible; in these cases, the words were also excised in combination with adjacent words. For example, in the sentence "The fox was passed to the owl by the pig.", the words "was" and "passed" and the word combination "was passed" were excised from the sentence (see Appendix K for a list of the target words and word combinations). The same units excised from the intact sentences were also excised from the spectrally-inverted sentences. These excised individual words or word combinations were copied into their own soundfiles. 2.11 Experiment 1: Calibration The calibration procedure was the same as for Experiment 2; see section 2.4.6 above. 2.12 Experiment 1: Presentation of Materials In Experiment 1 the set up was the same as for Experiment 2, except that the computer screen was not used. 41 RESULTS 3.1 Experiment 1: Word Recognition In this experiment, participants could correctly identify the target word or they could incorrectly identify it but give a response with the correct number of syllables. The number of words correct and the number of syllables in the response was measured. In all three conditions in Experiment 1, participants were asked to repeat back what he or she heard. The three conditions in the experiment were: condition 1, spectrally-inverted words; condition 2, non-spectrally altered words; condition 3, spectrally-inverted words with the aid of a list of all the possible responses. Figure 2 Experiment 1: Total Number of Correct Responses 35 j 30 --II " ° x re E, 20 --o CD fc 15 -8 1 1 0 " o 5 -0 -1 2 3 4 5 6 7 8 9 10 Participant As can be seen in Figure 2, for all subjects, the number of words correct, out of a maximum of 34, was far better in condition 2, the non-altered word H Condition 1 • Condition 2 • Condition 3 42 condition, than in conditions 1 or 3. Furthermore, compared to condition 1, participants generally did better in condition 3, the spectrally-inverted word condition presented after practice and in a closed-set response format. It was originally thought that with the benefit of practice and using a closed-set response format, the participants might get one or two more correct answers, but the finding that two participants got 10 out of 34 possible items was surprising. Nevertheless, all participants still performed much worse than in condition 2. To further investigate performance in these conditions, item-by-item analysis was conducted. Figure 3 illustrates the total number of participants who got each of the 34 items correct. Again, participants did best in condition 2. The target word or word combinations in condition 3 which participants tended to get correct were those with multiple syllables (e.g. "by the mouse", "that passed", "to the owl"). Further examination of syllabicity yielded some interesting patterns. When evaluating the number of responses in which the correct number of syllables was given, it can be seen that subjects generally were able to recognize the correct number of syllables in all three conditions. That is, when comparing word correct score, in which the participant had to give the exact lexical item, to number of syllables correctly stated, it was seen that participants did much better in recognizing the number of syllables. Even though the word correct score was poor in conditions 1 and 3, the score on syllables correct was high. In evaluating the number of syllables correct, and only for this portion of the analysis, participants were not penalized for getting the lexical item wrong. So, if a c and and chased bumped by by the mouse chased duck fox grabbed hauled hugged it it was kicked kissed mouse o owl ST => passed P'g pulled pushed scratched tapped that that passed the the pig to to the owl tossed touched tripped was was passed a -•• ro c Number of Participants o * v cn cn --j oo « a o 1 i i i 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token ' 1 ' 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token ! ! 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token i 1 1 1 1 i Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token : 1 ! 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token I 1 i 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token | 1 1 1 I 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 — M i l l Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token i i | 1 ' I I Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token I 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token | | 1 ) I I Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 1 1 1 j Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 1 I I I I Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token _ J 1 1 1 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token : 1 1 1 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token i 1 1 1 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token I I I ' Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token I Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token r 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token i l l ' Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token I I ' l l Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token r 1 1 1 1 1 | 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token ^ 1 1 1 1 1 I 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 1 1 1 1 1 i Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 1 1 1 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 1 1 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 1 1 1 i 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token ! 1 1 1 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token " •C0NDITI0N1 • CONDITION 2 r  • CONDITION 3 Figure 3 Experiment 1: Total Number of Participants Correctly Responding to Each Token 2P 44 participant gave the correct number of syllables, regardless of whether or not the lexical item matched, they were scored as correct for the syllabic portion of the analysis. Figures 4 to 6 show the number of tokens for which the correct number of syllables was given in each of the 3 conditions for all 10 subjects. In these figures the raw score of each participant is given for the 1-.2-, and 3-syllable words. The maximum score correct for the 1-, 2-, and 3-syllable words is constant across all 3 conditions with maximum scores of: 1-syllable words = 27, 2-syllable words = 5, and 3-syllable words = 2. This maximum value is indicated on each of the three figures (see Figures 4 to 6). 30 Figure 4 Experiment 1: Condition 1: Raw Score of Correct Syllables Given 27 - Max 1 Syllable 25 20 -H o <n 15 S <u DC 10 -H o E fa j f c , III • 1 Syllable (27) B 2 Syllable (5) • 3 Syllable (2) SS1 SS2 S S3 SS4 S S 5 S S6 S S 7 SS8 SS9 SS10 Participants 5-Max 2 Syllable T -Max*3 " Syllable 45 30 Figure 5 Experiment 1: Condition 2: Raw Score of Correct Syllables Given 27 - Max 1 Syllable B 1 Syllable (27) • 2 Syllable (5) • 3 Syllable (2) 5 - Max 2 Syllable SS1 3 - Max 3 SS2 S S3 SS4 S S 5 SS6 SS7 SS8 S S 9 SS10 Syllable Participants 30 Figure 6 Experiment 1: Condition 3: Raw Score of Correct Syllables Given 27 - Max 1 Syllable m 1 Syllable (27) • 2 Syllable (5) • 3 Syllable (2) SS1 SS2 SS3 SS4 SS5 S S6 SS7 SS8 S S 9 SS10 Participants 5-Max 2 Syllable 2 - M a x 3 Syllable Since the number of items varied widely between the 1-, 2-, and 3-syllable target word categories, percentages for each participant were calculated. In 46 Figures 7 to 9 are shown the percentages correct for the 1 -, 2-, and 3-syllable words in each of the 3 conditions respectively for all 10 participants. That many participants were able to give a response with the correct number of syllables is consistent with the claim that spectral inversion maintains temporal/amplitude cues. Figure 7 Experiment 1: Condition 1: Percent of Correct Number of Syllables Given o> CO c CD o I 100 90 80 70 60 50 40 30 20 10 0 I Lfl • 1 Syllable • 2 Syllable • 3 Syllable SS1 S S2 SS3 SS4 SS5 S S6 Participants SS7 SS8 SS9 SS10 Figure 8 Experiment 1: Condition 2: Percent of Correct Syllables Given 100 -, 90 -80 -70 -60 -50 -40 -30 -20 -10 -0 -• 1 Syllable a 2 Syllable • 3 Syllable SS1 SS2 S S 3 SS4 SS5 SS6 Participants SS7 S S8 SS9 SS10 47 Figure 9 Experiment 1: Condition 3: Percent of Correct Syllables Given 100 90 80 70 s 6 0 | 50 CD °- 40 30 20 10 0 • 1 Syllable • 2 Syllable • 3 Syllable SS1 SS2 SS3 SS4 SS5 SS6 SS7 Participants SS8 SS9 SS10 3.2 Experiment 2: Sentence Structure Recognition In Experiment 2, in each of three conditions, participants were asked to select the syntactic type heard from the nine choices on a computer screen. In the intact condition, participants were presented sentences which were acoustically intact. In the prosodic condition, participants were presented sentences which had been spectrally-inverted so that segmental cues were reduced whereas prosodic cues (temporal and waveform amplitude) remained. In the concatenated condition, participants were presented with sentences that had been edited and assembled by the experimenter, so that segmental cues were intact, whereas sentential prosodic cues were removed. The purpose of the main experiment was to determine if a listener is able to recognize sentential syntactic type on the basis of prosodic or spectrally-dependent semantic cues 48 with severe reductions in the other type of cue. The effect of manipulating the nature of the signal and sentential complexity on the accuracy and latency of sentence recognition were analyzed. 3.2.1 Effect of Removal of Spectral Cues on Reaction time An analysis of variance was performed on the median reaction time for each syntactic type in each condition for each subject. The results indicated that there was no significant effect on reaction time (p > 0.05) of the condition or the target sentence, nor was there any significant interaction effect between the condition and the syntactic type (p > 0.05). For the recognition of each syntactic type in each condition the median reaction times were long (from 933 ms to 6553 ms), and showed a wide variety of standard deviations (from 361 ms to 3376 ms). (For details see Appendix L.) These results are likely due to the off-line nature of the task. Because the participants had to select the correct sentence from a list of sentences on a computer screen, they were likely scanning all of the sentences each time a sentence was presented. Therefore, the fact that reaction time did not differ significantly with the syntactic type or stimulus condition is not surprising, if decision-time rather than syntactic processing time was measured. 49 3.2.2 Effect of Removal of Spectral Cues on the Accuracy of Sentence Type Recognition Figures 10a through 10i show each target syntactic type for each block in each condition for all participants. With zero (0) indicating response incorrect, and one (1) indicating response correct the mean score for all participants in a given block in a given condition for a given target sentence was calculated. Overall, no trend indicating a practice or learning effect was observed. An analysis of variance was performed on each block separately. For each block, there was a significant effect of target sentence and condition and there was also a significant interaction effect between target sentence and condition (see Table 3 below for details with respect to F ratios and significance values). The fact that the analysis of variance was the same for all blocks provides evidence of the similarity of the results across blocks. Therefore, the responses for all the blocks were combined in further statistical analyses. Block Number Effect of Target Effect of Condition Interaction Effect Between Target and Condition 1 F (8,112) = 4.547, p<0.001 F (2,28) = 40.132, p<0.001 F (16,224) = 4.302, p<0.001 2 F (8,112) = 4.912, p<0.001 F (2,28) = 14.367, pO.001 F (16,224) = 4.912, p<0.001 3 F (8,112) = 4.750, p<0.001 F(2,28) = 11.429, p<0.001 F (16,224) = 4.750, pO.001 4 F (8,112) = 3.209, p<0.01 F (2,28) = 21.510, p<0.001 F (16,224) = 3.311, p<0.001 5 F (8,112) = 5.655, p<0.001 F (2,28) = 13.500, pO.001 F (16,224) = 5.655, p<0.001 Table 3: F_Ratios for Target Sentences in Each Condition: F ratios of the effects of target sentence and condition for each block as well as the interaction between target and condition. Each F ratio and significance (p value) is displayed separately to illustrate that there was little difference across block. 50 Figure 10 Mean Score of Each Target Syntactic type by Block Figure 10a Target Sentece A 1 4 m n ^ <u 0.8 <§ 0.6 8 0.4 2 0.2 0 -I 1 1 1 1 2 3 4 5 Block —«—Condition 1 —a — Cond i t i on 2 - . i t Condition 3 1 f 0.8 0.6 0.4 0.2 0 1 Figure 10b Target Sentence P 3 Block 1 HP —Condit ion 1 -m— Condition 2 -at— Condition 3 Figure 10c Target Sentence CS -•—Condit ion 1 Hi —Condi t ion 2 4 Condition 3 51 Figure 10, con't. Figure 10d Target Sentence CO -•—Condit ion 1 H i —Condi t ion 2 • A Condition 3 1 t 2 0.8 -o o co 0.6 --c CO CD 0.4 -s 0.2 --0 -Figure 10e Target Sentence D 3 4 Block -•—Condit ion 1 -m— Condition 2 -* Condition 3 Figure 10f Target Sentence DP - A / v 4 —Condit ion 1 -•—Condit ion 2 i Condition 3 Figure 10, con't. 52 Figure 10g Target Senttnce C 3 Block —•—Condi t ion 1 —m—Condition 2 — T J E — Condition 3 1* 2 0.8 II o o CO 0.6 -c ca 0) 0.4 -E 0.2 -0 --Figure 10h Target Sentence SO 3 Block -•—Condit ion 1 H i — Condition 2 ,. Condition 3 53 Figure 11 indicates the mean response of all participants for all blocks for all target sentences in each condition. The maximum mean score is five (5) because there were five exemplars of each target syntactic type in each condition for each participant. From Figure 11, it can be seen that participants performed equally well in the intact and concatenated conditions, but more poorly in the prosodic condition. However, it is also true that participants performed at ceiling in the intact and concatenated conditions. An analysis of variance confirmed this description with a significant effect of condition F(2, 28) = 40.132, p < 0.001. A Student-Newman-Keuls test of multiple comparisons (see Table 4) indicated that there was a significant difference (p < 0.01) between the prosodic condition and the other two conditions, but there was no significant difference found between the intact and concatenated conditions (p > 0.05). Due to participants performing at ceiling in the intact and concatenated conditions, the remainder of the results focus on the prosodic condition. intact condition concatenated condition prosodic condition 4.9 4.9 4.1 Table 4. Student-Newman-Keuls Test for Condition Differences: Results of the Student-Newman-Keuls test of multiple comparisons performed at p < 0.01. Numbers represent mean correct score for each syntactic type. Conditions joined by a common line do not differ from one another; conditions not joined by a common line do differ significantly from one another at p < 0.01. 54 5 -4 -to ii a 3 E . £ o O o Figure 11 Experiment 2: Mean Response of all Participants for all Blocks co ^ c CO CO S 1 -intact condition prosodic condition concatenated condition 3.2.3 Effect of Sentence Complexity on the Accuracy of Syntactic Type Recognition Recall that according to Caplan et al. (1985) syntactic types can be grouped into the following hierarchy: 1. two-place one-verb sentences with one verb per sentence: A, P, C S , C O 2. three-place one-verb sentences with one verb per sentence: D, DP 3. sentences with two verbs: C, SO, OS. Additionally, Dillon (1995) found these general hierarchical categories to be maintained for young normal-hearing listener's in noise. The mean number of correct sentences selected for each syntactic type (A, P, CS , C O , D, DP, C, SO, OS) for each condition are shown in Table 5. 55 Syntactic type intact condition prosodic condition concatenated condition A 5.0 5.0 5.0 P 5.0 4.265 4.933 C S 5.0 4.6 4.933 C O 5.0 4.4 5.0 D 4.933 3.933 5.0 DP 5.0 2.467 5.0 C 5.0 4.333 5.0 S O 5.0 4.266 4.933 O S 5.0 3.334 4.866 Table 5. Mean Correct Score for Each Syntactic type for Each Condition. Figure 12 Experiment 2: Mean Number of Sentences Correctly Identified for Each Sentence Type A P C S C O D DP C S O OS Sentence Type Figures 12 shows the mean number of sentences correctly identified for each syntactic type for all three conditions. Nearly all responses were correct in the intact and concatenated conditions, but more errors were made in the prosodic condition. Fewer of the syntactic types D, DP, and O S were identified correctly in the prosodic condition, but more than 80% of the sentences were identified correctly for the other six syntactic types. An analysis of variance 56 demonstrated a significant effect of syntactic type on the number of correct sentences identified F(8, 112) = 4.547, p < 0.001. The results of a Student-Newman-Keuls test of multiple comparisons (see Table 6) indicated that there were the following significant differences among the syntactic types (p < 0.01): 1. syntactic types A, P, CS, C O , D, C, and S O did not differ significantly from each other, but they differed significantly from DP and O S which were the most poorly recognized. 2. syntactic types P, C O , D, C, SO and O S did not differ significantly from each other, but as a group, they differed from types A and C S which were more easily recognized, as well as from DP which was the most poorly recognized. 3. syntactic types DP and OS, the most poorly recognized, did not differ from each other. A C S C O (5 P S O 6 OS DP 5.0 4.84 4.8 4.7 4.7 4.7 4.6 4.4 4.2 Table 6. Student-Newman-Keuls Test for Syntactic Type Differences: Results of Student-Newman-Keuls test of multiple comparisons performed at p < 0.01. Numbers represent mean correct score for each syntactic type. Sentences joined by a common line do not differ from one another; syntactic types not joined by a common line do differ. 3.2.4 Effect of Interaction of Condition and Syntactic type on Accuracy of Response As stated above the pattern of sentences correctly identified for the different syntactic types was similar in the intact and concatenated conditions, 57 but differed in the prosodic condition. This was confirmed in an analysis of variance which demonstrated a significant interaction effect between the condition and the sentence target type, F(16, 224) = 4.302, p < 0.001. Referring back to Figure 12 again, it can be seen that participants performed differently in the prosodic condition especially on syntactic types D, DP, and OS. The results of a Student-Newman-Keuls test of multiple comparisons (see Table 7) indicated that there were the following significant differences among the syntactic types (p < 0.01) and conditions. Score >4.9 4.6 4.4 4.3 4.2 4.0 3.4 2.5 intact condition all syntactic types prosodic condition all syntactic types concatenated condition A CS CO C, P SO D OS DP Table 7. Student-Newman-Keuls Test for Syntactic Type Differences by Condition: Results of Student-Newman-Keuls test of multiple comparisons performed at p < 0.01. Mean correct score is for each syntactic type. Sentences joined by a common line do not differ from one another; syntactic types not joined by a common line do differ. 1. for all syntactic types in the intact and concatenated conditions, performance was the same. Now looking at the results for the prosodic condition only, 2. for syntactic type A, which was perfectly recognized, performance did not differ from performance on sentences in the intact and concatenated conditions. 58 3. syntactic types CS , C O , C, and P in the prosodic condition were still recognized well enough that performance on them did not differ significantly from performance on sentences in the intact and concatenated conditions. 4. for syntactic type SO in the prosodic condition, performance differed from performance on sentences in the intact and concatenated conditions, but not from performance on syntactic types CS, C O , C, or P in the prosodic condition. 5. for syntactic type D in the prosodic condition, performance differed from performance on sentences in the intact and concatenated conditions, but not from performance on syntactic types C O , C, P or S O in the prosodic condition. 6. syntactic types O S and DP were significantly more poorly recognized than other syntactic types in the prosodic condition, and performance on them differed significantly from performance on each other. 3.2.5 Error Patterns Further examination of the pattern of errors in the prosodic condition of Experiment 2 yielded some interesting patterns. Presented in Table 8 below is a response matrix for all participants in the prosodic condition. Syntactic types are ordered as in the hierarchy found in this experiment (for further discussion on hierarchy see section 4.4.7). Along the top of the matrix is the target syntactic 59 type and along the side is the syntactic type selected as the response. The maximum response for each cell on the diagonal is 75 (there were 15 participants who each heard each syntactic type in the prosodic condition a total of five times). The actual number of correct responses on the horizontal is highlighted in gray. Numbers in the remaining cells correspond to the number of each type of error that was made. It can be seen from Table 8 that participants had no difficulty with syntactic type A, and increasing difficulty with the remainder of the syntactic types looking across the matrix. For example, for syntactic type DP, two participants never selected this syntactic type. Clearly, overall performance on syntactic type DP was poor compared to other syntactic types. T A R G ET R E S P O N S E A C S C O C P SO D O S DP A 75 1 8 C S 69 6 1 2 2 C O 3 66 1 1 2 C 65 3 16 14 P 3 1 64 2 8 5 S O 64 1 1 4 D 3 2 4 59 3 12 O S 6 1 50 1 DP 1 1 1 4 2 37 Table 8: Response Matrix for Experiment 2: Matrix of responses given in Experiment 2, the prosodic condition for all syntactic types, for all participants. Along the top is the target, and along the side is the response given. Numbers in the gray boxes indicate the total number of correct responses given for all participants out of a maximum of 75. Numbers in the white boxes indicate the number of incorrect responses. The number of responses for each column is 75. Table 9, below, is a matrix showing the pattern of errors based on the number of syllables in the targets. Along the top of the matrix are the target syntactic types categorized by the number of syllables. As can be seen in Table 60 9, when participants made errors their response tended to have the same number of syllables as the targets, although this was not true to the same degree for syntactic type DP. Note that DP has more syllables (ten) than any other syntactic type. However, for syntactic type DP, the most common number of syllables in response errors was nine. Specifically, participants frequently gave syntactic types C, OS, and SO instead of target type DP. Overall, errors in the number of syllables were usually in the direction of fewer syllables than the number in the target. These results suggest that participants are able to recognize the number of syllables in a sentence and use this cue, at least to some extent, to select the syntactic type. One possibility is that DP is difficult because of high processing load due to the high number of syllables. It is interesting, however, to note that syntactic type P, which has only seven syllables, is the next most poorly recognized after DP even though it is the only type with this number of syllables. Another possibility is that DP and P are both difficult because they are especially dependent on a specific spectral cue. In particular, for either P or DP, the word 'by' must be identified. 61 T A R G E T R E S P O N S E A (5 syllables) P (7 syllables) CS, C O , D (8 syllables) C, SO, O S (9 syllables) DP (10 syllables) A (5 syllables) 75 (100%) 8 (11%) 1 (1%) P (7 syllables) 64 (85%) 12 (5%) 2 (1%) 5 (7%) C S , C O , D (8 syllables) 2 (3%) 203 (90%) 17 (8%) 14 (19%) C, SO, O S (9 syllables) 4 (2%) 203 (90%; 19 (25%; DP (10 syllables) 1 (1%) 5 (2%) 3(7%; 37 (49%) Table 9: Syllabic Response Matrix for Experiment 2: Matrix of responses given in Experiment 2, organized by number of syllables, in the prosodic condition for all syntactic types, for all participants. Numbers in brackets is the number of syllables in that (or those) target syntactic type(s). Along the top is the target, and along the side is the response given. Numbers in the gray boxes indicate the total number of correct responses given for all participants out of a maximum of 75 for syntactic types A, P, and DP and 225 for syntactic group types C S , C O , D, and C, SO, OS. Numbers in each of the white boxes indicates the number of incorrect responses. The number in italics (in brackets) indicates the corresponding percentages. 3.2.6 Correlation of Working Memory with Syntactic Type Recognition Recall from above, the overall response in the intact and concatenated conditions was seen to be at ceiling, therefore, correlation was only examined in the prosodic condition where some variation in the participants responses was seen. As can be seen in Table 10, participants' recognition of syntactic type was seen to correlate with working memory for seven of the nine syntactic types. The two syntactic types which did not correlate with working memory were A and OS; however, note that no errors were made on the type A so that it was not possible to observe a meaningful correlation. 62 Syntac-tic Type A C S C O C P S O D O S DP Work-ing Memory Corelat-ion *** 0.723 ** 0.651 ** 0.832 ** 0.601 * 0.793 ** 0.904 ** 0.314 0.455 ** * significant at p < 0.05 ** significant at p < 0.01 *** no errors Table 10: Correlation of Working Memory and Syntactic Type: Each number is the correlation co-efficient. Values which are significant are indicated with an asterisk (* or **), and values without an asterisk are not significant. There were no errors on type A, therefore no meaningful correlation was observed. 3.2.7 Summary From these results it can be seen that the effect of syntactic type was primarily due to performance in the prosodic condition; in particular, syntactic types DP and O S were poorly recognized in the prosodic condition. Additionally, the differences seen between conditions in section 3.2.4 above, are due to difference in performance seen for syntactic types DP and O S in the prosodic condition. Importantly, for most syntactic types, performance in the prosodic condition was very good. Furthermore, the errors which participants tended to make reflected their use of the number of syllables in the sentence. Finally, working memory and recognition of syntactic type in the prosodic condition were seen to correlate for all but one syntactic type when enough errors were made for it to be possible to see meaningful correlations. 63 4. DISCUSSION 4.1 Review of Hypothesis This study, was designed to determine if a listener is able to recognize sentential syntactic type on the basis of prosodic or segmental cues when the availability of the other type of cue is severely diminished. In a forced-choice, closed-set paradigm, participants selected the syntactic type they thought they heard from a set of nine choices displayed on a computer screen. By presenting prosodic cues (specifically waveform amplitude and temporal cues) while altering spectral cues by spectrally inverting the signal or by presenting concatenated words with altered sentence prosody, it was possible to evaluate the utility of the respective cues for recognizing syntactic type. In this study the following null hypotheses were tested: 1) Participants will repeat spectrally inverted words as accurately as non-spectrally inverted words. 2) Participants will select the syntactic type of spectrally inverted sentences with the same degree of accuracy as compared to non-altered sentences and sentences containing primarily segmental cues. 3) Participants will perform equally well on the recognition of all nine syntactic types in the prosodic condition. 4) Participant reaction time will not vary with syntactic type. 5) Working memory will not correlate with recognition of syntactic type. 64 4.2 Summary of Results 4.2.1 Experiment 1 Experiment 1 was intended to demonstrate that participants were unable to recognize the spectrally-inverted words used in the sentence materials for Experiment 2. It was found in Experiment 1 that participants were able to identify non-spectrally-inverted word and word combinations, but were essentially unable to identify the spectrally-inverted words and word combinations (see Figure 2). Thus, hypothesis 1 was refuted. In this experiment, it was also found that of the few segments which were correctly recognized, participants' response accuracy was superior for 2- and 3-syllable tokens (see Figures 4 to 9). Overall, participants seemed to rely on the number of syllables to identify segments. 4.2.2 Experiment 2 This experiment was designed to determine if participants could recognize the syntactic type of a sentence from prosodic or segmental cues when the other type of cue was severely reduced. It was found that reaction time was not affected by the type of cue available, nor by the target syntactic type, supporting hypothesis 4. In Experiment 2, it was also shown that overall accuracy was similar in the intact condition, where all cues were available, and the concatenated condition, where segmental cues were available but prosodic cues were disrupted. However, participants performed significantly worse in the prosodic condition, when waveform amplitude and temporal cues were the primary cues available 65 (see Figure 11, and Table 4). Thus, hypothesis 3 was not completely supported. Further examination of the participants' performance, however, showed that the reduction in sentence recognition in the prosodic condition was primarily attributable to poor recognition of two syntactic types (DP and OS), with performance on the other syntactic types not being significantly different from performance in the other two conditions. Thus, hypotheses 3 was not completely supported. Additionally, accuracy in recognizing seven of the nine syntactic types correlated significantly with working memory in the prosodic condition; thus partially refuting hypothesis 5. 4.3 Conclusions: Experiment 1 4.3.1 Response Accuracy It was hypothesized that participants would not be able to repeat non-spectrally-inverted and spectrally-inverted words equally well. It was found that participants were unable to recognize spectrally-inverted word or word combinations but they achieved near-perfect recognition for the same segments when they were not spectrally-inverted. It is not surprising that spectral information is necessary for word recognition, especially when other contextual cues are minimal. In other words, when a listener does not have all the cues necessary to decode a signal, either using context and top-down processing, or using signal cues and perceptual processing, it would be a challenge to recognize words. Shannon et al. (1995) found that participants were able to select target vowels and consonants from a closed set using temporal and 66 amplitude cues alone. Their finding suggests that, at least in some situations, it is possible to decode phonemes with temporal and amplitude cues alone. The fact that participants in the present study were unable to recognize the target word or word combinations in an open-set format is not in conflict with Shannon et al.'s (1995) findings. Participants in the present study were not required merely to identify the correct phoneme from a closed-set list of possible phonemes; rather they were required to recognize larger word or word combination patterns, without any indication as to what the word might be (in condition 1). In contrast, in condition 3 of Experiment 1, with the benefit of experience and using a closed-set format, results more similar to those of Shannon et al. (1995) were found. The fact that participants performed comparatively better in condition 3 as compared to condition 1 suggests that when there is a limited set of choices listeners are more accurate at selecting the target. However, participants in the present study still did not perform as well as those in the study of Shannon et al. (1995). Further differences between the results of the two studies are probably attributable to the greater complexity of the segments in the present study, and to the larger size of the closed-set of response options in the present study. 4.3.2 Syllabic Recognition In Experiment 1, it was found that participants, although not necessarily able to recognize the target word accurately, were able to respond with the correct number of syllables in almost all cases (see Figures 4 to 9). This suggest 67 that spectral inversion maintains important temporal speech cues. One aspect of speech that is primarily supported by temporal cues is syllabicity. Recall (Figure 1b), that it was possible to see the vowel/consonant alterations preserved in spectrally-inverted sentences. In this figure, the alterations between the periodicity of the vowel, and the aperiodic noise of the consonants is visible. Participants' ability to respond with the correct number of syllables offers more evidence that useful temporal cues are maintained in the spectral-inversion process. The use of supra-segmental information, such as syllabicity, could play a role in other contexts, such as sentences. Specifically, rhythm cues could be beneficial at the phrase and sentence levels (for details see Cooper, 1983). For example, cues such as vowel lengthening, or fundamental frequency rising/falling correspond to phrasal/sentential final syllables. In Experiment 1, condition 3, it was also found that participants were very accurate in identifying the three-syllable target word combinations. It is possible that participants were better able to identify targets containing more than one syllable because as the amount of supportive structural context was increased, more and more acoustic information became available and the participants were better able to use various top-down processes (Wingfield, 1996). Generally speaking, increasing the quantity of information available corresponds to the increase in the duration of the signal. By having more cues available over time, a supportive context is created, resulting in increased opportunities for the listener to decode the acoustic signal. On the other hand, if increased signal duration itself were the reason for the better performance of the participants, then in 68 condition 1 Experiment 1, they would have also performed better on multisyllabic targets over single syllable targets. This was, however, was not the case. In condition 1 of Experiment 1, participants performed equally poorly regardless of the number of syllables in the target. Therefore, in condition 3 of Experiment 1, it seems most likely that participants in fact benefited from the close-set paradigm. Specifically, in condition 3 Experiment 1 there were only two targets containing three syllables. Overall, results from Experiment 1 indicate that participants are able to utilize temporal cues to identify supra-segmental information, especially the number of syllables, but spectrally-dependent segmental information is not recognizable in spectrally-inverted words or word combinations. 4.4 Conclusions: Experiment 2 At first glance, when examining the results from Experiment 2, it seems that participants did not recognize syntactic types as well in the prosodic condition as in the intact and concatenated conditions. It will be argued, however, that the differences in the prosodic condition are due to poor performance on two syntactic types, and do not necessarily reflect poorer performance overall in this condition. Overall, the results from Experiment 2 suggest that prosodic cues do play an important role in the recognition of most syntactic types. 69 4.4.1 Reaction Time It was hypothesized that participants' reaction times would not vary with syntactic types. Results from Experiment 2 indicate that the time a participant requires to select a syntactic type from a closed set is not affected by the type of acoustic information available (i.e. the intact, prosodic and concatenated conditions), nor target syntactic type. These findings are most likely due to the experimental task or paradigm itself, and further investigation into reaction time as related to sentential syntactic type is warranted using a more on-line task. Because participants in Experiment 2 had to select from nine choices that were constantly present on a screen in front of them, it is not possible to rule out that participants scanned or read the set of choices after hearing each experimental sentence and before deciding which syntactic type to choose. In a study specifically designed to examine reaction time as a function of increasing syntactic complexity (Shapiro & Nagel,1995), it was found that syntactically more complex sentences took longer for listeners to process. The paradigm utilized by Shapiro and Nagel (1995) required participants to complete a visual cross-modal lexical decision task. When a participants' ability to process syntactic type is measured in this way, an effect of reaction time is seen. Therefore, it is necessary in the future to repeat Shapiro and Nagel's (1995) study, utilizing the materials from the present experiment. The goal of such a study would be to highlight the utility of prosodic cues in the recognition of syntactic structure in a more on-line manner. If participants performed a cross-modal task which requires on-line processing, we would expect to find that 70 reaction time to syntactically more complex sentences should be longer than for less complex sentences. 4.4.2 Accuracy of Sentence Recognition In Experiment 2, it was expected that participants' performance would improve with experience, as reflected by improved scores in each condition across blocks of trials. Examination of response accuracy across blocks in each condition for each syntactic type did not yield a consistent pattern of improvement (see Figures 10a to 10i). It seems that participants fully understood the task after the practice sentences, and that the task was relatively simple and did not require a lot of training to master. This suggests immediate use of the limited cues which remained available. It was expected that participants would be able to select the sentential syntactic type for normally-spoken sentences, and that they would not perform more poorly on spectrally-inverted sentences and that they would also not perform more poorly when segmental cues were available but prosodic cues were reduced. Examination of response accuracy in each of the three conditions (see Figure 11) indicated that participants performed the same in the intact and concatenated conditions, but more poorly in the prosodic condition. This issue will be further discussed in section 4.4.4 (see below). The finding that participants performed similarly in the intact and concatenated conditions is consistent with the findings of Wingfield et al. (1989). In the Wingfield et al. (1989) study, it was found that young participants 71 performed equally well at immediately recalling words in sentences both with and without supportive prosody. Therefore, young normal-hearing participants in quiet listening situations retain their ability to recognize words in sentences even if sentential prosodic cues are minimized. 4.4.3 Syntactic Type Recognition It was hypothesized that participants would perform equally well on all nine syntactic types. When examining syntactic type collapsed across conditions, it was found that syntactic types A, P, CS , C O , D, C, and S O did not differ significantly from each other, but as a group they differed significantly from DP and OS; syntactic types P, C O , D, C, S O and OS did not differ significantly from each other, but as a group, they differed from types A, C S and DP; and syntactic types DP and OS did not differ from each other, but as a group they differed from A, P, CS , C O , D, C and SO. Overall, based on the mean response correct scores the following hierarchy, from less to more complex, was found: A, CS , C O , C, P, SO, D, OS, DP. At first glance the hierarchy seems to differ from that found by Caplan et at. (1985) and Dillon (1995); however, as discussed in the following sections, further examination indicates that this is not necessarily the case. Specifcially, syntactic type DP was the most difficult in the present study, whereas Caplan et al. (1985) and Dillon (1995) found this structure to be only of moderate difficulty. Additionally, SO in the present study was found to be of moderate difficulty, but Caplan et al. (1985) and Dillon (1995) found this to be among the most difficult structures. Syntactic type P was also found by both 72 Caplan et al. (1985) and Dillon (1995) to be one of the easier syntactic types, whereas in the present study it was found to be in the middle of the hierarchy of complexity. However, since syntactic type P in the present study did not differ significantly from the easier syntactic types, the shift of P in the hierarchy is of little concern. These shifts in hierarchy can primarily be attributed to differential performance in the prosodic condition and can most likely be accounted for by the necessity of spectrally-dependent segmental cues (see section 4.4.7 below for further discussion). 4.4.4 Effect of Condition and Syntactic type on Accuracy of Response It was expected that there would be no syntactic type by condition interaction effect on accuracy. In examining accuracy without collapsing across conditions or syntactic types, some interesting patterns are evident. First and foremost, in the prosodic condition, syntactic types DP and O S were recognized significantly less accurately than all other syntactic types in all three conditions. The percentage of responses correct on these two syntactic types in the prosodic condition is extremely poor. The fact that participants had difficulty with syntactic type O S is consistent with the findings of Caplan et al. (1985) and Dillon (1995). Dillon (1995) found that participants had difficulty distinguishing between syntactic types OS and C, and she noted that these two syntactic types differ only on the basis of the words "that" and "and", respectively. As in the study of Dillon (1995), the patterns of errors in the present study indicated that errors on syntactic type O S were largely due to incorrect selection of syntactic type C. 73 In the future it will be necessary to further explore the acoustic basis for this confusion between syntactic types O S and C. A question arises concerning what, if any, temporal cues would be useful in distinguishing between "that" and "and" in syntactic types O S and C respectively. Potential temporal differences could exist in the pause length before or after and/that, difference in duration of and/that, or lengthening or shortening of surrounding lexical items, etc. A cursory examination of such temporal differences, however, failed to reveal any useful distinctive cue. It may be that in degraded conditions, such as spectral-inversion or poor signal to noise conditions, these structures are highly confuseable because no clearly distinguishing temporal cues are available. Syntactic type DP was also very difficult for participants. This difficulty was greater than was found by Dillon (1995) and Caplan et al. (1985). Examination of the raw data did not point to any consistent error patterns. However, participants demonstrated a general avoidance of this syntactic type as a response. This can also be seen in another way in Figures 10a to 10i where it is shown that the frequency with which syntactic type DP was selected never exceeded a mean score of 60%. Additionally, it is noteworthy that two of the participants never selected syntactic type DP in the prosodic condition, and two other participants only selected syntactic type DP once out of the five times it was presented in the prosodic condition. One possible reason that the mean response for this syntactic type was much lower in the present study than was found in the studies of Dillon (1995) and Caplan et al. (1985) is that the passive relies on the recognition of the word "by". When spectral cues are minimized, "by" might 74 become indistinguishable from a variety of other prepositions (e.g. in, to, on). If the listener processes the "by NP" as a more general "PP", then they would fail to find this structure in the response choice list, resulting in confusion and errors that are unpredictable. Next, in considering the mean response for syntactic type by condition, it can be seen that all syntactic types in the intact and concatenated conditions, and syntactic syntactic types A, P, CS , C O , and C for the prosodic condition did not differ from each other. As well, in the prosodic condition, mean responses to syntactic types CS , C O , C, P, S O did not differ from each other. Nor did mean responses to syntactic types C O , C, P, SO, D differ from each other. The overlapping groupings of mean responses to syntactic types do not provide evidence of a clear hierarchy of syntactic types except that types O S and DP ranked lower than the other types, as discussed above. Returning to the issue of the overall mean accuracy for the prosodic condition, as raised in section 4.3.2 above, the large impact which the two syntactic types, O S and DP, had on overall accuracy needs to be re-considered. The fact that participants did not perform well overall in the prosodic condition is mostly due to poorer performance on these two syntactic types (DP and OS). This poor performance may not necessarily reflect the general performance of listeners in the prosodic condition. It remains possible that the difficulty which participants had with a limited number of the syntactic types arose from the relative paucity of distinctive temporal cues for those types only, whereas other 75 syntactic types might contain more distinctive temporal cues, which might have contributed to the remarkably high degree of recognition accuracy. 4.4.5 Examination of the Prosodic Condition and Accuracy of Sentence Type Recognition Based on the Number of Syllables When participants made errors they tended to select syntactic types with the same number of or fewer syllables (see Table 9). Specifically, when participants are credited for a correct response based on the number of syllables in a sentence, it becomes clear that syllabicity was utilized by participants, although not with the same degree of accuracy for syntactic types DP and P. For seven of the nine syntactic types (A, CS , C O , C, SO, D, and OS), incorrect responses usually had the same number of syllables as the target sentence. While syllabicity evidently provides a useable cue to syntactic type it is important to note that errors were still made, especially for DP and P, even though there were no other response choices with the same number of syllables, ten and seven respectively. This pattern of responses is consistent with the findings from Experiment 1 that participants tended to report the correct number of syllables in their responses even when lexical items were not correctly identified. This is not surprising since the number of syllables was audible to participants. Re-examining the working-memory explanation of language comprehension developed by Carpenter and her colleagues and others (Carpenter, et al., 1994; Carpenter et al., 1995; Daneman & Carpenter, 1980; Pichora-Fuller et al.,1995), it seems reasonable to suggest that one reason for 76 the degree of difficulty which participants experienced in recognizing syntactic type DP, and the errors made, might have been related to working-memory constraints because of the high number of syllables. Specifically, when making errors, participants often selected a syntactic type with fewer, but still a "high" number of syllables. Other factors may also contribute to the pattern of errors. Perhaps when selecting a syntactic type with a high number of syllables, the participants may have known that there were many syllables but selected a more common syntactic type. Also, following from the discussion in section 4.4.4 above, the necessity of spectrally-dependent segmental cues may have also influenced response selection. The poor response accuracy for syntactic type DP is not inexplicable if we take the following three factors into consideration: 1) working memory constraints might have been at play (participants were only able to recall that there were a high number of syllables); 2) the frequency of occurrence of the syntactic type DP is low; 3) segmental cues are key in recognizing the passive. Further examination of how syllabicity serves sentence recognition is needed. It will be necessary in the future to specifically investigate whether or not syllabicity is an important cue for syntactic recognition, and to determine if it is used to only rule in or out specific syntactic types or if it serves other beneficial roles. 4.4.6 Working Memory in the Prosodic Condition It was hypothesized that participants' working memory span would be correlated with their ability to correctly recognize syntactic types. Correlational 77 statistics were conducted to evaluate whether or not working memory span correlated with ability to correctly recognize syntactic type in the prosodic condition. Correlational statistics were not run on either of the other two conditions because performance was at ceiling and no meaningful correlations could have been obtained. Results indicated that syntactic types CS , C O , C, P, SO, D and DP correlated with working memory span in the prosodic condition; however this was not the case for syntactic types A and OS. Note that performance on syntactic type A was perfect making it impossible to observe meaningful correlations. There is no obvious reason why recognition accuracy for syntactic type O S was not significantly correlated with working memory span. In section 4.4.5 above, issues surrounding working memory and syntactic type DP were raised. The fact that working memory correlated significantly with recognition accuracy for this syntactic type supports the idea that working memory limitations may have been at play. Working memory correlates with recognition accuracy for most syntactic types; in the absence of segmental cues participants with lower working memory spans performed more poorly than those with higher working memory spans. Conversely, when stressed, participants with high working memory spans were better able to utilize the residual prosodic cues to recognize syntactic types than participants with lower working memory spans. 78 4.4.7 Hierarchy of Syntactic Types In section 4.4.3 above, the issue of the hierarchy of syntactic types was raised. Participants tended to respond less accurately to syntactic type DP in the prosodic condition, which inevitably affects the hierarchy. Note that the hierarchy is based on response accuracy, and fails to take into account the patterns of responses or the cues underlying those patterns. As discussed above in section 4.4.3, in the present study it was found that syntactic type P was higher in the ranking of syntactic types than it was in the studies of Dillon (1995) and Caplan et al. (1985), where it was found to be among the most simple of syntactic types. The reason for difficulty with syntactic type P may be akin to the reasons already discussed with respect to syntactic type DP. Additionally, further examination of the results indicates that this difference in the hierarchy is not statistically significant. That is, in the present study, syntactic type P in all three conditions did not differ significantly from syntactic types A, CS, or C O (the more simple syntactic types found in previous studies). Therefore, the apparent discrepancy in the ranking of type P in the various studies is not really significant and does not warrant further in-depth examination. In the hierarchy found in the present study, as mentioned in section 4.4.3 above, syntactic type S O was ranked in the middle of the hierarchy. In contrast, Dillon (1995) and Caplan et al. (1985) and many others found this syntactic type to be among the most difficult. This shift in hierarchy is again primarily due to mean accuracy on this syntactic type in the prosodic condition. However, the 79 explanation differs from that offered for the ranking of syntactic types DP or P. Recall that in the intact and concatenated conditions, performance was at ceiling and there were no differences in mean accuracy for the syntactic types; therefore, based solely on these two conditions, there would be no groupings or hierarchy established. A hierarchy is observable only in the prosodic condition. The range of mean accuracy scores in the prosodic condition, with the exception of syntactic types DP and OS, is also quiet small with mean accuracy scores ranging from 80% to 100%. Therefore, although the mean accuracy scores were used to determine a hierarchy of syntactic types, the hierarchy was primarily based on problems only for types OS and DP. To a large extent, the hierarchy found in the present study, therefore, does not stand in contrast to that found by Dillon (1995) or Caplan et al. (1985). This study was designed to examine the ability of listener's to recognize syntactic types in the presence of specific limited acoustic cues. Therefore, any apparent differences in hierarchy can be explained with reference to the usefulness of these specific types of cues. Participants' ability to recognize syntactic types in the different conditions will be further discussed (see section 4.5 below). 4.5 Utility of Prosodic Cues This study was conducted to determine if young normal-hearing listeners are able to recognize syntactic types based on prosodic or segmental cues when the other type of cue is severely reduced. It is clear, as mentioned in section 4.4.2 above, that participants were able to recognize all syntactic types on the 80 basis of segmental cues alone. Participants did not perform statistically differently in the concatenated condition as compared to the intact condition. The fact that participants were able to recognize syntactic types with the same degree of accuracy when segmental cues were available but prosodic cues were minimized indicates that these cues are sufficient for the recognition of many syntactic types. Participants' ability to recognize sentential syntactic type on the basis of prosodic cues when segmental cues were minimized was also examined. Issues surrounding the nature of the redundancies which a normal-hearing listener might use to recognize sentence structures were discussed above (section 1.3). Normal-hearing listeners have available to them both prosodic and segmental cues which they can utilize in quiet, non-adverse listening situations. In contrast, hard-of-hearing listeners do not have such a rich array of cues available to thenrL Segmental information is primarily carried by the high frequencies, whereas prosodic information is primarily carried by low-frequency components of the speech signal. Since these cues are less likely to be diminished by hearing loss, the potential use of prosodic cues for the recognition of syntactic type is important to explore. If a listener is able to recognize syntactic type on the basis of prosodic cues when spectrally-dependent segmental cues are minimized, then these cues would likely be advantageous to a hard-of-hearing person. By being able to recognize syntactic types, a listener's ability to deduce or use top-down processing to decode the meaning of the sentence would be enhanced. Such 81 top-down use of syntactic information may facilitate the listener's processing of reduced spectrally-dependent information. In the prosodic condition, listeners recognized seven of the nine syntactic types as accurately as they recognized them in the other conditions. Participants performed significantly more poorly in recognizing two of the nine syntactic types (DP and OS) in the prosodic condition than in the intact and concatenated conditions. Thus, participants were able to use prosodic cues most but not all of the time to correctly recognize syntactic types when segmental cues were minimized. Additionally, crediting participants if the number of syllables in the response was the same as the number of syllables in the target, it was found that participants were able to recognize the number of syllables in the sentence most of the time. Listeners' seem to be able to utilize temporally-based syllabicity cues to recognize most syntactic types. The notion that prosodic cues serve as a framework or structure supporting sentence recognition is in agreement with Shapiro and Nagel (1995). These authors state that "prosodic information is used very early (perhaps at the first-pass) to resolve temporary ambiguities". The use of prosodic information early in sentence recognition is consistent with the notion that prosodic support is used to recognize syntactic types, and that listeners use it to fill in or assemble segmental details provided by other more local properties of the acoustic signal. In the present study, participants had to recognize syntactic types from a closed set of common choices. In contrast, in everyday life, a listener may need to recognize a larger number of syntactic types. The findings of the present study 82 are, therefore, limited to only a subset of the total number of such types. Nevertheless, it seems reasonable to argue that prosodic information is used to distinguish syntactic structures (e.g., Allbritton, et al., 1996; Bradford, 1995; Nicol, 1996; Price, et al., 1991). In cases of syntactic ambiguity there is often a closed-set of possible alternative interpretations. Therefore, using prosodic information to resolve syntactic ambiguities in these situations is comparable to the closed-set nature of the task used in the present study. That is, in the present study participants had to select syntactic type from a closed-set of options. Similarly, when resolving syntactic ambiguities there is often a closed-set of choices. In both cases, the recognition of the correct syntactic type is enhanced or enabled by prosodic cues. In the future, it will be important to determine if prosodic cues are sufficient for resolving other kinds of syntactic ambiguities. Studies by Drullman (1995a, 1995b) have investigated the relative contribution of temporal cues to speech intelligibility. In these studies, the author presented speech in the presence of noise; in the control condition, the speech signal was unaltered, but in the experimental condition a portion of the speech signal was removed or masked by noise. When Drullman (1995a, 1995b) removed the portion of speech below the noise floor that was assumed to have been masked, it was surprising that speech intelligibility was drastically reduced as compared to the control condition. Drullman's conclusions (1995a, 1995b) are consistent with those of this study, insofar as both studies illustrate listeners' 83 ability to use temporal cues when decoding speech in conditions where segmental cues are minimized. 4.6 Future Directions Since this study, to our knowledge, was the first of its kind, in the future it will be necessary to further examine listener's ability to recognize syntactic type on the basis of prosodic cues when segmental cues are minimized. Further research along this line can examine participants' general ability to use prosodic cues in everyday situations using other paradigms. For example, studies could be conducted using a free-recall paradigm in which the participants repeat the sentence, and syntactic structure is credited regardless of the identity of lexical items. It will also be necessary to investigate listeners' ability to use prosodic cues to disambiguate syntactically ambiguous sentences. Furthermore, it will be necessary to investigate the possible contribution of fundamental frequency as a distinct prosodic cue. In the present study, the conservative view that pitch was not present was taken. Therefore in the future, by including the fundamental frequency, the perception of pitch will be ensured. Normal-hearing listeners have the opportunity to take advantage of particular kinds of cue redundancies when listening to speech. By removing specific cues, redundancies were eliminated. In everyday life, redundancies are often eliminated by competing noise. As an extension of this study and work by Drullman (1995a, 1995b), it will be important in the future to extend the present paradigm to investigate normal-hearing listeners' abilities to recognize syntactic 84 types in the presence of a competing noise. Such a study would help to determine if prosodic cues remain audible in the presence of background noise and would increase our understanding of the utility of these cues in speech perception by hard-of-hearing listeners. Conversely, if the cues do not remain audible or cannot be used in background noise, then hard-of-hearing listeners may not benefit from these cues. It will also be necessary in the future to further evaluate the issues raised concerning the allocation of working memory resources during the comprehension of spoken language. Even if prosodic cues remain audible in the presence of background noise, how does their availability relate to the allocation of working memory resources to higher-order linguistic and/or cognitive processing of the material heard. The present study found no effect of syntactic type on reaction time. It seems likely that syntactically more complex sentences would require more time to process. The present paradigm was apparently not sensitive to processing time. Having participants scan nine syntactic choices apparently resulted in an off-line measure of sentence recognition. In the future, a more on-line paradigm should be undertaken which is more sensitive to reaction time. Such an investigation should illuminate more fully how prosodic cues influence sentence processing. The specific acoustic prosodic cues utilized by listeners was not evaluated in this study, and further investigation into this area will be necessary in the 85 future. Specifically it will be necessary to determine which prosodic cues are the most important for syntactic recognition. Finally, the present study has many long-term implications. By demonstrating the usefulness of prosodic cues in sentence recognition, we gain insight into how listeners decode acoustic signals into the intended meaning. Since the results from the present study point to the utility of syllabicity cues in recognizing syntactic type, a beginning point for further research has already been identified. It will be necessary in the future to specifically investigate whether or not syllabic recognition is an important cue for syntactic recognition, and to sort out whether or not syllabic recognition is a cue which can be used to only rule in or out some syntactic type or if this cue offers other benefits. Additionally, there is a strong body of literature which argues for prosodic support during syntactic development, and by understanding how adults are able to use ' prosodic cues to support the recognition of syntactic type, insights might be gained into how infants come to understand syntax. On the more practical level of aural rehabilitation, a hard-of-hearing listener's ability to understand is impacted by factors including: the speaker, the listener, the environment, and the message (Erber, 1988). In aural rehabilitation, these factors are manipulated in an attempt to achieve optimal communication, either for the purpose of information exchange or social interaction. An appreciation of the role of prosodic cues in syntactic recognition will allow future development of rehabilitative strategies. Such strategies would permit hard-of-hearing listeners to take greater advantage of the prosodic cues which are preserved in everyday listening conditions. 87 R E F E R E N C E S Allbritton, D.W., McKoon, G . , & Ratcliff, R. (1996). Reliability of prosodic cues for resolving syntactic ambiguity. Journal of Experimental Psychology. 22 (3). 714-735. Benguerel, A.P. (under review). Stress-timing vs. syllable-timing vs. mora-timing: the perception of speech rhythm by native speakers of different languages. To appear in Etudes et Travaux (1998), published by the Institut des Langues Vivantes et de Phonetique, Universite Libre de Bruxelles. Berk, L . E . (1993). Infants, children, and adolescents. Neddham Heights, M.A.: Allyn & Bacon. Blesser, B. (1972). Speech perception under conditions of spectral transformations, I: phonetic characteristics. Journal of Speech and Hearing Research. 15. 5-41. Bradford, L. (1995). The effect of noise on segmental and prosodic timing in speech production. Masters of Science Thesis, UBC. Caplan, D., Baker, C , & Dehaut, F. (1985). Syntactic determinants of sentence comprehension in aphasia. Cognition. 21. 117-175. Caplan, D., & Futter, C. (1986). Assignment of thematic roles to nouns in sentence comprehension by an agrammatic patient. Brain and Language. 27,117-134. Caplan, D. (1987). Discrimination of normal and aphasic subjects on a test of syntactic comprehension. Neuropsvchologia. 25.173-184. Caplan, D., & Evans, C. (1990). The effects of syntactic structure on discourse comprehension in patients with parsing impairments. Brain and Language. 39. 206-234. Carpenter, P.A., Miyake, A., & Just, M.A. (1994). Working memory constraints in comprehension: evidence from individual differences, aphasia and aging. In M.A. Gernsbacher (Ed.), Handbook of psvcholinguistics. (pp. 1075 -1121). New York: Academic Press. Carpenter, P.A., Miyake, A., & Just, M.A. (1995). Language comprehension: sentence and discourse processing. Annual Review of Psychology. 46. 91-120. 88 Cassidy, K.W., & Kelly, M.H. (1991). Phonological information for grammatical category assignments. Journal of Memory & Language, 30 (3), 348-369. Chomsky, N. (1957). Syntactic structures. Mouton: The Hague. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: The MIT Press. Cooper, W. (1983). The perception of fluent speech. American Psychological Association. 405. 48-63. Cooper, W . E . , & Type-Murray, N. (1985). Acoustical cues to the reconstruction of missing words in speech perception. American Psychological Association. 38 (1). 30-40. C S R E (4.2) (1995). Computer Speech Research Environment. London, Ont: AVAAZ Innovations, Inc. Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception & Performance. 14 (1). 113-121. Cutler, A. (1989). Auditory and lexical access: where do we start? Lexical representation and process (William Marslen Wilson, Ed). Cambridge, MA: MIT Press. Daneman, M., & Carpenter, P.A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behaviour. 1JL 450-466. Davis, H. (1947). Acoustics and psychoacoustics. In H. Davis and S.R. Silverman (Eds.), Hearing and deafness. New York: Hold, Rinehart and Winston. DeRenzi, E . , & Vignolo, L. (1962). The token test: a sensitive test to detect receptive disturbances in aphasics. Brain. 86. 665-678. Dillon, L.M. , (1995). The effect of noise and syntactic complexity on listening comprehension. Masters of Science Thesis, UBC. Drullman, R. (1995a). Temporal envelope and fine structure cues for speech intelligibility. Journal of the Acoustical Society of America. 97 (1). 585-592. Drullman, R. (1995b). Speech intelligibility in noise: relative contribution of speech elements above and below the noise level. Journal of the Acoustical Society of America. 98 (3). 1796-1798. 89 Duez, D. (1985). Perception of silent pauses in continuous speech. Language and Speech. 28 (4). 377-389. Erber, N.P. (1988). Communication therapy for hearing-impaired adults. Abbotsford, Australia: Clavis Publishing. Fraser. H. (1992). The subject of speech perception. London: Macmillan Press. Ginsberg, I.A., & White, T.P. (1994). Otologic disorders and examination. In J . Katz (Ed.), Handbook of clinical audiology. 4 t h edition (chapter 2, pp 6-24). Williams and Williams: Baltimore. Jusczyk, P.W. (1985). On characterizing the development of speech perception. Neonate Cognition: Beyond the Blooming. Buzzing Confusion (J. Mehler & R. Fox, Eds). Norwood, N.J.: Ablex. Kilborn, K. (1991). Selective impairment of grammatical morphology due to induced stress in normal listeners: implications for aphasia. Brain and Language. 41. 275-288. King, J . , & Just, M.A. (1991). Individual differences in syntactic processing: the role of working memory. Journal of Memory and Language. 30. 580-602. Kreiman, J . (1982). Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics. 10. 163-175. Kuhl, P.K. (1993). Early linguistic experience and phonetic perception: implications for theories of developmental speech perception, special issue: phonetic development. Journal of Phonetics. 21 (1-2). 125-139. Lehiste, I. (1980). Phonetic characteristics of discourse. Meeting of the Committee on Speech Research. Acoustical Society of Japan. April 20. Miyake, A., Carpenter, P.A., & Just, M.A. (1994). A capacity approach to syntactic comprehension disorders: making normal adults perform like aphasic patients. Cognitive Neuropsychology. 11. 671-717. Morgan, J.L. , & Demuth, K. (Eds) (1996). Signal to syntax: bootstrapping from speech to grammar in early acquisition. Mahwh NJ: Lawrence Erlbaum. Associates, Inc. Morgan, J.L. (1996). Prosody and the roots of parsing. Language & Cognitive Processes. 11 (1-2). 69-106. 90 Morgan, J.L. (1986). From simple input to complex grammar. Cambridge, MA: MIT Press. Nelson, D.G.K., Hirsh-Pasek, K., Jusczyk, P.W., & Cassidy, K.W. (1989). How the prosodic cues in motherese might assist language learning. Journal of Child language. 16. 55-68. Nespor, M., & Vogel, I. (1983). Prosodic structure above the word. In A. Cutler and D.R. Ladd (Eds.): Prosody: models and measurements (pp, 123-140). Berlin: Springer. Nicol, J.L. (1996). What can prosody tell a parser? Journal of Psvcholinouistic Research. 25 (2). 179-192. Ohde, R.N., Haley, K.L., Vorperian, H.K., & McMahon, C.W. (1995). A developmental study of the perception of onset spectra for stop consonants in different vowel environments. Journal of the Acoustical Society of America. 97 (6). 3800-3812. Pell, M.D. (1996). On the receptive prosodic loss in Parkinson's Disease. Cortex. 32. 693-704. Pichora-Fuller, M.K. (1997). Language comprehension in older listeners. Journal of Speech Language Pathology and Audilogy. 21, 2,125-142. Pichora-Fuller, M.K., Schneider, B.A., & Daneman, M. (1995). How young and old adults listen to and remember speech in noise. Journal of the Acoustical Society of America. 97. 593-608. Price, P.J., Ostendorf, S., Shattuck-Hufnagel, S., & Fong, C. (1991). The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America. 90 (6). 2956-2970. Raven, J .C . (1938). The Mill Hill Vocabulary Scale. London: Lewis. Selkirk, E.O. (1978). Paper presented at the Sloan Workshop on the Mental Representation of Phonology. University of Massachusetts. Nov. 18-19 (Reproduced by the Indiana University Linguistics Club, November, 1980). Shapiro, L.P., & Nagel, H.N. (1995). Lexical properties, prosody, and syntax: implications for normal and disordered language. Brain and Language, 5JL 240-257. Shattuck-Hufnagel, S., & Turk, A . E . (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psvcholinouistic Research. 25 {21193-247. 91 Shannon, R.V., Fan-Gang, Z., & Kamath, V. (1995). Speech recognition with primarily temporal cues. Science. 270. 303-304. Smith, C. (1988). Factors of linguistic complexity and performance. In A. Davison, & G. Green (Eds) Linguistic complexity and text comprehension: readability issues reconsidered (pp, 247-279). Hillsdale, NJ: Erlbaum. Speer, S.R., Kjelgaard, M.M., & Dobroth, K.M. (1996). The influence of prosodic structure on the resolution of temporary syntactic closure ambiguities. Journal of Psycholinguistic Research. 25 (2). 249-271. Swerts, M., & Geluykens, R. (1993). The prosody of information in spontaneous monologue. Phonetica. 50. 189-196. Tidball, G.A. (1995). The effects of noise on identification of topic changes in discourse. Masters of Science Thesis. Turner, C.W., Smith, S.J., Aldridge, P.L, & Stewart, S.L. (1997). Formant transition duration and speech recognition in normal and hearing-impaired listeners. Journal of the Acoustical Society of America. 101 (5. Pt. 1). 2822-2825. Wilber, L.A. (1994). Calibration, puretone, speech and noise signals. In J . Katz (Ed.), Handbook of clinical audioloov. 4 t h edition (chapter 6, pp 73-94). Williams and Williams: Baltimore. Wingfield, A. (1996). Cognitive Factors in Auditory Performance: context, speed of processing, and constraints of memory. Journal of the Acoustical Society of America. 7.175-182. Wingfield, A., Lahar, C.J . , & Stine, E.A. (1989). Age and decision strategies in running memory for speech: effects of prosody and linguistic structure. Journal of Gerontology. 44 (4). 106-113. Wingfield, A., Lombardi, L., & Sokol, S. (1984). Prosodic features and the intelligibility of accelerated speech: syntactic versus periodic segmentation. Journal of Speech and Hearing Research. 27. 128-134. Yost, W.A., & Nielsen, D.W. (1985). Fudamentals of hearing. 2 n d edition, chapter 13. Holt, Rinehart, and Winston, Inc: Chicago. 92 APPENDIX A Syntactic Trees of an Ambiguous Sentence: John knew the answer [was correct] Structure a) John knew the answer Structure b) John knew the answer was correct APPENDIX B Experiment 2 Participants' Pure-Tone Thresholds (dBHL) Test Frequency (in kHz) 0.25 0.5 1 2 4 8 R L R L R L R L R L R L Subject 1 0 5 0 0 -5 5 10 0 -5 -5 0 0 2 5 5 -5 0 -5 -10 0 -10 -10 -10 5 10 3 0 -5 0 0 0 0 5 5 0 0 5 0 4 0 5 0 0 0 0 0 0 0 0 -5 5 5 5 0 0 5 0 -5 0 0 0 0 10 -5 6 0 -5 0 -5 -5 -5 0 -5 5 -5 15 10 7 0 5 -5 5 -10 -10 -10 -10 -10 -10 10 0 8 -10 0 -10 -5 -10 -5 -10 -5 -10 -5 -10 10 9 -10 -10 -10 -5 -10 0 -10 0 -10 -10 -10 0 10 0 0 0 0 0 -5 0 5 -5 -10 0 5 11 5 0 5 0 0 5 0 0 0 -10 5 15 12 0 0 0 0 -5 -5 -5 -5 -5 -5 0 0 13 5 -5 -5 -5 -10 0 -10 -5 -5 15 15 0 14 0 5 0 0 0 -10 0 -10 5 0 15 10 15 5 10 15 5 15 10 15 5 0 5 0 15 94 APPENDIX C Experiment 2 Background Information on Participants ss# Pure Tone SRT Vocab- Age Handed- Babble *WMS Average Right Left Right Left ulary Score ness Threshold Right Left 1 1.33 1.33 0 0 15 22 Left 0 5 3 2 -3.33 -6.66 -5 -5 13 24 Right -5 -5 4 3 1.33 1.33 0 0 16 29 Right -5 0 5.67 4 0 0 0 0 13 21 Right 0 5 3 5 0 0 0 -5 15 23 Right -5 0 4 6 -1.33 -5 -5 -5 13 23 Right 0 -5 3.33 7 -8.33 -5 -5 0 15 22 Right -5 -5 3.67 8 -10 -1.33 5 -5 12 22 Right -10 0 2.67 9 -10 0 -5 0 16 23 Right -10 0 4 10 0 0 0 0 14 22 Right 0 0 3.33 11 1.33 1.33 0 0 16 26 Right 0 -5 3 12 -3.33 -3.33 -5 -10 13 22 Right -5 -5 2.67 13 -8.33 -3.33 -10 -10 16 23 Right -5 0 3 14 0 -6.66 -5 -5 16 27 Right 0 0 2.67 15 15 6.66 5 10 16 23 Right 10 10 2.67 *WMS: This term refers to reading working memory span measured with the methods and materials of Daneman & Carpenter (1980). APPENDIX D Experiment 2 Words for the Concatenation Condition F U N C T O R S the that was by to and it ANIMALS pig duck owl mouse fox V E R B S pulled pushed scratched bumped tapped passed kissed tossed touched tripped kicked hugged grabbed chased hauled 96 APPENDIX E Sentence and Word Lengths for the Materials Used in Experiment 2, the Concatenated Condition To ensure that the concatenated sentences were as similar as possible in duration to the same sentences spoken naturally, the average duration per word was used as a basis of comparison. This measure was used instead of average sentence length because nine different syntactic types, varying in duration, were included in the comparison. The average duration per word was calculated by dividing the total duration of each sentence (including inter-word pauses) by the number of words in that sentence. The difference between the natural and concatenated average word duration is partly attributable to the 50 msec inter-word pause used in the concatenated sentences. (This pause duration is greater than that which occurred between the words in running speech.) Sentence Duration: Duration: Number Average Average Normal Conca- of Words Duration Per Duration Per tenated Word: Normal Word: Concatenated 1 2517 ms 2991 ms 8 315 ms 374 ms 2 3678 ms 3297 ms 9 409 ms 366 m 3 2994 ms 2942 ms 8 374 ms 368 ms 4 3660 ms 3480 ms 9 407 ms 387 ms 5 1902 ms 1701 ms 5 380 ms 340 ms 6 2312 ms 2685 ms 7 330 ms 384 ms 7 3361 ms 3838 ms 10 336 ms 384 ms 8 3530 ms 3137 ms 9 392 ms 349 ms 9 2663 ms 2806 ms 8 333 ms 351 ms 10 3667 ms 3499 ms 9 407 ms 389 ms 11 2954 ms 3126 ms 8 369 ms 391 ms Appendix E, con't... Sentence Duration: Duration: Number Average Average Normal Conca- of Words Duration Per Duration Per tenated Word: Normal Word: Concatenated 12 2597 ms 2973 ms 8 325 ms 372 ms 13 3009 ms 3748 ms 10 301 ms 375 ms 14 3496 ms 3518 ms 9 388 ms 391 ms 15 3351 ms 3361 ms 9 372 ms 373 ms 16 1929 ms 1891 ms 5 386 ms 364 ms 17 2223 ms 2643 ms 7 318 ms 378 ms 18 2690 ms 3201 ms 8 336 ms 400 ms 19 3673 ms 3221 ms 9 408 ms 358 ms 20 2954 ms 3007 ms 8 369 ms 376 ms 21 2338 ms 2693 ms 7 334 ms 385 ms 22 2443 ms 2998 ms 8 305 ms 375 ms 23 3584 ms 3528 ms 9 398 ms 392 ms 24 2212 ms 2045 ms 5 442 ms 409 ms 25 2514 ms 3115 ms 8 314 ms 389 ms 26 3581 ms 3310 ms 9 398 ms 368 ms 27 3237 ms 3826 ms 10 324 ms 383 ms 28 3340 ms 3736 ms 10 334 ms 374 ms 29 2463 ms 2986 ms 8 309 ms 373 ms 30 3688 ms 3366 ms 9 410 ms 374 ms 31 2575 ms 2975 ms 8 322 ms 372 ms 32 2395 ms 2783 ms 7 342 ms 398 ms 33 2999 ms 2923 ms 8 375 ms 365 ms 34 3547 ms 3584 ms 9 394 ms 398 ms 35 2050 ms 2940 ms 5 410 ms 588 ms 36 3703 ms 3474 ms 9 411 ms 386 ms 37 2644 ms 3182 ms 8 331 ms 398 ms 38 2765 ms 3081 ms 8 346 ms 385 ms 39 3569 ms 3546 ms 9 397 ms 394 ms 40 1873 ms 1783 ms 5 375 ms 357 ms 41 3770 ms 3909 ms 10 377 ms 391 ms 42 3067 ms 2872 ms 8 383 ms 359 ms 43 3418 ms 3205 ms 9 380 ms 356 ms 44 4025 ms 3790 ms 9 447 ms 421 ms 45 2106 ms 2683 ms 7 301 ms 383 ms Average 2957 ms 3095 ms 8 365 ms 383 ms APPENDIX F Experiment 2 Order of Presentation of Sentences Practice Sentences 1. The frog followed the goose. 2. The dog was swallowed by the cat. 3. It was the cat that tackled the goose. 4. It was the frog that the cat punched. 5. The dog smacked the frog to the goose. 6. The dog was tickled to the frog by the cat. 7. The goose tickled the dog and tackled the cat. 8. The frog that the goose followed smacked the dog. 9. The cat punched the frog that swallowed the goose. 10. The frog that the goose followed smacked the dog. 2 11. The dog was tickled to the frog by the cat. 2 12. It was the frog that the cat punched. 2 13. The dog was swallowed by the cat. 2 14. The frog followed the goose. 2 15. It was the cat that tackled the goose. 2 16. The dog smacked the frog to the goose. 2 17. The goose tickled the dog and tackled the cat. 2 18. The cat punched the frog that swallowed the goose. 2 99 Appendix F, con't... Experimental Sentence Lists List 1: Intact Condition Block 1. The fox that the owl kissed tripped the duck. 1 2. It was the pig that the fox grabbed 1 3. The mouse pulled the owl. 1 4. The duck was scratched by the mouse. 1 5. The fox chased the owl that tapped the mouse. 1 6. The owl touched the mouse and hugged the pig. 1 7. The mouse hauled the fox to the duck 1 8. It was the duck that kicked the owl. 1 9. The owl was pushed to the mouse by the duck. 1 10. The pig pushed the duck to the fox. 2 11 .the pig that the duck hugged kissed the mouse. 2 12. The owl was kicked to the fox by the duck. 2 13. The owl was bumped by the mouse. 2 14. It was the mouse that the duck passed. 2 15. The pig pulled the duck that chased the owl. 2 16. The duck tossed the pig. 2 17. It was the duck that bumped the mouse. 2 18. The pig grabbed the duck and touched the owl. 2 19. The pig kissed the fox that pushed the mouse. 3 100 Appendix F con't... List 1: Intact Condition con't... Block 20. The mouse was tossed to the duck by the fox. 3 21 .The mouse tripped the duck and tapped the pig. 3 22. The fox scratched the owl. 3 23. It was the pig that tapped the owl. 3 24. The fox that the mouse bumped grabbed the duck. 3 25. It was the pig that the owl kicked. 3 26. The mouse was tossed by the owl. 3 27. The pig passed the owl to the mouse. 3 28. It was the pig that the owl hugged. 4 29. The pig touched the fox that tripped the duck. 4 30. It was the duck that passed the pig. 4 31 .The duck tossed the mouse. 4 32. The mouse hauled the pig to the fox. 4 33. The fox that the duck scratched chased the owl. 4 34. The pig was pulled by the fox. 4 35. The mouse passed the fox and kissed the owl. 4 36. The mouse was hauled to the duck by the fox. 4 37. The mouse was pushed to the fox by the pig. 5 38. The pig tapped the owl that kicked the fox. 5 39. The owl that the fox grabbed hugged the duck. 5 101 Appendix F con't... List 1; Intact Condition con't... Block 40. It was the mouse that bumped the fox. 5 41. It was the fox that the pig scratched. 5 42. The owl pulled the pig. 5 43. The mouse hauled the duck to the fox. 5 44. The pig was chased by the owl. 5 45. The owl touched the pig and tripped the fox. 5 102 Appendix F con't... List 1: Intact Condition con't... Summary of Sentence Numbers bv Sentence Target and Block Block Syntactic tvoe 1 2 3 4 5 Active 3 16 22 31 42 Passive 4 13 26 34 44 Cleft Subject 8 17 23 30 40 Cleft Object 2 14 25 28 41 Dative Active 7 10 27 32 43 Dative Passive 9 12 20 36 37 Co-Ordinated 6 18 21 35 45 Subject-Object 1 11 24 33 39 Object-Subject 5 15 19 29 38 103 Appendix F con't... List 2; Prosodic Condition Block 1. The pig touched the owl and chased the fox. 1 2. The owl scratched the fox that tapped the mouse. 1 3. The pig was pushed to the owl by the duck. 1 4. It was the duck that the fox bumped 1 5. The fox was hauled by the mouse. 1 6. It was the pig that passed the duck. 1 7. The owl that the mouse pulled tossed the duck. 1 8. The owl kicked the pig to the fox. 1 9. The duck tripped the fox. 1 10.The duck was scratched by the fox. 2 11 .The fox was passed to the owl by the pig. 2 12. The duck touched the pig and tripped the mouse. 2 13. The fox pushed the pig to the duck. 2 14. The pig that the fox hugged grabbed the mouse. 2 15. It was the mouse that the owl chased. 2 16. The fox chased the owl. 2 17. It was the owl that kissed the mouse. 2 18. The fox touched the pig that tapped the owl. 2 19. The pig passed the fox and tossed the mouse. 3 20. The duck tossed the fox to the mouse. 3 104 Appendix F con't... List 2; Prosodic Condition con't... Block 21 .The owl kissed the pig. 3 22. The owl bumped the pig that tapped the duck. 3 23. The fox was kicked to the pig by the owl. 3 24. It was the owl that the mouse pulled. 3 25. The mouse was tripped by the owl. 3 26. The duck that the mouse hugged pulled the pig. 3 27. It was the pig that hauled the fox. 3 28. It was the mouse that the owl hauled. 4 29. The pig was touched by the duck. 4 30. The duck that the mouse kissed hugged the owl. 4 31 .The owl kicked the duck that bumped the mouse. 4 32. The pig scratched the mouse. 4 33. The fox was passed to the mouse by the duck. 4 34. It was the fox that grabbed the mouse. 4 35. The duck hauled the pig to the fox. 4 36. The duck pushed the pig and bumped the mouse. 4 37. The owl pushed the duck. 5 38. The fox tossed the pig to the duck. 5 39. The duck was kissed by the pig. 5 40. The fox was pulled to the mouse by the pig. 5 105 Appendix F, con't... List 2; Prosodic Condition con't... Block 41 .The owl that the pig grabbed kicked the duck. 5 42. It was the owl that the mouse chased. 5 43. It was the fox that tapped the duck. 5 44. The owl grabbed the fox and tripped the mouse. 5 45. The mouse that scratched the owl hugged the duck. 5 106 Appendix F, con't... List 2; Prosodic Condition con't... Summary of Sentence Numbers by Sentence Target and Block Block Syntactic tvpe 1 2 3 4 5 Active 9 16 21 32 37 Passive 5 10 25 29 39 Cleft Subject 6 17 27 34 43 Cleft Object 4 15 24 28 42 Dative Active 8 13 20 35 38 Dative Passive 3 11 23 33 40 Co-Ordinated 1 12 19 36 44 Subject-Object 7 14 26 30 41 Object-Subject 2 18 22 31 45 107 Appendix F, con't... List 3; Concatenated Condition Block 1. It was the pig that the mouse bumped. 1 2. The mouse hugged the owl that pushed the duck. 1 3. The owl passed the duck to the mouse. 1 4. The owl tripped the fox and tossed the duck. 1 5. The pig kicked the owl. 1 6. The duck was tapped by the pig. 1 7. The pig was kicked to the duck by the fox. 1 8. The pig that the duck grabbed bumped the owl. 1 9. It was the duck that bumped the owl. 1 10. The mouse that the duck touched grabbed the fox. 2 11 .The fox passed the mouse to the pig. 2 12. It was the owl that the mouse hugged. 2 13. The mouse was pushed to the owl by the duck. 2 14. The mouse chased the owl and pulled the fox. 2 15. The fox grabbed the pig that tripped the duck. 2 16. The duck chased the mouse. 2 17. The duck was kissed by the pig. 2 18. It was the fox that touched the mouse. 2 19. The mouse bumped the owl that kissed the pig. 3 20. The owl tossed the pig to the fox. 3 108 Appendix F, con't... List 3; Concatenated Condition con't... Block 21. The duck was hauled by the mouse. 3 22. It was the owl that kissed the mouse. 3 23. The mouse kissed the owl and scratched the pig. 3 24. The pig scratched the fox. 3 25. It was the fox that the mouse pulled. 3 26. The mouse that the duck hauled tapped the owl. 3 27. The owl was tossed to the mouse by the duck. 3 28. The duck was pulled to the fox by the owl. 4 29. It was the mouse that hugged the duck. 4 30. The pig that the fox tripped touched the owl. 4 31. It was the owl that the fox pulled. 4 32. The pig was kicked by the fox. 4 33. The owl pushed the fox to the pig. 4 34. The fox grabbed the pig and scratched the duck. 4 35. The duck chased the fox. 4 36. The pig hauled the mouse that hugged the fox. 4 37. It was the fox that the duck tossed. 5 38. It was the pig that tapped the mouse. 5 39. The mouse that the fox touched tapped the pig. 5 40. The duck passed the owl. 5 Appendix F, con't... List 3; Concatenated Condition con't... 41 .The owl was passed to the pig by the fox. 42. The fox hauled the owl to the pig. 43. The owl kicked the pig that pushed the duck. 44. The pig chased the mouse and scratched the fox. 45. The mouse was tripped by the duck. 110 Appendix F, con't... List 3: Concatenated Condition con't... Summary of Sentence Numbers bv Sentence Target and Block Block Syntactic tvDe 1 2 3 4 5 Active 5 16 24 35 40 Passive 6 17 21 32 45 Cleft Subject 9 18 22 29 38 Cleft Object 1 12 25 31 37 Dative Active 3 11 20 33 42 Dative Passive 7 13 27 28 41 Co-Ordinated 4 14 23 34 44 Subject-Object 8 10 26 30 39 Object-Subject 2 15 19 36 43 111 Calibration Tone: Intensity = 91.9 dB A Voltage = 2.3786 V Experiment 1: Inverted Words Non-Inverted Words APPENDIX G Calibration Calculations Part 1: Voltages Average Voltage 0.4206 V 1.0261 V Experiment 2: Practice Sentences 0.2471 V Intact Condition Sentences 0.4585 V Prosodic Condition Sentences 0.3369 V Concatenated Condition Sentences 0.7116 V Part II: Calculations Step 1: The formula to calculate the difference in dB between two voltage values is: 20 log (voltage A / voltage B) The resulting difference is how much larger (or smaller) voltage A is than voltage B. In calculating the attenuation, the next step is to calculate the difference, in 112 Appendix G , con't... Part II. Calculations... dB, between the voltage of the speech signal and the voltage of the calibration tone. In all of the following calculations voltage A is the voltage of the speech stimuli, and voltage B is the voltage of the calibration, tone (2.3786 V). For all of the experimental conditions in this paper, the speech was less intense than the calibration tone. The resulting sign (positive or negative) from the calculations merely represents the relationship between the two voltages. Put another way, the resulting dB between voltage A and voltage B is a ratio, and if the ratio is negative than the value on top (voltage A) is smaller than the voltage on the bottom (voltage B). If the exact same numbers were put in but the ratio but inverted then the result would be the same number but positive. The absolute value of this number should be used in step 2. Step 2: Having determined the difference in dB between the calibration tone and the speech stimuli, it is possible to find the intensity in dB of the speech stimuli. The next step is to subtract the value obtained in step 1 from 91.9 dB to get the intensity of the speech stimuli in dB SPL. Step 3: Knowing the intensity of the speech stimuli from step 2, it is now possible to calculate the amount of attenuation required in order to present the speech stimuli at 70 dB SPL (the target presentation level). By subtracting 70 dB SPL from the value (in dB SPL) in step 2, the amount of attenuation required to 11 Appendix G , con't... Part II. Calculations... set the attenuator is determined. The experimenter enters this value into the attenuate portion of the ecosgen program. Experiment 1 Spectrally Inverted Words: Stepl: 20 log (0.4206 / 2.3786) = -15.049 = |15.049| The inverted words are on average 15.05 dB SPL quieter than the calibration tone. Step 2: The average sound pressure level of the inverted words is: 91.9 dB -15.05 dB = 76.85 dB. Step 3: To present the inverted words at 70 dB SPL, the attenuation to be entered into ecosgen is: 76.85 dB - 70 dB = 6.85 dB. Non-inverted Words: Stepl: 20 log (1.0261 / 2.3786) = -7.3026 = |7.3026| The non-inverted words are on average 7.3 dB SPL quieter than the calibration tone. Step 2: The average sound pressure level of the non-inverted words is: 91.9 dB - 7.3 dB = 84.6 dB. Step 3: To present the non-inverted words at 70 dB SPL, the attenuation to be entered into ecosgen is: 84.6 d B - 70 dB = 14.6 dB. Experiment 2 Practice Sentences: Stepl: 20 log (0.2471 / 2.3786) = -19.67 = |19.67| The practice sentences are on average 19.67 dB SPL quieter than the calibration tone. 114 Appendix G , con't... Part II. Calculations... Step 2: The average sound pressure level of the practice sentences is: 91.9 dB -19.67 dB = 72.23 dB. Step 3: To present the practice sentences at 70 dB SPL, the attenuation to be entered into ecosgen is: 72.23 d B - 70 dB = 2.23 dB. Intact Condition: Stepl: 20 log (0.4585 / 2.3786) = -14.30 = |14.30| The intact condition sentences are on average 14.3 dB SPL quieter than the calibration tone. Step 2: The average sound pressure level of the intact condition sentences is: 91.9 dB - 14.3 dB = 77.6 dB. Step 3: To present the intact condition sentences at 70 dB SPL, the attenuation to be entered into ecosgen is: 77.6 dB - 70 dB = 7.76 dB. Prosodic Condition: Stepl: 20 log (0.4206 / 2.3786) = -15.049 = |15.049| The prosodic condition sentences are on average 15.05 dB SPL quieter than the calibration tone. Step 2: The average sound pressure level of the prosodic condition sentences is: 91.9 dB -15.05 dB = 76.85 dB. Step 3: To present the prosodic condition sentences at 70 dB SPL, the attenuation to be entered into ecosgen is: 76.85 d B - 70 dB = 6.85 dB. 115 Appendix G , con't... Part II. Calculations Concatenated Condition: Stepl: 20 log (0.7116 / 2.3786) = -10.48 = 110.48| The concatenated condition sentences are on average 10.05 dB SPL quieter than the calibration tone. Step 2: The average sound pressure level of the concatenated condition sentences is: 91.9dB-10.48 dB = 81.42 dB. Step 3: To present the concatenated condition sentences at 70 dB SPL, the attenuation to be entered into ecosgen is: 81.42 d B - 7 0 dB = 11.42 dB. APPENDIX H 117 APPENDIX I Experiment 2: Instructions Presented to Participants The instructions below were presented on a computer monitor. The participants were told to read through the instructions, and if they had any questions to please ask the experimenter. i Practice Sentences: INSTRUCTIONS N O T E : To move the screen down use the mouse on the arrow keys at the side of the screen. You are going to hear some sentences. The sentences that you hear will be similar to these: A. "The cat tackled the dog." B. "It was the goose that the frog followed." C. "The frog was tickled by the cat." Each sentence will have one of nine (9) possible sentence structures. Your job is to indicate which of the nine (9) sentence structures you heard. The sentence structures will look like the following: 1. The cat tackled the dog. 2. The frog was tickled by the cat. 3. It was the goose that smacked the dog. 4. It was the goose that the frog followed. 5. The cat followed the dog to the goose. 6. The frog was smacked to the dog by the cat. 7. The dog tickled the cat and swallowed the frog. 8. The goose that the dog punched tackled the cat. 9. The frog swallowed the goose that punched the dog. Each sentence structure is represented by a number on the computer screen. An example of each structure is posted beside the number on a piece of paper. Each time that you hear a sentence, use the computer mouse to point to and click on the number that you think best matches the sentence structure you heard. 118 Appendix I, con't... You will now hear eighteen practice sentences. You will hear each of the nine types of sentences twice. Use the mouse to indicate which type of sentence structure you heard. The computer will not play the next sentence until you have clicked on one of the nine boxes. After you click on a number, the computer will highlight the correct answer in green. Therefore, if your response was correct it will be highlighted in green. If your response was incorrect the correct answer will be highlighted in green and your incorrect answer will be highlighted in red. Once you have finished the eighteen practice sentences, you will complete three sets of 45 real sentences. There will be a brief message before each set to let you know when each is about to begin. Each set of real sentences will take about 20 minutes to complete. When you are ready to hear the sentences, tell the experimenter. ii Condition 1: Now that you have completed the sample sentences, you are going to hear a set of 45 real sentences. After each real sentence you hear, use the mouse to point to and click on the sentence which best matches the sentence you heard. Remember that at all times the example sentences will be posted on the computer screen, and your task is to match the sentence type you hear to the nine possible sentence type options. The computer will not play the next sentence until you have clicked on one of the nine boxes. This time, you won't get any feedback about whether your response was correct. This test set will take about 20 minutes to complete. Good listening! iii Condition 2: You are now going to hear another set of 45 sentences, with same nine possible sentence structures. Each time you hear a sentence, use the computer mouse to point to and click on the number that you think best matches the sentence structure that you heard. This set of sentences will sound rather distorted, but they are meant to sound this way. If you are not sure of the sentence structure, please try to guess. This set will take about 20 minutes to complete. 119 Appendix I, con't... N O T E : After reading these instructions, the participants were then told again by the experimenter that the sentences would sound very unusual. This point was emphasized based on pilot participants who laughed when they heard the first sentence and therefore found it difficult to select the correct sentence type. ]y Condition 3: You will now hear a third set of 45 sentence. This set of sentences will sound different from either of the two sets that you have heard so far. Each time you hear a sentence, use the computer mouse to click on the number that you think best matches the sentence structure heard. If you are not sure of the sentence structure, please try to guess. This set will take about 20 minutes to complete. APPENDIX J Experiment 1 Alphabetical List Given to Participants This is a list of the words you will hear. Select one of these words. and chased and bumped by the mouse by chased duck fox grabbed hauled hugged it was it kicked kissed mouse owl passed pig pulled pushed scratched tapped that passed that the pig the tossed to the fox touched to tripped was passed was Appendix J , con't... Score Sheet for Experiment 1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 APPENDIX K Experiment 1 Order of Presentation of Individual Words Condition 1: Spectrally-inverted words, first time As in the order presented 1. chased 2. kicked 3. tossed 4. mouse 5. was passed 6. owl 7. by the mouse 8. hauled 9. kissed 10. tapped 11. the 12. it 13. duck 14. was 15. scratched 16. hugged 17. tripped 18. to the fox 19. touched 20. grabbed 21. passed 22. pushed 23. and 24. it was 25. to 26. and chased 27. pig 28. pulled 29. bumped 30. by 31. the pig 32. that passed 33. that 34. fox Numbe of syllables Appendix K, con't.. Condition 2: Non-altered words As in the order presented 1. owl 2. hauled 3. scratched 4. the 5. to the owl 6. and 7. chased 8. was passed 9. to 10. pulled 11. it 12. touched 13. passed 14. mouse 15. bumped 16. the pig 17. tapped 18. grabbed 19. fox 20. was 21. and chased 22. by 23. kicked 24. by the mouse 25. pushed 26. it was 27. that passed 28. hugged 29. that 30. pig 31. tossed 32. duck 33. kissed 34. tripped Numbe of syllable Appendix K, con't... Condition 3: Spectrally-inverted words, second time In the order presented 1. it was 2. pulled 3. duck 4. that 5. hauled 6. passed 7. was passed 8. kissed 9. owl 10. mouse 11. tapped 12. and chased 13. by 14. tripped 15. fox 16. the pig 17. bumped 18. and 19. by the mouse 20. kicked 21. tossed 22. that passed 23. was 24. chased 25. pushed 26. to 27. grabbed 28. touched 29. pig 30. to the fox 31. the 32. hugged 33. scratched 34. it Number of syllables 2 125 APPENDIX L Mean and Standard Deviation of Reaction Times For Syntactic Types Across Condition In each box of the table is listed the mean reaction time for all participants, and in brackets is the standard deviation of that mean. Syntactic Type Condition 1 Condition 2 Condition 3 A 1602 ms (685 ms) 1598 ms (496 ms) 2069 ms (453 ms) C S 1514 ms (1134 ms) 2030 ms (1670 ms) 932 ms (361 ms) C O 1486 ms (922 ms) 2275 ms (1714 ms) 1320 ms (864 ms) C 2170 ms (2004 ms) 1430 ms (1041 ms) 1668 ms (1647 ms) P 1742 ms (909 ms) 3078 ms (1533 ms) 1351 ms (694 ms) S O 2020 ms (975 ms) 1884 ms (1008 ms) 2418 ms (1110 ms) D 3096 ms (1454 ms) 2750 ms (1065 ms) 3549 ms (1590 ms) O S 4439 ms (2534 ms) 2550 ms (2007 ms) 3106 ms (1340 ms) DP 2204 ms(1079 ms) 6553 ms (3376 ms) 2156 ms (1019 ms) Overall 2252 ms (1648 ms) 3055 ms (3549 ms) 2063 ms (1341 ms) 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088529/manifest

Comment

Related Items