UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Identification of environmental sounds Spanik, Christiane Susan 1999

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1999-0293.pdf [ 6.97MB ]
Metadata
JSON: 831-1.0089022.json
JSON-LD: 831-1.0089022-ld.json
RDF/XML (Pretty): 831-1.0089022-rdf.xml
RDF/JSON: 831-1.0089022-rdf.json
Turtle: 831-1.0089022-turtle.txt
N-Triples: 831-1.0089022-rdf-ntriples.txt
Original Record: 831-1.0089022-source.json
Full Text
831-1.0089022-fulltext.txt
Citation
831-1.0089022.ris

Full Text

IDENTIFICATION OF E N V I R O N M E N T A L SOUNDS by CHRISTIANE SUSAN SPANIK B.Sc. McGi l l University, 1995 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE F A C U L T Y OF MEDICINE (School of Audiology and Speech Sciences) We accept this thesis as conforming to the required standard UNIVERSITY OF BRITISH C O L U M B I A April 1999 © Christiane Susan Spanik, 1999 I n p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r a n a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y sha l l m a k e i t f r e e l y a v a i l a b l e f o r r e f e r e n c e a n d s t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s m a y b e g r a n t e d b y t h e h e a d o f m y d e p a r t m e n t o r b y h i s o r h e r r e p r e s e n t a t i v e s . I t is u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n sha l l n o t b e a l l o w e d w i t h o u t m y w r i t t e n p e r m i s s i o n . S6\oO PepaitniLnT of fj(jd folOQcf {S^PpA ScuQlAC €S T h e U n i v e r s i t y o f B r i t i s h C o l u m b i a V a n c o u v e r , C a n a d a Date OAf/if 5Q: im DE-6 (2/88) A B S T R A C T The purpose of this study was to investigate the processing of environmental sound, and to compare this to the processing of spoken language. This was done by conducting an identification experiment using the gating paradigm to assess the on-line processing of environmental sounds. In this experiment, twenty participants identified brief, isolated segments of environmental sounds and reported a confidence rating for their responses. Additional context was provided by gating the original recordings in both the preceding and following directions. Eight different recordings of environmental sound were presented, with each listener hearing additional segments of each recording in either the preceding or following direction. It was found that identification performance improved with addition of context in both directions of context, suggesting that as with language, there are context effects in environmental sound processing. Unlike language, however, there is no clear effect of direction of context. There was some evidence that high-context sound sequences were more easily identified than low-context sound sequences, but these results were not consistent. A n error analysis provided strong evidence that the process of environmental sound identification entails the activation of a cohort, which is compelling evidence that there may be some similarities in the auditory processing language and environmental sound. It appears that item identification in these two types of auditory input may be similar, with bottom-up processes activating an initial cohort based primarily on acoustic information, and top-down processes narrowing the size of the cohort as more information becomes available. i i i T A B L E OF CONTENTS A B S T R A C T i i T A B L E OF CONTENTS i i i LIST OF TABLES vii LIST OF FIGURES viii A C K N O W L E D G E M E N T S xiv Chapter 1: LITERATURE REVIEW 1 1.1 Overview 1 1.2 The Hypothetical Special Status of Language 2 1.3 Environmental Sound 5 1.4 Is There a Language of Environmental Sound? 7 1.5 Context in Identification of Environmental Sounds 12 1.6 The Vancouver SoundScape 13 1.7 Language Processing Issues 14 1.7.1 The Cohort Theory of Word Identification 15 1.7.2 Cognitive Factors in Spoken Word Identification 17 1.8 The Present Study 22 1.8.1 Purpose 22 1.8.2 Relevance 22 1.9 Hypotheses 24 Chapter 2: METHODS 29 2.1 Purpose of the Experiment 29 iv 2.2 Participants 30 2.3 Materials 31 2.3.1 High and Low Context Environmental Sounds 33 2.4 Preparation of the Stimuli 33 2.5 Presentation of the Stimuli 37 2.6 Ordering of the Stimuli 38 2.7 Calibration of the Stimuli 39 2.8 Experimental Conditions 40 2.9 Experimental Design 41 2.10The Experimental Task 42 2.10.1 Stopping Criteria 43 Chapter 3: RESULTS 44 3.1 Chapter Preview 44 3.2 Scoring Procedure 44 3.3 Soundfiles Identified Correctly 45 3.4 Identification Results for Individual Soundfiles 47 3.4.1 Soundfile 1 52 3.4.2 Soundfile 2 55 3.4.3 Soundfile 3 58 3.4.4 Soundfile 4 61 3.4.5 Soundfile 5 64 3.4.6 Soundfile 6 67 V 3.4.7 Soundfile7 70 3.4.8 Soundfile8 73 3.5 Patterns in the data 76 3.6 Direction of Context 77 3.7 Amount of Context 77 3.8 Listener Performance 79 3.8.1 High Reading Working Memory Listeners 79 3.8.2 Low Reading Working Memory Listeners 85 3.8.3 A l l Listeners 87 3.9 Error Analysis 89 Chapter 4: DISCUSSION 99 4.1 Overview 99 4.1.1 Identification of Environmental Sounds 99 4.1.2 Effects of Additional Context on the Identification of Brief Environmental Sounds 100 4.1.3 Patterns of Recognition 101 4.1.4 Quantity of the Context 102 4.1.5 Performance of Individual Listeners 102 4.1.6 Comparison of Environmental Sound Processing to Language Processing 103 4.1.6.1 Degree of Context 103 4.1.6.2 Direction of Context 107 vi 4.1.6.3 Working Memory Span 109 4.1.6.4 Error Analysis 110 4.2 Future Directions 111 REFERENCES 119 APPENDIX A : Participants' Pure Tone Thresholds (dB HL) for Right (R) and Left (L) Ears 122 APPENDIX B: Instructions for the Reading Working Memory Span Test 123 APPENDIX C: Participants' Characteristics 124 APPENDIX D: Source of Experimental Soundfiles from SoundScape Archives with Descriptions 125 APPENDIX E: Stimuli Time Waveforms of Experimental Stimuli 126 APPENDIX F: Characteristics of Experimental 130 APPENDIX G: Calibration of Experimental Stimuli 131 APPENDIX H : Schematic of the TDT Set-up 135 APPENDIX I: Order of Stimuli by Soundfile Number and Condition 136 APPENDIX J: Participant Instructions 13 7 APPENDIX K : Reduced Graphic Representation of Response Form 138 LIST OF TABLES vii Table 1. Assignment of Soundfile Presentation Order for the Low Reading Working Memory Group. 42 Table 2. Responses Accepted as Correct for Scoring 45 Table 3. Error Analysis for Soundfile 1 92 Table 4. Error Analysis for Soundfile 2 93 Table 5. Error Analysis for Soundfile 3 93 Table 6. Error Analysis for Soundfile 4 94 Table 7. Error Analysis for Soundfile 5 95 Table 8. Error Analysis for Soundfile 6 96 Table 9. Error Analysis for Soundfile 7 97 Table 10. Error Analysis for Soundfile 8 98 V l l l LIST OF FIGURES Figure 1. Illustration of the Gating Procedure 36 Figure 2. Illustration of Gating in the Opposite Direction 37 Figure 3. Correct Identification of Soundfiles, Combined for the High Working Memory Span Listeners 46 Figure 4. Correct Identification of Soundfiles by Initial Context Direction for the High Reading Working Memory Span Group 47 Figure 5. First Response - Mean, Median Number of Gates by Soundfile and Condition....50 Figure 6. First Response at Confidence Rating of 5 or Better - Mean, Median Number of Gates by Soundfile and Condition 50 Figure 7. First Response at Maximum Confidence - - Mean, Median Number of Gates by Soundfile and Condition 50 Figure 8. First Response - Mean, Median Percentage of Gates by Soundfile and Condition 51 Figure 9. First Response at Confidence Rating of 5 or Better - Mean, Median Percentage of Gates by Soundfile and Condition 51 Figure 10. First Response at Maximum Confidence - Mean, Median Percentage of Gates by Soundfile and Condition 51 Figure 11. Soundfile 1: Distribution of Correct Responses for Each Listener in the Preceding-First Condition '. 53 Figure 12. Soundfile 1: Distribution of Correct Responses for Each Listener in the Following-First Condition ; 53 I X Figure 13. Soundfile 1: Distribution of First Responses for Listeners in the Preceding-First Condition 54 Figure 14. Soundfile 1: Distribution of First Responses for Listeners in the Following-First Condition 54 Figure 15. Soundfile 2: Distribution of Correct Responses for Each Listener in the Preceding-First Condition 56 Figure 16. Soundfile 2: Distribution of Correct Responses for Each Listener in the Following-First Condition 56 Figure 17. Soundfile 2: Distribution of First Responses for Listeners in the Preceding-First Condition 57 Figure 18. Soundfile 2: Distribution of First Responses for Listeners in the Following-First Condition 57 Figure 19. Soundfile 3: Distribution of Correct Responses for Each Listener in the Preceding-First Condition 59 Figure 20. Soundfile 3: Distribution of Correct Responses for Each Listener in the Following-First Condition 59 Figure 21. Soundfile 3: Distribution of First Responses for Listeners in the Preceding-First Condition 60 Figure 22. Soundfile 3: Distribution of First Responses for Listeners in the Following-First Condition 60 Figure 23. Soundfile 4: Distribution of Correct Responses for Each Listener in the Preceding-First Condition 62 X Figure 24. Soundfile 4: Distribution of Correct Responses for Each Listener in the Following-First Condition 62 Figure 25. Soundfile 4: Distribution of First Responses for Listeners in the Preceding-First Condition 63 Figure 26. Soundfile 4: Distribution of First Responses for Listeners in the Following-First Condition 63 Figure 27. Soundfile 5: Distribution of Correct Responses for Each Listener in the Preceding-First Condition 65 Figure 28. Soundfile 5: Distribution of Correct Responses for Each Listener in the Following-First Condition 65 Figure 29. Soundfile 5: Distribution of First Responses for Listeners in the Preceding-First Condition 66 Figure 30. Soundfile 5: Distribution of First Responses for Listeners in the Following-First Condition 66 Figure 31. Soundfile 6: Distribution of Correct Responses for Each Listener in the Preceding-First Condition 68 Figure 32. Soundfile 6: Distribution of Correct Responses for Each Listener in the Following-First Condition 68 Figure 33. Soundfile 6: Distribution of First Responses for Listeners in the Preceding-First Condition 69 Figure 34. Soundfile 6: Distribution of First Responses for Listeners in the Following-First Condition 69 X I Figure 35. Soundfile 7: Distribution of Correct Responses for Each Listener in the Preceding-First Condition 71 Figure 36. Soundfile 7: Distribution of Correct Responses for Each Listener in the Following-First Condition 71 Figure 37. Soundfile 7: Distribution of First Responses for Listeners in the Preceding-First Condition 72 Figure 38. Soundfile 7: Distribution of First Responses for Listeners in the Following-First Condition 72 Figure 39. Soundfile 8: Distribution of Correct Responses for Each Listener in the Preceding-First Condition 74 Figure 40. Soundfile 8: Distribution of Correct Responses for Each Listener in the Following-First Condition 74 Figure 41. Soundfile 8: Distribution of First Responses for Listeners in the Preceding-First Condition 75 Figure 42. Soundfile 8: Distribution of First Responses for Listeners in the Following-First Condition 75 Figure 43. First Response at of CR 5 or Better - Ordered Median, Mean Number of Gates by Soundfile and Condition 79 Figure 44. First Response at of CR 5 or Better - Ordered Median, Mean Percentage of Gates by Soundfile and Condition 79 Figure 45. Participant 1: First Correct Response Gate at CR of 5 or Better With Group Data ..81 Figure 46. Participant 3: First Correct Response Gate at CR of 5 or Better With Group Data 81 Figure 47. Participant 6: First Correct Response Gate at CR of 5 or Better With Group Data.. 81 Figure 48. Participant 8: First Correct Response Gate at CR of 5 or Better With Group Data 81 Figure 49. Participant 10: First Correct Response Gate at CR of 5 or Better With Group Data 82 Figure 50. Participant 12: First Correct Response Gate at CR of 5 or Better With Group Data..... 82 Figure 51. Participant 13: First Correct Response Gate at CR of 5 or Better With Group Data 82 Figure 52. Participant 15: First Correct Response Gate at CR of 5 or Better With Group Data '. .82 Figure 53. Participant 2: First Correct Response Gate at CR of 5 or Better With Group Data 83 Figure 54. Participant 4: First Correct Response Gate at CR of 5 or Better With Group Data. 83 Figure 55. Participant 5: First Correct Response Gate at CR of 5 or Better With Group Data ! ...83 Figure 56. Participant 7: First Correct Response Gate at CR of 5 or Better With Group Data 83 X l l l Figure 57. Participant 9: First Correct Response Gate at CR of 5 or Better With Group Data 84 Figure 58. Participant 11: First Correct Response Gate at CR of 5 or Better With Group Data...... 84 Figure 59. Participant 14: First Correct Response Gate at CR of 5 or Better With Group Data • 84 Figure 60. Participant! 6: First Correct Response Gate at CR of 5 or Better With Group Data ....... .......84 Figure 61. Participant 18: First Correct Response at CR of 5 or Better With Group Data 86 Figure 62. Participant 19: First Correct Response at CR of 5or Better With Group Data 86 Figure 63. Participant 17: First Correct Response at CR of 5 or Better With Group Data.... 86 Figure 64. Participant 20: First Correct Response at CR of 5 or Better With Group Data 86 Figure 65. Listener Performance: Number of Soundfiles Correctly Identified by Participants: First Response and CR of 5 or Better : 87 Figure 66. Listener Performance: Analysis of Maximum Confidence Ratings Reported by Participants 88 Figure 67. Listener Performance: Correlation Between Number of Soundfiles Correctly Identified and Average Confidence Ratings .....88 X I V A C K N O W L E D G E M E N T S I would like to thank Kathy Pichora-Fuller for the idea and support for this thesis and all of the guidance she has given me along the way, and Jeff Small and Rushen Shi for their input and enthusiasm for the study. I would also like to thank Barry Truax and his staff and students at SFU who helped me a great deal in my endeavor. I am grateful to the staff, faculty, and students of the School of Audiology and Speech Sciences for providing help and moral support, especially John Nichol, the computer guru. I thank all of my fellow thesis students for their company and empathy. I am indebted to my family, particularly my parents, who set the bar high and have provided encouragement and support through my many years of school. Finally, I am most grateful to Craig, who has provided more support, love and understanding than could reasonably be asked of anyone. This research was supported by a grant form the National Science and Engineering Research Council of Canada to M . K . Pichora-Fuller. 1 LITERATURE REVIEW 1.1 Overview It is interesting to know how people ascribe meaning to environmental sound. Despite its constant presence in our lives, there is relatively little known about the way humans process this type of auditory input. Although the literature on human auditory perception contains innumerable instances of studies of spoken language comprehension and, to a lesser extent, musical stimuli, there are considerably fewer studies that have investigated the processes which underlie human identification of environmental sound stimuli. The following section will present a review of relevant literature. The discussion will begin with the hypothesized special status for speech within the brain. The term environmental sound will be defined and discussed, and the possibility that environmental sound has structural features that are processed analogously to spoken language will be considered. Certainly, both can be regarded as instances of pattern perception. This has raised some interesting questions about auditory processing. If environmental sound has a structure like language, do we process it in a similar way? Studies that have investigated the influence of context in environmental sound processing will also be discussed. A current model of language processing, the cohort theory, will be described in some detail, but it bears mention at this point that there is no corresponding theory of environmental sound processing. Numerous models, and components thereof, have been invoked, but the general consensus appears to be that due to the dearth of a coherent knowledge base in this field, these models lack sufficient empirical support (McAdams, 1993; Truax, 1996). Finally, two recent studies 2 of on-line language processing will be discussed, as these will form a framework for the present research. 1.2 The Hypothetical Special Status of Language In the literature on auditory perception and processing, much debate has surrounded the possibility that language is special or privileged above other types of auditory input, such as environmental sound or music (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). In addition to the usual emphasis on the differences between these types of auditory inputs, it may also be interesting to consider what these types of processing may have in common. Despite the position that speech is special, there may be some situations in which speech might be more likely to be perceived as environmental sound. The linguistic content of language is not likely to be fully appreciated by the listener in examples such as a non-native listener hearing a foreign language, or an infant hearing a language it has not yet acquired. There are also some aspects of language that may be processed more like environmental sound, such as voice or emotion recognition. For the purposes of this debate, the experimental paradigm should ideally contrast well-matched speech and non-speech stimuli. Specifically, the stimuli used should be matched for naturalness; i f natural speech sounds are used, the non-speech stimuli should not be laboratory-created artificial sounds. Some of the early studies failed to address this issue, using non-speech stimuli such as pure or complex tones, white noise, and clicks (for example, Haydon & Spellancy, 1973; Morton & Chambers, 1976). As these are not typical of the environmental sounds most listeners encounter in their lives, and are therefore meaningless to listeners, conclusions drawn based on such stimuli are questionable. 3 Fortunately, more recent studies have addressed the issue of naturalness in stimuli selection (for example, Cycowicz & Friedman, 1998; Van Petten & Rheinfelder, 1995). Related to the larger issue of the special status of language in the brain are issues concerning the hypothesized routing of the different types of auditory inputs at some pre-central stage of processing, and the lateralization of different types of auditory input in the brain. Jackendoff (1987) assumed in his theories of the computational mind that there is some kind of precognitive streaming of sensory input which takes place at some point following the peripheral systems but prior to central processing. For the present discussion, we are interested primarily in this streaming of inputs to the auditory system. The division he proposes would direct incoming auditory information according to whether it was language, musical, or general purpose auditory input. Presumably, by general auditory input, Jackendoff was referring to a class of sounds that include environmental sounds. Lateralization refers to the fact that different types of perceptual input are processed differentially by the hemispheres of the brain. Numerous studies have found that environmental sounds are not processed in the same regions of the brain, or indeed in the same hemisphere of the brain as language. For example, Van Petten and Rheinfelder (1995) investigated the conceptual relationships between spoken words and environmental sounds in an event-related brain potential (ERP) study. They found that the ERP context effects elicited by words and environmental sounds showed an asymmetry: "the contextual modulation of a late negative wave in the ERPs elicited by sounds was larger over the left hemisphere, [which] is consistent with the idea that the two context effects received differential contributions for the two cerebral hemispheres" (p. 501). The association of this response 4 with the left hemisphere may seem counterintuitive because the left hemisphere is considered to be the site of language processing. However, Van Petten and Rheinfelder note that the patterns of asymmetries in ERP studies are not easily interpretable because activity in one hemisphere or the other could be reflected in either positivity or negativity in the response. Also, because it is not only the location of neurons, but their orientation to the skull which affects the scalp distribution of the ERP, it is difficult to make claims regarding the hemisphere involved in processing. Therefore, the asymmetry indicated simply reflects differential hemispheric involvement in the processing of speech and environmental sound stimuli. Interestingly, this division in the brain between the processors of linguistic and non-linguistic stimuli is not restricted to auditory language. Hickok, Bellugi & Klima (1996) found that there is a similar dissociation for visual stimuli. In a study of brain-lesioned A S L users, the authors found that participants with left hemisphere lesions performed significantly worse on language measures that assessed production, reception, comprehension, naming and repetition. Further, it was found that these deficits did not reflect deficits in general spatial cognitive ability. In fact, there was a double-dissociation found; some of the participants with left hemisphere lesions had disrupted A S L abilities, but performed well on visuo-spatial tasks, and conversely, some of the participants with right hemisphere lesions had normal A S L abilities, but performed poorly on visuo-spatial tasks. Therefore, the observed dissociation between the brain's processing of linguistic and non-linguistic input is apparently not specific to the auditory modality. We recognize that, irrespective of the status of language in auditory processing, spoken language is a rich code and deserving of study. It is enormously important to our lives 5 for the purposes of cognitive processing, exchanging information, and the socialization of our species. On the other hand, it would be a disservice to underestimate the importance of environmental sound. It allows us to understand and monitor the world around us, a function that should not be taken lightly. 1.3 Environmental Sound What is an 'environmental sound'? Environmental sounds will be defined as sounds which are "produced by real events [and have] meaning by virtue of their causal events" (Ballas & Howard, 1987, p.97). Ballas and Howard objected to considering environmental sounds as those which are non-speech and non-music because this definition implies a secondary status for environmental sound, and assumes that adequate definitions of speech and music exist, an assumption that is unwarranted in their opinion. Nonetheless, considering these sounds as those which are non-speech and non-music provides another frame of reference. The critical semantic difference between language and environmental sound, as described by Ballas and Howard, is the manner in which meaning is conveyed. They argued that the spoken word has meaning because of social convention. Two people must agree that the acoustic sequence [blu] refers to the color blue in order for the sequence to have that meaning. Conversely, environmental sounds convey meaning by virtue of the causal events; the sound of a car door closing simply means that a car door was closed. Ballas and Howard did not note that there may be situations in which an environmental sound may convey a culturally defined meaning. Such an example is the use of particular sequences of tones (two long and one short or one long and one short) to signal the floor in a hospital where there is a fire alarm. Although these types of socially meaningful environmental sounds are probably 6 not the most common instances of environmental sound in the auditory world, they are interesting to note because, like language, the meaning of the sound is stored in the mind of the sender and receiver. Ballas and Howard did not address the nature of meaning in music, but for a more complete discussion it too needs to be considered. Like language, music has meaning because of social convention. For example, scholars may refer to the characteristics and history of music of specific epochs, and societies, and certainly, our auditory perception of music is more a matter of aesthetics than survival. Finally, music also serves a kind of communicative purpose, especially with respect to the communication of emotions. Environmental sound is the term of choice for the present study because it reflects an underlying ecological view of these sounds. Not only do the sounds reflect events in the environment, they also provide an interface between the individual and the environment. Truax (1996) called this interaction acoustic communication. This is a considerably different stance than that taken in much of the previous work in this field. Many studies of the perception of environmental sound have used terms such as 'non-speech', 'non-verbal', 'non-linguistic', or 'everyday sounds' to refer to the stimuli used. The first three of these by their very nature are biased towards speech and furthermore, these studies frequently used non-meaningful stimuli, such as pure tones, complex tones or noise. The label 'everyday sounds' is also insufficient, because the everyday sounds typically used are limited in ecological scope; they may adequately represent the everyday sounds of some individuals and communities but not others. Environmental sounds must be seen for what they are; they are meaningful if the sound sources are meaningful to the individual, the community, the culture or the time. For illustration, consider the ringing of a cell phone. This sound can be identified by the majority of citizens of present-day Vancouver, but we can only speculate about how 7 Vancouverites of the 1800s, or aboriginal peoples of remote regions of the world would interpret the sound. It is worth considering i f attentional factors differ depending on whether the input is an environmental sound or language or music. Listeners may not be aware that they are constantly monitoring the world around them (Truax, 1996), whereas they may be more aware that they are actively engaged in spoken language communication or musical appreciation. For all types of input, it seems that awareness may be heightened when communication breaks down. Certainly, communication partners are aware of listening when conversations are conducted in difficult listening conditions, or the other partner is distracted, or conversation is too one-sided. Listeners are also capable of directing their attention to environmental sound, as anyone who has awoken to a bump in the night may report. The listener can become acutely aware of environmental sounds and foreground them when the situation warrants it. Conversely, speech or music may be perceived as nothing more than background, for example i f a person is reading while conversations are being conducted around him. 1.4 Is There a Language of Environmental Sound? In a discussion of the similarities and differences between environmental sound and spoken language, the proposition that environmental sound itself comprises a language is very tempting to consider. Ballas and Howard (1987) made this very argument, basing their position on the fact that, like language, environmental sound is a system that conveys meaning, has some structure and is processed both from the bottom up and from the top down. This, of course, is a very simplified view of the criteria for language. Scholars have 8 established an extensive set of criteria, called "design features" (O'Grady & Dobrovolsky, 1992, p. 575), to describe human language. Most of these features are not and cannot be fulfilled by environmental sounds, such as "arbitrariness [which means that] there is no natural or inherent connection between a token and its referent" (p. 576) and "displacement [which means that] users of the system are able to refer to events remote in space and time" (p. 577). It would also be difficult to argue that environmental sound is generative or rule-governed in the same sense that language is; for example, there are no sounds which could be considered analogous to the functions words of language, such as for or to, which serve only to connect other words. By its nature, environmental sound is non-arbitrary and rooted in the natural and physical world. As such, productivity and rules are governed and limited by the physical nature of the sound-producing objects and events. If the sign is not arbitrary, can it constitute language? Linguists would probably say no. However, we can once again recognize some characteristics that are shared between these types of auditory input. For example, listeners can think about environmental sounds, describe them, attribute qualities to them, and think about them at later times, just as they do with spoken language. Notwithstanding the oversimplification of Ballas and Howard's characteristics of a language, we will consider their argument in our discussion. Before continuing on to Ballas and Howard's (1987) argument, though, it is very interesting to note that the points they make regarding the features which make environmental sounds language-like have also been used by Jackendoff in a discussion of the representation and processing of music in the mind. He described the meaning of music, its hierarchical levels of structure and its processing, which, like language, proceed both from the bottom up and the top down. However, a crucial difference between Jackendoff s 9 discussion (1987) and the position of Ballas and Howard is that Jackendoff did not presume to refer to music as a language. He simply identified some similarities between these types of auditory input, and applied them where appropriate by using some of the language of the linguistic processing literature. The discussion of a possible language of environmental sound will begin with Ballas and Howard's (1987) contention that, like spoken language, environmental sounds are represented at different levels: phonological, semantic, and syntactic. The fact that environmental sounds convey meaning suggests a semantic level. The potential to derive meaning from the sequence of sounds may be thought of as evidence of a structure somewhat like syntax, although a true syntax of language includes many features not observed in environmental sounds, such as hierarchical and governing relationships among its elements. Ballas and Howard did not discuss the lexical or the discourse levels; however, the auditory object (a single auditory happening) and the auditory scene (the acoustic content of the listeners auditory world) may be thought of as approximately equivalent levels of representation to lexical and discourse, respectively. These terms, derived from Bregman's (1990) work on auditory scene analysis, will be more fully discussed in section 2.1. An environmental sound equivalent for the phonemic level of representation appears, however, to be elusive, and a full set of acoustic features has yet to be postulated. The authors pointed out that for speech the articulatory mechanism is limited in the number of possible sounds it can produce. The real noise-producing world is not confined to the same extent, making simple categorization difficult. Greenberg (1996) offered an evolutionary view of this situation. He contended that since the auditory system preceded the vocal apparatus in human evolution, it is "likely that the auditory system has imposed far more 10 constraints on the human vocal apparatus than vice versa" (Greenberg, p. 398). The auditory system, however, is certainly powerless to shape the rest of the natural world. This may be the reason why features of environmental sound, unlike those in phonology, have not yet been identified; environmental sounds were not made to measure for the purpose of human auditory perception. Returning to Ballas and Howard's (1987) position, the present discussion will now turn to auditory processing issues. Parallels between the processing of language and environmental sounds relate to bottom-up and top-down processes that are apparent for both types of auditory information. Bottom-up processing is reflected in a listener's ability to identify isolated environmental sounds, notably when this occurs in a controlled, experimental setting in which the contextual cues contributing to top-down processing are limited. Identification under such conditions is possible, and errors can illuminate parallels to word identification. The noted example is the phenomenon of homonyms; in the case of both codes, these are sounds which may be confused due to a great degree of acoustic similarity, although they have different meanings/sources. The string [5er] could mean their, they're or there i f spoken out of context; the sound of a car backfiring may sound to the listener like a gunshot i f he/she is unaware of the context. Top-down processing of sound is context driven. This type of processing may be observed for environmental sounds as well as for spoken language. Ballas and Howard (1987) used the example of temporal structure, which they proposed as a loose syntax of environmental sound. They argued that, like the syntax of spoken language, the syntax of environmental sound is learned and used implicitly. Evidence for their position comes from studies (Howard & Ballas, 1980, 1982) in which participants were able to learn and use 11 patterns of sound in a recognition task, but they were unaware of having done so. Subjects in these studies were required to categorize a sequence of water- and steam-related sounds that were either grammatical (following a finite-state grammar developed by Howard and Ballas) or ungrammatical (randomly generated). Subjects in these studies performed better on the sequentially structured sequences, even when these were minimally interpretable. In addition, results suggested that, as with speech, there was an important interaction of semantic and syntactic factors that affected performance (Howard & Ballas, 1982). Performance of the subjects was enhanced when they were explicitly told the semantic context. The notion that environmental sounds may comprise a language is certainly contentious. In fact, in Ballas and Mullins (1991), the authors revisited the notion that environmental sound possesses a language-like structure and conclude that the similarities between language and environmental sound are notably fewer than proposed in their earlier work (Ballas & Howard, 1987). Ballas and Mullins (1991) asserted that drawing parallels between language and environmental sounds is problematic at best, and at worst, fundamentally invalid. They argued that comparison of the effects of context in language to that of environmental sound may be ill-advised, and they even questioned the meaningfulness of environmental sound. Truax (1996) contended that environmental sound does not comprise a language because, although there is organization and meaning, these are fundamentally different from analogous representations in language or music. Regardless of these reservations, there are important and undeniable similarities between spoken language and environmental sound. The similarities of interest for the purpose of this study are pattern perception and the use of context in auditory processing. 12 1.5 Context in Identification of Environmental Sounds It is generally accepted that context does play an important role in the comprehension of environmental sounds (Ballas & Howard, 1987; Truax, 1996). Some previous studies have investigated the effects of context on the comprehension of environmental sounds. Ballas and Mullins (1991) used homonymous sounds in a series of experiments , intended to determine whether interpretation of a sound is affected by the sequence in which it is embedded. Pairs of nearly homonymous sounds were identified, and stimuli were created using these such that each sound would occur in two contexts; one in which it was consistent with the other sounds and one in which it was substituted for its homonym. For example, the sound of a fuse burning and food frying would be a pair of near homonyms. The natural sequence would be the sequence of a match being struck, the fuse burning, and an explosion. The substitution sequence might be slicing food, chopping food, and the fuse burning. Results of this study indicated that there was at best a minimal positive effect of context, and more prominent was a strong negative effect of context. That is, identification performance for sounds occurring within the natural context was not significantly better than the baseline measure of identification of the same sounds presented in isolation. The negative effect of context refers to the finding that presence of context tended to bias the listener against the correct interpretation when the context was the substitution sequence; the listener would be less likely to correctly respond that the stimulus was the sound of a fuse burning i f the surrounding context was slicing food, and chopping food than if the context was a match being struck, and an explosion. Van Petten and Rheinfelder (1995), in an event-related potential study, found that context effects were similar for both spoken language and environmental sound stimuli in the 13 patterns of results from a mismatch negativity study, providing evidence that both are influenced by context. 1.6 The Vancouver Soundscape The stimuli used in most studies of environmental sound are taken from commercial or experimental recordings or from libraries of isolated real sounds. These are combined artificially into sequences, typically through electronic manipulations. Although the researchers might preserve the acoustical properties of the sounds themselves, there may be other factors involved in the natural occurrences of these sequences, such as ambient noise or subtle temporal or intensity cues, which are lost by splicing together manufactured sounds. For the purposes of the present study, we were fortunate to have access to recordings that were made in the real world and so were able to provide the participants with the most natural stimuli possible, in the hope that this would eliminate the potential loss of subtle auditory cues. The corpus from which these recordings were taken was the archives of the Vancouver Soundscape Project. This project began in the 1970's with the genesis of the World Soundscape Project (WSP), the principal purpose of which was "to document and archive soundscapes, to describe and analyze them, and to promote increased public awareness of environmental sound through listening and critical thinking" (Truax, 1996, p. 54). The recordings have subsequently been used in a relatively new genre of musical composition called "soundscape composition" (Truax, 1996, p. 54). Soundscape study for the WSP began in Vancouver before extending across Canada and around the world, and the results, including those early recordings, comprise the Vancouver Soundscape Project. This work has continued to the present, and more recently 14 recorded materials were used for this study. Those selected for use in this experiment were recorded on a good quality portable digital audio tape (DAT) recorder between late 1991 and early 1993. Relatively recent stimuli were selected because it has been found that the urban soundscape changes considerably over time (Truaxj 1996). Obviously, listeners usually perceive auditory input along with visual, olfactory, and other modalities of input. Human perception of the world is dependent upon the processing of these various modalities, as well as interactions between modalities. For this experiment, however, we wanted to address auditory input without co-occurring input from other modalities. It was assumed that listeners would be able to mentally generate a context based on the auditory stimulus alone. A pilot study using the selected experimental stimuli was conducted to ensure this was the case. 1.7 Language Processing Issues In order to compare the processing of language to the processing of environmental sounds, an understanding of the former is important. The first of the following sections will describe a contemporary and well regarded theory of lexical access, the cohort theory. This theory was selected for discussion because of its wide acceptance, the support for it in the literature, and its capacity to address working memory issues. The second of the following sections will describe some of the cognitive factors that have been found to play a role in lexical access. 15 1.7.1 The Cohort Theory of Word Identification In studies of word recognition and word-shadowing, Marslen-Wilson and Welsh (1978) showed that not only does the process of word identification involve the interaction of bottom-up (data-driven) and top-down (knowledge-driven) processes, but in fact the immediate result of that interaction is the percept. This contrasted to earlier models that postulated serial processing of bottom-up and top-down information. Based on these results, a new model, an active direct access model of word identification was created. This model was based on the notion that there is an initial input-driven activation of a class of word-candidates, called a cohort. This activation is based purely on acoustic and phonetic properties of the input and it effectively and immediately limits the number of potential interpretations for a given word to those with similar acoustic onsets. Top-down processes then apply other knowledge sources to rapidly narrow in on and select the correct word. This process restricts the potential candidates to at least the same extent as a purely bottom-up model would, and it is likely to have the correct interpretation among the initial candidates. The interaction of top-down and bottom-up processing makes recognition faster than i f it were solely driven by bottom-up processes. This seminal work has since been refined and the model has been dubbed the cohort theory (Marslen-Wilson & Tyler, 1980). Research into this theory has benefited greatly from the development of the gating paradigm which is a method used to study the on-line processing of words (Grosjean, 1980). Previously used on-line processing tasks included shadowing (Marslen-Wilson & Welsh, 1978), and monitoring tasks (Marslen-Wilson & Tyler, 1980), which revealed the end products of processing, but not the interim products. The important feature of the gating 16 paradigm, developed by Grosjean (1980) is the ability it afforded to researchers to investigate the temporal structure of word comprehension. Essentially, this method involves the successive presentation of increasingly longer portions of a stimulus. The procedure whereby the stimuli are excised into predetermined durations is known as gating, and each of these incrementally longer portions is known as a gate. Typically, the first gate is very short in duration, less than the minimum required for correct identification, and the longest gate is the full duration of the stimulus. The listener hears the gates of a given stimulus in order, from briefest to longest. The task is to guess the identity of the stimulus following the presentation of each gate and provide a confidence rating for the response given. By presenting stimuli in this way, the researcher is able to study the on-line process of word identification. This is done by determining the isolation point, which is the gate at which the stimulus is guessed correctly. It is at this gate that the stimulus is isolated from the other possible words in the initial cohort. Error analysis is also performed, because it provides information on the word isolation process by providing insight into the composition of the cohort at various points in time. Error analysis also illuminates the types of strategies listeners utilize in ambiguous situations. This paradigm is notable for its flexibility (for a review, see Grosjean, 1996), making it an ideal choice for the present study in which the comprehension of non-linguistic auditory input was evaluated. Language processing studies in which the gating paradigm has been used provide further evidence for the cohort theory. The results of early studies of this type (Grosjean, 1980; Tyler, 1984) suggested that listeners could identify a word after hearing as little as half of its duration. The fact that the listener did not need to hear the entire duration of the word to 17 correctly identify it suggests that this process may be facilitated by context. The speed with which identification takes place indicates considerable efficiency, consistent with the cohort theory. In the present experiment, the cohort theory, or some modification thereof, will be considered with respect to the processing of environmental sound. As discussed, compelling evidence for the cohort theory, as applied to environmental sounds, would be evidence of activation of a group of candidates before correct identification of the target. Often, before words or environmental sounds are folly recognized, a few possible candidates may cross the listener's mind. An example might be the experience of driving and suddenly hearing a sound that seems to indicate car trouble. Occasionally it is caused by car trouble, but it might be a sound outside the vehicle. Until that is determined for certain, however, the car owner may be quite nervous. Although activation of a cohort may be expected for environmental sound stimuli, other supporting evidence for the cohort theory, rapid identification, may not be evident. The term rapid identification refers to the previously noted finding that listeners required less than half of the duration of a word to correctly identify it (Grosjean, 1980; Tyler, 1984). 1.7.2 Cognitive Factors in Spoken Word Identification Two studies motivated and provided an experimental framework for the present study (Wingfield, Aberdeen & Stine, 1991; Wingfield, Alexander & Cavigelli, 1994). Both investigations compared the word recognition performance of young listeners to that of older listeners, ultimately suggesting some of the cognitive factors which underlie how our brains 18 process language and some of the changes this processing undergoes as we age. The gating paradigm was used for the presentation of the stimuli in both studies. The first of these studies (Wingfield et al., 1991) investigated the role of context in spoken language comprehension. The first goal of this experiment was to determine the stimulus duration listeners needed to hear in order to correctly identify words under three stimulus conditions: high, low and no context. The second goal of the experiment was to examine error patterns in listeners' pre-recognition responses, which are the incorrect responses given before the correct word was identified. The latter analysis was performed to provide insights about bottom-up processing. The results of the study indicate that listeners were able to correctly identify words in the no context condition when presented with just over half of the word. Identification was faster still in high and low context conditions, with listeners needing to hear, respectively, twenty and less than thirty percent of the word. Analysis of the data hinted at an effect of age that appeared to increase with decreasing context, but this did not reach significance. The findings of this study suggest that older listeners are able to effectively employ linguistic context in word processing. The second study of interest (Wingfield et al., 1994) investigated the effects of direction of context on the identification of words. This study specifically addressed the issue of working memory. It was postulated that elderly listeners whose working memory resources have been compromised due to normal aging (Carpenter, Miyake, & Just, 1994) would perform more poorly on tasks which require them to hold ambiguous words in memory until they can be disambiguated. In this study, a target word was selected based on difficulty of identification in isolation, and gating proceeded on a word-by-word basis in both the preceding and following directions. The young and elderly listeners heard context in 19 one direction or the other for each of these ambiguous target words. Several findings emerged from this study. Identification of ambiguous words was facilitated by the introduction of surrounding context, as demonstrated by the fact that the more gates listeners heard, the more likely they were to correctly identify the target word. Generally, younger listeners tended to perform better in this task than older listeners; however, there were marked differences in performance for the preceding and following context conditions. Preceding context appeared to better facilitate identification than following context for both groups of listeners, and patterns of improvement were similar for both age groups. No significant effects of age were found for the preceding condition. For the following context condition, there were significant effects of age. Although both groups benefited from hearing increasing amounts of following context, the relative improvement for the older listeners was less than for the younger listeners, a difference that is likely to be related to the working memory limitations of older listeners. Working memory must be discussed in some detail, as it is a central focus of the previously described study. This type of memory is important to the study of language comprehension and production because it provides storage and computational resources for language processing. Working memory is involved in on-line language processing. For example, as a listener identifies words, these are held in working memory and processing of incoming speech continues until sentence-level interpretation is complete. There is a necessary trade-off between the limited computational and storage resources available to language processing system. If either the storage or processing demands of a linguistic operation stresses the working memory system too much, the other will suffer. For example, the test of working memory which was utilized by Wingfield et al. (1994), and the one which 20 will be used for this experiment, is the reading working memory span (Daneman & Carpenter, 1980). This measure of working memory requires the participant to read sentences aloud while maintaining the final word of each in memory. The number of sentences in each set increases from two to six, and subjects begin to have difficulty with this task after their working memory capacity is exceeded. This is because the system reaches a point where there are insufficient resources to hold the items in memory and simultaneously process new sentences. The studies by Wingfield et al. (1991, 1994) which form the basis for the present experiment compared results of on-line language processing in young adults to those in elderly listeners. It has been found that older adults frequently have reduced working memory capacities which are usually discovered when memory load is great (Carpenter et al., 1994; Kemper, 1992). This reduction in working memory capacity may not be entirely due to a cognitive decline, but also to a general deterioration of perception (Schneider & Pichora-Fuller, 1999). Although it has been reported that the reduction in working memory span may contribute to an age-related decline in language comprehension, Wingfield et al. (1994), found that there is evidence in older listeners of some compensatory adaptations that decrease the semblance and experience of a decline in language comprehension. Of course, when the system is stressed, these constraints become apparent, but under normal conditions, the adaptations make up for some of what was lost. Working memory may also play a role in the recognition of environmental sound sequences. Specifically, it would be interesting to determine i f similar patterns are found for identification of environmental sound as have been found for language. These patterns are that sound identification is facilitated more through the introduction of preceding compared 21 to following context, and that those with low working memory spans benefit significantly less from context in the following direction. Ostensibly, this would be due to a reduced capacity to hold the ambiguous sound segment in working memory while simultaneously attending to and interpreting the surrounding contextual cues. Wingfield (1996) discussed changes in language processing that occur with normal aging. He noted that there are significant differences in patterns of word identification between younger and older listeners. It seems that when minimal or no context was available to the listener, younger listeners performed better than older listeners. This is presumably a reflection of bottom-up processing ability. However, when context was introduced, younger and older listeners made use of it to different degrees. Improvement in the ability to correctly identify target words when context is introduced is thought to reflect top-down language processing. Based on the results of the aforementioned studies, it was concluded that elderly listeners make differentially greater use of context than young adults. Of course, both younger and older adults make use of both top-down and bottom-up information in the identification of spoken language. However, there appears to be an adaptive age-related shift in the balance of a listener's dependence on these sources of information, possibly as strategy to compensate for the age-related decreases in working memory and/or perceptual capacities. Elderly listeners are significantly less able to take advantage of context following an unintelligible word than are younger listeners; however, the great degree to which they are able to use preceding context may provide compensation in many everyday situations. 1.8 • The Present Study 1.8.1 Purpose There were two purposes of the present study. The primary purpose was to investigate the auditory processing of environmental sound stimuli using an on-line processing task. The second purpose was to compare patterns of results to those from studies of language processing in order to evaluate the similarities or differences in the processing of these two auditory codes. In order to make this comparison as meaningful as possible, the experiment was created to be approximately analogous to previous language processing studies (Wingfield etal., 1991, 1994). 1.8.2 Relevance If it is found that similar patterns of on-line processing occur for both language and environmental sound, this would indicate that the auditory processing for these two codes may be similar in substantial ways. This finding might also suggest that factors which affect the processing of speech might similarly affect the processing of environmental sounds. For example, it is known that listeners, with normal hearing, but particularly those with sensorineural hearing loss, have difficulty hearing speech in adverse conditions (Crandell, 1993; Dillon, 1995; Finitzio-Hieber & Tillman, 1978). Processing similarities might suggest that the extraction of environmental sounds from noisy conditions might also be difficult. Some potential applications of this are noted below for both normal and hearing-impaired individuals. Normal hearing people live in a noisy world. Of course, the term noise has become subjectively defined in our present society. One might consider anything a listener doesn't 23 care to hear, or which interferes with hearing something else, as noise. However for this discussion, we will define noise as a collection of environmental sounds, the presence of which interferes with hearing listening to a listener-defined sound of interest. We know that even normal hearing listeners have difficulty hearing speech in noise, but how might a low signal-to-noise ratio affect recognition when the signal is an environmental sound? This warrants consideration because the ability to pick a warning sound out of the acoustic environment is a matter of personal safety. Ramsdell (1970) referred to the level of listening at which we are aware of the sounds of the environment as the signal or warning level. Fortunately, we have other senses that can provide warning cues, but there are situations in which other senses are impaired, or input is not available. Consider an individual with a visual impairment on a noisy street corner, or the fact that auditory input is often available when visual input is not, for example around a corner or in the middle of the night in a dark house. A second potential application relates to hearing-impaired listeners. Generally, clients presenting with hearing loss are managed by hearing professionals from a communicative point of view. That is, amplification and aural rehabilitation focus on maximizing communicative functions. It is typically assumed that communicating with others is the most important aspect of hearing. Certainly, hearing for communication contributes greatly to quality of life and mental and emotional health. However, understanding the acoustical world around us is important with respect to both quality of life and survival. Ramsdell (1970) discussed the depression that deafened individuals experience due to the loss of the primitive level of hearing. This is the level at which we perceive background sounds even when we are not actively attending to, or are even aware of them. It is the loss of these sounds that makes 24 the world seem dead, and can lead to depression. Perhaps hearing heath professionals should address this issue when helping clients to manage hearing loss. Certainly, the importance of a more ecological approach in aural rehabilitation, and the recognition of what the hearing impaired person has truly lost has been identified by some (Noble, 1983). 1.9 Hypotheses Identification of Environmental Sounds Hypothesis H 0 l a : Listeners will be unable to identify environmental sounds based solely on auditory input. It was predicted that H 0 would not be supported. Planned analysis will be conducted to determine patterns of identification with respect to working memory of listeners, amount of context of stimuli, and direction of context. H 0 l b : Listeners will be unable to identify brief environmental sounds when additional auditory context is added. It was predicted that H 0 would not be supported. Although, this hypothesis seems to be similar to H 0 l a , it addresses whether or not additional context aids in the identification of environmental sounds that are presumed to be too brief to identify. Planned analysis will be conducted to determine the amount of context required by listeners to achieve correct identification. 25 Patterns of Environmental Sound Identification H 02a: Listeners will show similar patterns of identification for all environmental sound stimuli in terms of the number of gates required for correct recognition and high confidence ratings. It was predicted that H 0 would not be supported, but that sound-specific patterns would emerge. Specifically, it was a predicted that the high-context soundfiles would be more easily identified by listeners than low-context soundfiles. Wingfield et al. (1991) found that listeners were better able to identify targets presented in high-context sentences than those presented in low-context sentences, and a similar result was expected in this experiment. Planned analysis will be conducted for each of the environmental sounds. H 02b: Listener performance will be affected both by the quantity of the signal heard (number of gates) and the contents of the acoustic signal (the auditory objects). It was predicted that H 0 would be not supported, but that the content of the acoustic signal would have a greater effect on identification than duration. This was expected because it was the nature of the auditory objects which was expected to provide supportive context; discrete auditory objects were predicted to constitute more supportive context and repetitive auditory objects were predicted to constitute less supportive context. Planned analysis will be conducted to determine the relationship between identification of the target and the contents and duration of the acoustic signal. 26 H 02c: A l l listeners will perform equally well in the environmental sound identification task. It was predicted H 0 would not be supported; it was predicted that some of the listeners would perform better than others under conditions that place stress on working memory, such as when the ambiguous target is held in memory while context is added in the following direction. This is primarily because working memory was assumed to have an effect on processing of environmental sounds, but also because it was predicted that some participants would be more familiar with specific environmental sounds. Planned analysis will be conducted to determine the differences between listeners. Comparison of Environmental Sound Processing To Language Processing Hypothesis: H 0 3: Performance of listeners will differ from performance of listeners in analogous language processing studies. Three sub-hypothesis follow. It was predicted that H 0 would not be supported; it was expected that some kind of similarity between on-line processing of language and environmental sounds might emerge. For example, it was expected that the findings of this experiment would include evidence of cohort elicitation, as has been found in language studies (Grosjean, 1980; Tyler, 1984). H 03a: Listeners will perform equally well for high- and low-context environmental sounds. 27 It was predicted that H 0 would not be supported, but that listeners would find the high-context sounds easier to identify, because, as mentioned, Wingfield et al. (1994) found listeners identification performance was better in more supportive context. Planned analysis will be conducted to determine the nature of the difference in performance for high- and low-context environmental sounds. H 03b: Listeners will perform equally in both context direction conditions. It was predicted that H 0 would not be supported. Results similar to those found by Wingfield et al. (1994) were expected, where listeners benefited more from context preceding the target than following it. This was attributed to the demands of working memory; holding an ambiguous item in working memory as disambiguating context is presented following it is presumed to place more stress on working memory than receiving disambiguating context before the ambiguous target item. Planned analysis will be conducted to determine the difference in performance between preceding and following context directions. H 03c: Listeners in both the high and low reading working memory span groups will perform equally well. It was predicted that H 0 would not be supported. Wingfield et al. (1991, 1994) found that elderly adults, with presumably reduced working memory, performed more poorly on the tasks that stressed their working memory capacity. It was expected that the differences between young and old in the Wingfield studies (Wingfield et al., 1991, 1994) would be 28 similar to differences observed between high and low working memory span participants in the present study. Planned analysis will be conducted to determine the difference in performance between high and low reading working memory span groups. H 03d: Listener's errors will not reveal an initial cohort in the environmental sound identification process. It was predicted that H 0 would not be supported, but that the pre-identification responses from the participants would reveal a large cohort at early gates, which would eventually converge upon the correct responses. Planned analysis will be conducted to determine the composition of the initial cohort. 29 METHODS 2.1 Purpose of the Experiment The purpose of this study was to investigate on-line environmental sound processing and to study the similarities and differences between language processing and the processing of environmental sound. The intention was to accomplish this by creating an experiment that could be considered analogous to previous studies of on-line language processing (Wingfield et al., 1991, 1994). In these previous studies, the gating paradigm was used to determine the amount of auditory input listeners required to correctly identify a target word in a sentence. These researchers studied the effects of degree of context in the sentence (high versus low), direction of context (words preceding or following the target word in a sentence) and age of the listener (young versus old). Wingfield et al. (1991,1994) were interested in the effects of cognitive factors on performance, and so compared the results from younger and older listeners. The superior performance of the younger listeners, particularly on tasks that required listeners to hold ambiguous elements in memory, was attributed to the reduction of working memory span which accompanies aging (Wingfield et al., 1991). As with these previous studies, the focus of the present experiment was to determine the effects of degree of context and direction of context in identification tasks. The analogs of target words and sentences that were used were gathered from recordings of environmental sound. The targets used were brief recordings of environmental sound. These will be referred to as auditory objects, which will be defined as the sum of the sequence of acoustical energy that is created by a physical happening. This term is derived from Bregman's definition of auditory stream,' our perceptual grouping of the parts of the neural spectrogram that go 30 together ... [or] the perceptual unit that represents a single happening ... [which] can incorporate more than one sound" (1990, pp. 9-10). The term object will be used here rather than stream because the word stream is more semantically suggestive of continuous than discrete sounds. As the targets are very brief, the term object is more appropriate. The context for the present experiment will be provided by the environmental sound that surrounds the target auditory object in the original recordings as it would in the natural world. This context may be adequately described as the auditory sequence, which will be defined as any sequence of sounds that occurs within the time-domain of interest. They need not be related by physical source or acoustic properties. The perceptual experience of the auditory sequence would be equivalent to Bregman's auditory scene. Bregman does not define this term explicitly; however, his discussion and allusions to the processing in the visual modality seem to indicate that the auditory scene is analogous to the visual scene. That is, the perceptual experience of the auditory world in a particular space and in time. 2.2 Participants Twenty listeners participated in this study; 14 female and 6 male. Sixteen of these comprised the high reading working memory span group (12 female, 4 male) and four were low reading working memory span listeners (2 male, 2 female). The age range of all participants was 19.8 to 35.2 years, with a mean age of 26.45. A l l were either monolingual speakers of English or bilingual with English as the primary language. Those who didn't learn English as a first language began leaning it before age 8. A l l participants had air-conduction thresholds at or below 25 dB H L for the pure tone test frequencies 250 to 8000 Hz (for details of testing see Appendix A). In order to maximize the likelihood that 31 participants had heard sounds recorded in Vancouver (for example, the sky train or cable bus), participants were screened for having lived in the Greater Vancouver Area for a minimum of two years. In addition to hearing screening, preliminary testing also included a test of reading working memory span (Daneman & Carpenter, 1980; for a description of the test see Appendix B). Participants were assigned to groups according to their working memory span. Those in the high-span group had scores of 3 or better and those in the low-span groups had scores of less than 3. Each participant was tested for this measure during both experimental sessions, to ensure that their working memory score remained stable (Waters & Caplan, 1996). Participants whose working memory scores were not stable were not included in the study. (For participant details see Appendix C) 2.3 Materials The stimuli consisted of eight recordings of naturally occurring environmental sounds, ranging in length from 10.006 to 42.408 seconds. The sounds used in this study were selected from a corpus of soundfiles archived at Simon Fraser University as the Vancouver SoundScape Project (Truax, 1996). A total of 30 recordings were reviewed for suitability. A subset of recordings were selected. These were judged by the experimenter to be fairly common or recorded from popular or frequented locations within the Greater Vancouver Area. A pilot study was conducted to ensure that these sounds were identifiable when played in their entirety. The three participants in the pilot study listened to the entire and 32 intact soundfiles presented in conditions similar to those to be used in the main study. This pilot study determined that the selected soundfiles were reasonably easy to identify (at least two of the three listeners correctly identified the soundfiles) in their intact condition. This suggests that the auditory scenes the soundfiles describe ought to be accessible to most listeners. The recordings were also judged for the forefronting of potential targets from any ambient noises. The term forefronting used here refers to an auditory object of interest being of a higher intensity, or more audible, than other simultaneous sounds or ambient noise. In some cases, a high level of ambient noise in the recording may have confounded the listener's ability to hear the intended target. An example of this was the soundfile of a cash register in a noisy department store. A beep of the cash register scanner is audible, but the sounds of footsteps, public address system music, and voices overlap this sound, making it difficult for the listener to isolate the intended target. Therefore, it seemed important to select soundfiles in which the sounds of interest could be heard out, or isolated from the other elements of the auditory scene. Based on the results, and the experimental goal, eight soundfiles were selected which were believed to represent a range of sounds, in different contextual conditions (high versus low). A chart giving the tape and take number and duration of each of the selected soundfiles, as well as the description provided by the Soundscape archives catalog can be found in Appendix D. 33 2.3.1 High and Low Context Environmental Sounds Degree of context for environmental sounds, to our knowledge, has not previously been described. For the purposes of this experiment, high-context environmental sounds were those which contained auditory sequences of discrete or rapidly changing related sounds and low-context environmental sounds were those which consisted of slowly changing or repetitive sequences of sounds. Soundfile 1 is an example of a high-context sequence, containing the sequence of bus approaching and decelerating, braking (air brakes), door opening, person stepping on steps, putting change in farebox, bus doors closing and engine revving as the bus drives away. It was assumed that such a related sequence would readily support identification. Soundfile 6 is an example of a low-context sequence, containing primarily fizzling, crackling, and rumbling as a fire begins to burn. This sound also has a slowly changing element; the intensity increases as the fire burns more strongly. We thought that the low-context sounds would be less identifiable due to the limited acoustical cues conveyed. 2.4 Preparation of the Stimuli The selected soundfiles were copied from the original D A T tapes found in the Vancouver Soundscape Project archives at Simon Fraser University onto a Jaz disc using a Macintosh computer running the Sounddesigner program. The files were copied in a format that was suitable for using with the Soundworks program on the N E X T computer system in the School of Audiology and Speech Sciences at the University of British Columbia. Soundworks is a soundfile manipulation program that was used for preparing the stimuli for the experiment. 34 The eight soundfiles selected as stimuli were each manipulated in a similar way. First, a section of sound was selected to be the target. It was this portion which was considered to be analogous to the target word in the Wingfield et al. (1994) experiment. Selection of targets was based on both temporal and acoustic factors, as well as relative position within the soundfile. In certain soundfiles, specifically, the high-context soundfiles, the event(s) represented tended to be of a temporally discrete nature (a sequence of relatively discrete auditory events). An example of this type would be the bus example described previously (see Soundfile 1 in Appendix F). In other soundfiles, specifically the low-context soundfiles, events were of a more continuous nature. An example of this type would be the motorcycle engine example described previously. In the cases of patterns in with discrete sequences of sound, a relatively central and brief sound was selected as the target. For example, the target selected from Soundfile 1 was change falling into the fare box on a bus. In the cases of more continuous sound sequences, a central portion was selected as well, but with less regard to specific defining acoustic or temporal characteristics. Selection of the target was done in this way in an effort to make this experiment as analogous as it could be to the Wingfield experiment (1994). As in the case of the mental representation of words, certain sound events may be relatively discrete, and capturing this discreteness was attempted. It was also important to locate a target at a position within the file so that there would be ample context both preceding and following it. For each of the soundfiles, once the position of the target had been selected, a 400 msec portion containing the target was excised and saved in its own soundfile. This file, containing the target only, would become the first gate of the experiment. Gating then 35 continued independently in both the preceding and following directions, with each gate being 400 msec longer than the previous. This procedure continued until the beginning and end of each soundfile was reached. If the gating procedure resulted in portions at either end of the soundfile which were less than the full 400 msec, the portions would be included in as part of the previous gate if the duration was less than 100 msec. Portions longer than 100 msec were made into separate gates. This procedure is illustrated in Figure 1 below. A chart describing the target, auditory scene, and number of gates in each direction is included in Appendix F. In addition, charts showing the time-waveforms for the 8 selected stimuli are given in Appendix E. la. Soundfile Waveform Target 400 msec Target Gate Gate -2 Gate -3 Gate -4 Gate -5 36 lb. Soundfile Waveform Target Gate Gate 2 Gate 3 Gate 4 Gate 5 Gate 6 Target 400 msec Figure 1. Illustration of the Gating Procedure. Gating proceeds in 400 msec increments in the preceding (shown in la.) and following (shown in lb.) directions independently until beginning and end of soundfile are reached. It was recognized that for some of the soundfiles, context in one direction only might be insufficient for the correct identification of the target. That is, listeners might need to hear context in both directions. For this reason, it was decided that i f identification was not achieved after hearing the entire context in one direction, gating would proceed in the opposite direction of context. This situation is illustrated in Figure 2. 37 Soundfile Waveform G a t e x G a t e x + 1 G a t e x + 2 G a t e x + 3 G a t e x + 4 G a t e x + 5 T a r g e t 4 0 0 m s e c w w w w Figure 2. Illustration of gating in the opposite direction. The topmost arrow represents the final gate in the preceding direction. The arrows below it represent the length of gates in subsequent presentations. These include the entire beginning portion of the soundfile, as well as increasing increments in the following direction. 2.5 Presentation of the Stimuli Experimental materials were delivered to the subjects using of the Computerised Speech Research Environment, version 4.5 (CSRE 4.5, 1995) on a PC. Before stimuli were transferred to the PC for final preparation, the sampling rate was changed from the original 44,100 Hz to 20,000 Hz. This conversion was performed within the Soundworks program on the NeXT system. A l l files were then converted to binary format by a program on the NeXT system called GISO (Garbage in, sound out). At this point, files containing the stimuli were transferred from the NeXT to the PC and assembled in the experimental protocol libraries using the CSRE 4.5 software. 38 2.6 Ordering of the Stimuli The CSRE 4.5 program 'ecosgen' was used to design the experimental protocol. Four blocks of stimuli were prepared from each of the eight soundfiles. This procedure resulted in a total of 32 blocks. Two of these were for the initial direction of context; one preceding and one following. For the experiments for which context was presented in one direction only, the block was designed so that the first stimuli delivered would be the target, and subsequent gates for one direction only would then be presented for that block. In addition, two experiments were prepared for conditions in which preceding and following gates are presented along with the entire context in the opposite direction. In preparation for this, the paste function of CSRE 4.5 was used to splice together gates of one direction with their entire opposite-direction file. These blocks were designed so that the first stimuli delivered would be the second gate in one, along with its complementary portion in the other. Subsequent presentations would include successively higher gates combined with the complementary portion. The purpose was to present the listeners with the equivalent of the final gate in the initial direction of context combined with successively higher gates in the unheard, or opposite, direction of context. These are represented by the bottom five arrows in Figure 2. An additional soundfile was prepared in a manner identical to that described above. This was a briefer recording that was used only for practice and to ensure the clarity of the instructions. 39 2.7 Calibration of the Stimuli Each soundfile was calibrated individually to ensure a comfortable listening level for the subjects. This was accomplished by standardizing the intensity of the experimental stimuli (as reflected in the RMS voltage) to the intensity of a standard 1 kHz tone. This procedure is commonplace in the delivery of speech materials (Olsen & Matkin, 1991), but in this case it was extended for use with other acoustic stimuli (ie., environmental sound). Calibration began by first calculating the root mean square (RMS) voltage for each soundfile by running an in-house program (rms-spch.exe, written by Kim Yue of the Erindale College of the University of Toronto) for each of the eight experimental soundfiles as well as the practice soundfile. Next, using the 'ecosgen' program, an experiment was generated to deliver an 1-kHz pure tone in stereo to the headphones which would be used in the experiment (TDH-39), with all equipment set-up as if the experiment were being run. The pure tone was delivered both with and without attenuation so that the experimenter could ensure that there was no peak clipping. A sound level meter (Quest Electronics, model 1800 with a model OB 300 filter) was used to measure the output of the headphones. The 1-kHz tone had an intensity of 108.1 dBA without attenuation, and 88.1 dBA with 20 dB attenuation. Because these values differed by 20 dB, it was concluded that there was no peak-clipping. The 1-kHz tone was run through the same RMS calculation program to determine its RMS voltage. This value was then used as the denominator in the equation: 20 log (voltage A / voltage B), the numerator being the RMS voltage of the stimulus soundfile being measured. The result of this equation gives the average difference in sound pressure level between the soundfile and the calibration tone. It is possible to calculate the 40 average sound pressure level of the soundfile by subtracting the result of the equation from the intensity of the calibration tone. It is standard procedure to deliver experimental speech materials at an intensity of 70 dB SPL, which corresponds to average levels for speech (50 dB H L ; Miller, 1951). In order to achieve this presentation value, 70 dB was subtracted from the calculated average sound pressure level of each soundfile. The resulting difference was the amount of attenuation used during delivery of the stimuli. The attenuation values for each soundfile were entered into their corresponding experimental blocks using 'ecosgen'. Calculations for each of the soundfiles are included in Appendix G. 2.8 Experimental Conditions Stimuli were delivered binaurally to the subject via Madsen TDH 39P 10W earphones at the previously described comfortable listening level. Stimuli were presented in the absence of any experimentally introduced competing noise. Participants were tested individually in this experiment, and for the duration of each experimental session, the participant was seated in a sound-attenuating, double-walled IAC booth. The stimuli were delivered to the earphones by the CSRE 4.5 'ecoscon' program, which is the experimental control system, via the Tucker Davis Technologies D/A and attenuator modules (Appendix H provides a schematic of the configuration). The participant was able to communicate with the experimenter via a microphone located in the soundbooth, which was routed through a Madsen OB 802 audiometer. The experimenter wore a pair of Sennheiser earphones routed from the audiometer in order to monitor the participant's responses. The stimuli delivered by the TDT were routed to the 41 headphones worn by the participant via the headphone jacks of the soundproof booth. Thus, the experimenter was unable to communicate with the participant directly via the headphones. For this purpose, when necessary, the experimenter spoke to the participant using the audiometer microphone routed to a loudspeaker inside the booth. 2.9 Experimental Design The experiment was run in two sessions, each lasting from one to two hours, with four soundfiles used during each session. The practice soundfile was presented to all participants at the beginning of testing during the first session to ensure comprehension of the task. Two high-context sounds and two low-context sounds were presented at each experimental session. Each soundfile, in each context direction, was heard as the first experimental stimuli by at least one of the 16 participants. Soundfile order and context directions were counter-balanced. The table found in Appendix I illustrates the order of soundfile presentation for each of the participants. Participant number was assigned randomly to each of the participants. Participant number corresponds to participant number for the high working memory group. The random participant numbers assigned to the low working memory group are as follows in Table 1. 42 Participant Code Soundfile Presentation Order (Participant Number) 17 02 18 06 19 13 20 16 Table 1. Assignment of Soundfile Presentation Order for the Low Reading Working Memory Group. The soundfile presentation order corresponds to the subject number column in Appendix H . 2.10 The Experimental Task Following preliminary testing (RWMS testing, hearing screening), the participants were seated in the sound attenuating booth and the experimenter read the instructions for the experiment to them. They were told about the nature of the sounds they would hear; specifically, that they were recordings of 1 minute or less in duration, recorded from naturally occurring sound events in the Vancouver area, with no exotic sounds, and no splicing together of odd or incongruous sounds. It was clearly stated that the purpose of the experiment was not to trick or mislead the listener by using unusual sounds or combinations of sounds. Participants were asked to identify the sound presented on each trial. They were encouraged to give as much detail as possible, for example, number of objects causing the sound. The participants were also asked to give a confidence rating, a score from 1 to 10 that would reflect their confidence that each answer given was correct. A rating of 1 would represent complete uncertainty, and a score of 10 would represent complete certainty. (See Appendix J for the instructions given to the participants.) 43 The participants were required to respond by identifying the target and rating their confidence in their answer after hearing each and every gate presented. Guessing was encouraged. Although it was made clear that the goal was to identify the target portion, participants were encouraged to report anything else they heard. The participants' responses were recorded by the experimenter. Immediately following each gate presentation, the experimenter recorded the gate number, the participant's response for the target, the confidence rating, and any other information the participant provided (an example of a completed response form is included in Appendix K) . The experimenter then presented the listener with the next gate. 2.10.1 Stopping Criteria Criteria were established for aborting the block should the objective be met, that is i f correct identification of the target was achieved with a high degree of confidence. Presentations in a block were terminated if two criteria were met. First, the response was correct five trials in a row as determined by the range of accepted responses (discussed in Results). Second, confidence was rated seven or better. Participants were not told of any stopping criteria, but only that the object was to correctly identify the target. 44 RESULTS 3.1 Chapter Preview In this experiment, participants responded to experimental stimuli by guessing the identity of a brief environmental sound. The amount presented was incrementally increased in either the preceding or following direction. The purpose was to determine the amount of sound stimuli necessary for listeners to correctly identify a predetermined target auditory object within the stimuli. The following sections will describe the scoring procedure and the results of this experiment. The results will be discussed separately for each soundfile. Participant performance and the effects of the amount and direction of context wil l be considered. The general discussion of the results will be based on the performance of the 16 listeners with high reading working memory spans. The results of the four listeners with low reading working memory spans will be discussed following the general discussion. 3.2 Scoring Procedure The responses to be accepted as correct for each of the soundfiles were determined a priori. The acceptable responses are based on the descriptions of the soundfiles found in the Soundscape archives catalog (see Appendix D). The acceptable responses appear in Table 2. The experimenter scored the results in both an on-line and off-line manner. During testing, i f the participant gave the correct response, a checkmark was entered in the relevant column of the answer sheet (for a sample complete response sheet, see Appendix J). Following the testing sessions, the experimenter recorded the gate numbers and confidence ratings (CRs) corresponding to each correct response given for each soundfile. 45 Soundfile Number Target - Soundscape Archives Description Accepted Responses High Context Soundfiles Soundfile 1 Change in bus farebox Change in bus farebox Soundfile 3 Skytrain chimes Skytrain chimes or bell Soundfile 5 Drive humming as Hard drive or floppy drive or PC reading drive or computer boots up start-up or computer booting Soundfile 7 Dot-matrix printer Dot Matrix Printer or early/old-fashioned printer or printing non-laser type printer or computer printer printing Low Context Soundfiles Soundfile 2 Harley Davidson engine Motorcycle or Motorbike or Harley Davidson revving engine revving Soundfile 4 Ducks flying out of water Ducks flying out of water Soundfile 6 Fire crackling Fire crackling or campfire burning Soundfile 8 Waves Ocean waves or waves Table 2. Responses Accepted as Correct for Scoring 3.3 Soundfiles Identified Correctly Two of the eight soundfiles were correctly identified by all members of the high working memory span group; these were Soundfile 3 and Soundfile 7. The remaining six soundfiles were correctly identified by the majority of listeners; however, the number of listeners responding correctly varied depending on the soundfile. Figure 3 illustrates the number of listeners that responded correctly for each soundfile. This figure reflects two values for each of the soundfiles: the number of listeners that responded correctly at some point, irrespective of the confidence rating for this response (loose criterion), and the number of listeners that responded correctly and had a confidence rating of 5 or better (strict 46 criterion). A confidence level of 5 or better reflects that the listener was at least 50% certain that the response was correct. Note there are only three soundfiles for which there is a difference between the number of listeners meeting the loose and strict criteria. Therefore, all listeners who responded correctly eventually reached a confidence level more than 50% certainty for five of the eight soundfiles. Also note that half or more of the listeners responded correctly with a confidence level more than 50% certainty for every soundfile. 16 c 12 a> „ ^ "5 8 o *-i- 3 a> o Figure 3. Correct Identification of Soundfiles, Combined for the High Working Memory Span Listeners Soundfile 1 Soundfile 2 Soundfile 3 Soundfile 4 Soundfile 5 Soundfile 6 Soundfile 7 Soundfile 8 Soundfile Number 1 Total - First Response I Total - Confidence Rating of 5 or better Identification of individual soundfiles according to the initial direction of context is illustrated in Figure 4. These results suggest that for some of the soundfiles, initial direction of context may have had an effect on identification. This figure will be described further in later sections with respect to individual soundfiles. 47 Figure 4. Correct Identification of Soundfiles by Initial Context Direction for the High Working Memory Span Group Soundfile 1 Soundfile 2 Soundfile 3 Soundfile 4 Soundfile 5 Soundfile 6 Soundfile 7 Soundfile 8 Soundfile Number • Preceding Context - First Response • Preceding Context - Confidence Rating 5 or better | • Following Context - First Response • Following Context - Confidence Rating 5 or better 3.4 Identification Results for Individual Soundfiles It was decided that results for each soundfile would be discussed separately due to both the considerable differences observed in the patterns of identification during scoring and also the very different acoustical natures of the different soundfiles. In this section, various aspects of the results will be described. The first description will concern the distribution of correct responses by gate number and corresponding confidence ratings. For each soundfile, results are separated by initial direction-of-context conditions. In the accompanying figures, the soundfile is illustrated temporally on the x-axis, with the negative numbers representing preceding gates, the positive numbers representing following gates, and the gate identified as 'Target' corresponding to the first gate heard in both directions. In the text as well, preceding gates will be assigned a negative sign and following gates will be unmarked. Included in this discussion will be reference to the auditory scenes contained in the soundfiles. The accompanying time waveforms of the soundfiles can be found in Appendix F. 48 The second description of the results concerns distributions which include three different responses from each listener: (a) the first correct response (referred to as First Response in the key), (b) the first response with a confidence rating (CR) of five or better (referred to as Half Response in the key), and (c) the first response at which the listener reported their maximum confidence rating for that soundfile (referred to as Max Response in the key). This analysis was performed to explore which of these measures would be the most informative for reporting the group results. Our goal was to discover i f the distribution of these responses of interest comprised a standard/normal distribution or not. The figures (5 to 10) illustrating the distribution of these responses are separated by preceding or following condition and show the total number of responses at each gate. Figure 5 shows mean and median for first response, Figure 6 shows these statistics for the first response with a confidence rating of 5 or better, and Figure 7 shows these statistics for first response with the maximum confidence reported. The most informative measure for each will be discussed. In addition to the absolute number of gates, statistics will also be calculated for the percentage of gates in the initial direction of context1. Figure 8 shows mean and median for first response, Figure 9 shows the results for the first response with a confidence rating of 5 or better, and Figure 10 shows the results for the first response with the maximum confidence reported. Using percentages in our analysis allows a discussion of a more relative nature and clearly shows the cases in which the typical listener required more than the initial direction of context to correctly identify the target. For example, i f the mean or median is less than 100%, this indicates that the typical listener was able to identify the target before hearing the entire 1 We recognize that for many soundfiles and conditions, the mode(s) of distributions are the most appropriate statistic. Modes are noted where appropriate, however, for convenience, mean and median will be the statistics used to compare results between soundfiles. 49 soundfile in the initial direction of context. Conversely, i f the mean or median is more than 100%, this indicates that the typical listener required context in both the initial and opposite direction of context in order to identify the target. 50 Figure 5. First Response -Mean, Median Number of Gates by Soundf i le and Condi t ion 80 , : 70 -T-Soundfi le, Condit ion FJJMean • Median Figure 6. First Response at Conf idence Rating of 5 or Better -Mean, Median Number of Gates by Soundf i le and Condi t ion • 80 -i -7 0 _ T Soundfi le, Condition • Mean •Median F i g u r e 7. F i r s t R e s p o n s e at M a x i m u m C o n f i d e n c e -M e a n , M e d i a n N u m b e r o f G a t e s by S o u n d f i l e a n d C o n d i t i o n S o u n d f i l e , C o n d i t i o n El Mean •Median ~| 51 Figure 8. First Response -Mean, Median Percentage of Gates by Soundf i le and Condi t ion 300 Soundfile, Condition IMean I Median Figure 9. First Response at Conf idence Rating of 5 or Better -Mean, Median Percentage of Gates by Soundf i le and Condi t ion 250 F i g u r e 10. F i rs t R e s p o n s e at M a x i m u m C o n f i d e n c e -M e a n , M e d i a n P e r c e n t a g e of Ga tes by S o u n d f i l e a n d C o n d i t i o n Soundfi le, Condit ion • Mean •Median 52 3.4.1 Soundfile 1 The majority of listeners correctly identified Soundfile 1, as shown in Figure 3. Seven of the eight listeners in each condition gave a CR of 5 or better for the correct response, as shown in Figure 4. The distribution of all correct responses for Soundfile 1 is illustrated in Figures 11 and 12. The results from listeners in the preceding-first context condition shows there was a relatively high concentration of correct responses from gates -13 to -19, and two listeners who responded with a greater number of preceding gates. Steep confidence rating curves are observed for several of these listeners. Three of the listeners who responded correctly also heard the following direction of context. These participants had a widely distributed pattern of correct responses. For Soundfile 1, gates -13 to -21 contained the sounds of the brakes squeaking (peaks at gates -16 and -15) and then releasing air (peaks at gates -14 and -13). The results from listeners who heard the following-first direction of context tended to have a more widely distributed pattern of correct identification through the gates following the target. Four of the listeners who responded correctly also heard the preceding direction of context. There is a concentration of responses from these listeners from the gates -2 to -10. These listeners have similar CR curves. Gates -2 to -10 contained the sounds of the door opening (gate -9) and a person stepping onto the bus (gates -7 and -6). 53 Figure 1 1 . Soundfile 1 : Distribution of Correct Responses for Each Listener in the Preceding-First Context Group -28 -22 -16 -10 -4 4 10 16 22 28 34 40 46 52 Gate Number - • - 1 - A - 3 - • - 6 - * - 8 - • - 1 0 - A - 1 2 - • - 1 3 - X - 1 5 10 ai 8 c O 2 0 Figure 12. Soundfile 1 : Distribution of Correct Responses for Each Listener in the Following-First Context Group 7 >O0< >0< -2 -4 -5 -7 -9 -11 -14 -16 -28 -22 -16 -10 -4 10 16 22 28 34 40 46 Gate Number 52 The distribution of the three response categories is shown in Figure 13 for the preceding-first direction of context, and Figure 14 for the following-first direction of context. It can be seen that the distribution for these responses was closer to normal for the preceding-first condition than for the following-first condition. Therefore, median or mode is more appropriate than mean for describing the central tendency of the distribution. The modes in these distributions are at gates —14 and 21 in the preceding-first context condition (Figure 13), andat gate -4 in the following-first context condition (Figure 14). 54 A Figure 13. Soundf i le 1 : Dis t r ibut ion of First Responses for L is teners in the Preceding-First Context Cond i t ion Number of Listeners (Out of 16) Number of Listeners (Out of 16) Number of Listeners (Out of 16) Number of Listeners (Out of 16) II I -28 -23 -18 -13 -8 -3 4 9 14 19 24 29 34 39 44 49 Gate Number | El First Response BHalf Response DMax Response | Figure 14. Soundf i le 1: Dist r ibut ion of First Responses for L is teners in the Fol lowing-First Context Condi t ion * 8 o Listene if 16) nber of (Outo 3 z I I II I I U 28 -23 -18 -13 -8 -3 4 9 14 19 24 29 34 39 44 49 Gate Number jH First Response • Half Response • Max Response | The means and medians for these response categories are shown in Figure 5 through 10. Results of the analysis of distribution of first responses suggest that the appropriate statistic should be the median or the mode for both direction-of-context conditions. Note, that for the preceding condition, the standard deviation reveals considerable variation for all three response categories. Note that both the mean and median number of gates for the preceding-first condition were lower than for the following-first condition for all response categories (Figures 5 — 7). Finally, note that the means and medians were at less than 100% of the number of gates in both initial direction of context conditions for all but the maximum confidence response (Figures 8-10). That is, listeners tended to correctly identify the target after hearing context in only one direction, whether preceding or following. 55 3.4.2 Soundfile 2 A total of 15 of the 16 listeners responded correctly with a confidence rating of at least five (see Figure 1). A l l of the eight listeners in the following context condition responded correctly and all but one of the preceding context condition responded correctly (see Figure 2). The distribution of listeners' correct responses to Soundfile 2 is illustrated in Figures 15 and 16. The results from the listeners in the preceding-first context condition show a high concentration of responses in the preceding direction, beginning almost immediately preceding the target. Results indicate that the majority of listeners in this condition gradually increased in CR as more of the soundfile was heard. Three of these listeners also heard context in the following direction. Two responded with a rapidly increasing CR in the first seven gates of the following direction, and one did not increase confidence until much more of the soundfile had been heard. Six of the eight listeners reached a high level of confidence (CR of 8 or better). The gates preceding the target contain a random series of engine sounds (revs) of an idling motorcycle. The results from the listeners in the following-first context condition indicate a concentration of correct responses with two trends. The first was a rapid increase in confidence in the correct response between gates Target and 14. The second was a more gradual increase from gates one to 26. Six of the eight listeners reached a high level of confidence (CR of 8 or better). None of the listeners in this condition required context in the preceding direction to correctly identify the target. The gates following the target contained more engine sounds of a motorcycle, but these became more periodic as the motorcycle 56 started moving (approximately gate 4). These engine noises continued and became louder (first peak was at gate 18) as the motorcycle approached the microphone of the recorder, and passed by (gates 24 and 25). The intensity decreased as the motorcycle drove away (gate 30 to end of soundfile). 10 Figure 15. Soundf i le 2: Distr ibut ion of Correct Responses for Each Listener in the Preceding-First Context Group CE c % 4 O 2 O r ^ / SK AA A^AkA. - 2 - 4 - 5 - 7 - 9 - 1 1 - 1 4 - 1 6 -21 - 1 6 -11 - 6 Target 6 11 16 21 26 31 36 41 4 6 Gate Number Figure 16. Soundf i le 2: Distr ibut ion of Correct Responses for Each Listener in the Fol lowing-First Context Group - • - 1 — A — 3 - • - 6 — * — 8 - • - 1 0 —A— 12 - • — 1 3 - X - 1 5 The distribution of the three response categories is shown in Figure 17 for the preceding-first direction of context, and Figure 18 for the following-first direction of context. It can be seen that the distribution for these responses was closer to normal for the preceding than following conditions. The modes in these distributions are at gate -4 for the preceding context condition (Figure 17), and at gates 4 and 20 for the following context condition (Figure 18). 57 Figure 17. Soundf i le 2: Distr ibut ion of First Responses for L isteners in the Preceding-First Context Condi t ion at t_ 0) c 0) _ .2 S •R ° i- 3 g> o 1 -21 . -ie -11 -6 Target 11 16 21 Gate Number 26 31 36 4 6 3First Response MHal f Response D M a x Response | Figure 18. Soundf i le 2: Distr ibut ion of First Responses for L isteners in the Fol lowing-First Context Condi t ion £ " 3 2 E 3 -21 -17 -13 -9 -5 T a r g e t 9 13 17 21 25 29 33 37 41 Gate Number iFirst Response • H a l f Response D M a x Response | The means and medians for the three response categories are shown in Figures 5 though 10. Results of the analysis of distribution suggested that the statistic of choice is the mean for the preceding context conditions and mode or median for the following context condition. Note, again, for both conditions, there is considerable variation for all three response categories in these conditions. Note that in both direction conditions only a small number of gates frequently yielded correct identification of the target. Note that less that 50% of the number of gates available in the preceding direction (Figure 8-10) was sufficient for target identification although more gates were required to achieve the maximum confidence (Figure 10). Similarly, less than 50% of the number of gate available in the following condition was sufficient for target identification, even to achieve maximum confidence. In both direction-of-context conditions, the maximum confidence rating was high (CR of 8 or better). 58 3.4.3 Soundfile 3 A l l listeners correctly guessed the target for Soundfile 3 (see Figures 1,2). The distribution of listeners' correct responses to Soundfile 3 is illustrated in Figures 19 and 20. For listeners in the preceding-first context direction there is a concentration of correct responses between gates —3 and —13, with their CRs for these responses uniformly high. For the listeners in this condition who correctly responded at other gates, the gate where they achieved correct identification varied widely, and one listener required following context to correctly identify the target. Gates —3 to —13 contained the electric hum of the Skytrain and the station ambience (lots of reverberation), feet running (gates —5 and -A), doors engaging (gate —3). Gate —2 was the sound of the first of three tones that announce the closing of the Skytrain doors (the others were at gate Target and following gate 2). The results from the listeners in the following-first context direction indicate a concentration of correct responses with gradually increasing CRs from gate 4 to gate 19. The CR curves for these listeners are similar. Two more listeners respond correctly with high CRs from gates 24 to 29. One listener required context in the preceding direction before responding correctly. Gates 4 to 19 contain the final two Skytrain bells (gates Target and 2), the door closing (gates 4 to 9), the electrical hum of the Skytrain increasing in intensity as it the engine engages (gates 12-19) and it begins to move. 59 Figure 19. Soundfile 3: Distribution of Correct Responses for Each Listener in the Preceding-First Context Group r r v T r n i i i r r i r i r i i i i i i i i i i i i i i i -50 -46 -42 -38 -34 -30 -26 -22 -18 -14 -10 -6 -2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Gate Number Figure 20. Soundfile 3: Distribution of Correct Responses for Each Listener in the Following-First Context Group 10 a 8 4 -°= 6 O 2 4 --2 -4 -5 -7 -9 -11 -14 -16 -50 -46 -42 -38 -34 -30 -26 -22 -18 -14 -10 -6 -2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Gate Number The distribution of the three response categories is shown in Figure 21 for the preceding-first direction of context, and Figure 22 for the following-first direction of context. It can be seen that the distribution for these responses is not particularly normal for either the preceding-first context condition or the following-first context condition. The modes in these distributions are at gate —3 for the preceding-first context condition (Figure 21) and at gates 9 and 24 in the following-first context condition (Figure 22). 60 Figure 2 1 . Soundfile 3: Distribution of First Responses for Listeners in the Preceding-First Context Condition 5 at <5 4 0) Z3 Q Lis o *•* ber o 2 E 1 z 0 -50 -46 -42 -38 -34 -30 -26 -22 -18 -14 -10 -6 -2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Gate Number |HFirst Response • Half Response QMax Response j Figure 22. Soundfile 3: Distribution of First Responses for Listeners in the Following-First Context Condition at a c S —~ .2 £ I I -50 -46 -42 -38 -34 -30 -26 -22 -18 -14 -10 -2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Gate Number | B First Response BHalf Response DMax Response | The means and medians for these response categories are shown in Figures 5 through 10. Results of the previous analysis suggest that the appropriate statistic should be the median or mode for both the preceding-first and following-first context conditions. Note, again, that there is considerable variation for all three response categories in both context direction conditions. Note that both the number of gates required for target identification in the preceding-first condition was lower than for the following-first condition for all response categories. Finally, note that the means and medians were at considerably less than 100% of the number of gates for all responses and for both conditions. 61 3.4.4 Soundfile 4 Eleven of the sixteen listeners correctly identified Soundfile 4 (see Figure 1); five in the preceding direction and six in the following direction (see Figure 2). This is one of the soundfiles identified by few of listeners. The distribution of listeners' correct responses to Soundfile 4 is illustrated in Figures 23 and 24. The results from the listeners in the preceding-first context direction indicates that no listeners responded correctly with context only in the preceding direction; all required following context. The distribution of correct responses occurs from gates 3 to 12. Four of the five listeners who responded correctiy had high CRs (8 or better) and similar CR curves. Preceding gates contain the sounds of low-intensity quacking. The results from the listeners in the following-first context direction indicate that the distribution of correct responses occurs from following gate 7 to 18. Confidence ratings varied as did CR curves. One listener dropped from a high CR to a lower one. Three of the listeners also heard context in the preceding direction. Following gates in this soundfile contain the sounds of ducks' wings splashing in the water (Target to gate 7), then a series of higher intensity quacks that decrease in intensity (gates 6 to 11) as the ducks fly away, followed by a sound from a different species of bird (gates 14 to 16). 62 10 o> 8 c (0 <= 6 8 • c •a 4 o O 2 Figure 23. Soundf i le 4: Dis t r ibut ion of Correct Responses for Each Listener in the Preceding-First Context Group -2 -4 -5 -7 -9 -11 -14 -16 -8 4 6 Gate Number 10 12 14 16 Figure 24. Soundfile 4: Distribution of Correct Responses for Each Listener in the Following-First Context Group 10 A — A o c 0) •a o o O * - » - 1 - A - 3 - • — 6 —0—8 - H - 1 0 —A—12 —©—13 - X - 1 5 -8 -6 2 4 6 Gate Number 10 12 14 16 The distribution of the three response categories is shown in Figure 25 for the preceding-first direction of context, and Figure 26 for the following-first direction of context. It can be seen that the distribution for these responses is relatively multimodal for the preceding-first condition, but even more more dispersed for the following-first condition. The modes in these distributions are at gates 5 and 7 in the preceding-first direction of context (Figure 25) and at gates —8, 7, and 12 in the following-first direction of context (Figure 26). 63 Figure 25. Soundf i le 4: Dis t r ibut ion of First Responses for L is teners in the Preceding-First Context Cond i t ion E a i 1 z 0 I -9 Target 5 7 Gate Number 11 13 15 17 BI First Response • Hall Response • Max Response Figure 26. Soundf i le 4: D is t r ibut ion of First Responses for L is teners in the Fol lowing-Fi rs t Context Cond i t ion c S 52 S a i 1 1 1 1 -7 Target 5 7 Gate Number 11 13 15 17 G First Response M Halt Response • Max Response The means and medians for these response categories are shown in Figures 5 through 10. Results of the previous analysis suggest that the appropriate statistic should be the modes for both initial direction of context conditions. Note the relatively small standard deviation about the means for all response categories. This combination of multimodal distribution and small variation may have been due to the small size of the soundfile. Compared to the other soundfiles used, Soundfile 4 had considerably fewer gates. Note that both the mean and median number of gates for the following-first condition were only slightly lower than for the preceding-first condition for all response categories. Finally, note that all three response categories for the typical listener in the following-first context conditions were less than 100%, but all three response categories were at more than 100% the typical listener in the 64 following-first context conditions was at more than 100% for the typical listener in the preceding-first context condition. Note, again, that this was a small soundfile, and that there were only nine gates in the preceding direction of context. 3.4.5 Soundfile 5 Eleven of the 16 listeners correctly identified Soundfile 5 (see Figure 1). Five of the eight in the preceding-first context condition were correct, and seven of the eight in the following-first context condition were correct (see Figure 2). The distribution of listeners' correct responses to Soundfile 5 is illustrated in Figures 27 and 28. The results from the listeners in the preceding-first context direction indicates that only two listeners responded correctly with context in only the preceding direction and only one of these had high confidence ratings. Three listeners required context in the following direction to reach stopping criteria. Two of these listeners followed a similar response pattern from gates two to 12. The preceding gates contained the sounds of a switch being flipped (gates -30 and-29), a hum (gates -28 to -20), a series of clicks (gates -19 to -12, more humming (gates —11 to -6), a buzz from a computer drive (gate —5), more humming (gates -A to —2), and the buzz of the computer drive (Target). The results from the listeners in the following-first context direction indicate a concentration of similar responses for three listeners from gates eight to 19. The distribution of the other listeners' correct responses is dispersed throughout the following gates. Three of the listeners also heard context in the preceding direction. The distribution of correct responses is dispersed throughout the preceding gates. In the following gates, the Target buzz is followed by more humming, a beep (gate 8), then a rattling hum (gate 10 to the end of the 65 soundfile). Figure 27. Soundfile 5: Distribution of Correct Responses for Each Listener in the Preceding-First Context Group 31 -26 -21 -16 -11 -6 Target 6 11 16 21 26 31 36 41 Gate Number -m-2 - * - 4 - • - 5 - * - 7 - « - 9 - A - 1 1 - • - 1 4 -X - 1 6 Figure 28. Soundfile 5: Distribution of Correct Responses for Each Listener in the Following-First Context Group -1 -3 -6 -8 -10 -12 -13 -15 The distribution of the three response categories is shown in Figure 29 for the preceding-first direction of context, and Figure 30 for the following-first direction of context. It can be seen that the distribution for these responses is somewhat flat for the following-first condition, but more multimodal for the preceding-first condition. Modes are at gates —13 and 8 for the following-first context condition. Modes for the preceding-first condition are more difficult to determine due to the flatness of the distribution. 66 Figure 29. Soundfile 5: Distribution of First Responses for Listeners in the Preceding-First Context Condition a c .2 " 3 to ° 2 -31 -26 -21 -16 -11 -6 Target 6 11 16 21 26 31 36 Gate Number 41 3 First Response M Half Response • Max Response Figure 30. Soundfile 5: Distribution of First Responses for Listeners in the Following-First Context Condition c | S _ o 2 0 *-v. = 8 O 1 1 z -31 -26 -21 -16 -11 -6 Target 6 11 Gate Number 16 21 26 31 36 41 | B First Response MHalf Response DMax Response | The means and medians for these response categories are shown in Figures 5 through 10. Results of the previous analysis suggest that the appropriate statistic should be the median for the preceding-first context condition and the modes for the following-first context condition. Note that consistent with the flatness of the distributions, the standard deviations are large. Note that both the mean and median number of gates for the following-first condition were lower than for the preceding-first condition for all response categories, but that this difference is smallest for the maximum CR response. Finally, note that the means and medians were less at than 100% of the number of gates for all response categories for the \ 67 following condition, but were at or greater than 100% for all response categories for the preceding condition. 3.4.6 Soundfile 6 Thirteen of the listeners correctly identified Soundfile 6, but only ten of these reached a confidence rating of five or better (see Figure 1). Five listeners in each of the two conditions responded correctly. The distribution of listeners' correct responses to Soundfile 6 is illustrated in Figures 31 and 32. The results from the listeners in the preceding-first context direction indicate a concentration of correct responses from gate—23 to gate—28 and beyond. Five of the listeners required context in the following direction. Two trends emerge in the following context responses. Two of the listeners responded correctly with high CRs (8 or better), and three respond correctly with much lower CRs. Preceding gates contain the striking (gate —25) and fizzling of a match (gates —24 to —23) and the crackling of a fire (begins at gate —21). The intensity of the sound continues to increase to the Target gate. The results from the listeners in the following-first context direction indicate that only two of the listeners responded correctly with only following context. Five listeners also required preceding context. The distribution of the results from these listeners is concentrated from gates 25 to 28. The preceding gates contain the sound of the fire beginning to rumble (begins shortly after the target) and continuing to increase in intensity to the end of the soundfile. The concentration of responses at preceding gates 28 to 23 corresponds to the match sounds described above. 68 10 mmm !: \ | >e< • •a 4 Figure 31. Soundfile 6: Distribution of Correct Responses for Each Listener in the Preceding-First Context Group o O 2 x_. - « - i — A — 3 - • - 6 — • - 8 - • - 1 0 a - A - 1 2 - • - 1 3 -X - 1 5 -28 -25 -22 -19 -16 -13 -10 -7 -4 Ta 4 7 10 13 16 19 22 25 28 31 34 37 Gate Number 40 Figure 32. Soundfile 6: Distribution of Correct Responses for Each Listener in the Following-First Context Group - 2 - 4 - 5 - 7 - 9 -11 -14 -16 The distribution of the three response categories is shown in Figure 33 for the preceding-first direction of context, and Figure 34 for the following-first direction of context. It can be seen that the distribution for these responses was multimodal for the preceding-first context condition, and more unimodal by skewed for the following-first context condition. Modes in the distributions are at gate 12 for the preceding-first context condition (Figure 33) and gates —25 to —27 for the following-first context condition (Figure 34). 69 Figure 33. Soundfile 6: Distribution of First Responses for Listeners in the Preceding-First Context Condition -28 -25 -22 -19 -16 -13 -10 -7 -4 Ta 4 7 10 13 16 19 22 25 28 31 34 37 40 Gate Number Fi First Repsonse BHalf Response DMax Response j Figure 34. Soundfile 6: Distribution of First Responses for Listeners in the Following-First Context Condition 4 -i—n—TT— : 1 s „ S 3 — T " £ 111 J j , _ : : . 1 l l I -28 -25 -22 -19 -16 -13 -10 -7 -4 Ta 4 7 10 13 16 19 22 25 28 31 34 37 40 Gate Number ; 1 First Repsonse 1 Hall Response DMax Response | The means and medians for these response categories are shown in Figure 5 through 10. The statistic of choice is be the mode for both direction-of-context conditions. Note the large mean number of gates for each of the response categories, particularly for the following-first context condition. Both the mean and median number of gates for the preceding-first condition were considerably lower than for the following-first condition for all response categories. Note also that the medians were generally greater than 100% of the number of gates in the initial direction of context for both direction conditions. The only exception was the first response in the preceding direction. 70 3.4.7 Soundfile 7 A l l listeners in both direction-of-context conditions correctly identified Soundfile 7 (see Figure 7). The distribution of listeners' correct responses to Soundfile 7 is illustrated in Figures 35 and 36. The results from the listeners in the preceding-first context direction indicate that all listeners responded correctly with preceding context only. Further, the distribution of responses from five of the listeners shows the concentration of correct responses occurs from the Target gate to preceding gate 16 with a steep increase in CRs. Three listeners responded later, or with more gradual increases in CRs. Preceding gates Target to 16 contain the sounds of keystrokes (gates 14-13 and 11), paper feeding (gate 7), the whine of a line being printed (gates 6-5) parts moving in preparation for printing the next line (gates 5-4) and the whine of the printer as a line is printed (gates 4 to Target). The results from the listeners in the following-first context direction indicate that seven of the listeners responded correctly by or before the second gate. The response curves for these listeners were, on average, shallower than those for the listeners in the preceding condition. Three listeners started at and maintained high CRs, three increased CRs quickly (over less than ten gates), and two increased more slowly, and to a lower maximum CR. Only one of the listeners in this condition required context in the preceding direction. Following gates contain the sounds of the printer as several lines are printed (Target to gate 30), humming (gates 31-32) and feeding the paper through (gates 33 to the end of the soundfile). 71 10 o> 8 T J 4 Figure 35. Soundfile 7: Distribution of Correct Responses for Each Listener in the Preceding-First Context Group M - - -- 3 9 - 3 6 - 3 3 - 3 0 - 2 7 -24 -21 - 1 8 - 1 5 - 1 2 -9 -6 -3 2 5 8 11 1 4 1 7 2 0 2 3 2 6 2 9 3 2 3 5 3 8 41 Gate Number Figure 36. Soundfile 7: Distribution of Correct Responses for Each Listener in the Following-First Context Group i i 1 i i i i i -39 - 3 6 - 3 3 - 3 0 - 2 7 -24 -21 - 1 8 - 1 5 - 1 2 -9 -6 -3 2 5 8 11 1 4 1 7 2 0 2 3 2 6 2 9 3 2 3 5 3 8 41 Gate Number -1 - 3 - 6 - 8 - 1 0 - 1 2 - 1 3 - 1 5 The distribution of the three response categories is shown in Figure 31 for the preceding-first direction of context, and Figure 32 for the following-first direction of context. It can be seen that the distribution for these responses is relatively unimodal but slightly skewed for both context conditions. The modes in these distributions are at gates —5 for the preceding-first context condition (Figure 37) and at gate 2 for the following-first context condtion (Figure 38). 72 Figure 37. Soundfile 7: Distribution of First Responses for Listeners in the Preceding-First Context Condition 10 2 9 S~ 7 S £ 6 B ° 5 • - 3 4 f & 3 1 2 2 1 Si -39 -36 -33 -30 -27 -24 -21 -18 -15 -12 -9 -6 -3 2 5 8 11 14 17 20 23 26 29 32 35 38 41 Gate Number i First Repsonse M Half Response • Max Response Figure 38. Soundfile 7: Distribution of First Responses for Listeners in the Following-First Context Condition 10 CO 9 8 -istenf to 7 6 o 5 (Out 4 iqiui (Out 3 z 2 HE -39 -36 -33 -30 -27 -24 -21 -18 -15 -12 -9 -6 -3 2 5 8 11 14 17 20 23 26 29 32 35 38 41 Gate Number I First Response BHal t Response DMax Response | The means and medians for these response categories are shown in Figures 5 through 10. The statistic of choice is the median for both direction-of-context conditions. Note, however, that the standard deviation reveals considerable variation for all three response categories in the following-first context condition. Note that both the mean and median number of gates for the following-first condition were marginally lower than for the preceding-first condition for all response categories. Finally, note that the mean percentage of gates in the initial direction was at much less than 100% for all response categories for both direction-of-context conditions. 73 3.4.8 Soundfile 8 Fourteen of the 16 listeners correctly identified Soundfile 8, but only 13 of these with a confidence rating of 5 or better (see Figure 1). A l l of the eight listeners who heard the preceding-first context condition responded correctly, but only 5 of those who heard the following-first context condition responded correctly with a confidence rating of 5 or better. The distribution of listeners' correct responses to Soundfile 8 is illustrated in Figures 38 and 39. The results from the listeners in the preceding-first context direction indicate a concentration of correct responses from gate—9 to—19, and continuing in the following direction from following gates 2 to 11. There appears to be a gradually increasing slope for the CR curves. Four listeners also heard context in the following direction. Preceding gates contain the sounds of a wave crashing on the shore and breaking (gates —16 to —14) an increase in intensity (peaks at gate —12), then pebbles knocking against one another as the water recedes (gates —5 to Target; continuing to gate 12) The results from the listeners in the following-first direction of context indicate a dispersed multimodal distribution of responses in the following direction. Six of the listeners in this condition heard context in the preceding direction. There is a concentration of responses from these listeners from gates —2 to —11. The slopes of these CR curves are shallow. As mentioned in the preceding paragraph, the gates Target to 12 contain the sounds of pebbles against each other as the wave recedes before another wave begins rolling in (gate 12). 74 Figure 39. Soundfile 8: Distribution of Correct Responses for Each Listener in the Preceding-First Context Group 0 4 -1 -16 -13 -10 -4 Target 4 Gate Number 10 13 16 19 -1 -3 -6 -8 -10 -12 -13 -15 10 Figure 40. Soundfile 8: Distribution of Correct Responses for Each Listener in the Following-First Context Group -16 -13 -10 -2 -4 - 5 -7 -9 -11 -14 -16 The distribution of the three response categories is shown in Figure 41 for the preceding-first direction of context, and Figure 42 for the following-first direction of context. It can be seen that the distribution for these responses is multimodal for the preceding-first condition, with even more modes for the following-first condition. The modes in these distributions are at gates —17 and —10 for the preceding-first condition (Figure 42) and at —19, —9, and —4 for the following-first condition (Figure 43). 75 Figure 41. Soundfile 8: Distribution of First Responses for Listeners in the Preceding-First Context Condition co 4 l I 1 31 i , j . i . i i j . i . . , I , . , j i i , -19 -16 -13 -10 -7 - 4 Target 4 7 10 13 16 19 Gate Number B First Reponse MHalf Response • Max Response Figure 42. Soundfile 8: Distribution of First Responses for Listeners in the Following-First Context Condition stener: 6) I 1 lumber of Lis (Out of 1 I—: —I 1 lumber of Lis (Out of 1 I—: —I 1 1 z 1 n I I I 1 1 u 19 -16 -13 -10 -7 - 4 Target 4 7 10 13 16 19 Gate Number |UFirst Reponse • Half Response DMax Response | The means and medians for these response categories are shown in Figure 5 through 10. The statistic of choice is the mode for both direction-of-context conditions. Note that both the mean and median number gates for the preceding-first condition were lower than for the following-first condition for all response categories. Note also mean percentage of gates in the initial direction is no greater than 100% for any of the response categories for the preceding condition, but that these percentages are much higher in the following-first context condition, exceeding 100% for the CR 5 or Better and Maximum CR responses. 3.5 Patterns in the data This section will describe some of the patterns that emerge from these distributions. This description will focus primarily on the distributions of all correct responses for each of the soundfiles. Correct responses at early gates: Listeners were able to identify some targets after hearing relatively little of the soundfile. Those soundfiles and conditions that were correctly identified by several listeners within five gates of the target are Soundfile 2 and 7, in both the preceding-first and following-first conditions, Soundfile 3 in the preceding-first condition. Accordingly, both mean and median number of gates for the three response categories are considerably lower for these soundfile conditions than for many of the others. Correct responses based on one direction of context only: In two cases, all listeners responded correctly after hearing the soundfile in one direction of context only. This occurred for the following-first direction-of-context condition for Soundfile 2 and the preceding-first direction of context for Soundfile 7. As is noted in the following two points, for these two soundfiles, correct responses were concentrated in both directions. A concentration of responses in the preceding direction: Soundfiles 1,2, 6, 7 and 8 show a concentration of responses in the preceding-first context condition. In the case of Soundfiles 1, 6 and 8, there was a similar concentration for the gates preceding the target for the listeners who were in the following-first context condition. 77 A concentration of responses in the following direction: Soundfiles 2, 3, 5, and 7 show a concentration of responses in the following-first context condition. 3.6 Direction of Context There are no clear effects of initial direction of context. Medians were generally lower for the preceding-first conditions than for the following-first conditions for Soundfiles 1,3,6, and 8. The opposite trend was observed for Soundfiles 2, 4, 5, and 7. The implication is that identification may be more strongly based on the information conveyed in the acoustic signal than on the amount or direction of the soundfile heard. That is, response accuracy may be based on quality as opposed to quantity of input. 3.7 Amount of context -Soundfiles 1,3,5, and 7 were considered to be high-context sounds (as defined by the criteria outlined in section 2.3.1 "High and Low Context Environmental Sounds"). Soundfiles 2, 4, 6, and 8 were considered to be low-context (also as defined in section 2.3.1). The soundfiles that yielded the best identification performance in terms of the number of listeners who correctly identified them, were the high-context Soundfiles, 1, 3, and 7. Those that yielded the worst performance were primarily the low-context conditions, 4, 6, and 8. However, listeners performed relatively poorly on Soundfile 5, one of the so-called high-context soundfile and they performed relatively well on Soundfile 2, one of the so-called low-context soundfile. It is of interest to consider why the identification of Soundfiles 2 and 5 departed from what was expected based on the initial assumptions regarding context. 78 A comparison in terms of the median number of gates at the CR of 5 or better response category was undertaken to further explore the nature of context. Medians were used because most of the distributions were not normal. The medians were ordered from smallest to largest and are shown in Figure 43. It can be seen that there are no clear patterns of results which can be explained as an effect of context; however, it can be seen that at the extremes, the median number of gates is low for Soundfile 7 (high-context) in both conditions, and very high for Soundfile 6 (low-context) in both conditions. A similar ordering of medians and means is presented in Figure 44, in this case for percentage of gates in the initial direction of context. This figure illustrates the amount of auditory input required for correct identification relative to the length of the initial direction of context. Soundfile 7 is again at one extreme and six is near the other. These soundfiles contrast on several dimensions, both in terms of acoustic content as well as other performance parameters. Soundfile 7 is the sound of a machine that is probably very common in the acoustic environment of the typical listener in this experiment, (university student), whereas fire is probably much less frequently heard. Soundfile 7 was disambiguated by listeners at much earlier gates than Soundfile 6, which listeners typically did not disambiguate until the match striking was heard in last few gates in the preceding direction. Finally, more listeners in general were able to identify Soundfile 7 than Soundfile 6. If number of modes, instead of medians or means are considered with respect to high-and low-context soundfiles, there is once again a lack of any strong patterns, with unimodal and multimodal distribution of responses being found for both high-and low-context soundfiles, in both the preceding and following context conditions. 79 Figure 43. First Response at CR of 5 or Better - Ordered Mean, Median Number of Gates by Soundfile and Condit ion 80 - i Figure 44. First Response at CR of 5 or Better - Ordered Mean, Median Percentage of Gates by Soundfile and Condit ion 300 -, — 250 ; • Mean • Median 3.8 Listener Performance 3.8.1 High Reading Working Memory Listeners Listener performance for the high reading working memory group will be described in this section. First correct response gates for each individual are plotted along with the group median. The median is used because some of the distributions were non-normal. These results are found in Figures 45 through 60. The ordering of the results for individual 80 participants is by direction of context, then by amount of context. This was intended to make apparent any effects of direction and amount of context on performance. Listeners who performed extremely well or extremely poorly could not be readily identified. More often, listener's performance was marginally different from the group mean, with possibly one or two outlying values. Three individual listeners, however, appeared to consistently perform better or worse than the median. Participant 5 consistently responded correctly at gates at or below the median (see Figure 55). Participant 3 responded correctly at gates at or above the median for the majority of soundfiles (see Figure 46). Participant 1 also responded correctly at gates above the median for the majority of the soundfiles (Figure 45). Therefore, it does not seem that individuals are consistently excellent or poor at identifying environmental sounds. Perhaps each listener had varying degrees of experience with particular sounds. 81 Figure 45. Participant 1 : First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 1 Soundfile 3 Preceding Preceding High High Soundfile 6 Soundfile 8 Soundfile 5 Soundfile 7 Soundfile 2 Preceding Preceding Preceding Following Following Low Low High High Low Soundfile, Direction of Context, Degree of Context Soundlile 4 Following Low iFirst Response with CR of 5 or Better IMedian for Group Figure 46. Participant 3: First Correct Response Gate at CR of 5 or Better With Group Data 70 60. 50 40 30 20 10 Soundfile 1 Soundfile 3 Preceding Preceding High High Soundfile 6 Soundfile 8 Soundfile 5 Soundfile 7 Preceding Preceding Preceding Following Low Low High High Soundf i le, Direct ion of Context, Degree of Context Soundfile 2 Following Low I First Response with CR of 5 or Better IMedian for Group Soundfile 4 Following Low Figure 47. Participant 6: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 1 Preceding High Soundfile 3 Preceding High Soundfile 6 Soundfile 8 Soundfile 5 Soundfile 7 Soundfile 2 Preceding Preceding Preceding Following Following Low Low High High Low Soundfile, Direction of Context, Degree of Context Soundfile 4 Following Low IFirst Response with CR of 5 or Better IMedian for Group F igu re 48. P a r t i c i p a n t 8: F i rs t C o r r e c t R e s p o n s e Gate at CR of 5 or B e t t e r Wi th G r o u p Data 70 60 J 50 | 40 2 30 + n O 20 1 0 0 Soun . Prec H dlile 1 ding igh Soundfile 8 Soundfile 5 Soundfile 7 Preceding * Preceding Following Low H igh H igh Sound f i l e , Di rect ion of Contex t , Degree of Contex t Soundlile 2 Following Low Soundlile 4 Following Low IFirst Response with CR of 5 or Better IMedian lor Group 82 Figure 49. Participant 10: First Correct Response Gate at CR 5 or Better With Group Data Soundfile Precedini High 1 Soundfile 3 Soundfile 6 Soundfile 8 Soundfile 5 Soundfile 7 Soundfile 2 ) Preceding Preceding Preceding Preceding Following Following High Low Low . High High Low Soundfile, Direction of Context, Degree of Context Soundfile 4 Following Low I First Response with CR of 5 or Better I Median for Group Figure 50. Part ic ipant 12: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 1 Preceding High Soundfile 3 Preceding High Soundfile 6 Soundfile 8 Soundfile 5 Soundfile 7 Preceding Preceding Preceding Following Low Low High High Soundfile, Direction of Context, Degree of Context Soundfile 2 Following Low Soundfile 4 Following Low I First Response with CR of 5 or Better I Median for Group Figure 51 . Participant 13: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 1 Preceding High Soundfile 3 Preceding High Soundfile 6 Soundfile 8 Soundfile 5 Soundfile 7 Soundfile 2 Preceding Preceding Preceding Following Following Low Low High High Low Soundfile, Direction of Context, Degree of Context Soundfile 4 Following Low iFirst Response with CR of 5 or Better IMedian for Group Figure 52. Participant 15: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 1 Preceding High Soundfile 5 Soundfile 7 Preceding Following High High Soundfile, Direction of Context, Degree of Context Soundfile 4 Following Low I First Response with CR of 5 or better IMedian for Group 83 Figure 53. Part ic ipant 2: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 5 Preceding High Soundfile, Direction of Context, Degree of Context I First Response with CR of 5 or Better • Median for Group Figure 54. Participant 4: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 5 Preceding High Soundfile 7 Soundfile 2 Soundfile 4 Soundfile 1 Soundfile 3 Preceding Preceding Preceding Preceding Following High Low Low High High Soundfile, Direction of Context, Degree of Context Soundfile 6 Following Low Soundfile 8 Following Low H First Response with CR of 5 or Better I Median for Group Figure 55. Participant 5: First Correct Response Gate at CR of 5 or Better , With Group Data Figure 56. Participant 7: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 5 Soundfile 7 Soundfile 2 Soundfile 4 Soundfile 1 Soundfile 3 Soundfile 6 Soundfile 8 Preceding Preceding . Preceding Preceding Preceding Following Following Following High High Low Low High High Low Low Soundfile, Direction ot Context, Degree of Context 3 First Response with CR of 5 or Better IMedian for Group 84 90 S 70 C 60 3 50 Z 40 3 30 co 20 O 10 0 Figure 57. P a r t i c i p a n t 9: F i rs t C o r r e c t R e s p o n s e Gate at CR of 5 or Bet te r Wi th G r o u p Data if^iB i Soundfile 5 Preceding High Soundfile 7 Preceding High Soundfile 2 Soundfile 4 Soundfile 1 Soundfile 3 Preceding Preceding Preceding Following Low Low High High Soundf i le, Direction of Context, Degree of Context Soundfile 8 Following Low E3First Response with CR of 5 or Better IMedian for Group Figure 58. Participant 1 1 : First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 2 Soundfile 4 Soundfile 1 Soundfile 3 Soundfile 6 Preceding Preceding Preceding Following Following Low Low High High Low Soundfile, Direction of Context, Degree of Context Soundfile 8 Following Low EFirst Response with CR of 5 or Better I Median for Group Figure 59. Participant 14: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 5 Preceding High Soundfile 7 Preceding High Soundfile 2 Soundfile 4 Soundfile 1 Soundfile 3 Soundfile 6 Preceding Preceding Preceding Following Following Low Low High High Low Soundfile, Direction of Context, Degree of Context Soundfile 8 Following Low El First Response with CR of 5 or Better IMedian for Group Figure 60. Participant 16: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 1 Soundfile 3 Preceding Following High High Soundfile, Direction of Context, Degree of Context fa First Response with CR of 5 or Better IMedian for Group 85 3.8.2 Low Reading Working Memory Listeners Only four low working memory listeners participated in this study, and so there is not a full group to compare to the high reading working memory group. The description of performance of individual low reading working listener's performance will be limited to qualitative observations. For the purposes of comparison, identification results will be compared to the medians for the high working memory listeners. These results are shown in Figures 61 through 64. Performance for the low reading working memory span listeners was generally similar to that for the high reading working memory span listeners. Note, however, that Participant 17 consistently heard less than the median number of gates before correctly identifying the target with a confidence of 5 or greater (see Figure 63). Note that Participants 19 and 20 almost consistently heard more than the median number of gates before correctly identifying the target with a confidence of 5 or greater (see Figure 62 and 64, respectively). Note also that Participant 19 never identified three soundfiles, and Participant 20 never identified two soundfiles. 86 Figure 6 1 . Participant 18: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 1 Preceding High Soundfile 3 Soundfile 6 Soundfile 8 Soundfile 5 Soundfile 7 Preceding Preceding Preceding Preceding Following High Low Low High High Soundfile, Direction of Context, Degree of Context Soundfile 2 Following Low Soundfile 4 Following Low fg First Response with CR of 5 or Better IMedian for High WM Group Figure 62. Participant 19: First Correct Response Gate at CR 5 or Better With Group Data 70 60 50 40 30 '20 10 0 J Soundfile 1 Preceding High Soundfile 3 Preceding High Soundfile 6 Preceding Low Soundfile 8 Preceding Low Soundfile 5 Preceding High Soundfile 7 Following High Soundfile 4 Following Low Soundfile, Direction of Context, Degree of Context HFi rs t Response with CR of 5 or Better IMedian for High WM Group Figure 63. Participant 17: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 5 Preceding High Soundfile 7 Soundfile 2 Soundfile 4 Soundfile 1 Soundfile 3 Soundfile 6 Preceding Preceding Preceding Preceding Following Following High Low Low High High Low Soundfile, Direction of Context, Degree of Context Soundfile 8 Following Low HFi rs t Response with CR S or Better IMedian for High WM Group | Figure 64. Participant 20: First Correct Response Gate at CR of 5 or Better With Group Data Soundfile 5 Soundfile 7 Soundfile 2 Soundfile 4 Soundfile 1 Soundfile 3 Soundfile 6 Soundfile 8 Preceding Preceding Preceding Preceding Preceding Following Following Following High High Low Low High High Low Low Soundfile, Direction of Context, Degree of Context • First Response with CR of 5 or Better IMedian for High WM Group 87 3.8.3 AH Listeners Individual trends in identification performance and confidence ratings were observed. Figure 65 shows the performance of participants by the number of soundfiles that were correctly identified. These are arranged from most identified to least. As can be seen, three participants correctly identified all soundfiles, 10 participants correctly identified 7, and the remainder did less well. There are no strong effects of working memory in this analysis; one of the listeners in the low reading working memory group correctly identified all soundfiles, whereas the majority of the high reading wording memory span listeners did not identify all the soundfiles. Two of the listeners with low reading working memory spans, however, are among listeners who correctly identified the fewest soundfiles (Participants 19 and 20). Figure 65. Listener Performance: Number of Soundfiles Correctly Identified By Participants: First Response and CR of 5 or Better Participant Code BTotal First • Total with CR of 5 or Better | Trends in the maximum confidence ratings were also investigated and are displayed in Figure 66. A l l but two of the participants reached a maximum of 10 for at least one of the soundfiles. The mean of the maximum CRs ranged from 8 to 10. Two participants, 1 and 12, consistently reached a maximum CR of 10. The vast majority of the other listeners had an average maximum CR of 8 or greater. 88 Figure 66. Listener Performance: Analysis of Maximum Confidence Ratings Reported by Participants l Highest Max CR CZ3 Lowest Max CR - Average of Maximum CRs A regression analysis was performed to determine i f there was a relationship between the listener performance measures described in Figures 65 and 66. That is, the total number of soundfiles correctly identified by each listener was compared to their average maximum confidence ratings. Figure 67 shows the results. Note that R 2 is quite low, indicating the lack of a strong correlation between listeners' performance on the identification task and their confidence in the correctness of responses. in £ > Q) •J2 >< c I I S | | o Z cu TJ Figure 67. Listener Performance: Correlation Between Number of Soundfiles Correctly Identified and Average Confidence Ratings 8 i : 1 4 5 6 7 Average Conf idence Rat ing 10 89 3.9 Error Analysis Error analysis was performed to investigate the range of the responses given for each of the soundfiles. This analysis was conducted on a soundfile by soundfile basis. Each of the responses given were recorded, then the number of times each listener gave an answer was counted. The results are found in Tables 3 through 10. In these tables, the high and low reading working memory groups are separated, and within theses groups, the listeners who heard the soundfile in a preceding-first direction of context are listed before those who heard the following-first direction of context. Within each of the tables, the responses have been ordered according to two considerations. First, the responses are arranged by the number of participants who gave this response from most to least. Second, responses were ordered by total number of responses given by all listeners. Ordering was done in this way to illustrate the frequency of patterns of responses given by listeners; it clearly shows that there were some responses common to many listeners and others that were relatively rare. General trends are apparent in these results. Most notably, there was a wide variety of responses given for some of these soundfiles. A time-response analysis was not performed, but the results presented here do indicate that a range of possible sources of these sounds were elicited from the listeners. This phenomenon may be thought of as approximately analogous to the elicitation of a word-initial cohort. Furthermore, there were some interesting similarities in the responses given by listeners. In some cases, these similarities took the form of correct identification of the object or objects involved, at times in the absence of correct identification of the action that caused the target sound. In Tables 6 and 10, it can be seen that many listeners gave responses for 90 Soundfile 4 (ducks taking off from the water) which included various types of water sounds and objects, including ducks, acting upon water. Similarly, many participants gave water-related responses for Soundfile 8. This relatedness of responses was also found for Soundfile 1 (change dropping into a bus farebox), shown in Table 3, for which many listeners responded that the target was hard objects hitting each other or a hard surface. The most common error responses to the target of Soundfile 3 (bells announcing the closing of Skytrain doors) were bells from other sources, such as elevator or doorbell, as shown in Table 5. Another pattern found in the error analysis was responses that included objects that were related to, but different from, those which actually caused the target sound. For some of the soundfiles, particularly those that contained machines and transportation sounds, many participants identified the target as related machines or transportation modes. For example, many responses given for Soundfile 2 (a motorcycle engine revving), shown in Table 4, were machines or vehicles that are loud and/or have gas engines, such as jackhammer, chainsaw, and lawnmower/leafblower. Similarly, responses for Soundfile 7 (a dot-matrix printer printing), shown in Table 9, included many other mechanical devices. In some cases, the errors that were commonly made in which the objects identified were neither correct nor obviously related to the actual objects causing the target sound. Such responses were observed for Soundfile 6 (fire crackling), shown in Table 8. Responses given were possibly acoustically similar to the target, but there was little or no relationship in terms of the objects causing the sound. The more common pre-recognition responses for this soundfile included wheels on a surface and rain falling onto a thin surface. For Soundfile 1, shown in Table 3, the second most common response given by participants was the sound of 91 a turnstile being turned. As well, for Soundfile 5 (the whine of a printer printing a page), shown in Table 7, a common response was the horn of a large vehicle, for example, a truck. It is interesting to note certain parallels to language regarding levels of processing that might be postulated from the types of errors described. When a listener correctly identifies the objects involved in a sound-producing event, but not the action, or the action but not the objects, this might be considered to be evidence of a level of environmental sound processing roughly analogous to semantic processing in language. On the other hand, when responses contain neither the sound producing objects nor events, but seem to be determined more on the basis of the acoustical signal, this might be considered to be evidence of a level of processing roughly analogous to analogous to phonetic or phonological processing in language. 92 Soundfile 1 High Reading Working Memory Span (RWMS) Listenersl | Low RW MS Listeners Responses Given Preceding Context Group Following Context Group Pre Con Fol Contl 1 3 6 8 10 12 13 15 2 4 5 7 9 11 14 16 18 19 17 20 Cour Sum Putting money into bus farebox 8 6 1 5 5 5 8 18 44 8 5 13 18 7 12 7 18 9 5 19 202 Turnstile being turned 75 2 39 25 5 62 24 7 232 Coins going into machine - phone/vending machine 14 39 6 31 4 90 Coin into (Skytrain) ticket dispenser I 25 8 27 3 60 Metal objects falling (onto hard surface) 6| 5 13 3 24 Keys or marbles hitting a table 9 1 5 3 15 Skytrain doors 32 2 2 34 Something falling onto hard surface 7 13 2 20 Chain being pulled out of water 3 15 2 18 Something hard (dice) falling on to surface 15 1 2 16 Objects being moved against a ridged surface 9 1 2 10 Creaky hinges 8 1 2 9 Shaker noise 5 1 2 6 Silverware tapping on table 3 1 2 4 Something scraping along ground 1 2 2 3 Metal on metal 1 1 2 2 Boat ignition being started 51 1 51 Boat rocking in water 12 1 12 Slot machine 11 1 11 Utensils being washed in sink 7 1 7 Glass plate hitting surface 7 1 7 Dial being turned 7 1 7 Metallic object turning 7 1 7 Office sounds 6 1 6 Amusement park ride 6 1 6 Some sort of percussion instrument 5 1 5 Series of objects - cogs? 5 1 5 Crank turning 5 1 5 Marbles dropping onto stone floor 4 1 4 Wind-up toy 4 1 4 Coins dropped into bowl 4 1 4 Construction noises 4 1 4 Someone weighing something on a scale 4 1 4 Metallic objects clicking against each other and turning 3 1 3 Airplane doors clicking 3 1 3 Someone putting record into record player 3 1 3 (Metal) Parts of machine clicking 3 1 3 Gate opening 2 1 2 Chair creaking 2 1 2 Tapping 2 1 2 Ratchety sound 2 1 2 Multiple wooden objects falling 2 1 2 Conveyor belt 2 1 2 Pachenko game (Japanese pinbal-type machine) 2 1 2 Trolley bus stopping 2 1 2 Plastic toy clattering 1 1 1 Typing on keyboard 1 1 1 Money hitting counter 1 1 1 Emergency brake being put on 1 1 1 Rocks falling 1 1 1 Metal (umbrella) clanging against metal bus stop pole 1 1 1 Playing card in bicycle spokes 1 1 1 Cans rattling 1 1 1 Coins in cash register 1 1 1 Drums 1 1 1 Large ship's speed setting 1 1 1 Table 3. Error Analysis for Soundfile 1. 93 Soundfile 2 Responses Given High Reading Working Memory Span (RWMS) Listeners Preceding Context Group 2 4 5 7 9 11 14 16 1 3 6 8 10 12 13 15 17 20 18 19 Cour Sum Motorcycle rewing/moving 12 5 33 34 17 8 4 18 11 5 8 6 9 11 11 3 16 195 Motorcycle starting 4 8 7 7 1 21 12 6 8 66 Jackhammer 3 2 1 3 1 3 1 7 14 Chainsaw (cutting through tree) 19 11 1 18 6 5 55 Harley Davidson motorcycle 8 7 4 2 4 21 Lawnmower/leafblower 5 10 3 3 18 Machine of some sort 1 12 1 3 14 Racecar engine (starting) 26 60 2 86 Engine starting 1 5 2 6 Traffic - cars moving 1 3 2 4 Car muffler 1 2 2 3 Truck moving 1 1 2 2 Car engine 18 1 18 Saw/construction site noises 5 1 5 Small airplane (taking off) 5 1 5 Train on train tracks 3 1 3 Rat-tat-tat of video game shooting 3 1 3 Drilling concrete 2 1 2 Dirtbike 2 1 2 Roar of vehicle engine 2 1 2 Moving vehicle 2 1 2 Paper rattling 1 1 1 Card stock being rifled through 1 1 1 Clapping | 1 1 1 Following Context Group Low RWMS Listeners Pre Corl Fol Cont Table 4. Error Analysis for Soundfile 2. Soundfile 3 High Reading Working Memory Span (RWMS) Listeners Responses Given Preceding Context Grou P Fol owing Context Group Pre Cor Fol Cont 1 3 6 8 10 12 13 15 2 4 5 7 9 11 14 16 18 19 17 20 Cour Sum Ding of a Skytrain - door warning 11 18 5 6 5 6 7 13 25 5 5 5 9 7 6 5 5 18 9 5 20 175 Sounds related to Skvtrain 8 5 9 6 1 5 29 Bell from elevator 2 10 5 7 3 5 27 Bell 20 1 2 1 1 5 25 Machine hum/buzz 1 14 4 17 4 36 Horn of a car or truck 1 1 2 1 4 5 uoorDen 14 b 4 24 Electrical noise/buzz 12 2 2 3 16 Airplane engine 3 3 2 6 Chimes 2 2 2 4 Racecar noises/speeding car 2 1 2 3 Warning/announcement signal 1 1 2 2 Seabus warning ding 64 1 64 Church bells 26 1 26 Vaccuum cleaner 15 1 15 Drycleaners rack (dinging as clothes go around) 13 1 13 Machines of some kind 13 1 13 Pressurized air 10 1 10 Fax machine 8 1 8 Pipes - pipe organ 7 1 7 Music from ice cream truck 2 1 2 Chimes of a clock 2 1 2 Grouse Mountain Skyride announcement s gna 2 1 2 Alarm 2 1 2 Mixing DOWI (stand-type mixer) 2 1 2 Photocopier 2 1 2 Music 1 1 1 Merry-go-round music 1 1 1 I ram wnistie 1 1 1 Chainsaw 1 1 1 Saw cutting wood 1 1 1 Jet enging starting 1 1 1 Traffic noises 1 1 1 Coffee grinder 1 1 1 video game 1 1 1 Low RWMS Listeners Table 5. Error Analysis for Soundfile 3. 94 Soundfile 4 High Reading Working Memory Span (RWMS) Listeners Low RWMS Listeners Responses Given Preceding Context Group Following Context Group Pre Con Fol Cont 2 4 5 7 9 11 14 16 1 3 6 8 10 12 13 15 17 20 18 19 Coun Sum Duck(s) taking off from water 3 5 2 5 6 5 6 7 5 2 8 6 6 12 14 78 Duck moving wings in water/splashing 2 2 8 16 3 3 6 1 3 4 1 11 49 Water running in a creek (river) 2 15 6 13 12 11 6 59 Duck landing on water 8 3 4 5 5 5 25 Water falling from fountain 5 9 2 2 4 18 Duck and movement in water; action ur 2 3 8 1 4 14 Running water 4 2 6 2 4 14 Water sounds (nondescript) 2 5 1 2 4 10 Something disturbing/moving water 3 1 1 2 4 7 Water waves/water lapping 1 5 1 3 7 Applause 3 1 2 3 6 Rippling of water 1 3 2 3 6 Duck stirring water 20 2 2 22 Ocean 12 5 2 17 Bird taking off from water 11 4 2 15 Someone paddling in water 1 4 2 5 Water being stirred - lake or pond 1 4 2 5 Water falling 2 2 2 4 Water rushing 16 1 16 Something jumping into water 9 1 9 Rain 9 1 9 Clapping 8 1 8 Duck landing in water - shaking water off wings 7 1 7 Water flowing 6 1 6 Lots of water dripping 6 1 6 Water being poured from a bucket 5 1 5 Something landing in water 3 1 3 Water dumping/spilling 3 1 3 Water running into bath 3 1 3 Boat/something with a motor in the water 2 1 2 Birds landing on water/splashing water 1 1 1 Fish jumping out of water 1 1 1 Waterfall 1 1 1 Pouring rain 1 1 1 Water dripping down a pipe 1 1 1 Water pouring into pond 1 1 1 Table 6. Error Analysis for Soundfile 4. 95 Soundfile 5 High Reading Working Memory Span (RWMS) Listeners Low RWMS List aners t Responses Given Preceding Context Group Fol owing Context Group Pre Con Fol Con 2 4 5 7 9 11 14 16 1 3 6 8 10 12 13 15 17 20 18 19 Coun Sun Printer 57 48 2 52 19 11 3 47 4 2 5 11 250 (Big) Truck horn honking 2 1 6 26 6 5 2 7 7 3 10 65 Computer booting up 10 5 24 7 28 11 3 6 8 94 Some kind of horn 2 2 16 2 72 1 2 7 97 PC starting up 14 9 1 11 2 1 9 7 47 Some kind of device buzzing 2 3 1 5 4 5 15 Hard drive of computer as it starts up/boots 1 11 19 7 4 38 Buzzer of some kind (game buzzer) 11 1 1 1 4 14 Some kind of machine 1 5 4 3 10 Doorbell/buzzer 4 3 1 3 8 Something industrial 4 22 2 26 Disk processed when computer is turned on 8 5 2 13 (Photo)Copier machine 2 11 2 13 Electric typewriter 3 10 2 13 Computer checking/reading drive as t starts up 5 6 2 11 Machine noise 1 2 2 3 Computer 1 2 2 3 Something electrical/mechanical 1 1 2 2 Airplane engine starting 36 1 36 Wood planer 22 1 22 Airplane warming up 22 1 22 Fax machine beeping 21 1 21 Seismograph 18 1 18 Conveyer belt noises 17 1 17 Key being cut 16 1 16 Computer transferring information in hard drive 15 1 15 Laundromat noises/ drycleaners 14 1 14 Complex machine doing readin( 10 1 10 Mechanical saw 10 1 10 Drilling tool 8 1 8 Big car vaccuum 8 1 8 Alarm buzzing 8 1 8 Something cutting 6 1 6 Machine ready signal 6 1 6 Office machine 6 1 6 ATM machine (automatic teller) 6 1 6 Big photodeveloper machine/x-ray developer 5 1 5 Button on a machine buzzing 4 1 4 Lumber mill noises 3 1 3 Vaccuum cleaner 3 1 3 Electric fan turning 2 1 2 Computer beep 2 1 2 Engine starting 2 1 2 Electrical buzzing 1 1 1 Mechanical hum 1 1 1 Fog horn 1 1 1 Ferry horn 1 1 1 Tugboat horn 1 1 1 Transfer of information in a machine 1 1 1 Head in dryer in beauty salon 1 1 1 Deli slicer 1 1 1 Dishwasher 1 1 1 Carpet cleaner 1 1 1 Truck engine 1 1 1 Car enqine 1 1 1 Table 7. Error Analysis for Soundfile 5. 96 Soundfile 6 High Reading Working Memory Span (RWMS) Listeners Low RWMS Liste lers Responses Given Precedino Context Group Followinq Context Group Pre Con Fol Cont 1 3 6 8 10 12 13 15 2 4 5 7 9 11 14 16 18 19 17 20 Cou Sum Fire (burning) 7 19 43 5 14 4 6 4 4 3 3 16 6 13 134 Rain 22 50 8 9 7 8 7 6 15 18 27 11 177 Rain on (wooden) roof 5 1 24 9 8 6 6 53 Water flowing/running 3 4 1 2 4 4 6 18 Horse(s) and carriaqe 1 5 6 13 6 5 31 Rain and/or fire 3 3 2 2 1 5 11 Rain falling onto metal or plastic roof/surface 1 14 58 5 4 78 Rain fallina in trees/on leaves 6 13 14 26 4 59 Rain on a window 3 11 33 4 4 51 Papers being shuffled 1 3 3 2 4 9 Water sound of some sort 16 5 3 3 24 Hail falling 16 1 4 3 21 Paper rustling 3 1 2 3 6 Hard rain fallina 3 27 2 30 Water falling 23 3 2 26 Wheels moving in the rain 23 2 2 25 Rain on a tarpaulin (or canvas) 13 11 2 24 Rain falling onto umbrella 11 7 2 18 Wheels (bicycle/wagon) rolling on gravel 9 4 2 13 Biq drops of water fallina into a puddle 8 3 2 11 Things crackling/burning in fire 2 7 2 9 Hose spraying (something plastic or metal) 3 3 2 6 Water rushinq 3 2 2 5 Horse trotting/galloping 3 2 2 5 Papers being ruffled (in wind) 2 2 2 4 Food frvinq 3 1 2 4 Applause/hands clapping 2 1 2 3 Oil boiling 47 1 47 Loqs rollinq off a slide 42 1 42 Water pouring into sink, hitting dishes 32 1 32 Wheels (horse and carriage) in the rain 25 1 25 Wheels on dirt (wheelbarrow/cart) 19 1 19 Liquid boiling 15 1 15 Campfire 13 1 13 Flaq - fabric flvinq in wind 10 1 10 Dentist's suction vaccuum 9 1 9 Sticks clacking together 8 1 8 Candy wrapper (cellophane) beinq crumpled 6 1 6 Water squirting (water gun) 6 1 6 Fins or flippers - like penguins 6 1 6 Water boilinq 6 1 6 Water wheel 5 1 5 Wood crackling in fire 5 1 5 Sewinq machine workinq 5 1 5 Paper crumpling 4 1 4 Deck of cards being shuffled 4 1 4 Papers beinq fanned 3 1 3 Rain falling on windshield 3 1 3 Washing a car 3 1 3 Firecracker burninq 3 1 3 Dragging a rake across rocks 3 1 3 Someone typing 3 1 3 Rustlinq newspaper 2 1 2 Water falling into bathtub 2 1 2 Water fountain 2 1 2 Crackling paper 1 1 1 Rain falling on hard object (patio) 1 1 1 Water being stirred up 1 1 1 Objects hitting water 1 1 1 Many typists 1 1 1 Things falling 1 1 1 Soft pieces of something being dropped 1 1 1 Table 8. Error Analysis for Soundfile 6. 97 Soundfile 7 High Reading Working Memory Span (RWMS) Listeners Low RWMS Listf jners Responses Given Preceding Context Group Following Context Group Pre Cor Fol Con 2 4 5 7 9 11 14 16 • 1 3 6 8 10 12 13 15 17 20 18 19 Cour Sum Older style (computer) prints 5 10 6 5 2 71 10 16 5 9 130 Printer printing 1 6 14 3 13 25 5 9 22 9 98 Dot matrix printer 1 4 5 15 6 7 5 7 8 9 58 Something printing out 5 52 1 3 58 Some sort of machine 1 1 1 3 3 Saw blade - like a table saw 1 8 2 9 Drilling 2 5 2 7 Cash register printing out receipt 5 1 2 6 Machine sawing through metal 1 4 2 5 Electrical saw 1 2 2 3 Some sort of bird/bird screaming 2 1 2 3 Photocopier starting 46 1 46 Jigsaw 1 10 1 10 Photoslide - slide projector 7 1 7 Bubble jet printers 4 1 4 Drilling with metal sparking 2 1 2 Machine used for cutting 2 1 2 Machine noise 2 1 2 Metal on metal noise 2 1 2 Ink jet printer 1 1 1 Jackhammer 1 1 1 Key cutter 1 1 1 Circular saw 1 1 1 Buzz saw 1 1 1 Sewing machine 1 1 1 Something electrical spinning 1 1 1 Tape being fast-forwarded 1 1 1 Fingernails on a blackboard 1 1 1 Lightening 1 1 1 Car not starting 1 1 1 Table 9. Error Analysis for Soundfile 7. 98 Soundfile 8 Hiah Readina Working Memory Scan (RWMS) Listeners 1 Low RWMS Listeners Responses Given Preceding Context Group I Followina Context Group Pre Cont Fol Cont 1 3 6 8 10 12 13| 15 2 4 5 7 9 1 1 14 16 18 19 1 7 20 Count Sum Waves on ocean, crashing onto beach 1 1 4 6 11 6 1 6 1 8 8 63 People/large crowd cheering 3 1 1 1 5 3 1 4 8 19 Waves 8 6 7 1 5 1 1 7 29 Rainfall 2 8 9 29 1 1 9 6 68 River/raDids running 23 3 1 0 9 25 5 70 (Strong.) Wind blowina 3 1 1 8 4 13 White noise/static 2 6 1 3 4 12 Car driving by (alongside/on freeway) 1 4 1 0 5 3 29 Bad reception on radio/television 1 20 1 3 22 Wave recedina 9 5 6 3 20 Waves on beach, recedina. (draaaina rocks) 5 8 5 3 1 8 Heaw rain and wind 4 2 2 3 8 Ocean 1 3 1 3 5 Waterfall 1 1 5 2 1 6 Stream 6 8 2 1 4 Wave on a pebbly beach 2 10 2 1 2 Traffic (tires) on wet road/pavement 9 1 2 1 0 Airplane qoina across skv 3 7 2 1 0 Applause (in a large auditorium) 7 1 2 8 Heavy rainfall 5 1 2 6 3ar going through carwash 1 5 1 1 5 Water running, over rocks 14 1 1 4 Wind blowing against trees 1 1 1 1 1 Water falling 9 1 9 Static on phone line 9 1 9 Wind blowing against the waves 8 1 8 Sprinkler 6 1 6 Wind blowina aaainst sand 6 1 6 Driving with open window 6 1 6 Wr blowing against something 5 1 5 Raining on the ocean water 4 1 4 Water sounds 3 1 3 Water flowing 3 1 3 Hard stream of water 3 1 3 Shoppy ocean water 2 1 2 Friction sound- sliding doors or skis 2 1 2 Someone skiing down a hill 2 1 2 Toilet flushing 1 1 1 Hose spraying in the wind 1 1 1 Traffic in a tunnel 1 1 1 Aeresol whipped cream 1 1 1 Birds flying 1 1 1 Table 10. Error Analysis for Soundfile 8. 99 DISCUSSION 4.1 Overview The purpose of this study was to investigate the processing of environmental sound, and to compare this to the processing of spoken language. Participants listened to brief segments of environmental sounds and guessed the identity of the auditory objects they represented. Additional context was provided by gating the original recordings in both preceding and following directions. The following sections discuss whether or not the findings of the experiment supported the predictions that were put forward in section 1.9. 4.1.1 Identification of Environmental Sounds It was predicted that participants would be able to identify environmental sound recordings based solely on auditory input. This prediction was supported; it was found in both the pilot study and the main experiment that listeners are able to identify many environmental sounds based on auditory cues only. Earlier, we discussed the fact that input from other, normally available, modalities is absent in the experimental situation. This absence may have had some effects on performance, but it certainly did not make identification impossible. This is perhaps unsurprising, given previous studies and the nature of auditory perception. Notably, Ballas (1993) found that listeners were able to identify brief environmental sounds (maximum duration was 0.625 sec), but that there was considerable variability in accuracy and identification time for different stimuli. Performance was found by Ballas (1993) to be 100 "related to variables in different domains, including acoustic variables, ecological frequency, causal uncertainty, and sound typicality. [Ballas' findings suggested that] sound identifiability is related to the ease with which a mental picture is formed of the sound, context independence, the familiarity of the sound to a mental stereotype, the ease in using words to describe the sound, and the clarity of the sound" (p. 262). 4.1.2 Effects of Additional Context on the Identification of Brief Environmental Sounds It was predicted that listeners would be able to identify brief environmental sounds when additional context, in the form of longer duration of the original recording, was added. This prediction was supported. Findings suggest that additional context aided the listeners in identification of brief environmental sounds. Ballas and Mullins (1991) also found positive effects of supportive context on identification of environmental sounds, although the significant effects they found, once other factors had been ruled out, were small. They also found that when an environmental sound was replaced by one that was nearly a homonym, the context of the sequence biased the listeners against giving the acoustically correct response. To explain their results, Ballas and Mullins hypothesized that encoding of environmental sounds is too slow (relative to encoding of language) to allow the auditory system to benefit from surrounding context in perception of the stimuli sound, and that performance is further reduced by the negative effects of a non-supportive acoustic context. An alternative explanation is that listeners used context in to a greater extent than they used acoustic cues. That is, in the condition in which the target sound was the near-homonym 101 instead of the acoustically correct sound, greater weight may have been given to the context than the sound itself in the listeners' perception. Overall, the findings of the present study are consistent with Ballas and Mullin's contention that comparison of the effects of context between language and environmental sound is problematic although context effects are observed in both cases. 4.1.3 Patterns of Recognition It was predicted that different patterns of identification for the different soundfiles would emerge. In particular, it was expected that high-context soundfiles would be identified more easily and with higher confidence than the low-context soundfiles. It was also expected that preceding context would facilitate identification of the target to a greater extent than following context, due to working memory constraints. This prediction was generally supported, although some of the patterns that emerged did not show grouping by amount of or direction of context as was predicted. There was considerable variability observed in the distribution of responses, both between soundfiles and between direction-of-context conditions. Some soundfiles appeared to be answered with more ease, and with much less between-participant variation in responses. As well, some patterns were observed in more than one soundfile; for example, for some soundfiles and conditions (Soundfiles 2 and 7 in both direction conditions and Soundfile 3 in the preceding condition), there was a concentration of correct early responses. . 1 0 2 4.1.4 Quantity of the Context It was predicted that the performance of listeners would be influenced by the acoustic content of the signal heard in terms of the nature of the auditory events, as opposed to the quantity in terms of number of gates. This prediction was supported. If the quantity of the signal were the only factor that determined listener performance, we would expect that similar relationships between number of gates heard and correct identification would have emerged for all soundfiles. Instead, different patterns emerged for the different soundfiles. Additionally, i f quantity of the signal were the most important factor, then we would not expect that the nature of specific auditory events would enhance identification. This was observed in the data, however. For example, listeners in both the preceding and following conditions responded correctly to the gates of Soundfile 6 that corresponded to the sounds of a match being struck and fizzling, even though these events occurred at a point where more gates had been heard by those in the following condition than by those in the preceding condition. 4.1.5 Performance of Individual Listeners It was predicted that some listeners would perform better than others on the environmental sound identification task. This prediction was supported. There was a range of performance, as measured by the number of soundfiles to which the participants responded correctly with a confidence rating of 5 or better. At one extreme, some listeners correctly identified all soundfiles; at the other extreme, only half of the soundfiles were correctly identified. The majority of the participants in this study, however, correctly responded to at least 7 of the soundfiles. 103 Performance in this task was also evaluated in terms of confidence ratings for the correct responses. The distribution of these data is similar to that for the identification data. That is, there was a range from consistently confident to less confident. The majority of listeners had a mean confidence rating at 8 or above, although two listeners never achieved a maximum confidence of 10. The results of the preceding analyses were reviewed to determine if the relative performance on these two outcome measures were correlated. Participant 1 consistently responded correctly and with the maximum level of confidence possible. In contrast, Participant 19 correctly responded to only 5 of the 8 soundfiles, and consistently gave relatively low confidence scores. In most cases, however, performance on one measure did not necessarily predict performance on the other, as was illustrated in Figure 67. 4.1.6 Comparison of Environmental Sound Processing to Language Processing It was predicted that the performance of listeners would bear some similarities to the performance of listeners in analogous language processing studies. The predictions relating to each sub-hypothesis are discussed separately. 4.1.6.1 Degree of Context It was predicted that listeners would perform better for high-context than low-context sounds. This prediction was generally not supported. Two measures will be considered in this discussion: the number of listeners who correctly identified the soundfile, and the mean number of gates to correct identification with a confidence rating of 5 or better. 104 A l l listeners correctly identified two of the high-context soundfiles (Soundfiles 3 and 7), and almost all identified a third (Soundfile 1). However, almost all listeners also correctly identified one of the low-context soundfiles (Soundfile 2). The remaining soundfiles were identified by fewer listeners. When soundfiles were ordered from lowest to highest median number of gates to first correct response with a confidence rating of 5 or better (as seen in Figure 43), high- and low-context soundfiles were scattered throughout the range. Nevertheless, some of the results are consistent with the prediction; low medians were observed for high-context Soundfiles 3 and 7, and high medians for the low-context Soundfile 6. There are at least two possible reasons for the apparent lack of strong correlation between context and identification performance. First, there may indeed be no effects of context on identification of environmental sounds. Second, the factors influencing amount of context in environmental sounds may not have been adequately appreciated. Both of these possibilities warrant further consideration. The first possibility seems unlikely to be entirely true. As was shown by Ballas and Mullins (1993), there are significant effects of context; however, these effects are not entirely consistent. Notably, these researchers found that the context a listener hears may bias the listener against the acoustically correct interpretation when it is inconsistent with the context, with a semantically consistent interpretation being favored. The second possibility, that the selection of criteria for estimating degree of context were inadequate warrants discussion. It was presumed that temporally ordered sequences of discrete but related auditory events would provide contextual support to a greater extent than would be provided by slowly changing, repetitive, or continuous auditory objects. This was 105 not necessarily shown to be the case. Predictability or ease of identification of a sound may rely on a number of factors, including properties of the target sound itself as well the properties of the surrounding context. For example, we may consider properties such as distinctiveness, category (for example, transportation sounds, animal sounds, mechanical sounds, human sounds), ecological or cultural importance (where ecological importance would refer to safety or survival, and cultural importance would refer to culture-specific signals, for example, the ring of a telephone), familiarity, or frequency of the sound to determine what properties make a sound easy to identify. The size of the initial cohort could also be a determining factor in the ease of identifying an auditory event. It may be worthwhile to establish whether or not the size of the initial cohort correlates to the average number of gates required for correct identification of various targets. Although a full investigation of these issues will be left to future researchers, we will make a note here of the properties of the sounds and their contexts that seem likely to have affected the performance of listeners. Listeners tended to perform best on the soundfiles that described mechanical and transportation sounds, specifically, the printer, the Skytrain bells, and the Harley Davidson motorcycle. These sounds are either highly familiar and/or distinctive; even a listener who reported having ridden the Skytrain on a single occasion was able to identify the chimes with high confidence. In contrast, the bus, another transportation sound, was somewhat less easily identified. The relative difficulty of listeners in identifying this sound may be attributed to the typicality of the acoustical properties of the recording of the auditory object itself. In this particular recording, the sound of change in the farebox is quite high in intensity relative to 106 other auditory objects in the recording, perhaps due to the microphone being closer to the source than a typical listeners ear would be. Some listeners reported that this high relative intensity caused confusion; logically, they felt they were hearing change in the bus farebox, but it just didn't sound the way they expected it should. Listeners also had difficulty identifying the computer booting up. Again, this may have been due to acoustical typicality. The microphone of the recorder was directly on the hard drive, presumably causing the sounds emanating from this part of the computer to be of a relatively higher intensity and a higher frequency emphasis than the listener would normally experience. Again, this was reflected in comments made by the listeners, who mentioned the correct source of the sound as a possibility, but dismissed it as not sounding quite right. The situations described in which listeners dismissed the correct target sound even though it was the logical explanation because it did not sound as the listener expected may benefit from the application of auditory virtual reality techniques. Auditory virtual reality attempts to simulate real-world conditions under earphones, which allows the listener to experience auditory input in a three-dimensional space and makes sound localization possible. The application of head related transfer functions (HRTF) may also provide more natural auditory input. Ear-level microphones are also available and could be used to collect recordings that would more accurately represent the acoustic signal arriving at the typical listener's ear. In the present experiment, the stimuli were presented in stereo, and some listeners were able to recognize the movement of the sound producing objects in some of the soundfiles, particularly the motorcycle in Soundfile 3. It would be interesting to study 107 whether listeners' overall performance is improved through enhancement of auditory cues using virtual reality techniques. Overall, listeners performed less well on sounds of nature. In the soundfile of ducks taking off from the water, many listeners identified that ducks and water were involved, but the action was not correct. Listeners tended to have the greatest difficulty overall identifying fire, and it is possible that without the sound of the match striking many listeners would never have responded correctly. This might be considered to be a case of disambiguating context, and the results for these gates indicate a narrowing of the cohort with the addition of this information. J 4.1.6.2 Direction of Context It was predicted that listeners would not perform equally well in both direction conditions, but that listeners would benefit more from preceding context than following, as was found by Wingfield et al. (1994). This prediction was not generally supported. There was no clear pattern of effects favoring either preceding or following direction of context. More listeners correctly identified the target in the preceding context condition for Soundfile 8. More listeners correctly identified the target in the following context condition for Soundfiles 2,4, and 5. In terms of median number of gates, half of the soundfiles had lower medians for the preceding context condition and half had lower medians for the following context condition. In the present study, direction of context has been shown to have little i f any effect on identification. A number of possible explanations come to mind. First, there is a temporal order to the world that is upset only under the present experimental conditions insofar as 108 gating alters the time window of the signal that is heard. Furthermore, nature does not add information in a backwards direction. Although gating is a somewhat unnatural procedure, a listener may sometimes hear only a portion of a sound in nature. It is difficult to assess the extent to which upsetting the natural temporal order of environmental sounds may have complex and unpredictable effects. It is also possible that such effects are more disruptive for environmental sound recognition than for language processing, because consistent effects of direction of context have been found in studies of word identification (Wingfield et al., 1994). Second, there are temporal relationships between auditory objects; auditory sequences may be spread out over a long time, or a very short time. Within a sequence there may be series of regions that contain relatively many or few auditory objects and auditory objects may be simple or complex patterns. For example, the ringing of a phone is typically punctuated by periods of silence. It is the presence, durations, and repetition rate of these pauses that tell us, for example that we are hearing a telephone and not an alarm clock. If the length of the pauses is altered, this can also provide information. A long silence may tell us that the phone has been answered, or that the caller has given up. Therefore, the temporal distribution of signal information may be a critical feature of particular soundfiles. The third characteristic of environmental sounds, and one we have already discussed, is that it is not just the quantity, but the nature of the acoustic information that provides cues to the listener. We cannot accurately determine the effects of the initial direction of context unless the nature of the signal is controlled so that acoustic content is equalized in both directions of context. There are different ways in which this may be achieved. A reasonable approach to take might be to use auditory sequences which are relatively matched in intensity, duration, and possibly frequency composition (for the segments both preceding and following the target). These 109 would be either repetitive sequences (such as the motorcycle that was included in this study as Soundfile 2), or sequences of discrete auditory events. These could be either repeating events, or distinct events at regular intervals. An example of the former is a telephone ring; however, finding a sequence of periodically occurring but distinct auditory events may be more difficult. It is more common in nature for sounds that are separated by uniform increments to be repetitions of the same sound, or close variations on it. It is much less likely for a natural sequence of discrete auditory objects to be periodic. Sequences could certainly be designed, but these would not necessarily be representative of naturally occurring auditory sequences. We leave these matters to future researchers to address. 4.1.6.3 Working Memory Span It was predicted that listeners with high reading working memory spans would perform better than listeners with low working memory spans. As was found by Wingfield et al., (1994), it was expected that listeners with high working memory spans would identify target auditory events at earlier gates than listeners with low working memory spans, particularly in the following context condition, which requires listeners to hold items in memory as context is added. The small sample size of the low reading working memory group prevents us from concluding that the prediction was or was not supported. However, a similar range of performance was found for the members of both the high and low working memory groups who were tested. Considering individual listeners, some members of the low working memory group performed very well, while others did poorly relative to the range of the high working memory span listeners. As a group, performance measures of the low working memory span listeners did not cluster together. It is possible that with a full 110 complement of low working memory span listeners, more decisive trends may emerge; however, this will be left to future study. 4.1.6.4 Error Analysis It was predicted that listeners' errors would reveal an initial cohort in the environmental sound identification process. An initial cohort would be a variety of responses given by listeners early in the identification process that may share some common characteristics, such as acoustic similarities, with the correct target. This prediction was supported. In the analysis of errors from each of the soundfiles, it was shown that listeners' pre-recognition responses include a fairly narrow range of responses. The findings suggest the possibility that listeners generate a cohort of possible identities for environmental sounds that are too brief to be correctly identified. We propose that this activation of possible identities may be similar to that which has been extensively studied in language by Marslen-Wilson and others. The cohort model invokes both top-down and bottom-up processing in the process of lexical identification. The process begins with bottom-up processing, making use of acoustic and phonetic cues to activate a cohort, or a set of word candidates. As more acoustic information becomes available, and top-down knowledge is applied, the number of possible candidates decreases, thus the size of the cohort narrows until the lexical item is correctly identified. The experiment presented here indicates that a similar process may occur in the identification of environmental sound. Most listeners were not able to correctly identify the 400 msec targets until more context had been added. The range of pre-recognition responses indicates that, as is the case with language, there may be activation of a cohort that is I l l narrowed down as more acoustical information is added and the auditory system applies top-down processing. In the analysis of error responses, both types of processing seem to be evident. Taking, for example, the pre-recognition responses for Soundfile 1, the target of which was change falling into a bus farebox, several of the responses given involved acoustically similar events, for example, coins, marbles, or keys hitting each other or something hard. Listeners also demonstrated use of top-down processing when they recognized engine noise, and used this information to determine that the target was a turnstile turning at the entrance to a Skytrain station (although Skytrain stations do not have turnstiles) or near a bus stop. Of all the predictions put forth to investigate the similarities and differences between the auditory perception of language and environmental sound, the indication of cohort generation is the most compelling similarity. It remains possible that similar processes underlie the identification of both speech and environmental sound input, even though unique processes may also be involved in the encoding of the two types of input. 4.2 Future Directions In this study, a paradigm previously used only for linguistic stimuli was applied to study environmental sound processing. The results of this experiment on the identification of environmental sounds indicate that listeners are able to identify environmental sounds, and that adding context to brief environmental sounds helps listeners in identification. Performance for different sounds varied quite widely, with some soundfiles being apparently easier to identify than others. There was also a range of performance between listeners. Generally, similarities in 112 identification between spoken language and environmental sounds were not suggested by the results. Cohort generation seems to be the most notable similarity between perception of these two kinds of input. These findings present many questions that may be addressed in future research. Some of these have been alluded to in the foregoing discussion. Subject characteristics must be investigated more fully. In order to draw reasonable conclusions regarding the effects of working memory span, complete study groups must be used. It may also be useful to determine if listeners' performance in recognizing environmental sounds correlates with other areas of auditory processing, such as gap detection threshold or auditory pattern perception for artificial sounds. It may also be useful to probe the thinking style of participants. Following each experimental session, participants were invited to make written comments regarding the experimental task and the stimuli used. Many participants provided insights about their experiences in this experiment. For example, a large number reported that they were visually oriented in their thinking or that they created pictures or images in their head during the experiment. During word recognition, the acoustical pattern triggers lexical access. During identification of environmental sounds, the acoustical pattern apparently often gives rise to scene imaging. Comments made to this effect included: "I found it fascinating to be in a sterile, closed off foreign environment trying to determine what each sound was. It made me very conscious of how visual I am and how I use sound as a filler.... I tend to see situations visually: I would find a scenario and tend to stick to it until I had enough information to guess otherwise.... It is interesting to play a little guessing game with sound as I discovered the initial bit of 113 sound can send one off in a completely opposite direction - in terms of what the final guess is. The visual imagery (for me) kept changing as I worked out in my brain what the sound was." -09 "Really wild images came to mind, I try (sic) to fit additional information into that image. Then new information doesn't fit that image for sure, I would discard it and start trying a new hypothesis." -10 "I attempted to build a picture!" -14 Other listeners provided other insights into the process by which they arrived at the correct response: "Once I thought it was one thing, it is (sic) difficult to change trail of thought in other direction." -11 "I would hear the noise and then find as many things that would fit the sound and pick the most likely but fell into a mind set with the first one where I could not picture anything else that would fit the sound then made an educated guess with it (sic)."-18 .114 "I listened each time for the possability (sic) of something other than what I though I was hearing. When I was not very certain I was especially open to reinterpreting the sound. It was difficult to answer when there were 2 equal possabilities (sic)." -16 "I would take a wild guess first, then take other surrounding clues and narrow down the possibilities"-13 "I found it interesting how I could jump between 2 possible targets in the same trial over several exposures to the same sound. I found I also got 'locked in' to one target which made it hard to think of other possibilities." - 01 "At first I thought it was important to correctly identify the sound within the first few guesses. After doing a few I went with whatever I thought and didn't worry about getting it right so quickly - instead I talked aloud - 'free associating' until I got right answer." -14 Comments were also made regarding the length of the experiment. The total length of most of the soundfiles were quite long, and subjects who heard many gates for each soundfile reported that the task became tedious. "As an aurally aware (acutely) person, I found the length of the sequences long. Perhaps more than 4 sounds per session would be better with less repetition of each." -17 115 "The exp't (sic) was fun overall, but tedious when the sound sequences were long." -13 , "I found that after a while, more sound on either side of the target was not helpful in identifying the target after a while (sic). The result was I had made up my mind what the sound was (or that I didn't know what the sound was) and would stop trying to identify it." -06 It is possible that the factors the participants themselves described may have had an influence on performance, and are worth investigating thoroughly. For example, regarding the knowledge-base of the participants, just as we have methods for assessing vocabulary to categorize participants in language-based experiments, a counterpart measure may be useful. Since environmental sound identification and processing is far less studied, there are no standard tests of sound categorization, but such tests could be developed. In addition, it may be useful in future experiments to vary experimental design to address the attentional demands on the listener in such a task, so that tedium or fatigue does not affect the results. This may be accomplished by using soundfiles of shorter total duration. It has been mentioned, but it bears repeating here, that any effects of context appear to be influenced by both the amount and the nature of the components. For this reason, future studies should try to better control for these factors. Some suggestions for how this may be accomplished have been noted. Generally, however, it seems that in order to strictly control content in terms of the patterning of discrete or repetitive auditory events, the stimuli may 116 require a significant degree of experimental manipulation. If the goal is to determine how listeners identify natural, real-world recordings, substantial manipulation would likely be undesirable. Some of the patterns observed in the cohorts for different sounds have been discussed. It may be interesting to perform a more in-depth analysis to better understand the bases for the cohort for various sounds. Further study may also address the effects of noise on the identification of environmental sound. Observation has shown that presenting an auditory object in a noisy recording caused the subject to report difficulty in determining the intended target. A listener who heard the soundfiles at the pilot test stage reported that it was extremely difficult to determine which sound was the target in a soundfile that was recorded at a noisy department store. The target, the sound of a cash register printing a receipt was heard simultaneously with music on the public address system, voices speaking, footsteps, and many other auditory objects. This difficulty demonstrates the importance of understanding how environmental sounds are heard against a background of competing sounds or noise. A future direction for this type of study may require modification of the stimuli and task, for example, the subject may be asked to report everything they hear, or to describe the auditory scene depicted in the recordings. It might also be possible to provide a visual stimulus to set the context; for example, a picture of a department store. It may be interesting to investigate how well listeners are able to identify sounds that vary in perceived importance, depending on the listener's goals. That is, some sounds are useful, such as the warning bells of the Skytrain doors, or an approaching vehicle. Other sounds may be pleasant, but much less informative for the typical listener's purposes, such as .117 a flock of ducks taking off or waves on a beach. It may be informative to study whether or not listener performance varies between sounds that are more often foregrounded, or actively attended to, and sounds that are usually backgrounded, or not actively attended to. Finally, it would be very interesting to assess the identification abilities of hard of hearing listeners. Ramsdell (1970) reported that the auditory world of many deafened adults becomes dead. Determining how these listeners perceive and interpret environmental sounds would be a worthwhile endeavor. The importance and utility of environmental sounds has been recognized in the development of auditory tests that use environmental sounds as test items for hearing assessment in different populations. Several of the standard test batteries used for listeners with cochlear implants include environmental sounds, for example the Minimal Auditory Capabilities battery (MAC), the Iowa Test Battery, and the HIPPS Profile (Faulkner & Read, 1991). A hearing test for the pediatric population called the Sound Effects Recognition Test (SERT) was developed that can asses hearing in children and other listeners with limited verbal abilities (Finitzio-Hieber, Gerling, Matkin, & Cherow-Skalka, 1980). The tests were originally developed to assess hearing in listeners with limited speech discrimination abilities. For example, the M A C battery was developed in 1981, and was intended for use in assessment of the early, single-channel cochlear implants, the candidates for which would have had poor speech discrimination abilities. The nature of these tests may change over time, as selection criteria are expanded and technology improves. It would also be useful to develop protocols for use with a broader group of listeners. Such a test could be useful in clinical protocol, or for specific purposes, such as hearing aid fittings. It might also be illuminating to assess listeners' awareness of environmental sounds before and after an experimental task or therapy. It is possible that this could inform both the researchers and 118 hard of hearing listeners themselves about the experience of hearing loss from an ecological perspective. Finally, an environmental sound-based test may be useful in the study of central auditory processing disorders (CAPD). It would be illuminating to determine whether the auditory processing disruptions observed in listeners with C A P D are language-based, or more generalize in the auditory system. 119 REFERENCES Ballas, J.A. (1993). Common factors in the identification of an assortment of brief everyday sounds. Journal of Experimental Psychology: Human Perception and Performance, 19(2). 250-267. Ballas, J.A., & Howard, J.H. (1987). Interpreting the language of environmental sounds. Environment and Behavior, 19(1^ ), 91-114. Ballas, J.A., & Mullins, T. (1991). Effects of context on the identification of everyday sounds. Human Performance, 4(3), 199-219. Bregman, A.S. (1990). Auditory scene analysis. Cambridge: MIT Press. Carpenter, P.A., Miyake, A . , & Just, M A . (1994). Working memory constraints in comprehension: Evidence from individual differences, aphasia, and aging. In M . A . Gernsbacher (Ed.), Handbook of psycholinguistics (pp. 1075-1122). New York: Academic Press. Crandell, C. (1993). Speech recognition in noise by children with minimal degrees of sensorineural hearing loss. Ear & Hearing. 14(3). 210-216. CSRE (4.5) (1995). Computer Speech Research Environment. [Computer software]. London, Ont: A V A A Z Innovations, Inc. Cycowicz, Y . M . , & Friedman, D. (1998). Effect of sound familiarity on the event-related potentials elicited by novel environmental sounds. Brain and Cognition. 6. 30-51. Daneman, M . , & Carpenter, P A . (1980), Individual differences in working memory. Journal of Verbal Learning and Behavior. 19. 450-466. Dillon, L . M . (1995). The effect of noise and syntactic complexity on listening comprehension. Unpublished master's thesis, University of British Columbia, Vancouver, British Columbia, Canada. Faulkner, A . , & Read, T. (1991). Speech perception and its assessment. In H . Cooper (Ed.), Cochlear implants: A practical guide (pp. 251-282). San Diego, C A : Singular Publishing Group Finitzo-Hieber, T., Gerling, IJ. , Matkin, N.D. , & Cherow-Skalka, E. (1980). A sound effects recognition test for the pediatric audiological evaluation. Ear & Hearing. 1(5). 271-276. Finitzo-Hieber, T., & Tillman, T.W. (1978). Room acoustics effects on monosyllabic word discrimination for normal and hearing impaired children. Journal of Speech and Hearing Research. 21(3). 440-458. 120 Greenberg, S. (1996). Auditory processing of speech. In N.J. Lass (Ed.), Principles of experimental phonetics (pp. 362-407). S. Louis, M O : Mosby. Grosjean, F. (1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics, 28(4), 267-83. Grosjean, F. (1996). Gating. Language and Cognitive Processes, 11(6). 597-604. Haydon, S.P., & Spellancy, F.J. (1973). Monaural reaction time asymmetries for speech and non-speech sounds. Cortex. 9(3). 288-294. Hickock, G., Bellugi, U . , & Klima, E.S. (1996). The neurobiology of sign language and its implications for the neural basis of language. Nature, 38 L 669-702. Howard, J.H., & Ballas, J.A. (1980). Syntactic and semantic factors in the classification of nonspeech transient patterns. Perception & Psychophysics. 28(5). 431-439. Howard, J.H., & Ballas, J.A. (1982). Acquisition of acoustic pattern categories by exemplar observation. Organizational Behavior and Human Performance. 30. 157-173. Jackendoff, R. (1987). Consciousness and the Computational Mind. Cambridge, M A : A Bradford Book, The MIT Press. Kemper, S. (1992). Language and aging. In F.I.M. Craik & T.A. Salthouse (Eds.), The handbook of aging and cognition, (pp. 213-270). Hillsdale, NJ: Lawrence Erlbaum Associates. Liberman, A . M . , Cooper, F.S., Shankweiler, D.P., & Studdert-Kennedy, M . (1967). Perception of the speech code. Psychological Review 74, 431-461. Marslen-Wilson, W.D., & Tyler, L . K . (1980). The temporal structure of spoken language understanding. Cognition, 8, 1-71. Marslen-Wilson, W.D., & Welsh, A . (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10, 29-63. McAdams, S. (1993). Recognition of sound sources and events. In S. McAdams & E. Bigand (Eds.), Thinking in sound: The cognitive psychology of human audition, (pp. 146-198). Oxford: Clarendon Press. Miller, G.A. (1951). Language and communication. New York: McGraw-Hill Morton, J., & Chambers, S.M. (1976). Some evidence for 'speech' as an acoustic feature. British Journal of Psychology, 67(1), 31-45. 121 Noble, W. (1983). Hearing, hearing impairment, and the audible world: A theoretical essay. Audiology. 22, 325-338. O'Grady, W., & Dobrovolsky. (1992). Contemporary linguistic analysis (2 n d ed.) Toronto, Canada: Copp Clark Pitman Ltd. Olsen, W.O., & Matkin, N.D. (1991). Speech audiometry. In W.F. Rintelmann (Ed.), Hearing assessment (2 n d ed., pp. 39-140). Needham Heights, M A : Allyn and Bacon. Ramsdell, D. A . (1970). The psychology of the hard-of-hearing and the deafened adult. In H . Davis & S.R. Silverman, (Eds.), Hearing and Deafness. (3 r d ed., pp.435-446). New York: Hold Rinehart and Winston. Schnieder, B.A. , & Pichora-Fuller, M . K . (in press). Implications of perceptual deterioration for cognitive aging research. In F. Craik & T. Salthouse (Eds.), The handbook of cognitive aging. (2 n d ed.) Lawrence Erlbaum Associates. Truax, B. (1996). Soundscape, acoustic communication and environmental sound composition. Contemporary Music Review. 15(1). 49-65. Tyler, L . K . (1984). The structure of the initial cohort: Evidence from gating. Perception & Psychophysics. 36(5). 417-427. Van Petten, C , & Rheinfelder, H . (1995). Conceptual relationships between spoken words and environmental sounds: Event-related brain potential measures. Neuropsychologia. 33(4). 485-508. Waters, G.S., & Caplan, D. (1996). The measurement of verbal working memory capacity and its relation to reading comprehension. The Quarterly Journal of Experimental Psychology. 49A(1). 51-79. Wingfield, A . (1996). Cognitive factors in auditory performance: Context, speed of processing, and constraints of memory. Journal of the American Academy of Audiology, 7, 175-182. Wingfield, A . , Alexander, A . H . , & Cavigelli, S. (1994). Does memory constrain utilization of top-down information in spoken word recognition? Evidence from normal aging. Language and Speech. 37(3). 221-235. Wingfield, A. , Aberdeen, J.S., & Stine, E .A.L. (1991). Word onset gating and linguistic context in spoken word recognition by young and elderly adults. Journal of Gerontology: Psychological Sciences. 46 (3). 127-129. 122 APPENDIX A Participants' Pure Tone Thresholds (dBHL) for Right (R) and Left (L) Ears Participant code Test Frequency 250 Hz 500 Hz 1000 Hz 2000 Hz 4000 Hz 8000 Hz R L R L R L R L R L R L High Working Memory Span Group 01 0 0 0 5 5 5 -10 o 5 5 10 5 02 0 0 0 -5 0 -5 0 -5 -5 0 5 0 03 5 0 5 0 0 5 -5 -10 -10 0 0 -5 04 0 0 0 -5 0 -5 0 -10 -5 -5 5 10 05 5 0 0 0 0 0 0 0 0 0 0 0 06 5 0 0 -5 -5 . -5 -5 -5 0 0 10 5 07 10 5 15 5 5 5 5 5 5 5 5 20 08 5 5 5 5 0 0 -10 -5 -5 -5 -5 5 09 0 5 5 5 0 0 -5 -10 0 -10 0 10 10 0 -5 -5 -10 0 -5 0 5 0 -5 5 5 11 5 5 5 0 5 0 5 -5 -5 -10 5 0 12 5 5 10 0 10 5 5 5 0 0 5 5 13 10 5 5 0 -5 0 5 0 0 -5 15 5 14 20 0 10 0 0 5 10 5 -5 -10 5 0 15 5 0 10 10 5 5 -5 5 -5 0 15 25 16 15 5 10 15 0 0 0 5 0 15 5 5 Low Working Memory Span Group 17 0 0 5 5 10 10 10 5 0 5 20 0 18 0 0 0 0 0 -5 -5 -5 0 0 5 5 19 10 10 5 5 5 5 -5 5 -5 -5 10 10 20 0 -5 5 0 5 0 5 5 -5 -10 -5 5 Note. H L = Hearing Level 123 APPENDIX B Instructions for the Reading Working Memory Span Test Participants read the following: INSTRUCTIONS READING SPAN T E S T -In this task you will be presented with a series of unrelated sentences printed on the screen. Whenever a sentence is presented to you, you are to read the sentence out loud. After you finish reading the sentence, a new sentence will appear on the screen and you are to read that one; keep doing this until you see a blank screen. The blank screen signifies that the trial is over. What you are to do then is to say back to me the last word in each of the sentences in the trial. For example, suppose you read the following sentences: A ROSE IS STILL A ROSE B Y A N Y OTHER N A M E . FRIENDS, ROMANS, C O U N T R Y M E N - L E N D M E Y O U R EARS. Upon seeing the blank screen, you would say, " N A M E , EARS" . If possible, you are to say out the last words in the order in which they were presented. If you are unable to remember them sequentially, you may say them in any order, but you should not start with the last sentence first unless it is the only one you can remember. Your goal is to try to say back as many of the last words in the trial as possible. We will be starting off with trials consisting of two sentences and will be periodically increasing the number of sentences per trial and I will warn you when this is about to happen. That is, we will progress to three-sentence trials, then four-sentence trials, and so on. The first few trials are for practice so that you can get the hang of it. 124 APPENDIX C Participants' Characteristics Participant Code Age Sex Handedness Years in GVA First Language RWMS Scores PTA in dBHL SRT in dBHL WD' @35/ SI r(% 40dB Participant Code Age Sex Handedness Years in GVA First Language 1st R L R L R L High Wor dng Memory Span Group 01 32 F R 2 E 4 4 -1.6 3.3 -5 -5 100 100 02 28 F R 4+ E 5 2/3 6 0 -5 0 -5 96 96 03 20 F R 2+ E 3 1/3 3 1/3 0 -1.6 0 -5 100 96 04 28 F R 3+ E 3 2/3 3 2/3 0 -6.6 0 -5 100 96 05 20/ 21 M R 5+ E 3 2/3 3 0 0 0 0 100 100 06 24 M R 4 E 3 3 1/3 -3.3 -5 0 -5 100 100 07 24 F R 2+ E 3 2/3 3 2/3 8.3 5 5 5 92 92 08 26 F R 3+ E 3 1/3 3 2/3 -1.6 0 -5 -5 96 96 09 29 F R 2+ E 3 3 2/3 0 -1.6 0 -5 96 100 10 26 F R life E 4 2/3 5 -1.6 -3.3 0 -5 100 100 11 19 F R life E 3 1/3 3 2/3 5 -1.6 5 0 96 96 12 19 M R 10+ C 3 1/3 3 1/3 8.3 3.3 5 0 100 100 13 30 F R 11 D 3 2/3 3 1/3 1.6 0 0 0 100 92 14 35 M R 2+ E 3 4 6.6 3.3 5 5 100 100 15 30 F R 6+ E 3 3 1/3 3.3 6.6 5 5 100 100 16 28 F R 4+ E 3 1/3 4 3.3 5 0 5 . 92 100 Low Wor dng Memory Span Group 17 31 M R 10+ E 2 2 2/3 8.3 6/6 5 5 100 100 18 28 M R a life E 2 2/3 3 -1.6 -3.3 0 0 96 100 19 21 F R life E 2 2/3 2 2/3 1.6 5 0 -5 96 100 20 21 F R a 10+ C 2 1/3 2 1/3 5 1.6 0 -5 100 100 Note. RWMS = Reading working memory span; PTA = Pure tone average; WDT = Word detection threshold; SRT = Spondee reception threshold; G V A = Greater Vancouver Area; SL = Sensation level; R a = Participant is ambidextrous, but generally writes with right hand; E = First language (Ll ) is English; C = L l was a Chinese dialect, but participant began learning English simultaneously with first language acquisition, no later than 3 years of age; D = L l was Dutch. Participant began learning English at the age of eight and speaks English as the primary language. 125 APPENDIX D Source of Experimental Soundfiles from SoundScape Archives with Descriptions Soundfile Tape # Take# Description Provided by Soundscape Project Archives Index Duration of take 1 .3 18 Main Street bus. Boarding bus, sitting in rear, conversation between child and mother at 3:57, change into dispenser at 0:25. 5:59 2 8 7 Gastown. Harley Davidson motorcycle. 0:56 3 3 3 Waterfront Skytrain station. Station ambience, train arrival and departure, stationary mic position. 2:58 4 3 12 Lost Lagoon, ambience. Higher levels than take 11, ducks, distant park traffic, nice ambience, ducks take off at 2:01, loud duck at 3:00, some geese. 4:54 5 1 37 Computer booting. Mic placed on hard drive 0:30 6 1 40 Hearth fire; burning paper. Match strikes and fire starts, nice resonance in chimney, crackling paper. 2:00 7 1 39 Computer printer, dot matrix. Printing one page and line feeding. 0:20a 8 1 8 French Beach, ocean waves. Large waves, steep gravel shore, dynamic undertow, nice spectral movement. 1:13 Practice 8 10 Warehouse area. Good, clear sounds of closing door. (Identified as Vancouver Port Authority) 1:05 Note. a I be ieve this to be an error in the index, as the duration of this take actual y exceeds 0:20 seconds. Where descriptions are underscored, this indicates the auditory events of interest within soundfiles containing several auditory events. These were the portions used for the purposes of this study. Durations of the relevant portions are included in Appendix E. 127 Appendix E, con't. 128 C O (/} o 130 APPENDIX F Characteristics of Experimental Stimuli Degree Gates Gates Duration of Soundfile Target of Preceding Following Entire context Target Target Stimuli 1 Change dropping into bus fare box. High 27 52 31.581 sec 2 Revving of motorcycle engine - Harley Davidson Low 20 49 27.883 sec 3 Skytrain warning chimes. High 49 56 42.408 sec 4 Ducks taking off from water. Low 8 17 10.006 sec 5 Computer reading drive as it boots up. High 30 41 28.253 sec 6 Fire crackling. Low 27 39 30.488 sec 7 Dot matrix printer printing. High 38 41 31.900 sec 8 Waves on a gravelly shore. Low 18 18 14.338 sec Practice Squeaking of door hinges. 9 9 7.6 sec 131 APPENDIX G Calibration of Experimental Stimuli Part 1: Voltages 1 kHz Calibration Tone 2.3786 V Soundfile 1 0.7953 V Soundfile 2 1.0057 V Soundfile 3 1.7677 V Soundfile 4 0.4087 V Soundfile 5 0.7099 V Soundfile 6 0.2535 V Soundfile 7 0.6543 V Soundfile 8 1.0785 V Practice Soundfile 0.2743 V Part 2: Procedure Step 1: Apply the equation 20 log (voltage A / voltage B) to determine the difference in decibels between the two voltage values. The 1-kHz calibration tone will be the standard for comparison, as the intensity for this tone is known. It was measured in both headphones using a sound level meter, as per the description in the Methods chapter, and found to be 88.1 d B A . 132 Appendix G, con't... The product of this equation reveals how much greater or lesser voltage A (values used will be those found for the experimental stimuli) is in relation to voltage B (the value found for the 1 kHz calibration tone). For all equations, voltage B will be the value found for the calibration tone, 2.3786 V. In cases in which the result is negative, voltage B is greater in intensity than voltage A . Conversely, in cases in which the result of the equation is positive, voltage A is greater in intensity than voltage B. A l l soundfiles used in this experiment were found to be lower in intensity than the 1-kHz calibration tone. Step 2: Once the difference in decibels has been calculated, the intensity of the original soundfiles may be calculated by subtracting the result of Step 1 from the known intensity of the 1-kHz tone (88.1 d B A ) . Step 3: In order to provide a comfortable listening level to the subjects, the presentation level of the stimuli were attenuated. The amount by which to attenuate the stimuli was calculated by subtracting 70 dB SPL (the pre-determined desired presentation level) from the result of Step 2 (the effective intensity of the unattenuated stimuli). The results of Step 3 were entered into the ecosgen program in the corresponding blocks so that each of the soundfiles would be attenuated by the appropriate amount when presented to subjects. In cases in which the result of this step were negative, stimuli would be presented unattenuated. Appendix G, con't... Part 3: Calculations Soundfile 1 Stepl : 20 log (0.7953 / 2.3786) = -9.516 = |9.516| Step 2: 88.1 d B - 9.516 dB = 78.584 Step 3: 78.584 d B - 7 0 dB = 8.6 dB Soundfile 2 Stepl : 20 log (1.0057 / 2.3786) = -7.477 = |7.477| Step 2: 88.1 dB - 7.477 dB - 80.623 dB Step 3: 80.623 d B - 7 0 d B = 10.6 Soundfile 3 Step 1: 20 log (1.7677 / 2.3786) = -2.578 = |2.578| Step 2: 88.1 dB - 2.578 dB = 85.522 dB Step 3: 85.522 d B - 7 0 dB = 15.5 dB Soundfile 4 Step 1: 20 log (0.4087/2.3786) =-15.298 = |15.298| Step 2: . 88.1 dB - 15.298 dB = 72.802 dB Step 3: 72.802 dB - 70 dB = 2.8 dB Soundfile 5 Step 1: 20 log (0.7099 / 2.3786) = -10.502 = |10.502| Step 2: 88.1 dB - 10.502 dB = 77.598 dB Step 3: 77.598 dB - 70 dB = 7.6 dB Appendix G, con't... Soundfile 6 Step 1: 20 log (0.2535 / 2.3786) = -19.447 = |19.447| Step 2: 88.1 dB - 19.447 dB = 68.653 dB Step 3: 68.653 dB - 70 dB s 0 dB Soundfile 7 Step 1: 20 log (0.6543 / 2.3786) = -11.210 = |11.210| Step 2: 88.1 dB - 11.210 dB = 76.89 dB Step 3: 76.89 dB - 70 dB = 6.9 dB Soundfile 8 Step 1: 20 log (1.0785 / 2.3786) = -6.870 = |6.870| Step 2: 88.1 dB-6.870 dB = 81.23 dB Step 3: 81.23 d B - 7 0 d B = 11.2 dB Practice Soundfile Step 1: 20 log (0.2743 / 2.3786) = -18.76 = |18.76| Step 2: 88.1 d B - 18.76 dB = 69.34 dB Step 3: 69.34 dB - 70 dB s OdB APPENDIX H Schematic of the TDT Set-up 136 APPENDIX I Order of Stimuli by Soundfile Number and Condition Order of Subject Number Soundfiles 1 2 3 4 5 6 7 8 <1 <2 <3 . <4 <5 <6 <7 <8 2> 3> 4> 1> 6> 7> 8> 5> Session 1 <3 <4 <1 <2 <7 <8 <5 <6 4> 1> 2> 3> 8> 5> 6> 7> <8 <5 <6 <7 <4 <1 <2 <3 7> 8> 5> 6> 3> 4> 1> 2> Session 2 <6 <7 <8 <5 <2 <3 <4 <1 5> 6> 7> 8> 1> 2> 3> 4> (continued...) Order of Subject Number Soundfiles 9 10 11 12 13 14 15 16 1> 2> 3> 4> 5> 6> 7> 8> <2 <3 <4 <1 <6 <7 <8 <5 Session 1 3> 4> 1> 2> 7> 8> 5> 6> <4 <1 <2 <3 <8 <5 <6 <7 8> 5> 6> 7> 4> 1> 2> 3> <7 <8 <5 <6 <3 <4 <1 <2 Session 2 6> 7> 8> 5> 2> 3> 4> 1> <5 <6 <7 <8 <1 <2 <3 <4 Note: Left arrow (<) indicates preceding context. Right arrow (>) indicates following context. 137 APPENDIX J Participant Instructions The participants were read the following: For this study, we have used recorded segments of less than one minute in duration. These recordings were made in Vancouver, and consist of naturally occurring sound events. What I have done is selected a very small portion of each of these recordings and removed it from the rest. I call this the target. I will play you this section over the earphones and ask you to describe to me, with as much detail as possible, what event is represented by the sound. However, this snippet is so small, it is unlikely that you will be able to tell what it is right away. So what I have done is cut the recording into progressively longer segments by adding more either before or after the portion I am interested in. By making the segments you hear longer and longer, I hope to provide you with enough information to correctly identify the target. The objective is to find out how much of the sound you need to hear to correctly identify the target. When I have played each segment to you, I would like you to tell me first, what you heard, and second, how confident you are that your guess is correct. Again, what I am most interested in is the original snippet. Try to provide a noun and a verb in your answer to describe the cause of the noise. Sometimes it helps to talk through the 'scene' that you hear, and I will record the comments you make. However, I cannot move on to the next segment until you have responded. Also, your confidence rating should be reported each time, based on a scale of 1 to 10. A rating of 1 indicates little or no confidence that your response is correct, a rating of 10 indicates that you are certain your response is correct. When you have provided a response and a confidence rating, I will play the next longer segment to you. I would like to make a few comments about responses. First, i f your guess hasn't changed from one segment to the next, you can say, 'same answer, same score'. Also, please be as explicit as you can be in your responses. For instance, instead of responding,' that was a knock', try to describe the objects making the sound. A more specific response might be, 'that sounds like a hard object hitting a wooden object'. In some cases, once you have heard all of the sound in one direction, I will start adding on in the other direction. When this happens, you will hear the entire recording in the original direction, plus you will start hearing bits of the recording added on in the opposite direction. So, for example, i f you have heard everything that comes before the target, you will continue by hearing the beginning portion plus increasing segments after it. It might become difficult to remember what the target is because it is no longer at one edge of the segment you are hearing. Just try to remember enough to help you establish where it comes in, and do the best you can. Last, some of these sounds are harder to guess than others, and some are longer than others. Just do the best you can, and try not to get frustrated. No matter what your responses are, they are still telling me something useful. Do you have any questions before we begin? 138 APPENDIX K Reduced Graphic Representation of Response Form Environmental Sounds Study: Score Sheet Sheet X o f ^ Subject Code: Q\ Date: . " W .0. , I <fi9 Session: ^ ) .2 Soundfile: (T) 2 3 4 Direction of context: i/JPreceding Beginning—>Following sample Following Preceding~>£«(i Gate # Response Additional Comments Conf. Score • % ' "I I A** J o . vt«iV.<?S C 7 f r v vo.ri^ 2 5 - £ U k Voices -r&Uc4< (o . M t y ^ . K > riLlirc tst^rLS / - t /<*.<•• LOMi I ft. I q r i j >j 1 1 / I I o j w a / i d AY n ^ k l L A n il • >J ' 0 3 — M IZ it \n\\ mote. t s u / e 4 W ^ > 13 H II u 16 II V We 11 \? nps.m'vut. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0089022/manifest

Comment

Related Items