Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Cross-modal effects on speech perception : the influence of text on the resolution of ambiguous spoken… Shoolingin, Allen 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2005-0635.pdf [ 5.19MB ]
Metadata
JSON: 831-1.0092168.json
JSON-LD: 831-1.0092168-ld.json
RDF/XML (Pretty): 831-1.0092168-rdf.xml
RDF/JSON: 831-1.0092168-rdf.json
Turtle: 831-1.0092168-turtle.txt
N-Triples: 831-1.0092168-rdf-ntriples.txt
Original Record: 831-1.0092168-source.json
Full Text
831-1.0092168-fulltext.txt
Citation
831-1.0092168.ris

Full Text

C R O S S - M O D A L EFFECTS O N SPEECH PERCEPTION: T H E I N F L U E N C E OF T E X T O N T H E RESOLUTION OF A M B I G U O U S S P O K E N WORDS by A L L E N SHOOLINGIN B . S c , The University of Victoria, 2000  A THESIS SUBMITTED IN P A R T I A L F U L L F I L L M E N T OF T H E REQUIREMENTS FOR THE D E G R E E OF M A S T E R OF SCIENCE in THE F A C U L T Y OF G R A D U A T E STUDIES (Audiology and Speech Sciences)  T H E UNIVERSITY OF BRITISH C O L U M B I A October 2005  © Allen Shoolingin, 2005  ABSTRACT  The current study explores the influence of printed words on spoken word recognition. Theoretical frameworks outline ties between the modalities (Massaro, 1987) and, in particular, orthography and phonology (Grainger, Diependaele, Spinelli, Ferrand, & Farioli, 2003). This study investigated whether text can influence the resolution of ambiguous spoken words, and whether this influence can occur when the text is displayed below conscious awareness. Twenty-two young adults participated in a cross-modal word repetition task (i.e., textual prime followed by an auditory target which was repeated). The textual primes were either identifiable to the participants or not. The auditory targets were either unambiguous or ambiguous (through digital editing of the voice onsets of spoken words). The pairing of the prime and target words was also manipulated to form various priming relationships (e.g., pan-PAN, ban-PAN, net-PAN, etc.). The results show that text can facilitate or inhibit the processing of ambiguous spoken words, providing some evidence of interconnectivity between orthography and phonology. These effects appear to be limited to when the text is consciously identifiable, suggesting that cross-modal priming is dependent on the length of exposure of a printed word. There was also a general perceptual bias by participants to identify (i.e., repeat) the ambiguous targets as voiced (e.g., BAN), indicating that the degree of ambiguity was not sufficient to allow preceding text to influence the identification of these targets.  T A B L E OF CONTENTS  ABSTRACT  ii  T A B L E OF CONTENTS  iii  LIST OF T A B L E S  viii  LIST OF FIGURES  .'  ix  ACKNOWLEDGEMENTS  x  C H A P T E R 1: L I T E R A T U R E REVIEW  1  1.1  Introduction  1  1.2  Processes of Auditory Word Recognition  2  1.2.1  Ambiguity and its relevance to auditory word recognition  2  1.2.2  Ambiguity resolution  3  1.2.3  Research on constraints of auditory word recognition  4  1.2.4  Interactivity between levels of activation: A connectionist framework  5  1.3  Visual information and speech perception  6  1.4  Models of spoken and visual word recognition  9  1.4.1  Models of spoken word recognition  :  11.  1.4.1.1  Fuzzy Logic Model of Perception  11  1.4.1.2  Evaluation of the F L M P  12  1.4.1.3  Limitations of models of spoken word recognition  12  1.4.1.4  Summary  13  1.4.2  Models of visual word recognition  14  1.4.2.1  Bimodal interactive activation model  14  1.4.2.2  Evaluation of the bimodal interactive activation model  14  iv 1.4.2.3 1.4.3 1.5  Limitations of models of visual word recognition..... Relating spoken and visual word recognition models  The present study  16 16 17  1.5.1  Reaction time  19  1.5.2  Repetition identity  21  C H A P T E R 2: M E T H O D S  22  2.1  Introduction  22  2.2  Pilot study  22  2.2.1  Overview  22  2.2.2  Participants  22  2.2.3  Materials  23  2.2.3.1  Stimuli description  23  2.2.3.2  Stimuli preparation  23  2.2.3.2.1  Sound recording  2.2.3.2.2  Low-pass  2.2.3.2.3  Amplification  24  2.2.3.2.4  Noise gating  25  2.2.3.2.5  Voice Onset Time (VOT) continua construction  26  2.2.4  2.3  23 filtering  Procedure  24  28  2.2.4.1  Participant instructions  28  2.2.4.2  Experimental testing  28  Cross-modal priming study 2.3.1  Overview  ;  31 31  V  2.3.2  Participants  ;  31  2.3.2.1  Hearing screening  32  2.3.2.2  Reading screening...:  32  2.3.2.3  Participant summary  32  2.3.3  Experimental task and conditions  33  2.3.4  Materials  34  2.3.4.1  Stimuli description  34  2.3.4.1.1  Auditory (target) stimuli  34  2.3.4.1.2  Visual (prime) stimuli  34  2.3.5  Procedure  35  2.3.5.1  Participant instructions  35  2.3.5.2  Experimental testing  36  C H A P T E R 3: RESULTS  41  3.1  Overview  41  3.2  Preparation of the data for analysis  41  3.3  3.2.1  Equipment error  41  3.2.2  Response errors  42  3.2.3  Extreme outliers  42  Reaction time data  42  3.3.1  Omnibus A N O V A  43  3.3.2  Prime-target trial type  44  3.3.2.1  Ambiguous versus unambiguous trials  45  3.3.2.2  Matching versus unrelated trials  45  vi  3.4  3.5  3.3.2.3  Voiced and voiceless bias versus unrelated ambiguous trials  46  3.3.2.4  Opposing versus unrelated trials  46  3.3.3  Prime duration  46  3.3.4  Prime-target trial type and prime duration  50  3.3.5  Filler trials  53  Word repetition data  56  3.4.1  Unambiguous targets  56  3.4.2  Ambiguous targets across all conditions  56  3.4.3  Ambiguous targets by prime type  57  3.4.4  Frequency of responses matching prime  60  3.4.4.1  Biasing identification of ambiguous targets  60  3.4.4.2  Biasing identification by prime duration  61  Summary of results  62  C H A P T E R 4: DISCUSSION  64  4.1  Introduction  64  4.2  Reaction time data  66  4.2.1  Prime-target trial type  66  4.2.1.1  Ambiguous versus unambiguous trials  66  4.2.1.2  Matching versus unrelated trials  67  4.2.1.3  Voiced and voiceless bias versus unrelated ambiguous trials  67  4.2.1.4  Opposing versus unrelated trials  68  4.2.2  Prime duration  69  4.2.3  Prime-target trial type and prime duration  69  Vll  4.2.3.1 4.3  4.4  Cross-modal effects with briefly displayed primes  70  Word repetition data  71  4.3.1  Perceptual bias effect  71  4.3.2  Frequency of responses matching prime  72  4.3.2.1  Biasing identification of ambiguous targets  72  4.3.2.2  Biasing identification with briefly displayed primes  73  General discussion  74  4.4.1  Print and the resolution of ambiguity in speech  :  4.4.2  Cross-modal influence below level of awareness  77  4.4.3  Summary  78  :  74  4.5  Theoretical account of the data  78  4.6  Limitations of the study  80  4.7  Future directions  81  4.8  Conclusion  81  REFERENCES  .'  83  A P P E N D I X I: Percentage of voiced identification versus voice onset time  92  A P P E N D I X II: Auditory (target) stimuli  95  A P P E N D I X III: Visual (prime) stimuli  96  A P P E N D I X IV: Trial list 1  97  LIST OF T A B L E S  Table 2.1: Selected V O T continua sound files to serve as ambiguous stimuli  30  Table 3.1: Target reaction times (ms) by prime duration and prime-target pair type  43  Table 3.2: Planned comparisons of experimental trials by prime-target trial type  45  Table 3.3: Reaction time means and difference (ms) by prime duration  47  Table 3.4: Post hoc comparisons of session trial sections  49  Table 3.5: Paired t-tests of experimental trials by prime-target trial type (long)  52  Table 3.6: Comparisons of voiceless responses to ambiguous targets by prime type..... 59 Table 3.7: Frequency of responses matching visual prime by prime type and duration . 60  ix LIST OF FIGURES  Figure 1.1: A n example of a language model  :  10  Figure 2.1: Trial sequence in the short (33-ms) prime duration condition  38  Figure 2.2: Trial sequence in the long (100-ms) prime duration condition  38  Figure 3.1: Reaction time averages over the course of the session  48  Figure 3.2: Reaction time of experimental trials by prime-target trial type and prime duration  51  Figure 3.3: Reaction time of filler trials by prime-target trial type and prime duration.. 54 Figure 3.4: Summary of word repetition responses to ambiguous targets  57  Figure 3.5: Percentage of voiceless, voiced, and error responses to ambiguous targets by prime-target pairings  58  Figure 3.6: Percentage of voiceless, voiced, and error responses to ambiguous targets by prime type  59  X  ACKNOWLEDGEMENTS  I would first like to thank Dr. Jeff Small for his support, patience, and guidance at all stages of my work. Likewise, I thank my committee, namely Dr. Joseph Stemberger, Dr. Barbara Bernhardt, and Barbara Purves for their support and contributions. M y thanks extend to the faculty and staff of the School of Audiology and Speech Sciences for facilitating me throughout the process. Thanks are warranted for my friends for rounding out my academic character with adventurous trips to Karaoke, mini-golf, and the prairies. I want to thank my dad for his unconditional love and.pride in what I do. Special thanks go to Murray for providing me an escape from my work, all the while understanding whenever I need to change our plans in order to meet deadlines.  1 C H A P T E R 1: L I T E R A T U R E R E V I E W  1.1  Introduction There are examples in the world where text and speech coincide. For example  many new televisions have closed captioning that supplements the audio signal with onscreen text for the hearing impaired. Lecturers often use text in their multimedia presentations to visually support their speech. Likewise, students in class often listen to the teacher's spoken words while simultaneously copying her or his printed words off an overhead projector. As well, young children follow the words in a storybook as an adult reads to them (Owen & Borowsky, 2003). While speech and text differ in modality, both acts of communication are abstract, functional, and in different ways learned. Also, when available in a language, orthography is generally based on the language's phonology. For most adult readers of such a language, the written word automatically conjures its sound in the mind (Frost, 2003). This link between speech and text, or more particularly phonology and orthography, has been the object of much experimental research and is conveyed in models of auditory and visual word recognition (see sections 1.4). This study investigates whether text can influence the resolution of lexically ambiguous spoken words. The study also examines whether such an influence can occur below the subject's awareness through a rapidly presented visual stimulus. While the specific domain of the study is the interface between orthography and phonology, the study encompasses a set of overlapping and distinct topics, including auditory word recognition, ambiguity resolution, and visual word recognition.  2 1.2  Processes of Auditory Word Recognition Recognizing spoken words includes different levels of representation. As  depicted in Caplan (1992, p. 13), a spoken word is analyzed into sublexical and lexical phonological units. As well, semantic representations are activated, accessing the word's meaning. How this process occurs is complex, and research into this field is ongoing. The current inquiry will describe some of the processes and factors that affect auditory word recognition, particularly those related to the lexical access of phonological forms. In terms of auditory word recognition, the following subsections outline ambiguity in speech, negative and positive constraints, and a connectionist framework. 1.2.1  Ambiguity and its relevance to auditory word recognition There are situations where ambiguity arises in speech perception. At the lexical  and semantic levels of processing, the activation upon hearing the word sight has to be distinguished from cite and site. Talker variability factors such as accent introduce sources of ambiguity at the phonemic level. Background noise effects can mask the acoustic features in the speech signal (Mullenix, Pisoni, & Martin, 1989). Coarticulation effects from the speaker's interacting speech articulators can also neutralize phonemic differences in the speech signal (Gaskell, 2001). Ambiguity has a real world presence in our daily lives. Speech perception, and more specifically, spoken word recognition occurs in real time. The rapidity of the speech signal results in blending of speech sounds that may lead to ambiguity (e.g., Gaskell & Marslen-Wilson, 2001) and misunderstanding. As well, speech acts often take place in environments with less than ideal signal to noise ratios, resulting in degradation of the acoustic signal (Newman, 2004). Yet, the remarkable fact is that auditory word  3 recognition is generally successful despite the many challenges. The underlying processes and mechanisms that enable this are complex and not fully understood. Thus, the goal of studying ambiguity resolution is to better understand how the speaker's intended message is attained from the acoustic medium, and further, in the case of this proposed study, whether orthography can influence this process. 1.2.2  Ambiguity resolution Lloyd (2001) outlined different types of ambiguity in speech, including those  relevant to the current study. Lexical ambiguity occurs when a word has a double meaning (e.g., That is a hot car—hot as in temperature or excellent).  Perceptual  ambiguity stems from background noise or poor articulation affecting the identity of a. speech sound (e.g., a segment that could be heard as a I6J or It/). Perceptual-lexical ambiguity is the blend of the two types, as it is the situation where an ambiguous phonemic feature (e.g., voicing) permits more than one lexical interpretation (e.g., a word sounding like dip and tip) (Lloyd, 2001; Connine, Blasko, & Wang, 1994). While this can occur from signal degradation (Lloyd, 2001), perceptual-lexical ambiguity can also be experimentally induced through the editing of acoustic speech cues (Connine et al.,. 1994). For example, voice onset time cues can be manipulated to induce perceptuallexical ambiguity (see Methods for further descriptions). Presumably, listeners resolve perceptual-lexical ambiguity, benefiting from the constraints of sentential context (Connine et al. 1994; Connine, Blasko, & Hall, 1991), which illustrates one of a variety of constraints of auditory word recognition.  4 1.2.3  Research on constraints of auditory word recognition The literature has documented a variety of factors that affect how well spoken  words are processed. Touching on these experimental findings is necessary to understand the challenges present and cues available in spoken word recognition. The following provides an overview of these factors in terms of negative and positive constraints. A variety of studies outline some factors that negatively affect spoken word recognition. Talker variability studies have shown that word recognition is more difficult when the speaker is unfamiliar (Nygaard et al., 1994; Legge, Grosmann, & Pieper, 1984), and when the words are presented at different speaking rates and by multiple speakers (Sommers, Nygaard, & Pisoni, 1994). Perceptual-lexical ambiguity is more difficult for listeners to resolve without an available context (Lloyd, 2001; Salasoo & Pisoni, 1985). Prosodic research has found that English word comprehension may be impaired when prosody is absent (Wingfield, Lahar, & Stine, 1989; Wingfield, Lindfield, & Goodglass, 2000; Herron & Bates, 1997), or when prosodic cues are not English (Cutler, Dahan, & Donselaar, 1997). Studies using neighborhood activation modeling (i.e., focusing on how similar sounding words affect each other's access) have pointed to increased difficulty in recognizing spoken words that have a low word frequency (i.e., infrequently used in the language) and high neighborhood density (i.e., similar in word shape to many other words) (Dirks, Takayanagi, Moshfegh, Noffsinger, & Fausti, 2001; Sommers, 1996; Garlock, Walley, & Metsala, 2001). A collection of studies has reported how spoken word recognition can be facilitated by various cues and linguistic knowledge. Prosodic cues for syllabic stress helped listeners to identify word-onset gated words (Macdonald, 2003; Wingfield et al.,  2000). Lexical status (i.e., whether or not a spoken segment consists of a real word) has been seen to influence the perception of an ambiguous or distorted string (Connine & Clifton, 1987; Connine, Titone, Deelman, & Blasko, 1997). Lexical status has also been found to be a better facilitator of word recognition when sublexical distortion occurs later, as opposed to earlier, in the spoken string (Marslen-Wilson & Welsh, 1978; Samuel, 1981). Sentential context appears to support the identification of high neighborhood density, low word frequency words when the semantic context increases the predictability of those words (Sommers & Danielson, 1999). Similarly, sentential context facilitates the resolution of ambiguous words from a voice onset time (VOT) continuum (Connine, 1987), and words from an environment conducive to phonological assimilation (Gaskell & Marslen-Wilson, 2001). 1.2.4  Interactivity between levels of activation: A connectionist framework Collectively, the aforementioned research findings and the availability of cues  point to an inherent interaction of information between different levels of processing. The effects by lexical status and sentential context provide evidence that high-level knowledge affects how the incoming speech signal is interpreted. This multi-level interaction can be modeled by a computational connectionist framework. Connectionism can be broadly described as a theoretical stance that attempts to account for human cognitive performance by utilizing artificial neural structures. These constructed neural networks are based to some degree on biological neuronal and synaptic architectures. Connectionist models have been used to model language behavior such as reading and syntactic parsing, as well as a host of other cognitive, behavioral and physiological phenomena (Garson, 2002).  6 McClelland and Elman's (1986) T R A C E model of speech perception can serve as an example of connectionist modeling. In contrast tp the stepwise unidirectional diagrams of Caplan (1992, p. 13), connectionist models like T R A C E depict bi-directional connections between levels of representation. The levels are linked by an array of connections that may be programmed towards accomplishing particular tasks, such as phoneme feature detection. Connections are probabilistically weighted, and the weights change during language experience. Top-down and bottom-up connections between levels are excitatory (i.e., increase activation) while connections within levels are inhibitory (i.e., decrease activation), allowing for competitive activation at each representational level. The concerted excitatory and inhibitory forces facilitate the activation of one successful candidate over others. The T R A C E model depicts one variant of many possible connectionist models of language, as these models differ in architecture, learning, and the nature of representations (Garson, 2002). B y manipulating the representations, structure, and the algorithmic aspects of connectionist models, researchers can test their simulations against human performance. Finding links between the human and simulated data help support or refute theoretical predictions about spoken and visual word processing. 1.3  Visual information and speech perception A collection of studies has examined the influence of non-textual visual input on  auditory experience. For example, when spoken syllables are mismatched with lip and facial posture information, listeners may report hearing a syllable that exists in neither auditory nor visual domains. This so-called McGurk effect is evidence for the availability of visual information to auditory word processing (Shigeno, 2000; Green,  7 1998; Sams, Manninen, Surakka, Helin, & Katto, 1998; McGurk & McDonald, 1976). While McGurk phenomena research provides an example of visual information influencing speech perception, the research does not make claims about text. There is some evidence that print influences speech perception. At the syllabic level, Massaro, Cohen, and Thompson (1988) found that printed BA or DA had a significant effect on the identification of synthesized tokens from a BA-DA continuum, especially when the auditory token was the most ambiguous (see section 1.4.1.1). At the phonemic and word level, Borowsky, Owen, and Fonos (1999), and later Owen and Borowsky (2003), found orthographic influence on cross-modal phonological discrimination in noise. Based on their findings, and in the context of Seidenberg and McClelland's 1989 model of visual word recognition, they asserted that there were directionally weighted connections from orthographic to phonological representations. From priming research, there is evidence that print can contribute to faster response times of auditory stimuli. Kirsner and Smith (1974) provided participants a continuous sequence of words and nonwords appearing in either the visual or auditory modality. They found significantly faster lexical decision times for auditory words preceded by an identical visual word. In a series of signal detection studies, Frost, Repp, and Katz (1988) discovered that subjects were more likely to treat amplitude-modulated noise like speech when co-occurring print matched the signal. The same condition also led to significant priming. Dijkstra, Schreuder, and Frauenfelder (1989) found that consonant graphemes facilitated faster vowel identification in auditory C V syllables only when the letter matched the sound of the syllable onset. In a lexical decision task with cross-modal repetition priming, Kouider and Dupoux (2001) found that printed words  primed matching auditory target words, but only when the prime was consciously perceived. However, Grainger et al. (2003) (reviewed in section 1.4.2.1) found priming of spoken words by print even when the visual prime was not identifiable (i.e., presented for a very short duration). In Miller and Swick's (2003) study involving an auditory lexical decision, patients with alexia were more primed by words that were orthographically and phonologically related to target words than by primes that were only phonologically related. While the primes were not visual, the orthographic representations activated by the prime still had a facilitative effect. There is evidence that written words facilitate the learning of new words or novel stimuli (i.e., nonwords). Moreno and Mayer (2002) observed that university students increased their retention of new information about lightning formation when redundant on-screen text was presented with narration. Bird and Williams (2002) used a nonword rhyme-monitoring task and noted an increase in spoken nonword recognition scores for bimodally presented items over auditory- and visual-only nonwords. The authors argued that print facilitated the processing of the auditory stimuli when the auditory information was ambiguous or not sufficient to establish a phonological form. The findings to date indicate that text can influence speech perception. The effects of text appeared in subjects' identification and discrimination judgments (e.g., Massaro et al. 1988; Borowsky et al., 1999). As well, text is evidenced to facilitate the processing of spoken words when they are related (e.g., Dijkstra et al., 1989; Grainger et al., 2003). Finally, bimodal information (i.e., text and speech presented together) increased subjects' abilities to remember novel words (e.g., Moreno & Mayer, 2002). This review of the findings requires a theoretical account of the cross-modal (i.e. visual-  9 auditory) effects by text on spoken word recognition, and this is addressed in the following section. 1.4  Models of spoken and visual word recognition Models of language are important in order to operationalize conceptions about  language representations and processes (e.g., auditory comprehension, reading aloud, etc.). Such models provide an account of behavioral data, allow one to test hypotheses in computational networks, as well as drive further experimental research. A n examination of models of spoken and visual word recognition is, therefore, relevant to the current inquiry regarding the interaction of text and speech processing. A n illustration of a language model that is appropriate for the current inquiry is shown in Figure 1.1. It exemplifies a variety of features of a lexical-semantic network across two modalities. There is the overall architecture that describes how parts of the system relate to each other. There are also depictions of internalized representations or knowledge (shown as white nodes in Figure 1.1) that correspond to linguistic domains (e.g., semantic, phonology, orthography). Pathways or connections link these representations in a uni- or bi-directional fashion. The intermediate (or hidden unit) layer (shown in light gray in Figure 1.1) is necessary because the relationships between linguistic domains are nonlinear. For example, the word cat maps onto the semantic concept animal, but the same concept maps onto the words dog and hamster. While the links in Figure 1.1 are shown as bi-directional, some models limit their scope to one direction (e.g., Levelt, Roelofs, & Meyer, 1999).  10  Figure 1.1. A n example ofa language model showing processes for visual (i.e. text) and auditory (i.e. speech) input. Activated pathways shown in black. The illustration is adapted from Seidenberg and McClelland (1989), McClelland and Elman (1986), and Owen and Borowsky (2003).  Spoken and visual word recognition models correspond to some degree to the example given in Figure 1.1. While the similarity will be more apparent for models of visual word recognition, a review of spoken word models is necessary, as they relate to questions about the interaction between text and speech. A model of language that encompasses both visual and auditory word recognition is ideal to discuss the cross-modal (visual-auditory) influence of text on speech perception. However, as the perfect model does not exist, one is limited to reviewing  11 existing models that are generally modality specific. While there were many models in each modality, the selection was limited to those that specifically addressed the influence of text on spoken word recognition. In order to address these cross-modal effects, these selected models extended their scope to include both modalities. 1.4.1  Models of spoken word recognition  1.4.1.1  Fuzzy Logic Model of Perception Massaro (1987) outlined the Fuzzy Logic Model of Perception (FLMP), which  states that auditory syllables are compared to internal exemplars called prototypes. These prototypes, stored in memory, house an aggregate of abstracted acoustic characteristics called features. The model states that speech is processed in a stage-like fashion where the incoming signal is compared to internalized features and prototypes in order to attain the best degree of fit or match between a prototype and the input. The model argues that all available sources of information are independently evaluated (Massaro, 1987), including those from another modality. Prototypes are capable of containing visual features including orthography. Massaro, Cohen, and Thompson (1988) conducted a cross-modal experiment that included the pairing of printed syllables, B A or D A , with synthesized syllables ranging on a continuum between /ba/ and /da/. Massaro et al. (1988) found the effect by text on the auditory syllable identification task was significant (p < 0.001). They also stated that print had a stronger effect when it was paired with auditory stimuli from the ambiguous region of the /ba/ - /da/ continuum (Massaro et al., 1988). These findings indicate that print can bias the perception of auditory strings, particularly when the string is ambiguous.  12 1.4.1.2  Evaluation of the F L M P The F L M P is similar to connectionist models like T R A C E (McClelland & Elman,  1986) as both continuously evaluate the incoming speech signal, allowing flexible decision-making in the identification of particular speech sounds and syllables. However, unlike T R A C E , Massaro's model does not permit top-down activation to directly influence the processing of the bottom-up signal. Instead, it assigns the topdown information as a new independent (i.e. bottom-up) input for evaluation (Hawkins, 1999). While the F L M P provides an explanation for the biasing of speech by textual information, there are theoretical limitations in this account. The F L M P focuses on the access of spoken syllables as opposed to whole words (Hawkins, 1999), and hence, does not include predictions of cross-modal effects at the lexical level. As well, the model cannot explain why print facilitates faster latencies when it.matches the auditory word. Furthermore, the F L M P architecture does not outline how orthography and phonology are related. Even though prototypes contain visual and auditory feature specification, the model does not explicitly state whether these prototypes, in turn, comprise an amodal level of representation or a highly interconnected but dissociable bimodal layer. There is arguably a lack of detail in this model regarding how visual words are processed, but this criticism is not specific to the F M L P . 1.4.1.3  Limitations of models of spoken word recognition As exemplified by the F L M P , models of spoken word recognition are not able to  explain how text is processed. While some models (such as the FLMP) predict connections between text and speech, there is not sufficient detail about these links.  13 Gaskell and Marslen-Wilson (2002; 1997) identified this flaw within their own connectionist framework, the distributed cohort model, when attempting to account for the effects of audible speech fragments (i.e., gates) on the processing of printed single words. In order to address the processing of text in their repetition and semantic priming experiments, Gaskell and Marslen-Wilson (2002) selected Plaut, McClelland, Seidenberg, and Patterson's (1996) model of visual word recognition. Unfortunately, the distributed cohort model's extension into cross-modal priming did not provide any predictions of text influencing speech perception, as the paradigm used in Gaskell and Marslen-Wilson's (2002) study employed auditory primes and visual targets. One would have to infer from models showing connections from orthography to phonology (see later section 1.4.2) that text could influence speech as well as the reverse. At this time, the available spoken word models and the experimental evidence they are supported by are not adequate in enabling one to formulate predictions about cross-modal effects from print to speech. 1.4.1.4  Summary The F L M P (Massaro, 1987) states that the degree of match between the incoming  speech and prototypical representations is continuously evaluated. The model permits textual information to be included in this evaluation, which is supported by the finding that text can bias the identification of speech syllables when they are ambiguous. The model does not venture to state whether reaction times to processing words can be affected cross-modally, or how orthography and phonology are interconnected. The latter shortcoming, however, is indicative of models of spoken word recognition and requires examination of models of visual word recognition.  14 1.4.2  Models of visual word recognition  1.4.2.1  Bimodal interactive activation model The bimodal interactive activation model of word recognition proposed by  Grainger, Diependaele, Spinelli, Ferrand, and Farioli (2003) closely follows the earlier workings by Grainger and his colleagues (e.g., Grainger & Ferrand, 1994; Jacobs, Rey, Ziegler, & Grainger, 1998). This is a visual word recognition model that has been extended into the auditory word processing domain. Similar to Figure 1.1 (see section 1.4), the model proposes separate processing paths for visual and auditory words and proposes bi-directional connections. The bimodal model does not include semantic representations in its architecture. Instead it narrows its range to orthographic and phonological processing. This model distinguishes between lexical and sublexical processing, a defining feature of "dual-route" models (e.g., Coltheart, Curtis, Atkins, & Ffaller, 1993; Zorzi, Houghton, & Butterworth, 1998). The bimodal model contains an extra layer between the sublexical orthographic and phonological representations that process and bridge modality-specific clusters (i.e., graphemes and phonemes). Relevant to the current inquiry, the authors conducted a series of cross-modal priming experiments, and found that lexical decision latencies to spoken words were facilitated by matching (i.e., identical) visual words, even when the visual words were briefly displayed (compare to Kouider and Dupoux, 2001). The authors argued that their findings pointed to automatic activation along the lexical route between orthography and phonology. 1.4.2.2  Evaluation of the bimodal interactive activation model As stated previously, Grainger et al. (2003) is similar to Coltheart et al. (1993)  and Zorzi et al. (1998), in that they provide a dual-route view of reading: a "special"  15 lexical route that processes exception words, and a sublexical route that analyzes regular words and nonwords. This differs from single route models developed by Seidenberg and McClelland (1989) and Plaut et al. (1996), which outline a single route between orthography and phonology to handle the processing of all words and strings. A l l of these models agree that there are nonsemantic connections between orthography and phonology. Unlike Coltheart et al. (1993), Zorzi et al. (1998), Seidenberg and McClelland (1989), and Plaut et al. (1996), the Grainger et al. (2003) model is not computational or algorithmic in its current form. Similar to Stone, Vanhoy, and Orden (1997) and Seidenberg and McClelland (1989), Grainger et al. (2003) views the connections between orthography and phonology as automatic and bi-directional. Grainger et al.'s (2003) use of an intermediate layer between sublexical orthography and phonology (i.e., a complex sublexical interface) can be compared to the hidden unit layer in the models by Seidenberg and McClelland (1989) and Plaut et al. (1996), but its inclusion into the architecture is to account for priming by pseudohomophones (i.e., nonwords such as brane that can be pronounced as real words). An advantage of Grainger et al.'s (2003) model (compared to spoken word recognition models) is that its architecture contains the means to explain both visual and auditory word processing. The bimodal interactive activation model can also account for cross-modal priming effects of spoken words by identical printed words. A limitation of Grainger et al. (2003) is that it does not predict what will happen i f a spoken word is ambiguous (e.g., a distorted word that can be perceived as goat and coat). It is not known if the model can predict the biased perception of ambiguous speech by potentially matching text (as found in Massaro et al. 1988).  16 1.4.2.3  Limitations of models of visual word recognition As exemplified by Grainger et al. (2003) in the previous section, visual word  recognition models (e.g., Plaut et al., 1996; Coltheart et al., 1993; Zorzi et al., 1998) do not specify how ambiguous spoken words are processed. Visual word recognition models also do not state how these ambiguous spoken words can be biased or primed by potentially identical text (as suggested in Frost et a l , 1988). With Grainger et al. (2003) being an exception, models of visual word recognition do not say whether unambiguous spoken words are facilitated by identical printed words. 1.4.3  Relating spoken and visual word recognition models Recall from section 1.4.1.3, that the spoken word models were evaluated as being  incomplete in providing predictions about cross-modal priming effects of print on speech. While the F L M P model did argue that the printed word could provide information for integration with speech information (see section 1.4.1.1), it did not provide sufficient detail about how textual information is integrated with auditory information (Massaro et al., 1988). As indicated by Gaskell and Marslen-Wilson (2002), spoken word modeling would need to refer to visual word models to make predictions about the interaction between orthography and phonology. Visual word models have the advantage over spoken word models by including orthographic representations. However, both kinds of modeling are not mutually exclusive as they both include phonological processing. Grainger et al. (2003) illustrates this point by providing an extended visual word recognition model that includes spoken word processing. Yet, even though Grainger et al. (2003) appears to best model the influence of text on speech perception compared to other visual word recognition models, all of these textual models are inadequate in  17 predicting how ambiguous spoken words are processed. Therefore, in order to fully account for the biasing and priming effects of text on speech (see section 1.3), an idealized cross-modal model will have to be built upon the current spoken and visual word models. Merging visual and auditory word models would be facilitated by shared assumptions (i.e., about architecture, flow of information, etc.). For example, Gaskell and Marslen-Wilson (2002) selected Plaut et al.'s (1996) model as an account of visual word processing as it contained a similar connectionist framework to their distributed cohort model of spoken word recognition. In the absence of an idealized cross-modal model, the available spoken (e.g., FLMP) and visual word models (e.g., bimodal interactive activation model) serve as the theoretical support of the cross-modal effects by print on speech. 1.5  The present study The conclusions from prior research that print can influence speech perception  deserve further inspection because the scope or extent of cross-modal influence is not clear. For example, while there is evidence of facilitation of spoken words by identical printed words (Kirsner & Smith, 1974; Kouider & Dupoux, 2001), it is not known if text can inhibit the processing of a spoken word. Questions about online versus offline (i.e., automatic versus strategy-driven processing) effects on spoken words by briefly displayed text also remain (Kouider & Dupoux, 2001; Grainger et al., 2003). Another related question regards ambiguity resolution. Despite evidence that ambiguous speech can be influenced by text (Massaro et al., 1989), it is not known whether these crossmodal effects are robust when the text is briefly displayed.  18 The current inquiry intends to expand our understanding about text and its influence on speech by examining the effects of visual information (text) on the resolution of perceptual-lexical ambiguity in the auditory domain (spoken words). The experiment will also study whether rapidly presented text can subconsciously influence the perception of these ambiguous words. This inquiry extends previous research in several ways. First, the investigation will pair text with spoken words made ambiguous by voice onset time (VOT) manipulation (elaborated in the next chapter). Second, the study will precede these ambiguous spoken words with briefly displayed text. Third, the inquiry will include cross-modal word pairs that differ by the initial letter or sound (i.e., opposing word pairs like bin and pin) to ascertain possible inhibitory effects by text on speech. In this study, participants viewed a computer screen, were presented with text, listened to speech stimuli, and repeated what they heard. The independent and dependent variables were as follows: Independent Variables (all within subject): •  Prime duration: 33ms (not perceived) or 100ms (consciously perceived)  •  Prime-target trial type: six possible conditions over two target types. o  Unambiguous targets: primes were either a matching or opposing endpoint, or an unrelated word,  o  Ambiguous targets: primes were either a voiced or voiceless bias, or an unrelated word.  19 Dependent Variables: •  Reaction time: measured from the onset of auditory target to onset of repetition.  •  Repetition identity: what the subject actually repeated.  The specific research questions addressed in this study are: 1. Can print influence the resolution of ambiguity in the speech signal? 2. And i f so, is this cross-modal influence available below the level of awareness?  Hypotheses constructed from these queries are organized by dependent variable: 1.5.1  Reaction time  1. Reaction times on ambiguous target trials will be slower than on unambiguous trials. This will be determined by comparing the reaction times of unrelated prime-ambiguous target trials to unrelated prime-unambiguous target trials. For example, a prime-target pairing such as reef-XOATwill GOATox reef-COAT.  be slower than reef-  This expectation is based on VOT research that found that  ambiguous stimuli produce slower reaction times (Pisoni & Tash, 1974; Repp, 1981). 2. Trials where the visual prime is identical to the auditory target will have faster reaction times than unrelated prime-target trials. In other words, pairings such as pin-PIN will have faster reaction times than sag-PIN. This hypothesis is based on priming research that found facilitative effects of identical prime-target pairings (Forster & Davis, 1984; Lukatela, Frost, & Turvey, 1999; Kouider & Dupoux, 2001).  20 3. Responses to ambiguous targets following potentially matching primes will be faster than the same targets following unrelated primes. For example, the trials coat-XOA T or goat-XOA T will have faster reaction times than reef-XOA T. The ambiguous target will activate multiple lexical candidates, but a potentially matching prime will cross-modally reinforce the activation of one lexical candidate over others (i.e., coat or goat in the above examples). This expected facilitation effect corresponds to the Frost, Repp, and Katz (1988) findings where related text paired with ambiguous speech in noise produced faster reaction times. 4. Trials where the visual prime differs from the target by the initial letter and phoneme will have slower responses than trials with unrelated primes. In other words, slower reaction times will be observed in opposing prime-target pairs like cap-GAP compared to fin-GAP. The expectation is that the prime will activate a representation that is phonologically similar to but lexically different from the target, and therefore, slow the processing of the target. This kind of competition or inhibitory effect has been observed in priming studies using rhymes (e.g., Lukatela & Turvey, 1996). 5. Visual primes will influence reaction times even when they are briefly displayed. The prior four hypotheses will be observed regardless of prime duration. While participants will not be able to explicitly identify blinked primes, visual word processing will occur. This is based on visual lexical decision and naming studies that used brief visual primes (Ferrand & Grainger, 1993; Lukatela & Turvey, 1996) and the theoretical explanation that such primes automatically access lexical representations (Forster, 1998).  21 1.5.2  Repetition identity  1. Visual primes will influence the identification of ambiguous targets. Given that VOT-manipulated stimuli taken from the voicing category boundary region are potentially ambiguous, they will activate multiple lexical candidates. The prime will automatically activate a phonological representation (an endpoint) that will bias the selection of one of these candidates. For example, i f a participant sees the word tip and hears an ambiguous word that is acoustically between dip and tip (i.e., XIP), then the listener will likely repeat tip. This effect will be assessed by counting the number of instances that a response matches the prime. In trials such as dip-XIP or tip-XIP, it is expected that the majority of responses will correspond to the prime. This expectation stems from the finding that print influences the identification of ambiguous synthesized speech syllables (Massaro, Cohen, & Thompson, 1988; Massaro 1998). 2. Visual primes will influence the identification of ambiguous targets, even when the prime is blinked (i.e., displayed for 33 ms). The scenario described in the previous hypothesis will occur even when the subject is not consciously aware of the prime's identity. Given that briefly displayed primes produce significant activation in visual word recognition studies (e.g., Lukatela & Turvey, 1996), it is expected that the prime's orthography-to-phonology activation will be sufficient to bias the subjects' responses.  22 C H A P T E R 2: METHODS  2.1  Introduction The description of the methodology is divided into two major parts. The first  section describes the pilot study that was conducted to attain ambiguous verbal stimuli. The second section presents the methodology of a cross-modal priming experiment that used the ambiguous stimuli selected from the pilot study. 2^2  Pilot study  2.2.1  Overview The purpose of the pilot study was to ascertain which voice-onset time  manipulated sound files were ambiguous. The following sections will describe the participants, the stimuli, and the procedure. The section on stimuli will include a detailed description of the manipulation of the voice-onset time of auditory stimuli. 2.2.2  Participants There were six volunteers in the pilot study. Participants' ages ranged from 23 to  48 years ( M = 31.7; SD = 10.0). A l l were recruited through the university department or from personal contacts of the experimenter. A l l pilot subjects were female due to a lack of male candidates. None of the pilot volunteers reported any hearing problems. A l l reported being monolingual speakers of English. Five of the six participants reported having limited foreign language exposure through previous schooling, but none of them stated being proficient in a second language.  23 2.2.3  Materials  2.2.3.1  Stimuli description There were 46 unique sound files used in the pilot study of which 34 had been  experimentally manipulated, while 12 were unedited. The sound files were single words corresponding to minimal pairs: bin or pin, ban or pan, dip or tip, dab or tab, gap or cap, goat or coat. The experimentally manipulated sound files were potentially ambiguous, but it was expected that listeners would interpret each of these files as either member (or endpoint) of the corresponding minimal pair. The minimal pairs were selected so that the endpoints were relatively matched in word frequency according to Francis, Kucera, & Mackie (1982). However, two minimal pairs were selected due to their use in other studies. The words goat and coat were used in a phoneme categorization study by Borsky, Tuller, & Shapiro (1998). As well, bin and pin were used in an earlier phoneme categorization study by Repp & Lin (1991). 2.2.3.2  Stimuli preparation  2.2.3.2.1  Sound recording  Using his own voice, the experimenter recorded the stimuli. The 30-year old experimenter, native to British Columbia, was a monolingual speaker of Canadian English. A l l spoken words were uttered twice, pausing before each repetition. These word repetitions were produced with a similar pitch and loudness. Recording of the words was completed in four excerpts to allow the experimenter to rest his voice. Sound recordings of single words were done in a sound-treated booth (Acoustic Systems BB-880L), using a unidirectional low-impedance microphone (Audio-Technica ATR20) that was connected to the microphone input on a laptop computer (Toshiba  24 Satellite 2450). The specifications of the laptop were more than sufficient to record the audio (Pentium 4, 512 M B of R A M , Microsoft Windows X P Professional, integrated digital audio processing). To minimize fan noise during recording, the laptop operated without A C power on a fully charged battery. Internally, the recording volume was set to 50%, and the microphone boost was turned off. To ensure consistent recording quality, the microphone was maintained approximately 15 cm from the experimenter's mouth, angled 45 degrees to the right. The words were recorded onto the hard drive by means of a sound editing application, Cool Edit 2000 (Syntrillium Software Corporation, 2000). The sound recordings were digitized at 22 kHz in 16-bit amplitude resolution. A l l recordings were saved in an uncompressed WAV-file format. 2.2.3.2.2  Low-pass filtering  As a precaution against aliasing, low-pass filtering was conducted on the waveforms. The microphone used in the recording had a maximum frequency response at 12 kHz, 1 kHz higher than 11 kHz, half of the sampling rate. This potentially allowed the digitization of false acoustic data (i.e., aliasing) at frequencies as low as 10 kHz (Kent & Read, 1996). The acoustic stimuli in this study were filtered in Cool Edit 2000 using a fast Fourier transform (FFT) filter set to 100 percent pass at 9.6 kHz and zero percent pass at 10 kHz. 2.2.3.2.3  Amplification  Each raw file, containing multiple word recordings, was amplified in Cool Edit 2000. In order to increase the loudness without distorting the audio, the raw waveforms  25 were amplified so that waveform peaks approached -6 dB. After this process, the sound files were verified in Cool Edit 2000 to have similar loudness levels. Peak and average root mean square (RMS) amplitudes were -0.98 dB (SD = 0.81) and -37.56 dB (SD = 0.48) respectively. 2.2.3.2.4  Noise gating  A by-product of the amplification process was an increase in noise, especially noticeable in the silent sections of the recordings. To remedy this problem, digital noise gating was performed on the recordings in a second sound-editing application, SoundForge X P (Sonic Foundry, 1999). This process effectively removed low-amplitude signals, below a set threshold, from the waveform while leaving the high-amplitude signals intact. After analyzing the noise with the software's acoustic statistics tool, the selected threshold was -36.7 dB. The attack and release parameters of the noise gate were one and 150 ms, respectively. To ensure minimal degradation of the words of interest, timing points were inserted into the original waveforms. These points marked the onset and the offset of each word. Extra points were marked onto the minimal pair waveforms marking release bursts, and the voicing onsets. This permitted visual comparison of pre- and post-noise gated waveforms to ensure that the acoustic signals at these points were untouched. Whenever the acoustic signals at these points appeared changed, the original waveform was restored and the noise threshold was lowered in 1-dB increments. Once the process was complete, each waveform was auditioned and saved separately in an uncompressed WAV-file format.  26 2.2.3.2.5  Voice Onset Time (VOT) continua construction  Voice onset time (VOT) is an acoustic measure of the interval between the stop release and the onset of periodical vibration in the speech signal (Baken & Orlikoff, 2000; Kent & Read, 1992). While this time interval is typically measured from wideband spectrograms (Baken & Orlikoff, 2000), it can be calculated from the acoustic waveform directly (Till & Stivers, 1981). The measure, representing a complex sequence of laryngeal and vocal tract configurations (Abramson, 1977), is associated with the voicing feature of plosive speech sounds. In English, stops with shorter V O T (less than 25 ms) are likely perceived as voiced (e.g., /b/, /d/, and /g/), whereas those with longer V O T (greater than 25 ms) are more typically heard as voiceless (e.g., /p/, /t/, and IkJ). V O T varies significantly within the same speaker and according to other factors including place of articulation, height of the following vowel, speech rate, word stress, and utterance length (Baken & Orlikoff, 2000; Kent & Read, 1992). V O T has been used by many experimenters as a means to study various issues in speech perception (e.g., Joanisse & Seidenberg, 1998). Researchers have been able to electronically manipulate the time interval using speech synthesizers (e.g., McMurrary, Tanenhaus, & Aslin, 2002) or digital editing tools (e.g., Baum, 2001). This level of control permits researchers to create stimuli based on a V O T continuum that could not otherwise be replicated by typical speakers. For example, it is possible to edit speech samples to sound like both [t] and [d]. This kind of stimulus has been used in research to study ambiguity resolution (e.g., Connine & Clifton, 1987). In this study, six V O T continua were constructed. The process followed the methodology described by Boyczuk & Baum (1999), Baum (2001), and Miller & Dexter  27 (1988). For each continuum, best exemplars of the minimal pair, i.e. endpoints, were chosen (e.g., bin and pin). Using SoundForge X P , the experimenter located the release burst and the voicing onset markings for each endpoint, permitting measurement of VOT. The difference in V O T between endpoints was then divided into ten intermediate steps to form a continuum. With two V O T continua, dip-tip and gap-cap, the V O T difference between endpoints was large enough that it warranted 20 intermediate steps. The average length of the intermediate step ranged from 6.0 to 9.1 ms across V O T continua. The waveforms of the voiced and voiceless endpoints were marked at locations corresponding to the average intermediate step. These markings were adjusted so that they intersected the waveforms at points where the zero line on the y-axis was crossed. These zero-crossings were marked to minimize transient noise artifacts during sound editing. Once the intermediate steps were marked along the waveforms of the endpoints, a systematic manipulation of the voiced endpoint began. The first section of the voiced endpoint, between the release burst and the onset of voicing, was replaced by a corresponding first section of the voiceless endpoint. This replacement occurred by means of using copying and pasting features in the sound-editing program. Similarly, the next token was made by continuing this process. The first and second sections of the voiced endpoint were replaced by corresponding sections of the voiceless endpoint. The continuum series increased as larger sections of the voiced endpoint were replaced by the voiceless counterpart. As this digital splicing continued, the V O T of the edited sound file increased, while the file's duration was maintained.  28 Upon completion of the continua, there were ten tokens for ban-pan, bin-pin, dabtab, and goat-coat, and 20 tokens for dip-tip and gap-cap. The tokens of each continuum were arranged by V O T within a file window. Using the sound preview feature in Windows, the experimenter auditioned each continuum from either endpoint and located where his perception of the phoneme changed categorically. Once this categorical shift was located on the continuum, three consecutive tokens from that region were selected. 2.2.4  Procedure  2.2.4.1  Participant instructions Before starting the pilot experiment, participants were told that they were going to  identify similar sounding words that had been digitally altered. It was stated that the goal of the study was to identify potentially ambiguous words for use in a later experiment. They were verbally instructed to wear headphones, look at the computer screen, and to identify the spoken words. Similarly, onscreen instructions directed subjects to watch a fixation point, listen to the word, and identify it by matching it to one of the two printed words that followed. Word identifications were made by pressing the 1 ox 2 keys on the keyboard that corresponded to the printed words. There were no time constraints, but subjects were encouraged to use their initial impressions. 2.2.4.2  Experimental testing The pilot study was conducted in a sound-booth. The apparatus consisted of a  desktop computer and two pairs of headphones. The computer was a Pentium 4 and was more than equipped to perform the experiment: 2.4 GHz processing, 512 M B of memory, an 80 GB hard drive, and a 64 M B video card (Radeon 7000 Series). Peripherals included a Microsoft Internet Keyboard, a 2-button scroll wheel mouse, and a 17-inch  29 liquid-crystral display (LCD) monitor (Viewsonic VG700). The operating system was Microsoft Windows 2000 Professional. The application that ran the experiment was E Prime version 1.1 (Psychology Software Tools, 2003). There were two pairs of headphones in this task. The participant wore one pair (JVC HA-G33) while the experimenter wore the other pair (JVC HA-D30). Both headphones were plugged into the audio output of the computer via a single-to-dual headphone jack. This setup allowed the experimenter to track the course of the experiment from inside the sound booth. During a trial, the fixation point, a " * " symbol, appeared onscreen for 1500 ms. The fixation point was then replaced by a visual cue, a " + " sign, and the simultaneous presentation ofa spoken word. The plus sign remained onscreen for 1.0 s after onset. Finally, two printed word choices were displayed with their respective numbers until the subject responded. A l l word choices were centered onscreen in bold 18-point Courier New font. The word choices corresponded to the endpoints of the sound file. The software randomized the position of the individual word choices (top or bottom). Three sound files were selected from each continuum that had been perceived to be potentially ambiguous by the experimenter. With six continua, there were 18 sound files. Each sound file was presented six times to the subject for identification. There were a total of 108 VOT manipulated trials. As a control, a block of 12 non-manipulated sound files was appended to the study. These unedited files corresponded to the endpoints of each continuum and subsequently were only presented once. Hence, the session consisted of 120 trials. Within each block, the software randomized trials.  30 As initial data was collected, it became evident that only two sound files showed potential ambiguity, corresponding to the continua dab-tab and bin-pin. As a result, a second session consisting of the exact experimental setup was performed with 16 new sound files (i.e., untested tokens) plus the two ambiguous files from the first session. This second attempt increased the total number of trials to 240 (2 x 120 trials) over two sessions. Two individuals completed both sessions within the day, while the other four participants completed them after a month. Each session was approximately 15 minutes long. Following pilot data collection, an average percentage was calculated for each wave file. The number of voiced responses to a particular token across all participants was summed and then divided by the number of subjects. As expected, the percentage of responses decreased as the edited VOT increased (see Appendix I). The wave file with a voiced response closest to 50% was identified as an ambiguous token of the corresponding continuum. The selected tokens are shown in Table 2.1. One continuum, dip-tip, did not appear to have an ambiguous wave file, as there was a steep decrease in responses between tokens three and four. As a result, a token with an edited V O T of 37.1 ms was selected for the cross-modal experiment. Table 2.1 Selected VOT Continua Sound Files to Serve as Ambiguous  Continuum Bin-Pin Ban-Pan Dip-Tip Dab-Tab Gap-Cap Goat-Coat  Stimuli  V O T (ms) 18/7 20.9 37.1 37.1 34.3 41.8  31 2.3  Cross-modal priming study  2.3.1  Overview This portion of the chapter outlines the cross-modal priming experiment. There  are descriptions of the participants, the experiment and conditions, the auditory and visual stimuli, and the session instructions. The final section on the experimental testing includes a description of the trial sequences. 2.3.2  Participants Initially, 29 participants were recruited in this study, consisting of 5 men and 24  women. Only individuals between the ages of 18 and 30 were permitted to participate, stemming from evidence that older adults experience a general decrease in cognitive processing resources (e.g., Sommers & Danielson, 1999). The majority of participants (n = 24) were Masters students in an audiology and speech sciences program. None were paid for their participation in the study. A l l were recruited by postings around the university or by personal contacts. English monolingual speakers with minimal exposure to other languages were sought for the experiment. However, the latter criterion had to be relaxed to gain a sufficient subject pool. This change in criteria permitted more volunteers for the experiment, including those with exposure to other languages. The experiences varied from a non-English-speaking home life to traveling abroad as adults. As a result, a subset of participants who had a non-English language spoken at home (n = 6) was removed from the study. The remaining participants reported to have some exposure to another language (e.g., taking French as a foreign language class in high school), but none stated to be proficient in a second language.  32 2.3.2.1  Hearing screening With the aid of a portable audiometer (Maico MA-40), the session began with a  hearing screening. Participants were asked to wear headphones and raise their hand whenever they heard a tone. Pure tones were presented to each ear at frequencies of 1000, 2000, 4000, and 500 hertz. Participants passed the screening at 20 db H L or less at all frequencies in both ears. Those who reported passing a hearing test in the last 12 months were not screened in this study. There was one individual who did not pass the hearing criteria and was subsequently removed from the data. 2.3.2.2  Reading screening After the hearing screening, participants were given a reading test. The  experimenter administered the short version of the North American Adult Reading Test, or NAART35 (Uttl, 2002). During this test, the participants were instructed to read aloud a list of 35 words, pausing between words, as the experimenter phonetically transcribed what was uttered. Participants received a point for each word's correct pronunciation. One or more errors in pronunciation of an item (e.g. reading corps as corpse) resulted in a score subtraction of one point. As directed by the test administration, dialectical variation in pronunciation was not penalized (Spreen & Strauss, 1991). The average score was 27.3 out of 35 (Range = 18 to 35; SD = 4.31). No participant was excluded from the study, since all NAART35 scores were within or above normal limits (i.e. plus or minus 1.5 standard deviations). 2.3.2.3  Participant summary In sum, there were 17 women and five men who participated in the experiment.  Ages ranged from 22 to 30 years old ( M = 25.41; SD = 2.59). A l l were native English  33 speakers. A l l reported to have some exposure to another language through school or travel. While one individual reported having a high school diploma, the remaining participants stated having at least a Bachelors degree. 2.3.3  Experimental task and conditions The experiment used an auditory word repetition activity delivered in a cross-  modal priming paradigm similar to that used by Kouider and Dupoux (2001). The priming methodology employed a presentation of two key stimuli: a prime (e.g., a visual word) and a target (e.g., an auditory word). As the term cross-modal denotes, the stimuli were in different modalities (i.e., a visual prime and an auditory target). The relationship between the prime and target was experimentally varied to explore its effects on the time to repeat the target word. The two independent variables form the following conditions. The prime duration variable, or the length of time that the visual prime was displayed onscreen, consisted of two conditions: short duration (33-ms) and long duration (100-ms). The second variable, prime-target trial type, denoted the relationship between the prime and target stimuli. This second variable consisted of six conditions across ambiguous and unambiguous target words (the ambiguous target words are spelled with an Xfor clarity):  Condition Matching Opposing Unrelated Unrelated-Ambiguous Voiced Bias-Ambiguous Voiceless Bias-Ambiguous  Example goat-GOAT coat-GOAT reef-GOAT reef-XOAT goat-XOAT coat-XOAT  Number of trials 12 12 12 6 6 6_  34 2.3.4  Materials  2.3.4.1  Stimuli description  2.3.4.1.1  Auditory (target) stimuli  The auditory stimuli consisted of 72 unique W A V files. Each sound file contained an isolated spoken English word (see Appendix II). The sound files consisted of 12 endpoint words, six ambiguous words chosen from the pilot study, and 54 filler words. The filler words were selected from a published corpus (Francis, Kucera, & Mackie, 1982). To help control for word frequency, words with fifty occurrences or less in the corpus, across all word classes, were used in the study. However there was one exception in that the endpoint word coat had a total frequency count of 58. A l l words consisted of a phonological C V C shape. A l l auditory stimuli had their source from the recordings made for the pilot study. 2.3.4.1.2  Visual (prime) stimuli  The visual stimuli consisted of single words, three to five letters in length. There were 45 unique printed words, 18 of which were experimental primes (see Appendix III). These experimental stimuli consisted of 12 voiced and voiceless primes (related to minimal pairs like pan-ban) plus six control or unrelated primes (e.g., net). These 18 experimental primes were selected to map onto the 18 endpoint and ambiguous auditory stimuli (see previous section). The other 27 visual primes were chosen to pair with the auditory filler words. A l l visual stimuli were centered onscreen in lower-case letters, using an 18-point Courier New font with a bold setting. Similar to the auditory stimuli, the visual stimuli were selected from Francis et al. (1982). Only words that had total frequencies lower than 50 occurrences in the Francis et  35 al. (1982) corpus were included. A l l visual stimuli corresponded to a C V C phonological shape. Words with homophonic representations (e.g. bear and bare) were excluded from the study. A l l the words were examined for their orthographic neighborhood density by accessing the Speech and Hearing Lab Neighborhood Database (2001), available online. It was difficult to control for neighborhood density since many short words had overlapping neighborhoods, especially those corresponding to minimal pairs. However, individual orthographic word densities were recorded and taken into consideration when pairing visual and auditory stimuli, and when composing the sequences of trials. A l l words were examined for their semantic content. This was done informally with the aid of online dictionaries (Oxford English Dictionary Online, 2004; Pickett et al., 2000). Strong word associations were identified (for example the word chop was associated with the terms cut and wood) and care was taken to minimize the selection of semantically related items. 2.3.5  Procedure  2.3.5.1  Participant instructions Participants were told they were going to partake in a cross-modal priming  experiment that entailed listening to single words and repeating them aloud. They were instructed to look at a computer screen while listening to each word, and that single written words would be displayed right before they hear the auditorily presented word. It was explained that the printed words would be displayed briefly, sometimes as if to "blink". The experimenter stated that the purpose of the experiment was to see if text influences what one hears. The participants were also told that some of the spoken  36 words might sound ambiguous, though the specifics about the ambiguity were not divulged. Written instructions directed the participants to watch the " * " symbol at the center of the screen, watch a sequence of rapidly changing events, and listen to single words over the headphones. The participants were instructed to repeat the spoken word as soon as they heard the entire word. The instructions informed subjects that some of the words might sound similar and that there were no right or wrong answers. 2.3.5.2  Experimental testing The cross-modal priming experiment was conducted in the same sound-treated  booth as the pilot-study. The desktop that was described in the pilot study was used again in the main experiment. The only difference in computer hardware was switching the liquid crystal display (LCD) screen for a 17-inch cathode-ray tube (CRT) monitor (Philips 107B). Displaying visual stimuli with durations less than 100 ms warranted the use of a CRT monitor. E-Prime stated that these brief visual stimuli were vulnerable to the display rate of monitors. The advice was to use a CRT monitor, since one could accommodate the refresh rate of this type of monitor to the presentation of the visual stimuli. L C D monitors were stated to display pixels differently, and hence could not be accommodated (Schneider, Eschman & Zuccolotto, 2002; Psychology Software Tools, n.d.). The display settings were adjusted for the experiment. A refresh test measuring the drawing time to the center of the screen was performed to confirm that the refresh rate of the monitor was 60 Hz. Internal display settings were set to a screen area of 600 by  37 800 pixels with 32-bit color. Brightness and contrast settings were set for comfortable viewing. The experiment used a five-button, five-lamp serial response box that contained a voice key (Psychology Software Tools, model 200A). A headset with a unidirectional microphone (Optimus model 33-3012) was attached to the voice key input. The serial response box was connected to the desktop computer by a serial cable. This setup provided the means to collect reaction times. In addition to the voice key, another microphone was used in the experiment. A boundary microphone (Optimus model 33-3022) was placed on the desk between the subject and the monitor. The output of the microphone was connected to a portable cassette recorder (Marantz PMD430). This setup permitted recording the actual verbal output of the subject. The auditory stimuli were presented through a pair of headphones (JVC H A G33). Participants wore these headphones on top of the headset microphone. The headset microphone was oriented to a position approximately two centimeters from the participants' lips. With this arrangement in place, the participants were seated comfortably approximately 70 cm in front of the monitor and boundary microphone. As shown in Figures 2.1 and 2.2, a trial began with a fixation point, a " * " symbol, on the center of the screen for 1.5 seconds. The fixation point was then replaced by a row of ten " # " for 500 ms, functioning as a forward mask. As soon as the mask vanished, a single word appeared briefly onscreen for either 33 or 100 milliseconds. The visual prime was immediately replaced by the backwards mask, a row of ten " & ". As the backwards mask was displayed, the auditory target was presented through the  38 headphones. There was a fixed time lag, or stimulus onset asynchrony (SOA), of 100 ms between the onset of the visual prime and the onset of the auditory target. The backwards mask remained onscreen until the participant provided a verbal response, advancing the experiment to the next trial.  prime  1500 ms  SOO ms  33 ms  67 ms  participant responds  100 ms  Figure 2.1. Trial sequence in the short (33-ms) prime duration condition. Order of events from left to right: fixation point, forward mask, visual prime, backwards mask, backwards mask plus auditory target, subject's response. Time intervals are not to scale.  prime  1S00 ms  £00 ms  &&&&&&&  100 ms  participant responds  Figure 2.2. Trial sequence in the long (100-ms) prime duration condition. Order of events from left to right: fixation point, forward mask, visual prime, backwards mask plus auditory target, subject's response. Time intervals are not to scale.  39 The voice key measured reaction times from the onset of the auditory target. The tape recorder continuously recorded all verbal responses during the task. As well, the experimenter, seated to the right and back of the participant, transcribed verbal responses and noted any potential behaviors that might have affected the voice key (e.g., coughs or lip smacking). The cross-modal experiment was broken into two major blocks, containing 108 trials each. The first and second blocks consisted of short (33-ms) and long (100-ms) prime duration conditions, respectively. The reason for this fixed block sequence was to minimize the extent of the priming effects of the longer 100-ms condition on the response times of the shorter 33-ms condition. There was a distinction in the literature between the long-term (i.e. conscious) and short-term (e.g. masked) priming (e.g., Bowers, Damian, & Havelka, 2002) in the extent of their effects. Conscious priming included an episodic memory trace, and hence, the effects were reported to last minutes, hours, or longer. Short-term priming effects, however, were believed to last for a couple of seconds (Bowers et al., 2002). Hence, it was expected that any priming effects in the 33-ms condition would not extend into the 100-ms condition. Of the 108 trials in each block, 54 trials were experimental and 54 were fillers. The experimental trials were divided into the six prime-target trial conditions (see section 2.3.3 for specific numbers of trials). The filler trials were composed of 27 matching (e.g., pose-POSE) and 27 unrelated (e.g.,pose-LUG)  trials.  The trials in each block were presented in one of two fixed ordered lists (see Appendix IV for the first trial list). Each trial list contained 108 trials that were pseudorandomly arranged. The first trial list was constructed, using an online random sequence  40 generator (Haahr, 1999). Afterwards, the sequence of trials was adjusted to ensure that the types of trials were balanced across the list. Changes were warranted to ensure that two consecutive trials did not share the same onsets, orthographically or phonologically. Semantic relations between primes and targets across consecutive trials were similarly eliminated. As well, no more than two matching trials, i.e. prime matching target, were allowed in a row. A second trial list was created by reversing the order of the first. In other words, the first list began with the pair hedge-SOOT, whereas the second list ended with hedge-SOOT. Each trial list corresponded to a single prime duration block. The pairing of trial list and block was controlled by E-Prime, and was counterbalanced by subject. There were 216 trials in total. As well, there were ten practice trials to permit the participants to become familiar with the task. Breaks were provided within and between blocks. The overall session, including breaks, took about one hour to complete.  41 C H A P T E R 3: RESULTS  3.1  Overview This chapter begins with an explanation of how the data were prepared for  analysis, including the treatment of errors and outliers. The rest of the chapter divides the results into two parts. First, findings from the reaction time data are described. The second section presents results for the word repetition data. The chapter ends with a summary of findings. 3.2  Preparation of the data for analysis Studies using reaction time data have typically identified and removed a small  subset of trials containing errors or extreme outliers (e.g., Kouider & Dupoux, 2001). Eliminating these errors permits the researcher to more accurately evaluate the relationships of interest in the data. Likewise, the data was screened prior to analysis for errors related to participants and equipment. As well, extreme outliers were identified as described below. Given that there were 216 trials per participant and 22 participants, a maximum of 4752 data points were available for analysis. Of this total, 218 points (4.6%) were removed due to error or extreme outliers. A breakdown of excluded trials is elaborated in the following. 3.2.1  Equipment error Equipment error was defined as an event where the voice key failed to activate on  the participant's response. These instances were noted during the session whenever the trial failed to advance. Eyeballing the reaction time data for unusually long latencies (e.g. over five seconds) confirmed the identification of the error. Only 10 out of 4752 data points (or 0.2%) across all subjects were removed due to voice key problems.  42 3.2.2  Response errors In unambiguous target trials, a response error was identified when the participant  repeated a word that was not the target. In ambiguous target trials, a response error was noted when the response was not a member of the associated minimal pair (e.g., tin instead of bin or pin).  Response errors were identified during the session by the  experimenter on a response sheet. After the session, the experimenter reviewed the audio recording to verify occurrences of response error. A total of 192 data points (4.0%) were removed from the reaction time analysis due to response error. Fifty of those errors occurred in trials containing ambiguous targets. 3.2.3  Extreme outliers The procedure used for the identification of outliers varies slightly between  authors with some choosing values outside the mean plus or minus two or three standard deviations (e.g. Ziegler, Ferrand, & Montant, 2004). In the current study, reaction time values that were greater or less than three times the interquartile range from the limits of that range were defined as extreme outliers. These extreme values were labeled in SPSS for Windows (SPSS Inc., 1999) using the Explore feature. In this manner, 16 data points (0.3%) were removed from analysis. 3.3  Reaction time data This portion of the chapter reviews reaction time data in the current experiment.  The following sections are structured to reflect the A N O V A output. In turn, the sections are ordered accordingly: the overall A N O V A , each main effect, and the interaction effect. Where relevant within a particular section, emphasis on particular findings will be made  43 by noting their relationship to the study's hypotheses. Appended to the end of this portion is an overview of the findings for filler trials. 3.3.1  Omnibus A N O V A The average reaction time (RT) values ranged from 605 to 1053 ms ( M = 807.06;  SD = 123.43). The RT means and standard deviations for the experimental conditions are shown in Table 3.1 below.  Table 3.1 Target Reaction Times (ms) by Prime Duration and Prime-Target  Short (33-ms) Prime-Target Trial Type Matching Opposing Unrelated Voiced Bias Voiceless Bias Unrelated-Ambiguous  M 822 824 820 824 832 843  :  SD 122 122 114 146 135 . 120  Pair Type  Long(lOO-ms)  :  M 706 744 734 719 796 737  SD 151 145 124 152 191 180  Note. Voiced Bias, Voiceless Bias, and Unrelated-Ambiguous trial types contain ambiguous targets, whereas Matching, Opposing, and Unrelated trial types do not. Examples of each prime-target condition in parentheses: Matching (ban-BAN), Opposing (pan-BAN), Unrelated (net-BAN), Voiced bias (ban-XAN), Voiceless bias (pan-XAN), Unrelated-Ambiguous (net-XAN).  44 For experimental trials, a 2 X 6 repeated measures A N O V A was conducted in SPSS, designating prime duration (2) and prime-target trial type (6) as wi thin-subjects factors (see Table 3.1 above for respective levels). Trial list sequence (i.e., list 1 then 2 versus list 2 then 1) was also included as a between-subjects factor. The General Linear Model (GLM) repeated measures feature in the SPSS software ascertained that sphericity could not be assumed for prime-target trial type and for its interaction with prime duration. As a result, the Greenhouse-Geisser adjustment of the degrees of freedom for those within-subjects effects was used for tests of significance (but unadjusted degrees of freedom are reported here for clarity since the significant values were virtually the same). The main effects of prime-target trial type, F (5, 100) = 4.45, MSE = 5,551,/? = 0.008, and prime duration, F (1, 20) = 27.94, MSE = 17,990,/? < 0.001, were significant. The interaction between prime-target trial type and duration was also significant, F (5, 100) = 3.82, MSE = 4,515,/? = 0.018. Analysis of between-subjects effects revealed that trial list sequence was not significant, F ( l , 20) = 0.198, MSE = 211,020, p = 0.661. Interactions between trial list sequence and the within-subjects factors also were not significant (all tests of significance showing p > 0.1). 3.3.2  Prime-target trial type The main effect of prime-target trial type was further analyzed using paired t-  tests. These planned comparisons tested were in accordance to the reaction time hypotheses outlined in section 1.5.1 of the introduction chapter. The tests of significance were one-tailed, as each planned comparison was arranged so that the first prime-target trial type was expected to be slower than the second. The results of the t-tests are shown in Table 3.2 below, and they are individually addressed in the following subsections.  45 Table 3.2 Planned Comparisons of Experimental Trials by Prime-Target  #=21 Paired t-test Unrelated Ambiguous-Voiced Bias Unrelated Ambiguous-Voiceless Bias Unrelated-Matching Opposing-Unrelated Unrelated Ambiguous-Unrelated  Trial Type  a = 0.05  t  p (1-tailed)  1.77 -1.73 • 1.667 0.899 0.896  0.046 0.050 0.055 0.190 0.190  Note. Examples of each prime-target condition in parentheses: Matching (ban-BAN), Opposing (pan-BAN), Unrelated (net-BAN), Voiced bias (ban-XAN), Voiceless bias (panXAN), Unrelated-Ambiguous (net-XAN).  3.3.2.1  Ambiguous versus unambiguous trials As seen in Table 3.2 above, ambiguous targets were not significantly different  (i.e.,/? > 0.1) from unambiguous targets when both were paired with unrelated visual primes. This did not support the first hypothesis in section 1.5.1 that ambiguous auditory stimuli would be slower to process than unambiguous stimuli. 3.3.2.2  Matching versus unrelated trials Even though the difference is marginal (i.e.,/? < 0.1), there was a trend for trials  containing identical or matching prime-target relationships to be faster than trials containing an unrelated prime. This would support the second hypothesis in section 1.5.1 that a target word is primed by an identical word.  46 3.3.2.3  Voiced and voiceless bias versus unrelated ambiguous trials The difference in latencies between ambiguous targets preceded by a voiced bias  prime versus those preceded by an unrelated prime was significant (i.e.,/? < 0.05). This would partially support the third hypothesis in section 1.5.1, stating that processing an ambiguous auditory target (e.g., XOAT) is facilitated by a potentially matching visual prime (i.e., goat as opposed to reef). The difference between ambiguous targets paired with voiceless bias primes versus unrelated primes was marginally significant (i.e.,/? = 0.05), but in the reverse direction and contrary to the hypothesis. The mixed findings are interpreted in the discussion chapter. 3.3.2.4  Opposing versus unrelated trials The latencies in trials containing opposing prime-target pairings (e.g., ban-PAN)  were not significantly different (i.e.,/? > 0.1) from unrelated pairings (e.g., net-BAN). This finding did not support the fourth hypothesis in section 1.5.1 stating that an auditory target is slower to process when preceded by a visual prime that differs from the target by the initial letter (and phoneme). 3.3.3  Prime duration Post-hoc examination of prime duration was conducted by reviewing the G L M  repeated measures outputs for experimental trials. The G L M feature in SPSS automatically provided a post hoc comparison between the short (33-ms) and long (100ms) conditions of the prime duration variable, using 95 % confidence intervals for differences in means. As seen in Table 3.3 below, the short priming condition had slower latencies than the long condition. This difference between conditions was deemed significant (p < 0.001).  47  Table 3.3 Reaction Time Means and Difference (ms) by Prime Duration  a = 0.05 Condition Short Long Short - Long  (AM)  M  SE  828 741  26.5 32.4  87  16.6  p (2-tailed)  < 0.001  Note. Short and Long respectively refer to the 33-ms and 100-ms prime duration conditions.  The short and long condition reaction times were analyzed for learning effects (i.e., faster performance over the course of the session), as such learning artifacts would confound the main effect for prime duration. Average reaction times were compiled for four session sections. Each section was composed of 27 trials. The first and second sections corresponded to the first and last 27 trials of the short block, whereas the third and fourth sections contained the first and last 27 trials of the long block. As illustrated in Figure 3.1, there was a general trend of reaction times decreasing across the course of the session. A four-level repeated measures A N O V A was performed, defining section as a within-subjects factor and trial list sequence as a between-subjects factor. Sphericity could not be assumed for section, and therefore, the Greenhouse-Geisser adjustment of the degrees of freedom for was used for tests of significance (but unadjusted degrees of freedom are reported here for clarity). The A N O V A reported a significant main effect for section, F (3, 60) = 28.48, MSE = 5,158,/? < 0.001, and a non-significant effect for  48 trial list sequence, F(\, 20) = 0.16, MSE = 65,556,/? = 0.90. There was a significant interaction effect between section and trial list sequence, F (3, 60) = 3.31, MSE = 17,095, p = 0.045, but both trial list sequence groups followed the same trend as depicted in Figure 3.1. As a result, this interaction effect was no longer pursued.  Reaction time a c r o s s the s e s s i o n by section  1000 900 800  E E +J  c o o ra a>  or  IIP  700 600 500 400 300 200 100  Section 1  Section 2  Section 3  Section 4  Trial section  Figure 3.1. Reaction time averages (with SE bars) over the course of the session, indicating a practice effect in the experiment. Sections are in chronological order. Each section is composed of 27 consecutive trials. Section 1 and 2 correspond respectively to the first and last 27 trials of the short (33-ms) prime duration condition. Section 3 and 4 refer to the first and last 27 trials of the long (100-ms) condition respectively.  49 The repeated measures A N O V A generated tests of significance between section latencies using 95 % confidence intervals for differences in means. A Bonferroni adjustment was applied for these multiple comparisons. The tests revealed that five of the six possible comparisons showed significant differences (see Table 3.4). Only the difference between Section 2 and 3 was deemed as non-significant. However, the comparisons confirmed that reaction times did decrease within each prime duration condition and across the session. This finding pointed to a practice or training effect during the experiment, which confounds the main effect of prime duration reported in the omnibus A N O V A (see section 3.3.1). As stated in the previous chapter, the fixed order of short and long prime duration conditions was necessary to prevent consciously identifiable primes from influencing the latencies in trials using briefly displayed primes.  Table 3.4 Post Hoc Comparisons of Session Trial Sections  a = 0.05 Comparison Section 1 and 2 Section l a n d 3 Section l a n d 4 Section 2 and 3 Section 2 and 4 Section 3 and 4  AM 50 78 162 28 112 84  Note. A l l p-values show Bonferroni adjustment.  SE IZ6 22.8 22.3 15.5 15.8 16.7  p (2-tailed) 0.004 0.016 < 0.001 0.504 < 0.001 < 0.001  50 3.3.4  Prime-target trial type and prime duration The significant interaction effect found in the repeated measures A N O V A for  experimental trials required further examination. A six-level repeated measures A N O V A was used for each prime duration to assess the differences between the prime-target trial types. The short duration condition had no significant main effect of trial type, F (5, 100) = 0.89, MSE = 1,788, p = 0.491. The long duration condition contained a significant main effect, F (5, 100) = 5.71, MSE = 7,163,/? = 0.003, sphericity not assumed. Collectively, this interaction and the follow-up univariate analyses indicate that the main effect previously identified for prime-target trial type in section 3.3.1 was limited to the long duration condition (see Figure 3.2).  51  Reaction time of experimental trials by prime duration and primetarget trial type  - • — Matching o - - Opposing -H Unrelated -*—Voiced -•—Voiceless — Unr-Amb  Short  Long Prime duration  Figure 3.2. Reaction time of experimental trials by prime-target trial type and prime duration. Prime duration conditions: Short (33-ms) and Long (100-ms). Examples of each prime-target condition: Matching (ban-BAN), Opposing (pan-BAN), Unrelated (netBAN), Voiced bias (ban-XAN), Voiceless bias (pan-XAN), Unrelated-Ambiguous (netXAN).  As a further follow up, the planned comparisons described in section 3.3.2 were evaluated in the long condition using paired one-tailed t-tests. As seen in Table 3.5 below, the analyses showed that the latencies for matching trial types were significantly faster than unrelated trial types, which agreed with the finding described in section 3.3.2.2. Also, as shown in Table 3.5, the voiceless bias trials were significantly slower than unrelated ambiguous trials, echoing the finding described in 3.3.2.3. A l l other comparisons in the long duration were not significant (p > 0.1).  52 Table 3.5 Paired t-Tests of Experimental Trials by Prime-Target df=2\  Paired t-test Unrelated-Matching Unrelated Ambiguous-Voiceless Bias Unrelated Ambiguous-Voiced Bias Opposing-Unrelated Unrelated Ambiguous-Unrelated  Trial Type (Long)  a = 0.05  t  p (1-tailed)  2.700 -2.28 1.080 0.888 0.150  0.007 0.017 0.146 0.192 0.441  Note. Planned comparisons performed in the long prime duration condition only. Examples of each prime-target condition in parentheses: Matching (ban-BAN), Opposing (pan-BAN), Unrelated (net-BAN), Voiced bias (ban-XAN), Voiceless bias (pan-XAN), Unrelated-Ambiguous (net-XAN).  As illustrated in Figure 3.2 above, it appeared that the voiceless bias trials were driving the interaction effect. To further explore this finding, the A N O V A output for the long priming condition was reviewed for post-hoc comparisons between voiceless bias trials and the other types. The pairwise comparison for prime-target trial types revealed that voiceless bias trials were significantly slower than the other trial types in the long (100-ms) priming condition (all two-tailed p's < 0.05). The implication of this finding will be discussed in the following chapter. The contrasting effects of voiced and voiceless bias trials necessitated further investigation because the voice key responds more slowly to voiceless stops than to voiced stops (e.g., Kessler, Treiman, & Mullennix, 2002), and therefore, the apparent inhibition effect in the voiceless bias condition may be attributed to the number of voiceless responses in that condition (see section 3.4.3). The reaction time data in the  53 100-ms condition for the ambiguous target trials were further separated by the voicing onset of the subjects' responses (e.g., XAN identified as either BAN or PAN). The voiced and voiceless bias trials were compared to the unrelated-ambiguous trials within the voiced and voiceless response subsets using paired t-tests. These post-hoc comparisons required two-tailed significance tests in light of the unexpected negative direction of voiceless bias trials. Within the voiced response data (e.g., XAN identified as BAN), voiceless bias trials were marginally slower than unrelated ambiguous trials, t (21) = 2.074, p = 0.051. Within the small voiceless response data subset (e.g., XAN identified as PAN), the voiceless bias trials appeared to be faster than unrelated ambiguous trials, but the difference was not significant (p > 0.1). Regardless of voicing in the subjects' response onsets, the voiced bias and unrelated ambiguous trials were not significantly different (p's > 0.1). Overall, when accounting for possible voice key effects, there is still an apparent inhibition effect by voiceless bias primes as indicated by the marginal effect in the voiced responses data subset. 3.3.5  Filler trials Filler trials were relevant to this experiment in two supplementary ways. First,  these trials were arranged in either matching (e.g., chop-CHOP)  or unrelated (e.g., soot-  HEDGE) prime-target trial types, which permitted an additional opportunity to test the prediction that matching trials would have faster reaction times than unrelated trials (see second hypothesis in section 1.5.1). Second, these filler trial types were included in the short prime duration condition, which allowed a secondary testing of the prediction that briefly displayed primes matching their corresponding auditory targets would lead to  54 faster reaction times. In turn, this section summarizes the filler trial findings in this manner. Figure 3.3 below shows filler trial latencies across conditions.  Reaction time of filler trials by prime duration and prime-target trial type  950 .,_  1  900  700 -I  1  Short  Long Prime duration  Figure 3.3. Reaction time of filler trials by prime-target trial type and prime duration. Prime duration conditions: Short (33-ms) and Long (100-ms).  Using the G L M feature, a 2 X 2 repeated measures A N O V A was conducted in SPSS, designating prime-target trial type (i.e., matching and unrelated) and prime duration (i.e., short and long) as within-subjects factors. Trial list sequence was included as a between-subjects factor. The main effects of prime-target trial type, F (1, 20) = 7.12, MSE = 839,p = 0.015 and prime duration, F (1, 20) = 38.21, MSE = 4,813, p < 0.001 were significant. The interaction between prime-target trial type and duration was also significant, F (1, 20) = 26.83, MSE = 843, p < 0.001. Analysis of between-subjects  55 effects revealed that trial list sequence was not significant, F (1, 20) = 0.027, MSE = 61,774,/? = 0.872. Interactions between trial list sequence and the within-subjects factors also were not significant (all tests of significance showing p > 0.1). As a planned comparison, a paired t-test was conducted comparing average latencies for the matching and unrelated prime-target trial type conditions (respectively M = 825.21, SD = 128.21, and M= 843.11, SD = 115.41). Unrelated trials were predicted to be slower to process than matching trials. Hence, the test of significance was one-tailed at the 0.05 alpha level. The t-test was significant, t (21) = 2.93,/? = 0.004, indicating that reaction times for matching trials were faster than unrelated trials. This coincided with the trend reported for experimental trials in 3.3.2.2, and partially supported the hypothesis that auditory words are processed faster when preceded by an identical visual word (see section 1.5.1). A second paired t-test compared the average latencies for matching ( M = 888.35, SD = 118.58) and unrelated ( M = 873.54, SD = 113.91) trial types in the short priming condition. The one-tailed t-test was not significant, as unrelated trials appeared to be faster than matching trials, t (21) = -1.957,/? - 0.0319. This did not support the hypothesis that briefly displayed primes that are identical to auditory target result in faster reaction times (see 1.5.1 in the introduction chapter). A third paired, one-tailed, t-test of filler prime-target conditions was conducted in the long prime duration condition. The average reaction times for matching ( M = 763.76, SD = 150.35) and unrelated ( M = 812.50, SD = 125.02) trial types were deemed significantly different, t (21) = 5.068,/? < 0.001. This finding coincided with the first t-  56 test mentioned above, and indicated that latencies for matching trials were faster than unrelated trials.  3.4  Word repetition data This part of the chapter reviews the word repetition data in the. study, particularly  responses to ambiguous targets. The first three sections describe the data and identify apparent trends. The last section specifically reviews the instances that a response to an ambiguous target corresponded to (i.e., matched) a visual prime. 3.4.1  Unambiguous targets Unambiguous target trials were not of interest for word repetition analysis, as  responses were expected to and did vary little. Incidentally, the majority of response errors were in trials with unambiguous targets. Examples of error prone words were fool, hedge, jug, and thief which were interpreted respectively by some participants as pool, edge, chug, and teeth.  3.4.2  Ambiguous targets across all conditions For every ambiguous stimulus, the majority of responses were voiced. As  illustrated in Figure 3.4, voiced responses ranged from 64.4% foxXOATio 96.2 % for XIN. Voiceless responses did not exceed 7.6% with the exception of XOAT with 35.6%. For XAN only one voiceless response (0.8%) was noted. Percentage of errors ranged from zero for XIN waA XOAT \o 16.7% for XAP.  57  Responses to ambiguous targets  D.  XAN  XIN  XAB  XIP  XAP  XOAT  Target  Figure 3.4. Summary of word repetition responses to ambiguous targets. Examples of voiced and voiceless responses for each target in parentheses: XAN {Ban-Pan), XIN {BinPin), XAB {Dab-Tab), XIP {Dip-Tip), XAP {Gap-Cap), and XOAT {Goat-Coat). Examples of response errors in parentheses: XAN {bam, an), XAB {dam, dan), XIP {tick), and XAP {gack, cat). No response errors were observed for XIN and XOAT.  3.4.3  Ambiguous targets by prime type Collapsing across prime duration conditions, responses were examined by specific  prime-target pairing. As seen in Figure 3.5, the majority of responses were voiced, ranging from 54.6% for coat-XOATto 100% for sag-XIN. Voiceless responses ranged from 0% for ban-XAN, net-XAN, sag-XIN, and jar-XAB to 45.6% for coat-XOAT. Errors ranged from 0% for XIN pairings to 18.2% for cap-XAP.  58  Responses to ambiguous targets by prime-target pair  0> D5 TO C  • Errors • Voiceless  o  • Voiced  0) D-  Prime-target pairing  Figure 3.5. Percentage of voiceless, voiced, and error responses to ambiguous targets by prime-target pairings. Short (33-ms) and long (100-ms) prime durations collapsed.  Collapsing prime-target pairing (see Figure 3.6), voiced responses continued to dominate the data. Both voiced biasing and unrelated prime trials seemed to have the most voiced responses (86.4% and 85.6% respectively). Errors ranged from 5.7% to 7.2% for voiceless biasing and unrelated prime trials respectively. As seen in Table 3.6 below, trials with voiceless biasing primes had the most voiceless responses (15.5%), verified by paired t-tests (i.e.,/?'s < 0.05). While this finding appeared to support the  59  hypothesis of a biasing effect, this link was weak since over 75% of responses to ambiguous trials with voiceless primes were voiced.  Responses to ambiguous targets by prime type  • Errors • Voiceless • Voiced  Voiced  Voiceless  Unrelated  Prime type  Figure 3.6. Percentage of voiceless, voiced, and error responses to ambiguous targets by prime type: Voiced (e.g., ban-XAN), Voiceless (e.g., pan-XAN), and Unrelated (e.g., netXAN).  Short (33-ms) and long (100-ms) prime durations collapsed.  Table 3.6 Comparisons of Voiceless Responses to Ambiguous Targets by Prime Type  Paired t-test Voiceless - Unrelated Voiceless - Voiced Voiced - Unrelated  df=21  a = 0.05  /  p (2-tailed) 0.002 0.009 0.77  3.58 2.87 0.29  Note. Examples in parentheses: Voiced (e.g., ban-XAN), Voiceless (e.g., pan-XAN), and Unrelated (e.g., net-XAN).  60 3.4.4  Frequency of responses matching prime There were 36 trials across the session using ambiguous targets. These trials  included 12 voiced- and 12 voiceless-biasing primes and 12 unrelated primes. If the related prime had a biasing effect on the repetition of an ambiguous target, then it was expected that the majority of responses would correspond to the voiced- and voicelessbiasing primes. When calculating a frequency of responses matching related primes, unrelated prime-ambiguous targets were ignored (no one repeated the unrelated prime in these trials). This left 24 trials per subject for examination. Table 3.7 shows the breakdown of responses matching the prime.  Table 3.7 Frequency of Responses Matching Visual Prime by Prime Type and Duration  Voiced bias Voiceless bias Total  Short 107(81)  Long 121 (92)  22 (17)  19(14)  129  140  Total 228 41 269  Note. Cell percentages in parentheses. Each cell is out of 132 trials (i.e., 6 trials multiplied by 22 subjects). Short and Long correspond to 33-ms and 100-ms prime duration conditions. Prime type: Voiced bias (bin-XIN), Voiceless bias (pin-XIN).  3.4.4.1  . Biasing identification of ambiguous targets Across the participants, the occurrences of response matching the prime ranged  from 9 to 17 out of 24 trials. The mean and standard deviation were 12.27 and 1.72  61 respectively, indicating that just over half of ambiguous targets in the combined voicedand voiceless-biasing trials were identified the same as the prime. This did not appear to strongly support the first hypothesis in section 1.5.2 (see introductory chapter) stating that in trials such as dip-XIP or tip-XIP, where the visual prime is potentially identical to the ambiguous auditory target, the majority of responses will correspond to the prime. On closer inspection of the word repetition data, the majority (86%) of responses in voiced-biasing trials (e.g., dip-XIP) corresponded to the prime (i.e., XIP identified as dip). However, only a minority (16%) of voiceless-bias trials matched the prime. The difference in the number of matches between the voiced ( M = 10.36, SD = 1.09) and voiceless (M= 1.86, SD = 1.78) bias trials was found to be significant using a paired, two-tailed t-test, t (21) = 16.71,/? < 0.001. These findings will be discussed in the next chapter. 3.4.4.2  Biasing identification by prime duration In the long priming condition, 53 % of responses to ambiguous spoken words  corresponded to the prime word, whereas only 49% of responses in the short prime duration condition matched the preceding prime. Thus, there was mixed support for the second word repetition hypothesis in section 1.5.2 that predicted the majority of responses will correspond to the prime, whether the prime is displayed briefly or for a longer period, in trials where the visual prime is potentially identical to the ambiguous auditory target (e.g., dip-XIP or tip-XIP). The number of matches between the short ( M = 5.91; SD = 0.87) and long ( M = 6.36; SD = 1.50) prime durations was also tested for significance. A paired, two-tailed, t-  62 test with an alpha level of 0.05 showed the difference between short and long durations was not significant, t (21) = -1.226,/? = 0.234. 3.5  Summary of results Reaction times on trials consisting of matching (i.e., identical) prime and target  words (e.g., ban-BAN) were significantly faster than on unrelated trials (e.g., net-BAN). As well, it appeared that ambiguous spoken words (e.g., XAN) were processed faster when the visual prime was potentially identical and contained a voiced onset (i.e., ban primed XAN, not pan). However, these effects were only observed in the condition with the longer prime duration. Overall, ambiguous spoken words did not have significantly slower latencies than unambiguous words when both were preceded by an unrelated visual prime. Additionally, opposing prime and target words (e.g., ban-PAN) were not significantly different from unrelated word pairs. Surprisingly, voiceless bias trials (e.g., pan-XAN) were significantly slower than unrelated-ambiguous trials (e.g., net-XAN) in the long priming condition. Also unexpected was a significant decrease in reaction times over the course of the session. Word repetition data analysis found that the majority of responses to ambiguous targets were voiced. Voiceless bias trials had significantly more voiceless responses than either voiced bias or unrelated ambiguous trials, yet the majority of responses in voiceless bias trials were voiced. Overall, there was a small majority of responses matching (or corresponding to) the biasing primes. On further inspection, the majority of ambiguous spoken words in voiced bias trials was identified the same as the visual prime (i.e., XAN in ban-XAN was identified as BAN). Conversely, a minority of ambiguous words in  63 voiceless bias trials was repeated as voiceless. Voiced bias trials had significantly more matches than voiceless bias trials regardless of prime duration condition.  64 C H A P T E R 4: DISCUSSION  4.1  Introduction As stated in the first chapter, the current inquiry intended to explore the  constraints on ambiguity resolution during auditory word recognition. This exploration sought to better understand the process of extrapolating the speaker's intended message from the acoustic medium. Spoken communication is complicated by a variety of factors that can result in ambiguity when parsing speech. However, a variety of cues are also available for the listener to decipher another's spoken message. Past research has shown cross-modal effects of print on speech perception (e.g., Kirsner & Smith, 1974; Dodd et al., 1988; Borowsky et al., 1999), indicating an interdependent processing relationship between the modalities that enables orthography to serve as a cue for understanding ambiguous spoken words. The current study tested this notion by pairing printed words with isolated spoken words, some of which were digitally edited to be perceptual-lexically ambiguous (e.g., ban versus pan). It was expected that phonologically related primes would bias listeners to identify the ambiguous spoken word targets in accordance with what they saw (i.e., ban priming XAN resulting in a ban response). A second goal of the current inquiry was to discover the extent to which print could influence the speed of spoken word recognition (i.e., a priming effect), depending on the nature of the prime-target relationship. Prior research had indicated faster processing of auditory words when preceded by identical visual words (e.g., Kouider & Dupoux, 2001). The current project extended this line of research by pairing different combinations of visual primes with auditory targets. In one condition, pairs with identical primes and targets (e.g., ban-BAN) were expected to show  65 a priming effect relative to unrelated pairings (e.g., net-BAN). Pairs containing ambiguous targets and potentially matching primes were anticipated to show a smaller priming effect (e.g., ban-XAN).  Opposing prime-target pairs (e.g., ban-PAN) were  expected to be slower to process than unrelated pairs (e.g., net-PAN) due to competing activations of phonological representations that have a high degree of overlap (Lukatela & Turvey, 1996). A third goal of the current study was to ascertain whether the effect of print on speech perception could occur below the listener's level of awareness. In a recent study by Kouider and Dupoux (2001), the cross-modal priming effect by print only occurred when the visual prime was displayed long enough to be recognized, whereas within-modal priming by print (i.e., both prime and target are text) was observed regardless of prime exposure. More recently, Grainger et al. (2003) found contrary evidence showing cross-modal priming in the absence of conscious recognition of the visual primes. This issue was addressed by employing short and long prime exposures within the current study. It was expected that cross-modal priming below the participant's awareness would be evidence of strong automatic connections between orthography and phonology. This chapter discusses evidence from the present study regarding the ability of printed words to disambiguate spoken words and whether this facilitation is below subjects' awareness. A review and evaluation of results, organized by the order of findings in chapter three, precedes a general discussion of their implications. Finally, the study's limitations and suggestions for future endeavors are provided.  66 4.2  Reaction time data  4.2.1  Prime-target trial type The significant main effect of prime-target trial type (see results section 3.3.2)  indicated that there was at least one difference between two pairing types. As seen in Table 3.2 (see section 3.3.2), three of the five planned comparisons revealed significant or marginally significant differences between prime-target trial types, which highly suggest that ambiguous and unambiguous spoken words are influenced by preceding text. Before examining these planned comparisons in detail, it should be stated that marginal significance (i.e., where 0.1 >p > 0.05) for some contrasts indicate a need to replicate these findings with a larger sample size. 4.2.1.1  Ambiguous versus unambiguous trials Ambiguous words were hypothesized to be harder to process than unambiguous  words. This hypothesis was based on findings from prior research (Pisoni & Tash, 1974; Repp, 1981). Slower reaction times were therefore anticipated for ambiguous targets preceded by unrelated primes relative to unambiguous targets following unrelated primes. The current study, however, did not find a significant difference in reaction times between ambiguous and unambiguous trials preceded by unrelated primes. One explanation for not finding a difference in latencies is that listeners can process ambiguous words as fast as unambiguous words—an idea that would not be. supported by the literature. Equally unviable would be an account that argued that the unambiguous stimuli were too ambiguous, because these stimuli were not digitally edited. Failing to support the hypothesis, the data bring into question the ambiguous quality of the experimentally manipulated stimuli. This possibility accords with the finding of a strong  67 tendency for subjects to repeat ambiguous targets as voiced (to be discussed further in section 4.3.1 below). 4.2.1.2  Matching versus unrelated trials Prior research (e.g., Forster & Davis, 1984) has found that target words were  processed faster if they were preceded by an identical, as opposed to an unrelated, prime (e.g., ban-BAN).  This priming effect has been observed even when the prime is visual  and the target is auditory (Kouider & Dupoux, 2001; Grainger et al., 2003). It was therefore hypothesized that matching prime-target pairings would have faster reaction times relative to unrelated prime-target trials. Supporting this hypothesis, the current study found that reaction times were faster for identical prime-target pairs than unrelated pairings. Even though the effect was marginal for this pair type in experimental trials in the omnibus A N O V A (see 3.3.2.2), closer inspection of the long prime duration condition revealed a robust priming effect (i.e.,/> = 0.007, as reported in section 3.3.4). As well, there was a significant priming effect for matching prime-target trials in filler trials (e.g., chop-CHOP).  The observed facilitation indicated that spoken words benefited by their  representation being activated by a prior identical prime in another modality. It also illustrated the presence of connections between the prime's orthographic form and its phonological representation shared with the target. Such interconnectivity would be supported by connectionist models of spoken and visual word recognition (e.g., Plaut et al., 1996). 4.2.1.3  Voiced and voiceless bias versus unrelated ambiguous trials It was hypothesized that ambiguous targets would be easier to process i f preceded  by a potentially matching prime word instead of an unrelated prime (e.g., ban-XAN or  68 pan-XAN versus net-XAN). This prediction stemmed from the finding that spoken words degraded in noise were faster to process when coinciding with a potentially matching displayed prime word (Frost, Repp, & Katz, 1988). Likewise, a printed word that was a potential candidate of an ambiguous spoken word was anticipated to facilitate reaction times. The hypothesis was not fully supported by the data. Voiced bias trials (e.g., banXAN) were significantly faster than unrelated ambiguous pairings (e.g., net-XAN), which supported the hypothesis. However, this effect was not robust enough to be detected in the long priming condition (i.e., p > 0.1, see section 3.3.4). Unexpectedly, voiceless bias trials (e.g., pan-XAN) were marginally slower than unrelated ambiguous trials. Further inspection of the long prime duration condition revealed a significant inhibition effect in voiceless bias trials compared to unrelated ambiguous trials. Post-hoc analysis also indicated that voiceless bias trials differed significantly from all other trials, resulting in a significant interaction effect in the omnibus A N O V A (to be further discussed in section 4.2.3 below). The implications of these mixed results will be discussed in section 4.4 below. 4.2.1.4  Opposing versus unrelated trials For trials where the visual prime was an orthographic and phonological neighbor  (i.e., competitor) of the auditory target (e.g.,pan-BAN),  it was hypothesized that such  auditory targets would be slower to process than if they had been preceded by an unrelated prime. This inhibitory effect has been documented in priming studies using rhymes (e.g., Lukatela & Turvey, 1996). It was expected that a prime opposing its target on the initial letter would activate a phonological neighbor of the target that in turn would interfere with the processing of the target. The results did not show such an inhibitory  69 effect in these opposing prime-target trials relative to unrelated pairings. The interpretation of this finding will be discussed (see section 4.4 below). 4.2.2  Prime duration Latencies for the long prime duration condition were significantly faster than the  short condition. On further examination, it was found that reaction times decreased throughout the session, evidence of a training effect. Counterbalancing the order of short and long prime duration blocks might have controlled this training effect, but fears of long-term priming effects by the consciously identifiable primes were more of a concern. Although the full impact of this artifact on the reaction time data is unknown, one can anticipate that the latencies in the long condition would be more affected, as it represents the latter half of the session. For example, one could argue that the priming effect for matching prime-target trials in the long condition (as discussed in 4.2.1.2) was due to the training effect. However, this argument is rebuffed in the discussion about the interaction effect (see next section). 4.2.3  Prime-target trial type and prime duration The significant interaction effect implied that differences between prime-target  trial types were dependent on the prime duration. Further examination of the interaction effect revealed that the significant differences between prime-target trial types were limited to the long prime duration condition (see section 3.3.4 in the results chapter). Matching trials (e.g., ban-BAN) were significantly different from unrelated trials (e.g., net-BAN) as discussed above, and voiceless bias trials were significantly slower than unrelated ambiguous trials. Post-hoc analysis also found that voiceless bias trials were  70 significantly slower than all other trial types in the long condition, indicating that these trials were driving the interaction effect. The interaction effect has implications concerning the training effect (as described in the previous section). It was proposed that one might attribute a decrease in latencies within a prime-target trial type to the training effect. However, the training effect cannot account for the significant differences between prime-target types in the long condition. Particularly, the training effect cannot explain the robust inhibition effect observed in voiceless bias trials. It is possible that the training effect reduced the means to detect priming effects in conditions with weaker power, but it did not wash out the effects of stronger conditions like the voiceless bias trials. 4.2.3.1  Cross-modal effects with briefly displayed primes It was hypothesized that each of the reaction time hypotheses stated above would  be observed even when the prime duration was too short for conscious identification. This prediction stemmed from visual word recognition studies that reported priming effects with briefly displayed primes (e.g. Ferrand & Grainger, 1993; Lukatela & Turvey, 1996). Contrary to expectation, there was evidence against cross-modal priming effects with briefly displayed primes. Particularly, the significant interaction effect found in the omnibus A N O V A (see section 3.3.4) indicated that the main effect of prime-target type was dependent on the prime duration condition. This interaction effect was supported by separate A N O V A s within the short and long conditions, as no significant differences were found between prime-target trial types only in the short prime duration condition. Providing additional counterevidence was the priming effect by identical (i.e., matching as in ban-BAN) prime-target pairings being limited to the 100-ms condition (see section  71 4.2.1.2 above). The inhibition effect observed in voiceless bias trials was also observed only in the long prime duration condition (see section 4.2.1.3). These results replicated the finding by Kouider and Dupoux (2001) who reported a similar null cross-modal effect with briefly displayed primes. These findings would indicate that there were temporal constraints on the influence of text on spoken words. Particularly, for cross-modal priming to occur, it would seem necessary to provide sufficient prime exposure to maximize its effects. Furthermore, the results would indicate that cross-modal priming requires longer prime exposures than documented in within-modal paradigms (i.e., visual primes and visual targets). 4.3  Word repetition data  4.3.1  Perceptual bias effect The majority of all responses to ambiguous targets were voiced (e.g., XAN was  often repeated as BAN). This was the case regardless of prime type, prime duration, trial list sequence, and specific minimal pair. Cumulatively, this would suggest that the experimentally manipulated stimuli were not sufficiently ambiguous, or that subjects had a voicing bias. If the stimuli were ambiguous and visual primes had no effect on ambiguous spoken words, one would still anticipate finding a split in voiced and voiceless responses. That is, the auditory stimuli like XAN would be identified as ban or pan at chance. While there were significantly more voiceless responses when the biasing prime was voiceless than with other types of primes, a similar statement could not be said for voiced responses, as both unrelated and voiced biasing primes had the most voiced responses. Together this was weak evidence of the visual prime influencing the repetition (i.e., identity) of the ambiguous target. However, any possible trend toward a  72 cross-modal bias effect was attenuated by the larger perceptual bias towards voiced responses. While the majority of voiced responses to the ambiguous stimuli could be attributed to the voice onset times being too short, the digitally manipulated stimuli were tested and deemed ambiguous in the pilot study (see Appendix I). 4.3.2  Frequency of responses matching prime  4.3.2.1  Biasing identification of ambiguous targets It was hypothesized that visual primes would influence the identification of  ambiguous spoken words. It was expected that the majority of the responses to ambiguous targets would match the preceding voiced and voiceless biasing primes. On the contrary, the data did not support this hypothesis. The results showed just over half of such trials had responses corresponding to the prime. This contradicted a prior study that observed the influence of printed letters on the identification of ambiguous synthesized syllables (Massaro et al., 1988). The current finding also contested another experiment that found that printed words biased listeners to perceive amplitudemodulated noise as true words when the prime had a potentially matching contour (Frost et al., 1988). Given that printed words did not clearly influence the identification of ambiguous spoken words, it would suggest that the any activation by a visual prime was not sufficient to resolve the perceptual-lexical ambiguity in audition. This finding could be attributed to a functional dissociation between visual and auditory word processing (i.e. the way the two modalities are connected), or a failure to attain a level of activation required to resolve the ambiguity in the auditory domain. Dissociation between orthographic and phonological word processing has received some support from findings from the developmental literature (e.g., Sprenger-Charolles, Siegel, Bechennec, &  73 Serniclaes, 2003) and neuropsychological accounts (e.g., Booth, Burman, Meyer, Gitelman, Parrish, & Mesulam, 2002). However, as discussed in the first chapter, there are other studies that have demonstrated interconnectivity between orthography and phonology through cross-modal priming (e.g., Kouider and Dupoux, 2001) and feedforward and feedback consistency effects (e.g., Ziegler & Ferrand, 1998). Moreover, spoken words were primed by identical printed words in the present study. Hence, one would have to consider the possibility that the amount of activation by the visual prime in the present study was not sufficient to result in an observable biasing effect. Another factor to consider is the perceptual bias toward voiced responses. As described in section 4.3.1 above, the majority of ambiguous targets preceded by a voiced biasing prime were identified as voiced, whereas a minority of such targets preceded by a voiceless biasing prime was labeled as voiceless. The number of matches in the voiced and voiceless bias trials were, therefore, significantly different. This finding would not lend support to the current hypothesis, as there was no presupposition for a biasing effect of ambiguous trials solely by voiced primes. 4.3.2.2  Biasing identification with briefly displayed primes It was also predicted that the influence of text on ambiguous words would be  observed in the short (as well as long) prime duration condition. In other words, visual primes were expected to influence the identification of ambiguous spoken words even when the prime was not consciously identifiable to subjects. It has been documented that briefly displayed primes produced significant activation in visual word recognition studies (e.g. Lukatela & Turvey, 1996). However, this hypothesis was not supported by the data. Only about half of the responses to ambiguous targets corresponded to voiced  74 or voiceless biasing primes when the prime was blinked. While this might indicate that printed words were not able to bias the resolution of ambiguous spoken words below conscious awareness, one was not able to make such a conclusion due to the pattern of results from the repetition data described in section 4.3.2.1. Namely, there was no interaction between prime duration and the number of matches, as the perceptual voicing bias was observed equally in the short and long conditions. 4.4  General discussion Recall that the current inquiry asked the following questions: 1. Can print influence the resolution of ambiguity in the speech signal? 2. And i f so, if this cross-modal influence available below the level of awareness?  Each of these questions will be addressed in turn using the results of the current study. 4.4.1  Print and the resolution of ambiguity in speech The first question regarding ambiguity resolution of speech by print was  addressed by recording reaction times to monitor changes in processing speed, and by measuring the number of matches between the subject's response and biasing prime. The reaction time results were mixed. On one hand, ambiguous targets were facilitated by voiced bias trials (e.g., ban-XAN, see section 4.2.1.3 above), although a second planned comparison within the long condition (i.e., a reduction in power to detect a difference in a smaller subset of data) showed no significant facilitation by the voiced bias primes. On the other hand, ambiguous spoken words appeared to be inhibited by voiceless bias primes (e.g., pan-XAN), an effect that was robust despite a training effect in the data. This was unexpected since both voiced and voiceless bias primes were predicted to  75 facilitate the ambiguous words. One could argue that both types of influence provide evidence that print can influence the resolution of ambiguous spoken words. That is, the facilitative and inhibitory effects can be seen respectively as positive and negative constraints in ambiguity resolution. Overall, the word repetition data did not provide supporting evidence of text influencing the resolution of ambiguous spoken words. The majority of responses to ambiguous targets were voiced (e.g., XAN identified as BAN), which was interpreted as a perceptual bias effect in the data (see section 4.3.1 above). In turn, the high number of voiced responses coincided with the majority of responses in voiced bias trials matching the voiced prime and the minority of responses in voiceless bias trials matching a voiceless prime. The predominance of voiced responses to ambiguous spoken words in the data cast doubt on the degree of ambiguity of the digitally edited stimuli. If the stimuli were perceptually more voiced, then that would account for the majority of voiced responses in the repetition data. The perceptual bias would explain why the unrelated-ambiguous trials (e.g., net-XAN) were not significantly different from unambiguous trials (e.g., netBAN).  It would also explain the facilitative effect in reaction times by voiced bias  primes, as such primes before a perceptually voiced target word would be similar to an identical pairing (i.e., ban-BAN instead of ban-XAN).  As well, it would account for the  apparent inhibitory effect by voiceless bias primes since a voiced target word following a voiceless prime would lead to lexical competition between two similar phonological forms (i.e., pan-BAN instead of pan-XAN).  It is unclear i f voiceless bias primes (e.g.,  pan) facilitated latencies when ambiguous targets were identified as voiceless (e.g., XAN  76 identified as PAN), since there was not enough voiceless response data to verify this. The low number of voiceless responses also made it difficult to decide if voiced bias primes inhibited the processing of ambiguous targets identified as voiceless (i.e., similar to an opposing pairing ban-PAN). Unlike the prediction for the voiceless bias trials, opposing prime-target trials (e.g., pan-BAN) were hypothesized to have longer latencies due to competing similar lexical forms. However, no inhibitory effects were observed in these trials compared to unrelated trials (e.g., net-BAN). The opposing trials did not contain digitally manipulated stimuli, hence, one could argue that the unambiguous targets in these trials were not vulnerable to inhibitory effects of an opposing prime, because the target only activated itself. Conversely, the ambiguity in the edited stimuli may have allowed the voiceless bias prime to inhibit processing, because the target activated both the voiced and voiceless representations.  This conclusion is problematic because the ambiguous stimuli  were not apparently highly ambiguous (although they were identified as such in the pilot study), yet they could be inhibited by a potentially opposing prime. To reconcile these two conflicting findings, one can suggest that the degree of ambiguity of the digitally edited stimuli was sufficient to permit the inhibitory effect in voiceless bias trials, even though the ambiguous stimuli were perceptually more voiced. Recall that voiceless bias trials had more voiceless responses than the other categories (see section 3.4.4 in the results chapter). Even though the tendency to provide a voiced response in this category was still strong, participants appeared to be mildly biased to provide a voiceless response. This was evidence of there being some degree of ambiguity in the digitally edited stimuli.  77 In sum, the current study did not provide conclusive support for text resolving ambiguous spoken words. It did, however, show that reaction times to ambiguous spoken words are positively and negatively affected by the preceding printed word. These facilitative and inhibitory effects can be explained by the apparent perceptual tendency to identify the ambiguous spoken words as voiced in the word repetition data. It appears that the degree of ambiguity in these spoken words was sufficient to permit text to either facilitate or inhibit the processing of these ambiguous words, but it was not enough to allow text to bias the identification of these words. 4.4.2  Cross-modal influence below level of awareness The second question pertained to the influence of briefly displayed text on  ambiguous spoken words. The reaction time data did not find any differences between prime-target trial types when the prime was not identifiable (i.e., at 33-ms; see section 4.2.3.1 above), indicating that prime exposure was not sufficient to permit cross-modal priming. The word repetition data did not find any evidence of text biasing ambiguous spoken words in the short prime duration condition, as only half of the responses to ambiguous spoken words corresponded to the preceding prime (see section 4.3.2.2). The effects of text on spoken words regardless of ambiguity, appear limited to the condition where the printed word is identifiable (i.e., at 100-ms). While this finding supports the Kouider and Dupoux (2001) claim that cross-modal effects by print on spoken word recognition requires conscious awareness of the visual prime word, there is a caveat in concluding that cross-modal influence is limited to awareness in light of the Grainger et al. (2003) study. Replicating the cross-modal condition of Kouider and Dupoux's (2001) experiment, Grainger et al. (2003) found evidence of print facilitating  78 processing of spoken words even when the subjects could not identify what they saw. This contradicted Kouider and Dupoux's assertion that cross-modal effects by print were limited to conscious awareness. In opposition to Grainger et al. (2003) findings, the current results suggest that a printed word needs sufficient exposure (i.e., more than 33 ms) to produce a detectable priming effect in auditory target words. 4.4.3  Summary The evidence of text influencing the processing of ambiguous (and unambiguous)  speech was confined to the reaction time data. Despite participants' perceptual bias to identify ambiguous spoken words as voiced, the preceding text increased or decreased the latencies to identify these stimuli. These cross-modal effects were observed when the text was identifiable, but not when it was blinked, indicating that adequate prime exposure (i.e., more than 33 ms) was required to elicit these effects. 4.5  Theoretical account of the data The Grainger et al. (2003) bimodal interactive activation model can account for  some of the findings in the current study. The observed facilitation of spoken words by identical text has been modeled by Grainger and his colleagues. Within the framework, the printed word activates the orthographic and corresponding phonological representations via a bimodal lexical route channel that connects the two lexicons. The prime's activation of the phonological representation, in turn, facilitates the speed of access to that representation by the matching auditory target. Even though, the bimodal interactive activation model does not adequately suggest how ambiguous spoken words are processed, the model can be extended to account for the finding that the ambiguous targets (that were perceptually more voiced)  79 were facilitated by potentially matching and voiced primes. The voiced bias condition was likely somewhat analogous to the identical (i.e., matching) prime-target pairing scenario due to the perceptual bias to voiced responses. In terms of the Grainger et al. (2003) model, the voiced prime activated the corresponding (voiced) phonological representation, and it facilitated the ambiguous target's subsequent access to that voiced representation. The lack of robust priming by voiced bias primes in the long prime duration condition suggests that the prime only weakly activated the voiced phonological representation. The bimodal interactive activation model can also be extended to explain the inhibition of ambiguous spoken words by voiceless primes. A voiceless prime would activate the corresponding voiceless phonological form (i.e., pan) in the Grainger et al. (2003) framework. Assuming that the ambiguous words activated both voiced and voiceless counterparts (e.g., XAN activating ban and pan), and that the spoken words were perceptually more voiced (i.e., more activation of ban than pan), the overlapping voiced and voiceless phonological representations would compete, resulting in longer latencies. The interpretation of word repetition data can be made using Massaro's (1987) Fuzzy Logic Model of Perception (FLMP). In this framework, ambiguous information (such as digitally edited speech) is vulnerable to biasing from another source (including text; see Massaro et al., 1988).  The apparent tendency in the current data for  participants to perceive the ambiguous stimuli as voiced would be explained as a high degree of fit between the digitally edited onset of the ambiguous stimuli and the internal prototypes for voiced segments. The finding that voiceless bias trials had significantly  80 more voiceless responses than other ambiguous target trials would be explained as (voiceless) textual information from the prime being integrated with the somewhat ambiguous acoustic information, and in turn, increasing the degree of fit with an internal prototype for a voiceless segment. The F L M P and the bimodal interactive activation model each contribute to the interpretation of the cross-modal findings in the current study. The priming effects by text on speech are better suited to Grainger et al.'s (2003) model, whereas the word repetition data can be interpreted in Massaro's (1987) framework. As stated in the introduction, no single model can explain both the priming and biasing effects of text on speech perception in the current study as well as those documented in the literature. A challenge to other models of spoken and visual word recognition is to expand the theoretical scope to predict, or even simulate, these cross-modal effects. In addition to Grainger et al. (2003) and Massaro (1987), there are other researchers (e.g., Gaskell & Marslen-Wilson, 2002) who recognize the need to address the interconnectivity between orthography and phonology. 4.6  Limitations of the study The question of whether there would be temporal constraints on the activation of  spoken words by print was posited earlier in the chapter (see section 4.2.3.1). The current study did not provide clear findings that speak to the minimum time length required to permit cross-modal priming. Only two prime duration conditions were used, representing two extremes of exposure. These conditions were selected partly to minimize the total number of trials per subject, and to use a simple blink-nonblinked  81 structure to test the hypotheses. Intermediary exposure lengths should be included in the future to explore the possibility that longer prime exposures yield larger priming effects. In light of the apparent perceptual bias effect towards voiced responses, it would be necessary to reexamine the methodology used to attain experimentally-controlled ambiguous stimuli. As seen in Appendix I, the pilot study found that the percentage of voiced responses to the edited stimuli decreased as the voice onset time increased. Yet the selection of the ambiguous stimuli (i.e., the tokens closest to the 50% voiced response level) from these pilot results was apparently too perceptually voiced for the current sample. The discrepancy between pilot and experimental perceptions toward the ambiguous stimuli may be due to the pilot study's small sample size. It may be necessary in the future to gain a larger sample size for the pilot testing of the V O T stimuli. 4.7  Future directions Considering the unexpected outcomes of this study, it would be worthwhile to  conduct a similar experiment using a different type of ambiguous stimulus (e.g., speech in noise, acoustic manipulation of the coda, gated stimuli, distortion). As well, the inclusion of varying degrees of prime duration could be used to explore the temporal constraint question regarding cross-modal priming. The lack of inhibitory effects in the opposing prime-target relationship (e.g., ban-PAN), while finding these effects in voiceless bias trials (e.g., pan-XAN), warrants further study to explain the discrepancy. 4.8  Conclusion This cross-modal study showed that printed words could affect the automatic (i.e.,  online) processing of spoken words. Printed words appeared to facilitate or inhibit the speed of processing of the ambiguous spoken words. However, this influence did not  82 translate into biasing the identification of ambiguous spoken words, as the majority of these words were perceived as voiced. The cross-modal priming effects can be explained in a connectionist framework as evidence of interconnectivity between orthography and phonology (e.g., Plaut et al., 1996). These priming effects were limited to trials containing identifiable visual primes, suggesting that cross-modal priming was dependent on the length of exposure of a printed word.  83 REFERENCES  Abramson, A.S. (1977). Laryngeal timing in consonant distinctions. Phonetica, 34, 295303. Baken, R J . & Orlikoff, R.F. (2000). Clinical Measurement of Speech and Voice. San  Diego, C A : Singular. Baum, S. (2001). Contextual influences on phonetic identification in aphasia: The effects of speaking rate and semantic bias. Brain and Language, 76, 266-281. Bird, S.A. & Williams, J.N. (2002). The effect of bimodal input on implicit and explicit memory: A n investigation into the benefits of within-language subtitling. Applied Psycholinguistics, 23, 509-533. Booth, J.R., Burman, D.D., Meyer, J.R., Gitelman, D.R., Parrish, T.B., & Mesulam, M . M . (2002). Functional anatomy of intra- and cross-modal lexical tasks. Neuroimage,  16, 7-22.  Borowsky, R., Owen, O.J., & Fonos, N . (1999). Reading speech and hearing print: Constraing models of visual word recognition by exploring connections with speech perception. Canadian Journal of Experimental  Psychology,  53, 294-305.  Borsky, S., Tuller, B., & Shapiro, L.P. (1998). "How to milk a coat:" The effects of semantic and acoustic information on phoneme categorization. Journal of the Acoustical Society of America, 103, 2670-2676.  Bowers, J.S., Damian, M.F., & Havelka, J. (2002). Can distributed orthographic knowledge support word-specific long-term priming? Apparently so. Journal of Memory and Language, 46, 24-38.  Boyczuk, J.P. & Baum, S. (1999). The influence of neighbourhood density on phonetic categorization in aphasia. Brain and Language, 67, 46-70. Caplan, D . (1992). Language: Structure, processing,  and disorders.  Cambridge, M A :  MIT Press. Coltheart, M . , Curtis, B., Atkins, P., & Haller, M . (1993). Models of reading aloud: Dual-route and parallel-distributed-processing approaches. Psychological Review, 100, 589-608. Connine, C. (1987). Constraints on interactive processes in auditory word recognition: The role of sentence context. Journal of Memory and Language, 26, 527-538.  84 Connine, C M . , Blasko, D.G., & Wang, J. (1994). Vertical similarity in spoken word recognition: Mulitple lexical activation, individual differences, and the role of sentence context. Perception and Psychophysics, 56, 624-636. Connine, C M . & Clifton, C. (1987). Interactive use of lexical information in speech perception. Journal of Experimental Psychology: Human Perception and  Performance, 13, 291-299. Connine, C , Titone, D., Deelman, T., & Blasko, D. (1997). Similarity mapping in spoken word recognition. Journal of Memory and Language, 37, 463-480. Cutler, A., Dahan, D., & Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40, 141-201. Dijkstra, T., Schreuder, R., & Frauenfelder, U.H. (1989). Grapheme context effects on phonemic processing. Language and Speech, 32, 89-108. Dirks, D.D., Takayanagi, S., Moshfegh, A., Noffsinger, P.D. & Fausti, S.A. (2001). Examination of the Neighborhood Activation Theory in normal and hearingimpaired listeners. Ear and Hearing, 22, 1-13. Dodd, B., Oerlemans, M . , & Robinson, R. (1988). Cross-modal effects in repetition priming: A comparison of lipread, graphic, and heard stimuli. Visible Language, 22, 58-77. Ferrand, L. & Grainger, J. (1993). The time course of orthographic and phonological code activation in the early phases of visual word recognition. Bulletin of the Psychonomic Society, 31, 119-122.  Forster, K.I. (1998). The pros and cons of masked priming. Journal of Psychological Research, 27, 203-233. Forster, K.I. & Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory, and  Cognition, 10, 680-698. Fowler, C A . (1986). A n event approach to the study of speech perception from a directrealist perspective. Journal of Phonetics, 14, 3-28. Fowler, C A . & Dekle, D J . (1991). Listening with eye and hand: Cross-modal contributions to speech perception. Journal of Experimental Psychology: Perception and Performance, 17, 816-828.  Human  Francis, W.N., Kucera, H., & Mackie, A.W. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin.  85 Frost, R. (2003). Toward a strong phonological theory of visual word recognition: True issues and false trails. Psychological Bulletin, 123, 71-99. . Frost, R., Repp, B.H., & Katz, L. (1988). Can speech perception be influenced by simultaneous presentation of print? Journal of Memory and Language, 27, 741755. Garlock, V . M . , Walley, A.C., & Metsala, J.L. (2001). Age-of-acquisition, word frequency, and neighborhood density effects of spoken word recognition by children and adults. Journal of Memory and Language, 45, 468-492. Garson, J. (2002). Connectionism. In E. N . Zalta (Ed.), The Stanford Encyclopedia Philosophy (Winter 2002 Edition). Retrieved August 9, 2005 from http://plato.stanford.edu/archives/win2002/entries/connectionism/  of  Gaskell, M . G . (2001). Phonological variation and its consequences for the word recognition system. Language and Cognitive Processes, 16, 723-729. Gaskell, M . G . & Marslen-Wilson, W.D. (1997). Integrating form and meaning: A distributed model of speech perception. Language and Cognitive Processes, 12, 613-656. Gaskell, M . G . & Marslen-Wilson, W.D. (2001). Lexical ambiguity resolution and spoken word recognition: Bridging the gap. Journal of Memory and Language, 44, 325-349. Gaskell, M . G . & Marslen-Wilson, W.D. (2002). Representation and competition in the perception of spoken words. Cognitive Psychology, 45, 220-266. Grainger, J., Diependaele, K., Spinelli, E., Ferrand, L., & Farioli, F. (2003). Masked repetition and phonological priming within and across modalities. Journal of Experimental  Psychology: Learning, Memory, and Cognition, 29, 1256-1269.  Grainger, J. & Ferrand, L. (1994). Phonology and orthography in visual word recognition: Effects of masked homophone primes. Journal of Memory and Language, 33, 218-233. Grainger, J. & Jacobs, A . M . (1996). Orthographic processing in visual word recognition: A multiple read-out model. Psychological Review, 703,518-565. Green, K.P. (1998). The use of auditory and visual information during phonetic processing: implications for theories of speech perception. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (pp. 3-25). East Sussex, U K : Psychology Press.  86 Haahr, M . (1999). Random sequence generator. Retrieved October 5, 2004, from http://www.random.org/sform.html Hawkins, S. (1999). Reevaluating assumptions about speech perception: Interactive and integrative theories. In J.M. Pickett (Ed.), The acoustics of speech communication:  Fundamentals,  speech perception  theory and technology  (pp.  232-288). Needham Heights, M A : Allyn & Bacon. Heron, D.T. & Bates, E A . (1997). Sentential and acoustic factors in the recognition of open- and closed-class words. Journal of Memory and Language, 37, 217-239. Jacobs, A . M . , Rey, A., Ziegler, J.C., & Grainger, J. (1998). M R O M - p : A n interactive . activation, multiple readout model of orthographic and phonological processes in visual word recognition. In J. Grainger & A . M . Jacobs (Eds.), Localist connectionist approaches to human cognition  (pp. 147-188). Mahwah, N J :  Lawrence Erlbaum Associates. Joanisse, M.F. & Seidenberg, M.S. (1998). Specific language impairment: A defecit in grammar or processing? Trends in Cognitive Sciences, 2, 240-247. Kent, R.D. & Read, C. (1992). The Acoustic Analysis of Speech. San Diego, C A : Singular. Kessler, B., Treiman, R., & Mullennix, J. (2002). Phonetic biases in voice key response time measurements. Journal of Memory and Language, 47, 145-171. Kirsner, K. & Smith, M . C . (1974). Modality effects in word identification. Memory & Cognition, 2, 637-640. Kouider, S. & Dupoux, E. (2001). A functional disconnection between spoken and visual word recognition: evidence from unconscious priming. Cognition, 82, 35-49. Kucera, H . & Francis, W . N . (1967). Computational analysis of present-day  American  English. Providence, RI: Brown University Press. Legge, G.E., Grosmann, C , & Pieper, C M . (1984). Learning unfamiliar voices. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 298-303.  Liberman, A. & Mattingly, L . (1985). The motor theory revised. Cognition, 21, 1-36. Lloyd, V . L . (2001). Use of phonological and lexical cues in the resolution of perceptuallexical ambiguities by listeners differing in working memory span. Doctoral  dissertation, University of British Columbia, Vancouver, B C . Lukatela, G. & Turvey, M.T. (1996). Inhibition of naming by rhyming primes. Perception and Psychophysics,  58, 823-835.  87  Macken, M . A., & Barton, D. (1980). The acquisition of the voicing contrast in English: A study of voice onset time in word-initial stop consonants. Journal of Child Language, 7, 41-74. Massaro, D.W. (1987). Speech perception by ear and eye: A paradigm for  psychological  inquiry. Hillsdale, NJ: Lawrence Erlbaum Associates. Massaro, D.W. (1998). Illusions and issues in bimodal speech perception. Proceedings of Auditory Visual Speech Perception '98, 21-26.  Massaro, D.W., Cohen, M . M . , & Thompson, L.A. (1988). Visible language in speech perception: Lipreading and reading. Visible Language, 12, 8-31. Macdonald, S.L. (2003). The role of segmental and suprasegmental cues in lexical  access. Master's thesis, University of British Columbia, Vancouver, B C . McClelland, J.L. & Elman, J.L. (1986). The T R A C E model of speech perception. Cognitive Psychology, 18, 1-86.  McGurk, H . & McDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746748. McMurray, B., Tanenhaus, M . K . , & Aslin, R.N. (2002). Gradient effects of withincategory phonetic variation on lexical access. Cognition, 86, 33-42. Miller, J.L. & Dexter, E.R. (1988). Effects of speaking rate and lexical status on phonetic perception. Journal of Experimental Psychology: Human Perception  and  Performance, 14, 369-378. Miller, K . M . & Swick, D. (2003). Orthography influences the perception of speech in alexic patients. Journal of Cognitive Neuroscience, 15, 981-990. Moreno, R. & Mayer, R.E. (2002). Verbal redundancy in multimedia learning: When reading helps listening. Journal of Educational Psychology, 94, 156-163. Mullenix, J.W., Pisoni, D.B., & Martin, C S . (1989). Some effects of talker variability on spoken word recognition. Journal of the Acoustical Society of America, 85, 365-  378. Newman, R.S. (2004). Perceptual restoration in children versus adults. Applied Psycholinguisitics, 25, 481-493. Nygaard, L.C., Sommers, M.S., & Pisoni, D.B. (1994). Speech perception as a talkercontingent process. Psychological Science, 5, 42-46.  88 Owen, W J . & Borowsky, R. (2003). Examining the interactivity of lexical orthographic and phonological processing. Canadian Journal of Experimental Psychology,  57,  290-303. Oxford English Dictionary Online, (n.d.). Retrieved September 24, 2005, from http://www.oed.com Pichora-Fuller, M . K . , Schneider, B.A., & Daneman, M . (1995). How young and old adults listen to and remember speech in noise. Journal of the Acoustical Society of America, 97, 593-608. Pickett, J.P. et al. (Eds.). (2000). American heritage dictionary of the English  language,  fourth edition. New York: Bartleby.com Pisoni, D.B. & Tash, J. (1974). Reaction times to comparisons within and across phonetic categories. Perception and Psychophysics, 15, 285-290. Plaut, D.C. (1999). A connectionist approach to word reading and acquired dyslexia: Extension to sequential processing. Cognitive Science, 23, 543-568. Plaut, D . C , McClelland, J.L., Seidenberg, M.S., & Patterson, K . (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 703,56-113. Psychology Software Tools. (2003). E-Prime Beta 1.1 [Computer software]. Pittsburgh, PA: Author. Psychology Software Tools, Inc. (n.d.). Frequently asked questions about E-Prime.  Retrieved October 10, 2004, from http://www.pstnet.com/products/eprime/faq/efaqp .htm# 14c Rastle, K . & Davis, M . H . (2002). On the complexities on naming. Journal of Experimental Psychology: Human Perception and Performance,  28, 307-314.  Repp, B.H. (1981). Perceptual equivalence of two kinds of ambiguous speech stimuli. Bulletin of the Psychonomic Society, 18, 12-14.  Repp, B.H. & Lin, H . (1991). Effects of preceding context on the voice-onset-time category boundary. Journal of Experimental Psychology: Human Perception and  Performance, 77,289-302. Rosen, S. & Howell, P. (1987). Auditory, articulatory, and learning explanations of categorical perception in speech. In S. Harnad (Ed.), Categorical perception: The groundwork of cognition (pp. 113-160). New York: Cambridge University Press.  89 Salasoo, A. & Pisoni, D.B. (1985). Interaction of knowledge sources in spoken word identification. Journal of Memory & Language, 24, 210-231.  Sams, M . , Manninen, P., Surakka, V., Helin, P., & Katto, R. (1998). McGurk effect in Finnish syllables, isolated words, and words in sentences: Effects of word meaning and sentence context. Speech Communication, 26, 75-87. Samuel, A . G . (1981). Phonemic restoration: Insights from a new methodology. Journal of Experimental  Psychology:  General, 770,474-494.  Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime User's Guide. Pittsburgh, PA: Psychology Software Tools Inc. Scott, T., Green, W.B., & Stuart, A. (2001). Interactive effects of low-pass filtering and masking noise on word recognition. Journal of the American Academy of Audiology, 12, 437-444. Seidenberg, M.S. & McClelland, J.L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568. Shaffer, J.P., (1995). Multiple hypothesis testing. Annual Review of Psychology, 46, 561585. Shigeno, S. (2000). Influence of vowel context on the audio-visual speech perception of voiced stop consonants. Japanese Psychological Research, 42, 155-167. Sommers, M.S. (1996). The structural organization of the mental lexicon and its contribution to age-related changes in spoken word recognition. Psychology and Aging, 77,333-341. Sommers, M.S. & Danielson, S.M. (1999). Inhibitory processes and spoken word recognition in young and older adults: The interaction of lexical competition and semantic context. Psychology and Aging, 74,458-472. Sommers, M.S., Nygaard, L.C., & Pisoni, D.B. (1994). Stimulus variability and spoken word recognition. I. Effects of variability in speaking rate and overall amplitude. Journal of the Acoustical Society of America, 96, 1314-1324.  Sonic Foundry. (1999). Sound Forge X P Version 4.5b [Computer software]. Madison, WI: Author. Speech and Hearing Lab Neighborhood Database. (2001). Retrieved October 19, 2004, from http://128.252.27.56/Neighborhood/SearchHome.asp  90 Spreen, O. & Strauss, E. (1991). A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary (pp. 41-44). New York: Oxford University Press. Sprenger-Charolles, L., Siegel, L.S., Bechennec, D., & Serniclaes, W. (2003). Development of phonological and orthographic processing in reading aloud, in silent reading, and in spelling: A four-year longitudinal study. Journal of Experimental  Child Psychology, 84, 194-217.  Stone, G.O., Vanhoy, M . , & Van Orden, G.C. (1997). Perception is a two-way street: Feedforward and feedback phonology in visual word recognition. Journal of Memory and Language, 3"f5, 337-359.  Switchboard Database, (n.d.). Retrieved June 26, 2004, from http://www.ldc.upenn.edu/LOL/ Syntrillium Software Corporation. (2000). Cool Edit 2000 [Computer software]. Phoenix, A Z : Author. Till, J.A. & Stivers, D.K. (1981). Instrumentation and validity for direct-readout voice onset time measurement. Journal of Communication Disorders, 14, 507-512. Tun, PA. & Wingfield, A . (1999). One voice too many: Adult age differences in language processing with different types of distracting sounds. Journal of Gerontology: Psychological  Sciences, 54B, 317-327.  Uttl, B. (2002). North American Adult Reading Test: Age, norms, reliability, and validity. Journal of Clinical and Experimental Neuropsychology,  24, 1123-1137.  Wingfield, A., Lahar, C , & Stine, E. (1989). Age and decision strategies in running memory for speech: Effects on prosody and linguistic structure. Journal of Gerontology: Psychological  Sciences, 44, 106-113.  Wingfield, A., Lindfield, K . C . , & Goodglass, H . (2000). Effects of age and hearing sensitivity on the use of prosodic information in spoken word recognition. Journal of Speech, Language and Hearing Research, 43, 915-925.  Ziegler, J.C. & Ferrand, L. (1998). Orthography shapes the perception of speech: The consistency effect in auditory word recognition. Psychonomic Bulletin & Review, 5, 683-689. Ziegler, J . C , Montant, M . , & Jacobs, A . M . (1997). The feedback consistency effect in lexical decision and naming. Journal of Memory and Language, 37, 533-554.  91 Zorzi, M . , Houghton, G, & Butterworth, B . (1998). Two routes or one in reading aloud? A connectionist dual-process model. Journal of Experimental Psychology: Human Perception & Performance,  24,1131-1161.  APPENDIX I Percentage of voiced identification versus voice onset time  Bilabial Bin-Pin: Percentage of Voiced Responses -400-  100 90 80  in 70 o in 60  c o a 0) (a OZ TJ  o o  >  50 40 30 20 10 0 16.1  18.7  24.5  24.5  28.2  Voice Onset Time (ms)  Ban-Pan: Percentage of Voiced Responses  -x-e33.4  Alveolar Dip-Tip: Percentage of Voiced Responses  22.2  24.8  31.3  43.5  49.4  55.8  Voice Onset Time (ms)  Dab-Tab: Percentage of Voiced Responses  10 0  _|  , 27.7  , 33  , 37.1  , 37.1  , 41.6  Voice Onset Time (ms)  46.3  94 Velar Gap-Cap: Percentage of V o i c e d R e s p o n s e s  100 90 80  70 60 m a 50 UL •a 40 a> 30 o '5 20 > 10  in  •c o a.  28.2  34.3  40.2  52.3  58.2  -x-o-l  63.4  Voice Onset Time (ms)  Goat-Coat: Percentage of V o i c e d R e s p o n s e s  100 -,  -40&  c o a u> a> a. •a o o >  -4?  31  35.9  41.8  49  K  17  53.2  Voice Onset Time (ms)  58  APPENDIX II Auditory (target) stimuli  Experimental auditory stimuli Minimal Pairs  Ambiguous Words  ban pan bin pin dab tab dip tip gap cap goat coat  XAN XIN XAB XIP XAP XOAT  Filler auditory stimuli Fillers bane bud cheer chill chop cog cub den dock fad fool fuzz gang gum  gut hedge hem huff jab jug kiss lawn lug lull mess mill mole mop  nag node poke pose rag rave rum sap sash sham sheen shove soot thief  tuck veer vet vibe wheat whim wick wool yawn zap zeal zip  APPENDIX III Visual (prime') stimuli  Experimental visual stimuli Minimal Pairs  Control words  ban pan bin pin dab tab dip tip gap cap goat coat  fin jar net sag reef yam  Filler visual stimuli Fillers bane cheer chop cog den fuzz gang hedge hem jab kiss lull mess mop  nag node pose rag rave sash sham thief veer whim wick yawn zeal  97 APPENDIX IV Trial list 1  Trial no  Target (Auditory)  Relationship  1 hedge  soot  none  2jar 3 chop  tab chop  none matching  4 mess  vibe  none  5 cap 6 mop  gap fool  opposing  7 jab 8 goat 9 pan  jab coat  Prime (Visual)  Minimal Pair tab-dab  cap-gap  none matching opposing  coat-goat  XAN tip  voiceless bias  pan-ban  matching  tip-dip cap-gap  12 nag 13 sash  gap cub sash  none  14 dab 15 sag  tab XIN  opposing  16 node  node vet  matching  10 tip 11 fin  17 cog 18 lull  none  tab-dab pin-bin  none matching  19 bin  pin  opposing  pin-bin  20 goat  XOAT  voiced bias  coat-goat  21 nag  nag  matching  22 rave  rave  matching  23 tip  opposing  24 lull  dip sheen  25 rave  gut  none  26 yam  tip  none  tip-dip  27 reef  XOAT  none  coat-goat  28 kiss  zap  none  29 rag  rag  matching  30 cheer  mole  none  31 sag  pin  none  32 fuzz  wheat  none  33 veer  veer  matching  34 wick  rum  none  35 dip  XIP  voiced bias  36 kiss 37 pin  kiss  matching  pin  matching  fad goat  none  39 reef  ambiguous  none matching  lull  38 zeal  Target Type  ambiguous  ambiguous  tip-dip  none  none  ambiguous  pin-bin  tip-dip pin-bin coat-goat  ambiguous  98 40 sham  zip  none  41 tab  XAB  voiceless bias  tab-dab  42 net  ban  none  pan-ban  43 tab 44 den  tab  matching  tab-dab  none  45 pin  poke bin  opposing  46 sham  sham  matching  47 dab  XAB  voiced bias  tab-dab  48 dip  matching  tip-dip  49 node  dip huff  50 coat  coat  matching  coat-goat  51 ban  ban  matching  pan-ban  52 gap  cap hedge  opposing matching  cap-gap  XIP  none  tip-dip  ambiguous  tuck pose  none matching cap-gap  ambiguous  pan-ban tip-dip pin-bin  ambiguous  53 hedge 54 yam 55 chop 56 pose  pin-bin  cheer  matching  58 cap  XAP  voiceless bias  59 pose  lug  none  60 hem  hem  61 zeal  zeal XAN tip  matching matching none opposing  XIN  voiceless bias  65 den  den  matching  66 whim  whim  matching  67 hem  jug bin  none  68 sag 69 yawn  yawn  matching  70 ban  XAN  voiced bias  pan-ban  71 coat  goat  opposing  coat-goat  72 ban 73 wick  pan  pan-ban  wick  opposing matching  none  lawn  none  dab  none  76 bane  mill  none  77 tip 78 gap .  XIP  voiceless bias matching  tip-dip cap-gap  none none  tip-dip  matching  tab-dab coat-goat  81 dab  gap dip wool  82 coat  dab XOAT  83 pan  pan  voiceless bias matching  84 yawn 85 fin  chill cap  none none  86 thief  thief  matching  ambiguous  pin-bin  74 thief 75 jar  79 yam 80 gang  ambiguous  none  57 cheer  62 net 63 dip 64 pin  ambiguous  ambiguous  tab-dab  pan-ban cap-gap  ambiguous  ambiguous  99 87 cap  cap  matching  cap-gap  88 bin  XIN  voiced bias  pin-bin  89 sash  dock  none  90 mop  matching  91 tab  mop dab  opposing  tab-dab  92 goat  goat  matching  coat-goat  93 cog  cog  matching  94 jar  XAB  tab-dab  95 gang 96 net  gang pan  none matching none  97 reef  coat  none  coat-goat  98 bin  bin  matching  pin-bin  99 veer  sap  none  bud  none  101 gap 102 pan 103 mess  XAP  voiced bias opposing matching  cap-gap  104 fin  XAP  cap-gap  105 bane 106 whim  bane shove  none matching  107 fuzz  fuzz gum  108 jab  none matching none  ambiguous  pan-ban  100 rag  ban mess  ambiguous  ambiguous  pan-ban ambiguous  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092168/manifest

Comment

Related Items