Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Attention and salience in lexically-guided perceptual learning McAuliffe, Michael 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2015_september_mcauliffe_michael.pdf [ 1.06MB ]
JSON: 24-1.0166424.json
JSON-LD: 24-1.0166424-ld.json
RDF/XML (Pretty): 24-1.0166424-rdf.xml
RDF/JSON: 24-1.0166424-rdf.json
Turtle: 24-1.0166424-turtle.txt
N-Triples: 24-1.0166424-rdf-ntriples.txt
Original Record: 24-1.0166424-source.json
Full Text

Full Text

Attention and salience in lexically-guided perceptuallearningbyMichael McAuliffeB.A., University of Washington, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Linguistics)The University of British Columbia(Vancouver)July 2015c©Michael McAuliffe, 2015AbstractPsychophysical studies of perceptual learning find that perceivers only improvethe accuracy of their perception on stimuli similar to what they were trained on.In contrast, speech perception studies of perceptual learning find generalization tonovel contexts when words contain a modified ambiguous sound. This dissertationseeks to resolve the apparent conflict between these findings by framing the resultsin terms of attentional sets. Attention can be oriented towards comprehension ofthe speaker’s intended meaning or towards perception of a speaker’s pronunciation.Attention is proposed to affect perceptual learning as follows. When attention isoriented towards comprehension, more abstract and less context-dependent rep-resentations are updated and the perceiver shows generalized perceptual learning,as seen in the speech perception literature. When attention is oriented towardsperception, more finely detailed and more context-dependent representations areupdated and the perceiver shows less generalized perceptual learning, similar towhat is seen in the psychophysics literature. This proposal is supported by threeexperiments. The first two implement a standard paradigm for perceptual learningin speech perception. In these experiments, promoting a more perception-orientedattentional set causes less generalized perceptual learning. The final experimentuses a novel paradigm where modified sounds are embedded in sentences duringexposure. Perceptual learning is found only when the modified sound is embed-ded in words that are not predictable from the sentence. When modified soundsare in predictable words, no perceptual learning is observed. To account for thislack of perceptual learning, I hypothesize that sounds in predictable sentences areless reliable than sounds in words in isolation or unpredictable sentences. In thecases where perceptual learning is present, contexts which support comprehension-iioriented attentional sets show larger perceptual learning effects than contexts pro-moting perception-oriented attentional sets. I argue that attentional sets are a keycomponent to the generalization of perceptual learning to new contexts.iiiPrefaceAll of the work presented henceforth was conducted in the Speech in Context Lab-oratory at the University of British Columbia, Point Grey campus. All experimentsand associated methods were approved by the University of British Columbia’sResearch Ethics Board [certificate #H06-04047].I was the lead investigator for all experiments. Jamie Russell and Jobie Huiaided in data collection for the experiments in Chapter 2. Jobie Hui and MichelleChan were involved in stimulus preparation and data collection for the experimentin Chapter 3. Molly Babel was involved throughout all experiments in conceptformation and manuscript edits.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Perceptual learning . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Linguistic expectations and perceptual learning . . . . . . . . . . 121.2.1 Lexical bias . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.2 Semantic predictability . . . . . . . . . . . . . . . . . . . 151.3 Attention and perceptual learning . . . . . . . . . . . . . . . . . . 171.4 Category typicality and perceptual learning . . . . . . . . . . . . 221.5 Current contribution . . . . . . . . . . . . . . . . . . . . . . . . . 262 Lexical decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 292.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 39v2.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 432.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 452.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.4 Grouped results across experiments . . . . . . . . . . . . . . . . 502.5 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . 543 Cross-modal word identification . . . . . . . . . . . . . . . . . . . . 573.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.3 Pretest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2.4 Experiment design . . . . . . . . . . . . . . . . . . . . . 693.2.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 693.2.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 703.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.3.1 Exposure . . . . . . . . . . . . . . . . . . . . . . . . . . 703.3.2 Categorization . . . . . . . . . . . . . . . . . . . . . . . 723.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . 794.1 Specificity and generalization in perceptual learning . . . . . . . . 804.2 Effect of increased linguistic expectations . . . . . . . . . . . . . 814.3 Attentional control of perceptual learning . . . . . . . . . . . . . 834.4 Category atypicality . . . . . . . . . . . . . . . . . . . . . . . . . 844.5 Implications for cognitive models . . . . . . . . . . . . . . . . . 864.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92viList of TablesTable 1.1 Summary of predictions for size of perceptual learning effectsunder different linguistic expectations and attention. . . . . . . 15Table 1.2 Summary of predictions for size of perceptual learning effectswhen exposed to different typicalities of the modified category. 26Table 2.1 Filler words used in all experiments. . . . . . . . . . . . . . . 29Table 2.2 Filler nonwords used in Experiments 1 and 2. . . . . . . . . . 30Table 2.3 Words containing /S/ in all experiments. . . . . . . . . . . . . . 30Table 2.4 Mean and standard deviations for frequencies (log frequencyper million words in SUBTLEXus) and number of syllables ofeach item type . . . . . . . . . . . . . . . . . . . . . . . . . . 31Table 2.5 Frequencies (log frequency per million words in SUBTLEXus)of words used in categorization continua . . . . . . . . . . . . 31Table 2.6 Step chosen for each Word-initial stimulus in Experiment 1 andthe proportion /s/ response in the pretest . . . . . . . . . . . . 33Table 2.7 Step chosen for each Word-medial stimulus in Experiment 1and the proportion /s/ response in the pretest . . . . . . . . . . 35Table 2.8 Step chosen for each Word-initial stimulus in Experiment 2 andthe proportion /s/ response in the pretest . . . . . . . . . . . . 46Table 2.9 Step chosen for each Word-medial stimulus in Experiment 2and the proportion /s/ response in the pretest . . . . . . . . . . 47Table 3.1 High predictability filler sentences. . . . . . . . . . . . . . . . 63Table 3.2 Low predictability filler sentences. . . . . . . . . . . . . . . . 64viiTable 3.3 High predictability sentences with /S/ words. . . . . . . . . . . 65Table 3.4 Low predictability sentences with /S/ words. . . . . . . . . . . 65Table 3.5 High predictability sentences with target /s/ words. . . . . . . . 66Table 3.6 Low predictability sentences with target /s/ words. . . . . . . . 67viiiList of FiguresFigure 1.1 Schema of perceptual learning. The top panel shows categoriesfor /s/ and /S/ along a continuum, with a modified /s/ categoryin the dashed line. The bottom panel shows a categorizationfunction for exposure to a typical /s/ (solid) and a modified /s/(dashed). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Figure 1.2 Schema of the predictive coding framework adapted from Clark(2013). Representations are hierarchical and are more abstractthe higher they are in the hierarchy. Blue arrows represent ex-pectations, red arrows are error signals, and yellow is the actualsensory input. . . . . . . . . . . . . . . . . . . . . . . . . . . 10Figure 1.3 A schema for predictive coding under a perception-oriented at-tentional set. Attention is represented by the pink box, wheregain is enhanced for detection, but error signal propagation islimited to lower levels of sensory representation where the ex-pectations must be updated. This is represented by the lack ofpink nodes outside the attention box. As before, blue errorsrepresent expectations, red arrows represent error signals, andyellow represents the sensory input. . . . . . . . . . . . . . . 21ixFigure 1.4 A schema for predictive coding under a comprehension-orientedattentional set. Attention is represented by the green box, whereit is oriented to higher, more abstract levels of sensory repre-sentation. Error signals are able to propagate farther and up-date more than just the fine grained low level sensory repre-sentations. As before, blue arrows represent expectations, redarrows represent error signals, and yellow represents the sen-sory input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Figure 1.5 Distribution of original categories (red endpoints) and modi-fied /s/ categories used in Experiments 1 and 2. Experiment 1uses a maximally ambiguous category between /s/ and /S/. Ex-periment 2 uses a category that is more like /S/ than /s/. Thex-axis was generated using acoustic similarities used to gener-ate Figures 2.3 and 2.7. . . . . . . . . . . . . . . . . . . . . . 24Figure 2.1 Proportion of word-responses for Word-initial exposure words.Solid lines represent Experiment 1 selection criteria (50% word-response rate) and dashed lines represent Experiment 2 se-lection criteria (30% word-response rate). Dots are averagedword-response across subjects, and the blue line is a binomialmodel of the responses. . . . . . . . . . . . . . . . . . . . . . 32Figure 2.2 Proportion of word-responses for Word-medial exposure words.Solid lines represent Experiment 1 selection criteria (50% word-response rate) and dashed lines represent Experiment 2 se-lection criteria (30% word-response rate). Dots are averagedword-response across subjects, and the blue line is a binomialmodel of responses. . . . . . . . . . . . . . . . . . . . . . . . 34Figure 2.3 Multidimensional scaling of the acoustic distances between thesibilants of original productions, categorization tokens and theexposure tokens in Experiment 1. Categorization and exposuretokens were synthesized from the original productions usingSTRAIGHT (Kawahara et al., 2008). . . . . . . . . . . . . . 36xFigure 2.4 Within-subject mean accuracy for words in the exposure phaseof Experiment 1, separated out by Trial Type (Filler, /s/, and/S/). Error bars represent 95% confidence intervals. . . . . . . 40Figure 2.5 Within-subject mean reaction time to words in the exposurephase of Experiment 1, separated out by Trial Type (Filler, /s/,and /S/). Error bars represent 95% confidence intervals. . . . . 41Figure 2.6 Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 1. Error barsrepresent 95% confidence intervals. . . . . . . . . . . . . . . 42Figure 2.7 Multidimensional scaling of the acoustic distances between thesibilants of original productions, categorization tokens and theexposure tokens in Experiment 2. Categorization and exposuretokens were synthesized from the original productions usingSTRAIGHT (Kawahara et al., 2008). . . . . . . . . . . . . . 48Figure 2.8 Within-subject mean accuracy in the exposure phase of Exper-iment 2, separated out by Trial Type (Filler, /s/, and /S/). Errorbars represent 95% confidence intervals. . . . . . . . . . . . . 49Figure 2.9 Within-subject mean reaction time in the exposure phase ofExperiment 2, separated out by Trial Type (Filler, /s/, and /S/).Error bars represent 95% confidence intervals. . . . . . . . . . 50Figure 2.10 Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 2. Error barsrepresent 95% confidence intervals. . . . . . . . . . . . . . . 51Figure 2.11 Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 1 and Experi-ment 2. Error bars represent 95% confidence intervals. . . . . 52Figure 2.12 Correlation of cross-over point in categorization with the pro-portion of word responses to critical items containing an am-biguous /s/ token in Experiments 1 and 2. . . . . . . . . . . . 53xiFigure 3.1 Multidimensional scaling of the acoustic distances between thesibilants of original productions, categorization tokens and theexposure tokens in Experiment 3. Note that the only Isolationtokens are the Categorization tokens. . . . . . . . . . . . . . . 68Figure 3.2 Within-subject mean reaction time in the exposure phase ofExperiment 3, separated out by Trial Type (Filler, /s/, and /S/).Error bars represent 95% confidence intervals. . . . . . . . . . 71Figure 3.3 Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 3. Error barsrepresent 95% confidence intervals. . . . . . . . . . . . . . . 72Figure 3.4 Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 3 and the word-medial condition of Experiment 1. Error bars represent 95%confidence intervals. . . . . . . . . . . . . . . . . . . . . . . 73Figure 3.5 Schema of category relaxation in predictable sentences. Thesolid vertical line represents the mean of the modified cate-gory similar to the one used for Experiment 1, and a dashedvertical line represents the mean of the Experiment 2 modifiedcategory. A more atypical category, as was used in Experi-ment 2, has a higher probability of being categorized as /s/ inpredictable sentences than in isolation. . . . . . . . . . . . . . 76Figure 3.6 Distribution of cross-over points for each participant acrosscomparable exposure tokens in Experiments 1 and 3. Largerbulges represent more subjects located at that point in the dis-tribution. The dashed line represents the mean step of the con-tinua. Large bulges around the dashed line for Control, Unpre-dictive and Predictive conditions indicate that many speakersdid not change their category boundaries, compared to the Iso-lation conditions. . . . . . . . . . . . . . . . . . . . . . . . . 77xiiFigure 4.1 A schema for predictive coding under a perception-oriented at-tentional set. Attention is represented by the pink box, wheregain is enhanced for detection, but error signal propagation islimited to lower levels of sensory representation where the ex-pectations must be updated. This is represented by the lack ofpink nodes outside the attention box. As before, blue errorsrepresent expectations, red arrows represent error signals, andyellow represents the sensory input. . . . . . . . . . . . . . . 86Figure 4.2 A schema for predictive coding under a comprehension-orientedattentional set. Attention is represented by the green box, whereit is oriented to higher, more abstract levels of sensory repre-sentation. Error signals are able to propagate farther and up-date more than just the fine grained low level sensory repre-sentations. As before, blue errors represent expectations, redarrows represent error signals, and yellow represents the sen-sory input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87xiiiAcknowledgmentsThis dissertation would not have been possible without support from innumerablesources. First and foremost, I must thank Molly Babel, whose enthusiasm andpassion for linguistic research is infectious. Equally impressive is her focus andorganization, which really helped get this dissertation done in (somewhat) timelyfashion. I could not imagine a better supervisor and collaborator. Additionally,feedback and discussions from my committee members, Eric Vatikiotis-Batesonand Carla Hudson Kam, have been invaluable in shaping my thinking on all of myprojects over the past five years.Several seminars provided excellent perspectives on my dissertation research.The first, taught by Martina Wiltschko and Michael Rochemont, helped shape myinitial prospectus. The second, taught by Molly Babel and Kathleen Currie Hall,provided a forum to share my intiial findings and thought process and helped refinethem. I am also grateful to my fellow students in those seminars for all the livelydiscussions. My fellow graduate students have shaped my thinking and my grad-uate experience, namely: Kevin McMullin, Mark Scott, Anita Szakay, Jen Abel,Alexis Black, Heather Bliss, Blake Allen, Michael Fry, Scott Mackie, and MurraySchellenberg. You’re all awesome.I would like to thank the members of the Speech in Context lab, who are uni-formly wonderful people. In particular, Michelle Chan and Jobie Hui helped mewith stimuli selection, and Jobie Hui and Jamie Russell helped me with runningparticipants. I am indebted to all of you.My wonderful parents have supported me from the beginning of studying lin-guistics, even if it wasn’t the most viable major for a career in undergrad. I cer-tainly would not have found my passion in linguistics if it weren’t for the exposurexivto other languages and cultures that I had growing up. My brother, aunts, unclesand cousins from around the world have been amazingly supportive as well. Thankyou especially to Laura Tammpere who has been a constant source of support andfun. I love you all!xvChapter 1IntroductionListeners are faced with a large degree of phonetic variability when interactingwith their fellow language users. Speakers differ in size, gender, and sociolect,which makes speech sound categories overlap in acoustic dimensions. Despite thisvariation, listeners can interpret disparate and variable productions as belonging toa single word type or sound category, a phenomenon referred to as perceptual con-stancy (Shankweiler et al., 1977; Kuhl, 1979) or recognition equivalence (Sumnerand Kataoka, 2013). One of the processes for achieving this constancy is percep-tual learning, whereby perceivers update a perceptual category based on contextualfactors.While perceptual learning is a response to speaker variation, there is also vari-ation on the part of the listener. For instance, a professor with a non-native accentcan cause shifts in their students’ attention. Some students may be unphased andfocus on the content of the lecture. Some may focus on the unfamiliar pronounci-ations in order to better understand the professor, while others would be distractedby the unfamiliarity. Perceptual learning is typically framed in terms of speakervariation. In contrast, this dissertation examines the effect of listener variation onperceptual learning.In the speech perception literature, perceptual learning has two distinct, yetrelated, usages. It can refer to the process of learning to understand a group ofspeakers that share a common characteristic, such as a nonnative accent (Brad-low and Bent, 2008). The second usage, which this dissertation adopts, is more1constrained. Perceptual learning here refers to a listener updating their perceptualcategories following exposure to a single speaker with a modified sound category(Norris et al., 2003).The primary locus of investigation within the perceptual learning literature isgeneralization to novel contexts. In most studies, exposure to modified sound cat-egories generalizes to other words and nonwords when the modified exposure to-kens are embedded in real words (Norris et al., 2003; Reinisch et al., 2013). Thisparadigm is referred to as lexically-guided perceptual learning, as listeners are ex-posed to the modified sounds in the context of real words in a lexical decision task.In a lexical decision task, participants hear words (e.g. silver in English) or non-words (e.g. shilver). For each auditory token, they are asked whether they hearda word in English or not. On the other hand, generalization appears to be morelimited when perceptual learning is induced through visually-guided paradigms.In visually-guided paradigms, listeners are exposed to a modified category in anonword (i.e., a token halfway between aba and ada) matched to an unambiguousvideo signal (i.e., a person saying aba). The perceptual learning exhibited fromthese visually-guided experiments is found to influence only the specific nonwordthat perceivers are exposed to and not other similar nonwords (Reinisch et al.,2014). This kind of exposure-specificity effect has been widely reported in percep-tual learning studies in the psychophysics literature (Gibson, 1953, for review).Why, then, does lexically-guided perceptual learning produce such generaliza-tion? Here I posit that these results can be understood by considering the atten-tional set exploited in the exposure phase. An attentional set is a strategy that isemployed by perceivers to prioritize certain aspects of stimuli. Attentional setscan be induced through instructions or through properties of stimuli themselves. Acommon example in the visual domain is the attentional sets employed in visualsearch tasks. If a perceiver has to find a target shape in a field of shapes, thereare two possible attentional sets (Bacon and Egeth, 1994). The first is a singleton-detection attentional set: a diffuse set where any salient information along anydimension will be given priority. If the target shape is a circle in a field of squares,then the singleton-detection attentional set is elicited, because the shape to find isalways a highly salient element. Singleton detection can result in slowed reactiontimes if a distractor singleton (i.e. a red square) is present. The second atten-2tional set is the feature-detection set: a focused set limited to the target’s definingfeature (i.e. the color red or the square shape). If there are multiple redundant tar-gets or many distractor singletons, then the singleton-detection attentional set willnot be an effective strategy and feature detection will be employed. Using featuredetection, participants are not distracted by singletons. Bacon and Egeth (1994)speculate that singleton detection is the default attentional set. Feature detectionis only employed when singleton detection is made ineffective through stimulusdesign.In the speech perception literature, two broad attentional sets have been posited(Cutler et al., 1987; Pitt and Szostak, 2012). The first is a comprehension-orientedor diffuse attentional set – this is the attentional set assumed to operate duringnormal language use. When oriented towards comprehension, listeners are fo-cused on comprehending the intended message of the speech, and a comprehen-sion set is promoted by tasks that focus on word identity and word recognition.The comprehension-oriented attentional set is elicited in lexically-guided percep-tual learning paradigms through their use of lexical decision tasks and the embed-ding of modified sound categories in word tokens. A second kind of attentionalset is a perception-oriented or focused attentional set, where a listener is focusedmore on the low-level signal properties of the speech rather than the message. Theperception-oriented attentional set is promoted by tasks such as phoneme/syllablemonitoring or mispronunciation detection. The tasks used in visually-guided per-ceptual learning and perceptual learning within the psychophysics literature canbe thought of as eliciting this attentional set. Stimuli used in these paradigms areeither nonwords or visual stimuli, and so lack linguistic meaning.Comprehension and perception are, of course, interconnected concepts. For thepurposes of this thesis, I largely follow the distinction drawn in Pitt and Szostak(2012). In this distinction, comprehension refers to processing and understandinga speaker’s intended meaning. In other words, it is the activation or retrieval oflinguistic objects that have been abstracted away from the specific instances. Per-ception is defined as hearing and processing the speech as pronounced. It is thenthe encoding of the fine details of a specific instance of an abstract linguistic cat-egory. If a listener hears the word cat ([kæ˜t^]), the listener both perceives the finedetails of the specific production (i.e., an unreleased [t], glotalization on the vowel,3gender of the speaker, etc.) and comprehends the lexical item “cat”. Perceptionand comprehension do not always occur together, however. If a listener hears anonword like keet ([kit]), the listener still perceives the details, but there is no lex-ical item to comprehend. Conversely, listeners can comprehend items that are notactually produced. For instance, in a conversation where one person has a cat, theycan say “I have to go home to feed my, uh, you know.” The listener can still com-prehend the speaker’s meaning of feeding their cat, without an actual pronunciationof the word cat. In terms of theories of speech perception, comprehension mapsto the abstract linguistic representations (sound categories, words, etc.), and per-ception to the fine-detailed episodic traces. Attention to linguistic properties (i.e.syntactic category) or signal properties (i.e. the speaker’s gender) has been shownto change the relative strengths of encoding for abstract or episodic representations(Goldinger, 1996; Theodore et al., 2015). These differences in attention correspondwell to the attentional sets proposed above for comprehension and perception.The core hypothesis of this dissertation is that perceivers who adopt a morecomprehension-oriented attentional set will show more generalization than thosewho adopt a more perception-oriented attentional set. This hypothesis captures thebasic findings of perceptual learning in the speech perception and psychophysicsliterature. Comprehension-oriented exposure tasks (e.g., lexical decision) lead togeneralization on novel test items, but perception-oriented exposure tasks (e.g.,speech reading) do not generalize to novel test items. To test the hypothesis, I usea lexically-guided perceptual learning paradigm to expose listeners to an /s/ cat-egory modified to sound more like /S/. Groups of participants differ in whethercomprehension-oriented or perception-oriented attentional sets are favored whenprocessing the modified /s/ category based on experimental manipulations. Thefavoring of attentional sets are implemented in four ways across the three exper-iments presented in this dissertation. Two manipulations are linguistic in nature,one is instruction-based, and the fourth is stimulus-based. The rest of this sectionis devoted to an overview of these manipulations and their motivations.Before introducing the manipulations themselves, a definition of perceptualsalience is necessary. Salience is a widely-used term and is poorly-defined acrossthe literature. For the purposes of this dissertation I adopt the following definition:an element is salient if it is unpredictable given the context and/or easily distin-4guishable from other possible elements. The modified /s/ category that is beinglearned in this dissertation already has salient signal properties in the form of highfrequency, relatively high amplitude, aperiodic noise. Increasing the perceptualsalience of the modified /s/ category is a function of embedding it in a linguisticposition with little conditioning context or increasing the acoustic distance from atypical /s/ production.I argue that increased perceptual salience promotes a more perception-orientedattentional set. In general, psycholinguistic studies have a small number of targettrials and a large number of unrelated filler trials. By overwhelming the target tri-als with filler trials, it is assumed that participants will be unlikely to notice of thetrue purpose of the experiment until debriefing. Changing the nature of the fillertrials can induce attentional set changes: if more fillers are words rather than non-words, a comprehension-oriented attentional set is promoted (Mirman et al., 2008).Increasing the number of target trials can also induce attentional set changes. In-structing participants to attend to a modified sound has less of an effect when thereare proportionally more of those trials (Pitt and Szostak, 2012). Put another way,if the modified sounds are salient due to prevalence, participants are more likelyto notice the targets and adopt a more perception-oriented attentional set withoutexplicit instructions. Increasing perceptual salience through stimulus design in thisdissertation is predicted to have the same effect.The first linguistic manipulation used to promote different attentional sets isthe position of the modified /s/ in the exposure words. Accurate perception ismost critical when expectations are low, as perception of highly expected elementsserves a more confirmatory role (Marslen-Wilson and Welsh, 1978; Gow and Gor-don, 1995). Some groups of participants are exposed to the modified sound onlyat the beginnings of words (e.g. silver, settlement) and other groups are exposedto the category only in the middle of words (e.g. carousel, fossil). Word-initialpositions lack the expectations afforded to the word-medial positions, and lexicalinformation exhibits less of an effect on word-initial positions as compared to laterpositions (Pitt and Samuel, 2006). As such, I predict word-initial exposure willpromote a more perception-oriented attentional set through increased the percep-tual salience of the modified /s/ category within the task. In contrast, I predictword-medial exposure will promote a more comprehension-oriented attentional5set. Further background on manipulating word position is given in Section 1.2.1.The second linguistic manipulation employed is the context in which a wordappears. In Experiments 1 and 2, participants are exposed to the modified soundcategory in isolated words, as in previous work (Norris et al., 2003). However,in Experiment 3, the words containing the sound category have been embeddedin sentences that are either predictive or unpredictive of the target word. Use ofsentence frames is predicted to promote comprehension-oriented attentional setsmore than words in isolation. Increasing the predictability of a word increasesthe expectations for the sounds in those words as well, mirroring the word-positionmanipulation above. Further background on the use of the sentence frames is givenin Section 1.2.2.Participants in all three experiments receive the same general instructions forthe exposure task, but one group of participants in each experiment receive addi-tional instructions about the nature of the /s/ category, following previous studies(Pitt and Szostak, 2012). Without any additional instructions, the task is predictedto promote a comprehension-oriented attentional set. The instructions about /s/are expected to promote a perception-oriented attentional set. Section 1.3 containsbackground on attention and instructions.The stimulus-based manipulation is the degree of typicality of the modified /s/category – Experiments 1 and 2 differ in this respect. In line with previous work(Norris et al., 2003), participants in Experiment 1 are exposed to a modified cate-gory halfway between /s/ and /S/. Participants in Experiment 2 are exposed to aneven more atypical /s/ – the modified fricative is more /S/-like than /s/-like. Ex-posure to an atypical category is predicted to promote a more perception-orientedattentional set because its atypicality is predicted to be more perceptually salient.The structure of the thesis is as follows. This chapter provides an overviewof relevant literature on perceptual learning (Section 1.1), linguistic expectations(Section 1.2), attention (Section 1.3), and category typicality (Section 1.4) as theyrelate to and motivate the three experiments of this dissertation. Chapter 2 detailstwo experiments using a lexically-guided perceptual learning paradigm, each withdifferent conditions for levels of lexical bias and attention. The two experimentsdiffer in the acoustic properties of the exposure tokens, with the first experimentusing a slightly atypical /s/ category that is halfway between /s/ and /S/. The sec-6ond experiment uses a more atypical /s/ category that is more /S/-like than /s/-like.Chapter 3 details an experiment using a novel exposure paradigm that manipulatessemantic predictability to increase the linguistic expectations during exposure. Fi-nally, Chapter 4 summarizes the results and discusses implications and future direc-tions. The perceptual learning literature has generally used consistent processingconditions to elicit perceptual learning effects, and a goal of this dissertation is toexamine the robustness and degree of perceptual learning across conditions thatpromote more comprehension- or more perception-oriented attentional sets.1.1 Perceptual learningPerceptual learning is a well established phenomenon in the psychophysics litera-ture. Training can improve a perceiver’s ability to discriminate in many disparatemodalities (e.g., visual acuity, somatosensory spatial resolution, weight estimation,and discrimination of hue and acoustic pitch (see Gibson, 1953, for review)). Inthe psychophysics literature, perceptual learning is an improvement in a perceiver’sability to judge the physical characteristics of objects in the world through atten-tion on the task, but not reinforcement, correction, or reward. Perceptual learningin speech perception refers to updating a listener’s sound categories based on ex-posure to a speaker’s modified production category (Norris et al., 2003; Vroomenet al., 2007). Figure 1.1 shows a schema of perceptual learning. Exposure to aspeaker exhibiting the modified category’s distribution of /s/ (top panel) causes par-ticipants to update their perceptual /s/ category to include more /S/-like instances.This expanded category (assuming no modifications to their /S/ category) results ina greater willingness of the participant to categorize ambiguous /s/-/S/ instances as/s/ rather than /S/ (bottom panel). Perceptual learning effects are then evaluated asthe difference between the normal categorization function and the one followingexposure to a modification.Norris et al. (2003) began the recent set of investigations into lexically-guidedperceptual learning in speech. Norris and colleagues exposed one group of Dutchlisteners to a fricative halfway between /s/ and /f/ at the ends of words like olif“olive” and radijs “radish”, while exposing another group to the ambiguous frica-tive at the ends of nonwords, like blif and blis. Following exposure, both groups7Figure 1.1: Schema of perceptual learning. The top panel shows categoriesfor /s/ and /S/ along a continuum, with a modified /s/ category in thedashed line. The bottom panel shows a categorization function for ex-posure to a typical /s/ (solid) and a modified /s/ (dashed).1 2 3 4 5 6 7 8 9 10 11Category/s//ʃ/02550751001 2 3 4 5 6 7 8 9 10 11Continuum stepPercent /s/ responseExposureNormal /s/Modified /s/of listeners were tested on their categorization of a fricative continuum from 100%/s/ to 100% /f/. Listeners exposed to the ambiguous fricative at the end of wordsshifted their categorization behavior, while those exposed to the same sounds atthe end of nonwords did not. The exposure using words was further differentiatedby the bias introduced by the words. That is, half the tokens ending in the am-biguous fricative formed a word if the fricative was interpreted as /s/ but not if itwas interpreted as /f/, and the others were the reverse. Listeners exposed only tothe /s/-biased tokens categorized more of the /f/-/s/ continuum as /s/, and listenersexposed to /f/-biased tokens categorized more of the continuum as /f/. The ambigu-ous fricative was associated with either /s/ or /f/ according to the bias induced bythe word, which led to an expanded category for that fricative at the expense of theother category. These results crucially show that perceptual categories in speechare malleable, and that the lexical system of the listener facilitates generalizationto that category in new forms and contexts.8In addition to lexically-guided perceptual learning, unambiguous visual cues tosound identity can cause perceptual learning as well; this is referred to as percep-tual recalibration. In Bertelson et al. (2003), an auditory continuum from /aba/ to/ada/ was synthesized and paired with a video of a speaker producing /aba/ or /ada/.Participants first completed a pretest that identified the maximally ambiguous stepof the /aba/-/ada/ auditory continuum. In eight blocks, participants were randomlyexposed to the ambiguous auditory token paired with video for /aba/ or /ada/. Fol-lowing each block, they completed a short categorization test. Participants showedperceptual learning effects, such that they were more likely to respond with /aba/if they had been exposed to the video of /aba/ paired with the ambiguous token inthe preceding block, and vice versa for /ada/.Visually-guided perceptual learning in speech perception has been modeledusing a Bayesian belief updating framework (Kleinschmidt and Jaeger, 2011). Inthis framework, the model categorizes the incoming stimuli based on an acoustic-phonetic feature and a binary visual feature, and then updates the distribution toreflect that categorization. This updated conditional distribution is then used forfuture categorizations in an iterative process. Kleinschmidt and Jaeger (2011) ef-fectively model the results of the behavioral study in Vroomen et al. (2007) in aBayesian framework, with models fit to each participant capturing the perceptualrecalibration and selective adaptation shown over the course of the experiment. TheBayesian belief updating framework has only been applied to the visually-guidedperceptual learning paradigm.A similar, but more general Bayesian framework for perception and action incognition is the predictive coding model (Clark, 2013), schematized in Figure 1.2.This framework uses a hierarchical generative model that aims to minimize pre-diction error between bottom-up sensory inputs and top-down expectations. Mis-matches between the top-down expectations and the bottom-up signals generateerror signals that are used to modify future expectations. Perceptual learning isthe result of modifying expectations to match learned input and reduce future errorsignals. The lowest levels of the hierarchical model have the most detailed repre-sentations. Representations lose detail and become more abstract the higher in thehierarchy they are.The predictive coding framework is adopted for this thesis rather than theo-9Figure 1.2: Schema of the predictive coding framework adapted from Clark(2013). Representations are hierarchical and are more abstract thehigher they are in the hierarchy. Blue arrows represent expectations,red arrows are error signals, and yellow is the actual sensory input.retical models of speech perception because representations in predictive codingcan be more general than linguistic objects. Models of speech perception – under-standably – focus primarily on the linguistic representations. Representations areusually proposed to encode both abstract and episodic information (e.g. McLennanet al., 2003). Abstract and episodic information would map to higher and lower lev-els of a representation in the predictive coding framework, respectively. However,a representation in the predictive coding framework is not limited to the linguis-tic domain. For instance, individual speakers can be thought of as part of moreabstract accent representations. If a listener is exposed to multiple speakers of anaccent, they are better at understanding novel speakers of that accent than a listenerexposed to just one speaker of that accent (Bradlow and Bent, 2008). Predictions10for sensory input when listening to the novel speaker would then be shaped byexpectations based on the abstract accent as well as the linguistic representationsgenerally assumed.Perceptual learning in the psychophysics literature has shown a large degreeof exposure-specificity, where observers show learning effects only on the same orvery similar stimuli as those they were trained on. As such, perceptual learninghas been argued to reside in or affect the early sensory pathways, where stimuli arerepresented with the greatest detail (Gilbert et al., 2001). Visually-guided percep-tual learning has also shown a large degree of exposure-specificity, where partici-pants do not generalize cues across speech sounds (Reinisch et al., 2014) or acrossspeakers unless the sounds are sufficiently similar (Eisner and McQueen, 2005;Kraljic and Samuel, 2005, 2007; Reinisch and Holt, 2013). Crucially, lexically-guided perceptual learning in speech has shown a greater degree of generalizationthan what would be expected from a purely psychophysical standpoint. The test-ing stimuli are in many ways quite different from the exposure stimuli. Partici-pants are trained on multisyllabic words ending in an ambiguous sound, but testedon monosyllabic words (Reinisch et al., 2013) and nonwords (Norris et al., 2003;Kraljic and Samuel, 2005). In these cases, generalization is robust; however, someexposure-specificity has been found when exposure and testing use different posi-tional allophones (Mitterer et al., 2013).Why is lexically-guided perceptual learning more context-general? The exper-iments performed in this dissertation provide evidence that this context-generalityis the result of a listener’s attentional set, which can be influenced by linguistic, in-struction, or stimulus properties. A comprehension-oriented attentional set, wherea listener’s goal is to understand the meaning of speech, promotes generalizationand leads to greater perceptual learning. A purely perception-oriented attentionalset, where a listener’s goal is to perceive specific qualities of a signal, does notpromote generalization. The attentional set promoted in the experiments in thisdissertation is comprehension-oriented, as the tasks are lexically guided. However,perception-oriented attentional sets will be promoted in some conditions. I predictthat perceptual learning will be present across all conditions, but participants inconditions that promote perception-oriented attentional sets should show smallerperceptual learning effects. In terms of a Bayesian framework with error propaga-11tion, a more perception-oriented attentional set would keep error propagation morelocal, resulting in the exposure-specificity seen more in the psychophysics litera-ture and the visually-guided paradigm. A more comprehension-oriented attentionalset would propagate errors farther upward to more abstract representations. In bothcases, errors would propagate to where attention is focused, but more abstract rep-resentations would be more applicable to novel contexts, leading to the observedcontext-general perceptual learning. These attentional sets will be explored in moredetail in Section 1.3 following an examination of the linguistic factors that will bemanipulated in the experiments in Chapters 2 and 3.1.2 Linguistic expectations and perceptual learningThe linguistic manipulations used to induce different attentional sets are lexicalbias and semantic predictability. Chapter 2 presents two experiments using a stan-dard lexically-guided perceptual learning paradigm, which uses lexical bias as themeans to link an ambiguous sound to an unambiguous category. In Chapter 3, anovel paradigm is used to further promote use of comprehension-oriented atten-tional sets. This paradigm embeds words in sentences that differ in their semanticpredictability.1.2.1 Lexical biasLexical bias is the primary way through which perceptual learning is induced in theexperimental speech perception literature. Lexical bias, also known as the GanongEffect, refers to the tendency for listeners to interpret a speaker’s (noncanonical)production as a meaningful word rather than a nonsense word. For instance, given acontinuum from a word to a nonword that differs only in the initial sound (e.g., taskto dask), listeners are more likely to interpret any step along the continuum as theword endpoint rather than the nonword endpoint as compared to a continuum whereneither endpoint is a word (Ganong, 1980). This bias is exploited in perceptuallearning studies to allow for noncanonical, ambiguous productions of a sound tobe readily linked to pre-existing sound categories. In terms of the attentional setsproposed in this dissertation, lexical effects, including lexical bias, arise due tocomprehension-oriented attentional sets.12Comprehension-oriented attentional sets are not limited to just comprehensiontasks. Lexical effects have also been found for reaction time in phoneme detec-tion tasks: sounds are detected faster in words than in nonwords. However, suchlexical effects are dependent on the attentional set being employed. If the stimuliare sufficiently repetitive (e.g. all having the same CV shape) the lexical bias ef-fects disappear (Cutler et al., 1987). The monotony or variation of filler items issufficient to bias listeners towards perception-oriented or comprehension-orientedattentional sets, respectively. The lexical status of the filler items contributes toattentional set adoption as well. Lexical effects are found when the proportion ofword fillers is high, but disappear when the proportion of nonword fillers is high(Mirman et al., 2008).The lexical effect of interest in this dissertation is lexical bias. The degree towhich a word biases the perception of a sound is primarily determined by prop-erties of the word. Longer words show stronger lexical bias than shorter words(Pitt and Samuel, 2006). Continua formed using trisyllabic words, such as estab-lish and malpractice, were found to show consistently larger lexical bias effectsthan monosyllabic words, such as kiss and fish. Pitt and Samuel (2006) also foundthat lexical bias from trisyllabic words was robust across experimental conditions(e.g., compressing the durations by up to 30%), but lexical bias from monosyl-labic words was more fragile and condition dependent. The lexical bias effectsshown by monosyllabic words only approached those of trisyllabic words whenthe participants were told to keep response times within a certain margin and weregiven feedback when the response time fell outside the desired range. The reactiontime monitoring could have added a greater cognitive load for participants, whichhas been shown to increase lexical bias effects (Mattys and Wiget, 2011). Pitt andSamuel (2006) argue that longer words exert stronger lexical bias from more condi-tioning information present in longer words, as well as greater lexical competitionfor shorter words.Within a given word, different positions have stronger or weaker lexical biaseffects. Pitt and Szostak (2012) used a lexical decision task with a continuum offricatives from /s/ to /S/ embedded in words that differed in the position of thesibilant. They found that ambiguous fricatives later in the word, such as establishor embarrass, show greater lexical bias effects than the same ambiguous fricatives13embedded earlier in the word, such as serenade or chandelier. Pitt and Samuel(1993) found that for monosyllabic words, token-final targets produce more robustlexical bias effects than token-initial targets. Lexical bias is strengthened over thecourse of the word. As a listener hears more evidence for a particular word, theirexpectations for hearing the rest of that word increase.One final research paradigm that has investigated lexical biases is phonemerestoration tasks (Samuel, 1981). In this paradigm, listeners hear words with noiseadded to or replacing sounds and are asked to identify whether noise completelyreplaced part of the speech or noise was simply added to the speech. Lower sensi-tivity to noise addition versus noise replacement and increased bias for respondingthat noise has been added is indicative of phoneme restoration – that is, listenersare perceiving sounds not physically present in the signal. Samuel (1981) identifiedseveral factors that increase the likelihood of the phoneme restoration effect. In thelexical domain, words are more likely than nonwords to have phoneme restorations.More frequent words are also more likely to exhibit phoneme restoration effects,and longer words also show greater phoneme restoration effects. The position ofthe sound in the word also influences listeners’ decisions, with non-initial positionsshowing greater effects. The other influences on phoneme restoration discussed inSamuel (1981), namely the signal properties and sentential context are discussedin subsequent sections.Lexical bias effects are found across a wide range of tasks involving speech.However, lexical bias effects are not uniform across the word. Various models oflexical access give a large role to the initial sounds in the word (Marslen-Wilsonand Welsh, 1978; Gow and Gordon, 1995). In such models, initial perception ofsounds plays a disproportionate role in providing lexical identity, allowing latersounds to be perceived in reference to the initially perceived lexical identity. Inspontaneous speech, sounds are more likely to be produced canonically in earlypositions than in later positions (Pitt and Szostak, 2012). In terms of the predictivecoding framework (Clark, 2013), decreased lexical bias would result from the lackof (or decreased) expectations from higher levels of representation.Lexical effects are generally argued to be the result of attentional sets ratherthan the cause (Cutler et al., 1987; Pitt and Szostak, 2012). I propose that ambigu-ous sounds that are salient, in this case due to their position in the word, cause14listeners to adopt a more perception-oriented attentional set over the course of theexperiment. Under this proposal, increasing the perceptual salience of a few am-biguous sounds is functionally equivalent to having many ambiguous sounds thatare not as salient. That is, having some number of modified /s/ tokens at the be-ginnings of words is similar to having a larger number of modified /s/ tokens atthe ends of words (with the same overall number of trials) or modified /s/ tokensthat are less typical of /s/. In both cases, the likelihood of the participant noticingthe ambiguous sound is higher, and so too is their likelihood of adopting a moreperception-oriented attentional set for completing the task. The experiments inthis dissertation do not fully test this proposal, as the number of exposure tokensis never manipulated. However, it does predict that less typical modified sounds(discussed in Section 1.4) embedded later in words should produce comparableperceptual learning effects as more typical modified sounds at the beginnings ofwords. It is important to note that perceptual learning is predicted to occur regard-less of exposure location, following previous research (Norris et al., 2003; Kraljicand Samuel, 2005; Kraljic et al., 2008a,b; Clare, 2014), but the degree of percep-tual learning is predicted to be less when exposure is at the beginnings of words.Table 1.1 lists the predicted perceptual learning effects for the linguistic manipula-tions and instruction. Experiments 1 and 2 in Chapter 2 test the predictions relatedto lexical bias.Table 1.1: Summary of predictions for size of perceptual learning effects un-der different linguistic expectations and attention.Lexical bias Semantic predictabilityWord-initial Word-medial Unpredictable PredictableRegular attention Smaller effect Larger effect Larger effect Largest effectAttention to /s/ Smaller effect Smaller effect Smaller effect Smaller effect1.2.2 Semantic predictabilityIn addition to lexical bias, semantic predictability (Kalikow et al., 1977) can in-crease the predictability of modified sound categories by increasing the predictabil-ity of the word containing them. Sentences are semantically predictable when the15words prior to the final word point almost definitively to the identity of that finalword. For instance, the sentence fragment The cow gave birth to the... from Ka-likow et al. (1977) is almost guaranteed to be completed with the word calf. Onthe other hand, a fragment like She is glad Jane called about the... is far fromhaving a guaranteed completion beyond being a noun. Despite its name, seman-tic predictability does not incorporate formal semantic theory, but refers to worldknowledge that language users have.Words that are predictable from context are temporally and spectrally reducedcompared to words that are less predictable (Scarborough, 2010; Clopper and Pier-rehumbert, 2008). Despite this acoustic reduction, highly predictable sentencesare generally more intelligible. Sentences that form a semantically predictablewhole have higher word identification rates across varying signal-to-noise ratios(Kalikow et al., 1977) in both children and adults (Fallon et al., 2002), and acrossnative monolingual and early (but not late) bilingual listeners listeners (Mayo et al.,1997). Highly predictable sentences are more intelligible to native listeners innoise, even when signal enhancements are not made, but nonnative listeners re-quire both signal enhancements and high predictability to see any benefit (Bradlowand Alexander, 2007). However, when words at the ends of predictive sentencesare excised from their context, they tend to be less intelligible than words excisedfrom non-predictive contexts (Lieberman, 1963).Semantic predictability has similar effects to lexical bias on phoneme catego-rization (Connine, 1987; Borsky et al., 1998). In those studies, a continuum fromone word to another, such as coat to goat, is embedded in a sentence frame that se-mantically coheres with one of the endpoints. The category boundary shifts basedon the sentence frame. If the sentence frame cues the voiced stop, more of thecontinuum is subsequently categorized as the voiced stop and vice versa for thevoiceless stop.In the phoneme restoration paradigm, higher semantic predictability has beenfound to bias listeners toward perceptually restoring a sound (Samuel, 1981). Thisincreased bias towards interpreting the stimulus as an intact word was also coupledwith an increase in sensitivity between the two types of stimuli (i.e. noise addedto speech, speech replaced with noise), which Samuel (1981) suggests is the resultof a lower cognitive load in predictable contexts. Later work has suggested that in16cases of lower cognitive load, finer phonetic details are encoded (see also Mattysand Wiget, 2011).To summarize, the literature on semantic predictability has shown largely simi-lar effects as lexical bias in terms of how sounds are categorized and restored. Fromthis, I hypothesize that increasing the expectations for a word through semanticpredictability will promote a comprehension-oriented attentional set, as perceptionof the modified sound category will not be strictly necessary for comprehension.Listeners who are exposed to an /s/ category that is more /S/-like only in words thatare highly predictable from context are therefore predicted to show larger percep-tual learning effects than listeners exposed to the same category only in words thatare unpredictable from context. However, there may be an upper limit for listenerexpectations when both semantic predictability and lexical bias are high, as com-mitting too much to a particular expectation could lead to garden path phenomena(Levy, 2008) or other misunderstandings. Table 1.1 lists the predicted perceptuallearning effects for the linguistic manipulations and instruction. The effect of se-mantic predictability on perceptual learning is explicitly tested in Chapter 3.1.3 Attention and perceptual learningAttention is a large topic of research in its own right, and this section only reviewsliterature that is directly relevant to perceptual learning and this thesis. Attentionhas been found to have a role in perceptual learning in the psychophysics literature.Indeed, Gibson (1953) identifies attention as the sole prerequisite to perceptuallearning. There is some evidence of short-lived perceptual learning without explicitattention (Watanabe et al., 2001), but the effects are not as robust as for attendedperceptual learning. Perceptual learning is not alone in requiring attention; learningstatistical regularities in a speech stream likewise depends on auditory attention,either explicitly or through passive listening (Toro et al., 2005; Saffran et al., 1997,but see Finn et al., 2014). The model of attention used in this dissertation is that ofattentional sets.Attentional sets refer to the strategies that the perceiver uses to perform a task.The attentional sets widely used in the visual perception literature do not aligncompletely with the notion of perception-oriented and comprehension-oriented at-17tentional sets used here. For instance, in a visual search task, attending to color,orientation, motion, and size are the predominant strategies (Wolfe and Horowitz,2004). However, some parallels are present. The two broad categories in the vi-sual perception literatures are focused and diffuse attentional sets. Focused setsdirect attention to components of the sensory input, perceiving the trees instead ofthe forest. Diffuse sets direct attention to global properties of the sensory input,perceiving the forest instead of the trees. The two attentional sets employed in vi-sual search paradigms introduced above – singleton-detection and feature-detectionattentional sets – are diffuse and focused, respectively. Perception-oriented andcomprehension-oriented attentional sets have also been referred to as focused anddiffuse attentional sets in recent speech perception work. In Pitt and Szostak(2012), a diffuse attentional set is employed when detecting words from nonwordsin a lexical decision task, and a focused attentional set is employed when par-ticipants’ attention is directed to a potentially misleading sound. Attentional setselection is primarily affected by the instructions and the stimuli for the task andthey tend to become entrenched over time (Leber and Egeth, 2006).Listeners can employ different attentional sets depending on the nature of thetask, as well as other processing considerations. For instance, listeners can attendto particular syllables or sounds in syllable- or phoneme-monitoring tasks (Norrisand Cutler, 1988, and others), and even particular linguistically relevant positions(Pitt and Samuel, 1990). However, even in these low-level, signal based tasks,lexical properties of the signal can exhibit some influence if the stimuli are notmonotonous enough to disengage comprehension (Cutler et al., 1987). Addition-ally, when performing a phoneme categorization task under higher cognitive load,such as performing a more difficult concurrent task, listeners show increased lexi-cal bias effects (Mattys and Wiget, 2011). Stimulus variation in general seems tolead towards a more diffuse, comprehension-oriented attentional set, where thegoal is firmly more comprehension-based than low-level perception-based. Incomprehension-oriented tasks, such as a lexical decision tasks, explicit instruc-tions can promote a more perception-oriented attentional set. When listeners aretold that the speaker’s /s/ is ambiguous and to listen carefully to ensure correct re-sponses, they are less tolerant of noncanonical productions across all positions inthe word (Pitt and Szostak, 2012). That is, listeners whose attention is directed to18the speaker’s sibilants are less likely to accept the modified production as a wordthan listeners given no particular instructions about the sibilants. While the primarytask has a large influence on the type of attentional set adopted, other instructionsand aspects about the stimuli can shift the listener’s attentional set toward anotherone.Attentional sets have been found to affect what aspects of stimuli are percep-tually learned in the visual domain. Ahissar and Hochstein (1993) found that, ingeneral, attending to global features for detection (i.e., discriminating different ori-entations of arrays of lines) does not make participants better at using local featuresfor detection (i.e., detection of a singleton that differs in angle in the same arraysof lines), and vice versa. Perceptual learning in the visual domain is limited to theaspects of the stimuli to which participants were attending.Attentional sets have not been directly manipulated in previous lexically-guidedperceptual learning literature, but some work has been done on how individual dif-ferences in attention control can impact perceptual learning. Scharenborg et al.(2014) presents a perceptual learning study of older Dutch listeners in the modelof Norris et al. (2003). In addition to the exposure and test phases, these olderlisteners completed tests for high-frequeny hearing loss, selective attention, andattention-switching control. Selective attention refers to the ability of the partic-ipants to focus on one element in visual string to the exclusion of other (poten-tially distracting) elements, measured using the Flanker Test (Eriksen and Eriksen,1974). Attention-switching control is measured using the Trail-Making Test (Re-itan, 1958), where participants complete two tasks of connecting dots. In the firsttask, dots are numbered from 1 to 25 and the trail must go from 1 to 25 in order.In the second task, dots are labeled with either letters or numbers, and the trailmust alternate between the two in ascending order (1-A-2-B, etc.). Differences inthe time for completion of these two tasks is indicative a participant’s attention-switching control, with faster performance in the second task indicative of betterattention-switching control.Scharenborg and colleagues found no evidence that perceptual learning wasinfluenced by listeners’ hearing loss or selective attention abilities, but they didfind a significant relationship between a listener’s attention-switching control andtheir perceptual learning. Listeners with worse attention-switching control showed19greater perceptual learning effects, which the authors ascribed to an increased re-liance on lexical information. Older listeners were shown to have smaller percep-tual learning effects compared to younger listeners, but the differences were mostprominent directly following exposure (Scharenborg and Janse, 2013). Youngerlisteners initially had a larger perceptual learning effect in the first block of test-ing, but the effect lessened over the subsequent blocks. Older listeners showedmore consistent, but smaller perceptual learning effects, hypothesized to be due togreater prior experience. Scharenborg and Janse (2013) also found that participantswho endorsed more of the target items as words in the exposure phase showed sig-nificantly larger perceptual learning effects in the testing phase.There is evidence that attentional sets in the visual domain become entrenchedover time (Leber and Egeth, 2006). However, the fact that attention-switching con-trol in older adults was a significant predictor of the size of perceptual learningeffects (Scharenborg et al., 2014) reinforces that comprehension and perception,as defined in this dissertation, are not mutually exclusive. These findings do sug-gest that attention can be switched between comprehension and perception, andthat this switching has consequences for perceptual learning. The lexical decisiontask is oriented towards comprehension, so the primary attentional set is likely tobe a diffuse one relying more on lexical information than acoustic. Participantswith worse attention-switching control would have been less able to attend to thefine details of the signal than those with better attention-switching control, and it isprecisely those with worse attention-switching control that showed the larger per-ceptual learning effects. The ability to attend to finer sensory representations couldprevent error propagation to more abstract representations, leading to a smallerperceptual learning effect for participants with better attention-switching control.As stated above, Bayesian models account well for the results of perceptuallearning experiments. Attentional sets are crucial to the hypothesis tested in thisdissertation, but they do not play a role in the conceptual and computational modelsof perceptual learning. The predictive coding framework (Clark, 2013) provides again-based attentional mechanism. Gain is typically likened to increasing the vol-ume. For instance, attending to a specific location on a screen has subjectivelysimilar effects as increasing contrast (Ling and Carrasco, 2006) and attending tospeech from a single ear is subjectively similar to increasing the volume for that20Figure 1.3: A schema for predictive coding under a perception-oriented at-tentional set. Attention is represented by the pink box, where gain isenhanced for detection, but error signal propagation is limited to lowerlevels of sensory representation where the expectations must be updated.This is represented by the lack of pink nodes outside the attention box.As before, blue errors represent expectations, red arrows represent errorsignals, and yellow represents the sensory In this model, attention causes greater weight to be attached to errorsignals from mismatched expectations and sensory input, increasing their weightand their effect on future expectations. However, as noted by Block and Siegel(2013), this view of attention does not capture the full range of experimental re-sults. For instance, in a texture segregation task, spatial attention to the peripheryimproves detection accuracy where spatial resolution is poor, but attention to cen-tral locations, where spatial resolution is high, actually harms accuracy (Yeshurunand Carrasco, 1998). This detrimental effect is an instance of missing the forestfor the trees, as spatial resolution increased too much in the central locations to21perceive the larger texture. Instead the fine details interferred with perceiving thelarger texture. The attentional mechanism proposed in this dissertation limits er-ror propagation beyond where attention is focused. Attending to perception ratherthan comprehension should only update expectations about perception of that in-dividual instance. The lower sensory levels are where stimuli are represented withthe greatest degree of detail (Gilbert et al., 2001). Perceptual learning at theselower levels should be more exposure-specific and less generalized than any learn-ing that propagates to higher representational levels. Figures 1.3 and 1.4 showschemas of the proposed attention mechanism for updating future expectations un-der perception-oriented and comprehension-oriented attentional sets, respectively.In contrast, according to the mechanism proposed in Clark (2013), any increasesin attention, perception-oriented or otherwise, are predicted to lead to greater per-ceptual learning.1.4 Category typicality and perceptual learningA primary finding across the perceptual learning literature is that learning effectsare found only on testing items that are similar in some sense to the exposureitems. In the most extreme instance, perceptual learning is only found on the exactsame items as exposure (Reinisch et al., 2014), but most commonly, perceptuallearning is limited to items produced by the same speaker as the exposure items(Norris et al., 2003; Reinisch et al., 2013). However, a less studied question is whatproperties of the exposure items cause different degrees of perceptual learning. Inthis dissertation, two levels of category typicality are used. Figure 1.5 shows fourcategories. At the left and right ends are the original categories for /s/ and /S/ asproduced by a male Vancouver English speaker. The two categories in the middleare the modified categories used in Experiments 1 and 2. Experiment 1 uses amodified /s/ category that is halfway between the original /s/ and /S/, while themodified category for Experiment 2 is skewed more towards /S/. Thus, Experiment2 uses a more atypical /s/ category (farther from the typical /s/ distribution) thanExperiment 1. The more atypical category is predicted to be more salient, andtherefore promote a perception-oriented attentional set.22Figure 1.4: A schema for predictive coding under a comprehension-orientedattentional set. Attention is represented by the green box, where it isoriented to higher, more abstract levels of sensory representation. Er-ror signals are able to propagate farther and update more than just thefine grained low level sensory representations. As before, blue arrowsrepresent expectations, red arrows represent error signals, and yellowrepresents the sensory input.Sumner (2011) investigated category typicality through a manipulation of pre-sentation order. Listeners were exposed to French-accented English with modifica-tions to the /b/-/p/ category boundary. Participants were exposed to stimuli rangingfrom English-like to French-like voice onset time for /b/ and /p/. In one presenta-tion order, the order of stimuli was random, but in the others the voice onset timechanged in a consistent manner, such as starting as more French-like and becom-ing more English-like. The presentation order that showed the greatest perceptuallearning effects was the one that began more English-like and ended more French-like. The condition that mirrored the more normal course of nonnative speaker pro-23Figure 1.5: Distribution of original categories (red endpoints) and modified/s/ categories used in Experiments 1 and 2. Experiment 1 uses a max-imally ambiguous category between /s/ and /S/. Experiment 2 uses acategory that is more like /S/ than /s/. The x-axis was generated usingacoustic similarities used to generate Figures 2.3 and 2.7./s/Exp 1Exp 2/ʃ/nunciation changes, starting as more French-like and ending as more English-like,did not produce significantly different behavior than control participants who onlycompleted the categorization task. The random presentation order had perceptuallearning effects in between the two ordered conditions. These results suggest thatlisteners constantly update their category following each successive input, ratherthan only relying on initial impressions (contra Kraljic et al., 2008b). This find-ing is mirrored in Vroomen et al. (2007), where participants initially expand theircategory in response to a single, repeated modified input, but then entrench thatcategory as subsequent input is the persistently same. The data in Vroomen et al.(2007) is modeled using a Bayesian framework with constant updating of beliefs.However, in Sumner (2011), the constantly shifting condition also shows moreperceptual learning than a random order of the same stimuli, suggesting that smalldifferences in expectations and observed input induce greater updating than largedifferences. The bias towards small differences is better captured by the exemplarmodel proposed by Pierrehumbert (2001), where only input similar to the learned24distribution is used for updating that distribution.Variability is a fundamental property of the speech signal, so sound categoriesmust have some variance associated with them and certain contexts can have in-creased degrees of variability. For example, Kraljic et al. (2008a) exposed partic-ipants to ambiguous sibilants between /s/ and /S/ in two different contexts. In one,the ambiguous sibilants were intervocalic, and in the other they occurred as partof a /stô/ cluster in English words. Participants exposed to the ambiguous soundintervocalically showed a perceptual learning effect, while those exposed to thesibilants in /stô/ environments did not. The sibilant in /stô/ often surfaces closer to[S] in many varieties of English, due to coarticulatory effects from the other con-sonants in the cluster (Baker et al., 2011). They argue that the interpretation ofthe ambiguous sound is done in context of the surrounding sounds, and only whenthe pronunciation variant is unexplainable from context is the variant learned andattributed to the speaker (see also Kraljic et al., 2008b). In other words, a more/S/-like /s/ category is typical in the context of the /stô/ clusters, but is atypical inintervocalic position. Interestingly, given the lack of learning present in in the /stô/context, some degree of salience seems to be required to trigger perceptual learn-ing.Similarity of input to known distributions has effects in many psycholinguisticparadigms. For instance in phoneme restoration, Samuel (1981) found that thelikelihood of restoring a sound increases when said sound is acoustically similar tothe noise replacing it. When the replacement noise is white noise, fricatives andstops are more likely to be restored than vowels and liquids. Acoustic signals thatbetter match expectations are less likely to be noticed as atypical.In this dissertation, the degree of typicality of the modified category is manip-ulated across Experiments 1 and 2. In one case, the /s/ category for the speaker ismaximally ambiguous between /s/ and /S/, but in the other, the category is more like/S/ than /s/. The maximally ambiguous category is hypothesized to be less salientthan the more /S/-like /s/ category. This lessened salience will result in greateruse of comprehension-oriented attentional sets. I hypothesize that the more /S/-likecategory will shift listeners’ attentional sets to be more perception-oriented due totheir greater atypicality, which will lead to less generalized perceptual learning.The predictions for this hypothesis are summarized in Table 1.2.25Table 1.2: Summary of predictions for size of perceptual learning effectswhen exposed to different typicalities of the modified category.Lexical biasWord-initial Word-medialLess atypical More atypical Less atypical More atypicalRegular attention Smaller effect Smaller effect Larger effect Smaller effectAttention to /s/ Smaller effect Smaller effect Smaller effect Smaller effect1.5 Current contributionLexically-guided perceptual learning generalizes to new forms and contexts farmore than would be expected from a purely psychophysical perspective (Norriset al., 2003; Gilbert et al., 2001). Lexically-guided paradigms provide a focuson comprehension and psychophysics tasks giving focus to perception, promotingthe respecitive attentional sets. Indeed, visually-guided perceptual learning, withits emphasis on perception of speech, shows largely similar exposure-specificityeffects as the psychophysics findings (Reinisch et al., 2014). This dissertationexpands on the existing literature by modifying the exposure tasks to promotecomprehension- or perception-oriented attentional sets. Perceptual learning effectsare hypothesized to be smaller in the conditions that promote perception-orientedattentional sets, as perception exposure tasks have shown greater exposure-specificityeffects than comprehension exposure tasks.26Chapter 2Lexical decision2.1 MotivationThe experiments in this chapter implement a standard lexically-guided perceptuallearning experiment with exposure to a modified /s/ category during a lexical de-cision task. Because the exposure task is one of word recognition, participantsare predicted to default to a comprehension-oriented attentional set. Recall thatcomprehension-oriented attentional sets are hypothesized to facilitate perceptuallearning and generalization. Two experimental manipulations guide listeners touse more of a perception-oriented attentional set. The first manipulation relatesto the position of the modified /s/ category in the exposure tokens (silver versuscarousel). Lexical bias effects increase as the length of the word increases andas the word unfolds (Pitt and Samuel, 2006; Pitt and Szostak, 2012), so we predictthat more learning will take place in carousel-like words. The second manipulationis through explicit instructions about the modified /s/ category. Such instructionshave been shown to reduce lexical bias effects in lexical decision tasks (Pitt andSzostak, 2012), thus I predict a reduction in learning when attention is drawn tospeech sounds. Both of these manipulations will be present in Experiments 1 and2.Experiments 1 and 2 differ in the atypicality of the modified /s/ category. Stud-ies have reported greater perceptual learning when ambiguous stimuli are closerto the distribution expected by a listener than when the ambiguous stimuli are far-27ther away from expected distributions (Sumner, 2011). Words containing stimulifarther away from the target production are in general less likely to be endorsedas words, but similar effects of attention are found across word position (Pitt andSzostak, 2012). Experiment 2 contains the same manipulations to attention andlexical bias as Experiment 1, but with ambiguous stimuli farther from the targetproduction than those used in Experiment 1. Lower rates of generalized perceptuallearning are predicted for all conditions in Experiment 2.The hypothesis of this dissertation is that the greatest perceptual learning ef-fects should be observed when no attention is directed to the ambiguous soundsand when lexical bias is maximized. In such a case, participants should use acomprehension-oriented attentional set. If selective attention is directed to the am-biguous sounds, a more perception-oriented attentional set should be adopted withless generalization in perceptual learning as a result. Likewise, if the ambiguoussound is in a linguistically salient position with little to no lexical bias, a listener’sattention should be drawn to the ambiguous sound, causing adoption of a moreperception-oriented attentional set. Finally, if the ambiguous sounds are more atyp-ical, they should be more salient to listeners regardless of lexical bias, leading againto a more perception-oriented attentional set. Regardless of the cause, adopting aperception-oriented attentional set is predicted to inhibit a generalized perceptuallearning effect.2.2 Experiment 1In this experiment, listeners are exposed to ambiguous productions of words con-taining a single instance of /s/, where the /s/ has been modified to sound more/S/-like. Exposure comes in the guise of a lexical decision task. In one group, thecritical words have an /s/ in word-initial position (i.e., cement), with no /S/ neigh-bor (a word that differs only in the sibilant; i.e., shement); this is referred to as theWord-initial condition. In the other group, the critical words will have an /s/ inword-medial position (tassel) with no /S/ neighbor (tashel); this is referred to as theWord-medial condition. In addition, half of each group will be given instructionsthat the speaker has an ambiguous /s/ and to listen carefully, following Pitt andSzostak (2012).282.2.1 MethodologyParticipantsA total of 173 participants from the UBC population completed the experimentand were compensated with either $10 CAD or course credit1. The data from 77nonnative speakers of English and two native speakers of English with reportedspeech or hearing disorders were excluded from the analyses. This left data from94 participants for analysis. Twenty additional native English speakers participatedin a pretest to determine the most ambiguous sounds. Twenty-five other nativespeakers of English participated for course credit in a control experiment.MaterialsTable 2.1: Filler words used in all experiments.acorn acrobat antenna apple balloon bamboobuckle butterfly cabin calendar camel campfirecandy cockpit collar cowboy cradle cutlerydarkroom diamond doorbell dryer elephant featherfingerprint garlic goalie gondola graffiti helicopterladder ladle librarian lightning lumber mannequinmeadow microwave minivan motel movie muralnapkin omelet painter piano ponytail popcornreferee table tadpole teapot theatre tiretortilla tractor traffic tunnel umbrella weathermanOne hundred and twenty English words and 100 phonologically-legal non-words were used as exposure materials. The set of words consisted of 40 criticalitems, 20 control items, and 60 filler words. Filler words and nonwords are listedin Tables 2.1 and 2.2, respectively. The control words containing /S/ are given inTable 2.3. Half of the critical items had an /s/ in the onset of the first syllable1The student population of the University of British Columbia has a diverse language background.In order to control for the language background of participants and to make the results of the currentexperiments more comparable to previous research, participants were only analyzed if they wereself-reported native speakers of English. Participants were still compensated for their participation,but the data is currently unanalyzed.29Table 2.2: Filler nonwords used in Experiments 1 and 2.apolm arafimp arnuff balrop bambany bawapeetbettle bimobel bipar blial brahata danoordarnat deoma follipocktel foter gallmit gamteeganla gippelfraw giptern gittle glaple golthingoming gompy gorder hagrant hammertrent hintarberhovear iddle iglopad igoldion impomo inoretkempel kimmer kire klogodar kowack lefeloolindel mogmet mopial motpem namittle nartomynepow neproyave nidol noler nometin nonifemomplero pammin peltlon pickpat pidbar pluepelaipoara poltira pomto potha prickpor prithetradadub rigloriem rinbel rindner ripnem roggelroppet rudle talell talot tankfole tayadeteerell tello tepple teygot theely theerhebthorkwift thragkole timmer tingora tinogail tiracktirrenper tovey toygaw tuckib tuddom tutrewywapteep wekker wogim yovernon(Word-initial) and half had an /s/ in the onset of the final syllable (Word-medial).All critical tokens formed nonwords if their /s/ was replaced with /S/. Half the con-trol items had an /S/ in the onset of the first syllable and half had an /S/ in the onsetof the final syllable. Each critical item and control item contained just the one sibi-lant, with no other /s z S Z Ù Ã/. Filler words and nonwords did not contain anysibilants. Frequencies and number of syllables across item types are in Table 2.4Table 2.3: Words containing /S/ in all brochure cashier chandeliercushion eruption hibernation parachutepatient shadow shampoo shareholdershelter shiny shoplifter shouldershovel sugar tissue usherFour monosyllabic minimal pairs were selected as test items for categorization.These minimal pairs differed only in the voiceless sibilant at the beginning of the30Table 2.4: Mean and standard deviations for frequencies (log frequency permillion words in SUBTLEXus) and number of syllables of each item typeItem type Frequency Number of syllablesFiller words 1.81 (1.05) 2.4 (0.55)/s/ Word-initial 1.69 (0.85) 2.4 (0.59)/s/ Word-medial 1.75 (1.11) 2.3 (0.47)/S/ Word-initial 2.01 (1.17) 2.3 (0.48)/S/ Word-medial 1.60 (1.12) 2.4 (0.69)word (sack-shack, sigh-shy, sin-shin, and sock-shock). Two of the pairs had ahigher log frequency per million words (LFPM) from SUBTLEXus (Brysbaert andNew, 2009) for the /s/ word and two had higher LFPM for the /S/ word, as shownin Table 2.5.Table 2.5: Frequencies (log frequency per million words in SUBTLEXus) ofwords used in categorization continuaContinuum /s/-word frequency /S/-word frequencysack-shack 1.11 0.75sigh-shy 0.53 1.26sin-shin 1.20 0.48sock-shock 0.95 1.46All words and nonwords were recorded by a male Vancouver English speakerin a quiet room. Critical words for the exposure phase were recorded in pairs, oncenormally and once with the sibilant swapped forming a nonword. The speaker wasinstructed to produce both forms with comparable speech rate, speech style, andprosody.For each critical item, the word and nonword versions were morphed togetherin an 11-step continuum (0%-100% of the nonword /S/ recording, in steps of 10%)using STRAIGHT (Kawahara et al., 2008) in Matlab. Prior to morphing, the wordand nonword versions were time aligned based on acoustic landmarks, such as stopbursts, onset of F2, nasalization or frication, etc. All control items and filler wordswere processed and resynthesized by STRAIGHT to ensure a consistent quality31across stimulus items.PretestTo determine which step of each continua would be used in exposure, a phoneticcategorization experiment was conducted. Participants were presented with eachstep of each exposure word-nonword continuum and each categorization minimalpair continuum, resulting in 495 trials (40 exposure words plus five minimal pairsby 11 steps) for each listener, blocked into exposure and categorization. Partici-pants completed a lexical decision task for the exposure continua, responding witheither “word” or “nonword” to each step of the continua. For the categorizationcontinua, participants identified the first sound as either “s” or “sh”. The experi-ment was implemented in E-prime (Psychology Software Tools, 2012).Figure 2.1: Proportion of word-responses for Word-initial exposure words.Solid lines represent Experiment 1 selection criteria (50% word-response rate) and dashed lines represent Experiment 2 selection crite-ria (30% word-response rate). Dots are averaged word-response acrosssubjects, and the blue line is a binomial model of the responses.l l l l llll l l ll l l ll ll l l l ll l l l l lll l l ll l l l l ll ll l ll l l l l lll ll ll l l l lll l l l ll l l l l llll l ll l l lllll l l ll l l l ll ll l l ll l l l lll ll l ll l l l lll ll l ll l l l lll l l l ll l l l lll l l l ll l l l lll l l l ll llll ll l l l ll l l l l ll l llll l l l ll l ll l ll l l l l llll l ll ll lll ll l l lceiling celery cement ceremony saddlesafari sailboat satellite sector seminarsettlement sidewalk silver socket sofasubmarine sunroof surfboard syrup0.000.250.500.751. 6 9 3 6 9 3 6 9 3 6 9Step numberProportion /s/ response32Table 2.6: Step chosen for each Word-initial stimulus in Experiment 1 andthe proportion /s/ response in the pretestWord Step chosen Proportion /s/ responseceiling 7 0.40celery 7 0.30cement 7 0.26ceremony 7 0.44saddle 8 0.25safari 6 0.45sailboat 7 0.35satellite 7 0.45sector 6 0.39seminar 7 0.33settlement 7 0.42sidewalk 7 0.30silver 7 0.21socket 7 0.30sofa 7 0.26submarine 7 0.45sunroof 6 0.39surfboard 7 0.59syrup 6 0.37Average 6.8 0.36The proportion of /s/-responses (or word responses for exposure items) at eachstep of each continuum was calculated and the most ambiguous step chosen. Thethreshold for the ambiguous step for Experiment 1 was when the percentage of/s/-response dropped near 50%. The lists of steps chosen for Word-initial targetstimuli are in Table 2.6 and Table 2.7, respectively. For the minimal pairs, sixsteps surrounding the 50% cross-over point were selected for use in the phoneticcategorization task. Due to experimenter error, the continuum for seedling wasnot included in the stimuli, so the chosen step was the average chosen step forthe /s/-initial words. The average step chosen for Word-initial /s/ words was 6.8(SD = 0.5), and for Word-medial /s/ words the average step was 7.7 (SD = 0.8).To visualize the effect of morphing on the acoustics of the sibilants and to con-33Figure 2.2: Proportion of word-responses for Word-medial exposure words.Solid lines represent Experiment 1 selection criteria (50% word-response rate) and dashed lines represent Experiment 2 selection crite-ria (30% word-response rate). Dots are averaged word-response acrosssubjects, and the blue line is a binomial model of responses.l l l l llll l l ll l l l lll ll l ll l l l l ll l l lll l l l l lll l l ll l l l ll l ll l ll l l l l l ll l lll l l l l lll l l ll l l l llll l lll l l l lll l l l ll l l l l lll l l ll l l l l l ll ll ll l l l l ll l ll ll l l l l llll l ll l l l l ll l ll ll l l l l llll l ll l l l l ll ll l ll l l l l l ll ll ll l l ll lll ll ll l l l l ll l l l ll l l l l ll ll llcarousel castle concert croissant currencycursor curtsy dancer dinosaur faucetfossil galaxy medicine missile monsoonpencil pharmacy tassel taxi whistle0.000.250.500.751. 6 9 3 6 9 3 6 9 3 6 9 3 6 9Step numberProportion /s/ responsefirm the desired effects, a multidimensional scaled plot of acoustic distance wasconstructed, similar to Mielke (2012). Using the python-acoustic-similaritypackage (McAuliffe, 2015), sibilants were transformed into arrays of mel-frequencycepstrum coefficients (MFCC), which are an auditory representation of acousticwaveforms. Pairwise distances between each sibilant production were computedvia dynamic time warping to create a distance matrix of the sibilant productions.The dynamic time warping algorithm aligns time frames that are similar while al-lowing for time to be compressed or expanded for one of the productions. Thedistance returned is independent of differences in timing, but differences in orderof frames are maintained. This distance matrix from the parwise calculations wasthen multidimensionally scaled to produce Figure 2.3. As seen there, the original,unsynthesized productions (in blue) form four quadrants based on the two princi-34Table 2.7: Step chosen for each Word-medial stimulus in Experiment 1 andthe proportion /s/ response in the pretestWord Step chosen Proportion /s/ responsecarousel 7 0.45castle 7 0.50concert 7 0.53croissant 7 0.42currency 7 0.58cursor 8 0.53curtsy 8 0.40dancer 7 0.45dinosaur 7 0.50faucet 7 0.45fossil 8 0.30galaxy 9 0.47medicine 8 0.55missile 10 0.30monsoon 8 0.42pencil 7 0.45pharmacy 8 0.42tassel 8 0.35taxi 8 0.50whistle 7 0.58Average 7.7 0.45pal components of the distance matrix. The first dimension is associated with thecentroid frequency of the sibilant, separating /s/ tokens from /S/. The second dimen-sion separates out the word-medial sibilants (in smaller font) from the word-initialsibilants (in larger font), likely due to the different coarticulatory effects based onword position. The categorization tokens (all word-initial) predictably occupy thespace between the word-initial /s/ tokens and the word-initial /S/ tokens. The ex-posure tokens pattern as expected. Exposure /S/ tokens are overlapping with theoriginal distributions for /S/ tokens. Exposure /s/ tokens are in between /s/ and /S/,though word-medial /s/ tokens are closer to the original /S/ distribution, reflectingthe difference in average stimuli step chosen in Tables 2.6 and 2.7.35Figure 2.3: Multidimensional scaling of the acoustic distances between thesibilants of original productions, categorization tokens and the expo-sure tokens in Experiment 1. Categorization and exposure tokens weresynthesized from the original productions using STRAIGHT (Kawaharaet al., 2008).ssssss ʃʃʃʃ ʃssssss sss ssssʃʃʃs ʃssss s ʃʃʃʃ ʃsʃssss ʃssʃsss ʃʃʃʃsʃʃʃssʃʃʃsssssssss ʃʃʃ ʃʃʃ ʃʃ ʃʃʃʃ ʃʃsssss ʃsssssʃ ʃsʃsss ss ss s sss ssssss sʃʃ ʃʃʃs ʃʃʃsʃʃʃʃʃʃsssss s ssss ss s s s ss ss s s s ssss ʃs sssssʃʃʃʃʃ ʃʃʃʃ ʃʃʃʃʃ ʃʃʃʃ ʃʃ-25025-50 -25 0 25First principal componentSecond principal componentExperimentaaaCategorizationExposureOriginalExposure Typeaa Word-medialWord-initialExperiment designParticipants were assigned to one of four groups from a 2x2 between-subject fac-torial design. The first factor was the position of the ambiguous sibilant in theexposure words (Exposure Type: Word-initial versus Word-medial) and the sec-ond factor was whether participants were given additional instructions about thesibilant (Attention: Attention versus No Attention). Two of the groups of partic-ipants were exposed to only critical items that began with /s/ (Word-initial) andthe other two were exposed to only critical items that had an /s/ in the onset ofthe final syllable (Word-medial). This gave a consistent 200 trials in all exposurephases with identical control and filler items for all participants. Participants inhalf the groups (Attention) received additional instructions that the speaker’s “s”36sounds were sometimes ambiguous, and to listen carefully to ensure correct re-sponses in the lexical decision. Participants in the control experiment completedonly the categorization task.ProcedureParticipants in the experimental conditions completed an exposure task and a cat-egorization task in E-Prime (Psychology Software Tools, 2012). Exposure was alexical decision task, where participants heard auditory stimuli and were instructedto respond with either “word” if they thought what they heard was a word or “non-word” if they did not think it was a word. The buttons corresponding to “word” and“nonword” were counterbalanced across participants. Trial order was pseudoran-dom. Stimuli containing sibilants (/s/ or /S/) did not appear in the first six trials ornext to each other (following Reinisch et al., 2013). For each trial, a blank screenwas shown for 500 ms, and then the two responses and their corresponding but-tons on the button box were shown (i.e. “word” and “1” on one side of the screenand “nonword” and “5” on the other side of the screen). The auditory stimuluswas played 500 ms following the presentation of the response options. Participantshad 3000 ms from the onset of the auditory stimulus to respond. Participants weregiven feedback whether a response was detected in the 3000 ms window. Thisfeedback did not include accuracy or response time information and was shown for500 ms before the following trial began. Every 50 trials participants were given abreak and the next trial did not start until the participant pressed a button.In the categorization task, participants heard an auditory stimulus and had tocategorize it as one of two words, differing only in the onset sibilant, i.e. sin or shin.The buttons corresponding to the words were counterbalanced across participants.The six most ambiguous steps of the minimal pair continua were used with sevenrepetitions each, giving a total of 168 trials. Each trial proceeded similarly toexposure. A blank screen was displayed for 500 ms, followed by the responsescreen for 500 ms (i.e. “sin” and “1” on one side, “shin” and “5” on the other)before the auditory stimulus was presented. Participants had 3000 ms from theonset of the auditory stimulus to respond and feedback about whether the responsewas detected was shown for 1500 ms. Participants were given a break every 4037trials, except after 160 trials, as that would leave eight trials in the rest of theexperiment.To remove experimenter interaction between exposure and categorization, par-ticipants were given oral instructions explaining both tasks at the beginning of theexperiment. Written instructions were presented to participants at the beginningof each task as well. The instructions for the exposure task given to participantsassigned to an Attention condition included explicit reference to the modified sibi-lants. Participants were told that “this speaker’s ‘s’ sound is sometimes ambiguous”and instructed to “listen carefully so as to choose the correct response.”AnalysisPerceptual learning effects are assessed through logistic mixed-effects models ofthe categorization task data. Responses were coded as 1 for /s/ responses and 0 for/S/ responses. Positive significant estimates therefore indicate higher likelihood of/s/ response across categorization. Thus, positive significant effects are indicativeof perceptual learning, as higher likelihood of /s/ response is associated with anexpanded /s/ category.Deviance contrast coding is used for all two-level independent variables, sothe intercept of the model represents the grand mean. Main effects for factors arecalculated with other factors held at their average value, rather than at an arbitraryreference level. For any factors that have three levels, treatment (dummy) contrastsare used with an appropriate reference level to aid interpretation. Numeric inde-pendent variables were centered prior to inclusion in models. Although continuasteps are discrete (i.e., Step 1 and Step 2, but no intermediate tokens), it is enteredas a numeric variable in the models to reduce the complexity of models. Graphs ofcategorization results show continua step as categorical factor to aid interpretation.For categorization models, Continuum was a random effect. However, therewere only four minimal pair continua used in the categorization, so the randomeffect status may not be warranted. The estimates for the continua effects are likelynot very reliable, but differences between continua are not the principle questionbeing investigated. Use of a by-Continuum random effect structure with maximalrandom slopes allowed for estimation of the fixed effects that are not driven by one38particular minimal pair continuum.2.2.2 ResultsControl experimentResponses with reaction times less than 200 ms or greater than 2500 ms were ex-cluded from analyses (following Reinisch et al., 2013). A logistic mixed effectsmodel was fit with Subject and Continuum as random effects and Step as a fixedeffect with by-Subject and by-Continuum random slopes for Step. The interceptwas not significant (β = 0.43,SE = 0.29,z = 1.5, p = 0.13), indicating that con-trol participants did not differ significantly from the pretest participants. Step wassignificant (β = −2.61,SE = 0.28,z = −9.1, p < 0.01), with higher steps (more/S/-like) responded to more as /S/. Results from the control experiment are shownin Figure 2.6 and all other categorization results as a reference point for interpretingthe figures.ExposurePerformance on the exposure task was high overall: 92% of the filler words werecorrectly accepted and 89% of nonwords were correctly rejected. Trials with non-word stimuli and responses faster than 200 ms or slower than 2500 ms were ex-cluded from further analysis. A logistic mixed-effects model with accuracy as thedependent variable was fit with fixed effects for Trial (0-200), Trial Type (Filler,/s/, and /S/), Attention (No Attention and Attention), Exposure Type (Word-Initialand Word-Medial), and their interactions. Trial Type was coded using treatment(dummy) coding, with Filler as the reference level. Deviance contrast coding wasused for Exposure Type (Word-initial = 0.5, Word-medial = -0.5) and Attention (Noattention = 0.5, Attention = -0.5). The random effect structure was as maximallyspecified as possible with random intercepts for Subject and Word. By-Subject ran-dom slopes were specified for Trial, Trial Type, and their interactions. By-Wordrandom slopes were specified for Attention, Exposure Type, and their interactionsA significant fixed effect was found for Trial Type of /s/ versus Filler (β =−2.13,SE = 0.31,z = −6.8, p < 0.01), as participants were less likely to endorse39words containing the modified /s/ category as compared to filler words. However,there was a significant interaction between Trial and Trial Type of /s/ versus Filler(β =−0.45,SE = 0.14,z = 3.1, p = 0.01), so the differences in accuracy betweenwords with /s/ and filler words diminished over time. Participants adapted to thespeaker’s /s/ over the course of the experiment. There was also a significant maineffect of Attention (β = −0.57,SE = 0.28,z = −2.0, p = 0.04), indicating thatparticipants were more accurate at identifying words in the Attention conditioncompared to the No Attention condition. However, there was a marginal interac-tion between Attention and Trial Type of /s/ versus Filler (β = 0.72,SE = 0.39,z =1.8, p = 0.06), suggesting that attention only increased accuracy for words not con-taining the modified /s/ category. Figure 2.4 shows within-subject mean accuracyacross exposure, with Trial in four blocks.Figure 2.4: Within-subject mean accuracy for words in the exposure phaseof Experiment 1, separated out by Trial Type (Filler, /s/, and /S/). Errorbars represent 95% confidence intervals.Filler /s/ /ʃ/ 51-100 101-150 151-200 1-50 51-100 101-150 151-200 1-50 51-100 101-150 151-200Exposure trial blockWord recognition accuracyAttentionNo attentionAttentionA linear mixed-effects with logarithmically-transformed reaction time as thedependent variable was fit with identical fixed effect and random effect structureas the logistic model for accuracy. Significant effects were found for Trial Typeof /s/ versus Filler (β = 0.71,SE = 0.07, t = 10.8) and the interaction betweenTrial and Trial Type of /s/ versus Filler (β = −0.08,SE = 0.02, t = −3.1). These40effects follow the pattern found in the accuracy model, where participants beginwith slower reaction times to words with /s/, but over time this difference betweenwords /s/ and filler words lessens. Figure 2.5 shows within-subject mean reactiontime across exposure, with Trial in four blocks.Figure 2.5: Within-subject mean reaction time to words in the exposure phaseof Experiment 1, separated out by Trial Type (Filler, /s/, and /S/). Errorbars represent 95% confidence intervals.100011001-50 51-100 101-150 151-200Exposure trial blockReaction time (ms) Trial TypeFiller/s//ʃ/CategorizationResponses with reaction times less than 200 ms or greater than 2500 ms wereexcluded from analyses. Two participants were excluded because their initial esti-mated cross-over point for the continuum lay outside of the 6 steps presented. Alogistic mixed effects model was constructed with Subject and Continuum as ran-dom effects and a by-Subject random slope for Step and by-Continuum randomslopes for Step, Attention, Exposure Type, and their interactions. Fixed effects forthe model were Step, Exposure Type, Attention, and their interactions. Deviancecontrast coding was used for Exposure Type (Word-initial = 0.5, Word-medial =-0.5) and Attention (No attention = 0.5, Attention = -0.5). An /s/ response wascoded as 1 and an /S/ response as 0.41Figure 2.6: Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 1. Error bars represent95% confidence intervals.Word−initial Word−medialllllllllllll0.000.250.500.751.001 2 3 4 5 6 1 2 3 4 5 6Continua stepProportion /s/ responseAttentionl No attentionAttentionControlThere was a significant effect for the intercept (β = 0.76,SE = 0.22,z = 3.3, p <0.01), indicating that participants categorized more of the continua as /s/ in gen-eral. This is evidence of learning compared to participants in the control experi-ment. There was also a significant main effect of Step (β =−2.14,SE = 0.15,z =−14.2, p < 0.01), and a significant interaction between Exposure Type and Atten-tion (β = −0.93,SE = 0.45,z = −2.04, p = 0.04). The results are visualized inFigure 2.6. When exposed to a modified /s/ category at the beginning of words,participants show a general expansion of the /s/ category with no difference in be-havior induced by the attention manipulation. However, when the exposure is toambiguous /s/ tokens later in the words, we can see differences in behavior be-yond the general /s/ category expansion. Participants who were not warned of thespeaker’s ambiguous tokens categorized more of the continua as /s/ compared tothose who were warned of the speaker’s ambiguous /s/ productions.422.2.3 DiscussionThe condition that showed the largest perceptual learning effect was the one mostbiased toward a comprehension-oriented attentional set. Participants exposed to themodified /s/ category in the middle of words and with no explicit instructions about/s/ had larger perceptual learning effects than any of the other conditions. The otherconditions showed roughly equivalent sizes of perceptual learning, suggesting thatthere was not a compounding effect of explicit attention and word position. Thatis, the comprehension-oriented nature of the primary task still exerts an effect onattentional set selection, and a significant perceptual learning effect was found onnovel words.The findings of this experiment do not support the predictions of a purely gain-based mechanism for attention, such as the one posited by Clark (2013). If atten-tion functioned as a gain mechanism – that is, increasing the weight of error signalsgenerated by mismatches between expectations and incoming signals – we shouldexpect to see greater perceptual learning when listeners were instructed to attend tothe speaker’s /s/ sounds. Instead, the opposite was found. Participants told to attendto the speaker’s /s/ sounds showed smaller perceptual learning effects. The natureand sentiment of the instructions may affect the outcome of attention. In this ex-periment, the instructions regarding /s/ were phrased to suggest that the ambiguityof the speaker’s “s” could harm accuracy, a negative sentiment. If the instructionsabout the speaker’s “s” were more positive, such as giving an explanation for thecause of ambiguity, then a different pattern might be observed. For a gain-basedmechanism, however, positive or negative sentiment in attention is not predicted toaffect attention, but rather attention always increasing the gain of error signals. Ifsentiment of the instructions does change behavior, then it would be another markagainst a gain mechanism.In addition to the perceptual learning effects of the categorization phase, theexposure phase also demonstrates learning. In the initial trials, words with a mod-ified /s/ are responded to more slowly and less accurately, but over the course ofexposure, both reaction times and accuracy approach those of filler and unmodified/S/ words. Interestingly, only the attention manipulation had an effect on exposureperformance, with participants attending to the /s/ category responding more accu-43rately overall. Exposure Type did not significantly influence accuracy or reactiontime in the exposure task.Much of the literature on perceptual learning in speech perception focuses onthe issue of generalization and specificity. For instance, listeners have been shownto generalize across speakers more if the exposure speaker’s modified categoryhappens to be within the range of variation of the categorization speaker’s stim-uli (Eisner and McQueen, 2005; Kraljic and Samuel, 2005). Additionally, manyperceptual learning studies artificially enhance the similarity between exposure to-kens and categorization tokens, such as splicing the maximally ambiguous step ofthe categorization continuum into exposure words (Norris et al., 2003). Becauseexposure-specificity plays such a large role in perceptual learning, it is natural toconsider whether the greater perceptual learning effects in some conditions arisedue to greater similarity to the exposure stimuli. However, as shown in Figure 2.3,Word-medial exposure tokens are acoustically farther from the categorization to-kens than the Word-initial exposure tokens. Even if auditory similarity of /s/ acrossexposure and categorization played any role, it was still overridden by the experi-mental manipulations.In this experiment I used a similar method for exposure stimuli selection asReinisch et al. (2013), but used a threshold of 50% word response rate in the pretestas the cutoff rather than 70%. With their 70% stimuli, Reinisch and colleaguesreport word endorsement rates that consistently exceeded 85%. In contrast, Exper-iment 1 used 50% as the threshold and had correspondingly lower word endorse-ment rates (mean = 76%, SD = 22%). Despite the lower word endorsement ratesand the less canonical stimuli used, perceptual learning effects remained robust.This raises the question: can perceptual learning occur from a modified categoryeven more atypical than the one used in this experiment? More atypical categoriesshould be more salient and induce a more perception-oriented attentional set, andtherefore result in smaller perceptual learning effects. In Experiment 2, we testwhether a comprehension-oriented attentional set can be maintained despite thecategory atypicality triggering perception-oriented attentional sets.442.3 Experiment 2Experiment 2 uses stimuli that are farther from the canonical productions of thecritical exposure tokens containing /s/.2.3.1 MethodologyParticipantsA total of 127 participants from the UBC population completed the experimentand were compensated with either $10 CAD or course credit. The data from 31nonnative speakers of English were excluded from the analyses. This left datafrom 96 participants for analysis.MaterialsExperiment 2 used the same items as Experiment 1, except that the step alongthe /s/-/S/ continua chosen as the ambiguous sound had a different threshold. Forthis experiment, 30% identification as the /s/ word was used the threshold. Theaverage step chosen for /s/-initial words was 7.3 (SD = 0.8), and for /s/-medialwords the average step was 8.9 (SD = 0.9). The list of steps chosen for Word-initial and Word-medial target stimuli are in Tables 2.8 and 2.9, respectively. Notethat for several stimuli, the same steps are used for both Experiment 1 and 2. Therewere large jumps in proportion /s/ response between steps for the continua for thosestimuli. However, the key aspect of the stimuli is the distribution of the /s/ categoryas a whole, and not the individual steps.Multidimensional scaling was employed to assess the distributions of the stim-uli used in Experiment 2. A similar pattern is found for Experiment 2 as Experi-ment 1. The axes remain the same as before, with the first dimension correspondingto differences between sibilants, and the second dimension corresponding to dif-ferences in word position. The original productions, categorization tokens, and /S/tokens in Figure 2.7 are identical to those shown in Figure 2.3, but the exposuretoken distribution is shifted towards the /S/ distribution. In the Word-medial posi-tion, the distributions of /s/ and /S/ are close to overlapping, and in the Word-initialposition, they are still separated, but closer than in the stimuli for Experiment 1.45Table 2.8: Step chosen for each Word-initial stimulus in Experiment 2 andthe proportion /s/ response in the pretestWord Step chosen Proportion /s/ responseceiling 8 0.20celery 7 0.30cement 7 0.26ceremony 8 0.39saddle 8 0.25safari 7 0.21sailboat 7 0.35satellite 8 0.30sector 6 0.39seminar 7 0.33settlement 8 0.35sidewalk 7 0.30silver 7 0.21socket 7 0.30sofa 7 0.26submarine 9 0.32sunroof 6 0.39surfboard 8 0.25syrup 6 0.37Average 7.3 0.30ProcedureThe procedure and instructions were identical to those of Experiment 1.AnalysisResponse data and factors were transformed and analyzed in the same way as inExperiment 1.46Table 2.9: Step chosen for each Word-medial stimulus in Experiment 2 andthe proportion /s/ response in the pretestWord Step chosen Proportion /s/ responsecarousel 8 0.25castle 9 0.25concert 10 0.30croissant 8 0.20currency 9 0.30cursor 11 0.30curtsy 9 0.26dancer 8 0.26dinosaur 9 0.39faucet 8 0.25fossil 8 0.30galaxy 10 0.26medicine 9 0.30missile 10 0.30monsoon 9 0.15pencil 8 0.37pharmacy 9 0.39tassel 8 0.35taxi 10 0.35whistle 9 0.35Average 8.9 ResultsExposureTrials with nonword stimuli and responses faster than 200 ms or slower than 2500ms were excluded from analysis. Performance on the exposure task was as high asin Experiment 1, with accuracy on filler trials averaging 92%. A logistic mixed-effects model with accuracy as the dependent variable and a linear mixed-effectsmodel with reaction time (logarithmically-transformed) as the dependent variablewere fit with identical specifications as Experiment 1.In the logistic mixed-effects model of accuracy, a significant fixed effect was47Figure 2.7: Multidimensional scaling of the acoustic distances between thesibilants of original productions, categorization tokens and the expo-sure tokens in Experiment 2. Categorization and exposure tokens weresynthesized from the original productions using STRAIGHT (Kawaharaet al., 2008).ʃʃʃs ʃsssss ʃʃʃʃ ʃsʃssssʃss ʃsssʃʃʃʃs ʃʃʃssʃʃʃsssss ss ss sʃʃ ʃʃʃ ʃʃʃʃʃʃʃʃʃssss s ʃs ssss ʃ ʃs ʃʃʃʃs ʃssss s ʃʃʃʃ ʃsʃssss ʃssʃsss ʃʃʃʃsʃʃʃssʃʃʃssss sssss ʃʃʃʃʃʃ ʃʃ ʃʃʃʃ ʃʃsssss ʃssssʃ ʃsʃs ss ss ss ss sssss sssss sss s ss ss s s ss-40-2002040-30 0 30 60First principal componentSecond principal componentExperimentaaaCategorizationExposureOriginalExposure Typeaa Word-medialWord-initialfound for Trial Type of /s/ versus Filler (β = −3.56,SE = 0.31,z = −11.4, p <0.01), with participants less likely to respond that an item was a word if it con-tained the modified /s/ category. There was a significant interaction between Trialand Trial Type of /s/ versus Filler (β =−0.29,SE = 0.11,z = 2.5, p < 0.01) and be-tween Trial and Trial Type of /S/ versus Filler (β =−0.33,SE = 0.16,z =−2.0, p =0.04). These interactions indicate that participants became more likely to endorsewords with modified /s/ productions over time, but also became less accurate onwords containing /S/. Figure 2.8 shows within-subject mean accuracy across expo-sure, with Trial in four blocks.In the linear mixed-effects model of reaction time, significant effects werefound for Trial Type of /s/ versus Filler (β = 0.94,SE = 0.07, t = 14.4), indicat-48Figure 2.8: Within-subject mean accuracy in the exposure phase of Exper-iment 2, separated out by Trial Type (Filler, /s/, and /S/). Error barsrepresent 95% confidence intervals. 101-150 151-200 51-100Exposure trial blockWord recognition accuracyTrial TypeFiller/s//ʃ/ing that reaction times were slower for words containing the modified /s/ category.Also significant was the interaction between Trial and Trial Type of /S/ versus Filler(β = 0.07,SE = 0.02, t = 3.4). However, there was no interaction between Trialand Trial Type of /s/ versus Filler (β =−0.02,SE = 0.02, t =−0.8). This indicatesthat reaction time remained relatively stable for words containing the modified /s/category, but lengthened for words containing the /S/ control. There was a marginaleffect for Trial Type of /S/ versus Filler (β = 0.14,SE = 0.07, t = 1.9), indicatingthat words with /S/ tended to be responded to more slowly than filler times. Finally,there was a marginal effect of Exposure Type (β = 0.17,SE = 0.09, t = 1.9), in-dicating that words in the Word-medial condition tended to be responded to faster.Figure 2.9 shows within-subject mean reaction time across exposure, with Trial infour blocks.CategorizationResponses with reaction times less than 200 ms or greater than 2500 ms wereexcluded from analyses. Two participants were excluded because their initial esti-49Figure 2.9: Within-subject mean reaction time in the exposure phase of Ex-periment 2, separated out by Trial Type (Filler, /s/, and /S/). Error barsrepresent 95% confidence intervals.Filler /s/ /ʃ/90010001100120013001-50 101-150 151-200 51-100 1-50 101-150 151-200 51-100 1-50 101-150 151-200 51-100Exposure trial blockReaction time (ms)Exposure TypeWord-initialWord-finalmated cross-over point for the continuum lay outside of the 6 steps presented. Alogistic mixed effects model was constructed with identical specification as Exper-iment 1.There was a significant effect for the Intercept (β = 0.60,SE = 0.26,z = 2.3, p =0.02), indicating that participants categorized more of the continua as /s/ in gen-eral. There was also a significant main effect of Step (β = −2.51,SE = 0.19,z =−13.1, p < 0.01). Unlike in Experiment 1, there were no other significant effects,suggesting that participants across conditions had similar perceptual learning ef-fects. These results are shown in Figure 2.102.4 Grouped results across experimentsThe data from Experiment 1 and Experiment 2 were pooled and analyzed iden-tically as above, but with Experiment and its interactions as additional fixed ef-fects to directly assess the effect of category atypicality. In the logistic mixedeffects model, there were significant main effects for Intercept (β = 1.00,SE =0.36,z = 2.7, p < 0.01) and Step (β = −2.64,SE = 0.21,z = −12.1, p < 0.01),50Figure 2.10: Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 2. Error bars represent95% confidence intervals.Word−initial Word−medialllllllllllll0.000.250.500.751.001 2 3 4 5 6 1 2 3 4 5 6Continua stepProportion /s/ responseAttentionl No attentionAttentionControla significant two-way interaction between Experiment and Step (β = 0.51,SE =0.20,z = 2.5, p = 0.01), and a marginal four-way interaction between Step, Ex-posure Type, Attention and Experiment (β = 0.73,SE = 0.42,z = 1.7, p = 0.08).These results can be seen in Figure 2.11. The four-way interaction can be seenin Word-medial/No Attention conditions across the two experiments, where Ex-periment 1 has a difference between the Attention and No Attention condition, butExperiment 2 does not. The two-way interaction between Experiment and Step andthe lack of a main effect for Experiment suggests that while the category boundarywas not significantly different across experiments, the slope of the categorizationfunction was.In previous research, a link has been shown between the proportion of word en-dorsement for exposure tokens and the size of perceptual learning effects (Scharen-borg and Janse, 2013). Listeners who endorsed more of exposure tokens as wordsshowed a larger perceptual learning effect. To assess such a link in the currentexperiments, a logistic mixed-effects model was constructed identically as above.However, participants’ word endorsement rate of target /s/ words were includedas an additional fixed effect, along with its interactions with all other fixed effects.51Figure 2.11: Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 1 and Experiment 2.Error bars represent 95% confidence intervals.Word−initial Word−mediallllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.000.250.500.751. AttentionAttention1 2 3 4 5 6 1 2 3 4 5 6Continua stepProportion /s/ responseExperimentlllExperiment 1Experiment 2ControlWord endorsement rate was calculated as the ratio of the number of word responsesby the total number of /s/ trials. Prior to inclusion in the model, an arcsine transfor-mation was performed on the word endorsement rates. Word endorsement rate wassignificant (β = 1.55,SE = 0.38,z = 4.1, p < 0.01), finding the same effect as pre-vious work. Participants who endorsed more exposure tokens showed larger per-ceptual learning effects. Word endorsement rate significantly interacted with Step(β = 0.63,SE = 0.24,z = 2.6, p < 0.01) and was involved in a marginal interactionwith Step, Attention and Experiment (β =−1.79,SE = 0.94,z =−1.8, p = 0.06).To better investigate the nature of the four-way interaction, word endorsements52were correlated with estimated cross-over points by participant. Cross-over pointsoccur when a participant’s perception switches from predominantly /s/ to predomi-nantly /S/. The cross-over point was determined from the Subject random interceptand the by-Subject random slope of Step in a simple model containing only thoserandom effects, similar by-Continuum random effects, and a fixed effect for Step(Kleber et al., 2012). Scatter plots of word endorsement rate and cross-over pointacross all experimental conditions are shown in Figure 2.12. In general there is apositive correlation between word endorsement rate in the exposure phase and thecross-over point from /s/ to /S/ in the categorization phase. Participants in Experi-ment 1 who were exposed to more typical /s/ stimuli showed a stronger correlationacross conditions than participants in Experiment 2, who were exposed to a moreatypical /s/ category. An analysis of word endorsement rates across Exposure Type,Attention, and Experiment revealed only a significant difference in endorsementrates for Experiment (F(1,187) = 26.8, p < 0.01). Experiment 1 had a mean wordendorsement rate of 75% (SD = 23%) and Experiment 2 had a mean endorsementrate of 58% (SD = 27%).Figure 2.12: Correlation of cross-over point in categorization with the pro-portion of word responses to critical items containing an ambiguous/s/ token in Experiments 1 and 2.Word−medial Word−initiallll lllllll llllllll lllllllllllllllllllllllll lllllllllll lllllll llllllll lllllllllllllllllllllllllllllll llll lll lllllllllllll llllllllllll llllllllllllllllllllllll llllll llllllllllllllllll345345No AttentionAttention0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00Proportion "word" response to exposure /s/ itemsCrossover step across continuaExperimentllExperiment 1Experiment 2In Experiment 1, the strongest correlations between word endorsement rate and53cross-over point are seen in the Word-initial conditions (Attention: r = 0.46, t(22)=2.4, p = 0.02; No Attention: r = 0.45, t(22)= 2.4, p = 0.02), with the next strongest,and more marginal, correlation in Word-medial/Attention condition (r = 0.39, t(23)=2.0, p = 0.06). The condition for which the most perceptual learning was observed(Word-medial/No Attention) actually has the weakest relationship (r = 0.32, t(21)=1.5, p = 0.13).In Experiment 2, the strongest correlation is in the Word-initial/No Atten-tion condition (r = 0.40, t(23) = 2.1, p = 0.05), with two trending correlationsfor the Word-medial conditions (Attention: r = 0.33, t(20) = 1.6, p = 0.12; NoAttention: r = 0.27, t(22) = 1.3, p = 0.20). Finally, the correlation for the Word-initial/Attention condition is not significant (r =−0.05, t(20) =−0.2, p = 0.82).2.5 General discussionThe perceptual learning effects found in Experiment 1 and 2 align with eithercomprehension-oriented or perception-oriented attentional sets. The perception-oriented attentional sets are predicted to exhibit less generalization, similar to whatis seen in the psychophysics literature and in visually-guided perceptual learn-ing in speech perception (Reinisch et al., 2014). In support of this, participantsin perception-oriented conditions of Experiment 1 (i.e., Attention conditions andWord-initial conditions) showed uniform and modest amounts of perceptual learn-ing. Those in Experiment 2, who were triggered to use a perception-oriented setbased on the category atypicality of the stimuli, showed similar modest levels ofperceptual learning. Participants that were not exposed to any triggers towardsperception-oriented attentional sets (Experiment 1/ No Attention/ Word-medial)were predicted to use a more comprehension-oriented attentional set which alignswith the task performed. These participants showed a substantially larger percep-tual effect than those biased towards perception-oriented attentional sets.Compared to Experiment 1, Experiment 2 had a weaker correlation betweencritical word endorsement rates and cross-over boundary points. This suggests thatalthough the stimuli used in Experiment 2 were farther from the canonical produc-tion, they did not shift the category boundary as much as the stimuli in Experiment1. While neither attention nor position of the ambiguous sound had an effect on54the correlation, the distance from the canonical production did. This potentiallysuggests that the degree to which a category is shifted is inversely related to thetoken’s distance to the mean.One condition in Experiment 1 did not have a strong correlation between wordresponse rate and cross-over point. This condition (No Attention/Word-medialexposure) had the largest perceptual learning effect, as well. The lack of cor-relation in precisely this condition falls out from the proposed attention mecha-nism. Comprehension-oriented attentional sets are proposed to update higher, moreabstract linguistic representations. Initial endorsements might shift the bound-ary more than later endorsements under such an attentional set, which would re-sult in a non-linear relationship between endorsement rate and cross-over point.Lexically-guided perceptual learning is typically induced with relatively few to-kens, usually 20 of 200 total tokens (Norris et al., 2003; Reinisch et al., 2013),but as few as 10 modified tokens have been shown to cause perceptual learning(Kraljic et al., 2008b). Under perception-oriented attentional sets, the relationshipbetween endorsement rate and cross-over point may be more linear, with equal up-dating per endorsement, but each individual instance contributes less than initialcomprehension-oriented endorsements. Visually-guided perceptual learning gen-erally uses hundreds of target tokens with no fillers (Vroomen et al., 2007; Reinischet al., 2014).The correlation between word response rate in the exposure phase and the cat-egory boundary in the categorization phase across both experiments has two pos-sible explanations. In a causal interpretation between exposure and categorization,as each ambiguous sound is processed and errors propagate, the distribution forthat category (for that particular speaker) is updated. Participants who processedmore of the ambiguous sound as an /s/ updated their perceptual category for /s/more. This explanation fits within a Bayesian model of the brain (Clark, 2013)or a neo-generative model of spoken language processing (Pierrehumbert, 2002).A non-causal story is also plausible: the correlation may reveal individual differ-ences on the part of the participants, where some participants are more adaptable ortolerant of variability than others. These more tolerant listeners then show greaterdegrees of perceptual adaptation. Individual differences in attention-switching con-trol have previously been found to affect perceptual learning (Scharenborg et al.,552014), which supports a non-causal interpretation as well.As mentioned in the discussion for Experiment 1, the findings do not supporta simple gain mechanism for attention (contra Clark, 2013). In Yeshurun and Car-rasco (1998), attention to areas with finer spatial resolution caused observers tomiss larger patterns. If attention simply boosted the error signal, attention of allkinds should always be beneficial to perceptual learning. The findings of these twoexperiments supports a larger role for attention in a predictive coding framework,which been previously noted in Block and Siegel (2013). The propagation-limitingattention mechanism proposed in this dissertation explains both the findings in thevisual domain and the current findings. Attention to a level of representation causeserrors between expectations and observed signals to be resolved and updated at thatlevel. If attention is more oriented towards comprehension, errors can be propa-gated to a higher, more abstract level of linguistic representation before updatingexpectations.A lexical decision task by default biases a participant towards a comprehension-oriented attentional set. The experimental manipulations promoted perception-oriented attentional sets that attenuated generalized perceptual learning effects. Tofully examine the use of perception- and comprehension-oriented attentional sets inperceptual learning, manipulations that induce comprehension-oriented attentionalsets are necessary. In the following chapter, such a manipulation is implementedthrough increasing linguistic expectations with semantic predictability.56Chapter 3Cross-modal word identification3.1 MotivationThe largest perceptual learning effect in this dissertation was found in Experiment1 in the No Attention/ Word-medial condition, which is the condition that was theleast likely to promote a perception-oriented attentional set in listeners. The lexi-cal decision task is a comprehension-oriented task, so the comprehension-orientedattentional set is the default attentional set. Participants with no manipulation pro-moting a perception-oriented attentional set would have maintained this defaultattentional set. The experiment in this chapter examines perceptual learning inlarger sentence contexts, as opposed to lexically guided perceptual learning in sin-gle word paradigms. Using sentences ending with final target words that containa modified /s/ category, semantic predictability is used in conjunction with lexi-cal bias to boost linguistic expectations. The linguistic expectation exploited inChapter 2 was a lexical expectation. Hearing part of a word increases a listener’sexpectation for hearing the rest of that word. But as the words are presented in iso-lation in the lexical decision task, all words have equal likelihood of occurring. Noparticular expectations are present prior to hearing the initial sounds of the word.In a sentence context, however, expectations of a particular word can be boosted bythe words preceding it. If the expectation for a word is increased, the expectationsfor the sounds within it would be increased as well.For our purposes, semantic predictability refers to how predictable the final57word in a sentence is (Kalikow et al., 1977). Example (1) is a high predictabilitysentence and Example (2) is an unpredictable sentence. Although the final word isthe same in both, the preceding sentence cues the final word in the predictable sen-tence, but not in the unpredictable one. Semantic predictability does not referenceformal models of semantics explicitly, but is rather more about world knowledge.(1) The cow gave birth to the calf.(2) She is glad Jane called about the calf.In general, high predictability sentences contain less signal information, but areeasier to process and understand. For example, semantic predictability in produc-tion studies is associated with phonetic reduction (particularly duration of wordsand sounds) independent of lexical factors like frequency and neighborhood den-sity (Scarborough, 2010; Clopper and Pierrehumbert, 2008). In speech perceptionwork, semantic predictability and lexical bias have been found to have similar ef-fects on phoneme categorization (Connine, 1987; Borsky et al., 1998). Sentenceswith higher semantic predictability are more intelligible in noise, particularly fornative speakers (Kalikow et al., 1977; Mayo et al., 1997; Fallon et al., 2002; Brad-low and Alexander, 2007, and others). Similarly, in phoneme restoration tasks,semantic predictability increases the bias for a listener to hear a complete word,which may account for the increased intelligibility. However, this increased bias iscoupled with an increased sensitivity in detecting missing sounds for semanticallypredictable words (Samuel, 1981).Samuel (1981) proposes that high predictability sentences place a lower cogni-tive load on the perceptual system. The lower cognitive load allows for more cog-nitive resources to be allocated to the primary perception-oriented task, resultingin greater perceptual sensitivity. Mattys and Wiget (2011) manipulated cognitiveload through easier or harder concurrent visual search tasks during a phoneme cat-egorization task. Mirroring the phoneme restoration results, Mattys and colleaguesfound greater perceptual sensitivity in conditions with lower cognitive load. Inboth of these cases, the goal of the listener was oriented towards perception. Alower cognitive load on the perceptual system may allow more cognitive resources(including attention) to be allocated to perception. In a comprehension-oriented58task, lower cognitive load would not necessarily always result in greater perceptualsensitivity. If a listener’s end goal is not perception of a specific production of aspeech sound, then performance on the task would not necessarily be increased byattending further to perception.There are many possible outcomes for perceptual learning in this experiment.Many theoretical frameworks do not make explicit predictions about how the per-ceptual system will be updated in the context of full sentences. Most models ofspeech perception end at perception of words. In a sentence like Example (1),perceiving individual words is likely not the goal. For instance, independently per-ceiving the word to does not aid in comprehending the meaning of the sentence.Instead, comprehension is likely more oriented towards the relations between con-cepts and perceiving phrases or multiword chunks. If the perceived chunk is largerthan a word, are the fine details still as faithfully encoded as they are for words inisolation? Even if the fine details are encoded, are they reliable enough evidencefor perceptual learning? The experiment reported here takes a first pass at answer-ing some of these questions, and sets the stage for future inquiry into perceptuallearning from sentences.One promising avenue for exploring this chapter’s research questions lies in thereliability of evidence, which has been shown previously to be crucial for percep-tual learning (Kraljic et al., 2008b,a). If ambiguous productions are accompaniedby a video of the speaker holding a pen in their mouth, then no perceptual learn-ing is observed (Kraljic et al., 2008b). Likewise, if listeners are first exposed totypical tokens, and then exposed to atypical tokens, no perceptual learning is ob-served. If the order is flipped (atypical tokens first), then perceptual learning effectsare present (Kraljic et al., 2008b). If there is a linguistic context that conditions agreater variability, then modified tokens in those contexts will not cause percep-tual learning (Kraljic et al., 2008a). However, the unlearned modification must liewithin the range of variability conditioned by the context. For /s/ in /stô/ clusters,where /s/ becomes more /S/ like due to coarticulation, a more /S/-like modificationwill not be learned. Presumably, modifications that lay outside of the variabilityconditioned by the context will still be learned. Kraljic and colleagues argue fromthese studies that listeners will attribute variation to the context as much as possi-ble, and only fall back to updating perceptual categories when no other explanation59is available.Extended beyond single words, reliability can be thought of in terms of per-ceived carefulness of a word production. In an experimental setting, every stimu-lus is carefully curated by the experimenter. However, both in the laboratory andoutside, words in isolation are produced longer and more clearly than their coun-terparts in full sentences. Words in isolated sentences are going to be producedless clearly than words in isolation (though not necessarily unintelligibly). Wordsin spontaneous conversation are likely to be the least clear, as seen in the “massivereduction” in the Buckeye corpus (Johnson, 2004; Dilts, 2013). All of these factorsare dependent on aspects of the sentence (focus, clause type, etc.) or of the speechstyle, so words in casual conversation will be less clear than words in a formalpresentation.From a perception standpoint, the more clear an acoustic token, the more signalinformation is available to be processed. Clear tokens typically have longer dura-tions, increased intensity, and more distinct formants (Krause and Braida, 2004).A listener would view tokens that were produced more clearly or with greater careas more reliable productions for that speaker. Listeners have been found to recog-nize careful and casual speech equally well, but signal information is used more incareful speech (Sumner et al., 2015). If we extend the argument by Kraljic and col-leagues, it would predict that sentences should have less perceptual learning thanwords in isolation because (some of) the variability of a word’s production in asentence can be attributed to the fact that the item is in a sentence context. Addi-tionally, given the propensity for acoustic reduction in high predictability contexts(Scarborough, 2010), words in predictable sentences would be even less reliable,leading to less perceptual learning.This is not to say that sentences are ineffective in driving perceptual learn-ing as compared to words in isolation. From the literature on perceptual learningof foreign accents, sentences are extremely useful in learning to perceive nonna-tively accented speakers (Bradlow and Bent, 2008). For the purposes of learningan accent, sentences are probably better than words in isolation, as the greater con-text would allow for better identification of the words. Differences in perceptuallearning from native and nonnative speakers can also be seen in the contradictoryfindings of Sumner (2011) and Kraljic et al. (2008b). Sumner (2011) found that60listeners could update their perceptual categories constantly over the course of theexperiment. In contrast, Kraljic et al. (2008b) found that listeners adapted to thefirst instances of the category that they heard and did not use subsequent tokens.The nativeness of the exposure speaker differed in the two experiments. Constantadaptation was found for a nonnative speaker (Sumner, 2011) rather than a nativespeaker (Kraljic et al., 2008b). Listeners may be more biased toward typical na-tive categories for native speakers, so that exposure to an initially typical categorycauses listeners to disregard the later atypical category as unreliable. Listeners cantherefore leverage their previous experience with native speakers more readily. Lis-teners’ previous experience does not as readily extend to nonnative speech, whereinterspeaker and intraspeaker variation is more prevalent, so constant adaptationand consistent token reliability would improve comprehension the most.This dissertation largely adopts the predictive coding framework presented inClark (2013) to account for perceptual learning. Reliability of sensory informa-tion is not directly addressed in Clark’s exposition. The basic form of his model,however, predicts that increasing expectations should always increase error sig-nals. In Egner et al. (2010, cited in Clark (2013)), participants were exposed tofaces and non-face objects (i.e., houses) embedded in white noise on a computerscreen (static). Participants who were told about the faces had equally-sized neu-ronal responses in the fusiform face area for face stimuli as for non-face stimuli.Participants who were not expecting to see faces showed neuronal responses inthat area only for the face stimuli. The mismatch between expectation and the per-ceived signal generated an error signal of similar magnitude to the signal itself. Ifincreased expectations result in increased error signals, perceptual learning shouldbe largest for participants exposed to the modified category in higher predictabilitysentences. If smaller perceptual learning effects are observed for those participants,then reliability weighting or attribution of error signals would be necessary in themodel.To test these predictions, a novel exposure paradigm was used in place of a lex-ical decision task. In this paradigm, participants are presented with sentences au-ditorily. Following the sentence, two pictures appear on the screen: one matchingthe final word of the sentence and the other a distractor. Participants are instructedto indicate which picture corresponds to the final word of the sentence. Following61exposure, participants completed the same categorization task as those in Experi-ments 1 and 2. This experiment will validate lexical decision tasks for learning asingle characteristic (/S/-like /s/) in a context that is more closely resembling actuallanguage use. At the same time, this experiment provides a link between lexically-guided perceptual learning and experiments that use sentential stimuli for learningnon-native accents (Bradlow and Bent, 2008).3.2 Methodology3.2.1 ParticipantsA total of 137 participants from the UBC population completed the experimentand were compensated with either $10 CAD or course credit. The data from 39nonnative speakers of English were excluded from the analyses. No participantsreported speech or hearing disorders. This left data from 98 participants for analy-sis. Twenty additional native English speakers participated in a pretest to determinesentence predictability, and 10 other native English speakers participated in a pic-ture naming pretest.3.2.2 MaterialsOne hundred and twenty sentences were used as exposure materials. The set ofsentences consisted of 40 critical sentences, 20 control sentences and 60 filler sen-tences. The critical sentences ended in one of 20 of the critical words in Experi-ments 1 and 2 that had an /s/ in the onset of the final syllable. The 20 control sen-tences ended in the 20 control items used in Experiments 1 and 2, and the 60 fillersentences ended in the 60 filler words in Experiments 1 and 2. Half of all sentenceswere written to be predictive of the final word, and the other half were written to beunpredictive of the final word. Unlike previous studies using sentence or semanticpredictability (Kalikow et al., 1977), unpredictive sentences were written with arange of sentence structures. In all cases, the final words were plausible objectsof lexical verbs and prepositions. The high and low predictability filler sentencescan be found in Tables 3.1 and 3.2, respectively. The high and low predictabilityfiller sentences with /S/ words can be found in Tables 3.3 and 3.4, respectively. Fi-62nally, the high and low predictability critical sentences can be found in Tables 3.5and 3.6, respectively. Aside from the sibilants in the critical and control words,the sentences contained no sibilants (/s z S Z Ù Ã/). The same minimal pairs forphonetic categorization as in Experiments 1 and 2 were used.Table 3.1: High predictability filler sentences.Sentence Word DistractorThe oak tree grew from a tiny acorn pineappleThe radio in the car didn’t work with a bent antenna towelThe clown made the girl an animal from a balloon pancakeEveryday the panda had to eat a lot of bamboo boomerangThe belt had an ornate buckle hamburgerThe caterpillar came out of the cocoon a beautiful butterfly crayonThe hermit lived in a log cabin paradeThey marked the date on the calendar antlerThey rode to the pyramid on the back of a camel atomRight before the plane took off,the captain called the flight crew from the cockpit doorknobThe woman threaded the bowtie through her collar ladybugAt the rodeo, the cattle were rounded up by the cowboy rippleThe baby rocked in her cradle telephoneThe delivery man rang the doorbell firewoodHe moved the wet laundry over to the dryer hydrantThe tiny rodent terrified the big, grey elephant pepperThe criminal wore a glove to not leave behind one fingerprint islandThe cook needed one more clove of garlic wheelbarrowRed paint in hand, the youth tagged the building with graffiti catapultThe watery dinner had to be poured out with a ladle lollipopThe man reheated the leftover dinner in the microwave ukuleleEvery dinner plate came with a folded napkin toothpickThe ballroom had a grand piano dolphinThe woman tied her hair back in a ponytail airportThe adult frog developed from a tadpole bucketThe acting company performed in an old theatre earmuffHer favorite burrito came wrapped in a flour tortilla falafelThe farm youth rode around on the tractor barbecueThe train went under a mountain through a tunnel wagonThe heavy rainfall could have been predicted by the weatherman robotSentences were recorded by the same male Vancouver English speaker used inExperiments 1 and 2. Critical sentences were recorded in pairs, with one normal63Table 3.2: Low predictability filler sentences.Sentence Word DistractorThey clapped loudly for the acrobat pillowThe man liked to begin the day with an apple vampireWearily, the woman built up her campfire bagelHe looked forward to freely available candy donkeyThe couple never agreed on the cutlery butterHe took pride in the renovated darkroom candleThey were enthralled by the diamond kiwiWhile he lay on the ground, the boy played with a feather broccoliTo get any farther, they definitely needed a good goalie waterfallHe didn’t know how to get to the gondola honeyThey were a little frightened to board the helicopter cannonThe woman needed to borrow a ladder flagpoleHe had to track down and get help from the librarian tornadoThey had a good view of the lightning coffeeToward the end, they were running low on lumber anvilThey liked how it looked on the mannequin parrotOn the way, they liked walking through the meadow cupcakeThe couple were looking forward to buying a minivan tugboatHe finally made it to the motel armadilloThey went out for a quick bite to eat after the movie volleyballHe really liked the look of the mural monocleAfter a long night, he devoured the whole omelet hummingbirdOn the field trip, they learned all about the painter pulleyThe boy cried when they took away the popcorn puppetThe irate woman yelled at the referee propellerWhen they were called, the group moved to the table helmetHe didn’t know about a problem with the teapot crowbarHe had to remember to pick up the tire parakeetEvery day he dreaded the late afternoon traffic rowboatThe woman kept a lookout for the umbrella catamaranproduction and then a production of the same sentence with the /s/ in the finalword replaced with an /S/. The speaker was instructed to produce both sentenceswith comparable speech rate, speech style, and prosody.As in Experiments 1 and 2, the critical items were morphed together into an11-step continuum using STRAIGHT (Kawahara et al., 2008); only the final wordin each sentence was morphed. The preceding words were the synthesized versionsof the sentence with the correct /s/ production to minimize artifacts of the morphing64Table 3.3: High predictability sentences with /S/ words.Sentence Word DistractorThe bidding became frantic for the final item in the auction accordionWhile waiting in line at the new bank,the woman read their introductory brochure blueberryThe woman only got a dime back after paying the cashier laptopHe could only kneel for a little while without a plump cushion forkliftLava flowed down the volcano after the violent eruption pumpkinThe bear awoke from her winter hibernation violinAfter jumping out of the plane, the woman opened her parachute cameraThe doctor took the time to look in on every patient crocodileWhile down with the flu,the woman invariably carried a clean tissue whirlpoolThe opera-goer found her row with the help of an usher doormatTable 3.4: Low predictability sentences with /S/ words.Sentence Word DistractorThe woman couldn’t wait to fill up the bookshelf muffinThe whole family travelled for an hour to the coronation waffleHe did not look forward to the handshake raccoonHe gave a wide berth to the machine kittenThey dragged their feet on the way to the mansion treadmillHe had a hard time with meditation beekeeperThey were deeply worried about the militia peanutHe went on and on about the milkshake elbowFor winter break, he wanted to go to the ocean iguanaHe could finally get a new windshield koalaalgorithm. The control and filler items were also processed and resynthesized toensure consistent quality. The ambiguous point selection was based on the pretestperformed for Experiment 1 and 2 exposure items. The ambiguous steps of thecontinua chosen corresponded to the 50% cross over point in Experiment 1.Acoustic distances between exposure tokens, categorization tokens, and theiroriginal productions were multidimensionally scaled. In Figure 3.1, the originalproductions are separated again by the first dimension, which corresponds to thecentroid frequency of the sibilant. The categorization tokens are predictably inbetween the original productions and offset in the second dimension due to their65Table 3.5: High predictability sentences with target /s/ words.Sentence Word DistractorAt the carnival the girl rode a unicorn around the carousel pirateA deep moat protected the old castle martiniThe encore from the pop duo perfected the whole concert earplugFrom the bakery he got a flaky, buttery croissant windmillAfter her world trip,the traveller had a little money leftover in every local currency elevatorWhen the computer locked up, he couldn’t move the cursor cloverThe lady returned the bow with a formal curtsy gavelThe critic raved about the ballet and the lead dancer cricketLong ago, a comet hit the earth, killing every big dinosaur bandanaWater poured into the bath tub from the faucet doughtnutAfter millennia, the bone in the riverbed turned into a fossil menorahThe name ’Milky Way’ can perfectly depict our galaxy kayakWe no longer worry about the plague due to modern medicine cucumberIn the heated aerial battle, neither pilot could lock on with a missile cookieRain fell every day in India during the monsoon gargoyleThe man wrote on the paper with a graphite pencil tromboneThe woman got an over-the-counter drug at her local pharmacy kettleFrom the cap of the new grad hung a golden tassel guitarThe New Yorker flagged down a taxi ribbonThe traffic cop alerted the driver by blowing her whistle raviolidifferent position in the word. The exposure tokens for Experiment 3 fit in betweenthe original productions and the categorization tokens.Sentences were pairs with two pictures apiece. Pictures of 200 words, with 100pictures for the final word of the sentences and 100 for distractors, were selectedin two steps. First, a research assistant selected five images from a Google imagesearch of the word, and then a single image representing that word was selectedfrom amongst the five by me. To ensure consistent behavior in E-Prime (Psychol-ogy Software Tools, 2012), pictures were resized to fit within a 400x400 area witha resolution of 72x72 DPI and converted to bitmap format. Additionally, any trans-parent backgrounds in the pictures were converted to plain white backgrounds.66Table 3.6: Low predictability sentences with target /s/ words.Sentence Word DistractorThey got back in line for the carousel pirateHe dreaded the long walk to the castle martiniHe prepared night and day for the concert earplugThe man had a craving for a croissant windmillThey weren’t worried about the different currency elevatorThe man could never find the cursor cloverThe girl didn’t want to make a curtsy gavelThe boy wanted to become a better dancer cricketThe boy really wanted to ride the dinosaur bandanaThe woman hoped to get a working faucet doughtnutNo one knew where to find the fossil menorahThe man talked at length about the galaxy kayakWith that GPA, they could have a career in medicine cucumberThe boy wanted to build a toy missile cookieOn their picnic, they avoided the monsoon gargoyleThe woman looked frantically for her pencil tromboneThe woman loved her work at the pharmacy kettleHe worried about the color of the tassel guitarThe woman had no luck getting a taxi ribbonThe boy ran away when he heard the whistle ravioli3.2.3 PretestThe same twenty participants that completed the lexical decision continua pre-testalso completed a sentence predictability task before the phonetic categorizationtask described in Experiment 1. Participants were compensated with $10 CADfor both tasks, and were native North American English speakers with no reportedspeech, language or hearing disorders. In this task, participants were presentedwith the 120 exposure sentences with the final target word removed. Participantswere instructed to type in the word that came to mind when reading the fragment,and to enter any additional words that came to mind that would also complete thesentence. There was no time limit for entry and participants were shown an exam-ple with the fragment “The boat sailed across the...” and the possible completions“bay, ocean, lake, river”. Responses were collected in E-Prime (Psychology Soft-ware Tools, 2012), and were sanitized by removing miscellaneous keystrokes, spellchecking, and standardizing variant spellings and plural forms.67Figure 3.1: Multidimensional scaling of the acoustic distances between thesibilants of original productions, categorization tokens and the exposuretokens in Experiment 3. Note that the only Isolation tokens are theCategorization tokens.sssssssssss ss sss ssss sssssssss sssss ssssʃʃsʃsss sss ʃssʃs ssʃsssʃʃssssʃʃsʃsss ʃsssssssss ʃ ʃʃs ʃʃʃssʃssss ʃss ss s ssss s sss s s s s ss s s s s s-50-25025-60 -40 -20 0 20First principal componentSecond principal componentExperimentaaaCategorizationExposureOriginalExposure TypeaaaIsolationUnpredictivePredictiveFrom the sanitized data, responses were coded as either 0 if the target word wasnot present or 1 if it was. For each sentence, the target response rate was calculatedby averaging responses from all participants. The target response rate was 0.49(range 0-0.95) for predictive sentences and 0.03 (range 0-0.45) for unpredictivesentences. Predictive sentences that had target response rates of 0.2 or less wererewritten. The predictive sentences for auction, brochure, carousel, cashier, cock-pit, concert, cowboy, currency, cursor, cushion, dryer, graffiti, and missile wererewritten to remove any syntactic or semantic ambiguities. For instance, a com-mon completion for the predictive sentence “The youth tagged the wall with...”was “spray paint” rather than “graffiti”. To promote the likelihood of “graffiti”, thesentence was changed to “Red paint in hand, the youth tagged the wall with...”,which would eliminate “spray paint” as a possible completion.68Five volunteers participated in another pretest to determine how suitable thepictures were at representing their associated word. All participants were nativespeakers of North American English, with reported corrected-to-normal vision.Participants were presented with a single image in the middle of the screen. Theirtask was to type the word that first came to mind, and any other words that de-scribed the picture equally well. There was no time limit and presentation of thepictures was self-paced. Responses were sanitized as above.Pictures were replaced if 20% or less of the participants (1 of 5) responded withthe target word and the responses were semantically unrelated to the target word.Five pictures were replaced, toothpick and falafel with clearer pictures and ukulele,earmuff and earplug were replaced with rollerblader, anchor and bedroom. Allfive replacements were for distractor words.3.2.4 Experiment designParticipants were assigned to one of four groups from a 2x2 between-subject fac-torial design. The first factor was whether the word containing the ambiguoussibilant was predictable from the preceding words or not (Predictability: Predictiveversus Unpredictive). All participants were therefore exposed to a consistent 100stimulus sentences with identical control and filler items for all participants. Thesecond factor was whether participants were given additional instructions aboutthe sibilant or not (Attention: Attention versus No Attention). Participants in theAttention condition received additional instructions that the speaker’s “s” soundswere sometimes ambiguous, and to listen carefully to ensure correct responses.3.2.5 ProcedureAs in Experiments 1 and 2, participants completed an exposure task and a catego-rization task in E-Prime (Psychology Software Tools, 2012). For the exposure task,participants heard a sentence via headphones for each trial. Immediately follow-ing the auditory presentation, they were presented with two pictures on the screen.Their task was to select the picture on the screen that corresponded to the final wordin the sentence they heard. The order was pseudorandom with the same constraintsdescribed in Experiment 1. Half of the matching pictures were selected via one69button and half via the other.Each trial proceeded as follows. A blank screen was presented for 250 ms. Im-mediately following, a sentence was presented auditorily. Following the auditorystimulus, two pictures and their respective buttons appeared on the screen. For ex-ample, a sentence ending in “dog” would show a picture of a dog and “1” on oneside of the screen, and a picture of a banana and “5” on the other side of the screen.Participants had up to 3000 ms to respond which picture matched the final word inthe sentence. Feedback as to whether a response was detected was shown for 500ms before the next trial began. Participants were given a self-paced break after 50trials.Following the exposure task, participants completed the same categorizationtask described in Experiments 1 and 2.Participants were given oral instructions explaining both tasks at the beginningof the experiment to remove experimenter interaction between exposure and cat-egorization. Written instructions were presented to participants at the beginningof each task as well. The instructions for the exposure task given to participantsassigned to an Attention condition included explicit reference to the modified sibi-lants. Participants were told that “this speaker’s ‘s’ sound is sometimes ambiguous”and instructed to “listen carefully so as to choose the correct response.”3.2.6 AnalysisResponse data and factors were transformed and analyzed in the same way as inExperiment 1 and 2.3.3 Results3.3.1 ExposurePerformance in the task was high, with accuracy near ceiling across all subjects(mean accuracy = 99.5%, sd = 0.8%). Due to these ceiling effects, a logistic mixed-effects model of accuracy was not constructed. A linear mixed effects model forlogarithmically-transformed reaction time was constructed with a similar structureas in Experiments 1 and 2. Fixed effects were Trial (0-100), Trial Type (Filler, /s/,70and /S/), Attention (No Attention and Attention), Predictability (Unpredictive andPredictive), and their interactions. By-Subject and by-Word random effect struc-ture was as maximal as permitted by the data, with by-Subject random slopes forTrial, Trial Type, Predictability, and their interactions and by-Word random slopesfor Attention, Predictability, and their interaction. Trial Type was coded usingtreatment (dummy) coding, with Filler as the reference level. Deviance contrastcoding was used for Predictability (Unpredictive = 0.5, Predictive = -0.5) and At-tention (No attention = 0.5, Attention = -0.5).Figure 3.2: Within-subject mean reaction time in the exposure phase of Ex-periment 3, separated out by Trial Type (Filler, /s/, and /S/). Error barsrepresent 95% confidence intervals.Filler /s/ /ʃ/6007008001-25 26-50 51-75 75-100 1-25 26-50 51-75 75-100 1-25 26-50 51-75 75-100Exposure trial blockReaction time (ms)AttentionNo attentionAttentionA significant effect was found for Trial (β = −0.20,SE = 0.01, t = −11.0),indicating that reaction time became faster over the course of the experiment. Therewas a significant effect for Trial Type of /S/ versus Filler (β = 0.19,SE = 0.09, t =2.1), but not for /s/ versus Filler (β = 0.11,SE = 0.09, t = 1.3), suggesting thatwords with /S/ in them were responded to more slowly than filler words or thosewith a modified /s/ in them. There was a significant interaction between Trial andTrial Type of /s/ versus Filler (β = 0.05,SE = 0.02, t = 2.4) and between Trial andTrial Type of /S/ versus Filler (β = 0.05,SE = 0.02, t = 2.9), indicating that reactiontime for words with /s/ or /S/ in them did not become as fast across the experiment71as those for filler words. These results are shown in Figure 3.2. Note that they-axis has a different scale than that used in Experiments 1 and 2 for reactiontimes. Participants were faster in general in this task than in the lexical decisiontask. Responses to predictable sentences (mean = 669 ms, SD = 321 ms) were notsignificantly faster than responses to unpredictable sentences (mean = 646 ms, SD= 299 ms), suggesting that performance was at floor.3.3.2 CategorizationResponses with reaction times less than 200 ms or greater than 2500 ms were ex-cluded from analyses. A logistic mixed effects model was constructed with Subjectand Continua as random effects and continua Step as random slopes, with 0 codedas a /S/ response and 1 as a /s/ response. Fixed effects for the model were Step,Exposure Type, Attention, and their interactions, with deviance coding used forcontrasts for Exposure Type (Unpredictive = 0.5, Predictive = -0.5) and Attention(No attention = 0.5, Attention = -0.5).Figure 3.3: Proportion /s/ response along the 6 step continua as a functionof Exposure Type and Attention in Experiment 3. Error bars represent95% confidence intervals.Unpredictive Predictivellllllllllll0.000.250.500.751.001 2 3 4 5 6 1 2 3 4 5 6Continua stepProportion /s/ responseAttentionl No attentionAttentionControlAs in the previous experiments, there was a significant effect of the intercept72(β = 0.52,SE = 0.20,z = 2.6, p < 0.01) and of Step (β = −2.49,SE = 0.19,z =−12.7, p < 0.01). Exposure Type (β = 0.23,SE = 0.23,z = 0.97, p = 0.33), Atten-tion (β = 0.30,SE = 0.21,z = 1.4, p = 0.15), and their interaction (β = 0.38,SE =0.44,z = 0.9, p = 0.39) are all not significant, despite the visible differences in Fig-ure 3.3. In Figure 3.3, there appears to be a similar interaction pattern as was seenfor Experiment 1 (Figure 2.6). Participants in the different attention conditionsfor Unpredictive exposure appear to differ in Step 4. However, the lack of signif-icance suggests that this may be less reliable or more localized to Step 4 than inExperiment 1.Figure 3.4: Proportion /s/ response along the 6 step continua as a function ofExposure Type and Attention in Experiment 3 and the word-medial con-dition of Experiment 1. Error bars represent 95% confidence intervals.Isolation Unpredictive Predictivellllllllllllllllll0.000.250.500.751.001 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6Continua stepProportion /s/ responseAttentionl No attentionAttentionControlAs an additional comparison, the data from this experiment was combined withthe subset of participants in Experiment 1 who were exposed to the same set ofwords (the word-medial condition). Exposure Type was recoded as a three-levelfactor, using treatment (dummy) contrast coding, with the Experiment 1 exposure(Isolation) as the reference level. An identically specified logistic mixed effectsmodel was fit to this data set as to the initial data. In this model, there was a sig-nificant effect of Attention (β = 0.74,SE = 0.32,z = 2.2, p = 0.02), such that par-ticipants in Attention conditions were less likely to categorize the continua steps73as /s/. Exposure Type had a marginal effect of Predictive compared to Isolation(β =−0.43,SE = 0.23,z =−1.9, p = 0.05), indicating that participants in the Pre-dictive condition were less likely to categorize the continua as /s/ overall as com-pared to participants from Experiment 2. Step interacted with both Unpredictive ascompared to Isolation (β = −0.42,SE = 0.17,z = −2.4, p = 0.01) and Predictiveas compared to Isolation (β =−0.32,SE = 0.14,z =−2.2, p = 0.02). These inter-actions indicate that the categorization functions for sentential stimuli had a steepercross-over than the Isolation. As shown in Figure 3.4, the endpoints (Steps 5 and 6)for the sentential conditions are wholly overlapping with the control categorizationfor those steps. While participants in the Unpredictive condition showed a shiftedcategory boundary, the perceptual learning affected less of the continua than forparticipants in the Isolation condition.An additional model was run with the reference level for Exposure Type as Pre-dictive to check whether participants in the Predictive condition showed perceptuallearning effects at all. In the model with Predictive as reference, the intercept is nolonger significant (β = 0.44,SE = 0.28,z = 1.5, p = 0.12), indicating perceptuallearning was not robustly present in participants in the Predictive condition. Thedifference between the Predictive condition and the Isolation condition remains(β = 0.77,SE = 0.32,z = 2.4, p = 0.01), and, as above, the difference between Pre-dictive and Unpredictive is not significant (β = 0.42,SE = 0.31,z = 1.3, p = 0.17).These results indicate participants in the Predictive condition showed no perceptuallearning effects, and participants in the Unpredictive condition were in betweenPredictive and Isolation, but not significantly different from either. Increasing thestatistical power might separate the conditions further.3.4 DiscussionThe key finding of the current experiment is that modified categories embedded inwords in meaningful sentences produce less perceptual learning than words in iso-lation. In fact, participants exposed to a modified category only in predictive wordshad a similar boundary as those in the control experiment who had no exposure toa modified /s/ category. This pattern of results aligns the most with the extension toKraljic and colleagues’ argument that perceptual learning is a last resort. If there74is any way to attribute the acoustic atypicality to either linguistic or other sourcesof variation, no perceptual learning occurs (Kraljic et al., 2008b,a). In the currentexperiment, semantic predictability may be a linguistic source to which variationcan be attributed. Semantic predictability shows effects that are similar to a morelocal source like consonant cluster coarticulation.The prediction of a simple predictive coding model (Clark, 2013) was not borneout. Rather than increased expectations enhancing error signals, the conditionswith increased expectations showed no perceptual learning at all. How can wereconcile then the predictive coding model and the findings of the current experi-ment? One way, certainly, is to incorporate the reliability argument of Kraljic andcolleagues. Bayesian approaches capture uncertainty quite well, so the unreliabletokens, such as those in the high predictability sentences, would have greater un-certainty associated with them. Another possibility is that perceptual learning didoccur, but it was not generalized to the test items. Participants could have learnedfrom their exposure how the speaker produces /s/ in high predictability contexts,but the context of words in isolation was too different from the exposure context.Put another way, the participants could have learned how the speaker reduces his/s/ category, but not how the speaker normally produces it.However, if semantic predictability functions in a similar way as consonantcluster coarticulation, listeners would not show perceptual learning effects even ifthey were tested on a continuum in a high predictability sentence. In Kraljic et al.(2008a), listeners exposed to an ambiguous /s/ in the context of /stô/ and then testedon a continuum from /astôi/ to /aStôi/ showed no perceptual learning effects. Partici-pants who were exposed to ambiguous /s/ intervocalically showed perceptual learn-ing on both /asi/-/aSi/ and /astôi/-/aStôi/ continua. There was no exposure-specificityeffect, so participants did not even learn that the speaker produces a more /S/-like/s/ in that context. Any abstract encoding process accounts for and removes thevariability associated with the context, leaving the unmodified perceptual category.A similar pattern is likely to be seen with high predictability exposure. Importantly,the speaker’s durations for the target /s/ words did not different across predictabil-ity conditions (Predictable /s/ words: mean = 0.53 s, SD = 0.06 s; Unpredictable/s/ words: mean = 0.53 s, SD = 0.07 s). Any effect of predictability is more likelyfrom listener perception than speaker production in this experiment.75Figure 3.5: Schema of category relaxation in predictable sentences. The solidvertical line represents the mean of the modified category similar to theone used for Experiment 1, and a dashed vertical line represents themean of the Experiment 2 modified category. A more atypical category,as was used in Experiment 2, has a higher probability of being catego-rized as /s/ in predictable sentences than in isolation.Isolation PredictableModified /s/categoryExperiment 1Experiment 2Category/s//ʃ/One question raised by this finding is whether perceptual learning is possibleat all in high predictability sentences. If the range of acceptable variation for allcategories is expanded (schematized in Figure 3.5), the modified category wouldhave fallen closer to the expected mean in predictable sentences compared to isola-tion. In terms of error propagation, the modified categories used here may not havegenerated enough errors to learn from. Presenting listeners with a more atypicalcategory should then cause more perceptual learning in this case. In Figure 3.5,the atypical category from Experiment 2 would have a higher likelihood of beingcategorized as /s/ in predictable sentences than in isolation. If increasing the atypi-cality in predictable sentences did in fact result in increased perceptual learning, itwould suggest that perceptual learning is maximized in a particular range. Tokenstoo close to the expected mean are too typical to learn from, and tokens too far fromthe expected mean are too unreliable. However, if listeners simply ignore atypicalsounds in highly predictable words, then increasing the atypicality of the category76(i.e. using the ambiguity threshold from Experiment 2) would not increase percep-tual learning. If that were the case, listeners might not even be sensitive to replacingthe /s/ with another sound category entirely (i.e. /f/) in a comprehension-orientedtask (but see Samuel, 1981).Figure 3.6: Distribution of cross-over points for each participant across com-parable exposure tokens in Experiments 1 and 3. Larger bulges repre-sent more subjects located at that point in the distribution. The dashedline represents the mean step of the continua. Large bulges around thedashed line for Control, Unpredictive and Predictive conditions indicatethat many speakers did not change their category boundaries, comparedto the Isolation conditions.345Control Isolation Unpredictive PredictiveExposure ConditionCrossover step across continua AttentionNo attentionAttentionControlAs a final point in this discussion, the distribution of individuals’ perceptuallearning effects differs in shape as compared to Experiment 1. Figure 3.6 shows thedistribution of cross-over points of each subject in Experiment 3 and participantsin the condition of Experiment 1 that used the same exposure words. Cross-overpoints are where along the continua perception switches from primarily /s/ to pri-marily /S/, and higher cross-over points are indicative of greater perceptual learning.Participants exposed to the modified category in sentences show more consistency(larger bulges in the violin plots) than those exposed in isolated words. In the Iso-lation conditions, participants follow a fairly wide, even distribution. In contrast,77participants in the sentence conditions a more tightly clustered either around thenormalized cross over point or a step above, suggesting potentially discrete groupsin the distribution.One possible reason for these more discrete groups may relate to cognitive load.Under lower cognitive load conditions, participants in perception-oriented tasksshow greater perceptual sensitivity. In this experiment, the task is comprehension-oriented, so lower cognitive load could have distributed cognitive resources ei-ther to the comprehension task or to aspects of the signal. Participants with betterattention-switching control might devote those resources to perception, while thosewith worse attention-switching control might not, which may be the cause of thefindings in Scharenborg and Janse (2013). Future research should quantify partic-ipants’ attention-switching abilities and other individual differences that may playa role in explaining these findings.78Chapter 4Discussion and conclusionsThis dissertation set out to examine the influence of listener attentional sets on per-ceptual learning. Perceptual learning is a phenomenon common to many fields in-volved in cognitive science. How perceptual learning generalizes to new contexts,however, is quite different across paradigms. Perceptual learning in psychophysicsis the process of a perceiver aligning their senses to the world. Perceptual learn-ing in speech perception is the process by which perceivers align their perceptualsystem to an interlocutor to facilitate understanding.I argue that perceptual learning as a mechanism is shared between linguisticand non-linguistic domains. However, psychophysics paradigms employ primar-ily perception-oriented attentional sets, while speech perception paradigms employboth perception-oriented and comprehension-oriented attentional sets. Perception-oriented attentional sets in all domains lead to less generalized learning. Con-versely, comprehension-oriented attentional sets lead to more generalized learning.The first two experiments of this dissertation implement a standard lexically-guidedperceptual learning paradigm – a lexical decision task – but with manipulationspromoting perception-oriented attentional sets. Even in a comprehension-orientedlexical decision task, promoting more perception-oriented attentional sets leadsto less generalized learning. These results provide a crucial link between fullycomprehension-oriented perceptual learning in the lexically-guided paradigm andfully perception-oriented perceptual learning in visually-guided paradigms. Theremainder of this chapter first summarizes the results of the dissertation as they re-79late to specificity and generalization in the perceptual learning literature. The fourmanipulations used to promote the different attentional sets are then examined,followed finally by implications for models of cognition and psycholinguistics.4.1 Specificity and generalization in perceptual learningThe results of this dissertation speak to the dichotomy between specificity andgeneralization found in the perceptual learning literature. In Experiment 1, partici-pants had larger perceptual learning effects when they were exposed to ambiguoussounds later in the words rather than at the beginning of words (e.g. carousel versuscement). And yet, the testing continua consisted of stimuli with the sibilant at thebeginnings of words, which are more similar to the exposure tokens beginning withthe ambiguous sound. Exposure that matched the word position of the categoriza-tion (word-initial) showed no greater perceptual learning effects than word-medialexposure. Perceptual learning, therefore, occurred at a level of abstraction that isusually not assumed in perceptual learning studies. Most lexically-guided percep-tual learning studies attempt to make the exposure tokens and the categorizationsimilar – and in some cases, the same – in order to maximize exposure-specificityeffects. In this dissertation, listeners generalized from stimuli with large degrees ofcoarticulation (i.e., in the middle of the word) to stimuli without as much coartic-ulation. In some cases, the perceptual learning effect was largest in precisely thecases where coarticulatory effects differed from exposure to test. One aspect thatwas not tested in the current studies is exposure-specificity at the level of the item.Perhaps a more perception-oriented attentional set would show greater perceptuallearning on the specific exposure items.The effect of attentional set manipulations in Experiments 1 and 2 suggestthat when listeners adopt more perceptually-oriented attentional sets, even withintasks that are oriented toward comprehension, generalization of perceptual learn-ing to new forms is inhibited. Lexically-guided perceptual learning is more likelyto be expanded to new contexts than visually-guided perceptual learning (Norriset al., 2003; Kraljic et al., 2008a; Reinisch et al., 2014, but see Mitterer et al.,2013). Visually-guided and psychophysical perceptual learning paradigms oftenhave highly repetitive stimuli with little variation. Both of these aspects add to80the monotony of the task and the likelihood of perception-oriented attentional sets(Cutler et al., 1987).Lexically-guided perceptual learning, on the other hand, requires very few in-stances to affect the perceptual system. The standard number is around 20 am-biguous tokens within 200 trials, but as few as 10 ambiguous tokens have beenshown to have comparable effects (Kraljic et al., 2008b). A consequence of the pro-posed attention mechanism is that it nicely captures the different number of stim-uli needed for perceptual learning across comprehension-oriented and perception-oriented tasks. Tokens heard under comprehension-oriented attentional sets shouldhave a relatively large effect on the perceptual system as compared to tokens heardunder a perception-oriented attentional set. A single token updating a more ab-stract representation will generalize more than many repetitions updating fine-grained episodic representations. From this, we could predict that word endorse-ment rate and category boundary shift would be less linearly correlated the morecomprehension-oriented participants are. This prediction is borne out by the lackof correlation between word endorsement rate and cross-over point in Experiment1 in the Word-final/No Attention condition. This condition is predicted to have themost comprehension-oriented attentional set of the conditions, and here is the onlyinstance in Experiment 1 where a significant correlation between word endorse-ment rate and cross-over point is not present. Participants in this condition haverelatively high cross-over points that do not depend as much on the sheer numberof tokens endorsed.4.2 Effect of increased linguistic expectationsThe conditions of Experiment 1 that are most similar to previous lexically-guidedperceptual learning paradigms are those with no explicit instructions about the /s/category. In these conditions, increasing linguistic expectations through lexicalbias resulted in larger perceptual learning effects. I argue that the increased per-ceptual learning is due to increased maintenance of comprehension-oriented atten-tional sets by participants in the Word-medial condition. The participants exposedto a modified /s/ category at the beginnings of words would be more likely to havetheir attention drawn to the atypicality of the modified /s/ category. There are two81potential scenarios for how this would have affected participants. In the first, “nor-mal” word processing would proceed with the perception of the modified /s/ aspart of comprehending the word, but the attentional set would not change. In thesecond, processing the word would trigger an attentional set change that would getreinforced for each new modified /s/ encountered. The experiments in this disser-tation do not definitively answer which scenario is more likely, and it could be thatdifferent participants fall into different scenarios. However, when participants weretold about the ambiguity of the /s/, they do not behave any differently if the /s/ isword-initial or word-medial. This similarity of behavioral patterning suggests thesecond scenario is more likely, and more perception-oriented attentional sets wereadopted as a result of exposure to words beginning with a modified /s/ category.Increasing linguistic expectations through semantic predictability did not in-crease perceptual learning. In fact, there was a trend towards unpredictive sen-tences increasing perceptual learning. Semantic predictability has previously beenshown to affect perception-oriented tasks in a similar way as lexical bias (Con-nine, 1987; Borsky et al., 1998). In Experiment 3, however, participants exposedto the modified /s/ category in high predictability sentences showed no perceptuallearning effects at all. While the Isolation condition (Word-medial condition inExperiment 1) was not significantly different from the Unpredictive condition ofExperiment 3, there was a trend toward reduced perceptual learning when the mod-ified sound category was embedded in a sentential context in general. The lack ofa perceptual learning effect from high predictability exposure sentences is remi-niscent of studies that find no perceptual learning when a modified /s/ categoryis embedded in a /stô/ cluster that conditions that variation (Kraljic et al., 2008a).In both cases, the modified category is embedded in a context that conditions in-creased variability. However, there is a difference between the consonant clustercontext and the semantic predictability context. In the consonant cluster, there isa straightforward coarticulatory reason for /s/ to surface as more /S/-like in /stô/clusters, with the /s/ produced more in a postalveolar position due to the upcoming/ô/. For semantic predictability, there is no particular reason why a /s/ should sur-face more /S/ like in high predictability sentences. If high semantic predictabilitycan be the attributed cause of /s/ surfacing as more /S/-like, it seems reasonablethat the range of acceptable productions for all categories would be expanded (as82schematized in Figure 3.5 of Chapter 3).Perceptual learning of nonnative accents is possible through hearing sentencesof varying predictability (Bradlow and Bent, 2008, and others). However, the pho-netic variability involved in those tasks reaches far beyond the sibilant modifiedhere. The speaker producing the sentences in this dissertation is a native Englishspeaker of the local dialect. Even with the synthesis applied to the sound files, heis more intelligible than the speakers in studies involving nonnative accents. Theease of comprehension of the speaker in this study might actually inhibit perceptuallearning in sentences, because listeners can leverage so much of their perceptualexperience with other speakers of the local dialect.On the flip side, how nonnative listeners perceptually adapt to speech thatvaries in predictability is an interesting question as well. Nonnative listeners donot benefit from high semantic predictability as much as native listeners (Mayoet al., 1997). This tends to result in less accuracy for transcribing speech in noise.As the sentences presented here did not include noise, the lessened benefit fromsemantic predictability might manifest differently. If high predictability sentencesare not as predictable for those listeners, they may show perceptual learning effectsmore similar to unpredictable sentences.4.3 Attentional control of perceptual learningThe findings of Experiment 1 support the hypothesis that comprehension-orientedattentional sets produce larger perceptual learning effects than perception-orientedattentional sets. Although all participants showed perceptual learning effects, thoseexposed to the ambiguous sound with increased lexical bias only showed largerperceptual learning effects when the instructions about the speaker’s ambiguoussound were withheld. Attention on the ambiguous sound equalized the perceptuallearning effects across lexical bias. However, in Experiment 2, there is no sucheffect of attention. This suggests that ambiguous sounds farther away from thecanonical production induce a more perception-oriented attentional set regardlessof explicit instructions.One question raised by the current results is whether perception-oriented atten-tional sets always result in decreased perceptual learning. The instructions used to83focus the listener’s attentional set framed the ambiguity in a negative way, with lis-teners being cautioned to listen carefully to ensure they made the correct decision.If the attention were directed to the ambiguous sound by framing the ambiguity ina positive way (i.e., that the ambiguous “s” is from a non-native accent or a speechdisorder), would we still see the same pattern of results? The current mechanismwould predict that attention of any kind to signal properties would block the prop-agation of errors, reducing perceptual learning. This prediction will be tested infuture work.Attention’s role in perceptual learning may extend to the realm of sociolinguis-tics. In sociolinguistics, there are three categories of linguistic variables: indica-tors, markers, and stereotypes (Labov, 1972). Of these, stereotypes are the mostknown to speakers of the dialect and speakers of other dialects. If attention to per-ception inhibits perceptual learning, then perceptual learning of these stereotypedlinguistic variables would be inhibited. For instance, New Zealand English hasundergone several vowel shifts as compared to other varieties of English, but theseshifted vowels differ in salience depending on the listener’s dialect (Bell, 2015).For Australian English listeners, the STRUT vowel is salient (fish and chips as morefush and chups). For North American English listeners, the DRESS vowel is moresalient (Bret heard as Brit). These two populations of listeners are predicted toperceptually adapt to these vowel changes differently. North American Englishlisteners should adapt to STRUT more than Australian Englsih listeners, and viceversa for DRESS. Given the scale from indicators to markers to stereotypes is or-dered in terms of speaker (or listener) awareness, the role of attention proposed inthis dissertation would predict progressively less perceptual learning as awarenessincreases. Salient social variants (i.e. r-lessness) have also been found to not beencoded as robustly as canonical productions (Sumner and Samuel, 2009). Areless salient social variants learned easier in general?4.4 Category atypicalityIn Experiment 2, there was no effect of explicit instructions or lexical bias on per-ceptual learning, with a stable perceptual learning effect present for all listeners.There are two potential, non-exclusive explanations for the lack of effects. As84stated above, the increased distance to the canonical production drew the listener’sattention to the ambiguous productions, resulting in a perception-oriented atten-tional set. The second potential explanation is that the productions farther fromcanonical produce a weaker effect on the updating of a listener’s categories, as pre-dicted from the neo-generative model in Pierrehumbert (2002). This explanationis supported in part by the weaker correlation between word endorsement rate andcross-over point found in Experiment 2, and the findings of Sumner (2011) wherethe highest rates of perceptual learning were found when the categories began moretypical and gradually became less typical over the course of exposure. This expla-nation could be tested straightforwardly by implementing the same gradual shiftparadigm used in Sumner (2011) with the manipulations used in this dissertation.An interesting extension to the current findings would be to observe the per-ceptual learning effects in a cognitive load paradigm. Speech perception undercognitive load has been shown to have greater reliance on lexical information dueto weaker initial encoding of the signal (Mattys and Wiget, 2011). Following ex-posure to a modified ambiguous category, we might expect to see less perceptuallearning if the exposure was accompanied by high cognitive load. Scharenborget al. (2014), however, found that hearing loss of older participants did not sig-nificantly influence their perceptual learning. Therefore it may be that perceptuallearning would not fluctuate across cognitive loads. Higher cognitive loads, how-ever, might allow for more atypical ambiguous stimuli to be learned, due to theincreased reliance on lexical information during initial encoding.It is important to bear in mind that what is typical in one context is not necessar-ily typical in another. The methodology employed for Experiment 3 assumed thatexpected variation for the category /s/ would be common across all experiments.However, it may be that the perfectly ambiguous /s/ category in Experiment 3 waswithin the range of variation in high predictability sentences. In this case, had thecategory atypicality been more like that of Experiment 2, we may have actuallyseen more of an effect, perhaps back to the level of Experiment 1 (as schematizedin Figure 3.5 of Chapter 3).85Figure 4.1: A schema for predictive coding under a perception-oriented at-tentional set. Attention is represented by the pink box, where gain isenhanced for detection, but error signal propagation is limited to lowerlevels of sensory representation where the expectations must be updated.This is represented by the lack of pink nodes outside the attention box.As before, blue errors represent expectations, red arrows represent errorsignals, and yellow represents the sensory input.4.5 Implications for cognitive modelsThe model that this dissertation adopts is based off of the predictive coding frame-work (Clark, 2013). In this model, expectations about incoming signal are fedfrom higher levels of representation to lower ones. The mismatch between actualperceived signal and the expectations is then propagated back to the higher lev-els as an error signal. Future expectations are modified based on the error signal.This framework captures the basics of perceptual learning, and a similar compu-tational framework has been used to model visually-guided perceptual learningtasks (Kleinschmidt and Jaeger, 2011). However, the attentional mechanism in the86predictive coding framework does not work well for some instances of visual atten-tion (Block and Siegel, 2013) or for the current results. I propose a new attentionalmechanism for predictive coding, one in which attention inhibits error propagationbeyond the level to which attention is directed. Figures 4.1 and 4.2 show schemasreproduced from Chapter 1 for perception-oriented and comprehension-oriented at-tentional sets, respectively. Such a mechanism explains both the previous findingsand the current results.Figure 4.2: A schema for predictive coding under a comprehension-orientedattentional set. Attention is represented by the green box, where it isoriented to higher, more abstract levels of sensory representation. Er-ror signals are able to propagate farther and update more than just thefine grained low level sensory representations. As before, blue errorsrepresent expectations, red arrows represent error signals, and yellowrepresents the sensory input.The predictive coding framework advanced here has implications for other psy-cholinguistic research outside of perceptual learning. Recent innovations in speech87perception models have emphasized the role of episodic memory traces (Goldinger,1996; Pierrehumbert, 2001). That is, listeners encode more phonetic detail than isstrictly necessary for linguistic comprehension, and can process previously heardtokens of a word type faster than novel tokens of that word type. Theodore et al.(2015) argue that attention during encoding can emphasize abstract (i.e. lexical) in-formation at the expense of episodic (i.e., talker) information or vice versa. Such aproposal is similar to that put forth in this dissertation, but encoding in a predictivecoding framework would be updating of predictions. The lack of an explicit mem-ory trace mechanism in the predictive coding framework may be a weakness con-cerning cognition as a whole, but I would argue that it still accounts for the speechperception data. The principle motivation for episodic memory traces was origi-nally to account for behavioral data that showed sensitivity to fine details of pre-vious stimuli (Goldinger, 1996). However, Sumner and Kataoka (2013) highlightsrecent findings of recognition equivalence but memory inequality between frequentforms and idealized, infrequent forms. For instance, the word flute is generally pro-nounced with an unreleased /t/ in North American English, but it is also producedless frequently with a fully released /t/ (the idealized form) or with a glottal stop.All pronunciation variants are recognized equally well in short term processingtasks (accuracy and reaction time). However, infrequent, idealized pronunciationsare remembered better in long term recall tasks. Sumner and colleagues proposean alternate route to linguistic encoding, which they term socioacoustic encoding.Hierarchical respresentations in the predictive coding framework can account forthis data without appealing to episodic memory traces. The socioacoustic encodingwould be a speaker-based hierarchical representation, with abstracted gender andaccent representations.While most of the discussion here has concerned representations of speechsounds, predictive coding representations are not solely limited to linguistic ob-jects. In fact, one of the key findings of perceptual learning is that it is largelydependent on the speaker. In these cases, perceptual learning is not updating justthe distribution for what is expected of a speech sound, but also what is expectedfor that speaker. Perceptual learning from a group of speakers that share a trait(i.e., the same non-native accent) facilitates the creation (or perhaps simply theidentification) of a more abstract category for that group of speakers, enhancing88intelligibility on future novel speakers (Bradlow and Bent, 2008).The predictive coding framework can be applied to speech production as well,and has particular applicability to sound change. When a person produces speech,they are also perceiving it and compensating for any deviations from their predic-tions (Hickok et al., 2011). In addition to hierarchical representations for what aperson’s own speech should sound like, there could also be social representationsthat act on it. If a speaker identifies with a particular speech group, then their pre-dictions for their own speech should align with what they expect other memebersof the speech group to produce. One way of thinking about speech style in pro-duction (e.g. reading, interview, casual conversation, etc.) is in terms of attentionto speech production (Labov, 1997). As attention to speech production increases,speech group markers (i.e., non-rhoticity in New York City) become less prevelant(Labov, 1997). Perhaps attention plays an inhibitory role on abtract social repre-sentations in speech production. In terms of sound change, an individual wouldchange their speech both when they associate a particular trait with a speech groupand consider themselves a member of that group.Recent work on a historical vowel change shift in New Zealand English pro-posed that low frequency words led the shift (Hay et al., 2015). The mechanismthey propose to account for this data is one where tokens that are difficult to com-prehend are less likely to be encoded. Low frequency words are particularly af-fected because they are likely to be interpreted as higher frequency neighbors andnonwords. The experiments of this dissertation contained a similar situation atan individual speaker level. Participants were more likely to not recognize wordscontaining a modified /s/ category as real English words than the filler words. InExperiment 1, the amount to which a participant’s boundary shifted was – in gen-eral – correlated with the amount of /s/ words recognized as words. To the extentthat difficult to comprehend words are treated as nonwords, these findings reinforcethe findings of Hay et al. (2015).Outside of psycholinguistcs, this dissertation suggests testable predictions forperceptual learning in the visual domain using visual illusions. In the Kanizsa il-lusion, for instance, three Pac-man like objects are arranged to give the illusion ofthree circles overlaid by a triangle (Kanizsa, 1976). Perception of this illusion re-quires more abstract representations that are not in the signal, much like the objects89of comprehension as defined in this dissertation. The proposed mechanism for at-tention would predict that perceptual learning involving visual illusions should bemore general and less exposure specific. In the Kanizsa case, perceivers wouldperceptually learn characteristics of the abstract triangles and circles instead ofthe Pac-man shapes. Visual illusions allow perceivers to better organize complexscenes in short-term and working memory (Vandenbroucke et al., 2012). Similarto these illusions, words and higher linguistic structures allow better organizationof complex auditory signals. Drawing attention to either the circles and trianglesof the illusion or to the Pac-man symbols should induce attentional sets similarto comprehension-oriented and perception-oriented ones proposed in this disserta-tion, with similar effects on the generalization of perceptual learning.I have argued that attentional sets, particularly within the predictive codingframework, are crucial to the generalization of perceptual learning to new contexts.Recently proposed models of speech perception treat linguistic representations as abalance of both more abstract elements and more fine-detailed elements (Theodoreet al., 2015) and also incorporate aspects of social representations (Szakay, 2012;Sumner and Kataoka, 2013). Both of these trends are easy to incorporate into apredictive coding framework. Such a model accounts for the findings of this thesisand those of the larger psycholinguistic literature.4.6 ConclusionThis dissertation investigated the influences of attention and linguistic salience onperceptual learning in speech perception. Perceptual learning was modulated bythe attentional set of the listener. Perception-oriented attentional sets were inducedthrough increasing the salience of the modified category, either by reducing thelexical bias, increasing the typicality, or giving explicit instructions. In all these in-stances, participants showed robust perceptual learning effects, but smaller effectsthan participants not biased towards perception-oriented attentional sets. Expo-sure to modified categories in predictive sentences resulted in no perceptual learn-ing effects, potentially due to the attribution of the modified sound category toreduced speech clarity. These results support a greater role of attention than previ-ously assumed in predictive coding frameworks, such as the proposed propagation-90blocking mechanism. Finally, these results suggest that the degree to which listen-ers perceptually adapt to a new speaker is under their control to the same degreeas attentional set adoption. However, given the robust perceptual learning effectsfound across experiments, perceptual learning is a largely automatic process whenvariation cannot be attributed to contextual factors.91BibliographyAhissar, M. and Hochstein, S. (1993). Attentional control of early perceptuallearning. Proceedings of the National Academy of Sciences of the UnitedStates of America, 90(12):5718–22. → pages 19Bacon, W. F. and Egeth, H. E. (1994). Overriding stimulus-driven attentionalcapture. Perception & Psychophysics, 55(5):485–496. → pages 2, 3Baker, A., Archangeli, D., and Mielke, J. (2011). Variability in american englishs-retraction suggests a solution to the actuation problem. Language variationand change, 23(03):347–374. → pages 25Bell, A. (2015). The indexical cycle and the making of social meaning inlanguage: Why everybody needs good neighbours sociolinguistically.Colloquium at University of British Columbia. → pages 84Bertelson, P., Vroomen, J., and De Gelder, B. (2003). Visual Recalibration ofAuditory Speech Identification: A McGurk Aftereffect. PsychologicalScience, 14(6):592–597. → pages 9Block, N. and Siegel, S. (2013). Attention and perceptual adaptation. Behavioraland Brain Sciences, 36(3):205–6. → pages 21, 56, 87Borsky, S., Tuller, B., and Shapiro, L. P. (1998). ”How to milk a coat:” the effectsof semantic and acoustic information on phoneme categorization. Journal ofthe Acoustical Society of America, 103(5 Pt 1):2670–2676. → pages 16, 58,82Bradlow, A. R. and Alexander, J. A. (2007). Semantic and phonetic enhancementsfor speech-in-noise recognition by native and non-native listeners. Journal ofthe Acoustical Society of America, 121(4):2339–2349. → pages 16, 58Bradlow, A. R. and Bent, T. (2008). Perceptual adaptation to non-native speech.Cognition, 106(2):707–729. → pages 1, 10, 60, 62, 83, 8992Brysbaert, M. and New, B. (2009). Moving beyond Kucera and Francis: a criticalevaluation of current word frequency norms and the introduction of a newand improved word frequency measure for American English. BehaviorResearch Methods, 41(4):977–990. → pages 31Clare, E. (2014). Applying phonological knowledge to phonetic accommodation.In Poster presented at the 14th Conference on Laboratory Phonology,Tachikawa, Tokyo. → pages 15Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the futureof cognitive science. Behavioral and Brain Sciences, 36(3):181–204. →pages ix, 9, 10, 14, 20, 22, 43, 55, 56, 61, 75, 86Clopper, C. G. and Pierrehumbert, J. B. (2008). Effects of semantic predictabilityand regional dialect on vowel space reduction. Journal of the AcousticalSociety of America, 124(3):1682–1688. → pages 16, 58Connine, C. (1987). Constraints on interactive processes in auditory wordrecognition: The role of sentence context. Journal of Memory andLanguage, 538:527–538. → pages 16, 58, 82Cutler, A., Mehler, J., Norris, D., and Segui, J. (1987). Phoneme identification andthe lexicon. Cognitive Psychology, 19(2):141–177. → pages 3, 13, 14, 18, 81Dilts, P. C. (2013). Modelling Phonetic Reduction in a Corpus of Spoken EnglishUsing Random Forests and Mixed-effects Regression. PhD thesis, Universityof Alberta. → pages 60Egner, T., Monti, J. M., and Summerfield, C. (2010). Expectation and surprisedetermine neural population responses in the ventral visual stream. TheJournal of Neuroscience, 30(49):16601–16608. → pages 61Eisner, F. and McQueen, J. M. (2005). The specificity of perceptual learning inspeech processing. Perception & Psychophysics, 67(2):224–238. → pages11, 44Eriksen, B. A. and Eriksen, C. W. (1974). Effects of noise letters upon theidentification of a target letter in a nonsearch task. Perception &Psychophysics, 16(1):143–149. → pages 19Fallon, M., Trehub, S. E., and Schneider, B. A. (2002). Children’s use of semanticcues in degraded listening environments. The Journal of the AcousticalSociety of America, 111(5 Pt 1):2242–2249. → pages 16, 5893Finn, A. S., Lee, T., Kraus, A., and Kam, C. L. H. (2014). When it hurts (andhelps) to try: The role of effort in language learning. → pages 17Ganong, W. F. (1980). Phonetic categorization in auditory word perception.Journal of Experimental Psychology: Human Perception and Performance,6(1):110–125. → pages 12Gibson, E. J. (1953). Improvement in perceptual judgments as a function ofcontrolled practice or training. Psychological Bulletin, 50(6):401–431. →pages 2, 7, 17Gilbert, C., Sigman, M., and Crist, R. (2001). The neural basis of perceptuallearning. Neuron, 31:681–697. → pages 11, 22, 26Goldinger, S. D. (1996). Words and voices: episodic traces in spoken wordidentification and recognition memory. Journal of Experimental Psychology:Learning, Memory, and Cognition, 22(5):1166–1183. → pages 4, 88Gow, D. W. and Gordon, P. C. (1995). Lexical and prelexical influences on wordsegmentation: evidence from priming. Journal of Experimental Psychology:Human Perception and Performance, 21(2):344–359. → pages 5, 14Hay, J. B., Pierrehumbert, J. B., Walker, A. J., and LaShell, P. (2015). Trackingword frequency effects through 130years of sound change. Cognition,139:83–91. → pages 89Hickok, G., Houde, J., and Rong, F. (2011). Sensorimotor integration in speechprocessing: computational basis and neural organization. Neuron,69(3):407–422. → pages 89Johnson, K. (2004). Massive reduction in conversational American English. InSpontaneous speech: Data and analysis. Proceedings of the 1st session ofthe 10th international symposium, pages 29–54. Citeseer. → pages 60Kalikow, D., Stevens, K., and Elliott, L. (1977). Development of a test of speechintelligibility in noise using sentence materials with controlled wordpredictability. Journal of the Acoustical Society of America, 61(5). → pages15, 16, 58, 62Kanizsa, G. (1976). Subjective contours. Scientific American, 234(4):48–52. →pages 89Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T., and Banno, H.(2008). Tandem-straight: A temporally stable power spectral representation94for periodic signals and applications to interference-free spectrum, F0, andaperiodicity estimation. In Proceedings of IEEE International Conference onAcoustics, Speech and Signal Processing, pages 3933–3936. → pages x, xi,31, 36, 48, 64Kleber, F., Harrington, J., and Reubold, U. (2012). The Relationship between thePerception and Production of Coarticulation during a Sound Change inProgress. Language and Speech, 55(3):383–405. → pages 53Kleinschmidt, D. and Jaeger, T. F. (2011). A Bayesian belief updating model ofphonetic recalibration and selective adaptation. In Proceedings of the 2ndACL Workshop on Cognitive Modeling and Computational Linguistics.Association for Computational Linguistics. → pages 9, 86Kraljic, T., Brennan, S. E., and Samuel, A. G. (2008a). Accommodating variation:dialects, idiolects, and speech processing. Cognition, 107(1):54–81. →pages 15, 25, 59, 75, 80, 82Kraljic, T. and Samuel, A. G. (2005). Perceptual learning for speech: Is there areturn to normal? Cognitive Psychology, 51(2):141–78. → pages 11, 15, 44Kraljic, T. and Samuel, A. G. (2007). Perceptual adjustments to multiplespeakers. Journal of Memory and Language, 56(1):1–15. → pages 11Kraljic, T., Samuel, A. G., and Brennan, S. E. (2008b). First impressions and lastresorts: how listeners adjust to speaker variability. Psychological Science,19(4):332–8. → pages 15, 24, 25, 55, 59, 60, 61, 75, 81Krause, J. C. and Braida, L. D. (2004). Acoustic properties of naturally producedclear speech at normal speaking rates. The Journal of the Acoustical Societyof America, 115(1):362–378. → pages 60Kuhl, P. K. (1979). Speech perception in early infancy: perceptual constancy forspectrally dissimilar vowel categories. The Journal of the Acoustical Societyof America, 66(6):1668–1679. → pages 1Labov, W. (1972). Sociolinguistic Patterns. Language, 2(4):344. → pages 84Labov, W. (1997). The Social Stratification of (r) in New York City DepartmentStores. In Coupland, N., editor, Sociolinguistics: A Reader, pages 168–178.St. Martin’s Press. → pages 89Leber, A. B. and Egeth, H. E. (2006). Attention on autopilot: Past experience andattentional set. Visual Cognition, 14(4-8):565–583. → pages 18, 2095Levy, R. (2008). Expectation-based syntactic comprehension. Cognition,106(3):1126–1177. → pages 17Lieberman, P. (1963). Some Effects of Semantic and Grammatical Context on theProduction and Perception of Speech. Language and Speech, 6(3):172 –187.→ pages 16Ling, S. and Carrasco, M. (2006). When sustained attention impairs perception.Nature neuroscience, 9(10):1243–1245. → pages 20Marslen-Wilson, W. D. and Welsh, A. (1978). Processing interactions and lexicalaccess during word recognition in continuous speech. Cognitive Psychology,10(1):29–63. → pages 5, 14Mattys, S. L. and Wiget, L. (2011). Effects of cognitive load on speechrecognition. Journal of Memory and Language, 65(2):145–160. → pages 13,17, 18, 58, 85Mayo, L. H., Florentine, M., and Buus, S. (1997). Age of second-languageacquisition and perception of speech in noise. Journal of Speech, Language,and Hearing Research, 40(3):686–693. → pages 16, 58, 83McAuliffe, M. (2015). python-acoustic-similarity. Available from → pages 34McLennan, C. T., Luce, P. a., and Charles-Luce, J. (2003). Representation oflexical form. Journal of Experimental Psychology: Learning, Memory, andCognition, 29(4):539–53. → pages 10Mielke, J. (2012). A phonetically based metric of sound similarity. Lingua,122(2):145–163. → pages 34Mirman, D., McClelland, J. L., Holt, L. L., and Magnuson, J. S. (2008). Effects ofAttention on the Strength of Lexical Influences on Speech Perception:Behavioral Experiments and Computational Mechanisms. Cognitive Science,32(2):398–417. → pages 5, 13Mitterer, H., Scharenborg, O., and McQueen, J. M. (2013). Phonologicalabstraction without phonemes in speech perception. Cognition,129(2):356–361. → pages 11, 80Norris, D. and Cutler, A. (1988). The relative accessibility of phonemes andsyllables. Perception & Psychophysics, 43(6):541–550. → pages 1896Norris, D., McQueen, J. M., and Cutler, A. (2003). Perceptual learning in speech.Cognitive Psychology, 47(2):204–238. → pages 2, 6, 7, 11, 15, 19, 22, 26,44, 55, 80Pierrehumbert, J. (2002). Word-specific phonetics. Laboratory phonology 7. →pages 55, 85Pierrehumbert, J. B. (2001). Exemplar dynamics: Word frequency, lenition andcontrast. In Bybee, J. and Hopper, B., editors, Frequency and the emergenceof linguistic structure, pages 137–158. John Benjamins, Amsterdam. →pages 24, 88Pitt, M. and Szostak, C. (2012). A lexically biased attentional set compensates forvariable speech quality caused by pronunciation variation. Language andCognitive Processes, (April 2013):37–41. → pages 3, 5, 6, 13, 14, 18, 27, 28Pitt, M. A. and Samuel, A. G. (1990). Attentional allocation during speechperception: How fine is the focus? Journal of Memory and Language,29(5):611–632. → pages 18Pitt, M. A. and Samuel, A. G. (1993). An empirical and meta-analytic evaluationof the phoneme identification task. Journal of Experimental Psychology:Human Perception and Performance, 19(4):699–725. → pages 14Pitt, M. A. and Samuel, A. G. (2006). Word length and lexical activation: longeris better. Journal of Experimental Psychology: Human Perception andPerformance, 32(5):1120–1135. → pages 5, 13, 27Psychology Software Tools, I. (2012). E-Prime. → pages 32, 37, 66, 67, 69Reinisch, E. and Holt, L. L. (2013). Lexically Guided Phonetic Retuning ofForeign-Accented Speech and Its Generalization. Journal of ExperimentalPsychology: Human Perception and Performance, 40(2):539–555. → pages11Reinisch, E., Weber, A., and Mitterer, H. (2013). Listeners retune phonemecategories across languages. Journal of Experimental Psychology: HumanPerception and Performance, 39(1):75–86. → pages 2, 11, 22, 37, 39, 44, 55Reinisch, E., Wozny, D. R., Mitterer, H., and Holt, L. L. (2014). Phoneticcategory recalibration: What are the categories? Journal of Phonetics,45:91–105. → pages 2, 11, 22, 26, 54, 55, 8097Reitan, R. M. (1958). Validity of the trail making test as an indicator of organicbrain damage. Perceptual and motor skills, 8(3):271–276. → pages 19Saffran, J. R., Newport, E. L., Aslin, R. N., Tunick, R. A., and Barrueco, S.(1997). Incidental language learning: Listening (and learning) out of thecorner of your ear. Psychological science, 8(2):101–105. → pages 17Samuel, A. G. (1981). Phonemic restoration: insights from a new methodology.Journal of Experimental Psychology: General, 110(4):474–494. → pages14, 16, 25, 58, 77Scarborough, R. (2010). Lexical and contextual predictability: Confluent effectson the production of vowels. In Laboratory phonology 10, pages 575–604.→ pages 16, 58, 60Scharenborg, O. and Janse, E. (2013). Comparing lexically guided perceptuallearning in younger and older listeners. Attention, Perception &Psychophysics, 75(3):525–36. → pages 20, 51, 78Scharenborg, O., Weber, A., and Janse, E. (2014). The role of attentional abilitiesin lexically guided perceptual learning by older listeners. Attention,Perception, & Psychophysics, pages 1–15. → pages 19, 20, 55, 85Shankweiler, D., Strange, W., and Verbrugge, R. R. (1977). Speech and theproblem of perceptual constancy. → pages 1Sumner, M. (2011). The role of variation in the perception of accented speech.Cognition, 119(1):131–136. → pages 22, 24, 28, 60, 61, 85Sumner, M. and Kataoka, R. (2013). Effects of phonetically-cued talker variationon semantic encoding. The Journal of the Acoustical Society of America,134(6). → pages 1, 88, 90Sumner, M., McGowan, K., D’Onofrio, A., and Pratt, T. (2015). Casual speech ismore sensitive to top-down information than careful speech. In Oralpresentation at the Linguistic Society of America 2015 Annual Meeting. →pages 60Sumner, M. and Samuel, A. G. (2009). The effect of experience on the perceptionand representation of dialect variants. Journal of Memory and Language,60(4):487–501. → pages 84Szakay, A. (2012). The effect of dialect on bilingual lexical processing andrepresentation. PhD thesis, University of British Columbia. → pages 9098Theodore, R. M., Blumstein, S. E., and Luthra, S. (2015). Attention modulatesspecificity effects in spoken word recognition: Challenges to the time-coursehypothesis. Attention, Perception, & Psychophysics, pages 1–11. → pages 4,88, 90Toro, J. M., Sinnett, S., and Soto-Faraco, S. (2005). Speech segmentation bystatistical learning depends on attention. Cognition, 97(2). → pages 17Vandenbroucke, A. R. E., Sligte, I. G., Fahrenfort, J. J., Ambroziak, K. B., andLamme, V. A. F. (2012). Non-Attended Representations are PerceptualRather than Unconscious in Nature. PLoS ONE, 7(11). → pages 90Vroomen, J., van Linden, S., de Gelder, B., and Bertelson, P. (2007). Visualrecalibration and selective adaptation in auditory-visual speech perception:Contrasting build-up courses. Neuropsychologia, 45(3):572–577. → pages7, 9, 24, 55Watanabe, T., Na´n˜ez, J. E., and Sasaki, Y. (2001). Perceptual learning withoutperception. Nature, 413(6858):844–848. → pages 17Wolfe, J. M. and Horowitz, T. S. (2004). What attributes guide the deployment ofvisual attention and how do they do it? Nature Reviews Neuroscience,5(6):495–501. → pages 18Yeshurun, Y. and Carrasco, M. (1998). Attention improves or impairs visualperformance by enhancing spatial resolution. Nature, 396(6706):72–75. →pages 21, 5699


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items