UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Acquisition of allophony from speech input by adult learners Noguchi, Masaki 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_september_noguchi_masaki.pdf [ 6.62MB ]
Metadata
JSON: 24-1.0300489.json
JSON-LD: 24-1.0300489-ld.json
RDF/XML (Pretty): 24-1.0300489-rdf.xml
RDF/JSON: 24-1.0300489-rdf.json
Turtle: 24-1.0300489-turtle.txt
N-Triples: 24-1.0300489-rdf-ntriples.txt
Original Record: 24-1.0300489-source.json
Full Text
24-1.0300489-fulltext.txt
Citation
24-1.0300489.ris

Full Text

Acquisition of allophony from speech input by adultlearnersbyMasaki NoguchiB.A. Humanity, Soka University, 2000M.A. Anthropology, Tulane University, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Linguistics)The University of British Columbia(Vancouver)May 2016c©Masaki Noguchi, 2016AbstractSound systems are a basic building block of any human language. An integral partof the acquisition of sound systems is the learning of allophony. In sound systems,some segments are used as allophones, or contextually-conditioned variants of asingle phoneme, and learners need to figure out whether given segments are differ-ent phonemes or allophones of a single phoneme. There is a growing interest inthe question of how allophony is learned from speech input (e.g., Seidl and Cristia,2012). This dissertation investigates the mechanisms behind the learning of al-lophony. Whether given segments are different phonemes or allophones of a singlephoneme is partly determined by the contextual distribution of the segments. Whensegments occur in overlapping contexts and their occurrences are not predictablefrom the contexts, they are likely to be different phonemes. When segments oc-cur in mutually exclusive contexts, and their occurrences are predictable from thecontexts (i.e., they are in complementary distribution), the segments are likely tobe allophones. This dissertation starts with the hypothesis that allophonic relation-ships between segments can be learned from the complementary distribution of thesegments in input.With data collected in a series of laboratory experiments with adult Englishspeakers, I make the following claims. First, adults can learn allophonic relation-ships between two segments from the complementary distribution of the segmentsin input. The results of Experiment 1 showed that participants learned to treattwo novel segments as something like allophones when they were exposed to inputin which the segments were in complementary distribution. Second, the learningof allophony is constrained by the phonetic naturalness of the patterns of com-plementary distribution. The results of Experiment 2 showed that the learningiiof allophony happened only when learners were exposed to input in which rele-vant segments were in phonetically natural complementary distribution. Third, thelearning of allophony involves the learning of the context-dependent perception ofrelevant segments. The results of Experiment 3 showed that, through exposure toinput, participants’ perception of the relevant segments became more dependent oncontext such that they perceived the segments as being more similar to each otherwhen they heard the segments in phonetically natural complementary contexts.iiiPrefaceThis research project was conceived and designed by Masaki Noguchi with assis-tance from Carla L. Hudson Kam, Gunnar O´lafur Hansson, and Molly Babel. Datacollection was performed by Masaki Noguchi, and data analyses (including sta-tistical analyses) were performed by Masaki Noguchi with assistance from CarlaL. Hudson Kam and Gunnar O´lafur Hansson. This research project was fundedby NSERC Discovery Grant (Individual) to Carla L. Hudson Kam “Constraintson language acquisition and how they change (or don’t) with age.” All experi-ments presented in this dissertation were approved by the University of BritishColumbia’s Research Ethics Board [certificate #H12-02287].The following is a list of presentations and publications in which various partsof this dissertation were first introduced.• The results of Expriment 1 (Chapter 2) were first presented as a poster at The14th Conference on Laboratory Phonology in Tachikawa, Tokyo (Noguchiand Hudson Kam, 2014b). The poster was created by Masaki Noguchi withassistance from Carla L. Hudson Kam.• The results of Experiment 2 (Chapter 3) were first presented as a poster atThe 39th Annual Boston University Conference on Language Developmentin Boston, MA (Noguchi and Hudson Kam, 2014a). The poster was createdby Masaki Noguchi with assistance from Carla L. Hudson Kam.• The results of Experiment 3 (Chapter 4) were first presented as a poster at2015 Annual Meeting of Phonology in Vancouver, B.C. (Noguchi and Hud-son Kam, 2015b). The poster was created by Masaki Noguchi with assis-tance from Carla L. Hudson Kam and Gunnar O´lafur Hansson.iv• The results of the experiment presented in Appendix A were first presentedin the proceedings of Acoustics Week in Canada 2015 (Noguchi and Hud-son Kam, 2015a). The paper was written by Masaki Noguchi with assistancefrom Carla L. Hudson Kam.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Phonological relationships: Phonemic contrasts vs. allophony . . . 31.2 The perceptual effect of phonological relationships . . . . . . . . 61.2.1 Reduced sensitivity to allophonic differences: an informa-tion theoretic account . . . . . . . . . . . . . . . . . . . . 71.3 Acquisition of phonological relationships . . . . . . . . . . . . . 91.3.1 Early acquisition of phonemic contrasts . . . . . . . . . . 91.3.2 Early acquisition of allophonic variation . . . . . . . . . . 141.3.3 Late acquisition of non-native phonemic contrasts . . . . . 151.3.4 Late acquisition of non-native allophonic variation . . . . 171.4 Learning mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 181.4.1 Distributional learning of sound categories . . . . . . . . 181.4.2 Distributional learning of allophony . . . . . . . . . . . . 24vi1.5 Outline of dissertation . . . . . . . . . . . . . . . . . . . . . . . . 282 Experiment 1: Distributional learning of allophony . . . . . . . . . 322.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2 Target segments . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.1 Post-alveolar fricatives in Mandarin . . . . . . . . . . . . 332.2.2 Post-alveolar fricatives in Mandarin and English . . . . . 372.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 412.3.2 Exposure stimuli . . . . . . . . . . . . . . . . . . . . . . 412.3.3 Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 492.3.4 AX discrimination test . . . . . . . . . . . . . . . . . . . 542.3.5 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.3.6 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 562.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Experiment 2: Phonetic naturalness and the learning of allophony . 663.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.2 Constraints on statistical learning . . . . . . . . . . . . . . . . . . 673.3 Constraints on the learning of phonology . . . . . . . . . . . . . . 683.4 Constraints on the learning of allophony . . . . . . . . . . . . . . 733.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 743.5.2 Exposure stimuli . . . . . . . . . . . . . . . . . . . . . . 753.5.3 AX discrimination task . . . . . . . . . . . . . . . . . . . 773.5.4 Design and procedure . . . . . . . . . . . . . . . . . . . 783.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823.7.1 Context effects in speech perception . . . . . . . . . . . . 843.7.2 Context effects and the distributional learning of sound cat-egories . . . . . . . . . . . . . . . . . . . . . . . . . . . 88vii3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Experiment 3: Learning of context-dependent perception of novelsounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 1024.2.2 Exposure stimuli . . . . . . . . . . . . . . . . . . . . . . 1024.2.3 Test stimuli . . . . . . . . . . . . . . . . . . . . . . . . . 1044.2.4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.2.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3.1 Cumulative link model . . . . . . . . . . . . . . . . . . . 1104.3.2 Trials with different critical syllables . . . . . . . . . . . 1114.3.3 Trials with same critical syllables . . . . . . . . . . . . . 1134.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1245.1 Summary of findings . . . . . . . . . . . . . . . . . . . . . . . . 1245.2 The role of context in the learning of sound categories . . . . . . . 1265.3 Some remaining questions about the context effects hypothesis . . 1345.3.1 Directionality . . . . . . . . . . . . . . . . . . . . . . . . 1345.3.2 Non-spectral information . . . . . . . . . . . . . . . . . . 1355.4 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . 1375.5 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142A Categorical perception of post-alveolar fricatives by native speakersof Mandarin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172A.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172A.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173A.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173viiiA.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174ixList of TablesTable 2.1 F2 transitions of Mandarin sibilants in [Ca] syllables (based onTable 2 in Chiu, 2009) . . . . . . . . . . . . . . . . . . . . . . 36Table 2.2 Identification of Mandarin sibilant fricatives and the rating oftheir similarity to English fricatives by English speakers (basedon Table 8 in Hao, 2012) . . . . . . . . . . . . . . . . . . . . 38Table 2.3 Exposure stimuli (Experiment 1) . . . . . . . . . . . . . . . . 53Table 2.4 Test stimuli (Experiment 1) . . . . . . . . . . . . . . . . . . . 54Table 3.1 Exposure stimuli in four conditions (Experiments 1 and 2) . . . 77Table 3.2 Test stimuli (Experiments 2: same as the ones used in Experi-ment 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Table 4.1 Stimuli for similarity rating task . . . . . . . . . . . . . . . . . 106Table 4.2 Helmert contrast coding for Context . . . . . . . . . . . . . . 112Table 5.1 Allophones of English /t/ . . . . . . . . . . . . . . . . . . . . 138xList of FiguresFigure 1.1 Continuum between allophony and phonemic contrast (basedon Figure 1 in Hall, 2012) . . . . . . . . . . . . . . . . . . . 5Figure 1.2 Frequency distribution of VOT values in English (200 samplesgenerated on the basis of the data in Allen and Miller, 1999) . 18Figure 1.3 Inference of categories from distributional shape . . . . . . . 19Figure 1.4 Category effect . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 1.5 Bimodal vs. Unimodal distribution of [da]-[ta] stimuli (Basedon Figure 1 in Maye et al., 2002) . . . . . . . . . . . . . . . . 21Figure 1.6 Non-complementary distribution vs. complementary distribution 28Figure 2.1 Spectral measurements of [ù] and [C] . . . . . . . . . . . . . . 43Figure 2.2 Formant transitions of [ùa] and [Ca] (with 95% CI) . . . . . . 44Figure 2.3 Spectra of resynthesized frication noise tokens . . . . . . . . 45Figure 2.4 Formant transitions of resynthesized vowel tokens . . . . . . 46Figure 2.5 Spectrograms of steps 1, 4, 7, and 10 . . . . . . . . . . . . . . 47Figure 2.6 Aggregate distribution of critical syllables (Experiment 1) . . 49Figure 2.7 Distribution of 32 critical syllables in the non-complementarycondition (Experiment 1) . . . . . . . . . . . . . . . . . . . . 50Figure 2.8 Distribution of 32 critical syllables in the complementary con-dition (Experiment 1) . . . . . . . . . . . . . . . . . . . . . . 51Figure 2.9 Mean d′ scores for distant pair trials with 2 SE (Experiment 1) 59Figure 2.10 Mean d′ scores for close pair trials with 2 SE (Experiment 1) . 60Figure 3.1 Aggregate distribution of critical syllables (Experiment 2) . . 76xiFigure 3.2 Distribution of 32 critical syllables in the complementary-unnatural condition (Experiment 2) . . . . . . . . . . . . . . 77Figure 3.3 Mean d′ scores with 2 SEs for distant pair (Experiments 1 and 2) 80Figure 3.4 Mean d′ scores with 2 SEs for close pair (Experiments 1 and 2) 81Figure 3.5 Context effects in the categorization of an [I] - [U] continuum . 85Figure 3.6 Context effects in the categorization of a [da] - [ga] continuum 86Figure 3.7 F2 transitions from context syllables to critical syllables . . . 91Figure 3.8 Complementary-natural condition . . . . . . . . . . . . . . . 93Figure 3.9 Complementary-unnatural condition . . . . . . . . . . . . . . 95Figure 4.1 Aggregate distribution of critical syllables (Experiment 3) . . 103Figure 4.2 Distribution of 32 critical syllables (Experiment 3) . . . . . . 104Figure 4.3 Distribution of responses to trials with different critical sylla-bles (1=“very similar” and 7=“very different”) . . . . . . . . 112Figure 4.4 Distribution of responses to trials with the same critical sylla-bles (1=“very similar” and 7=“very different”) . . . . . . . . 114Figure 4.5 Complementary-natural condition . . . . . . . . . . . . . . . 120Figure 4.6 Complementary-unnatural condition . . . . . . . . . . . . . . 121Figure 5.1 Probabilistic distribution . . . . . . . . . . . . . . . . . . . . 139Figure A.1 Mean d′ scores (with 95% CI) . . . . . . . . . . . . . . . . . 174Figure A.2 Proportion of /Ca/ responses (with 95% CI) . . . . . . . . . . 175xiiAcknowledgmentsI would like to express my sincere gratitude to my supervisors, Prof. Carla L. Hud-son Kam and Prof. Gunnar O´lafur Hansson, for the continuous support of my dis-sertation and related research, for their patience, motivation, and immense knowl-edge. Besides my supervisors, I would like to thank Prof. Molly Babel for herinsightful comments and encouragement.xiiiChapter 1IntroductionSound systems are a basic building block of any human language. A major compo-nent of sound systems is the inventory of phonemes. Phonemes are the categoriesof sounds that are used to make lexical contrasts (e.g., Trubetzkoy, 1969; Twad-dell, 1935). A great deal of research has investigated the acquisition of phonemes.Studies on infant speech perception have demonstrated that infants start categoriz-ing speech sounds according to the inventory of phonemes in their target languageduring the first year of life even though they have a very limited amount of lexicalknowledge (e.g., Kuhl et al., 1992; Werker and Tees, 1984). Studies have sug-gested that infants can induce phonetic categories from statistical information ininput: specifically, the frequency distribution of the sounds in acoustic space (e.g.,Maye et al., 2002). These phonetic categories include representations for individ-ual sounds or segments.1The acquisition of phonemes is more complex than the learning of segments(e.g., Werker and Curtin, 2005). One of the complexities in the acquisition ofphonemes is the learning of allophony. In sound systems, every segment belongsto a phoneme, but a phoneme may comprise multiple segments. In other words,some segments are used as variants of a single phoneme. These variants are calledallophones. In order to acquire phonemes, infants need to know whether givensegments are separate phonemes or allophones of the same phoneme. There is a1Following Pierrehumbert (2003, p.118), I use the term “segments” to refer to temporally discreteunits of speech that are equivalent to IPA symbols.1growing interest in the question of when and how infants learn allophony in theirtarget language (see Seidl and Cristia, 2012, for a review). Studies have suggestedthat infants start learning allophony in their target language in the first year oflife (e.g., Seidl et al., 2009). However, the mechanisms behind the learning ofallophony are yet to be understood.Whether two given segments are separate phonemes or allophones is partlydetermined by the distribution of the segments in particular environments. Specif-ically, when the segments occur in the same contexts, and thus their occurrencesare unpredictable, they are likely to be separate phonemes. By contrast, whenthe segments occur in mutually exclusive contexts, and thus their occurrences arepredictable, they are likely to be allophones (e.g., Hall, 2009; Jones, 1950; Tru-betzkoy, 1969). Therefore, researchers have argued that allophonic relationshipsbetween segments can be learned from the relative predictabilities of the segmentsin particular environments (e.g., Peperkamp et al., 2006a).In this dissertation, I investigate the mechanisms behind the learning of al-lophony. Studies show that human learners are sensitive to the frequency distribu-tion of sounds in acoustic space (e.g., Maye, 2000; Maye et al., 2002). Studies alsoshow that human learners are sensitive to statistical dependencies between linguis-tic items presented in a sequence (e.g., Saffran et al., 1996a,b), and they probablyuse this ability to learn regularities in the distribution of segments across environ-ments (i.e., phonotactic distributions) (e.g., Chambers et al., 2003; Onishi et al.,2002). Here, I hypothesize that human learners can learn allophonic relationshipsbetween sounds based on these two different kinds of distributional information.They can learn how to categorize sounds based on the frequency distribution ofthe sounds in acoustic space. But they can also learn the contextual distributionof the categories or segments at the same time and use their knowledge about thecontextual distribution to treat the categories as separate phonemes or allophones.If the segments occur in mutually exclusive contexts, and their occurrences are pre-dictable, they are likely to be treated like allophones. But if the segments occur inoverlapping contexts, and their occurrences are not predictable, they are likely tobe treated like separate phonemes.With the data collected in a series of laboratory experiments presented in thefollowing chapters, I will make the following claims: (1) adults can learn allo-2phonic relationships between segments from the contextual distribution of the seg-ments in input (Experiment 1), (2) the learning of allophony is constrained by thephonetic naturalness of the patterns of contextual distribution (Experiment 2), and(3) the learning of allophony involves the learning of context-dependent perceptionof relevant segments (Experiment 3).1.1 Phonological relationships: Phonemic contrastsvs. allophonyIn phonological analysis, sounds are classified into categories at both segmentaland sub-segmental levels. At the segmental level, sounds are classified into seg-mental categories (or simply segments). Categorization at the segmental levelassumes that continuous speech can be analyzed as a string of temporally dis-crete segment-sized units. However, such an assumption has been questioned bysome researchers on both phonetic and phonological grounds (see Ladd, 2014,for a recent critical review on segment-based representations in phonology). Ar-guments have also been put forward against the idea that segments are the basicunit of speech processing (see Klatt, 1979; Port, 2010; Port and Leary, 2005, forarguments against the primacy of segment-sized units in speech processing). De-spite these criticisms, the notion of segments is still widely used in phonetics andphonology, and segments are the basic building blocks of most theories of phono-logical relationships.At the sub-segmental level, sounds are classified into sub-segmental categorieslike features. While some theories assume that features are tied to segmental units(e.g., distinctive features: Chomsky and Halle, 1968; Clements, 1985; Jakobsonet al., 1951), others assume that they are not (e.g., articulatory gestures: Browmanand Goldstein, 1989, 1992). Features are sub-segmental in a sense that a segmentcomprises multiple features, but categorization based on a feature may comprisemultiple segments (i.e., a natural class).Segments in sound systems can stand in different types of phonological rela-tionships with each other. When differences between segments make lexical con-trasts, the relationship between the segments is contrastive (i.e., the segments areseparate phonemes). In other words, the substitution of one segment for another in3a word can change the meaning of the word. This also means that the distributionof these segments is not conditioned by particular environments; these segmentscan occur in the same contexts (i.e., they are in non-complementary distribution)and their occurrences are not predictable from the contexts. In English, for exam-ple, substituting [u] for [i] in the word beat [bit] results in another word boot [but],and this is concomitant with the fact that [u] and [i] can occur in the same contexts,after [b] and before [t]. Therefore, [u] and [i] are in a phonemically contrastiverelationship; they are separate phonemes, /u/ and /i/.2When differences between segments do not make lexical contrasts, the rela-tionship between the segments is allophonic (i.e., the segments are allophones orvariants of a single phoneme). The distribution of allophones is usually conditionedby particular environments; they occur in mutually exclusive contexts (i.e., they arein complementary distribution) and their occurrences are predictable from the con-texts. In English, for example, so-called “light” [l] and “dark” (velarized) [ë] arein complementary distribution; light [l] occurs in syllable-initial position as in theword leaf [lif], and dark [ë] occurs in syllable-final position as in the word feel [fië](Ladefoged and Johnson, 2014, p.73). Therefore, light [l] and dark [ë] are allo-phones of a single phoneme /l/ in English. In this case, substituting [ë] for [l] insyllable-initial position, or presenting [ë] in the inappropriate context, affects theprocessing of [ë] (e.g., slows down phoneme monitoring) but does not change theidentity of [ë] as /l/ (e.g., Lin, 2011).Sometimes, multiple segments are used interchangeably as surface realizationsof a single phoneme: free variation (e.g., Trubetzkoy, 1969). For example, inEnglish, [t] and [th] can occur in the same context (e.g., word final position), butthe former can substitute the latter in a word without changing the meaning of theword (e.g., both [khæt] and [khæth] are acceptable realization of a word cat). Freevariation is not usually considered to be a part of allophony, at least of the typediscussed in this dissertation (Hall, 2009, p.11).Recent approaches view phonological relationships as a probabilistic phe-nomenon rather than a categorical dichotomy between contrastive and allophonic(Hall, 2009, 2012, 2013b; Peperkamp et al., 2006a). Hall, for instance, claims that2In phonology, segments and allophones are transcribed within brackets ([ ]), and phonemes aretranscribed within slashes (/ /).4phonological relationships are determined on the basis of the relative predictabil-ities of the occurrences of two segments in particular environments; the more theoccurrences of two segments are predictable, the more allophonic the relationshipsbetween the segments are. This probabilistic approach has a significant advantageover the traditional approach, particularly in the analysis of so-called intermediatephonological relationships (Hall, 2009, 2012, 2013b). In Japanese, for example,the alveolar sibilants [s] and [(d)z] and the alveolopalatal sibilants [C] and [(d)ý],respectively, are in such an intermediate relationship. They are in complementarydistribution in a subset of the lexicon (old Japanese words). While the alveolarsibilants occur before the vowels [a], [e], [o], and [W], the alveolopalatal sibilantsoccur before the vowel [i]. But the complementarity is weakened in the othersubsets of the lexicon (old loanwords from Chinese, recent loanwords, and ono-matopoetic words); there the alveolopalatal sibilants also occur before the vowels[a], [e], [o], and [W]. Under the traditional approach, the relationship between thealveolar sibilants and alveolopalatal sibilants is neither contrastive nor allophonic.Under the probabilistic approach, as shown in the Figure 1.1, the relationship canbe defined as falling anywhere between a perfect phonemic contrast and perfectallophony depending on how much the distribution of the alveolar sibilants and thedistribution of the alveolopalatal sibilants overlap with each other (Hall, 2013a).Non-overlapping distribution Perfect allophonyOverlapping distribution  perfect phonemic contrastDistribution of segment ADistribution of segment BFigure 1.1: Continuum between allophony and phonemic contrast (based onFigure 1 in Hall, 2012)51.2 The perceptual effect of phonological relationshipsWhether segments are contrastive or allophonic significantly affects the way thesegments are perceived. Since allophones are variants of a single phoneme cate-gory, listeners tend to be less sensitive to acoustic differences between allophones.For instance, listeners discriminate native allophones with less accuracy and longerlatency than native phonemes (Beddor and Strange, 1982; Boomershine et al.,2008; Harnsberger, 2001; Peperkamp et al., 2003; Whalen et al., 1997).3 However,comparing phonemes and allophones in terms of their discriminability within a sin-gle language has a potential drawback. Since the same set of segments cannot beboth phonemes and allophones in the same language, any such comparison must bemade between different sets of segments. This makes the interpretation of any dif-ferences between phonemes and allophones difficult. This problem is overcome bycomparing phonemes and allophones across languages (Boomershine et al., 2008;Johnson and Babel, 2010). One such study looked at the perception of the voicedalveolar stop [d] and alveolar tap [R] by Spanish and English speakers. Crucially,these two segments are phonemes in Spanish but allophones in English (i.e., thetap [R] occurs intervocalically and the voiced stop [d] occurs elsewhere) (Boomer-shine et al., 2008). The results of the study showed that English speakers perceivedthese segments as being more similar to each other than Spanish speakers did. Thisclearly indicates that the same set of segments are perceived differently by speak-ers of different languages depending on whether the segments are contrastive orallophonic in their native languages.Despite the perceptual effects of allophonic relationships that have beendemonstrated, listeners are not completely insensitive to acoustic differences be-tween native allophones. Listeners’ sensitivity to allophonic variation is affectedby task variables as well. For example, Pegg and Werker (1997) demonstratedthat listeners show better sensitivity to allophonic variation in their native languagewhen tested with a task that allows them to compare test stimuli at the level ofauditory processing (e.g., the AX paradigm). In such a task, listeners’ responses3Phonological relationships are gradient (see Section 1.1). Hall (2009) argues that listeners’sensitivity is also gradient; listeners are less sensitive to acoustic differences between segments thatare more allophonic. For brevity, in what follows I will just say “allophones” when what is intendedis “allophones of a single phoneme”.6are based more on the acoustic properties of the stimuli than on the categorizationof the stimuli (Fujisaki and Kawashima, 1969, 1970; Pisoni, 1973).Studies have also demonstrated that context plays an important role in the per-ception of allophones (Peperkamp et al., 2003; Whalen et al., 1997). For example,listeners’ sensitivity to allophonic variation depends on whether the allophones arepresented in the appropriate contexts or not. Specifically, the discrimination ofnative allophones is harder when they are presented in appropriate contexts thanwhen they are presented in inappropriate contexts (Peperkamp et al., 2003). Otherstudies have demonstrated that listeners use their knowledge about allophonic vari-ation in various speech processing tasks (e.g., Church, 1987). For example, listen-ers use allophonic variation as a cue for syllable boundaries in speech segmentation(Christie Jr., 1974; Nakatani and Dukes, 1977). Christie Jr. (1974) demonstratedthat as the amount of aspiration in [t] in [asta] increases, English speakers’ judg-ments about the location of a syllable boundary changes from [a.sta] to [as.ta].This is because the aspirated [th] and the unaspirated [t] are allophones; while theaspirated [th] occurs in syllable-initial position, the unaspirated [t] occurs in a con-sonant cluster following [s], and the presence or absence of aspiration in [t] servesas a cue for the location of a syllable boundary.In sum, phonological relationships significantly affect listeners’ sensitivityto acoustic differences between segments. Specifically, listeners are sensitive toacoustic differences between segments that are contrastive in their native language,but they are less sensitive to acoustic differences between segments that are allo-phonic in their native language.1.2.1 Reduced sensitivity to allophonic differences: an informationtheoretic accountHall (2009) recently proposed an information theoretic account of the perceptualeffect of phonological relationships. In information theory (Cover and Thomas,2006; Shannon and Weaver, 1949; Shannon, 1951), the amount of information car-ried by a set of messages is measured in entropy, an index of how predictable themessages are. When the messages are more predictable, they are less informativeand their entropy is lower. When the messages are less predictable, they are moreinformative and their entropy is higher. In an information theoretic view on hu-7man cognition, the amount of information in stimuli, or the predictability of thestimuli, significantly affects efficiency in the processing of the stimuli. When theamount of information is larger, or the stimuli are less predictable, the processingrequires more cognitive resources. By contrast, when the amount of information issmaller, or the stimuli are more predictable, the processing requires fewer cognitiveresources (e.g., Hyman, 1953; Pierce, 1980; Wickens, 1981). With speech stimuli,studies have demonstrated that higher predictability of stimuli facilitates speechprocessing in a high-demand task (e.g., Moray and Taylor, 1958; Treisman, 1960,1964, 1965). For example, in Treisman (1964), English speakers heard two pas-sages simultaneously in a binaural recording and were asked to shadow one of thepassages. Participants performed better in the shadowing task when the semanticpredictability of the words in the passage was higher (i.e., adjacent words showedmore English-like transitional probabilities).In speech perception, researchers have argued that listeners modulate selectiveattention according to the predictability of stimuli (Astheimer and Sanders, 2009,2011). For example, listeners have a tendency to attend to the sounds that occurin word onset position (e.g., Connine et al., 1993; Marslen-Wilson and Zwitser-lood, 1989). According to Astheimer and Sanders (2009, 2011), this is because thesounds that occur at the onset of a word are less predictable than the same soundsthat occur within a word (e.g., the transitional probability between adjacent sylla-bles is lower across a word boundary than within a word: Saffran et al. 1996a).Astheimer and Sanders (2009, 2011) argue that this selective attention is benefi-cial for speech perception. Perceiving speech sounds requires the processing ofrapidly changing acoustic information, and attending to the position in which theoccurrences of sounds are unpredictable helps listeners to process the acoustic in-formation in detail and to resolve uncertainty about the sounds. However, when theoccurrences of sounds are predictable, listeners may not need to process the acous-tic information in detail because some parts of the information may be predictableand redundant. In this way, listeners can minimize the amount of resources neededto process predictable sounds.In a similar vein, Hall (2009) argues that listeners become less sensitive toallophonic variation because allophones are more predictable and less informative,and listeners allocate less resources to process allophones (i.e., they become less8attentive to the acoustic properties of allophones). According to Hall,“for pairs of sounds (X and Y) for which there is a low degree ofuncertainty (in context C), it is less crucial for mature language usersto pay particular attention to acoustic and articulatory cues used todifferentiate X and Y in C because these cues are redundant with theinformation provided by C” (Hall, 2009, p. 117).1.3 Acquisition of phonological relationships1.3.1 Early acquisition of phonemic contrastsA great deal of research has investigated the early acquisition of phonemes by in-fants. Infants show signs of the categorical perception of speech sounds from anearly age. Categorical perception is an phenomenon in which observers’ sensitivityto physical differences between stimuli is largely determined by the way the stim-uli are categorized; they are sensitive to the differences between stimuli that areclassified into two different categories but are less sensitive to the same degree ofdifference between stimuli that are classified into a single category (see Goldstoneand Hendrickson, 2010; Harnad, 2005, for recent reviews). In speech perception,listeners are sensitive to acoustic differences between sounds that are classified intotwo different phonemes but are less sensitive to the same degree of acoustic dif-ference between sounds that are classified into a single phoneme (Fry et al., 1962;Liberman et al., 1957, 1961, 1967).4Eimas et al. (1971) demonstrated that 1- and 4-month-old English-learning in-fants perceived a bilabial stop voicing continuum (i.e., a Voice Onset Time (VOT)continuum) categorically; they discriminated a pair of stimuli with 20 ms differ-ence in VOT from a certain region of the continuum (+20 ms vs. +40 ms), but notpairs of stimuli with the same amount of difference in VOT from other regions of4Note, however, that categorization is not the sole factor that determines listeners’ sensitivity. Asmentioned in Section 1.2 above, task variables also determine listeners’ sensitivity. For example,listeners show better sensitivity to within-phoneme acoustic variation in a task that allows them tocompare stimuli at the level of auditory processing (e.g., an AX discrimination task with a short inter-stimulus interval (ISI)). Schouten et al. (2003) have argued that categorical perception is somethingthat emerges from the interaction of listeners’ linguistic knowledge and task variables.9the continuum (−80 ms vs. −60 ms and 0 ms vs. +20 ms). Lasky et al. (1975)demonstrated that 4- to 6.5-month-old Spanish-learning infants also perceived abilabial stop continuum categorically; they discriminated the stimuli with −60 and−20 ms VOT and the stimuli with +20 and +60 ms VOT, but not the stimuliwith −20 and +20 ms VOT. Since the boundary between Spanish voicing cate-gories falls between −20 and +20 ms, the results of Lasky et al. (1975) suggestthat language experience has little effect on determining the 4 to 6.5-month-oldSpanish-learning infants’ sensitivity to the VOT differences, and yet these infantsshow different levels of sensitivity to the same degree of difference in VOT fromdifferent regions of the continuum. Some studies have suggested that what seemsto be category effect in the perception of a VOT continuum actually arises fromnatural sensitivities of the auditory system, not from listeners’ knowledge aboutcategories. For example, Pastore et al. (1988) demonstrated that the auditory sys-tem has different levels of sensitivities to different timing relationships betweenacoustic events.Similarly, Eimas (1974) demonstrated that 3-month-old English-learning in-fants perceived a stop place continuum (i.e., a formant transitions continuum) cat-egorically; they discriminated a pair of stimuli with differences in F2 and F3 tran-sitions that straddled the boundary between alveolar [dæ] and velar [gæ], but didnot discriminate a pair of stimuli with the same amount of differences in F2 andF3 transitions that did not straddle the boundary. These studies suggest that infantsare born with the ability to perceive speech sounds categorically.5Infants initially show good sensitivity to acoustic differences between a wide5Some studies have suggested that categorical perception is a property of the auditory system. Forexample, Cutting and Rosner (1974) demonstrated that adult listeners perceived non-speech audiostimuli categorically. In their study, they used a continuum of sawtooth waveforms that differed inthe rise time from 0 ms to 80 ms. The stimuli with a rapid onset sounded like the plucking of astring instrument whereas the stimuli with a slow onset sounded like the bowing of the same stringinstrument. The results of the study showed that adult English speakers perceived the continuumcategorically; their categorization of the stimuli as either “plucking” or “bowing” showed a non-linear categorization curve, and their performance in a discrimination task was determined by thecategorization. Jusczyk et al. (1977) replicated Cutting and Rosner (1974)’s results with 2-month-old infants, showing that 2-month-old infants perceived the continuum categorically as well. Otherstudies have even suggested that categorical perception is not a unique property of the human auditorysystem, showing categorical perception of human speech sounds by non-human animals like rhesusmonkeys (Waters and Wilson, 1976), Japanese macaques (Kuhl and Padden, 1983), and chinchillas(Kuhl and Miller, 1978).10range of speech sounds, including ones that are not different phonemes in thelanguage of their environment (e.g., Streeter, 1976; Trehub, 1976; Werker andTees, 1983, 1984). Trehub (1976) demonstrated that 1- and 4-month-old English-learning infants can discriminate the alveolar fricative [z] and retroflex fricative [ü]from Czech. Werker and Tees (1983, 1984) demonstrated that 6- to 8-month-oldEnglish-learning infants can discriminate various non-native phonemes, such as thedental stop [d”] and retroflex stop [ã] from Hindi and the velar ejective stop [k’] anduvular ejective stop [q’] from NìePkepmxcı´n (Thompson River Salish).Infants’ sensitivity, however, changes over the first year of life according to theinventory of phonemes in their target language (perceptual reorganization: e.g.,Werker and Tees 1984). They maintain or gain good sensitivity to acoustic dif-ferences between sounds that are different phonemes in their target language butbecome less sensitive to differences between sounds that are not. For example,Werker and Tees (1984) demonstrated that 6- to 8-month-old English-learning in-fants can discriminate Hindi dental [d”] and retroflex [ã], but 10- to 12-month-oldinfants cannot. Similarly, Tsushima et al. (1994) demonstrated that 6- to 8-month-old Japanese-learning infants can discriminate English [l] and [ô], but 10- to 12-month-old infants cannot. These findings suggest that infants learn the sounds thatare phonemes (specifically consonants) in their target language as separate cat-egories between 6-to-8 months and 10-to-12 months of age. Once they learn thecategories, their sensitivity to acoustic differences between speech sounds becomesmore dependent on their knowledge about the categories.Compared to consonants, infants seem to learn vowels a little earlier (e.g.,Polka and Werker, 1994; Polka and Bohn, 1996). A series of studies by Kuhland colleagues demonstrated that infants start showing perceptual magnet effectsin the perception of vowels by 6 months of age (Grieser and Kuhl, 1989; Kuhl,1991; Kuhl et al., 1992). Perceptual magnet effects are a kind of prototype effectin speech perception and are considered to be indicative of listeners’ knowledgeabout the internal structure of sound categories. When listeners compare two phys-ically different sounds from the same category, the discrimination of these soundsis harder when one of the sounds is a category prototype and the other is a non-prototype than when both of them are non-prototypes. This is because prototypestimuli work as a perceptual magnet and perceptually assimilate non-prototypical11stimuli (a phenomenon referred to as perceptual warping: Kuhl 1991; Kuhl andIverson 1995), and this reduces the perceived distance between prototypical stim-uli and non-prototypical stimuli.Kuhl et al. (1992) tested 6-month-old English-learning infants and 6-month-old Swedish-learning infants on their perception of English and Swedish vow-els. While both English and Swedish have the high front unrounded vowel [i] asa phoneme, Swedish also has the high front rounded vowel [y] as a phoneme.The results of the study demonstrated that while English-learning infants showeda stronger magnet effect with the English [i] prototype, Swedish-learning infantsshowed a stronger magnet effect with the Swedish [y] prototype. In other words,while English-learning infants performed worse in discriminating the English [i]prototype from its within-category non-prototypical variants than in discriminat-ing the Swedish [y] prototype from its within-category non-prototypical variants,Swedish-learning infants performed worse in discriminating the Swedish [y] proto-type from its within-category non-prototypical variants than in discriminating theEnglish [i] prototype from its within-category non-prototypical variants.6 Thesefindings suggest that infants learn the categories vocalic sounds that are used asphonemes in their target language by 6 months of age.Perceptual reorganization may help infants to acquire their target lan-guage. Specifically, attending to the sounds that are used to make lexical contrasts(i.e., phonemes) may facilitate the learning of words. For example, Kuhl et al.(2008) demonstrated that infants’ sensitivity to acoustic differences between non-native phonemes at 7.5 months of age predicts the rate of their vocabulary growth6Kuhl et al. (1992) tested the prototypicality of the vowels with adult speakers. Adult Englishspeakers and Swedish speakers were presented with the English [i] prototype and the Swedish [y]prototype and were asked to decide whether the vowel is used in their language, to decide whichcategory the vowel belongs to, and how well the vowel represents the category using a scale from“1” (poor) to “7” (good). Adult English speakers categorized the [i] prototype as a good exemplarof the English /i/ (with an average rating of 5.4), but they responded that the [y] prototype is notused in their language. Adult Swedish speakers categorized the [y] prototype as a good exemplarof the Swedish /y/ (with an average rating of 4.7). They responded that the [i] prototype is used intheir language but is ambiguous with regard to the category; they categorized the [i] prototype as theSwedish /e/ with an average rating of 2.6 or the Swedish /i/ with an average rating of 1.8 (Kuhl et al.,1992, footnote 6). The finding that Swedish speakers did not categorize the English [i] prototype asa good exemplar of the Swedish /i/ suggests that the acoustic properties of the high front unroundedvowel /i/ are quite different between these two languages.12in the next two years. Specifically, those who were more sensitive to differencesbetween non-native phonemes at 7.5 months acquired fewer words by 24 months.Perceptual reorganization, however, is not the acquisition of phonemes per se.Phonemes are the categories of sounds that are used to make lexical contrasts.Therefore, the acquisition of phonemes involves not only the learning of categoriesbut also the learning of their contrastive function. Studies have suggested that, forinfants at the stage of early vocabulary development, being sensitive to acousticdifferences between sounds that are phonemes in their target language does notnecessarily mean that they understand the contrastive function of the sounds.For example, Stager and Werker (1997) reported that 14-month-old English-learning infants reliably discriminated two English phonemes, the bilabial stop /b/and alveolar stop /d/, but they failed to learn a pair of novel words that differed fromeach other in one segment, where the differing segments were the phonemes /b/ and/d/ (a minimal pair), /bI/ and /dI/, in an audio-visual word learning task. Similarly,Thiessen (2007) reported that 15-month-old English learning infants reliably dis-criminated two English phonemes, the voiced stop /d/ and voiceless stop /t/, butthey failed to learn a minimal pair, /dO/ and /tO/, in an audio-visual word learningtask. These studies suggest that despite the ability to discriminate segments thatare distinct phonemes in their target language, the infants of this age group havenot yet acquired the contrastive function of the segments that are phonemes in theirtarget language.77Interestingly, Thiessen (2007, 2011b) found that 15-month-old English-learning infants success-fully learned the lexical contrast between /dO/ and /tO/ when they were exposed to non-minimal pairs,the words that differed from each other in more than one segment (e.g., /dOgo/ and /tObo/), as well asminimal pairs. Feldman and colleagues have argued that for infants of this age group the presentationof a phonemic contrast in a minimal pair or in overlapping lexical contexts makes the perception ofthe contrast harder while the presentation of a phonemic contrast in a non-minimal pair or in non-overlapping lexical contexts makes the perception of the contrast easier because the non-overlappinglexical contexts serve as a cue for the phonemic contrast. According to this view, the infants inThiessen (2007, 2011b) learned the lexical contrast between /dO/ and /tO/ when they were exposedto non-minimal pairs as well because non-overlapping lexical contexts in those non-minimal pairshelped them to establish a more robust phonetic contrast between /t/ and /d/ (Feldman et al., 2011,2013a,b). Alternatively, Rost and McMurray (2009, 2010) have argued that the infants in previousstudies (e.g., Stager and Werker, 1997; Thiessen, 2007) failed to learn the novel minimal pairs be-cause they failed to differentiate the critical phonemes due to the lack of within-category variabilityin the input (e.g., the infants in Thiessen’s study were trained with a single exemplar of each cate-gory). Rost and McMurray demonstrated that 14-month-old English-learning infants learned a novelminimal pair, /buk/ and /puk/, in an audio-visual word learning task with input that showed a large131.3.2 Early acquisition of allophonic variationCompared to the large number of studies on the acquisition of phonemic contrastsby infants, there are relatively few studies on the acquisition of allophonic varia-tion by infants. The existing studies have demonstrated that infants become lesssensitive to acoustic differences between native allophones at the same time asthey learn native phonemes (Dietrich et al., 2007; Hohne and Jusczyk, 1994; Seidlet al., 2009). For example, Seidl et al. (2009) compared French-learning infantsand English-learning infants on their perception of vowel nasality. The differencebetween oral and nasal vowels is contrastive in French but allophonic in English(i.e., the nasalized vowels occur when followed by a nasal consonant). In theirstudy, Seidl et al. exposed French-learning infants (11-month-old) and English-learning infants (4-month-old and 11-month-old) to an artificial language in whichthe type of the coda consonant in CVC syllables was determined by the nasality ofthe preceding vowel (e.g., the fricatives occurred after a nasal vowel and the stopsoccurred after a oral vowel). Their prediction was that if infants can discriminatethe nasal and oral vowels, they should be able to learn the phonotactic patterns.The results of the study showed that the 4-month-old English-learning infants andthe 11-month-old French-learning infants learned the phonotactic patterns but the11-month-old English-learning infants did not, suggesting that the older English-learning infants had become less sensitive to the acoustic differences between thenasal and oral vowels.8Although infants’ sensitivity to acoustic differences between native allophonesstarts declining towards the end of the first year of life, this does not necessarilymean that they become completely insensitive to allophonic variation. Jusczyket al. (1999) demonstrated that 10.5-month-old English-learning infants are able touse their knowledge about allophonic variation to segment speech. In English, thevoiceless aspirated stop [th] and voiceless unreleased stop [t^] are allophones of /t/.The former occurs in syllable onset position as in the first /t/ of the word nitrateamount of within-category variability (e.g., multiple tokens by multiple talkers: Rost and McMurray2009 and multiple exemplars by a single talker: Rost and McMurray 2010).8It is also possible that the older English-learning infants had already learned the contextually-conditioned distribution of the nasal and oral vowels in English and their L1 phonological knowledgeinterfered with the learning of the artificial phonotactic patterns that involved an illegal configuration(i.e., the nasal vowels occurring before an non-nasal consonant) (e.g., Finn and Hudson Kam, 2008).14["n2I.thôeIt^], and the latter occurs in syllable coda position as in the first /t/ of thephrase night rate ["n2It^.ôeIt^]. Jusczyk et al. reported that in a speech segmentationtask 10.5-month-old infants reliably placed a syllable boundary before /t/ when the/t/ was aspirated but not when the /t/ was unreleased.In sum, infants learn to categorize speech sounds according to the phonemeinventory of their target language by the end of the first year of life. They main-tain or gain sensitivity to acoustic differences between segments that are distinctphonemes in their target language. At the same time, infants become less sensitiveto acoustic differences between allophones in their target language.1.3.3 Late acquisition of non-native phonemic contrastsAs a consequence of perceptual reorganization and subsequent phonological acqui-sition, the perception of sounds from other languages that are not used as separatephonemes in the native language becomes significantly more difficult for adults(e.g., Best, 1995; Flege, 1995).According to the Perceptual Assimilation Model (PAM), the degree of difficultyin the perception of the difference between non-native phonemes is determined byhow the non-native phonemes are mapped onto native phoneme categories (Bestet al., 1988, 2001; Best, 1995). Within the framework of the PAM, cross-languagespeech perception can be classified into five different types. In the first type, twonon-native phonemes are mapped onto a single native phoneme, and the differ-ence is hard to perceive (single-category assimilation, e.g., Hindi dental [d”] andretroflex [ã] for English speakers: Tees and Werker 1984). In the second type,two non-native phonemes are mapped onto two different native phonemes, andthe difference is easy to perceive (two-category assimilation, e.g., Zulu voiced lat-eral fricative [Ð] and voiceless lateral fricative [ì] for English speakers: Best et al.2001). In the third type, one of two non-native phonemes is perceived as a goodexemplar of a native phoneme while the other is perceived as a poor exemplar ofthe same native phoneme, and the difference is relatively easy to perceive (cate-gory goodness assimilation, e.g., Zulu aspirated stop [kh] and ejective stop [k’] forEnglish speakers: Best et al. 2001). In the fourth type, two non-native phonemesare mapped onto somewhere in between native phonemes, and the ease of percep-15tion of the difference depends on the proximity to native phonemes (uncategorized,e.g., English [l] and [ô] for Japanese speakers: Guion et al. 2000). In the fifth type,non-native phonemes are not mapped onto any native phonemes, and the differenceis easy to perceive (non-assimilation, e.g., Zulu apical click [|] and palatal click [!]for English speakers: Best et al. 1988).Despite the difficulty that adults experience in the perception of some non-native phonemes, studies have demonstrated that perceptual training can signif-icantly improve their perception (Hirata et al., 2007; Iverson and Evans, 2007,2009; Iverson et al., 2012; Kingston, 2003; Logan et al., 1991; Lively et al., 1993).For example, adult Japanese speakers have great difficulty in perceiving Englishliquids [l] and [ô] (Gillette, 1980; Goto, 1971; Miyawaki et al., 1975; Mochizuki,1981). This is largely because of the absence of these English sounds in Japanese.Japanese has an alveolar tap [R] that is similar to English [l] and [ô]. However, therelationship between Japanese [R] and English [l] and [ô] is complicated. Takagi(1993) reported that Japanese speakers perceive Japanese [R] as being more simi-lar to English [l] than to English [ô]. Sekiyama and Tohkura (1993) reported thatJapanese speakers perceive English [ô] as Japanese [R], [î], and [g]. Finally, Ya-mada and Tohkura (1992) reported that Japanese speakers perceive stimuli that areintermediate between English [l] and [ô] as [î]. The other factor that significantlyaffects the perception of the difference between English [l] and [ô] by Japanesespeakers is perceptual cue weighting. Iverson et al. (2003) reported that while En-glish speakers primarily attend to F3 to perceive the difference, Japanese speakersattend to F2. Since F3 is the dominant acoustic cue that differentiates English [l]from [ô], attending to the wrong cue makes perception of the difference difficult forJapanese speakers (Lotto et al., 2004).Studies have demonstrated that Japanese speakers’ perception of English [l]and [ô] can be significantly improved through perceptual training (Logan et al.,1991; Lively et al., 1993, 1994). In Logan et al. (1991), Japanese speakers weretrained to identify [l] and [ô] produced by multiple talkers in various phonetic con-texts. After three weeks of training, participants showed a significant improvementin the correct identification of [l] and [ô] in both trained and novel words. Livelyet al. (1993) examined the effect of talker variability in training stimuli and demon-strated that learners who were trained with stimuli produced by multiple talkers16generalized their learning to test stimuli produced by a novel talker, but those whowere trained with stimuli produced by a single talker did not. Finally, Lively et al.(1994) demonstrated that the training had a long-lasting effect; Japanese speakersmaintained the effect of the training over three to six months after the last trainingsession. Studies have also demonstrated that Japanese speakers can shift their at-tention to F3 in perceiving [l] and [ô] through training (e.g., Ingvalson et al., 2012;Lim and Holt, 2011).1.3.4 Late acquisition of non-native allophonic variationThe learning of non-native allophonic variation by adults has been much less stud-ied, but there is some evidence showing that adults can learn allophonic variation ina second language (L2). Darcy et al. (2007, 2009) tested the recognition of Frenchwords by L2 French learners (native speakers of English). In French, stop conso-nants assimilate to the voicing of a following obstruent across word boundaries.For example, the word bott is realized as [bot] in isolation but as [bod] before avoiced obstruent. In other words, stop consonants have two variants, voiceless andvoiced, depending on whether or not a voiced obstruent follows. As a consequenceof this contextual variation, words like bott have two different pronunciations, andthis complicates the recognition of the words. Native French speakers, however,know the contextual variation and have no problem in recognizing such words. Forexample, when they hear [bod] before a voiced obstruent, they can infer that thevoicing of the final stop is the result of assimilation to the following voiced ob-struent and the underlying form is voiceless (a process referred to as compensationfor assimilation). In this way, they can recognize the word bott from [bod] occur-ring before a voiced obstruent. In Darcy et al. (2007, 2009), both beginning andadvanced French learners correctly recognized assimilated words, but advancedlearners more consistently than beginning learners. These findings suggest thatadults can learn L2 allophonic variation, but that this depends on the amount ofexperience with the target L2.171.4 Learning mechanisms1.4.1 Distributional learning of sound categoriesAs discussed in Section 1.3.1, infants learn how to categorize speech sounds ac-cording to the inventory of phonemes in their target language by the end of the firstyear of life. Although it is unlikely that they have acquired phonemes with theirfunctional values at this point, they seem to have acquired phonetic categories ofsounds, such as segments. Researchers have argued that infants can learn phoneticcategories from statistical information in input, specifically, the frequency distribu-tion of sounds in acoustic space (distributional learning of sound categories: Kuhl,1994; Lacerda, 1995, 1998; Maye, 2000; Pierrehumbert, 2003).0 25 50 75 100VOTFrequencyFigure 1.2: Frequency distribution of VOT values in English (200 samplesgenerated on the basis of the data in Allen and Miller, 1999)Naturally produced speech comes with an infinite amount of variability. How-ever, the variability is not completely random. The sounds produced in a languageare systematically distributed in acoustic space according to the kinds of soundsused in the language (Abramson and Lisker, 1964; Hillenbrand et al., 1995; Peter-son and Barney, 1952). Figure 1.2, for instance, shows the frequency distributionof VOT values from word-initial stop consonants in English. In this figure, two sep-18Acoustic continuumFrequency(a) Distribution of soundsAcoustic continuumFrequencycategoryAB(b) Learned categoriesFigure 1.3: Inference of categories from distributional shapearate frequency peaks are clearly visible—one at around 10-15 ms and the other ataround 65-75 ms—and these two frequency peaks represent two stop voicing cat-egories in English, the voiceless unaspirated stops (or short-lag) and the voicelessaspirated stops (or long-lag).same differentAcoustic continuumFrequencycategoryABFigure 1.4: Category effect19The distributional learning hypothesis assumes that human learners are sensi-tive to this kind of distributional information in input. They can keep track of thefrequencies of sounds that occur in the input and learn the frequency distributionsof the sounds in acoustic space. Once they learn the distributions, they can usethe frequency peaks as categories or form categories as abstract representationsbased on the frequency peaks, and start using their knowledge about the categoriesto classify the sounds. For instance, Figure 1.3a shows the frequency distribu-tion of sounds along a hypothetical acoustic continuum. The distribution has twofrequency peaks (bimodal distribution). The distributional learning hypothesis pre-dicts that when learners are exposed to this input, they should be able to learn twocategories based on the number of frequency peaks in the input. Figure 1.3b showsthe two categories that should be learned from this input.9In the distributional learning hypothesis, it is assumed that the learning of cate-gories has a signifiant impact on speech perception. Specifically, it affects learners’sensitivity to acoustic differences between sounds. Learners become more sensitiveto acoustic differences between sounds that are classified into separate categoriesand/or less sensitive to the same degree of acoustic difference between sounds thatare classified into the same category. Figure 1.4 shows that once learners acquireknowledge about the categories, they start treating two sounds that are classifiedinto two categories as different sounds and two sounds that are classified into thesame category as the same sound. As a result, they become more sensitive to theacoustic differences that straddle the boundary between two categories and/or lesssensitive to the same degree of acoustic difference that happen within the samecategory.Experimental studies have demonstrated that infants are sensitive to this kindof distributional information (Cristia et al. 2011a, Maye et al. 2002, 2008, Yoshidaet al. 2010, cf. Pons et al. 2006). For example, Maye et al. (2002) exposedtwo groups of English-learning infants (6-month-olds and 8-month-olds) to syl-lables taken from an 8 step-continuum between the prevoiced [da] and voicelessunaspirated [ta] in which VOT was systematically manipulated. For one group,9This inductive learning process has been implemented in various computational models(De Boer and Kuhl, 2003; Guenther and Gjaja, 1996; Lin, 2005; McMurray et al., 2009; Vallabhaet al., 2007).20051015201 2 3 4 5 6 7 88-step continuum of [da]-[ta] stimuliFrequency GroupBimodalUnimodalFigure 1.5: Bimodal vs. Unimodal distribution of [da]-[ta] stimuli (Based onFigure 1 in Maye et al., 2002)the frequency distribution of the eight syllables had two separate peaks (Bimodalgroup). For the other group, the frequency distribution had only one peak (Uni-modal group) (Figure 1.5). After exposure, infants in the bimodal group discrimi-nated the test stimuli taken from the end points of the continuum (i.e., the canonicaltokens of [da] and [ta]), but infants in the unimodal group did not. These resultssuggest that while infants in the bimodal group learned to classify the syllablesinto two separate categories, infants in the unimodal group learned to classify thesyllables into a single category.Experimental studies have demonstrated that adults are also sensitive to thiskind of distributional information. These studies have tested the distributionallearning of stop voicing categories (Maye, 2000; Maye and Gerken, 2000, 2001;Hayes-Harb, 2007), vowel categories (Gulian et al., 2007; Goudbeek et al., 2008),fricative place categories (Pajak, 2012; Pajak and Levy, 2012), consonantal dura-tion categories (Pajak, 2012; Pajak and Levy, 2012), and tonal categories (Onget al., 2015). Maye (2000), for example, exposed two groups of adult Englishspeakers to syllables taken from 8-step stop voicing continua (e.g., between theprevoiced [da] and voiceless unaspirated [ta]) in which VOT, F1 transition, and F2transition were systematically manipulated.10 For one group, the frequency distri-10Note that acoustic space is multi-dimensional. A phonetic contrast between two categories canbe made by multiple acoustic cues, and the distributional learning of sound categories involves theintegration of these acoustic cues (Toscano and McMurray, 2010). Most of the previous experimental21bution of the stop consonants showed a bimodal shape (Bimodal group). For theother group, the frequency distribution of the stop consonants showed a unimodalshape (Unimodal group). After exposure, participants in the bimodal group showedsignificantly better sensitivity to acoustic differences between the prevoiced andvoiceless unaspirated stops, compared to participants in the unimodal group. Theseresults suggest that while participants in the bimodal group learned to classify thestop consonants into two separate categories, participants in the unimodal grouplearned to classify the stop consonants into a single category.These studies suggest that distributional learning is a learning mechanism usedby both infants and adults to learn sound categories. However, this does not meanthat distributional learning is equally effective for learners from all age groups. Aslearners acquire more knowledge about their L1 phonetics and phonology, it signif-icantly affects the effectiveness of the learning. For example, Yoshida et al. (2010)tested the distributional learning of the prevoiced [da] and voiceless unaspirated[ta] by 10-month-old English-learning infants. After exposure, 10-month-old in-fants showed different levels of sensitivity to acoustic differences between [da] and[ta] depending on the distributional information in their input, but compared to 6-and 8-month-old infants, 10-month-old infants required a longer exposure to showthe same amount of learning. This is probably because 10-month-old infants havealready acquired the English voicing categories and their L1 knowledge interfereswith the learning of novel voicing categories.Studies have also demonstrated the impact of adults’ L1 phonological knowl-edge on the distributional learning of novel sound categories (Pajak, 2012; Pajakand Levy, 2012). Pajak and Levy (2012) tested the distributional learning of conso-nantal duration categories (short consonants vs. long consonants) by adult speakersof Korean and Mandarin. While the difference in segmental duration is phonemicfor vowels in Korean, but not for consonants, the difference in segmental dura-tion is not phonemic at all in Mandarin. Therefore, Pajak and Levy expected thatKorean speakers would be more sensitive to the distribution of consonants alonga segmental duration continuum because they already know that the difference instudies on the distributional learning of sound categories have focused on the role of a single acousticcue, but studies have suggested that different acoustic cues have different weights in terms of theircontribution to the learning of sound categories (e.g., Cristia et al., 2011a).22segmental duration is phonemic for vowels in their native language (i.e., their L1phonological knowledge biases their perception such that it predisposed them toattend to segmental duration). The results of the study showed that Korean speak-ers were indeed more sensitive than Mandarin speakers to the difference betweenthe bimodal and unimodal distributions of consonants along a segmental durationcontinuum. These results suggest that the effectiveness of distributional learningby adults is affected by their L1 phonological knowledge.Another difference between infants and adults is found in the robustness of thelearning, specifically the ability to generalize what they learn from input to novelstimuli. Maye and colleagues exposed infants to input that supported the learn-ing of novel voicing categories for stop consonants from one place of articulation(e.g., the alveolar prevoiced [d] and alveolar voiceless unaspirated [t]). After expo-sure, the infants not only learned the voicing categories they were exposed to, theyalso generalized the categories to an unfamilar place of articulation (e.g., the velarprevoiced [g] and velar voiceless unaspirated [k]) (Maye and Weiss, 2003; Mayeet al., 2008). Maye and colleagues also tested adults’ ability to generalize the samevoicing categories from one place of articulation to another place of articulation,but the adults did not generalize (Maye, 2000; Maye and Gerken, 2001). Later,Pajak (2012) demonstrated that adults generalized what they learned from input tonovel stimuli. In her study, adults learned novel segmental duration categories forconsonants from one manner class (e.g., short sonorant [l] and long sonorant [l:])from the input and generalized the newly learned categories to consonants from adifferent manner class (e.g., short fricative [s] and long fricative [s:]).From these limited data, it seems that adults have less robust abilities to gen-eralize what they learn from input to novel stimuli. One possible explanation isa difference between infants and adults in their cognitive capacity; infants havea smaller cognitive capacity, and this restriction forces them to learn new cate-gories in a more feature-based manner (e.g., prevoiced vs. voiceless unaspirated),while adults have a larger cognitive capacity, and this enables them to learn newcategories in a more item-based manner (e.g., alveolar prevoiced stop vs. alveolarvoiceless unaspirated stop) (e.g., Newport, 1988, 1990). Whatever the source ofthis difference in generalization is, it remains true that both infants and adults areable to learn the categories of sounds from the frequency distribution of the sounds23in input.1.4.2 Distributional learning of allophonyThe limitation of the distributional learning hypothesis is that it explains the learn-ing of phonetic categories such as segments but does not explain the acquisitionof sound systems. In sound systems, some segments are used as allophones orcontext-dependent variants of a single phoneme. Studies on the development of in-fants’ speech perception suggest that infants are learning the segments used in theirtarget language and the allophonic relationships between some of these segments atthe same time (Dietrich et al., 2007; Hohne and Jusczyk, 1994; Seidl et al., 2009).Researchers have argued that allophony can be learned from statistical informationin input (Peperkamp et al., 2006a). The kind of information infants would have totrack to learn allophony is quite different from that required to learn segments. Be-cause some allophones occur in mutually exclusive contexts, their occurrences arepredictable from the contexts. Therefore, by learning the relative predictabilities ofsegments in particular environments, learners should be able to figure out whetherthe segments are allophones.Note, however, that in natural languages, segments that are in complementarydistribution are not always allophones. For example, in French, the bilabial ap-proximant [4] always occurs as the last consonant of an initial consonant cluster(e.g., pluie [pl4i]), while the mid-low front rounded vowel [œ] always occurs in aclosed syllable (e.g., peur [pœK]). This means that [4] and [œ] are in complemen-tary distribution; the former occurs before a vowel and the latter occurs before aconsonant. However, these two segments are not allophones in French (Peperkampet al., 2006a). Therefore, the learning of allophony from the statistical informationin input should be constrained in some way such that not just any segments that arein complementary distribution are learned as allophones.Peperkamp et al. (2006a) proposed two constraints on the learning of allo-phones. The first one requires that potential allophones are phonetically similar toeach other. The second one requires that the contextual distribution of potential al-lophones is phonetically natural in a sense that the allophones and their respectivecontexts are phonetically similar to each other (i.e., the distribution assumes assim-24ilatory patterns). In the above example, [4] and [œ] are not allophones in Frenchbecause they are phonetically too different to be allophones and/or the occurrencesof [4] before a vowel and [œ] before a consonant are not natural in the above sense.Studies have demonstrated that infants learn phonotactic regularities in theirtarget language at the same time as they learn the categories of speech sounds(Archer and Curtin, 2011; Jusczyk et al., 1993; Jusczyk and Luce, 1994; Nazziet al., 2009). For example, Jusczyk et al. (1993) demonstrated that 6- and 9-month-old English-learning infants show a sensitivity to the difference between phonotac-tically legal and illegal forms in English. Jusczyk and Luce (1994) further demon-strated that 6- and 9-month-old English-learning infants show a similar sensitivityto phonotactic probabilities in English; they prefer to listen to nonce words that arephonotactically more probable (e.g., [kæz]) over nonce words that are phonotacti-cally less probable (e.g., [guS]). Therefore, it is possible that infants can integratetheir knowledge about segments and their phonotactic distribution and figure outwhether the occurrences of the segments are predictable in particular environments.Artificial language learning experiments have demonstrated that infants havea robust ability to learn phonotactic regularities in input with a relatively smallamount of exposure (Chambers et al., 2003, 2011; Cristia and Seidl, 2008; Cristiaet al., 2011b; Saffran and Thiessen, 2003; Seidl and Buckley, 2005). For example,Saffran and Thiessen (2003) demonstrated that 9-month-old English-learning in-fants learned novel first-order phonotactic patterns—positional restrictions on theoccurrence of certain classes of segments (e.g., voiceless stops in syllable onsetposition and voiced stops in syllable coda position)—with a fairly small amount ofexposure (two minutes).If infants have such a robust ability to learn phonotactic regularities in input,the question arises as to whether they can use their knowledge about the phonotac-tic distribution of segments across environments to infer allophonic relationshipsbetween the segments. As far as I know, there is only one study that has testedthe learning of allophony by infants from the phonotactic distribution of segmentsin artificial input. White et al. (2008) exposed 8.5- and 12-month-old English-learning infants to input in which a pair of segments (e.g., [b] and [p]) occurredin overlapping contexts ([b]-initial words and [p]-initial words occurring after [na]and [rot]: e.g., na bevi, na pevi, rot bevi, rot pevi), but the other pair (e.g., [z] and25[s]) occurred in mutually exclusive contexts ([z]-initial words occurring after [na]and [s]-initial words occurring after [rot]: e.g., na zuma, rot suma). In other words,phonotactic regularities in the input implied that [b] and [p] are contrastive and [z]and [s] are allophonic. After exposure, infants were presented with the alternationof a [b]-initial word and its [p] initial counterpart (e.g., rot poli, na boli, rot poli,na boli, ...) and the alternation of a [z]-initial word and its [s]-initial counterpart(e.g., rot sadu, rot sadu, na zadu, rot sadu,...). White et al. predicted that if infantshad learned [b] and [p] as distinct phonemes and [z] and [s] as allophones, theyshould hear the [b]-[p] alternation differently from the [z]-[s] alternation. Specifi-cally, they should not hear the [z]-[s] alternation as alternation because they treat [z]and [s] as allophones. After exposure, both 8.5- and 12-month-old infants listenedlonger to the [z]-[s] alternation. White et al. interpreted the results as a noveltyeffect; since infants had been exposed to a wide variety of stimuli presented in arandom order during exposure, the presentation of alternating allophones (i.e., therepetition of the “same” sound) triggered a novelty effect. The results of the studysuggest that infants can learn to treat pairs of sounds differently depending on theirphonotactic distribution in input.Adults also have a robust ability to learn phonotactic regularities (Dell et al.,2000; Onishi et al., 2002; Chambers et al., 2010). For example, Onishi et al. (2002)demonstrated that adult English speakers learned first-order phonotactic patterns—positional restrictions on the occurrence of certain classes of segments (e.g., [b,k, m, t] in syllable onset position and [p, g, n, tS] in syllable coda position)—with a fairly small amount of exposure (120-130 items). Onishi et al. (2002) fur-ther demonstrated that adult English speakers learned more complex second-orderphonotactic patterns, where the occurrence of certain segments in certain positionswas conditioned by the types of neighbouring segments (e.g., [b] in syllable onsetposition and [p] in syllable coda position if the vowel nucleus is [æ], but [p] inonset position and [b] in coda position if the vowel nucleus is [I]).Peperkamp et al. (2003) tested the learning of allophony by adults from thephonotactic distribution of segments in artificial input. The input used in theirstudy contained two different kinds of distributional information: (1) a bimodalfrequency distribution of sounds that implied the categorization of the sounds intotwo segments, and (2) a phonotactic distribution of the segments that implied ei-26ther a phonemic contrast or allophony between the segments. They exposed twogroups of adult French speakers to input in which the frequency distribution im-plied the categorization of fricative sounds into the voiced uvular fricative [K] andvoiceless uvular fricative [X]. For one group, the phonotactic distribution was notconditioned by context; both [K] and [X] occurred before voiced and voiceless con-sonants. This implied a phonemic contrast between the segments. For the othergroup, the phonotactic distribution was conditioned by context; [K] occurred be-fore voiced consonants and [X] occurred before voiceless consonants. This impliedan allophonic relationship between the segments. All participants were tested onthe discrimination of [K] and [X] before and after exposure.Peperkamp et al. (2003) predicted that participants in the second group wouldlearn the target segments as allophones and thus become less sensitive to acousticdifferences between the segments. The results of the study, however, were not soclear. First, participants in the second group showed significantly better sensitivityto acoustic differences between [K] and [X] compared to participants in the firstgroup already in the pre-test. This makes the interpretation of any possible learn-ing effects difficult. Second, participants in both groups showed significant im-provement in sensitivity after exposure. Those who were in the first group showeda numerically larger improvement, but the difference between the groups was notstatistically significant. The interpretation of this possible learning effect, however,is complicated by the fact that [K] and [X] are allophones in French; the voiced [K]occurs before voiced consonants and voiceless [X] occurs before voiceless conso-nants. Therefore, it is not entirely clear whether the difference between the groupswas due to the learning of different phonological relationships, or simply the in-terference of participants’ L1 phonological knowledge with the processing of theexposure stimuli. Participants in the second group could have learned [K] and [X]as allophones in the artificial language, but it is also possible that they processed[K] and [X] in the exposure stimuli as allophones from the beginning since theyoccurred in contexts that conformed to the complementary distribution of thesesegments in French, and no learning happened. To date, there is no convincing evi-dence that adults can learn allophony from the phonotactic distribution of segmentsin input.27024681 2 3 4 5 6 7 8Acoustic valuesFrequency ContextsC1C2(a) Overlapping distribution024681 2 3 4 5 6 7 8Acoustic valuesFrequency ContextsC1C2(b) Complementary distributionFigure 1.6: Non-complementary distribution vs. complementary distribution1.5 Outline of dissertationIn this dissertation, I investigate the mechanisms behind the learning of allophony.Specifically, I test the learning of an allophonic relationship between two novelsegments by adults. Adults have the ability to learn phonetic categories or segmentsfrom the frequency distribution of sounds in acoustic space. They also have theability to learn the phonotactic regularities in the input. However, there does notyet exist any convincing evidence demonstrating that adults can learn phonologicalrelationships between segments from the phonotactic distribution of the segments.Figure 1.6a shows the frequency distribution of eight sounds taken from a hy-pothetical acoustic continuum. The bimodal shape of the distribution implies thecategorization of the eight sounds into two separate segments (i.e., sounds 1 - 4belong to one segment, and sounds 5 - 8 belong to the other segment). In thisfigure, all of the eight sounds occur in two different contexts (C1 and C2) withequal frequency. Therefore, the implied segments are also occurring in two differ-ent contexts with equal frequency. This means that the segments are occurring inoverlapping contexts (i.e, they are in non-complementary distribution), and theiroccurrences are not predictable from the contexts. When learners are exposed tothis kind of input, they should learn that the sounds are categorized into two seg-28ments and that the segments are potentially contrastive.11Figure 1.6b shows the same bimodal distribution of the same eight sounds.In this figure, the first half of the eight sounds consistently occur in one context(i.e., sounds 1 - 4 occurring in context C2), and the second half consistently occurin the other context (i.e., sounds 5 - 8 occurring in context C1). This means thatthe implied segments are occurring in two mutually exclusive contexts (i.e., theyare in complementary distribution), and their occurrences are predictable from thecontexts. When learners are exposed to this kind of input, they should learn thatthe sounds are categorized into two segments but the segments are allophonic, rep-resenting a single phoneme category.The difference between the learning of phonemic contrast and the learning ofallophonic variation should be reflected in many aspects of learners’ speech per-ception. In this dissertation, I focus on learners’ sensitivity to acoustic differencesbetween target segments because this is the measure that has been used to assessthe learning of novel sound categories in most of the previous studies on the distri-butional learning of sound categories. If learners learn the the target segments asallophones, they should show reduced sensitivity to acoustic differences betweenthe target segments.In Experiment 1, adult English speakers were exposed to input in which thefrequency distribution of novel fricative sounds implied the categorization of thesounds into two segments, the retroflex fricative [ù] and alveolopalatal fricative [C],and the phonotactic distribution of the segments implied either a phonemic contrastor allophony between the segments. In one condition, participants were exposedto input in which the occurrences of the segments were not predictable from thecontexts (non-complementary condition; cf. Figure 1.6a). In another condition,participants were exposed to input in which the occurrences of the segments werepredictable from the contexts (complementary condition; cf. Figure 1.6b). Theresults of Experiment 1 suggest that participants in the complementary conditionlearned to treat the novel fricatives as something like allophones. This supports thehypothesis that adults can learn allophonic relationships between segments fromthe phonotactic distribution of the segments in input.11Since no semantic information is taken into consideration, it remains unknown whether thesegments are learned as phonemes.29Experiment 2 examined whether the learning of allophony is constrained bythe phonetic naturalness of the patterns of complementary distribution. Studieson artificial language learning have demonstrated that the learning of phonologicalpatterns can be constrained or biased by the phonetic naturalness of the patterns;phonetically motivated patterns are more learnable than phonetically unmotivatedones (Carpenter, 2010; Schane et al., 1975; Wilson, 2003, 2006). Experiment 2tested whether the phonetic naturalness of the complementary distribution of theretroflex [ù] and alveolopalatal [C] in input affects the learning of these fricativesas allophones. The results of Experiment 2, alongside the results of Experiment 1,indicate that adults can learn allophonic relationships between two segments onlywhen the patterns of complementary distribution are phonetically natural.In order to explain how phonetic naturalness affects the learning of allophony,I explore the role of perceptual biases and propose a hypothesis about the mecha-nisms behind the acquisition of reduced sensitivity to acoustic differences betweenallophones (the context effects hypothesis). The hypothesis is that the learning ofallophony involves the context-dependent perception of sounds in the input. Whenlisteners hear sounds presented in different contexts, they perceive the sounds dif-ferently. Specifically, listeners perceive the instances of two different segments asbeing more similar to each other when they hear the sounds in phonetically naturalcomplementary contexts than in phonetically unnatural complementary contexts.Therefore, when learners are exposed to input in which the instances of two tar-get segments are occurring in phonetically natural complementary contexts, thecontext-dependent perception of the sounds affects the aggregate distribution ofthe sounds in auditory space such that frequency peaks are closer to each otherthan they actually are and the boundary between the categories is less clear. Thelearning of such an aggregate distribution leads to the learning of less distinct cat-egories, or it may even lead to the learning of a single category. Experiment 3tested whether adult learners’ perception of the retroflex [ù] and alveolopalatal [C]is affected by context in the way I assume. The results showed that learners’ per-ception became significantly more dependent on context after exposure; learnersperceived the instances of these two segments as being more similar to each otherwhen the sounds were presented in phonetically natural complementary contextsthan in phonetically unnatural complementary contexts.30In Chapter 2, I present the details of Experiment 1. In Chapter 3, I presentthe details of Experiment 2. The presentation of the results of Experiment 2 isfollowed by a discussion where I propose the the context effects hypothesis aboutthe learning of allophonic relationships. In Chapter 4, I present the details of Ex-periment 3. Finally, in Chapter 5, I will discuss the results of the experiments in alarger context.31Chapter 2Experiment 1: Distributionallearning of allophony2.1 IntroductionExperiment 1 tested whether adults can learn an allophonic relationship betweentwo novel segments. The experiment was designed as a modified version of previ-ous experiments on the distributional learning of sound categories by adults (e.g.,Maye, 2000; Maye and Gerken, 2000; Peperkamp et al., 2003). In this experiment,adult English speakers were exposed to input in which the frequency distribution ofnovel sounds showed a bimodal shape, implying that the sounds are classified intotwo segments. While the shape of the frequency distribution was kept the samein the input for all participants, the phonotactic distribution of the segments dif-fered between two experimental conditions. In the first condition, the segmentsoccurred in overlapping contexts and the occurrences of the segments were notpredictable from the contexts (non-complementary condition). In the second con-dition, the segments occurred in mutually exclusive contexts and the occurrencesof the segments were predictable from the contexts (complementary condition).I predicted that participants in the non-complementary condition would learn totreat the target segments as something like distinct phonemes and maintain or gainsensitivity to acoustic differences between the segments. By contrast, participantsin the complementary condition would learn to treat the target segments as some-32thing like allophones and become less sensitive to acoustic differences between thesegments.2.2 Target segmentsThe target segments used in this experiment were two post-alveolar voiceless frica-tives from Mandarin, the retroflex [ù] and alveolopalatal [C]. While English hasonly one class of post-alveolar fricatives, the palato-alveolar fricatives ([S], [Z]),Mandarin has two, and the phonetic contrast between these two classes of frica-tives in Mandarin is known to be difficult for English speakers to acquire (e.g.,Chao, 1948). In this section, some background information about these Mandarinsounds will be provided.2.2.1 Post-alveolar fricatives in MandarinMandarin has two classes of post-alveolar sibilants. The first class is often calledretroflex in the literature ([ù, úù, úùh]) (Chao, 1948; Duanmu, 2007). However, ar-ticulatory studies have demonstrated that these sounds are not really retroflex (Hu,2008; Ladefoged and Wu, 1984; Lee, 2008; Lee-Kim, 2014; Noguchi et al., 2015a;Proctor et al., 2012; Toda and Honda, 2003). For example, using X-ray imagesand palatograms, Ladefoged and Wu (1984) demonstrated that the articulation ofso-called retroflex sibilants in Mandarin does not involve the retroflexion of thetongue tip as is usually observed in the articulation of truly retroflex consonantsin other languages. Rather, their articulation involves the formation of a short andslack constriction channel in the post-alveolar region using the upper part of thetongue front and the formation of a large front cavity. Similar observations havebeen made in later studies using other imaging techniques such as MRI (Proc-tor et al., 2012; Toda and Honda, 2003) and ultrasound (Lee-Kim, 2014; Noguchiet al., 2015a). Due to the absence of retroflexion, researchers have used differentlabels for this class of Mandarin post-alveolar sibilants, such as flat post-alveolar(Ladefoged and Maddieson, 1996; Toda and Honda, 2003) and apical post-alveolar(Lee, 2008; Proctor et al., 2012). Here, I will use the traditional label retroflex forconvenience.The second class of post-alveolar sibilants is often called palatal ([C, tC, tCh])33(Chao, 1948; Duanmu, 2007). The articulation of these sibilants in Mandarin in-volves the formation of a long and narrow constriction channel over the post-alveolar and palatal regions using the tongue blade and the tongue body. The for-mation of the palatal constriction is achieved by the forward and upward move-ments of the tongue body and the advancement of the tongue root (Hu, 2008;Ladefoged and Wu, 1984; Lee, 2008; Lee-Kim, 2014; Proctor et al., 2012; Todaand Honda, 2003). Researchers have used different labels for this class of Man-darin post-alveolar sibilants, such as alveolopalatal (Ladefoged and Maddieson,1996) and anterodorsal post-alveolar (Lee, 2008). Here, I will use the label alve-olopalatal for convenience.1The phonetic contrasts between fricatives from different places of articula-tion are characterized by differences in a number of acoustic properties. Princi-pal among them are the spectral shape of frication noise and formant transitionsinto the following vowel (Gordon et al., 2002; Hughes and Halle, 1956; Jongmanet al., 2000; McMurray and Jongman, 2011; Wilde, 1993). For sibilant fricatives,the spectral shape of frication noise is largely determined by the size of the frontcavity; the smaller the front cavity is, the higher the centroid frequency is (Heinzand Stevens, 1961). In English, for example, while the alveolar fricatives ([s] and[z]) have a spectral peak around 7000 Hz, the palato-alveolar fricatives ([S] and [Z])have a spectral peak around 4000 Hz (Jongman et al., 2000).There are a number of studies on the acoustics of sibilant fricatives in Man-1The phonological relationships between alveolopalatal consonants and consonants from otherplaces of articulation have been a major issue in the literature on Mandarin phonology. Synchron-ically speaking, alveolopalatal consonants and some other consonants are in complementary dis-tribution; alveolopalatal consonants occur before high front vowels ([i, y]) or glides ([j, 4]) whiledental, retroflex, and velar consonants do not. Phonologists have proposed various hypotheses forthe phonological status of alveolopalatal consonants: (1) allophones of dental consonants (Duanmu,2007; Hartman, 1944), (2) allophones of velar consonants (Chao, 1948), or (3) underlying phonemes(Cheng, 1973). Recently, Lu (2011) examined how much the hypothesized phonological status ofalveolopalatal consonants affects Mandarin speakers’ perception. In her study, Lu compared Man-darin and Korean speakers’ ratings of the similarity between the dental [s] and alveolopalatal [C].Crucially, these two sounds are in complementary distribution in both languages, but they alternatewith each other in phonological processes only in Korean (i.e., these two sounds are more allophonicin Korean than in Mandarin in the sense that the allophony is supported by distribution and alterna-tion in Korean but only by distribution in Mandarin). The results of the study showed that Koreanspeakers rated the sounds as being more similar to each other than Mandarin speakers did, suggestingthat the allophonic relationship between [s] and [C] is not as well established in Mandarin as it is inKorean.34darin (Chang, 2013; Chiu, 2009; Lee, 2011; Li, 2008; Li et al., 2007; Svantesson,1986). For example, Lee (2011) reported that the dental/alveolar [s] has a centroidfrequency between 7000 and 11000 Hz, the retroflex [ù] has a centroid frequencybetween 3000 and 6000 Hz, and the alveolopalatal [C] has a centroid frequencybetween 5000 and 10000 Hz. These values reflect how these fricatives are pro-duced. Dental/alveolar fricatives are produced with an anterior constriction anda small front cavity. This is reflected in the concentration of spectral energy inthe higher frequency region. Retroflex fricatives, by contrast, are produced with aposterior constriction and a large front cavity. This is reflected in the concentra-tion of spectral energy in the lower frequency region. Moreover, a short and slackconstriction channel for the production of retroflex fricatives allows a higher de-gree of acoustic coupling between the front and back cavities and further increasesspectral prominence in the lower frequency region (Stevens et al., 2004). Alve-olopalatal fricatives are produced with a constriction channel that stretches fromthe post-alveolar to palatal regions, but the long and narrow constriction channelreduces the amount of the acoustic coupling and prevents the increase of spectralprominence in the lower frequency region. This is reflected in the concentration ofspectral energy in the intermediate frequency region.In a CV string, formant transitions between the consonant and the vowel sys-tematically change according to the place of the articulation of the consonant (e.g.,Sussman et al. 1991; Wilde 1993; cf., Fowler 1994). For sibilant fricatives, for ex-ample, studies on Polish fricatives, which are similar to the ones in Mandarin, havereported that F2 transitions systematically change according to the class of the sibi-lant fricatives (Nowak, 2006; Zygis and Padgett, 2010). Nowak (2006) measuredF2 transitions at 25 ms and 75 ms of vowels following three sibilant fricatives inPolish, the dental [s”], retroflex [ù], and alveolopalatal [C]. Nowak reported that thevowel [a] showed a slight F2 falling after the dental [s”] and retroflex [ù] but a steepF2 falling for the alveolopalatal [C].For Mandarin fricatives, Chiu (2009) measured F2 transitions from the onsetto the midpoint of the vowel [a] after three sibilant fricatives in Taiwan Mandarin,the dental/alveolar [s], retroflex [ù], and alveolopalatal [C]. Chiu reported that F2transitions after these three fricatives showed a falling contour, and the transitionafter the alveolopalatal [C] showed the steepest fall (see Table 2.1).35Class IPA Onset F2 −Mid F2 (Hz)Dental/alveolar [s] 64.18Retroflex [ù] 170.05Alveolopalatal [C] 337Table 2.1: F2 transitions of Mandarin sibilants in [Ca] syllables (based onTable 2 in Chiu, 2009)The differences in the spectral shape of frication noise and formant transitionsare used as perceptual cues in the perception of fricatives (Delattre et al., 1962;Harris, 1954, 1958; Wagner et al., 2006; Whalen, 1981a,b, 1991), but the degreeto which listeners rely on these cues varies depending on the class of fricatives andthe inventory of fricatives in a given language. For example, Harris (1954) reportedthat English speakers rely more on frication noise in the perception of two sibilantfricatives ([s] and [S]) but formant transitions in the perception of two non-sibilantfricatives ([f] and [T]). Wagner et al. (2006) argued that perceptual cue weightingis largely determined by the inventory of fricatives. For example, English speakersrely on formant transitions in the perception of [f] because English has anothernon-sibilant fricative, [T], which is spectrally similar to [f]. Dutch speakers, bycontrast, rely on frication noise in the perception of [f] because Dutch does nothave another non-sibilant fricative that is spectrally similar to [f]. Studies havesuggested that the perceptual cue weighting is learned over the course of languageacquisition. In English, for example, children rely more on formant transitionsand less on frication noise than adults do (Nittrouer, 1992; Nittrouer and Studdert-Kennedy, 1987; Nittrouer and Miller, 1997a,b; Nittrouer, 2002).For Mandarin sibilant fricatives, Chiu (2010) demonstrated that native speakersrely on different cues in perceiving different sibilant fricatives. In his study, Man-darin speakers were asked to identify three sibilant fricatives, the dental/alveolar[s], retroflex [ù], and alveolopalatal [C], presented before congruent and incongru-ent formant transitions in CV syllables. Stimuli were created by cross-splicing thetokens of frication noise before the tokens of the vowel [a] with different patternsof formant transitions. The stimuli with congruent formant transitions were createdby cross-splicing a token of frication noise before a token of [a] originally produced36after the same fricative sound (e.g., [s] + [(s)a]). The stimuli with incongruent for-mant transitions were created by cross-splicing a token of frication noise before atoken of [a] originally produced after different fricative sounds (e.g., [s] + [(ù)a]).The results of the study showed that incongruent formant transitions affected theidentification of the fricatives in some cases. Specifically, the identification of thedental/alveolar [s] and retroflex [ù] was fairly accurate except when these fricativeswere presented before [(C)a]; then they were misidentified as the alveolopalatal [C].Moreover, the identification of the alveolopalatal [C] was less accurate when theywere presented before [(s)a] and [(ù)a]. From these results, Chiu concluded thatMandarin speakers primarily attend to frication noise in perceiving the dental/alve-olar [s] and retroflex [ù] but to formant transitions in perceiving the alveolopalatal[C].2.2.2 Post-alveolar fricatives in Mandarin and EnglishSince this experiment tests the learning of Mandarin post-alveolar fricatives bynative speakers of English, it is crucial to understand, first, how these Mandarinsounds are different from English post-alveolar fricatives, and second, how En-glish speakers perceive these Mandarin sounds. Toda and Honda (2003) made anarticulatory comparison between English and Mandarin sibilant fricatives. In theirMRI data, English palato-alveolar [S] overlaps with Mandarin retroflex [ù] and alve-olopalatal [C] in terms of the size of the front cavity and the average width of thepalatal constriction channel. Li et al. (2007) made an acoustic comparison betweenEnglish and Mandarin sibilant fricatives and reported that English palato-alveolar[S] falls in between Mandarin retroflex [ù] and alveolopalatal [C] in terms of am-plitude ratio (the difference in dB between the amplitude of the most prominentspectral peak and the amplitude of the second formant), an acoustic measure thatcorrelates with the degree of palatalization. In other words, English [S] is slightlymore palatalized than Mandarin [ù] but less palatalized than Mandarin [C]. Thesesimilarities between English [S] and Mandarin [ù] and [C] explain why the phoneticcontrast between these two Mandarin sounds is particularly difficult for Englishspeakers to acquire (e.g., Chao, 1948).37Table 2.2: Identification of Mandarin sibilant fricatives and the rating of theirsimilarity to English fricatives by English speakers (based on Table 8 inHao, 2012)Stimuli Group% identification (similarity score)/s/ /z/ /S/ /Z/ /tS/[ù1]Advanced 82 (5.84) 10 (6.29)Beginning 76 (5.98) 13 (5.78)No-exposure 72 (6.33) 14 (4.3)[ùu]Advanced 85 (6.22)Beginning 78 (5.98) 14 (5.9)No-exposure 61 (5.82) 10 (5.14) 18 (5.31)[Ci]Advanced 13 (2.89) 69 (3.38) 16 (5)Beginning 13 (3.89) 64 (4.76) 14 (4.6)No-exposure 33 (6.17) 14 (4.4) 33 (5.83) 11 (4)[Cy]Advanced 69 (3.98) 13 (5.22) 12 (3.25)Beginning 65 (5.68) 25 (4.83)No-exposure 69 (6.1) 13 (5.22) 10 (6.29)Hao (2012) tested the perception of Mandarin sibilant fricatives by three groupsof English speakers: (1) advanced learners of Mandarin, (2) beginning learners ofMandarin, (3) naive English speakers who had no previous exposure to Mandarin.Participants heard three sibilant fricatives in CV syllables, the dental/alveolar frica-tive (in the syllables [s1] and [su]), the retroflex fricative (in the syllables [ù1] and[ùu]), and the alveolopalatal fricative (in the syllables [Ci] and [Cy]), and were askedto label the fricative sounds using one of five phonetic symbols used to transcribeEnglish phonemes, exemplified for them in English words (e.g., /s/ for son). Par-ticipants were also asked to rate the similarity between the fricative sounds in thestimuli and English fricatives that they chose on a scale of “1” (less similar) to “7”(more similar).Table 2.2 shows the results of the identification and similarity rating. Impor-tant findings from the results are that naive English speakers in the no-exposuregroup perceived Mandarin retroflex [ù] as English palato-alveolar /S/ most of thetime, but their perception of Mandarin alveolopalatal [C] was dependent on vowelcontext. They perceived the alveolopalatal fricative in [Cy] as the palato-alveolar/S/ most of the time, but they perceived the alveolopalatal fricative in [Ci] either as38the alveolar /s/ or palato-alveolar /S/. These findings suggest that the perception ofthese Mandarin fricatives by naive English speakers can be characterized either assingle category assimilation or category goodness assimilation in the frameworkof the Perceptual Assimilation Model (Best et al., 1988; Hao, 2012). Both theretroflex [ù] and the alveolopalatal [C] are mapped onto the palato-alveolar /S/ (sin-gle category assimilation), but depending on the vowel context, the alveolopalatal[C] is mapped onto either the alveolar /s/ or the palato-alveolar /S/, which meansthat overall the retroflex [ù] fits the category of the palato-alveolar /S/ better thanthe alveolopaltal [C] does (category goodness assimilation).Hao (2012) also tested her participants’ discrimination using an AXB task.Participants compared the retroflex [ù] and alveolopalatal [C] in the vowel contextsin which these fricatives were identified as the palato-alveolar /S/ (comparing [ù1]vs. [Cy] and [ùu] vs. [Cy]). The results showed that naive English speakers per-formed well in the discrimination task (mean accuracies were higher than 75%).This goes against the hypothesis that the perception of these Mandarin fricativesby naive English speakers is characterized as single category assimilation. How-ever, Hao speculates that the good performance in the discrimination taks is partlydue to the fact that the stimuli differed not only in the fricative portion but also inthe vocalic portion. Even through the vowel contrasts used in the test stimuli ([1]vs. [y] and [u] vs. [y]) were non-English, participants were still able to perceiveacoustic differences between these vowels and use the differences to discriminatethe test stimuli (Hao, 2012, p.103).Taken together, the results of Hao’s study suggest that, although English speak-ers can distinguish [ù] and [C], they may classify the sounds into one or two cate-gories. These conclusions are complicated, however, by the fact that the discrim-ination stimuli differed in vowel as well as consonant, and the categorization taskasked English speakers to forcefully sort the Mandarin sounds into English cate-gories, instead of testing how many categories English listeners would naturallyput the Mandarin sounds into.Lee et al. (2012) tested the identification of Mandarin fricatives by nativeMandarin speakers and English-speaking Mandarin learners using the hearing-in-noise test. In their study, participants were asked to identify Mandarin fricativesin CV syllables, the dental/alveolar [sa], retroflex [ùa], and alveolopalatal [Ca],39presented with different levels of speech-shaped noise. The results of the studyshowed both commonalities and differences between Mandarin speakers and En-glish speakers. Overall, Mandarin speakers performed better than English speakers.With the dental/alveolar [sa], while Mandarin speakers frequently misidentified itas the retroflex [ùa], English speakers were more likely to misidentify it as thealveolopalatal [Ca]. With the retroflex [ùa], both Mandarin and English speakersfrequently misidentified it as the alveolopalatal [Ca]. With the alveolopalatal [Ca],both Mandarin and English speakers accurately identified it as the alveolopalatal[Ca] (mean accuracies were higher than 80% in all noise conditions). The resultsof the Mandarin speakers conform to the claim made in Chiu (2010). Mandarinspeakers primarily attend to frication noise in the perception of the dental/alveolar[s] and retroflex [ù]. Therefore, the masking noise obscured the spectral shape ofthe frication noise and made the identification of these fricatives harder. Mandarinspeakers primarily attend to formant transitions in the perception of the alveolopal-tal [C]. Since the masking noise did not affect formant transitions as much as thespectral shape of the frication noise, the identification of the alveolopaaltal [C] wasless affected. The results of the English speakers are less clear. At least the findingthat English speakers performed as well as Mandarin speakers in the identificationof the alveolopalatal [Ca] in all noise conditions suggests that they are learning toattend to formant transitions in the perception of the alveolopalatal [C].Some studies have suggested that English speakers’ learning of non-Englishpost-alveolar sibilant fricatives is more sensitive to information from formant tran-sitions than information from frication noise. For example, McGuire (2007a, 2008)tested the perceptual learning of the phonetic contrast between Polish retroflex[ù] and alveolopalatal [C], which are similar to the ones in Mandarin, by Englishspeakers. After a period of laboratory training, participants showed a significantimprovement in sensitivity to the phonetic contrast when they were trained withinput in which the contrast was cued by formant transitions but not when theywere trained with input in which the contrast was cued by the spectral shape of thefrication noise.In sum, Mandarin retroflex [ù] and alveolopalatal [C] overlap with Englishpalato-alveolar [S] in terms of both articulation and acoustics. Naive English speak-ers perceive the retroflex [ù] and alveolopalatal [C] as English palato-alveolar [S]40most of the time, but they also perceive the alveolopalatal [C] as English alveolar[s] in some vowel contexts.2.3 MethodExperiment 1 consisted of two sessions over two consecutive days. In each session,participants were first exposed to input, and then were tested on the discriminationof the retroflex [ù] and alveolopalatal [C]. The experiment was split into two ses-sions over two consecutive days because previous research has demonstrated thatmemory consolidation resulting from sleep facilitates the learning of speech soundcategories (e.g., Fenn et al., 2003). Participants were randomly assigned to one ofthree conditions, one control and two experimental. Participants in different con-ditions were exposed to different sets of exposure stimuli, but they did the samediscrimination test.2.3.1 ParticipantsSixty-two adult native speakers of English with no known speech or hearing prob-lems participated in the experiment. All participants completed two sessions overtwo consecutive days and were paid $20 for their participation. All reported En-glish to be their first and dominant language. Many of the participants were mul-tilingual, but none of them were familiar with any language containing two ormore post-alveolar fricatives as phonemes. Two participants were excluded fromthe analyses on the basis of their poor performance in a monitoring task duringexposure (details described below). This left 20 participants in each condition.2.3.2 Exposure stimuliExposure stimuli consisted of 256 bisyllabic strings. Each string comprised a con-text syllable followed by a critical syllable or a filler syllable. There were eightcontext syllables, [li], [lu], [mi], [mu], [pi], [pu], [gi], and [gu]. Context syllableswere classified into two groups according to the vowel quality, one with the highfront unrounded vowel [i] ([i] context) and the other with the high back roundedvowel [u] ([u] context). There were eight critical syllables drawn from a 10-stepcontinuum between the retroflex [ùa] and the alveolopalatal [Ca]. There were also41four filler syllables, [tha], [ta], [fa], and [ha].Recording and the acoustic analyses of target syllablesA male native speaker of Mandarin from Taiwan, who is also a trained phonetician,produced four repetitions of each one of eight context syllables ([li], [lu], [mi],[mu], [pi], [pu], [gi], and [gu]), two target syllables ([ùa] and [Ca]), and four fillersyllables ([tha], [ta], [fa], and [ha]). Recording was done in a soundproof booth withan external dynamic omnidirectional microphone (SHURE SM63LB) connectedto an iMac computer through a preamp (M-Audio Fast Track Pro). Recordingsampling rate was 44,100Hz. The speaker produced the context syllables in a singlesyllable form, but the target syllables and filler syllables in a bisyllabic string withthe preceding vowel [a] (e.g., [aùa]). All of the syllables were produced with tone1 (high level tone).I analyzed the acoustics of target syllables looking at the spectral shape offrication noise and formant transitions into the following vowel. First, two spec-tral measurements, centroid frequency (i.e., center of gravity or CoG) and peakfrequency, were measured from the midpoint of the frication noise. Second, thefirst three formants were measured at the 10 ms and 100 ms points of the follow-ing vowel. All of the acoustic measurements were made with Praat (Boersma andWeenink, 2001).From the frication noise of the each token of the target syllables, a slice of 24ms around the midpoint was extracted using the Hamming window function. AnFFT spectrum was generated from the slice. The FFT spectrum was smoothed us-ing Cepstral smoothing with a bandwidth of 500 Hz and CoG was measured fromthe smoothed spectrum. The FFT spectrum was also converted into a Long TimeAveraged Spectrum (LTAS) with a bandwidth of 125 Hz and peak frequency wasmeasured from the LTAS. Figure 2.1 shows the CoG and peak frequency of theretroflex [ù] and alveolopalatal [C]. Both CoG and peak frequency are higher forthe alveolopalatal [C]. The values of CoG are within the ranges of centroid fre-quency that have been reported for Mandarin retroflex and alveolopalatal fricativesin previous studies (e.g. Lee, 2011).425500600065007000[ʂ] [ɕ]FricativesFreuquency (Hz)(a) Centre of gravity4000500060007000[ʂ] [ɕ]FricativesFreuquency (Hz)(b) Spectral peakFigure 2.1: Spectral measurements of [ù] and [C]From the vowel of the each token of the target syllables, the first three formantswere measured in a psychoacoustic scale (ERB-rate: Moore and Glasberg 1983) at10 ms and 100 ms after the onset of voicing. Figure 2.2 shows the trajectories ofthe first three formants from 10 ms to 100 ms of the vowel [a] after the retroflex[ù] and alveolopalatal [C]. The vowel after the alveolopalatal [C] has lower onset F1and higher onset F2.4351015202510 100Time (ms)ERB-rate FormantF1F2F3Retroflex [ʂa]51015202510 100Time (ms)ERB-rate FormantF1F2F3Alveolo-palatal [ɕa]Figure 2.2: Formant transitions of [ùa] and [Ca] (with 95% CI)Resynthesis of target syllablesA 10-step continuum between the retroflex [ùa] and alveolopalatal [Ca] was con-structed by resynthesizing the original recordings of the target syllables. The resyn-thesis was done using a similar method as that employed by McGuire (2007a,b).First, the recordings of the target syllables were segmented into the frication noiseand the vowel. Separate continua were constructed for the frication noise and thevowel. Then, the two continua were combined together to create a continuum ofsyllables. For the frication noise, tokens of [ù] and [C] were selected on the basis ofthe quality of the recordings. From each token, 200 ms around the midpoint of thefrication noise were extracted using a parabolic windowing function to smooth theonset and the offset of the extracted frication noise. After the extraction, the meanintensity of the frication noise was adjusted to be 50 dB. After the duration and theintensity were equalized, the Pitch Synchronous Overlap Add (PSOLA) method inPraat was used to interpolate a stepwise transition (10 steps) from the token of [ù]to the token of [C] (Boersma and Weenink, 2001; Moulines and Charpentier, 1990).Figure 2.3 shows the LPC spectra of the frication noise at steps 1, 4, 7, and 10. As44the steps move from the retroflex end (step 1) to the alveolopalatal end (step 10),the prominent energy peak at around 4000 Hz is reduced and energy in the higherfrequency regions increases.Intensity (dB)Frequency (Hz)0 2000 4000 6000 8000 104 1.2·104 1.4·104 1.6·104 1.8·104 2·104 2.2·1040204060step10step7step4step1Figure 2.3: Spectra of resynthesized frication noise tokensFor the vowel, tokens of [a] after [ù] and [C] were selected on the basis ofthe quality of the recordings. From each token, the initial 200 ms were extractedusing a similar method used for the frication noise. First, 500 ms around the onsetof voicing were extracted using a parabolic windowing function. Then, 200 msafter the onset of voicing were extracted. In this way, the intensity rise after theonset was retained to be the same as the original, but the intensity fall towards theoffset was smoothed such that the two vowel tokens had similar envelope shapestowards the offset. After the extraction, the mean intensity of the vowel tokenswas adjusted to be 63 dB.2 After the duration and the intensity were equalized, theSpeech Transformation and Representation by Adaptive Interpolation of WeightedSpectrogram method (STRAIGHT) was used to interpolate a stepwise transition2The vowel intensity was determined based on the mean of the ratios of the vowel intensity tofrication noise intensity across two fricatives in the original recordings451011121310 100Time (ms)ERB-ratestep14710(a) F117.017.518.018.519.019.510 100Time (ms)ERB-ratestep14710(b) F2Figure 2.4: Formant transitions of resynthesized vowel tokens(10 steps) from the token of [a] after [ù] to the token of [a] after [C] (Kawaharaet al., 1999). Figure 2.4 shows the first two formants measured at 10 ms and 100ms of steps 1, 4, 7, and 10. As the steps move from the retroflex end (step 1) to thealveolopalatal end (step 10), the onset F1 becomes lower and the onset F2 becomeshigher.46Time (s)0 0.40010104Frequency (Hz)step1Time (s)0 0.40030104Frequency (Hz)step4Time (s)0 0.40050104Frequency (Hz)step7Time (s)0 0.3990104Frequency (Hz)step10Figure 2.5: Spectrograms of steps 1, 4, 7, and 10Finally, the frication noise continuum and the vowel continuum were combinedto create a 10-step continuum from [ùa] to [Ca]. Figure 2.5 shows the spectrogramsof steps 1, 4, 7, and 10 from the continuum. In the methods described above, theinterpolations of steps between the retroflex [ùa] and alveolopalatal [Ca] were basedpurely on the acoustic properties of the end-point tokens. In order to determine thelocation of the perceptual boundary between [ùa] and [Ca], I tested the perception ofthe 10 syllables from the continuum by native speakers of Mandarin using an ABXdiscrimination task and an identification task. The results of the tests showed thatthe perceptual boundary is located at step 6 (Noguchi and Hudson Kam 2015a; alsosee Appendix A). For this experiment, eight critical syllables were selected fromthe continuum, four from the retroflex side of the perceptual boundary (steps 2, 3,4, 5) and four from the alveolopalatal side of the perceptual boundary (steps 7, 8,9, 10).47Construction of exposure stimuliExposure stimuli were constructed by concatenating the context syllables and crit-ical syllables or filler syllables. In all of the stimuli, a context syllable came first,followed by a critical syllable or a filler syllable, such that the context vowel imme-diately preceding the target consonant. Before the concatenation, the duration ofcontext syllables and critical syllables was manipulated so that all exposure stimulihad the same prosodic structure. The duration of context syllables and filler syl-lables was changed to 400 ms using a method similar to the one described above;400 ms around the onset of the vowel was extracted using the parabolic windowfunction. The first half (200 ms) of the extracted recording contained the acousticsignal that corresponded to the consonants, and the second half (200 ms) containedthe acoustic signal that corresponded to the vowels. By doing this, context sylla-bles, critical syllables, and filler syllables had the same vowel duration (200 ms).Moreover, all exposure stimuli had the same intervocalic interval duration (200ms). Furthermore, mean syllable intensity was adjusted to be 55 dB for contextsyllables and 60 dB for critical syllables and filler syllables.The frequencies of critical syllables were manipulated so that their aggregatedistribution showes a bimodal shape (see Figure 2.6). This implies that the firsthalf of the critical syllables (steps 2 to 5) forms the retroflex category, and thesecond half (steps 7 to 10) forms the alveolopalatal category. These 16 tokensof critical syllables were combined with eight context syllables ([li], [lu], [mi],[mu], [pi], [pu], [gi], and [gu]) to generate 128 bisyllabic strings (critical stimuli).Similarly, 32 tokens of filler syllables (eight tokens of [ta], [tha], [fa], and [ha])were combined with eight context syllables to generate 256 bisyllabic strings (fillerstimuli). Participants in the experimental conditions heard the 128 critical stimuliand one half of the 256 filler stimuli (the ones with [ta] and [tha]). Participants inthe control condition heard the 256 filler stimuli. In all conditions, the 256 stimuliwere divided into two subsets according to the consonant of the context syllables,a subset with [l] and [p] and the other subset with [m] and [g]. The first subset wasused in Session 1 and the second subset was used in Session 2.48012341 2 3 4 5 6 7 8 9 10StepsFrequencyFigure 2.6: Aggregate distribution of critical syllables (Experiment 1)2.3.3 ConditionsThe experiment had three conditions, two experimental and one control. In thetwo experimental conditions, the phonotactic distribution of critical syllables wasmanipulated. In the first experimental condition (non-complementary condition),all of the critical syllables occurred in overlapping contexts: after the high frontunrounded vowel [i] and the high back rounded vowel [u]. Figure 2.7 shows thedistribution of the 32 tokens of critical syllables where the same number of tokensof each step occur in the [i] context and [u] context. For example, one token ofstep 2 occurred in the [u] context and the other token of step 2 occurred in the [i]context, for a total of two tokens at this step. In this condition, the retroflex [ù] andalveolopalatal [C] were in a non-complementary distribution, and their occurrenceswere not predictable from the preceding contexts. In the second experimental con-dition (complementary condition), the critical syllables from the retroflex categoryoccurred after the high back rounded vowel [u], and the critical syllables from thealveolopalatal category occurred after the high front unrounded vowel [i]. Figure2.8 shows the distribution of the same 32 tokens of critical syllables where the to-kens of step 2, 3, 4, and 5 occur in the [u] context and the tokens of step 7, 8, 9, and4910 occur in the [i] context. In this condition, the retroflex [ù] and alveolopalatal [C]were in complementary distribution, and their occurrences were fully predictablefrom the preceding contexts.024681 2 3 4 5 6 7 8 9 10StepsFrequency Contexts[i][u]Figure 2.7: Distribution of 32 critical syllables in the non-complementarycondition (Experiment 1)50024681 2 3 4 5 6 7 8 9 10StepsFrequency Contexts[i][u]Figure 2.8: Distribution of 32 critical syllables in the complementary condi-tion (Experiment 1)The pattern of complementary distribution implemented in the input used inthe complementary condition was defined on the basis of the relative similari-ties between the target segments and their respective contexts. Generally speak-ing, retroflex consonants share certain phonetic features with back rounded vow-els (Flemming, 2003; Hamann, 2003, 2002) and palatal/palatalized consonantsshare certain phonetic features with high front vowels (Guion, 1996, 1998; Wil-son, 2006). For example, the articulation of truly retroflex consonants involves theformation of a post-alveolar apical constriction: the lowering of the tongue frontand the rising of the tongue tip towards the post-alveolar region. These gesturesare facilitated by the retraction of the tongue body because it makes enough spacefor the lowing of the tongue front and the rising of the tongue tip. The retractionof the tongue body is also seen in the articulation of back vowels. The articulationof retroflex consonants also involves the formation of a large front cavity, whichis reflected in the acoustic signal as the lowering of the onset F3 of the follow-ing vowel. This F3 lowering parallels an acoustic effect of lip rounding and/or lipprotrusion that happens in the articulation of rounded vowels (Bhat, 1974; Dixitand Flege, 1991; Ladefoged and Bhaskararao, 1983). The connection between51retroflex consonants and back vowels is phonologized in some languages. For ex-ample, in Kodagu (Dravidian), retroflex consonants occur after back vowels butnot after front vowels (e.g., [uã] is attested but [iã] is not: Bhat 1973; Flemming2003; Gnanadesikan 1994).The articulation of palatal/palatalized consonants involves the upward and for-ward movements of the tongue body, which are also seen in the articulation ofhigh front vowels. The connection between palatal/palatalized consonants and frontvowels is phonologized in many languages as palatalization. Palatalization usuallyinvolves three processes, tongue raising, tongue fronting, and spirantization (Bhat,1978). For example, in Slavic, the velar stop [k] is realized as the palato-alveolaraffricate [tS] before front vowels ([i], [œ]), and in Japanese, the alveolar fricative[s] is realized as the alveolopalatal fricative [C] before the high front vowel [i](Bateman, 2007; Bhat, 1978; Guion, 1998).For post-alveolar fricatives in Mandarin, the connections between the retroflex[ù] and the high back rounded vowel [u] are not so obvious. Since the articulationof Mandarin retroflex fricatives does not involve the formation of a post-alveolarapical constriction, their articulation does not necessarily involve the retractionof the tongue body. The articulation of Mandarin retroflex fricatives involves theformation of a large front cavity. However, for the stimuli used in this experiment,the tokens of the retroflex [ùa] and alveolopalatal [Ca] did not show any significantdifference in terms of F3 transition (see Figure 2.2).3 By contrast, the connectionsbetween the alveolopalatal [C] and the high front unrounded vowel [i] are verystrong. The articulation of the alveolopalatal fricatives involves the formation ofa long and narrow constriction channel over the post-alveolar and palatal regions.These articulatory gestures are achieved by moving the tongue body forward andupward, which are also seen to the articulation of the high front unrounded vowel[i]. For the stimuli used in this experiment, the presence of the palatal constrictionis reflected in the low onset F1 and high onset F2 (see Figure 2.2). Overall, theoccurrence of the retroflex [ù] next to the high back rounded vowel [u] and the3One study has reported that the articulation of retroflex fricatives by some native speakers ofMandarin from Taiwan involves significant lip rounding (Chang, 2010). The speaker who producedthe target syllables for this experiment is from Taiwan. However, it is not clear whether such a featurewas present in his production of retroflex fricatives.52alveolopalatal [C] next to the high front unrounded vowel [i] is phonetically morenatural than the reverse.Here, it should be made clear that despite the relatively natural connections de-scribed above, the particular pattern of complementary distribution implemented inthe complementary condition of the current experiment is, as far as I know, not at-tested in any natural languages. In natural languages, there is no case in which theoccurrences of retroflex and alveolopalatal fricatives are conditioned by the pre-ceding vowels. Cases like Kodagu where the occurrences of retroflex consonantsare conditioned by the preceding vowels are usually limited to stops (Bhat, 1973;Flemming, 2003; Gnanadesikan, 1994). This is probably because in the VC tran-sition stops are strongly coarticulated with the preceding vowel, but fricatives arenot (Stevens and Blumstein, 1975). Therefore, the pattern of complementary dis-tribution implemented here does not assume any sort of typological universality.The third condition was the control condition. In this condition, participantsheard the same number of exposure stimuli as in the two experimental conditions,but the stimuli did not contain critical syllables at all; instead, they contained fourfiller syllables ([ta], [tha], [fa], [ha]). The control condition was included to assessnative English speakers baseline sensitivity to acoustic differences between theretroflex [ù] and alveolopalatal [C] as well as how sensitivity might be affectedby repeated testing and familiarity with the test stimuli. Table 3.1 summarizes theexposure stimuli used in all three conditions.Condition Context syllables Critical syllables Filler syllablesNon-complementaryli-, mi-, pi-, gi- steps 2, 3, 4, 5, 7, 8, 9, 10 -ta, -thalu-, mu-, pu-, gu- steps 2, 3, 4, 5, 7, 8, 9, 10 -ta, -thaComplementaryli-, mi-, pi-, gi- steps 7, 8, 9, 10 -ta, -thalu-, mu-, pu-, gu- steps 2, 3, 4, 5 -ta, -thaControlli-, mi-, pi-, gi- N/A -ta, -tha, -fa, -halu-, mu-, pu-, gu- N/A -ta, -tha, -fa, -haTable 2.3: Exposure stimuli (Experiment 1)532.3.4 AX discrimination testParticipants’ sensitivity to acoustic differences between the retroflex [ù] and alve-olopalatal [C] was tested using an AX discrimination paradigm. In each test trial,participants compared a token of [ùa] and a token of [Ca] (different trial) or twonon-identical tokens of [ùa] or [Ca] (same trial). There were two types of differ-ent trials depending on the acoustic distance between the test stimuli, distant pairsand close pairs. In the distant pair trials, participants compared two stimuli thatare acoustically quite different from each other (step 2 and step 10 from the con-tinuum). In the close pair trials, they compared two stimuli that are acousticallymore similar to each other (step 4 and step 8 from the continuum). There werealso two types of same trials. In the retroflex category trials, participants com-pared two non-identical tokens of the retroflex category (step 2 and step 4). Inthe alveolopalatal category trials, participants compared two non-identical tokensof the alveolopalatal category (step 8 and step 10). In order to make the focus ofthe study less obvious, filler trials were included, where participants compared twodifferent filler syllables ([ta] vs. [tha] and [fa] vs. [ha]) or two non-identical tokensof a single filler syllable ([ta] vs. [ta], [tha] vs. [tha], [fa] vs. [fa], and [ha] vs. [ha]).Table 2.4 summarizes the stimuli used in the AX discrimination test.Test trialDifferent trialDistant pair step 2 vs. step 10Close pair step 4 vs. step 8Same trialRetroflex step 2 vs. step 4Alveolopalatal step 8 vs. step 10Filler trialDifferent trial ta vs. tha, fa vs. haSame trial ta vs. ta, tha vs. tha, fa vs. fa, ha vs. haTable 2.4: Test stimuli (Experiment 1)Participants in all three conditions took the same AX discrimination test withthe same test stimuli. Since participants in the control condition were not exposedto critical syllables at all, all of the items used in the test trials were novel stimulifor these participants. Similarly, since participants in the non-complementary andcomplementary conditions were exposed to only half of the filler syllables ([ta] and[tha]), the other half ([fa] and [ha]), which were used in the filler trials, were novelstimuli for these participants.542.3.5 DesignE-Prime Professional (ver. 2.0) was used to control the presentation of stimuli andthe recording of responses (Schneider et al., 2002). A session consisted of threephases: practice, exposure, and test. In the practice phase, participants did a prac-tice AX discrimination test, in which they compared 18 pairs of English monosyl-labic words. Half of the pairs contained two non-identical tokens of a single word(e.g., cap and cap), and the other half contained two different words (e.g., cap andgap). The purpose of including the practice trials was to familiarize participantswith the AX discrimination task. Therefore, the structure of practice trials wasidentical to that of test trials (see below).In the exposure phase, participants heard a block of 128 stimuli presented ina random order with one second interstimulus interval (ISI). They heard the blockfour times. Exposure stimuli were presented as “short sentences” so that each syl-lable in a stimulus could be treated as a word. In order to help participants stayattentive to the stimuli, a monitoring task was given to them. In each block ofstimuli presentation, a monitoring stimulus (a filler stimulus with long vowels:e.g., [li:ta:]) was randomly inserted in every subblock of 16 presentations. Par-ticipants were asked to press the spacebar when they heard the instances of “slowspeech“.In the test phase, participants did an AX discrimination test. There were 64test trials, 32 different trials and 32 same trials. Half of the 32 different trials weredistant pair trials, and the other half were close pair trials. Half of the 32 sametrials were retroflex trials, and the other half were alveolopalatal trials. Similarly,there were 64 filler trials (32 different trials and 32 same trials). In each trial,the paired stimuli were separated by an ISI of 750 ms, which is long enough tolet participants process the stimuli at a higher, non-auditory level (Pisoni, 1973;Werker and Logan, 1985). The same ISI has been used in previous studies on thedistributional learning of sound categories by adults (e.g., Maye and Gerken, 2000;Pajak, 2012). Participants were given a maximum of five seconds to respond, andthe trial was terminated whenever they recorded a response. Intertrial interval (ITI)was two seconds.552.3.6 ProcedureThe experiment consisted of two sessions over two consecutive days. It was con-ducted at the Language and Learning Lab at The University of British Columbia.On Day 1, participants came into the lab and signed the consent form. Then, Ses-sion 1 began. Participants were first given the following information about theexperiment.In this experiment, you will listen to someone speaking in a languagethat you never heard before. After listening to the speech for a while,you will be tested on what you learned from listening. During the test,you will hear the person saying two things in the language and youwill decide whether they are two repetitions of the same word or twodifferent words.After the introduction, participants proceeded to the practice phase. Partici-pants were given the following instruction at the beginning of the practice phase.In each trial, you will hear someone saying two things in English. Theyare either two repetitions of the same word or two different words. Ifyou think that they are two repetitions of the same word, press the“SAME” key. If you think that they are two different words, press the“DIFFERENT” key.No feedback was provided during the practice phase. But after completing thepractice trials, participants were allowed to ask for further instruction on the task ifthey needed. Otherwise, participants proceeded to the exposure phase. Participantswere given the following instructions at the beginning of the exposure phase.You will listen to someone speaking in a language that you never heardbefore. You will be listening to a person saying short sentences inthe language. In the recordings, each sentence consists of two words.Unlike English, words in this language are very short and consist ofonly one syllable. Therefore a sentence could be something like “wako”.56Sometimes words will be pronounced very slowly, like “waaa kooo”.When you hear the slow speech, press the “SPACE” bar.After completing the exposure phase, participants proceeded to the test phase.Participants were given the following instructions at the beginning of the test phase.We will see what you learned about the new language. The test is justlike the practice that you took at the beginning. But this time, you willbe tested with the new language.In each trial, you will hear the person saying two things in the lan-guage you just heard. If you think that they are two repetitions of thesame word in this language, press the “SAME” key. If you think thatthey are two different words in this language, press the “DIFFER-ENT” key.Some words in this language are very similar to each other. Listencarefully to the way they sound. Even if you are not sure about youranswer, make a guess based on what you heard in the previous listen-ing section, and make sure that you answer all of the trials.On Day 2, participants came back to the lab and did Session 2. Session 2followed the same procedure as Session 1, except that participants filled a languagebackground questionnaire and received $20 after finishing the test phase.2.4 ResultsFirst, participants’ performance on the monitoring task was checked to see whetherthey were attentive to the stimuli during exposure. Participants who detected fewerthan 75% of the monitoring stimuli were excluded from further analyses. Thisaffected two participants, one in the control condition and the other in the comple-mentary condition (Note that they were excluded prior to the end of the study andwere replaced with two new participants so that the condition Ns were balanced).This left 60 participants (20 per condition) for analyses.Responses to test trials in the AX discrimination test were converted into sen-sitivity or d-prime (d′) scores (Macmillan and Creelman, 2004). d′ is based on the57difference between the likelihood of the correct detection of a signal, that is, thelikelihood of answering “yes” in the presence of a signal (hit rate), and the likeli-hood of the incorrect detection of a signal, that is, the likelihood of answering “yes”in the absence of a signal (false alarm rate). A larger difference between thesetwo likelihoods means better sensitivity (i.e., the likelihood of correct detection ismuch higher than the likelihood of incorrect detection). The actual computationof d′ takes the difference between z-transformed hit rate and z-transformed falsealarm rate.d′ = z(Hit rate)− z(False alarm rate)d′ scores were computed for each participant for each pair (distant or close) ineach session (1 or 2). “Hit” was defined as the correct detection of a change insignal when participants heard two stimuli from two different categories (i.e., an-swering different in different trials), and “false alarm” was the incorrect detectionof a change in signal when participants heard two stimuli from a single category(i.e., answering different in same trials). In order to compute hit rates and falsealarm rates, different trials were coupled with same trials that shared the same firstsyllable (the A of an AX pair). For example, the different trial in which stim-uli were presented in the order step 2, step 10 was coupled with the same trial inwhich stimuli were presented in the order step 2, step 4. In other words, hit was thedetection of across-category change, and false alarm was the detection of within-category change. Following standard practice, when hit rate or false alarm ratewas 0, it was replaced by 12N , and when hit rate was 1, it was replaced by 1− 12N(N=number of trials for a particular trial type) (Macmillan and Creelman, 2004,p.8). All of the following statistical analyses were done in R 3.0.3 (R Core Team,2014).Figure 2.9 shows the mean d′ scores for distant pair trials by session and con-dition. Figure 2.10 shows the mean d′ scores for close pair trials by session andcondition. A repeated-measures ANOVA was conducted on d′ scores with con-dition as a between-participant variable and pair and session as within-participantvariables. The significance level was set at p < 0.05. There were significant maineffects of condition [F(2,57) = 4.2, p = 0.019], session [F(1,57) = 14.153,p < 0.001], and pair [F(1,57) = 235.576, p < 0.001]. There was no significant5801231 2Sessiond'GroupControlNon-compCompDistant pairFigure 2.9: Mean d′ scores for distant pair trials with 2 SE (Experiment 1)two-way interaction between condition and pair [F(2,57) = 1.386, p = 0.258],between condition and session [F(1,57) = 1.177, p = 0.838], nor between pairand session [F(1,57) = 3.325, p = 0.074]. There was no three-way interaction[F(1,57) = 1.642, p = 0.203].With condition, post-hoc pairwise comparison with the Holm adjustmentmethod indicated that d′ scores of the complementary condition (M = 1.07,SD= 1.06) were significantly lower than those of the control condition (M = 1.62,SD = 0.86) (p < 0.001) and the non-complementary condition (M = 1.64, SD =0.95) (p < 0.001) but there was no significant difference between the control andnon-complementary conditions (p = 0.905). With pair, post-hoc pairwise com-parison indicated that d′ scores for the distant pair trials (M = 1.91, SD = 0.97)were significantly higher than those for the close pair trials (M = 0.97, SD = 0.78)(p < 0.001). With session, post-hoc comparison indicated that d′ scores for Ses-sion 2 (M = 1.61, SD = 0.99) were significantly higher than those for Session 1(M = 1.28, SD = 0.98) (p< 0.001).Participants in the control and non-complementary conditions showed about5901231 2Sessiond'GroupControlNon-compCompClose pairFigure 2.10: Mean d′ scores for close pair trials with 2 SE (Experiment 1)the same level of sensitivity to acoustic differences between [ùa] and [Ca] afterexposure. If the performance of participants in the control condition reflects thebaseline sensitivity, or English speakers’ pre-existing sensitivity, to the acousticdifferences, this finding suggests that no significant learning happened in the non-complementary condition. Exposure to the input in which [ù] and [C] occurred inoverlapping contexts did not significantly affect learners’ pre-existing sensitivityto acoustic differences between [ùa] and [Ca]. By contrast, participants in the com-plementary condition showed significantly lower sensitivity compared to those inthe control and non-complementary conditions after exposure. This suggests thatexposure to the input in which [ù] and [C] occurred in mutually exclusive contextssignificantly reduced the learners’ pre-existing sensitivity.Participants across all three conditions showed a significant improvement insensitivity from Session 1 to Session 2. This is probably due to familiarizationwith test stimuli. Since the same set of test stimuli were used in both sessions, itis possible that participants in all three conditions became more sensitive to acous-tic differences between the test stimuli. If that is the case, the data from Session 160should provide a clearer picture of the effects of the input. A follow-up analysis wasconducted with just the data from Session 1. A repeated-measures ANOVA wasconducted on d′ scores with condition as a between-participants variable and pairas a within-participant variable. There were significant main effects of condition[F(2,57) = 4.793, p = 0.012] and pair [F(1,57) = 165.82, p< 0.001]. There wasno significant interaction between condition and pair [F(2,57) = 2.58, p= 0.085].Post-hoc pairwise comparison indicated that d′ scores of the complementary con-dition (M = 0.87, SD = 1.02) were significantly lower than those of the controlcondition (M = 1.48, SD = 0.86) (p = 0.011) and the non-complementary con-dition (M = 1.5, SD = 0.92) (p = 0.011), but there was no significant differencebetween the control and the non-complementary conditions (p = 0.937). Post-hocpairwise comparison indicated that d′ scores for the distant pair trials (M = 1.8,SD = 0.96) were significantly higher than those for the close pair trials (M = 0.77,SD = 0.68) (p< 0.001).In sum, participants in the control and non-complementary conditions showedthe same level of sensitivity to acoustic differences between [ùa] and [Ca] afterexposure, but participants in the complementary condition showed a significantlylower level of sensitivity after exposure.2.5 DiscussionThe results of Experiment 1 showed that a difference in the phonotactic distributionof target segments in input significantly affected learners’ sensitivity to acousticdifferences between the segments. Participants in the two experimental conditionswere exposed to input in which the frequency distribution of novel sounds showeda bimodal shape. This implied the categorization of the sounds into two segments.In the non-complementary condition, these two segments occurred in overlappingcontexts, and thus their occurrences were not predictable from the contexts. Inthe complementary condition, these two segments occurred in mutually exclusivecontexts (i.e., they were in complementary distribution), and thus their occurrenceswere predictable from the contexts. The finding that participants in the comple-mentary condition showed reduced sensitivity after exposure suggests that theseparticipants learned to treat the target segments as something like allophones.61These results are consistent with the hypothesis that adults can learn allophonicrelationships between two segments from the complementary distribution of thesegments in input. However, the finding that participants in the control and non-complementary conditions did not show any significant difference may be surpris-ing. Participants in the control condition did not get any exposure to critical syl-lables, while participants in the non-complementary condition got exposure to thebimodal distribution of critical syllables, which was supposed to be a robust cue forthe learning of [ù] and [C] as separate categories. Therefore, one may wonder whyexposure to the bimodal distribution did not help them to improve their sensitivity.To this point I have been implicitly assuming a model in which all sensitivi-ties are learned. However, we know that this is not the case. Following Aslin andPisoni (1980), Maye (2000) discusses three possible hypotheses in which exposureto input affects learners’ sensitivity in a distributional learning paradigm. The firstis the maintenance hypothesis. In this hypothesis, learners have good pre-existingsensitivity to acoustic differences between two target segments. Therefore, expo-sure to a bimodal distribution does not affect their sensitivity, but exposure to aunimodal distribution will decrease their sensitivity. The second is the facilitationhypothesis. In this hypothesis, learners have poor pre-existing sensitivity to acous-tic differences between two target segments. Therefore, exposure to a bimodaldistribution will improve their sensitivity, but exposure to a unimodal distributiondoes not affect their sensitivity. The third is the underspecification hypothesis. Inthis hypothesis, learners’ sensitivity to the acoustic differences between two targetsegments is initially underspecified. Therefore, exposure to a bimodal distributionwill improve their sensitivity, and exposure to a unimodal distribution will decreasetheir sensitivity.In her own study on the distributional learning of stop voicing categories byadult English speakers, Maye (2000) compared three conditions, control, uni-modal, and bimodal. In the control condition, participants did not get any exposureat all. In the unimodal condition, participants were exposed to a unimodal distribu-tion that implied the classification of stop consonants into a single voicing category(between prevoiced and voiceless unaspirated). In the bimodal condition, partici-pants were exposed to a bimodal distribution that implied the classification of stopconsonants into two voicing categories (prevoiced and voiceless unaspirated). Af-62ter exposure, all participants were tested on the discrimination of the prevoiced andvoiceless unaspirated stops. The results showed that participants in the bimodalcondition performed significantly better than participants in the unimodal condi-tion. The results also showed that participants in the control condition performedbetter than the participants in the unimodal condition but worse than the partici-pants in the bimodal condition, even though the differences were not statisticallysignificant. From these results, Maye concluded that the distributional learninghappened in both directions supporting the underspecification hypothesis; whileexposure to the unimodal distribution decreased learners’ sensitivity, exposure tothe bimodal distribution improved their sensitivity (Maye, 2000, pp.103-105).Following studies, however, have shown more varied results. Hayes-Harb(2007) tested the distributional learning of stop voicing categories by adult En-glish speakers, using the same stimuli used in Maye (2000). She compared severalexposure conditions including ones that were equivalent to the control, unimodal,and bimodal conditions of Maye (2000). Interestingly, the results of Hayes-Harb’sstudy showed that after exposure, while participants in the control and bimodalconditions performed at the same level in the discrimination task, participants inthe unimodal condition performed significantly worse than those in the control andbimodal conditions, suggesting that learning happened only for participants in theunimodal condition.A similar finding was reported in Pajak (2012). Pajak tested the distributionallearning of consonantal duration categories (short consonants vs. long consonants)by adult English speakers. In her study, participants were exposed to the bimodaldistribution of consonants along a segmental duration continuum (bimodal condi-tion) or the unimodal distribution of consonants along the same segmental durationcontinuum (unimodal condition). After exposure, participants in the bimodal con-dition performed at chance level in the discrimination task, but participants in theunimodal condition showed a significant bias towards the “same” response. Ac-cording to Pajak, while participants in the bimodal condition did not learn any-thing, participants in the unimodal condition did learn to classify test stimuli into asingle category.Compared to the results of these studies, the results of Experiment 1 are not un-usual. While exposure to the input in the non-complementary condition did not af-63fect participants’ sensitivity, exposure to the input in the complementary conditionled to a reduction in participants’ sensitivity. This is analogous to the maintenancehypothesis in Maye (2000), and suggests that participants in the complementarycondition learned to treat the retroflex [ù] and alveolopalatal [C] as variants of asingle category.Part of the maintenance hypothesis is good pre-existing sensitivity, which wassomething we see in the data from the present experiment. Participants in the con-trol condition performed fairly well in the discrimination task. They performedparticularly well in distant pair trials (step 2 vs. step 10). The mean accuracy oftheir responses in these trials was 75.4% (SD= 27) across two sessions. A possiblefactor that contributed to the good sensitivity of participants in the control conditionis the fact that the retroflex [ù] and alveolopalatal [C] were completely new to them.In other words, not having experienced these sounds resulted in good sensitivity.This may sound quite counterintuitive, but studies on L2 phonological acquisi-tion have demonstrated that in some cases having less knowledge of L2 phonol-ogy allows learners to be more sensitive to acoustic differences between some L2phonemes. Diehm (1998), for example, tested the perception of the different de-grees of palatalization in Russian by Russian speakers and English-speaking Rus-sian learners. Russian has four different degrees of palatalization, non-palatalized(CV), simple palatalization (CjV), palatalized yod (CjjV), and palatalized i-yod(CjijV). Surprisingly, Diehm found that English speakers did better than Russianspeakers with the identification of the palatalized i-yod (CjijV); Russian speakerscorrectly identified CjijV only 26% of the time and misidentified it as CjjC 73%of the time, while English speakers correctly identified CjijV 78% of the time andmisidentified it as CjjV only 15% of the time. According to Diehm, CjijV hasextremely low functional load in Russian and CjijV and CjjV are in a situationof near-merger. Therefore, for Russian speakers, having the functional knowl-edge about CjijV and CjjV made them less sensitive to the acoustic informationthat differentiates CjijV from CjjV. By contrast, English speakers do not have thefunctional knowledge Russian speakers do, so their perception is not biased by thelinguistic knowledge; their perception is based more on the acoustic properties ofCjijV and CjjV. In this way, having less experience with an L2 sometimes enableslearners to attend to the acoustic properties of the sounds in the L2 better. The64learning examined in Diehm (1998) is very different from the learning examinedin this experiment, but it is still possible that, for participants in the control con-dition, not having experienced the retroflex [ù] and alveolopalatal [C] at all couldenable them to attend to the acoustic properties of these sounds better in the testtrials.2.6 ConclusionThere were three important empirical findings from the results of Experiment 1.First, adult English speakers have fairly good pre-existing sensitivity to acousticdifferences between Mandarin retroflex [ù] and alveolopalatal [C]. Second, whenEnglish speakers are exposed to input in which these two segments occur in over-lapping contexts, and thus the occurrences of the segments are not predictable fromthe contexts, they maintain the pre-existing sensitivity. Third, when English speak-ers are exposed to input in which these two segments occur in mutually exclusivecontexts, and thus the occurrences of the segments are predictable from the con-texts, they become less sensitive to the acoustic differences. The third finding isparticularly important since it suggests that the segments in complementary distri-bution are learned as something like allophones. Experiment 1 provided the firstexperimental support for the hypothesis that adults can learn allophonic relation-ships between segments from the complementary distribution of the segments ininput.65Chapter 3Experiment 2: Phoneticnaturalness and the learning ofallophony3.1 IntroductionExperiment 1 tested whether adults can learn allophonic relationships betweensegments from the complementary distribution of the segments in input. Learn-ers in two experimental conditions were exposed to the same bimodal distributionof novel sounds that implied the classification of the sounds into two segmentalcategories, the retroflex [ù] and alveolopalatal [C]. The crucial difference betweenthese two conditions was in the phonotactic distribution of the target segments.In one condition, the segments occurred in overlapping contexts and their occur-rences were not predictable from the contexts (non-complementary condition). Inthe other condition, the segments occurred in mutually exclusive contexts suchthat their occurrences were predictable from the contexts (complementary condi-tion). The results showed that learners in the non-complementary condition main-tained their pre-existing sensitivity to acoustic differences between [ùa] and [Ca],but learners in the complementary condition showed reduced sensitivity after ex-posure. These results suggest that learners in the complementary condition learned66to treat[ù] and [C] as something like allophones.In this chapter, I address the question of how robust this learning is. I specifi-cally ask whether the learning of allophony is constrained by the naturalness of thepatterns of complementary distribution. This question relates to an issue of majorimportance in current research on language learning: the way in which the induc-tive learning of linguistic patterns is subject to constraints. Human learners havethe ability to inductively learn regularities in input (e.g., statistical learning: Aslinet al. 1998, Saffran et al. 1996a, Saffran et al. 1996b). However, it has been demon-strated that inductive learning is constrained or biased such that some patterns aremore learnable than others (e.g., Moreton and Pater, 2012a,b; Newport and Aslin,2004; Saffran, 2002; Thiessen, 2011a). If allophony is inductively learned fromthe complementary distribution of segments in input, it is important to understandwhether the learning is constrained and how.3.2 Constraints on statistical learningStudies have demonstrated that both infants and adults are able to segment con-tinuous speech into sub-strings based on statistical dependencies between adjacentsyllables; they extract sub-strings that have higher between-syllable transitionalprobabilities (e.g., Aslin et al., 1998; Saffran et al., 1996a,b). Studies have alsodemonstrated that statistical learning is domain-general. Human learners can seg-ment continuous non-speech tone sequences into sub-strings based on statisticaldependencies between adjacent tones (Saffran et al., 1999), and they can also learnvisual patterns based on statistical dependencies between visual objects in a scene(Fiser and Aslin, 2002; Kirkham et al., 2002).Despite the prevalence of studies showing the robust effects of statistical learn-ing, it has been recognized that statistical learning is constrained by perceptual fac-tors. For example, Newport and Aslin (2004) reported that adults failed to segmentcontinuous speech into sub-strings based on transitional probabilities between non-adjacent syllables but successfully did segmentation based on transitional probabil-ities between non-adjacent segments of the same class (i.e., between non-adjacentconsonants or non-adjacent vowels). The failure to do segmentation based on tran-sitional probabilities between non-adjacent syllables suggests that distance be-67tween the objects over which statistical dependencies are learned affects the easeof learning (cf. Go´mez, 2002). The success of doing segmentation based on transi-tional probabilities between non-adjacent segments of the same class suggests thatsimilarity between the objects over which statistical dependencies are learned alsoaffects the ease of learning (through Gestalt principles of similarity according toNewport and Aslin 2004). The similarity constraint has been reported in a study onthe segmentation of non-speech tone sequences as well (Creel et al., 2004). Creelet al. reported that adults performed better in the segmentation of non-speech tonesequences based on transitional probabilities between non-adjacent tones when thetones were similar to each other (e.g., from the same pitch range).3.3 Constraints on the learning of phonologyThere is a growing number of studies identifying constraints on the learning ofphonology. In artificial language learning experiments, it has been demonstratedthat the inductive learning of phonological patterns is constrained such that somepatterns are more learnable than others. In this section, I will provide a brief re-view of factors that have been considered to affect phonological pattern learning.Previous studies have investigated constraints on phonological pattern learning byboth infants and adults, and these studies suggest that the learning is constrained insimilar manners with both infants and adults.1Some studies have suggested that the phonetic naturalness of patterns con-strains the learning (Carpenter, 2010; Gerken and Bollt, 2008; Schane et al., 1975).Phonetically natural patterns are more learnable than unnatural ones. Here, naturalpatterns are those that can be explained with reference to aspects of speech pro-duction and/or speech perception (e.g., Blevins, 2008). For example, Schane et al.(1975) exposed adult English speakers to input in which word-final consonantswere deleted in certain environments. In one condition, word-final consonantswere deleted before consonant-initial words (e.g., amuf + paSi → amupaSi). Inanother condition, word-final consonants were deleted before vowel-initial words(e.g., amuf + oga → amuoga). The first pattern is more natural than the second1Cristia et al. (2011c) reported some developmental changes in the effects of constraints on thelearning of phonotactic patterns by infants. Specifically, the learning by 4-month-old infants is lessconstrained compared to the learning by 7.5-month-old infants.68one because it involves consonant cluster reduction, which is grounded in phoneticeffects such as the gestural overlap between consonants in a cluster (e.g., Byrd andTan, 1996). After exposure, learners in the first condition showed better learningperformance than learners in the second condition. Carpenter (2010) exposed adultEnglish speakers to input in which the distribution of stress was conditioned by theheight of vowels. In one condition, low vowels were stressed, and high vowels wereunstressed. In another condition, high vowels were stressed, and low vowels wereunstressed. In natural languages, stress patterns can be sensitive to the sonorityof vowels; in such cases, stress preferentially targets vowels with higher sonority(e.g., De Lacy, 2004). Since low vowels have higher sonority, the first stress patternis more natural than the second one. After exposure, learners in the first conditionshowed better learning performance than learners in the second condition.The claim that phonetic naturalness affects the learning of phonological pat-terns does not imply that unnatural patterns are impossible to learn. For example,Pycha et al. (2003) compared the learning of vowel harmony and vowel dishar-mony by adult English speakers. According to Pycha et al., vowel harmony is morenatural than vowel disharmony because the former has phonetic grounding in phe-nomena such as vowel-to-vowel coarticulation. However, the results of the studyshowed that adults could learn vowel harmony and vowel disharmony equally well.Seidl and Buckley (2005) tested the learning of phonotactic patterns by 9-month-old English-learning infants. In their study, a group of infants were exposed toinput in which sibilants occurred in intervocalic position and stops occurred else-where. Another group of infants were exposed to input in which stops occurredin intervocalic position and sibilants occurred elsewhere. The first pattern is morenatural than the second one because it assumes the results of spirantization in in-tervocalic position, which can be explained in terms of phonetic factors such asarticulatory effort (e.g., Kirchner, 1998). After exposure, infants in both groupsshowed learning equally well.Studies have also demonstrated that the formal complexity of patterns con-strains the learning. Here, complexity of patterns is determined by the numberof phonological features and the number of operations that are required to de-scribe the patterns (e.g., Moreton and Pater, 2012a). Specifically, patterns thatapply to sets of sounds that are defined by a small number of phonological features69(i.e., natural classes) are simple (or systematic) compared to the ones that apply tosets of random sounds. Studies have demonstrated that simple patterns are morelearnable than complex ones (Endress and Mehler, 2010; Kuo, 2009; Peperkampet al., 2006b; Pycha et al., 2003; Saffran and Thiessen, 2003; Skoruppa et al., 2011;Wilson, 2003). For example, Saffran and Thiessen (2003) exposed 9-month-oldEnglish-learning infants to input in which the distribution of consonants was con-ditioned by syllable position. A group of infants were exposed to input in whichvoiceless stops ([p, t, k]) occurred in syllabe onset position and voiced stops ([b,d, g]) occurred in syllable coda position. Another group of infants were exposedto input in which a set of arbitrary consonants ([p, d, k]) occurred in syllable on-set position and another set of arbitrary consonants ([b, t, g]) occurred in syllablecoda position. After exposure, infants in the first group learned the phonotacticpatterns, but infants in the second group did not. Peperkamp et al. (2006b) com-pared the learning of systematic and arbitrary phonological alternations by adultFrench speakers. A group of learners were exposed to input in which consonantsfrom the same place and manner class (e.g., homorganic stops) alternated betweenvoiced and voiceless; the consonants were voiced in intervocalic position (e.g., [nElpemuS]∼[Ka bemuS]). Another group of learners were exposed to input in whichthe alternation involved place and manner of articulation in addition to voicing(e.g., [nEl pemuS]∼[Ka ZemuS]). After exposure, learners in the first group learnedthe alternation, but learners in the second group did not.Despite the robust effects of complexity, complex patterns are not impossibleto learn. Some studies have demonstrated that both infants and adults can learnphonotactic patterns that apply to sets of arbitrary sounds (Chambers et al., 2003;Kuo, 2009; Onishi et al., 2002). For example, Chambers et al. (2003) demonstratedthat 16.5-month-old English-learning infants learned phonotactic patterns in which[b, k, m, t] occurred in syllable onset position and [p, g, n, t, S] occurred in syllablecoda position.In some of these studies, however, it is not clear whether the factor that con-strains the learning is phonetic naturalness or complexity. This is because phoneti-cally natural patterns are usually simple in their formal representations. Therefore,a comparison between systematic and arbitrary patterns can often be interpretedas a comparison between phonetically natural and unnatural patterns. For exam-70ple, Wilson (2003) exposed adult English speakers to input in which the distri-bution of two allomorphs ([-la] and [-na]) of a single suffix was conditioned bythe consonants of the preceding words. In one condition, while [-la] occurred af-ter CVCV words whose second consonant was either [t] or [k] (e.g., [suto-la] and[tuko-la]), [-na] occurred after CVCV words whose second consonant was a nasal(e.g., [dume-na]). In another condition, [-la] occurred after words with a nasal or [t](e.g., [dume-la] and [suto-la]), and [-na] occurred after words with [k] (e.g., [tuko-na]). After exposure, learners in the first condition learned the distribution, butlearners in the second condition did not. Here, the distribution of [-la] and [-na]in the first condition assumed a pattern of nasal harmony; the sonorant [l] wasnasalized after a nasal consonant. This is phonetically natural in the sense thatnasal harmony is phonetically grounded (in coarticulatory nasalization) and for-mally simple (or systematic) in the sense that the environment that conditioned thealternation can be defined by a single phonological feature ([+nasal]) and the alter-nation can be described as the change of of the sonorant from [-nasal] to [+nasal].Indeed, such nasal consonant harmony patterns are attested in natural languages(e.g., Hansson, 2010).Researchers have argued that what makes natural and simple patterns morelearnable are inductive biases that learners are subject to (Moreton, 2008; Moretonand Pater, 2012a,b; Wilson, 2006). Wilson (2006) proposed a category of inductivebiases he called substantive biases. According to Wilson, substance in phonologyrefers to “any aspect of grammar that has its basis in the physical properties ofspeech. These properties include articulatory inertias, aerodynamic pressures, anddegrees of auditory salience and distinctiveness” (Wilson, 2006, p. 946). Substan-tive biases are cognitive biases that predispose learners toward those patterns thatare phonetically grounded.Wilson (2006) demonstrated that substantive biases affect the way learnersmake generalizations in the learning of phonological patterns. In his artificial lan-guage learning experiment, adult English speakers were exposed to input in whichvelar consonants are palatalized in certain environments. In one condition, ve-lar palatalization happened only before a high front vowel (e.g., /ki/ → [tSi]). Inanother condition, velar palatalization happened only before a mid front vowel(e.g., /ke/ → [tSe]). After exposure, learners were tested on the generalization of71the rule to new vowel contexts: generalization to the mid vowel context for learnersin the first condition and generalization to the high vowel context for learners inthe second condition. The results showed that learners in the first condition did notgeneralize the rule to the mid vowel context, but learners in the second conditiondid generalize the rule to the high vowel context. According to Wilson, learnersknew that palatalization in the high vowel context is phonetically more natural thanpalatalization in the mid vowel context and that the occurrence of the latter impliesthe occurrence of the former. Therefore, the learning of palatalization in an unnat-ural (or less expected) context allowed learners to infer that palatalization shouldhappen in more natural (expected) context as well. White and Sundara (2014) re-ported a similar bias in the learning of phonological alternations by 12-month-oldEnglish-learning infants. In their study, a group of infants were exposed to input inwhich there was an alternation between a pair of relatively similar sounds (b]∼[v]).Another group of infants were exposed to input in which there was an alternationbetween a pair of relatively less similar sounds ([p]∼[v]). Afte exposure, learnersin the first group learned the alternation they were familiarized with, but did notgeneralize it to a pair of less similar sounds ([p]∼[v]). By contrast, learners in thesecond group learned the alternation they were familiarized with, and generalizedit to a pair of more similar sounds ([b]∼[v]). According to White and Sundara,infants knew that phonological alternations between phonetically similar soundsare more natural than alternations between phonetically dissimilar sounds. There-fore, learning an unnatural alternation allowed learners to infer its more naturalcounterpart should happen as well.Moreton and Pater (2012a) introduced another category of learning biases,complexity biases. These are cognitive biases that predispose learners towards pat-terns that are formally simple and systematic. Moreton and Pater (2012b) furtherargued that substantive biases are a part of complexity biases. This is because, asdiscussed above, the patterns that are considered to be phonetically natural are usu-ally simple and systematic in their phonological representations. Moreton (2012)also claimed that, unlike substantive biases, which rely on learners’ knowledgeabout phonological substance, complexity biases are domain-general.723.4 Constraints on the learning of allophonySome of the studies mentioned above have investigated constraints on the learn-ing of phonological alternations which are basically allophonic rules (Peperkampet al., 2006b; Skoruppa et al., 2011; Wilson, 2003). However, no study has inves-tigated constraints on the learning of allophony as a question of category learning.Experiment 2 is the first attempt to explore constraints on the learning of allophony.Specifically, it tests the effect of the phonetic naturalness of the patterns of com-plementary distribution in input.In the input used in the complementary condition of Experiment 1, the retroflex[ù] occurred after the high back rounded [u], and the alveolopalatal [C] occurredafter the high front unrounded [i]. This particular pattern of complementary dis-tribution was implemented on the basis of relative similarities between the targetsegments and their respective contexts. While the connections between [ù] and [u]are not so obvious, the connections between [C] and [i] are strong; both the articula-tion of [C] and the articulation of [i] involve the formation of a palatal constriction,and the presence of a palatal constriction is reflected in the acoustic signal as lowF1 and high F2. These relative similarities between the target segments and thecontexts make the complementary distribution natural in the sense that the connec-tions between the target segments and the conditioning contexts, specifically theconnections between [C] and [i], may imply some sort of coarticulation (e.g., thepalatality of [C] has arisen as a result of coarticulation with the preceding [i]).Here, the relative similarities between the target segments and the contexts mayindirectly affect the learning of the allophonic relationship between [ù] and [C]. Asdiscussed earlier, studies show that the similarity between linguistic objects overwhich statistical dependencies are learned may affect the ease of learning (e.g.,Newport and Aslin, 2004). Therefore, it is possible that the acoustic similaritiesbetween the target segments and the contexts may facilitate the learning of thecomplementary distribution. On the assumption that the learning of allophony cru-cially relies on the learning of complementary distribution, this may eventuallyfacilitate the learning of the allophonic relationship. Alternatively, as discussed inthe previous section, studies show that learners have learning biases that predis-pose them towards phonetically natural patterns (e.g., Wilson, 2006). Therefore, it73is possible that these learning biases may facilitate the learning of the phoneticallynatural complementary distribution and the allophonic relationship.If the learning of allophony is constrained by phonetic naturalness, learnersshould be able to learn segments as allophones only when the segments are in pho-netically natural complementary distribution. If phonetic naturalness has no effecton the learning, by contrast, learners should be able to learn segments as allophoneswhen the segments are in complementary distribution no matter whether the patternof the distribution is phonetically natural or not. Since the learning of an allophonicrelationship between two segments from input with a phonetically-natural comple-mentary distribution has been already tested in Experiment 1, I test whether adultscan learn the same allophonic relationship from input with a phonetically unnaturalcomplementary distribution in Experiment 2.3.5 MethodsIn Experiment 2, I tested the learning of the allophonic relationship betweenthe retroflex [ù] and alveolopalatal [C] from input in which these two target seg-ments are in a phonetically unnatural complementary distribution; the tokens of theretroflex [ù] occurred after the high front unrounded [i] and the tokens of the alve-olopalatal [C] occurred after the high back rounded [u] (complementary-unnaturalcondition). If the learning is constrained by the phonetic naturalness of the patternsof complementary distribution, learners in the complementary-unnatural conditionshould not learn to treat the target segments as something like allophones, and thusshould not show reduction in their sensitivity to acoustic differences between thesegments.The method was the same as in Experiment 1. There were two sessions overtwo consecutive days. In each session, I first exposed participants to input, and thentested their sensitivity to acoustic differences between the target segments using anAX discrimination paradigm.3.5.1 ParticipantsTwenty adult native English speakers with no known language or hearing disor-der participated in Experiment 2. All participants completed two sessions over74two consecutive days and were paid $20 for their participation. All participantsreported that English was their first and dominant language. Many of them weremultilingual but none of them was familiar with any language that has two or morepost-alveolar fricatives as phonemes.3.5.2 Exposure stimuliExposure stimuli consisted of 256 bisyllabic strings. Each string comprised a con-text syllable followed by either a critical syllable or a filler syllable. Syllables inthe exposure stimuli were the same as the ones used in Experiment 1. There wereeight context syllables grouped into two classes according to the vowel quality, [i]context ([li], [mi], [pi], and [gi]) and [u] context ([lu], [mu], [pu], and [gu]). Therewere eight critical syllables drawn from a 10-step continuum between [ùa] and [Ca](steps 2, 3, 4, 5, 7, 8, 9, and 10). There were four filler syllables, [ta], [tha], [fa],and [ha], of which only [ta] and [tha] were used in exposure stimuli (i.e. [fa] and[ha] were used only in test stimuli).The frequency of critical syllables was manipulated so that the aggregate dis-tribution showed exactly the same bimodal shape with two frequency peaks as inExperiment 1 (Figure 3.1). These 16 tokens of critical syllables were combinedwith the eight context syllables to generate 128 critical stimuli. Similarly, 16 to-kens of filler syllables (8 tokens of each of [ta] and [tha]) were combined with eightcontext syllables to generate 128 filler stimuli. These 256 exposure stimuli weredivided into two subsets according to the consonants of context syllables, a subsetwith [l] and [p] and the other subset with [m] and [g]. The first subset was used inSession 1 and the second subset was used in Session 2.75012341 2 3 4 5 6 7 8 9 10StepsFrequencyFigure 3.1: Aggregate distribution of critical syllables (Experiment 2)The phonotactic distribution of the critical syllables was manipulated so thatthe tokens of the retroflex category (steps 2, 3, 4, and 5) occurred after [i], and thetokens of the alveolopalatal category (step 7, 8, 9, and 10) occurred after [u] (Figure3.2). Table 3.1 summarizes exposure stimuli used in Experiment 2 alongside theones used in Experiment 1.76024681 2 3 4 5 6 7 8 9 10StepsFrequency Contexts[i][u]Figure 3.2: Distribution of 32 critical syllables in the complementary-unnatural condition (Experiment 2)Condition Context syllables Critical syllables Filler syllablesNon-complementary(Exp. 1)li-, mi-, pi-, gi- steps 2, 3, 4, 5, 7, 8, 9, 10 -ta, -thalu-, mu-, pu-, gu- steps 2, 3, 4, 5, 7, 8, 9, 10 -ta, -thaComplementary(Exp. 1)li-, mi-, pi-, gi- steps 7, 8, 9, 10 -ta, -thalu-, mu-, pu-, gu- steps 2, 3, 4, 5 -ta, -thaComplementary-unnatural(Exp. 2)li-, mi-, pi-, gi- steps 2, 3, 4, 5 -ta, -thalu-, mu-, pu-, gu- steps 7, 8, 9, 10 -ta, -thaControl(Exp. 1)li-, mi-, pi-, gi- NA -ta, -tha, -fa, -halu-, mu-, pu-, gu- NA -ta, -tha, -fa, -haTable 3.1: Exposure stimuli in four conditions (Experiments 1 and 2)3.5.3 AX discrimination taskParticipants’ sensitivity to acoustic differences between the retroflex [ù] and alve-olopalatal [C] was tested using an AX discrimination paradigm. The test was iden-tical to the one used in Experiment 1. In each one of the test trials, participants77compared a token of [ùa] and a token of [Ca] (different trials) or two non-identicaltokens of [ùa] or [Ca] (same trials). There were two types of different trials depend-ing on the acoustic distance between the test stimuli, distant pair and close pair. Inthe distant pair trials, participants compared two stimuli that are acoustically quitedifferent from each other (step 2 and step 10). In the close pair trials, participantscompared two stimuli that are acoustically more similar to each other (step 4 andstep 8). There were also two types of same trials. In the retroflex category trials,participants compared two non-identical tokens of [ùa] (step 2 vs. step 4). In thealveolopalatal category trials, participants compared two non-identical tokens of[Ca] (step 8 vs. step 10). Table 3.2 summarizes the test stimuli used in the AXdiscrimination test.Test trialsDifferent trialsDistant pair step 2 vs. step 10Close pair step 4 vs. step 8Same trialsRetroflex step 2 vs. step 4Alveolo-palatal step 8 vs. step 10Filler trialsDifferent trials ta vs. tha, fa vs. haSame trials ta vs. ta, tha vs. tha, fa vs. fa, ha vs. haTable 3.2: Test stimuli (Experiments 2: same as the ones used in Experiment 1)3.5.4 Design and procedureExperiment 2 followed the same design and procedure used in Experiment 1. Itconsisted of two sessions over two consecutive days. On Day 1, participants cameinto the lab and signed the consent form. At the beginning of the session, partic-ipants were told that they would hear someone speaking in a language that theyhad never heard before, and they would be asked about what they learned aboutthe language after hearing the speech.2 A session comprised three phases: prac-tice, exposure, and test. In the practice phase, participants got some practice on theAX discrimination task. In the exposure phase, participants heard a block of 128exposure stimuli presented in a random order four times. ISI was one second. Theexposure stimuli were presented as short “sentences” so that each syllable in the2See Section 2.3.6 for the actual instructions given to participants.78stimuli could be treated as a word. In order to help participants to stay attentiveto exposure stimuli, they were given a monitoring task to perform while listening(see Section 2.3.5). In the test phase, there were 64 test trials. Half of the testtrials were different trials, and the other half were same trials. The 32 differenttrials included 16 distant pair trials and 16 close pair trials. The 32 same trialsincluded 16 retroflex category trials and 16 alveolopalatal category trials. Therewere 32 filler trials (16 different trials and 16 same trials). In each trial, ISI was750 ms. Participants were given a maximum of five seconds to respond, but thetrial was terminated whenever participants recorded a response. ITI was two sec-onds. On Day 2, participants came back to the lab and did Session 2. Session 2followed the same procedure used in Session 1, except that participants filled out alanguage background questionnaire and received $20 after finishing the test phase.E-prime Professional (ver. 2.0) was used to control the presentation of stimuli andthe recording of responses (Schneider et al., 2002).3.6 ResultsBefore the analyses of the data, participants’ performance on the monitoring taskwas checked to see whether they were attentive to the stimuli during exposure. Allparticipants performed better than 75% on the monitoring task, and therefore theywere all included in the following data analyses. Responses to test trials in theAX discrimination test were converted into sensitivity or d′ scores (Macmillan andCreelman, 2004). d′ scores were computed for each participant with each pair type(distant or close) in each session (1 or 2). In order to compute hit rates and falsealarm rates, different trials were coupled with same trials that shared the same firstsyllable (the A of an AX pair) (see Section 2.4). Following standard practice, whenhit rate or false alarm rate was 0, it was replaced by 12N , and when hit rate was 1, itwas replaced by 1− 12N (N=number of trials for a particular trial type) (Macmillanand Creelman, 2004, p.8). All of the following statistical analyses were done in R3.0.3 (R Core Team, 2014).Since the design of Experiment 2 was fully comparable to that of Experiment1, the results of Experiment 2 were analyzed alongside the results of Experiment1. Figure 3.3 shows the mean d′ scores for distant pair trials by session (1 and792) and condition (control (Exp. 1), non-complementary (Exp. 1), complementary-natural (Exp. 1), complementary-unnatural (Exp. 2)). Note that in order to makethe distinction between the complementary condition in Experiment 1 and thecomplementary-unnatural condition in Experiment 2 clear, the former conditionwill be referred to as the complementary-natural conditon from now on. Figure 3.4shows the mean d′ scores for close pair trials by session and condition.01231 2Sessiond'GroupControlNon-compComp-natComp-unnatDistant pairFigure 3.3: Mean d′ scores with 2 SEs for distant pair (Experiments 1 and 2)A repeated-measures ANOVA was conducted on the d′ scores with condition asa between-participant variable and pair and session as within-participant variables.The significance level was set at p < 0.05. There were significant main effects ofcondition [F(3,76) = 2.985, p = 0.036], pair [F(1,76) = 273.906, p < 0.001],and session [F(1,76) = 14.433, p < 0.001]. There was a significant two-way in-teraction between pair and session [F(1,76) = 5.624, p = 0.02], but no significanttwo-way interactions between condition and pair [F(3,76) = 1.154, p = 0.333]nor between condition and session [F(3,76) = 0.618, p = 0.605]. There was nosignificant three-way interaction [F(3,76) = 1.166, p = 0.328].With condition, post-hoc pairwise comparisons with the Holm adjustment8001231 2Sessiond'GroupControlNon-compComp-natComp-unnatClose pairFigure 3.4: Mean d′ scores with 2 SEs for close pair (Experiments 1 and 2)method indicated that the d′ scores of the complementary-natural condition (M =1.07, SD = 1.06) were significantly lower than those of the control condition(M = 1.62, SD = 0.86) (p = 0.001), the non-complementary condition (M = 1.64,SD = 0.95) (p = 0.001), and the complementary-unnatural condition (M = 1.69,SD = 1.05) (p < 0.001), but there was no significant difference between the lastthree conditions.Due to the interaction between pair and session, the effects of pair and ses-sion were explored separately in follow-up analyses. First, I explored the effect ofpair for Session 1 and Session 2 separately. For each session, I conducted a sep-arate repeated-measures ANOVAs on the d′ scores with condition as a between-participant variable and pair as a within-participant variable. Since there was nosignificant three-way interaction, the effect of condition and its interactions withpair were not explored. For Session 1, the analysis yielded a significant main effectof pair [F(1,76) = 190.519, p < 0.001]. Post-hoc pairwise comparison indicatedthat the d′ scores for distant pair trials (M = 1.87, SD = 0.95) were significantlyhigher than those for close pair trials (M = 0.86, SD = 0.76) (p< 0.001). For Ses-81sion 2, the analysis yielded a significant main effect of pair [F(1,76) = 155.467,p < 0.001]. Post-hoc pairwise comparison indicated that the d′ scores for the dis-tant pair trials (M = 2.05, 0.97) were significantly higher than those for the closepair trials (M = 1.24, SD = 0.89) (< 0.001).Second, I explored the effect of session for distant pair trials and close pair tri-als separately. I conducted separate repeated-measures ANOVAs on the d′ scoreswith condition as a between-participant variable and session as a within-participantvariable. Since there was no significant three-way interaction, the effects of con-dition and its interactions with session were not explored. For distant pair tri-als, the analysis yielded no significant main effect of session [F(1,76) = 3.606,p = 0.061]. For close pair trials, the analysis yielded a significant main effect ofsession [F(1,76) = 25.13, p < 0.001]. Post-hoc comparison indicated that the d′scores for Session 2 (M = 1.24, SD = 0.9) were significantly higher than those forSession 1 (M = 0.87, SD = 0.76) (p = 0.005).In sum, participants in the complementary-unnatural condition showed thesame level of sensitivity as participants in the control and non-complementary con-ditions, and they showed significantly higher sensitivity than participants in thecomplementary-natural condition. Participants across conditions showed a signifi-cant improvement in sensitivity to the acoustic differences between the test stimuliused in close pair trials from Session 1 to Session 2.3.7 DiscussionThe crucial finding of Experiments 1 and 2 is that participants in thecomplementary-unnatural condition showed significantly better sensitivity toacoustic differences between the retroflex [ùa] and alveolopalatal [Ca] than partici-pants in the complementary-natural condition. In these two conditions, participantswere exposed to input in which [ù] and [C] were in complementary distribution.The crucial difference between these two conditions was in the phonetic natural-ness of the patterns of complementary distribution. In the complementary-naturalcondition, the target segments occurred in phonetically natural contexts. In thecomplementary unnatural condition, by contrast, the target segments occurred inphonetically unnatural contexts. The results of Experiments 1 and 2 together sug-82gest that allophonic relationships between two segments can be learned from inputin which the segments occur in phonetically natural complementary contexts butnot from input in which the segments occur in phonetically unnatural complemen-tary contexts. This suggests that the learning of allophony, at least as indicated bya reduction in sensitivity to acoustic differences, is constrained by phonetic natu-ralness.These findings add to the growing evidence for the effects of constraints on thelearning of phonology (see Section 3.3). Previous studies about constraints on thelearning of phonology have focused on the learning of phonological patterns, suchas phonotactics and phonological alternations (including allophonic rules), but verylittle attention has been paid to constraints on the learning of sound categories (cf.Gilkerson, 2005). The findings of Experiments 1 and 2 provide the first evidenceshowing that the learning of allophony is constrained by the phonetic naturalnessof the patterns of complementary distribution.Now the question is how exactly phonetic naturalness affects the learning ofallophony. In Section 3.4, I mentioned two hypotheses about the way phoneticnaturalness may affect the learning of allophony. First, the acoustic similarities be-tween the target segments and the natural contexts may facilitate the learning of thestatistical dependencies between the target segments and the contexts, and thus thelearning of the complementary distribution of the target segments (e.g., Newportand Aslin, 2004). On the assumption that the learning of allophony crucially re-lies on the learning of complementary distribution, this may indirectly facilitate thelearning of the allophonic relationship between the target segments. Second, learn-ers may have some learning biases that predispose them to favour those phonotacticpatterns that are phonetically grounded (e.g., Wilson, 2006), and these biases mayfacilitate the learning of the complementary distribution, and thus the learning ofthe allophonic relationships.The findings of Experiments 1 and 2 do not favour one or the other one ofthese interpretations. Therefore, the question of how exactly phonetic naturalnessaffects the learning of allophony remains open and further studies are needed. Inthe following subsections, I will propose yet another hypothesis about the way pho-netic naturalness can affect the learning of allophony: context effects hypothesis.In this hypothesis, phonetic naturalness serves as the basis for perceptual biases83that affect the learning of sound categories in a more direct way. The hypothe-sis assumes that when learners hear sounds in different contexts, as was the case inthe complementary-natural and complementary-unnatural conditions, they actuallyperceive the sounds differently (i.e., speech perception is subject to context effects:see below for details). More specifically, learners perceive two different sounds asbeing more similar to each other when they hear the sounds in phonetically naturalcomplementary contexts than in phonetically unnatural complementary contexts.According to the distributional learning hypothesis (e.g., Maye 2000; see Sec-tion 1.4.1), learners form categories for speech sounds based on the frequency dis-tributions of speech sounds in acoustic space. However, it is more precise to saythat what they actually learn is the aggregate distribution of the sounds in auditoryspace. This is because their knowledge about the input is built upon their experi-ence of the acoustic signal rather than the acoustic signal itself. This means thatlearners’ knowledge about the aggregate distribution can be affected by the waysthey perceive the sounds in the input. In the following subsections, I first explaincontext effects in speech perception and then how they can affect the learning ofsegmental categories by comparing the input used in the complementary-naturaland complementary-unnatural conditions.3.7.1 Context effects in speech perceptionIn the speech signal, acoustic cues for the perception of sounds are temporally dis-tributed across segmental boundaries. For example, the perception of a stop conso-nant as either voiced or voiceless is cued not only by the acoustic properties of thestop consonant itself (e.g., VOT) but also by the acoustic properties of the neigh-bouring sounds (e.g., formant transitions into the following vowel) (e.g., Lisker,1986). A major source of temporally distributed acoustic cues is coarticulation.When we produce speech, articulatory gestures required for the production of suc-cessive sounds overlap in time, and this gestural overlap systematically affects theacoustic realization of these sounds (Farnetani and Recasens, 1997). As a result,the portion of the acoustic signal that corresponds to a sound carries informationabout the identity of the sound as well as information about the coarticulatory ef-fects of the neighbouring sounds. This means that acoustic cues for the perception84of a target sound are available not only from the portion of the acoustic signal thatcorresponds to the target sound itself but also from the portions of the acoustic sig-nal that correspond to the neighbouring sounds because the coarticulatory effects inthe neighbouring sounds provide information about the identity of the target soundthat triggered the coarticulation.Studies have demonstrated that the perception of sounds is significantly af-fected by manipulating the surrounding sounds (context effects) (Lindblom andStuddert-Kennedy, 1967; Mann, 1980; Mann and Repp, 1980; Repp and Mann,1981; Repp, 1981; Summerfield, 1975). For example, Lindblom and Studdert-Kennedy (1967) demonstrated that the categorization of vowels is significantlyinfluenced by consonantal contexts. In their study, adult English speakers cate-gorized vowel tokens taken from a continuum between [I] and [U] occurring in twoconsonantal contexts: [j j] and [w w]. The results showed that participants weremore likely to label intermediate tokens as [I] in the [w w] context and were morelikely to label the same intermediate tokens as [U] in the [j j] context; the categoryboundary shifted towards the [U] end in the [w w] context and towards the [I] endin the [j j] context (see Figure 3.5).[I] responses [U] responses[I] [U]boundary shift(a) Categorization in the [w w] context[I] responses [U] responses[I] [U]boundary shift(b) Categorization in the [j j] contextFigure 3.5: Context effects in the categorization of an [I] - [U] continuumSimilarly, Mann (1980) demonstrated that the categorization of consonants issignificantly affected by consonantal contexts. In her study, adult English speakerscategorized syllable tokens taken from a continuum between [da] and [ga] occur-ring in two consonantal contexts: [al ] and [aô ]. The results showed that partici-pants were more likely to label intermediate tokens as [ga] in the [al ] context and85were more likely to label the same intermediate tokens as [da] in the [aô ] context;the category boundary shifted towards the [ga] end in the [aô ] context and towardsthe [da] end in the [al ] context (see Figure 3.6).[da] responses [ga] responses[da] [ga]boundary shift(a) Categorization in the [aô ] context[da] responses [ga] responses[da] [ga]boundary shift(b) Categorization in the [al ] contextFigure 3.6: Context effects in the categorization of a [da] - [ga] continuumIn the literature, there are at least two major approaches to the question of howexactly context effects arise: the articulatory approach and the auditory approach.The articulatory approach assumes that listeners know how the articulatory andacoustic realization of the same sound may vary in different contexts due to coar-ticulation and take the context-dependent variation into consideration in mappingthe acoustic signal onto sound categories (compensation for coarticulation) (Lind-blom and Studdert-Kennedy, 1967; Mann, 1980). There are two major theoriesin the articulatory approach: the motor theory of speech perception (Libermanand Mattingly, 1985; Liberman and Whalen, 2000) and the direct realist theory(Fowler, 1986, 1996, 2006). These theories both assume that the target of speechperception is the source of speech production; listeners perceive speech sounds bytranslating the acoustic signal into articulatory events. When listeners hear coartic-ulated sounds, they perceive the sounds as coarticulatory events or the interactionbetween the articulatory events for the target sounds and the neighbouring sounds.Therefore, depending on the articulatory events of the neighbouring sounds, thesame acoustic signal may be mapped onto the articulatory events of different targetsounds.33The major difference between these two theories is that the former assumes that the mappingof acoustic signal onto articulatory events is mediated by some sort of representation (e.g., features86According to the articulatory approach, for example, listeners in Mann (1980)know that when [d] is produced after [ô] the articulatory gestures for [d] and [ô]overlap in time. Since the place of articulation of [ô] is further back than the placeof articulation of [d], the gestural overlap moves the place of articulation of [d]backwards. Listeners also know how this articulatory variation is reflected in theacoustic signal. Since [ô] has low F3, the gestural overlap results in the loweringof the onset F3 of [da]. Therefore, when they hear stimuli that are intermediatebetween [da] and [ga] (i.e., stimuli with too-low-to-be-[da] onset F3) after [ô], theyare more likely to perceive the stimuli as [da] because they infer that the too-low-to-be-[da] onset F3 is a result of coarticulation with the preceding [ô], not an inherentproperty of the stimuli. Similarly, listeners know that when [g] is produced after [l],the gestures for [g] and [l] overlap in time. Since the place of articulation of [l] isfurther forward than the place of articulation of [g], the gestural overlap moves theplace of articulation of [g] forward. They also know that this articulatory variationis reflected in the raising of the onset F3 of [ga]. Therefore, when they hear stimulithat are intermediate between [da] and [ga] (i.e., stimuli with too-high-to-be-[ga]onset F3) after [l], they are more likely to perceive the stimuli as [ga] becausethey infer that the too-high-to-be-[ga] onset F3 is a result of coarticulation with thepreceding [l], not an inherent property of the stimuli.The auditory approach assumes that context effects arise as a part of generalcontrast effects in perception. A dominant theory in this approach is the spec-tral contrast theory (Kluender et al., 2003; Lotto et al., 1997; Lotto and Kluender,1998). It explains that context effects result from how the auditory system respondsto changes in the signal. Rapid changes in the amplitude and the spectrum of theacoustic signal trigger an abrupt increase in the discharge rate of auditory nervefibres (ANFs), but the peak in discharge rate is always followed by a gradual decayor adaptation. This happens because of the depletion of the neurotransmitter at thesynapses between hair cells and ANFs in the cochlea (e.g., Smith, 1979). Adapta-tion happens in different frequency regions. For example, with vowel-like sounds,and gestures) and that perception is achieved through analysis-by-synthesis: comparison between theincoming signal and candidates that are internally synthesized using these features and gestures. Bycontrast, the latter does not assume the necessity of these representations; perception is achieved bythe direct mapping of acoustic signal onto articulatory events.87it happens at least in five different regions, low (below F1), F1, mid (between F1and F2), F2, and high (above F2) (Delgutte and Kiang, 1984). One role of adap-tation in speech perception is the enhancement of the spectral contrasts betweensuccessive sounds. Once ANFs are adapted, they become less responsive to thesubsequent stimulation. However, the adaptation of some ANFs leads to relativeexaggeration of the responses of the other unadapted ANFs to the subsequent stim-ulation. In this way, adaptation enhances the spectral contrasts between successivesounds (Delgutte, 1980, 1997; Delgutte and Kiang, 1984).According to the spectral contrast theory, context effects happen because neuraladaptation to the spectrum of the precursor sound modulates the perception of thespectrum of the following sound. For example, when listeners in Mann (1980)heard an intermediate syllable after [ô], adaptation happened with the ANFs thatare responsive to the frequency range around the F3 of [ô], and this adaptationmodulated the perception of the spectrum of the following syllable such that theweight of the spectrum shifted toward the higher end since [ô] has low F3. Asa result, the syllables were more likely to be perceived as [da]. Similarly, whenlisteners heard the same intermediate syllables after [l], adaptation happened withthe ANFs that are responsive to the frequency range around the F3 of [l]. Thisadaptation modulated the perception of the spectrum of the following syllablessuch that the weight of the spectrum shifted towards the lower end since [l] hashigh F3. As a result, the syllables were more likely to be perceived as [ga].In sum, listeners integrate temporally distributed acoustic cues in speech per-ception. Therefore, the perception of speech sounds is influenced by context insystematic ways. Specifically, listeners may perceive the same stimuli as differentsounds when the stimuli are presented in different contexts. In the next subsec-tion, I will discuss how the context-dependent perception of sounds may impactthe distributional learning of sound categories.3.7.2 Context effects and the distributional learning of soundcategoriesAccording to the distributional learning hypothesis, learners form categories forsounds based on their knowledge about the frequency distributions of the soundsin acoustic space. However, it is more precise to say that what they actually learn88through exposure to the input is the aggregate distribution of the sounds in auditoryspace, not acoustic space. This is because learners’ knowledge about the input isbuilt upon their experience or perception of the acoustic signal in the input.In previous experimental studies of distributional learning, participants in ex-perimental conditions were exposed to input in which sounds were presented inthe same, that is overlapping, environments. It was assumed that participants inexperimental conditions perceived the sounds as being the same in all environ-ments, and thus, the only difference between the conditions was in the frequenciesof the sounds (e.g., Maye and Gerken, 2000). This is a perfectly safe assumptionin previous studies. However, the mapping of sounds from acoustic space to au-ditory space is not trivial when it comes to the input used in Experiments 1 and2. In the input used in the experimental conditions (overlapping, complementary-natural, complementary-unnatural), the frequency distribution of novel sounds hadthe same bimodal shape in acoustic space. However, the input differed in theseconditions in terms of the phonotactic distribution of the sounds. This means thatparticipants in these conditions heard the same sounds but in different sets of con-texts. As discussed above, perception of speech sounds is affected by context suchthat the same stimulus may be perceived as different sounds when it is presented indifferent contexts. Therefore, when learners in different conditions hear the samesounds in different contexts, these sounds may be mapped onto different locationsin auditory space. This perceptual shift in the locations of the sounds may eventu-ally result in different shapes of the aggregate distribution of the sounds in auditoryspace in different conditions.In what follows, I will explain how context effects in speech perception canaffect distributional information that learners extract from input, and how that can,in turn, affect the learning of categories. To do so, I will compare the input used inthe complementary-natural and complementary-unnatural conditions, so that I canhighlight the difference between the phonetically natural complementary distribu-tion and the phonetically unnatural complementary distribution.Before going into the details of possible context effects in the perception ofthe novel sounds in phonetically-natural and phonetically-unnatural contexts, letus recapitulate the phonetic properties of the target segments (the retroflex [ù] andalveolopalatal [C]) and the contexts ([i] and [u]). The articulation of Mandarin89[ù] involves the formation of a short and slack constriction channel in the post-alveolar region (e.g., Ladefoged and Wu, 1984; Lee-Kim, 2014; Noguchi et al.,2015a; Proctor et al., 2012; Toda and Honda, 2003). By contrast, the articulationof Mandarin [C] involves the formation of a long and narrow constriction channelover the post-alveolar and palatal regions. The formation of a palatal constrictionis achieved by moving the tongue body forward and upward (Hu, 2008; Ladefogedand Wu, 1984; Lee, 2008; Lee-Kim, 2014; Proctor et al., 2012; Toda and Honda,2003). The presence of a palatal constriction in the articulation of [C] is reflected inthe acoustic signal as low F1 and high F2 at the onset of the following vowel. Thearticulation of the high front unrounded [i] involves the forward and upward move-ments of the tongue body, and the acoustics of [i] are characterized by low F1 andhigh F2. The articulation of the high back rounded [u] involves the backward andupward movements of the tongue body, and the acoustics of [u] are characterizedby low F1 and low F2.To highlight the relationship between the target segments and the contexts,Figure 3.7 shows F2 frequencies of the vowels in the context syllables and criticalsyllables used in the input of Experiments 1 and 2. Figure 3.7 (a) shows meanF2 frequencies at the 100 ms (50%) and 190 ms (95%) points of the vowels inthe context syllables. The mean F2 frequencies of [i] are higher than the meanF2 frequencies of [u]. Figure 3.7 (b) shows F2 frequencies at the 10 ms (5%) and100 ms (50%) points of the vowels in two critical syllables: step 2 (the canonicaltoken of the retroflex [ùa]) and step 10 (the canonical token of the alveolopalatal[Ca]). The onset F2 frequency of the alveolopalatal token is higher than the onsetF2 frequency of the retroflex token. Note that mean F2 frequency at the offset of[i] is higher than onset F2 frequencies of both retroflex and alveolopalatal tokens,and mean F2 frequency at the offset of [u] is lower than onset F2 frequencies ofboth retroflex and alveolopalatal tokens.The contrast between the retroflex [ù] and alveolopalatal [C] is also made bydifferences in the spectral shape of the frication noise. Here, I focus on the formanttransitions for two reasons. First, from the learners’ perspective, adult Englishspeakers seem to rely more on formant transitions in learning non-native post-alveolar fricatives (McGuire, 2007a,b, 2008). Second, studies on context effectshave demonstrated that the categorization of CV syllables from a continuum be-90121620100 190Time (ms)ERB-rate voweliu(a) F2 (context syllables)12162010 100Time (ms)ERB-rate step210(b) F2 (critical syllables)Figure 3.7: F2 transitions from context syllables to critical syllablestween [ba] and [da], in which the onset F2 was systematically manipulated, wassignificantly affected by the preceding vowel (Coady et al., 2003; Holt, 1999).These results suggest that the interaction of non-adjacent formant cues is robust.In the input used in the complementary-natural condition, the tokens of theretroflex [ùa] occur after [u] and the tokens of the alveolopalatal [Ca] occur after[i]. According to the articulatory approach, when participants hear a token of theretroflex [ùa] after [u], they may infer that there is coarticulation between the unfa-miliar syllable and the preceding vowel [u]. More specifically, they may infer thatthe onset F2 of the unfamiliar syllable has been lowered as a result of the coarticu-lation, and thus that the inherent onset F2 of the unfamiliar syllable is higher thanit is in the acoustic signal. This makes the unfamiliar syllable seem more like [Ca]on the [ùa] - [Ca] continuum. Similarly, when participants hear a token of the alve-olopalatal [Ca] after [i], they may infer that the onset F2 of the unfamiliar syllablehas been raised as a result of coarticulation with the preceding vowel [i], and thus,the inherent onset F2 of the unfamiliar syllable is lower than it is in the acoustic91signal. This makes the unfamiliar syllable seem more like [ùa] on the [ùa] - [Ca]continuum.According to the auditory approach, when participants hear a token of theretroflex [ùa] after [u], their perceptual system first adapts to the frequency regionaround the F2 of [u], and this adaptation exaggerates the relative height of the onsetF2 of the following unfamiliar syllable. This makes the unfamiliar syllable seemmore like [Ca]. Similarly, when participants hear a token of the alveolopalatal [Ca]after [i], their perceptual system first adapts to the frequency region around the F2of [i], and this adaptation exaggerates the relative lowness of the onset F2 of thefollowing unfamiliar syllable. This makes the unfamiliar syllable seem more like[ùa].What is crucial here is that both the articulatory and auditory approaches pre-dict that the presentation of these unfamiliar syllables in phonetically natural con-texts modulates the perceived onset F2 of the syllables. The onset F2 of the retroflextoken is perceived to be higher than it is in the acoustic signal, and the onset F2 ofthe alveolopalatal token is perceived to be lower than it is in the acoustic signal.Figure 3.8 demonstrates how these context effects may affect the aggregatedistribution in auditory space. Figure 3.8a shows a schematic representation of thefrequency distribution of the unfamiliar syllables in acoustic space in the input usedfor the complementary-natural condition. We can see that the distribution has twopeaks, implying that the syllables are classified into two categories, the retroflex[ùa] to the lower end and the alveolopalatal [Ca] to the higher end of the onsetF2 continuum. Figure 3.8b shows how context effects may affect the perception ofthese unfamiliar syllables when they are presented in phonetically natural contexts,the tokens of [ùa] after [u] and the tokens of [Ca] after [i]. As discussed above, thecontext effects may shift the location of the tokens of [ùa] to the higher end ofthe onset F2 continuum and the location of the tokens of [Ca] to the lower endof onset the F2 continuum in auditory space. Figure 3.8c shows the hypotheticalrepresentation of the aggregate distribution of these unfamiliar syllables in auditoryspace. As a result of the context effects, two frequency peaks shows a large overlap,and the aggregate distribution has a shallower trough between the peaks than thebimodal distribution in acoustic space.In the input used in the complementary-unnatural condition, by contrast, the92FrequencyLow onset F2 High onset F2(a) Bimodal distribution in acoustic spaceFrequencyShift due to context effectShift due to context effectLow onset F2 High onset F2[u] context[i] context(b) Context effects in complementary natural contextsFrequencyAggregate distribution[ʂa] [ɕa]Low onset F2 High onset F2(c) Aggregate distribution in auditory spaceFigure 3.8: Complementary-natural condition93tokens of the retroflex [ùa] occur after [u], and the tokens of the alveolopalatal [Ca]occur after [i]. According to the articulatory approach, when participants hear atoken of [ùa] after [i], they may infer that the onset F2 of the unfamiliar syllablehas been raised due to coarticulation with the preceding [i], and thus, they mayinfer that the inherent onset F2 of the unfamiliar syllable is lower than it is in theacoustic signal. Similarly, when participants hear a token of the alveolopalatal[Ca] after [u], they may infer that the onset F2 of the unfamiliar syllable has beenlowered due to coarticulation with the preceding [u], and thus, they infer that theonset F2 of the unfamiliar syllable is higher than it is in the acoustic signal.According to the auditory approach, when participants hear a token of retroflex[ùa] after [i], their perceptual system first adapts to the frequency region around theF2 of [i], and this adaptation exaggerates the relative lowness of the onset F2 ofthe following unfamiliar syllable; the onset F2 is perceived to be lower than it is inthe acoustic signal. Similarly, when participants hear a token of the alveolopalatal[Ca] after [u], their perceptual system first adapts to the frequency region aroundthe F2 of [u], and this adaptation exaggerates the relative height of the onset F2 ofthe following unfamiliar syllable; the onset F2 is perceived to be higher than it isin the acoustic signal.Figure 3.9 demonstrates how these context effects may affect th aggregate dis-tribution in auditory space. Figure 3.9a shows a schematic representation of thefrequency distribution of the unfamiliar syllables in acoustic space in the inputused for the complementary-unnatural condition. The input implies that the syl-lables are classified into the retroflex and alveolopalatal categories. Figure 3.9bshows how context effects may affect the perception of these unfamiliar syllableswhen they are presented in phonetically unnatural contexts, the tokens of [ùa] after[u] and the tokens of [Ca] after [i]. As discussed above, the context effects mayshift the location of the tokens of [ùa] to the lower end of the onset F2 continuumand the location of the tokens of [Ca] to the higher end of the onset F2 contin-uum in auditory space. Figure 3.9c shows the hypothetical representation of theaggregate distribution of these unfamiliar syllables in auditory space. As a resultof the context effects, the overlap between the frequency peaks is very small, andthe aggregate distribution has two peaks that are clearly distinguishable from eachother.94FrequencyLow onset F2 High onset F2(a) Bimodal distribution in acoustic spaceFrequencyShift due to context effectShift due to context effectLow onset F2 High onset F2[u] context[i] context(b) Context effects in complementary unnatural contextsFrequencyAggregate distribution[ʂa] [ɕa]Low onset F2 High onset F2(c) Aggregate distribution in auditory spaceFigure 3.9: Complementary-unnatural condition95If the context effects happen as they were described above, the input differsin terms of the shape of the aggregate distribution in auditory space between thecomplementary-natural and complementary-unnatural conditions. The aggregatedistribution may be bimodal in both conditions, but the trough between the peaksis shallower in the input used in the complementary-natural condition (Figure 3.8c)than in the input used in the complementary-unnatural condition (Figure 3.9c). Iflearners form categories for speech sounds based on the shape of the aggregatedistribution of the sounds in auditory space, the categories that the learners in thecomplementary-natural and complementary-unnatural conditions should be differ-ent from each other. Specifically, the learners in the complementary-natural condi-tion learn an aggregate distribution in which two frequency peaks are more closelyspaced and thus less distinct. As a result, the learners are more likely to inter-pret the distribution as unimodal, representing a single category. By contrast, thelearners in the complementary-unnatural condition learn an aggregate distributionin which two peaks are distantly spaced and more distinct. As a result, the learnersare more likely to interpret the distribution as bimodal, representing two categories.These differences may explain the outcomes of the complementary-natural andcomplementary-unnatural conditions in Experiments 1 and 2. The participants inthe complementary-natural condition were more likely to learn a single categoryfor the novel fricative sounds and therefore to become less sensitive to acoustic dif-ferences between [ù] and [C]. By contrast, the participants in the complementary-unnatural condition were more likely to learn two separate categories for the novelfricative sounds and therefore to maintain their sensitivity. In other words, theinput used in the complementary-natural condition provided less robust distri-butional cues for the learning of the two categories than the input used in thecomplementary-unnatural condition.This hypothesis may explain the difference between the complementary-natural and complementary-unnatural conditions. However, it does not ex-plain the absence of difference between the complementary-unnatural and non-complementary conditions. The results of Experiments 1 and 2 demonstrated thatparticipants in these two conditions showed the same level of sensitivity to acous-tic differences between [ù] and [C] after exposure. However, the input used in thenon-complementary condition contained the novel sounds occurring in both natu-96ral and unnatural contexts. In other words, half of the exposure stimuli providedless robust cues for the learning of two categories, and the other half providedmore robust cues for the learning of two categories. This means that the non-complementary condition should be intermediate between the complementary-natural and complementary-unnatural conditions in terms of the learnability of thephonetic contrast between [ù] and [C]. However, that was not the case.A possible explanation for the absence of difference between thecomplementary-unnatural and non-complementary conditions is the learning of thepredictable occurrences of the target segments. Despite the fact that the input usedin the complementary-unnatural condition provided more robust distributional cuesfor the learning of two categories, it did not help learners to improve their sensi-tivity to acoustic differences between the target segments because they were alsodetecting the predictable occurrences of the target segments at the same time. Inother words, the aggregate distribution of the novel fricative sounds in auditoryspace could have helped learners to improve their sensitivity to acoustic differ-ences between the target segments, but the learning of the predictable occurrencesof the target segments wiped out the effect of the aggregate distribution.Another explanation is that the context-dependent perception of critical syl-lables was the primary factor that determined the learners’ sensitivity to acousticdifferences between the target segments, but learners in the non-complementaryand complementary-unnatural conditions were just showing a ceiling effect. Sinceparticipants in the control condition showed fairly good sensitivity to acoustic dif-ferences between the retroflex [ù] and alveolopalatal [C] (especially when they com-pared the canonical tokens of these segments), it is possible that the amount of ex-posure was not enough for learners in the non-complementary and complementary-natural conditions to improve their sensitivity.3.8 ConclusionIn this chapter, I presented the results of Experiments 2 and compared these to theresults of Experiment 1. The results suggest that adults can learn the allophonicrelationship between two novel segments, the retroflex [ù] and the alveolopalatal[C], only when they are exposed to input in which the tokens of these segments oc-97cur in phonetically natural complementary contexts, but not when they are exposedto input in which the tokens of these segments occur in complementary contextsthat are phonetically natural. These findings indicate that the learning of allophonyis constrained by the phonetic naturalness of the patterns of complementary dis-tribution. In order to account for the role of phonetic naturalness, I proposed ahypothesis about the mechanisms behind the learning of allophony (the contexteffects hypothesis). According to this hypothesis, learners in the complementary-natural condition became less sensitive to acoustic differences between [ù] and [C]because they actually learned less distinct categories or even a single category, andthis happened through the the learning of the aggregate distribution of the novelfricative sounds with less distinct peaks in auditory space as a result of the context-dependent perception of the novel fricative sounds.98Chapter 4Experiment 3: Learning ofcontext-dependent perception ofnovel sounds4.1 IntroductionIn Chapter 3, I proposed a hypothesis about the mechanisms behind the learning ofallophony. The hypothesis is that the context-dependent perception of sounds in in-put affects the aggregate distribution of the sounds in auditory space. Specifically,when learners hear tokens of two target segments in phonetically natural comple-mentary contexts, they perceive the tokens as being more similar to each other thanthey actually are. This affects the shape of the aggregate distribution of the soundsin auditory space such that the distance between two distributional peaks becomessmaller. This eventually leads to the learning of less distinct categories or even asingle category.A crucial assumption that underlies the hypothesis is that the perception ofsounds, no matter whether they are familiar or novel, is affected by context. Thereis a large body of evidence for context effects in the perception of sounds in listen-ers’ native languages (e.g., Lindblom and Studdert-Kennedy, 1967; Mann, 1980;Mann and Repp, 1980; Repp and Mann, 1981; Repp, 1981; Summerfield, 1975).99However, our knowledge about context effects in the perception of novel (or non-native) sounds is still limited. Experiment 3 tests whether the perception of theretroflex [ù] and alveolopalatal [C] by adult English speakers is affected by contextin the ways that were discussed in Chapter 3. Specifically, it tests whether adultEnglish speakers perceive the retroflex [ù] and alveolopalatal [C] as being more sim-ilar to each other when they hear the sounds in phonetically natural complementarycontexts than in phonetically unnatural complementary contexts.Hao (2012) reported that naive English speakers categorized the alveolopalatalfricative in [Cy] as English palato-alveolar /S/ most of the time, but they categorizedthe alveolopalatal fricative in [Ci] either as English alveolar /s/ or palato-alveolar/S/ (see Table 2.2 in Chapter 2). What is interesting in these results is that naiveEnglish speakers perceived the alveolopalatal [C] as being less palatalized (i.e., as/s/) before a high front unrounded vowel [i]. This suggests that they might beinferring that the palatality in the alveolopalatal [C] is a result of coarticulationwith the following [i] and are compensating for the coarticulation.The question of how the perception of novel (or non-native) sounds is affectedby context relates to the question of how much context effects depend on listen-ers’ experience with specific languages. Some researchers have argued that contexteffects are based on listeners’ general knowledge about acoustic and articulatoryevents (Mann, 1986; Viswanathan et al., 2010). For example, Mann (1986) demon-strated that Japanese speakers showed context effects in the categorization of a[da]-[ga] continuum when the stimuli were presented after English liquids [l] and[ô]; they were more likely to label intermediate tokens as [ga] after [l] and weremore likely to label the same intermediate tokens as [da] after [ô]. Given the factthat Japanese speakers have great difficulty in categorizing these English liquidsounds (e.g., Miyawaki et al., 1975), the finding that these English sounds signif-icantly affected Japanese speakers’ categorization of the [da]-[ga] continuum wasstriking. According to Mann (1986), even though Japanese speakers do not havecategories for these English liquid sounds, they still have access to phonetic infor-mation (e.g., formants and underlying articulatory events) from the liquid soundsand integrate the information into the perception of the stimuli from the [da]-[ga]continuum. Therefore, context effects do not necessarily rely on listeners’ knowl-edge about the categories of sounds in specific languages.100Some researchers have further argued that context effects are based on generalcontrast effects in perception (Kluender et al., 2003; Lotto et al., 1997; Lotto andKluender, 1998). In this view, context effects in speech perception arise becauseof the way the auditory system works. For example, Lotto and Kluender (1998)demonstrated that English speakers showed context effects in the categorization ofa [da]-[ga] continuum when the stimuli were presented after the frequency mod-ulated (FM) glides that mimicked the trajectories of the F3 of English [l] and [ô].This suggests that context effects do not necessarily rely on the linguistic process-ing of speech sounds.Other researchers have argued that there is some language-specificity in contexteffects. For example, Beddor et al. (2002) compared native speakers of English andShona in the categorization of vowels. According to Beddor et al., both languagesshow non-local vowel-to-vowel coarticulation. In English, the directionality ofcoarticulation is symmetric; both anticipatory and carryover coarticulations happenwith the same magnitude. In Shona, by contrast, the directionality is asymmetric;anticipatory coarticulation is stronger than carryover coarticulation. In perception,while English speakers showed the same amount of context effects with anticipa-tory and carryover coarticulations, Shona speakers showed stronger context effectswith anticipatory coarticulation. These results suggest that language-specific pho-netic knowledge plays a role in context effects.If context effects are based on listeners’ general knowledge about articulatoryand acoustic events or general contrast effects in speech perception, the perceptionof non-native sounds should be affected by context in ways that are predicted bythe kinds of information that are available in the acoustic signal. If context effectsare based on listeners’ knowledge about specific languages, the perception of non-native sounds should be affected by context only after exposure to input.As discussed in Chapter 3, both the articulatory and auditory theories of contexteffects predict that English speakers would perceive the retroflex [ù] and the alve-olopalatal [C] as being more similar to each other when they hear these soundsin phonetically natural contexts than in phonetically unnatural contexts. If thecontext-dependent perception of these novel sounds is something that requireslearning, the asymmetry between the phonetically natural contexts and phoneti-cally unnatural contexts would emerge only after exposure to input that supports101such learning.4.2 MethodsThe experiment consisted of two sessions over two consecutive days. In Session1, participants first did a similarity rating task and then heard input stimuli. InSession 2, the order was reversed. They first heard the exposure stimuli and thenperformed the similarity rating task. The fact that testing was carried out both pre-and post-exposure allows us to assess the effects of learning through exposure.4.2.1 ParticipantsTwenty native speakers of English with no identified language or hearing disordertook part in the experiment. All participants were undergraduate students enrolledin linguistics courses at The University of British Columbia, and they receivedcourse credit for participation. All participants self-reported that English was theirfirst and dominant language. Most of them were multilingual, but none of themwere familiar with any language that contained two or more post-alveolar sibilantsas phonemes.4.2.2 Exposure stimuliThe exposure stimuli used in Experiment 3 were identical to the ones used in thenon-complementary condition in Experiment 1 (see Section 2.3.2). They consistedof 256 bisyllabic strings. Each string comprised a context syllable followed by acritical syllable or a filler syllable. There were eight context syllables, [li], [lu],[mi], [mu], [pi], [pu], [gi], and [gu]. Context syllables were grouped into twoclasses according to the vowel quality; syllables with the high front unroundedvowel [i] ([i] context) and syllables with the high back rounded vowel [u] ([u]context). There were eight critical syllables taken from a 10-step continuum be-tween the natural productions of the Mandarin retroflex [ùa] and alveolopalatal[Ca]. Based on the categorization of the continuum by native speakers of Man-darin, four steps from each side of the category boundary (step 6) were used ascritical syllables: steps 2-5 for the retroflex [ù] and steps 7-10 for the alveolopalatal[C] (Noguchi and Hudson Kam 2015a; also see Appendix A). The frequencies of102critical syllables were manipulated so that their aggregate distribution showed abimodal shape with two peaks (Figure 4.1). There were two filler syllables, [tha]and [ta].012341 2 3 4 5 6 7 8 9 10StepsFrequencyFigure 4.1: Aggregate distribution of critical syllables (Experiment 3)Exposure stimuli were constructed by concatenating context syllables with ei-ther critical syllables or filler syllables. The sixteen tokens of critical syllablesshown in Figure 4.1 were combined with 8 context syllables to generate 128 criti-cal stimuli. Similarly, 16 tokens of filler syllables (8 tokens of [tha] and [ta]) werecombined with 8 context syllables to generate 128 filler stimuli. These 256 expo-sure stimuli were divided into two subsets according to the consonant of contextsyllables, a subset with [l] and [p] and the other subset with [m] and [g]. The firstsubset was used in Session 1 and the second subset was used in Session 2. In thisinput, all of the critical syllables occurred in overlapping contexts. This means thattokens of the retroflex [ù] occurred in both the natural context ([u] context) and theunnatural context ([i] context), and the tokens of the alveolopalatal [C] occurred inboth the natural context ([i] context) and the unnatural context ([u] context). Figure4.2 shows the 32 tokens of critical syllables where the same number of tokens ofeach step occur in the [i] context and the [u] context.103024681 2 3 4 5 6 7 8 9 10StepsFrequency Contexts[i][u]Figure 4.2: Distribution of 32 critical syllables (Experiment 3)4.2.3 Test stimuliStimuli for the similarity rating task were bisyllabic strings. The first syllable waseither [i] or [u] with no onset consonant. The second syllable was either a criticalsyllable or a filler syllable. The critical syllables were taken from the end points ofthe continuum used in the exposure stimuli, step 2 for the retroflex [ùa] and step 10for the alveolopalatal [Ca]. The filler syllables were [tha] and [ta].Stimuli were paired for the similarity rating task. For each pair, the second syl-lables were either the same syllable ([V.CV1]–[V.CV1]) or two different syllables([V.CV1]–[V.CV2]). Pairs with different critical syllables were classified into threegroups according to the type of context: overlapping, complementary-natural, andcomplementary-unnatural. In pairs with overlapping contexts, the critical syllableswere presented in the same context, either after [i] or [u] (e.g., i-step 2 and i-step10). In pairs with complementary-natural contexts, the step 2 (retroflex) syllablewas presented after [u] and the step 10 (alveolopalatal) syllable was presented after[i] (u-step 2 and i-step 10). In pairs with complementary-unnatural contexts, thestep 2 (retroflex) syllable was presented after [i] and the step 10 (alveolopalatal)syllable was presented after [u] (i-step 2 and u-step 10). Note that in the presenta-tion of different critical syllables in overlapping contexts one of the critical sylla-104bles was in the natural context and the other was in the unnatural context (e.g., [u]is natural for the step 2 syllable but unnatural for the step 10 syllable).Pairs with the same critical syllables were classified into two groups accordingto the types of context as well: overlapping and complementary. In pairs with over-lapping contexts, the same syllable was presented twice in the same context, eitherafter [i] or [u] (e.g., i-step 2 and i-step 2). In these pairs, a single critical syllablewas presented twice, both times either in the natural (e.g., u-step 2 and u-step 2)or the unnatural context (e.g., i-step 2 and i-step 2). In pairs with complementarycontexts, by contrast, one was presented after [i] and the other was presented after[u] (e.g., i-step 2 and u-step 2). In these pairs, a single critical syllable was pre-sented twice, once in the natural context and the other time in the unnatural context.Finally, pairs with filler syllables were classified into two groups according to thetype of context, overlapping and complementary. Table 4.1 summarizes the pairedstimuli used in the test.105Table 4.1: Stimuli for similarity rating task2nd syllable Trial type Context Pairs NaturalnessCritical Different Overlapping i-step 2 and i-step 10 Natural for step 10Critical Different Overlapping u-step 2 and u-step 10 Natural for step 2Critical DifferentComplementary-natural u-step 2 and i-step 10 Natural for bothCritical DifferentComplementary-unnatural i-step 2 and u-step 10 Unnatural for bothCritical Same Overlappingu-step 2 and u-step 2i-step 10 and i-step 10Natural for bothCritical Same Overlappingi-step 2 and i-step 2u-step 10 and u-step 10Unnatural for bothCritical Same Complementaryu-step 2 and i-step 2u-step 10 and i-step 10Natural for one stimulusFiller Different Same i/u-tha and i/u-ta N/AFiller Different Differentu-tha and i-tai-tha and u-taN/AFiller Same Samei/u-tha and i/u-thai/u-ta and i/u-taN/AFiller Same Complementaryi-tha and u-thai-ta and u-taN/AThe context effects hypothesis makes different predictions for the different trialtypes. For pairs with different critical syllables, the perceived similarity betweenthe critical syllables should be high in complementary-natural contexts. This isbecause context effects would shift the perceived locations of the critical sylla-bles on the [ùa]-[Ca] continuum inwards when they were heard in complementary-natural contexts. By contrast, the perceived similarity between the critical sylla-bles should be low in complementary-unnatural contexts. This is because contexteffects would shift the perceived locations of the critical syllables outwards whenthey were heard in complementary-unnatural contexts (see Section 3.7.2). Finally,the perceived similarity between critical syllables should be intermediate in over-lapping contexts. This is because context effects would shift the perceived loca-106tions of the critical syllables in the same direction; there would be no change inthe perceived distance between the critical syllables. However, the presentationof different critical syllables in overlapping contexts might actually highlight theacoustic discrepancy between the syllables instead. If that is the case, the perceivedsimilarity between the critical syllables should be low in overlapping contexts.For pairs with the same critical syllables, a single critical syllable should beperceived differently in different contexts. This is because one of the repetitionswas presented in the natural context and the other was presented in the unnaturalcontext. In such a case, context effects would shift the location of a single criti-cal syllable in opposite directions along the continuum, analogous to the demon-stration of context effects in categorization in original studies (e.g., Lindblom andStuddert-Kennedy, 1967; Mann, 1980).4.2.4 DesignE-prime Professional (ver. 2.0) was used to control the presentation of stimuli andthe recording of responses (Schneider et al., 2002). A session consisted of twophases, exposure and test. In Session 1, the test phase was followed by the expo-sure phase. In Session 2, the exposure phase was followed by the test phase. In theexposure phase, participants heard a block of 128 stimuli presented in a randomorder with one second ISI. They heard the block four times. Exposure stimuli werepresented as “words” in a foreign language. In order to help participants stay at-tentive to the stimuli, a monitoring task was given to them. In each block of stimulipresentation, a monitoring stimulus (a filler stimulus with long vowels:e.g., [li:ta:])was randomly inserted in every subblock of 16 presentations. Participants wereasked to press the spacebar when they heard the instances of “slow speech”.In the test phase, in each trial, participants heard a pair of bisyllabic stringsand were asked to rate the similarity between the second syllables of the stringson a scale of 1 to 7, where 1 means “very similar” and 7 means “very different”.Emphasis was placed on the point that they should compare the second syllablesand not the whole (two-syllable) strings. They were also encouraged to use allpoints on the scale when rating the similarity. ISI was 750 ms. Participants weregiven a maximum of five seconds to respond, but the trial was terminated whenever107participants recorded a response. ITI was two seconds.There were 48 test trials (trials with critical syllables): 24 with different crit-ical syllables and 24 with the same critical syllables. There were eight trials withdifferent critical syllables in overlapping contexts (two orders of presentation,two contexts, and two repetitions), eight trials with different critical syllables incomplementary-natural contexts (two orders of presentation and 4 repetitions), andeight trials with different critical syllables in the complementary-unnatural con-texts (two orders of repetition and 4 repetitions). Similarly, there were eight tri-als with the same critical syllables in overlapping contexts (two critical syllables,two vowels, and two repetitions) and 16 trials with the same critical syllables incomplementary contexts (two critical syllables two orders of presentation, andfour repetitions). Note that the number of trials with the same critical syllablesin complementary contexts was two times the number of the trials with the samecritical syllables in overlapping contexts. This is because the number of timesparticipants heard the combinations of different context vowels was made to bethe same in the trials with different critical syllables (with the contrast betweencomplementary-natural and complementary-unnatural contexts) and the trials withthe same critical syllables (without the contrast between complementary-naturaland complementary-unnatural context).There were 32 filler trials: 16 trials with different filler syllables and 16 trialswith the same filler syllables. There were eight trials with different filler syllablesin overlapping contexts (two orders of presentation, two vowels, and two repeti-tions), eight trials with different filler syllables in complementary contexts (twoorders of presentation for the filler syllables, two orders of presentation for thevowels, and two repetitions). There were eight trials with the same filler sylla-bles in overlapping contexts (two filler syllables, two vowels, and two repetitions),and eight trials with the same filler syllables in complementary contexts (two fillersyllables, two orders of presentation for the vowels, and two repetitions).4.2.5 ProcedureOn Day 1, participants came into the lab and signed the consent form. In Session 1,participants first did the similarity rating task. Participants were given the following108instructions at the beginning of the test phase.In each trial, you will hear a pair of words in a foreign language. Eachword has two syllables. Therefore a pair will be something like “gewobaro”.Your task is to compare the second syllables of the words (“wo” of“gewo” and “ro” of “baro” in the example above) and rate the simi-larity between the syllables on the scale of 1 to 7.In this scale, 1 means “very similar”, 7 means “very different”, andintermediate numbers mean intermediate similarities between “verysimilar” and “very different”. When you rate the similarity, try to useall of the 7 levels.After completing the test phase, participants proceeded to the exposure phase.Participants were given the following instruction at the beginning of the exposurephase.Now, you will listen to more words in this language. Your task is tolisten to the recordings.Sometimes words will be pronounced very slowly, like “waaakooo”.When you hear the slow speech, press the “SPACE” bar.On Day 2 participants came back to the lab and did Session 2. In Session 2, theorder of the phases was reversed: the exposure phrase was first and the test phasewas second. After completing the test phase, participants filled out a languagebackground questionnaire.4.3 ResultsFirst, each participant’s performance on the monitoring task was checked to seewhether they were attentive to the stimuli during exposure. All participants didbetter than 75% on the monitoring task and thus were included in the analyses.Since responses on the similarity rating trials were categorical and ordered, ordinallogistic regression or cumulative link models were used to analyze the responses(Agresti, 2002; Christensen, 2015a,c).1094.3.1 Cumulative link modelWhen response categories are ordered on a scale, the probability that the responseY will fall at or below a response category j is called the cumulative probability ofthe response j.P(Y ≤ j) = pi1+ ...+pi j, j = 1, ...,J (4.1)Logits of cumulative probabilities are computed for J−1 categories.logit[P(Y ≤ j)] = log[P(Y ≤ j)1−P(Y ≤ j)]= log[pi1+ ...pi jpi j+1+ ...+piJ], j = 1, ...,J−1(4.2)A cumulative link model is a regression model for cumulative logits. It is like abinary logistic regression model for binary responses with a pair of outcomes Y ≤j and Y > j.logit[P(Y ≤ j)] = α j +βx, j = 1, ...,J−1 (4.3)The model consists of parameters α and β . Importantly, α is dependent on j but βis not. This means that the parameter β which specifies the effect of an explanatoryvariable x on the log odds of responses being at or below j is constant for all J−1cumulative logits. Agresti (2002, 2010) writes the cumulative link model with aplus sign on the right hand side as shown in (4.3). With this formulation, a positivevalue for β means that the probability that the response Y will fall at or below j ishigher for higher values of x. In other words, a positive β indicates that x has theeffect of lowering the scores on the ordinal 1-to-J scale. In contrast, Christensen(2015a,c) writes the model with a minus sign on the right hand side as shown in(4.4). With this formulation, a positive value for β means that the probability thatthe response Y falls at or above j is higher for higher values of x.logit[P(Y ≤ j)] = α j−βx, j = 1, ...,J−1 (4.4)Christensen (2015c) argues that the latter formulation is more intuitive because theeffect of an explanatory variable x is interpreted in the same way as it is in ordinarylinear regression or ANOVA models. The analyses presented below were done in R3.0.3 (R Core Team, 2014) with the mixed effects cumulative link model functionclmm from the ordinal package (Christensen, 2015b). Since the function adopts110the formulation in (4.4), I will interpret the parameters of explanatory variablesaccordingly.Responses on trials with different critical syllables and trials with the same crit-ical syllables were analyzed separately. This was because these two different typesof trials had different sets of contexts; different trials had three contexts (overlap-ping, complementary-natural, and complementary-unnatural contexts), and sametrials had two contexts (overlapping and complementary contexts). The randomeffects structure contained both random by-participant intercepts and random by-participant slopes for all predictor variables. However, when a model failed toconverge, the structure was simplified in the following way. First, uncorrelatedrandom intercepts and random slopes were used instead of correlated random in-tercepts and random slopes. If the model still failed to converge, an interactionbetween random slopes was excluded from the structure if the model contained aninteraction between predictors; otherwise a random slope was excluded from thestructure. This sort of simplification inflates Type I error rates (Barr et al., 2013).However, the failure of convergence may arise from trying to fit a model that istoo complex for the data; the amount of information in the data is too small forthe reliable estimation of all the parameters specified in the maximal random effectstructure, and the reduction of the complexity is necessary (Bates et al., 2015).4.3.2 Trials with different critical syllablesFigure 4.3 shows the distribution of responses on the trials with different criti-cal syllables by context and session. The responses were analyzed with three pre-dictors, session (Session 1 and Session 2), context (overlapping, complementary-natural, and complementary-unnatural), and the interaction between sessionand context. Since the levels of context are hierarchically organized (i.e.complementary-natural and complementary-unnatural can form a level “comple-mentary” that makes a contrast against overlapping), the effect of context was ex-amined in two steps. First, a contrast was made between overlapping and com-plementary. Then, another contrast was made between complementary-natural andcomplementary-unnatural. To do so, the predictor levels were coded using a con-trast coding scheme (Helmert) (Table 4.2).111SessionScoreContext1Overlap1 2 3 4 5 6 7Comp-unnatComp-nat21 2 3 4 5 6 7Figure 4.3: Distribution of responses to trials with different critical syllables(1=“very similar” and 7=“very different”)Table 4.2: Helmert contrast coding for ContextTrial type 1st comparison 2nd comparisonOverlap 0.66 0Comp. nat. -0.33 0.5Comp. unnat. -0.33 -0.5Likelihood ratio tests were used to evaluate the statistical significance of thepredictors by comparing a predictor model (a model with a predictor) and a reducedmodel (a model without the predictor).1 The significance level was set at p <0.05. The test results yielded no significant main effects of session [χ2(1) = 2.922,p = 0.087] and context [χ2(1) = 2.314, p = 0.314], but a significant interaction1The likelihood ratio test statistic is −2(`0− `1), where `1 and `0 is the log-likelihood of theobserved data under the predictor model and the reduced model, respectively. It asymptoticallyfollows a χ2 distribution with degrees of freedom equal to the difference in the number of parametersof predictor models and their reduced counterparts (Christensen, 2015c).112between session and context [χ2(2) = 21.951, p< 0.001].2In order to understand the nature of the interaction, follow-up analyses wereconducted with the data from Session 1 and Session 2 separately. Responseswere analyzed with one predictor, context (overlapping, complementary-natural,and complementary unnatural contexts, coded as shown in Table 4.2). First, theanalysis of responses from Session 1 revealed that there was no significant maineffect of context [χ2(2) = 0.932, p = 0.628]. The analysis of responses from Ses-sion 2 revealed that there was a significant main effect of context [χ2(2) = 20.646,p < 0.001].3 According to the context model, the first Helmert comparison re-vealed that the estimated odds of giving a score at j or higher (for any j > 1) were2.12 times higher in the overlapping context than in the complementary contexts(p < 0.001). This means that participants perceived the different critical syllablesas being less similar to each other when they (the syllables) were presented in theoverlapping contexts than when they were presented in the complementary con-texts. The second Helmert comparison revealed that the estimated odds of givinga score j or higher (for any j > 1) were 1.58 times higher in the complementary-unnatural context than in the complementary-natural context (p = 0.033). Thismeans that participants perceived the different critical syllables as being less sim-ilar to each other when they were presented in the complementary-unnatural con-texts than when they were presented in the complementary-natural contexts, butonly after exposure.4.3.3 Trials with same critical syllablesFigure 4.4 shows the distribution of responses on the trials with same critical syl-lables by session and context.4 The responses were analyzed with three predictors,session (Session 1 and Session 2), context (overlapping and complementary), and2The interaction model failed to converge with the maximal random effect structure and evenwith the uncorrelated random intercepts and slopes. Therefore, the interaction model and the reducedmodel were fitted without the interaction between random slopes.3The reduced model failed to converge with the maximal random effect structure and even withthe uncorrelated random intercepts and the random slopes. Therefore, the context model and thereduced model were fitted without the random slopes.4Remember that the number of trials with the same critical syllables in complementary contextswas two times the number of the trials with the same critical syllables in overlapping contexts (see4.2.4).113SessionScoreContext1Overlap1 2 3 4 5 6 7Comp21 2 3 4 56 7Figure 4.4: Distribution of responses to trials with the same critical syllables(1=“very similar” and 7=“very different”)interaction between session and context. The analyses revealed that there was nosignificant main effect of session [χ2(1) = 0.86, p = 0.353], but there was a sig-nificant main effect of context [χ2(1) = 32.371, p < 0.001].5 According to thecontext model, the estimated odds of giving a score at j or higher (for any j > 1)were 2.23 times higher in the complementary contexts than in the overlapping con-texts (p< 0.001). This means that participants perceived two repetitions of a singlecritical syllable as being less similar to each other when they were presented in thecomplementary contexts than when they were presented in the overlapping con-texts. The analyses also revealed that there was a significant interaction betweensession and context [χ2(1) = 13.254, p< 0.001].65Both the session model and the context model failed to converge with the maximal randomeffects structure and even with the uncorrelated random intercepts and random slopes. Therefore,these models and their reduced counterparts were fitted without random slopes.6The interaction model failed to converge with the maximal random effect structure and evenwith the uncorrelated random intercepts and random slopes. Therefore, the interaction model and thereduced model were fitted without random slopes.114In order to understand the nature of the interaction, follow-up analyses wereconducted with the data from Session 1 and Session 2 separately. The responseswere analyzed with one predictor, context (overlapping and complementary). Theanalysis of responses from Session 1 revealed a significant main effect of context[χ2(1) = 36.96, p < 0.001].7 According to the context model, the estimated oddsof giving a score at j or higher (for any j > 1) were 3.64 times higher in thecomplementary contexts than in the overlapping contexts (p< 0.001). The analysisof responses from Session 2 revealed that there was no significant main effect ofcontext [χ2(1) = 2.77, p = 0.096].8 Participants perceived two repetitions of asingle critical syllable as being less similar to each other when they were presentedin the complementary contexts than when they were presented in the overlappingcontexts in Session 1, but not in Session 2.4.4 DiscussionThe results of Experiment 3 revealed the following points. First, the rating of thesimilarity between different critical syllables was not affected by context in Ses-sion 1, whereas it was affected by context in Session 2. In Session 2, participantsperceived [ùa] and [Ca] as being less similar to each other when they were presentedin the overlapping contexts than when they were presented in the complementarycontexts regardless of whether the complementary contexts were natural or unnat-ural. This is different from the prediction made by the context effects hypothesis.According to the hypothesis, the perceived similarity between [ùa] and [Ca] shouldbe higher in the overlapping context than in the complementary-natural contextbut lower in the overlapping context than in the complementary-unnatural context.However, the results showed that the perceived similarity between [ùa] and [Ca] wassignificantly lower in the overlapping context than in the complementary-naturaland the complementary-unnatural contexts. This is probably because the presen-tation of the test stimuli in the overlapping contexts (i.e., after physically identical7The context model failed to converge with the maximal random effects structure and even withthe uncorrelated random intercepts and random slopes. Therefore, the context model and the reducedmodel were fitted without the random slope.8The context model failed to converge with the maximal random effect structure and even withthe uncorrelated random intercepts and random slopes. Therefore, the context model and the reducedmodel were fitted without the random slope.115vowels) highlighted the acoustic discrepancy between the stimuli and participantsbecame more sensitive to the difference between the stimuli.In Session 2, participants perceived [ùa] and [Ca] as being less similar to eachother when they were presented in the complementary-unnatural contexts than inthe complementary-natural contexts. These results followed the prediction madeby the context effects hypothesis; the perceived similarity between [ùa] and [Ca] ishigher in the complementary-natural context than in the complementary-unnaturalcontext. The finding that participants showed a significant effect of context onlyafter exposure is of particular importance. This suggests that they learned to per-ceive [ùa] and [Ca] in a context-dependent manner through exposure to input stim-uli. Specifically, they started perceiving these syllables as being more similar toeach other when they were presented in the complementary-natural contexts thanin the complementary-unnatural contexts.Second, the perception of two repetitions of a single critical syllable was sig-nificantly affected by context in Session 1. Participants rated two repetitions of[ùa] or [Ca] as being less similar to each other in the complementary contexts thanin the overlapping contexts. However, the effect of context became non-significantin Session 2. The difference between the complementary and the overlapping con-texts was expected under the context effects hypothesis. When two identical tokensof [ùa] or [Ca] were presented in overlapping contexts, context effects should shiftthe location of the tokens in the same direction along the continuum, which shouldhave no impact on the perceived distance between the tokens. When two identicaltokens of [ùa] or [Ca] were presented in complementary contexts, context effectsshould perceptually shift the location of the tokens in different directions along thecontinuum, which will increase the perceived distance between the tokens. There-fore, two repetitions of a [ùa] or [Ca], one in the natural context and the other in theunnatural context, sould not be perceived as identical (as in the original demonstra-tion of context effects in categorization in Mann 1980). The results of Experiment3 suggest that participants showed the context-dependent perception of the teststimuli in the trials with the same critical syllables in Session 1.It is interesting to note that the effect of the context became non-significant inSession 2. Does this mean that participants’ perception of [ùa] and [Ca] becameless dependent on context after exposure? Assuming that participants initially did116not have categories for the novel fricative sounds, it could be the case that theirperception of the novel sounds was more based on the information in the acousticsignal. Since the acoustic signal is continuous, the processing of the informationfrom the portions of the acoustic signal that corresponded to the novel sounds couldbe influenced by the information from the portions of the acoustic signal that cor-responded to the neighbouring sounds. Once participants learned the categories,their perception could become more dependent on their knowledge about the cat-egories. In other words, before learning the categories, participants were moresensitive to the interactions of acoustic information across segmental boundaries.However, this does not explain the pattern of responses on the trials with differentcritical syllables.Participants did not show context effects in their responses on the trials withdifferent critical syllables in Session 1. A possible explanation for the absenceof context effects in the comparison of different critical syllables in Session 1 isthat the the acoustic differences between the syllables were large enough to elimi-nate any potential context effects. The context effects should reduce the perceiveddistance between [ùa] and [Ca] when the syllables are presented in complementary-natural contexts. However, as we saw in Experiment 1, English speakers seem tohave fairly good pre-existing sensitivity to acoustic differences between [ùa] and[Ca]. Therefore, it is possible that participants in Experiment 3 could initially per-ceive the differences between [ùa] and [Ca] well enough that their perception wasnot affected by context. After being exposed to input that supported the learning ofthe context-dependent perception of [ù] and [C], significant context effects emergedin Session 2.Another possible explanation for the pattern of responses on the trials with thesame critical syllables is that participants took the (non-)identity of context sylla-bles into consideration in the similarity rating. In other words, participants ratedtwo identical tokens of a single critical syllable presented in the complementarycontexts as being less similar to each other because they consciously or uncon-sciously compared disyllabic strings instead of the second syllables of the disyl-labic strings. If this is the case, it would be a serious confound for any comparisonsbetween the overlapping context and the complementary context.In order to test whether this confound was actually present, responses on trials117with the same filler syllables were analyzed. The filler syllables used were the aspi-rated [tha] and the unaspirated [ta]. As far as I am aware, there is no straightforwardphonetic connection between the laryngeal features in stop consonants and vowelfrontness, backness, and rounding. Therefore, there should not be any context ef-fects in the perception of the filler syllables in complementary contexts, and thereshould not be any difference between the overlapping context and the complemen-tary context in terms of the perceived similarity between the filler syllables. The re-sponses were analyzed using mixed effects ordinal logistic regression models withthree predictors, session (Session 1 and Session 2), context (overlapping and com-plementary) and interaction between session and context. The analyses revealedthat there was no significant main effect of session [χ2(1)t = 1.73, p= 0.188], butthere was a significant main effect of context [χ2(1) = 7.326, p= 0.007]. Accord-ing to the context model, the estimated odds of giving a response at j or higher (forany j > 1) were 2.88 times higher in the complementary contexts than in the over-lapping contexts (p = 0.005). This means that participants perceived two repeti-tions of a single filler syllable as being less similar to each other when the syllableswere presented in the complementary contexts than when the syllables were pre-sented in the overlapping contexts. The analysis yielded no significant interactionbetween session and context [χ2(1) = 2.329, p = 0.127].9The significant effect of context is striking. Participants showed a strong biasto rate two identical tokens of a single filler syllable as being less similar to eachother when the syllables were presented in different contexts even through thereshould not be any context effects that could explain the difference. These resultssuggest that at least some participants took the (non-)identity of context syllablesinto consideration in the similarity rating. This unfortunate confound makes thecomparison between the overlapping and complementary contexts difficult. Inter-estingly, responses on the trials with different critical syllables did not show thesame bias; there was no significant effect of context in Session 1. Moreover, par-ticipants perceived different critical syllables as being less similar to each other inthe overlapping contexts than in the complementary contexts in Session 2.9The interaction model failed to converge with the maximal random effect structure as well aswith the uncorrelated random intercepts and random slopes. Therefore, the interaction model and thereduced model were fitted without interaction between the random slopes.118Despite the confound, the comparison between the complementary-natural andthe complementary-unnatural contexts in the trials with different critical syllablesremains interpretable because they differed only in the naturalness of the context.Crucially, the difference in the naturalness did not affect the similarity rating inSession 1, but it did so in Session 2. Specifically, participants perceived [ùa]and [Ca] as being more similar to each other when the syllables were presentedin the complementary-natural contexts than when the syllables were presented inthe complementary-unnatural contexts. These results suggest that, through expo-sure, participants learned to perceive [ùa] and [Ca] in a way that was predicted bythe context effects hypothesis.The finding that the context-dependent perception of novel sounds requiressome learning raises a question about the generality of context effects. Asdiscussed above, while some researchers have claimed that context effects arelanguage-general, other researchers have claimed that there is some languagespecificity in context effects. The current finding supports the language speci-ficity of context effects. However, further research is needed to understand themechanisms supporting this learning.The results of Experiment 3 provide support for one of the crucial assumptionson which the context effects hypothesis for allophony learning stands: Learnersperceive novel sounds as more similar to each other when the segments are pre-sented in phonetically natural complementary contexts than in phonetically unnat-ural complementary contexts. The results of Experiment 3 also demonstrated thatthe context-dependent perception of novel sounds requires learning. The results ofExperiments 1 and 2 showed that learners who were exposed to input in which twonovel segments were in a phonetically natural complementary distribution seemedto have learned the segments as allophones (i.e., they showed reduced sensitivityto acoustic differences between the segments after exposure), while learners whowere exposed to input in which the novel segments were in a phonetically unnaturalcomplementary distribution did not (i.e., they maintained their pre-existing sensi-tivity to acoustic differences between the segments). The context effects hypothesissays that this is due to the difference in the shape of the aggregate distribution of thenovel sounds in auditory space. In the complementary-natural condition, partici-pants heard tokens of two novel segments occurring only in phonetically natural119FrequencyShift due to context effectShift due to context effectLow onset F2 High onset F2[u] context[i] context(a) Context effects in the perception of input stimuliFrequencyAggregate distribution[ʂa] [ɕa]Low onset F2 High onset F2(b) Aggregate distribution in auditory spaceFigure 4.5: Complementary-natural conditioncontexts, and the context-dependent perception of these sounds led participantsbuild the aggregate distribution of the sounds in auditory space with large overlapbetween two peaks, and thus two categories that are less distinct from each otheror even a single category. Once participants built less distinct categories or a singlecategory for the novel sounds, their perception of the novel sounds became moredependent on their knowledge about the categories (or the category) even when thesounds were presented in isolation. Therefore, participants became less sensitiveto acoustic differences between the sounds (see Figure 4.5). In the complementary-120FrequencyShift due to context effectShift due to context effectLow onset F2 High onset F2[u] context[i] context(a) Context effects in the perception of input stimuliFrequencyAggregate distribution[ʂa] [ɕa]Low onset F2 High onset F2(b) Aggregate distribution in auditory spaceFigure 4.6: Complementary-unnatural conditionunnatural condition, participants heard tokens of two segments occurring only inphonetically unnatural contexts, and thus the shape of the aggregate distributionthat participants built in their auditory space had distantly separated peaks, leadingto the learning of two clearly distinct categories. This is why participants in thecomplementary-unnatural condition maintained good sensitivity to acoustic differ-ences between the sounds (see Figure 4.6).In order to verify whether the learning of context-dependent perception is reallya part of the mechanisms behind the learning of allophony, there are some gaps that121need to be filled. First, in Experiments 1 and 2, participants were tested on theirsensitivity to acoustic differences between [ùa] and [Ca] presented in isolation in adiscrimination task. The current experiment, by contrast, tested participants’ ratingof similarity between [ùa] and [Ca] presented in contexts. How these two differenttasks are related to each other is a question that needs to be answered. Second, thecurrent experiment was designed as a within-subject comparison. Participants wereexposed to the tokens [ùa] and [Ca] in both the natural and unnatural contexts. InExperiments 1 and 2, by contrast, participants in the complementary-natural condi-tion were exposed to the tokens of [ùa] and [Ca] only in the natural contexts, whileparticipants in the complementary-unnatural contexts were exposed to the tokens[ùa] and [Ca] only in the unnatural contexts. If participants in these two conditionsof Experiment 1 and 2 were actually learning the context-dependent perception ofthe novel sounds, they should show context effects in the similarity rating task; par-ticipants in the complementary-natural condition would perceive [ùa] and [Ca] asbeing more similar to each other than participants in the complementary-unnaturalcondition do. Moreover, if the learning of context-dependent perception signif-icantly affected the learning of [ùa] and [Ca] as separate categories, participantsin the complementary-natural and the complementary-unnatural conditions shouldshow a significant difference in the rating of the similarity between the [ùa] and[Ca] even in isolation. A follow-up experiment using a between-subjects designand similarity rating in isolation would be desirable to make stronger connectionsbetween context effects and the learning of allophonic relationships. This will be afuture direction of this work.4.5 ConclusionIn this chapter, I presented the results of Experiment 3. In this experiment, I testedwhether the perception of two novel sounds is affected by the contexts in which thesounds are presented and whether the context-dependent perception of the novelsounds requires learning. The results showed that context-dependent perceptionof the novel sounds was learned through exposure to input; participants perceivedtwo novel sounds, the retroflex [ùa] and the alveolopalatal [Ca], as more similarto each other in the complementary-natural context than in the complementary-122unnatural context. The results provide support for one of the basic assumptions onwhich the context effects hypothesis for the learning of allophony stands: learnersperceive novel sounds differently in different contexts. But, the context effects inthe perception of the novel sounds still requires learning.123Chapter 5General discussionIn this chapter, I first summarize the findings of the three experiments presentedin the previous chapters. Then, I discuss the implications of these findings for thetheories of sound category learning. I also discuss some remaining questions aboutthe context effects hypothesis. Finally, I discuss some possible future directions ofthe research initiated in this dissertation.5.1 Summary of findingsThe goal of this dissertation was to investigate the mechanisms behind the learningof allophony. The results of Experiment 1 suggest that allophony can be learnedfrom the complementary distribution of segments in input. There were three im-portant findings from the results of Experiment 1. First, the results of the controlcondition showed that adult English speakers have fairly good pre-existing sensi-tivity to acoustic differences between Mandarin retroflex [ù] and alveolopalatal [C].Second, the results of the non-complementary condition showed that adult Englishspeakers maintained the pre-existing sensitivity when they were exposed to inputin which these segments were in non-complementary distribution (i.e., the occur-rences of these segments are unpredictable from relevant contexts). Third, theresults of the complementary-natural condition showed that adult English speak-ers became less sensitive to acoustic differences between [ù] and [C] when theywere exposed to input in which these segments were in complementary distribu-124tion (i.e. the occurrences of these segments are predictable from relevant contexts).The finding that exposure to input in which [ù] and [C] were in complementarydistribution resulted in the reduction in learners’ sensitivity to the difference be-tween these segments suggests that the segments were learned as something likeallophones.The results of Experiment 2, together with the results of Experiment 1, suggestthat the learning of allophony is constrained by the phonetic naturalness of thepatterns of complementary distribution. In the input used in the complementary-natural condition (Experiment 1), target segments occurred in phonetically naturalcomplementary contexts; the retroflex [ù] occurred after the high back roundedvowel [u], and the alveolopalatal [C] occurred after the high front unrounded vowel[i]. In Experiment 2, adult English speakers were exposed to input in which thetarget segments occurred in phonetically unnatural complementary contexts; theretroflex [ù] occurred after the high front unrounded vowel [i] and the alveolopalatal[C] occurred after the high back rounded vowel [u]. The important finding from theresults of Experiment 2 was that adult English speakers did not show any significantreduction in their sensitivity to acoustic differences between [ù] and [C] when theywere exposed to input in which these segments were in complementary distributionbut the pattern of the distribution was phonetically unnatural.In order to account for the role of phonetic naturalness, I proposed a hypothesisabout the mechanisms behind the learning of allophony (context effects hypothe-sis). I hypothesized that the learning of allophony, as it is seen in the reduction inlearners’ sensitivity, is partly attributed to the way learners perceive the instances ofthe target segments during exposure. Specifically, learners perceive the instancesof the target segments as being more similar to each other when they hear thesounds in phonetically natural complementary contexts. This has a significant im-pact on the shape of the aggregate distribution of the sounds in auditory space,and thus categories that learners build for the sounds as well. Since learners in thecomplementary-natural condition heard the instances of the target segments occur-ring only in phonetically natural contexts, the shape of the aggregate distribution inauditory space had two peaks that were closely separated from each other, and thusthey formed less distinct categories or even a single category for the sounds in theinput. This led to the reduction in their sensitivity to acoustic differences between125the target sounds.Experiment 3 tested whether adult English speakers’ perception of the retroflex[ù] and alveolopalatal [C] was affected by context, and whether the context-dependent perception of these novel sounds was due to perceptual biases that adultEnglish speakers already had or that they learned through exposure to input. In Ex-periment 3, adult English speakers rated the similarity between [ùa] and [Ca] beforeand after exposure to input. In the similarity rating task, learners compared [ùa]and [Ca] in three different types of context: overlapping context (i.e., both [ùa] and[Ca] were presented after the same vowel, either [i] or [u]), complementary-naturalcontext (i.e., [ùa] was presented after [u], and [Ca] was presented after [i]), andcomplementary-unnatural context (i.e., [ùa] was presented after [i] and, [Ca] waspresented after [u]). I predicted that learners would perceive these two novel sylla-bles as being more similar to each other in the complementary-natural context thanin the complementary-unnatural context. There were two important findings fromthe results of Experiment 3. First, there was no significant effect of context type onthe rating of similarity in the pre-exposure test. Second, there was a significant ef-fect of context type on the rating of similarity in the post-exposure test; participantsrated [ùa] and [Ca] as being more similar to each other in the complementary-naturalcontext than in the complementary-unnatural context. These results suggest thatthe context-dependent perception of these novel sounds requires some learning.In sum, the experiments in this dissertation suggest that allophony can belearned from the complementary distribution of target segments in input, but thatthe learning is constrained by the phonetic naturalness of the patterns of the com-plementary distribution. I argued that the mechanisms that underlie the learning ofthe allophony between two segments in phonetically natural complementary con-texts involve the learning of the context-dependent perception of the instances ofthe target segments.5.2 The role of context in the learning of sound categoriesAs reviewed in Chapter 1, numerous studies have demonstrated that both infantand adult learners can learn sound categories from the frequency distribution ofsounds in acoustic space. However, recent studies have pointed out that frequency126distribution is not the only cue that learners can use to learn sound categories.For example, Yeung and Werker (2009) demonstrated that 9-month-old infants arealready sensitive to the functional value of novel segments. In their study, 9-month-old English-learning infants failed to discriminate Hindi dental [d”a] and retroflex[ãa]. But, they successfully discriminated these Hindi sounds after being presentedwith two novel objects paired with the dental [d”a] and the retroflex [ãa] respec-tively. These results suggest that infants as young as 9 months of age can alreadyuse semantic cues for the learning of novel segmental categories.Another source of information for the learning of sound categories is the lexicalcontext (Feldman et al., 2009, 2011, 2013a,b; Martin et al., 2013; Swingley, 2009;Thiessen, 2011b). Research on the role of the lexical context in the learning ofsound categories has evolved from studies on the learning of minimal pairs by in-fants. Stager and Werker (1997) first reported that 14-month-old English-learninginfants reliably discriminated the English phonemes /b/ and /d/, but they failed tolearn a minimal pair that relied on the phonemic contrast in an audio-visual wordlearning task ([bI] vs. [dI]). Similarly, Thiessen (2007) reported that 15-month-old English-learning infants reliably discriminated the English phonemes /t/ and/d/, but they failed to learn a minimal pair that relied on the phonemic contrastin an audio-visual word learning task ([dO] vs. [tO]). These results suggest that in-fants’ ability to discriminate native phonemes at this age does not necessarily meanthat they are ready to use their ability to learn new words. They also suggest thatthe word learning task makes the processing of two discriminable but still simi-lar sounds challenging for infants of this age, possibly due to cognitive capacityconstraints (e.g., Werker and Fennell, 2004). Interestingly, Thiessen also reportedthat 15-month-old infants successfully learned the minimal pair when they weretrained with a non-minimal pair along with the minimal pair ([dO] vs. [tO] and[dObo] vs. [tOgu]). Thiessen (2011b) argued that the presentation of the similarsounds in very distinct lexical contexts made the sounds more differentiable andfacilitated the learning of minimal pairs. This happened through the process ofacquired distinctiveness.11Rost and McMurray (2009, 2010) argue that the failure of learning the minimal pairs in earlierstudies is due to the lack of within-category variability in the stimuli used in the training (e.g., onlyone token of each category was used in Thiessen (2007)).127“if an organism has difficulty differentiating between two similar stim-uli, A and B (for example, two similar sounds), they can be repeatedlypaired with two easily differentiable outcomes, X and Y (X might bepunishment, and Y a reward), such that the organism consistently ex-periences AX and BY pairings. Over time, these pairs reinforce theorignal subtle distinction between A and B and make it easier to de-tect” (Thiessen, 2011b, p.1449)Based on these findings, Feldman and colleagues developed a model of soundcategory learning which takes both acoustic information and lexical informationinto consideration (the lexical distributional model: Feldman et al. 2009). In thismodel, learners make inferences about sound categories from the distribution ofacoustic values, but they make these inferences for sounds used in specific lexicalitems. Therefore, when there is a potential ambiguity in the acoustic informa-tion (i.e. there is a significant overlap between two distributional peaks), learnerscan rely on unambiguous lexical contexts, or unambiguously differentiable soundsin the lexical contexts, to overcome the ambiguity. According to this model, thecontrast between [dO] and [tO] was potentially ambiguous for the infant learnersin Thiessen (2007), but the contrast between the words in the non-minimal pair([dObo] and [tOgu]) was not. Therefore, the infant learners used their knowledgeabout the words to sort out the potential ambiguity between [dO] and [tO], and thissignificantly facilitated the learning of the phonetic contrast between [dO] and [tO]in the cognitively-demanding word learning task.Feldman and colleagues conducted a series of sound category learning exper-iments to test whether lexical context can alone help the learning of sound cate-gories (Feldman et al., 2011, 2013b). First, they exposed adults to syllables takenfrom an 8-step continuum between [tA] and [tO]. The syllables were classified intotwo categories (i.e., the first four steps being [tA] and the second four steps being[tO]), but the categorization was not cued by the frequency distribution of the syl-lables; all of the syllables occurred with the same frequency. In the non-minimalpair condition, participants heard the [tA] tokens and [tO] tokens in different lexicalcontexts ([gutA] and [litO] or [litA] and [gutO]). In the minimal pair condition, par-ticipants heard both [tA] tokens and [tO] tokens in the same lexical contexts ([gutA]128and [gutO] or [litA] and [litO]). Participants in both conditions were tested on the dis-crimination of [tA] and [tO] after exposure. The results showed that participants inthe non-minimal pair condition performed significantly better than participants inthe minimal pair condition. These results suggest that the non-overlapping lexicalcontexts facilitated the learning of the phonetic contrast between [tA] and [tO].Feldman and colleagues also tested the effect of the lexical context in the learn-ing of sound categories by 8-month-old English-learning infants (Feldman et al.,2013b). In this experiment, infants were familiarized with syllables taken from thesame 8-step continuum between [tA] and [tO]. In the non-minimal pair condition,infants heard these syllables in non-overlapping lexical contexts ([gutA] and [litO]or [litA] and [gutO]). In the minimal pair condition, infants heard the same syllablesin overlapping lexical contexts ([gutA] and [gutO] or [litA] and [litO]) (minimal paircondition). After exposure, infants were tested on the discrimination of [tA] and[tO] using a stimulus-alternation preference procedure. The results demonstratethat the infants in the non-minimal pair condition showed a significant preferencefor non-alternating test stimuli (e.g., [tA, tA, tA, ...]) as compared to alternatingtest stimuli (e.g., [tA, tO, tA,...]), but the infants in the minimal pair condition didnot show any preference, indicating that the infants in the non-minimal pair con-dition discriminated [tA] and [tO], but the infants in the minimal pair condition didnot. These results suggest that the non-overlapping lexical context facilitated thelearning of the phonetic contrast between [tA] and [tO] by infants.A somewhat similar but different line of research has developed the idea thatthe implicit learning of sound categories is facilitated by the explicit processing ofevents that are systematically correlated with the occurrences of the target sounds(incidental learning of sound categories: Gabay et al. 2015, Lim and Holt 2011,Seitz et al. 2010, Vlahou et al. 2012, Wade and Holt 2005). The focus of thesestudies is a phenomenon called task-irrelevant perceptual learning (TIPL). Whenlearners perform a task that involves the processing of task-relevant stimuli, buttheir performance on the task or the presentation of the task-relevant stimuli issystematically correlated with the presentation of task-irrelevant stimuli, they learnthe features of the task-irrelevant stimuli (Seitz and Watanabe, 2009, and referencesthere in). TIPL is not a purely passive learning process. It is an interactive learningprocess in the sense that it relies on the reinforcement triggered by the learner’s129active engagement in the task or the active processing of the task-relevant stimuli.Because of this interactive nature, it has been argued that TIPL is an ecologicallyrealistic model for the learning of sound categories. In the real world, for example,sound categories are learned through word learning, and word learning involvesthe active processing of non-phonetic information that is systematically correlatedwith the occurrences of phonetic information (e.g., the properties of the referents).In order to demonstrate the effectiveness of incidental learning in sound cat-egory learning, researchers have used various experimental paradigms. For ex-ample, Lim and Holt (2011) used a video game paradigm to test the incidentallearning of the English liquids [l] and [ô] by Japanese speakers. In this paradigm,participants play a video game in which they shoot or capture four aliens, and eachone of the four aliens is always accompanied by one of the four syllables including[la] and [ôa]. After playing the game for about 20 minutes, participants showed asignificant improvement in the identification of the English liquids. Interestingly,participants also showed an improvement in cue weighting. It is known that whilenative English speakers rely largely on F3 in the categorization of [l] and [ô], naiveJapanese speakers rely more on F2. Participants in Lim and Holt (2011) becamemore attentive to F3 after playing the video game.Vlahou et al. (2012) used a different paradigm to test the incidental learning ofthe Hindi dental [d”a] and retroflex [ãa] by Greek speakers. In their experiments,one group of participants was trained on the identification of [d”a] and [ãa] withexplicit feedback, another group was trained in a TIPL paradigm. In the TIPLparadigm, participants heard the pairs of learning stimuli: two tokens of the dental[d”a] or two tokens of the retroflex [ãa]. Crucially, while the stimuli in the dentalpair were played with the same intensity level, the stimuli in the retroflex pairwere played with different intensity levels, and participants were asked to decidewhether the stimuli in each pair had the same intensity level or different intensitylevels. This means that participants’ responses are always correlated with the typeof the consonant: “same” to the dental pair and “different” to the retroflex pair.After exposure, participants who were trained in the TIPL paradigm performed aswell as partipants who were trained with explicit feedback in the identification andthe discrimination of [d”a] and [ãa]. These results suggest that incidental learningis as effective as learning with explicit feedback.130Lexical distributional learning and incidental learning are quite different learn-ing processes, but both emphasize the role of context. In lexical distributionallearning, unambiguous phonetic contrasts in the lexical context help learners tosort out potential ambiguity in input and facilitate the learning of sound categories.In incidental learning, the active processing of unambiguous task-relevant stimulihelps the learning of the properties of potentially ambiguous task-irrelevant stim-uli. Here, the task itself can be characterized as the context for the learning ofsound categories. Compared to the role of context in lexical distributional learn-ing and incidental learning, the role of context in this dissertation is very different.In this dissertation, I demonstrated that the context contributes to the learning ofallophony; when segments occur in phonetically-natural complementary contexts,they are likely to be learned as context-conditioned allophones.The contrast between Feldman et al. (2011, 2013b) and this dissertation is par-ticularly striking because these two studies implemented very similar phonotac-tic distributions of the target categories in input. In the input used in the non-minimal pair condition of Feldman et al.’s experiments, the target categories werein fact in a complementary distribution; the tokens of [tA] occurred after the vowel[i] and the tokens of [tO] occurred after the vowel [u], or the other way around.In the input used in the minimal pair condition, the target categories were in anon-complementary distribution; the tokens of [tA] and [tO] both occurred after thevowel [i] or [u]. If learners in Feldman et al.’s experiments were sensitive to thedependencies between the target categories and their contexts, those who were inthe non-minimal pair condition could have learned [tA] and [tO] as contextually-conditioned variants of a single syllable. However, the results of Feldman et al.’sexperiments showed the opposite pattern. The “complementary distribution” in theinput used in the non-minimal pair condition in fact helped the learning of [tA] and[tO] as separate categories rather than the variants of a single category. How canthe difference between the results of this dissertation and Feldman et al.’s studiesbe explained?Feldman et al. (2011) were aware of the possibility that learners in the non-minimal pair condition of their experiment could have interpreted the lexical con-texts as phonological contexts for allophonic variation between [tA] and [tO]. How-ever, they rejected the possibility for the following reasons. In their experiment,131learners in the non-minimal pair condition were in fact grouped into two sub-conditions. In one sub-condition, learners heard [tA] occurring after [li] and [tO]occurring after [gu]. In the other sub-condition, learners heard [tA] occurring af-ter [gu] and [tO] occurring after [li]. According to Feldman et al., the phonotacticdistribution of [tA] and [tO] was more natural in the first sub-condition than in thesecond sub-condition. Since [A] has higher F2 than [O], and [i] has higher F2 than[u], the occurrence of [tA] after [li] and the occurrence of [tO] after [gu] could havebeen interpreted as a result of vowel-to-vowel coarticulation. With the assumptionthat the learning of phonology is biased towards the patterns that are phoneticallynatural (e.g, Wilson, 2006), Feldman et al. (2011) predicted that if learners in thenon-minimal pair condition were learning the complementary distribution of [tA]and [tO], and allophonic variation between these syllables, those who were in thefirst sub-condition would have learned the allophony better and become less sensi-tive to acoustic differences between [tA] and [tO] than those who were in the secondsub-condition. Feldman et al., however, did not find any significant difference be-tween these two sub-conditions in terms of the learners’ sensitivity to the acousticdifferences. From these results, Feldman et al. argued that learners in their exper-iment were not interpreting the lexical contexts as phonological contexts and werenot learning allophonic variation between [tA] and [tO].In this dissertation, I assumed that context was used as phonological contextfor the learning of phonological relationships. The occurrence of different soundsin mutually exclusive contexts was used as a cue for the learning of an allophonicrelationship between the sounds. Feldman et al. (2011), by contrast, claimed thatcontext was used as lexical context for the learning of phonetic contrasts. Theoccurrence of different sounds in mutually exclusive contexts was used as a cue forestablishing separate categories for the sounds. Therefore, the crucial differencebetween my claim and Feldman et al.’s claim is in the way learners interpreted thecontext either as lexical context or phonological context. How did the differenceemerge first of all?One possible explanation is the amount of variability in exposure stimuli. Inthe non-minimal pair condition of Feldman et al. (2011, 2013b)’s experiments, thetokens of target syllables (i.e., 8 syllables from a continuum between [tA] and [tO])were presented after two syllables (e.g., [li] or [gu]). This means that there were132two types of bisyllabic strings in the input (e.g., [litA] and [gutO]). By contrast, inthe complementary-natural condition of Experiment 1 of in this dissertation, thetokens of target syllables (i.e., 8 syllables from a continuum between [ùa] and [Ca])were presented after four syllables (e.g., [li] and [pi] or [lu] and [pu]) in a session.This means that there were four types of bisyllabic strings in the input (e.g., [liCa],[piCa], [luùa], and [puùa]). This slight difference in the amount of variability inexposure stimuli could have directed learners in these two studies in different di-rections in terms of the interpretation of the context. Studies have demonstratedthat the learning of regularities in input is facilitated by the presence of variabil-ity that seems to highlight the regularities under certain conditions (e.g., Go´mez,2002). A recent study on the acquisition of phonotactics by infants has suggestedthat the acquisition of native phonotactic patterns is determined by the type fre-quency of the patterns or the number of different items in which the patterns areinstantiated in input rather than the token frequency of the patterns or the frequencyof the occurrences of the patterns in input (Archer and Curtin, 2011). Therefore, itis possible that learners in the experiments reported in this dissertation learned thedependencies between the target syllables and the contexts as phonotactic regulari-ties because the input stimuli had enough variability, and learners in Feldman et al.(2011, 2013b) failed to learn the dependencies between the target syllables and thecontexts as phonotactic regularities (i.e., the dependencies were learned as parts ofthe lexical forms), because the input stimuli did not have enough variability.To test whether the amount of variability in the input plays a crucial role indetermining whether learners adopt a phonological interpretation or a lexical inter-pretation of the context, it would be useful to replicate the non-minimal pair con-dition of Feldman et al. (2011, 2013b) with more variability in input stimuli. Withmore variability, I expect that learners would learn the dependencies between thetarget syllables and the contexts as a kind of complementary distribution and wouldtherefore learn the allophonic relationship between [tA] and [tO]. Alternatively, itwould be useful to replicate the complementary-natural condition of Experiment 1of this dissertation without the stimulus variability. I expect that learners would in-terpret the context as lexical context and therefore would fail to learn the allophonicrelationship between [ù] and [C].If the amount of variability in input stimuli determines the way the context is133interpreted, gradually increasing the amount of stimulus variability in the input inwhich target segments are in complementary distribution would induce inverted U-shaped learning. Initial exposure to input with low stimulus variability (e.g., onetype of stimuli for each one of the target segments, just like the input used inthe non-minimal pair condition of Feldman et al. 2011 and Feldman et al. 2013b)would induce learners to interpret the context as lexical context, and would facili-tate the learning of target segments as distinct categories, but subsequent exposureto input with high stimulus variability would induce learners to direct their attentionto the complementary distribution and would therefore encourage the learning ofan allophonic relationship between the target segments. If these two types of learn-ing happen in succession, learners would initially show good sensitivity to acousticdifferences between the target segments, but their sensitivity would decline as theyget exposure to input stimuli with more variability.5.3 Some remaining questions about the context effectshypothesisIn Chapter 3, I explained the results of Experiment 1 and 2 using both articulatoryand auditory theories of context effects. Since both articulatory and auditory theo-ries provided the same explanations for the results of Experiments 1 and 2, I did notmake any judgments about which theories are better than the others in explainingperceptual biases in the learning of allophony within the context of Experiments 1and 2. However, when it comes to the question of how robust these theories arein explaining the learning of allophony beyond the results of Experiments 1 and 2,auditory theory (i.e., spectral contrast theory) faces some serious problems. In thissection, I will discuss these problems.5.3.1 DirectionalityThe first problem is directionality. The pattern of complementary distribution im-plemented in the input used in the complementary-natural condition of Experiment1 assumed carryover assimilation. The distribution of the retroflex [ù] and alve-olopalatal [C] was conditioned by the quality of the preceding vowel: [ù] occurringafter the high back rounded vowel [u] and [C] occurring after the high front un-134rounded vowel [i]. Therefore, if [ù] and [C] are considered to be two variants of asingle phoneme, the variation could have arisen as a result of the assimilation ofthe fricative to the preceding vowel. In this case, both articulatory and auditorytheories are able to explain the change in learners’ sensitivity to acoustic differ-ences between [ù] and [C] after exposure (see Section 3.7.2). However, carryoverassimilation is not the only process through which allophony arises; there are agreat deal of instances in which allophony arises through anticipatory assimilation.For example, in Japanese, the distribution of sibilants is conditioned by thequality of the following vowel. While the alveolar sibilants [s] and [(d)z] occurbefore [a], [e], [o], and [W], the alveolopalatal sibilants [C] and [(d)ý] occur before[i].2 The allophonic relationship between the alveolar sibilants and alveolopalatalsibilants can be described as a result of palatalization; the alveolar sibilants assim-ilate to the palatality of the following high front unrounded vowel [i]. Moreover,one of the most commonly attested allophonic alternations in natural languages isvowel nasalization, and this predominantly happens as anticipatory assimilation;vowels are nasalized when followed by a nasal consonant. Since spectral contrasttheory is all about how the perception of the precursor stimulus modulates the per-ception of the following stimulus, it has nothing to say about the influence of thefollowing stimulus on the precursor stimulus (Fowler, 2006, pp.163-164). There-fore, spectral contrast theory cannot explain the learning of allophony in general.5.3.2 Non-spectral informationAnother problem of spectral contrast theory is that its explanatory potential is lim-ited to cases in which allophony relies on the spectral properties of the interactingsegments. However, there are a great number of cases in which spectral propertiesare not relevant to allophony. For example, in English, voiceless stops have twoallophonic variants, the voiceless aspirated and the voiceless unaspirated. The for-mer type occurs in word-initial syllable onset position as a singleton (e.g., [phIt]),and the latter type occurs in syllable onset position in a cluster following [s](e.g., [spIt]). Researchers have argued that the presence or absence of aspirationin voiceless stops in these two contexts can be explained by the timing relation-2The complementary distribution of the alveolar sibilants and alveolopalatal sibilants is seen onlyin a subset of Japanese lexicon, old Japanese words.135ships between oral and laryngeal gestures (Iverson and Salmons, 1995; Kim, 1970;Kingston, 1990; Lo¨fqvist and Yoshioka, 1981; Yoshioka et al., 1981).When a voiceless stop is produced in word-initial syllable onset position as asingleton, the vocal folds open for the production of the voiceless stop. Then, thevocal folds start closing after the release of the oral closure in order to be able toproduce voicing for the following vowel. Aspiration happens as a consequenceof continuous airflow between the release of the oral closure and the onset of thevoicing (Kim, 1970). By contrast, when a voiceless stop is produced in syllableonset position in a cluster following [s], the vocal folds open for the production ofthe voiceless [s]. Then, the vocal folds start closing during the production of [s],and by the time that the oral closure for the following stop is released the vocalfolds are already in the setting for the production of the voicing for the followingvowel. Therefore, there is not enough time for aspiration to happen between therelease of the oral closure and the onset of the voicing (Kim, 1970; Yoshioka et al.,1981; Lo¨fqvist and Yoshioka, 1981). Browman and Goldstein (1986) argue thatEnglish has a constraint that allows only one laryngeal gesture in a word-initial(syllable-initial) consonant cluster. The articulatory theories can potentially ex-plain the learning of the allophonic relationship between the voiceless aspirated andvoiceless unaspirated stops by assuming that learners have innate knowledge aboutthe complexity in the timing relationships between oral and laryngeal gestures orthat learners acquire such knowledge through experience. However, spectral con-trast theory cannot explain the learning of the allophony because it has nothing tosay about the relative timing of articulatory and acoustic events.The limitations of spectral contrast theory discussed above indicate that thearticulatory theories are more robust in explaining the learning of allophony. How-ever, this does not necessarily mean that spectral contrast effects are irrelevant tothe learning of allophony. It is possible that different mechanisms are involved inthe learning of different types of allophony, and that spectral contrast effects stillplay a role in the learning of certain types of allophony, as with the case examinedin this dissertation. A better way to test the role of spectral contrast effects in thelearning of allophony examined in this dissertation would be to replicate the resultsof Experiment 1 using non-lingustic contexts.Studies have demonstrated that listeners show context effects in the perception136of speech sounds even when the sounds are presented in non-linguistic auditorycontexts (Holt, 1999, 2005; Holt et al., 2000; Lotto and Kluender, 1998). Forexample, Lotto and Kluender (1998) reported that English speakers showed contexteffects in the categorization of a continuum between [da] and [ga] when the stimuliwere presented after the frequency modulated (FM) glides that mimicked the F3trajectories of [l] and [ô]. These results have been considered as strong supportfor spectral contrast effects in speech perception. Therefore, if the learning ofallophony examined in this dissertation arises from spectral contrast effects, thesame learning effects are expected to be obtained even when the contexts are non-linguistic auditory stimuli. When learners are exposed to the tokens of the retroflex[ùa] after an FM glide that mimics the F2 trajectory of high back rounded vowel[u] and the tokens of the alveolopalatal [Ca] after an FM glide that mimics the F2trajectory of the high front unrounded vowel [i], they should become less sensitiveacoustic differences between [ùa] and [Ca].5.4 Future directionsIn this dissertation, I presented the first experimental support for the distributionallearning of allophony by adults. What we need to test now is how robust the dis-tributional learning of allophony is. In this dissertation, I demonstrated that adultEnglish speakers can learn an allophonic relationship between two novel frica-tives, the retroflex [ù] and alveolopalatal [C], when they are exposed input in whichthese fricatives are in a phonetically natural complementary distribution (i.e., theretroflex [ù] occurs after the high back vowel [u] and the alveolopalatal [C] occursafter the high front vowel [i]). As I made it clear in Chapter 2, this particular pat-tern of complementary distribution is artificial in the sense that it is not attestedin any natural languages. Therefore, the distributional learning of allophony stillneeds to be tested with patterns that are attested in natural languages.Allophony in natural language shows a wide variety of form and complexity.The contexts that determine the distribution include segments, features, and thestructural properties of higher-level phonological representations, such as syllable,foot, word, and phrase. Moreover, a phoneme can have more than two allophones.For example, the English alveolar stop /t/ has at least five allophones, and the con-137Table 5.1: Allophones of English /t/Allophones Contexts Examples[th] voiceless aspirated syllable onset tie ["thaI][t] voiceless unaspirated after [s] sty ["staI][t^] unreleased syllable coda mat ["mæt^][P] glottal stop before a syllabic nasal beaten ["biPn"][R] tap between stressed and unstressed vowels city ["sIRi]texts that determine their distributions vary from adjacent segments to prosodicpositions (see Table 5.1).3 /t/ is aspirated when it occurs as a singleton at the onsetof a stressed syllable, especially in word initial position. It is unaspirated whenit occurs in a consonant cluster following [s]. It is unreleased when it occurs insyllable coda position, especially in word-final position. It is realized as a glottalstop when it is followed by a syllabic nasal consonant. And it is realized as a tapwhen it occurs between two vowels, especially when the first vowel is stressed andthe second vowel is unstressed.It is also true that allophony is not categorical. First, as Hall (2009) has pro-posed, phonological relationships are a probabilistic phenomenon rather than acategorical dichotomy between phonemic contrast and allophony. The more theoccurrences of two segments are predictable in particular environments, the moreallophonic the segments are. Second, the phonetic realization of allophones canbe gradient. For example, I discussed in Chapter 1 that English /l/ has two allo-phones: the light [l] occurring in syllable-initial position and the dark (velarized)[ë] occurring in syllable final position. However, Sproat and Fujimura (1993) havedemonstrated that the degree of the velarization varies according to the duration ofthe syllable rhyme; the longer the rhyme is, the greater the degree of velarizationis.The question of how distributional learning can deal with these complexitiesneeds to be explored. For example, this dissertation demonstrated that adults canlearn an allophonic relationship between two segments from input in which the3This is not an exhaustive list of the allophones of /t/ and the contexts that determine their dis-tributions. The realization of the allophones also depends on dialect (e.g., Ladefoged and Johnson,2014).138occurrences of the segments were fully predictable from context. What will hap-pen when learners are exposed to input in which the complementary distribution oftwo segments is probabilistic (e.g., the target segments occur in mutually exclusivecontexts 75% of times but they occur in overlapping contexts 25% of times)? Fig-ure 5.1 shows such a distribution. In this hypothetical input, there are eight soundstaken from a continuum. The frequency distribution of these syllables shows abimodal shape, implying that the sounds are classified into two segments. Thephonotactic distribution of the segments is probabilistic. The instances of each syl-lable occur in both contexts C1 and C2 but with different frequencies; the syllablesthat belong to the first category (the distributional peak to the left) occur in C125% of time and in C2 75% of time, and the syllables that belong to the secondcategory (the distributional peak to the right) occur in C1 75% of time and in C225% of time. Are learners able to learn the probabilistic dependencies betweenthe target segments and their contexts and learn the target segments as probabilis-tic allophones? If so, how will the learning be reflected in learners’ sensitivity toacoustic differences between the target segments?0510151 2 3 4 5 6 7 8Acoustic valuesFrequency ContextsC1C2Figure 5.1: Probabilistic distribution139Studies on the distributional learning of syntax have demonstrated that adultsare sensitive to probabilistic variation in the grammar of an artificial language(Hudson Kam and Newport, 2005). The learning of allophony depends on thelearning of phonotactic regularities. If adults are sensitive to probabilistic variationin the phonotactic regularities, they should be able to learn probabilistic allophones;they will learn two segments as being less allophonic when the segments occur inoverlapping contexts 25% of time than when the segments never occur in overlap-ping contexts. According to Hall (2009)’s information theoretic account of phono-logical status, listeners’s sensitivity to the acoustic differences between allophonesis determined by how predictable these allophones are. As the predictability of al-lophones becomes lower, listeners’ sensitivity to the allophonic variation becomeshigher. Therefore, compared to learners who are exposed to input in which the oc-currences of two segments are either fully predictable or fully unpredictable, thosewho are exposed to input in which the distribution of the segments is probabilisticshould show an intermediate level of sensitivity to acoustic differences between thetarget segments.4The goal of this dissertation was to understand the mechanisms behind thelearning of allophony. This was largely motivated by the question of how infantslearn allophony (see Chapter 1). Given the findings that both infants and adults aresensitive to various kinds of distributional information in input—specifically, thephonotactic distribution of segments across contexts—I hypothesized that both in-fants and adults can learn allophony from the phonotactic distribution of segmentsin input. However, I only tested the learning of allophony by adults. Therefore,it needs to be tested whether infants can learn the same allophony. There existsa study that already demonstrated that 8.5- and 12-month-old English-learning in-fants learned an allophonic relationship between two consonants when they wereexposed to input in which the consonants occurred in complementary contexts(White et al., 2008). However, we need more experimental studies on the learningof allophony by infants to understand whether infants and adults are using the same4It is worth noting that Hudson Kam and Newport (2005) demonstrated that there is a age differ-ence in the learning of probabilistic variation in artificial language. While adults learn the languagewith probabilistic variation, children (5- to 7-years-old) regularized the language by generalizing theprobabilistically dominant patterns. Therefore, it is possible that there will be a similar age differencein the learning of probabilistic allophony as well.140learning mechanisms or not. In this dissertation, I demonstrated that the learningof allophony by adults is constrained by the phonetic naturalness of the patterns ofcomplementary distribution. Do the same phonetic naturalness constraints applyto the learning of allophony by infants? Some studies have demonstrated that in-fants’ learning of phonological patterns is constrained by the phonetic naturalnessof the patterns (e.g., Gerken and Bollt, 2008; White and Sundara, 2014). There-fore, it is very important to understand how the learning of allophony by infants isconstrained as well. In order to account for the role of phonetic naturalness in thelearning of allophony, I proposed a specific hypothesis about the mechanisms be-hind the learning of allophony, the context effects hypothesis. If the same phoneticnaturalness constraints apply to the learning of allophony by infants, it should betested whether infants learn the context-dependent perception of sounds throughexposure.5.5 Final remarksThe learning of allophony is a small but important part of phonological acquisi-tion. By studying allophony, we can see how phonology is acquired as a system ofknowledge. Human learners have the abilities to learn various elements of phono-logical systems, such as phonetic categories and phonotactic regularities, from sta-tistical information in input, and the learning is constrained by cognitive factorssuch as learning biases (both domain-general and language-specific) and percep-tual factors such as context effects (both domain-general and language-specific).The learning of allophony is built upon the interplay of these aspects of languagelearning and speech processing. Therefore, studying the mechanisms behind thelearning of allophony is studying the mechanisms involved in the interplay. Thereis much more work to be done to understand the details of the interplay.141BibliographyAbramson, A. S. and Lisker, L. (1964). A cross-language study of voicing ininitial stops: Acoustical measurements. Word, 20(3):384–422. → pages 18Agresti, A. (2002). Categorical data analysis. Wiley Series in Probability andStatistics. Wiley-Interscience, Hoboken, 2nd edition. → pages 109, 110Agresti, A. (2010). Analysis of ordinal categorical data. Wiley, Hoboken, 2ndedition. → pages 110Allen, J. S. and Miller, J. L. (1999). Effects of syllable-initial voicing andspeaking rate on the temporal characteristics of monosyllabic words. TheJournal of the Acoustical Society of America, 106(4):2031–2039. → pages xi,18Archer, S. L. and Curtin, S. (2011). Perceiving onset clusters in infancy. InfantBehavior and Development, 34(4):534–540. → pages 25, 133Aslin, R. N. and Pisoni, D. B. (1980). Some developmental processes in speechperception. In Yeni-Komshian, G. H., Kavanagh, J. F., and A, F. C., editors,Child phonology: Perception, volume 2, pages 67–96. Academic Press, NewYork. → pages 62Aslin, R. N., Saffran, J. R., and Newport, E. L. (1998). Computation ofconditional probability statistics by 8-month-old infants. PsychologicalScience, 9(4):321–324. → pages 67Astheimer, L. B. and Sanders, L. D. (2009). Listeners modulate temporallyselective attention during natural speech processing. Biological Psychology,80(1):23–34. → pages 8Astheimer, L. B. and Sanders, L. D. (2011). Predictability affects early perceptualprocessing of word onsets in continuous speech. Neuropsychologia,49(12):3512–3516. → pages 8142Barr, D. J., Levy, R., Scheepers, C., and Tily, H. J. (2013). Random effectsstructure for confirmatory hypothesis testing: Keep it maximal. Journal ofMemory and Language, 68(3):255–278. → pages 111Bateman, N. (2007). A crosslinguistic investigation of palatalization. PhD thesis,University of California, San Diego. → pages 52Bates, D., Kliegl, R., Vasishth, S., and Baayen, H. (2015). Parsimonious mixedmodels. Manuscript submitted for publication. → pages 111Beddor, P. S., Harnsberger, J. D., and Lindemann, S. (2002). Language-specificpatterns of vowel-to-vowel coarticulation: acoustic structures and theirperceptual correlates. Journal of Phonetics, 30(4):591–627. → pages 101Beddor, P. S. and Strange, W. (1982). Cross-language study of perception of theoral–nasal distinction. The Journal of the Acoustical Society of America,71(6):1551–1561. → pages 6Best, C. T. (1995). A direct realist view of cross-language speech perception. InStrange, W., editor, Speech perception and linguistic experience : Issues incross-language research, pages 171–204. York Press, Baltimore. → pages 15Best, C. T., McRoberts, G. W., and Goodell, E. (2001). Discrimination ofnon-native consonant contrasts varying in perceptual assimilation to thelistener’s native phonological system. The Journal of the Acoustical Society ofAmerica, 109(2):775–794. → pages 15Best, C. T., McRoberts, G. W., and Sithole, N. M. (1988). Examination ofperceptual reorganization for nonnative speech contrasts: Zulu clickdiscrimination by English-speaking adults and infants. Journal of ExperimentalPsychology: Human Perception and Performance, 14(3):345–360. → pages 15,16, 39Bhat, D. (1973). Retroflexion: an areal feature. Working Papers on LanguageUniversals, 13:27–67. → pages 52, 53Bhat, D. (1974). Retroflexion and retraction. Journal of Phonetics, 2:233–237. →pages 51Bhat, D. N. (1978). A general study of palatalization. In Greenberg, J. H., editor,Universals of human language, volume 2, pages 47–92. Stanford UniversityPress, Stanford. → pages 52143Blevins, J. (2008). Natural and unnatural sound patterns: A pocket field guide. InWillems, K. and De Cuypere, L., editors, Naturalness and iconicity inlanguage, pages 121–148. John Benjamins Publishing, Amsterdam. → pages68Boersma, P. and Weenink, D. (2001). Praat, a system for doing phonetics bycomputer. Glot International, 5(9):342–345. → pages 42, 44Boomershine, A., Hall, K. C., Hume, E., and Johnson, K. (2008). The impact ofallophony versus contrast on speech perception. In Avery, P., Dresher, B. E.,and Rice, K., editors, Contrasts in phonology: theory, perception, acquisition,pages 145–171. Mouton de Gruyter, Berlin. → pages 6Browman, C. P. and Goldstein, L. (1986). Towards an articulatory phonology.Phonology, 3(1):219–252. → pages 136Browman, C. P. and Goldstein, L. (1989). Articulatory gestures as phonologicalunits. Phonology, 6(2):201–251. → pages 3Browman, C. P. and Goldstein, L. (1992). Articulatory phonology: An overview.Phonetica, 49(3-4):155–180. → pages 3Byrd, D. and Tan, C. C. (1996). Saying consonant clusters quickly. Journal ofPhonetics, 24(2):263–282. → pages 69Carpenter, A. C. (2010). A naturalness bias in learning stress. Phonology,27(3):345–392. → pages 30, 68, 69Chambers, K. E., Onishi, K. H., and Fisher, C. (2003). Infants learn phonotacticregularities from brief auditory experience. Cognition, 87(2):B69–B77. →pages 2, 25, 70Chambers, K. E., Onishi, K. H., and Fisher, C. (2010). A vowel is a vowel:generalizing newly learned phonotactic constraints to new contexts. Journal ofExperimental Psychology: Learning, Memory, and Cognition, 36(3):821–828.→ pages 26Chambers, K. E., Onishi, K. H., and Fisher, C. (2011). Representations forphonotactic learning in infancy. Language Learning and Development,7(4):287–308. → pages 25Chang, Y.-H. S. (2010). Lip rounding in Taiwan Mandarin retroflex sibilants.Poster presented at the 84th Annual Meeting of the Linguistic Society ofAmerica, Baltimore. → pages 52144Chang, Y.-H. S. (2013). Variability in cross-dialectal production and perceptionof contrasting phonemes: the case of the alveolar-retroflex contrast in Beijingand Taiwan Mandarin. PhD thesis, University of Illinois atUrbana-Champaign. → pages 35, 172Chao, Y. R. (1948). Mandarin primer: an intensive course in spoken Chinese.Harvard University Press, Cambridge. → pages 33, 34, 37Cheng, C.-C. (1973). A synchronic phonology of Mandarin Chinese. Mouton deGruyter, Berlin. → pages 34Chiu, C. (2009). Acoustic and auditory comparisons of Polish and TaiwaneseMandarin sibilants. Canadian Acoustics, 37(3):142–143. → pages x, 35, 36Chiu, C. (2010). Attentional weighting of Polish and Taiwanese Mandarinsibilant perception. In Heijl, M., editor, Proceedings of the 2010 Canadianlinguistics association annual conference. → pages 36, 37, 40Chomsky, N. and Halle, M. (1968). The sound pattern of English. Harper & Row,New York. → pages 3Christensen, R. H. B. (2015a). Analysis of ordinal data with cumulative linkmodels—estimation with the ordinal package. R package version 2015.6-28.http://www.cran.r-project.org/package=ordinal/. → pages 109, 110Christensen, R. H. B. (2015b). ordinal—regression models for ordinal data. Rpackage version 2015.6-28. http://www.cran.r-project.org/package=ordinal/. →pages 110Christensen, R. H. B. (2015c). A tutorial on fitting cumulative link mixed modelswith clmm from the ordinal package. R package version 2015.6-28.http://www.cran.r-project.org/package=ordinal/. → pages 109, 110, 112Christie Jr., W. M. (1974). Some cues for syllable juncture perception in English.The Journal of the Acoustical Society of America, 55(4):819–821. → pages 7Chung, K. S. (2006). Hypercorrection in Taiwan Mandarin. Journal of AsianPacific Communication, 16(2):197–214. → pages 172Church, K. W. (1987). Phonological parsing and lexical retrieval. Cognition,25(1):53–69. → pages 7Clements, G. N. (1985). The geometry of phonological features. Phonology,2(1):225–252. → pages 3145Coady, J. A., Kluender, K. R., and Rhode, W. S. (2003). Effects of contrastbetween onsets of speech and other complex spectra. The Journal of theAcoustical Society of America, 114(4):2225–2235. → pages 91Connine, C. M., Blasko, D. G., and Titone, D. (1993). Do the beginnings ofspoken words have a special status in auditory word recognition? Journal ofMemory and Language, 32(2):193–210. → pages 8Cover, T. M. and Thomas, J. A. (2006). Elements of information theory.Wiley-Interscience, Hoboken, 2nd edition. → pages 7Creel, S. C., Newport, E. L., and Aslin, R. N. (2004). Distant melodies: statisticallearning of nonadjacent dependencies in tone sequences. Journal ofExperimental Psychology: Learning, Memory, and Cognition,30(5):1119–1130. → pages 68Cristia, A., McGuire, G. L., Seidl, A., and Francis, A. L. (2011a). Effects of thedistribution of acoustic cues on infants’ perception of sibilants. Journal ofPhonetics, 39(3):388–402. → pages 20, 22Cristia, A. and Seidl, A. (2008). Is infants’ learning of sound patterns constrainedby phonological features? Language Learning and Development,4(3):203–227. → pages 25Cristia, A., Seidl, A., and Francis, A. (2011b). Phonological features in infancy.In Clements, G. N. and Rachid, R., editors, Where do phonological contrastscome from? Cognitive, physical and developmental bases of distinctive speechcategories, pages 303–326. John Benjamins Publishing Company, Amsterdam.→ pages 25Cristia, A., Seidl, A., and Gerken, L. (2011c). Learning classes of sounds ininfancy. In Proceedings of The 34 th Annual Penn Linguistics Colloquium,volume 17 of University of Pennsylvania Working Papers in Linguistics, page 9.→ pages 68Cutting, J. E. and Rosner, B. S. (1974). Categories and boundaries in speech andmusic. Perception & Psychophysics, 16(3):564–570. → pages 10Darcy, I., Peperkamp, S., and Dupoux, E. (2007). Bilinguals play by the rules:Perceptual compensation for assimilation in late L2-learners. In Cole, J. andHualde, J. I., editors, Laboratory phonology, volume 9, pages 411–442.Mouton de Gruyter, Berlin. → pages 17146Darcy, I., Ramus, F., Christophe, A., Kinzler, K., and Dupoux, E. (2009).Phonological knowledge in compensation for native and non-nativeassimilation. In Ku¨gler, F., Fe´ry, C., and van de Vijver, R., editors, Variationand gradience in phonetics and phonology, pages 265–310. Mouton deGruyter, Berlin. → pages 17De Boer, B. and Kuhl, P. K. (2003). Investigating the role of infant-directedspeech with a computer model. Acoustics Research Letters Online,4(4):129–134. → pages 20De Lacy, P. (2004). Markedness conflation in optimality theory. Phonology,21(2):145–199. → pages 69Delattre, P. C., Berman, A., and Cooper, F. S. (1962). Formant transitions and locias acoustic correlates of place of articulation in American fricatives. StudiaLinguistica, 16(1-2):104–122. → pages 36Delgutte, B. (1980). Representation of speech-like sounds in the dischargepatterns of auditory-nerve fibers. The Journal of the Acoustical Society ofAmerica, 68(3):843–857. → pages 88Delgutte, B. (1997). Auditory neural processing of speech. In Hardcastle, W. J.and Laver, J., editors, The handbook of phonetic sciences, pages 507–538.Blackwell, Oxford. → pages 88Delgutte, B. and Kiang, N. Y. (1984). Speech coding in the auditory nerve: I.vowel-like sounds. The Journal of the Acoustical Society of America,75(3):866–878. → pages 88Dell, G. S., Reed, K. D., Adams, D. R., and Meyer, A. S. (2000). Speech errors,phonotactic constraints, and implicit learning: a study of the role of experiencein language production. Journal of Experimental Psychology: Learning,Memory, and Cognition, 26(6):1355–1367. → pages 26Diehm, E. E. (1998). Gestures and linguistic function in learning Russian:Production and perception studies of Russian palatalized consonants. PhDthesis, Ohio State University. → pages 64, 65Dietrich, C., Swingley, D., and Werker, J. F. (2007). Native language governsinterpretation of salient speech sound differences at 18 months. Proceedings ofthe National Academy of Sciences, 104(41):16027–16031. → pages 14, 24147Dixit, R. P. and Flege, J. E. (1991). Vowel context, rate and loudness effects oflinguopalatal contact patterns in Hindi retroflex /ú/. Journal of Phonetics,19(2):213–229. → pages 51Duanmu, S. (2007). The phonology of standard Chinese. Oxford UniversityPress, Oxford. → pages 33, 34Eimas, P. D. (1974). Auditory and linguistic processing of cues for place ofarticulation by infants. Perception & Psychophysics, 16(3):513–521. → pages10Eimas, P. D., Siqueland, E. R., Jusczyk, P., and Vigorito, J. (1971). Speechperception in infants. Science, 171(3968):303–306. → pages 9Endress, A. D. and Mehler, J. (2010). Perceptual constraints in phonotacticlearning. Journal of Experimental Psychology: Human Perception andPerformance, 36(1):235. → pages 70Farnetani, E. and Recasens, D. (1997). Coarticulation and connected speechprocesses. In Hardcastle, W. J. and Laver, J., editors, The handbook of phoneticsciences, pages 371–404. Blackwell, Oxford. → pages 84Feldman, N., Myers, E., White, K., Griffiths, T., and Morgan, J. (2011). Learnersuse word-level statistics in phonetic category acquisition. In Proceedings of the35th Boston Uniniversity Conference on Language Development, pages197–209. → pages 13, 127, 128, 131, 132, 133, 134Feldman, N. H., Griffiths, T. L., Goldwater, S., and Morgan, J. L. (2013a). A rolefor the developing lexicon in phonetic category acquisition. PsychologicalReview, 120(4):751. → pages 13, 127Feldman, N. H., Griffiths, T. L., and Morgan, J. L. (2009). Learning phoneticcategories by learning a lexicon. In Proceedings of the 31st annual conferenceof the Cognitive Science Society, pages 2208–2213. → pages 127, 128Feldman, N. H., Myers, E. B., White, K. S., Griffiths, T. L., and Morgan, J. L.(2013b). Word-level information influences phonetic learning in adults andinfants. Cognition, 127(3):427–438. → pages 13, 127, 128, 129, 131, 132, 133,134Fenn, K. M., Nusbaum, H. C., and Margoliash, D. (2003). Consolidation duringsleep of perceptual learning of spoken language. Nature, 425(6958):614–616.→ pages 41148Finn, A. S. and Hudson Kam, C. L. (2008). The curse of knowledge: Firstlanguage knowledge impairs adult learners’ use of novel statistics for wordsegmentation. Cognition, 108(2):477–499. → pages 14Fiser, J. and Aslin, R. N. (2002). Statistical learning of higher-order temporalstructure from visual shape sequences. Journal of Experimental Psychology:Learning, Memory, and Cognition, 28(3):458. → pages 67Flege, J. E. (1995). Second language speech learning: Theory, findings, andproblems. In Strange, W., editor, Speech perception and linguistic experience:Issues in cross-language research, pages 233–277. York Press, Baltimore. →pages 15Flemming, E. (2003). The relationship between coronal place and vowelbackness. Phonology, 20(3):335–373. → pages 51, 52, 53Fowler, C. A. (1986). An event approach to the study of speech perception from adirect-realist perspective. Journal of Phonetics, 14(1):3–28. → pages 86Fowler, C. A. (1994). Invariants, specifiers, cues: An investigation of locusequations as information for place of articulation. Perception & Psychophysics,55(6):597–610. → pages 35Fowler, C. A. (1996). Listeners do hear sounds, not tongues. The Journal of theAcoustical Society of America, 99(3):1730–1741. → pages 86Fowler, C. A. (2006). Compensation for coarticulation reflects gesture perception,not spectral contrast. Perception & Psychophysics, 68(2):161–177. → pages86, 135Fry, D. B., Abramson, A. S., Eimas, P. D., and Liberman, A. M. (1962). Theidentification and discrimination of synthetic vowels. Language and Speech,5(4):171–189. → pages 9Fujisaki, H. and Kawashima, T. (1969). On the modes and mechanisms of speechperception. Annual Report of the Engineering Research Institute, 28:67–73. →pages 7Fujisaki, H. and Kawashima, T. (1970). Some experiments on speech perceptionand a model for the perceptual mechanism. Annual Report of the EngineeringResearch Institute, 29:207–214. → pages 7Gabay, Y., Dick, F. K., Zevin, J. D., and Holt, L. L. (2015). Incidental auditorycategory learning. Journal of Experimental Psychology: Human Perceptionand Performance, 41(4):1124–1138. → pages 129149Gerken, L. and Bollt, A. (2008). Three exemplars allow at least some linguisticgeneralizations: Implications for generalization mechanisms and constraints.Language Learning and Development, 4(3):228–248. → pages 68, 141Gilkerson, J. (2005). Categorical perception of natural and unnatural categories:evidence for innate category boundaries. UCLA Working Papers in Linguistics,13:34–58. → pages 83Gillette, S. (1980). Contextual variation in the perception of L and R by Japaneseand Korean speakers. Minnesota Papers in Linguistics and the Philosophy ofLanguage, 6:59–72. → pages 16Gnanadesikan, A. (1994). The geometry of coronal articulations. In Gonza´lez,M., editor, Proceedings of the North East Linguistic Society, volume 24, pages125–139, Amherst. → pages 52, 53Goldstone, R. L. and Hendrickson, A. T. (2010). Categorical perception. WileyInterdisciplinary Reviews: Cognitive Science, 1(1):69–78. → pages 9Go´mez, R. L. (2002). Variability and detection of invariant structure.Psychological Science, 13(5):431–436. → pages 68, 133Gordon, M., Barthmaier, P., and Sands, K. (2002). A cross-linguistic acousticstudy of voiceless fricatives. Journal of the International Phonetic Association,32(2):141–174. → pages 34Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “l”and “r”. Neuropsychologia, 9(3):317–323. → pages 16Goudbeek, M., Cutler, A., and Smits, R. (2008). Supervised and unsupervisedlearning of multidimensionally varying non-native speech categories. SpeechCommunication, 50(2):109–125. → pages 21Grieser, D. and Kuhl, P. K. (1989). Categorization of speech by infants: Supportfor speech-sound prototypes. Developmental Psychology, 25(4):577. → pages11Guenther, F. H. and Gjaja, M. N. (1996). The perceptual magnet effect as anemergent property of neural map formation. The Journal of the AcousticalSociety of America, 100(2):1111–1121. → pages 20Guion, S. G. (1996). Velar palatalization: coarticulation, perception, and soundchange. PhD thesis, University of Texas at Austin. → pages 51150Guion, S. G. (1998). The role of perception in the sound change of velarpalatalization. Phonetica, 55(1-2):18–52. → pages 51, 52Guion, S. G., Flege, J. E., Akahane-Yamada, R., and Pruitt, J. C. (2000). Aninvestigation of current models of second language speech perception: the caseof Japanese adults’ perception of English consonants. The Journal of theAcoustical Society of America, 107(5):2711–2724. → pages 16Gulian, M., Escudero, P., and Boersma, P. (2007). Supervision hampersdistributional learning of vowel contrasts. In Trouvain, J. and Barry, W. J.,editors, Proceedings of the 16th International Congress of Phonetic Sciences,pages 1893–1896. → pages 21Hall, K. C. (2009). A probabilistic model of phonological relationships fromcontrast to allophony. PhD thesis, The Ohio State University. → pages 2, 4, 5,6, 7, 8, 9, 138, 140Hall, K. C. (2012). Phonological relationships: a probabilistic model. McGillWorking Papers in Linguistics, 22(1):1–14. → pages xi, 4, 5Hall, K. C. (2013a). Documenting phonological change: A comparison of twojapanese phonemic splits. In Luo, S., editor, Proceedings of the 2013 AnnualMeeting of the Canadian Linguistic Association. → pages 5Hall, K. C. (2013b). A typology of intermediate phonological relationships. TheLinguistic Review, 30(2):215–275. → pages 4, 5Hamann, S. (2002). Retroflexion and retraction revised. In Hall, T. A.,Pompino-Marschall, B., and Rochon, M., editors, Papers on phonetics andphonology : The articulation, acoustics and perception of consonants,volume 28 of ZAS papers in linguistics, pages 13–25. Zentrum fu¨r AllgemeineSprachwissenschaft, Sprachtypologie und Universalienforschung, Berlin. →pages 51Hamann, S. (2003). The phonetics and phonology of retroflexes. LOT Press,Utrecht. → pages 51Hansson, G. O´. (2010). Consonant harmony: long-distance interaction inphonology. UC Publications in Linguistics. University of California Press,Berkeley. → pages 71Hao, Y.-C. (2012). The effect of L2 experience on second language acquisition ofMandarin consonants, vowels, and tones. PhD thesis, Indiana University. →pages x, 37, 38, 39, 100151Harnad, S. (2005). To cognize is to categorize: Cognition is categorization. InCohen, H. and Lefebvre, C., editors, Handbook of categorization in cognitivescience, pages 20–45. Elsevier, Amsterdam. → pages 9Harnsberger, J. D. (2001). The perception of Malayalam nasal consonants byMarathi, Punjabi, Tamil, Oriya, Bengali, and American English listeners: Amultidimensional scaling analysis. Journal of Phonetics, 29(3):303–327. →pages 6Harris, K. S. (1954). Cues for the identification of the fricatives of AmericanEnglish. The Journal of the Acoustical Society of America, 26(5):952–952. →pages 36Harris, K. S. (1958). Cues for the discrimination of American English fricatives inspoken syllables. Language and Speech, 1(1):1–7. → pages 36Hartman, L. M. (1944). The segmental phonemes of the Peking dialect.Language, pages 28–42. → pages 34Hayes-Harb, R. (2007). Lexical and statistical evidence in the acquisition ofsecond language phonemes. Second Language Research, 23(1):65–94. →pages 21, 63Heinz, J. M. and Stevens, K. N. (1961). On the properties of voiceless fricativeconsonants. The Journal of the Acoustical Society of America, 33(5):589–596.→ pages 34Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K. (1995). Acousticcharacteristics of American English vowels. The Journal of the Acousticalsociety of America, 97(5):3099–3111. → pages 18Hirata, Y., Whitehurst, E., and Cullings, E. (2007). Training native Englishspeakers to identify Japanese vowel length contrast with sentences at variedspeaking rates. The Journal of the Acoustical Society of America,121(6):3837–3845. → pages 16Hohne, E. A. and Jusczyk, P. W. (1994). Two-month-old infants’ sensitivity toallophonic differences. Perception & Psychophysics, 56(6):613–623. → pages14, 24Holt, L. L. (1999). Auditory constraints on speech perception: An examination ofspectral contrast. PhD thesis, University of Wisconsin–Madison. → pages 91,137152Holt, L. L. (2005). Temporally nonadjacent nonlinguistic sounds affect speechcategorization. Psychological Science, 16(4):305–312. → pages 137Holt, L. L., Lotto, A. J., and Kluender, K. R. (2000). Neighboring spectral contentinfluences vowel identification. The Journal of the Acoustical Society ofAmerica, 108(2):710–722. → pages 137Hu, F. (2008). The three sibilants in standard Chinese. In Sock, R., Fuchs, S., andYvis, L., editors, Proceedings of the 8th International Seminar on SpeechProduction, pages 105–108. INRIA. → pages 33, 34, 90Hudson Kam, C. L. and Newport, E. L. (2005). Regularizing unpredictablevariation: The roles of adult and child learners in language formation andchange. Language Learning and Development, 1(2):151–195. → pages 140Hughes, G. W. and Halle, M. (1956). Spectral properties of fricative consonants.The Journal of the Acoustical society of America, 28(2):303–310. → pages 34Hyman, R. (1953). Stimulus information as a determinant of reaction time.Journal of Experimental Psychology, 45(3):188. → pages 8Ingvalson, E. M., Holt, L. L., and McClelland, J. L. (2012). Can native japaneselisteners learn to differentiate /r–l/ on the basis of F3 onset frequency?Bilingualism: Language and Cognition, 15(02):255–274. → pages 17Iverson, G. K. and Salmons, J. C. (1995). Aspiration and laryngeal representationin Germanic. Phonology, 12(3):369–396. → pages 136Iverson, P. and Evans, B. G. (2007). Learning English vowels with differentfirst-language vowel systems: Perception of formant targets, formantmovement, and duration. The Journal of the Acoustical Society of America,122(5):2842–2854. → pages 16Iverson, P. and Evans, B. G. (2009). Learning English vowels with differentfirst-language vowel systems II: Auditory training for native Spanish andGerman speakers. The Journal of the Acoustical Society of America,126(2):866–877. → pages 16Iverson, P., Kuhl, P. K., Akahane-Yamada, R., Diesch, E., Tohkura, Y.,Kettermann, A., and Siebert, C. (2003). A perceptual interference account ofacquisition difficulties for non-native phonemes. Cognition, 87(1):B47–B57.→ pages 16153Iverson, P., Pinet, M., and Evans, B. G. (2012). Auditory training for experiencedand inexperienced second-language learners: Native French speakers learningEnglish vowels. Applied Psycholinguistics, 33(1):145–160. → pages 16Jakobson, R., Fant, G., and Halle, M. (1951). Preliminaries to speech analysis.The distinctive features and their correlates. The MIT Press, Cambridge. →pages 3Johnson, K. and Babel, M. (2010). On the perceptual basis of distinctive features:Evidence from the perception of fricatives by Dutch and English speakers.Journal of Phonetics, 38(1):127–136. → pages 6Jones, D. (1950). The phoneme: Its nature and use. Cambridge University Press,Cambridge. → pages 2Jongman, A., Wayland, R., and Wong, S. (2000). Acoustic characteristics ofEnglish fricatives. The Journal of the Acoustical Society of America,108(3):1252–1263. → pages 34Jusczyk, P. W., Friederici, A. D., Wessels, J. M., Svenkerud, V. Y., and Jusczyk,A. M. (1993). Infants’ sensitivity to the sound patterns of native languagewords. Journal of Memory and Language, 32(3):402–420. → pages 25Jusczyk, P. W., Hohne, E. A., and Bauman, A. (1999). Infants’ sensitivity toallophonic cues for word segmentation. Perception & Psychophysics,61(8):1465–1476. → pages 14, 15Jusczyk, P. W. and Luce, P. A. (1994). Infants’ sensitivity to phonotactic patternsin the native language. Journal of Memory and Language, 33(5):630–645. →pages 25Jusczyk, P. W., Rosner, B. S., Cutting, J. E., Foard, C. F., and Smith, L. B. (1977).Categorical perception of nonspeech sounds by 2-month-old infants.Perception & Psychophysics, 21(1):50–54. → pages 10Kawahara, H., Masuda-Katsuse, I., and De Cheveigne, A. (1999). Restructuringspeech representations using a pitch-adaptive time–frequency smoothing and aninstantaneous-frequency-based F0 extraction: Possible role of a repetitivestructure in sounds. Speech Communication, 27(3):187–207. → pages 46Kim, C.-W. (1970). A theory of aspiration. Phonetica, 21(2):107–116. → pages136154Kingston, J. (1990). Articulatory binding. In Kingston, J. and Beckman, M.,editors, Papers in laboratory phonology I: Between the grammar and thephysics of speech, pages 406–434. Cambridge University Press, Cambridge. →pages 136Kingston, J. (2003). Learning foreign vowels. Language and Speech,46(2-3):295–348. → pages 16Kirchner, R. M. (1998). An effort based approach to consonant lenition. PhDthesis, University of California Los Angeles. → pages 69Kirkham, N. Z., Slemmer, J. A., and Johnson, S. P. (2002). Visual statisticallearning in infancy: Evidence for a domain general learning mechanism.Cognition, 83(2):B35–B42. → pages 67Klatt, D. H. (1979). Speech perception: A model of acoustic-phonetic analysisand lexical access. Journal of Phonetics, 7(312):1–26. → pages 3Kluender, K. R., Coady, J. A., and Kiefte, M. (2003). Sensitivity to change inperception of speech. Speech Communication, 41(1):59–69. → pages 87, 101Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magneteffect” for the prototypes of speech categories, monkeys do not. Perception &Psychophysics, 50(2):93–107. → pages 11, 12Kuhl, P. K. (1994). Learning and representation in speech and language. CurrentOpinion in Neurobiology, 4(6):812–822. → pages 18Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M.,and Nelson, T. (2008). Phonetic learning as a pathway to language: new dataand native language magnet theory expanded (NLM-e). PhilosophicalTransactions of the Royal Society B: Biological Sciences, 363(1493):979–1000.→ pages 12Kuhl, P. K. and Iverson, P. (1995). Chapter 4: Linguistic experience and the“perceptual magnet effect,”. In Strange, W., editor, Speech perception andlinguistic experience: Issues in cross-language research, pages 121–154. YorkPress, Baltimore. → pages 12Kuhl, P. K. and Miller, J. D. (1978). Speech perception by the chinchilla:Identification functions for synthetic VOT stimuli. The Journal of theAcoustical Society of America, 63(3):905–917. → pages 10155Kuhl, P. K. and Padden, D. M. (1983). Enhanced discriminability at the phoneticboundaries for the place feature in macaques. The Journal of the AcousticalSociety of America, 73(3):1003–1010. → pages 10Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., and Lindblom, B.(1992). Linguistic experience alters phonetic perception in infants by 6 monthsof age. Science, 255(5044):606–608. → pages 1, 11, 12Kuo, L.-J. (2009). The role of natural class features in the acquisition ofphonotactic regularities. Journal of Psycholinguistic Research, 38(2):129–150.→ pages 70Lacerda, F. (1995). The perceptual-magnet effect: An emergent consequence ofexemplar-based phonetic memory. In Elenuis, K. and Branderud, P., editors,Proceedings of the 13th International Congress of Phonetic Sciences,volume 2, pages 140–147. → pages 18Lacerda, F. (1998). An exemplar-based account of emergent phonetic categories.Proceedings of the 16th Internatinoal Congresses on Acoustics, 3:2013–2014.→ pages 18Ladd, D. R. (2014). Simultaneous structure in phonology. Oxford UniversityPress, Oxford. → pages 3Ladefoged, P. and Bhaskararao, P. (1983). Non-quantal aspects of consonantproduction-a study of retroflex consonants. Journal of Phonetics,11(3):291–302. → pages 51Ladefoged, P. and Johnson, K. (2014). A course in phonetics. Cengage Learning,Stamford, 7th edition. → pages 4, 138Ladefoged, P. and Maddieson, I. (1996). The sounds of the world’s languages.Wiley-Blackwell, Hoboken. → pages 33, 34Ladefoged, P. and Wu, Z. (1984). Places of articulation-an investigation ofPekingese fricatives and affricates. Journal of Phonetics, 12(3):267–278. →pages 33, 34, 90Lasky, R. E., Syrdal-Lasky, A., and Klein, R. E. (1975). VOT discrimination byfour to six and a half month old infants from Spanish environments. Journal ofExperimental Child Psychology, 20(2):215–225. → pages 10Lee, C.-Y., Zhang, Y., Li, X., Tao, L., and Bond, Z. (2012). Effects of speakervariability and noise on Mandarin fricative identification by native and156non-native listeners. The Journal of the Acoustical Society of America,132(2):1130–1140. → pages 39Lee, S.-I. (2011). Spectral analysis of Mandarin Chinese sibilant fricatives. InLee, W.-S. and Zee, E., editors, Proceedings of the 17th International Congressof Phonetic Sciences, pages 1178–1181. → pages 35, 42Lee, W.-S. (2008). Articulation of the coronal sounds in Peking dialect. In Sock,R., Fuchs, S., and Yvis, L., editors, Proceedings of the 8th InternationalSeminar on Speech Production, pages 109–122. INRIA. → pages 33, 34, 90Lee-Kim, S.-I. (2014). Revisiting Mandarin ‘apical vowels’: An articulatory andacoustic study. Journal of the International Phonetic Association,44(3):261–282. → pages 33, 34, 90Li, F. (2008). The phonetic development of voiceless sibilant fricatives in English,Japanese and Mandarin Chinese. PhD thesis, The Ohio State University. →pages 35Li, F., Edwards, J., and Beckman, M. (2007). Spectral measures for sibilantfricatives of English, Japanese, and Mandarin Chinese. In Trouvain, J. andBarry, W. J., editors, Proceedings of the 16th International Congress ofPhonetic Sciences, pages 917–920. → pages 35, 37Liberman, A. M., Cooper, F. S., Shankweiler, D. P., and Studdert-Kennedy, M.(1967). Perception of the speech code. Psychological Review, 74(6):431. →pages 9Liberman, A. M., Harris, K. S., Hoffman, H. S., and Griffith, B. C. (1957). Thediscrimination of speech sounds within and across phoneme boundaries.Journal of Experimental Psychology, 54(5):358. → pages 9Liberman, A. M., Harris, K. S., Kinney, J. A., and Lane, H. (1961). Thediscrimination of relative onset-time of the components of certain speech andnonspeech patterns. Journal of Experimental Psychology, 61(5):379. → pages9Liberman, A. M. and Mattingly, I. G. (1985). The motor theory of speechperception revised. Cognition, 21(1):1–36. → pages 86Liberman, A. M. and Whalen, D. H. (2000). On the relation of speech tolanguage. Trends in Cognitive Sciences, 4(5):187–196. → pages 86157Lim, S.-j. and Holt, L. L. (2011). Learning foreign sounds in an alien world:Videogame training improves non-native speech categorization. CognitiveScience, 35(7):1390–1405. → pages 17, 129, 130Lin, S. S. (2011). Production and perception of prosodically varyinginter-gestural timing in American English laterals. PhD thesis, The Universityof Michigan. → pages 4Lin, Y. (2005). Learning features and segments from waveforms: A statisticalmodel of early phonological acquisition. PhD thesis, University of CaliforniaLos Angeles. → pages 20Lindblom, B. E. and Studdert-Kennedy, M. (1967). On the role of formanttransitions in vowel recognition. The Journal of the Acoustical Society ofAmerica, 42(4):830–843. → pages 85, 86, 99, 107Lisker, L. (1986). “Voicing” in English: a catalogue of acoustic features signaling/b/ vs. /p/ in trochees. Language and Speech, 29(1):3–11. → pages 84Lively, S. E., Logan, J. S., and Pisoni, D. B. (1993). Training Japanese listeners toidentify English /r/ and /l/. II: The role of phonetic environment and talkervariability in learning new perceptual categories. The Journal of the AcousticalSociety of America, 94(3):1242–1255. → pages 16Lively, S. E., Pisoni, D. B., Yamada, R. A., Tohkura, Y., and Yamada, T. (1994).Training Japanese listeners to identify English /r/ and /l/. III: Long-termretention of new phonetic categories. The Journal of the Acoustical Society ofAmerica, 96(4):2076–2087. → pages 16, 17Lo¨fqvist, A. and Yoshioka, H. (1981). Interarticulator programming in obstruentproduction. Phonetica, 38(1-3):21–34. → pages 136Logan, J. S., Lively, S. E., and Pisoni, D. B. (1991). Training Japanese listeners toidentify English /r/ and /l/: A first report. The Journal of the Acoustical Societyof America, 89(2):874–886. → pages 16Lotto, A. J. and Kluender, K. R. (1998). General contrast effects in speechperception: Effect of preceding liquid on stop consonant identification.Perception & Psychophysics, 60(4):602–619. → pages 87, 101, 137Lotto, A. J., Kluender, K. R., and Holt, L. L. (1997). Perceptual compensation forcoarticulation by Japanese quail (Coturnix coturnix japonica). The Journal ofthe Acoustical Society of America, 102(2):1134–1140. → pages 87, 101158Lotto, A. J., Sato, M., and Diehl, R. L. (2004). Mapping the task for the secondlanguage learner: the case of Japanese acquisition of /r/ and /l/. In Slifka, J.,Manuel, S., and Matthies, M., editors, Proceedings of From Sound to Sense:50+ Years of Discoveries in Speech Communication, pages C181–C186. →pages 16Lu, Y.-a. (2011). The psychological reality of phonological representations: Thecase of Mandarin fricatives. In Zhuo, J.-S., editor, Proceedings of the 23rdNorth American Conference on Chinese Linguistics, volume 1, pages 251–226.→ pages 34Macmillan, N. A. and Creelman, C. D. (2004). Detection theory: A user’s guide.Lawrence Erlbaum Associate, Inc., Publishers, Mahwah, 2nd edition. → pages57, 58, 79Mann, V. A. (1980). Influence of preceding liquid on stop-consonant perception.Perception & Psychophysics, 28(5):407–412. → pages 85, 86, 87, 88, 99, 107,116Mann, V. A. (1986). Distinguishing universal and language-dependent levels ofspeech perception: Evidence from Japanese listeners’ perception of English “l”and “r”. Cognition, 24(3):169–196. → pages 100Mann, V. A. and Repp, B. H. (1980). Influence of vocalic context on perception ofthe [S]-[s] distinction. Perception & Psychophysics, 28(3):213–228. → pages85, 99Marslen-Wilson, W. and Zwitserlood, P. (1989). Accessing spoken words: Theimportance of word onsets. Journal of Experimental Psychology: HumanPerception and Performance, 15(3):576. → pages 8Martin, A., Peperkamp, S., and Dupoux, E. (2013). Learning phonemes with aproto-lexicon. Cognitive Science, 37(1):103–124. → pages 127Maye, J. and Gerken, L. (2000). Learning phonemes without minimal pairs. InHowell, C., Fish, S., and Keith-Lucas, T., editors, Proceedings of the 24thAnnual Boston University Conference on Language Development, volume 2,pages 522–533, Somerville. Cascadilla Press. → pages 21, 32, 55, 89Maye, J. and Gerken, L. (2001). Learning phonemes: How far can the input takeus. In Domı´nguez, L. and Johansen, A., editors, Proceedings of the 25th annualBoston University Conference on Language Development, volume 1, pages480–490, Somerville. Cascadilla Press. → pages 21, 23159Maye, J. and Weiss, D. (2003). Statistical cues facilitate infants’ discrimination ofdifficult phonetic contrasts. In Beachley, B., Brown, A., and Conlin, F., editors,Proceedings of the 27th annual Boston University Conference on LanguageDevelopment, volume 2, pages 508–518, Somerville. Cascadilla Press. →pages 23Maye, J., Weiss, D. J., and Aslin, R. N. (2008). Statistical phonetic learning ininfants: Facilitation and feature generalization. Developmental Science,11(1):122–134. → pages 20, 23Maye, J., Werker, J. F., and Gerken, L. (2002). Infant sensitivity to distributionalinformation can affect phonetic discrimination. Cognition, 82(3):B101–B111.→ pages xi, 1, 2, 20, 21Maye, J. C. (2000). Learning speech sound categories from statisticalinformation. PhD thesis, University of Arizona. → pages 2, 18, 21, 23, 32, 62,63, 64, 84McGuire, G. L. (2007a). English listeners’ perception of Polish alveopalatal andretroflex voiceless sibilants: A pilot study. UC Berkeley Phonology Lab AnnualReport, pages 391–415. → pages 40, 44, 90McGuire, G. L. (2007b). Phonetic category learning. PhD thesis, The Ohio StateUniversity. → pages 44, 90McGuire, G. L. (2008). Selective attention and English listeners’ perceptuallearning of the Polish post-alveolar sibilant contrast. Unpablished manuscript.→ pages 40, 90McMurray, B., Aslin, R. N., and Toscano, J. C. (2009). Statistical learning ofphonetic categories: Insights from a computational approach. DevelopmentalScience, 12(3):369–378. → pages 20McMurray, B. and Jongman, A. (2011). What information is necessary for speechcategorization? Harnessing variability in the speech signal by integrating cuescomputed relative to expectations. Psychological Review, 118(2):219. → pages34Miyawaki, K., Jenkins, J. J., Strange, W., Liberman, A. M., Verbrugge, R., andFujimura, O. (1975). An effect of linguistic experience: The discrimination of[r] and [l] by native speakers of Japanese and English. Perception &Psychophysics, 18(5):331–340. → pages 16, 100160Mochizuki, M. (1981). The identification of /r/ and /l/ in natural and synthesizedspeech. Journal of Phonetics, 9:283–303. → pages 16Moore, B. C. and Glasberg, B. R. (1983). Suggested formulae for calculatingauditory-filter bandwidths and excitation patterns. The Journal of theAcoustical Society of America, 74(3):750–753. → pages 43Moray, N. and Taylor, A. (1958). The effect of redundancy in shadowing one oftwo dichotic messages. Language and Speech, 1(2):102–109. → pages 8Moreton, E. (2008). Analytic bias and phonological typology. Phonology,25(1):83–127. → pages 71Moreton, E. (2012). Inter- and intra-dimensional dependencies in implicitphonotactic learning. Journal of Memory and Language, 67(1):165–183. →pages 72Moreton, E. and Pater, J. (2012a). Structure and substance in artificial-phonologylearning, part I: Structure. Language and Linguistics Compass, 6(11):686–701.→ pages 67, 69, 71, 72Moreton, E. and Pater, J. (2012b). Structure and substance in artificial-phonologylearning, part II: Substance. Language and Linguistics Compass,6(11):702–718. → pages 67, 71, 72Moulines, E. and Charpentier, F. (1990). Pitch-synchronous waveform processingtechniques for text-to-speech synthesis using diphones. SpeechCommunication, 9(5):453–467. → pages 44Nakatani, L. H. and Dukes, K. D. (1977). Locus of segmental cues for wordjuncture. The Journal of the Acoustical Society of America, 62(3):714–719. →pages 7Nazzi, T., Bertoncini, J., and Bijeljac-Babic, R. (2009). A perceptual equivalent ofthe labial-coronal effect in the first year of life. The Journal of the AcousticalSociety of America, 126(3):1440–1446. → pages 25Newport, E. L. (1988). Constraints on learning and their role in languageacquisition: Studies of the acquisition of American sign language. LanguageSciences, 10(1):147–172. → pages 23Newport, E. L. (1990). Maturational constraints on language learning. CognitiveScience, 14(1):11–28. → pages 23161Newport, E. L. and Aslin, R. N. (2004). Learning at a distance I: Statisticallearning of non-adjacent dependencies. Cognitive Psychology, 48(2):127–162.→ pages 67, 68, 73, 83Nittrouer, S. (1992). Age-related differences in perceptual effects of formanttransitions within syllables and across syllable boundaries. Journal ofPhonetics, 20(3):351–382. → pages 36Nittrouer, S. (2002). Learning to perceive speech: How fricative perceptionchanges, and how it stays the same. The Journal of the Acoustical Society ofAmerica, 112(2):711–719. → pages 36Nittrouer, S. and Miller, M. E. (1997a). Developmental weighting shifts for noisecomponents of fricative-vowel syllables. The Journal of the Acoustical Societyof America, 102(1):572–580. → pages 36Nittrouer, S. and Miller, M. E. (1997b). Predicting developmental shifts inperceptual weighting schemes. The Journal of the Acoustical Society ofAmerica, 101(4):2253–2266. → pages 36Nittrouer, S. and Studdert-Kennedy, M. (1987). The role of coarticulatory effectsin the perception of fricatives by children and adults. Journal of Speech,Language, and Hearing Research, 30(3):319–329. → pages 36Noguchi, M., Chiu, C., Po-Chun, W., and Yamane, N. (2015a). Uncoveringsibilant fricative merger in Taiwan Mandarin: Evidence from ultrasoundimaging and acoustics. Poster presented at Linguistic Society of America 2015Annual Meeting, Portland. → pages 33, 90, 172Noguchi, M., Chiu, C., Wei, P.-C., and Yamane, N. (2015b). Contrastive tongueshapes of the three sibilant fricatives in Taiwan Mandarin read speech.Canadian Acoustics, 43(3). → pages 172Noguchi, M. and Hudson Kam, C. L. (2014a). Learning phonetic categories withphonotactics: The influence of predictability and phonetic naturalness. Posterpresented at The 39th Annual Boston University Conference on LanguageDevelopment, Boston. → pages ivNoguchi, M. and Hudson Kam, C. L. (2014b). Learning sound categories withphonotactics. Poster presented at The 14th Conference on LaboratoryPhonology, Tokyo. → pages iv162Noguchi, M. and Hudson Kam, C. L. (2015a). Categorical perception ofpost-alveolar sibilants by Taiwan and Beijing Mandarin speakers. CanadianAcoustics, 43(3). → pages v, 47, 102Noguchi, M. and Hudson Kam, C. L. (2015b). Learning the context-dependentperception of novel speech sounds. Poster presented at 2015 Annual Meetingon Phonology, Vancouver. → pages ivNowak, P. M. (2006). The role of vowel transitions and frication noise in theperception of Polish sibilants. Journal of Phonetics, 34(2):139–152. → pages35Ong, J. H., Burnham, D., and Escudero, P. (2015). Distributional learning oflexical tones: A comparison of attended vs. unattended listening. PLOS ONE,10(7). → pages 21Onishi, K. H., Chambers, K. E., and Fisher, C. (2002). Learning phonotacticconstraints from brief auditory experience. Cognition, 83(1):B13–B23. →pages 2, 26, 70Pajak, B. (2012). Inductive inference in non-native speech processing andlearning. PhD thesis, University of California San Diego. → pages 21, 22, 23,55, 63Pajak, B. and Levy, R. (2012). Distributional learning of L2 phonologicalcategories by listeners with different language backgrounds. In Biller, A. K.,Chung, E. Y., and Kimball, A. E., editors, Proceedings of the 36th BostonUniversity Conference on Language Development, volume 2, pages 400–413,Somerville. Cascadilla Press. → pages 21, 22Pastore, R. E., Layer, J. K., Morris, C. B., and Logan, R. J. (1988). Temporalorder identification for tone/noise stimuli with onset transitions. Perception &Psychophysics, 44(3):257–271. → pages 10Pegg, J. E. and Werker, J. F. (1997). Adult and infant perception of two Englishphones. The Journal of the Acoustical Society of America, 102(6):3742–3753.→ pages 6Peperkamp, S., Le Calvez, R., Nadal, J.-P., and Dupoux, E. (2006a). Theacquisition of allophonic rules: Statistical learning with linguistic constraints.Cognition, 101(3):B31–B41. → pages 2, 4, 24163Peperkamp, S., Pettinato, M., and Dupoux, E. (2003). Allophonic variation andthe acquisition of phoneme categories. In Beachley, B., Brown, A., and Conlin,F., editors, Proceedings of the 27th annual Boston University Conference onLanguage Development, volume 2, pages 650–661, Somerville. CascadillaPress. → pages 6, 7, 26, 27, 32Peperkamp, S., Skoruppa, K., and Dupoux, E. (2006b). The role of phoneticnaturalness in phonological rule acquisition. In Magnitskaia, T. and Colleen,Z., editors, Proceedings of the 30th annual Boston University Conference onLanguage Development, volume 2, pages 464–475, Somerville. CascadillaPress. → pages 70, 73Peterson, G. E. and Barney, H. L. (1952). Control methods used in a study of thevowels. The Journal of the Acoustical Society of America, 24(2):175–184. →pages 18Pierce, J. R. (1980). An Introduction to Information Theory: Symbols, Signals andNoise. Dover Publications, Inc., New York, 2nd edition. → pages 8Pierrehumbert, J. B. (2003). Phonetic diversity, statistical learning, andacquisition of phonology. Language and speech, 46(2-3):115–154. → pages 1,18Pisoni, D. B. (1973). Auditory and phonetic memory codes in the discriminationof consonants and vowels. Perception & Psychophysics, 13(2):253–260. →pages 7, 55Polka, L. and Bohn, O.-S. (1996). A cross-language comparison of vowelperception in English-learning and German-learning infants. The Journal of theAcoustical Society of America, 100(1):577–592. → pages 11Polka, L. and Werker, J. F. (1994). Developmental changes in perception ofnonnative vowel contrasts. Journal of Experimental Psychology: HumanPerception and Performance, 20(2):421. → pages 11Pons, F., Sabourin, L., Cady, J. C., and Werker, J. F. (2006). Distributionallearning in vowel distinctions by 8-month-old English infants. Paper presentedat The 28th Annual Conference of the Cognitive Science Society, Vancouver.→ pages 20Port, R. F. (2010). Rich memory and distributed phonology. Language Sciences,32(1):43–55. → pages 3164Port, R. F. and Leary, A. P. (2005). Against formal phonology. Language,81(4):927–964. → pages 3Proctor, M., Lu, L. H., Zhu, Y., Goldstein, L., Narayanan, S., et al. (2012).Articulation of Mandarin sibilants: A multi-plane realtime MRI study. InProceedings of The 14th Australasian International Conference on SpeechScience and Technology, pages 113–116. → pages 33, 34, 90Pycha, A., Nowak, P., Shin, E., and Shosted, R. (2003). Phonologicalrule-learning and its implications for a theory of vowel harmony. In Garding,G. and Tsujimura, M., editors, Proceedings of The 22nd West Coast Conferenceon Formal Linguistics, pages 101–114, Somerville. Cascadilla Press. → pages69, 70R Core Team (2014). R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna. → pages 58, 79, 110Repp, B. H. (1981). Two strategies in fricative discrimination. Perception &Psychophysics, 30(3):217–227. → pages 85, 99Repp, B. H. and Mann, V. A. (1981). Perceptual assessment of fricative–stopcoarticulation. The Journal of the Acoustical Society of America,69(4):1154–1163. → pages 85, 99Rost, G. C. and McMurray, B. (2009). Speaker variability augments phonologicalprocessing in early word learning. Developmental Science, 12(2):339–349. →pages 13, 14, 127Rost, G. C. and McMurray, B. (2010). Finding the signal by adding noise: Therole of noncontrastive phonetic variability in early word learning. Infancy,15(6):608–635. → pages 13, 14, 127Saffran, J. R. (2002). Constraints on statistical language learning. Journal ofMemory and Language, 47(1):172–196. → pages 67Saffran, J. R., Aslin, R. N., and Newport, E. L. (1996a). Statistical learning by8-month-old infants. Science, 274(5294):1926–1928. → pages 2, 8, 67Saffran, J. R., Johnson, E. K., Aslin, R. N., and Newport, E. L. (1999). Statisticallearning of tone sequences by human infants and adults. Cognition,70(1):27–52. → pages 67Saffran, J. R., Newport, E. L., and Aslin, R. N. (1996b). Word segmentation: Therole of distributional cues. Journal of Memory and Language, 35(4):606–621.→ pages 2, 67165Saffran, J. R. and Thiessen, E. D. (2003). Pattern induction by infant languagelearners. Developmental Psychology, 39(3):484–494. → pages 25, 70Schane, S. A., Tranel, B., and Lane, H. (1975). On the psychological reality of anatural rule of syllable structure. Cognition, 3(4):351–358. → pages 30, 68Schneider, W., Eschman, A., and Zuccolotto, A. (2002). E-Prime: User’s guide.Psychology Software Inc. → pages 55, 79, 107Schouten, B., Gerrits, E., and van Hessen, A. (2003). The end of categoricalperception as we know it. Speech Communication, 41(1):71–80. → pages 9Seidl, A. and Buckley, E. (2005). On the learning of arbitrary phonological rules.Language Learning and Development, 1(3-4):289–316. → pages 25, 69Seidl, A. and Cristia, A. (2012). Infants’ learning of phonological status.Frontiers in psychology, 3(448):1–10. → pages ii, 2Seidl, A., Cristia, A., Bernard, A., and Onishi, K. H. (2009). Allophonic andphonemic contrasts in infants’ learning of sound patterns. Language Learningand Development, 5(3):191–202. → pages 2, 14, 24Seitz, A. R., Protopapas, A., Tsushima, Y., Vlahou, E. L., Gori, S., Grossberg, S.,and Watanabe, T. (2010). Unattended exposure to components of speechsounds yields same benefits as explicit auditory training. Cognition,115(3):435–443. → pages 129Seitz, A. R. and Watanabe, T. (2009). The phenomenon of task-irrelevantperceptual learning. Vision Research, 49(21):2604–2610. → pages 129Sekiyama, K. and Tohkura, Y. (1993). Inter-language differences in the influenceof visual cues in speech perception. Journal of Phonetics, 21(4):427–444. →pages 16Shannon, C. E. (1951). Prediction and entropy of printed English. Bell SystemTechnical Journal, 30(1):50–64. → pages 7Shannon, C. E. and Weaver, W. (1949). The mathematical theory of information.University of Illinois Press, Chicago. → pages 7Skoruppa, K., Lambrechts, A., and Peperkamp, S. (2011). The role of phoneticdistance in the acquisition of phonological alternations. In Lima, S., Mullin, K.,and Smith, B., editors, Proceedings of The 39the Annual Meeting of the NorthEast Linguistic Society, volume 2, pages 464–475. CreateSpace IndependentPublishing Platform. → pages 70, 73166Smith, R. L. (1979). Adaptation, saturation, and physiological masking in singleauditory-nerve fibers. The Journal of the Acoustical Society of America,65(1):166–178. → pages 87Sproat, R. and Fujimura, O. (1993). Allophonic variation in English /l/ and itsimplications for phonetic implementation. Journal of Phonetics,21(3):291–311. → pages 138Stager, C. L. and Werker, J. F. (1997). Infants listen for more phonetic detail inspeech perception than in word-learning tasks. Nature, 388(6640):381–382. →pages 13, 127Stevens, K. N. and Blumstein, S. E. (1975). Quantal aspects of consonantproduction and perception: A study of retroflex stop consonants. Journal ofPhonetics, 3(4):215–233. → pages 53Stevens, K. N., Li, Z., Lee, C.-Y., and Keyser, S. J. (2004). A note on Mandarinfricatives and enhancement. In Fant, G., Fujisaki, H., Cao, J., and Xu, Y.,editors, From traditional phonology to modern speech processing: Festschriftfor professor Wu Zongji’s 95th birthday, pages 393–403. Foreign LanguageTeaching and Research Press, Beijing. → pages 35Streeter, L. A. (1976). Language perception of 2-mo-old infants shows effects ofboth innate mechanisms and experience. Nature, 259(5538):39–41. → pages 11Summerfield, Q. (1975). How a full account of segmental perception depends onprosody and vice versa. In Cohen, A. and Nooteboom, S. G., editors, Structureand process in speech perception, pages 51–68, Berlin. Springer-Verlag. →pages 85, 99Sussman, H. M., McCaffrey, H. A., and Matthews, S. A. (1991). An investigationof locus equations as a source of relational invariance for stop placecategorization. The Journal of the Acoustical Society of America,90(3):1309–1325. → pages 35Svantesson, J.-O. (1986). Acoustic analysis of Chinese fricatives and affricates.Journal of Chinese Linguistics, 14(1):53–70. → pages 35Swingley, D. (2009). Contributions of infant word learning to languagedevelopment. Philosophical Transactions of the Royal Society B: BiologicalSciences, 364(1536):3617–3632. → pages 127167Takagi, N. (1993). Perception of American English /r/ and /l/ by adult Japaneselearners of English: A unified view. PhD thesis, University of California Irvine.→ pages 16Tees, R. C. and Werker, J. F. (1984). Perceptual flexibility: maintenance orrecovery of the ability to discriminate non-native speech sounds. CanadianJournal of Psychology/Revue canadienne de psychologie, 38(4):579. → pages15Thiessen, E. D. (2007). The effect of distributional information on children’s useof phonemic contrasts. Journal of Memory and Language, 56(1):16–34. →pages 13, 127, 128Thiessen, E. D. (2011a). Domain general constraints on statistical learning. ChildDevelopment, 82(2):462–470. → pages 67Thiessen, E. D. (2011b). When variability matters more than meaning: the effectof lexical forms on use of phonemic contrasts. Developmental Psychology,47(5):1448. → pages 13, 127, 128Toda, M. and Honda, K. (2003). An MRI-based cross-linguistic study of sibilantfricatives. Paper presented at 6th International Seminar on Speech Production,Manly. → pages 33, 34, 37, 90Toscano, J. and McMurray, B. (2010). Cue integration with categories: Astatistical approach to cue weighting and combination in speech perception.Cognitive Science, 34(3):434–464. → pages 21Trehub, S. E. (1976). The discrimination of foreign speech contrasts by infantsand adults. Child Development, 47(2):466–472. → pages 11Treisman, A. M. (1960). Contextual cues in selective listening. Quarterly Journalof Experimental Psychology, 12(4):242–248. → pages 8Treisman, A. M. (1964). Verbal cues, language, and meaning in selectiveattention. The American Journal of Psychology, 77(2):206–219. → pages 8Treisman, A. M. (1965). The effects of redundancy and familiarity on translatingand repeating back a foreign and a native language. British Journal ofPsychology, 56(4):369–379. → pages 8Trubetzkoy, N. S. (1969). Principles of phonology. University of California Press,Berkeley. → pages 1, 2, 4168Tsushima, T., Takizawa, O., Sasaki, M., Shiraki, S., Nishi, K., Kohno, M.,Menyuk, P., and Best, C. T. (1994). Discrimination of English /r-l/ and /w-y/ byJapanese infants at 6-12 months: Language-specific developmental changes inspeech perception abilities. In Proceedings of The 3rd InternationalConference on Spoken Language Processing, pages 1695–1698, Yokohama.Acoustical Society of Japan. → pages 11Twaddell, W. F. (1935). On defining the phoneme. Language, 11(1):5–62. →pages 1Vallabha, G. K., McClelland, J. L., Pons, F., Werker, J. F., and Amano, S. (2007).Unsupervised learning of vowel categories from infant-directed speech.Proceedings of the National Academy of Sciences, 104(33):13273–13278. →pages 20Viswanathan, N., Magnuson, J. S., and Fowler, C. A. (2010). Compensation forcoarticulation: Disentangling auditory and gestural theories of perception ofcoarticulatory effects in speech. Journal of Experimental Psychology: HumanPerception and Performance, 36(4):1005–1015. → pages 100Vlahou, E. L., Protopapas, A., and Seitz, A. R. (2012). Implicit training ofnonnative speech stimuli. Journal of Experimental Psychology: General,141(2):363–381. → pages 129, 130Wade, T. and Holt, L. L. (2005). Incidental categorization of spectrally complexnon-invariant auditory stimuli in a computer game task. The Journal of theAcoustical Society of America, 118(4):2618–2633. → pages 129Wagner, A., Ernestus, M., and Cutler, A. (2006). Formant transitions in fricativeidentification: The role of native fricative inventory. The Journal of theAcoustical Society of America, 120(4):2267–2277. → pages 36Waters, R. and Wilson, W. (1976). Speech perception by rhesus monkeys: Thevoicing distinction in synthesized labial and velar stop consonants. Perception& Psychophysics, 19(4):285–289. → pages 10Werker, J. F. and Curtin, S. (2005). Primir: A developmental framework of infantspeech processing. Language Learning and Development, 1(2):197–234. →pages 1Werker, J. F. and Fennell, C. T. (2004). Listening to sounds versus listening towords: Early steps in word learning. In Hall, D. G. and Waxman, S. R., editors,Weaving a lexicon, pages 79–109. The MIT Press, Cambridge. → pages 127169Werker, J. F. and Logan, J. S. (1985). Cross-language evidence for three factors inspeech perception. Perception & Psychophysics, 37(1):35–44. → pages 55Werker, J. F. and Tees, R. C. (1983). Developmental changes across childhood inthe perception of non-native speech sounds. Canadian Journal ofPsychology/Revue canadienne de psychologie, 37(2):278. → pages 11Werker, J. F. and Tees, R. C. (1984). Cross-language speech perception: Evidencefor perceptual reorganization during the first year of life. Infant Behavior andDevelopment, 7(1):49–63. → pages 1, 11Whalen, D. (1981a). Effects of nonessential cues on the perception of english [s]and [ssˇ]. The Journal of the Acoustical Society of America, 69(S1):S94–S94.→ pages 36Whalen, D. H. (1981b). Effects of vocalic formant transitions and vowel qualityon the English [s]–[sˇ] boundary. The Journal of the Acoustical Society ofAmerica, 69(1):275–282. → pages 36Whalen, D. H. (1991). Perception of the English /s/–/S/ distinction relies onfricative noises and transitions, not on brief spectral slices. The Journal of theAcoustical Society of America, 90(4):1776–1785. → pages 36Whalen, D. H., Best, C. T., and Irwin, J. R. (1997). Lexical effects in theperception and production of American English /p/ allophones. Journal ofPhonetics, 25(4):501–528. → pages 6, 7White, J. and Sundara, M. (2014). Biased generalization of newly learnedphonological alternations by 12-month-old infants. Cognition, 133(1):85–90.→ pages 72, 141White, K. S., Peperkamp, S., Kirk, C., and Morgan, J. L. (2008). Rapidacquisition of phonological alternations by infants. Cognition, 107(1):238–265.→ pages 25, 26, 140Wickens, C. D. (1981). Processing resources in attention, dual task performance,and workload assessment (technical report EPL–81–3/ONR–81–3).Engineering-Psychology Research Laboratory, University of Illinois atUrbana-Champaign, Urbana-Champaign. → pages 8Wilde, L. (1993). Inferring articulatory movements from acoustic properties atfricative-vowel boundaries. The Journal of the Acoustical Society of America,94(3):1881–1881. → pages 34, 35170Wilson, C. (2003). Experimental investigation of phonological naturalness. InGarding, G. and Tsujimura, M., editors, Proceedings of The 22nd West CoastConference on Formal Linguistics, pages 533–546, Somerville. CascadillaPress. → pages 30, 70, 71, 73Wilson, C. (2006). Learning phonology with substantive bias: An experimentaland computational study of velar palatalization. Cognitive Science,30(5):945–982. → pages 30, 51, 71, 72, 73, 83, 132Yamada, R. A. and Tohkura, Y. (1992). The effects of experimental variables onthe perception of American English /r/ and /l/ by Japanese listeners. Perception& Psychophysics, 52(4):376–392. → pages 16Yeung, H. H. and Werker, J. F. (2009). Learning words’ sounds before learninghow words sound: 9-month-olds use distinct objects as cues to categorizespeech information. Cognition, 113(2):234–243. → pages 127Yoshida, K. A., Pons, F., Maye, J., and Werker, J. F. (2010). Distributionalphonetic learning at 10 months of age. Infancy, 15(4):420–433. → pages 20, 22Yoshioka, H., Lo¨fqvist, A., and Hirose, H. (1981). Laryngeal adjustments in theproduction of consonant clusters and geminates in American English. TheJournal of the Acoustical Society of America, 70(6):1615–1623. → pages 136Zygis, M. and Padgett, J. (2010). A perceptual study of Polish fricatives, and itsimplications for historical sound change. Journal of Phonetics, 38(2):207–226.→ pages 35171Appendix ACategorical perception ofpost-alveolar fricatives by nativespeakers of MandarinIn order to find the location of a perceptual boundary between the retroflex [ùa]and alveolopalatal [Ca], I tested the categorical perception of the syllables from the10-step continuum between [ùa] and [Ca] by native speakers of Mandarin. Previousstudies have demonstrated that there are some differences between the regional va-rieties of Mandarin with respect to the production of the retroflex sibilants. Specif-ically, while the phonetic contrast between the dental and retroflex sibilants aremaintained in the variety spoken around Beijing (so-called “Standard Mandarin”),the contrast tend to be lost in casual speech in other varieties, including the onespoken in Taiwan (Chang, 2013; Chung, 2006; Noguchi et al., 2015a,b). Since thestimuli used in this study were produced by a speaker from Taiwan, I tested theperception of Mandarin speakers from both Beijing and Taiwan.A.1 DesignI used an ABX discrimination task as well as an identification task to test the cate-gorical perception of the phonetic contrast between retroflex [ù] and alveolopalatal[C]. Stimuli were drawn from the 10-step continuum from retroflex [ùa] to alve-172olopalatal [Ca] (see Secttion 2.3.2). In the ABX discrimination task, participantscompared two test items that were separated by one step (e.g., step 1 vs. step 3). ISIwas 750 ms. In each trial, participants were given a maximum of five seconds torespond, but the trial was terminated whenever they recorded a response. ITI wastwo seconds. In the identification task, participants were asked to label a singletest item either as retroflex or alveolopalatal. In each trial, participants were givena maximum of five seconds to respond, but the trial was terminated whenever theyrecorded a response. ITI was two seconds.A.2 Participants10 Taiwan Mandarin speakers and 7 Beijing Mandarin speakers participated in thestudy. All were living in or visiting Vancouver at the time of the study. All partic-ipants in the Taiwan Mandarin group self-reported living in Taiwan until adoles-cence. Similarly, all participants in the Beijing group self-reported living in Beijinguntil adolescence. Participants were paid $5 for their participation.A.3 ProcedureAll participants did the ABX discrimination task first and the identification tasksecond. Instructions were given in Mandarin in written form. For the ABX discrim-ination task, participants heard three syllables and were asked to decide whetherthe last syllable was identical to the first one or the second one. There were eightblocks in this task. Each block contained 32 trials (8 ABX triads in 4 differentorders). For the identification task, participants heard a single syllable and wereasked to identify it as either retroflex [ùa] or alveolopalatal [Ca]. Category labelswere shown in Zhuyin for Taiwan Mandarin speakers and Pinyin for Beijing Man-darin speakers. There were eight blocks in the task, each consisting of 10 trials,one per step on the continuum. Trials within a block were presented in randomorder.173A.4 ResultsFor the discrimination task, responses to each unique triad were converted intosensitivity scores (d′). Figure A.1 shows the mean d′ scores by pair and languagegroup. Participants in both groups show the highest sensitivity for the trials com-paring step 5 and step 7. A repeated-measures ANOVA was conducted with d′scores as dependent variable and language group as a between-participant factorand pair as a within-participant factor. The analysis yielded a significant effectof pair [F(7,105) = 21.45, p < 0.001] but not language group [F(1,15) = 1.68,p = 0.21]. There was no interaction between the two factors [F(7,105) = 1.02,p = 0.41].01231-3 2-4 3-5 4-6 5-7 6-8 7-9 8-10Paird'LanguageBeiMTaiMFigure A.1: Mean d′ scores (with 95% CI)For the identification task, the proportion of /Ca/ responses was calculated foreach step on the continuum. Figure A.2 shows the mean proportion of /Ca/ re-sponses by step and language group. For both groups, the proportion of /Ca/ re-sponses is close to 0 in steps 1-5, jumps to around .5 at step 6, then increases toclose to 1 in steps 7-10, indicating a perceptual boundary between retroflex [ùa] andalveolopalatal [Ca] at step 6. Note that both Taiwan and Beijing Mandarin speakers174show the same identification curve.0.000.250.500.751.001 2 3 4 5 6 7 8 9 10StepProportion of /ɕa/ responsesLanguageBeiMTaiMFigure A.2: Proportion of /Ca/ responses (with 95% CI)The results of this study show that both Taiwan and Beijing Mandarin speakersperceive the contrast between retroflex [ùa] and alveolopalatal [Ca] in a categori-cal fashion, and there is no difference between the two language groups in wherethey place the boundary. Both Taiwan and Beijing Mandarin speakers placed thecategory boundary at step 6.175

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0300489/manifest

Comment

Related Items